320 0

TTS-driven Data Augmentation with transliteration for non-native English ASR

Title
TTS-driven Data Augmentation with transliteration for non-native English ASR
Author
박민수
Advisor(s)
김태욱
Issue Date
2023. 2
Publisher
한양대학교
Degree
Master
Abstract
Automatic Speech Recognition (ASR) system works pretty well in native English (L1) but it does not work well in non-native English (L2) because the recent state-of-art ASR system is focused on native English. To reduce the performance gap between L1 English and L2 English, training data from non-native speakers are needed. However, both unlabeled and labeled data is hard to obtain and find through publicly available datasets. Speech synthesis (text-to-speech) can be used to build ASR training datasets and solve low-resource problems despite traditional speech synthesis systems focused on generating native language. In this paper, we present a novel way to generate non-native speech synthesis by combining transliteration and native TTS systems as we will also investigate the influence of synthetic L2 English and synthetic L1 English data on L2 English performance. Our best model trained on synthetic L2 and Authentic L2 dataset achieves ~53.34% relative word error rate (WER) reduction compared to the traditional ASR system. For few-shot settings, the model trained with additional synthetic L2 English shows ~31.45% relative WER reduction compared to the model trained on 10 minutes of authentic L2 English.
URI
http://hanyang.dcollection.net/common/orgView/200000651793https://repository.hanyang.ac.kr/handle/20.500.11754/179599
Appears in Collections:
GRADUATE SCHOOL OF APPLIED ARTIFICIAL INTELLIGENCE[S](인공지능융합대학원) > DEPARTMENT OF ARTIFICIAL INTELLIGENCE SYSTEMS(인공지능시스템학과) > Theses (Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE