TTS-driven Data Augmentation with transliteration for non-native English ASR
- Title
- TTS-driven Data Augmentation with transliteration for non-native English ASR
- Author
- 박민수
- Advisor(s)
- 김태욱
- Issue Date
- 2023. 2
- Publisher
- 한양대학교
- Degree
- Master
- Abstract
- Automatic Speech Recognition (ASR) system works pretty well in native English (L1) but it does not work well in non-native English (L2) because the recent state-of-art ASR system is focused on native English. To reduce the performance gap between L1 English and L2 English, training data from non-native speakers are needed. However, both unlabeled and labeled data is hard to obtain and find through publicly available datasets. Speech synthesis (text-to-speech) can be used to build ASR training datasets and solve low-resource problems despite traditional speech synthesis systems focused on generating native language. In this paper, we present a novel way to generate non-native speech synthesis by combining transliteration and native TTS systems as we will also investigate the influence of synthetic L2 English and synthetic L1 English data on L2 English performance. Our best model trained on synthetic L2 and Authentic L2 dataset achieves ~53.34% relative word error rate (WER) reduction compared to the traditional ASR system. For few-shot settings, the model trained with additional synthetic L2 English shows ~31.45% relative WER reduction compared to the model trained on 10 minutes of authentic L2 English.
- URI
- http://hanyang.dcollection.net/common/orgView/200000651793https://repository.hanyang.ac.kr/handle/20.500.11754/179599
- Appears in Collections:
- GRADUATE SCHOOL OF APPLIED ARTIFICIAL INTELLIGENCE[S](인공지능융합대학원) > DEPARTMENT OF ARTIFICIAL INTELLIGENCE SYSTEMS(인공지능시스템학과) > Theses (Master)
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML