322 0

Full metadata record

DC FieldValueLanguage
dc.contributor.advisor김태욱-
dc.contributor.author박민수-
dc.date.accessioned2023-05-11T11:47:46Z-
dc.date.available2023-05-11T11:47:46Z-
dc.date.issued2023. 2-
dc.identifier.urihttp://hanyang.dcollection.net/common/orgView/200000651793en_US
dc.identifier.urihttps://repository.hanyang.ac.kr/handle/20.500.11754/179599-
dc.description.abstractAutomatic Speech Recognition (ASR) system works pretty well in native English (L1) but it does not work well in non-native English (L2) because the recent state-of-art ASR system is focused on native English. To reduce the performance gap between L1 English and L2 English, training data from non-native speakers are needed. However, both unlabeled and labeled data is hard to obtain and find through publicly available datasets. Speech synthesis (text-to-speech) can be used to build ASR training datasets and solve low-resource problems despite traditional speech synthesis systems focused on generating native language. In this paper, we present a novel way to generate non-native speech synthesis by combining transliteration and native TTS systems as we will also investigate the influence of synthetic L2 English and synthetic L1 English data on L2 English performance. Our best model trained on synthetic L2 and Authentic L2 dataset achieves ~53.34% relative word error rate (WER) reduction compared to the traditional ASR system. For few-shot settings, the model trained with additional synthetic L2 English shows ~31.45% relative WER reduction compared to the model trained on 10 minutes of authentic L2 English.-
dc.publisher한양대학교-
dc.titleTTS-driven Data Augmentation with transliteration for non-native English ASR-
dc.typeTheses-
dc.contributor.googleauthor박민수-
dc.sector.campusS-
dc.sector.daehak인공지능융합대학원-
dc.sector.department인공지능시스템학과-
dc.description.degreeMaster-


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE