206 125

Document-Level Neural TTS Using Curriculum Learning and Attention Masking

Title
Document-Level Neural TTS Using Curriculum Learning and Attention Masking
Author
장준혁
Keywords
Speech synthesis; document-level neural TTS; curriculum learning; attention masking; Tacotron2; MelGAN; DeepVoice3; ParaNet; MultiSpeech
Issue Date
2021-01
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Citation
IEEE ACCESS, v. 9, page. 8954-8960
Abstract
Speech synthesis has been developed to the level of natural human-level speech synthesized through an attention-based end-to-end text-to-speech synthesis (TTS) model. However, it is difficult to generate attention when synthesizing a text longer than the trained length or document-level text. In this paper, we propose a neural speech synthesis model that can synthesize more than 5 min of speech at once using training data comprising a short speech of less than 10 s. This model can be used for tasks that need to synthesize document-level speech at a time, such as a singing voice synthesis (SVS) system or a book reading system. First, through curriculum learning, our model automatically increases the length of the speech trained for each epoch, while reducing the batch size so that long sentences can be trained with a limited graphics processing unit (GPU) capacity. During synthesis, the document-level text is synthesized using only the necessary contexts of the current time step and masking the rest through an attention-masking mechanism. The Tacotron2-based speech synthesis model and duration predictor were used in the experiment, and the results showed that proposed method can synthesize document-level speech with overwhelmingly lower character error rate, and attention error rates, and higher quality than those obtained using the existing model.
URI
https://ieeexplore.ieee.org/document/9312676/https://repository.hanyang.ac.kr/handle/20.500.11754/175538
ISSN
2169-3536
DOI
10.1109/ACCESS.2020.3049073
Appears in Collections:
COLLEGE OF ENGINEERING[S](공과대학) > ELECTRONIC ENGINEERING(융합전자공학부) > Articles
Files in This Item:
Document-Level Neural TTS Using Curriculum Learning and Attention Masking.pdfDownload
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE