DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding
- Title
- DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding
- Author
- 장준혁
- Keywords
- Deep learning; Sequence to sequence; Speech synthesis; Multi speaker speech synthesis
- Issue Date
- 2019-01
- Publisher
- IEEE/ICEIC
- Citation
- ICEIC 2019 - International Conference on Electronics, Information, and Communication, 8706390
- Abstract
- In this paper, multi speaker speech synthesis using speaker embedding is proposed. The proposed model is based on Tacotron network, but post-processing network of the model is modified with dilated convolution layers, which used in Wavenet architecture, to make it more adaptive to speech. The model can generate multi speaker voice with only one neural network model by giving auxiliary input data, speaker embedding, to the network. This model shows successful result for generating two speaker's voices without significant deterioration of speech quality. © 2019 Institute of Electronics and Information Engineers (IEIE).
- URI
- https://ieeexplore.ieee.org/document/8706390https://repository.hanyang.ac.kr/handle/20.500.11754/122095
- ISBN
- 978-899500444-9
- DOI
- 10.23919/ELINFOCOM.2019.8706390
- Appears in Collections:
- COLLEGE OF ENGINEERING[S](공과대학) > ELECTRONIC ENGINEERING(융합전자공학부) > Articles
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML