259 0

DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding

Title
DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding
Author
장준혁
Keywords
Deep learning; Sequence to sequence; Speech synthesis; Multi speaker speech synthesis
Issue Date
2019-01
Publisher
IEEE/ICEIC
Citation
ICEIC 2019 - International Conference on Electronics, Information, and Communication, 8706390
Abstract
In this paper, multi speaker speech synthesis using speaker embedding is proposed. The proposed model is based on Tacotron network, but post-processing network of the model is modified with dilated convolution layers, which used in Wavenet architecture, to make it more adaptive to speech. The model can generate multi speaker voice with only one neural network model by giving auxiliary input data, speaker embedding, to the network. This model shows successful result for generating two speaker's voices without significant deterioration of speech quality. © 2019 Institute of Electronics and Information Engineers (IEIE).
URI
https://ieeexplore.ieee.org/document/8706390https://repository.hanyang.ac.kr/handle/20.500.11754/122095
ISBN
978-899500444-9
DOI
10.23919/ELINFOCOM.2019.8706390
Appears in Collections:
COLLEGE OF ENGINEERING[S](공과대학) > ELECTRONIC ENGINEERING(융합전자공학부) > Articles
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE