Repository at Hanyang University: X-vector based speaker recognition using Aishell Chinese speech dataset

Browse

My Repository

Repository at Hanyang UniversityGRADUATE SCHOOL[S](대학원)ELECTRICAL ENGINEERING(전기공학과)Theses (Master)

488 0

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	최준원	-
dc.contributor.author	우새요	-
dc.date.accessioned	2022-02-22T02:00:00Z	-
dc.date.available	2022-02-22T02:00:00Z	-
dc.date.issued	2022. 2	-
dc.identifier.uri	http://hanyang.dcollection.net/common/orgView/200000589746	en_US
dc.identifier.uri	https://repository.hanyang.ac.kr/handle/20.500.11754/167978	-
dc.description.abstract	Speaker recognition is also as known as voiceprint recognition. It is a technology that identifies the speaker's identity based on the voice. It is a fundamental task of speech processing and finds its wide applications in real life. In recent years, with the rapid development of the Internet and the popularization of intelligent devices, Speaker recognition has been widely used. Extracting information that can represent the speaker's identity from the speech is the theoretical core of speaker recognition. Because of its powerful ability of information calculating and modeling, the deep neural network has been used in natural language and speech processing, computer vision, autonomous car, and other fields. It is a hot research topic to introduce deep neural networks into speaker recognition. In the wake of developments in science and technology, speaker recognition systems have achieved impressive performance. In this paper, Firstly, we introduce the background and history of speaker recognition, and introduce the theoretical knowledge of speaker recognition; Secondly, we analyze the traditional and deep learning methods in the field of speaker recognition; Finally, we construct an x-vector speaker recognition model based on TDNN on the Chinese Mandarin speech dataset called "Aishell", which is more effective than the traditional model. And increases the data expansion step compared with the official project, and it can greatly reduce the value of EER and improve the robustness of the model against noise and reverberation.\| 발성자 인식은 성문 인식으로도 알려져 있다. 음성으로 화자의 신원을 확인하는 기술이다. 음성 처리의 기본 작업이며 실생활에서 광범위하게 응용된다. 최근에는 인터넷의 급속한 발전과 지능형 기기의 대중화로 발성자 인식이 널리 사용되고 있다. 말에서 화자의 정체성을 나타낼 수 있는 정보를 추출하는 것이 발성자 인식의 이론적 핵심이다. 정보 계산과 모델링의 강력한 능력 때문에, 심층 신경망은 자연 언어와 음성 처리, 컴퓨터 비전, 자율 자동차, 그리고 다른 분야에서 사용되어 왔다. 발성자 인식에 심층 신경망을 도입하는 것이 화제다. 과학기술 발전에 따라 발성자 인식 시스템이 인상적인 성과를 거두고 있다. 본 논문에서는 먼저 발성자 인식의 배경과 연구 역사를 소개하고 발성자 인식에 대한 이론적 지식을 소개한다. 다음으로 발성자 인식 분야의 전통적인 딥 러닝 방법을 각각 분석한다. 마지막으로 기존 모델보다 더 효과적인 중국어 표준어 음성 데이터 세트 "Aishell"에 TDNN 을 기반으로 x 벡터 스피커 인식 모델을 구축한다. 또한 공식 프로젝트에 비해 데이터 확장 단계를 증가시키며, EER 의 가치를 크게 낮추고 소음 및 잔향에 대한 모델의 견고성을 향상시킬 수 있다.	-
dc.publisher	한양대학교	-
dc.title	X-vector based speaker recognition using Aishell Chinese speech dataset	-
dc.type	Theses	-
dc.contributor.googleauthor	우새요	-
dc.contributor.alternativeauthor	우새요	-
dc.sector.campus	S	-
dc.sector.daehak	대학원	-
dc.sector.department	전기공학과	-
dc.description.degree	Master	-

Appears in Collections:: GRADUATE SCHOOL[S](대학원) > ELECTRICAL ENGINEERING(전기공학과) > Theses (Master)

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show simple item record

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

BROWSE