Repository at Hanyang University: Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection

Browse

My Repository

Repository at Hanyang UniversityCOLLEGE OF ENGINEERING[S](공과대학)ELECTRONIC ENGINEERING(융합전자공학부)Articles

271 0

Full metadata record

DC Field	Value	Language
dc.contributor.author	장준혁	-
dc.date.accessioned	2018-08-30T00:43:29Z	-
dc.date.available	2018-08-30T00:43:29Z	-
dc.date.issued	2016-07	-
dc.identifier.citation	COMPUTER SPEECH AND LANGUAGE (2016), v. 38, Page. 1-12	en_US
dc.identifier.issn	0885-2308	-
dc.identifier.issn	1095-8363	-
dc.identifier.uri	https://www.sciencedirect.com/science/article/pii/S0885230815001072?via%3Dihub	-
dc.identifier.uri	https://repository.hanyang.ac.kr/handle/20.500.11754/74576	-
dc.description.abstract	In this paper, we investigate the ensemble of deep neural networks (DNNs) by using an acoustic environment classification (AEC) technique for the statistical model-based voice activity detection (VAD). From an investigation of the statistical model-based VAD, it is known that the traditional decision rule is based on the geometric mean of the likelihood ratio or the support vector machine (SVM), which is a shallow model with zero or one hidden layer. Since the shallow models cannot take an advantage of the diversity of the space distribution of features, in the training step, we basically build the multiple DNNs according the different noise types by employing the parameters of the statistical model-based VAD algorithm. In addition, the separate DNN is designed for the AEC algorithm in order to choose the best DNN for each noise. In the on-line noise-aware VAD step, the AEC is first performed on a frame-by-frame basis using the separate DNN so the a posteriori probabilities to identify noise are obtained. Once the probabilities are achieved for each noise, the environmental knowledge is contributed to allow us to combine the speech presence probabilities which are derived from the ensemble of the DNNs trained for the individual noise. Our approach for VAD was evaluated in terms of objective measures and showed significant improvement compared to the conventional algorithm. (C) 2015 Elsevier Ltd. All rights reserved.	en_US
dc.description.sponsorship	This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2014R1A2A1A10049735). This work was also supported by the ICT R&D program of MSIP/IITP [R0126-15-1119, Development of a solution for situation-awareness based on the analysis of speech and environmental sounds].	en_US
dc.language.iso	en	en_US
dc.publisher	ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD	en_US
dc.subject	Voice activity detection	en_US
dc.subject	Statistical model	en_US
dc.subject	Acoustic environment classification	en_US
dc.subject	Deep neural network	en_US
dc.subject	Ensemble	en_US
dc.title	Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection	en_US
dc.type	Article	en_US
dc.relation.volume	38	-
dc.identifier.doi	10.1016/j.csl.2015.11.003	-
dc.relation.page	1-12	-
dc.relation.journal	COMPUTER SPEECH AND LANGUAGE	-
dc.contributor.googleauthor	Hwang, Inyoung	-
dc.contributor.googleauthor	Park, Hyung-Min	-
dc.contributor.googleauthor	Chang, Joon-Hyuk	-
dc.relation.code	2016011173	-
dc.sector.campus	S	-
dc.sector.daehak	COLLEGE OF ENGINEERING[S]	-
dc.sector.department	DEPARTMENT OF ELECTRONIC ENGINEERING	-
dc.identifier.pid	jchang	-

Appears in Collections:: COLLEGE OF ENGINEERING[S](공과대학) > ELECTRONIC ENGINEERING(융합전자공학부) > Articles

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show simple item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE