Repository at Hanyang University: 적층형 심화 신경망을 이용한 음성 향상 기법

Browse

My Repository

Repository at Hanyang UniversityGRADUATE SCHOOL[S](대학원)ELECTRONICS AND COMPUTER ENGINEERING(전자컴퓨터통신공학과)Theses (Master)

364 0

적층형 심화 신경망을 이용한 음성 향상 기법

Title: 적층형 심화 신경망을 이용한 음성 향상 기법

Other Titles: Speech Enhancement Based on Stacked Deep Neural Networks

Author: 김상현

Alternative Author(s): Kim, Sang-Hyeon

Advisor(s): 장준혁

Issue Date: 2016-02

Publisher: 한양대학교

Degree: Master

Abstract: 본 논문에서는 적층 형태의 심화 신경망 (deep neural network) 구조 로부터 회귀 기법을 이용하는 음성 향상 기법을 제안한다. 기존의 통계적 모델 기반 방식들과 다르게 심화 신경망 구조에 기반을 둔 음성 향상 모델은 비정상 잡음 환경 아래에서도 우수한 성능을 보여준다. 하지만 음성 향상을 위해 심화 신경망을 효과적으로 학습시키는 것은 여전히 어려운 일이며, 이는 심화 신경망이 지역 최소점 (local minima)에 머물게 하여 음성 왜곡을 초래하는 원인이다. 본 논문에서는 이러한 현상을 완화하기 위해서 다수의 심화 신경망을 계층 형태로 연결하여 학습하는 방식을 제안한다. 이전 단 심화 신경망 학습이 완료되면 다음 단의 심화 신경망은 이전 단의 복원된 음성 스펙트럼과 원 잡음 스펙트럼을 입력으로 받아 새로운 학습을 시작한다. 이는 각 심화 신경망이 이전 단 심화 신경망의 지역 최소점을 시작점으로 하여 타겟 스펙트럼으로 순차적 재 사상함으로써 보다 우수한 성능을 보이도록 하는 방식이다. 제안된 알고리즘은 기존 통계적 모델 기반의 잡음제거 방식뿐만 아니라 단일 심화 신경망 구조로 이루어진 음성 향상 알고리즘보다 음성 품질 및 음성 인식률 측면에서 우수한 결과를 얻었다.|In this thesis, we propose a regression-based speech enhancement technique using deep architecture which consists of stacked multiple deep neural networks (DNNs). In contrast to conventional statistical model-based speech enhancement methods, the DNN-based enhancement models have shown good performance, especially in non-stationary noise condition. However, training the DNN is a still challenging task, which makes the DNN gets stuck in poor local minima and creates artificial annoying sound. To alleviate this problem, we introduce a hierarchical training strategy for speech enhancement by concatenating a series of the DNN modules. After training the previous DNN, the reconstructed output with raw input are fed into input of next DNN and start second level DNN training. Our approach allows the next DNN learn new mapping function that has lower bound on previous DNN module to make a better representation to target spectrum. Experimental results show that proposed method outperformed not only the statistical model-based enhancement method, but also conventional DNN based enhancement model.; In this thesis, we propose a regression-based speech enhancement technique using deep architecture which consists of stacked multiple deep neural networks (DNNs). In contrast to conventional statistical model-based speech enhancement methods, the DNN-based enhancement models have shown good performance, especially in non-stationary noise condition. However, training the DNN is a still challenging task, which makes the DNN gets stuck in poor local minima and creates artificial annoying sound. To alleviate this problem, we introduce a hierarchical training strategy for speech enhancement by concatenating a series of the DNN modules. After training the previous DNN, the reconstructed output with raw input are fed into input of next DNN and start second level DNN training. Our approach allows the next DNN learn new mapping function that has lower bound on previous DNN module to make a better representation to target spectrum. Experimental results show that proposed method outperformed not only the statistical model-based enhancement method, but also conventional DNN based enhancement model.

URI: https://repository.hanyang.ac.kr/handle/20.500.11754/126398 http://hanyang.dcollection.net/common/orgView/200000428777

Appears in Collections:: GRADUATE SCHOOL[S](대학원) > ELECTRONICS AND COMPUTER ENGINEERING(전자컴퓨터통신공학과) > Theses (Master)

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE