Repository at Hanyang University: 베이지안 방법론 기반의 다중클래스 앙상블 분류 성능 향상 방안

Browse

My Repository

Repository at Hanyang UniversityGRADUATE SCHOOL[S](대학원)INDUSTRIAL ENGINEERING(산업공학과)Theses (Master)

540 0

베이지안 방법론 기반의 다중클래스 앙상블 분류 성능 향상 방안

Title: 베이지안 방법론 기반의 다중클래스 앙상블 분류 성능 향상 방안

Other Titles: A Method for Improving the Performance of Multiclass Ensemble Classification Based on Bayesian Methodology

Author: 전성대

Alternative Author(s): Jeon Seong Dae

Advisor(s): 배석주

Issue Date: 2021. 8

Publisher: 한양대학교

Degree: Master

Abstract: 4차 산업혁명 이후 데이터가 복잡해지고 다양화되면서 Multi-class의 분류 분석에 대한 요구가 증가하고 있다. 하지만, 일반적인 기계학습 알고리즘은 과적합(overfitting), 클래스 불균형(imbalanced class), 지역 최소값(local minimum) 등의 문제가 발생하며 이는 오분류 비용을 초래한다. 특히, 제조 공정 혹은 의학 분야에서의 오분류 비용 발생은 큰 경제적 손실이나 의료사고로 이어질 수 있다. 이러한 기계학습 알고리즘의 단점을 보완할 수 있는 방법이 앙상블 학습법이다. 앙상블 학습법(Ensemble Learning, EL)은 좋은 예측 성능을 얻기 위해 다수의 학습 알고리즘의 분류 결과를 종합하여 최종적으로 분류하는 방식을 말한다. 각 개별 분류기가 종합될 때는 투표 방식에 의해 종합되게 되고 가장 기본이 되는 직접 투표(Majority Voting, MV)방식과 간접 투표(Soft Voting, SV)방식 등이 있다. 본 연구에서는 배깅 알고리즘으로부터 산출된 분류 확률 값을 이용한 베이지안 방법론 기반의 앙상블 프레임워크를 제안하였다. 사전적 정보를 이용하여 사후 확률을 유도하였고, 사후 확률을 최대화하는 확률 값을 재추정하였다. 베이지안 방법론을 사용하게 되면 결과에 대한 해석이 전통적인 통계학보다 훨씬 쉽고 직관적으로 잘 해석할 수 있다는 장점이 있다. 또한, 베이지안 기법은 좋은 추론 방법을 제공하며 복잡한 문제에서 MCMC 기법을 활용하여 모수(parameter)를 쉽게 구할 수 있을 뿐만 아니라, 사후 분포를 통한 모수 추론을 보다 정확하게 할 수 있다는 장점이 있다. 앙상블 학습법에 이용된 분류기는 Multi-class 데이터에서 분류 성능이 뛰어난 기계학습 모델로 알려진 서포트 벡터 머신(Support Vector Machine, SVM), 랜덤포레스트(RandomForest, RF), 인공신경망(Artificial Neural Networks, ANN), 가우시안 나이브 베이즈 분류기(Gaussian Naive Bayes, GNB) 모델을 이용하였다. OpenML에서 공개한 4개의 Multi-class 데이터셋에 기계학습 알고리즘을 적용하여 사례 분석을 실시하였으며 제안하는 앙상블 프레임워크와 기존에 연구된 앙상블 프레임워크의 분석 결과를 비교 분석하였다. 분석 결과, 4개의 데이터셋 중 3개 데이터셋에 대해 베이지안 방법론 기반의 앙상블(Ensemble Learning based on Bayesian methodology, ELB) 프레임워크가 기존의 연구된 앙상블 프레임워크보다 높은 분류 성능을 가지는 것을 확인하였다.|As data becomes more complex and diversified after the 4th industrial revolution, the demand for multi-class classification analysis is increasing. However, general machine learning algorithms have problems such as overfitting, imbalanced class, and local minimum, which incur misclassification costs. In particular, misclassification costs in the manufacturing process or medical field can lead to large economic losses or medical accidents. The ensemble learning method is a method that can compensate for the shortcomings of such machine learning algorithms. Ensemble learning (EL) refers to a method of final classification by synthesizing the classification results of multiple learning algorithms in order to obtain good prediction performance. When each individual classifier is synthesized, it is synthesized by the voting method, and there are the most basic direct voting (Majority Voting, MV) method and indirect voting (Soft Voting, SV) method. In this study, an ensemble framework based on Bayesian methodology was proposed using the classification probability value calculated from the bagging algorithm. The posterior probability was derived using a priori information, and the probability value that maximizes the posterior probability was re-estimated. The advantage of using the Bayesian methodology is that the interpretation of the results is much easier and more intuitive than traditional statistics. In addition, the Bayesian method provides a good reasoning method and has the advantage of being able to easily obtain parameters by using the MCMC method in complex problems, as well as more accurately inferring parameters through the posterior distribution. The classifiers used in the ensemble learning method are Support Vector Machine (SVM), Random Forest (RF), Artificial Neural Networks (ANN), which are known as machine learning models with excellent classification performance on multi-class data. A Gaussian Navie Bayes (GNB) model was used. Case analysis was performed by applying a machine learning algorithm to the four multi-class datasets published by OpenML, and the analysis results of the proposed ensemble framework and the previously studied ensemble framework were compared and analyzed. As a result of the analysis, it was confirmed that the Ensemble Learning based on Bayesian methodology (ELB) framework had higher classification pperformance than the previously studied ensemble framework for 3 datasets among the 4 datasets.

URI: http://hanyang.dcollection.net/common/orgView/200000496582 https://repository.hanyang.ac.kr/handle/20.500.11754/164009

Appears in Collections:: GRADUATE SCHOOL[S](대학원) > INDUSTRIAL ENGINEERING(산업공학과) > Theses (Master)

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE