Repository at Hanyang University: Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES)

Browse

My Repository

Repository at Hanyang UniversityCOLLEGE OF SCIENCE AND CONVERGENCE TECHNOLOGY[E](과학기술융합대학)ETC

112 0

Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES)

Title: Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES)

Author: 최성경

Keywords: NAIVE Bayes classification; MACHINE learning; RECEIVER operating characteristic curves; GENOME-wide association studies; EPIDEMIOLOGY; K-nearest neighbor classification; GENOMES; SUPPORT vector machines; Asthma; Disease risk prediction model; Ensemble methods; Genome-wide association study; GWAS; KoGES; Korean Genome and Epidemiology Study; Large-scale genetic data; Machine learning methods; Oversampling; Penalized methods

Issue Date: 2024-02-02

Publisher: BMC

Citation: BMC BIOINFORMATICS

Abstract: BackgroundGenome-wide association studies have successfully identified genetic variants associated with human disease. Various statistical approaches based on penalized and machine learning methods have recently been proposed for disease prediction. In this study, we evaluated the performance of several such methods for predicting asthma using the Korean Chip (KORV1.1) from the Korean Genome and Epidemiology Study (KoGES).ResultsFirst, single-nucleotide polymorphisms were selected via single-variant tests using logistic regression with the adjustment of several epidemiological factors. Next, we evaluated the following methods for disease prediction: ridge, least absolute shrinkage and selection operator, elastic net, smoothly clipped absolute deviation, support vector machine, random forest, boosting, bagging, naive Bayes, and k-nearest neighbor. Finally, we compared their predictive performance based on the area under the curve of the receiver operating characteristic curves, precision, recall, F1-score, Cohen ' s Kappa, balanced accuracy, error rate, Matthews correlation coefficient, and area under the precision-recall curve. Additionally, three oversampling algorithms are used to deal with imbalance problems.ConclusionsOur results show that penalized methods exhibit better predictive performance for asthma than that achieved via machine learning methods. On the other hand, in the oversampling study, randomforest and boosting methods overall showed better prediction performance than penalized methods.

URI: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-024-05677-x https://repository.hanyang.ac.kr/handle/20.500.11754/189485

ISSN: 1471-2105

DOI: 10.1186/s12859-024-05677-x

Appears in Collections:: COLLEGE OF SCIENCE AND CONVERGENCE TECHNOLOGY[E](과학기술융합대학) > ETC

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE