84 0

AI-Based Medical Data Processing System and Model Interpretation through XAI

Title
AI-Based Medical Data Processing System and Model Interpretation through XAI
Author
이시렬
Alternative Author(s)
Siryeol Lee
Advisor(s)
이주현
Issue Date
2024. 2
Publisher
한양대학교 대학원
Degree
Master
Abstract
AI-Based Medical Data Processing System and Model Interpretation through XAI Si Ryeol Lee Dept. of Applied artificial Intelligence The Graduate School Hanyang University The incidence of heart and brain-related vascular diseases, commonly known as cardiocerebrovascular disease (CVD), is on a steady rise, making it the primary cause of death in humans. Given the challenge for doctors to screen vast numbers of individuals, there’s a pressing need for methods that are both highly accurate and easy to understand. In response to this, we have created four different classifiers based on machine learning to identify CVD: these include a multi-layer perceptron, a support vector machine, a random forest, and a light gradient boosting method. These were developed using data from the Korea National Health and Nutrition Examination Survey (KNHANES). To ensure our dataset accurately reflects the general population, we adjusted the KNHANES data by resampling and rebalancing it using complex sampling weights. This approach simulates a dataset that appears as if it has been uniformly collected from a broad population base. We took additional steps to clarify our analysis of risk factors. This involved eliminating variables that either had multicollinearity or were irrelevant to CVD. We achieved this by filtering based on the Variance Inflation Factor (VIF) and using the Boruta algorithm. Before training our machine learning models, we applied a technique known as the synthetic minority oversampling technique (SMOTE). This preparation significantly improved our models’ ability to generalize. The performance of our proposed classifiers was noteworthy, with Area Under the Curve (AUC) scores exceeding 0.853. By employing a method based on Shapley values for risk factor analysis, we pinpointed the most crucial risk factors for CVD. These include a person’s age, their sex, and whether they have hypertension. Further, our analysis revealed positive correlations between the prevalence of CVD and factors such as age, hypertension, and Body Mass Index (BMI). Conversely, we found negative correlations with factors like being female, alcohol consumption, and monthly income. Our study highlights that selecting appropriate features and balancing classes can significantly enhance the interpretability of machine learning models in healthcare.
URI
http://hanyang.dcollection.net/common/orgView/200000721358https://repository.hanyang.ac.kr/handle/20.500.11754/188852
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > APPLIED ARTIFICIAL INTELLIGENCE(인공지능융합학과) > Theses(Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE