Repository at Hanyang University: Proposing an AI-based approach to imbalanced medical datasets

Browse

My Repository

Repository at Hanyang UniversityGRADUATE SCHOOL[S](대학원)INFORMATION SYSTEMS(정보시스템학과)Theses (Ph.D.)

291 0

Proposing an AI-based approach to imbalanced medical datasets

Title: Proposing an AI-based approach to imbalanced medical datasets

Author: 전민종

Advisor(s): 이욱

Issue Date: 2023. 2

Publisher: 한양대학교

Degree: Doctor

Abstract: Deep learning and machine learning are two primary forms of artificial intelligence (AI) that have matured swiftly; in 2015, AI already defeated humans in the visual recognition challenge. However, the characteristics of the medical, such as the data imbalance problems, make it challenging to deploy AI algorithms, notwithstanding their usefulness. Various anomaly detection algorithms were introduced to resolve this problem. However, there are enough algorithms that are proper for CSV format, but those algorithms for image data are insufficient. Therefore, this paper consists of two parts; first, machine learning and deep learning algorithms were introduced for anomaly detection and data augmentation. Second, this study suggests a novel method, U-AnoGAN, for detecting anomalies in X-ray images. It advocated classifying and identifying anomalies using masks rather than whole X-ray pictures. Each part consists of three and seven experiments, respectively. At first, the study utilizes EEG datasets for the experiment. The datasets were used with LGBM, a machine learning classifier. Then, datasets were subjected to anomaly detection algorithms like IF and LOF, and those methods were compared based on the AUC score. The LGBM achieved the most outstanding AUC score, 72%. CTGAN and TVAE were employed for data augmentation in the third stage. Following the creation of synthetic datasets, those created data were compared using visualization and similarity scores. The outcome showed that CTGAN performed better than TVAE because it attained a higher similarity. The second experiment clarifies problematic aspects of deploying AnoGAN and SR-AnoGAN in X-ray images. They resulted in unstable learning and ambiguous results that could not be applied to medical diagnosis. Then, several experiments verified that employing masks rather than entire X-ray images could secure the accuracy of diagnosis. While employing X-ray images yielded about a 90% accuracy score, mask images resulted in an 80% accuracy score. Then, the segmentation process of U-AnoGAN was conducted with normal datasets and achieved above 90% dice coefficient and 95% accuracy, respectively. Then, this pre-trained model segmentation was performed on Covid-19 and Pneumonia datasets to derive masks of them. VGG-16 was implemented to classify those derived masks and achieved about 90% accuracy. Lastly, U-AnoGAN calculated the anomaly scores on each dataset and yielded further improvements compared to previous algorithms (AnoGAN and SR-AnoGAN) both quantitatively and qualitatively. In conclusion, this study helped readers understand how to handle unbalanced medical datasets. The suggested model, U-AnoGAN, will lead to a breakthrough in problem resolution and the application of AI to several medical sectors. |딥러닝과 머신러닝은 AI의 대표적인 기술들이고, 빠르게 발전하였으며, 2015년에는 이미 인간을 능가하는 성능을 보였다. 하지만 이러한 발전에도 불구하고, 데이터 불균형과 같은 의료데이터의 특징은 AI 알고리즘의 적용을 어렵게 하고, 이를 해결하기 위해 다양한 이상 탐지 알고리즘이 도입되었다. CSV 형식에 적합한 알고리즘은 충분히 개발되었지만 X-ray와 같은 이미지 데이터에 적합한 알고리즘은 아직 충분하지 않다. 따라서 본 논문은 두가지 파트로 구성된다. 첫째, 이상 탐지 및 데이터 증식에 특화된 머신러닝 및 딥러닝 기술들을 소개한다. 둘째, 이 연구는 X-ray 데이터에 적합하며, 단순하고도 차별화된 U-AnoGAN이라는 모델을 제시한다. U-AnoGAN은 X-ray 이미지 전체가 아닌, 이미지 마스크를 활용하여 분류와 이상치 탐지를 수행한다. 첫번째 파트는 3가지의 실험들로 구성되며, 두번째 파트는 U-AnoGAN의 성능을 입증하기 위해 7가지의 실험들로 구성되었다. 우선, 첫번째 연구는 실험에 EEG 데이터 세트를 활용한다. 대표적인 머신러닝 분류기인 LGBM과, IF 및 LOF와 같은 이상 탐지 알고리즘이 적용되었고 AUC 점수를 계산하였다. LGBM은 72%로 가장 뛰어난 AUC 점수를 획득했습니다. 세 번째 단계에서는 CTGAN과 TVAE가 데이터 증강을 위해 사용되었다. 가짜 데이터 세트를 만든 후 시각화 및 유사성 점수를 사용하여 생성된 데이터를 비교하였다. 실험 결과, 유사도 기반으로 CTGAN이 TVAE보다 더 나은 성능을 보여주었다. 두번째 연구는, AnoGAN과 SR-AnoGAN의 단점을 명확히 하기 위해 X-ray 데이터에 사용하였다. 두 알고리즘은 불안정한 학습을 보였고, 이는 의료 데이터 적용에 어려울 것이라 판단된다. 그리고 다음 실험들에서는 마스크만으로도 X-ray 이미지 전체를 사용했을 때와 비슷한 성능을 낼 수 있다는 것을 증명하였다. 우선 정상, 코로나, 폐렴이라는 3가지 라벨에 대해서 X-ray 이미지는 90%, 마스크는 80%의 정확도를 보였다. 그 후, 정상 데이터로만 훈련한 U-AnoGAN의 이미지 분할 부분이 이루어졌고, 90% 이상의 dice coefficient 수치와 95%가 넘는 정확도를 보였다. 훈련된 U-AnoGAN은 각각 코로나, 폐렴 환자들의 데이터에 적용되었고, 각각의 이미지 마스크를 도출하였다. 도출된 마스크들은 VGG-16의 입력 데이터로 사용되었으며, 마스크 분류 결과, 90%의 정확도를 달성하였다. 마지막으로, U-AnoGAN은 각각 데이터에 anomaly score를 계산하였고, 이는 AnoGAN 및 SR-AnoGAN보다 양적으로나 질적으로 개선된 결과를 보였다. U-AnoGAN은 이미지에서 폐 같은 중요한 장기에 초점을 맞추어 진단할 수 있으며, 통계적으로 유의한 결과를 보여주고, 시각화를 통해 이미지 내의 비정상 부분을 확인할 수 있다. 이는 우선 이미지 마스크가 의료 진단에서 상당한 역할을 한다는 것과 AnoGAN 및 SR-AnoGAN과 같은이전 모델들을 능가했다는 점을 보여주었다.

URI: http://hanyang.dcollection.net/common/orgView/200000653509 https://repository.hanyang.ac.kr/handle/20.500.11754/179392

Appears in Collections:: GRADUATE SCHOOL[S](대학원) > INFORMATION SYSTEMS(정보시스템학과) > Theses (Ph.D.)

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE