237 0

Clustering and Visualizing Similar Diseases with Comparative Vector Embeddings for Medical Diagnosis

Title
Clustering and Visualizing Similar Diseases with Comparative Vector Embeddings for Medical Diagnosis
Author
왕격
Advisor(s)
조인휘
Issue Date
2022. 8
Publisher
한양대학교
Degree
Master
Abstract
Disease diagnosis has always been a very serious issue in the medical field. Its development plays an important role in the prevention and treatment of human diseases. Therefore, the prediction of disease is a hot topic that has been discussed for a long time. Traditional disease diagnosis was made by doctors’ knowledge and experience, but the identifying process can be time-consuming and susceptible. However, with the increasing development of computer technology, deep learning is widely adopted for solving disease prediction problems. Deep learning can allow the computer to learn things by using large neural networks and adjusting the hyper-parameters connected between layers whenever new data is available. In the field of medicine, deep learning has been growing significantly in disease diagnosis and detection areas. In this paper, we proposed a diabetes diagnosis method by learning the embedding of clinical concept data using both structured data and unstructured data based on the Skip-gram model. We also clustered similar diseases by calculating the distance between the embedding matrix and visualizing them in a lower-dimension space. This paper proposes a method that uses Skip-gram with Negative Sampling based on pre-trained Glove vectors to learn the medical domain word representations and then cluster clinical words of diseases with similar semantics and visualize them in a 2-dimensional space. We evaluate our model on three different datasets which contain different relationships of clinical concept words, then compare the results after learning the medical terms embeddings of these three datasets respectively. By examining the Euclidean distance between congestive heart failure and other diseases related to it such as hypertension, coronary artery disease, cardiac arrest, and atrial fibrillation, we obtained the results that EMRs data boosted the best performance with distances of 13.32, 14.98, 12.37 and 14.11 between congestive heart failure and its relevant four diseases. We also found that each group of diseases that are clustered together has a high similarity in semantics based on the word embeddings learned from the model.
URI
http://hanyang.dcollection.net/common/orgView/200000626629https://repository.hanyang.ac.kr/handle/20.500.11754/174215
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > COMPUTER SCIENCE(컴퓨터·소프트웨어학과) > Theses (Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE