Repository at Hanyang University: Discovering rare variant elements from next generation sequencing data in genomics

Browse

My Repository

Repository at Hanyang UniversityGRADUATE SCHOOL[S](대학원)ELECTRONICS AND COMPUTER ENGINEERING(전자컴퓨터통신공학과)Theses (Ph.D.)

75 0

Discovering rare variant elements from next generation sequencing data in genomics

Title: Discovering rare variant elements from next generation sequencing data in genomics

Author: 배준우

Advisor(s): Heejin Park

Issue Date: 2024. 2

Publisher: 한양대학교 대학원

Degree: Doctor

Abstract: Bioinformatics is a dynamic and ever-evolving field that bridges biology and computational science. Its applications are broad, ranging from fundamental biological research to medical advancements and biotechnological innovations. It enables scientists to make sense of the vast and complex biological data generated in the modern era, ultimately advancing our understanding of life and improving human health and well- being. In particular, rare variant detection from Next-Generation Sequencing (NGS) data is a crucial task in genomics, as these rare variants may play a significant role in understanding the genetic basis of various diseases. Recent advances in sequencing technology have allowed us to investigate personal genomes to find structural variations, which have been studied extensively to identify the association with the physiology of diseases such as cancer. In particular, mobile genetic elements (MGEs) are one of the major constituents of the human genomes, and cause genome instability by insertion, mutation, and rearrangement. We have developed a new program, iMGEins, to identify such novel MGEs by using sequencing reads of individual genomes, and to explore the breakpoints with the supporting reads and MGEs detected. iMGEins is the first universal MGE detection tool applying three algorithmic paradigms (discordant read-pair mapping, split-read mapping, and contig assembly). Our evaluation results showed an excellent performance in detecting novel MGEs from simulated genomes as well as real personal genomes. In detail, the average recall rate of iMGEins was 96.25%, which is about two times higher than the rates of two other tools compared. The average precision rate of iMGEins was 99.57%, which is also two times higher than the other tools. On the testing with real human genomes of the NA12878 sample, iMGEins found 2,040 known MGEs that are individually inserted, along with 122 MGE sequences inserted. Numerous programs for finding SNPs and short indels based on NGS data have been developed. However, existing programs in which users directly set parameters and call variants miss many true positives or call many false positives, despite the efforts of countless researchers. This is particularly serious when different SNP/indel variants are applied to one location. In order to solve this problem, we conducted a research to find SNPs/Indels from NGS read data by applying a deep-learning method that showed better performance than the general state-of-the-art algorithm. After processing the read pileup data to be suitable for text-based NGS data, we applied it to a deep learning model modified based on the Transformer model. This method showed similar performance to other existing programs, but showed better performance especially in the special case where SNPs and Indels were mixed.

URI: http://hanyang.dcollection.net/common/orgView/200000723107 https://repository.hanyang.ac.kr/handle/20.500.11754/188323

Appears in Collections:: GRADUATE SCHOOL[S](대학원) > ELECTRONICS AND COMPUTER ENGINEERING(전자컴퓨터통신공학과) > Theses (Ph.D.)

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE