165 0

Development of Ultra-­fast and High-­performing Variant Callers from Whole-­Genome Sequencing Data

Title
Development of Ultra-­fast and High-­performing Variant Callers from Whole-­Genome Sequencing Data
Other Titles
전장유전체시퀀싱데이터를 이용한 초고속 그리고 고성능 변이 검출기들의 개발
Author
최민학
Alternative Author(s)
Min-hak Choi
Advisor(s)
남진우
Issue Date
2023. 8
Publisher
한양대학교
Degree
Doctor
Abstract
Following the first human genome sequence by J. Craig Venter, human genome sequencing has progressed rapidly. Next­-generation sequencing (NGS) technology now enables cost­-effective production of whole­genome sequencing (WGS) data on a population scale. The cost of sequencing the entire human genome has fallen from around one million dollars in 2007 to around 500-600 dollars by 2023. WGS data has revealed important insights into several diseases. For example, repetitive expansions of "CAG" in the HTT gene are associated with Huntington's disease, while single nucleotide variations and defects in BRCA1/2 are associated with breast and ovarian cancer. In addition, genomic variations such as the BCR­-ABL fusion gene in chronic myeloid leukaemia serve as diagnostic markers, aid in drug development and act as therapeutic targets for many diseases. With the advancement of WGS technology and the emergence of research institutions and consortia dedicated to the large­-scale production of WGS, there has been a growing interest in the effective detection of variations using WGS data. However, the sheer volume of data generated by WGS poses several challenges. For example, when generating WGS data at 30x coverage using Illumina short paired­-end reads, the data size for a single sample can reach approximately 100–150 GB. In addition, high performance computing resources are required to detect the desired types of variations and the execution time is significant, typically taking 3–5 days from raw data to germline mutation detection. To address these challenges, researchers have sought to develop powerful and efficient variation detection algorithms and methods. In this context, I have developed two rapid and powerful mutation detection tools to detect genomic variations effectively and accurately amidst the huge noise generated by WGS data. As a second contribution, I have developed the BIG­-RS de novo mutation (DNM) detection pipeline. This pipeline is based on the widely used GATK program and includes several in­house filters that reflect the characteristics of DNMs, such as compound heterozygous variants and their absence in the parental genomes. Using the WGS dataset of the CEHP/Utah NA12878 family, I performed a comparative evaluation of the performance of the BIG-­RS DNM detection pipeline against DNM detection programs such as DeNovoCNN, TrioDenovo and RUFUS, as well as the criteria used in the papers by Thomas A Sasani et al. eLIFE (2019) and Lucie A Bergeron et al. eLIFE (2022). The evaluation results showed that the BIG-­RS DNM detection pipeline had superior DNM detection capabilities. The use of ETCHING and the BIG-­RS DNM detection pipeline is expected to accelerate the study of SVs and DNMs using large-­scale WGS data. In addition, these tools are expected to contribute to personalized medicine by enabling the rapid and efficient detection of newly discovered de novo variations in individuals using clinical WGS data.
URI
http://hanyang.dcollection.net/common/orgView/200000684102https://repository.hanyang.ac.kr/handle/20.500.11754/186927
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > LIFE SCIENCE(생명과학과) > Theses (Ph.D.)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE