357 0

Construction of Korean genetic variants database and a variome analysis pipeline for population WES/WGS data

Title
Construction of Korean genetic variants database and a variome analysis pipeline for population WES/WGS data
Author
박영찬
Advisor(s)
고인송
Issue Date
2016-08
Publisher
한양대학교
Degree
Master
Abstract
A map of human genome, a piece of work from Human Genome Project (HGP) which took about 10 years, leaves us important scientific information and questions simultaneously. One question arises that how the genome map should be applied on treating disease or developing cures practically. After many years from the end of Human Genome Project, the next generation sequencing technology, which enables one whole human genome re-sequencing in a very short time and at low costs, may answer the question. Now, the full map of one individual’s whole genome can be sequenced in a few weeks. The accumulation of sequence information, generated by the next generation sequencing technologies (NGS), allows researchers to identify large numbers of variants consisting of genetic markers which are very important to understand how they are affecting on hereditary disorders. NGS also boosts up human genome resequencing at low cost, but it has brought about the exponential increase in the amount of data resulting in major computational challenges. To process NGS data, new computational approaches as well as software products have been developed to support the whole genome and exome data analysis. In the results of analyzing 100 Korean individuals’ exome data, we identified a pool of 1,907,598 Single Nucleotide Variants (SNV) and 325,166 Insertion and Deletion (InDel) as initial variations. They are masked with the dbSNP and 1000 Genome Project (1KGP) as the known variation for constructing a database of Korean variations. The database can be utilized as a pilot database of Korean variome and contribute to Korean variome study. The discovered variants and annotation are stored in a newly developed Hanyang variome database (http://166.104.77.48). Information is linked to other related databases in order to allow researchers to access information in a quick and easy way. Moreover, applying the methods from the exome study, we released an automated pipeline for analyzing the whole exome and genome sequence (WES/WGS) population data for novice researchers in Bioinformatics and Genomics fields. Analysis results include various statistical features, such as population distance relationship, Principle Component Analysis (PCA), variants qualities and visualization through a genome browser.
URI
https://repository.hanyang.ac.kr/handle/20.500.11754/125405http://hanyang.dcollection.net/common/orgView/200000429235
Appears in Collections:
GRADUATE SCHOOL OF BIOMEDICAL SCIENCE AND ENGINEERING[S](의생명공학전문대학원) > BIOMEDICAL INFORMATICS(생명의료정보학과) > Theses (Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE