542 0

유전체 변이를 검출하는 알고리즘 비교

Title
유전체 변이를 검출하는 알고리즘 비교
Other Titles
A survey of algorithms for identifying structural variants with genomic sequencing Data
Author
이시현
Alternative Author(s)
Lee Seehyun
Advisor(s)
노미나
Issue Date
2017-02
Publisher
한양대학교
Degree
Master
Abstract
Next-Generation Sequencing은 유전체를 무수히 많은 조각으로 나눈 뒤 각각의 염기서열을 조합하여 유전체를 해독하는 분석 방법이다. 이러한 NGS 기술의 발달과 분석 비용의 하락으로 인하여 다양한 연구 분야에서 NGS가 보편적으로 쓰이기 시작했다. NGS 기술 중 하나인 Whole-Genome Sequencing으로 생성된 Data의 분석은 크게 4단계의 pipelines를 거치게 되는데, quality assessment, read alignment, variant identification, annotation이다. 각 개체의 유전체를 해독하여 비교·분석함으로써 특정 질병과 연관된 유전적 요인을 찾는 것이 가능하다. 특정 질병에 걸린 환자와 그렇지 않은 사람의 유전체를 비교·분석하는 방법이 중요한데 본 연구에서는 variant identification 중 structural variants를 효율적으로 찾는 여러 tools의 알고리즘 비교와 성능 분석을 진행 했다. 일반적으로 structural variants는 insertion, deletion, inversion, duplication, translocation 등을 포함하며 이 중 Transposable element라 불리는 long insertion을 효율적으로 찾는 것이 중요한 이슈가 되고 있다. 분석을 위해 최근 개발 수정이 이루어지는 Soft-ware tools(Breakdancer, Pindel, Socrates, Delly, TEMP) 5개를 비교 선정했다. 각 tools의 long insertion identification 성능 평가 기준이 되는 sensitivity와 accuracy를 분석하기 위해 simulation data 3개를 생성하여 분석했다.|Next-Generation Sequencing technology fragments genomes into numerous pieces, and then decodes the fragmented genomes into nucleotide sequences. The analysis of data that is generated by whole-genome Sequencing is based on the four-step pipelines: quality assessment, read alignment, variant identification, and annotation. Through these steps, genetic factors associated with a particular disease could be discovered. In this work, variant identification algorithms are compared and analyzed to investigate their performance in terms of how efficiently and accurately they find structural variants. In general, structural variants include insertion, deletion, inversion, duplication, and translocation. Recently, it is becoming increasingly important to efficiently find long insertions, which are called transposable elements (TEs). In order to analyze the performance of finding TEs, five systems (BreakDancer, Pindel, Socrates, Delly and TEMP) were selected. We have also generated three simulation data having sixty long insertions, and exploited them to analyze the sensitivity and accuracy as the performance evaluation criteria.; Next-Generation Sequencing technology fragments genomes into numerous pieces, and then decodes the fragmented genomes into nucleotide sequences. The analysis of data that is generated by whole-genome Sequencing is based on the four-step pipelines: quality assessment, read alignment, variant identification, and annotation. Through these steps, genetic factors associated with a particular disease could be discovered. In this work, variant identification algorithms are compared and analyzed to investigate their performance in terms of how efficiently and accurately they find structural variants. In general, structural variants include insertion, deletion, inversion, duplication, and translocation. Recently, it is becoming increasingly important to efficiently find long insertions, which are called transposable elements (TEs). In order to analyze the performance of finding TEs, five systems (BreakDancer, Pindel, Socrates, Delly and TEMP) were selected. We have also generated three simulation data having sixty long insertions, and exploited them to analyze the sensitivity and accuracy as the performance evaluation criteria.
URI
https://repository.hanyang.ac.kr/handle/20.500.11754/124229http://hanyang.dcollection.net/common/orgView/200000430358
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > COMPUTER SCIENCE(컴퓨터·소프트웨어학과) > Theses (Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE