Repository at Hanyang University: 유전체 변이를 검출하는 알고리즘 비교

Browse

My Repository

Repository at Hanyang UniversityGRADUATE SCHOOL[S](대학원)COMPUTER SCIENCE(컴퓨터·소프트웨어학과)Theses (Master)

542 0

유전체 변이를 검출하는 알고리즘 비교

Title: 유전체 변이를 검출하는 알고리즘 비교

Other Titles: A survey of algorithms for identifying structural variants with genomic sequencing Data

Author: 이시현

Alternative Author(s): Lee Seehyun

Advisor(s): 노미나

Issue Date: 2017-02

Publisher: 한양대학교

Degree: Master

Abstract: Next-Generation Sequencing은 유전체를 무수히 많은 조각으로 나눈 뒤 각각의 염기서열을 조합하여 유전체를 해독하는 분석 방법이다. 이러한 NGS 기술의 발달과 분석 비용의 하락으로 인하여 다양한 연구 분야에서 NGS가 보편적으로 쓰이기 시작했다. NGS 기술 중 하나인 Whole-Genome Sequencing으로 생성된 Data의 분석은 크게 4단계의 pipelines를 거치게 되는데, quality assessment, read alignment, variant identification, annotation이다. 각 개체의 유전체를 해독하여 비교·분석함으로써 특정 질병과 연관된 유전적 요인을 찾는 것이 가능하다. 특정 질병에 걸린 환자와 그렇지 않은 사람의 유전체를 비교·분석하는 방법이 중요한데 본 연구에서는 variant identification 중 structural variants를 효율적으로 찾는 여러 tools의 알고리즘 비교와 성능 분석을 진행 했다. 일반적으로 structural variants는 insertion, deletion, inversion, duplication, translocation 등을 포함하며 이 중 Transposable element라 불리는 long insertion을 효율적으로 찾는 것이 중요한 이슈가 되고 있다. 분석을 위해 최근 개발 수정이 이루어지는 Soft-ware tools(Breakdancer, Pindel, Socrates, Delly, TEMP) 5개를 비교 선정했다. 각 tools의 long insertion identification 성능 평가 기준이 되는 sensitivity와 accuracy를 분석하기 위해 simulation data 3개를 생성하여 분석했다.|Next-Generation Sequencing technology fragments genomes into numerous pieces, and then decodes the fragmented genomes into nucleotide sequences. The analysis of data that is generated by whole-genome Sequencing is based on the four-step pipelines: quality assessment, read alignment, variant identification, and annotation. Through these steps, genetic factors associated with a particular disease could be discovered. In this work, variant identification algorithms are compared and analyzed to investigate their performance in terms of how efficiently and accurately they find structural variants. In general, structural variants include insertion, deletion, inversion, duplication, and translocation. Recently, it is becoming increasingly important to efficiently find long insertions, which are called transposable elements (TEs). In order to analyze the performance of finding TEs, five systems (BreakDancer, Pindel, Socrates, Delly and TEMP) were selected. We have also generated three simulation data having sixty long insertions, and exploited them to analyze the sensitivity and accuracy as the performance evaluation criteria.; Next-Generation Sequencing technology fragments genomes into numerous pieces, and then decodes the fragmented genomes into nucleotide sequences. The analysis of data that is generated by whole-genome Sequencing is based on the four-step pipelines: quality assessment, read alignment, variant identification, and annotation. Through these steps, genetic factors associated with a particular disease could be discovered. In this work, variant identification algorithms are compared and analyzed to investigate their performance in terms of how efficiently and accurately they find structural variants. In general, structural variants include insertion, deletion, inversion, duplication, and translocation. Recently, it is becoming increasingly important to efficiently find long insertions, which are called transposable elements (TEs). In order to analyze the performance of finding TEs, five systems (BreakDancer, Pindel, Socrates, Delly and TEMP) were selected. We have also generated three simulation data having sixty long insertions, and exploited them to analyze the sensitivity and accuracy as the performance evaluation criteria.

URI: https://repository.hanyang.ac.kr/handle/20.500.11754/124229 http://hanyang.dcollection.net/common/orgView/200000430358

Appears in Collections:: GRADUATE SCHOOL[S](대학원) > COMPUTER SCIENCE(컴퓨터·소프트웨어학과) > Theses (Master)

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE