316 0

Efficient Algorithms to Identify Peptides from Massive High-resolution MS/MS Spectra

Title
Efficient Algorithms to Identify Peptides from Massive High-resolution MS/MS Spectra
Author
김현우
Advisor(s)
박희진
Issue Date
2016-08
Publisher
한양대학교
Degree
Doctor
Abstract
Peptide identification is an important problem in proteomics. One of the most popular scoring schemes for peptide identification is XCorr (cross-correlation). Since calculating XCorr is very computationally intensive, a lot of efforts have been made to develop fast XCorr engines. However, the existing XCorr engines are not suitable for high-resolution tandem mass spectrometry because they are too slow and consume most of the running time. We present a high-speed XCorr engine for high-resolution tandem mass spectrometry by developing a novel algorithm for calculating XCorr. The algorithm enables XCorr calculation 1.25-49 times faster than previous algorithms for 0.01 Da fragment tolerance. Recently, proteogenomics has emerged as a new research field that combines proteomics and genomics. Proteogenomics research has been using six-frame translation of the whole genome or amino acid exon graphs to overcome the limitations of reference protein sequence databases. However, six-frame translation is not suitable for annotating genes that span over multiple exons, and amino acid exon graphs are not convenient to represent novel splice variants and exon skipping events between exons of incompatible reading frames. We propose a proteogenomic pipeline NextSearch (Nucleotide EXon-graph Transcriptome Search) that is based on a nucleotide exon graph. This pipeline consists of constructing a compact nucleotide exon graph that systematically incorporates novel splice variations, and a search tool that identifies peptides by directly searching the nucleotide exon graph against tandem mass spectra. Because our exon graph stores nucleotide sequences, it can easily represent novel splice variations and exon skipping events between incompatible reading frame exons. Searching for peptide identification is performed against this nucleotide ex`on graph, without converting it into a protein sequence in a FASTA format, achieving an order of magnitude reduction in the size of the sequence database storage. NextSearch outputs the proteome-genome/transcriptome mapping results in a general feature format (GFF) file, which can be visualized by public tools such as the UCSC Genome Browser.
URI
https://repository.hanyang.ac.kr/handle/20.500.11754/125561http://hanyang.dcollection.net/common/orgView/200000429276
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > ELECTRONICS AND COMPUTER ENGINEERING(전자컴퓨터통신공학과) > Theses (Ph.D.)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE