73 0

Confident Identification of Co-fragmented Peptides from Unrestrictive Modification Search Using Machine Learning

Title
Confident Identification of Co-fragmented Peptides from Unrestrictive Modification Search Using Machine Learning
Author
박선진
Alternative Author(s)
Sunjin Park
Advisor(s)
Eunok Paek
Issue Date
2024. 2
Publisher
한양대학교 대학원
Degree
Master
Abstract
MODplus is an unrestrictive search tool that can detect any type of post- translational modifications (PTMs) from tandem mass spectrometry (MS/MS) data. For an MS/MS spectrum, MODplus generates two peptide-spectrum-match (PSM) results from two types of searches: 1) NOR search performed within a specified precursor mass tolerance and 2) C2N search performed outside the specified mass tolerance. Currently, MODplus chooses a more reliable PSM from NOR and C2N PSMs to avoid false PTM identifications caused by wrong precursor masses, but sometimes NOR and C2N PSMs might represent co-eluting peptides thus both need be retained. Since the existing MODplus was developed for open modification search, MODplus is performed without considering co-eluting peptide. However, many recent studies deal with data containing co-eluting peptide information, such as data generated from Data-Independent Acquisition (DIA). As a result, there is a need for a process to extract coeluting peptide information from MODplus results. Identifying coeluting peptides in an unrestricted modification search can be challenging due to the computational overhead. It is inefficient to perform coeluted peptide identification using methods like restrictive modification searches in MODplus. However, the post-process that automatically classifies the co-eluting peptide in the results of the existing MODplus can be performed within reasonable time. Here, we propose a novel post-processing procedure for MODplus results to identify co-eluting peptides. Recent approaches have reported the identification of co-eluting peptides from standard database searches, but in unrestrictive or open searches, determining co-eluting peptides is not trivial. We employed a random forest-based Co-eluting Peptide Classifier (CPC) to determine whether NOR and C2N peptides are co-eluting. We analyzed various PSM features and found that delta score, sequence distance, and retention time difference between NOR and C2N PSMs are important features for determining which of NOR and C2N PSMs is eluting or whether the two are co-eluting, i.e., three classes: 1) NOR-eluting, 2) C2N-eluting, and 3) co-eluting. Finally, PSMs classified by CPC were validated by Percolator. Our post-processing using CPC could successfully yield additional peptide identifications while minimizing false positives. Through CPC, we could obtain 32.88% increased identification for peptide sequence.
URI
http://hanyang.dcollection.net/common/orgView/200000724142https://repository.hanyang.ac.kr/handle/20.500.11754/188382
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > COMPUTER SCIENCE(컴퓨터·소프트웨어학과) > Theses (Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE