208 0

Computational approach for comprehensive transcriptome assembly using high throughput RNA-seq

Title
Computational approach for comprehensive transcriptome assembly using high throughput RNA-seq
Other Titles
대규모 알엔에이 서열 데이터를 이용한 전산학적 전사체 어셈블리 방법
Author
유보현
Alternative Author(s)
Bo-Hyun You
Advisor(s)
남진우
Issue Date
2015-08
Publisher
한양대학교
Degree
Master
Abstract
High-throughput RNA-sequencing (RNA-seq) have been identified unprecedented transcriptome diversity in microorganisms, plants and animals but annotations of many putative transcripts are fragmentary or poorly define boundaries of transcripts mainly because of a lack of strand information of RNA-seq reads, and/or blurred RNA-seq signals in both ends of transcript. Strand information of the reads and unbiased 5’ and 3’ terminal signals of transcripts could correct the erroneous annotations. Here, we present the Co-assembly Followed by End-correction (CAFE), which estimates strand information of unstranded reads using a maximum likelihood estimation of hidden Markov models (HMM), re-assembles the RNA reads with predicted strand information, and corrects the boundaries of transcripts using pre-compiled transcription start sites (TSS) and, cleavage and polyadenylation sites (CPS). Using the CAFE, the strand information for all unstranded RNA-seq reads from this study were successfully predicted. The CAFE improved original annotations separately assembled with strand-specific and unstranded RNA-seqs by 1.3 – 1.6% in the sensitivity and 14 – 18.4% in the specificity with re-constructing 166,227 for HeLa cells (89.49% had both TSS and CPS) and 244,085 transfrags for mouse embryonic stem (mES) cells (93.63% had both TSS and CPS). More full length transcripts with both TSS and CPS (2,314 for HeLa and 872 for mES cells) were reconstructed than the original annotations, regardless of transcriptome assemblers. Of the resulting transfrags, hundreds of putative lincRNAs and a thousand of protein-coding genes were newly identified in addition to thousands of known lincRNAs in HeLa and mES cells, respectively, suggesting that the CAFE would be largely applicable to discover novel isoforms and transcripts as well as expand repertoires of RNAs.
URI
https://repository.hanyang.ac.kr/handle/20.500.11754/128016http://hanyang.dcollection.net/common/orgView/200000426957
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > LIFE SCIENCE(생명과학과) > Theses (Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE