213 0

An efficient assembler to remove superbubbles, cycles and pseudocycles in assembly graphs

Title
An efficient assembler to remove superbubbles, cycles and pseudocycles in assembly graphs
Author
루스다
Alternative Author(s)
Rushda Muneer
Advisor(s)
Yongsu Park
Issue Date
2023. 8
Publisher
한양대학교
Degree
Master
Abstract
With the advancement in DNA sequencing technologies, the concern for effective genome assembly is a hot topic in the field of bioinformatics. Although present day assemblers like MEGAHIT can efficiently reconstruct and assemble reads generated by sequencing machines using De Bruijn graphs centered assembly approaches, there remains a lot of room for improvement due to a graph fragmentation in regions of complex assembly. Our assembler aims to use a graph-based approach that utilizes the shape of substructures instead of just relying on the underlying sequence to reconstruct longer contigs and minimize misassemblies for microbial genomes. We focus on removing typical graph topologies like tips, bubbles, cycles, pseudocycles and superbubbles that are usually generated either due to erroneous reads or high number of close repeated regions in the fragments. Simplifying such complex structures from the assembly graph can assist in effective reconstruction of the original assembly. Upon testing our results with QUAST evaluation metrics, we were able to achieve better results for simpler single genomes in terms of longest alignment and N50 values and less misassemblies. However, for larger and complex genomes we either got comparable or slightly lower longest or N50 values and more misassemblies than MEGAHIT. We suspect that the removal of certain cycles and pseudocycles create oversimplification that can cause this problem. So far, our assembler works well with simpler microbial genomes, but we aim to further improve and extend this research to include the reconstruction of complex genomes.
URI
http://hanyang.dcollection.net/common/orgView/200000684401https://repository.hanyang.ac.kr/handle/20.500.11754/186747
Appears in Collections:
GRADUATE SCHOOL[S](대학원) > COMPUTER SCIENCE(컴퓨터·소프트웨어학과) > Theses (Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE