AlignGraph: Algorithm for secondary de novo genome assembly guided by closely related references

Abstract

Motivation: De novo assemblies of genomes remain one of the most challenging applications in next-generation sequencing. Usually, their results are incomplete and fragmented into hundreds of contigs. Repeats in genomes and sequencing errors are the main reasons for these complications. With the rapidly growing number of sequenced genomes, it is now feasible to improve assemblies by guiding them with genomes from related species. Results: Here we introduce AlignGraph, an algorithm for extending and joining de novo-assembled contigs or scaffolds guided by closely related reference genomes. It aligns paired-end reads and preassembled contigs or scaffolds to a close reference. From the obtained alignments, it builds a novel data structure, called the PE multipositional de Bruijn graph. The incorporated positional information from the alignments and PE reads allows us to extend the initial assemblies, while avoiding incorrect extensions and early terminations. In our performance tests, AlignGraph was able to substantially improve the contigs and scaffolds from several assemblers. For instance, 28.7-62.3% of the contigs of Arabidopsis thaliana and human could be extended, resulting in improvements of common assembly metrics, such as an increase of the N50 of the extendable contigs by 89.9-94.5% and 80.3-165.8%, respectively. In another test, AlignGraph was able to improve the assembly of a published genome by increasing the N50 of its extendable scaffolds by 86.6%. These results demonstrate AlignGraph's efficiency in improving genome assemblies by taking advantage of closely related references. © 2014 The Author. Published by Oxford University Press. All rights reserved.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 93,069

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Similar books and articles

Analytics

Added to PP
2017-03-08

Downloads
2 (#1,818,851)

6 months
2 (#1,259,626)

Historical graph of downloads

Sorry, there are not enough data points to plot this chart.
How can I increase my downloads?

Author's Profile

Tao Jiang
Fudan University

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references