For the last twenty years, fragment assembly in DNA sequencing followed the """"""""overlap - layout - consensus"""""""" paradigm that is used in all currently available assembly tools. Although this approach proved to be useful in clone-by-clone DNA sequencing, it faces difficulties in genomic shotgun assembly: the existing algorithms make assembly errors and are often unable to resolve repeats even in prokaryotic genomes. Biologists are well aware of potential assembly errors and are forced to carry additional experiments to verify the assembled contigs. We abandon the classical """"""""overlap - layout - consensus"""""""" paradigm in favor of a new Eulerian Superpath approach to fragment assembly. This allows us to reduce the fragment assembly to a variation of the classical Eulerian path problem, and, for the first time, to resolve the problem of repeats in fragment assembly. Our new EULER algorithm resolves all repeats except long 100 percent perfect repeats that are theoretically impossible to resolve without additional experiments. This reduction allows one to generate provably optimal error-free fragment assemblies. The main goal of this proposal is to scale up EULER for assembly of eukaryotic genomes.
Zhi, Degui; Keich, Uri; Pevzner, Pavel et al. (2007) Correcting base-assignment errors in repeat regions of shotgun assembly. IEEE/ACM Trans Comput Biol Bioinform 4:54-64 |
Raphael, Benjamin J; Pevzner, Pavel A (2004) Reconstructing tumor amplisomes. Bioinformatics 20 Suppl 1:i265-73 |
Pevzner, Pavel A; Pevzner, Paul A; Tang, Haixu et al. (2004) De novo repeat classification and fragment assembly. Genome Res 14:1786-96 |
Li, Lei M; Kim, Jong Hyun; Waterman, Michael S (2004) Haplotype reconstruction from SNP alignment. J Comput Biol 11:505-16 |
Raphael, Benjamin; Zhi, Degui; Tang, Haixu et al. (2004) A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res 14:2336-46 |
Raphael, Benjamin J; Volik, Stanislav; Collins, Colin et al. (2003) Reconstructing tumor genome architectures. Bioinformatics 19 Suppl 2:ii162-71 |
Mulyukov, Zufar; Pevzner, Pavel A (2002) EULER-PCR: finishing experiments for repeat resolution. Pac Symp Biocomput :199-210 |
Heber, Steffen; Alekseyev, Max; Sze, Sing-Hoi et al. (2002) Splicing graphs and EST assembly problem. Bioinformatics 18 Suppl 1:S181-8 |