For the last twenty years, fragment assembly in DNA sequencing followed the """"""""overlap - layout - consensus"""""""" paradigm that is used in all currently available assembly tools. Although this approach proved to be useful in clone-by-clone DNA sequencing, it faces difficulties in genomic shotgun assembly: the existing algorithms make assembly errors and are often unable to resolve repeats even in prokaryotic genomes. Biologists are well aware of potential assembly errors and are forced to carry additional experiments to verify the assembled contigs. We abandon the classical """"""""overlap - layout - consensus"""""""" paradigm in favor of a new Eulerian Superpath approach to fragment assembly. This allows us to reduce the fragment assembly to a variation of the classical Eulerian path problem, and, for the first time, to resolve the problem of repeats in fragment assembly. Our new EULER algorithm resolves all repeats except long 100 percent perfect repeats that are theoretically impossible to resolve without additional experiments. This reduction allows one to generate provably optimal error-free fragment assemblies. The main goal of this proposal is to scale up EULER for assembly of eukaryotic genomes.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG002366-01
Application #
6360359
Study Section
Genome Study Section (GNM)
Program Officer
Felsenfeld, Adam
Project Start
2001-08-24
Project End
2004-07-31
Budget Start
2001-08-24
Budget End
2002-07-31
Support Year
1
Fiscal Year
2001
Total Cost
$717,074
Indirect Cost
Name
University of California San Diego
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
077758407
City
La Jolla
State
CA
Country
United States
Zip Code
92093
Zhi, Degui; Keich, Uri; Pevzner, Pavel et al. (2007) Correcting base-assignment errors in repeat regions of shotgun assembly. IEEE/ACM Trans Comput Biol Bioinform 4:54-64
Raphael, Benjamin J; Pevzner, Pavel A (2004) Reconstructing tumor amplisomes. Bioinformatics 20 Suppl 1:i265-73
Pevzner, Pavel A; Pevzner, Paul A; Tang, Haixu et al. (2004) De novo repeat classification and fragment assembly. Genome Res 14:1786-96
Li, Lei M; Kim, Jong Hyun; Waterman, Michael S (2004) Haplotype reconstruction from SNP alignment. J Comput Biol 11:505-16
Raphael, Benjamin; Zhi, Degui; Tang, Haixu et al. (2004) A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res 14:2336-46
Raphael, Benjamin J; Volik, Stanislav; Collins, Colin et al. (2003) Reconstructing tumor genome architectures. Bioinformatics 19 Suppl 2:ii162-71
Mulyukov, Zufar; Pevzner, Pavel A (2002) EULER-PCR: finishing experiments for repeat resolution. Pac Symp Biocomput :199-210
Heber, Steffen; Alekseyev, Max; Sze, Sing-Hoi et al. (2002) Splicing graphs and EST assembly problem. Bioinformatics 18 Suppl 1:S181-8