DNA sequencing enables biomedical discovery. Next-generation DNA sequencers such as the 454 platform provide cost-effective alternatives to traditional Sanger sequencing. All sequencing platforms rely on sophisticated assembly software to transform a collection of reads into chromosome-length sequences. Celera Assembler is a premier assembler that has remained at the forefront of genomics for 10 years. It has enabled ground-breaking publications of eukaryotic genomes and metagenomic samples sequenced by Sanger chemistry. Now, the Celera Assembler also supports the 454 technology. Celera Assembler is the software most aggressive at contig construction and repeat resolution from 454 read and mate pair data sets. Celera Assembler is the only assembler that handles the large data sets involved in human-genome-scale projects. Renewed funding will enable several important algorithms extensions. The software will show improved performance and yield on Sanger, 454, and hybrid data sets. The software will gain capability to exploit short-read next-generation data during assembly of longer reads. It will gain the ability to incorporate reference sequences as an assist to de novo assembly. Its data model will be refined for increased precision on real data. The software will be able to integrate non-shotgun reads generated by directed sequencing or other means. It will enhance metagenomics assembly of environmental samples. Software engineering improvements will assist the algorithms development. To enable the wider adoption of the Celera Assembler, the project will include documentation, training, end-user support, and support for integration with data sources and analysis tools.

Public Health Relevance

/ Relevance to Public Health This research will enable scientists to study basic biology by helping them decipher the DNA sequences of entire genomes. This research is necessary because current sequencing technology reads very few DNA bases at a time. Sophisticated software is required to reconstruct the chromosome-length sequences that allow biologists to recognize genes, pathways, evolutionary trees, and ecosystem interdependencies.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM077117-05
Application #
7916754
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Lyster, Peter
Project Start
2006-05-11
Project End
2012-06-30
Budget Start
2010-07-01
Budget End
2011-06-30
Support Year
5
Fiscal Year
2010
Total Cost
$598,559
Indirect Cost
Name
J. Craig Venter Institute, Inc.
Department
Type
DUNS #
076364392
City
Rockville
State
MD
Country
United States
Zip Code
20850
Pearce, S L; Clarke, D F; East, P D et al. (2017) Genomic innovations, transcriptional plasticity and gene loss underlying the evolution and divergence of two highly polyphagous and invasive Helicoverpa pest species. BMC Biol 15:63
Gulia-Nuss, Monika; Nuss, Andrew B; Meyer, Jason M et al. (2016) Genomic insights into the Ixodes scapularis tick vector of Lyme disease. Nat Commun 7:10507
Marinotti, Osvaldo; Cerqueira, Gustavo C; de Almeida, Luiz Gonzaga Paula et al. (2013) The genome of Anopheles darlingi, the main neotropical malaria vector. Nucleic Acids Res 41:7387-400
Koren, Sergey; Schatz, Michael C; Walenz, Brian P et al. (2012) Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol 30:693-700
Prüfer, Kay; Munch, Kasper; Hellmann, Ines et al. (2012) The bonobo genome compared with the chimpanzee and human genomes. Nature 486:527-31
Miller, Webb; Hayes, Vanessa M; Ratan, Aakrosh et al. (2011) Genetic diversity and population structure of the endangered marsupial Sarcophilus harrisii (Tasmanian devil). Proc Natl Acad Sci U S A 108:12348-53
Koren, Sergey; Miller, Jason R; Walenz, Brian P et al. (2010) An algorithm for automated closure during assembly. BMC Bioinformatics 11:457
Miller, Jason R; Koren, Sergey; Sutton, Granger (2010) Assembly algorithms for next-generation sequencing data. Genomics 95:315-27
Rausch, Tobias; Koren, Sergey; Denisov, Gennady et al. (2009) A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads. Bioinformatics 25:1118-24
Denisov, Gennady; Walenz, Brian; Halpern, Aaron L et al. (2008) Consensus generation and variant detection by Celera Assembler. Bioinformatics 24:1035-40

Showing the most recent 10 out of 12 publications