The two widely used Next Generation Sequencing (NGS) technologies are 454 Sequencing and Illumina sequencing. We propose to determine the best sequencing strategy, that is the optimal mix of 454 and Illumina read and mate pair data to produce the best possible assembly at the lowest cost. We propose to continue developing our software for closing gaps and fixing mis- assemblies by our shooting method. We can extend the method to use additional NGS reads and mate pairs to close gaps in existing assemblies to increase contiguity, and find and correct mis-assemblies. This method can be used as a cheaper alternative to traditional finishing techniques. The final product of any assembly project is a set of the chromosome sequence files. We propose to develop improved software capable of producing chromosome sequences from the assembled contigs using mate pair and marker data. Our preliminary version works for assemblies that have large contigs (N50 size >100Kb). Genomes assembled from the NGS data typically have small contigs (N50 size of 10-20Kb). We propose to extend development of the software so that it is applicable to genome assemblies of the NGS data. We propose to employ the experience that we gained in the previous project period to re-assemble the genomes of chicken, rat, and possibly other genomes of public health interest from the existing Trace Archive data combined with (if available) additional NGS data. The NGS data is getting cheaper. Now there are many groups interested in sequencing various genomes. Thus we propose to produce de novo assemblies of insect, plant genomes and other organisms of public health interest in collaboration with the centers that generate the data. Our goal is to serve as an expert genome assembly group that provides its services and techniques to the community.

Public Health Relevance

Advances in the sequencing technologies made it possible to obtain large amounts of sequence data quickly and at low cost, compared to the Sanger sequencing. Our goals are to contribute our techniques, software and expertise in assembly of the short read data to the community. We will continuously improve our methods to obtain the best possible assemblies of the new genomes sequenced with the latest technologies. The ultimate goal of this project is to improve public health by better understanding the human genome and the genomes of other species.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
2R01HG002945-07
Application #
8040077
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Felsenfeld, Adam
Project Start
2003-08-13
Project End
2014-05-31
Budget Start
2011-07-13
Budget End
2012-05-31
Support Year
7
Fiscal Year
2011
Total Cost
$288,125
Indirect Cost
Name
University of Maryland College Park
Department
Other Basic Sciences
Type
Schools of Arts and Sciences
DUNS #
790934285
City
College Park
State
MD
Country
United States
Zip Code
20742
Muñoz, Adriana; Santos Muñoz, Daniella; Zimin, Aleksey et al. (2016) Evolution of transcriptional networks in yeast: alternative teams of transcriptional factors for different species. BMC Genomics 17:826
Li, Gang; Hillier, LaDeana W; Grahn, Robert A et al. (2016) A High-Resolution SNP Array-Based Linkage Map Anchors a New Domestic Cat Draft Genome Assembly and Provides Detailed Patterns of Recombination. G3 (Bethesda) 6:1607-16
Marçais, Guillaume; Yorke, James A; Zimin, Aleksey (2015) QuorUM: An Error Corrector for Illumina Reads. PLoS One 10:e0130821
Schrader, Lukas; Kim, Jay W; Ence, Daniel et al. (2014) Transposable element islands facilitate adaptation to novel environments in an invasive species. Nat Commun 5:5495
Zimin, Aleksey V; Marçais, Guillaume; Puiu, Daniela et al. (2013) The MaSuRCA genome assembler. Bioinformatics 29:2669-77
Patro, Rob; Sefer, Emre; Malin, Justin et al. (2012) Parsimonious reconstruction of network evolution. Algorithms Mol Biol 7:25
Salzberg, Steven L; Phillippy, Adam M; Zimin, Aleksey et al. (2012) GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res 22:557-67
Marcais, Guillaume; Kingsford, Carl (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764-70
Dalloul, Rami A; Long, Julie A; Zimin, Aleksey V et al. (2010) Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol 8:
Zimin, Aleksey V; Delcher, Arthur L; Florea, Liliana et al. (2009) A whole-genome assembly of the domestic cow, Bos taurus. Genome Biol 10:R42

Showing the most recent 10 out of 16 publications