Continued improvement of genome assemblies and assembly techniques for Next Gener

Yorke, James

Abstract

The two widely used Next Generation Sequencing (NGS) technologies are 454 Sequencing and Illumina sequencing. We propose to determine the best sequencing strategy, that is the optimal mix of 454 and Illumina read and mate pair data to produce the best possible assembly at the lowest cost. We propose to continue developing our software for closing gaps and fixing mis- assemblies by our shooting method. We can extend the method to use additional NGS reads and mate pairs to close gaps in existing assemblies to increase contiguity, and find and correct mis-assemblies. This method can be used as a cheaper alternative to traditional finishing techniques. The final product of any assembly project is a set of the chromosome sequence files. We propose to develop improved software capable of producing chromosome sequences from the assembled contigs using mate pair and marker data. Our preliminary version works for assemblies that have large contigs (N50 size >100Kb). Genomes assembled from the NGS data typically have small contigs (N50 size of 10-20Kb). We propose to extend development of the software so that it is applicable to genome assemblies of the NGS data. We propose to employ the experience that we gained in the previous project period to re-assemble the genomes of chicken, rat, and possibly other genomes of public health interest from the existing Trace Archive data combined with (if available) additional NGS data. The NGS data is getting cheaper. Now there are many groups interested in sequencing various genomes. Thus we propose to produce de novo assemblies of insect, plant genomes and other organisms of public health interest in collaboration with the centers that generate the data. Our goal is to serve as an expert genome assembly group that provides its services and techniques to the community.

Public Health Relevance

Advances in the sequencing technologies made it possible to obtain large amounts of sequence data quickly and at low cost, compared to the Sanger sequencing. Our goals are to contribute our techniques, software and expertise in assembly of the short read data to the community. We will continuously improve our methods to obtain the best possible assemblies of the new genomes sequenced with the latest technologies. The ultimate goal of this project is to improve public health by better understanding the human genome and the genomes of other species.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG002945-08
Application #: 8300065
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Felsenfeld, Adam

Project Start: 2003-08-13
Project End: 2014-05-31
Budget Start: 2012-06-01
Budget End: 2013-05-31
Support Year: 8
Fiscal Year: 2012
Total Cost: $288,125
Indirect Cost: $88,125

Institution

Name: University of Maryland College Park
Department: Other Basic Sciences
Type: Schools of Arts and Sciences
DUNS #: 790934285

City: College Park
State: MD
Country: United States
Zip Code: 20742

Related projects


NIH 2013 R01 HG	Continued improvement of genome assemblies and assembly techniques for Next Gener Yorke, James A. / University of Maryland College Park	$275,160
NIH 2012 R01 HG	Continued improvement of genome assemblies and assembly techniques for Next Gener Yorke, James A. / University of Maryland College Park	$288,125
NIH 2011 R01 HG	Continued improvement of genome assemblies and assembly techniques for Next Gener Yorke, James A. / University of Maryland College Park	$288,125
NIH 2009 R01 HG	Continued Improvements of Whole Genome Shotgun Assembly Yorke, James A. / University of Maryland College Park	$227,592
NIH 2009 R01 HG	Continued Improvements of Whole Genome Shotgun Assembly Yorke, James A. / University of Maryland College Park	$71,250
NIH 2008 R01 HG	Continued Improvements of Whole Genome Shotgun Assembly Yorke, James A. / University of Maryland College Park	$227,592
NIH 2007 R01 HG	Continued Improvements of Whole Genome Shotgun Assembly Yorke, James A. / University of Maryland College Park	$232,001
NIH 2005 R01 HG	Reliable Assembler for Whole Genome Shotgun Data. Yorke, James A. / University of Maryland College Park	$174,771
NIH 2004 R01 HG	Reliable Assembler for Whole Genome Shotgun Data. Yorke, James A. / University of Maryland College Park	$174,771
NIH 2003 R01 HG	Reliable Assembler for Whole Genome Shotgun Data. Yorke, James A. / University of Maryland College Park	$145,251

Publications

Muñoz, Adriana; Santos Muñoz, Daniella; Zimin, Aleksey et al. (2016) Evolution of transcriptional networks in yeast: alternative teams of transcriptional factors for different species. BMC Genomics 17:826

Li, Gang; Hillier, LaDeana W; Grahn, Robert A et al. (2016) A High-Resolution SNP Array-Based Linkage Map Anchors a New Domestic Cat Draft Genome Assembly and Provides Detailed Patterns of Recombination. G3 (Bethesda) 6:1607-16

Marçais, Guillaume; Yorke, James A; Zimin, Aleksey (2015) QuorUM: An Error Corrector for Illumina Reads. PLoS One 10:e0130821

Schrader, Lukas; Kim, Jay W; Ence, Daniel et al. (2014) Transposable element islands facilitate adaptation to novel environments in an invasive species. Nat Commun 5:5495

Zimin, Aleksey V; Marçais, Guillaume; Puiu, Daniela et al. (2013) The MaSuRCA genome assembler. Bioinformatics 29:2669-77

Patro, Rob; Sefer, Emre; Malin, Justin et al. (2012) Parsimonious reconstruction of network evolution. Algorithms Mol Biol 7:25

Salzberg, Steven L; Phillippy, Adam M; Zimin, Aleksey et al. (2012) GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res 22:557-67

Marcais, Guillaume; Kingsford, Carl (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764-70

Dalloul, Rami A; Long, Julie A; Zimin, Aleksey V et al. (2010) Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol 8:

Zimin, Aleksey V; Delcher, Arthur L; Florea, Liliana et al. (2009) A whole-genome assembly of the domestic cow, Bos taurus. Genome Biol 10:R42

Showing the most recent 10 out of 16 publications

Comments

Be the first to comment on James Yorke's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: