Our main goal is to create a whole genome shotgun assembler for large repetitive genomes, that is superior at finding the sequence in repeat regions. Our most obvious departure from previous methods is the use of mated pairs in the beginning of the assembly, and to begin by building what we call a """"""""'virtual physical map,"""""""" that determines the relative positions of BACs in the genome. For our assembly we will only require whole genome shotgun sequence data, including BAC end reads. There are several tasks that we propose to accomplish. * To develop an integrated code that performs the assembly and outputs the consensus sequence along with quality values. We intend to document our code and post the source on the Internet to make it available to the scientific community around the world. * To make our program modular so that groups (such as the group at Baylor) can use parts of our assembler separately, including the overlapper routine and virtual physical map routine. * To evaluate the reliability of our assembler using data from a finished genome such as C. elegans. * To compare the performance of our assembler to other existing assemblers such as ARACHNE and Phusion using publicly available read data for human and mosquito genomes. * To assemble the mouse and rat genomes using publicly available read data and compare our (draft) assembly with publicly available draft assemblies. The results of our investigations will be published in peer-reviewed scientific journals. We are purely academic, not-for profit research group and we do not plan to patent or in any other way restrict the community's access to our software and results. Our research is directed toward uncovering more of the sequence than existing whole genome shotgun assemblers can provide, in highly repetitive genomes, like human or mouse. Our approach may find more genes and lead to better understanding of the genetic structure of the species. The ultimate goal of this work is of course the public health benefits expected from more accurately determining and better understanding the human genome.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG002945-01
Application #
6676673
Study Section
Genome Study Section (GNM)
Program Officer
Felsenfeld, Adam
Project Start
2003-08-13
Project End
2006-07-31
Budget Start
2003-08-13
Budget End
2004-07-31
Support Year
1
Fiscal Year
2003
Total Cost
$145,251
Indirect Cost
Name
University of Maryland College Park
Department
Other Basic Sciences
Type
Other Domestic Higher Education
DUNS #
790934285
City
College Park
State
MD
Country
United States
Zip Code
20742
Muñoz, Adriana; Santos Muñoz, Daniella; Zimin, Aleksey et al. (2016) Evolution of transcriptional networks in yeast: alternative teams of transcriptional factors for different species. BMC Genomics 17:826
Li, Gang; Hillier, LaDeana W; Grahn, Robert A et al. (2016) A High-Resolution SNP Array-Based Linkage Map Anchors a New Domestic Cat Draft Genome Assembly and Provides Detailed Patterns of Recombination. G3 (Bethesda) 6:1607-16
Marçais, Guillaume; Yorke, James A; Zimin, Aleksey (2015) QuorUM: An Error Corrector for Illumina Reads. PLoS One 10:e0130821
Schrader, Lukas; Kim, Jay W; Ence, Daniel et al. (2014) Transposable element islands facilitate adaptation to novel environments in an invasive species. Nat Commun 5:5495
Zimin, Aleksey V; Marçais, Guillaume; Puiu, Daniela et al. (2013) The MaSuRCA genome assembler. Bioinformatics 29:2669-77
Patro, Rob; Sefer, Emre; Malin, Justin et al. (2012) Parsimonious reconstruction of network evolution. Algorithms Mol Biol 7:25
Salzberg, Steven L; Phillippy, Adam M; Zimin, Aleksey et al. (2012) GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res 22:557-67
Marcais, Guillaume; Kingsford, Carl (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764-70
Dalloul, Rami A; Long, Julie A; Zimin, Aleksey V et al. (2010) Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol 8:
Rapatski, Brandy; Yorke, James (2009) Modeling HIV outbreaks: the male to female prevalence ratio in the core population. Math Biosci Eng 6:135-43

Showing the most recent 10 out of 16 publications