Our main goal is to create a whole genome shotgun assembler for large repetitive genomes, that is superior at finding the sequence in repeat regions. Our most obvious departure from previous methods is the use of mated pairs in the beginning of the assembly, and to begin by building what we call a """"""""'virtual physical map,"""""""" that determines the relative positions of BACs in the genome. For our assembly we will only require whole genome shotgun sequence data, including BAC end reads. There are several tasks that we propose to accomplish. * To develop an integrated code that performs the assembly and outputs the consensus sequence along with quality values. We intend to document our code and post the source on the Internet to make it available to the scientific community around the world. * To make our program modular so that groups (such as the group at Baylor) can use parts of our assembler separately, including the overlapper routine and virtual physical map routine. * To evaluate the reliability of our assembler using data from a finished genome such as C. elegans. * To compare the performance of our assembler to other existing assemblers such as ARACHNE and Phusion using publicly available read data for human and mosquito genomes. * To assemble the mouse and rat genomes using publicly available read data and compare our (draft) assembly with publicly available draft assemblies. The results of our investigations will be published in peer-reviewed scientific journals. We are purely academic, not-for profit research group and we do not plan to patent or in any other way restrict the community's access to our software and results. Our research is directed toward uncovering more of the sequence than existing whole genome shotgun assemblers can provide, in highly repetitive genomes, like human or mouse. Our approach may find more genes and lead to better understanding of the genetic structure of the species. The ultimate goal of this work is of course the public health benefits expected from more accurately determining and better understanding the human genome.
Muñoz, Adriana; Santos Muñoz, Daniella; Zimin, Aleksey et al. (2016) Evolution of transcriptional networks in yeast: alternative teams of transcriptional factors for different species. BMC Genomics 17:826 |
Li, Gang; Hillier, LaDeana W; Grahn, Robert A et al. (2016) A High-Resolution SNP Array-Based Linkage Map Anchors a New Domestic Cat Draft Genome Assembly and Provides Detailed Patterns of Recombination. G3 (Bethesda) 6:1607-16 |
Marçais, Guillaume; Yorke, James A; Zimin, Aleksey (2015) QuorUM: An Error Corrector for Illumina Reads. PLoS One 10:e0130821 |
Schrader, Lukas; Kim, Jay W; Ence, Daniel et al. (2014) Transposable element islands facilitate adaptation to novel environments in an invasive species. Nat Commun 5:5495 |
Zimin, Aleksey V; Marçais, Guillaume; Puiu, Daniela et al. (2013) The MaSuRCA genome assembler. Bioinformatics 29:2669-77 |
Patro, Rob; Sefer, Emre; Malin, Justin et al. (2012) Parsimonious reconstruction of network evolution. Algorithms Mol Biol 7:25 |
Salzberg, Steven L; Phillippy, Adam M; Zimin, Aleksey et al. (2012) GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res 22:557-67 |
Marcais, Guillaume; Kingsford, Carl (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764-70 |
Dalloul, Rami A; Long, Julie A; Zimin, Aleksey V et al. (2010) Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol 8: |
Zimin, Aleksey V; Delcher, Arthur L; Florea, Liliana et al. (2009) A whole-genome assembly of the domestic cow, Bos taurus. Genome Biol 10:R42 |
Showing the most recent 10 out of 16 publications