Our main goal is to create a whole genome shotgun assembler for large repetitive genomes, that is superior at finding the sequence in repeat regions. Our most obvious departure from previous methods is the use of mated pairs in the beginning of the assembly, and to begin by building what we call a """"""""'virtual physical map,"""""""" that determines the relative positions of BACs in the genome. For our assembly we will only require whole genome shotgun sequence data, including BAC end reads. There are several tasks that we propose to accomplish. * To develop an integrated code that performs the assembly and outputs the consensus sequence along with quality values. We intend to document our code and post the source on the Internet to make it available to the scientific community around the world. * To make our program modular so that groups (such as the group at Baylor) can use parts of our assembler separately, including the overlapper routine and virtual physical map routine. * To evaluate the reliability of our assembler using data from a finished genome such as C. elegans. * To compare the performance of our assembler to other existing assemblers such as ARACHNE and Phusion using publicly available read data for human and mosquito genomes. * To assemble the mouse and rat genomes using publicly available read data and compare our (draft) assembly with publicly available draft assemblies. The results of our investigations will be published in peer-reviewed scientific journals. We are purely academic, not-for profit research group and we do not plan to patent or in any other way restrict the community's access to our software and results. Our research is directed toward uncovering more of the sequence than existing whole genome shotgun assemblers can provide, in highly repetitive genomes, like human or mouse. Our approach may find more genes and lead to better understanding of the genetic structure of the species. The ultimate goal of this work is of course the public health benefits expected from more accurately determining and better understanding the human genome.
Showing the most recent 10 out of 16 publications