Next-generation sequencing has created of wealth of information about the genetic basis of disease and natural variation. The major flaw of next-generation sequencers is the short read length, which creates problems in piecing together a complete genome. The goal of this proposal is to use molecular biology to create DNA libraries that can be sequenced and more easily assembled into long contiguous stretches. The proposal will first focus on creating contig sequences of 5,000 nucleotides that have been assembled from each sub-region of the genome. Next, the goal is to create stretches of 50,000 nucleotide assemblies, also pieced together in parallel across the genome. Computer scripts to optimize the assembly of the genomic regions will also be developed. These longer contiguous sequences will greatly improve the ability of researchers to accurately determine the proper order of genes in a genome, which will speed discovery of the DNA changes that give rise to altered phenotypes in model organisms, non-model organisms and humans.
Understanding the sequence of a genome helps researchers make new discoveries about biology and health more quickly. This application proposes to develop methods to help put a genome sequence together in the proper order, simplifying the process of reading a genome.
Kamps-Hughes, Nick; Quimby, Aine; Zhu, Zhenyu et al. (2013) Massively parallel characterization of restriction endonucleases. Nucleic Acids Res 41:e119 |
Davey, John W; Hohenlohe, Paul A; Etter, Paul D et al. (2011) Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet 12:499-510 |
Etter, Paul D; Preston, Jessica L; Bassham, Susan et al. (2011) Local de novo assembly of RAD paired-end contigs using short sequencing reads. PLoS One 6:e18561 |