In this project, we propose to discover, sequence and integrate into the human genome three classes of refractory structural variation. This will include novel insertions not present with the reference sequence, regions of recurrent copy-number variation, and duplicated regions of high sequence diversity. Using an existing fosmid clone resource from 16 reference individuals, we will identify and subclone these regions, generate high-quality sequence commensurate with the human genome, and assess copy-number variation of these regions. We predict that this project will recover and characterize 1,575 loci that will be difficult to fully characterize by other experimental and computational approaches. The advantage of our approach is that it utilizes the entire sequence of the clone to determine the complete context of this variation. Our goal will be to integrate this high-quality sequence into the reference genome as alternate haplotypes that may be annotated and further characterized. This work will, thus, complement ongoing efforts as part of the 1000 Genomes Project and Genome Reference Consortium to comprehensively assess the complete spectrum of human genetic variation.
Showing the most recent 10 out of 41 publications