This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. Biological research is increasingly dependent upon 'finished' genome sequence as a baseline for further research. More than 99% of the targeted human genome is now represented as high quality finished sequence with each base ordered and orientated. Two major types of gaps remain: heterochromatic (estimated at _190 Mb) and euchromatic gaps (23.0 Mb). Within euchromatic regions 54.5% (168/308) of all assembly gaps are flanked by segmental duplication. The greatest gap density within the finished genome occurs within 2 Mb transition regions between the centromere and euchromatin DNA. We propose that duplications and large-scale structural variation have complicated sequence and assembly of these regions creating de facto gaps. This grant outlines a systematic strategy to target the sequence and assembly of pericentromeric DNA using genomic libraries of haploid complexity. Comparative sequence analysis of one pericentromeric region among primates will serve as a model to understand the pattern of structural variation as a function of evolutionary time. In addition, this competitive renewal develops a computational pipeline that provides support for the analysis of duplication content within other mammalian genomes. The results of this analysis will provide a framework for understanding these regions in other organisms as well as complement ongoing NHGRI-approved whole-genome shotgun sequencing efforts. The presence of recent segmental duplications remains the single most important predictor of gap location within euchromatic sequence. The resolution of these exceptional regions is, therefore, critical for accurate assembly and annotation of genomes.
Showing the most recent 10 out of 444 publications