The goal of the """"""""Genome sequence variation"""""""" project is to understand the genomic structure of variations, the fundamental forces that have shaped this structure, and to use this knowledge for understanding the genetic causes of diseases. First, we have analyzed the overlapping portion of large insert (BAC) clones sequenced for the construction of a public human reference sequence. We have found 500,000 computational candidate SNPs. These candidates were verified to be high quality predictions in independent laboratory experiments (Marth et al., Nature Genetics 2001). We analyzed the genome distribution of these SNPs, and found that nucleotide diversity correlated with structural and functional features such as G+C content, CpG di-nucleotide content, repeat content, recombination frequency, and coding features. However, the variance in these correlations is so large that even all these features together are only very poor predictors of local values of nucleotide diversity. This shoed that random forces (genetic drift) is likely the main component in the description of nucleotide diversity. We have studied genetic drift under realistic recombination and mutation values, and dynamic models of population history, and found that a bottleneck shaped model accounts for the data at all length scales we analyzed (Marth et al., PNAS, in press). Among other population-genetic conclusions, this predicts a reduced level of linkage disequilibtium in the genome of the population (or populations) represented in the public genome sequence compared to previous expectations. This prediction has since been confirmed in various studies from other laboratories. In the BAC overlaps, we have also found over 100,000 deletion/insertion polymorphisms (DIPs). Thousands of these were analyzed by our collaborator, and results reported (Weber et al., AJHG 2002).

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000158-01
Application #
6681416
Study Section
(CBB)
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
2002
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Weber, James L; David, Donna; Heil, Jeremy et al. (2002) Human diallelic insertion/deletion polymorphisms. Am J Hum Genet 71:854-62
Sachidanandam, R; Weissman, D; Schmidt, S C et al. (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928-33
Marth, G; Yeh, R; Minton, M et al. (2001) Single-nucleotide polymorphisms in the public domain: how useful are they? Nat Genet 27:371-2