The 1000 Genomes Project (TGP) has tremendous potential to answer fundamental questions in human population genetics and shape the future design of medical genomic studies. Key to realizing this potential is the development of efficient, robust, and powerful computational methods for analysis of the copious amounts of data generated by the project. Here, we propose novel approaches for characterizing population structure, analyzing patterns of admixture, and localizing signatures of selection across the 2,000 samples of the TGP. Our project has three primary aims. First, we will construct detailed models of human demographic history based on the TGP. To accomplish this, we develop approaches for analyzing the joint allele frequency spectrum of rare and common SNPs, copy number variants (CNVs), and haplotypes across all the populations being surveyed. Having full sequence data will render these approaches dramatically better at making inferences about the recent past, where distortions in frequency spectra are particularly important for testing associations with rare variants. Second, we will characterize patterns of population structure and admixture in the four Hispanic/Latino and three African-American TGP samples. The TGP presents a tremendous opportunity for catalyzing population and medical genomics research for these important and understudied ethnic minority groups. We will develop novel statistical genomic approaches for reconstructing the genetic history of admixed populations and apply these methods to the TGP samples. Our methods will be tailored for short-read sequence data and will leverage the trio design of the sampling. Third, we will detect signatures of balancing, purifying, and positive selection in the full TGP data set. We will develop software tools to integrate signatures of natural selection based on a new approach that uses numerical methods to fit a diffusion approximation to the multi-dimensional site frequency spectrum. This approach allows identification of distortions caused by positive, balancing, or negative selection. The method is especially well suited to low coverage short-read sequence data. These inferences will be integrated with the maps of GWAS hits to accelerate discovery of disease-associated variants.

Public Health Relevance

Medical genetics research provides a vehicle for uncovering the heritable basis of complex disease. The 1000 Genomes project is an international effort to sequence the genomes of approximately 2,000 diverse human subjects. We propose to analyze these data in order to characterize differences among genomes and catalyze medical and population genomic research throughout the world.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project--Cooperative Agreements (U01)
Project #
3U01HG005715-02S1
Application #
8526601
Study Section
Special Emphasis Panel (ZHG1-HGR-M (J1))
Program Officer
Brooks, Lisa
Project Start
2010-09-09
Project End
2014-06-30
Budget Start
2012-09-01
Budget End
2014-06-30
Support Year
2
Fiscal Year
2012
Total Cost
$196,250
Indirect Cost
$71,250
Name
Stanford University
Department
Genetics
Type
Schools of Medicine
DUNS #
009214214
City
Stanford
State
CA
Country
United States
Zip Code
94305
1000 Genomes Project Consortium; Auton, Adam; Brooks, Lisa D et al. (2015) A global reference for human genetic variation. Nature 526:68-74
Shringarpure, Suyash S; Carroll, Andrew; De La Vega, Francisco M et al. (2015) Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes. PLoS One 10:e0129277
Arbiza, Leonardo; Gottipati, Srikanth; Siepel, Adam et al. (2014) Contrasting X-linked and autosomal diversity across 14 human populations. Am J Hum Genet 94:827-44
Gazave, Elodie; Ma, Li; Chang, Diana et al. (2014) Neutral genomic regions refine models of recent rapid human population growth. Proc Natl Acad Sci U S A 111:757-62
Carpenter, Meredith L; Buenrostro, Jason D; Valdiosera, Cristina et al. (2013) Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries. Am J Hum Genet 93:852-64
Ma, Li; Clark, Andrew G; Keinan, Alon (2013) Gene-based testing of interactions in association studies of quantitative traits. PLoS Genet 9:e1003321
Gravel, Simon; Zakharia, Fouad; Moreno-Estrada, Andres et al. (2013) Reconstructing Native American migrations from whole-genome and whole-exome data. PLoS Genet 9:e1004023
Gazave, Elodie; Chang, Diana; Clark, Andrew G et al. (2013) Population growth inflates the per-individual number of deleterious mutations and reduces their mean effect. Genetics 195:969-78
1000 Genomes Project Consortium; Abecasis, Goncalo R; Auton, Adam et al. (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56-65
Chang, Diana; Keinan, Alon (2012) Predicting signatures of ""synthetic associations"" and ""natural associations"" from empirical patterns of human genetic variation. PLoS Comput Biol 8:e1002600

Showing the most recent 10 out of 19 publications