Population Structure Admixture and Selection across the 1000 Genomes Data Set

Bustamante, Carlos; Clark, Andrew

Abstract

The 1000 Genomes Project (TGP) has tremendous potential to answer fundamental questions in human population genetics and shape the future design of medical genomic studies. Key to realizing this potential is the development of efficient, robust, and powerful computational methods for analysis of the copious amounts of data generated by the project. Here, we propose novel approaches for characterizing population structure, analyzing patterns of admixture, and localizing signatures of selection across the 2,000 samples of the TGP. Our project has three primary aims. First, we will construct detailed models of human demographic history based on the TGP. To accomplish this, we develop approaches for analyzing the joint allele frequency spectrum of rare and common SNPs, copy number variants (CNVs), and haplotypes across all the populations being surveyed. Having full sequence data will render these approaches dramatically better at making inferences about the recent past, where distortions in frequency spectra are particularly important for testing associations with rare variants. Second, we will characterize patterns of population structure and admixture in the four Hispanic/Latino and three African-American TGP samples. The TGP presents a tremendous opportunity for catalyzing population and medical genomics research for these important and understudied ethnic minority groups. We will develop novel statistical genomic approaches for reconstructing the genetic history of admixed populations and apply these methods to the TGP samples. Our methods will be tailored for short-read sequence data and will leverage the trio design of the sampling. Third, we will detect signatures of balancing, purifying, and positive selection in the full TGP data set. We will develop software tools to integrate signatures of natural selection based on a new approach that uses numerical methods to fit a diffusion approximation to the multi-dimensional site frequency spectrum. This approach allows identification of distortions caused by positive, balancing, or negative selection. The method is especially well suited to low coverage short-read sequence data. These inferences will be integrated with the maps of GWAS hits to accelerate discovery of disease-associated variants.

Public Health Relevance

Medical genetics research provides a vehicle for uncovering the heritable basis of complex disease. The 1000 Genomes project is an international effort to sequence the genomes of approximately 2,000 diverse human subjects. We propose to analyze these data in order to characterize differences among genomes and catalyze medical and population genomic research throughout the world.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project--Cooperative Agreements (U01)
Project #: 1U01HG005715-01
Application #: 7881973
Study Section: Special Emphasis Panel (ZHG1-HGR-M (J1))
Program Officer: Brooks, Lisa

Project Start: 2010-09-09
Project End: 2012-06-30
Budget Start: 2010-09-09
Budget End: 2011-06-30
Support Year: 1
Fiscal Year: 2010
Total Cost: $441,922
Indirect Cost

Institution

Name: Stanford University
Department: Genetics
Type: Schools of Medicine
DUNS #: 009214214

City: Stanford
State: CA
Country: United States
Zip Code: 94305

Related projects


NIH 2012 U01 HG	Population Structure Admixture and Selection across the 1000 Genomes Data Set Bustamante, Carlos D.; Clark, Andrew G. / Stanford University	$196,250
NIH 2011 U01 HG	Population Structure Admixture and Selection across the 1000 Genomes Data Set Bustamante, Carlos D.; Clark, Andrew G. / Stanford University	$436,083
NIH 2010 U01 HG	Population Structure Admixture and Selection across the 1000 Genomes Data Set Bustamante, Carlos D.; Clark, Andrew G. / Stanford University	$441,922

Publications

1000 Genomes Project Consortium; Auton, Adam; Brooks, Lisa D et al. (2015) A global reference for human genetic variation. Nature 526:68-74

Shringarpure, Suyash S; Carroll, Andrew; De La Vega, Francisco M et al. (2015) Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes. PLoS One 10:e0129277

Gazave, Elodie; Ma, Li; Chang, Diana et al. (2014) Neutral genomic regions refine models of recent rapid human population growth. Proc Natl Acad Sci U S A 111:757-62

Arbiza, Leonardo; Gottipati, Srikanth; Siepel, Adam et al. (2014) Contrasting X-linked and autosomal diversity across 14 human populations. Am J Hum Genet 94:827-44

Carpenter, Meredith L; Buenrostro, Jason D; Valdiosera, Cristina et al. (2013) Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries. Am J Hum Genet 93:852-64

Ma, Li; Clark, Andrew G; Keinan, Alon (2013) Gene-based testing of interactions in association studies of quantitative traits. PLoS Genet 9:e1003321

Gravel, Simon; Zakharia, Fouad; Moreno-Estrada, Andres et al. (2013) Reconstructing Native American migrations from whole-genome and whole-exome data. PLoS Genet 9:e1004023

Gazave, Elodie; Chang, Diana; Clark, Andrew G et al. (2013) Population growth inflates the per-individual number of deleterious mutations and reduces their mean effect. Genetics 195:969-78

1000 Genomes Project Consortium; Abecasis, Goncalo R; Auton, Adam et al. (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56-65

Chang, Diana; Keinan, Alon (2012) Predicting signatures of ""synthetic associations"" and ""natural associations"" from empirical patterns of human genetic variation. PLoS Comput Biol 8:e1002600

Showing the most recent 10 out of 19 publications

Comments

Be the first to comment on Carlos Bustamante's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: