The 1000 Genomes Project (TGP) has tremendous potential to answer fundamental questions in human population genetics and shape the future design of medical genomic studies. Key to realizing this potential is the development of efficient, robust, and powerful computational methods for analysis of the copious amounts of data generated by the project. Here, we propose novel approaches for characterizing population structure, analyzing patterns of admixture, and localizing signatures of selection across the 2,000 samples of the TGP. Our project has three primary aims. First, we will construct detailed models of human demographic history based on the TGP. To accomplish this, we develop approaches for analyzing the joint allele frequency spectrum of rare and common SNPs, copy number variants (CNVs), and haplotypes across all the populations being surveyed. Having full sequence data will render these approaches dramatically better at making inferences about the recent past, where distortions in frequency spectra are particularly important for testing associations with rare variants. Second, we will characterize patterns of population structure and admixture in the four Hispanic/Latino and three African-American TGP samples. The TGP presents a tremendous opportunity for catalyzing population and medical genomics research for these important and understudied ethnic minority groups. We will develop novel statistical genomic approaches for reconstructing the genetic history of admixed populations and apply these methods to the TGP samples. Our methods will be tailored for short-read sequence data and will leverage the trio design of the sampling. Third, we will detect signatures of balancing, purifying, and positive selection in the full TGP data set. We will develop software tools to integrate signatures of natural selection based on a new approach that uses numerical methods to fit a diffusion approximation to the multi-dimensional site frequency spectrum. This approach allows identification of distortions caused by positive, balancing, or negative selection. The method is especially well suited to low coverage short-read sequence data. These inferences will be integrated with the maps of GWAS hits to accelerate discovery of disease-associated variants.
Medical genetics research provides a vehicle for uncovering the heritable basis of complex disease. The 1000 Genomes project is an international effort to sequence the genomes of approximately 2,000 diverse human subjects. We propose to analyze these data in order to characterize differences among genomes and catalyze medical and population genomic research throughout the world.
|1000 Genomes Project Consortium; Auton, Adam; Brooks, Lisa D et al. (2015) A global reference for human genetic variation. Nature 526:68-74|
|Shringarpure, Suyash S; Carroll, Andrew; De La Vega, Francisco M et al. (2015) Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes. PLoS One 10:e0129277|
|Arbiza, Leonardo; Gottipati, Srikanth; Siepel, Adam et al. (2014) Contrasting X-linked and autosomal diversity across 14 human populations. Am J Hum Genet 94:827-44|
|Gazave, Elodie; Ma, Li; Chang, Diana et al. (2014) Neutral genomic regions refine models of recent rapid human population growth. Proc Natl Acad Sci U S A 111:757-62|
|Carpenter, Meredith L; Buenrostro, Jason D; Valdiosera, Cristina et al. (2013) Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries. Am J Hum Genet 93:852-64|
|Ma, Li; Clark, Andrew G; Keinan, Alon (2013) Gene-based testing of interactions in association studies of quantitative traits. PLoS Genet 9:e1003321|
|Gravel, Simon; Zakharia, Fouad; Moreno-Estrada, Andres et al. (2013) Reconstructing Native American migrations from whole-genome and whole-exome data. PLoS Genet 9:e1004023|
|Gazave, Elodie; Chang, Diana; Clark, Andrew G et al. (2013) Population growth inflates the per-individual number of deleterious mutations and reduces their mean effect. Genetics 195:969-78|
|1000 Genomes Project Consortium; Abecasis, Goncalo R; Auton, Adam et al. (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56-65|
|Chang, Diana; Keinan, Alon (2012) Predicting signatures of ""synthetic associations"" and ""natural associations"" from empirical patterns of human genetic variation. PLoS Comput Biol 8:e1002600|
Showing the most recent 10 out of 19 publications