Significant improvements in the cost and ability to obtain genome-wide genotypes, even entire genome sequencing, of individuals provides an opportunity to understand genetic variation among humans and its relationship to complex trait variation at an unprecedented level of resolution. During this time, it has been discovered that human population genetic structure is more of a continuous phenomenon rather than being manifested as a set of well-defined discrete subpopulations. However, classical population genetic theory and methods for population structure are largely based on the assumption of non-overlapping, discretely structured populations. For example, Wright's F-statistics and association testing have mostly been studied in the context of K distinct populations. A number of innovative algorithms for analyzing structured populations with dense genotyping have recently been proposed. However, there has yet to be a firm statistical foundation developed for this setting. We propose to tackle a number of open problems in such a way that a coherent theoretical framework and set of statistical methodologies will emerge. We will perform extensive data analyses based on these methods on existing data sets, including the Human Genome Diversity Project, the 1000 Genomes Project, and GWAS studies involving structured populations.

Public Health Relevance

Understanding genome-wide patterns of genetic variation among individuals and how this relates to complex diseases is one of the primary goals of modern medical research. The proposed research will contribute to this goal by tackling a number of open problems in such a way that a coherent statistical framework and set of methodologies will emerge that can be applied to data sets of genome-wide genetic variation to produce a clearer picture of the genetic basis of human disease.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG006448-01A1
Application #
8296777
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brooks, Lisa
Project Start
2012-08-25
Project End
2015-06-30
Budget Start
2012-08-25
Budget End
2013-06-30
Support Year
1
Fiscal Year
2012
Total Cost
$300,004
Indirect Cost
$113,666
Name
Princeton University
Department
Type
Organized Research Units
DUNS #
002484665
City
Princeton
State
NJ
Country
United States
Zip Code
08544
Gopalan, Prem; Hao, Wei; Blei, David M et al. (2016) Scaling probabilistic models of genetic variation to millions of humans. Nat Genet 48:1587-1590
Hao, Wei; Song, Minsun; Storey, John D (2016) Probabilistic models of genetic variation in structured populations applied to global human studies. Bioinformatics 32:713-21
Song, Minsun; Hao, Wei; Storey, John D (2015) Testing for genetic associations in arbitrarily structured populations. Nat Genet 47:550-4
Chung, Neo Christopher; Storey, John D (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics 31:545-54