Significant improvements in the cost and ability to obtain genome-wide genotypes, even entire genome sequencing, of individuals provides an opportunity to understand genetic variation among humans and its relationship to complex trait variation at an unprecedented level of resolution. During this time, it has been discovered that human population genetic structure is more of a continuous phenomenon rather than being manifested as a set of well-defined discrete subpopulations. However, classical population genetic theory and methods for population structure are largely based on the assumption of non-overlapping, discretely structured populations. For example, Wright's F-statistics and association testing have mostly been studied in the context of K distinct populations. A number of innovative algorithms for analyzing structured populations with dense genotyping have recently been proposed. However, there has yet to be a firm statistical foundation developed for this setting. We propose to tackle a number of open problems in such a way that a coherent theoretical framework and set of statistical methodologies will emerge. We will perform extensive data analyses based on these methods on existing data sets, including the Human Genome Diversity Project, the 1000 Genomes Project, and GWAS studies involving structured populations.
Understanding genome-wide patterns of genetic variation among individuals and how this relates to complex diseases is one of the primary goals of modern medical research. The proposed research will contribute to this goal by tackling a number of open problems in such a way that a coherent statistical framework and set of methodologies will emerge that can be applied to data sets of genome-wide genetic variation to produce a clearer picture of the genetic basis of human disease.
|Gopalan, Prem; Hao, Wei; Blei, David M et al. (2016) Scaling probabilistic models of genetic variation to millions of humans. Nat Genet 48:1587-1590|
|Hao, Wei; Song, Minsun; Storey, John D (2016) Probabilistic models of genetic variation in structured populations applied to global human studies. Bioinformatics 32:713-21|
|Chung, Neo Christopher; Storey, John D (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics 31:545-54|
|Song, Minsun; Hao, Wei; Storey, John D (2015) Testing for genetic associations in arbitrarily structured populations. Nat Genet 47:550-4|