Title: Models and Methods for Population Genomics Abstract: Understanding genome-wide genetic variation and its role in health-related complex traits in humans is one of the most important goals of modern biomedical research. There continues to be a substantial need for new statistical models and methods that can be applied in these studies, particularly as study designs become more ambitious and sample sizes increase. The overall goal of the proposed research is to develop statistical methods and software useful in understanding population genomics studies that involve genome-wide genotyping, many simultaneously measured traits, and very large sample sizes. Our focus is on flexible modeling that adapts to systematic variation and robustly models data encountered in these modern studies.
The specific aims i nvolve (1) developing tests of association immune to arbitrary population structure that work for general distributions of traits, many simultaneous traits, or extreme large sample sizes; (2) introducing new models and estimates of kinship and FST in generalized settings, which will lead to improved quantitative genetic modeling of complex traits; (3) introducing new estimation and testing frameworks for population structure that show superior performance to existing approaches; (4) developing and distributing software; and (5) analyzing important data sets to discover new biology and validate our methods and software.

Public Health Relevance

Understanding genome-wide patterns of genetic variation among individuals and how this relates to complex diseases is one of the primary goals of modern medical research. The proposed research will contribute to this goal by tackling a number of open problems in such a way that a coherent statistical framework and set of methodologies will emerge that can be applied to data sets of genome-wide genetic variation to produce a clearer picture of the genetic basis of human disease.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Princeton University
Organized Research Units
United States
Zip Code
Gopalan, Prem; Hao, Wei; Blei, David M et al. (2016) Scaling probabilistic models of genetic variation to millions of humans. Nat Genet 48:1587-1590
Hao, Wei; Song, Minsun; Storey, John D (2016) Probabilistic models of genetic variation in structured populations applied to global human studies. Bioinformatics 32:713-21
Song, Minsun; Hao, Wei; Storey, John D (2015) Testing for genetic associations in arbitrarily structured populations. Nat Genet 47:550-4
Chung, Neo Christopher; Storey, John D (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics 31:545-54