Complex phenotypes, i.e., observable characteristics, are a central focus of biology and medicine. Growing collections of data that survey how genes and phenotypes vary across individuals present the tantalizing opportunity to systematically understand the genetic architecture of complex phenotypes. Drawing inferences about genetic architecture from these large collections of high-dimensional genomic data will elucidate the sets of rules that predict an organism’s phenotype. Methods based on mixed models, that model the joint effects of large numbers of genetic variants, have emerged as an important tool in this endeavor. Mixed model methods, however, rely on a number of simplifying modeling assumptions that can lead to biased inferences. Further, applying these methods to large-scale genetic datasets is computationally impractical. The project will develop novel, scalable methods that can characterize genetic architecture of complex traits. The application of these methods to large genetic datasets available will lead to novel insights into genes that underlie variation in complex phenotypes, which leads to further uncover rules of life. The inter-disciplinary aspect of the project will bring together researchers and students from computer science, statistics, bioinformatics and human genetics and will lead to cross-fertilization and closer interactions across these communities.

The project will develop new computational mixed model methods that are flexible and efficient and to apply these methods to obtain novel insights into genetic architecture. Specifically, the project will develop 1) linear mixed models that provide accurate estimates of heritability across a wide range of genetic architectures, 2) non-linear mixed models that estimate the contribution of gene-gene and gene-environment interactions, and 3) multi-trait mixed models that estimate the genetic component shared across traits. Importantly, the proposed methods are designed to scale to datasets that contain millions of individuals. To demonstrate their utility, we will apply these methods to large genetic datasets to obtain novel insights into heritability, its distribution across the genome, its correlation with other traits, the contribution of gene-gene and gene-environment interactions, and the impact of natural selection.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
1943497
Program Officer
Jean Gao
Project Start
Project End
Budget Start
2020-07-01
Budget End
2025-06-30
Support Year
Fiscal Year
2019
Total Cost
$128,875
Indirect Cost
Name
University of California Los Angeles
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90095