There are now hundreds of thousands of people enrolled in biobank studies, with genome-wide genotypes and rich phenotype data recorded. These data are an unprecedented resource for learning about human genetic and phenotypic variation. At the same time, population geneticists are developing advanced tools for representing large genetic datasets as ancestral recombination graphs, which concisely encode the genealogical relationships among genomic segments from different people in the dataset. Combining biobank-scale data with these advanced computational tools presents a tremendous opportunity, but we need new statistical methods to realize the possible benefits. My research group will develop and apply such statistical methods to complex biomedical traits, drawing on expertise in population-genetic theory, statistical genetics, and computation. First, we will leverage our new methods and large public datasets to study the evolution of genetic variants associated with biomedically relevant traits, providing important clues about disease etiology. Ancestral recombination graphs can encode all historical information available in a sample of contemporary genomes, so they are a rich basis for evolutionary inference. Second, we will also develop methods for enhancing genome-wide association studies (GWAS) aimed at discovering trait-associated genetic variants. Many key GWAS goals that remain challenging today?adjustment for population stratification and assortative mating, fine mapping of causal variants, and others?hinge critically on parsing correlations among both neighboring and distant genetic loci. Ancestral recombination graphs represent such correlations naturally, and so emerging tools present a variety of novel possibilities for clarifying the genetic basis of trait variation, including heritable differences in disease susceptibility. Finally, a third leg of our research program will be aimed at protecting the privacy of participants in large genetic databases. The assembly of large genetic datasets is critical to biomedical research and also hinges on public trust that privacy can be ensured. We will consider a set of new privacy threats that will arise as genetic research advances, particularly as genotype?phenotype associations are better understood and as applications of genetic genealogy become more prevalent.

Public Health Relevance

There are now hundreds of thousands of people enrolled in biobank studies, presenting unprecedented opportunities for understanding the genetic basis of vulnerability to disease and other complex traits. Novel computational methods from population genetics allow these vast genetic datasets to be expressed in terms of genealogical relationships among segments of DNA scattered across people, which naturally encodes all historical information available in a set of contemporary genomes, as well as all correlations among genetic variants. We will use these new resources to understand the evolution of complex biomedical traits and better identify genetic variants associated with disease.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Unknown (R35)
Project #
1R35GM137758-01
Application #
10023459
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Krasnewich, Donna M
Project Start
2020-09-01
Project End
2025-07-31
Budget Start
2020-09-01
Budget End
2021-07-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Southern California
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
072933393
City
Los Angeles
State
CA
Country
United States
Zip Code
90089