data canhave high utility for providing insight into genetic etiology of health and disease. Databases of genotype frequencies, such as the genome Aggregation Database (gnomAD), are used to prioritize putative causal variants and, more recently, as pseudo-controls in case-control analysis. Genome Wide Association Study (GWAS) test statistics are used in a variety of secondary data analyses including polygenic risk scores (PRS), genetic correlation analysis, and fine mapping of causal variants. Compared with individual level data, genetic summary data often has fewer barriers in access, promoting broad use of these valuable data resources. The availability and use of summary genetic data is often not equitable across all ancestral groups, especially for understudied ancestral groups that have little to no representation within these resources. Furthermore, heterogeneity within the summary data can lead to confounding and reduced power for case-control analysis, incorrect prioritization of putative causal variants for rare diseases, and reduced accuracy for polygenic risk scores. I develop robust and efficient methods to appropriately use genetic summary data while estimating, modeling, and harnessing the heterogeneity within. My methods coalesce around a unifying framework where I flip the paradigm of genetic and genomic data treating the genetic variant or element as the observational unit by which we analyze the data rather than the individual. This simple, yet innovative paradigm shift enables the use of classical statistical techniques and the creation of methods that detect, adjust for, and even use heterogeneity within summary level data. To enable broad and equitable use of our methods, we will create publicly available R packages compatible with Bioconductor and Shiny Apps for interactive internet use.

Public Health Relevance

Publicly increase data limiting methods heterogeneity, summary available genetic summary data (e.g. genotype f requencies in diverse ancestries) can the understanding of health and disease. Many methods that use this genetic summary are biased due to the underlying and unobserved heterogeneity within the data sets, the use of this resource for understudied or admixed ancestries. Here, I develop that use summary genetic and genomic data while estimating and modeling the thus increasing the r obustness and effectivenes s of publicly available genetic data for genetic research and precision medicine.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Unknown (R35)
Project #
1R35HG011293-01
Application #
10047677
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Sofia, Heidi J
Project Start
2020-09-01
Project End
2025-06-30
Budget Start
2020-09-01
Budget End
2021-06-30
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Colorado Denver
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
041096314
City
Aurora
State
CO
Country
United States
Zip Code
80045