The Human Genome Project and follow-on projects such as 1000 Genomes, GTEx, ENCODE, and TOPMed provide powerful resources to identify genes that influence human health and disease and variability in disease-related quantitative traits (QTs). Along with these resources have come increasingly efficient tools to genotype, sequence, and annotate the genome, and to support computation across these data. These resources and tools will be critical as we continue to explore the genetic basis of human disease and disease-related QTs. In this proposal, we describe statistical and computational problems that arise in human gene mapping, with a particular focus on sequence analysis, genotype imputation, and quality control. We describe statistical methods to address these problems and software tools and web services to facilitate their use. We will test resulting methods, tools, and web services via computer simulation and analysis of data from complex trait genetics studies in which we are involved. Specifically, we will: (1) develop tools to detect and estimate DNA sample contamination that are agnostic to genetic ancestry; (2) develop a test for Hardy-Weinberg equilibrium of sequence-based or imputed genotypes in the presence of population structure and robust to sample contamination; (3) enable more accurate variant filtering and genotype calling from DNA sequence data in the presence of population structure and/or sample contamination; (4) develop methods to detect sample contamination in RNA- and epigenomic sequence data; (5) extend the Michigan Imputation Server (MIS) to increase power of a sequence-based association studies by supporting use of external controls from existing sequence data resources, augmenting an existing imputation reference panel with the investigator's sequenced samples, and checking for contamination; and (6) document, distribute, and support efficient software tools to support these methods. Under separate funding, we will apply the resulting methods to help understand the genetic basis of type 2 diabetes and related QTs, and of schizophrenia and bipolar disorder. Success in these aims will enable more rapid identification of variants that predispose to human disease and account for variability in disease-related QTs, and has the potential to lead to new insights into basic biology and disease etiology, identify novel therapies, improve targeting of therapies, assist in disease classification, and support more accurate disease risk prediction. The modest cost of statistical and computational methods development, and the impact of these methods across many studies, makes our proposed research highly cost effective.

Public Health Relevance

Studies to localize and identify genetic variants that predispose to human diseases and influence the variability of disease-related quantitative traits have the potential to inform breakthrough strategies to develop new drugs, to develop genetic tests to stratify risk, and to enable more targeted approaches to disease prevention and treatment. Efficient statistical and computational methods and software tools are critical for the success of such studies.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG009976-03
Application #
9954132
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Li, Rongling
Project Start
2018-09-07
Project End
2023-06-30
Budget Start
2020-07-01
Budget End
2021-06-30
Support Year
3
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Type
Schools of Public Health
DUNS #
073133571
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109