Next-generation sequencing (NGS) of DNA provides an unprecedented opportunity to discover rare disease- influencing variants. However, the current practice of first calling the underlying genotypes and then treating the called values as known in rare variant tests is problematic in the presence of genotyping errors and inefficient if poorly genotyped variants are filtered out altogether. The goal of this application is to develop statistical approaches that move beyond the standard genotype-calling paradigm to instead model the sequencing reads directly for testing rare variant associations. We believe our approaches are robust to many practical designs of NGS studies, and substantially more powerful than the use of only variants whose genotypes are accurately called.
Aim 1 proposes methods for testing rare variant associations in case-control studies with external controls, which we believe will be robust to the systematic sequencing differences between cases and controls.
Aim 2 proposes rare variant methods for case-parent trio studies that control the type I error by formulating association through the underlying transmissions of the rare allele.
Aim 3 establishes novel methods for testing rare variant associations on the X chromosome; the flexible nature of our likelihood approach makes it ideal for accounting for copy number difference between sexes, X inactivation, different sex ratios between cases and controls, and sex-specific effects. We will evaluate these methods using datasets from real sequencing studies that we are actively involved in and will implement the methods in user- friendly software for public distribution (Aim 4).

Public Health Relevance

The goal of this project is to develop innovative and high-impact statistical methods for identifying genetic loci that influence complex human diseases. These methods will move us beyond the current analytical paradigm that is error-prone and inefficient. Application of the proposed methods to applied datasets should improve our understanding of the genetic origins of complex diseases.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM116065-01A1
Application #
9053162
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Krasnewich, Donna M
Project Start
2015-09-18
Project End
2020-08-31
Budget Start
2015-09-18
Budget End
2016-08-31
Support Year
1
Fiscal Year
2015
Total Cost
$320,090
Indirect Cost
$113,617
Name
Emory University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
066469933
City
Atlanta
State
GA
Country
United States
Zip Code
30322
Hu, Yi-Juan; Schmidt, Amand F; Dudbridge, Frank et al. (2017) Impact of Selection Bias on Estimation of Subsequent Event Risk. Circ Cardiovasc Genet 10:
Liao, Peizhou; Satten, Glen A; Hu, Yi-Juan (2017) PhredEM: a phred-score-informed genotype-calling approach for next-generation sequencing studies. Genet Epidemiol 41:375-387
Hu, Yi-Juan; Liao, Peizhou; Johnston, H Richard et al. (2016) Testing Rare-Variant Association without Calling Genotypes Allows for Systematic Differences in Sequencing between Cases and Controls. PLoS Genet 12:e1006040
Sun, Yan V; Hu, Yi-Juan (2016) Integrative Analysis of Multi-omics Data for Discovery and Functional Studies of Complex Human Diseases. Adv Genet 93:147-90