Next-generation sequencing (NGS) of DNA provides an unprecedented opportunity to discover rare disease- influencing variants. However, the current practice of first calling the underlying genotypes and then treating the called values as known in rare variant tests is problematic in the presence of genotyping errors and inefficient if poorly genotyped variants are filtered out altogether. The goal of this application is to develop statistical approaches that move beyond the standard genotype-calling paradigm to instead model the sequencing reads directly for testing rare variant associations. We believe our approaches are robust to many practical designs of NGS studies, and substantially more powerful than the use of only variants whose genotypes are accurately called.
Aim 1 proposes methods for testing rare variant associations in case-control studies with external controls, which we believe will be robust to the systematic sequencing differences between cases and controls.
Aim 2 proposes rare variant methods for case-parent trio studies that control the type I error by formulating association through the underlying transmissions of the rare allele.
Aim 3 establishes novel methods for testing rare variant associations on the X chromosome; the flexible nature of our likelihood approach makes it ideal for accounting for copy number difference between sexes, X inactivation, different sex ratios between cases and controls, and sex-specific effects. We will evaluate these methods using datasets from real sequencing studies that we are actively involved in and will implement the methods in user- friendly software for public distribution (Aim 4).

Public Health Relevance

The goal of this project is to develop innovative and high-impact statistical methods for identifying genetic loci that influence complex human diseases. These methods will move us beyond the current analytical paradigm that is error-prone and inefficient. Application of the proposed methods to applied datasets should improve our understanding of the genetic origins of complex diseases.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Emory University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Hu, Yi-Juan; Schmidt, Amand F; Dudbridge, Frank et al. (2017) Impact of Selection Bias on Estimation of Subsequent Event Risk. Circ Cardiovasc Genet 10:
Liao, Peizhou; Satten, Glen A; Hu, Yi-Juan (2017) PhredEM: a phred-score-informed genotype-calling approach for next-generation sequencing studies. Genet Epidemiol 41:375-387
Hu, Yi-Juan; Liao, Peizhou; Johnston, H Richard et al. (2016) Testing Rare-Variant Association without Calling Genotypes Allows for Systematic Differences in Sequencing between Cases and Controls. PLoS Genet 12:e1006040
Sun, Yan V; Hu, Yi-Juan (2016) Integrative Analysis of Multi-omics Data for Discovery and Functional Studies of Complex Human Diseases. Adv Genet 93:147-90