Genome-wide association (GWA) studies have begun to reveal the genetic architecture of many common, complex diseases. Variants identified in GWA studies are proving useful for predicting disease risk, refining diagnoses, and optimizing treatment regimens. Due to the proportion of heritability as yet unaccounted for in most complex diseases, there is great potential for improved statistical methodology to mine new and existing GWA studies for many as yet undetected signals. As the clinical use of whole-genome sequencing and electronic medical records become standard, the high-dimensional inference procedures developed for GWA studies will find renewed use in whole-genome and phenome-wide association studies. Motivated by challenges arising in the study of obstructive sleep apnea, I propose to develop novel statistical methods that address outstanding gaps in methodology for GWA testing of non-normal and missing phenotypes.
My first aim addresses the challenge of conducting GWA analysis of quantitative traits with non-normal residuals. This work is motivated by analysis of the apnea-hypopnea index (AHI), a skewed phenotype used in the diagnosis of OSA. A common recourse when analyzing phenotypes with skewed or heavy tailed distributions is to apply the rank based inverse normal transformation (INT). However, it is unclear how best to apply the INT for optimizing power, or even whether INT-based association testing is always valid. Preliminary results indicate that two variations on INT-based testing are indeed valid, but that neither is uniformly most powerful.
In aim 1, I combine these approaches into a robust, well-powered, and general purpose omnibus test. The omnibus test will enable researchers to easily perform GWA analysis of quantitative traits, without the necessity of checking residual assumptions. I will apply the omnibus test to GWA analysis of AHI in several cohorts, including the Hispanic Community Health Study (SOL), the Multi-Ethnic Study of Atherosclerosis (MESA), and the Sleep Health Heart Study (SHHS).
My second aim addresses the challenge of conducting GWA analysis using a surrogate of the target phenotype. This work is motivated by the setting where the target phenotype (AHI) is only available for a subset of subjects, but surrogates of the target phenotype are available for all subjects. Performing the analysis using only subjects with complete data leads to loss of power, and has the potential to introduce bias. An existing approach to retaining all subjects imputes the target phenotype using the surrogate phenotype. However, this approach neither makes full use of the information contained in the data, nor appropriately propagates uncertainty due to imputation of the target phenotype.
In aim 2, I develop a surrogate phenotype association test using the expectation maximization (EM)-algorithm. I will apply the resulting test to leverage the SHHS, where AHI is measured, for performing GWA analysis in the vast UK Biobank, where only surrogate measurements, sleep duration and excessive daytime sleepiness, are available.

Public Health Relevance

Genome-wide association (GWA) studies published to date have identified thousands of genetic variants associated with hundreds of complex diseases and phenotypic traits. At present, there are gaps in the statistical methodology for GWA analysis of quantitative and missing phenotypes. Redressing these gaps will enable the scientific community to better leverage existing and future data for understanding the genetic architecture of common, complex diseases, including obstructive sleep apnea.

Agency
National Institute of Health (NIH)
Institute
National Heart, Lung, and Blood Institute (NHLBI)
Type
Predoctoral Individual National Research Service Award (F31)
Project #
5F31HL140822-02
Application #
9655229
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Laposky, Aaron D
Project Start
2018-03-01
Project End
2019-05-31
Budget Start
2019-03-01
Budget End
2019-05-31
Support Year
2
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Harvard University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
149617367
City
Boston
State
MA
Country
United States
Zip Code
02115