The increasing accessibility and availability of genetic and genomic data has resulted in an in?ux of transla- tional research to understand complex diseases. Multiple types of genetic and genomic data have been demon- strated to contribute to the underlying biological mechanisms of human disease. Genetic variants, gene ex- pression, methylation and other genomic features often act in concert to effect change in cellular functions and downstream diseases. Genetic association methods traditionally have been used to analyze the relationship be- tween variants and outcomes. This standard approach has been noted to have several limitations, including not elucidating the complex relationship between variants and diseases and not using potentially valuable information from other genomic data types. This proposal aims at addressing these issues by developing integrative analysis methods for genetic and genomic data using (1) causal mediation analysis and (2) network analysis towards understanding the biological mechanisms underlying phenotypes of complex diseases. Speci?cally, causal mediation analysis provides an attractive framework for identifying the biological pathways that drive diseases. It proceeds by jointly analyzing multiple types of genetic and genomic data, where genetic variation is suggested to be mediated through genomic features. The method decomposes the overall effect of a variant on a disease outcome into the effect through the mediator, e.g. gene expression, and the effect through other biological mechanisms. Traditional mediation anal- ysis for binary outcomes makes a rarity assumption that can be violated in complex diseases, such as asthma, which are common in certain populations. Estimators are constructed in this aim without imposing additional distributional assumptions for common binary outcomes.
Aim 1 proposes to use causal mediation analysis of genetic and genomic data in the setting of a common di- chotomous outcome.
In Aim 2, statistical inference for network analysis of genetic and genomic data is developed. Network analyses have also emerged as an integrative approach to characterize complex genomic associations. Features of a network of genetic and genomic variants can inform biological function. Current network methods treat edges as ?xed and known, when in fact these relationships are estimated in the initial analyses. Given that they have uncertainty and error, it is important to estimate their error to ensure reliability and reproducibility of the results. Here, measures of error for network metrics are developed. The methods described in these aims will be applied to studies of asthma and chronic obstructive pulmonary disease. The success of this work will improve the ability to reproducibly detect relationships between various biomedical features using mediation and network analysis under many settings.

Public Health Relevance

Integrative genomic analyses provide a method for jointly analyzing the many genomic variants that contribute to disease. Existing statistical methods work in limited settings and fail to give consideration to error propagated in intermediate steps of analysis. Methods to allow for both the expanded use of these integrative approaches and the quanti?cation of error are essential to gaining more complete, reproducible insight into the complex genomic basis of diseases like asthma and chronic obstructive pulmonary disease.

Agency
National Institute of Health (NIH)
Institute
National Heart, Lung, and Blood Institute (NHLBI)
Type
Predoctoral Individual National Research Service Award (F31)
Project #
1F31HL138832-01
Application #
9393470
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Tigno, Xenia
Project Start
2017-09-01
Project End
2018-08-31
Budget Start
2017-09-01
Budget End
2018-08-31
Support Year
1
Fiscal Year
2017
Total Cost
Indirect Cost
Name
Harvard University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
149617367
City
Boston
State
MA
Country
United States
Zip Code
02115