Research Project: Chronic lung diseases account for over 100,000 deaths a year in the United States and the pathogenetic underpinnings of the diseases are known to be very heterogeneous which underscores the importance of identifying disease subtypes. In recent years, integrative analysis of multi-omics data including the transcriptomic gene expression and exome/genome wide DNA sequence variants data have successfully identified molecular subtypes of many diseases that can predict a patient's response to cytotoxic and biologic treatments. This promises a future of biomarker driven personalized treatments that will improve outcomes, reduce toxicity, and reduce cost. In contrast, similar analysis for chronic lung diseases has not been realized. This is in part due to the complexity of the genetic and transcriptomic perturbations that contribute to these types of diseases. In addition, not all exomic/genome disease relevant sequence variants are expressed in a chronically diseased organ, making much of the exomic/whole genome sequencing data non-relevant in any specific disease. Our team has been analyzing large-scale transcriptomic data to identify disease heterogeneity, and our previous studies in asthma suggested that integrative analysis of the longitudinal transcriptional data and the genetic sequence variants data from the same subjects will significantly enhance discovery in chronic inflammatory diseases and specifically lung diseases. In addition, the pre-defined biological pathway information can significantly reduce the data dimension and enrich for signals for molecular endotypes of lung diseases. Taken together, we hypothesize that integrative analysis of longitudinal gene expression and genetic sequence variation in lung derived RNAs combined with prior pathway information will identify disease heterogeneity that has stronger association with important disease clinical features than those identified by integrating gene expression and prior pathway information only. To examine this hypothesis, we propose to 1) develop disease specific methods to identify sequence variants from lung tissue derived longitudinal RNA sequencing data; and 2) develop novel statistical models to integrate the genetic sequence variants, the longitudinal transcriptional signatures from the same dataset and the biological pathways to identify endotypes of chronic lung diseases, including asthma and sarcoidosis. Environment and Collaborators: I will be working on the proposed research together with Dr. Hongyu Zhao and collaborating with Drs. Geoffrey L. Chupp and Naftali Kaminski, a team of experienced, committed experts in the fields of statistical genomics and genetics, pulmonary medicine and translational research. This team has demonstrated collaborative success and each member brings unique expertise. The data sets for our main study populations will be mainly generated in Dr. Chupp's and Dr. Kaminski's labs at the PCCSM section. Validation of the discoveries and downstream functional studies will also be conducted in the Chupp and Kaminski labs.

Public Health Relevance

Chronic lung diseases are known to have complex origins which cause differences, or heterogeneity, in the clinical manifestations of disease and in treatment responses. My research effort will focus on developing novel statistical and computational methods to identify this heterogeneity from longitudinal RNA sequencing data by a deep and integrative analysis strategy. The ultimate goal is to identify the fundamentally different disease subtypes, discern the molecular markers to identify patients from each subtype and discover potential therapeutic targets for personalized treatment for patients.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21LM012884-02
Application #
9731648
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2018-07-01
Project End
2020-06-30
Budget Start
2019-07-01
Budget End
2020-06-30
Support Year
2
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Yale University
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
043207562
City
New Haven
State
CT
Country
United States
Zip Code
06520