The new cost-effective high-throughput genomic and imaging technologies have revolutionized the field of clinical diagnosis and research, but they have also created a number of new and significant challenges. The new datasets are large and frequently multi-modal, i.e. the measured variables are of different types: continuous (omics, fMRI measurements), binary (SNPs, gender), numerical (age, drug dosage), categorical (family history of disease, tissue of metastasis), ordinal (tumor stage, smoking). A key analysis aspect is to discover the direct (causal) associations between variables. This is important for many reasons: it can be used for classification, biomarker selection, drug effect, or for mechanistic studies of network perturbations in disease. Graphical models have been used in the past but they are not tuned for (a) multi-modal data and (b) large datasets. The objective of this application is to develop novel methodologies that will identify causal or partially causal networks, which can be used to support and enhance accurate disease prediction, and sub disease classification and help identify key interactions of the molecular mechanisms of diseases. We will develop and test new methodologies based on mixed variable partially causal graphical (MVPCG) models. Evaluation will be done on synthetic and real datasets, including parallel datasets with genomic, genetic and epigenetic data, clinical information and time series diagnostic image data. Our central hypothesis is that an integrative, computational analysis of different modalities of diagnostic patient data can identify complex associations and causal relations between clinical and other disease relevant features and thus help decipher the molecular disease mechanisms. The deliverables will be (1) new graphical approaches for integration and co-analysis of multi-modal biomedical and clinical data; (2) a new, fully documented software package for MatLab and R that can be seamlessly incorporated in other algorithms; (3) a new fully supported graphical user interface (GUI) to further disseminate our methodologies to non computer-savvy users; (4) results on the pathogenesis and predictive features of metastatic melanoma patients; and (5) results on predictive features of autistic spectrum subjects and neurotypicals. If successful, this cross-disciplinary team project will have a positive impact beyond the above deliverables, since the generality of our approaches makes them suitable for studying of any disease and makes them easily integratable into personalized medicine strategies in the future when massive high-throughput data collection will become a routine diagnostic procedure in all hospitals.

Public Health Relevance

New data collection methods have the potential to revolutionize medicine by generating multiple and informationally complementary data streams from patients. A current roadblock is efficient data analysis of such multi-modal datasets. We propose to use partially causal graphical models to represent the data and identify the direct associations between variables. It is believed that efficient co-analysis of multiple patient data types will improve current disease diagnosis and prognosis procedures, identify subdisease types and help study the disease molecular mechanisms. This project will be performed by a multi-disciplinary team of investigators at University of Pittsburgh and Carnegie Mellon University, including experts in causal modeling, regulatory genomics and clinical care.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM012087-03
Application #
9281893
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2015-06-01
Project End
2019-05-31
Budget Start
2017-06-01
Budget End
2018-05-31
Support Year
3
Fiscal Year
2017
Total Cost
$324,992
Indirect Cost
$68,162
Name
University of Pittsburgh
Department
Biology
Type
Schools of Medicine
DUNS #
004514360
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213
Raghu, Vineet K; Ramsey, Joseph D; Morris, Alison et al. (2018) Comparison of strategies for scalable causal discovery of latent variable models from mixed data. Int J Data Sci Anal 6:33-45
Huang, Biwei; Zhang, Kun; Lin, Yizhu et al. (2018) Generalized Score Functions for Causal Discovery. KDD 2018:1551-1560
Zhang, Kun; Schölkopf, Bernhard; Spirtes, Peter et al. (2018) Learning causality and causality-related learning: some recent progress. Natl Sci Rev 5:26-29
Kitsios, Georgios D; Fitch, Adam; Manatakis, Dimitris V et al. (2018) Respiratory Microbiome Profiling for Etiologic Diagnosis of Pneumonia in Mechanically Ventilated Patients. Front Microbiol 9:1413
Manatakis, Dimitris V; Raghu, Vineet K; Benos, Panayiotis V (2018) piMGM: incorporating multi-source priors in mixed graphical models for learning disease networks. Bioinformatics 34:i848-i856
Ping, Peipei; Hermjakob, Henning; Polson, Jennifer S et al. (2018) Biomedical Informatics on the Cloud: A Treasure Hunt for Advancing Cardiovascular Medicine. Circ Res 122:1290-1301
Raghu, Vineet K; Beckwitt, Colin H; Warita, Katsuhiko et al. (2018) Biomarker identification for statin sensitivity of cancer cell lines. Biochem Biophys Res Commun 495:659-665
Andrews, Bryan; Ramsey, Joseph; Cooper, Gregory F (2018) Scoring Bayesian Networks of Mixed Variables. Int J Data Sci Anal 6:3-18
Raghu, Vineet K; Ge, Xiaoyu; Chrysanthis, Panos K et al. (2017) Integrated Theory- and Data-driven Feature Selection in Gene Expression Data Analysis. Proc Int Conf Data Eng 2017:1525-1532
Pociask, Derek A; Robinson, Keven M; Chen, Kong et al. (2017) Epigenetic and Transcriptomic Regulation of Lung Repair during Recovery from Influenza Infection. Am J Pathol 187:851-863

Showing the most recent 10 out of 13 publications