Our Vision: We propose DeepLink, a versatile data translator that integrate multi-scale, heterogeneous, and multi-source biomedical and clinical data. The primary goal of DeepLink is to enable meaningful bidirectional translation between clinical and molecular science by closing the interoperability gap between models and knowledge at different scales. The translator will enhance clinical science with molecular insights from basic and translational research (e.g. genetic variants, protein interactions, pathway functions, and cellular organization), and enable the molecular sciences by connecting biological discoveries with their pathophysiological consequences (e.g. diseases, signs and symptoms, pharmacological effects, physiological systems). Fundamental differences in the language and semantics used to describe the models and knowledge between the clinical and molecular domains results in an interoperability gap. DeepLink will systematically and comprehensively close this gap. We will begin with the latest technology in semantic knowledge graphs to support an extensible architecture for dynamic data federation and knowledge harmonization. We will design a system for multi-scale model integration that is ontology-based and will combine model execution with prior, curated biomedical knowledge. Our design strategy will be iterative and participatory and anchored by 10 major milestones. In a series of demonstrations of DeepLink?s functions, we will address one of the major challenges facing translational science: reproducibility of biomedical research findings that are based on evolving molecular datasets. Reproducibility of analyses and replication of results are central to scientific advancement. Many landmark studies have used data that are constantly being updated, curated, and pared down over time. Our series of demonstrations projects are designed to prototype the technology required for a scalable and robust translator as well as the techniques we will use to close the interoperability gap for a specific use case. The demonstration project will, itself, will be a significant and novel contribution to science. DeepLink will be able to answer questions that are currently enigmatic. Examples include: - From clinicians: What is the comparative effectiveness of all the treatments for disease Y given a patient's genetic/metabolic/proteomic profile? What are the functional variants in cell type X that are associated with differential treatment outcomes? What metabolite perturbations in cell type Y are associated with different subtypes of disease X? - From basic science researchers: What is known about disease Y across all model organisms (even those not designed to model Y)? What are all the clinical phenotypes that result from a change in function in protein X? Which biological pathways are affected by a pathogenic variant of disease Y? What patient data are available to evaluate a molecularlyderived clinical hypothesis? Challenges and Our Approaches: DeepLink will close the interoperability gap that currently prohibits molecular discoveries from leading to clinical innovations. DeepLink will be technologically driven, addressing the challenges associated with large, heterogeneous, semantically ambiguous, continuously changing, partially overlapping, and contextually dependent data by using (1) scalable, distributed, and versioned graph stores; (2) semantic technologies such as ontologies and Linked Data; (3) network analysis quality control methods; (4) machine-learning focused data fusion methods; (5) context-aware text mining, entity recognition and relation extraction; (6) multi-scale knowledge discovery using patient and molecular data; and (7) presentation of actionable knowledge to clinicians and basic scientists via user-friendly interfaces.

Agency
National Institute of Health (NIH)
Institute
National Center for Advancing Translational Sciences (NCATS)
Project #
3OT3TR002027-01S1
Application #
9635840
Study Section
Special Emphasis Panel (ZTR1)
Program Officer
Colvis, Christine
Project Start
2018-03-01
Project End
2018-12-31
Budget Start
2018-03-01
Budget End
2018-12-31
Support Year
1
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Columbia University (N.Y.)
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
621889815
City
New York
State
NY
Country
United States
Zip Code
10032
Boland, Mary Regina; Polubriaginof, Fernanda; Tatonetti, Nicholas P (2017) Development of A Machine Learning Algorithm to Classify Drugs Of Unknown Fetal Effect. Sci Rep 7:12839
Boland, Mary Regina; Karczewski, Konrad J; Tatonetti, Nicholas P (2017) Ten Simple Rules to Enable Multi-site Collaborations through Data Sharing. PLoS Comput Biol 13:e1005278