Our Vision: We propose DeepLink, a versatile data translator that integrate multi-scale, heterogeneous, and multi-source biomedical and clinical data. The primary goal of DeepLink is to enable meaningful bidirectional translation between clinical and molecular science by closing the interoperability gap between models and knowledge at different scales. The translator will enhance clinical science with molecular insights from basic and translational research (e.g. genetic variants, protein interactions, pathway functions, and cellular organization), and enable the molecular sciences by connecting biological discoveries with their pathophysiological consequences (e.g. diseases, signs and symptoms, pharmacological effects, physiological systems). Fundamental differences in the language and semantics used to describe the models and knowledge between the clinical and molecular domains results in an interoperability gap. DeepLink will systematically and comprehensively close this gap. We will begin with the latest technology in semantic knowledge graphs to support an extensible architecture for dynamic data federation and knowledge harmonization. We will design a system for multi-scale model integration that is ontology-based and will combine model execution with prior, curated biomedical knowledge. Our design strategy will be iterative and participatory and anchored by 10 major milestones. In a series of demonstrations of DeepLink?s functions, we will address one of the major challenges facing translational science: reproducibility of biomedical research findings that are based on evolving molecular datasets. Reproducibility of analyses and replication of results are central to scientific advancement. Many landmark studies have used data that are constantly being updated, curated, and pared down over time. Our series of demonstrations projects are designed to prototype the technology required for a scalable and robust translator as well as the techniques we will use to close the interoperability gap for a specific use case. The demonstration project will, itself, will be a significant and novel contribution to science. DeepLink will be able to answer questions that are currently enigmatic. Examples include: - From clinicians: What is the comparative effectiveness of all the treatments for disease Y given a patient's genetic/metabolic/proteomic profile? What are the functional variants in cell type X that are associated with differential treatment outcomes? What metabolite perturbations in cell type Y are associated with different subtypes of disease X? - From basic science researchers: What is known about disease Y across all model organisms (even those not designed to model Y)? What are all the clinical phenotypes that result from a change in function in protein X? Which biological pathways are affected by a pathogenic variant of disease Y? What patient data are available to evaluate a molecularlyderived clinical hypothesis? Challenges and Our Approaches: DeepLink will close the interoperability gap that currently prohibits molecular discoveries from leading to clinical innovations. DeepLink will be technologically driven, addressing the challenges associated with large, heterogeneous, semantically ambiguous, continuously changing, partially overlapping, and contextually dependent data by using (1) scalable, distributed, and versioned graph stores; (2) semantic technologies such as ontologies and Linked Data; (3) network analysis quality control methods; (4) machine-learning focused data fusion methods; (5) context-aware text mining, entity recognition and relation extraction; (6) multi-scale knowledge discovery using patient and molecular data; and (7) presentation of actionable knowledge to clinicians and basic scientists via user-friendly interfaces.

National Institute of Health (NIH)
National Center for Advancing Translational Sciences (NCATS)
Project #
Application #
Study Section
Special Emphasis Panel (ZTR1)
Program Officer
Colvis, Christine
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Internal Medicine/Medicine
Schools of Medicine
New York
United States
Zip Code
Polubriaginof, Fernanda C G; Vanguri, Rami; Quinnies, Kayla et al. (2018) Disease Heritability Inferred from Familial Relationships Reported in Medical Records. Cell 173:1692-1704.e11
Tatonetti, Nicholas P (2018) The Next Generation of Drug Safety Science: Coupling Detection, Corroboration, and Validation to Discover Novel Drug Effects and Drug-Drug Interactions. Clin Pharmacol Ther 103:177-179
Ta, Casey N; Dumontier, Michel; Hripcsak, George et al. (2018) Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records. Sci Data 5:180273
Wilkinson, Mark D; Sansone, Susanna-Assunta; Schultes, Erik et al. (2018) A design framework and exemplar metrics for FAIRness. Sci Data 5:180118
Weng, Chunhua; Goldstein, Andrew; Yuan, Chi et al. (2018) The ranking of scientists. J Biomed Inform 79:145-146
Karczewski, Konrad J; Tatonetti, Nicholas P; Manrai, Arjun K et al. (2017) METHODS TO ENSURE THE REPRODUCIBILITY OF BIOMEDICAL RESEARCH. Pac Symp Biocomput 22:117-119
Boland, Mary Regina; Polubriaginof, Fernanda; Tatonetti, Nicholas P (2017) Development of A Machine Learning Algorithm to Classify Drugs Of Unknown Fetal Effect. Sci Rep 7:12839
Boland, Mary Regina; Karczewski, Konrad J; Tatonetti, Nicholas P (2017) Ten Simple Rules to Enable Multi-site Collaborations through Data Sharing. PLoS Comput Biol 13:e1005278