Biomedical Data Translator Technical Feasibility Assessment and Architecture Design

Dumontier, Michel; Tatonetti, Nicholas; Weng, Chunhua

Abstract

Our Vision: We propose DeepLink, a versatile data translator that integrate multi-scale, heterogeneous, and multi-source biomedical and clinical data. The primary goal of DeepLink is to enable meaningful bidirectional translation between clinical and molecular science by closing the interoperability gap between models and knowledge at different scales. The translator will enhance clinical science with molecular insights from basic and translational research (e.g. genetic variants, protein interactions, pathway functions, and cellular organization), and enable the molecular sciences by connecting biological discoveries with their pathophysiological consequences (e.g. diseases, signs and symptoms, pharmacological effects, physiological systems). Fundamental differences in the language and semantics used to describe the models and knowledge between the clinical and molecular domains results in an interoperability gap. DeepLink will systematically and comprehensively close this gap. We will begin with the latest technology in semantic knowledge graphs to support an extensible architecture for dynamic data federation and knowledge harmonization. We will design a system for multi-scale model integration that is ontology-based and will combine model execution with prior, curated biomedical knowledge. Our design strategy will be iterative and participatory and anchored by 10 major milestones. In a series of demonstrations of DeepLink?s functions, we will address one of the major challenges facing translational science: reproducibility of biomedical research findings that are based on evolving molecular datasets. Reproducibility of analyses and replication of results are central to scientific advancement. Many landmark studies have used data that are constantly being updated, curated, and pared down over time. Our series of demonstrations projects are designed to prototype the technology required for a scalable and robust translator as well as the techniques we will use to close the interoperability gap for a specific use case. The demonstration project will, itself, will be a significant and novel contribution to science. DeepLink will be able to answer questions that are currently enigmatic. Examples include: - From clinicians: What is the comparative effectiveness of all the treatments for disease Y given a patient's genetic/metabolic/proteomic profile? What are the functional variants in cell type X that are associated with differential treatment outcomes? What metabolite perturbations in cell type Y are associated with different subtypes of disease X? - From basic science researchers: What is known about disease Y across all model organisms (even those not designed to model Y)? What are all the clinical phenotypes that result from a change in function in protein X? Which biological pathways are affected by a pathogenic variant of disease Y? What patient data are available to evaluate a molecularlyderived clinical hypothesis? Challenges and Our Approaches: DeepLink will close the interoperability gap that currently prohibits molecular discoveries from leading to clinical innovations. DeepLink will be technologically driven, addressing the challenges associated with large, heterogeneous, semantically ambiguous, continuously changing, partially overlapping, and contextually dependent data by using (1) scalable, distributed, and versioned graph stores; (2) semantic technologies such as ontologies and Linked Data; (3) network analysis quality control methods; (4) machine-learning focused data fusion methods; (5) context-aware text mining, entity recognition and relation extraction; (6) multi-scale knowledge discovery using patient and molecular data; and (7) presentation of actionable knowledge to clinicians and basic scientists via user-friendly interfaces.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Center for Advancing Translational Sciences (NCATS)
Project #: 3OT3TR002027-01S2
Application #: 9855180
Study Section: Special Emphasis Panel (ZTR1)
Program Officer: Colvis, Christine

Project Start: 2016-09-25
Project End: 2019-12-31
Budget Start: 2019-01-01
Budget End: 2019-12-31
Support Year: 1
Fiscal Year: 2019
Total Cost
Indirect Cost

Institution

Name: Columbia University (N.Y.)
Department: Internal Medicine/Medicine
Type: Schools of Medicine
DUNS #: 621889815

City: New York
State: NY
Country: United States
Zip Code: 10032

Related projects


NIH 2019 OT3 TR	Biomedical Data Translator Technical Feasibility Assessment and Architecture Design Dumontier, Michel; Tatonetti, Nicholas P.; Weng, Chunhua / Columbia University (N.Y.)
NIH 2018 OT3 TR	Biomedical Data Translator Technical Feasibility Assessment and Architecture Design Dumontier, Michel; Tatonetti, Nicholas P.; Weng, Chunhua / Columbia University (N.Y.)
NIH 2016 OT3 TR	Biomedical Data Translator Technical Feasibility Assessment and Architecture Design Tatonetti, Nicholas P.; Dumontier, Michel; Weng, Chunhua / Columbia University (N.Y.)	$1,183,132

Publications

Polubriaginof, Fernanda C G; Vanguri, Rami; Quinnies, Kayla et al. (2018) Disease Heritability Inferred from Familial Relationships Reported in Medical Records. Cell 173:1692-1704.e11

Tatonetti, Nicholas P (2018) The Next Generation of Drug Safety Science: Coupling Detection, Corroboration, and Validation to Discover Novel Drug Effects and Drug-Drug Interactions. Clin Pharmacol Ther 103:177-179

Ta, Casey N; Dumontier, Michel; Hripcsak, George et al. (2018) Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records. Sci Data 5:180273

Wilkinson, Mark D; Sansone, Susanna-Assunta; Schultes, Erik et al. (2018) A design framework and exemplar metrics for FAIRness. Sci Data 5:180118

Weng, Chunhua; Goldstein, Andrew; Yuan, Chi et al. (2018) The ranking of scientists. J Biomed Inform 79:145-146

Karczewski, Konrad J; Tatonetti, Nicholas P; Manrai, Arjun K et al. (2017) METHODS TO ENSURE THE REPRODUCIBILITY OF BIOMEDICAL RESEARCH. Pac Symp Biocomput 22:117-119

Boland, Mary Regina; Polubriaginof, Fernanda; Tatonetti, Nicholas P (2017) Development of A Machine Learning Algorithm to Classify Drugs Of Unknown Fetal Effect. Sci Rep 7:12839

Boland, Mary Regina; Karczewski, Konrad J; Tatonetti, Nicholas P (2017) Ten Simple Rules to Enable Multi-site Collaborations through Data Sharing. PLoS Comput Biol 13:e1005278

Comments

Be the first to comment on Michel Dumontier's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: