A fundamental challenge to translate insights between biomedical researchers, who study biological mechanisms, and clinicians, who diagnose patient symptoms, is that many links between biological processes and disease pathophysiology are poorly understood. A comprehensive Biomedical Translator must enable chains of inference across objects as diverse as genetic mutations, molecular effects, tissue-specific expression patterns, cellular processes, organ phenotypes, disease states, patient symptoms, and drug responses, a challenge beyond the scope of any one organization. Fortunately, many individual links in this chain have been made by experiments yielding statistical connections between individual data types. High-throughput perturbation screens link chemical and genetic perturbations to cellular phenotypes such as gene-expression patterns, cell survival, or changes in phosphorylation. Genetic association studies link mutations to human disease or intermediate phenotypes and biomarkers. Electronic medical records (EMR) link diseases or human phenotypes to diagnostic or current procedural terminology (CPT) codes, and clinical trials link the impact of drugs and drug candidates on disease states. In principle, incorporating these links into chains of inference could translate results between the full set of data types within them. In practice, each link is maintained by experts with domain-specific experiments, semantic terminology, and methodological standards. While a key challenge faced by a global Biomedical Translator is to establish consistent standards across these existing data types, a more important goal is to develop a principled and robust framework to (a) model biological systems and experimental approaches to investigate them; (b) organize knowledge about biological mechanism and disease; and (c) incorporate diverse datasets that serve as windows into the underlying and unknown state of nature. We propose to implement a Biomedical Translator as a probabilistic graphical model, a paradigm from artificial intelligence (AI) research. Just as separate research communities form weakly coupled parts of the translation process, graphical models allow global inferences from weakly coupled ?nodes?. These inferences require each node to publish only probability distributions, enabling interoperability without necessarily having global entity-resolution standards, and benefit from paradigms for quality control, fault tolerance, and relevance assessment common in AI research. We hypothesize that a limited number of APIs, implemented as probability computations by communities around the world, would yield a Biomedical Translator as an emergent property of weakly coupled knowledge sources. From basic properties of graphical models, such a Translator could probabilistically translate among any data types connected within it, allowing for relatively complex query concepts. For example: What cellular processes in which tissues are impacted in a patient-based EMR? What genetic mutations sensitize cells to small-molecule treatment effects? Which small molecules mimic genetic ?experiments of nature? that protect against disease? To illustrate the value of these resources and our architectural paradigm, we propose a demonstration project to implement a Biomedical Translator supporting queries between small molecules, biological processes, genes, and disease. The demonstration project will provide a valuable first step to confront key data-integration and organizational challenges and will enable previously impossible queries, such as identifying small molecules that perturb the same biological processes implicated by human genetics in a disease context. In this capacity, such Translator could realistically identify existing drugs for known symptoms (i.e., repurposing), but could more broadly serve as an engine for hypothesis generation and biological discovery, suggesting pre-clinical small molecules to develop based on their observed biological activity, or providing heretofore novel links between cellular protein function and disease pathophysiology.

Agency
National Institute of Health (NIH)
Institute
National Center for Advancing Translational Sciences (NCATS)
Project #
3OT3TR002025-01S1
Application #
9540181
Study Section
Special Emphasis Panel (ZTR1)
Program Officer
Colvis, Christine
Project Start
2016-09-23
Project End
2018-06-30
Budget Start
2017-07-01
Budget End
2018-06-30
Support Year
1
Fiscal Year
2017
Total Cost
Indirect Cost
Name
Broad Institute, Inc.
Department
Type
DUNS #
623544785
City
Cambridge
State
MA
Country
United States
Zip Code
02142