Biomedical Data Translator Technical Feasibility Assessment and Architecture Design

Clemons, Paul

Abstract

A fundamental challenge to translate insights between biomedical researchers, who study biological mechanisms, and clinicians, who diagnose patient symptoms, is that many links between biological processes and disease pathophysiology are poorly understood. A comprehensive Biomedical Translator must enable chains of inference across objects as diverse as genetic mutations, molecular effects, tissue-specific expression patterns, cellular processes, organ phenotypes, disease states, patient symptoms, and drug responses, a challenge beyond the scope of any one organization. Fortunately, many individual links in this chain have been made by experiments yielding statistical connections between individual data types. High-throughput perturbation screens link chemical and genetic perturbations to cellular phenotypes such as gene-expression patterns, cell survival, or changes in phosphorylation. Genetic association studies link mutations to human disease or intermediate phenotypes and biomarkers. Electronic medical records (EMR) link diseases or human phenotypes to diagnostic or current procedural terminology (CPT) codes, and clinical trials link the impact of drugs and drug candidates on disease states. In principle, incorporating these links into chains of inference could translate results between the full set of data types within them. In practice, each link is maintained by experts with domain-specific experiments, semantic terminology, and methodological standards. While a key challenge faced by a global Biomedical Translator is to establish consistent standards across these existing data types, a more important goal is to develop a principled and robust framework to (a) model biological systems and experimental approaches to investigate them; (b) organize knowledge about biological mechanism and disease; and (c) incorporate diverse datasets that serve as windows into the underlying and unknown state of nature. We propose to implement a Biomedical Translator as a probabilistic graphical model, a paradigm from artificial intelligence (AI) research. Just as separate research communities form weakly coupled parts of the translation process, graphical models allow global inferences from weakly coupled ?nodes?. These inferences require each node to publish only probability distributions, enabling interoperability without necessarily having global entity-resolution standards, and benefit from paradigms for quality control, fault tolerance, and relevance assessment common in AI research. We hypothesize that a limited number of APIs, implemented as probability computations by communities around the world, would yield a Biomedical Translator as an emergent property of weakly coupled knowledge sources. From basic properties of graphical models, such a Translator could probabilistically translate among any data types connected within it, allowing for relatively complex query concepts. For example: What cellular processes in which tissues are impacted in a patient-based EMR? What genetic mutations sensitize cells to small-molecule treatment effects? Which small molecules mimic genetic ?experiments of nature? that protect against disease? To illustrate the value of these resources and our architectural paradigm, we propose a demonstration project to implement a Biomedical Translator supporting queries between small molecules, biological processes, genes, and disease. The demonstration project will provide a valuable first step to confront key data-integration and organizational challenges and will enable previously impossible queries, such as identifying small molecules that perturb the same biological processes implicated by human genetics in a disease context. In this capacity, such Translator could realistically identify existing drugs for known symptoms (i.e., repurposing), but could more broadly serve as an engine for hypothesis generation and biological discovery, suggesting pre-clinical small molecules to develop based on their observed biological activity, or providing heretofore novel links between cellular protein function and disease pathophysiology.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Center for Advancing Translational Sciences (NCATS)
Project #: 3OT3TR002025-01S1
Application #: 9540181
Study Section: Special Emphasis Panel (ZTR1)
Program Officer: Colvis, Christine

Project Start: 2016-09-23
Project End: 2018-06-30
Budget Start: 2017-07-01
Budget End: 2018-06-30
Support Year: 1
Fiscal Year: 2017
Total Cost
Indirect Cost

Institution

Name: Broad Institute, Inc.
Department
Type
DUNS #: 623544785

City: Cambridge
State: MA
Country: United States
Zip Code: 02142

Related projects


NIH 2017 OT3 TR	Biomedical Data Translator Technical Feasibility Assessment and Architecture Design Clemons, Paul Andrew / Broad Institute, Inc.
NIH 2016 OT3 TR	Biomedical Data Translator Technical Feasibility Assessment and Architecture Design Clemons, Paul Andrew / Broad Institute, Inc.	$399,608

Comments

Be the first to comment on Paul Clemons's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Related projects

Comments