Biomedical Data Translator Technical Feasibility Assessment of Reasoning Tool: Oregon State University

Deutsch, Eric; Nandi, Arnab; Ramsey, Stephen

Abstract

For developing the POC software, we used a rapid prototyping, cloud-based approach based on Python code running in a Docker container in Amazon?s Elastic Compute Cloud (EC2). We used Git for distributed source code control, distributed project management, and code deployment. We implemented a blackboard-like software module (Orangeboard) that provides a knowledge-graph object model (including information about source database and edge types for seven different types of relationships) and the ability to load the graph into Neo4j using a high-performance bulk-transfer (parameterized Cypher) and protocol (Bolt). We implemented Python classes to provide RESTful querying functionality for 14 different knowledge sources (Monarch/BioLink, DisGeNET, Disease Ontology, GeneProf, miRBase, miRGate, MyGene.info, OMIM, Pathway Commons 2, Pharos, human phenotype ontology, Reactome, Monarch/SciGraph, and UniProt). We implemented client-side HTTP request/response caching as well as non-persistent method-level caching in Python, to accelerate knowledge graph expansion. We implemented a BioNetExpander class that can iteratively expand a knowledge graph (in Orangeboard) from one or more seed nodes. This approach is flexible with respect to future types of queries and can accommodate future selective rules for node extension. Using BioNetExpander we are able to expand a knowledge graph from 21 seed diseases to 20,000 nodes and 800,000 relationships, in an hour. To enable path scoring, we implemented a Python class for obtaining path topological characteristics and metadata, for a given path in the Neo4j graph. We then implemented Python-based scripts for querying for paths between genetic conditions and the 21 diseases (Q1), and for the 1,000 drug/disease pairs (Q2) in the Neo4j knowledge graph (using Cypher). We benchmarked path-finding performance of this system and found that a typical shortest-paths query with two fixed endpoints takes 50 ms, and thus, this approach should have low query-response latency. In order to leverage PubMed abstract co-occurrence information in scoring a path in the knowledge graph, we used high-performance software (from Dr. Liang Huang?s lab) for indexing PubMed and are in the process of obtaining Normalized Google Distance (NGD) scores for pairs of genetic conditions and diseases (for Q1) and for pairs of drugs and diseases (for Q2). With the knowledge graph in hand, are in the process of refining our path-scoring approaches for Q1 & Q2 in preparation for the POC demo.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Center for Advancing Translational Sciences (NCATS)
Project #: 1OT2TR002520-01
Application #: 9613383
Study Section: Special Emphasis Panel (ZTR1)
Program Officer: Colvis, Christine

Project Start: 2017-12-29
Project End: 2019-12-28
Budget Start: 2017-12-29
Budget End: 2019-12-28
Support Year: 1
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: Oregon State University
Department
Type: Schools of Public Health
DUNS #: 053599908

City: Corvallis
State: OR
Country: United States
Zip Code: 97331

Related projects


NIH 2019 OT2 TR	Biomedical Data Translator Technical Feasibility Assessment of Reasoning Tool: Oregon State University Ramsey, Stephen A.; Deutsch, Eric / Oregon State University
NIH 2018 OT2 TR	Biomedical Data Translator Technical Feasibility Assessment of Reasoning Tool: Oregon State University Deutsch, Eric; Nandi, Arnab; Ramsey, Stephen A. / Oregon State University

Comments

Be the first to comment on Eric Deutsch's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Related projects

Comments