Text Mining for High-fidelity Curation and Discovery of Gene-drug-phenotype Relationships

Altman, Russ

Abstract

The rate at which new drugs are being introduced to market is decreasing, with grave implications for human health. Knowledge about the molecular mechanisms relevant to drug response is critical, but is collected in myriad individual experiments. As a result, the published literature contains rich information about how drugs and genes/gene products interact to produce phenotypes at the molecular, cellular and organismal levels, but this textual data requires substantial additional processing. As a result, there are efforts to manually curate the literature, and extract relationships between three key entities: genes/gene products, drugs and phenotypes-with the goal of representing the information in structured, computable formats. Although automated text mining may ultimately replace expert human curators, its best current role is to triage the literature and bring potentially important information to the attention of human curators. Recent advances in computational natural language processing (NLP) generally, and within our laboratory specifically, offer an opportunity to extract relationships between key entities with high accuracy. In particular, we have prototyped methods that take a relatively small set of examples of a relationship of interest (e.g. examples of gene-drug pairs in which the gene product metabolizes the drug) and then and other pairs that share a similar relationship. These methods can be applied to any relationship between our three key entity types. Thus, we propose an ambitious plan to (1) gather large corpora of biomedical text and extend existing lexicons for these entities, (2) build a database of all sentences/paragraphs relating these entities to one another, (3) create methods for accurately extracting semantically precise relationships from all pairs of entity types, and (4) validate these extracted relationships using both available gold standard data experimental sources and expert curator evaluation. In addition to directly supporting curation, our methods and extractions will be made available as general purpose resources for understanding drug action.

Public Health Relevance

Published biological text contains an abundance of valuable information about how drugs work at the molecular, cellular and organism levels and how they affect patients. However, it is dif?cult to gather all relevant information to support discovery of new drugs and optimal use of existing drugs. This proposal outlines a plan to develop methods for extracting drug-related information from huge collections of text, and using it to improve drug discovery and use.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 2R01LM005652-19A1
Application #: 8963236
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Ye, Jane

Project Start: 1994-07-01
Project End: 2019-06-30
Budget Start: 2015-09-01
Budget End: 2016-06-30
Support Year: 19
Fiscal Year: 2015
Total Cost
Indirect Cost

Institution

Name: Stanford University
Department: Biomedical Engineering
Type: Schools of Medicine
DUNS #: 009214214

City: Stanford
State: CA
Country: United States
Zip Code: 94304

Related projects

Publications

Zhou, Weizhuang; Altman, Russ B (2018) Data-driven human transcriptomic modules determined by independent component analysis. BMC Bioinformatics 19:327

Lo, Yu-Chen; Rensi, Stefano E; Torng, Wen et al. (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23:1538-1546

Previde, Paul; Thomas, Brook; Wong, Mike et al. (2018) GeneDive: A gene interaction search and visualization tool to facilitate precision medicine. Pac Symp Biocomput 23:590-601

Liu, Tianyun; Ish-Shalom, Shirbi; Torng, Wen et al. (2018) Biological and functional relevance of CASP predictions. Proteins 86 Suppl 1:374-386

Percha, Bethany; Altman, Russ B (2018) A global network of biomedical relationships derived from text. Bioinformatics 34:2614-2624

Lavertu, Adam; McInnes, Greg; Daneshjou, Roxana et al. (2018) Pharmacogenomics and big genomic data: from lab to clinic and back again. Hum Mol Genet 27:R72-R78

Petkovic, Dragutin; Altman, Russ; Wong, Mike et al. (2018) Improving the explainability of Random Forest classifier - user centered approach. Pac Symp Biocomput 23:204-215

Mallory, Emily K; Acharya, Ambika; Rensi, Stefano E et al. (2018) Chemical reaction vector embeddings: towards predicting drug metabolism in the human gut microbiome. Pac Symp Biocomput 23:56-67

Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne et al. (2017) Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans. Genome Med 9:98

Torng, Wen; Altman, Russ B (2017) 3D deep convolutional neural networks for amino acid environment similarity analysis. BMC Bioinformatics 18:302

Showing the most recent 10 out of 64 publications

Comments

Be the first to comment on Russ Altman's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: