The rate at which new drugs are being introduced to market is decreasing, with grave implications for human health. Knowledge about the molecular mechanisms relevant to drug response is critical, but is collected in myriad individual experiments. As a result, the published literature contains rich information about how drugs and genes/gene products interact to produce phenotypes at the molecular, cellular and organismal levels, but this textual data requires substantial additional processing. As a result, there are efforts to manually curate the literature, and extract relationships between three key entities: genes/gene products, drugs and phenotypes-with the goal of representing the information in structured, computable formats. Although automated text mining may ultimately replace expert human curators, its best current role is to triage the literature and bring potentially important information to the attention of human curators. Recent advances in computational natural language processing (NLP) generally, and within our laboratory specifically, offer an opportunity to extract relationships between key entities with high accuracy. In particular, we have prototyped methods that take a relatively small set of examples of a relationship of interest (e.g. examples of gene-drug pairs in which the gene product metabolizes the drug) and then and other pairs that share a similar relationship. These methods can be applied to any relationship between our three key entity types. Thus, we propose an ambitious plan to (1) gather large corpora of biomedical text and extend existing lexicons for these entities, (2) build a database of all sentences/paragraphs relating these entities to one another, (3) create methods for accurately extracting semantically precise relationships from all pairs of entity types, and (4) validate these extracted relationships using both available gold standard data experimental sources and expert curator evaluation. In addition to directly supporting curation, our methods and extractions will be made available as general purpose resources for understanding drug action.

Public Health Relevance

Published biological text contains an abundance of valuable information about how drugs work at the molecular, cellular and organism levels and how they affect patients. However, it is dif?cult to gather all relevant information to support discovery of new drugs and optimal use of existing drugs. This proposal outlines a plan to develop methods for extracting drug-related information from huge collections of text, and using it to improve drug discovery and use.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
2R01LM005652-19A1
Application #
8963236
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
1994-07-01
Project End
2019-06-30
Budget Start
2015-09-01
Budget End
2016-06-30
Support Year
19
Fiscal Year
2015
Total Cost
Indirect Cost
Name
Stanford University
Department
Biomedical Engineering
Type
Schools of Medicine
DUNS #
009214214
City
Stanford
State
CA
Country
United States
Zip Code
94304
Zhou, Weizhuang; Altman, Russ B (2018) Data-driven human transcriptomic modules determined by independent component analysis. BMC Bioinformatics 19:327
Lo, Yu-Chen; Rensi, Stefano E; Torng, Wen et al. (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23:1538-1546
Previde, Paul; Thomas, Brook; Wong, Mike et al. (2018) GeneDive: A gene interaction search and visualization tool to facilitate precision medicine. Pac Symp Biocomput 23:590-601
Liu, Tianyun; Ish-Shalom, Shirbi; Torng, Wen et al. (2018) Biological and functional relevance of CASP predictions. Proteins 86 Suppl 1:374-386
Percha, Bethany; Altman, Russ B (2018) A global network of biomedical relationships derived from text. Bioinformatics 34:2614-2624
Lavertu, Adam; McInnes, Greg; Daneshjou, Roxana et al. (2018) Pharmacogenomics and big genomic data: from lab to clinic and back again. Hum Mol Genet 27:R72-R78
Petkovic, Dragutin; Altman, Russ; Wong, Mike et al. (2018) Improving the explainability of Random Forest classifier - user centered approach. Pac Symp Biocomput 23:204-215
Mallory, Emily K; Acharya, Ambika; Rensi, Stefano E et al. (2018) Chemical reaction vector embeddings: towards predicting drug metabolism in the human gut microbiome. Pac Symp Biocomput 23:56-67
Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne et al. (2017) Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans. Genome Med 9:98
Torng, Wen; Altman, Russ B (2017) 3D deep convolutional neural networks for amino acid environment similarity analysis. BMC Bioinformatics 18:302

Showing the most recent 10 out of 64 publications