The rate at which new drugs are being introduced to market is decreasing, with grave implications for human health. Knowledge about the molecular mechanisms relevant to drug response is critical, but is collected in myriad individual experiments. As a result, the published literature contains rich information about how drugs and genes/gene products interact to produce phenotypes at the molecular, cellular and organismal levels, but this textual data requires substantial additional processing. As a result, there are efforts to manually curate the literature, and extract relationships between three key entities: genes/gene products, drugs and phenotypes-with the goal of representing the information in structured, computable formats. Although automated text mining may ultimately replace expert human curators, its best current role is to triage the literature and bring potentially important information to the attention of human curators. Recent advances in computational natural language processing (NLP) generally, and within our laboratory specifically, offer an opportunity to extract relationships between key entities with high accuracy. In particular, we have prototyped methods that take a relatively small set of examples of a relationship of interest (e.g. examples of gene-drug pairs in which the gene product metabolizes the drug) and then and other pairs that share a similar relationship. These methods can be applied to any relationship between our three key entity types. Thus, we propose an ambitious plan to (1) gather large corpora of biomedical text and extend existing lexicons for these entities, (2) build a database of all sentences/paragraphs relating these entities to one another, (3) create methods for accurately extracting semantically precise relationships from all pairs of entity types, and (4) validate these extracted relationships using both available gold standard data experimental sources and expert curator evaluation. In addition to directly supporting curation, our methods and extractions will be made available as general purpose resources for understanding drug action.
Published biological text contains an abundance of valuable information about how drugs work at the molecular, cellular and organism levels and how they affect patients. However, it is dif?cult to gather all relevant information to support discovery of new drugs and optimal use of existing drugs. This proposal outlines a plan to develop methods for extracting drug-related information from huge collections of text, and using it to improve drug discovery and use.
Zhou, Weizhuang; Altman, Russ B (2018) Data-driven human transcriptomic modules determined by independent component analysis. BMC Bioinformatics 19:327 |
Lo, Yu-Chen; Rensi, Stefano E; Torng, Wen et al. (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23:1538-1546 |
Previde, Paul; Thomas, Brook; Wong, Mike et al. (2018) GeneDive: A gene interaction search and visualization tool to facilitate precision medicine. Pac Symp Biocomput 23:590-601 |
Liu, Tianyun; Ish-Shalom, Shirbi; Torng, Wen et al. (2018) Biological and functional relevance of CASP predictions. Proteins 86 Suppl 1:374-386 |
Percha, Bethany; Altman, Russ B (2018) A global network of biomedical relationships derived from text. Bioinformatics 34:2614-2624 |
Lavertu, Adam; McInnes, Greg; Daneshjou, Roxana et al. (2018) Pharmacogenomics and big genomic data: from lab to clinic and back again. Hum Mol Genet 27:R72-R78 |
Petkovic, Dragutin; Altman, Russ; Wong, Mike et al. (2018) Improving the explainability of Random Forest classifier - user centered approach. Pac Symp Biocomput 23:204-215 |
Mallory, Emily K; Acharya, Ambika; Rensi, Stefano E et al. (2018) Chemical reaction vector embeddings: towards predicting drug metabolism in the human gut microbiome. Pac Symp Biocomput 23:56-67 |
Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne et al. (2017) Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans. Genome Med 9:98 |
Torng, Wen; Altman, Russ B (2017) 3D deep convolutional neural networks for amino acid environment similarity analysis. BMC Bioinformatics 18:302 |
Showing the most recent 10 out of 64 publications