Owing to exponential growth in scienti?c literature, it has become increasingly dif?cult for researchers to keep up with the latest developments in their ?elds of study. Hence, computational approaches that automatically mine large amounts of free text to extract essential information have gained popularity. This information is typically represented in the form of binary relations between different biomedical concepts. In this context, automatic extraction of meaningful relations from natural language narratives, a task often termed biomedical relation extraction (BRE), has garnered attention from informaticians. The relations extracted are used in high level applications including information retrieval (IR), literature based discovery (LBD), question answering, and text summarization. Most current BRE efforts tend to focus on a speci?c subdomain in biomedicine. For example, researchers built models that extract gene-protein or gene-gene interactions; in the clinical domain, recent results are focused on drug-drug and drug-disease interactions mentioned in clinical narratives. The only effort that extracts a broad set of relations adhering t a large standardized vocabulary is the rule based SemRep program being developed by researchers at the National Library of Medicine (NLM). SemRep extracts binary relations, called semantic predications, between biomedical entities from the UMLS Metathesaurus with predicates coming from an extension of the UMLS Semantic Network. Although SemRep achieves reasonable precision, its recall is very low on a gold standard dataset created for its evaluation. Given many applications in LBD and IR already use the predication database SemMedDB (obtained by running SemRep on all biomedical citations made available through PubMed), a predication extraction framework with a higher recall and a low acceptable loss in precision is more desirable especially if it can complement SemRep's extractions. We propose to build and evaluate a supervised BRE framework that converts syntactic relations obtained using the paradigm of open information extraction (OIE) to semantic predications by leveraging the existing database of predications in SemMedDB and relations from the UMLS Metathesaurus through distant supervision. We will conduct domain independent evaluation based on a gold standard dataset built by researchers at the NLM for evaluating SemRep. We will also conduct application oriented evaluations by simulating predication graph based document and passage retrieval using the Text REtrieval Conference (TREC) Genomics and OHSUMED datasets for IR experiments. We will also evaluate the quality of subgraphs resulting from LBD experiments to rediscover nine well known biomedical discoveries. We hypothesize that the predications extracted through our methods will complement those in SemMedDB and the combined predication dataset will result in improved overall performance compared with using SemMedDB alone.

Public Health Relevance

Semantic predications are binary relations extracted from biomedical text by the SemRep program and connect biomedical entities with a ?xed set of relation types. Although SemRep extractions have reasonable precision, their recall is very low. We propose to build a supervised predication extraction framework whose results will complement SemRep's extractions in terms of improved performance in both direct gold standard evaluation and application oriented evaluation in the context of information retrieval and literature based discovery.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21LM012274-02
Application #
9274042
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Vanbiervliet, Alan
Project Start
2016-06-01
Project End
2018-05-31
Budget Start
2017-06-01
Budget End
2018-05-31
Support Year
2
Fiscal Year
2017
Total Cost
$164,263
Indirect Cost
$42,009
Name
University of Kentucky
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
939017877
City
Lexington
State
KY
Country
United States
Zip Code
40506
Peng, Yifan; Rios, Anthony; Kavuluru, Ramakanth et al. (2018) Extracting chemical-protein relations with ensembles of SVM and deep learning models. Database (Oxford) 2018:
Rios, Anthony; Kavuluru, Ramakanth (2018) EMR Coding with Semi-Parametric Multi-Head Matching Networks. Proc Conf 2018:2081-2091
Tran, Tung; Kavuluru, Ramakanth (2018) An end-to-end deep learning architecture for extracting protein-protein interactions affected by genetic mutations. Database (Oxford) 2018:
Bakal, Gokhan; Talari, Preetham; Kakani, Elijah V et al. (2018) Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations. J Biomed Inform 82:189-199
Rios, Anthony; Kavuluru, Ramakanth; Lu, Zhiyong (2018) Generalizing biomedical relation classification with neural adversarial domain adaptation. Bioinformatics 34:2973-2981
Tran, Tung; Kavuluru, Ramakanth (2017) Predicting mental conditions based on ""history of present illness"" in psychiatric notes with deep neural networks. J Biomed Inform 75S:S138-S148
Kavuluru, Ramakanth; Rios, Anthony; Tran, Tung (2017) Extracting Drug-Drug Interactions with Word and Character-Level Recurrent Neural Networks. IEEE Int Conf Healthc Inform 2017:5-12
Sabbir, Akm; Jimeno-Yepes, Antonio; Kavuluru, Ramakanth (2017) Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings Proc IEEE Int Symp Bioinformatics Bioeng 2017:163-170
Rios, Anthony; Kavuluru, Ramakanth (2017) Ordinal convolutional neural networks for predicting RDoC positive valence psychiatric symptom severity scores. J Biomed Inform 75S:S85-S93