From Syntactic Relations to Semantic Predications: Porting Open Information Extraction to Biomedicine

Kavuluru, Venkata

Abstract

Owing to exponential growth in scienti?c literature, it has become increasingly dif?cult for researchers to keep up with the latest developments in their ?elds of study. Hence, computational approaches that automatically mine large amounts of free text to extract essential information have gained popularity. This information is typically represented in the form of binary relations between different biomedical concepts. In this context, automatic extraction of meaningful relations from natural language narratives, a task often termed biomedical relation extraction (BRE), has garnered attention from informaticians. The relations extracted are used in high level applications including information retrieval (IR), literature based discovery (LBD), question answering, and text summarization. Most current BRE efforts tend to focus on a speci?c subdomain in biomedicine. For example, researchers built models that extract gene-protein or gene-gene interactions; in the clinical domain, recent results are focused on drug-drug and drug-disease interactions mentioned in clinical narratives. The only effort that extracts a broad set of relations adhering t a large standardized vocabulary is the rule based SemRep program being developed by researchers at the National Library of Medicine (NLM). SemRep extracts binary relations, called semantic predications, between biomedical entities from the UMLS Metathesaurus with predicates coming from an extension of the UMLS Semantic Network. Although SemRep achieves reasonable precision, its recall is very low on a gold standard dataset created for its evaluation. Given many applications in LBD and IR already use the predication database SemMedDB (obtained by running SemRep on all biomedical citations made available through PubMed), a predication extraction framework with a higher recall and a low acceptable loss in precision is more desirable especially if it can complement SemRep's extractions. We propose to build and evaluate a supervised BRE framework that converts syntactic relations obtained using the paradigm of open information extraction (OIE) to semantic predications by leveraging the existing database of predications in SemMedDB and relations from the UMLS Metathesaurus through distant supervision. We will conduct domain independent evaluation based on a gold standard dataset built by researchers at the NLM for evaluating SemRep. We will also conduct application oriented evaluations by simulating predication graph based document and passage retrieval using the Text REtrieval Conference (TREC) Genomics and OHSUMED datasets for IR experiments. We will also evaluate the quality of subgraphs resulting from LBD experiments to rediscover nine well known biomedical discoveries. We hypothesize that the predications extracted through our methods will complement those in SemMedDB and the combined predication dataset will result in improved overall performance compared with using SemMedDB alone.

Public Health Relevance

Semantic predications are binary relations extracted from biomedical text by the SemRep program and connect biomedical entities with a ?xed set of relation types. Although SemRep extractions have reasonable precision, their recall is very low. We propose to build a supervised predication extraction framework whose results will complement SemRep's extractions in terms of improved performance in both direct gold standard evaluation and application oriented evaluation in the context of information retrieval and literature based discovery.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Exploratory/Developmental Grants (R21)
Project #: 5R21LM012274-02
Application #: 9274042
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Vanbiervliet, Alan

Project Start: 2016-06-01
Project End: 2018-05-31
Budget Start: 2017-06-01
Budget End: 2018-05-31
Support Year: 2
Fiscal Year: 2017
Total Cost: $164,263
Indirect Cost: $42,009

Institution

Name: University of Kentucky
Department: Biostatistics & Other Math Sci
Type: Schools of Public Health
DUNS #: 939017877

City: Lexington
State: KY
Country: United States
Zip Code: 40506

Related projects


NIH 2017 R21 LM	From Syntactic Relations to Semantic Predications: Porting Open Information Extraction to Biomedicine Kavuluru, Venkata Naga Ramakanth / University of Kentucky	$164,263
NIH 2016 R21 LM	From Syntactic Relations to Semantic Predications: Porting Open Information Extraction to Biomedicine Kavuluru, Venkata Naga Ramakanth / University of Kentucky	$209,697

Publications

Peng, Yifan; Rios, Anthony; Kavuluru, Ramakanth et al. (2018) Extracting chemical-protein relations with ensembles of SVM and deep learning models. Database (Oxford) 2018:

Rios, Anthony; Kavuluru, Ramakanth (2018) EMR Coding with Semi-Parametric Multi-Head Matching Networks. Proc Conf 2018:2081-2091

Tran, Tung; Kavuluru, Ramakanth (2018) An end-to-end deep learning architecture for extracting protein-protein interactions affected by genetic mutations. Database (Oxford) 2018:

Bakal, Gokhan; Talari, Preetham; Kakani, Elijah V et al. (2018) Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations. J Biomed Inform 82:189-199

Rios, Anthony; Kavuluru, Ramakanth; Lu, Zhiyong (2018) Generalizing biomedical relation classification with neural adversarial domain adaptation. Bioinformatics 34:2973-2981

Tran, Tung; Kavuluru, Ramakanth (2017) Predicting mental conditions based on ""history of present illness"" in psychiatric notes with deep neural networks. J Biomed Inform 75S:S138-S148

Kavuluru, Ramakanth; Rios, Anthony; Tran, Tung (2017) Extracting Drug-Drug Interactions with Word and Character-Level Recurrent Neural Networks. IEEE Int Conf Healthc Inform 2017:5-12

Sabbir, Akm; Jimeno-Yepes, Antonio; Kavuluru, Ramakanth (2017) Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings Proc IEEE Int Symp Bioinformatics Bioeng 2017:163-170

Rios, Anthony; Kavuluru, Ramakanth (2017) Ordinal convolutional neural networks for predicting RDoC positive valence psychiatric symptom severity scores. J Biomed Inform 75S:S85-S93

Comments

Be the first to comment on Venkata Kavuluru's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: