Relations linking various biomedical entities constitute a crucial resource that enables biomedical data science applications and knowledge discovery. Relational information spans the translational science spectrum going from biology (e.g., protein?protein interactions) to translational bioinformatics (e.g., gene?disease associations), and eventually to clinical care (e.g., drug?drug interactions). Scientists report newly discovered relations in nat- ural language through peer-reviewed literature and physicians may communicate them in clinical notes. More recently, patients are also reporting side-effects and adverse events on social media. With exponential growth in textual data, advances in biomedical natural language processing (BioNLP) methods are gaining prominence for biomedical relation extraction (BRE) from text. Most current efforts in BRE follow a pipeline approach containing named entity recognition (NER), entity normalization (EN), and relation classi?cation (RC) as subtasks. They typically suffer from error snowballing ? errors in a component of the pipeline leading to more downstream errors ? resulting in lower performance of the overall BRE system. This situation has lead to evaluation of different BRE substaks conducted in isolation. In this proposal we make a strong case for strictly end-to-end evaluations where relations are to be produced from raw text. We propose novel deep neural network architectures that model BRE in an end-to-end fashion and directly identify relations and corresponding entity spans in a single pass. We also extend our architectures to n-ary and cross-sentence settings where more than two entities may need to be linked even as the relation is expressed across multiple sentences. We also propose to create two new gold standard BRE datasets, one for drug?disease treatment relations and another ?rst of a kind dataset for combination drug therapies. Our main hypothesis is that our end-to-end extraction models will yield supe- rior performance when compared with traditional pipelines. We test this through (1). intrinsic evaluations based on standard performance measures with several gold standard datasets and (2). extrinsic application oriented assessments of relations extracted with use-cases in information retrieval, question answering, and knowledge base completion. All software and data developed as part of this project will be made available for public use and we hope this will foster rigorous end-to-end benchmarking of BRE systems.

Public Health Relevance

Relations connecting biomedical entities are at the heart of biomedical research given they encapsulate mech- anisms of disease etiology, progression, and treatment. As most such relations are ?rst disclosed in textual narratives (scienti?c literature or clinical notes), methods to extract and represent them in a structured format are essential to facilitate applications such as hypotheses generation, question answering, and information retrieval. The high level objective of this project is to develop and evaluate novel end-to-end supervised machine learning methods for biomedical relation extraction using latest advances in deep neural networks.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Kentucky
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code