Relations linking various biomedical entities constitute a crucial resource that enables biomedical data science applications and knowledge discovery. Relational information spans the translational science spectrum going from biology (e.g., protein?protein interactions) to translational bioinformatics (e.g., gene?disease associations), and eventually to clinical care (e.g., drug?drug interactions). Scientists report newly discovered relations in nat- ural language through peer-reviewed literature and physicians may communicate them in clinical notes. More recently, patients are also reporting side-effects and adverse events on social media. With exponential growth in textual data, advances in biomedical natural language processing (BioNLP) methods are gaining prominence for biomedical relation extraction (BRE) from text. Most current efforts in BRE follow a pipeline approach containing named entity recognition (NER), entity normalization (EN), and relation classi?cation (RC) as subtasks. They typically suffer from error snowballing ? errors in a component of the pipeline leading to more downstream errors ? resulting in lower performance of the overall BRE system. This situation has lead to evaluation of different BRE substaks conducted in isolation. In this proposal we make a strong case for strictly end-to-end evaluations where relations are to be produced from raw text. We propose novel deep neural network architectures that model BRE in an end-to-end fashion and directly identify relations and corresponding entity spans in a single pass. We also extend our architectures to n-ary and cross-sentence settings where more than two entities may need to be linked even as the relation is expressed across multiple sentences. We also propose to create two new gold standard BRE datasets, one for drug?disease treatment relations and another ?rst of a kind dataset for combination drug therapies. Our main hypothesis is that our end-to-end extraction models will yield supe- rior performance when compared with traditional pipelines. We test this through (1). intrinsic evaluations based on standard performance measures with several gold standard datasets and (2). extrinsic application oriented assessments of relations extracted with use-cases in information retrieval, question answering, and knowledge base completion. All software and data developed as part of this project will be made available for public use and we hope this will foster rigorous end-to-end benchmarking of BRE systems.

Public Health Relevance

Relations connecting biomedical entities are at the heart of biomedical research given they encapsulate mech- anisms of disease etiology, progression, and treatment. As most such relations are ?rst disclosed in textual narratives (scienti?c literature or clinical notes), methods to extract and represent them in a structured format are essential to facilitate applications such as hypotheses generation, question answering, and information retrieval. The high level objective of this project is to develop and evaluate novel end-to-end supervised machine learning methods for biomedical relation extraction using latest advances in deep neural networks.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
1R01LM013240-01A1
Application #
10052028
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Sim, Hua-Chuan
Project Start
2020-07-01
Project End
2024-03-31
Budget Start
2020-07-01
Budget End
2021-03-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Kentucky
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
939017877
City
Lexington
State
KY
Country
United States
Zip Code
40526