Semi-structured Information Retrieval in Clinical Text for Cohort Identification

Liu, Hongfang

Abstract

Natural Language Processing (NLP) techniques have shown promise for extracting data from the free text of electronic health records (EHRs), but studies have consistently found that techniques do not readily generalize across application settings. Unfortunately, most of the focus in applying NLP to real use cases has remained on a paradigm of single, well-defined application settings, so that generalizability to unseen use cases remains implicitly unaddressed. We propose to explicitly account for unseen application settings by adopting an information retrieval (IR) perspective with the objective of patient-level cohort identification. To do so, we introduce layered language models, an IR framework that enables the reuse of NLP-produced artifacts. Our long term goal is to accelerate investigations of patient health and disease by providing robust, user- centric tools that are necessary to process, retrieve, and utilize the free text of EHRs. The main goal of this proposal is to accurately retrieve ad hoc, realistic cohorts from clinical text at Mayo Clinic and OHSU, establishing methods, resources, and evaluation for patient-level IR. We hypothesize that cohort identification can be addressed in a generalizable fashion by a new IR framework: layered language models. We will test this hypothesis through four specific aims.
In Aim 1, we will make medical NLP artifacts searchable in our layered language IR framework. This involves storing and indexing the NLP artifacts, as well as using statistical language models to retrieve documents based on text and its associated NLP artifacts.
In Aim 2, we deal with the practical setting of ad hoc cohort identification, moving to patient-level (rather than document-level) IR. To accurately handle patient cohorts in which qualifying evidence may be spread over multiple documents, we will develop and implement patient-level retrieval models that account for cross- document relational and temporal combinations of events.
In Aim 3, we will construct parallel IR test collections using EHR data from two sites; a diverse set of cohort queries written by multiple people toward various clinical or epidemiological ends; and assessments of which patients are relevant to which queries at both sites. Finally, in Aim 4, we refine and evaluate patient-level layered language IR on the ad hoc cohort identification task, making comparisons across the users, queries, optimization metrics, and institutions. We will draw additional extrinsic comparisons with pre-existing techniques, e.g., for cohorts from the Electronic Medical Records and Genonmics network. The expected outcomes of the proposed work are: (i) An open-source cohort identification tool, usable by clinicians and epidemiologists, that makes principled use of NLP artifacts for unseen queries; ii) A parallel test collection for cohort identification, includig two intra-institutional document collections, diverse test topics and user-produced text queries, and patient-level judgments of relevance to each query; and (iii) Validation of the reusability of medical NLP via the task of retrieving patient cohorts.

Public Health Relevance

With the widespread adoption of electronic medical records, one might expect that it would be simple for a medical expert to find things like 'patients in the community who suffer from asthma.' Unfortunately, on top of lab tests, medications, and demographic information, there are observations that a physician writes down as text - which are difficult for a computer to understand. Therefore, we aim to process text so that a computer can understand enough of it, and then search that text along with the rest of a patient's medical record; this will allow clinicians or researchers to find and study patients groups of interest.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM011934-05
Application #: 9534183
Study Section: Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer: Vanbiervliet, Alan

Project Start: 2014-09-20
Project End: 2019-07-31
Budget Start: 2018-08-01
Budget End: 2019-07-31
Support Year: 5
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: Mayo Clinic, Rochester
Department
Type
DUNS #: 006471700

City: Rochester
State: MN
Country: United States
Zip Code: 55905

Related projects


NIH 2018 R01 LM	Semi-structured Information Retrieval in Clinical Text for Cohort Identification Liu, Hongfang / Mayo Clinic, Rochester
NIH 2017 R01 LM	Semi-structured Information Retrieval in Clinical Text for Cohort Identification Liu, Hongfang / Mayo Clinic, Rochester
NIH 2016 R01 LM	Semi-structured Information Retrieval in Clinical Text for Cohort Identification Liu, Hongfang; Wu, Stephen Tze-Inn / Mayo Clinic, Rochester	$387,966
NIH 2015 R01 LM	Semi-structured Information Retrieval in Clinical Text for Cohort Identification Liu, Hongfang; Wu, Stephen Tze-Inn / Mayo Clinic, Rochester
NIH 2014 R01 LM	Semi-structured Information Retrieval in Clinical Text for Cohort Identification Liu, Hongfang; Wu, Stephen Tze-Inn / Mayo Clinic, Rochester	$460,688

Publications

Liu, Sijia; Shen, Feichen; Komandur Elayavilli, Ravikumar et al. (2018) Extracting chemical-protein relations using attention-based neural networks. Database (Oxford) 2018:

Wang, Yanshan; Liu, Sijia; Afzal, Naveed et al. (2018) A comparison of word embeddings for the biomedical natural language processing. J Biomed Inform 87:12-20

Wu, Stephen; Liu, Sijia; Sohn, Sunghwan et al. (2018) Modeling asynchronous event sequences with RNNs. J Biomed Inform 83:167-177

Wang, Yanshan; Wang, Liwei; Rastegar-Mojarad, Majid et al. (2018) Clinical information extraction applications: A literature review. J Biomed Inform 77:34-49

Wang, Liwei; Rastegar-Mojarad, Majid; Ji, Zhiliang et al. (2018) Detecting Pharmacovigilance Signals Combining Electronic Medical Records With Spontaneous Reports: A Case Study of Conventional Disease-Modifying Antirheumatic Drugs for Rheumatoid Arthritis. Front Pharmacol 9:875

Zeng, Yuqun; Liu, Xusheng; Wang, Yanshan et al. (2017) Recommending Education Materials for Diabetic Questions Using Information Retrieval Approaches. J Med Internet Res 19:e342

Liu, Sijia; Wang, Liwei; Ihrke, Donna et al. (2017) Correlating Lab Test Results in Clinical Notes with Structured Lab Data: A Case Study in HbA1c and Glucose. AMIA Jt Summits Transl Sci Proc 2017:221-228

Ravikumar, K E; Rastegar-Mojarad, Majid; Liu, Hongfang (2017) BELMiner: adapting a rule-based relation extraction system to extract biological expression language statements from bio-medical literature evidence sentences. Database (Oxford) 2017:

Shen, Feichen; Liu, Sijia; Wang, Yanshan et al. (2017) Leveraging Collaborative Filtering to Accelerate Rare Disease Diagnosis. AMIA Annu Symp Proc 2017:1554-1563

Sohn, Sunghwan; Wi, Chung-Il; Juhn, Young J et al. (2017) Analysis of Clinical Variations in Asthma Care Documented in Electronic Health Records Between Staff and Resident Physicians. Stud Health Technol Inform 245:1170-1174

Showing the most recent 10 out of 21 publications

Comments

Be the first to comment on Hongfang Liu's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: