Bayesian Generative Methods for Extracting and Modeling Relations in EHR Narratives

Luo, Yuan

Abstract

Medicine has evolved into an era where the entire hospital progressively adopts more real-time monitoring for the patients and generates ICU like clinical data. The rapidly growing data makes ICU a snapshot for tomorrow?s standard of care that should benefit from computer-aided decision making. These data contain not only numerical or coded information, but also a large volume of unstructured narrative text such as physicians? and nurses' notes, specialists' reports, and discharge summaries. Both types of data have been shown to be highly informative for tasks such as cohort selection, and work best in combination. However, to achieve this, specific bits of information must be extracted from the narrative reports and coded in formal representation. These bits include medical concepts such as symptoms, diseases, medications and procedures; characteristics such as certainty, severity, dose; assertions about these items, such as whether they pertain to the patient or a family member, etc.; relations among these mentions, including indications of what condition is treated by what action and its degree of success, the time sequence and duration of events, and interpretations of laboratory test results as relations among medical concepts such as cells and antigens (e.g., ?[large atypical cells] express [CD30]?). Concepts and assertions can be regarded as simple relations, and our proposal focuses on modeling narrative relations to augment structured data for predicting patient outcomes. Most existing techniques for interpreting clinical narratives either rely on hand-crafted rule systems and large medical thesauri or are based on machine learning models that create classification or regression models from large annotated data sets. The former are difficult and laborious to generalize, whereas the latter require large volumes of human-labeled data and may result in models whose operation is difficult to interpret and is therefore considered unsuitable for computer-aided decision making. We propose to build on our previous work to use unsupervised learning methods that identify frequent patterns in un-annotated narratives and identify informative patterns by tensor factorization. Although existing methods can also identify patterns that are meaningful in a data-driven sense, these patterns are difficult for clinicians to understand. Our specific goal is to develop a novel method that uses a Bayesian generative model that integrates relation mining with tensor factorization to learn patterns that correspond to an understanding of the clinical domain and can be used for evidence based patient outcome prediction. Our framework represents relations in clinical narratives as graphs, then mines subgraphs for important relations. These relations are used as features in building up a tensor model in order to reduce dimensionality, discover coherent groups of relations, and explore the group interactions. We develop Bayesian formulation to integrate relation mining and tensor modeling in a generative model, to incorporate existing medical knowledge as probability priors, as well as to reliably estimate the posterior probabilities and confidence intervals of any findings from the model.

Public Health Relevance

Rapid growth in the hospital adoption of large volume of Electronic Health Records (EHRs) has led to an unprecedented availability of narrative dataset for clinical and translational research. We propose the development of a novel Bayesian generative framework to enable extraction of accurate and clinically meaningful patterns of EHR narratives in order to support evidence based diagnostic reasoning and outcome risk prediction.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Exploratory/Developmental Grants (R21)
Project #: 5R21LM012618-02
Application #: 9535479
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Sim, Hua-Chuan

Project Start: 2017-09-01
Project End: 2019-08-31
Budget Start: 2018-09-01
Budget End: 2019-08-31
Support Year: 2
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: Northwestern University at Chicago
Department: Public Health & Prev Medicine
Type: Schools of Medicine
DUNS #: 005436803

City: Chicago
State: IL
Country: United States
Zip Code: 60611

Related projects


NIH 2018 R21 LM	Bayesian Generative Methods for Extracting and Modeling Relations in EHR Narratives Luo, Yuan / Northwestern University at Chicago
NIH 2017 R21 LM	Bayesian Generative Methods for Extracting and Modeling Relations in EHR Narratives Luo, Yuan / Northwestern University at Chicago

Publications

Sanchez-Pinto, L Nelson; Luo, Yuan; Churpek, Matthew M (2018) Big Data and Data Science in Critical Care. Chest 154:1239-1248

Garg, Ravi; Prabhakaran, Shyam; Holl, Jane L et al. (2018) Improving the Accuracy of Scores to Predict Gastrostomy after Intracerebral Hemorrhage with Machine Learning. J Stroke Cerebrovasc Dis 27:3570-3574

Zeng, Zexian; Espino, Sasa; Roy, Ankita et al. (2018) Using natural language processing and machine learning to identify breast cancer local recurrence. BMC Bioinformatics 19:498

Xue, Ye; Klabjan, Diego; Luo, Yuan (2018) Predicting ICU readmission using grouped physiological and medication trends. Artif Intell Med :

Mao, Chengsheng; Zhao, Yuan; Sun, Mengxin et al. (2018) Are My EHRs Private Enough? - Event-level Privacy Protection. IEEE/ACM Trans Comput Biol Bioinform :

Zeng, Zexian; Deng, Yu; Li, Xiaoyu et al. (2018) Natural Language Processing for EHR-Based Computational Phenotyping. IEEE/ACM Trans Comput Biol Bioinform :

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: