Medicine has evolved into an era where the entire hospital progressively adopts more real-time monitoring for the patients and generates ICU like clinical data. The rapidly growing data makes ICU a snapshot for tomorrow?s standard of care that should benefit from computer-aided decision making. These data contain not only numerical or coded information, but also a large volume of unstructured narrative text such as physicians? and nurses' notes, specialists' reports, and discharge summaries. Both types of data have been shown to be highly informative for tasks such as cohort selection, and work best in combination. However, to achieve this, specific bits of information must be extracted from the narrative reports and coded in formal representation. These bits include medical concepts such as symptoms, diseases, medications and procedures; characteristics such as certainty, severity, dose; assertions about these items, such as whether they pertain to the patient or a family member, etc.; relations among these mentions, including indications of what condition is treated by what action and its degree of success, the time sequence and duration of events, and interpretations of laboratory test results as relations among medical concepts such as cells and antigens (e.g., ?[large atypical cells] express [CD30]?). Concepts and assertions can be regarded as simple relations, and our proposal focuses on modeling narrative relations to augment structured data for predicting patient outcomes. Most existing techniques for interpreting clinical narratives either rely on hand-crafted rule systems and large medical thesauri or are based on machine learning models that create classification or regression models from large annotated data sets. The former are difficult and laborious to generalize, whereas the latter require large volumes of human-labeled data and may result in models whose operation is difficult to interpret and is therefore considered unsuitable for computer-aided decision making. We propose to build on our previous work to use unsupervised learning methods that identify frequent patterns in un-annotated narratives and identify informative patterns by tensor factorization. Although existing methods can also identify patterns that are meaningful in a data-driven sense, these patterns are difficult for clinicians to understand. Our specific goal is to develop a novel method that uses a Bayesian generative model that integrates relation mining with tensor factorization to learn patterns that correspond to an understanding of the clinical domain and can be used for evidence based patient outcome prediction. Our framework represents relations in clinical narratives as graphs, then mines subgraphs for important relations. These relations are used as features in building up a tensor model in order to reduce dimensionality, discover coherent groups of relations, and explore the group interactions. We develop Bayesian formulation to integrate relation mining and tensor modeling in a generative model, to incorporate existing medical knowledge as probability priors, as well as to reliably estimate the posterior probabilities and confidence intervals of any findings from the model.

Public Health Relevance

Rapid growth in the hospital adoption of large volume of Electronic Health Records (EHRs) has led to an unprecedented availability of narrative dataset for clinical and translational research. We propose the development of a novel Bayesian generative framework to enable extraction of accurate and clinically meaningful patterns of EHR narratives in order to support evidence based diagnostic reasoning and outcome risk prediction.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21LM012618-02
Application #
9535479
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Sim, Hua-Chuan
Project Start
2017-09-01
Project End
2019-08-31
Budget Start
2018-09-01
Budget End
2019-08-31
Support Year
2
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Northwestern University at Chicago
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
005436803
City
Chicago
State
IL
Country
United States
Zip Code
60611
Sanchez-Pinto, L Nelson; Luo, Yuan; Churpek, Matthew M (2018) Big Data and Data Science in Critical Care. Chest 154:1239-1248
Garg, Ravi; Prabhakaran, Shyam; Holl, Jane L et al. (2018) Improving the Accuracy of Scores to Predict Gastrostomy after Intracerebral Hemorrhage with Machine Learning. J Stroke Cerebrovasc Dis 27:3570-3574
Zeng, Zexian; Espino, Sasa; Roy, Ankita et al. (2018) Using natural language processing and machine learning to identify breast cancer local recurrence. BMC Bioinformatics 19:498
Xue, Ye; Klabjan, Diego; Luo, Yuan (2018) Predicting ICU readmission using grouped physiological and medication trends. Artif Intell Med :
Mao, Chengsheng; Zhao, Yuan; Sun, Mengxin et al. (2018) Are My EHRs Private Enough? - Event-level Privacy Protection. IEEE/ACM Trans Comput Biol Bioinform :
Zeng, Zexian; Deng, Yu; Li, Xiaoyu et al. (2018) Natural Language Processing for EHR-Based Computational Phenotyping. IEEE/ACM Trans Comput Biol Bioinform :