Interactive machine learning methods for clinical natural language processing

Xu, Hua

Abstract

Growing deployments of electronic health records (EHRs) systems have made massive clinical data available electronically. However, much of detailed clinical information of patients is embedded in narrative text and is not directly accessible for computerized clinical applications. Therefore, natural language processing (NLP) technologies, which can unlock information in narrative document, have received great attention in the medical domain. Current state-of-the-art NLP approaches often involve building probabilistic models. However, the wide adoption of statistical methods in clinical NLP faces two grand challenges: 1) the lack of large annotated clinical corpora; and 2) the lack of methodologies that can efficiently integrate linguistic and domain knowledge with statistical learning. High-performance statistical NLP methods rely on large scale and high quality annotations of clinical text, but it is time-consuming and costly to create large annotated clinica corpora as it often requires manual review by physicians. Moreover, the medical domain is knowledge intensive. To achieve optimal performance, probabilistic models need to leverage medical domain knowledge. Therefore, methods that can efficiently integrate domain and expert knowledge with machine learning processes to quickly build high-quality probabilistic models with minimum annotation cost would be highly desirable for clinical text processing. In this study, we propose to investigate interactive machine learning (IML) methods to address the above challenges in clinical NLP. An IML system builds a classification model in an iterative process, which can actively select informative samples for annotation based on models built on previously annotated samples, thus reducing the annotation cost for model development. More importantly, an IML system also involves human inputs to the learning process (e.g., an expert can specify important features for a classification task based on domain knowledge). Thus, IML is an ideal framework for efficiently integrating rule-based (via domain experts specifying features) and statistics-based (via different learning algorithms) approaches to clinical NLP. To achieve our goal, we propose three specific aims.
In Aim 1, we plan to investigate different aspects of IML for word sense disambiguation, including developing new active learning algorithms and conducting cognitive usability analysis for efficient feature annotation by users. To demonstrate the broad uses of IML, we further extend IML approaches to two other important clinical NLP classification tasks: named entity recognition and clinical phenoytping in Aim 2. Finally we propose to disseminate the IML methods and tools to the biomedical research community in Aim 3.

Public Health Relevance

In this project, we propose to develop interactive machine learning methods to process clinical text stored in electronic health records (EHRs) systems. Such methods can efficiently integrate domain and expert knowledge with machine learning processes to quickly build high-quality probabilistic models with minimum annotation cost, thus improving performance of text processors. This technology will allow more accurate data extraction from clinical documents, thus to facilitate clinical research that rely on EHRs data.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM010681-07
Application #: 9132834
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Sim, Hua-Chuan

Project Start: 2010-05-31
Project End: 2018-09-28
Budget Start: 2016-09-29
Budget End: 2017-09-28
Support Year: 7
Fiscal Year: 2016
Total Cost
Indirect Cost

Institution

Name: University of Texas Health Science Center Houston
Department
Type: Sch Allied Health Professions
DUNS #: 800771594

City: Houston
State: TX
Country: United States
Zip Code: 77225

Related projects


NIH 2017 R01 LM	Interactive machine learning methods for clinical natural language processing Xu, Hua / University of Texas Health Science Center Houston
NIH 2016 R01 LM	Interactive machine learning methods for clinical natural language processing Xu, Hua / University of Texas Health Science Center Houston
NIH 2015 R01 LM	Interactive machine learning methods for clinical natural language processing Xu, Hua / University of Texas Health Science Center Houston
NIH 2014 R01 LM	Interactive machine learning methods for clinical natural language processing Xu, Hua / University of Texas Health Science Center Houston	$558,372
NIH 2012 R01 LM	Real-time Disambiguation of Abbreviations in Clinical Notes Xu, Hua / Vanderbilt University Medical Center	$129,035
NIH 2011 R01 LM	Real-time Disambiguation of Abbreviations in Clinical Notes Xu, Hua / Vanderbilt University Medical Center	$374,000
NIH 2010 R01 LM	Real-time Disambiguation of Abbreviations in Clinical Notes Xu, Hua / Vanderbilt University Medical Center	$387,500

Publications

Lee, Hee-Jin; Zhang, Yaoyun; Jiang, Min et al. (2018) Identifying direct temporal relations between time and events from clinical notes. BMC Med Inform Decis Mak 18:49

Brusco, Lauren L; Wathoo, Chetna; Mills Shaw, Kenna R et al. (2018) Physician interpretation of genomic test results and treatment selection. Cancer 124:966-972

Zhang, Yaoyun; Zhang, Olivia; Wu, Yonghui et al. (2017) Psychiatric symptom recognition without labeled data using distributional representations of phrases and on-line knowledge. J Biomed Inform 75S:S129-S137

Wu, Yonghui; Jiang, Min; Xu, Jun et al. (2017) Clinical Named Entity Recognition Using Deep Learning Models. AMIA Annu Symp Proc 2017:1812-1819

Ji, Zongcheng; Zhang, Yaoyun; Xu, Jun et al. (2017) Comparing Cancer Information Needs for Consumers in the US and China. Stud Health Technol Inform 245:126-130

Lee, Hee-Jin; Zhang, Yaoyun; Roberts, Kirk et al. (2017) Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation. AMIA Annu Symp Proc 2017:1070-1079

Lee, Hee-Jin; Wu, Yonghui; Zhang, Yaoyun et al. (2017) A hybrid approach to automatic de-identification of psychiatric notes. J Biomed Inform 75S:S19-S27

Zhang, Yaoyun; Xu, Jun; Chen, Hui et al. (2016) Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning. Database (Oxford) 2016:

Duan, Rui; Cao, Ming; Wu, Yonghui et al. (2016) An Empirical Study for Impacts of Measurement Errors on EHR based Association Studies. AMIA Annu Symp Proc 2016:1764-1773

Xu, Jun; Wu, Yonghui; Zhang, Yaoyun et al. (2016) CD-REST: a system for extracting chemical-induced disease relation in literature. Database (Oxford) 2016:

Showing the most recent 10 out of 30 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: