The VA has invested hugely in electronic medical records and has achieved a nationwide system that collects medical information from all patients. Currently, the textual information in the medical records is inaccessible to all but a small number of researchers. In order to obtain the highest value from this existing system, administrators and practitioners need to be able to access the textual information they need. It is our responsibility to get the most benefit from thi resource for biomedical and patient care. Clinical natural language processing (NLP) is an important part the solution. The value of NLP has been recognized in the biomedical domain. Evidence of this includes funding for the following national initiatives focused on clinical NLP: Integrating Biology and the Bedside (i2b2), Consortium for Health Informatics Research (CHIR), VA Informatics and Computing Infrastructure (VINCI), Strategic Health IT Advanced Research Projects (SHARP), and electronic Medical Records & Genomics (eMERGE). On the one hand, these efforts testify to the demand for NLP research. They have produced new NLP tools, created annotated datasets, developed common data models, shared semantic labels, and even piloted a prototype software ecosystem. On the other hand, the general consensus in the informatics community is that processing and utilizing textual data remains challenging due to lack of interoperability and collaboration. Unless the pace of research and development is accelerated in clinical NLP, we cannot meet the increasing NLP demand originated from the biomedical and health services research community. Although synergistic development has the promise of advancing the science of NLP and accelerating the pace of NLP tool production, there lacks a vibrant collaborative environment attracting participation of a significant number of clinical NLP developers and researchers. Within the VA CHIR and VINCI efforts, we have created a prototype NLP ecosystem called V3NLP that supports the interoperability and integration of heterogeneous tools into VA research and operational initiatives. The environment needed to foster collaboration and a critical mass of users, however, is lacking. In the proposed project, we will study the needs of existing and potential users of the V3NLP ecosystem to increase its utility and ease of adoption and to facilitate collaboration. The ultimate goal of an NLP ecosystem is to produce new and more accurate NLP methods for clinical text. This requires a good understanding of the characteristics of various types of clinical text and the strengths and weakness of existing methods. Because most clinical NLP solutions have been driven by individual use cases and note collections, the resultant solutions are optimized for the characteristics of the specific NLP tasks and text corpora analyzed. Since there are numerous tasks and corpora, clinical NLP solutions tend to be difficult to re-use, especially by different developers. To remedy this, we will research characteristics of a very large and heterogeneous collection of VA text records to understand and model sublanguages in VA clinical notes. This systematic and comprehensive sublanguage analysis will play a critical role in the proposed ecosystem. It will guide the development of new clinical NLP methods as well as the customization of existing solutions. Our general goal is to accelerate clinical NLP research and development.
The specific aims are as follows: (1) Collect and analyze the needs of NLP developers, health informatics researchers and health services researchers to inform the design of a collaborative NLP ecosystem that will facilitate development of more accurate methods. (2) Design and implement a clinical NLP ecosystem that fosters collaboration and accelerates research and adoption of accurate and generalizable NLP methods. (3) Conduct a comprehensive sublanguage analysis to guide the creation of adaptable NLP tools and methods based on VA text notes to support text processing and information extraction across multiple clinical domains.

Public Health Relevance

The value of clinical natural language processing (NLP) has been recognized in the biomedical domain. Synergistic development has the promise of advancing the science of NLP, but there lacks a vibrant collaborative environment attracting significant participation of NLP developers, researchers, and informaticians. In the proposed project, we will collect and analyze the needs of NLP developers, researchers, and informaticians to inform the design of a collaborative NLP ecosystem. We will design and implement a clinical NLP ecosystem that fosters collaboration and accelerates research and adoption of accurate and generalizable NLP methods. We will also conduct a comprehensive sublanguage analysis based on VA text notes in order to guide the creation of adaptable NLP tools and methods that support text processing and information extraction across multiple VA clinical domains.

Agency
National Institute of Health (NIH)
Institute
Veterans Affairs (VA)
Type
Non-HHS Research Projects (I01)
Project #
5I01HX001145-03
Application #
9117281
Study Section
HCR 7 - CREATE Diabetes (HCR7)
Project Start
2013-12-01
Project End
2018-11-30
Budget Start
2015-12-01
Budget End
2016-11-30
Support Year
3
Fiscal Year
2018
Total Cost
Indirect Cost
Name
VA Salt Lake City Healthcare System
Department
Type
DUNS #
009094756
City
Salt Lake City
State
UT
Country
United States
Zip Code
84148