The adoption of Electronic Health Record (EHR) systems is growing at a fast pace in the U.S., and this growth results in very large quantities of patient clinical data becoming available in electronic format, with tremendous potentials, but also equally growing concern for patient confidentiality breaches. Secondary use of clinical data is essential to fulfill the potentials for high quality healthcare, improved healthcare management, and effective clinical research. NIH expects that larger research projects share their research data in a way that protects the confidentiality of the research subjects. De-identification of patient data has been proposed as a solution to both facilitate secondary uses of clinical data, and protect patient data confidentiality. The majority of clinical data found in the EHR is represented as narrative text clinical notes, and de-identification of clinical text is a tedious ad costly manual endeavor. Automated approaches based on Natural Language Processing have been implemented and evaluated, allowing for much faster de- identification than manual approaches. Clinacuity, Inc. proposes to develop a new system to automatically de-identify clinical notes found in the EHR, to then improve the availability of clinical text for secondary uses, as well as ameliorate the protection of patient data confidentiality. To establish the merit and feasibility of such a system, we will work on the following objectives: 1) create a reference standard for training and testing the text de-identification application, a reference standard that will include a random sample of clinical narratives with protected health information annotated by domain experts; 2) develop a prototype to automatically de-identify clinical text in near real-time, a prototype that will implement a novel stepwise hybrid approach to maximize sensitivity first (our priority for de-identification), and then filter out false positives to enhance positive predictive value, and also replace PHI with realistic substitutes for improved protection; and 3) test the prototype with the aforementioned reference standard, using a cross-validation approach for training and testing, and also train and test the prototype with a reference standard from another healthcare organization. Commercial application: To strengthen patient information confidentiality protection, the HITECH Act heightened financial penalties incurred for breaches of PHI, even introducing criminal penalties. These new severe consequences for violation of patient information confidentiality render protection requirements even more obvious, and automatic high-accuracy clinical text de-identification, as offered by the system Clinacuity proposes, will strongly help healthcare and clinical research organizations avoid such consequences. This system has potential commercial applications in clinical research and in healthcare settings. It will improve access to richer, more detailed, and more accurate clinical data (in clinical text) for clinical researchers. It will also ease research data sharing, and help healthcare organizations protect patient data confidentiality.

Public Health Relevance

The adoption of Electronic Health Record systems is growing at a fast pace in the U.S., and this growth results in very large quantities of patient clinical daa becoming available in electronic format, with tremendous potentials, but also equally growing concern for patient confidentiality breaches. Secondary use of clinical data is essential to fulfil the potentials for high quality healthcare and effective clinical research. De-identification of patient data has been proposed as a solution to both facilitate secondary uses of clinical data, and protect patient data confidentiality. The majority of clinical data found in the EHR is represented as narrative text clinical notes that have been dictated and transcribed or directly typed in, and de-identification of clinical text is a tedious and costly manual endeavor. Automated approaches based on Natural Language Processing have been implemented and evaluated, allowing for much faster de-identification than manual approaches. The overall goal of this project is to develop a new system to automatically de-identify clinical narrative text in he Electronic Health Record, to then improve the availability of clinical text for secondary uses, as well as ameliorate the protection of patient data confidentiality.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Small Business Technology Transfer (STTR) Grants - Phase I (R41)
Project #
1R41GM116479-01
Application #
8982010
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Ravichandran, Veerasamy
Project Start
2016-02-15
Project End
2017-02-14
Budget Start
2016-02-15
Budget End
2017-02-14
Support Year
1
Fiscal Year
2016
Total Cost
Indirect Cost
Name
Clinacuity,Inc.
Department
Type
DUNS #
078649023
City
Salt Lake City
State
UT
Country
United States
Zip Code
84101