This project builds a novel privacy-preserving framework with both new algorithms and software tools to: 1) evaluate the effectiveness of current identifier-suppression techniques for Electronic Healthcare Record (EHR) data; 2) de-identify and anonymize EHR data to protect personal information without significantly reducing the utility of data for secondary data analysis. The proposed techniques eliminate the violation of privacy through re-identification, and facilitate the secondary usage, sharing, publishing and exchange of healthcare data without the risk of breaching protected health information (PHI). This new privacy-preserving framework injects the ICD-9-CM-aware constraint-based privacy-preserving techniques into EHRs to eliminate the threat of identifying an individual in the secondary use of research data. The proposed technique and development can be readily adapted to other types of healthcare databases in order to ensure privacy and prevent re-identification of published data. The project produces groundbreaking algorithms and tools for identifying privacy leakages and protecting personal privacy information in EHRs to improve healthcare data publishing. New privacy-preserving techniques developed in this project lead towards a new type of healthcare science for EHRs. The project also delivers fundamental advancements to engineering by showing how to integrate biomedical domain knowledge with a computationally advanced quantitative framework for preserving the privacy of published EHRs. HIPAA has established protocols and industry standards to protect the confidentiality of PHI. However, our results demonstrate that, even with regard to health data that meets HIPAA requirements, the risk of re-identification is not completely eliminated. By identifying the security vulnerabilities inherent in the HIPAA standards, our research develops a more rigorous security standard that greatly improves privacy protections by applying state-of-the-art algorithms.

The developed data privacy-preserving framework has significant implications for the future of US healthcare data publishing and related applications. Specifically, the transition from paper records to EHRs has accelerated significantly since the passage of the HITECH Act of 2009. The Act provides monetary incentives for the "meaningful use" of EHRs. As a result, the quality and quantity of healthcare databases has risen sharply, which has renewed the public's fear of a breach of privacy of their medical information. This research work is innovative and crucial not only for facilitating EHR data publishing, but also for enhancing the development and promotion of EHRs. At the educational front, this project facilitates the development of novel educational tools to construct entirely new courses and laboratory classes for healthcare, data privacy, data mining, and a wide range of applications. As a result, it enhances the current instructional methods for teaching data privacy and data mining, and has compelling biomedical and healthcare applications that can facilitate learning of computational algorithms. This project involves both undergraduate and graduate students in the three participating institutions. The PIs make a strong effort to engage minority graduate and undergraduate students in research activities in order to increase their exposure to cutting-edge research.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1344072
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
2014-01-01
Budget End
2017-12-31
Support Year
Fiscal Year
2013
Total Cost
$198,000
Indirect Cost
Name
University of North Texas Health Science Center at Fort Worth
Department
Type
DUNS #
City
Fort Worth
State
TX
Country
United States
Zip Code
76107