This Small Business Innovation Research Phase I project seeks to develop a novel prototype system for privacy protection in the information sharing and data mining environment. This research addresses our society's growing concerns about invasions to individual privacy by information technology in general, and by data mining in particular. The intellectual merit of the proposed project is four-fold. First, it identifies the problem of privacy breaches in the de-identified data. A common practice for many organizations today is to remove identity attributes from the customer records (called de-identification) before releasing them to the third party. This research analyzes the disclosure risks in such de-identified data. Second, the primary objective of the proposed research is to develop a privacy protection system to provide solutions for the problem, initially in the health-care domain. This project will take a systematic approach to develop methods and algorithms, conduct experimental evaluations with health-care providers, and produce a commercially viable and tested privacy protection solution. Third, the proposed research integrates a variety of techniques, such as linear programming, Bayes estimation, kd-trees and data masking, in an innovative and creative manner. The proposed approach overcomes several limitations in existing approaches and is flexible for integration with other related techniques. Fourth, this project is expected to result in a software solution consisting of a set of techniques that can be used by organizations to protect privacy, while providing the data to researchers and partners for research that benefits both the industry and society at large.
This project addresses the imperative privacy concerns and thus will have broader impacts in this information-rich society. This project has potential for significant commercial impact and strong commitments from industry. Several organizations have expressed interest to acquire the related products. Three organizations have agreed to participate in the study by providing data for experiments and feedback on the final prototype.