A Theoretical Foundation for Achievability and Optimization in Privacy-Preserving Data Mining
Data mining has been successfully applied to support a variety of applications, including marketing, weather forecasting, medical diagnosis, and homeland security. Mining data without violating the privacy of data being mined, however, is still a critical challenge. How to mine patientsÕ personal information, for example, is an ongoing problem in healthcare applications. Emerging privacy legislation, such as the Health Insurance Portability and Accountability Act (HIPAA), as well as the heightened public concerns about privacy protection, require immediate and resolute attention from the computing community on the protection of private information in data mining.
This research involves the understanding, analysis, and optimization of the tradeoff between privacy protection, accuracy of data mining, and system resources in privacy-preserving data mining. The methodology is to establish a solid theoretical foundation that defines the requirements for privacy protection in data mining, identifies the domain of privacy-preserving strategies, and determines the achievability of such strategies. This theoretical foundation enables the design and optimization of privacy-preserving data mining algorithms that are realistic, generic, and efficient. The research results of this project have broader impacts on the nationÕs higher education system and high-tech industries. The ability to mine private data without violating the privacy of data owners is a must for a wide variety of corporations, universities, hospitals, and government agencies. Similarly, theoretically and empirically validated means to protect privacy in data mining would benefit all privacy-concerned individuals at large. The impact of this project also extends to academia through educational efforts, including graduate and undergraduate student training, curriculum development, seminars, and outreach.