A Theoretical Foundation for Achievability and Optimization in Privacy-Preserving Data Mining

Data mining has been successfully applied to support a variety of applications, including marketing, weather forecasting, medical diagnosis, and homeland security. Mining data without violating the privacy of data being mined, however, is still a critical challenge. How to mine patientsÕ personal information, for example, is an ongoing problem in healthcare applications. Emerging privacy legislation, such as the Health Insurance Portability and Accountability Act (HIPAA), as well as the heightened public concerns about privacy protection, require immediate and resolute attention from the computing community on the protection of private information in data mining.

This research involves the understanding, analysis, and optimization of the tradeoff between privacy protection, accuracy of data mining, and system resources in privacy-preserving data mining. The methodology is to establish a solid theoretical foundation that defines the requirements for privacy protection in data mining, identifies the domain of privacy-preserving strategies, and determines the achievability of such strategies. This theoretical foundation enables the design and optimization of privacy-preserving data mining algorithms that are realistic, generic, and efficient. The research results of this project have broader impacts on the nationÕs higher education system and high-tech industries. The ability to mine private data without violating the privacy of data owners is a must for a wide variety of corporations, universities, hospitals, and government agencies. Similarly, theoretically and empirically validated means to protect privacy in data mining would benefit all privacy-concerned individuals at large. The impact of this project also extends to academia through educational efforts, including graduate and undergraduate student training, curriculum development, seminars, and outreach.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
0852674
Program Officer
Dmitry Maslov
Project Start
Project End
Budget Start
2008-09-01
Budget End
2013-12-31
Support Year
Fiscal Year
2008
Total Cost
$440,926
Indirect Cost
Name
George Washington University
Department
Type
DUNS #
City
Washington
State
DC
Country
United States
Zip Code
20052