Privacy concerns can prevent constructing a centralized data warehouse to support data mining. For example, the Centers for Disease Control (CDC) may want to mine insurance companies' data to identify trends and patterns in disease outbreaks, such as understanding and predicting the progression of a flu epidemic. Gathering all patient data into a single warehouse increases opportunities for privacy breaches and misuse. We propose an alternative: secure collaborative computing between the parties holding the data that produce the desired data mining results, while provably preventing disclosure of private data.

This project will enable knowledge discovery under the following assumptions: 1. data are distributed across multiple sources, with security/privacy concerns that limit data sharing, and 2. if data were gathered into a centralized warehouse, data mining tools could identify patterns or relationships that give beneficial knowledge. Developed techniques will replicate or approximate the results of centralized data mining, with quantifiable limits on the disclosure of data from each site. The goal is to develop a toolkit of privacy-preserving distributed computation techniques that can be assembled to solve specific real-world problems. By simplifying component assembly so it becomes development rather than research, widespread use of privacy-preserving distributed data mining will become feasible.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0312357
Program Officer
Gia-Loi Le Gruenwald
Project Start
Project End
Budget Start
2003-08-15
Budget End
2006-09-30
Support Year
Fiscal Year
2003
Total Cost
$282,274
Indirect Cost
Name
Purdue University
Department
Type
DUNS #
City
West Lafayette
State
IN
Country
United States
Zip Code
47907