A wealth of data about individuals is constantly accumulating in various databases in the form of medical records, social network graphs, mobility traces in cellular networks, search logs, and movie ratings, to name only a few. There are many valuable uses for such datasets, but it is difficult to realize these uses while protecting privacy. Even when data collectors try to protect the privacy of their customers by releasing anonymized or aggregated data, this data often reveals much more information than intended. To reliably prevent such privacy violations, we need to replace the current ad-hoc solutions with a principled data release mechanism that offers strong, provable privacy guarantees. Recent research on DIFFERENTIAL PRIVACY has brought us a big step closer to achieving this goal. Differential privacy allows us to reason formally about what an adversary could learn from released data, while avoiding the need for many assumptions (e.g. about what an adversary might already know), the failure of which have been the cause of privacy violations in the past. However, despite its great promise, differential privacy is still rarely used in practice. Proving that a given computation can be performed in a differentially private way requires substantial manual effort by experts in the field, which prevents it from scaling in practice.

This project aims to put differential privacy to work---to build a system that supports differentially private data analysis, can be used by the average programmer, and is general enough to be used in a wide variety of applications. Such a system could be used pervasively and make strong privacy guarantees a standard feature wherever sensitive data is being released or analyzed. Specific contributions will include ENRICHING THE FUNDAMENTAL MODEL OF DIFFERENTIAL PRIVACY to address practical issues such as data with inherent correlations, increased accuracy, privacy of functions, or privacy for streaming data; DEVELOPING A DIFFERENTIALLY PRIVATE PROGRAMMING LANGUAGE, along with a compiler that can automatically prove programs in this language to be differentially private, and a runtime system that is hardened against side-channel attacks; and SHOWING HOW TO APPLY DIFFERENTIAL PRIVACY IN A DISTRIBUTED SETTING in which the private data is spread across many databases in different administrative domains, with possible overlaps, heterogeneous schemata, and different expectations of privacy. The long-term goal is to combine ideas from differential privacy, programming languages, and distributed systems to make data analysis techniques with strong, provable privacy guarantees practical for general use. The themes of differential privacy are also being integrated into Penn's new undergraduate curriculum on Market and Social Systems Engineering.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
1065060
Program Officer
Sol Greenspan
Project Start
Project End
Budget Start
2011-03-15
Budget End
2017-02-28
Support Year
Fiscal Year
2010
Total Cost
$1,199,950
Indirect Cost
Name
University of Pennsylvania
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19104