Information technology and electronic communications have been rapidly applied to every sphere of human activity, including commerce, medicine and social networking. The concomitant emergence of myriad large centralized searchable data repositories has made "leakage" of private information via data correlation (inadvertently or by malicious design) an important and urgent societal problem. Maintaining the usefulness of these data sources while also providing necessary privacy guarantees is an important unsolved problem. This problem drives the need for an overarching analytic framework that can tell us unequivocally how safe private data can be (privacy) while still providing useful benefit (utility) to multiple legitimate information consumers.
This research develops a unified framework to study the utility-privacy tradeoff irrespective of the type of data source or method of perturbation. Techniques and results from rate-distortion theory are used to model data sources, develop application independent utility and privacy metrics, and develop a side-information model for dealing with questions of external knowledge. The framework, applicable for single query data source models, is extended to study the utility-privacy tradeoffs for multiple-query models. Also studied is a successive disclosure problem which draws on classic results in successive refinement to develop the conditions under which multiple queries result in no additional information loss. The universal framework developed includes tools and techniques to bridge the gap between the information-theoretic model and current approaches and the dominant theoretical framework in computer science.
The ubiquity of technologies such as on-line data repositories, biometric identification systems, financial (e.g., credit card) databases, healthcare information systems, smart electricity meters, etc., has created new challenges in information security and privacy. The research pursued under this project has developed a fundamental framework for examining, in a general setting, the tradeoff between the privacy of data in such systems and its measurable benefits. Although earlier approaches have considered the issue of data privacy alone, this new ability to understand the basic tradeoff between data privacy and the usefulness of data provides a means for developing methods and protocols for use in practical applications. This new methodology has been applied under the support of this grant to specific applications of this methodology in the areas of smart electricity metering, biometric identification systems, and general databases. The importance of this work to society is that it provides a way to understand the basic tradeoffs inherent in using data and in keeping data private. These are opposing goals, and so they must both be considered when designing protocols for information systems. Given the widespread and growing importance of this issue. The research conducted under this grant has the potential for guiding this critical area of technology development.