Privacy is critical to freedom of creativity and innovation. Assured privacy protection offers unprecedented opportunities for industry innovation, science and engineering discovery, as well as new life enhancing experiences and opportunities. The ability to perform efficient and yet privacy preserving big data computations in the Cloud holds great potential for safe and effective data analytics, such as enabling health-care applications to provide personalized medical treatments using an individual's DNA sequence, or enabling advertisers to create targeted advertisements by mining a user's clickstream and social activities, without violation of data privacy. The PrivacyGuard project is developing algorithms, systems and tools that provide end-to-end privacy guarantees over the life cycle of a data analytic job. The end-to-end privacy guarantee can be measured by how difficult one can learn about some of the original sensitive data from the sanitized data releases, the intermediate results of execution and the output of an analytic job. The ultimate goal of PrivacyGuard is to develop a methodical framework and a suite of techniques for ensuring distributed computations to meet the desired privacy requirements of input data, as well as protecting against disclosure of sensitive patterns during execution and in the final output of the computation.

The PrivacyGuard project advances the knowledge and understanding of privacy preserving distributed computation from three perspectives: (1) It designs formal mechanisms to formulate a data owner's end-to-end privacy requirement for each data release, for example, by associating each data release with a well-defined usage scope to confine the set of data analytics models and algorithms that can operate on the released data. (2) It develops a suite of execution privacy guards with dual objectives: to audit and enforce privacy compliances during distributed computation against data-flow based privacy violations and to guard the compliance of input privacy. (3) It devises a proactive approach to output privacy against information leakages associated with mining output, for example, by leveraging differential privacy model to maximize the upper bound for data privacy guarantee and minimize the lower bound for data utility losses. The PrivacyGuard project is the first effort towards a practical and systematic implementation framework for ensuring the end-to-end privacy in distributed big data computations. Furthermore, by integrating the PrivacyGuard research with the curriculum development on big data systems and analytics courses at Georgia Institute of Technology, it contributes to the education and training of new generation of data scientists to be the privacy compliance advocates.

Project Start
Project End
Budget Start
2016-05-01
Budget End
2022-04-30
Support Year
Fiscal Year
2015
Total Cost
$1,199,999
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332