One of the most salient features of this time is the dissemination of massive amounts of personal and sensitive data. Despite their enormous societal benefits, the powerful tools of modern machine learning, especially deep learning, can pose real threats to personal privacy. For example, over the last few years, it has become evident that deep neural networks have a remarkable power in learning even the finest details from large complex data sets. With such powerful tools, the need for robust and rigorous guarantees for privacy protection has become more crucial. The last decade has witnessed the rise of a sound mathematical theory, known as differential privacy, that enables designing data-analysis algorithms with rigorous privacy guarantees for their input data sets. Despite the noticeable success of this theory, existing tools from differential privacy are severely limited in offering acceptable utility guarantees when dealing with complex models like those arising in deep learning. This project will address those limitations by offering new principled approaches for designing differentially-private deep-learning algorithms that can scale to industrial workloads. The project will also involve collaboration with industry, which will facilitate the evaluation of the developed algorithms on real-world datasets and the development of open-source software tools. The products of this project have the potential to transform the way massive sets of sensitive data are used in modern machine-learning systems, which will impact the way these systems are designed and implemented in practice. The activities of this project will also aim at promoting diversity in computing by recruiting women and members of underrepresented groups.

The investigators will develop a rigorous, multi-faceted design paradigm for scalable, practical, differentially private algorithms for modern machine learning. This paradigm is based on two general strategies: (i) exploiting realistic and useful properties of the data and the machine-learning models to circumvent existing limitations in the literature on differential privacy, and (ii) leveraging a limited amount of public data (that has no privacy constraints) to boost the utility of the algorithms. Based on these strategies, the project will pursue following directions: (1) developing a new, generic framework for utilizing public data in privacy-preserving machine learning, (2) designing improved iterative training algorithms that can bypass the standard use of the so-called "composition theorem" of differential privacy, and (3) designing new differentially private stochastic-gradient methods tuned specifically to non-convex and over-parameterized machine-learning problems.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Santa Cruz
Santa Cruz
United States
Zip Code