Practical privacy-preserving machine learning methods are currently of critical importance in medical, financial and consumer applications, among others. The aim of this project is to develop practical private machine learning algorithms that can be easily implemented by practitioners in any field that holds sensitive data, while keeping robust privacy guarantees. The proposed research will extend the existing rigorous theoretical guarantees of differential privacy to reach the requirements of modern machine learning algorithms in concrete practical settings. The generated intellectual merit therefore spans all the way from the theory to practical algorithms. The resulting methods have the potential to benefit existing real-world applications in many high impact domains. The PIs have several ongoing and successful collaborations with medical practitioners and researchers and will evaluate the resulting algorithms on real patient data in high impact medical applications. All algorithms will be made publicly available as open source. Both PIs are dedicated towards actively hiring minorities and involving undergraduate students in research.

Differential privacy (DP) is now recognized as one of the most rigorous and potentially usable notions of statistical privacy, and has become a full-fledged research field. The aim of this research is to provide reliable privacy guarantees for practical machine learning algorithms, which is invaluable to protect individuals who volunteer their sensitive data for research purposes. We identify several areas with high impact potential and propose four concrete research thrusts. (1) Private Causal Inference. Causal inference is one of the most promising new directions in machine learning, that recently has become practical. Some of the most interesting causal questions deal however with medical or government policy data, which are inherently sensitive. We propose to unite the recent breakthroughs in both fields (causal inference and DP) and derive a practical and theoretically sound method to ensure differentially private causal inference. (2) Privacy for Bayesian Global Optimization. The success of deep learning has created a surge in popularity for Bayesian Global Optimization (BGO) for hyper-parameter tuning. Simultaneously, recent publications have tied the stability properties of differential privacy to generalization in adaptive data analysis. We propose to unite these recent developments and improve the generalization of BGO using insights from DP. Here, we are not protecting individuals from privacy leaks, but algorithms from overfitting-allowing for fine trade-offs of "privacy" vs. efficacy. (3) Private Communication-Efficient Distributed Learning. In response to the growth of data distributed over multiple machines, we aim to design practical private and communication-efficient algorithms for supervised and unsupervised learning problems. This work will build off our recent work on distributed learning and clustering algorithms. (4) Practical Private Active Learning. In the age of big data, there has been tremendous interest both in machine learning and its application areas on designing active learning algorithms that most efficiently utilize the available data, while minimizing the need for human intervention. Recently there have been exciting results on understanding statistical and computational principles (including work by the PIs). This research will develop new foundations and new practical well-founded active learning algorithms that are not only statistically and computationally efficient, but also differentially private.

Project Start
Project End
Budget Start
2016-09-01
Budget End
2019-08-31
Support Year
Fiscal Year
2016
Total Cost
$249,729
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213