Modern data sets are largely unlabeled. Unsupervised learning of useful representations to better understand the structure in data is a critical challenge in data science and machine learning; it finds application in computational and social science, including information retrieval, web mining, and recommendation systems. As we progress further into the age of Big data, and the amount of data to be processed grows faster than the growth in our computational resources, better and faster ways for performing unsupervised learning and data analysis on such big data sets become ever more necessary. Furthermore, with the advent of the internet of things, private data is collected rather ubiquitously and seamlessly through devices such as smartphones, cameras, microphones, radio-frequency identification (RFID) readers, and social networks, raising serious concerns about an individual's privacy. Therefore, in this project, we initiate a formal investigation into privacy-aware unsupervised learning for Big data applications.

Taking a stochastic optimization view of unsupervised learning, we capture more general learning problems than previously studied in the privacy literature. One such class of learning problems is non-convex problems, such as matrix learning, tensor factorization, deep learning, and many more. While most of these problems are NP-hard, in practice we find that we can efficiently find solutions to these problems. We conjecture that noisy stochastic gradient descent updates that have recently been shown to efficiently find local minima for a large class of non-convex problems also guarantees privacy implicitly. Finally, we consider extensions of the privacy model from that of a single curator to those to distributed learning, continual release model, streaming model, and a novel sliding window model.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2018-10-01
Budget End
2021-09-30
Support Year
Fiscal Year
2018
Total Cost
$911,398
Indirect Cost
Name
Johns Hopkins University
Department
Type
DUNS #
City
Baltimore
State
MD
Country
United States
Zip Code
21218