Data is a key resource in this information age. The availability of data, however, often causes privacy concerns. Many data sharing scenarios require data be anonymized for privacy protection. Most existing data anonymization techniques, however, satisfy only weak privacy notions that rely on particular assumptions about the adversaries, and provide inadequate protection. In recent years, the elegant notion of differential privacy has gradually been accepted as the privacy notion of choice for answering statistical queries. Most research on differential privacy, however, focuses on answering interactive queries, and there are several negative results on publishing microdata while satisfying differential privacy. Regardless, many data sharing scenarios require sharing of microdata, and research is needed to bridge this gap.
This project aims at bridging the gap between the elegant notion of differential privacy, and the practical difficulty of publishing microdata while preserving utility. Building on the preliminary results of the PI on using random sampling together with "safe" k-anonymization to satisfy differential privacy, this project aims at advancing the state of the art of both scientific understanding and specific techniques for privacy-preserving microdata publishing. Research activities include developing (1) Practical anonymization methods that can be proven to satisfy differential privacy, while capable of handling high-dimensional data; (2) Relaxations of differential privacy that are more suitable for microdata publishing; (3) Privacy theory and techniques that are easily applied to a family of data sanitization algorithms called localized algorithms, enabling the usage of input perturbation techniques for provably-private microdata publishing; (4) Privacy notions and techniques for publishing social network data and network trace data.
Advances in data anonymization techniques will benefit the society by providing a better balance between the need to release data to serve public interest and the need to protect individuals' privacy. This project also involves developing a graduate seminar course on data privacy, and supports two graduate students.