Group based anonymization is the most widely studied approach for privacy preserving data publishing. This includes k-anonymity, l-diversity, and t-closeness, to name a few. The goal of this proposal is to raise a fundamental issue on the privacy exposure of this approach which has been overlooked in the past and come out with a computationally efficient solution. The group based anonymization approach basically hides each individual record behind a group to preserve data privacy. However, patterns may still be derived or mined from the published anonymized data and be used by the adversary to breach individual privacy. The objective of this research is therefore to develop novel group-based anonymization methods that can defend against such an attack. The first part of the project is to define the attack problem, i.e., the published anonymized data can in fact be mined for privacy attacks. It identifies and formulates the privacy exposure to such an attack. The second part is to conduct a systematic study on the exposure of existing privacy techniques to the attack. The third part is to derive the condition that is able to resist such an attack and develop efficient data publishing algorithms to prevent it from occurring.
Due to the rapid advancement in storing, processing, and networking capabilities of computing devices, there has been a tremendous growth in the collection of digital information about individuals. The collected data offer tremendous opportunities for mining useful information, such as research on patient records to devise personalized medicine, research on trading records to devise more effective policy to avoid the financial meltdown of banking systems, etc. However, there is also a threat to privacy because data in raw form often contain sensitive information about individuals. Privacy-preserving data publishing (PPDP) studies how to transform raw data into a version that is immunized against privacy attacks but that still supports effective data analysis. Our work identifies the weakness of current PPDP approaches and comes out with novel alternatives that will make sharing of valuable data safer and more likely so that the rich information can be used to build a better society. Group based anonymization is the most widely studied approach for privacy-preserving data publishing. Our work is the first to identify its exposure. ? -differential privacy is another approach designed for an interactive querying model. We propose a novel data publishing approach for the non-interactive setting based on ? -differential privacy. The work creates an awareness of the weakness of the current privacy preserving data publishing schemes and provides an alternative approach by extending a privacy preservation scheme designed for an interactive query model to the non-interactive data publishing model. This will facilitate the sharing of data to advance data-driven research.