Massive graphs arise in many social media applications, such as social networks, E-commerce recommendation systems, e-mail communication patterns, and other collaborative applications. Such data is often sensitive from a privacy point of view. Recently there are many privacy preserving schemes being proposed to protect the release of network data. The question is how effective these schemes are on preventing the re-identification of nodes, i.e., preserving the identity anonymization of the network nodes.
This project will raise the issue of the inadequacy of the current network anonymization schemes for massive and sparse graphs. It is important to understand the theoretical properties which make them susceptible to re-identification attacks. By a systematic study of the re-identification risks of the existing approaches, and development of new principles for anonymization of network data, we will deepen our understanding of the problems and be better able to protect the data privacy. By designing a new type of attack algorithms and raising the issue on the privacy exposure of the current network anonymization schemes, the work can lead to fundamentally different thinking on how to perform privacy preserving data publishing on network data. It provides new insights on how to devise anonymization schemes to protect the privacy of social network data.
One of the biggest obstacles on sharing information is the privacy concern. This project has the potential to make fundamental, disruptive advances in protecting the privacy of network data. It provides new insights on the inadequacy of the current anonymization schemes. Many researchers need access to sensitive data, e.g., social network data, e-mail and communication patterns, etc. By advancing the knowledge on privacy preserving data publishing, the barrier of sharing data will come down to facilitate scientific research activities.