Social-networking sites (e.g., Facebook, MySpace, LinkedIn, etc.) and other online collaborative tools have emerged as places where people can post and share information. This information-sharing has many benefits, ranging from practical (e.g., sharing a business document) to purely social (e.g., communicating with distant friends). At the same time, information sharing inevitably poses significant threats to user privacy. In social-networking sites, for example, documented threats range from identity theft to digital stalking and personalized spam. As a result, a growing number of such sites allow individual users to specify fine-grained policies that indicate who can access their data, and to what extent. However, studies have consistently shown that most end-users find the task of specifying access-control policies for their own data overwhelming; as a result, users often skip the process altogether.

The goal of this project is to help collaborative and social-media users gain control of their data. To that end, the project will include three main components: assisted specification, feedback, and refinement recommendations. To assist users in initially specifying access-control policies for their data, the project will develop a "privacy wizard," which employs data mining and machine learning methods, including active learning, to construct an accurate policy, with minimal input from the user. To provide feedback regarding existing privacy settings, the project will pursue two approaches: aggregate scores and visualizations. For example, an aggregate score can be used to concisely explain to the user how her settings differ from those of other users. Preliminary work found that Item Response Theory (IRT) can be used effectively for this purpose. Finally, the project will consider how aggregate scores and visual feedback can be enriched with recommendations for refinements to help the user achieve an expressed level of social exposure.

Online collaborative tools and social media offer great promise in a number of arenas. In addition to communicating with friends via social networking sites, collaborative tools are now used in fields as diverse as business, medicine and education. However, the absence of usable privacy and access control prevents such tools from realizing their full potential. Results of this project will be disseminated via prototype implementations, as well as research publications. New undergraduate and graduate curriculum modules will also increase awareness of the importance of policy-specification and emerging research in this area.

Project Report

The research supported by this award investigated issues related to privacy of publicly available data. The research revolved around three main themes: 1. How do we quantify the privacy one loses when participating in online activities? Can this quantification act as a measure for increasing the awareness of individual users with respect to the privacy (or lack thereoff) of their online presence? For this research thrust we investigated the statistical properties of different privacy score measures and designed efficient algorithms for evaluating them. 2. Given a publicly availabe network dataset that reveals some of the connections or information exchanges between individual nodes can we infer other hidden connections? Also given the same dataset can we identify important and influential nodes in such a network? For the second research theme, we designed algorithms for identifying important nodes in networks -- i.e., networks that initiated the propagation of different ideas in the network. We also designed algorithms for inferring hidden relationships/communication patterns between network nodes. 3. Finally, we have investigated the following general question: "Given a set of aggregate statistics describing an underlying dataset can we infer the individual points of the dataset?" The above question has a wealth of different instantiations -- that depend on the particular application domain. During the course of this project we have explored several such instantiations. For example, given information about the number of common neighbors of network nodes, we designed an algorithm for inferring the most probably underlying graph structure. Similarly, given the row and column marginals of a 0-1 table that contains information about the products that individuals bought we have designed efficient algorithms for identifying the underlying table that indicates which customer purchased which product.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1017529
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
2010-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2010
Total Cost
$247,323
Indirect Cost
Name
Boston University
Department
Type
DUNS #
City
Boston
State
MA
Country
United States
Zip Code
02215