This research investigates how to publish data while limiting disclosure about entities in the data. An example is census data, an invaluable source of socioeconomic data. Simple approaches for limiting disclosure, such as removing identifying attributes like social security number and name, are not sufficient because combinations of other information in the data can help identify individuals in the data, especially when the data can be linked to external databases. It is this linkage, and in general, the property of data that it is often explicitly linked to other data, that is the focus of this project.

In linked data, data records are linked through relationships between records. Examples include data about students and the classes they took where the links are the association between a student and the classes she took; data about network packets and the routers that forwarded these packets, where the links are the association of packets to routers; or data about people and their social network, where the links are the social relationships between people. It is the explicit representation of these links in the data that violates some of the key assumptions of prior work. This research spans the whole spectrum from motivating applications of linked data, to novel privacy models and practical anonymization algorithms, to new techniques for attacking and analyzing anonymized data.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
0627642
Program Officer
Vijayalakshmi Atluri
Project Start
Project End
Budget Start
2006-09-15
Budget End
2011-08-31
Support Year
Fiscal Year
2006
Total Cost
$344,850
Indirect Cost
Name
University of Massachusetts Amherst
Department
Type
DUNS #
City
Amherst
State
MA
Country
United States
Zip Code
01003