In an era of information overload and big data, there is a pressing need to analyze, protect, prioritize and utilize data. Much of this data is inherently relational; thus, it is crucial to understand the benefits, challenges and potential hazards of exploiting the relational properties. Integrating, cleaning, and linking relational data requires matching and resolving references in the data. At the same time, matching and linking pose significant privacy risks. The proposed work develops a theoretical understanding of entity resolution in network data with the goal of developing tools and methods which can tell us how easy or difficult it will be to resolve data in different settings. Making use of the theory, new entity resolution algorithms will be developed with accuracy guarantees and for scaling entity resolution to large-scale data sources. These research results will enable more informed data sharing and usage decisions by individuals, industry, and government. Accurate analysis of network data is of utmost importance to science, medicine and national security. Whether studying socioeconomic trends, integrating data from large microarrays, analyzing organized crime or terrorist networks, or mining financial data for corporate misconduct, accurate network data, and its associated statistics, are crucial. At the same time, understanding how entity resolution effects privacy guarantees, and educating the public about the impact of releasing identifying information, is equally important.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1218488
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2012-09-01
Budget End
2016-08-31
Support Year
Fiscal Year
2012
Total Cost
$500,000
Indirect Cost
Name
University of Maryland College Park
Department
Type
DUNS #
City
College Park
State
MD
Country
United States
Zip Code
20742