This research project studies a new area of research - exposure detection - that is at the intersection of data mining, security, and natural language processing. Exposure detection refers to discovering components/attributes of a user's public profile that reduce the user's privacy. To help the public understand the privacy risks of sharing certain information on the web, this research project focuses on developing efficient algorithms for modeling how an adversary learns information using incomplete and schemaless public data sources. Theoretically sound and efficient techniques for identifying accurate web footprints are introduced, including: new methods for data matching using a novel probabilistic join operator on multi-granular data, automated approaches for generating inference rules, and new solutions for identifying missing information and unifying mismatched vocabulary using lightweight natural language processing and text mining. The research activities also investigate methods for quantifying and adjusting exposure and risk, facilitating a better understanding of individuals' vulnerability on the web. These techniques not only advance the state of the art in re-identification, probabilistic reasoning and inference logic, and natural language understanding, but also serve as a foundation for exposure detection.

Project Start
Project End
Budget Start
2013-01-15
Budget End
2016-12-31
Support Year
Fiscal Year
2012
Total Cost
$499,996
Indirect Cost
Name
Georgetown University
Department
Type
DUNS #
City
Washington
State
DC
Country
United States
Zip Code
20057