Everyday web users have little guidance in handling the growing number of privacy issues they face when they go online. Many web sites - some legitimate, some less so - have behaviors many would consider unexpected or undesirable. These include popular and well-known web sites, as well as web sites that aim to dupe customers with "free" trials. These kinds of sites often detail their behaviors in privacy policies and terms of use pages, but these policies are rarely read, hard to understand, and sometimes intentionally obfuscated with legal jargon, small text, and pale fonts. The goal of this research is to develop new techniques to pinpoint and summarize the most surprising and most important parts of policies. The results of this research will be made publicly available on a web site and through web browser extensions.

The major research activity for this research will be to design, implement, and evaluate CrowdVerify, a system that combines crowdsourcing with machine learning techniques to flag the most important and unexpected behaviors of web sites. The core idea is to slice up a given policy into smaller text segments, have crowd workers compare different segments, and then aggregate the results together. A number of competitor scoring systems will also be evaluated for rating the importance of segments, including ELO, Glicko, and TrueSkill. Using these results, computational models will be built that can predict what people find most surprising as well as most important in web policies.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1422018
Program Officer
Dan Cosley
Project Start
Project End
Budget Start
2014-10-01
Budget End
2017-09-30
Support Year
Fiscal Year
2014
Total Cost
$515,290
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213