With the advent of "big data," the Web has become a highly potent and opaque place which, left unchecked, can breed data misuse and unfair practices toward individuals. Today's Web services accumulate staggering amounts of personal data, such as emails, searches, and activity logs. They are mined for business value, retained for extended periods of time, and at times shared with third parties -- all without individuals' knowledge or explicit consent. Are these uses good or bad for individuals? Do they match a service's privacy policy? How long is data used after deletion? Who shares data with whom? Presently, there is little insight into these questions. This project creates a set of building blocks to track different types of individual data (e.g., keywords used in searches, previous sites visited) and measure multiple types of use (e.g., targeted advertising). These are used to develop robust and scalable tools for tracking data to uncover misuse.
Our project seeks to increase the data-driven Web's transparency by developing both the tools and the scientific foundations for tracking the data's journey on the web. We make contributions on three fronts. First, we design a set of building blocks, highly reusable and scalable components that facilitate the building of a new generation of auditing tools to lift the curtain on how personal data is being used. Second, we build a set of robust and scalable transparency tools that instantiate those building blocks and enable users, journalists, and investigators to obtain visibility into Web services' data uses. We are actively seeking out deployments through collaborations with journalists and investigators. Third, we leverage these tools to run extensive measurement studies of various data-driven platforms, such as targeted advertising ecosystems, data brokers, and online price discrimination. These studies increase awareness, and help uncover examples of data mistreatment, which we hope will provide the grounds for an informed societal argument on the need for increased voluntary transparency on the Web.