The objective of this project is to understand, evaluate, and contribute towards the suppression of sensitive aggregates over hidden databases. Hidden databases are widely prevalent on the Web, ranging from databases of government agencies, databases that arise in scientific and health domains, to databases that occur in the commercial world. They provide proprietary form-like search interfaces that allow users to execute search queries by specifying desired attribute values of the sought-after tuple(s), and the system responds by returning a few (e.g., top-k) satisfying tuples sorted by a suitable ranking function.
While owners of hidden databases would like to allow individual search queries, many also want to maintain a certain level of privacy for aggregates over their hidden databases. This has implications in the commercial domain (e.g., to prevent competitors from gaining strategic advantages) as well as in homeland-security related applications (e.g., to prevent potential terrorists from learning flight occupancy distributions). The PIs' prior work pioneered techniques to efficiently obtain approximate aggregates over hidden databases using only a small number of search queries issued via their proprietary front-end. Such powerful and versatile techniques may also be used by adversaries to obtain sensitive aggregates; thus defending against them becomes an urgent task requiring imminent attention. This project investigates techniques to suppress the sensitive aggregates while maintaining the usability of hidden databases for bona fide search users. In particular, it explores a solution space which spans all three components of a hidden database system: (1) the back-end hidden database, (2) the query processing module, and (3) the front-end search interface. The intellectual merit of the project is two-fold: (1) problem novelty: it initiates a new direction of research in information privacy of suppressing sensitive aggregates over hidden databases, and (2) solution novelty: it investigates a variety of promising techniques across the three components. The outcomes of this research have broader impacts on the nation's higher education system and high-tech industries. Parts of the project will be carried out by students of the University of Texas Arlington and George Washington University as advanced class projects or individual research projects.