Spurred by financial scandals and privacy concerns, governments worldwide have moved to ensure confidence in digital records by regulating their retention and deletion. The goal of this project is to develop and explore a database management system (DBMS) architecture that supports a spectrum of approaches to regulatory compliance, thereby extending the level of protection afforded by conventional file-based compliance storage servers to the vast amounts of structured data residing in databases. The key challenge of this work is to provide compliance assurances for the DBMS, even against insiders with super-user powers, while balancing the need for trustworthiness against the conflicting requirements for scalable performance guarantees and low cost. The resulting architecture provides tunable tradeoffs between security and performance, through a spectrum of techniques ranging from tamper detection to tamper prevention for data, indexes, logs, and metadata; tunable vulnerability windows; tunable granularities of protection; careful use of magnetic disk as a cache and of secure coprocessors on the DBMS platform and compliance storage server platform; and judicious retargeting of an on-disk encryption unit.

This work enables compliance laws to be applied to business, government, and personal data now stored in databases, increasing societal confidence in such data. A new web course on compliance data management will raise the computer science community's awareness of compliance issues and will help train a new generation of professionals cognizant of these challenges and solutions. The software prototypes and technical papers describing them will be disseminated through the project's web site http://web.crypto.cs.sunysb.edu/cdb/

Project Report

To ensure institutional accountability and societal trust, hundreds of regulations require long-term retention of records that are important for business and society. Often there is a significant incentive to tamper with these records; consider, e.g., financial records at Enron or e-government data such as voter registrations or birth records of Olympic gymnasts. For technical reasons, however, retention regulations have been interpreted much less stringently for database data than for email, spreadsheets, and other corporate documents. This project made major strides toward closing that gap, by producing low-cost methods to detect and deflect database tampering attempts, even if the attacker is a system administrator with super-user privileges. More precisely, the project produced low-cost techniques to ensure that the entire life cycle of a database record is trustworthy, including its creation, querying, migration to a new generation of servers, update, deletion, subpoena and litigation holds, and finally its mandated destruction. These new techniques are ready for adoption by the database industry when governments or companies want tighter guarantees against tampering of database contents. Specific technical contributions of the project include a 1%-overhead method to ensure that the results of a transaction processing workload are trustworthy, i.e., have not been tampered with subsequent to execution, even if the system crashes or super-user insiders attack the system. Under this approach, an organization can check for tampering of its data as frequently as desired, by running a routine that takes only a couple of minutes to execute. The project also produced low-cost schemes for several challenges associated with subpoenas and litigation holds, including ensuring that the data produced in response to a subpoena does not depend on irrelevant details of how the data is organized, and guaranteeing that subpoenaed data has been retained, even once it passes its mandated destruction date. The project also produced techniques for ensuring that data that has been destroyed cannot be reconstructed from other information left behind in a database. Further, since many records must be retained but few will ever be queried, the project demonstrated that a document’s contents intrinsically govern how likely it is to ever be in the answer to any future query. Based on this, the project produced techniques for automatically identifying such records at the time of their creation and placing them and their associated meta-data on long-term storage that supports reasonably fast retrieval, while the remaining documents and meta-data can be retained on very low-cost media. In the course of producing these results, the project trained and educated a new generation of researchers and experts in secure data management and exploration, including three PhD students, two MS students, and half a dozen undergraduate interns who gained hands-on experience in creating large scale secure data management systems.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0803280
Program Officer
Frank Olken
Project Start
Project End
Budget Start
2008-09-01
Budget End
2014-02-28
Support Year
Fiscal Year
2008
Total Cost
$356,000
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820