When handling an intrusion, one organization may need to share logs and network traces with other organizations. However, the data to be shared may contain sensitive information that the first organization does not wish to disclose to the other parties. The solution is to sanitize the data by removing information from the data. This protects the privacy of the organization. However, the data removed may be essential for the analysis of the data that the other organizations must perform. This research explores the tension between privacy and security analysis. The goals of this research are to: (1) develop a sanitizing language to describe the requirements for privacy and security analysis in such a way that the requirements can be automatically checked for inconsistencies; (2) determine the conditions under which "perfect sanitization" can occur, if any; and (3) examine the problem of sanitizing a dynamic data set that changes as the sanitization proceeds. This will involve developing and testing the sanitization language on both data extracted from a network, and on files containing student grades. The former will provide examples of both static and dynamic data. Perfect sanitization will be studied by translating the sanitization process into functions and analyzing their ranges and domains.
The significance of this work is in the balance of privacy and security. Previous work focuses on sanitizing data in an ad hoc manner, rather than analyzing the balance between privacy and security and allowing the sanitizers to choose among particular requirements when the needs of privacy and security analysis conflict. Its impact is that if successful, the results can be used in a wide variety of fields in which privacy and analysis (not just security analysis) must be balanced.