Statistical databases for public use pose a critical problem: how to make the data available for analysis without disclosing information that would infringe on privacy, violate confidentiality, or endanger national security. Organizations in the public and private sectors have a major stake in this confidentiality protection problem, given the fact that access to data is essential for advancing research and formulating policy. Yet, the possibility of extracting certain sensitive elements of information from the data can jeopardize the welfare of these organizations and potentially, the welfare of the society in which they operate. The challenge is, therefore, to represent the data in a form that permits accurate analysis for supporting research, decision-making and policy initiatives, while preventing an unscrupulous or ill- intentioned party from exploiting the data for harmful consequences. The objective of this project is to develop a practical, computer-based framework for assessing, measuring, and mitigating disclosure risk in public use data. Our proposed framework, called OptShield, overcomes the disadvantages found in currently deployed disclosure limitation methods. We achieve this by combining perturbation and suppression methods with optimal switching of sensitive records at the micro-data level, to produce a method that protects confidentiality while preserving data integrity. In Phase II we are proposing to continue algorithmic and software development to achieve the objective of a working prototype of the software and service. This software will serve as the core technology to provide an application for a broad market in which customers have a major stake in confidentiality protection. The application we ultimately plan to offer in Phase III will consist of a three-phased approach to the disclosure limitation problem: (1) Assess a user's qualitative and quantitative disclosure risks inherent in the organization's data publishing and sharing plans;(2) Measure the disclosure risks in a user's proposed data products;and (3) Protect the user's data by applying the appropriate disclosure limitation techniques.
Public health organizations that collect and share sensitive data are apprehensive about the risk of inadvertently disclosing confidential information, given the fact that access to their data is essential for advancing research and formulating policy. Yet, the possibility of extracting certain vulnerable elements of information from the data, even after personal identifiers have been removed, can jeopardize the welfare of these organizations and potentially the welfare of the society in which they operate. Within the US Department of Health and Human Services, for example, preserving the confidentiality of records in order to continue to elicit information from the American people and from health care providers is """"""""a matter of primary concern"""""""" (CDC/NCHS confidentiality guide). OptTek Systems, Inc. (OptTek) is developing a comprehensive framework designed to help public health and other organizations to avoid the disclosure of confidential information in public-use data. The application consists of a three-phased approach to the disclosure limitation problem: (1) Assess a user's qualitative and quantitative disclosure risks;(2) Measure the disclosure risks in a user's proposed data publishing and sharing plans;and (3) Protect the user's data by applying the appropriate disclosure limitation techniques.