The objective of this project is the development of an innovative technique to avoid disclosure of confidential data in public use tabular data. Our proposed technique, called Optimal Data Switching (OS), overcomes the limitations and disadvantages found in currently deployed disclosure limitation methods. Statistical databases for public use pose a critical problem of identifying how to make the data available for analysis without disclosing information that would infringe on privacy, violate confidentiality, or endanger national security. Organizations in both the public and private sectors have a major stake in this confidentiality protection problem, given the fact that access to data is essential for advancing research and formulating policy. Yet, the possibility of extracting certain sensitive elements of information from the data can jeopardize the welfare of these organizations and potentially, in some instances, the welfare of the society in which they operate. The challenge is, therefore, to represent the data in a form that permits accurate analysis for supporting research, decision-making and policy initiatives, while preventing an unscrupulous or ill-intentioned party from exploiting the data for harmful consequences. Our goal is to build on the latest advances in optimization, to which the OptTek Systems, Inc. (OptTek) research team has made pioneering contributions, to provide a framework based on optimal data switching, enabling the Centers for Disease Control and Prevention (CDC) and other organizations to effectively meet the challenge of confidentiality protection. The framework we propose is structured to be easy to use in a wide array of application settings and diverse user environments, from client-server to web-based, regardless of whether the micro-data is continuous, ordinal, binary, or any combination of these types. The successful development of such a framework, and the computer-based method for implementing it, is badly needed and will be of value to many types of organizations, not only in the public sector but also in the private sector, for whom the incentive to publish data is both economic as well as scientific. Examples in the public sector are evident, where organizations like CDC and the U.S. Census Bureau exist for the purpose of collecting, analyzing and publishing data for analysis by other parties. Numerous examples are also encountered in the private sector, notably in banking and financial services, healthcare (including drug companies and medical research institutions), market research, oil exploration, computational biology, renewable and sustainable energy, retail sales, product development, and a wide variety of other areas.

Public Health Relevance

In the process of accumulating and disseminating public health data for reporting purposes, various uses, and statistical analysis, we must guarantee that individual records describing each person or establishment are protected. Organizations in both the public and private sectors have a major stake in this confidentiality protection problem, given the fact that access to data is essential for advancing research and formulating policy. This project proposes the development of a robust methodology and practical framework to deliver an efficient and effective tool to protect the confidentiality in published tabular data.

Agency
National Institute of Health (NIH)
Institute
National Institute of Mental Health (NIMH)
Type
Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #
3R43MH086138-01A1S1
Application #
7790821
Study Section
Special Emphasis Panel (ZRG1-HOP-E (10))
Program Officer
Stirratt, Michael J
Project Start
2008-09-10
Project End
2009-09-11
Budget Start
2008-09-10
Budget End
2009-09-11
Support Year
1
Fiscal Year
2009
Total Cost
$4,047
Indirect Cost
Name
Opttek Systems, Inc.
Department
Type
DUNS #
128005423
City
Boulder
State
CO
Country
United States
Zip Code
80302