Privacy is a fundamental right and needs to be protected. For health care related d information, there are regulations for disclosure. These regulations were motivated by the public's concern of breaches of confidentiality that might result in discrimination. The recent progress in electronic medical record technology, the Internet, and the genetic revolution, together with media reports on violations of privacy have generated increasing interest in this topic. A common belief is that sensitive information is more easily available with the use of networked computers. Since total lack of disclosure is not realistic, current regulations require that the """"""""minimal amount"""""""" of information be given to a certain party. A thorough study on what constitutes """"""""minimal"""""""" for particular types of applications and a """"""""usefulness index"""""""" is lacking. An exact quantification of the potential for privacy breach in de-identified or anonymized databases is also lacking. Definition and quantification of these indices is important for decision-making. As we demonstrate, de-identified data sets can still be used for inference and therefore may disclose sensitive information. The use of machine learning methods to verify the remaining functional dependencies in a de- identified data set leads to better understanding of the possible inferences. Anonymization techniques based on logic, statistics, database theory, and machine learning methods can help in the protection of privacy. We will formally define and study anonymity in databases, from a theoretical and a practical standpoint. We will develop and implement algorithms to anonymize data sets that will be in accordance with the balance of anonymity and """"""""usefulness"""""""" of the disclosed data sets. We will also develop and implement algorithms to verify the anonymity of a given data set and indicate the type of records that are at highest risk for a privacy attack. We will make our methods and documented tools freely available to researchers via the WWW.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM007273-03
Application #
6733529
Study Section
Special Emphasis Panel (ZLM1-MMR-X (O1))
Program Officer
Sim, Hua-Chuan
Project Start
2002-02-01
Project End
2005-07-31
Budget Start
2004-02-01
Budget End
2005-07-31
Support Year
3
Fiscal Year
2004
Total Cost
$406,979
Indirect Cost
Name
Brigham and Women's Hospital
Department
Type
DUNS #
030811269
City
Boston
State
MA
Country
United States
Zip Code
02115
Mehta, Sanjay R; Vinterbo, Staal A; Little, Susan J (2014) Ensuring privacy in the study of pathogen genetics. Lancet Infect Dis 14:773-777
Vinterbo, Staal A; Sarwate, Anand D; Boxwala, Aziz A (2012) Protecting count queries in study design. J Am Med Inform Assoc 19:750-7
Lasko, Thomas A; Vinterbo, Staal A (2010) Spectral Anonymization of Data. IEEE Trans Knowl Data Eng 22:437-446
Vinterbo, Staal A; Dreiseitl, Stephan; Ohno-Machado, Lucila (2006) Approximation properties of haplotype tagging. BMC Bioinformatics 7:8
Vinterbo, Staal A; Kim, Eun-Young; Ohno-Machado, Lucila (2005) Small, fuzzy and interpretable gene expression based classifiers. Bioinformatics 21:1964-70
Ohno-Machado, Lucila; Silveira, Paulo Sergio Panse; Vinterbo, Staal (2004) Protecting patient privacy by quantifiable control of disclosures in disseminated databases. Int J Med Inform 73:599-606
Weber, Griffin; Vinterbo, Staal; Ohno-Machado, Lucila (2004) Multivariate selection of genetic markers in diagnostic classification. Artif Intell Med 31:155-67