Technologies that guarantee privacy protection while keeping data practically useful for medical research are essential for the viability of both medical research and doctor-patient confidentiality. Patient-specific, population-based clinical and/or genomic databases are emerging in many areas of medical research. These databases simultaneously offer tremendous scientific potential and catastrophic privacy risks. Developers have integrated privacy shields in today's databases, but these are naTve and ad hoc first attempts with no provable and little (or no) real-world privacy protection. We envision databases in which it can be scientifically shown that: (1) no person whose information is contained in the database can be reidentified; (2) researchers can access necessary information easily; and, (3) experimental results are equivalent to results found in the absence of privacy protection. Our specific research aims are: ? ? Aim 1. To provide a protocol that thwarts a common re-identification vulnerability. ? Encrypted pseudonyms as a privacy guard are prevalent in many of today's databases, but we will show that the believed protection is often a fallacy. We then build on our prior experience with cryptographic protocols to develop methods that insulate against the vulnerability we reveal. ? Aim 2. To provide tools that reduce IRB burdens. Human review of protocols (by IRBs) are the most used privacy safeguard in medical research. We will reduce IRB burdens by creating a tool for automatically identifying optimal data subsets for pre-approval. This builds on our prior experience with automated privacy risk assessments and on data mining algorithms. ? Aim 3. To create a new research paradigm in which software agents (not humans) access sensitive data. Rather than sharing data, we will construct a system in which software agents (working on behalf of human researchers) perform computations on secluded data and report verifiable results back to researchers. Results are shared, not specific patient values. This approach builds on prior work in database security on query restriction and inference control. ? ? These research activities are conducted in a computer science lab and an informatics group to ensure ? generalizable, yet practical solutions. Construction of new technologies with their accompanying proofs of privacy and utility will be done at Carnegie Mellon using publicly and semi-publicly available data, so results can be publicly verified. Researchers at Vanderbilt will implement these new technologies within their own systems, thereby verifying practical results on sensitive patient information. ? ?

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM009018-03
Application #
7488898
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
2006-06-20
Project End
2010-06-19
Budget Start
2008-06-20
Budget End
2010-06-19
Support Year
3
Fiscal Year
2008
Total Cost
$300,160
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
Schools of Arts and Sciences
DUNS #
052184116
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213
Atreya, Ravi V; Smith, Joshua C; McCoy, Allison B et al. (2013) Reducing patient re-identification risk for laboratory results within research datasets. J Am Med Inform Assoc 20:95-101