There is increasing concern in disclosing sensitive information when clinical data are disseminated, given the potential for breach of individual privacy. Data sharing has become critical in the acceleration of biomedical research and healthcare quality improvement. We will develop new methods for privacy protection that can adapt to the amount of the data being disseminated and the sensitivity of certain variables.
Our first aim i s to measure fine-grained privacy risk of individual records in patient sub- populations This index can be used to monitor and customize privacy protection of individual clinical records and help prioritize efforts in privacy protection.
The second aim i s to develop a new and practical method to support privacy-preserving data dissemination in both centralized and distributed environments, with or without knowledge of which analytic techniques will be applied to the disclosed data.
The third aim i s to speed up privacy preserving algorithms through advanced parallelization techniques. If successful, these new methods will allow privacy protection for large data set dissemination/analysis in real time.
These aims are faithful to the mission of the National Library of Medicine, and they are tightly related to the mentors'efforts i leading the development of trustworthy data sharing and individualized predictive models as part of the National Center for Biomedical Computing (NCBC), iDASH (integrating Data for analysis, Anonymization, and SHaring). The applicant wishes to use this funding opportunity to complement his computer science skills with biomedical knowledge, and specialized training in parallel computing to investigate new algorithms for privacy protection in disseminated data. Success in this project will lead to his long-term goal of becoming an independently funded investigator and joining the core faculty of the Division of Biomedical Informatics at UCSD.

Public Health Relevance

There are important tradeoffs between disseminating clinical and genetic data for societal benefits and protecting personal privacy. We will develop practical solutions to address fine-grained privacy and usability trade-offs, provide multi-resolution protection to satisfy needs of different stakeholders, and accelerate privacy-preserving algorithms to support efficient data anonymization, analysis, and sharing.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Career Transition Award (K99)
Project #
1K99LM011392-01
Application #
8354440
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
2012-09-01
Project End
2013-08-31
Budget Start
2012-09-01
Budget End
2013-08-31
Support Year
1
Fiscal Year
2012
Total Cost
$75,510
Indirect Cost
$5,593
Name
University of California San Diego
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
804355790
City
La Jolla
State
CA
Country
United States
Zip Code
92093
Li, Pinghao; Jiang, Xiaoqian; Wang, Shuang et al. (2014) HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads. J Am Med Inform Assoc 21:363-73
Menon, Aditya Krishna; Jiang, Xiaoqian; Kim, Jihoon et al. (2014) Detecting Inappropriate Access to Electronic Health Records Using Collaborative Filtering. Mach Learn 95:87-101
Wang, Shuang; Jiang, Xiaoqian; Wu, Yuan et al. (2013) EXpectation Propagation LOgistic REgRession (EXPLORER): distributed privacy-preserving online model learning. J Biomed Inform 46:480-96
Gardner, James; Xiong, Li; Xiao, Yonghui et al. (2013) SHARE: system design and case studies for statistical health information release. J Am Med Inform Assoc 20:109-16
Vaidya, Jaideep; Shafiq, Basit; Jiang, Xiaoqian et al. (2013) Identifying inference attacks against healthcare data repositories. AMIA Jt Summits Transl Sci Proc 2013:262-6
Roozgard, Aminmohammad; Barzigar, Nafise; Wang, Shuang et al. (2013) Nucleotide sequence alignment using sparse coding and belief propagation. Conf Proc IEEE Eng Med Biol Soc 2013:588-91