There is increasing concern in disclosing sensitive information when clinical data are disseminated, given the potential for breach of individual privacy. Data sharing has become critical in the acceleration of biomedical research and healthcare quality improvement. We will develop new methods for privacy protection that can adapt to the amount of the data being disseminated and the sensitivity of certain variables.
Our first aim i s to measure fine-grained privacy risk of individual records in patient sub- populations This index can be used to monitor and customize privacy protection of individual clinical records and help prioritize efforts in privacy protection.
The second aim i s to develop a new and practical method to support privacy-preserving data dissemination in both centralized and distributed environments, with or without knowledge of which analytic techniques will be applied to the disclosed data.
The third aim i s to speed up privacy preserving algorithms through advanced parallelization techniques. If successful, these new methods will allow privacy protection for large data set dissemination/analysis in real time.
These aims are faithful to the mission of the National Library of Medicine, and they are tightly related to the mentors'efforts i leading the development of trustworthy data sharing and individualized predictive models as part of the National Center for Biomedical Computing (NCBC), iDASH (integrating Data for analysis, Anonymization, and SHaring). The applicant wishes to use this funding opportunity to complement his computer science skills with biomedical knowledge, and specialized training in parallel computing to investigate new algorithms for privacy protection in disseminated data. Success in this project will lead to his long-term goal of becoming an independently funded investigator and joining the core faculty of the Division of Biomedical Informatics at UCSD.

Public Health Relevance

There are important tradeoffs between disseminating clinical and genetic data for societal benefits and protecting personal privacy. We will develop practical solutions to address fine-grained privacy and usability trade-offs, provide multi-resolution protection to satisfy needs of different stakeholders, and accelerate privacy-preserving algorithms to support efficient data anonymization, analysis, and sharing.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Career Transition Award (K99)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Diego
Internal Medicine/Medicine
Schools of Medicine
La Jolla
United States
Zip Code
Li, Pinghao; Jiang, Xiaoqian; Wang, Shuang et al. (2014) HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads. J Am Med Inform Assoc 21:363-73
Menon, Aditya Krishna; Jiang, Xiaoqian; Kim, Jihoon et al. (2014) Detecting Inappropriate Access to Electronic Health Records Using Collaborative Filtering. Mach Learn 95:87-101
Wang, Shuang; Jiang, Xiaoqian; Wu, Yuan et al. (2013) EXpectation Propagation LOgistic REgRession (EXPLORER): distributed privacy-preserving online model learning. J Biomed Inform 46:480-96
Gardner, James; Xiong, Li; Xiao, Yonghui et al. (2013) SHARE: system design and case studies for statistical health information release. J Am Med Inform Assoc 20:109-16
Vaidya, Jaideep; Shafiq, Basit; Jiang, Xiaoqian et al. (2013) Identifying inference attacks against healthcare data repositories. AMIA Jt Summits Transl Sci Proc 2013:262-6
Roozgard, Aminmohammad; Barzigar, Nafise; Wang, Shuang et al. (2013) Nucleotide sequence alignment using sparse coding and belief propagation. Conf Proc IEEE Eng Med Biol Soc 2013:588-91