There is increasing concern in disclosing sensitive information when clinical data are disseminated, given the potential for breach of individual privacy. Data sharing has become critical in the acceleration of biomedical research and healthcare quality improvement. We will develop new methods for privacy protection that can adapt to the amount of the data being disseminated and the sensitivity of certain variables.
Our first aim i s to measure fine-grained privacy risk of individual records in patient sub- populations This index can be used to monitor and customize privacy protection of individual clinical records and help prioritize efforts in privacy protection.
The second aim i s to develop a new and practical method to support privacy-preserving data dissemination in both centralized and distributed environments, with or without knowledge of which analytic techniques will be applied to the disclosed data.
The third aim i s to speed up privacy preserving algorithms through advanced parallelization techniques. If successful, these new methods will allow privacy protection for large data set dissemination/analysis in real time.
These aims are faithful to the mission of the National Library of Medicine, and they are tightly related to the mentors'efforts i leading the development of trustworthy data sharing and individualized predictive models as part of the National Center for Biomedical Computing (NCBC), iDASH (integrating Data for analysis, Anonymization, and SHaring). The applicant wishes to use this funding opportunity to complement his computer science skills with biomedical knowledge, and specialized training in parallel computing to investigate new algorithms for privacy protection in disseminated data. Success in this project will lead to his long-term goal of becoming an independently funded investigator and joining the core faculty of the Division of Biomedical Informatics at UCSD.
There are important tradeoffs between disseminating clinical and genetic data for societal benefits and protecting personal privacy. We will develop practical solutions to address fine-grained privacy and usability trade-offs, provide multi-resolution protection to satisfy needs of different stakeholders, and accelerate privacy-preserving algorithms to support efficient data anonymization, analysis, and sharing.
|Zhao, Yongan; Wang, Xiaofeng; Jiang, Xiaoqian et al. (2015) Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery. J Am Med Inform Assoc 22:100-8|
|Ji, Zhanglong; Jiang, Xiaoqian; Wang, Shuang et al. (2014) Differentially private distributed logistic regression using private and public data. BMC Med Genomics 7 Suppl 1:S14|
|Li, Haoran; Xiong, Li; Jiang, Xiaoqian (2014) Differentially Private Synthesization of Multi-Dimensional Data using Copula Functions. Adv Database Technol 2014:475-486|
|Li, Haoran; Xiong, Li; Ohno-Machado, Lucila et al. (2014) Privacy preserving RBF kernel support vector machine. Biomed Res Int 2014:827371|
|Li, Pinghao; Jiang, Xiaoqian; Wang, Shuang et al. (2014) HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads. J Am Med Inform Assoc 21:363-73|
|Menon, Aditya Krishna; Jiang, Xiaoqian; Kim, Jihoon et al. (2014) Detecting Inappropriate Access to Electronic Health Records Using Collaborative Filtering. Mach Learn 95:87-101|
|Jiang, Wenchao; Li, Pinghao; Wang, Shuang et al. (2013) WebGLORE: a web service for Grid LOgistic REgression. Bioinformatics 29:3238-40|
|Li, Pinghao; Wang, Shuang; Kim, Jihoon et al. (2013) DNA-COMPACT: DNA COMpression based on a pattern-aware contextual modeling technique. PLoS One 8:e80377|
|Roozgard, Aminmohammad; Barzigar, Nafise; Wang, Shuang et al. (2013) Nucleotide sequence alignment using sparse coding and belief propagation. Conf Proc IEEE Eng Med Biol Soc 2013:588-91|