The rapid development of Next Generation Sequencing (NGS) technologies significantly reduces the cost for producing DNA data. As a result, genome sequencing may soon become a routine tool for clinical diagnosis and therapy selection. In the meantime, the demand for large-scale meta-analysis of human genomic data from patients with various diseases is expected to grow substantially in the near future. However, the effort to meet such a demand has not benefited from the progress in sequencing technologies, due to the massive amount of computational resources needed for storing and analyzing the NGS data and the complicated procedures for researchers to get access to the data, which are put in place to protect the privacy of human subjects. To address such challenges and facilitate secure and also convenient DNA data sharing, we propose to study and develop a suite of innovative and transformative techniques aimed at achieving practical and cost-effective genomic data protection. Using these techniques, NIH data center can offer a centralized analysis service on the genome data it hosts;execute the analysis programs submitted by the data users, and control release of analysis outcomes to ensure the privacy of DNA donors. Our techniques will also help the center outsource the computation tasks it does not have sufficient resources to handle to the computing systems rented locally and remotely in a highly privacy-preserving manner. The proposed research will be conducted in a close collaboration with iDASH, a National Center for Biomedical Computing for "integrating Data for Analysis, Anonymization and Sharing", using its data to evaluate our techniques and its infrastructure to deploy them.

Public Health Relevance

Collaborating with iDASH, a National Center for Biomedical Computing for integrating Data for Analysis, Anonymization and Sharing, we will develop innovative and practical techniques for protecting the privacy of human subjects in the large-scale analysis of human genome sequencing data. These techniques will significantly reduce the cost for human genome research, help overcome the barrier to data access, and ultimately accelerate the translational research in human genomics and discovery of novel diagnosis tools using genomic techniques.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Sofia, Heidi J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Indiana University Bloomington
Other Domestic Higher Education
United States
Zip Code
Farhan, Wael; Wang, Zhimu; Huang, Yingxiang et al. (2016) A Predictive Model for Medical Events Based on Contextual Embedding of Temporal Sequences. JMIR Med Inform 4:e39
Li, Sujun; Bandeira, Nuno; Wang, Xiaofeng et al. (2016) On the privacy risks of sharing clinical proteomics data. AMIA Jt Summits Transl Sci Proc 2016:122-31
Tang, Haixu; Jiang, Xiaoqian; Wang, Xiaofeng et al. (2016) Protecting genomic data analytics in the cloud: state of the art and opportunities. BMC Med Genomics 9:63
Shi, Haoyi; Jiang, Chao; Dai, Wenrui et al. (2016) Secure Multi-pArty Computation Grid LOgistic REgression (SMAC-GLORE). BMC Med Inform Decis Mak 16 Suppl 3:89
Yang, Lei; Wang, Shuang; Jiang, Xiaoqian et al. (2016) PATTERN: Pain Assessment for paTients who can't TEll using Restricted Boltzmann machiNe. BMC Med Inform Decis Mak 16 Suppl 3:73
Han, Dong; Wang, Shuang; Jiang, Chao et al. (2015) Trends in biomedical informatics: automated topic analysis of JAMIA articles. J Am Med Inform Assoc 22:1153-63
Zhao, Yongan; Wang, XiaoFeng; Tang, Haixu (2015) Secure Genomic Computation through Site-Wise Encryption. AMIA Jt Summits Transl Sci Proc 2015:227-31
Zhao, Yongan; Wang, Xiaofeng; Jiang, Xiaoqian et al. (2015) Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery. J Am Med Inform Assoc 22:100-8
Jiang, Xiaoqian; Zhao, Yongan; Wang, Xiaofeng et al. (2014) A community assessment of privacy preserving techniques for human genomes. BMC Med Inform Decis Mak 14 Suppl 1:S1
Wang, Shuang; Mohammed, Noman; Chen, Rui (2014) Differentially private genome data dissemination through top-down specialization. BMC Med Inform Decis Mak 14 Suppl 1:S2

Showing the most recent 10 out of 11 publications