Computer-assisted medicine is at a crossroads: medical care requires accurate data, but making such data widely available can create unacceptable risks to the privacy of individual patients. This tension between utility and privacy is especially acute in predictive personalized medicine (PPM). PPM holds the promise of making treatment decisions tailored to the individual based on her or his particular genetics and clinical history. Making PPM a reality requires running statistical, data mining and machine learning algorithms on combined genetic, clinical and demographic data to construct predictive models. Access to such data directly competes with the need for healthcare providers to protect the privacy of each patient's data, thus creating a tradeoff between model efficacy and privacy. Thus we find ourselves in an unfortunate standoff: significant medical advances that would result from more powerful mining of the data by a wider variety of researchers are hindered by significant privacy concerns on behalf of the patients represented in the data set. In this proposed work, we seek to develop and evaluate technology to resolve this standoff, enabling health practitioners and researchers to compute on privacy-sensitive medical records in order to make treatment decisions or create accurate models, while protecting patient privacy. We will evaluate our approach on a de-identified actual electronic medical record, with an average of 29 years of clinical history on each patient, and with detailed genetic data (650K SNPs) available for a subset of 5000 of the patients. This data set is available to us now through the Wisconsin Genomics Initiative, but only on a computer at the Marshfield Clinic. If successful our approach will make possible the sharing of this cutting-edge data set, and others like it that are now in development, including our ability to analyze this data at UW-Madison where we have thousands of processors available in our Condor pool. Our privacy approach integrates secure data access environments, including those appropriate to the use of laptops and cloud computing, with novel anonymization algorithms providing differential privacy guarantees for data and/or published results of data analysis. To this end, our specific aims are as follows:
AIM 1 : Develop and deploy a secure local environment that, in combination with secure network functionality, will ensure end-to-end security and privacy for electronic medical records and biomedical datasets shared between clinical institutions and researchers.
AIM 2 : Develop and deploy a secure virtual environment to allow large-scale, privacy-preserving data analysis """"""""in the cloud."""""""" AIM 3: Develop and evaluate privacy-preserving data mining algorithms for use with original (not anonymized) data sets consisting of electronic medical records and genetic data.
AIM 4 : Develop and evaluate anonymizing data publishing algorithms and privacy guarantees that are appropriate to the complex structure present in electronic medical records with genetic data.

Public Health Relevance

This project will develop an integrated approach to secure sharing of clinical and genetic data that based on algorithms for anonymization of data to achieve differential privacy guarantees, for privacy-preserving publication of data analysis results, and secure environments for data sharing that include addressing the increasing use of laptops and of cloud computing. The end goal of this project is to meet the competing demands of providing patients with both privacy and accurate predictive models based on clinical history and genetics. This project includes the first concrete evaluation of privacy- preserving data mining algorithms on actual combined EMR and genetic data, using with the Wisconsin Genomics Initiative data set.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1-ZH-C (J2))
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Wisconsin Madison
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Burnside, Elizabeth S; Liu, Jie; Wu, Yirong et al. (2016) Comparing Mammography Abnormality Features to Genetic Variants in the Prediction of Breast Cancer in Women Recommended for Breast Biopsy. Acad Radiol 23:62-9
Fan, Jun; Wu, Yirong; Yuan, Ming et al. (2016) Structure-Leveraged Methods in Breast Cancer Risk Prediction. J Mach Learn Res 17:
Wu, Yirong; Abbey, Craig K; Liu, Jie et al. (2016) Discriminatory power of common genetic variants in personalized breast cancer diagnosis. Proc SPIE Int Soc Opt Eng 9787:
Ye, Zhan; Mayer, John; Ivacic, Lynn et al. (2015) Phenome-wide association studies (PheWASs) for functional variants. Eur J Hum Genet 23:523-9
Wu, Yirong; Liu, Jie; Del Rio, Alejandro Munoz et al. (2015) Developing a clinical utility framework to evaluate prediction models in radiogenomics. Proc SPIE Int Soc Opt Eng 9416:
Liu, Jie; Wu, Yirong; Ong, Irene et al. (2015) Leveraging Interaction between Genetic Variants and Mammographic Findings for Personalized Breast Cancer Diagnosis. AMIA Jt Summits Transl Sci Proc 2015:107-11
Benndorf, Matthias; Burnside, Elizabeth S; Herda, Christoph et al. (2015) External validation of a publicly available computer assisted diagnostic tool for mammographic mass lesions with two high prevalence research datasets. Med Phys 42:4987-96
Benndorf, Matthias; Kotter, Elmar; Langer, Mathias et al. (2015) Development of an online, publicly accessible naive Bayesian decision support tool for mammographic mass lesions based on the American College of Radiology (ACR) BI-RADS lexicon. Eur Radiol 25:1768-75
Wu, Yirong; Abbey, Craig K; Chen, Xianqiao et al. (2015) Developing a utility decision framework to evaluate predictive models in breast cancer risk estimation. J Med Imaging (Bellingham) 2:041005
Kuusisto, Finn; Dutra, InĂªs; Elezaby, Mai et al. (2015) Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems. AMIA Jt Summits Transl Sci Proc 2015:87-91

Showing the most recent 10 out of 33 publications