Computer-assisted medicine is at a crossroads: medical care requires accurate data, but making such data widely available can create unacceptable risks to the privacy of individual patients. This tension between utility and privacy is especially acute in predictive personalized medicine (PPM). PPM holds the promise of making treatment decisions tailored to the individual based on her or his particular genetics and clinical history. Making PPM a reality requires running statistical, data mining and machine learning algorithms on combined genetic, clinical and demographic data to construct predictive models. Access to such data directly competes with the need for healthcare providers to protect the privacy of each patient's data, thus creating a tradeoff between model efficacy and privacy. Thus we find ourselves in an unfortunate standoff: significant medical advances that would result from more powerful mining of the data by a wider variety of researchers are hindered by significant privacy concerns on behalf of the patients represented in the data set. In this proposed work, we seek to develop and evaluate technology to resolve this standoff, enabling health practitioners and researchers to compute on privacy-sensitive medical records in order to make treatment decisions or create accurate models, while protecting patient privacy. We will evaluate our approach on a de-identified actual electronic medical record, with an average of 29 years of clinical history on each patient, and with detailed genetic data (650K SNPs) available for a subset of 5000 of the patients. This data set is available to us now through the Wisconsin Genomics Initiative, but only on a computer at the Marshfield Clinic. If successful our approach will make possible the sharing of this cutting-edge data set, and others like it that are now in development, including our ability to analyze this data at UW-Madison where we have thousands of processors available in our Condor pool. Our privacy approach integrates secure data access environments, including those appropriate to the use of laptops and cloud computing, with novel anonymization algorithms providing differential privacy guarantees for data and/or published results of data analysis. To this end, our specific aims are as follows:
AIM 1 : Develop and deploy a secure local environment that, in combination with secure network functionality, will ensure end-to-end security and privacy for electronic medical records and biomedical datasets shared between clinical institutions and researchers.
AIM 2 : Develop and deploy a secure virtual environment to allow large-scale, privacy-preserving data analysis """"""""in the cloud."""""""" AIM 3: Develop and evaluate privacy-preserving data mining algorithms for use with original (not anonymized) data sets consisting of electronic medical records and genetic data.
AIM 4 : Develop and evaluate anonymizing data publishing algorithms and privacy guarantees that are appropriate to the complex structure present in electronic medical records with genetic data.
This project will develop an integrated approach to secure sharing of clinical and genetic data that based on algorithms for anonymization of data to achieve differential privacy guarantees, for privacy-preserving publication of data analysis results, and secure environments for data sharing that include addressing the increasing use of laptops and of cloud computing. The end goal of this project is to meet the competing demands of providing patients with both privacy and accurate predictive models based on clinical history and genetics. This project includes the first concrete evaluation of privacy- preserving data mining algorithms on actual combined EMR and genetic data, using with the Wisconsin Genomics Initiative data set.
Showing the most recent 10 out of 33 publications