Machine learning on large-scale patient medical records can lead to the discovery of novel population-wide patterns enabling advances in genetics, disease mechanisms, drug discovery, healthcare policy, and public health. However, concerns over patient privacy prevent biomedical researchers from running their algorithms on large volumes of patient data, creating a barrier to important new discoveries through machine-learning.

The goal of this project is to address this barrier by developing privacy-preserving tools to query, cluster, classify and analyze medical databases. In particular, the project aims to ensure differential privacy --- a formal mathematical notion of privacy designed by cryptographers which has gained considerable attention in the systems, algorithms, machine-learning and data-mining communities in recent years. The primary challenge in applying differentially-private machine learning tools to biomedical informatics is the lack of statistical efficiency, or the large number of samples required.

The project will overcome this challenge by drawing on insights obtained from the PI's expertise to develop differentially-private and highly statistically-efficient machine learning tools for classification and clustering. The proposed research will advance the state-of-the-art in privacy-preserving data analysis by combining insights from differential privacy, statistics, machine learning, and database algorithms.

The proposed research is closely tied to the development of the undergraduate and graduate curricula at UCSD, feeding into the PI's new undergraduate machine learning class, a new graduate learning theory class, and updates to an algorithm design and analysis class. The corresponding materials will be publicly disseminated through the PI's website. The PI is strongly committed to increasing the participation of women and minorities, and will engage in outreach activities to attract and retain women in computer science.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1253942
Program Officer
Rebecca Hwa
Project Start
Project End
Budget Start
2013-07-01
Budget End
2019-06-30
Support Year
Fiscal Year
2012
Total Cost
$490,625
Indirect Cost
Name
University of California San Diego
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92093