Advances in DNA sequencing technologies are expected to make available a massive amount of human genomic data in the years to come, which need computation of a tremendous scale to process. This demand cannot be met by today's commercial clouds, since they do not provide strong privacy guarantees, or by using existing cryptography techniques, since they still cannot achieve the performance required for big-data analytics. The emergence of the new-generation hardware support for trusted execution environments presents a new opportunity for scalable data protection, through the processors designed to withstand the attacks even from a fully compromised operating system. To seize this opportunity, this project aims at developing a distributed, parallel computing framework critical for executing data-intensive tasks on trusted execution environment-capable systems. This research will be performed in collaboration with Intel, which will transfer the new methods developed for large-scale data protection to industry and genomic researchers. Students from historically black colleges and universities will participate in the work.

The project focuses on developing a big-data analytics framework built on Intel Software Guard Extensions (SGX) and applying it to support privacy-preserving, large-scale genomic data analyses and other computing tasks. Based upon the understanding of unique performance impacts of SGX systems, including those incurred by enclave creation, management, trust establishment, cross-enclave communication and others, a new MPI-based cluster computing framework is built to automatically optimize the deployment of computing nodes across enclaves and CPU packages under resource constraints. This new framework supports a set of fundamental genomic computing tasks, ranging from reads-mapping to peptide identification, as well as machine-learning based models. Also, its potential risks, side-channel leaks in particular, are analyzed and effectively controlled to provide high privacy assurance. The work will enable broad sharing of previously inaccessible data and help drive the new insights of individualized health care.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1838083
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2019-01-01
Budget End
2021-12-31
Support Year
Fiscal Year
2018
Total Cost
$1,000,000
Indirect Cost
Name
Indiana University
Department
Type
DUNS #
City
Bloomington
State
IN
Country
United States
Zip Code
47401