Scientists are often stymied in their research due to the inaccessibility of relevant data. Additionally, many data owners silo data away from powerful, economical cloud computing resources due to privacy and confidentiality concerns. This project enables data scientists to compute statistics over protected datasets while simultaneously empowering the owners of the underlying datasets to maintain control over how their data is used in computations and viewed by other people. The work also brings a cryptographically secure computing engine to one of the largest collections of small to medium sized research data in the world, running on a federated datacenter operated by multiple non-trusting vendors. In doing so, this project enhances the flow of information sharing to promote transparency and accountability for data sharing and processing decisions while simultaneously reducing tenants' need to trust the cloud's behavior thanks to cryptographic protections that promote confidentiality and integrity. The project enables scientific research computing on workflows involving collaborative experiments or replication and extension of existing results when the underlying data are encumbered by privacy concerns.

To accomplish this goal and enhance the economic potential of the cloud, the researchers and engineers on this project integrate and enhance three technologies they have previously developed. First, the Dataverse data management infrastructure houses, curates, and indexes social, physical, and life science data. Second, the Massachusetts Open Cloud (MOC) is a computing environment designed from the ground up to promote user control and flexibility over trust decisions. Third, Conclave compiles legacy code into a cryptographically secure multi-party computation program that can be executed on top of existing data processing frameworks like Hadoop and Spark. This project develops and open-sources the necessary cyberinfrastructure to integrate these technologies and provide a combined "secure computing element" into which data and analytics may be inserted and their resulting answers fed back. This secure computing element incorporates several designs: (i) policy-agnostic programming to ensure that legacy code may be accepted, (ii) the MOC's isolation mechanism to ensure that data owners may choose exactly which environment to entrust with their data, (iii) Conclave to hide the source data from everyone other the intended recipient (even the cloud itself), a policy engine to ensure that the data owner consents to the requested analytic, (iv) Dataverse's data classification engine to manage access control over source and derived data, and (v) a new auditing and billing mechanism to promote transparency, punish those who exceed their privileges, and provide a sustainable economic model for growth.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1739000
Program Officer
Robert Beverly
Project Start
Project End
Budget Start
2017-09-01
Budget End
2020-08-31
Support Year
Fiscal Year
2017
Total Cost
$1,002,988
Indirect Cost
Name
Boston University
Department
Type
DUNS #
City
Boston
State
MA
Country
United States
Zip Code
02215