CC*DNI DIBBs: Data Analysis and Management Building Blocks for Multi-Campus Cyberinfrastructure through Cloud Federation

Lifka, David; Wolski, Richard; Furlani, Thomas

Abstract

The ability to aggregate, share, and analyze important large data sets while optimizing time-to-science is essential to support multi-disciplinary and multi-institutional data-driven discovery. This project is deploying a federated cloud computing system in New York State and California comprised of data infrastructure building blocks designed to support scientists requiring flexible workflows and analysis tools for large-scale data sets. Data challenges from seven different communities-earth and atmospheric sciences, finance, chemistry, astronomy, civil engineering, genomics, and food science-are being addressed using a rich set of open source software, optimized frameworks, and cloud usage modalities. The federated cloud is operating at Cornell University (project lead) and at partner sites at the University at Buffalo and the University of California, Santa Barbara. The project team is supporting multi-disciplinary research groups with over forty global collaborators and documenting science use cases. The broader goal of this project is to develop a federated cloud model that encourages and rewards institutions for sharing large-scale data analysis resources that can be expanded internally with common, incremental building blocks and externally through meaningful collaborations with other institutions, public clouds, and NSF cloud resources.

Project documentation and webinars feature best practices and include how to create Virtual Machine instances, run at federated sites, burst to Amazon Web Services, and access, move, and store large-scale data. A new tool for cloud metrics is being built into Open XDMoD (XD Net Metrics on Demand) that features QBETS (Queue Bounds Estimation from Time Series) statistics to enable users to make online forecasts of future performance and allocation level availability as well as to predict when to burst from federation resources. A new allocations and accounting model allows institutional administrators to track utilization across federated sites and use this data as an exchange mechanism. These tools provide a better understanding of how the sharing of data infrastructure building block capacity across institutional boundaries can create wider science and engineering collaborations and increase data sharing in a scalable and sustainable way.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Advanced CyberInfrastructure (ACI)
Type: Cooperative Agreement (Coop)
Application #: 1541215
Program Officer: Amy Walton

Project Start
Project End
Budget Start: 2015-10-01
Budget End: 2021-09-30
Support Year
Fiscal Year: 2015
Total Cost: $8,229,079
Indirect Cost

CC*DNI DIBBs: Data Analysis and Management Building Blocks for Multi-Campus Cyberinfrastructure through Cloud Federation
Lifka, David Wolski, Richard Furlani, Thomas
Cornell University, Ithaca, NY, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments