CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science

Fox, Geoffrey; Marathe, Madhav; Jha, Shantenu; Qiu, Judy; Wang, Fusheng

Abstract

Many scientific problems depend on the ability to analyze and compute on large amounts of data. This analysis often does not scale well; its effectiveness is hampered by the increasing volume, variety and rate of change (velocity) of big data. This project will design, develop and implement building blocks that enable a fundamental improvement in the ability to support data intensive analysis on a broad range of cyberinfrastructure, including that supported by NSF for the scientific community. The project will integrate features of traditional high-performance computing, such as scientific libraries, communication and resource management middleware, with the rich set of capabilities found in the commercial Big Data ecosystem. The latter includes many important software systems such as Hadoop, available from the Apache open source community. A collaboration between university teams at Arizona, Emory, Indiana (lead), Kansas, Rutgers, Virginia Tech, and Utah provides the broad expertise needed to design and successfully execute the project. The project will engage scientists and educators with annual workshops and activities at discipline-specific meetings, both to gather requirements for and feedback on its software. It will include under-represented communities with summer experiences, and will develop curriculum modules that include demonstrations built as 'Data Analytics as a Service.'

The project will design and implement a software Middleware for Data-Intensive Analytics and Science (MIDAS) that will enable scalable applications with the performance of HPC (High Performance Computing) and the rich functionality of the commodity Apache Big Data Stack. Further, this project will design and implement a set of cross-cutting high-performance data-analysis libraries; SPIDAL (Scalable Parallel Interoperable Data Analytics Library) will support new programming and execution models for data-intensive analysis in a wide range of science and engineering applications. The project addresses major data challenges in seven different communities: Biomolecular Simulations, Network and Computational Social Science, Epidemiology, Computer Vision, Spatial Geographical Information Systems, Remote Sensing for Polar Science, and Pathology Informatics. The project libraries will have the same beneficial impact on data analytics that scientific libraries such as PETSc, MPI and ScaLAPACK have had for supercomputer simulations. These libraries will be implemented to be scalable and interoperable across a range of computing systems including clouds, clusters and supercomputers.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Advanced CyberInfrastructure (ACI)
Type: Standard Grant (Standard)
Application #: 1443054
Program Officer: Amy Walton

Project Start
Project End
Budget Start: 2014-10-01
Budget End: 2021-09-30
Support Year
Fiscal Year: 2014
Total Cost: $5,283,170
Indirect Cost

CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science
Fox, Geoffrey Marathe, Madhav Jha, Shantenu Qiu, Judy Wang, Fusheng
Indiana University, Bloomington, IN, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments