SHF: Small: Collaborative Research: Modeling and Analyzing Big Data on Peta- and Exascale Distributed Systems Supported by MapReduce Methodologies

Cicotti, Pietro

Abstract

Current petascale platforms can perform large-scale simulations and generate massive amounts of data at unprecedented rates. These rates are expected to increase as exascale platforms are introduced. The generation of more and more data presents new challenges for scientists who struggle with the analysis, sorting, and selection of scientifically meaningful results. When very large amounts of data records are located across a large number of nodes in a distributed memory system, even a small number of comparisons can be costly or even impossible. Therefore, new methodologies are necessary to analyze large scientific datasets at scale.

The goal of this project is to develop a transformative analysis method to model the properties of large scientific datasets in a distributed manner on petascale systems today and exascale systems in the future. The research activity includes (1) the design of new algorithms for encoding properties embedded in distributed data in a parallel manner by using space reduction techniques; (2) the design of new algorithms for clustering and classifying these properties by using distributed paradigms such as MapReduce; (3) the deployment of the algorithms for diverse datasets in structural biology and astronomy; and (4) the tuning of the algorithms for both result performance and accuracy on emerging storage technologies.

The analysis method will provide the scientific community with infrastructures and instrumentations to identify features that can be used to predict class memberships; find recurrent patterns in datasets; and identify class memberships from a specific feature or property. By effectively and accurately capturing scientific information in a scalable manner, these infrastructures and instrumentations will break the traditional constraint of data centralization and allow scientists to overcome the difficulties associated with the fully distributed nature of the data considered.

The project's educational component promotes training and learning in computational modeling and analysis techniques as well as data-intensive algorithms and platforms by involving undergraduate and graduate students in research activities and integrating big data analytics into the undergraduate curriculum at the University of Delaware. The research-based educational materials developed in this project will be made available to the scientific community through the project portal and through tutorials at XSEDE and Supercomputing (SC) conferences.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Communication Foundations (CCF)
Type: Standard Grant (Standard)
Application #: 1318417
Program Officer: Almadena Chtchelkanova

Project Start
Project End
Budget Start: 2013-09-01
Budget End: 2017-08-31
Support Year
Fiscal Year: 2013
Total Cost: $69,038
Indirect Cost

SHF: Small: Collaborative Research: Modeling and Analyzing Big Data on Peta- and Exascale Distributed Systems Supported by MapReduce Methodologies
Cicotti, Pietro
University of California San Diego, La Jolla, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments