The most powerful computing systems in the world have historically been dedicated to solving scientific problems. Until recently, the computations performed by these systems have typically been simulations of various physical phenomena. However, a new paradigm for scientific discovery has been steadily rising in importance, namely, data-intensive science, which focuses sophisticated analysis techniques on the enormous (and ever increasing) amounts of data being produced in scientific, commercial, and social endeavors. Important research based on data-intensive science include areas as diverse as knowledge discovery, bioinformatics, proteomics and genomics, data mining and search, electronic design automation, computer vision, and Internet routing. Unfortunately, the computational approaches needed for data-intensive science differ markedly from those that have been so effective for simulation-based supercomputing. To enable and facilitate efficient execution of data-intensive scientific problems, this project will develop a comprehensive hardware and software supercomputing system for data-intensive science.

Graph algorithms and data structures are fundamental to data-intensive computations and, consequently, this project is focused on providing fundamental, new understandings of the basics of large-scale graph processing and how to build scalable systems to efficiently solve large-scale graph problems. In particular, this work will characterize processing overheads and the limits of graph processing scalability, develop performance models that properly capture graph algorithms, define the (co-design) process for developing graph-specific hardware, and experimentally verify our approach with a prototype execution environment. Key capabilities of our system include: a novel fine-grained parallel programming model, a scalable library of graph algorithms and data structures, a graph-optimized core architecture, and a scalable graph execution platform. The project will also address the programming challenges involved in constructing scalable and reliable software for data-intensive problems.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
1111888
Program Officer
Almadena Y. Chtchelkanova
Project Start
Project End
Budget Start
2011-08-01
Budget End
2015-07-31
Support Year
Fiscal Year
2011
Total Cost
$811,941
Indirect Cost
Name
Indiana University
Department
Type
DUNS #
City
Bloomington
State
IN
Country
United States
Zip Code
47401