Proposal #: CNS 07-09140 07-08307 07-08820 PI(s): Brockman, Jay B. Bader, David A. Gao, Guang R. Barabasi,Albert-Laszlo;Chawla,Nitesh;Kogge,PeterM. Vetter, Jeffrey S. Institution: University of Notre Dame Georgia Institute Tech U.Delaware Notre Dame, IN 46556-5602 Atlanta, GA 30332-0002 Newark, DE 19716-1551 Proposal #: CNS 07-09385 07-09111 07-09254 PI(s): Gilbert, John R. Upchurch, Edwin T. Yelick, Katherine A. Wolski, Richard. Institution: UC-Santa Barbara California Inst Tech UC-Berkeley Santa Barbara, CA 93106-2050 Pasadena, CA 91125-0600 Berkeley, CA 94704-5940 Title: Colla Rsch:IAD:Dev Rsch Infr. for Multithreaded Computing Community Using Cray Eldorado Platform
Project Proposed:
This collaborative project, developing a shared infrastructure needed to broaden its impact for developing software to run on the next generation of computer hardware, brings a diverse group of researchers from six universities in a joint effort. The work responds to the trend towards multicore processors where developers envision placing tens to hundreds of cores on a single die, each running multiple threads (in contrast to the currently dominant message-passing architectures resulting from the advent of MPI and Linux clusters). Three objectives are proposed: . Acquiring computer hardware as a shared community resource capable of efficiently running, in experimental and production modes, complex programs with thousands of threads in shared memory; . Assembling software infrastructure for developing and measuring performance of programs running on the hardware; and . Building stronger ties between the people themselves, creating ways for researchers at the partner institutions to collaborate and communicate their findings to the broader community. The Cray XMT system, scheduled for delivery in 2007 serves as an ideal platform. The second bullet includes algorithms, data sets, libraries, languages, tools, and simulators to evaluate performance of program running on the hardware focusing on applications that benefit from large numbers of threats, massively data intensive, ""sparse-graph"" problems that are difficult to parallelize using conventional message-passing on clusters. Each university contributes a piece to the infrastructure, using it for support of projects. Sandia National Laboratories has agreed to host the system and provide supplementary funding. Each university will use the Cray XMT system in courses.
Broader Impacts: The infrastructure measures performance providing a basis for the community to improve sharin, and build strong ties for collaboration and communication. Courses will be created and materials will be made available. Workshops for dissemination of the findings are also planned.
On this research project, we acquired a Cray XMT massively multithreaded supercomputing platform, and performed basic research on developing algorithms for big data challenges. Generalizing k-Betweenness Centrality Using Short Paths and a Parallel Multithreaded Implementation: We present a new parallel algorithm that extends and generalizes the traditional graph analysis metric of betweenness centrality to include additional non-shortest paths according to an input parameter k. Betweenness centrality is a useful kernel for analyzing the importance of vertices or edges in a graph and has found uses in social networks, biological networks, and power grids, among others. k-betweenness centrality captures the additional information provided by paths whose length is within k units of the shortest path length. These additional paths provide robustness that is not captured in traditional betweenness centrality computations, and they may become important shortest paths if key edges are missing in the data. We implement our parallel algorithm using lock-free methods on a massively multithreaded Cray XMT. We apply this implementation to a real-world data set of pages on the World Wide Web and show the importance of the additional data incorporated by our algorithm. Massive Streaming Data Analytics: A Case Study with Clustering Coefficients: We present a new approach for parallel massive graph analysis of streaming, temporal data with a dynamic and extensible representation. Handling the constant stream of new data from health care, security, business, and social network applications requires new algorithms and data structures. We examine data structure and algorithm trade-offs that extract the parallelism necessary for high-performance updating analysis of massive graphs. Static analysis kernels often rely on storing input data in a specific structure. Maintaining these structures for each possible kernel with high data rates incurs a significant performance cost. A case study computing clustering coefficients on a general-purpose data structure demonstrates incremental updates can be more efficient than global recomputation. Within this kernel, we compare three methods for dynamically updating local clustering coefficients: a brute-force local recalculation, a sorting algorithm, and our new approximation method using a Bloom filter. On 32 processors of a Cray XMT with a synthetic scale-free graph of 16 million vertices and 537 million edges, the brute-force method processes a mean of over 50,000 updates per second and our Bloom filter approaches 200,000 updates per second.