This award 12 cabinets of compute nodes, with an additional 144 TF and 18 TB of memory, to Kraken, taking it to 1,174 TF and 147 TB. This will increase its node count from 8,256 to 9,408 by adding 1,152 nodes. With the addition of 1,152 nodes, Kraken will have significant room left to run other jobs simultaneously with 8,192 node jobs. Indeed, it will be possible to run a 8,192 job and a 1,024 node job at the same time. Given the ?bi-modal? distribution of the user jobs, with most job sizes at either 8,192 and above, or at 1,024 and below, this will greatly improve the responsiveness of the machine for users. The Kraken upgrade will allow users to extend their codes by scaling them up to the largest NSF supercomputer, with over 110,000 cores, implementing new models that could not have been run before this upgrade. There already a number of users scaling their applications to the full size of the current Kraken, and some of them are already prepared to use a larger system. Additionally, this will help facilitate scaling for applications already chosen to run on the Track 1 Blue Waters system, and prepare them for that very large machine.

Project Report

Kraken is the most popular and powerful system within the NSF’s TeraGrid organization, and provides approximately 50% of the total available resources. It is an extremely usable and scalable system, delivering over 80% of peak on the High Performance Linpack benchmark, and ranked as the world’s top Academic supercomputer, third overall, in the most recent Top500 list. Kraken was augmented with 12 cabinets of compute nodes, with an additional 144 TF and 18 TB of memory, taking it to 1,174 TF and 147 TB. This increased its node count from 8,256 to 9,408 by adding 1,152 nodes. With the shortfall in resources, the NSF scientific community badly needed this additional compute capacity, and the addition of 12 cabinets provided even more than that. Kraken runs a large proportion of "full machine" jobs, many of which use exactly 8,192 nodes. Previously, with only 8,256 compute nodes total, very little room was left on the system for smaller jobs. With the addition of 1,152 nodes, Kraken now has significant room left to run other jobs simultaneously with 8,192 node jobs. We now run a 8,192 job and a 1,024 node job at the same time. Given the "bi-modal" distribution of the user jobs, with most job sizes at either 8,192 and above, or at 1,024 and below, this has greatly improve the responsiveness of the machine for users. This was a very cost-effective upgrade, taking advantage of the already installed support infrastructure for Kraken, and it will operate the upgrade throughout the presently scheduled Kraken lifetime, i.e. through April 2012. This Kraken upgrade has allowed users to extend their codes by scaling them up to the largest NSF supercomputer, with over 110,000 cores, implementing new models that could not have been run before this upgrade. The larger machine has enabled easier access for classes and courses on Kraken. In particular, the great increase in headroom above the popular 8,192 node threshold has made it much easier to fit educational work into the machine while it is being used for capability computing. In the past, we have been unable to provide time for some classes because it would have prevented time-sensitive jobs at 8,192 nodes from running, but with the extra headroom, it is possible to cater to more of these requests. Under this award, Kraken enabled science successfully and reliably for the TeraGrid from February 2011 to June 2011. In this time, Kraken delivered over 359 million hours to scientists in 266,804 jobs, with over 97% uptime and 95% utilization. Over US 20 institutions used time on this machine, as allocated by the TeraGrid Resource Allocations Committee. This machine was used in classes at University of Tennessee-Knoxville, Brown University and Truman State University. NICS also participated in the Virtual School for Computational Science and Engineering, the TeraGrid/DEISA Summer School in HPC, the USA Science and Engineering Festival, Tapia, the National Society of Black Engineers, ESPCoR, the TeraGrid 2010 and Supercomputing 2010 conferences.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1041709
Program Officer
Barry I. Schneider
Project Start
Project End
Budget Start
2010-09-01
Budget End
2011-08-31
Support Year
Fiscal Year
2010
Total Cost
$2,850,000
Indirect Cost
Name
University of Tennessee Knoxville
Department
Type
DUNS #
City
Knoxville
State
TN
Country
United States
Zip Code
37916