High-accuracy artificial intelligence (AI) has repeatedly delivered great benefit across science and engineering. AI enables information extraction and analysis of large datasets and the creation of high-fidelity models that can augment or replace more computationally expensive calculations in traditional simulation codes. In this way, AI can accelerate time-to-science by orders of magnitude. To reap these benefits, highly complex AI models must first be trained. Training is a computation-intensive process that takes days, weeks, or even months, requires optimization of both the network architecture and its hyperparameters, and limits the scope and complexity of challenges addressed. What if training time could be reduced to the point of being interactive, taking only minutes or hours even in the more extreme cases? The result would be transformative: scientists and engineers could rapidly develop and refine their ideas, enabling them to achieve high-impact solutions to the most pressing and complex issues.
To help advance knowledge by enabling unprecedented AI speed and scalability, the Pittsburgh Supercomputing Center (PSC), a joint research center of Carnegie Mellon University and the University of Pittsburgh, in partnership with Cerebras Systems and Hewlett Packard Enterprise (HPE), will deploy Neocortex, an innovative computing resource that will accelerate scientific discovery by vastly shortening the time required for deep learning training/inference, foster greater integration of deep AI models with scientific workflows, and provide revolutionary innovative hardware for the development of more efficient algorithms for artificial intelligence and graph analytics. Neocortex will advance knowledge by accelerating scientific research, enabling development of more accurate models and use of larger training data, scaling model parallelism to unprecedented levels, focusing on human productivity by simplifying tuning and hyperparameter optimization, and providing a revolutionary hardware platform for the exploration of new frontiers.
Neocortex will introduce the most powerful AI processor to the NSF cyberinfrastructure ecosystem and will democratize access to game-changing compute power, otherwise only available to tech giants, for students, postdocs, faculty, and others, who require faster training turnaround to analyze data and integrate AI with simulation. It will provide a unique opportunity to explore the potential of a groundbreaking new AI hardware architecture, tapping into the revolutionary AI processor technology of the Cerebras CS-1 AI platform and the large in-memory scale up capabilities of HPE Superdome Flex to unlock new insights and accelerate time to discovery. The Neocortex project will additionally focus on building a strong community around these revolutionary capabilities, including collaborations with other leading national institutions and emphasizing inclusion and diversity. It will build STEM talent through training and internships, develop the U.S. workforce and national competitiveness through industrial outreach, and foster international collaborations. Public outreach and XSEDE campus champion and domain champion activities will help engage a wider audience.
The novel Neocortex architecture will couple two exceptionally powerful Cerebras CS-1 AI servers with an exceptionally large shared memory HPE Superdome Flex HPC server to achieve unprecedented AI scalability with excellent system balance. Each Cerebras CS-1 is powered by one Cerebras Wafer Scale Engine (WSE) processor, a revolutionary high-performance processor designed specifically to accelerate deep learning training and inferencing. The Cerebras WSE is the largest chip ever built, containing 400,000 AI-optimized cores implemented on a 46,225 square millimeter wafer with 1.2 trillion transistors. An on-chip fabric provides 100Pb/s of bandwidth through a fully configurable 2D mesh with no software overhead. The Cerebras WSE includes 18GB of SRAM accessible within a single clock cycle at 9PB/s bandwidth. The Cerebras WSE is uniquely engineered to enable efficient sparse computation, wasting neither time nor power multiplying the many zeroes that occur in deep networks. The Cerebras CS-1 software can be programmed with common ML frameworks such as TensorFlow and PyTorch, which for computational efficiency are mapped onto an optimized graph representation and a set of model-specific computation kernels. It also supports native code development. Support for the most popular deep learning frameworks and automatic, transparent acceleration will provide researchers with exceptional ease of use.
The HPE Superdome Flex HPC server of Neocortex will be an extremely powerful, user-friendly front end for the Cerebras CS-1 servers. This will enable flexible pre- and post-processing of data flowing in and out of the attached WSEs, preventing bottlenecks to taking full advantage of the WSE capability, and implementing advanced deep learning functions such as augmentation, hyper-parameter and model optimization, and ensemble learning. The Superdome Flex will be robustly provisioned with 24TB of RAM, 204.8TB of high-performance NVMe flash storage, 32 Intel Xeon CPUs, and 24 100GbE network interface cards to create the greatest flexibility for scaling applications across multiple CS-1 systems. Internally, the HPE Superdome Flex is interconnected by a custom memory fabric ASIC for cache-coherent hardware shared memory sustaining 850GB/s of interconnect bandwidth. Its large and fast memory and high compute performance will enable training on very large datasets with exceptional ease, avoiding the laborious task of splitting and trying to load-balance datasets across worker nodes.
Each Cerebras CS-1 has 1.2Tbps I/O, and will connect to the HPE Superdome Flex via twelve, standard 100GbE links. This configuration will deliver the greatest possible performance and flexibility, including, via PSC-Cerebras and PSC-HPE research partnerships, exploration of scaling training to multiple CS-1 systems.
Neocortex will be federated via 16 InfiniBand HDR100 connections (an aggregate 1.6Tbps) with Bridges-2, an NSF-supported capacity resource. This federation will yield great benefits to the user community including access to the Bridges-2 filesystem to manage persistent data; general-purpose computing for data preprocessing and traditional machine learning; interoperation with data-intensive projects using Bridges-2; and high-bandwidth external network connectivity to other XSEDE Service Providers, campus, labs, and clouds.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.