Computing substrates such as multi-core processors or Field Programmable Gate Arrays (FPGAs) share the characteristic of having two-dimensional arrays of processing elements interconnected by a routing fabric. At one end of the spectrum, FPGAs have single-output programmable logic functions, and at the other end, multi-core chips have complex 32/64-bit processing cores. For different applications, different programmable substrates produce the best area-power-performance tradeoffs. This project is developing a large-scale multi-core substrate that has hundreds or thousands of simple processing cores along with a compilation system that maps computations onto this fabric. This many-core architecture, named Diastolic Array, is coarser-grained than FPGAs but finer-grained than conventional multi-cores. To efficiently exploit such a large number of processing cores, the architecture needs spatially mapping a computation to processing cores and communication to the point-to-point interconnect network. To be practically viable, this mapping process must be automated and effective. The project addresses this challenge by simultaneously developing hardware architecture and a compilation system.

A diastolic array chip is expected to outperform FPGAs or general-purpose processors on an interesting class of applications, enabling more efficient prototyping and low-volume production. The outcomes of this project such as statically-configured interconnection architecture with associated algorithms for routing and resource allocation will also be applicable to other multi-core designs. Finally, the project is developing a new parallel processing module for an undergraduate computer architecture class to give sophomores early exposure to parallel hardware, experience with writing parallel programs and using compilers that exploit parallelism.

Project Report

Today’s computing systems typically use processors with a few processing cores. For example, typically smartphone uses a dual-core or quad-core processor. Future computing systems are expected to rely on even more processing cores, not only traditional processing cores but also more specialized cores such as graphics processing units, multimedia processing engines, etc., in order to provide higher performance with low energy consumption. As we increase the number of processing elements on a single processor, however, it will become increasingly challenging to have many processing elements efficiently communicate with each other and also correctly program them to perform computations in parallel. At the same time, a large number of processing cores that share hardware resources such as on-chip interconnect and off-chip memory controllers also introduce a new security challenge in protecting secrets. For example, recent studies on cloud computing such as Amazon EC2 have demonstrated that information can leak through timing variations that come from resource contention. The goal of this project was to develop a many-core architecture along with a compilation system that can achieve high-performance, low-power, and trustworthy operations. To achieve this goal, the project investigated multiple aspects of the future many-core processors. To improve efficiency, we developed a new on-chip interconnect network that uses static optimization algorithms to compute efficient network resource allocation such as task placement, scheduling, and traffic routing. This approach provided efficiency without complex hardware required in run-time adaptation techniques. For programmability, we investigated new ways to detect common bugs in parallel programs. For example, the research identified a new heuristic to detect common non-race bugs in multi-threaded programs, and showed that the new scheme can detect a broad range of non-race bugs with low false positives. The project also designed hardware to enable high-coverage detection of data races with almost no run-time overhead. For security, the project has developed protection techniques that can eliminate timing interference in on-chip networks and off-chip memory controllers, and showed that the overhead can be reasonable. The simulation infrastructure, named HORNET, that is developed partly through this project is available for public download. In addition to the technical developments, the project trained a number of graduate and undergraduate students through its research activities. Because the project involved collaborations with researchers in traditional networking (the Internet), the students were trained to work across traditional discipline boundaries. The outcome from the project is also integrated with a graduate course in advanced computer architecture.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
0905208
Program Officer
Sankar Basu
Project Start
Project End
Budget Start
2009-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2009
Total Cost
$350,000
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850