"This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5)."
Mainstream computing - on the desktop, in the datacenter, and in embedded devices - is undergoing an unprecedented shift toward parallelism as manufacturers adopt multi-core architectures. Conventional multi-core processors have very inefficient communication, synchronization, and locality-management mechanisms causing them to scale poorly on hard problems and to be difficult to program. The research proposes to develop a set of efficient mechanisms - hardware APIs and supporting microarchitecture - for communication, synchronization, and locality management. Specifically, the project proposes to develop an agile memory hierarchy, register-based communication and synchronization, and fast active messages. By reducing communication and synchronization overhead by orders of magnitude, it is possible to substantially improve the efficiency, programmability, and scalability of multi-core processors. Overall, the mechanisms will free programmers from the incidental constraints imposed by conventional multi-core architectures - allowing them to concentrate instead on the fundamental issues of parallelism, locality, and load balance.
The proposed work is expected to have an immense impact on future architecture and programming systems for multi-core processors. By reducing communication and synchronization overheads, the mechanisms will enable many applications that are not embarrassingly parallel to benefit from multi-core architectures. The work is likely to enable a new generation of multi-core programming systems. The educational plan includes to integrate the results of this research into graduate and undergraduate courses at Stanford University.