The "dark silicon" effect, where an increasing fraction of cores will have to be kept powered off (or, "dark"), at every generation of transistor downsizing, has made it difficult to sustain further efficiency gains via the scaling of semiconductor technology. However, the demands of applications and their data on storage and processing capabilities are rapidly growing, thus increasing the gap between the efficiency of the system stack and the needs of modern applications. This research project aims to redesign the system stack based on a novel paradigm that combines throughput-processing architecture and a concurrency-centric compilation framework. The system stack used in this research project consists of architecture specialized for throughput, which trades single-thread instruction level parallelism (ILP) exploitation units for throughput units. The compiler is specialized for concurrency, which minimizes single thread latency by interleaved execution of a tremendous number of concurrent threads.
This research project reveals the implications of concurrent execution on throughput processors and how these implications affect compile-time decisions and the corresponding runtime optimization. The intellectual merits are two-fold: 1) it reveals that the existing mainstream CPU compilation techniques are concurrency-oblivious, which leaves both many challenging problems unanswered and many opportunities for performance improvement to be explored, and 2) it tackles these problems by addressing both the resource allocation and instruction/thread scheduling aspects of compile-time decision making, which is where the fundamental difference between the concurrent execution model and the traditional CPU execution model arises. The broader impacts of this project are that the research results will drive innovation in business, education, and computing applications by reinventing the system stack to enhance efficiency and to help achieve the next supercomputing milestone, namely, exascale-computing.