The increase in processor clock frequencies from 1980-2003 has slowed down significantly in recent years. To improve computer performance computer architects are exploring parallel architectures including many-core architectures. In a many-core or multi-core architecture, processor cores with relatively low complexity are connected to memory and to each other via high-bandwidth on-chip interconnect. The most popular programming model for multi-cores is that of shared memory. In this memory model, programmers write different threads that can run on different processors all of which can share a single memory space. This means that the on-chip cache memory on the multi-core chip should behave like a large shared cache. Unfortunately, current schemes for cache coherence either suffer from lack of scalability or require large directories at each core significantly increasing chip area and power.
A directoryless cache coherence scheme is being investigated in this project that relies on the mechanism of execution migration. In execution migration, a thread?s context or state moves to the processor in whose cache the data resides. An important advantage of an execution migration architecture is that only a one-way trip is required to access data, since the thread moves to access data. In conventional data migration architectures, a round-trip is required to access data ? a request is sent to the location where the data resides and then the data is sent to the requesting thread. Further, only one copy of data need be present on chip if execution migration is used, since threads can move. This means that cache coherence is trivially ensured. Moreover, the chip can store more distinct data, since data is not replicated and this reduces off-chip access rates. Finally, an execution migration architecture can exploit the plentiful on-chip bandwidth available to speed up thread migration, thereby reducing data access latency.
There are challenges associated with this architecture corresponding to contention for shared data across multiple threads, and the energy required to move thread contexts. The first challenge is being met through judicious replication of data at the program source level or compiler level. In particular, limited read copies of data are created across multiple threads. Since these copies only exist in between two writes to the data, coherence is ensured as before without need for complex coherence logic. However, contention for shared data is significantly reduced. The second challenge of energy consumption is being met through migration of partial thread contexts ? if a stack machine is used as the processor core, energy consumption can be reduced by migrating a subset of the thread context corresponding to the top part of the stack instead of the entire stack.
In this project, an Execution Migration Machine with over 100 cores is being designed, and being evaluated using cycle-accurate simulation, and critical elements of the machine are being built on a Field Programmable Gate Array (FPGA). This project has the potential to meet the scalability and programmability challenges that face shared memory multi-core architectures. The Execution Migration Machine design will shed insight into how best thread migration can be used to enhance multi-core performance, possibly in combination with data migration. If successful, the project will impact the design of future multi-core processors through intelligent use of program and data migration.