Chip multiprocessor (CMP) systems, which provide multiple processors on a single chip, also known as multicore systems, have displaced single-processor architectures as the de facto standard model for computing platforms. This change is due to the fact that the CMPs offer superior performance and power efficiency, compared to the traditional designs. An emerging feature of the CMP era is the deployment of several different types of processing elements on the same platform, with varying computation speed and power consumption characteristics. An additional complicating factor is that a trend toward overprovisioned designs, where only a subset of the available cores can be active at any time, due to power and thermal constraints. Such heterogeneous CMPs are increasingly being deployed in systems where applications with different safety assurance (dependability) and timeliness requirements must co-exist on the same CMP. Hence, there is a growing need for an integrated framework to allocate heterogeneous hardware resources of a CMP among applications in a way that makes efficient use of the resources while assuring that the diverse safety and timeliness requirements of the applications are met.
This project aims to develop models, algorithms, and run-time management schemes for collections of applications with a mix of different timing and dependability requirements running on a shared heterogeneous CMP platform. In particular, a central objective is to develop a sound methodology to selectively apply known hardware and software fault tolerance mechanisms (such as modular redundancy, task replication, re-execution) to such mixed-dependability applications, by considering resource, power, and timing constraints simultaneously. A second objective is to extend the framework to tackle the challenge of intermittent run-time faults that occur in bursts and can affect multiple applications at once during a bounded time window. Success in these efforts could improve the safety and reduce the development and production costs of the increasingly complex cyber-physical systems upon which we all have come to depend.
Education and outreach activities include integration of aspects of the research into undergraduate and graduate courses at the two participating institutions, involvement of students as research assistants, and efforts to recruit student participants from under-represented demographic groups.