Future multicore processor systems will have a growing amount of system-wide shared resources. However, shared resources will present significant undesirable asymmetry. For example, the capabilities of processor cores, cache access latency, and memory access cost will differ depending on the time and the location of their usage. If such asymmetry is not properly managed, the full potential of the multicore computing paradigm will not be achieved. This exploratory research will investigate a novel predictive resource management framework called MAESTRO. The proposed framework automatically learns asymmetry in the system and useful application behavior; the learned knowledge is accumulated and refined; and resource management decisions, such as cache capacity allocation, are made in a predictive manner by exploiting the accumulated knowledge. It is expected that MAESTRO's predictive strategies with detailed system and application knowledge will be a more effective solution to new multicore resource management problems than conventional reactive strategies with limited knowledge. The PI will validate this expectation with solid system prototyping and by studying two target resource management problems.
The project has the potential to impact the way future computer systems are designed and managed. It is inter-disciplinary by nature and requires understating of applications, computer architecture, OS and machine learning. Students working on this project will receive rigorous inter-disciplinary training.
The common theme of the research tasks in this project has been to intelligently monitor and characterize system workloads and to predictively manage a computer system's resources such as CPUs, caches, and memory. In the early stage of our research, we developed a novel multithreaded program characterization framework called "BarrierWatch" (published in Computing Frontiers 2011). The main idea of BarrierWatch is that, in each program interval defined by two barrier synchronization points, the program shows a repeatable behavior. Because such intervals occur multiple times, this characteristic gives us the ability to predict the behavior of an interval that we've seen previously. We proved the concept using a realistic example of power-performance management of network-on-chip resources. In our "Maestro" work (published in the NASA/ESA Conference on Adaptive Hardware and Systems (AHS) 2011), we outline our overall system architecture with regard to intelligent predictive resource management. This work finds that the intended and unintended asymmetries in system resource characteristics and usage will only increase in the future. It also finds that efficiently learning an application characteristics as well as learning the properties of the platform resources is key to the success of predictive resource management. In the "DEFCAM" work (published in ACM TACO), we monitor the cache's "faulty-ness" and predictively apply/tune the cache resource salvaging methods to achieve the highest performance given the situation. This work is comprehensive and provides a framework to evaluate existing and new performance and yield enhancing strategies relevant to on-chip memory designs. "RDIS" (Recursively Defined Invertible Set) is a novel technique to mask off the effects of faults in large resistive memories. What separates this work (published in DSN 2012) from prior art is that it builds on a strong theoretical analysis of important fault characteristics of resistive memory, namely, stuck-at fault model and readability of faulty cells. RDIS allows for the correct retrieval of data by recursively determining and efficiently keeping track of the positions of the bits that are stuck at a value different from the ones that are written, and then, at read time, by inverting the values read from those positions. RDIS boasts a very low probability of failure that increases slowly with the relative increase in the number of faults. Moreover, RDIS tolerates many more faults than the best existing schemes by up to 95% on average at the same overhead level. We have also extended our BarrierWatch work and developed a new technique to predict future cache coherence activities for synchronization epochs (this work was published at MICRO 2012). We were motivated by the fact that predicting target processors that must be contacted on a cache miss can improve the miss handling latency in shared memory systems. We developed a new run-time coherence target prediction scheme that exploits the inherent correlation between synchronization points in a program and coherence communication. Our predictor reduces the miss latency of a directory protocol by 13%. Compared with existing prediction techniques, our predictor achieves comparable performance using substantially smaller power and storage overheads. Besides research achievements, this project contributed to the human resource development through funding Ph.D. students and to the general education by providing new course materials at the University of Pittsburgh.