With the advent of scalable parallel computing, thousands of devices are connected and managed collectively.  This era is confronted with a new challenge: performance failure; systems often perform worse than expected due to large-scale management issues such as hardware failures, software bugs, and configuration mistakes.  This project targets one overlooked cause of performance failure: "lagging hardware" -- hardware whose performance degrades significantly compared to its specification.  Many reports indicate that a single lagging hardware can easily cascade and make the performance of a whole cluster collapse.  Here, parallelism is unexploited, productivity is reduced, the system is underutilized, and energy is wasted. The goal of the LigHTS project is to transform computing systems into Lagging-Hardware Tolerant Systems.  The LigHTS project will bring many direct benefits to the society; users from many areas (science, healthcare, business, education, military, and government) increasingly use large-scale storage and computation services.  Here, predictable performance is a key to success, and in this context lagging-hardware tolerant computing is a critical ingredient.

The LigHTS project consists of three major objectives.  The first is lagging-hardware data analysis and instrumentation. To improve the robustness of future parallel systems, it is crucial to study lagging characteristics exhibited by modern hardware and to devise new instrumentation methodologies that can collect cases of lagging hardware in deployment.  The second is lagging-failure system analysis.  It is important to rigorously analyze the impact of lagging hardware (including disk, network, processor) to currently deployed systems. The results will unearth design flaws and provide valuable reevaluations of how deployed systems should evolve.  The last is LigHTS principles, design, and implementation.  There is a need to establish foundational principles of lagging-hardware tolerant computing and apply the principles in building prototypes of cross-layer LigHTS systems spanning distributed storage, computing framework, operating and runtime systems.

Project Start
Project End
Budget Start
2013-09-15
Budget End
2017-08-31
Support Year
Fiscal Year
2013
Total Cost
$749,854
Indirect Cost
Name
University of Chicago
Department
Type
DUNS #
City
Chicago
State
IL
Country
United States
Zip Code
60637