Mission-critical scientific simulations (e.g., climate simulation and fluid dynamics simulation) and enterprise workloads (e.g., search and encryption) running on large-scale computing systems are jeopardized by the increase of faults and errors in hardware and software. Understanding the vulnerability of these large-scale applications is important to minimize performance and power. Lack of the knowledge of application vulnerability forms a major bottleneck of execution efficiency, and jeopardizes HPC simulation capabilities. Previous works rely on random fault injection or detailed architecture analysis to evaluate application vulnerability. They can be slow and inaccurate. There is a big gap between the needs of reliable and efficient HPC and what the current methodologies can provide. This research explores a new methodology to understand application vulnerability. It investigates new analytical and statistical models to quantify and characterize application vulnerability based on a novel metric and application semantics (including algorithm semantics and data semantics). The PI integrates modeling techniques into a broader context for vulnerability analysis to improve the modeling accuracy and explore reliable and efficient protection for applications while examine the interplay between reliability, power, and performance.

The outcome from this research will provide support for execution correctness and efficiency of large-scale applications running on future computing systems that demand high data integrity. The proposed research will affect design of reliable applications and algorithms. Built upon the collaboration with industry, the research outcome is expected to be tangible and have direct impact on realistic scientific problems. Furthermore, the tight coupling between research components and education components creates a HPC learning culture to engage students in HPC, addressing HPC workforce shortage in the nation.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
1553645
Program Officer
Almadena Chtchelkanova
Project Start
Project End
Budget Start
2016-02-01
Budget End
2022-01-31
Support Year
Fiscal Year
2015
Total Cost
$507,989
Indirect Cost
Name
University of California - Merced
Department
Type
DUNS #
City
Merced
State
CA
Country
United States
Zip Code
95343