With the demand on improving performance and energy-efficiency, novel technologies including non-volatile memory (e.g., spin-transfer torque RAM (STT-RAM)), 3D integration technology (3D), and near-threshold voltage computing (NTC) have been increasingly deployed in the state-of-the-art throughput processors. Since the novel technologies are not designed for dependable computing, the reliability challenges, which have been a crucial issue in conventional throughput architecture design, become the major obstacle for integrating them into next-generation throughput processors. There is a pressing need for the investigation of innovative techniques that are able to take advantage of throughput processors' unique features for characterizing and improving the reliability of the next-generation new-technology based throughput architecture design.

The paramount reliability challenges in throughput processors include particle strikes induced soft errors, hard errors driven by aging effects, and manufacturing process variations. The principle investigator is building new foundations for vulnerability characterization and prediction, error detection, and fault tolerance against those dominant reliability challenges in throughput processors integrated with novel technologies. The project objectives include: (1) modeling and analyzing the vulnerability of novel-technology (e.g., STT-RAM, NTC, and 3D) enabled throughput processors in the presence of soft error, aging effects, and process variations; (2) fast and accurate predictive model to forecast the vulnerability phase behavior of throughput processors under new technologies; (3) developing the light-weight error detection mechanisms; and (4) exploring the opportunities and challenges introduced by the novel technologies to cost-effectively tolerate various types of errors in next-generation throughput architecture design. The proposed research will significantly promote the capability of architecting reliable throughput processors in future technologies beyond CMOS, making it possible to fulfill the Moore's Law without suffering the negative effects caused by various fault mechanisms. Moreover, this project will realize the desire of applying throughput processors into a wide range of computing scale from mobile computing to cloud computing, and increasing the deployment of throughput processors in support of supercomputing in science and engineering (e.g., finance, medical, biology, petroleum, aerospace, and geology). This project will also contribute to society through engaging high-school and undergraduate students from minority-serving institutions into research, expanding the computer engineering curriculum with reliability modeling and optimization techniques on throughput processors, attracting women and under-represented groups into graduate education, and disseminating research infrastructure for education and training of US IT workforce.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
1537085
Program Officer
Yuanyuan Yang
Project Start
Project End
Budget Start
2014-12-01
Budget End
2020-01-31
Support Year
Fiscal Year
2015
Total Cost
$411,484
Indirect Cost
Name
University of Houston
Department
Type
DUNS #
City
Houston
State
TX
Country
United States
Zip Code
77204