A memory device may exhibit errors due to manufacturing defects, device aging, or particle strikes from cosmic-ray-induced neutrons. Memory errors are an important threat to computer system reliability as semiconductor technologies continue to scale. This project develops a new approach that protects against memory errors non-uniformly by exploiting unequal error susceptibility at different memory regions in a computer system. Collaboration with industry researchers facilitates the integration of developed techniques into real-world memory technologies. Results of this project will contribute to comprehensive computer system reliability that is critical to society and the health of the world's economy. Curriculum enhancement and student training in this project enable advanced human resource development that is necessary for today's and tomorrow's digital workforce.
Research and development efforts within this project include four synergistic components: First, this project introduces a new software approach that systematically uncovers important characteristics of memory error propagation and its consequences. Second, research develops new energy-efficient hardware support for flexible, dynamic adjustment of memory error protection on each memory area. Third, this project devises non-uniform memory error protection policies that optimize for reliability and efficiency based on software error susceptibility and hardware protection costs. Finally, the developed error susceptibility assessment and non-uniform protection techniques are evaluated using real application scenarios. Cross-layer (software/hardware) technologies developed in this project enable wide utilization of advanced memory reliability mechanisms without significant loss of performance or energy efficiency.