As transistors shrink in size, they enable more functionality on the same area of silicon. However, small transistors are also more susceptible to intermittent errors due to alpha particles striking the silicon. Engineering systems to tolerate such errors is a problem that will become increasingly important over time. Evaluating fault tolerant designs has traditionally been almost impossible because (i) faults occur very infrequently, (ii) they require highly detailed (and thus time-consuming) simulations to determine whether they have any real effect, and (iii) millions or even billions of such simulations need to be performed to determine whether the design can actually tolerate particle strikes with different charateristics.
This project addresses the problem through a set of interlocking simulators, each at a different level of accuracy and simulation speed. The initial particle strike is modeled in extremely high detail for a short time until it can be modeled by a faster simulator with less detail, and so on. How to intelligently decide when the particle strikes, what it strikes, and when to transition between simulators are some of the core intellectual components of this project. Such a simulator could help develop techniques to build more efficient and effective reliable systems than are possible today.