Fast computer processors, tensor processing units, hardware accelerators, and heterogeneous architectures have enabled large-scale speed-ups in computational power, but memory speeds have not kept pace at the same time. Memory performance therefore has become the bottleneck in many applications that rely on heavy memory access. Several emerging memory technologies such 3D-Stacked Dynamic Random Access Memory (3D-DRAM) and non-volatile memory attempt to address memory bottleneck issues from a hardware perspective, but with a tradeoff among bandwidth, power, latency, and cost. Rather than redesigning existing algorithms to suit specific memory technology, this project will develop a Machine Learning-based approach that automatically learns access patterns which may be used to optimally prefetch data. Specifically, highly compact Long short-term memory (LSTM) models will be used as the centerpiece of the prefetcher for predicting memory accesses. Through novel model compression techniques, hierarchical memory modeling and dedicated hardware, this project will overcome barriers of fully exploiting machine learning and emerging hardware to improve prefetching. Successful completion of this project will lead to improved memory performance for applications, including signal processing, computer vision, and language processing.

A practical LSTM based prefetcher implementation on hardware requires dealing with certain challenges that will be addressed in this endeavor: (i) training a small model (to enable fast inference) with large traces that is highly accurate in predicting memory accesses for multiple applications; (ii) model compression to ensure real-time inference; (iii) retraining the model online on-demand to learn application specific models, which would require fast learning with small amount of data; (iv) making prefetching decisions in real-time based on the prediction and uncertainty of the model ''what'', ''when'', and ''where'' to prefetch, which also requires careful modeling of the target memory hierarchy; (vi) based on the predictions, deciding in real-time if reordering data (dynamic data layout) can improve the latency, making future prefetches more effective; (vii) mapping the framework of predictions and decision making on limited available configurable hardware in - ensuring low latency training and high-throughput prefetching utilizing small area/power.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Southern California
Los Angeles
United States
Zip Code