A growing number of commercial and enterprise systems increasingly rely on machine learning algorithms. This shift is, on the one hand, due to the breakthroughs in machine learning algorithms that extract insights from massive amounts of data. Therefore, such systems need to process ever-increasing amounts of data, demanding higher memory bandwidth and capacity. However, the bandwidth between processors and off-chip memory has not increased due to various stringent physical constraints. Besides, data transfers between the processors and the off-chip memory consume orders of magnitude more energy than on-chip computation due to the disparity between interconnection and transistor scaling.
Exploiting recent 3D-stacking technology, the researcher community has explored near-data processing architectures that place processors and memory on the same chip. However, it is unclear whether or not such processing-in-memory (PIM) attempts will be successful for commodity computing systems due to the high cost of 3D-stacking technology and demanded change in existing processor, memory and/or applications. Faced with these challenges, the PIs are to investigate near-data processing platforms that do not require any change in processor, memory and applications, exploiting deep insights on commodity memory subsystems and network software stack. The success of this project will produce inexpensive but powerful near-data processing platforms that can directly run existing machine learning applications without any modification.