Machine Learning (ML) has rapidly emerged as one of the foundational technologies of this century. It is pervasive in our lives today, from allowing us to unlock smartphones, to powering recommendation engines for almost any human activities (dinning, movies, services etc). The applications of ML are expected to become even more transformative in the future, especially in healthcare, autonomous transport, robotics, agriculture, education, and space exploration. ML models are computationally expensive, need large amounts of memory to store the trained model, and have strict runtime requirements. They cannot be run efficiently on general-purpose processors, which has led to an explosive growth in custom hardware accelerators for ML. However, getting good performance and energy-efficiency from these accelerators is itself challenging, as it relies on three components: the ML model itself, the hardware parameters, and the scheduling of computations in the ML model onto the limited compute and memory resources on the accelerator. The proposed research will develop an open-source software cyberinfrastructure called MAESTRO that can be used to analytically determine the performance and energy-efficiency of ML models over target hardware platforms, prior to actually building the hardware and deploying the model. MAESTRO will be extremely useful for students, researchers, and industry practitioners alike to learn about, design, and deploy custom ML solutions. The project will also engage undergraduate and high-school students to teach them about ML through outreach activities involving hackathons and hardware building.
Mapping ML computations over finite compute elements within an accelerator, and understanding the corresponding data that needs to move across the memory hierarchy is a non-trivial problem; the space of all possible ways of slicing and dicing the model (known as "dataflow") is exponentially complex, and the benefits of any mapping vary across ML models and target accelerator. To address this, the PI will first develop a set of data-centric directives to directly describe the mapping of the ML model over the accelerator, which will enable precise calculations of data reuse opportunities across space and time to reduce overall data movement. Next, the PI will develop the MAESTRO analytical cost model framework to estimate reuse, end-to-end performance, and energy over the target hardware. Finally, a set of tools will be developed around MAESTRO to automatically search for and determine the optimal hardware/mapping/model given constraints of runtime, power, energy, or area. The proposed framework will enable iterative innovation and co-design across the ML model, mapping and target hardware, and will therefore be highly valuable for ML model developers, compiler writers and computer architects. MAESTRO will be released and maintained on an open-source license, and the PI will run periodic tutorials to build an active user-base in the research community.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.