Moving machine learning techniques from the computing cloud down to edge computing nodes closer to the user is highly desirable in many use cases that require quick responses from the collected data sets. A typical use scenario is multi-task applications where a cloud server retains well-trained large-scale models, which are deployed in edge devices based on specific local needs. Examples include language translation or speech recognition with accents in multi-language audio conferences. However, supporting multi-task application on edge devices is challenging due to the associated high computational cost and large variety of involved models. Very little effort has been spent on the corresponding hardware design, especially for supporting multi-task speech and natural language processing (NLP) applications on edge compute devices. This research aims to design a novel computing system dedicated to such multi-task applications, particularly on accelerating speech/NLP, by combining innovations in both algorithm and hardware domains. The study benefits big data research, and industry at large by inspiring an interactive design philosophy between the designs of speech/NLP algorithms and the corresponding computing platforms. Undergraduate and graduate students involved in this research will be trained for the next-generation information technology workforce.
Different from conventional edge computing devices that mainly focuses on balancing the workloads between the cloud and the edge devices and optimizing the communication in between, this project concentrates on how to efficiently decompose and compress the task-specific sub-models extracted from a large multi-task model in the cloud so that deployment of the edge devices meet the functionality and performance needs under the specific hardware constraint. More specifically, the algorithm-level innovations enable a decomposable speech/NLP model that always assures proper function and performance in resource-limited edge devices, while the hardware-level innovations allow these devices to efficiently support speech/NLP multi-task applications and unleash the great potential of Resistive Random Access Memory (ReRAM)-based computing platforms. During the real-time operation, the model on the edge device can be scaled-up or shrunk-down to accommodate the dynamic hardware environment and user needs. The research leads to a holistic methodology across algorithm redesign, hardware acceleration, and an integrated software/hardware co-design.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.