The era of data science is underway, with an explosion of data from social media, environmental monitoring, E-health, national defense, sciences/engineering advances, etc., driving a very fast-growing information-technology sector. As a foundational pillar for big data, data centers play a crucially important role in efficiently collecting, storing, retrieving, classifying, and processing large datasets. In addition, these tremendous volumes of data in data centers, as well as the rapid advances of modern computer techniques, have propelled the ongoing boom of machine learning (ML) from artificial intelligence (AI). While ML aims at automatically learning useful properties from data for accurate and timely stochastic decision making, there is an increasing need for this decision making to occur in a real-time fashion. Thus, one of the most important services in an AI-based interactive data center is how to efficiently process computation-intensive and time-sensitive multimedia (e.g., video, audio) data and provide AI-based decision-making services. However, because of limited computing and storage capabilities, random uncertainties of availability for software/hardware resources, and statistical multiplex switching in data centers, the deterministic delay-bounded requirements for high-volume real-time services of AI-based interactive data-centers are often infeasible. Thus, the PI proposes to extend and apply the statistical delay-bounded quality-of-service (QoS) provisioning theory as an alternative solution to support real-time decision-making services, where the goal is to guarantee bounded delay with a small violation probability, therefore significantly reducing the processing delays currently found in AI-based interactive data-centers. These demand various software/hardware accelerators to be developed to guarantee diverse delay-bounded QoS requirements. The objective of this research is to systematically investigate fundamental and challenging issues on how to extend, apply, and implement the statistical delay-bounded QoS provisioning theory in supporting real-time, interactive, and decision-making services over AI-based interactive data centers.
While the statistical delay-bounded QoS provisioning theory has been shown to be a powerful technique and useful performance metric for supporting time-sensitive multimedia transmissions over mobile computing networks, how to efficiently extend and implement this technique/performance-metric for statistically upper-bounding the tail-Latency, which is the worst-case latency dictating delay-bounded QoS performances, imposed in the AI-based interactive data center services has neither been well understood nor thoroughly studied. To overcome the above challenges, employing various emerging computer software/hardware technologies, this project proposes to develop a set of AI-based hybrid software/hardware acceleration architectures, algorithms, and schemes to support the low-tail-latency QoS provisioning for multi-core AI-based interactive data-center services, while reducing the computational workloads and complexities introduced by parallel and distributed data centers. The proposed framework is mainly based on developing novel acceleration architectures for both software and hardware designs and optimizations to significantly boost computing efficiencies through minimizing instruction and data movement and processing across processors and memories. Leveraging the unique novel features and techniques of the statistical delay-bounded QoS provisioning theory and AI-based computing accelerators, a number of QoS-enabling engines constitute the main foundation of this project. More specifically, the research focuses mainly on the following closely coupled research tasks. (1) Develop deep-learning-based processing-in-memory (PIM) systems (PIM QoS-enabling engine) to accelerate training for applications classifications. (2) Develop deep-learning-based application-encoding/aggregating mechanisms and then compare the encoded vectors with trained profiling outputs to classify/aggregate applications. (3) Develop hierarchical cache-partitioning architectures to statistically upper-bound the tail-latency of data-center services by clustering applications based on their load profiles. (4) Develop the precise tail-latency QoS performance-prediction models/metrics and monitoring systems to guarantee the statistical delay-bounded QoS for low tail latency of the higher-priority co-running applications. (5) Develop modeling and analytical techniques, and simulation tools/testbeds, to validate and evaluate the performance for the proposed architectures, frameworks, protocols/algorithms, and schemes. The projects' research intends to benefit the national economy, environment, and society. Also, this project is well integrated with PI?s developments of new graduate and undergrad data-center-relevant curricula/courses at Texas A&M University. The important findings of this project are to be disseminated to the research community through the avenues of journals, conferences, and websites.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.