Deep learning has made substantial strides in computer vision, speech recognition, natural language processing, and other applications. New algorithms, larger datasets, increased compute power, and machine learning frameworks all contribute to this success. An important missing piece is that it is still challenging for users to effectively provision and integrate deep learning applications into existing datacenters. This project develops novel solutions that enable effective use of cloud resources, which in turn will aid in broadening the population of users capable of discovering new and better deep learning models and applying them in novel settings and applications.

When developing new applications, users experiment with many deep neural networks (DNNs), but have limited knowledge of their computational demand. Due to non-linear scaling, predicting throughput improvements is challenging. Techniques developed in thrust 1 of this project quickly guide provisioning and resource allocation. In such environments, efficient inter-job resource sharing (particularly for similar DNNs) is an open problem, addressed in thrust 2 of the project by developing effective scheduling techniques. The diversity of datacenter workloads (DNNs, web), with different resource "affinity", creates opportunities to embrace cloud federations. While promising, there is a lack of techniques to support their sustainable deployment; these are developed in thrust 3 of the project.

This project is committed to diversity in research and education, involving undergraduate and graduate students, coupled with an existing extensive K-12 outreach effort. The developed experimental testbed is utilized for both, research and education. All algorithms, designs, software, and data are made publicly available so that researchers and educators are able to replicate and improve on developed technologies. Solutions to the fundamental problems that are the focus of this project enable the development of new deep learning models and increase the adoption rate of these technologies in novel application domains.

All reports and code are stored in an SVN-based repository. Software and related documents are publicly available on GitHub. All data is kept for at least 7 years beyond the life of the project. Research products are available promptly after publication, including supplemental information, through http://qed.usc.edu. These records are durable, accessible through standard web protocols, and made secure. Appropriate storage media is used, to keep data access current, as needed. Data that supports patents resulting from the project is retained for the duration of the patents. The URL to the repository is http://qed.usc.edu/D3/repository.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1816887
Program Officer
Erik Brunvand
Project Start
Project End
Budget Start
2018-10-01
Budget End
2021-09-30
Support Year
Fiscal Year
2018
Total Cost
$516,000
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089