We stand on the verge of dramatic advances in deep learning algorithms, which will soon enable widespread adoption of computer-vision-based object recognition in scientific inquiry, commercial applications, and everyday life. However, practical large-scale applications in this area are currently limited by the computational capabilities of conventional computer systems. In recent years, technological improvement in computer processors (CPUs) has considerably slowed. This has led to an increase in interest in using Graphics Processing Units (GPUs) to accelerate deep learning computer vision algorithms. Although GPUs can perform these tasks faster than CPUs, they suffer from inflexibility and very high power cost. An alternative technology called the Field-Programmable Gate Array (FPGA) is very attractive for problems in this domain thanks to its flexibility and power efficiency. However, FPGAs have been underutilized in this area, in large part due to unfamiliarity and misconceptions. The goal of this project is to demonstrate the power and performance advantages of FPGAs over GPUs for deep-learning-based computer vision problems via hard experimental evidence. The PIs will disseminate their findings to the research community at large with the goal of encouraging the use of FPGAs in ground-breaking work tackling the grand challenges of deep learning and computer vision.

This project consists of a three-stage research plan. First, the PIs will prepare and validate a state-of-the-art image detection application based on convolutional neural networks. This will utilize the popular Caffe library, which allows convolutional networks to be evaluated on CPU and GPU. Second, the PIs will perform a detailed characterization and profiling of the performance of this application on GPU, seeking to understand the performance characteristics and their underlying causes. Third, the PIs will implement portions of the algorithm on an FPGA, and perform an in-depth analysis to find and explain the advantages and disadvantages offered by the platform. The PIs anticipate demonstrating that the slowest portion of the algorithm on the GPU will achieve significant speedup on the FPGA, arising from the efficient support of irregular fine-grain parallelism. Meanwhile, the fastest portion of the algorithm on the GPU is anticipated to run with comparable performance on the FPGA, but at dramatically lower power consumption.

This project will integrate research with graduate and undergraduate education. PhD students will be exposed to GPU optimization and application-specific high-performance FPGA design. Masters and undergraduate students will gain valuable skills assisting the project through the Masters Advanced Project in Computer Science and the Undergraduate Senior Design Project in Electrical and Computer Engineering. The results of the study will be published at prominent venues to ensure maximum exposure for the relevant research communities.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
1453460
Program Officer
Tracy J. Kimbrel
Project Start
Project End
Budget Start
2014-08-01
Budget End
2016-07-31
Support Year
Fiscal Year
2014
Total Cost
$95,000
Indirect Cost
Name
State University New York Stony Brook
Department
Type
DUNS #
City
Stony Brook
State
NY
Country
United States
Zip Code
11794