We stand on the verge of dramatic advances in deep learning applications, which will soon enable practicality and widespread adoption of computer vision based recognition in scientific inquiry, commercial applications, and everyday life. Grand challenge problems are within our reach; we will soon be able to build automated systems that recognize nearly everything we see, systems that can recognize the tens of thousands of basic-level categories that psychologists posit humans can recognize, systems that continuously learn from photos, video, and web content in order to create more complete and accurate visual models of the world. However, while it is clear that the computational capabilities for deep learning are within reach, it is equally clear that the required computational power cannot come from general-purpose processors. To succeed, we will need to build specialized domain-specific computing systems based on hardware accelerators that are capable of exploiting the extreme fine-grained parallelism inherent in deep-learning workloads. This project leverages parallelization and reconfigurable hardware to create an automated system that distributes computer vision algorithms onto a large number of field-programmable gate arrays (FPGA Cloud).
This project builds on recent advances in domain-specific hardware generation tools in order to bring the potential parallelism and performance per watt advantages of FPGAs to large-scale computer vision problems. By developing a platform to run deep learning algorithms on large clouds of FPGAs, this proposal explicitly addresses scaling algorithms beyond what a single chip can process. This involves addressing a wide range of challenging problems in algorithm analysis, building domain-specific hardware generators, communication for scaling algorithms across multiple FPGAs, and extensive validation of generating hardware for state-of-the-art deep learning approaches applied to computer vision problems. This project advances tools for designing domain-specific FPGA implementations of algorithms, taking a step toward making more efficient computing with greater parallelism more widely available. In particular, for computer vision, there will be significant benefits from a product of multiple improvements: higher parallelism, lower gate requirement by moving to fixed point when possible, and better performance per watt leading to higher computation density in servers. Together, these have the potential to significantly increase the extent to which computer vision can be a part of our daily lives, making computers better able to understand the context of our world.