This project develops a new parallel computing platform, namely Visual Computing Database, that facilitates the development of applications that require visual data analysis at massive scale. The developed system combines ideas from traditional relational database management systems (to more easily and powerfully organize and manage visual data collections) with modern graphics programming abstractions for efficiently manipulating pixel data. This project implements a prototype of the visual computing database, release it as an open source project to the community, and deploys the system at scale as a service to scientists and researchers on the Google Cloud Platform. There is strong evidence that in domains ranging from personal digital assistants that interpret one's surroundings, to management of critical infrastructure in smart cities, and to scientific data analysis, a fundamental requirement of the next generation of visual and experiential computing (VEC) applications will be the efficient analysis and mining of large repositories of visual data (images, videos, RGBD, etc.). Scaling visual data analysis applications to operate on collections such as the photos and videos on Facebook and YouTube, the traffic cameras in a city, or petabytes of images in a digital sky survey, presents significant computer science challenges due to the size of visual data representations and the computational expense of algorithms understanding and manipulating large image datasets. The difficulty of developing efficient, supercomputing scale applications from scratch inhibits the field's ability to explore advanced data-driven VEC applications.
A central aspect of the project is the design of a new visual data query language that integrates concepts from high performance, functional image processing languages with relational operators and spatial and temporal predicates, providing the ability to execute sequences of complex image/video analysis operations with high efficiency in the database (near the data store). Since visual analysis workloads involve tight integration of data retrieval operations and processing of the result sets (e.g., largescale machine learning, image registration/alignment, and 3D reconstruction), a key design challenge is making the results of database operations easily accessible to non-relational, supercomputing scale computations. All together the project addresses fundamental systems design questions such as: what is a good visual query language for future visual data analysis tasks? How can key operations be implemented efficiently on throughput hardware at scale? What are the appropriate benchmarks for evaluating visual data analysis systems at scale?
URL: http://graphics.cs.cmu.edu/projects/visualdb