This big data project develops tools to support researchers and developers in the task of prototyping multimedia content analysis algorithms in a large scale. Typically, scientists and engineers prefer to use high-level programming languages such as Python or MATLAB to conduct experiments, as they allow for a quick implementation of a novel idea. Experiments on big data, however, are often computationally-intensive and therefore must eventually be recoded into a low-level language by expert programmers in order to achieve sufficient performance, creating a gap between productivity and performance. In addition, multiple strategies may exist for mapping a problem onto parallel hardware depending on the input data size and the hardware parameters, further exacerbating the problem. Using the application area of multimedia content analysis as an example (an area with one of the largest and the fastest growing amounts of data due to the steady upload of consumer produced videos), this project performs research on a pattern-oriented, application-specific specialization framework that uses a tiered approach to parallel programming. The ultimate aim is to provide the scalability of diverse parallel processing at the productivity level of high-level languages.
Social media videos are increasingly being used for scientific research, as they allow us to observe and model many phenomena studied, for example, in social sciences, economics, meteorology and medicine. More scalable content analysis impacts any field that uses social media videos. Moreover, social media videos are an everyday part of many people's lives. Making multimedia content analysis more scalable allows for better algorithms to be developed by more students and researchers, and therefore impacts many people's lives. The framework is made available on the project website (http://smash.icsi.berkeley.edu).