With advances in camera technologies, and as cloud storage, network bandwidth and protocols become available, visual media are becoming ubiquitous. Video recording became de facto universal means of instruction for a wide range of applications such as physical exercise, technology, assembly, or cooking. This project addresses the scientific and technological challenges of video shooting in terms of coverage and optimal views planning while leaving high level aspects including creativity to the video editing and post-production stages.
Camera placement and novel view selection challenges are modeled as optimization problems that minimize the uncertainty in the location of actors and objects, maximize coverage and effective appearance resolution, and optimize object detection for the sake of semantic annotation of the scene. New probabilistic models capture long range correlations when the trajectories of actors are only partially observable. Quality of potential novel views is modeled in terms of resolution that is optimized by maximizing the coverage of a 3D orientation histogram while an active view selection process for object detection minimizes a dynamic programming objective function capturing the loss due to classification error as well as the resources spent for each view.
The project advances active sensing and perception and provides the technology for further automation on video capturing. Such technology has broader impact on the production of education videos for online courses as well as in telepresence applications. Research results are integrated into robotics and digital media programs addressing K-12 students.