Cameras are pervasively used for surveillance and monitoring applications and can capture a substantial amount of image data. The processing of this data, however, is either performed a posteriori or at powerful backend servers. While a posteriori and non-real-time video analysis may be sufficient for certain groups of applications, it does not suffice for applications such as autonomous navigation in complex environments, or hyper spectral image analysis using cameras on drones, that require near real-time video and image analysis, sometimes under SWAP (Size Weight and Power) constraints.
This work hypothesizes that future data challenges in real-time imaging can be overcome by pushing computation into the image sensor. Such systems will exploit the massive parallel nature of sensor arrays to reduce the amount of data analyzed at the processing unit. To this end, vertically integrated technology, such as focal plane sensor processors (FPSP), have been developed to overcome the limitations of conventional image processing systems. While some of these devices are programmable and offer the benefits of close-to-sensor processing such as performance and bandwidth reduction, they exhibit many drawbacks. For instance, each column of pixels is handled by a single processor, which reduces the parallelism and all pixels are treated equally and processed at the same rate, despite differences in input relevance for the application at hand. Consequently, systems spend more time spinning on non-relevant data, which increases sensing and computation time and power consumption. Research on FPSPs has mostly focused on technology aspects with some proof of concepts. Architectural design approaches, that involve high-level synthesis with the goal of mapping applications to low-level architectures, have not gained a lot of attention.
To overcome the limitations of existing architectures, the goal of this research is the design of a highly parallel, hierarchical, reconfigurable and vertically-integrated 3D sensing-computing architecture (XPU), along with high-level synthesis methods for real-time, low-power video analysis. The architecture is composed of hierarchical intertwined planes, each of which consists of computational units called XPUs. The lowest-level plane processes pixels in parallel to determine low level shapes in an image while higher-level planes use outputs from low-level planes to infer global features in the image. The proposed architecture presents three novel contributions: a hierarchical, configurable architecture for parallel feature extraction in video streams, a machine learning based relevance-feedback method that adapts computational performance and resource usage to input data relevance, and a framework for converting sequential image processing algorithms to multiple layers of parallel computational processing units in the sensor.
The results of this projects can be used in other fields, where large amounts of processing need to be performed on data collected by generic sensors deployed in the field. Furthermore, mechanisms for translating sequential constructs into functionally equivalent accelerators using hardware constructs will lead to highly parallel and efficient sensing units that can perform domain specific tasks more efficiently.