This project explores new directions to solving the following problem. Given an image, determine whether and where specific objects, or objects from a specific category, appear in the image. Visual category is defined as earlier, namely, as a collection of objects which share characteristic features that are visually similar, and occur in similar configurations. The visual nature of objects sought is communicated through (training) data containing them, and estimated using machine learning. The approach consists of two main parts. First, it learns whether a given set of previously unseen images (including videos), say supplied by a user, contains any dominant themes, namely, subimages, that occur frequently and look similar. Second, given a set of categories automatically inferred during training and a new test image, the approach recognizes all occurrences in the image of the learned categories. It delineates each such object in the image, and labels it with its category name. Both learning and subsequent recognition do not require human supervision. The approach learns and recognizes categories as image hierarchies. The impact of the project includes accurate high-speed extraction of image regions, image representation by connected segmentation tree, robust image matching, unsupervised extraction of hierarchical category models, efficient recognition of a large number of categories, unsupervised estimation of perceptually salient, relevance weights of subcategory detections to category recognition, and generalization of the proposed approach to extraction of texture elements. More broadly, the proposed approach is useful for applications in search engines, surveillance, video analytics, monitoring and data mining.
1. Work Done This extended grant has built upon the work completed through the last report, of 2011-12. We have continued the work on the three previously reported major parts and completed one major new part. These parts are as follows. (1) We have exploited the accuracy and the detailed low level analyses in the segmentation algorithm proposed in our earlier work [3] for the optical flow estimation problem, and have obtained promising results. (2) We have developed a novel way of enhancing the perceived sharpness of natural images by systematically learning human preferences. (3) We have started to develop a novel way of compressive sampling and reconstruction of images that exploits homogeneity and spatial correlation of pixels within image regions. (4) We have begun to use our low level segmentation algorithm for obtaining a super-resolution images from a given low resolution image.. Segmentation Based Optical Flow Estimation: We have proposed a method for regularizing the optical flow field in order to reduce the excessive blur around motion boundaries seen in most existing flow algorithms. We generate hypotheses for accurate motion boundaries by fusing together information from a preliminary, smoothing-error-prone flow field (e.g. obtained from a traditional optical flow algorithm), and the relatively accurate low level image segmentation algorithm developed by our group. We can bootstrap the performance of any given, traditional flow estimation algorithm, and thus expect to consistently improve or, at worst, preserve the performance of the given algorithm. By working with several traditional optical flow algorithms and data, we have demonstrated that our approach indeed meets this expectation. Learning Human Preferences for Sharpening Images: We have proposed a method for maximizing the perceived sharpness of an image. Image sharpness is defined in terms of the one-dimensional contrast across region boundaries. The region boundaries are ramps referred to above, and are automatically extracted at all natural scales present. The unknown spatial (size, geometry) scales and photometric (contrast) at which the regions happen to occur in a given image are themselves identified automatically. The ramps are modified by adding over- and under-shoots to them at the two ends, whose amounts must be determined. Human judgments of perceived sharpness are collected and used to learn a function that models the dependence of the over- and under-shoot amounts. Specifically, the best sharpening parameter values at an image location are estimated as a function of certain local image properties. We use the Gaussian mixture model (GMM) as the model, and estimate joint probability density of the preferred sharpening parameters and local image properties. The GMM parameters are adaptively estimated by parametric regression from GMM. Experimental results demonstrate the superior performance of our approach over the traditional, unsharp masking method. Non Local Compressive Sampling (CS) and Reconstruction of Images: Despite the remarkable progress in the theory of CS, the required CS rate for image (for acquiring an image using CS) is still very high. In our preliminary work, a non-local compressive sampling (NLCS) recovery method is proposed to further reduce the sampling rate by exploiting the non-local patch correlation and local piecewise smoothness of regions in natural images. In addition, we are targeting linear complexity as a function of the matrix size without compromising accuracy. From a Low-Resolution Image to a Super-Resolution Image: In [5] we propose a new image domain prior term for regularizing the super-resolution objective function. This term encourages preserving the local ramp structure around edges, in the superresolution reconstruction algorithm. We then perform a domain transformation of the pixels belonging to the steepest ramps at the edge pixels, in order to adaptively ‘compress’ the ramps. The resulting non-uniformly spaced image is then upscaled to a uniform, high resolution grid, using an edge preserving non-uniform interpolation scheme. This image is then used both as the prior constraint as well as the initial guess for the iterative super-resolution reconstruction algorithm. Our results compare favorably to the classical backprojection algorithm as well as conventional gradient domain priors. 2. Intellectual Merit: Overall, this work has contributed to approaches to some low level computer vision and image processing problems that can, in turn, potentially aid/improve the performance of higher level tasks such as matching and recognition. We have found that the low level characteristics of our region based image representation framework can significantly aid the dense correspondence estimation (optical flow) problem and can be used for the enhancement of the perceptual quality of images. We have validated our findings using several real world datasets and established benchmarks. We have begun to extend our region based image representation for designing improved compressive sampling and reconstruction algorithms. 3. Broader Impact Contributed to Motion Analysis and Image Enhancement and related aspects of Machine Learning, and to Image Acquisition. 4. Human Resources Three Ph.D. students, each partially (in different semesters), have been supported by the grant.