Learning visual representations remains a challenge for systems which interact with the real world and/or analyze visual information available on the web. Significant progress has been made with simple visual features based on gradient histograms: these models work extremely well on objects that have highly textured and nearly planar patterns or parts. However, these systems suffer when faced with certain classes of real world objects that do not have discriminative locally-planar opaque texture patches, especially objects with complex photometric models. This project develops layered visual models for visual recognition, which can model these classes of phenomena. The research team grounds the methods in a probabilistic foundation, primarily exploiting a sparse Bayesian approach to factoring observed image features into a set of component layers corresponding to an additive image formation process. Considering both local descriptor and local feature detector variants of the model, the research team offers a new concept for interest point detection in the case of transparent objects: extrema detection in a latent-factor scale space. This model has the potential to find invariant local detections despite transparency, and could be useful in a range of vision applications beyond pure recognition for which sparse local feature detectors have proven valuable (e.g., registration, mosaicing, SLAM). Robotic vision systems can use this representation for enhanced recognition of everyday objects, supporting domestic and industrial applications. These representations also facilitate intelligent media processing and indexing.