The objective of this research is the development of methods and software that will allow robots to detect and localize objects using Active Vision and develop descriptions of their visual appearance in terms of shape primitives. The approach is bio inspired and consists of three novel components. First, the robot will actively search the space of interest using an attention mechanism consisting of filters tuned to the appearance of objects. Second, an anthropomorphic segmentation mechanism will be used. The robot will fixate at a point within the attended area and segment the surface containing the fixation point, using contours and depth information from motion and stereo. Finally, a description of the segmented object, in terms of the contours of its visible surfaces and a qualitative description of their 3D shape will be developed. The intellectual merit of the proposed approach comes from the bio-inspired design and the interaction of visual learning with advanced behavior. The availability of filters will allow the triggering of contextual models that work in a top-down fashion meeting at some point the bottom-up low-level processes. Thus, the approach defines, for the first time, the meeting point where perception happens. The broader impacts of the proposed effort stem from the general usability of the proposed components. Adding top-down attention and segmentation capabilities to robots that can navigate and manipulate, will enable many technologies, for example household robots or assistive robots for the care of the elders, or robots in manufacturing, space exploration and education.
If I ask you for the scissors, you look around, you may go into another room, you may open a box or a drawer and after some some search you find the scissors. In our project we developed the theory and the implementation of the set of processes necessary for a robot to solve a similar problem, namely to find in images of a scene objects that it knows about. So, if the robot knows about a thousand different objects, our algorithms will allow the robot to locate those objects in images of scenes. To achieve this we had to develop a new image processing operator that was not known before, namely the "torque" operator. This operator provides places in the image that denote "objecthood", i.e. they are surrounded by boundaries. By modulating then the torque operator according to a prior object model (e.g. scissors), image places where the object may be light up, thus allowing us to find it after scrutinizing those locations through a process of segmentation. A robot that understands its environment well enough to safely navigate through it and also to search for and manipulate objects has obvious implications for a wide variety of applications. Two applications that are immediately ripe are in Healthcare and in Warehouse operations. For example, a robot can navigate in a complex hospital environment with complete autonomy and be able to perform basic fetch functions for patients that are bedridden or mobility-limited. .