The ubiquitous deployment of service robots in homes and service environments rests on the ability to detect and recognize objects of interest and navigate towards them. In the past few years, largely enabled by machine-learning approaches, there has seen tremendous progress by the computer vision community. The standard datasets for training and evaluation, however, typically consist of static images curated from the internet and requiring extensive manual annotation. While this paradigm is effective for learning commonly encountered object categories, it does not generalize to possibly thousands of objects of interest in service robotics applications. The development of learning algorithms which do not require supervision through detailed human annotations is one of the central problems in computer vision and artificial intelligence. The open problems in this area are motivated by our understanding how humans and biological systems acquire new knowledge about visual content in the environments. This project will lead to a new class of algorithms for object discovery, object detection, 3-D environment modeling, and navigation. The research will support a cohort of diverse graduate and undergraduate students at George Mason University and will further advance the active vision benchmark dataset for evaluating the development and deployment of service robots.

Technical aims of the project focus on the development of methods for learning representations of objects which are specific to the context where the robot operates, can be learned in self-supervised manner without need for laborious annotations, and are reusable for multiple tasks. This research utilizes the camera motion as a form of self-supervision for learning the new multi-view object embeddings, followed by zero-shot or few-shot detection training of powerful object detector models with little or no labelling effort. The inherent limitations of object detection will be tackled in the robotic setting by semantic target driven navigation techniques, learned in a reinforcement learning framework on top of representations and architectures developed for object detection. These policies will constitute a basic set of visually guided navigation skills of the robotic agent and will be integrated with mapping and exploration strategies. The approaches will be motivated by the current challenges of embodied agents' perception in indoors scenes, but the solutions will be broadly applicable in settings which require the long-term on-going interactions of an agent with dynamically changing environments.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2019-09-01
Budget End
2022-08-31
Support Year
Fiscal Year
2019
Total Cost
$499,990
Indirect Cost
Name
George Mason University
Department
Type
DUNS #
City
Fairfax
State
VA
Country
United States
Zip Code
22030