CAREER: Learning with Limited Feedback - Beyond Worst-case Optimality

Luo, Haipeng

Abstract

Machine learning has become an integral part of many technologies deployed in our daily lives. Traditional machine learning methods work by first collecting data and then training a fixed model for future predictions. However, much more challenging scenarios emerge as machine learning is deployed in more sophisticated applications, especially those that interact with human or other agents, such as recommender systems, game playing agents, self-driving cars, and many more. One main challenge in these applications is that the learning agent often has limited feedback from the surrounding environment, and it is thus critical to learn effectively with such limited feedback. Most existing approaches are conservative and assume worst-case environments. This project focuses on understanding how to exploit specific structures exhibited in particular problem instances, with the goal of developing more adaptive and efficient learning algorithms with strong theoretical guarantees. The success of this project requires developing new algorithmic techniques and mathematical tools in a variety of disciplines. Education is integrated into this project through curriculum development, student mentoring, organizing workshops, and developing a partnership with the Montebello Unified School District to support the goal of building Computer Science pathways.

The project consists of three main directions: partial monitoring, bandit optimization, and reinforcement learning. Each direction generalizes the classic multi-armed bandit problem in a different dimension: partial monitoring generalizes the feedback model; bandit optimization generalizes the decision space and objective functions; and reinforcement learning generalizes from stateless to stateful models. Each direction contains several main objectives: (1) for partial monitoring, the focus is on understanding how to adapt to data, environments, and models; (2) for bandit optimization, the focus is on developing adaptive algorithms for learning with linear, convex, and non-convex functions respectively; (3) for reinforcement learning, the focus is on investigating under what conditions learning becomes easier, and how to learn under non-stationary or even adversarial environments. In addition to theoretical developments, the project also aims at implementing all algorithms developed as open-source software and evaluating them using benchmark datasets.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 1943607
Program Officer: Rebecca Hwa

Project Start
Project End
Budget Start: 2020-03-01
Budget End: 2025-02-28
Support Year
Fiscal Year: 2019
Total Cost: $200,793
Indirect Cost

CAREER: Learning with Limited Feedback - Beyond Worst-case Optimality
Luo, Haipeng
University of Southern California, Los Angeles, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments