Disentangling Exploration from Exploitation

Yariv, Leeat

Abstract

This group of collaborative awards funds a project in economic theory. The project will develop a new approach to analyze how people make decisions between trying new experiences or continuing with long standing choices. The new approach applies to many real world situations that are not well explained with existing models. For instance, an investor might invest in one stock while gathering information on the returns to a different investment. A policymaker might explore new alternatives while implementing existing policy. An employee might search for a new job while continuing her current employment. The model applies to these situations where an individual chooses between multiple projects with unknown rewards and can invest in one to earn a reward while also observing the rewards of different projects. The new model is an alternative to a large class of models from economics and statistics that has studied a different kind of experimentation problem, often referred to as the bandit problem. In a bandit problem, the decision maker can only learn through investing. She gains information only about the project that she chooses: what she learns is linked to what she earns. The result is a well-known trade-off between exploration and exploitation: the decision - maker may desire investment in a low reward project because it offers valuable learning. The existing group of bandit models have been widely applied. The new research funded by this grant could result in deeper understanding of a similarly wide range of applications, including applications that could enhance national security and strengthen the US economy.

The research team will consider three environments that offer a particularly stark contrast with the classical experimentation framework. First, when experimentation is conducted by individuals, the optimal alternative is always eventually discovered. This contrasts with the classical result of incomplete learning in classical experimentation with discounting. However, behavior is more complex than in standard experimentation environments since a simple index (such as the Gittins index,which characterizes solutions in the classical framework) need not exist. Second, teams composed of similar members are no longer subject to free-rider problems: when observation and exploitation are separated, teams achieve efficient experimentation. The team will also analyze questions that cannot be addressed in the standard setting. Specifically, the project explores a delegation environment in which there are two agents: the executive (Doer), who chooses the project to be implemented, and the intelligence agency (Observer), who chooses what project to look at. Because the objectives of these two agents may be misaligned, delegated choices may yield either minimal or maximal experimentation depending on the nature of contact between the doer and the observer. Ultimately, the project has the potential to uncover novel features of experimentation that can be applied to various applications and tested against data.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Social and Economic Sciences (SES)
Type: Standard Grant (Standard)
Application #: 1949381
Program Officer: Nancy Lutz

Project Start
Project End
Budget Start: 2020-03-01
Budget End: 2022-02-28
Support Year
Fiscal Year: 2019
Total Cost: $199,516
Indirect Cost

Disentangling Exploration from Exploitation
Yariv, Leeat
Princeton University, Princeton, NJ, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments