A primary reason for studying repeated games is to understand how selfish players can coordinate their actions to achieve improvements without a collusive agreement. Unfortunately existing game-theoretic models admit so many outcomes that it is impossible to predict whether coordination will emerge. Also analysis postulates a rational agent who has unbounded computational capability and perfect foresight. These assumptions are critical for equilibrium models but also rather unrealistic. This project explores alternative models in which perfectly rational agents are replaced by `boundedly rational` agents who have only limited computational capabilities, and who cannot perfectly foresee the strategy of other players, which they have to learn from the past experiences. This approach is shown to capture learning dynamics and to permit applications to a wide class of repeated and dynamic games which have a `big` player who can influence the long run outcome of the model. Applications include international debt and optimal growth with moral hazard. More specifically, the project examines two person repeated games where each player learns the opponent's strategy according to the gradient method by assuming that the opponent is playing according to a linear strategy. In addition, each player artificially adds random noise that disappears slowly in order to experiment against the opponent's strategy. No restrictions are imposed on feasible strategies, but the forecast of each player must be a linear function of past observations. The reason for selecting this particular class of strategies is that these strategies are simple enough to be parameterized easily. Then, each player can learn the opponent's strategy using least squares estimation. The agent's preference is also modified slightly so that he is selecting a best response while minimizing the complexity of the decision making process. then, a recursive least squares learning model is obtained, where each player updates his belief as well as his operated game strategy as the game proceeds. The learning dynamics converges with probability 1 and in the limit, both players have an identical estimator. Consequently the behavior of the two players is highly correlated, and the limit frequency of outcomes can be sustained by some Nash equilibrium in linear strategies. In the prisoner's dilemma game, for example, the limit frequency of outcomes must be a strict convex combination of cooperation and defection, which implies that the players must learn to cooperative with positive probability.

Agency
National Science Foundation (NSF)
Institute
Division of Social and Economic Sciences (SES)
Application #
9602082
Program Officer
Daniel H. Newlon
Project Start
Project End
Budget Start
1996-08-01
Budget End
1998-11-09
Support Year
Fiscal Year
1996
Total Cost
$182,019
Indirect Cost
Name
Brown University
Department
Type
DUNS #
City
Providence
State
RI
Country
United States
Zip Code
02912