A primary reason for studying repeated games is to understand how selfish players can coordinate their actions to achieve improvements without a collusive agreement. Unfortunately existing game-theoretic models admit so many outcomes that it is impossible to predict whether coordination will emerge. Also analysis postulates a rational agent who has unbounded computational capability and perfect foresight. These assumptions are critical for equilibrium models but also rather unrealistic. This project explores alternative models in which perfectly rational agents are replaced by `boundedly rational` agents who have only limited computational capabilities, and who cannot perfectly foresee the strategy of other players, which they have to learn from the past experiences. This approach is shown to capture learning dynamics and to permit applications to a wide class of repeated and dynamic games which have a `big` player who can influence the long run outcome of the model. Applications include international debt and optimal growth with moral hazard. More specifically, the project examines two person repeated games where each player learns the opponent's strategy according to the gradient method by assuming that the opponent is playing according to a linear strategy. In addition, each player artificially adds random noise that disappears slowly in order to experiment against the opponent's strategy. No restrictions are imposed on feasible strategies, but the forecast of each player must be a linear function of past observations. The reason for selecting this particular class of strategies is that these strategies are simple enough to be parameterized easily. Then, each player can learn the opponent's strategy using least squares estimation. The agent's preference is also modified slightly so that he is selecting a best response while minimizing the complexity of the decision making process. then, a recursive least squares learning model is obtained, where each player updates his belief as well as his operated game strategy as the game proceeds. The learning dynamics converges with probability 1 and in the limit, both players have an identical estimator. Consequently the behavior of the two players is highly correlated, and the limit frequency of outcomes can be sustained by some Nash equilibrium in linear strategies. In the prisoner's dilemma game, for example, the limit frequency of outcomes must be a strict convex combination of cooperation and defection, which implies that the players must learn to cooperative with positive probability.