The intensity or vigor of goal-directed behavior is a correlate of the motivation underlying it, and, therefore, motivation can be inferred by monitoring the performance of goal-directed behavior. To study motivated behavior in monkeys, we use a task in which monkeys must perform some work, in this case detecting when a target spot turns from red-to-green, to obtain a drop of juice. In one set of tasks, a visual stimulus, a cue, indicates how much discomfort must be endured, e.g., the number of trials to be worked, to obtain the reward. The monkeys learn about the cues quickly, often after just a few trials. ? ? The number of errors in detecting the red changing to green becomes proportional to amount of work remaining before reward, with the monkeys working faster and with fewer errors when a visual cue indicates that the reward will be delivered immediately after the next correct response than when the cue indicates that additional red-to-green detections will be needed. This is a behavior in which the monkeys decrease the accuracy of their performance in response to an increased predicted workload. This achieves our goal of manipulating motivation.? ? Using a reward postponement version of this task we examined the effect of relationship among two external factors, reward size and delay, and one internal factor, thirst, which we modeled as accumulated reward/total reward, on motivation using the reward postponement task. Reward size and delay were independently manipulated and informed by the incentive cue. A simple mathematical relation describes the relations among reward size, delay and satiation level on motivation, errors = (1+kD)/ ((aR) F(S)). k is a constant, D is the delay, a is another constant, R is reward size (in drops), and F(S) is a sigmoidal funtion of satiation, which is 1-thirst. These results are consistent with temporal discounting of reward with time. In this form there is no interactive effect between satiation level and incentive value on motivation. Thirst simply enhances the incentive value, and reward is discounted as a hyperbolic function of delay duration as shown before in choice tasks. This model gives us an extended view of how incentive and motivational values are calculated and represented in the brain.? ? It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as schedule length effect). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In our modification of temporal difference learning, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we examined two phenomena observed in humans that often derive from the violation of the principle of invariance: framing, wherein equivalent options are treated differently depending on the context in which they are presented, and the sunk cost effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys.? ? The orbitofrontal cortex (OFC) is involved in assessing stimulus and outcome value. We have compared single neuronal responses related to outcome value in medial (areas 14/25/32) to those in lateral (11/13) orbitofrontal cortex while the monkey performs a task in which reward can be obtained through an operant response (operant trials) or after a delay (passive trials). Each trial begins with a visual cue that informs the monkey about both the trial type, operant or passive, and the amount of reward to be delivered at the end of the trial. In the operant trials the monkeys had to release a bar when a red fixation point turned green to obtain the reward. After correct bar release the green spot turned blue. In the passive trials, after the cue appeared, nothing else happened until the blue point appeared on the screen just before reward delivery. In operant trials reaction times and error rates were higher in low-reward trials, showing that the monkeys were sensitive to the cues. We also measured an instinctive response to cues, lipping. Cue-evoked lipping was linearly proportional to the reward size in both operant and passive trials, indicating that cues were associated with predicted outcome. 81 neurons were recorded in medial OFC and 37 neurons in lateral OFC. In medial OFC, neurons displayed a significant effect of reward size at the onset of the cues (n=13 cells). This influence of reward size on neuronal firing decreased at the time the blue spot appeared (n=9), when a significant effect of trial type (n=15 neurons) appeared. In lateral OFC, 18 neurons displayed a small but significant effect of reward size that was maintained throughout the trial. In lateral OFC, neurons were more sensitive to reward size than to trial type throughout the trials when comparing the variance accounted for by these two variables (F(1)=8.4; p=0.004). In medial OFC, the variance accounted for by reward and trial was also different (F(1)=18, p=2.5 10-5) and there was a significant interaction between over time within trial (F(4,760)=4.2, p=0.003): neurons shifted their relative sensitivity to reward size vs contingency between the times of cue onset and the appearance of the blue point. Overall, the lateral neurons were more sensitive to the outcome value (reward size) than to the trial type (passive or operant). Medial neurons became more sensitive to trial type than to reward size after the feedback signal (blue spot) appeared but before the reward was delivered.

Agency
National Institute of Health (NIH)
Institute
National Institute of Mental Health (NIMH)
Type
Intramural Research (Z01)
Project #
1Z01MH002619-17
Application #
7735123
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
17
Fiscal Year
2008
Total Cost
$1,420,813
Indirect Cost
Name
U.S. National Institute of Mental Health
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Bouret, Sebastien; Richmond, Barry J (2009) Relation of locus coeruleus neurons in monkeys to Pavlovian and operant behaviors. J Neurophysiol 101:898-911
Minamimoto, Takafumi; La Camera, Giancarlo; Richmond, Barry J (2009) Measuring and modeling the interaction among reward size, delay to reward, and satiation level on motivation in monkeys. J Neurophysiol 101:437-47
Simmons, Janine M; Saad, Ziad S; Lizak, Martin J et al. (2008) Mapping prefrontal circuits in vivo with manganese-enhanced magnetic resonance imaging in monkeys. J Neurosci 28:7637-47
Simmons, Janine M; Richmond, Barry J (2008) Dynamic changes in representations of preceding and upcoming reward in monkey orbitofrontal cortex. Cereb Cortex 18:93-103
La Camera, Giancarlo; Richmond, Barry J (2008) Modeling the violation of reward maximization and invariance in reinforcement schedules. PLoS Comput Biol 4:e1000131
Simmons, Janine M; Ravel, Sabrina; Shidara, Munetaka et al. (2007) A comparison of reward-contingent neuronal activity in monkey orbitofrontal cortex and ventral striatum: guiding actions toward rewards. Ann N Y Acad Sci 1121:376-94
Sugase-Miyamoto, Yasuko; Richmond, Barry J (2007) Cue and reward signals carried by monkey entorhinal cortex neurons during reward schedules. Exp Brain Res 181:267-76
Lerchner, Alexander; La Camera, Giancarlo; Richmond, Barry (2007) Knowing without doing. Nat Neurosci 10:15-7
Mizuhiki, Takashi; Richmond, Barry J; Shidara, Munetaka (2007) Mode changes in activity of single neurons in anterior insular cortex across trials during multi-trial reward schedules. Neurosci Res 57:587-91
Nakahara, Hiroyuki; Amari, Shun-ichi; Richmond, Barry J (2006) A comparison of descriptive models of a single spike train by information-geometric measure. Neural Comput 18:545-68

Showing the most recent 10 out of 25 publications