1. BACKGROUND AND SIGNIFICANCE Learning from feedback in the real w'orld is limited by constant fluctuations in reward outcomes associated with choosing certain options or actions. Some of these fluctuations are caused by fundamental changes in the reward values of those options/actions that necessitate dramatic adjustments to the current learning strategies, like in epiphany learning or one-shot learning [Chen & Krajbich, 2017; Lee et al. 2015]. Other changes represent inherent stochasticity in an otherwise stable environment and should be tolerated and ignored to maintain stable choice preferences. In other words, learning in dynamic environments is bounded by a tradeoff between being adaptable (i.e. respond quickly to changes in the environment) and being precise (i.e. update slowly after each feedback to be more accurate), which we refer to as the adaptability-precision tradeoff [Farashahi et al., 2017; Khorsand & Soltani, 2017]. Therefore, distinguishing meaningful changes in the environment from natural fluctuations can greatly enhance adaptive learning, indicating that adaptive learning depends on interactions between multiple brain areas. To date, most computational models of learning under uncertainty are very high-level and/or descriptive [Behrens et al., 2007; Costa et al., 2015; ligaya, 2016; Jang et al., 2015; Nassar et al., 201 O; Payzan-LeNestour & Bossaerts, 2011] and therefore, do not provide specific testable predictions. On the other hand, neural mechanisms of uncertainty monitoring for adaptive learning have been predominantly investigated in humans, and in a few cases monkeys, both of which are limited in terms of circuit-level manipulations. However, interactions between brain areas unfold on short timescales and can be specific to certain cell types. These properties have severely limited the ability of functional MRI [Logothetis, 2003] or MEG [Dale et al., 2000; Mostert et al., 2015] to reveal the microcircuit mechanisms within brain regions and fine-grained contributions between brain regions. To overcome these limitations and reveal neural mechanisms underlying adaptive learning under uncertainty, we propose a combination of detailed computational modeling, imaging of stable neuronal ensembles, and precise system-level manipulation of interactions between multiple brain areas in rodents. The latter is possible in part due to powerful circuit- dissection techniques in rodents that allow manipulations of genetically-tractable cell types and thus, specific projections between brain regions. Combined with decoding of neuronal activity in cortex and guided by mechanistic computational modeling, this approach enables us to investigate both microcircuit and system-level mechanisms of adaptive learning under uncertainty. We have recently proposed a mechanistic model for adaptive learning under uncertainty [Farashahi et al., 2017]. This model, which we refer to as reward-dependent metaplasticity (ROMP) model, provides a synaptic mechanism for how learning can be self-adjusted to reward statistics in the environment. The model predicts as more time spent in a given environment with a certain reward schedule, the organisms should become less sensitive to feedback that does not support what is learned. This and other predictions of the model were confirmed using a large set of behavioral data in monkeys during a probabilistic reversal learning task [Farashahi et al., 2017]. Although the proposed metaplasticity mechanism enables the model to become more robust against random fluctuations, it also causes the model to not respond quickly to actual changes in the environment. This limitation can be partially mitigated by allowing synapses to become unstable in response to changes in the environment [ligaya, 2016]. Interestingly, in our model, the changes in the activity of neurons that encode reward values can be used by another system to compute volatility in the environment. This signal can be used subsequently to increase the speed of learning when volatility is high, that is, when there is a higher chance of real changes in the environment. We hypothesize that such interactions between value-encoding and uncertainty-monitoring systems can enhance adaptability required in dynamic environments. In addition to this modeling study, we recently have shown that both basolateral amygdala (BLA) and orbitofrontal cortex (OFC) have complementary roles in adaptive value learning under uncertainty in rodents [Stolyarova & Izquierdo, 2017]. In this experiment, rats learned the variance in delays for food rewards associated with different visual stimuli upon selecting between them. We found that OFC is necessary to accurately learn such stimulus-outcome association (in terms of 1 21

Agency
National Institute of Health (NIH)
Institute
National Institute on Drug Abuse (NIDA)
Type
Research Project (R01)
Project #
5R01DA047870-03
Application #
9982289
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Pariyadath, Vani
Project Start
2018-09-15
Project End
2023-07-31
Budget Start
2020-08-01
Budget End
2021-07-31
Support Year
3
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Dartmouth College
Department
Psychology
Type
Schools of Arts and Sciences
DUNS #
041027822
City
Hanover
State
NH
Country
United States
Zip Code
03755