Monthly Archives: July 2016

Individual tendencies in Stroop test predicts use of model based learning

stroopindexThis post is based on the paper: “Cognitive Control Predicts Use of Model-Based Reinforcement-Learning,” authored by A. Ross Otto, Anya Skatova, Seth Madlon-Kay, and Nathaniel D. Daw, Journal of Cognitive Neuroscience. 2015 February ; 27(2): 319–333. doi:10.1162/jocn_a_00709. The paper is difficult to understand, but covers some interesting subject matter. Andy Clark alerted me to these authors in his book Surfing Uncertainty.
This paper makes the obvious assertion that dual process theories of decision making abound, and that a recurring theme  is that the systems rely differentially upon automatic or habitual versus deliberative or goal-directed modes of processing. According to Otto et al a popular refinement of this idea proposes that the two modes of choice arise from distinct strategies for learning the values of different actions, which operate in parallel.  In this theory, habitual choices are produced by model-free reinforcement learning (RL), which learns which actions tend to be followed by rewards. In contrast, goal-directed choice is formalized by model-based RL, which reasons prospectively about the value of candidate actions using knowledge (a learned internal “model”) about the environment’s structure and the organism’s current goals. Whereas model-free choice involves requires merely retrieving the (directly learned) values of previous actions, model-based valuation requires a sort of mental simulation – carried out at decision time – of the likely consequences of candidate actions, using the learned internal model. Under this framework, at any given moment both the model-based and model-free systems can provide action values to guide choices, inviting a critical question: how does the brain determine which system’s preferences ultimately control behavior?

Continue reading