This post is based on a draft dated July 10, 2015, “Learning in Dynamic Probabilistic Environments: A Parallel-constraint Satisfaction Network-model Approach,” written by Marc Jekel, Andreas Glöckner, & Arndt Bröder. The paper includes experiments that contrast Parallel Constraint Satisfaction with the Adaptive Toolbox Approach. I have chosen to look only at the update of the PCS model with learning. The authors develop an integrative model for decision making and learning by extending previous work on parallel constraint satisfaction networks with algorithms of backward error-propagation learning. The Parallel Constraint Satisfaction Theory for Decision Making and Learning (PCS-DM-L) conceptualizes decision making as process of coherence structuring in which learning is achieved by adjusting network weights from one decision to the next. PCS-DM-L predicts that individuals adapt to the environment by gradual changes in cue weighting.
This post is based on a paper: “Learning from experience in nonlinear environments: Evidence from a competition scenario,” authored by Emre Soyer and Robin M. Hogarth, Cognitive Psychology 81 (2015) 48-73. It is not a new topic, but adds to the evidence of our nonlinear shortcomings.
In 1980, Brehmer questioned whether people can learn from experience – more specifically, whether they can learn to make appropriate inferential judgments in probabilistic environments outside the psychological laboratory. His assessment was quite pessimistic. Other scholars have also highlighted difficulties in learning from experience. Klayman, for example, pointed out that in naturally occurring environments, feedback can be scarce, subject to distortion, and biased by lack of appropriate comparative data. Hogarth asked when experience-based judgments are accurate and introduced the concepts of kind and wicked learning environments (see post Learning, Feedback, and Intuition). In kind learning environments, people receive plentiful, accurate feedback on their judgments; but in wicked learning environments they don’t. Thus, Hogarth argued, a kind learning environment is a necessary condition for learning from experience whereas wicked learning environments lead to error. This paper explores the boundary conditions of learning to make inferential judgments from experience in kind environments. Such learning depends on both identifying relevant information and aggregating information appropriately. Moreover, for many tasks in the naturally occurring environment, people have prior beliefs about cues and how they should be aggregated.
This post is based on the paper: “Priors in perception: Top-down modulation, Bayesian
perceptual learning rate, and prediction error
minimization,” authored by Jakob Hohwy (see post Explaining Away) that appeared (or is scheduled to appear) in Consciousness and Cognition, 2017. Hohwy writes in an understandable manner and is so open that he posts papers even before they are complete of which this is an example. Hohwy pursues the idea of cognitive penetration – the notion that beliefs can determine perception.
Can ‘high level’ or ‘cognitive’ beliefs modulate perception? Hohwy methodically examines this question by trying to create the conditions under which it might work and not be trivial. For under standard Bayesian inference, the learning rate declines gradually as evidence is accumulated, and the prior updated to be ever more accurate. The more you already know the less you will learn from the world. In a changing world this is not optimal since when things in the environment change we should vary the learning rate. Hohwy provides this example. As the ambient light conditions improve, the learning rate for detecting a visible target should increase (since the samples and therefore the prediction error has better precision in better light). This means Bayesian perceptual inference needs a tool for regulating the learning rate. The inferential system should build expectations for the variability in lighting conditions throughout the day, so that the learning rate in visual detection tasks can be regulated up and down accordingly.
The human brain is thus hypothesized to build up a vast hierarchy of expectations that overall help regulate the learning rate and thereby optimize perceptual inference for a world that delivers changeable sensory input. Hohwy suggests that this makes the brain a hierarchical filter that takes the non-linear time series of sensory input and seeks to filter out regularities at different time scales. Considering the distributions in question to be normal or Gaussian, the brain is considered a hierarchical Gaussian filter or HGF .
This post is based on a paper: “Intuition and analytic processes in probabilistic reasoning: The role of time pressure,” authored by Sarah Furlan, Franca Agnoli, and Valerie F. Reyna. Valerie Reyna is, of course, the primary creator of fuzzy-trace theory. Reyna’s papers tend to do a good job of summing up the state of the decision making art and fitting in her ideas.
The authors note that although there are many points of disagreement, theorists generally agree that there are heuristic processes (Type 1) that are fast, automatic, unconscious, and require low effort. Many adult judgment biases are considered a consequence of these fast heuristic responses, also called default responses, because they are the first responses that come to mind. Type 1 processes are a central feature of intuitive thinking, requiring little cognitive effort or control. In contrast, analytic (Type 2) processes are considered slow, conscious, deliberate, and effortful, and they place demands on central working memory resources. Furlan, Agnoli, and Reyna assert that Type 2 processes are thought to be related to individual differences in cognitive capacity and Type 1 processes are thought to be independent of cognitive ability, a position challenged by the research presented in their paper. I was surprised by the given that intuitive abilities were unrelated to overall intelligence and cognitive abilities as set up by typical dual process theories.
This post is from Judgment and Decision Making, Vol. 11, No. 6, November 2016, pp. 601–610, and is based on the paper: “The irrational hungry judge effect revisited: Simulations reveal that the magnitude of the effect is overestimated,” written by Andreas Glöckner. Danziger, Levav and Avnaim-Pesso analyzed legal rulings of Israeli parole boards concerning the effect of serial order in which cases are presented within ruling sessions. DLA analyzed 1,112 legal rulings of Israeli parole boards that cover about 40% of the parole requests of the country. They assessed the effect of the serial order in which cases are presented within a ruling session and took advantage of the fact that the ruling boards work on the cases in three sessions per day, separated by a late morning snack and a lunch break. They found that the probability of a favorable decision drops from about 65% to 5% from the first ruling to the last ruling within each session. This is equivalent to an odds ratio of 35. The authors argue that these findings provide support for extraneous factors influencing judicial decisions and speculate that the effect might be driven by mental depletion. Glockner notes that the article has attracted attention and the supposed order effect is considerably cited in psychology.
David Brooks seems to be a fascination of mine. The New York Times columnist surprises me both in positive and negative ways. I only mention it when the surprise is negative. Below is an excerpt from his November 25, 2016, column.
And this is my problem with the cognitive sciences and the advice world generally. It’s built on the premise that we are chess masters who make decisions, for good or ill. But when it comes to the really major things we mostly follow our noses. What seems interesting, beautiful, curious and addicting?
Have you ever known anybody to turn away from anything they found compulsively engaging?
We don’t decide about life; we’re captured by life. In the major spheres, decision-making, when it happens at all, is downstream from curiosity and engagement. If we really want to understand and shape behavior, maybe we should look less at decision-making and more at curiosity. Why are you interested in the things you are interested in? Why are some people zealously seized, manically attentive and compulsively engaged?
Now that we know a bit more about decision-making, maybe the next frontier is desire. Maybe the next Kahneman and Tversky will help us understand what explains, fires and orders our loves.
I can imagine his frustration with the advice world and maybe with Kahneman and Tversky (see post Prospect Theory), but it appears that Brooks is only looking at the advice world. Brooks would benefit by looking at the work of Ken Hammond. The post Cognitive Continuum examines some of Hammond’s 1980 work. Hammond has those chess masters to whom Brooks refers as one extreme of the cognitive continuum. The post Intuition in J-DM looks at the work of Tilmann Betsch and Andreas Glockner in what is called Parallel Constraint Satisfaction theory.
Betsch and Glockner believe that information integration and output formation (choice, preference) is intuitive. Analysis involves directed search (looking for valid cues or asking an expert for advice), making sense of information, anticipating future events, etc. Thus, they see a judgment as a collaboration of intuition and analysis. The depth of analysis varies, but intuition is always working so preferences are formed even without intention. Limiting processing time and capacity constrains only input. Thus, once information is in the system, intuition will use that information irrespective of amount and capacity.
Curiosity might be considered the degree of dissonance we encounter in our automatic decision making that in effect tells us to analyze–find more information and examine it. We do mostly follow our noses, because it is adaptive. But it is also adaptive to be able to recognize change that is persistent and must be responded to. A parameter of the parallel constraint satisfaction model is the individual sensitivity to differences between cue validities. This implies that individuals respond differently to changing cue validities. Some change quickly when they perceive differences and others change at a glacial pace.
The post Rationality Defined Again: RUN & JUMP looks at the work of Tilmann Betsch and Carsten Held. Brooks in his opinion piece seems to be suggesting that analytic processing is pretty worthless. Betsch and Held have seen this before. They note that research on non-analytic processing has led some authors to conclude that intuition is superior to analysis or to at least promote it as such with the obvious example being Malcolm Gladwell in Blink. Such a notion, however, neglects the important role of decision context. The advantages and disadvantages of the different types of thought depend on the nature of the task. Moreover, the plea for a general superiority of intuition neglects the fact that analysis is capable of things that intuition is not. Consider, for example, the case of routine maintenance and deviation decisions. Routine decisions will lead to good results if prior experiences are representative for the task at hand. In a changing world, however, routines can become obsolete.
In the absence of analytic thought, adapting to changing contexts requires slow, repetitive learning. Upon encountering repeated failure, the individual’s behavioral tendencies will change. The virtue of deliberate analysis, Brooks’ chess mastering, lies in its power to quickly adapt to new situations without necessitating slow reinforcement learning. Whereas intuition is fast and holistic due to parallel processing, it is a slave to the pre-formed structure of knowledge as well as the representation of the decision problem. The relations among goals, situations, options and outcomes that result from prior knowledge provide the structural constraints under which intuitive processes operate. They can work very efficiently but, nevertheless, cannot change these constraint. The potential of analytic thought dwells in the power to change the structure of the representation of a decision problem.
I believe that Brooks realizes that analytic thought is one thing that distinguishes us from other creatures even though it does not seem to inform much of our decision making. The post Embodied(Grounded) prediction(cognition might also open a window for Brooks.
This post is based on a paper written by Fabienne Picard and Karl Friston, entitled: “Predictions, perceptions, and a sense of self,” that appeared in Neurology® 2014;83:1112–1118. Karl Friston is one of the prime authors of predictive processing and Fabienne Picard is a doctor known for studying epilepsy. The ideas here are not new or even new to this blog, but the paper and specifically the figure below provide a good summary of the ideas of predictive processing. Andy Clark’s Surfing Uncertainty is the place to go if the subject interests you.
This post examines the paper: “Are There Levels of Consciousness?” written by
Tim Bayne, Jakob Hohwy, and Adrian M. Owen, that appeared in Trends in Cognitive Sciences, June 2016, Vol. 20, No. 6. The paper is described as opinion and for me bridges ideas of predictive processing with some of the ideas of Stanislas Dehaene. Jakob Hohwy is an important describer of predictive processing. The paper argues that the levels-based or continuum based framework for conceptualizing global states of consciousness is untenable and develops in its place a multidimensional account of global states.
Consciousness is typically taken to have two aspects: local states and global states. Local states of consciousness include perceptual experiences of various kinds, imagery experiences, bodily sensations, affective experiences, and occurrent thoughts. In the science of consciousness local states are usually referred to as ‘conscious contents. By contrast, global states of consciousness are not typically distinguished from each other on the basis of the objects or features that are represented in experience. Instead, they are typically distinguished from each other on cognitive, behavioral, and physiological grounds. For example, the global state associated with alert wakefulness is distinguished from the global states that are associated with post-comatose conditions.
The authors suggest that to describe global states as levels of consciousness is to imply that consciousness comes in degrees, and that changes in a creature’s global state of consciousness can be represented as changes along a single dimension of analysis. Bayne, Hohwy, and Owen see two problems with this. One person can be conscious of more objects and properties than another person, but to be conscious of more is not to be more conscious. A sighted person might be conscious of more than someone who is blind, but they are not more conscious than the blind person is. The second problem that they see with the level-based analysis of global states is that there is good reason to doubt whether all global states can be assigned a determinate ordering relative to each other. The authors provide the example of the relationship between the global conscious state associated with rapid eye movement (REM) sleep and that which is associated with light levels of sedation. They do not believe that one of these states must be absolutely ‘higher’ than the other. Perhaps states can be compared with each other only relative to certain dimensions of analysis: the global state associated with REM sleep might be higher than that associated with sedation on some dimensions of analysis, whereas the opposite might be the case on other dimensions of analysis (Figure 1A).
The authors recognize two clear dimensions, but suggest there are likely several more. The first is gating. In some global states the contents of consciousness appear to be gated in various ways, with the result that individuals are able to experience only a restricted range of contents. MCS patients, patients undergoing absence seizures, and mildly sedated individuals can consciously represent the low-level features of objects, but they are typically unable to represent the categories to which perceptual objects belong. Thus, the gating of conscious contents is likely to provide one dimension along which certain global states can be hierarchically organized. The second dimension of consciousness is often captured by saying that the contents of consciousness are globally available for the control of thought and action. However, there is good reason to think that it is compromised in a number of pathologies of consciousness. For example, patients undergoing absence seizures can engage in perceptual-driven motor responses even though their capacities for reasoning, executive processing, and memory consolidation are typically limited. With respect to this dimension, the global state of consciousness associated with the EMCS is ‘higher’ than that which is associated with the MCS, for EMCS patients have access to a wider range of cognitive and behavioral consuming systems than MCS patients do.
Beyond the dimensions of gating of contents and the availability associated with consciousness, the authors suggest there might there be a role for attention in structuring global states. There is also the question of the possibility of interaction between some of the dimensions that structure consciousness. Although some dimensions may be completely independent of each other, others are likely to modulate each other. For example, there might be interactions between the gating of contents and functionality such that consciousness cannot be high on the gating dimension but low on certain dimensions of functionality (Figure 1C).
This idea that global states of consciousness are best understood as regions in a multidimensional space seems to me a natural progression as we learn more about consciousness and its underpinnings. An example is the time when you are completely immersed in some task and you don’t notice time passing or who walked by. Your attention is completely focused and gated so that you are missing other things. It is not a higher level of consciousness, but a different level of consciousness. The spotlight is focused on a smaller area. The light itself is not any brighter. At the same time, the argument that Bayne, Hohwy and Owen are making seems to be focused at very limited consciousness. Most of us just see a sleeping person as unconscious without an active global neuronal workspace. We do not see a person as conscious until some threshold or phase change occurs so that the light is brighter so that the availability is greater. There must be some level of error coming back from our predictions. Several previous posts including Consciousness. Confessions of a Romantic Reductionist, The Global Neuronal Workspace, and Dehaene: Consciousness and Decision Making, have looked at consciousness. This paper did not address the consciousness of other animals. It also did not address Intuition which is often considered unconscious in some ways since it is typically effortless as we perceive it. Global availability seems important to the idea. Of course, as you develop expertise, global availability is not so necessary for certain subjects. Auto-pilot can handle normal situations once you have expertise so maybe we all have different conscious realms since we have different expertise.
Frankly, I doubt that many would argue that consciousness has only a single dimension. Dehaene may ignore multiple dimensions, but I would suggest that he does this to make the idea more understandable to laymen.
This post is based on the paper: “Cognitive Control Predicts Use of Model-Based Reinforcement-Learning,” authored by A. Ross Otto, Anya Skatova, Seth Madlon-Kay, and Nathaniel D. Daw, Journal of Cognitive Neuroscience. 2015 February ; 27(2): 319–333. doi:10.1162/jocn_a_00709. The paper is difficult to understand, but covers some interesting subject matter. Andy Clark alerted me to these authors in his book Surfing Uncertainty.
This paper makes the obvious assertion that dual process theories of decision making abound, and that a recurring theme is that the systems rely differentially upon automatic or habitual versus deliberative or goal-directed modes of processing. According to Otto et al a popular refinement of this idea proposes that the two modes of choice arise from distinct strategies for learning the values of different actions, which operate in parallel. In this theory, habitual choices are produced by model-free reinforcement learning (RL), which learns which actions tend to be followed by rewards. In contrast, goal-directed choice is formalized by model-based RL, which reasons prospectively about the value of candidate actions using knowledge (a learned internal “model”) about the environment’s structure and the organism’s current goals. Whereas model-free choice involves requires merely retrieving the (directly learned) values of previous actions, model-based valuation requires a sort of mental simulation – carried out at decision time – of the likely consequences of candidate actions, using the learned internal model. Under this framework, at any given moment both the model-based and model-free systems can provide action values to guide choices, inviting a critical question: how does the brain determine which system’s preferences ultimately control behavior?
This is the first post in quite a while. I have been trying to consolidate and integrate my past inventory of posts into what I am calling papers. This has turned out to be time consuming and difficult since I really have to write something. As part of that effort, I have been reading Surfing Uncertainty– Prediction, Action, and the Embodied MInd authored by Andy Clark. (I note that this book is not designed for the popular press–it is quite challenging.) Clark refers to the extensive literature on decision making and pointed out a part of that literature unknown to me. With that recommendation, I sought out: “Perception, Action and Utility: The Tangled Skein,” (2012) in M. Rabinovich, K. Friston & P. Varona, editors, Principles of brain dynamics: Global state interactions. Cambridge, MA: MIT Press written by Samuel J. Gershman and Nathaniel D. Daw.
Gershman and Daw focus on two aspects of decision theory that have important implications for its implementation in the brain:
1. Decision theory implies a strong form of separation between probabilities and utilities. In
particular, the posterior must be computed before (and hence independently of) the expected
utility. This assumption is sometimes known as probabilistic sophistication. It means that I can state how much enjoyment I would derive from having a picnic in sunny weather, independently of my belief that it will be sunny tomorrow. This framework supports a sequentially staged view of the problem –perception guiding evaluation.
2. The mathematics that formalizes decision making under uncertainty, Bayes Theorem, generally assumes Gaussian or multinomial assumptions for distributions. Gershman and Daw note that these assumptions are not generally applicable to real-world decision-making
tasks, where distributions may not take any convenient parametric form. This means that if the brain is to perform the necessary calculations, it must employ some form of approximation.
Statistical decision theory, to be plausibly implemented in the brain, requires segregated representations of probability and utility, and a mechanism for performing approximate inference.
The full story, however, is not so simple. First, abundant evidence from vision indicates that reward modulation occurs at all levels of the visual hierarchy, including V1 and even before that in the lateral geniculate nucleus. Gershman and Daw suggest that the idea of far-downstream LIP (lateral intraparietal area) as a pure representation of posterior state probability is dubious. Indeed, other work varying rewarding outcomes for actions shows that neurons in LIP are indeed modulated by the probability and amount of reward expected for an action probably better thought of as related to expected utility rather than state probability per se. Then recall that area LIP is only one synapse downstream from the instantaneous motion energy representation in MT. If it already represents expected utility there seems to be no candidate for an intermediate stage of pure probability representation.
A different source of contrary evidence comes from behavioral economics. The classic Ellsberg
paradox revealed preferences in human choice behavior that are not probabilistically
sophisticated. The example given by Ellsberg involves drawing a ball from an urn containing 30 red balls and 60 black or yellow balls in an unknown proportion. Subjects are asked to choose between pairs of gambles (A vs. B or C vs. D) drawn from the following set:
Experimentally, subjects prefer A over B and D over C. The intuitive reasoning is that in gambles A and D, the probability of winning $100 is known (unambiguous), whereas in B and C it is unknown (ambiguous). There is no subjective probability distribution that can produce this pattern of preferences. This is widely regarded as violating the assumption of probability-utility segregation in statistical decision theory. (See post Allais and Ellsberg Paradoxes).
Gershman and Daw suggest two ways that the separation between probabilities and utilities might be weakened or abandoned:
A. Decision-making as probabilistic inference
The idea here is that by transforming the utility function appropriately, one can treat it as a probability density function parameterized by the action and hidden state. Consequently, maximizing the “probability” of utility with respect to action, while marginalizing the hidden state, is formally equivalent to maximizing the expected utility. Although this is more or less an algebraic maneuver, it has profound implications for the organization of decision-making circuitry in the brain. The insight is that what appear to be dedicated motivational and valuation circuits may instead be regarded as parallel applications of the same underlying computational mechanisms over effectively different likelihood functions.
Karl Friston builds on this foundation to assert a much more provocative concept: that for biologically evolved organisms, the desired equilibrium is by definition just the species’ evolved equilibrium state distribution. The mathematical equivalence rests on the evolutionary argument that hidden states with high prior probability also tend to have high utility. This situation arises through a combination of evolution and ontogenetic development, whereby the brain is immersed in a “statistical bath” that prescribes the landscape of its prior distribution. Because agents who find themselves more often in congenial states are more likely to survive, they inherit (or develop) priors with modes located at the states of highest congeniality. Conversely, states that are surprising given your evolutionary niche, like being out of water, for a fish, are maladaptive and should be avoided. (See post Neuromodulation.)
B. The costs of representation and computation
Probabilistic computations make exorbitant demands on a limited resource, and in a real physiological and psychological sense, these demands incur a cost that debits the utility of action. According to Gershman and Daw, humans are “cognitive misers” who seek to avoid effortful thought at every opportunity, and this effort diminishes the same neural signals that are excited by reward. For instance, one can study whether a rat who has learned to lever press for food while hungry will continue to do so when full; a full probabilistic representation over outcomes will adjust its expected utilities to the changed outcome value, whereas representing utilities only in expectation can preclude this and so predicts hapless working for unwanted food. The upshot of many such experiments is that the brain adopts both approaches, depending on circumstances. Circumstances elicit which approach can be explained by a sort of meta-optimization over the costs (e.g. extra computation) of maintaining the full representation relative to its benefits (better statistical accuracy).