The Mixed Instrumental Controller

mic_MG_5849This is more or less a continuation of the previous post based on Andy Clark’s “Embodied Prediction,” in T. Metzinger & J. M. Windt (Eds). Open MIND: 7(T). Frankfurt am Main: MIND Group (2015).   It further weighs in on the issue of changing strategies or changing weights (see post Revisiting Swiss Army Knife or Adaptive Tool Box). Clark has brought to my attention the terms model free and model based which seem to roughly equate to intuition/system 1 and analysis/system 2 respectively. With this translation, I am helped in trying to tie this into ideas like cognitive niches and parallel constraint satisfaction. Clark in a footnote:

Current thinking about switching between model-free and model based strategies places them squarely in the context of hierarchical inference, through the use of “Bayesian parameter averaging”. This essentially associates model-free schemes with simpler (less complex) lower levels of the hierarchy that may, at times, need to be contextualized
by (more complex) higher levels.

As humans, we have been able to use language, our social skills, and our understanding of hierarchy to extend our cognition.  Multiplication of large numbers is an example. We cannot remember enough numbers in our heads so we created a way to do any multiplication on paper or its equivalent if we can learn our multiplication tables. Clark cites the example of the way that learning to perform mental arithmetic has been scaffolded, in some cultures, by the deliberate use of an abacus. Experience with patterns thus made available helps to install appreciation of many complex arithmetical operations and relations. We structure (and repeatedly re-structure) our physical and social environments in ways that make available new knowledge and skills. Prediction-hungry brains, exposed in the course of embodied action to novel patterns of sensory stimulation, may thus acquire forms of knowledge that were genuinely out-of reach prior to such physical-manipulation-based re-tuning of the generative model. Action and perception thus work together to reduce prediction error against the more slowly evolving backdrop of a culturally distributed process that spawns a succession of designed environments whose impact on the development and unfolding of human thought and reason can hardly be overestimated.

Clark notes that the predictive brain is not doomed to deploy high-cost, model rich strategies moment-by-moment in a demanding and time-pressured world. Instead, that very same apparatus supports the learning and contextually determined deployment of low cost strategies that make the most of body, world, and action. Clark cites the example of painting white lines along the edges of a winding cliff-top road. This environmental alteration allows the driver to solve the complex problem of keeping the car on the road by (in part) predicting the ebb and flow of various simpler optical features and cues. In such cases, we are building a better world in which to predict, while simultaneously structuring the world to cue the low-cost strategy at the right time.

According to Clark, this suggests a very natural model of “extended cognition”, where this is simply the idea that bio-external structures and operations may sometimes form integral parts of an agent’s cognitive routines. Actions that engage and exploit specific external resources will now be selected in just the same manner as the inner coalitions of neural resources themselves. Minimal internal models that involve calls to world-recruiting actions may thus be selected in the same way as a purely internal model. The availability of such strategies (of trading inner complexity against real-world action) is the basic feature of embodied prediction machines. The brain and body have evolved together in a changing, but usually stable environment so they work together. In studying them, we need to use representative design as suggested by Egon Brunswik.

Clark considers the work undertaken by Pezzulo. Pezzulo (“The Mixed Instrumental Controller: using value of information to combine habitual choice and mental simulation,” by Giovanni Pezzulo, Francesco Rigoli, and Fabian Chersi. Frontiers in Psychology | Cognition March 2013 |Volume 4|Article 92) proposes that a single instrumental process of decision making produces both goal-directed and habitual behavior by flexibly combining aspects of model-based and model-free computations. Here, a so called “Mixed Instrumental Controller” determines whether to choose an action based upon a set of simple, pre-computed (“cached”) values, or by running a mental simulation enabling a more flexible, model-based assessment of the desirability, or otherwise, of actually performing the action. The mixed controller computes the “value of information”, selecting the more informative (but costly) model-based option only when that value is sufficiently high. More specifically, the MIC calculates the value of information of mental simulation on the basis of uncertainty and of how much the alternative “cached” action values differ against each other. Mental simulation, in such cases, then produces new reward expectations that can determine current action by updating the values used to determine choice. We can think of this as a mechanism that, moment-by-moment, determines whether to exploit simple, already-cached routines or to explore a richer set of possibilities using some form of mental simulation.

Figure 2 illustrates the algorithm followed by the mixed instrumental controller model. This algorithm can be separated in four sub-processes, called meta-choice (between cached values
and mental simulation), mental simulation, choice, and learning.

Pezzulo’s experiments find that mental simulations at decision points diminish after sufficient learning, in line with evidence showing that in this condition habitization replaces goal-directed mechanisms of choice. However, if variance is high or if the values of the alternatives are too close, the system is slower in developing habits. When environmental contingencies change, mental simulations are used anew, consistent with evidence of a passage from habitual to goal-directed strategies after outcome devaluation (unless it occurs after “overtraining”). When contingencies change, the goal-directed system can immediately change behavior. In the MIC, the method we adopt consists in “clamping”one policy at a time, which produces a serial process of  (simulated) internal experience sampling where simulations are seen as actual observations. This method is different from the idea of a “tree search” as it is typically described in normative approaches, and from models of parallel “diffusion” processes for planning. From my understanding, the chess board provides an example. Tree search of each move soon becomes too complicated, but  if you were given the probability of winning of each potential move you now face, you would not need to know the tree.

I see this Mixed Instrumental Controller as bolting on to the front of Parallel Constraint Satisfaction.  We go along in “cached” mode until the value of information is sufficiently high to suggest that we should do mental simulation. Mental simulation seems like parallel constraint satisfaction to me and we can do mental simulation in either intuitive or analytic modes. If the mental simulation does not give us a clear decision in the intuitive mode, it suggests that we get new information in the analytic mode. Once we get new information, we can run new simulations.