Predictive Processing

A. Prediction Machine
(See Post Prediction Machine)

  1. Encoding only unexpected variation

This paper focuses on the an idea of the brain’s goals. This is the idea of the prediction machine. Andy Clark suggests that the brain has evolved to support perception and action by attempting to match incoming sensory inputs with top-down expectations-predictions. This is done by using a hierarchical model that minimizes prediction error or surprise within a bidirectional cascade of cortical processing.

According to Clark, the process shares much with data compression strategy in signal processing. Consider a basic task such as image transmission: in most images, the value of one pixel regularly predicts the value of its nearest neighbors, with differences marking important features such as the boundaries between objects. That means that the code for a rich image can be compressed (for a properly informed receiver) by encoding only the “unexpected” variation: the cases where the actual value departs from the predicted one. What needs to be transmitted is therefore just the difference (a.k.a. the “prediction error”) between the actual current signal and the predicted one. Descendants of this kind of compression technique are currently used in JPEGs, in various forms of loss less audio compression, and in motion-compressed coding for video. The information that needs to be communicated “upward” under all these regimes is just the prediction error: the divergence from the expected signal.

Forward (to the brain) connections between levels thus carry the “residual errors”, the predictions from the actual lower level activity, while backward (from the brain) connections, which as Clark says do most of the “heavy lifting” in these models, carry the predictions themselves. The generative model providing the “top-down” predictions is here doing much of the more traditionally “perceptual” work, with the bottom up driving signals really providing a kind of ongoing feedback on their activity (by fitting, or failing to fit, the cascade of downward-flowing predictions). This leads to the development of neurons that exhibit a “selectivity that is not intrinsic to the area but depends on interactions across levels of a processing hierarchy”. For hierarchical predictive coding,  context-sensitivity is fundamental.

To see this, Clark says that we need only reflect that the neuronal responses that follow an input may be expected to change quite profoundly according to the contextualizing information provided by a current winning top-down prediction. When a neuron or population is predicted by top-down inputs it will be much easier to drive than when it is not”. This is because the best overall fit between driving signal and expectations will often be found by inferring noise in the driving signal and thus recognizing a stimulus as, for example, the letter m say, in the context of the word “mother”, even though the same bare stimulus, presented out of context or in most other contexts, would have been a better fit with the letter n.  A unit normally responsive to the letter m might, under such circumstances, be successfully driven by an n-like stimulus.

Attention fits into this picture, as a means of variably balancing the potent interactions between top-down and bottom-up influences by factoring in their degree of uncertainty. This is achieved by altering the volume on the error-units accordingly. Attention, if this is correct, is simply one means by which certain error-unit responses are given increased weight, hence becoming more apt to drive learning. This means that the precise mix of top down and bottom-up influence is not fixed. Instead, the weight given to sensory prediction error is varied according to how reliable (how noisy, certain, or uncertain) the signal is taken to be. Thus we are not (not quite) slaves to our expectations. Successful perception requires the brain to minimize surprisal. But the agent is able to see surprising things, at least in conditions where the brain assigns high reliability to the driving signal.

2. Bootstrap heaven  (Clark, Andy (2016). Surfling Uncertainty: Prediction, Action, and the Embodied Mind, Oxford University Press, New York. p 17-19.) See also posts Bootstrapping  and Dialectical Bootstrapping

The environment provides a form of supervised learning. The line between predicting the present and predicting the very near future vanishes. The states of your sensory registers will change, in ways driven by the incoming signal, as the world changes around you. The evolving states of your own sensory receptors provide a training signal  allowing your brain to ‘self-supervise’ its own learning. If you attempt to predict everything that happens next, then every single moment is a learning opportunity. This might explain how an infant seems to magically acquire such a sophisticated understanding of the world, despite their seemingly inert overt behavior.

The prediction task, thus conceived is a kind of bootstrap heaven. One way to learn a surprising amount about grammar is to look for the best ways to predict the next words in sentences. This is just the kind of training that the world provides, since your attempts at prediction are soon followed by the next word in the sentence. You can thus use the prediction task to bootstrap your way to the grammar, which you then use in the prediction task in the future.

pimmsWe also can use another trick–that is to recall specific concrete events that may be relevant to the task at hand. This is part of what is called PIMMS, a predictive interactive multiple memory system. The key novelty of the PIMMS model is that ongoing feedback links the three systems both during encoding and retrieval, and that different patterns of recurrent interaction at both points account for the observed differences in the data. Thus, what may appear to be a modular and motley group of systems, may be a web of statistically sensitive mutual influence that combines context with content, and balances specialization with integration.(Clark, Andy (2016). Surfling Uncertainty: Prediction, Action, and the Embodied Mind, Oxford University Press, New York. p 102-106.)

We command a forward model of the likely sensory consequences of our own motor commands.  The forward model allows us to overcome the signalling delays that would otherwise impeded fluid motion. Our control systems only have access to out of date information about the world and our own bodies and the delays are not even the same. Forward models enable us to live in the present and to control our bodies without much sense of struggle or effort.

 3. Explaining Away (See Post Explaining Away)

Both Clark and Hohwy use “explaining away” to illustrate the concept of cancelling out sensory prediction error. Perception thus involves “explaining away” the driving (incoming) sensory signal by matching it with a cascade of predictions pitched at a variety of spatial and temporal scales. These predictions reflect what the system already knows about the world (including the
body) and the uncertainties associated with its own processing. What we perceive depends heavily upon the set of priors that the brain brings to bear in its best attempt to predict the current sensory signal. An example that Hohwy provides is when the power goes out at home. You should have gotten a new main circuit breaker months ago so that is your first worry, but when you go outside and see everybody’s lights are out, you explain away that first hypothesis. Of course, it could be an unusual coincidence, but you dismiss it.

The balancing act between cancelling out and selective enhancement of nerve signals is made possible only by  the existence of “two functionally distinct sub populations,  encoding the conditional expectations of perceptual causes and the prediction error respectively”.  Superficial pyramidal cells are depicted as playing the role of error units, passing prediction error forward, while deep pyramidal cells play the role of representation units, passing predictions (made on the basis of a complex generative model) downward. Some form of functional separation is required. Hohwy sees such separation as “a central feature of the proposed architecture, and one without which it would be unable to combine the radical elements drawn from predictive coding with simultaneous support for the more traditional structure of increasingly complex feature detection and top-down signal enhancement.”

4. Neuromodulation (See Post Neuromodulation)

Karl Friston states:

It is self evident that if our brains entail generative models of our world, then much of the brain must be devoted to modelling entities that populate our world; namely, other people. In other words, we spend much of our time generating hypotheses and predictions about the behavior of people—including ourselves.

According to Friston this places the mirror neuron system center stage in generating predictions about how we will behave. A mirror neuron is a neuron that fires both when a person acts and when the person observes the same action performed by another. Thus, the neuron “mirrors” the behavior of the other, as though the observer were itself acting. If I see someone picking up an apple, my brain will make the motor plans necessary to snatch the apple myself. I prepare to imitate the movement and to feel what it would be like. In discussing this, Friston brings up three terms that were new to me. I will try to explain them as an aside.

Proprioception is the sense of the relative position of neighboring parts of the body and strength of effort being employed in movement. The proprioceptive sense is believed to be composed of information from sensory neurons located in the inner ear (motion and orientation) and in the stretch receptors located in the muscles and the joint-supporting ligaments (stance). Proprioception is sometimes said to combine the kinaesthetic (extremities) system and the vestibular (inner ear) system. It is distinguished from exteroception, by which one perceives the outside world through specific receptors for pressure, light, temperature, sound, and other sensory experiences, and interoception, by which one perceives pain, hunger, etc., and the movement of internal organs.

To appreciate the bilateral nature of predictions provided by the mirror neuron system, Friston suggests that we have to take unconscious inference to the next level and consider it in an embodied context. Put simply, one can regard perception as the suppression of outside the body prediction errors by selecting predictions that are best able to explain sensations. However, exactly the same argument can be applied to action that minimizes inside the body prediction errors via the classical reflex arcs. In other words, we can reduce prediction errors in one of two ways: we can either change predictions so that they match (exteroceptive) sensory samples, or we can change the samples through action, to make them match (proprioceptive) predictions. This is active inference. So what has this got to do with mirror neurons? If mirror neurons provide top-down predictions of both the proprioceptive and exteroceptive consequences of moving—and thereby cause movements through motor reflexes, then they provide a ready-made set of hypotheses for inferring the motor intentions of other people. This is because the exteroceptive (e.g. visual) consequences of movements are the same and all we have to do is to suppress the proprioceptive predictions. This provides a  perspective on why mirror neurons respond both to self-made acts and during action observation.

‘just as the visual brain constructs models of reality from figural primitives, so our social brain is innately wired to function as a psychologist, forming models of other people’s motivations, desires and thoughts.’ (p. 406)

However, Friston suggests that there is an important twist here. To harness the mirror neuron system during action observation, we have to suppress proprioceptive prediction errors that would otherwise elicit movements and cause us to mimic (mirror) the subject of our observation. This suppression rests on (mathematically speaking) reducing the precision of—or confidence in—proprioceptive prediction errors. This speaks to fundamental aspect of inference in the brain; namely the encoding of precision or confidence through neuromodulation. In other words, not only do we have to infer the content of our sensory apparatus but also the context, in terms of the precision or certainty about the content. This represents a subtle but always present problem that the brain has to solve—and the solution rests on modulating the gain or post-synaptic sensitivity of neuronal populations encoding prediction error.

This neuromodulation  ties inference in the brain to synaptic processes that may be compromised in syndromes like schizophrenia, Parkinson’s disease, autism, hysterical disorders, and so on. The basic idea is that many disorders of active inference can be understood as a failure of neuromodulation.  Friston suggests that the biggest challenge to formal descriptions of the brain as an inference machine is how one can accommodate emotions, self-awareness and disorders thereof. Prediction errors are formed by comparing primary input from stretch receptors with descending proprioceptive predictions to alpha motor neurons in the spinal-cord (and cranial nerve nuclei). This view replaces descending motor commands with motor predictions that are fulfilled by peripheral reflexes. The predictions themselves are elaborated on the basis of deep hierarchical inference about states of the world, including the trajectories of our own bodies. Friston believes that exactly the same mechanism can be applied to interoceptive signals. This means that the internal milieu is controlled by autonomic reflexes that transcribe descending interoceptive predictions into a physiological homoeostasis. As with the mirror neuron system (and sensorimotor representations in general), these interoceptive predictions are just one—among many—of multimodal predictions that come from high-level hypotheses about our embodied state.

Friston says that the best explanation for the myriad of sensory inputs I am experiencing is my situated state of mind. A situated state  comprises an internally consistent hierarchical model of the world, with multiple levels of description. Crucially, these hierarchal representations also predict my interoceptive state; including sympathetic and parasympathetic outflow—literally, my gut feelings. In this view, interoceptive information does not cause our self-awareness, or vice versa. There is a circular causality in which neuronal representations cause changes in autonomic status by enslaving autonomic reflexes. At the same time, interoceptive signals entrain hierarchical representations so that they provide the best prediction. Emotional valence is therefore a necessary aspect of any representation in the brain that includes interoceptive predictions. This means that—in terms of the brain’s computational anatomy—the influence of gut feelings (interoceptive signals) is inherently contextualized by concomitant exteroceptive and proprioceptive input. As Kandel observes:

‘As with visual perception, where we have learned that the brain is not a camera but a Homeric storyteller, so with emotion: the brain actively interprets the world using top-down inferences that depend upon context. As James pointed out, feelings do not exist until the brain interprets the cause of the body’s physiological signals and assembles an appropriate, creative response that is consistent with our expectations and the immediate context.’ (p. 351)


B. Prediction Error Minimization

(See Post Prediction Error Minimization)

  1. Hierarchy

An interesting example of the hierarchical predictive coding model is binocular rivalry. Binocular rivalry is a form of visual experience that occurs when, using a special experimental set-up, each eye is presented (simultaneously) with a different visual stimulus. Thus, the right eye might be presented with an image of a house, while the left receives an image of a face. Under these albeit artificial conditions, subjective experience unfolds in a surprising, “bi-stable” manner. Instead of visually experiencing a confusing all-points merger of house and face information, subjects report a kind of perceptual alternation between seeing the house and seeing the face. The driving (bottom-up) signals contain information that suggests two distinct, and incompatible, states of the visually presented world – for example, face at location X/house at location X. When one of these is selected as the best overall hypothesis, it will account for
all and only those elements of the driving input that the hypothesis predicts. As a result, prediction error for that hypothesis decreases. But prediction error associated with the elements of the driving signal suggestive of the alternative hypothesis is not suppressed; it is now propagated up the hierarchy. To suppress those prediction errors, the system needs to find another hypothesis. But having done so (and hence, having flipped the dominant hypothesis to the other interpretation), there will again emerge a large prediction error signal, this time deriving from those elements of the driving signal not accounted for by the flipped interpretation. No single hypothesis accounts for all the data, so the system alternates between the two semi-stable states. It behaves as a bi-stable system.

We do not experience a combined or interwoven image, because they do not constitute a viable hypothesis given our more general knowledge about the visual world. For it is part of that general knowledge that, for example, houses and faces are not present in the same place, at the same scale, at the same time. This, indeed, is the explanation of the existence of competition between certain higher-level hypotheses in the first place. They compete because the system has learned that “only one object can exist in the same place at the same time”.

2. Slowing down

The brain maintains its own integrity in the onslaught of sensory input by slowing down and controlling the causal transition of the input through itself. If it had no means to slow down the input its states would be at the mercy of the world and would disperse quickly. Hohwy states that a good dam builder must slow down the inflow of water by slowing down and controlling it with a good system of dams, channels, and locks. This dam system must in some sense anticipate the flows of water in a way that makes sense in the long run and that manages flows well on average. The system will do this by minimizing “flow errors”, and it will do this by learning about the states of water flow in the world on the other side of the dam.

This means that other types of descriptions of mental processes must all come down to the way neurons manage to slow sensory input. Hohwy is studying autism as a disorder where the prediction hierarchy is stuck closer to the senses so the model of the world is not corrected through a great number of repetitions. Thus, things are not slowed down enough. Hohwy provides an example of trying to determine the mean of 20 numbers, but we are given them one at a time, and if things are working correctly, we maintain a running mean that in stepwise manner eventually can make the determination. However, in this example, an autistic person would be presented with a single number each time without the running mean. This makes prediction difficult.

Many of the technical, social and cultural ways we interact with the world can be characterized as attempts to make the link between sensory input and environmental causes less volatile–slowing them down. We see this in the benefits of the built environment that protect from the heat or cold, in radio that lets us hear things directly rather than through hearsay, and in language. This picture relies on the internal nature of the neural mechanism that minimizes prediction error, relative to which all our cultural and technological trappings are external. Culture and technology situate the mind closer to the world through improving the reliability of its sensory input. They help us to communicate with and predict other people’s behavior more accurately. An overlooked aspect  is that ritual, convention, music, and other shared practices help align our mental states with each other and further enhance mutual predictability.

Professional athletes talk about how the game slows down as they get used to a stiffer level of competition. Their predictive models get better and that slows things down. My budding football career burned out at age 14. Having progressed from backyard football to 11 men on a team football, I can recall carrying the ball and getting beyond the line of scrimmage and just seeing a blur–not knowing what to focus on. Slowness also seems to be the opposite of an anxiety attack where ideas spiral and move too quickly. That the brain exists to slow sensory input is an interesting idea.

Clark notes that this “Bayesian brain”  even at its best, does not get everything right. He provides this description by Lt Colonel Henry Worsley on an expedition to the North Pole.

Whiteout days are tricky. That’s when the cloud cover gets so low it obscures the horizon. Amundsen called it ‘the white darkness’.  You have no idea of distance or height. There’s a story of him seeing what he thought was a man on the horizon.  As he started walking, he realized it was a dog turd just three feet in front of him.


(Clark, Andy (2016). Surfling Uncertainty: Prediction, Action, and the Embodied Mind, Oxford University Press, New York. p 8.)

The ‘man’ prediction may have been Bayes optimal, but it was wrong.

3. Embodied cognition (See Post Embodied Cognition)

Precision estimation thus has a kind of meta-representational feel, since we are estimating the uncertainty of our own representations of the world. These ongoing task and context-varying estimates alter the weighting on select prediction error units, so as to increase the impact of task-relevant, reliable information. One key effect of this is to allow the brain to vary the balance between sensory inputs and prior expectations at different levels in ways sensitive to task and context. High-precision prediction errors have greater weight, and thus play a large role in driving processing and response.

Estimating the reliability or lack thereof of our own prediction error signals is clearly a delicate and tricky business as evidenced by the whiteout days above. Estimating the reliability of some item of news is never easy, as anyone who has encountered widely differing reports of the same event in different media knows. Suppose that the ownership of your most trusted information source changed overnight. Streams of information that you are pre-inclined to believe are now seriously misleading. Martians really have landed. You are not suddenly impaired in forming predictions or prediction errors. The problem lies in the way they are used to inform inference or hypotheses.

What precision weighting provides is essentially a means of balancing patterns of inference and action, and as such it is strangely neutral concerning the intuitive difference between increasing the precision up a prior belief or decreasing the precision upon the sensory evidence. What matters is just the relative balance of influence, however that is achieved.

4. Heuristics (See post Embodied Prediction)

Clark then brings up Gerd Gigerenzer’s gift to the popular press: the “outfielder’s problem”: running to catch a fly ball in baseball. Giving perception its standard role, the job of the visual system is to get information about the current position of the ball so as to allow a distinct “reasoning system” to project its future trajectory. Nature, however, seems to have found a more elegant and efficient solution. Clark gives Chapman credit for the solution that involves running in a way that seems to keep the ball moving at a constant speed through the visual field. As long as the fielder’s own movements cancel any apparent changes in the ball’s optical acceleration, he will end up in the location where the ball hits the ground. This solution, OAC (Optical Acceleration Cancellation), explains why fielders, when asked to stand still and simply predict where the ball will land, typically do rather badly. They are unable to predict the landing spot because OAC is a strategy that works by means of moment-by moment self-corrections that, crucially, involve the agent’s own movements. According to Clark, OAC is a  case of fast, economical problem-solving. The use of data available in the optic flow enables the outfielder to sidestep the need to deploy a rich inner model to calculate the forward trajectory of the ball.

Clark is eloquent here:

Instead of using sensing to get enough information inside, past the visual bottleneck, so as to allow the reasoning system to “throw away the world” and solve the problem wholly internally, such strategies use the sensor as an open conduit allowing environmental magnitudes to exert a constant influence on behavior. Sensing is here depicted as the opening of a channel, with successful whole-system behavior emerging when activity in this channel is kept within a certain range. In such cases: The focus shifts from accurately representing an environment to continuously engaging that environment with a body so as to stabilize appropriate co-ordinated patterns of behavior. (See post Embodied Prediction)

Apt precision weightings here function to select what to predict at any given moment. They may thus select a pre-learned, fast, low-cost strategy for solving a problem, as task and context dictate. Contextually recruited patterns of precision weighting thus accomplish a form of set-selection or strategy switching. (Surfing Uncertainty, p 257)


C. Scaffolding

As humans, we have been able to use language, our social skills, and our understanding of hierarchy to extend our cognition.  Multiplication of large numbers is an example. We cannot remember enough numbers in our heads so we created a way to do any multiplication on paper or its equivalent if we can learn our multiplication tables. Clark cites the example of the way that learning to perform mental arithmetic has been scaffolded, in some cultures, by the deliberate use of an abacus. Experience with patterns thus made available helps to install appreciation of many complex arithmetical operations and relations. We structure (and repeatedly re-structure) our physical and social environments in ways that make available new knowledge and skills. Prediction-hungry brains, exposed in the course of embodied action to novel patterns of sensory stimulation, may thus acquire forms of knowledge that were genuinely out-of reach prior to such physical-manipulation-based re-tuning of the generative model. Action and perception thus work together to reduce prediction error against the more slowly evolving backdrop of a culturally distributed process that spawns a succession of designed environments whose impact on the development and unfolding of human thought and reason can hardly be overestimated.

Clark brings out the distinction between model free and model based schemes that would select strategies according to context. Model free learning is associated with habitual, intuitive, automatic control of choice and action while model based learning is associated with conscious analytical approaches. The basic idea is that within predictive processing, there would be both a model based and a model free controller using precision estimation and weighting. Based on the estimate of which is best in the current situation, either the model based or model free would be selected and then there would be toggling between them when the need arises. Here, a so called “Mixed Instrumental Controller” (See post Mixed Instrumental Controller) determines whether to choose an action based upon a set of simple, pre-computed (“cached”) values, or by running a mental simulation enabling a more flexible, model-based assessment of the desirability, or otherwise, of actually performing the action. The mixed controller computes the “value of information”, selecting the more informative (but costly) model-based option only when that value is sufficiently high. More specifically, the MIC calculates the value of information of mental simulation on the basis of uncertainty and of how much the alternative “cached” action values differ against each other. Mental simulation, in such cases, then produces new reward expectations that can determine current action by updating the values used to determine choice. We can think of this as a mechanism that, moment-by-moment, determines whether to exploit simple, already-cached routines or to explore a richer set of possibilities using some form of mental simulation.This is, in fact, Hammond’s idea of a cognitive continuum that was put forth nearly forty years ago. (See post Cognitive Continuum  . At that time I wondered whether Hammond’s idea could be related to actual brain function, but it is clear that it did.

Clark makes the point that the conceptualizations of System 1 with its fast, automatic response and System 2 with its slow effortful deliberative reasoning look increasingly shallow. These are now just convenient labels for different admixtures of resource and influence, each of which is recruited in the same general way as circumstances dictate.

Clark doubts that there are distinct model based and model free systems that could somehow be found in neural subsystems. Clark suggests that the way to think about this in the predictive processing perspective is by associating model free responses with processing dominated by bottom up sensory flow, while model based responses are those that involve greater and more widespread kinds of top-down influence. The context-dependent balancing between theses two sources of information, achieved by adjusting the precision-weighting of predictive error, then allows for whatever admixtures of strategy task and circumstances dictate. Clark notes that this allows model free modes to use model based to teach them how to respond. Within the predictive processing framework, this results in  a hierarchical embedding of the shallow model free responses in a deeper model based economy.

Humans have been able to shape our environment in ways that both cue and help constitute low cost routes to success. We build better worlds to predict in. The downside is that these culturally guided processes create path dependence so that later solutions build on earlier ones. Maybe the QWERTY keyboard is an example. Clark suggests that the downside is small, and that many of our capacities for cultural learning are themselves cultural innovations acquired from social interactions. Reading and writing are prime examples. Language seems to play a double role: functioning as communicative vehicles, but also as playing a role in the unfolding of our own thoughts and ideas. Words make a difference. Hearing the word “dog” has been shown to be better than simply hearing a barking sound as a means of improving performance in a related discrimination task. Words are able to tune neuronal populations by shifting the category sensitivity in a particular direction.

Clark notes that the predictive brain is not doomed to deploy high-cost, model rich strategies moment-by-moment in a demanding and time-pressured world. Instead, that very same apparatus supports the learning and contextually determined deployment of low cost strategies that make the most of body, world, and action. Clark cites the example of painting white lines along the edges of a winding cliff-top road. This environmental alteration allows the driver to solve the complex problem of keeping the car on the road by (in part) predicting the ebb and flow of various simpler optical features and cues. In such cases, we are building a better world in which to predict, while simultaneously structuring the world to cue the low-cost strategy at the right time.

Clark, Andy (2016). Surfling Uncertainty: Prediction, Action, and the Embodied Mind, Oxford University Press, New York. “Lazy Predictive Brain”.