This post is based on the paper: “Priors in perception: Top-down modulation, Bayesian
perceptual learning rate, and prediction error
minimization,” authored by Jakob Hohwy (see post Explaining Away) that appeared (or is scheduled to appear) in Consciousness and Cognition, 2017. Hohwy writes in an understandable manner and is so open that he posts papers even before they are complete of which this is an example. Hohwy pursues the idea of cognitive penetration – the notion that beliefs can determine perception.
Can ‘high level’ or ‘cognitive’ beliefs modulate perception? Hohwy methodically examines this question by trying to create the conditions under which it might work and not be trivial. For under standard Bayesian inference, the learning rate declines gradually as evidence is accumulated, and the prior updated to be ever more accurate. The more you already know the less you will learn from the world. In a changing world this is not optimal since when things in the environment change we should vary the learning rate. Hohwy provides this example. As the ambient light conditions improve, the learning rate for detecting a visible target should increase (since the samples and therefore the prediction error has better precision in better light). This means Bayesian perceptual inference needs a tool for regulating the learning rate. The inferential system should build expectations for the variability in lighting conditions throughout the day, so that the learning rate in visual detection tasks can be regulated up and down accordingly.
The human brain is thus hypothesized to build up a vast hierarchy of expectations that overall help regulate the learning rate and thereby optimize perceptual inference for a world that delivers changeable sensory input. Hohwy suggests that this makes the brain a hierarchical filter that takes the non-linear time series of sensory input and seeks to filter out regularities at different time scales. Considering the distributions in question to be normal or Gaussian, the brain is considered a hierarchical Gaussian filter or HGF .
In some visual illusions, such as the Müller-Lyer, highly precise priors for typical line-lengths may lead to a very low learning rate making the
system perceive falsely. Interpreted in terms of hierarchical inference, a Müller-Lyer stimulus gives rise to unexpected uncertainty (i.e., equal line lengths in spite of inwards and outwards wings). Depending on how high-level the involved priors, Hohwy believes that it is not unreasonable to classify this case of top-down modulation as a mild type of cognitive penetration even if in the vast majority of cases the same priors lead to true perception of lengths of line segments.
Individuals also differ. Hohwy says that it has long been argued that individuals with schizophrenia have a higher learning rate than healthy individuals in simple urn tasks where the priors and likelihoods would appear to be the same for all participants. There is also evidence that individuals with autism spectrum disorder are less prone to experience some visual illusions, suggesting individual variability in cases that pertain to cognitive penetration.
Long term, hierarchical priors for means and precisions are filtered from the environment. Some of these priors may influence even simple tasks, like picking marbles from urns. It could be that individuals with schizophrenia expect more volatility in the environment than healthy individuals. This would make them less likely to sit back and passively accumulate evidence for or against a particular hypothesis since the prior for volatility says the world will change rapidly. Hence they might jump to conclusions in a way that is consistent with Bayes’ rule, given their priors.
According to Hohwy, even if the brain does not literally know Bayes’ rule, it is the kind of system that has the mechanics to minimize prediction error. The brain’s models of the world can be used to predict sensory input and at keeping the – the prediction error – as low as possible. The main challenge for the prediction error minimizing brain is to manage the time frame over which prediction error minimization should occur. Getting rid of too much error over a short time period might overfit models, and be prone to increase prediction error in the long run. This is equivalent to having a too high learning rate: the modeling is swayed by mere variability. Conversely, getting rid of too little error in the short run would suggest having underfitted models which may blind the prediction error minimizing system to important real patterns the environment. This is then equivalent to having a too low learning rate: the model is not sensitive enough to the evidence.
The route to long term prediction error minimization can be ‘bumpy’. The prediction error minimization system will have expectations for its long term average prediction error bound and can only try various strategies for staying within those bounds.
As shown in Figure 1 the brain minimizes prediction error in the long run and thereby approximates hierarchical Bayesian inference. This yields a useful account in terms of
fluctuations around a varying learning rate such that cognitive penetrability can be individually variable, sometimes give false answers, and mediated by high-level belief.
Hohwy explains that attention can increase the weight on prediction error and thereby counteract a decreasing learning rate, which in turn diminishes cognitive penetration; conversely, when attention is withdrawn from some sensory input the learning rate for the relevant inference drops and penetration increases. Perceptual inference is always guided by expected precisions, so attention always plays a role alongside expectation in modulating the learning rate.
Expectation and attention also interact with adaptation. Hohwy writes that adaptation is the expectation that the world is not unchanging and that therefore prior probabilities will tend to drop in anticipation of the need for new events happening in the environment. Hence, even if attention is increasing the accumulation of evidence for a given prior hypothesis, adaptation will flatten that prior regardless. Similarly, attention can counteract the normal course of adaptation. For example, there may be more cognitive penetrability as we enter a new context like a new job where we don’t yet know how to allocate attention or how fast things tend to change.
Often, understanding is compromised, namely when we don’t know which hypothesis to test. Hohwy refers to the rubber hand illusion. There are two hypotheses: either the touch I can feel is located on the rubber hand I can see being touched, or the touch I can feel is in a different location and the coincidence between what is felt and what is seen is explained by a more deeply hidden common cause (such as the experimenter tapping on the real and the rubber hand). Most people will eventually select the first hypothesis and experience the illusion. The idea is that visual input is expected to be highly precise and therefore is given high weighting in inference.
Individuals who experience the illusion with a delay relative to most other individuals appear to in fact display cognitive penetrability from the high level belief that touch cannot be felt on rubber limbs.
Hohwy then adds one more factor to consider, namely action, which is everywhere in all but the
most constrained experimental conditions. From the perspective of prediction error minimization, action approximates Bayesian inference. The reason is simple: action occurs when a hypothesis is assumed and prediction error minimized relative to it. For example, if you
hypothesize that a sound is right in front of you when it is heard to come from the right, then the resulting prediction error may be minimized by turning the head rightwards. Since minimization of prediction error approximates Bayesian inference, action is a kind of inference, or ‘active inference’. The learning rate in active inference is essentially zero since the prior is not
updated at all in the light of the current sensory input. In fact, to ensure the learning rate is zero so that action can occur, the current sensory input is attenuated; this explains, for example, our inability to tickle ourselves.
Cognitive penetration should be expected to happen primarily when there is heightened uncertainty: when prior learning and the current input are not particularly informative, there will be more unconstrained exploration of the prediction error landscape leading to higher probability of deviations from the learning rate. In addition, deviations should occur more when there is a relatively strong expectation of change (cf. high adaptation), which again predicts more exploration of alternate hypotheses. In contrast, deviations and thereby cognitive penetration will be more restricted under low uncertainty and with little expectation of change (e.g., under strong attentional demands).
I note that cognitive penetration is wide of decision making as normally conceived. But I see cognitive beliefs as largely conscious and modulating perception which is typically subconscious. This seems to be a close parallel with the idea that analytic thought modulates intuitive thought. The learning rate seems closely allied with parameter p or k in parallel constraint satisfaction theory (see post The Fog of the Blog), although it would seem that there might be multiple individual parameters operating at different levels of the hierarchy.