Understanding or inferring the intentions, feelings and beliefs of others is a hallmark of human social cognition often referred to as having a Theory of Mind. ToM has been described as a cognitive ability to infer the intentions and beliefs of others, through processing of their physical appearance, clothes, bodily and facial expressions. Of course, the repertoire of hypotheses of our ToM is borrowed from the hypotheses that cause our own behavior.
But how can processing of internal visceral/autonomic information (interoception) contribute to the understanding of others’ intentions? The authors consider interoceptive inference as a special case of active inference. Friston (see post Prediction Error Minimization) has theorized that the goal of the brain is to minimize prediction error and that this can be achieved both by changing predictions to match the observed data and, via action, changing the sensory input to match predictions. When you drop the knife and then catch it with the other hand, you are using active inference.
Prediction errors are the difference between sensations and predictions of those sensations. For example, perception of a surprising object is associated with an attempt to suppress visual prediction error. Action, on the other hand, minimizes prediction error by directly altering sensory inputs through movement and visceral control that fulfill proprioceptive and interoceptive predictions. Ondobaka et al note that the intensity and frequency of on-going contractions of the heart muscle can be modulated to suppress the interoceptive prediction error signalling surprising interoceptive states related to e.g. blood pressure. This reflexive suppression of interoceptive prediction error corresponds to autonomic reflexes mediated by smooth muscles.
From the active inference perspective, knowing the contents of another’s mind can be cast as an optimal explanation for perceived (motor and visceral) behavior in others – that would have been produced by ourselves, have we had been in the same intentional and emotional state. The authors state that deep generative models permit inferences at multiple levels. For example, inferring the interoceptive (visceral) states of another necessarily constrains the hypothesis space of plausible explanations for their current behavior (e.g. she wants to go back indoors because she’s cold). Observer’s predictions of their own interoceptive states that cause a feeling of being cold play a principal role in understanding this feeling when seeing someone shiver. This is implied because interoceptive predictions are generated in the observer’s deep hierarchical model that concurrently generates exteroceptive predictions of seeing someone shiver. In this instance, the observation of shivering may induce an interoceptive or emotional contagion and empathy – implying that an observer can also be sympathetic to another’s desires and intentions (e.g. to go back into the warmth).
Compared to proprioception and exteroception, interoceptive sensations have a low degree of spatiotemporal acuity and do not typically reach conscious awareness. For example, it is difficult to localize a stomach ache in space and time. Similarly, there is little conscious access to the peristaltic contractions of the transverse colon or status of renal function. These low levels of acuity or resolution are a direct consequence of the generative models we have inherited to predict the continuously changing interoceptive signals reported by a large variety of receptors. Interoceptive–exteroceptive correspondence is rather poor, making the direct mapping of others’ interoceptive states to our own interoception more challenging.
Interoceptive states are informing and contextualizing behavior by biasing perception and action towards fulfilling organism’s physiological needs. Some research has proposed a fundamental role for interoception in body ownership, emotion and selfhood. The ability to recognize others’ emotion from facial expression is correlated with perceivers’ ability to report their own interoceptive states. For interoceptive cues, perspective taking is difficult and sometimes perhaps impossible. For example, we cannot see our pupils dilate. This suggests that the emotional and intentional theory of mind has to be learned through interpersonal interactions, probably at an early stage of development, in which attachments are made.
During social inference, ToM regions combine interoceptive information with proprioceptive and exteroceptive signals. Whereas the effects of one’s interoceptive states play a crucial role in behavior, there is no direct one-to one mapping of interoception onto proprioceptive and exteroceptive states. Interoceptive states are rather mapped, with exteroceptive and proprioceptive states onto multimodal constructs in a hierarchical fashion (Fig. 1). This means, each modality contextualizes the others, through ascending prediction errors and resulting updates at deep or higher conceptual levels of the hierarchical model. For example, to maintain an expected heart rate, many (probabilistic) mappings to exteroceptive states (objects) and proprioceptive states (movements) could exist. The fundamental advantage, offered by the mappings, is the potential to engage in simulation or counterfactual inference.
In summary, Ondobaka et al suggest that inference about or understanding of models that cause another’s behavior in ToM may be mediated by joint minimization of hierarchical prediction errors elicited by unexpected sensations. Multimodal expectations induced at deep (high) hierarchical levels – that necessarily entail interoceptive predictions – should play a fundamental role in inferring causes of sensory impressions produced by the behavior of other people.