This post is based on a draft dated July 10, 2015, “Learning in Dynamic Probabilistic Environments: A Parallel-constraint Satisfaction Network-model Approach,” written by Marc Jekel, Andreas Glöckner, & Arndt Bröder. The paper includes experiments that contrast Parallel Constraint Satisfaction with the Adaptive Toolbox Approach. I have chosen to look only at the update of the PCS model with learning. The authors develop an integrative model for decision making and learning by extending previous work on parallel constraint satisfaction networks with algorithms of backward error-propagation learning. The Parallel Constraint Satisfaction Theory for Decision Making and Learning (PCS-DM-L) conceptualizes decision making as process of coherence structuring in which learning is achieved by adjusting network weights from one decision to the next. PCS-DM-L predicts that individuals adapt to the environment by gradual changes in cue weighting.

Through the Brunswikian Lens Model, in which the structure of the environment and cue usage by individuals are investigated simultaneously, it has been shown that the accuracy of judgments in various domains is fairly high. It has been shown that judgmental accuracy can be increased by providing repeated feedback concerning the accuracy of a judgment (outcome feedback) in relatively simple tasks but less so in complex and uncertain tasks. Providing direct information about cue-criterion relations (task information feedback) is more efficient than outcome feedback but from a practical perspective it will not be available in many real world situations. People’s capacity for deliberate cognitive calculations is limited and it is therefore assumed that people look for good enough solutions—instead of hopelessly searching for the best. Considering the task to infer criteria based on probabilistic cues, individuals already face a difficult task. Individuals have to (explicitly or implicitly) know what the relevant cues are and how well they predict the criterion, they have to search for cues in memory or the environment, and they have to integrate them to determine the criterion and to select one of the available options. Importantly, in natural environments decision making does not end at this stage. People are involved in a continuous stream of inferences and actions in which feedback on (and consequences of) previous actions are used to reduce the gap between reality and predictions in order to improve the quality of subsequent choices (Prediction Machine). Specifically, to improve over time individuals have to learn.

In the most recent implementation of PCS for decision making, it is assumed that the default mode of cue integration is automatic and parallel. Deliberate processes only intervene if the parallel processes of cue integration do not lead to a sufficiently coherent interpretation. And then the conscious processes are asked usually only to get more information. This approach assumes that validities for the cues are updated by feedback, thus changing weights in the network structure. The integration mechanism for the information rests on spreading activation and does not change through learning. A model of learning in probabilistic decision-making needs to address how people learn probabilistic cue-criterion relations between decision-trials based on feedback, how people make decisions in each trial based on those learned relations, and how both processes relate to each other. The authors introduce the extension of PCS by adding the modified Delta-rule for learning of cue-criterion relations between trials.

As shown in Figure 1 above the network-model consists of nodes that are interconnected. Cues are represented in the middle layer of the network as nodes. Choice-options are represented in the upper layer as nodes. From a general source node constant activation spreads into the network by an iterative updating-algorithm that simulates the decision-process. In total there are N = L +K + 1 nodes in the model in three layers. The cue pattern of trial t is represented as weights attached to the bi-directional connections between K = 2 options and L = 5, and L = 6 cues. The property of bi-directional links distinguishes PCS from other prominent network models and leads to distinct predictions concerning coherence effects in the process of decision making. Weights can be excitatory (positive) or inhibitory (negative): A weight between a cue and an option receives a value of +:01 when it speaks for the option and a value of -.01 when it speaks against the option. The subjective validities of the cues are represented as weights attached to the connections between the source-node and the cue-nodes. Validity-weights are positive when the presence of a cue for an option is associated with a higher likelihood of being the better option and negative when the absence is associated with a higher likelihood. All option-nodes are connected with negative weights.

To add learning to the model, validity-weights in the network are updated using a modified Delta-rule based on the final activations of the option-nodes. The modified Delta Rule was created to apply to learning by McClelland and Rumelhart. A larger error expressed as difference in activation at the final interation of the decision process results in a larger change of weights. The free parameter lambda moderates the impact of a single decision trial on the change of the weights and thus moderates the speed of learning. For example, cue 1 might initially speak against option 1. After the final iteration of the network, the activation of option 1 might be positive due to other cues. Through much calculus that is beyond my understanding updating via the Delta rule results in a lower weight in this example since a lower ultimately negative weight would have still produced a correct prediction for option 1.

Jekel et al test two different algorithms that are based upon different psychologically plausible properties. One is based on the sensitivity to the strength of evidence. A high activation for one option and a low activation for the the other option result from an interplay between an unambiguous cue-pattern in a trial (e.g., most cues favor one option)and high net-weights for connections between the source node and validity nodes (i.e., a clear cue-pattern does not necessarily maximize the difference between node-activations when net-weights for validities are low or indistinct from each other). Those differences relate to observed confidence judgments in participants’ decisions. This alternative model **PCS Transformation** also includes a free parameter gamma that accounts for individual differences in how differences in node-activations map to choice-probabilities. A decreasing gamma results in a decreasing sensitivity to the activations of the options in the network (e.g., for gamma = 0, probabilities for both options are .5, that is, a participant is insensitive to differences in node activations).

The second algorithm is **PCS Noise.** It assumes that learning of cue-validities and thus updating of validity-weights in PCS is not deterministic but noisy. This approach shares the same Delta-learning rule for updating net-weights with the other approach of implementing learning in PCS, but differs on how deterministic predictions of PCS are transformed into probabilistic predictions by assuming that updating of cue-validities is partially probabilistic due to unsystematic error in the learning process. To simulate noisy updating of validity-weights in PCS for trial t, a random number from a normal distribution is added to the validity-weight after each trial.

To test the models, the authors used a hypothetical stock-market game. Participants were asked to select the more profitable of two stocks in a series of trials consisting each of a new pair of stocks. Participants were provided with information from five experts (i.e., cues) that made recommendation concerning whether the expected profitability of the respective stock was good or bad (i.e., binary cues). The recommendations of each expert could differentiate between the two options or could be indifferent between options (i.e., both options recommended). Participants received information on the ranking of the experts according to their validity in the first round (i.e., expert A was the most valid cue, expert B the second most valid and so on). The authors defined the better of two option for each trial as the more probable option in accordance with (naïve) Bayes given the cue-pattern and the validities of the experts in the environment. After each decision, participants received feedback about the better option and could thereby update subjective validities for the five experts and/or the success of decision strategies. Three different experiments were run with tens of thousands of responses. **The PCS Noise** Model was the most accurate in predicting responses at 89% and in predicting response times.

The models predict and the experiments verify that a higher learning rate in both PCS implementations lead to higher performance rates, while a high sensitivity gamma and a low standard deviation for error are significantly related to higher performance. Both PCS models make different predictions for decision times based on the number of iterations. The higher the number of iterations, the slower the expected decision time. Second, **PCS Noise** differs from all other models in its tendency to scatter probabilities for predicted choices across the entire range of probabilities.

Results show that the assumption of powerful automatic cognitive processes that take many pieces of information into account can explain human learning of cue-criterion relations and integration well. Low decision times (i.e., between one and two seconds) and mostly incorrect rankings of cues according to validity further speak for automatic processes that resemble characteristics of intuition.