I mentioned this in my last post and could not resist it. It is based on a 2009 paper by Herzog & Hertwig, “The Wisdom of Many in One Mind Improving Individual Judgments With Dialectical Bootstrapping.” How can a set of individually mediocre estimates become superior when averaged? The secret is a statistical fact that, although well known in measurement theory, has implications that are often not intuitively evident . A subjective quantitative estimate can be expressed as an additive function of three components: the truth (the true value of the estimated quantity), random error (random fluctuations in the judge’s performance), and systematic error (i.e., the judge’s systematic tendency to over- or underestimate the true value). Averaging estimates increases accuracy in two ways: It cancels out random error, and it can reduce systematic error. This reminds me of Scott Page’s diversity prediction theorem which simply states that the crowd’s error = avg error- diversity. I expect to look at systematic error and diversity in future posts, but for now how can we conduct a dialogue with ourselves and improve our predictions?
This can be illustrated using the concept of bracketing. If two estimates are on the same side of the truth (i.e., do not ‘‘bracket’’ the true value), averaging them will be as accurate, on average, as randomly choosing one estimate. But if two estimates bracket the true value (i.e., one overestimates it and the other underestimates it), averaging the two will yield a smaller absolute error than randomly choosing one of the estimates. Assume that the true value is 100, and two judges estimate it to be 110 and 120, erring by 10 and 20 units, respectively. Randomly choosing between their estimates gives an expected error of 15, whereas averaging the estimates results in 115, which is also off by 15. Now assume that the second judge’s
estimate is 80, rather than 120. In this case, although the two judges’ estimates have the same absolute errors as before, they lie on opposite sides of the true value. Because the second estimate still errs by 20 units, one can again expect an absolute error of 15 when choosing randomly between the two estimates. Averaging them, however, gives 95, an error of only 5 units! Averaging, therefore, dominates the strategy of choosing randomly between two estimates: Without bracketing, averaging is as accurate as random choice, and with bracketing, averaging beats random choice. Bracketing can arise from random error or different systematic errors. Consequently, a low correlation among the errors of a set of judges virtually guarantees bracketing, making the average estimate more accurate than an estimate by a randomly selected judge.
Herzog and Hertwig’s thesis is that it is possible to reduce estimation error within one person by averaging his or her first estimate with a dialectical second estimate that is at odds with the first one. How can one elicit a dialectical estimate that is likely to fall in the gain range? Herzog and Hertwig propose that any technique that prompts people to generate the dialectical estimate using knowledge that is at least partly different from the knowledge they used to generate the first estimate can suffice. Retrieving different but plausible information makes it likely that the second estimate will be sufficiently accurate to fall inside the gain range and that its error will be different from that of the first estimate, perhaps even causing the second estimate to fall on the opposite side of the true value, producing bracketing. This proposal builds upon insights from debiasing research which prompt people to consider knowledge that was previously overlooked, ignored, or deemed inconsistent with current beliefs by, for example, asking them to think of reasons why their first judgment might be wrong.
Is dialectical bootstrapping more than a theoretical possibility, and if so, how well does it work? The authors examined these questions in an empirical study in which participants first gave estimates in response to a set of questions and then generated dialectical estimates. The 105 Participants were students at the University of Basel. Each participant was randomly assigned to one of two conditions. In both conditions, participants first generated their estimates without
knowing that they would be asked later to generate a second estimate. In the dialectical-bootstrapping condition, participants were then asked to give dialectical estimates:
First, assume that your first estimate is off the mark. Second, think
about a few reasons why that could be. Which assumptions and
considerations could have been wrong? Third, what do these new
considerations imply? Was the first estimate rather too high or too
low? Fourth, based on this new perspective, make a second,
Figure 2 summarizes the results. “Repeated” is based on the same person’s response without the dialectical prompting set out above, and “other person’s” includes the response of an additional random person. As you can see, dialectical bootstrapping improved performance by about 4%.
Part of the wisdom of the many resides in an individual mind. Are there ways of improving on mere reliability gains apart from using the consider-the-opposite strategy employed by Herzog and Hertwig? They suggest that increasing the time delay between two repeated estimates also boosts gains produced by averaging. Dialectical bootstrapping is a simple mental tool that fosters accuracy by leveraging people’s capacity to construct conflicting realities. The authors do not claim that people spontaneously make use of this tool. Rather, they suggest that after learning to generate good estimates based on different knowledge anyone can benefit from dialectical bootstrapping. Although limited to the domain of quantitative estimates and predictions, this mental tool has a versatility stemming from its general statistical rationale.
Herzog, S.M., & Hertwig, R. (2009). “The Wisdom of Many in One Mind Improving Individual Judgments With Dialectical Bootstrapping.” Psychological Science, Vol 20, No. 2. 231-237.