At My Real Job: Expert Forecasting
In the most comprehensive analysis of expert prediction ever conducted, Philip Tetlock assembled a group of some 280 anonymous volunteers—economists, political scientists, intelligence analysts, journalists—whose work involved forecasting to some degree or other. These experts were then asked about a wide array of subjects. Will inflation rise, fall, or stay the same? Will the presidential election be won by a Republican or Democrat? Will there be open war on the Korean peninsula? Time frames varied. So did the relative turbulence of the moment when the questions were asked, as the experiment went on for years. In all, the experts made some 28,000 predictions. Time passed, the veracity of the predictions was determined, the data analyzed, and the average expert’s forecasts were revealed to be only slightly more accurate than random guessing—or, to put more harshly, only a bit better than the proverbial dart-throwing chimpanzee. And the average expert performed slightly worse than a still more mindless competition: simple extrapolation algorithms that automatically predicted more of the same.
But this should not in any sense be read as a case for what-do-they-know populism. On one level, it’s a call for suspicion about ideologically driven predictions:
Cynics resonate to these results and sometimes cite them to justify a stance of populist know-nothingism. But we would be wrong to stop there, because Tetlock also discovered that the experts could be divided roughly into two overlapping yet statistically distinguishable groups. One group would actually have been beaten rather soundly even by the chimp, not to mention the more formidable extrapolation algorithm. The other would have beaten the chimp and sometimes even the extrapolation algorithm, although not by a wide margin…
One group of experts tended to use one analytical tool in many different domains; they preferred keeping their analysis simple and elegant by minimizing “distractions.” These experts zeroed in on only essential information, and they were unusually confident—they were far more likely to say something is “certain” or “impossible.” In explaining their forecasts, they often built up a lot of intellectual momentum in favor of their preferred conclusions. For instance, they were more likely to say “moreover” than “however.”
The other lot used a wide assortment of analytical tools, sought out information from diverse sources, were comfortable with complexity and uncertainty, and were much less sure of themselves—they tended to talk in terms of possibilities and probabilities and were often happy to say “maybe.” In explaining their forecasts, they frequently shifted intellectual gears, sprinkling their speech with transition markers such as “although,” “but,” and “however.”
Using terms drawn from a scrap of ancient Greek poetry, the philosopher Isaiah Berlin once noted how, in the world of knowledge, “the fox knows many things but the hedgehog knows one big thing.” Drawing on this ancient insight, Tetlock dubbed the two camps hedgehogs and foxes.
On another level, it’s an empirical research project:
Consider a major new research project funded by the Intelligence Advanced Research Projects Activity, a branch of the intelligence community.
In an unprecedented “forecasting tournament,” five teams will compete to see who can most accurately predict future political and economic developments. One of the five is Tetlock’s “Good Judgment” Team, which will measure individual differences in thinking styles among 2,400 volunteers (e.g., fox versus hedgehog) and then assign volunteers to experimental conditions designed to encourage alternative problem-solving approaches to forecasting problems. The volunteers will then make individual forecasts which statisticians will aggregate in various ways in pursuit of optimal combinations of perspectives. It’s hoped that combining superior styles of thinking with the famous “wisdom of crowds” will significantly boost forecast accuracy beyond the untutored control groups of forecasters who are left to fend for themselves.
Other teams will use different methods, including prediction markets and Bayesian networks, but all the results will be directly comparable, and so, with a little luck, we will learn more about which methods work better and under what conditions. This sort of research holds out the promise of improving our ability to peer into the future.
In his response, Robin Hanson suggests that expert predictions aren’t really about trying to predict the future. They’re about signaling group affiliation and building authority. Experts make predictions — and more importantly, non-experts demand predictions — so that they can seem smart and well-connected. Branché , as the French would say. (See how easy signaling is? And how fun!)
Robin suggests bringing back the pre-Victorian attitude about wagering. If you put money on it, you’ll be relatively more motivated by accuracy and relatively less by making a good impression. If we think it dishonorable to predict without a wager, we will see a lot more accuracy, if such is anywhere to be found.
I’d like to see an Elo rating system for head-to-head forecasting challenges. Unlike wagering, a rating system in which you essentially wager non-monetary rating points wouldn’t implicate the law at all.
I’m not sure the institutional support would exist for it, but saying that you’re a grandmaster-level forecaster would be one heck of a credential.