Foxes, Hedgehogs, and the Art of Prediction

By James Case

BOOK REVIEW: The Signal and the Noise: Why So Many Predictions Fail, but Some Don’t. By Nate Silver. The Penguin Press, New York, 2012, 544 pages, $27.95.

Nate Silver is a bona fide celebrity. A Google search on his name returns more than 35 million items; his Wikipedia bio is 14 pages long and concludes with a 163-item bibliography. His book (released in August 2012) rose to second place on The New York Times bestseller list once it was clear that he had correctly predicted the outcomes in all 50 states in the 2012 presidential election. He had missed by just one in 2008, when Obama bested McCain in Indiana by a single percentage point. He also predicted the outcome of 31 of 33 senatorial races in 2012, missing only in Montana and North Dakota, where lightly regarded Democrats scored narrow upset victories. His 2013 Super Bowl predictions, though incorrect, were breathlessly reported from coast to coast.

Silver got his start as a professional prognosticator by selling a computerized player-evaluation system called PECOTA to Baseball Prospectus, an annual publication of little interest to anyone save the hardest of hardcore baseball fans. The acronym, which stands for Pitcher Empirical Comparison and Optimization Test Algorithm, was named after Bill Pecota, a journeyman infielder with the Kansas City Royals during the 1980s. A lifetime 0.249 hitter, Pecota became a recurrent thorn in the side of Silver’s beloved Detroit Tigers, against whom his batting average soared to 0.303. As a condition of purchase, Silver was required to extend the PECOTA system to evaluate hitters as well as pitchers, thus more than doubling the reach of the program without changing the acronym.

Growing up in Lansing, Michigan, Silver became hooked on baseball at an early age. He was only six when the 1984 Tigers won 35 of their first 40 games, romped virtually unchallenged to the American League pennant, and defeated the National League Champion San Diego Padres in the World Series. He learned that summer to decipher the morning box scores, and permanently succumbed to the lure of what he still considers “the world’s richest data set.” Only later, while majoring in economics at the University of Chicago (he subsequently did graduate work at Chicago and the London School of Economics), did he become familiar with Baseball Prospectus and begin applying his newly acquired quantitative skills to the wealth of data contained therein. Drawn initially to the raw numbers, he was surprised to find the writing sharp and entertaining.

After college, Silver spent nearly four years in Chicago, working as a “transfer pricing consultant” for the accounting firm KPMG. Despite the good money, congenial colleagues, and apparently secure future, he found the work boring in the extreme. To amuse himself, he began to develop the “colorful spreadsheet” that eventually became PECOTA. Only later, when Baseball Prospectus abandoned the computerized system it had been using to identify major league players likely to enjoy notable success in the coming season—information dear to subscribers—did he seek to capitalize on his efforts.

The title of the book is borrowed from radio engineering, practitioners of which struggled for years to design receivers capable of eliminating the noise—due mainly to atmospheric effects—that inevitably corrupts the signal transmitted to listeners at home. Silver argues that all who seek to interpret time series data face a similar problem, as all such data can be viewed as disguising a (wanted) signal with an (unwanted) quantity of apparently meaningless noise. His argument applies with particular force to financial data, which tends to embed a faint or non-existent signal within a cacophony of distracting noise, resulting in theories that model the past very well, yet fail to prove predictive.

Many of humankind’s misfortunes, Silver writes, result from failure to predict clearly foreseeable events. The recent financial crisis is a case in point. He identifies four separate failures: that of homeowners and investors to foresee that housing prices could not continue to rise indefinitely, that of the ratings agencies to foresee that a fall in housing prices would cause a crisis in U.S. financial markets, that of economists to foresee that a financial crisis in the U.S. was tantamount to a global financial crisis, and that of leadership to foresee that financial crises would produce an unusually long and deep recession. None of the four qualifies as a “black swan”—all were readily apparent to anyone who cared to look “under the hood” of the models then in use.

■ ■ ■

In the first half of the book, Silver surveys the current state of the art of prediction. Though his own prognostications have concerned mainly the outcomes of baseball and political contests, he seems to be fascinated by every phase and aspect of the predictive process, eager to learn what works for others and what leads so many astray. The second half of the book examines the ways in which predictions are used in various activities, including competitive ones like chess, war, and portfolio management.

A recurrent theme in the book concerns the need to revise predictions as additional information becomes available. Silver uses the 9/11 attacks to illustrate the process, applying the simplest version of Bayes’ theorem, but twice in succession. On 9/10, one might have judged the probability that terrorists would launch such an attack to be $x$ = 0.005%. Once the first plane hits, one should reflect that the probability of such a collision if an attack actually is under way approaches $y$ = 100%, while the probability if no such attack is in progress—as Silver estimates from historical data—is a mere $z$ = 0.008%. The probability that such an attack is under way jumps instantaneously, as soon as the first plane hits, from $x$ = 0.005% to $xy/(xy + z(1 – x)) = 38%$. A second application of the theorem reveals that the probability jumps again when the second plane hits, from 38% to 99.99%!

A second recurrent theme is the thinness of the market for accurate predictions. Whereas forecasts obtained from the National Weather Service are meant to be the best possible, local weather shows often purposely overstate the probability of rain. The theory is that a type I error, such as failure to predict rain on a day when it does rain (with its potential to spoil somebody’s picnic or golf outing), is more disruptive to viewers than a type II error, such as a prediction of rain on a day when it doesn’t rain. In another example, guest commentators on TV talk shows are more likely to be invited back if they venture welcome predictions (think upset victory by the home team over a ranked opponent, or a construction project that should bring jobs to the region) that turn out to be wrong than a dour prediction that turns out to be right. In short, there are many reasons for issuing predictions that are less than the most accurate possible. Silver confines his interest to accurate predictions, and the people who make them.

■ ■ ■

Silver claims to have discovered, in the writings of social scientist Philip Tetlock, the psychological profile of a successful prognosticator. Beginning in 1987, Tetlock started collecting predictions from a broad array of government and academic experts on a variety of topics, including, among other things, the immediate fate of the Soviet Union. Virtually none of those questioned foresaw the troubled nation’s impending collapse. Puzzled, Tetlock broadened his horizons, asking experts to venture predictions on the Gulf War, the Japanese real estate bubble, the potential secession of Quebec from Canada, and a host of other then-timely issues. His studies, which he continued for more than fifteen years, were eventually published in the 2005 book Expert Political Judgment. His conclusions were damning.

Tetlock’s experts proved all but clueless, grossly overconfident, and unable to calculate probabilities. Fully 15% of the events they judged to have no chance of occurring did soon occur, while almost 25% of those they deemed to be sure things never happened! Whether they were predicting events in economics, domestic politics, or international affairs, their collective judgment was utterly unreliable. A few individuals, however, performed conspicuously well.

From their answers to a battery of questions lifted from standard personality tests, Tetlock was able to separate his respondents into two groups: foxes, who demonstrated genuine predictive ability, and hedgehogs, who did not. The names were inspired by a line from an ancient Greek poem: “The fox knows many little things, while the hedgehog knows one big thing.” Foxes, he found, tend to be multidisciplinary, adaptable, self-critical, tolerant of complexity, cautious, and empirical, whereas hedgehogs are more likely to be specialized, stalwart, stubborn, order-seeking, confident, and ideological. Foxes provide the more accurate forecasts, but hedgehogs tend to be more entertaining as TV guests. The more interviews the experts had done, the less reliable Tetlock found their predictions to be.

Although he has plenty of other points to make, Silver returns to the fox-versus-hedgehog theme repeatedly. Indeed, he devotes much of the book to arguments that, in activities like forecasting baseball player performance, election results, and hurricane paths, where predictions are made by faceless foxes, the quality of predictions tends to be relatively high and gradually improving. But in fields like economics and climate science, where predictions are more likely to be made by publicity-seeking hedgehogs, prediction quality tends to be and remain low. He has thought long and hard about the art of prediction, and his book on the subject is well worth reading!

James Case writes from Baltimore, Maryland.