| March 01, 2016

Hot Hands, Streaks and Coin-flips: How The New York Times Got it Wrong

The existence of “Hot Hands” and “Streaks” in sports and gambling is hotly debated. Two recent articles in The New York Times (NYT) discussing streaks and hot hands in basketball and coin flips misinterpreted elementary concepts in probability and statistics. While it is disheartening to see the newspaper of record make such basic errors in mathematical reporting, the articles do provide a case study for how they got it wrong. This article is adapted from a longer piece I wrote for that purpose and audience [2].

The starting point is an article by George Johnson in the NYT Sunday Review on October 18, 2015, titled “Gamblers, Scientists and the Mysterious Hot Hand,” which discusses recent claims by Joshua Miller and Adam Sanjurjo [3] that a classic 1985 paper [1] debunking the concept of hot hands in basketball, “is flawed by a subtle misperception about randomness.” Then, on October 27, 2015, in a follow-up NYT article (published in The Upshot) called “Streaks Like Daniel Murphy’s Aren’t Necessarily Random,” Binyamin Appelbaum writes that Miller and Sanjurjo [3] claimed that the classic paper had “made a basic statistical error.”

What I discuss here is not hot hands per se, but how the NYT articles addressed probability and statistics. The following quotes are from the two articles:

(From Johnson): “For a 50 percent shooter, for example, the odds of making a basket are supposed to be no better after a hit – still 50-50. But in a purely random situation, according to the new analysis, a hit would be expected to be followed by another hit less than half the time.” (Italics added). The NYT article concerns a “purely random situation,” not some basketball-related phenomenon. I interpret the Johnson statement in italics as “... the probability that a hit will be followed by another hit is less than one-half.”

(From Appelbaum): “Flip a coin, and there’s an equal chance it will land heads or tails ... But Joshua Miller of Bocconi University and Adam Sanjurjo of the Universidad de Alicante pointed out something surprising: In the average series of four coin flips, the sequence heads-heads is significantly less common than heads-tails. On average, just 40.5 percent of the heads are followed by another heads.” (Italics added).

In a “purely random situation,” every flip will be an H with the same probability that it is a T – exactly one-half. Thus, a hit is expected to be followed by another hit (H) one-half of the time, which is as often as it is expected to be followed by a miss (T). So what is going on?

Following a similar example and table in the Miller and Sanjurjo paper (but not yielding a similar conclusion), Johnson did the following. He looked at the 16 length-four sequences, shown in Figure 1. For each sequence that contains an H in one of the first three positions (there are 14), he calculated the percentage of those Hs that are followed by another H. Call that the HH-percentage. Then he calculated the unweighted average of the HH-percentages and got about 40.5%.

Figure 1. The 16 sequences of length four. The number of Hs in the first three positions is 24, and the number of those Hs followed by another H is 12, exactly 50%. However, the HHs are not distributed uniformly. For example, sequence HHHH has three HHs and TTHH only has one, but both have HH-percentage of 100%. So, the unweighted average of the HH-percentages is not 50%, but about 40.5%. True, but so what? It does not follow that the probability of a hit following another hit is less than half.

The arithmetic is right, but so what? The unweighted average gives equal weight to each sequence, ignoring the fact that some sequences have more Hs than others, and that occurrences of HH are not uniformly distributed among the sequences. To correctly calculate the probability that an H follows an H, you need to give equal weight to each H that occurs in the first three positions of a sequence. Equivalently, if you start from the HH-percentages of the 14 sequences, you need to compute a weighted average of those HH-percentages: each HH-percentage multiplied by the number of Hs in the first three positions of the associated sequence. The NYT articles get this wrong, because they suggest that the unweighted average of the HH-percentages equals the probability of an H following an H in a fair coin flip.

While Johnson and Appelbaum miss the issue of weighted versus unweighted averaging, Miller and Sanjurjo (MS) do not. They state that “The key ... is that it is not the flip that is treated as the unit of analysis, but rather the sequence of [four] flips from each coin ... Therefore, in treating the sequence as the unit of analysis, the average empirical probability across coins amounts to an unweighted average.”

But Why?

Why did MS intentionally calculate unweighted averages? Let me explain one reason. Suppose that a player, with an established 50% hit rate, shoots four times in a game. To decide if he had a hot hand in the game, we compute his HH-percentage for the game and compare it to a reference number representing the expected HH-percentage for someone without a hot hand. We could compare to 50%, which is essentially (but not exactly) what the authors did in the classic 1985 paper. But the MS approach is to model a player without a hot hand as a fair coin; we can think of that player as selecting (with equal probability) one of the 16 four-flip sequences.1 The expected value of those 14 HH-percentages is their unweighted average (about 40.5%).

In general, for any k, the unweighted average HH-percentage over the 2k sequences of length k is less than one-half. So, in the MS view, an HH-percentage of 50% in a game for a player with a hit-rate of 50% is evidence the player did exhibit a hot hand, rather than evidence against it. That is the main point made by MS, and the reason for their use of the unweighted average. The NYT articles missed that point.

The Wall Street Journal also addressed the hot hands dispute in “The ‘hot hand’ debate gets flipped on its head,” by Ben Cohen, September 29, 2015, and initially made the same mistake as the NYT articles. Cohen wrote: “Toss a coin four times. Write down what happened. Repeat that process one million times. What percentage of flips after heads also come up heads? The obvious answer is 50%. That answer is also wrong. The real answer is 40%.”

But on September 30, in an online version of the article, the error was corrected to “Toss a coin four times. Write down the percentage of heads on the flips coming immediately after heads. Repeat that process one million times. On average, what is that percentage?”

The NYT, on the other hand, has not yet issued a correction at the time of this publication. As an educator in a field involving mathematical reasoning, and one concerned with the public’s understanding of quantitative issues, this is disturbing. Articles such as this reinforce the need for discussions in high school and college focused on quantitative reasoning, data analysis, probability, and statistics.

¹ In MS they restrict to the 14 sequences that have an H in one of the first three positions.

References
[1] Gilovich, T., Vallone, R., & Tversky, A. (1985). The hot hand in basketball: On the misperception of random sequences. Cognitive Psychology, 17, 295–314.

[2] Gusfield, D. (2015, December 29). Hot hands, streaks and coin-flips: Numerical nonsense in the New York Times. Cornell University Library. Preprint, arXiv:1512.08773v1 [math.HO].

[3] Miller, J.B., & Sanjurjo, A. (2015, September 15). Surprised by the Gambler’s and Hot Hand Fallacies? A Truth in the Law of Small Numbers. IGIER Working Paper no. 552.

Dan Gusfield is a professor in the computer science department at the University of California, Davis.