That error the “hot hands” guys say that everyone makes? Turns out nobody makes it

A followup to this post.

First, a recap. Last month, there was an article in the New York Times touting a working paper by Miller and Sanjurjo. The paper claimed that various things related to the gambler’s fallacy could be explained by a certain (supposedly) counterintuitive fact about probabilities. In particular, it claimed that past attempts to measure the “hot hands” phenomenon (the perception that, say, basketball players are more likely to make a shot when they’ve made their previous shots) were tainted by a mistaken intuition about probabilities.

The mathematical result described in the paper is correct, but I was very dubious about the claim that it was counterintuitive, and also about the claim that it was responsible for errors in past published work.

To save you the trouble of following the link, here are some excerpts from my previous post:

Suppose you flip a coin four times. Every time heads comes up, you look at the next flip and see if it’s heads or tails. (Of course, you can’t do this if heads comes up on the last flip, since there is no next flip.) You write down the fraction of the time that it came up heads. For instance, if the coin flips went HHTH, you’d write down 1/2, because the first H was followed by an H, but the second H was followed by a T.

You then repeat the procedure many times (each time using a sequence of four coin flips). You average together all the results you get. The average comes out less than 1/2.

I guess that might be a counterintuitive result. Maybe. Personally, I find the described procedure so baroque that I’m not sure I would have had any intuition at all as to what the result should be.

My question is whether the average-of-averages procedure described in the article actually corresponds to anything that any actual human would do.

According to the Miller-Sanjurjo paper, in previous published work,

The standard measure of hot hand effect size in these studies is to compare the empirical probability of a hit on those shots that immediately follow a streak of hits to the empirical probability of a hit on those shots that immediately follow a streak of misses.

If someone did that for a bunch of different people, and then took the mean of the results, and expected that mean to be zero in the absence of a hot-hands effect, they would indeed be making the error Miller and Sanjurjo describe, because it’s true that these two means differ for the reason described in the paper. So does anyone actually do this?

The sentence I quoted above cites three papers. I don’t seem to have full-text access to one of them, but I looked at the other two. One of them (Koehler and Conley) contains nothing remotely like the procedure described in this working paper. Citing it in this context is extremely misleading.

The other one (Gilovich et al.) does indeed calculate the probabilities described in the quote, but (a) that’s just one of many things it calculates, and (b) they never take the mean of the results. Fact (a) casts doubt on the claim that this is “the standard measure” — it’s a tiny fraction of what Gilovich et al. talk about — and (b) means that they don’t make the error anyway.

The relevant section of Gilovich et al. is Table 4, which tallies the results of an experiment in which Cornell basketball players attempted a sequence of 100 free throws each. The authors calculate the probabilities of getting a hit after a sequence of hits and after a sequence of misses, and note little difference between the two. Miller and Sanjurjo’s main point is that the mean of the differences should be nonzero, even in the absence of a “hot hands” effect. That’s true, but since Gilovich et al. don’t base any conclusions on the mean, it’s irrelevant.

Gilovich et al. do talk about the number of players with a positive or negative difference, but that’s different from the mean. In fact, the median difference between the two probabilities is unaffected by the Miller-Sanjurjo effect (in a simulation I did, it seems to be zero in the absence of a “hot hands” effect), so counting the number of players with positive or negative differences seems like it might be an OK thing to do.

In any case, Gilovich et al. draw their actual conclusions about this study from estimates of the serial correlation, which is an unimpeachably sensible thing to do, and which is unaffected by the Miller-Sanjurjo effect.

So I can find no evidence that anyone actually makes the error that Miller and Sanjurjo claim to be widespread. Two of the three papers they cite as examples of this error are free of it. I couldn’t check the third. I suppose I could get access to it with more effort, but I’m not going to bother.