Some folks had the nice idea of looking at the data from the Iran election returns for signs of election fraud. In particular, they look at the last and second-to-last digits of the totals for different candidates in different districts, to see if these data are uniformly distributed, as you’d expect. Regarding the last digit, they conclude
The ministry provided data for 29 provinces, and we examined the number of votes each of the four main candidates — Ahmadinejad, Mousavi, Karroubi and Mohsen Rezai — is reported to have received in each of the provinces — a total of 116 numbers.
The numbers look suspicious. We find too many 7s and not enough 5s in the last digit. We expect each digit (0, 1, 2, and so on) to appear at the end of 10 percent of the vote counts. But in Iran’s provincial results, the digit 7 appears 17 percent of the time, and only 4 percent of the results end in the number 5. Two such departures from the average — a spike of 17 percent or more in one digit and a drop to 4 percent or less in another — are extremely unlikely. Fewer than four in a hundred non-fraudulent elections would produce such numbers.
The calculations are correct. There’s about a 20% chance of getting a downward fluctuation as large as the one seen, about a 10% chance of getting an upward fluctuation as large as the one seen, and about a 3.5% chance of getting both simultaneously.
The authors then go on to consider patterns in the last two digits.
Psychologists have also found that humans have trouble generating non-adjacent digits (such as 64 or 17, as opposed to 23) as frequently as one would expect in a sequence of random numbers. To check for deviations of this type, we examined the pairs of last and second-to-last digits in Iran’s vote counts. On average, if the results had not been manipulated, 70 percent of these pairs should consist of distinct, non-adjacent digits.
Not so in the data from Iran: Only 62 percent of the pairs contain non-adjacent digits. This may not sound so different from 70 percent, but the probability that a fair election would produce a difference this large is less than 4.2 percent.
Each of these tests alone is of marginal statistical significance, it seems to me, but in combination they start to look significant. But I don’t think that’s a fair conclusion to draw. It seems to me that this analysis is an example of the classic error of a posteriori statistical significance. (This fallacy must have a catchy name, but I can’t come up with it now. If you know it, please tell me.)
This error goes like this: you notice a surprising pattern in your data, and then you calculate how unlikely that particular pattern is to have arisen. When that probability is low, you conclude that there’s something funny going on. The problem is that there are many different ways in which your data could look funny, and the probability that one of them will occur is much larger than the probability that a particular one of them will occur. In fact, in a large data set, you’re pretty much guaranteed to find some sort of anomaly that, taken in isolation, looks extremely unlikely.
In this case, there are lots of things that one could have calculated instead of the probabilities for these particular outcomes. For instance, we could have looked at the number of times the last two digits were identical, or the number of times they differed by two, three, or any given number. Odds are that at least one of those would have looked surprising, even if there’s nothing funny going on.
By the way, this is an issue that I’m worrying a lot about these days in a completely different context. There are a number of claims of anomalous patterns in observations of the cosmic microwave background radiation. It’s of great interest to know whether these anomalies are real, but any attempt to quantify their statistical significance runs into exactly the same problem.