In Slate, Jordan Ellenberg asks How will we know if Nate Silver was right?

It’s a good question. If you have a bunch of models that make probabilistic predictions, is there any way to tell which one was right? Every model will predict *some* probability for the outcome that actually occurs. As long as that probability is nonzero, how can you say the model was wrong?

Essentially every question that a scientist asks is of this form. Because measurements always have some uncertainty, you can virtually never say that the probability of any given outcome is *exactly* zero, so how can you ever rule anything out?

The answer, of course, is statistics. You don’t rule things out with absolute certainty, but you rule them out with high confidence if they fit the data badly. And “fit the data badly” essentially means “have a low probability of occurring.”

So Ellenberg proposes that all the modelers publish detailed probabilities for all possible outcomes (specifically, all possible combinations of victories by the candidates in each state). Once we know the outcome, the one who assigned the highest probability to it is the best.

In statistics terminology, what he’s proposing is simply ranking the models by *likelihood*. That is indeed a standard thing to do, and if I had to come up with something, it’s what I’d suggest too. In this case, though, it’s probably not going to give a definitive answer, simply because all the forecasters will probably have comparable probabilities for the one outcome that will occur.

All of those probabilities will be low, because there are lots of possible outcomes, and any given one is unlikely. That doesn’t matter. What matters is whether they’re all similar. If 538 predicts a probability of 0.8%, and Princeton Election Consortium predicts 0.0000005%, then I agree that 538 wins. But what if the two predictions are 0.8% and 0.5%? The larger number still wins, but how strong is that evidence?

The way to answer that question is to use a technique called *reasoning* (or as some old-fashioned people insist on calling it, *Bayesian reasoning*). Bayes’s theorem gives a way of turning those likelihoods into *posterior probabilities*, which are the probabilities that any given model is correct, given the evidence. The answer depends on the *prior probabilities* — how likely you thought each model was before the data came in. If, as I suspect, the likelihoods come out comparable to each other, then the final outcome depends strongly on the prior probabilities. That is, the new information won’t change your mind all that much.

If things turn out that way, then Ellenberg’s proposal won’t answer the question, but that’s because there won’t be any good way to answer the question. The Bayesian analysis is the correct one, and if it says that the posterior distribution depends strongly on the prior, then that means that the available data don’t tell you who’s better, and there’s nothing you can do about it.

I always enjoy your posts when you refer to Bayesian reasoning and then say “or as I like to call it – reasoning”. So I really enjoy the (intentional?) swap you made in this post: “a technique called *reasoning*…” To me these are different levels of tongue-in-cheek-ness, and both are highly amusing.