Knowing my interest in such things, my brother Andy pointed me toward an article in Science by Bradly Efron on Bayesian and frequentist statistical techniques. I think it’s behind a paywall, unfortunately.
Efron is obviously far more qualified to discuss this than I am — he’s a professor of statistics at Stanford, and I’ve never taken a single course in statistics. But, with the physicist’s trademark arrogance, I won’t let that stop me. As far as I can tell, much of what he says is muddled and incoherent.
Near the beginning, Efron gives a perfectly serviceable example of Bayesian inference:
A physicist couple I know learned, from sonograms, that they were due to be parents of twin boys. They wondered what the probability was that their twins would be identical rather than fraternal. There are two pieces of relevant evidence. One-third of twins are identical; on the other hand, identical twins are twice as likely to yield twin boy sonograms, because they are always same-sex, whereas the likelihood of fraternal twins being same-sex is 50:50. Putting this together, Bayes’ rule correctly concludes that the two pieces balance out, and that the odds of the twins being identical are even.
So far so good. Then, a couple of paragraphs later:
Bayes’ 1763 paper was an impeccable exercise in probability theory. The trouble and the subsequent busts came from overenthusiastic application of the theorem in the absence of genuine prior information, with Pierre-Simon Laplace as a prime violator. Suppose that in the twins example we lacked the prior knowledge that one-third of twins are identical. Laplace would have assumed a uniform distribution between zero and one for the unknown prior probability of identical twins, yielding 2/3 rather than 1/2 as the answer to the physicists’ question. In modern parlance, Laplace would be trying to assign an “uninformative prior” or “objective prior”, one having only neutral effects on the output of Bayes’ rule. Whether or not this can be done legitimately has fueled the 250-year controversy.
I have no idea whether Efron’s description of what Laplace would have thought is correct (and, I suspect, neither does Efron). But assuming it is, is the reasoning correct? Everybody agrees that, given the assumed prior, Bayes’s rule gives the correct recipe for calculating the final probability. The question is whether imaginary-Laplace used the right prior.
That question is, unfortunately, unanswerable without mind-reading. The correct prior for imaginary-Laplace to use is, unsurprisingly, the one that accurately reflected his prior beliefs. If imaginary-Laplace believed, before looking at the sonogram, that all possible values for the probability were equally likely — that is, if he would have been equally unsurprised by a divine revelation of the true value, whatever value was revealed — then the result is correct. If, on the other hand, he believed, based on his prior life experience, that some values of the probability were more likely than others — say, he’d encountered a lot more fraternal than identical twins in his life — then he would have been wrong to use a uniform prior.
The conclusion imaginary-Laplace comes to depends on what he thought before. Many people who dislike Bayesian statistics state that fact as if it were a problem, but it’s not. New evidence allows you to update your previous beliefs. Unless that evidence tells you things with logical certainty, that’s all it does. If you expect more than that, you’re asking your data to do something it just can’t do.
Efron claims that the traditional, non-Bayesian approach to statistics, known as “frequentism,” has the advantage that it “does away with prior distributions entirely,” with a resulting “gain in scientific objectivity.” This is true, I guess, but at a cost: Frequentism simply refuses to answer questions about the probability of hypotheses. To be specific, suppose that you went back to those expectant physicists and told them that they were only allowed to use frequentist reasoning and could not take into account their prior knowledge. You then asked them, based on the sonogram results, about the probability that their twins were identical. Frequentism flatly refuses to answer this question.
What this means in practice is that frequentism refuses to answer any actually interesting questions. Want to know whether the physicists at the LHC have found the Higgs boson? Don’t ask a frequentist. If he’s honest, he’ll admit that his methods can’t answer that question. All a frequentist can tell you is the probability that, if the Higgs boson has a certain set of properties, then the LHC folks would have gotten the data they did. What’s the probability that the Higgs actually has those properties? A frequentist can’t say.
For those who care, let me state that precisely. In frequentist analyses, all probabilities are of the form P(data | hypothesis), pronounced “the probability that the data would occur, given the hypothesis.” Frequentism flatly refuses to consider probabilities with the hypothesis to the left of the bar — that is, it refuses to consider the probability that any scientific hypothesis is correct.
The situation can be summarized in a Venn diagram:
One might naively have supposed that this would generally be regarded as a problem.
This makes the advice with which Efron concludes his article particularly silly: in the absence of strong prior information, he says,
Bayesian calculations cannot be uncritically accepted and should be checked by other methods, which usually means frequentistically.
This is meaningless. The Bayesian questions that depend on the prior are precisely those that frequentism refuses to address. If you don’t like Bayesian methods, your only choice is to decide that you’re not interested in the questions those methods answer. It’s a mystery to me how scientists make themselves go to work in the morning while believing that it’s impossible for their data to tell them anything about scientific hypotheses, but apparently many do.
(Actually, I’m pretty sure I do know how they do it: even the frequentist scientists are actually closet Bayesians. They cheerfully use the data to update their mental tally of how likely various hypotheses about the world are to be true, based on their prior beliefs.)
In the last part of the article, Efron advocates for an approach he seems to have developed called “Empirical Bayes.” I find the details confusing, but the claim is that, in some situations involving large data sets, “we can effectively estimate the relevant prior from the data itself.” As a matter of logic, this is of course nonsense. By definition, the prior is stuff you thought before you looked at the data. I think that what he means is that, in some situations involving large data sets, the evidence from the data is very sharply peaked, so that the conclusions you draw don’t depend much at all on the prior. That’s certainly true in some circumstances (and has been well known to all practitioners of Bayesian statistics forever).