If only scientists hyped marginal results more, our problems would be solved

Yesterday’s New York Times Sunday Review section contains one of the most gloriously silly pieces of science journalism I’ve seen in a while.

The main point of the article, which is by the science historian Naomi Oreskes and is headlined “Playing Dumb on Climate Change,” is that the 95% confidence threshold that’s commonly used as the requirement for “statistical significance” is too high. That’s right — in a world in which there’s strong reason to believe that most published research findings are false (in biomedical research), Oreskes thinks that the main problem we need to address is that scientists are too shy and retiring when it comes to promoting marginal results.

The truth, of course, is precisely the opposite. To quote from a good Nature article on this stuff from last year (which I wrote about back in February),

The irony is that when UK statistician Ronald Fisher introduced the P value in the 1920s, he did not mean it to be a definitive test. He intended it simply as an informal way to judge whether evidence was significant in the old-fashioned sense: worthy of a second look.

In case anyone doesn’t know, what both the Times article and I referred to as “95% confidence” is the same thing as what statisticians call a P-value of 5%.

Fisher, not surprisingly, had this exactly right. A “95% confidence” result is merely a hint that something interesting might be going on. It’s far from definitive evidence. And yet scientists and journalists routinely report these results as if they were virtual certainties. Oreskes’s proposal to lower that threshold is precisely the opposite of what we should do.

In the course of arguing for this position, Oreskes repeats a common misconception about P-values:

Typically, scientists apply a 95 percent confidence limit, meaning that they will accept a causal claim only if they can show that the odds of the relationship’s occurring by chance are no more than one in 20. But it also means that if there’s more than even a scant 5 percent possibility that an event occurred by chance, scientists will reject the causal claim. It’s like not gambling in Las Vegas even though you had a nearly 95 percent chance of winning.

A 95% confidence result (P < 5%) certainly does not mean that there’s a 5% probability that the event occurred by chance. It means that, if we assume that there is no causal link, then there’s a 5% chance of seeing results as extreme as the ones we found.

That distinction (which I’ve written about before) may sound like nitpicking, but it’s extremely important. Suppose that a pregnancy test is guaranteed to be 95% accurate. If I take that pregnancy test and get a positive result, It does not mean that there’s a 95% chance that I’m pregnant. Because there was a very low prior probability of my being pregnant (among other reasons, because I’m male), I’d be quite confident that that test result was a false positive.

When scientists quote P-values, they’re like the 95% accuracy quoted for the pregnancy test: both are probabilities of getting a given outcome (positive test result), assuming a given hypothesis (I’m pregnant). Oreskes (like all too many others) turns this into a statement about the probability of the hypothesis being true. You can’t do that without folding in information about the hypothesis’s prior probability (I’m unlikely to be pregnant, test or no test).

This is one reason that 95%-confidence results aren’t nearly as certain as people seem to think. Lots of scientific hypotheses are a priori not very likely, so even a 95%-confidence confirmation of them doesn’t mean all that much. Other phenomena, such as P-hacking and publication bias, mean that even fewer 95%-confidence results are true than you’d expect.

Oreskes says that scientists “practice a form of self-denial, denying themselves the right to believe anything that has not passed very high intellectual hurdles,” a description I’m happy to agree with, as an aspiration if not always a reality. Where she loses me is in suggesting that this is a bad thing.

She also claims that this posture of extreme skepticism is due to scientists’ fervent desire to distinguish their beliefs from religious beliefs. It’s possible that the latter claim could be justified (Oreskes is a historian of science, after all), but there’s not a hint of evidence or argument to support it in this article.

Oreskes’s article is misleading in another way:

We’ve all heard the slogan “correlation is not causation,” but that’s a misleading way to think about the issue. It would be better to say that correlation is not necessarily causation, because we need to rule out the possibility that we are just observing a coincidence.

This is at best a confusing way to think about the correlation-causation business, as it seems to suggest that the only two possibilities for explaining a correlation are coincidence and causation. This dichotomy is incorrect. There is a correlation between chocolate consumption and Nobel prizes. The correlation is not due to chance (the P-value is extremely low), but one cannot conclude that chocolate causes Nobel prizes (or vice versa).