A significant level of snark – Ted Bunn’s Blog

I learned via Peter Coles of this list of ways that scientists try to spin results that don’t reach the standard-but-arbitrary threshold of statistical significance. The compiler, Matthew Hankins, says

You don’t need to play the significance testing game – there are better methods, like quoting the effect size with a confidence interval – but if you do, the rules are simple: the result is either significant or it isn’t.

…

The following list is culled from peer-reviewed journal articles in which (a) the authors set themselves the threshold of 0.05 for significance, (b) failed to achieve that threshold value for p and (c) described it in such a way as to make it seem more interesting.

The list begins like this:

(barely) not statistically significant (p=0.052)
a barely detectable statistically significant difference (p=0.073)
a borderline significant trend (p=0.09)
a certain trend toward significance (p=0.08)
a clear tendency to significance (p=0.052)
a clear trend (p<0.09)
a clear, strong trend (p=0.09)
a considerable trend toward significance (p=0.069)
a decreasing trend (p=0.09)
a definite trend (p=0.08)
a distinct trend toward significance (p=0.07)

And goes on at considerable length.

Hankins doesn’t provide sources for these, so I can’t rule out the possibility that some are quoted out of context in a way that makes them sound worse than they are. Still, if you like snickering at statistical solecisms, snicker away.

I would like to note one quasi-serious point. The ones that talk about a “trend,” and especially “a trend toward significance,” are much worse than the ones that merely use language such as “marginally significant.” In the latter case, the authors are merely acknowledging that the usual threshold for “significance” (p=0.05) is arbitrary. Hankins says that, having agreed to play the significance game, you have to follow its rules, but that seems like excessive pedantry to me. The “trend” language, on the other hand, suggests either a deep misunderstanding of how statistics work or an active attempt to mislead.

Hankins:

For example, “a trend towards significance” expresses non-significance as some sort of motion towards significance, which it isn’t: there is no ‘trend’, in any direction, and nowhere for the trend to be ‘towards’.

This is exactly right. The only thing a p-value does is tell you about the probability that results like the ones you saw could have occurred by chance. Under that hypothesis, a low p-value occurred due to a chance fluctuation and will (with high probability) revert to higher values if you gather more data.

The “trend” language suggests, either deliberately or accidentally, that the results are marching toward significance and will get there if only we can gather more data. But that’s only true if the effect you’re looking for is really there, which is precisely what we don’t know yet. (If we knew that, we wouldn’t need the data.) If it’s not there, then there will be no trend; rather, you’ll get regression to more typical (higher / less “significant”) p-values.

Published by

Ted Bunn

I am chair of the physics department at the University of Richmond. In addition to teaching a variety of undergraduate physics courses, I work on a variety of research projects in cosmology, the study of the origin, structure, and evolution of the Universe. University of Richmond undergraduates are involved in all aspects of this research. If you want to know more about my research, ask me! View all posts by Ted Bunn

One thought on “A significant level of snark”

20 years or so ago, several people tried to place upper limits on the value of the cosmological constant based on gravitational-lensing statistics, back when there were just a handful of gravitational-lens systems. (I bucked the trend by writing a paper on a lower limit for the value of the cosmological constant. OK, it was negative, but now we have more data and, today, it is the (now positive) lower limit which is more interesting.) Someone asked me how low the upper limit could be if we had, say 100 gravitational-lens systems. I replied that that depends on what the value of the cosmological constant actually is, which we didn’t know at the time. I don’t think my reply was understood. (At the time, there was a prejudice that the cosmological constant was zero, so people tried to place upper limits on its value from observation. OK as far as it goes, but one should disregard one’s prejudices when analyzing data.)

Comments are closed.