Peer review, that is.
Remember BICEP2?
They announced a detection of B-mode microwave background polarization, which would be strong evidence that inflation happened in the early universe, but some people expressed doubt about whether they’d adequately eliminated the possibility that what they were seeing was contamination due to more nearby sources of radiation, particularly Galactic dust. (Some other very eminent people then said silly things.)
All of this occurred before the paper describing the results had undergone peer review. The paper has now been accepted for publication in the prestigious journal Physical Review Letters, with significant changes. As of now, the arxiv still has the original version, so it’s easy to compare it with the published version.
The authors have removed all discussion of one of the dust models that they used to argue against the possibility of dust contamination. This is the notorious “DDM2” model, which was based in part on data gleaned from a slide shown in a talk. A footnote explains the removal of this model, saying in part that “we have concluded the information used for the DDM2 model has unquantifiable uncertainty.”
Although the concerns about the DDM2 model got the most attention, people raised a number of concerns about the preprint’s discussion of dust contamination. Presumably the referees agreed, because the published paper is much more cautious in its claims. For instance, take a look at the end of the preprint,
The long search for tensor B-modes is appar- ently over, and a new era of B-mode cosmology has begun.
and compare it with the published version,
We have pushed into a new regime of sensitivity, and the high-confidence detection of B-mode polarization at degree angular scales brings us to an exciting juncture. If the origin is in tensors, as favored by the evidence presented above, it heralds a new era of B-mode cosmology. However, if these B modes represent evidence of a high-dust foreground, it reveals the scale of the challenges that lie ahead.
This is a case in which peer review clearly improved the quality of the paper: the second version is much more accurate than the first.
Other than the removal of the DDM2 model, I don’t think that the actual results have changed; the difference is all in the description of their significance. This is exactly as it should be. Even those of us who harbor doubts about the interpretation generally agree that this is a very important and exciting data set. The researchers deserve high praise for performing an experimental tour de force.
Some people say that the BICEP team shouldn’t have released their results at all until after peer review. I think that this objection is wrongheaded. There’s no way of knowing, but I bet that the official referees were able to give a much better critique of the paper because lots of other experts had been examining and commenting on it.
One argument that we shouldn’t publicize unreviewed results because this sort of thing makes us look bad. The media gave a lot of coverage to the original result and are naturally covering subsequent events as a reversal, which is perhaps embarrassing. In this particular case, of course I wish that the earlier reports had emphasized the doubts (which started to appear right away), but in general I can’t get too upset about this problem. I think it’s much better if the media covers science as it actually is — people get exciting results, and then the rest of the community chews them over before deciding what to think about them — instead of a sanitized version. It seems clear to me that the advantages of an open discussion in figuring out the truth far outweigh the disadvantages in (arguably) bad publicity.
There’s one strange thing about the BICEP2 paper. It appeared in Physical Review Letters, which traditionally is limited to very short papers. The limit used to be four pages. It’s now expressed in word count, but it comes to about the same thing. The published paper is at least five times longer than this limit. I don’t know if this has ever happened before.
Here’s another piece of the puzzle. The preprint doesn’t say which journal it was submitted to but is formatted in a style that doesn’t match PRL at all. In particular, the citations are in author-year format, whereas PRL uses numbered citations.
It’s no big deal, but I’m mildly curious about the explanation for these facts.
It’s not clear to me that this case is evidence that peer review works. If the published paper is better than the pre-print, how much of the improvement should we attribute to peer review, and how much to the public discussion?
I interpret this case as an argument against traditional peer review (which would have precluded the public discussion before publication) and in favor of alternative systems, like posting pre-prints on arxiv.
I agree that the open discussion was more important than the official peer review. If we had to choose between the two, I’d definitely opt for the former. But I think that there’s added value in the formal review, at least on this occasion. The version of the paper with a sort of official imprimatur is the one that does not make unwarranted claims rather than the one that does.
Of course, all I’m really saying here is that I agree more with the referees than I do with the authors. No doubt there are other cases where the reverse is true.