Replication

I heard (via Sean Carroll) about this piece in Science headlined “Replication Effort Provokes Praise—And ‘Bullying’ Charges.” It’s about efforts to replicate published results in certain areas of psychology.

In general, I think that publication bias and dodgy statistics are real problems in science, so I’d bet that lots of results, particularly those that are called “significant” because they clear the ridiculously weak threshold of 5%, are wrong. Apparently lots of people, particularly in certain parts of psychology, are worried about this. I think it’s great for people to try to replicate past results and find out. (Medical researchers are on the case too, particularly John Ioannidis, who claims that “most published research findings are false.”)

The most striking part of the Science piece is the “bullying” claim. It seems ridiculous on its face for a scientist to complain about other people trying to replicate their results. Isn’t that what science is all about? But I can understand in part what they’re worrying about. You can easily imagine someone trying to replicate your work, doing something wrong (or perhaps just different from what you did), and then publicly shaming you because your results couldn’t be replicated. For instance,

Schnall [the original researcher] contends that Donnellan’s effort [to replicate Schnall’s results] was flawed by a “ceiling effect” that, essentially, discounted subjects’ most severe moral sentiments. “We tried a number of strategies to deal with her ceiling effect concern,” Donnellan counters, “but it did not change the conclusions.” Donnellan and his supporters say that Schnall simply tested too few people to avoid a false positive result. (A colleague of Schnall’s, Oliver Genschow, a psychologist at Ghent University in Belgium, told Science in an e-mail that he has successfully replicated Schnall’s study and plans to publish it.)

The solution, of course, is for Donnellan to describe clearly what he did and how it differs from Schnall’s work. The readers can then decide (using Bayesian reasoning, or as I like to call it, “reasoning”) whether those differences matter and hence how much to discount the original work.

The piece quotes Daniel Kahneman giving an utterly sane point of view:

To reduce professional damage, Kahneman calls for a “replication etiquette,” which he describes in a commentary published with the replications in Social Psychology. For example, he says, “the original authors of papers should be actively involved in replication efforts” and “a demonstrable good-faith effort to achieve the collaboration of the original authors should be a requirement for publishing replications.”

If the two groups work in good faith to do a good replication, it’ll make the final results easier to interpret. If the original group refuses to work with people who are trying to replicate their results, well, everyone is entitled to take that into account when performing (Bayesian) reasoning about whether to believe the original results.

 

Dust or not?

Following the recent rumor, some more useful information has been coming out about questions that some people are raising about whether the BICEP experiment really has seen signs of gravitational waves from inflation in the polarization of the cosmic microwave background radiation. The Washington Post has by far the best news article I’ve seen on the subject: it actually quotes people on the record, rather than repeating vague anonymous speculation.

The original rumor seems to be generally true, in the sense that it accurately described some criticisms that cosmologists were making about the BICEP analysis. The rumor does seem to have exaggerated and/or oversimplified things, and of course whether those criticisms are valid or not remains to be seen.

The best place I know of to get the technical details is this talk by Raphael Flauger. (Unfortunately, the video doesn’t show the slides as he’s talking, so if you want to follow it, download the slides first and try to follow along as he talks.) He argues that the dust models used by the BICEP team are inaccurate for a few reasons, mostly having to do with problems associated with the reason in the original rumor: the BICEP team appears to have used an image in a slide from a talk for part of their model, and they seem (he claims) to have misinterpreted what was in that slide. In addition (he claims), there are other errors associated with digitizing the image rather than using the real data (which BICEP doesn’t have access to). Flauger further claims that when you use a different (better?) dust model, the possible contribution of dust to what BICEP saw gets significantly larger, possibly large enough to explain their signal.

If BICEP has offered a detailed, technical rebuttal to this criticism, I haven’t seen it yet.

My personal assessment, based on obviously incomplete information: Flauger’s arguments seem to me to need serious consideration. BICEP needs to supply a detailed response. As of now, I don’t know whether he’s right or not, but my view has changed somewhat since the original rumor. The available information now does seem to me sufficient to substantially lower my own estimate of the probability that BICEP has seen primordial gravitational waves. I was fairly skeptical all along, but now I’m more skeptical. If you must know, I’d put the probability significantly below 50%.

 

Rumors

The story so far:

  • BICEP2 announces a detection of B modes in the cosmic microwave background (CMB) polarization on large angular scales. If this result is correct, it’s very strong evidence that inflation happened in the very early Universe and is a really big deal. But that “if” part is important: we shouldn’t place too much confidence in this result until it’s independently confirmed.
  • In the blog Résonaances, Adam Falkowski publishes a rumor that an error had been found in the BICEP2 analysis.
  • Various science news outlets pick up the story (particularly this one and this one). They ask the BICEP2 people what they think, and the BICEP2 people vehemently stand by their results.

So what are we supposed to think?

The key claim in the Résonaances post is that the BICEP2 team made an error in modeling Galactic dust. This is potentially important, as an important part of the analysis is testing to make sure that the signal seen in the data is due to the CMB and not to boring, nearby sources such as dust.

Résonaances:

To estimate polarized emission from the galactic dust, BICEP digitized an unpublished 353 GHz map shown by the Planck collaboration at a conference.  However, it seems they misinterpreted the Planck results: that map shows the polarization fraction for all foregrounds, not for the galactic dust only (see the “not CIB subtracted” caveat in the slide). Once you correct for that and rescale the Planck results appropriately, some experts claim that the polarized galactic dust emission can account for most of the BICEP signal.

This looks to me like it might be at least partially true.

There is not a definitive map of polarized Galactic dust emission, so the BICEP team had to cobble together models of dust from different sources. They did so in several different ways: section 9.1 of their paper lists six different dust models. One of these models is based on data from the Planck satellite. It appears that they created the model using a digitized image of a slide from a talk by the Planck people, because the relevant data hadn’t been released in any other form. (Footnote 33 of the paper is the evidence for this last statement, in case you want to check it out.) The evidence does seem to me to support Falkowski’s statement: the image in question explicitly says “not CIB subtracted,” meaning that the data that went into that image includes other stuff besides what the BICEP team wanted. This does seem like a flaw in the construction of this particular model.

But it seems to me that Falkowski greatly overstates the significance of this flaw. For one thing, this is just one of six dust models used in the analysis. It was regarded as in some sense the “best” of them, but the more important point is that the other models yielded similar results. The BICEP team’s claim, as I understand it, is that the entire analysis, taking into account all the models, makes it implausible that dust is the source of the signal. Even if you throw out this model, I don’t think that that claim is significantly weakened.

As I’ve said before, I don’t think that the BICEP team has made a thoroughly convincing case that what they’ve seen can’t be foreground contamination. I think we need more data to answer that question. But even if Falkowski has correctly identified an error in the analysis, I don’t think that it changes the level of doubt all that much.

In the past, I’ve found Résonaances to be a good source of information, but I can’t say I’m thrilled with the way Falkowski handled this.

Animal magnetism

Interesting piece in Nature:

Interference from electronics and AM radio signals can disrupt the internal magnetic compasses of migratory birds, researchers report today in Nature1. The work raises the possibility that cities have significant effects on bird migration patterns.

That’s from a news item. The actual paper is here (possibly paywalled).

There’s strong evidence that some animals (birds, sharks, and bacteria, among others) respond to the Earth’s magnetic field, but the mechanisms by which they sense the field are still quite uncertain in many cases. Physics Today did a nice overview of this about six years ago. I think it’s fascinating that such a simple question remains unsolved.

The new result appears to be that robins do poorly at orienting themselves to Earth’s magnetic field when they’re in an environment with human-generated radio frequency electromagnetic fields, but when you shield them from those fields, they get better. Here’s a figure from the paper:

The dots around the two blue circles show the way the birds oriented themselves when they were inside of a grounded metal shield. The two red circles show what happened when the shield was not grounded. In each case, the arrow at the center is the average of all the directions, and the dashed circle shows the threshold for a significant deviation from random (5% significance, I believe). The graphs below are the field strengths with and without grounding, as functions of frequency.

These results barely exceed the 5% threshold, but the paper gives results of other similar experiments that show the same pattern.

Although the experiment seems to have been well-designed, I have to admit I’m skeptical, for a familiar reason: you should never believe an experiment until it’s been confirmed by a theory. I find it hard to imagine a mechanism for birds to sense magnetic fields that would be disrupted by the weak, low-frequency fields involved here.

The authors acknowledge this:

Any report of an effect of low-frequency electromagnetic fields on a biological system should be subjected to particular scrutiny for at least three reasons. First, such claims in the past have often proved difficult to reproduce. Second, animal studies are commonly used to evaluate human health risks and have contributed to guidelines for human exposures. Third, “seemingly implausible effects require stronger proof”.

Here’s what they say about mechanisms:

The biophysical mechanism that would allow such extraordinarily weak, broadband electromagnetic noise to affect a biological system is far from clear. The energies involved are tiny compared to the thermal energy, kBT, but the effects might be explained if hyperfine interactions in light-induced radical pairs or large clusters of iron-containing particles are involved.

The “tiny compared to the thermal energy” part is the really puzzling thing. If these electromagnetic fields are having an effect inside the system, they must do it by something absorbing photons (since that’s all an electromagnetic field is). But the energy of a photon at these frequencies is tiny in comparison to the thermal energy sloshing around a biological system anyway, so how could there be an effect?

The first of these two mechanisms seems to refer to one of the proposed mechanisms for magnetoreception in birds, which Wikipedia describes as follows:

According to one model, cryptochrome, when exposed to blue light, becomes activated to form a pair of two radicals (molecules with a single unpaired electron) where the spins of the two unpaired electrons are correlated. The surrounding magnetic field affects the kind of this correlation (parallel or anti-parallel), and this in turn affects the length of time cryptochrome stays in its activated state.

I think that this mechanism involves tiny energy differences between quantum states of a system, depending on how the electron spins are oriented. If the energy differences are tiny enough, then I guess low-frequency EM fields could disrupt the effect. But if the energy differences are that small, then I don’t understand why normal thermal fluctuations don’t mess it up all the time. I guess that for this mechanism to work, the electrons have to be shielded from thermal fluctuations, but external EM fields could still sneak in and mess them up. I guess that might be possible, but I’d want to see the details.

I completely don’t get what the authors are talking about when they refer to “large clusters of iron-containing particles”. I can’t see any conceivable way such particles could be affected by weak oscillating fields of the sort described here.

I have no idea whether you should believe this result or not. I hope that others will attempt to replicate it. If it’s real, it’s got to be a big clue about the interesting puzzle of how birds feel magnetic fields.