The Open Journal of Astrophysics

I’m pleased to point out that the Open Journal of Astrophysics is now open for submissions. Editor-in-chief Peter Coles has all the details. I’m a member of the editorial board of this new journal, although I confess that I have done nothing to help with it so far.

The journal performs peer review like other scholarly journals. Articles in it are published on the arxiv (where most astrophysics articles get posted anyway). The journal does not have the overhead of publishing in the traditional way, so it is free for both authors and readers.

I find the economics of traditional scholarly journals utterly baffling. As Coles observes, “The only useful function that journals provide is peer review, and we in the research community do that (usually for free) anyway.” I hope that efforts like this one will point the way to a more efficient system. I urge my astrophysics colleagues to submit articles to it.

Now let me confess to a bit of hypocrisy, or at least timidity. I’m hoping to submit an article for publication in the next few weeks, and I’m planning to send it to an established journal, not the Open Journal. The only reason is that I expect to apply for promotion (from Associate Professor to Full Professor) this summer, and I think there’s a significant possibility that some of the people evaluating my application will be more impressed by an established journal, with all the various accoutrements such as impact factors that go along with it.

This is quite possibly the last time in my career that I’ll have to worry about this sort of thing. In general, I care about the opinions of people who have actually read my work and formed a judgment based on its merits. Such people don’t need to rely on things like impact factors, which are a terribly stupid way to evaluate quality. So after this one, I’ll promise to submit future articles to the Open Journal (unless I have coauthors whom I can’t persuade to do it, I guess).


Which candidates are most electable, according to the market?

When people talk about US politics, they often focus on the various candidates’ “electability”. In particular, they talk about basing their support for a given candidate in the primary on how likely that candidate is to win the election.

This is a perfectly reasonable thing to think about, of course. If your primary goal is, say, to get a Democrat into the White House, then it makes sense to pick the Democrat who’s most likely to get there, even if that’s not your favorite candidate. The only problem is that, I suspect, people are often quite bad at guessing who’s the most electable candidate.

Eight years ago, I observed (1,2,3) that there is one source of data that might help with this, namely the political futures markets. These are sites where bettors can place bets on the outcomes of the elections. The odds available on the market at any given time show the probabilities that the bettors are willing to assign to the various outcomes. For instance, as of yesterday, at the PredictIt market, you could place a bet of $88 that would pay off $100 if Hillary Clinton wins the Democratic nomination. This means that the market “thinks” Clinton has an 88% chance of getting the nomination.

To assess a candidate’s electability, you want the conditional probability that the candidate wins the election, if he or she wins the nomination. The futures markets don’t tell you those probabilities directly, but you can get them from the information they do give.

Here’s a fundamental law of probability:

P(X becomes President) = P(X is nominated) * P(X becomes President, given that X is nominated).

The last term, the conditional probability, is the candidate’s “electability”. PredictIt lets you bet on whether a candidate will win the nomination, and on whether a candidate will win the general election. The odds for those bets tell you the other two probabilities in that equation, so you can get the electability simply by dividing one by the other.

So, as of Saturday, December 5, here’s what the PredictIt investors think about the various candidates:

PartyCandidateNomination ProbabilityElection ProbabilityElectability

(In case you’re wondering, the 0.5’s are because PredictIt has a 1% difference between the buy and sell prices on all these contracts. I went with the average of the two. They include other candidates, with lower probabilities, but I didn’t include them in this table.)

I’ve heard lots of people on the left say that they hope Donald Trump wins the nomination, because he’s unelectable — that is, the Democrat would surely beat him in the general election. I don’t know if that’s true or not, but it’s sure not what this market is saying.

Of course, the market could be wrong. If you think it is, then you have a chance to make some money. In particular, if you do think that Trump is unelectable, you can go place a bet against him to win the general election.

To be more specific, suppose that you are confident the market has overestimated Trump’s electability. That means that they’re either overestimating his odds of winning the general election, or they’re underestimating his odds of getting the nomination. If you think you know which is wrong, then you can bet accordingly. If you’re not sure which of those two is wrong, you can place a pair of bets: one that he’ll lose the general election, and one that he’ll win the nomination. Choose the amounts to hedge your bet, so that you break even if he doesn’t get the nomination. This amounts to a direct bet on Trump’s electability. If you’re right that his electability is less than 61%, then this bet will be at favorable odds.

So to all my lefty friends who say they hope Trump wins the nomination, so that Clinton (or Sanders) will stroll into the White House, I say put your money where your mouth is.


That error the “hot hands” guys say that everyone makes? Turns out nobody makes it

A followup to this post.

First, a recap. Last month, there was an article in the New York Times touting a working paper by Miller and Sanjurjo. The paper claimed that various things related to the gambler’s fallacy could be explained by a certain (supposedly) counterintuitive fact about probabilities. In particular, it claimed that past attempts to measure the “hot hands” phenomenon (the perception that, say, basketball players are more likely to make a shot when they’ve made their previous shots) were tainted by a mistaken intuition about probabilities.

The mathematical result described in the paper is correct, but I was very dubious about the claim that it was counterintuitive, and also about the claim that it was responsible for errors in past published work.

To save you the trouble of following the link, here are some excerpts from my previous post:

Suppose you flip a coin four times. Every time heads comes up, you look at the next flip and see if it’s heads or tails. (Of course, you can’t do this if heads comes up on the last flip, since there is no next flip.) You write down the fraction of the time that it came up heads. For instance, if the coin flips went HHTH, you’d write down 1/2, because the first H was followed by an H, but the second H was followed by a T.

You then repeat the procedure many times (each time using a sequence of four coin flips). You average together all the results you get. The average comes out less than 1/2.

I guess that might be a counterintuitive result. Maybe. Personally, I find the described procedure so baroque that I’m not sure I would have had any intuition at all as to what the result should be.

My question is whether the average-of-averages procedure described in the article actually corresponds to anything that any actual human would do.

According to the Miller-Sanjurjo paper, in previous published work,

The standard measure of hot hand effect size in these studies is to compare the empirical probability of a hit on those shots that immediately follow a streak of hits to the empirical probability of a hit on those shots that immediately follow a streak of misses.

If someone did that for a bunch of different people, and then took the mean of the results, and expected that mean to be zero in the absence of a hot-hands effect, they would indeed be making the error Miller and Sanjurjo describe, because it’s true that these two means differ for the reason described in the paper. So does anyone actually do this?

The sentence I quoted above cites three papers. I don’t seem to have full-text access to one of them, but I looked at the other two. One of them (Koehler and Conley) contains nothing remotely like the procedure described in this working paper. Citing it in this context is extremely misleading.

The other one (Gilovich et al.) does indeed calculate the probabilities described in the quote, but (a) that’s just one of many things it calculates, and (b) they never take the mean of the results. Fact (a) casts doubt on the claim that this is “the standard measure” — it’s a tiny fraction of what Gilovich et al. talk about — and (b) means that they don’t make the error anyway.

The relevant section of Gilovich et al. is Table 4, which tallies the results of an experiment in which Cornell basketball players attempted a sequence of 100 free throws each. The authors calculate the probabilities of getting a hit after a sequence of hits and after a sequence of misses, and note little difference between the two. Miller and Sanjurjo’s main point is that the mean of the differences should be nonzero, even in the absence of a “hot hands” effect. That’s true, but since Gilovich et al. don’t base any conclusions on the mean, it’s irrelevant.

Gilovich et al. do talk about the number of players with a positive or negative difference, but that’s different from the mean. In fact, the median difference between the two probabilities is unaffected by the Miller-Sanjurjo effect (in a simulation I did, it seems to be zero in the absence of a “hot hands” effect), so counting the number of players with positive or negative differences seems like it might be an OK thing to do.

In any case, Gilovich et al. draw their actual conclusions about this study from estimates of the serial correlation, which is an unimpeachably sensible thing to do, and which is unaffected by the Miller-Sanjurjo effect.

So I can find no evidence that anyone actually makes the error that Miller and Sanjurjo claim to be widespread. Two of the three papers they cite as examples of this error are free of it. I couldn’t check the third. I suppose I could get access to it with more effort, but I’m not going to bother.

A milestone for UR Physics

Check out this list produced by the American Institute of Physics:


This is the first time we’ve made the list. As department chair, I’d like to take credit for the accomplishment, but a close reading of the data will reveal that it all happened before I took over. Everyone in the department has worked hard to build our program, but particular credit has to go to the two previous department chairs, Jerry Gilfoyle and Con Beausang.


Another strange NY Times article about probability

Oddly enough, I still read the Sunday New York Times on paper. As a result, I was extremely confused by George Johnson’s article headlined Gamblers, Scientists, and the Mysterious Hot Hand.

The heart of the article is this claim:

In a study that appeared this summer, Joshua B. Miller and Adam Sanjurjo suggest why the gambler’s fallacy remains so deeply ingrained. Take a fair coin — one as likely to land on heads as tails — and flip it four times. How often was heads followed by another head? In the sequence HHHT, for example, that happened two out of three times — a score of about 67 percent. For HHTH or HHTT, the score is 50 percent.

Altogether there are 16 different ways the coins can fall. I know it sounds crazy but when you average the scores together the answer is not 50-50, as most people would expect, but about 40-60 in favor of tails.

Maybe it’s just me, but I couldn’t make sense of this claim. The online version has a graphic that clears it up. The last sentence is literally correct (with the possible exception of the phrase “as most people would expect,” as I’ll explain), but I couldn’t manage to parse it. I wonder if it was just me.

Here’s my summary of exactly what is being said:

Suppose you flip a coin four times. Every time heads comes up, you look at the next flip and see if it’s heads or tails. (Of course, you can’t do this if heads comes up on the last flip, since there is no next flip.) You write down the fraction of the time that it came up heads. For instance, if the coin flips went HHTH, you’d write down 1/2, because the first H was followed by an H, but the second H was followed by a T.

You then repeat the procedure many times (each time using a sequence of four coin flips). You average together all the results you get. The average comes out less than 1/2.

I guess that might be a counterintuitive result. Maybe. Personally, I find the described procedure so baroque that I’m not sure I would have had any intuition at all as to what the result should be. Hence my skepticism about the “as most people would expect” phrase. I think that if you took a survey, you’d get something like this:


(And, by the way, I don’t mean this as an insulting remark about the average person’s mathematical skills: I think I would have been in the 90%.)

The reason you don’t get 50% from this procedure is that it weights the outcomes of individual flips unevenly. For instance whenever HHTH shows up, the HH gets an effective weight of 1/2 (because it’s averaged together with the HT). But in each instance of HHHH, each of the three HH’s gets an effective weight of 1/3 (because there are three of them in that sequence). The correct averaging procedure is to count all the individual instances of the things you’re looking for (HH’s); not to group them together, average those, and then average the averages.

My question is whether the average-of-averages procedure described in the article actually corresponds to anything that any actual human would do.

The paper Johnson cites (which, incidentally, is an non-peer-reviewed working paper) makes grandiose claims about this result. For one thing, it is supposed to explain the “gambler’s fallacy.” Also, supposedly some published analyses of the “hot hands” phenomenon in various sports are incorrect because the authors used an averaging method like this.

At some point, I’ll look at the publications that supposedly fall prey to this error, but I have to say that I find all this extremely dubious. It doesn’t seem at all likely to me that that bizarre averaging procedure corresponds to people’s intuitive notions of probability, nor does it seem likely that a statistician would use such a method in a published analysis.




Replication in psychology

A few quick thoughts on the news that many published results in psychology can’t be replicated.

First, good for the researchers for doing this!

Second, I’ve read that R.A. Fisher, who was largely responsible for introducing the notion of using p-values to test for statistical significance, regarded the standard 5% level as merely an indication that something interesting might be going on, not as a definitive detection of an effect (although other sources seem to indicate that his views were more complicated than that). In any case, whether or not that’s what Fisher thought, it’s a good way to think of things. If you see a hypothesis confirmed with a 5% level of significance, you should think, “Hmm. Somebody should do a follow-up to see if this interesting result holds up,” rather than “Wow! It must be true, then.”

Finally, a bit of bragging about my own discipline. There’s plenty of bad work in physics, but I suspect that the particular problems that this study measured are not as bad in physics. The main reason is that in physics we do publish, and often even value, null results.

Take, for instance, attempts to detect dark matter particles. No one has ever done it, but the failed attempts to do so are not only publishable but highly respected. Here is a review article on the subject, which includes the following figure:


Every point in here is an upper limit — a record of a failed attempt to measure the number of dark matter particles.

I suspect that part of the reason we do this in physics is that we often think of our experiments primarily as measuring numbers, not testing hypotheses. Each dark matter experiment can be thought of as an attempt to measure the density of dark matter particles. Each measurement has an associated uncertainty. So far, all of those measurements have included the value zero within their error bars — that is, they have no statistically significant detection, and can’t rule out the null hypothesis that there are no dark matter particles. But if the measurement is better than previous ones — if it has smaller errors — then it’s valued.


538 on p-hacking

Christie Aschwanden has a piece on about the supposed “crisis” in science. She writes about recent high-profile results that turned out to be wrong, flaws in peer review, and bad statistics in published papers.

By far the best part of the article is the applet that lets you engage in your own p-hacking. It’s a great way to illustrate what p-hacking is and why it’s a problem. The idea is to take a bunch of data on performance of the US economy over time, and examine whether it has done better under Democrats or Republicans. There are multiple different measures of the economy one might choose to focus on, and multiple ways one might quantify levels of Democratic or Republican power. The applet lets you make different choices and determines whether there’s a statistically significant effect. By fiddling around for a few minutes, you can easily get a “significant” result in either direction.

Go and play around with it for a few minutes.

The rest of the article has some valuable observations, but it’s a bit of a hodgepodge. Curmudgeon that I am, I have to complain about a couple of things.

Here’s a longish quote from the article:

P-hacking is generally thought of as cheating, but what if we made it compulsory instead? If the purpose of studies is to push the frontiers of knowledge, then perhaps playing around with different methods shouldn’t be thought of as a dirty trick, but encouraged as a way of exploring boundaries. A recent project spearheaded by Brian Nosek, a founder of the nonprofit Center for Open Science, offered a clever way to do this.

Nosek’s team invited researchers to take part in a crowdsourcing data analysis project. The setup was simple. Participants were all given the same data set and prompt: Do soccer referees give more red cards to dark-skinned players than light-skinned ones? They were then asked to submit their analytical approach for feedback from other teams before diving into the analysis.

Twenty-nine teams with a total of 61 analysts took part. The researchers used a wide variety of methods, ranging — for those of you interested in the methodological gore — from simple linear regression techniques to complex multilevel regressions and Bayesian approaches. They also made different decisions about which secondary variables to use in their analyses.

Despite analyzing the same data, the researchers got a variety of results. Twenty teams concluded that soccer referees gave more red cards to dark-skinned players, and nine teams found no significant relationship between skin color and red cards.



The variability in results wasn’t due to fraud or sloppy work. These were highly competent analysts who were motivated to find the truth, said Eric Luis Uhlmann, a psychologist at the Insead business school in Singapore and one of the project leaders. Even the most skilled researchers must make subjective choices that have a huge impact on the result they find.

But these disparate results don’t mean that studies can’t inch us toward truth. “On the one hand, our study shows that results are heavily reliant on analytic choices,” Uhlmann told me. “On the other hand, it also suggests there’s a there there. It’s hard to look at that data and say there’s no bias against dark-skinned players.” Similarly, most of the permutations you could test in the study of politics and the economy produced, at best, only weak effects, which suggests that if there’s a relationship between the number of Democrats or Republicans in office and the economy, it’s not a strong one.

The last paragraph is simply appalling. This is precisely the sort of conclusion you can’t draw. Some methods got marginally “significant” results — if you define “significance” by the ridiculously weak 5% threshold — and others didn’t. The reason p-hacking is a problem is that people may be choosing their methods (either consciously or otherwise) to lead to their preferred conclusion. If that’s really a problem, then you can’t draw any valid conclusion from the fact that these analyses tended to go one way.

As long as I’m whining, there’s this:

Take, for instance, naive realism — the idea that whatever belief you hold, you believe it because it’s true.

Naive realism means different things to psychologists and philosophers, but this isn’t either of them.

Anyway, despite my complaining, there’s some good stuff in here.

How strong is the scientific consensus on climate change?

There is overwhelming consensus among scientists that climate change is real and is caused largely by human activity. When people want to emphasize this fact, they often cite the finding that 97% of climate scientists agree with the consensus view.

I’ve always been dubious about that claim, but in the opposite way from the climate-change skeptics: I find it hard to believe that the figure can be as low as 97%. I’m not a climate scientist myself, but whenever I talk to one, I get the impression that the consensus is much stronger than that.

97% may sound like a large number, but a scientific paradigm strongly supported by a wide variety of forms of evidence will typically garner essentially 100% consensus among experts. That’s one of the great things about science: you can gather evidence that, for all practical purposes, completely settles a question. My impression was that human-caused climate change was pretty much there.

So I was interested to see this article by James Powell, which claims that the 97% figure is a gross understatement. The author estimates the actual figure to be well over 99.9%. A few caveats: I haven’t gone over the paper in great detail, this is not my area of expertise, and the paper is still under peer review. But I find this result quite easy to believe, both because the final number sounds plausible to me and because the main methodological point in the paper strikes me as unquestionably correct.

Powell points out that the study that led to the 97% figure was derived in a way that excluded a large number of articles from consideration:


Cook et al. (2013) used the Web of Science to review the titles and abstracts of peer-reviewed articles from 1991-2011 with the keywords “global climate change” and “global warming.” With no reason to suppose otherwise, the reader assumes from the title of their article, “Quantifying the consensus on anthropogenic global warming [AGW] in the scientific literature,” that they had measured the scientific consensus on AGW using consensus in its dictionary definition and common understanding: the extent to which scientists agree with or accept the theory. But that is not how CEA used consensus.

Instead, CEA defined consensus to include only abstracts in which the authors “express[ed] an opinion on AGW (p. 1).” No matter how clearly an abstract revealed that its author accepts AGW, if it did “not address or mention the cause of global warming (p. 3),” CEA classified the abstract as having “no position” and omitted it from their calculation of the consensus. Of the 11944 articles in their database, 7930 (66.4%) were labeled as taking no position. If AGW is the ruling paradigm of climate science, as CEA set out to show, then rather than having no position, the vast majority of authors in that category must accept the theory. I return to this point below.
CEA went on to report that “Among abstracts expressing a position on AGW, 97.1% endorsed the consensus position that humans are causing global warming (p. 6, italics added).” Thus, the now widely adopted “97% consensus” refers not to what scientists accept, the conventional meaning, but to whether they used language that met the more restrictive CEA definition.

This strikes me as a completely valid critique. When there is a consensus on something, people tend not to mention it at all. Articles in physics journals pretty much never state explicitly in the abstract that, say, quantum mechanics is a good theory, or that gravity is what makes things fall.

I particularly recommend the paper’s section on plate tectonics as an explicit example of how the above methodology would lead to a false conclusion.

Let me be clear: I haven’t studied this paper enough to vouch for the complete correctness the methodology contained in it. But it looks to me like the author has made a convincing case that the 97% figure is a great understatement.


Prescription: retire the words “prescriptivist” and “descriptivist”

If, like me, you like arguments about English grammar and usage, you frequently come across the distinction between “prescriptivists” and “descriptivists.” Supposedly, prescriptivists are people who think that grammatical rules are absolute and unchanging. Descriptivists, on the other hand, supposedly think that there’s no use in rules that tell people how they should speak and write, and that all that matters is the way people actually do speak and write. As the story goes, descriptivists regard prescriptivists as persnickety schoolmarms, while prescriptivists regard descriptivists as people with no standards who are causing the language to go to hell in a handbasket.

In fact, of course, these terms are mostly mere invective. If you call someone either of these names, you’re almost always trying to tar that person with an absurd set of beliefs that nobody actually holds.

A prescriptivist, in the usual telling, is someone who thinks that all grammatical rules are set in stone forever. (Anyone who actually believed this would presumably have to speak only in Old English, or better yet, proto-Indo-European.) A descriptivist, on the other hand, is supposedly someone who thinks that all ways of writing and speaking are equally valid, and that we therefore shouldn’t teach students to write standard English grammar.

I can’t claim that there is nobody who believes either of these caricatures — I’m old enough to have learned that there’s someone out there who believes pretty much any foolish thing you can think of — but I do claim that practically nobody you’re likely to encounter when wading into a usage debate fits them.

I was reminded of all this when listening to an interview with Bryan Garner, author of Modern American Usage, which is by far the best usage guide I know of. Incidentally, I’m not the only one in my household who likes Garner’s book: my dog Nora also devoured it.


Like many people, I first learned about Garner in David Foster Wallace’s essay “Authority and American Usage,” which originally appeared in Harper’s and was reprinted in his book Consider the Lobster. Wallace’s essay is very long and shaggy, but it’s quite entertaining (if you like that sort of thing) and has some insightful observations. The essay is nominally a review of Garner’s usage guide, but it turns into a meditation on the nature of grammatical rules and their role in society and education.

From his book, Garner strikes me as a clear thinker about lexicographic matters, so I was disappointed to hear him go in for the most simpleminded straw-man caricatures of the hated descriptivists in that interview.

Garner’s scorn is mostly reserved for Steven Pinker. Pinker is pretty much the arch-descriptivist, in the minds of those people for whom that is a term of invective. But that hasn’t stopped him from writing a usage guide, in which he espouses some (but not all) of the standard prescriptivist rules. Pinker’s and Garner’s approaches actually have something in common: both try to give reasons for the various rules they advocate, rather than simply issuing fiats. But because Garner thinks of Pinker as a loosy-goosy descriptivist, he can’t bring himself to engage with Pinker’s actual arguments.

Garner says that Pinker has “flip-flopped,” and that his new book is  “a confused book, because he’s trying to be prescriptivist while at the same time being descriptivist.” As it turns out, what he means by this is that Pinker has declined to nestle himself into the pigeonhole that Garner has designated for him. I’ve read all of Pinker’s general-audience books on language — his first such book, The Language Instinct, may be the best pop-science book I’ve ever read — and I don’t see the new one as contradicting the previous ones. Pinker has never espoused the straw-man position that all ways of writing are equally good, or that there’s no point in trying to teach people write more effectively. Garner thinks that that’s what a descriptivist believes, and so he can’t be bothered to check.

The Language Instinct has a chapter on “language mavens,” which is the place to go for Pinker’s views on prescriptive grammatical rules. (That chapter is essentially reproduced as an essay published in the New Republic.) Garner has evidently read this chapter, as he mockingly summarizes Pinker’s view as “You shouldn’t criticize the way people use language any more than you should criticize how whales emit their moans,” which is a direct reference to an analogy found in this chapter. But he either deliberately or carelessly misleads the reader about its meaning.

Pinker is not saying that there are no rules that can help people improve their writing. Rather, he’s making the simple but important point that scientists are more interested in studying language as a natural system (how do people talk) than in how they should talk.

So there is no contradiction, after all, in saying that every normal person can speak grammatically (in the sense of systematically) and ungrammatically (in the sense of nonprescriptively), just as there is no contradiction in saying that a taxi obeys the laws of  physics but breaks the laws of Massachusetts.

Pinker is a descriptivist because, as a scientist, he’s more interested in the first kind of rules than the second kind. It doesn’t follow from this that he thinks the second kind don’t or shouldn’t exist. A physicist is more interested in studying the laws of physics than the laws of Massachusetts, but you can’t conclude from this that he’s an anarchist.

(Pinker does unleash a great deal of scorn on the language mavens, not for saying that there are guidelines that can help you improve your writing, but for saying specific stupid things, which he documents thoroughly and, in my opinion, convincingly.)

Although Pinker is the only so-called descriptivist Garner mentions by name, he does tar other people by saying, “There was this view, in the mid-20th century, that we should not try to change the dialect into which somebody was born.” He doesn’t indicate who those people were, but my best guess is that this is a reference to the controversy that arose over Webster’s Third in the 1950s. If so, it sounds as if Garner (like Wallace) is buying into a mythologized version of that controversy.

It seems to me that the habit of lumping people into the “prescriptive” and “descriptive” categories is responsible for Garner’s inability to pay attention to what Pinker et al. are actually saying (and for various other silly things he says in this interview). All sane people agree with the prescriptivists that some ways of writing are more effective than others and that it’s worthwhile to try to teach people what those ways are. All sane people agree with the descriptivists that some specific things written by some language “experts” are stupid, and that at least some prescriptive rules are mere shibboleths, signaling membership in an elite group rather than enhancing clarity. All of the interesting questions arise after you acknowledge that common ground, but if you start by dividing people up according to a false dichotomy, you never get to them.

Hence my prescription.


It’s still not rocket science

In the last couple of days, I’ve seen a little flareup of interest on social media in the “reactionless drive” that supposedly generates thrust without expelling any sort of propellant. This was impossible a year ago, and it’s still impossible.

OK, it’s not literally impossible in the mathematical sense, but it’s close enough. Such a device would violate the law of conservation of momentum, which is an incredibly well-tested part of physics. Any reasonable application of reasoning (or as some people insist on calling it, Bayesian reasoning) says, with overwhelmingly high probability, that conservation of momentum is right and this result is wrong.

Extraordinary claims require extraordinary evidence, never believe an experiment until it’s been confirmed by a theory, etc.

The reason for the recent flareup seems to be that another group has replicated the original group’s results. They actually do seem to have done a better job. In particular, they did the experiment in a vacuum. Bizarrely, the original experimenters went to great lengths to describe the vacuum chamber in which they did their experiment, and then noted, in a way that was easy for a reader to miss, that the experiments were done “at ambient pressure.” That’s important, because stray air currents were a plausible source of error that could have explained the tiny thrust they found.

The main thing to note about the new experiment is that they are appropriately circumspect in describing their results. In particular, they make clear that what they’re seeing is almost certainly some sort of undiagnosed effect of ordinary (momentum-conserving) physics, not a revolutionary reactionless drive.

We identified the magnetic interaction of the power feeding lines going to and from the liquid metal contacts as the most important possible side-effect that is not fully characterized yet. Our test campaign can not confirm or refute the claims of the EMDrive …

Just because I like it, let me repeat what my old friend John Baez said about the original claim a year ago. The original researchers speculated that they were seeing some sort of effect due to interactions with the “quantum vacuum virtual plasma.” As John put it,

 “Quantum vacuum virtual plasma” is something you’d say if you failed a course in quantum field theory and then smoked too much weed.