Boycott Elsevier

I’ve mentioned before some of the reasons for academics not to do business with the  publisher Elsevier. A bunch of scientists boycott Elsevier journals. I’ve just signed onto the boycott myself, and I urge my colleagues to do so too.

To be honest, this is an easy stand for me to take, since there’s pretty much never a time when refusing to publish in an Elsevier journal imposes a significant cost on me. For people in fields in which Elsevier journals were clearly better / more prestigious than others, the situation’s a bit different, I suppose.

Who knows what evil lurks in the hearts of men? The Bayesian doesn’t care.

Let me tell you a story (originally inspired by this post on Allen Downey’s blog).

Frank and Betsy are wondering whether a particular coin is a fair coin (i.e., comes up heads and tails equally often when flipped).  Frank, being a go-getter type, offers to do some tests to find out. He takes the coin away, flips it a bunch of times, and eventually comes back to Betsy to report his results.

“I flipped the coin 3022 times,” he says, “and it came up heads 1583 times. That’s 72 more heads than you’d expect with a fair coin. I worked out the p-value — that is, the probability of this large an excess occurring if the coin is fair — and it’s under 1%. So we can conclude that the coin is unfair at a significance level of  1% (or ‘99% confidence’ as physicists often say).”

You can take my word for it that Frank’s done the calculation correctly (or you can check it yourself if you like). Now, I want you to consider two different possibilities:

  1. Frank is an honest man, who has followed completely orthodox (frequentist) statistical procedure. To be specific, he decided on the exact protocol for his test (including, for some reason, the decision to do 3022 trials) in advance.
  2. Frank is a scoundrel who, for some reason, wants to reach the conclusion that the coin is unfair. He comes up with a nefarious plan: he keeps flipping the coin for as long as it takes to reach that 1% significance threshold, and then he stops and reports his results.

(I thought about making up some sort of backstory to explain why scoundrel Frank would behave this way, but I couldn’t come up with anything that wasn’t stupid.)

Here are some questions for you:

  • What should Betsy conclude on the basis of the information Frank has given her?
  • Does the answer depend on whether Frank is an honest man or a scoundrel?

I should add one more bit of information: Betsy is a rational person — that is, she draws conclusions from the available evidence via Bayesian inference.

As you can guess, I’m asking these questions because I think the answers are surprising. In fact, they turn out to be surprising in two different ways.

There’s one thing we can say immediately: if Frank is a scoundrel, then the 1% significance figure is meaningless. It turns out that, if you start with a fair coin and flip it long enough, you will (with probability 1) always eventually reach 1% significance (or, for that matter, any other significance you care to name). So the fact that he reached 1% significance conveys no information in this scenario.

On the other hand, the fact that he reached 1% significance after 3022 trials does still convey some information, which Betsy will use when she performs her Bayesian inference. In fact, the conclusion Betsy draws will be exactly the same whether Frank is an honest man or a scoundrel. The reason is that, either way, the evidence Betsy uses in performing her Bayesian inference is the same, namely that there were 1583 heads in 3022 flips.

[Technical aside: if Frank is a scoundrel, and Betsy knows it, then she has some additional information about the order in which those heads and tails occurred. For instance, she knows that Frank didn’t start with an initial run of 20 heads in a row, because if he had he would have stopped long before 3022 flips. You can convince yourself that this doesn’t affect the conclusion.]

That’s surprise #1. (At least, I think it’s kind of surprising. Maybe you don’t.) From a frequentist point of view, the p-value is the main thing that matters. Once we realize that the p-value quoted by scoundrel Frank is meaningless, you might think that the whole data set is useless. But in fact, viewed rationally (i.e., using Bayesian inference), the data set means exactly the same thing as if Frank had produced it honestly.

Here’s surprise #2: for reasonable assumptions about Betsy’s prior beliefs, she should regard this evidence as increasing the probability that the coin is fair, even though Frank thinks the evidence establishes (at 1% significance) the coin’s unfairness. Moreover, even if Frank’s results had ruled out the coin’s fairness at a more stringent signficance (0.1%, 0.00001%, whatever), it’s always possible that he’ll wind up with a result that Betsy regards as evidence in favor of the coin’s fairness.

Often, we expect Bayesians and frequentists to come up with different conclusions when the evidence is weak, but we expect the difference to go away when the evidence is strong. But in fact, no matter how strong the evidence is from a frequentist point of view, it’s always possible that the Bayesian will view it in precisely the opposite way.

I’ll show you that this is true with some specific assumptions, although the conclusion applies more generally.

Suppose that Betsy’s initial belief is that 95% of coins are fair — that is, the probability P that they come up heads is exactly 0.5. Betsy has no idea what the other 5% of coins are like, so she assumes that all values of P are equally likely for them. To be precise, her prior probability density on P, the probability that the given coin comes up heads, is

Pr[P] = 0.95 δ(P-0.5) + 0.05

over the range 0 < < 1. (I’m using the Dirac delta notation here.)

The likelihood function (i.e., the probability of getting the observed evidence for any given P) is

Pr[ E | P] = A P1583 (1-P)1439.

Here A is a constant whose value doesn’t matter. (To be precise, it’s the number of possible orders in which heads and tails could have arisen.) Turning the Bayes’s theorem crank, we find that the posterior probability distribution is

Pr[P | E] = 0.964 δ(P-0.5) + B P1583 (1-P)1439.

Here B is some other constant I’m not bothering to tell you because it doesn’t matter. What does matter is the factor 0.964 in front of the delta function, which says that, in this particular case, Betsy regards Frank’s information as increasing the probability that the coin is fair from 95% to 96.4%. In other words, she initially thought that there was a 5% chance the coin was unfair, but based on Frank’s results she now thinks there’s only a 3.2% chance that it is.

It’s not surprising that a Bayesian and frequentist interpretation of the same result give different answers, but I think it’s kind of surprising that Frank and Betsy interpret the same evidence in opposite ways: Frank says it rules out the possibility that the coin is fair with high significance, but Betsy says it increases her belief that the coin is fair.  Moreover, as I mentioned before, even if Frank had adopted a more stringent criterion for significance — say 0.01% instead of 1% — the same sort of thing could happen.

If Betsy had had a different prior, this evidence might not have had the same effect, but it turns out that  you’d get the same kind of result for a pretty broad range of priors. In particular, you  could change the 95% in the prior to any value you like, and you’d still find that the evidence increases the probability that the coin is fair. Also, you could decide that the assumption of a uniform prior for the unfair coins is unrealistic. (There probably aren’t any coins that come up heads 99% of the time, for instance.) But if you changed that uniform prior to any reasonably smooth, not too sharply peaked function, it wouldn’t change the result much.

In fact, you can prove a general theorem that says essentially the following:

No matter what significance level s Frank chooses, and what Betsy’s prior is, it’s still possible to find a number of coin flips and a number of heads such that Frank rules out the possibility that the coin is fair at significance s, while Betsy regards the evidence as increasing the probability that the coin is fair.

I could write out a formal proof of this with a few equations, but instead I’ll just sketch the main idea. Let n be the number of flips and k be the number of heads. Suppose Frank is a scoundrel, flipping the coin until he reaches the desired significance and then stopping. Imagine listing all the possible pairs (n,k) at which he might stop. If you just told Betsy that Frank had stopped at one of those points, but not which one, then you’d be telling Betsy no information at all (since Frank is guaranteed to stop eventually). With that information, therefore, her posterior probability distribution would be the same as her prior. But that posterior probability distribution is also a weighted average of the posterior  probability distributions corresponding to each of the possible pairs (n,k), with weights given by the probability that Frank stops at each of those points. Since the weighted average comes out the same as the prior, some terms in the average must give a probability of the coin being fair which is greater than the prior (and some must be less).

Incidentally, in case you’re wondering, Betsy and Frank are my parents’ names, which fortuitously have the same initials as Bayesian and frequentist. My father Frank probably did learn statistics from a frequentist point of view (for which he deserves pity, not blame), but he would certainly never behave like a scoundrel.

sigmas : statistics :: __________ : astronomy

Americans above a certain age will remember that the SAT used to include a category of “analogy questions” of the form “puppy : dog :: ______ : cow.” (This is pronounced “puppy is to dog as blank is to cow,” and the answer is “calf.”) Upon reading Peter Coles’s snarky and informative blog post about the recent quasi-news about the Higgs particle, I thought of one of my own. (By the way, Peter’s post is worth reading for several reasons, not least of which is his definition of the word “compact” as it is used in particle physics.)

Answer after the jump.

Continue reading sigmas : statistics :: __________ : astronomy

Faster than light neutrinos

My brother Andy asked me what I thought of the news that the faster-than-light neutrino result had been confirmed. Like pretty much all physicists, I was very skeptical of the original result, and I’m still skeptical. Here’s what I told him:

This is a confirmation by the same group, using essentially the same technique. They’ve improved the setup in one way that should eliminate one possible source of error (specifically, they made the neutrino pulses narrower, which makes it easier to compare arrival and departure times). That is an improvement, but that wasn’t the only possible source of error, and in fact I never thought it was the most likely one. I’m still waiting for confirmation by an independent group.

Is the wavefunction physically real?

To be honest, I hate this sort of question. I don’t know what “real” means, and I always have a suspicion that the people advocating for one answer or another to this question don’t know either.

There’s a new preprint by Pusey, Barrett, and Rudolph that is being described as shedding light on this question. According to Nature News, “The wavefunction is a real physical object after all, say researchers.”

From Nature:

The debate over how to understand the wavefunction goes back to the 1920s. In the ‘Copenhagen interpretation’ pioneered by Danish physicist Niels Bohr, the wavefunction was considered a computational tool: it gave correct results when used to calculate the probability of particles having various properties, but physicists were encouraged not to look for a deeper explanation of what the wavefunction is.

Albert Einstein also favoured a statistical interpretation of the wavefunction, although he thought that there had to be some other as-yet-unknown underlying reality. But others, such as Austrian physicist Erwin Schrödinger, considered the wavefunction, at
least initially, to be a real physical object.

The Copenhagen interpretation later fell out of popularity, but the idea that the wavefunction reflects what we can know about the world, rather than physical reality, has come back into vogue in the past 15 years with the rise of quantum information theory, Valentini says.

Rudolph and his colleagues may put a stop to that trend. Their theorem effectively says that individual quantum systems must “know” exactly what state they have been prepared in, or the results of measurements on them would lead to results at odds with quantum mechanics. They declined to comment while their preprint is undergoing the journal-submission process, but say in their paper that their finding is similar to the notion that an individual coin being flipped in a biased way — for example, so that it comes up ‘heads’ six out of ten times — has the intrinsic, physical property of being biased, in contrast to the idea that the bias is simply a statistical property of many coin-flip outcomes.

As far as I can tell, the result in this paper looks technically correct, but it’s important not to read too much into it. In particular, this paper has precisely nothing to say, as far as I can tell, on the subject known as the “interpretation of quantum mechanics.”

When people argue about different interpretations of quantum mechanics, they generally agree about the actual physical content of the theory (specifically about what the theory predicts will happen in any given situation) but disagree about what the predictions mean. In particular, the wavefunction-is-real camp and the wavefunction-isn’t-real camp would do the exact same calculations, and get the exact same results, for any specific experimental setup.

This paper considers a class of theories that are physically distinct from quantum mechanics — to be specific, a certain class of “hidden-variables theories,” although not the ones that were considered in most earlier hidden-variables work — and shows that they lead to predictions that are different from quantum mechanics. Therefore, we can in principle tell by experiment whether these alternative theories are right.

This is a nice result, but it seems to me much more modest than you’d think from the Nature description. I don’t think that people in the wavefunction-isn’t-real camp believe that one of these hidden-variables theories is correct, and therefore I don’t see how this argument can convince anyone that the wavefunction is real.

I admit that I’m not up-to-date on the current literature in the foundations of quantum mechanics, but I don’t know of anyone who was advocating in favor of the particular class of theories being described in this paper, and so to me the paper has the feel of a straw-man argument.

Personally, to the limited extent that I think the question is meaningful, I think that the wavefunction is real (in the ontological sense — mathematically, everyone knows it’s complex, not real!). But this preprint doesn’t seem to me to add significantly to the weight of evidence in favor of that position.

Faster-than-light neutrino results explained?

Note: The original version of this post was completely, embarrassingly wrong. I replaced it with a new version that says pretty much the opposite. Then Louis3 in the comments pointed out that I had misunderstood the van Elburg preprint yet again, but, if I’m not mistaken, that misinterpretation on my part doesn’t fundamentally change the argument. I hope I’ve got it right now, but given my track record on this I wouldn’t blame you for being skeptical!

If you’re reading this, you almost certainly know about the recent announcement by the OPERA group of experimental results showing that neutrinos travel slightly faster than light. I didn’t write about the original result here, because I didn’t have anything original to say. I pretty much agreed with the consensus among physicists: Probably something wrong with the experiments, extraordinary claims require extraordinary evidence, Bayesian priors, wait for replication, etc.

Recently, there’s been some buzz about a preprint being circulated by Ronald van Elburg claiming to have found an error in the OPERA analysis that would explain everything. If you don’t want to slog through the preprint itself (which is short but has equations), this blog post does a good job summarizing it.

van Elburg’s claim is that the OPERA people have incorrectly calculated the time of flight of a light signal between the source and detector in the experiment. (This is a hypothetical light signal, used for reference — no actual light signal went from one place to the other.) He goes through a complicated special-relativity calculation involving switching back and forth between an Earth-fixed (“baseline”) reference frame and a reference frame attached to a GPS satellite. I don’t understand why he thinks this complicated procedure is necessary : the final result is a relationship between baseline-frame quantities, and I don’t see why you can’t just calculate it entirely in the baseline frame. But more importantly, his procedure contains an error in the application of special relativity. When this error is corrected, the discrepancy he claims to have found goes away.

As a mea culpa for getting this completely wrong initially (and also for the benefit of the students in a course I’m teaching now), I’ve written up a critique of the van Elburg preprint, in which I try to explain the error in detail. I find it cumbersome to include equations in blog posts (maybe I just haven’t installed the right tools to do it), so I’ve put the critique in a separate PDF document. I’ll just summarize the main points briefly here.

van Elburg calculates the time of flight between source and detector in the following complicated way:

  1. He relates the satellite-frame source-detector distance to the baseline-frame distance via Lorentz contraction.
  2. He calculates the flight time in the satellite frame (correctly accounting for the fact that the detector is moving in this frame — which is what he claims OPERA didn’t do).
  3. He transforms back to the baseline frame.

At the very least, this is unnecessarily complicated. The whole point of special relativity is that you can work in whatever inertial frame you want, so why jump back and forth this way, rather than just doing the calculation in the Earth frame? In fact, I originally (incorrectly) thought that he’d done the calculation correctly but in an unnecessarily cumbersome way. It turns out that it’s worse than that, though: his calculation is just plain wrong.

The main error is in his equation (5), specifically when he writes

This is supposed to relate the time of flight in the satellite frame to the time of light in the Earth frame. But the time-dilation rule doesn’t apply in this situation. It’s only correct to calculate time dilation in this simple way (multiply by gamma) if you’re talking about events that are at the same place in one of the two reference frames. The standard example is two birthdays of one of the two twins in the twin paradox. When you’re considering two birthdays of the rocket-borne twin, you’re considering two events that are at the same place in the rocket frame, and the multiply-by-gamma rule is fine.

But in this case the time intervals under consideration are times of flight. That means that they’re time intervals between one event at one place (radio wave leaves the source) and another event at another place (radio wave arrives at detector). To properly relate time intervals of this sort in two different frames, you need the full machinery of the Lorentz transformation. If you use that full machinery to convert from satellite frame to Earth frame, you find that the time of flight comes out just the way you’d expect it to if you’d done the whole calculation in the Earth frame to begin with. (Of course it had to be that way — that’s the whole point of the principle of relativity.)

Now if the OPERA people had done their analysis the way van Elburg does (jumping back and forth with wild abandon between Earth and satellite frames), and if when they were in the satellite frame they had calculated a time of flight without accounting for the detector’s motion, then they would have been making an error of essentially the sort van Elburg describes. But as far as I can tell there’s no credible evidence, either in this preprint or in the OPERA paper, that they did the analysis this way at all, let alone that they made this error.

So this explanation of the OPERA results is a non-starter. Sorry for originally stating otherwise.

The passive voice can be used

I’m trying to be pretty rigorous in evaluating my students’ writing, but one thing I’m not telling them is to avoid the passive voice. I think that the avoid-the-passive rule, despite its popularity among writing teachers and in usage guides, is pretty much a superstition. It’s a marginally more worthwhile rule than other superstitions such as avoiding split infinitives, but only marginally.

Lots of people disagree with me about this, as I found when I participated in a big discussion of this on my brother’s Facebook wall recently. (I also seem to end up discussing the Oxford comma with surprising frequency on Facebook. No doubt once this information gets out I’ll be deluged with friend requests.) So I was glad to see this spirited defense of the passive by linguist Geoffrey Pullum recently.

Pullum also wrote a blistering takedown of Strunk and White a while back. I had mixed feelings about that one, but I think he’s got things write in the passive-voice piece.

Scientists are often taught to write in the passive in order to de-emphasize the role of the experimenter. You’re supposed to say “The samples were collected” instead of “We collected the samples,” because it’s not supposed to matter who did the collecting. Personally, I think this is another superstition, roughly equal in silliness to the no-passive-voice superstition. Ignore them both, and write whatever sounds best. (In most cases like the above,  I think that the active-voice construction ends up sounding more natural.)

I have heard one cogent argument in favor of teaching the avoid-the-passive rule: even if write-whatever-sounds-better is a superior rule, it’s not one that most inexperienced writers are capable of following. They need firm rules, even if those rules are just heuristics which they’ll later outgrow.

There’s some truth in this, and as long as we’re all clear that avoid-the-passive is a sometimes useful heuristic, as opposed to a firm rule, I have no major objections. But there are so many exceptions to this rule that I’m not convinced it’s all that good even as a heuristic. As Pullum points out, Orwell’s essay warning against the passive is itself 20% passive.

At least in the case of my students, overuse of the passive doesn’t seem like one of the top priorities to address. If I’m looking for heuristics to help them improve their writing, this one wouldn’t be near the top of the list. Here are two better ones that come immediately to mind:

  • Cut out all intensifiers (“very”, “extremely”, etc.), unless you have a good reason for them.
  • If you feel the need to include a qualifier like “As I mentioned earlier,” then the sentence in question probably doesn’t need to be there at all.
(Rules like these can be hard to follow. I initially wrote “a very good reason” in the first one, for instance.)

Addenda: Libby rightly points out “In other words” as a marker for the sort of thing I’m talking about in the second “rule.” A couple more I’d add to the list:

  • Don’t use fancy words for their own sake, especially if you’re in any doubt about the word’s precise meaning. Plain, familiar words are just fine.
  • Read your work aloud. Often, a sentence that look OK on the page sounds unnatural when you hear it.
  • If you’ve got a really long paragraph (at a rough guess, greater than about 200 words), chances are that you’ve muddled together several different ideas, each of which deserves its own paragraph.
One final point, emphasized by Pullum: An additional problem with teaching the avoid-the-passive rule is that most people don’t find grammar intuitive and don’t even recognize passive constructions correctly a lot of the time. (This is the place where his takedown of Strunk and White is most compelling. Even they get it wrong most of the time.) The avoid-the-passive rule seems to be meant as a simple proxy for more difficult rules, but it’s not even simple for most people in the target group.

A scientist teaching writing

I’m teaching a first-year seminar this semester, which is quite a different sort of course for me. We’re at the halfway point in the semester, which seems like as good a time as any to reflect a bit on how it’s going.

First, some background. First-year seminars replaced the Core course we used to require of all entering students (a change I strongly supported, by the way). Under the current system, all students have to take a first-year seminar each semester of their first year. These courses cover a wide variety of topics, based on faculty interest and expertise, but they’re all supposed to have certain things in common. Perhaps the most important of these is that all seminars are “writing-intensive.”

My seminar is called “Space is Big.” It’s about how ideas about the size of the Universe have changed over time, focusing on three periods: the Copernican revolution, the early 20th century, and the present.

So what do I have to say at the halfway point?

First, reading and grading essays takes a lot of time. It’s much harder than grading problem sets and exams. This is not a surprise, of course. The most time-consuming part is writing comments on each essay. I find pretty detailed comments are necessary, both to clarify my own thinking about why I’m giving the grade I am and more importantly to give the student guidance for improvement.

There are some teaching duties we scientists have that others don’t (designing labs, for instance). When we feel like complaining about that sort of thing, we should remember how much easier we generally have it when it comes to grading. (Not that we’ll stop complaining. Complaining is one of the great joys of life. It sets us apart from the animals.)

Based on my experience so far, the main problems our students have with their writing involve organization and structure: they’re pretty good at the individual sentence level, but they sometimes have trouble combining those sentences in a coherent way. The most common serious flaw in my students’ essays is the long, rambling paragraph that contains lots of true facts in no discernible order. Other problems include unnecessary repetitiveness and puffed-up, diffuse phrases that add no meaning. (I should add that not all of my students have these problems: some of them write quite well.)

This should be reassuring to my science colleagues, some of whom are convinced that they’re not qualified to teach writing because they don’t know rules of grammar and usage. True, some science professors do get confused about grammar and usage in ways that you wouldn’t expect to see from, say, an English or history professor. (Present company, excluded, of course! I’m a bit of a usage geek, and while I have many flaws, you’re not likely to catch me in a comma splice.) But based on my experience, the main sort of help students need with their writing concern structuring an argument clearly, logically, and concisely, not misplacing apostrophes.

We as scientists are perfectly qualified to teach and evaluate writing in this sense. We spend huge amounts of our time writing and evaluating other people’s writing (papers, grant proposals, etc.). We wouldn’t have gotten anywhere in science without skill in both these areas. That’s not to say that teaching and evaluating writing is easy — for scientists or anyone else. But we can do it.

And by the way, for those who are concerned about gaps in their ability to teach grammar and usage (or other aspects of writing), the University’s Writing Center has good support for faculty and students.

This course is far more work than a “normal” course of the sort I’m used to, but on the whole it’s been fun, mostly because I get to read and think about familiar subjects in a new way. I urge my science colleagues not to be scared to try it out.

Doom from the sky

I was on the Channel 8 Richmond TV news last night. You can see the video here.

Since I’m apparently the only astrophysicist in the greater Richmond area, I sometimes get asked to comment on space stories. I think this is my first time on this channel; I’ve been on Channel 6 from time to time.

In this case, they wanted to talk about the UARS satellite, which is going to reenter the atmosphere in the next couple of weeks. Some pieces are predicted to survive reentry and reach the ground.

Two disappointing things about this piece:

  1. The reporter says that there’s a 1 in 3200 chance of “being hit” by the debris. This is NASA’s estimate of the chance of someone, somewhere in the world being hit. The chance of any given person (such as you) being hit is 7 billion times smaller — i.e., one in 20 trillion. I stated that in the interview, but they chose not to use that part. The way they stated it is extremely misleading.
  2. The Santa Claus line is mine.