Someone doesn’t understand probabilities

August 21st, 2014 by Ted Bunn

I know: as headlines go, this one is not exactly Man Bites Dog. Let me be a bit more specific. Either the New York Times or trial lawyers don’t understand probability. (This, incidentally, is a good example of the inclusive “or”.)

The Times has an interactive feature illustrating the process by which lawyers decide whether to allow someone to be seated on a jury. For those who don’t know, in most if not all US courts, lawyers are allowed to have potential jurors stricken from jury pools, either for cause, if there’s evidence that a juror is biased, or using a limited number of “peremptory challenges” to remove people that the lawyer merely suspects will be unfavorable to his or her side. The Times piece asks you a series of questions and indicates how your answers affect the  lawyers’ opinion about you in a hypothetical lawsuit by an investor suing her money manager for mismanaging her investments.

The first two questions are about your job and age. As a white-collar worker, I’m told that I’d be more likely to side with the defendant, but the fact that I’m over 30 makes me more likely to favor the plaintiff. A slider at the top of the screen indicates the net effect:

So far so good. Question 3 then asks about my income. Here are the two possible outcomes:

So if I’m high-income, there’s no effect, but if I’m low-income, I’m more likely to side with the plaintiff. This is logically impossible. If one answer shifts the probability one direction, the other answer must shift it the other direction (by some nonzero amount).

Before the lawyers found out the answer, they knew that I was either low-income or high-income. (A waggish mathematician might observe that the possibility that my income is exactly $50,000 is not included in the two possibilities. This is why no one likes a waggish mathematician.) The lawyers’  assessment of me before asking the question must be a weighted average of the two  subsequent possibilities, with weights given by their prior beliefs about what my income would turn out to be. For instance, if they thought initially that there was a 70% chance that I’d be in the high-income category, then the initial probability should have been 0.7 times the high-income probability plus 0.3 times the low-income probability.

That means that if one answer to the income question shifts the probability toward the plaintiffs, then the other answer must shift the probability in the other direction.

So either the lawyers the reporter talked to are irrational or the reporter has misunderstood them. For what it’s worth, my money is on the first option. Lots of people don’t understand probabilities, but it seems likely to me that the Times reporters would have asked these questions straightforwardly and accurately reported the answers they heard from the lawyers they talked to.

If that’s true, it seems like it should present a money-making opportunity for people with expertise in probability. Lawyers who hired such people as consultants would presumably do a better job at jury selection and win more cases.


August 20th, 2014 by Ted Bunn

Update: Got a very nice and very prompt note back from the people who run the place. Apparently they’ve removed this material.

My wife and I just got back from a very nice vacation in Nova Scotia, which is very beautiful (and much cooler than Richmond in August). Among other things (such as rafting on the tidal bore in the Bay of Fundy, which I highly recommend), we visited the Joggins Fossil Cliffs, a UNESCO World Heritage Site where, as the name suggests, you can see tons of fossils. The site includes both a museum and a stretch of beach you can walk along and spot fossils in their natural habitat, so to speak. There are guides to show you things and help you figure out what you’re seeing. On the whole, it’s quite interesting and educational. If you’re nearby, it’s definitely worth a visit.

The site is run by a nonprofit educational organization. As usual, they get part of their revenue from a gift shop. Among the things you can buy in the gift shop are pretty polished stones.

So far so good. Now for the curmudgeonliness. The polished stones are accompanied by this pamphlet.

As I’m sure I don’t need to tell anyone who’s reading this, the last sentence of each description is complete nonsense. Stones and crystals do not have any effect on the human psyche.

I understand that the organization needs to raise money, but is it too much to ask that they refrain from actively promoting pseudoscience in doing so? The gift shop does not stock Creationist books that claim the Earth is 6000 years old, presumably because to do so would undermine their educational mission. This may be somewhat different in degree but not at all different in kind.

This sort of thing might seem harmless, but it’s not. People really believe in things like this. If they didn’t, there wouldn’t be Web sites like (I’d rather not link to it) that will sell you crystals to cure hundreds of different ailments. Look at this screen shot, for instance.

This is a link to 728 items you can buy that purport to help you if you have cancer but in fact do nothing. People with cancer (among other things) are being fleeced and are being given false hope by this sort of nonsense. For a science educator to give any sort of seal of approval to this is not OK.

As my colleague Matt Trawick pointed out, the last item on the list is particularly interesting, in a Catch-22 sort of way. Suppose that you buy some sodalite and it does in fact cause you “to become logical and rational.” Would you then go ask for your money back?

By the way, I’ve sent a note to the organization that runs the Fossil Cliffs outlining my concern. I’ll post something if I hear anything back. (See Update at the top.)

It’s not rocket science

August 4th, 2014 by Ted Bunn

You may have heard about these NASA engineers who claim to have demonstrated a reactionless drive mechanism — that is, something that can generate thrust without shooting anything out the back end. Such a device would violate one of the most well-established laws of physics, namely conservation of momentum. It would be an incredibly big deal if true.

Of course it’s not true, for all the usual reasons: Extraordinary claims require extraordinary evidence, never believe an experiment until it’s been confirmed by a theory, etc.

You can be confident that this result is wrong by using reasoning, or, as some people like to call it, Bayesian reasoning. To be specific, the new experimental result causes you to update your prior beliefs. Your prior belief was, or at least should have been, that there’s an incredibly high probability that momentum is conserved, particularly in situations like this one that are described by the best-tested theory in the history of science. When your prior is extremely strong (in this case because of centuries’ worth of experimental confirmation), even a very well-done experiment is not enough to dislodge it.

Phil Plait’s post is a reasonable place to go for more details, although he’s much too kind at a couple of points:

I’m not saying it’s wrong, but I am saying it’s very, very likely to be some sort of measurement or experimental error.

This is bizarrely wishy-washy. I, for one, am saying it’s wrong.

Plait also says

The only other way this device could possibly work is if it’s interacting with “virtual particles”, an interesting idea, but a highly speculative one.

Again, this is far too kind. To say that this works by “interacting with virtual particles” means precisely as much as saying that it works by interacting with invisible blue fairies. “Virtual particles,” as the term has been used in physics for nearly a century, would definitely not produce an effect like this. If the authors mean anything by this claim at all, then they are using that term in a way that bears no relation to its usual meaning, but of course there’s no indication at all of what they do think it means. They should just just call them invisible blue fairies instead, to avoid confusion.

Despite my complaints, Plait does sound an appropriate, if understated, note of skepticism. He also links to a couple of posts by my old friend John Baez, which treat the subject with an appropriate level of scorn. No euphemisms like “highly speculative” for him:

 ”Quantum vacuum virtual plasma” is something you’d say if you failed a course in quantum field theory and then smoked too much weed.

Despite being a mathematician, Baez digs deeper than most people into the experimental details, pointing out one astonishing fact that I haven’t seen mentioned elsewhere: the article describes in detail the workings of the vacuum chamber in which the experiment was performed, but the actual experiment was done “at ambient atmospheric pressure” (i.e., not in a vacuum). This is important because one obvious possible source of error is the production of air currents surrounding the device.



Another Mathematica bug

July 5th, 2014 by Ted Bunn

I got a reply to my last bug report, acknowledging that it’s a bug and saying they’d send it on to their engineers.

Here’s another:

The first one should only evaluate to zero if it’s zero for all a,b. The second indicates that that’s not the case. This one cost me several hours, as I searched elsewhere for why my results didn’t make sense. (The offending bit was originally in the middle of a larger expression; otherwise, I would have figured it out more quickly.)

Update: The Mathematica people have acknowledged this bug and passed the word on to their developers, presumably to be fixed in the next version.

Many worlds

June 30th, 2014 by Ted Bunn

I just wanted to link to Sean Carroll’s post defending the many-worlds interpretation of quantum mechanics. Sean has a habit of getting this sort of thing right.

He explains that the multiple worlds are not an add-on to the theory but instead are simply what happens naturally when you take the equations of the theory at face value. The standard (“Copenhagen”) interpretation is the one that needs to postulate an ad hoc extra rule. We should simply rename things:

  • The “many-worlds interpretation of quantum mechanics” should henceforth be known as “quantum mechanics.”
  • The Copenhagen interpretation should henceforth be known as “the disappearing-worlds interpretation of quantum mechanics.”

The system works

June 20th, 2014 by Ted Bunn

Peer review, that is.

Remember BICEP2?

They announced a detection of B-mode microwave background polarization, which would be strong evidence that inflation happened in the early universe, but some people expressed doubt about whether they’d adequately eliminated the possibility that what they were seeing was contamination due to more nearby sources of radiation, particularly Galactic dust. (Some other very eminent people then said silly things.)

All of this occurred before the paper describing the results had undergone peer review. The paper has now been accepted for publication in the prestigious journal Physical Review Letters, with significant changes. As of now, the arxiv still has the original version, so it’s easy to compare it with the published version.

The authors have removed all discussion of one of the dust models that they used to argue against the possibility of dust contamination. This is the notorious “DDM2″ model, which was based in part on data gleaned from a slide shown in a talk. A footnote explains the removal of this model, saying in part that “we have concluded the information used for the DDM2 model has unquantifiable uncertainty.”

Although the concerns about the DDM2 model got the most attention, people raised a number of concerns about the preprint’s discussion of dust contamination. Presumably the referees agreed, because the published paper is much more cautious in its claims. For instance, take a look at the end of the preprint,

The long search for tensor B-modes is appar- ently over, and a new era of B-mode cosmology has begun.

and compare it with the published version,

We have pushed into a new regime of sensitivity, and the high-confidence detection of B-mode polarization at degree angular scales brings us to an exciting juncture. If the origin is in tensors, as favored by the evidence presented above, it heralds a new era of B-mode cosmology. However, if these B modes represent evidence of a high-dust foreground, it reveals the scale of the challenges that lie ahead.

This is a case in which peer review clearly improved the quality of the paper: the second version is much more accurate than the first.

Other than the removal of the DDM2 model, I don’t think that the actual results have changed; the difference is all in the description of their significance. This is exactly as it should be. Even those of us who harbor doubts about the interpretation generally agree that this is a very important and exciting data set. The researchers deserve high praise for performing an experimental tour de force.

Some people say that the BICEP team shouldn’t have released their results at all until after peer review. I think that this objection is wrongheaded. There’s no way of knowing, but I bet that the official referees were able to give a much better critique of the paper because lots of other experts had been examining and commenting on it.

One argument that we shouldn’t publicize unreviewed results because this sort of thing makes us look bad.  The media gave a lot of coverage to the original result and are naturally covering subsequent events as a reversal, which is perhaps embarrassing. In this particular case, of course I wish that the earlier reports had emphasized the doubts (which started to appear right away), but in general I can’t get too upset about this problem. I think it’s much better if the media covers science as it actually is — people get exciting results, and then the rest of the community chews them over before deciding what to think about them — instead of a sanitized version. It seems clear to me that the advantages of an open discussion in figuring out the truth far outweigh the disadvantages in (arguably) bad publicity.

There’s one strange thing about the BICEP2 paper. It appeared in Physical Review Letters, which traditionally is limited to very short papers. The limit used to be four pages. It’s now expressed in word count, but it comes to about the same thing. The published paper is at least five times longer than this limit. I don’t know if this has ever happened before.

Here’s another piece of the puzzle. The preprint doesn’t say which journal it was submitted to but is formatted in a style that doesn’t match PRL at all. In particular, the citations are in author-year format, whereas PRL uses numbered citations.

It’s no big deal, but I’m mildly curious about the explanation for these facts.

Strange Mathematica behavior

June 19th, 2014 by Ted Bunn

So far so good. This is the correct result. Now multiply the x by 1.0:


That makes Mathematica think that the integral fails to converge.

I found this out when reproducing some calculations from a couple of years ago. It seemed to work back then, so this behavior seems to have been introduced in a recent version of Mathematica.

I know of various reasons why putting in the 1.0 could make a difference (because it forces Mathematica to think in terms of floating-point numbers with finite accuracy, rather than exact integers), but I don’t think any of them should make a difference here. The integral is failing to converge at x=0 (I checked that the problem is there, not at infinity), and the integrand is perfectly well-behaved there, even if you replace the 1.0 by any other complex number.


10 Scientific Ideas That Scientists Wish You Would Stop Misusing

June 16th, 2014 by Ted Bunn

That’s the headline of a piece on io9.  I find the headline a bit obnoxious —  we scientists are lecturing you, the unwashed masses, about what you’re doing wrong, when in fact scientists themselves are to blame for at least some of the misunderstandings described. But the actual content is very good.

Sean Carroll says very sensible things about “proof”. Science is mostly about accumulation of evidence, which allows us to update our model of the world via Bayesian reasoning (or as I like to call it “reasoning”).

Jordan Ellsburg takes aim at “statistically significant”:

“Statistically significant” is one of those phrases scientists would love to have a chance to take back and rename. “Significant” suggests importance; but the test of statistical significance, developed by the British statistician R.A. Fisher, doesn’t measure the importance or size of an effect; only whether we are able to distinguish it, using our keenest statistical tools, from zero. “Statistically noticeable” or “Statistically discernable” would be much better.

Well said. The fact that something can be “statistically significant” and simultaneously utterly unimportant is very often lost, particularly in descriptions of medical findings.

This item illustrates what bothers me about the headline of the piece, by the way. It smacks of blaming the victim. Scientists are at least as much to blame as anyone else for talking about “statistically significant” results in a misleading way.

The other items are well worth reading too. I particularly recommend the ones on quantum weirdness and “natural”.


Female hurricane names

June 9th, 2014 by Ted Bunn

Supposedly, hurricanes with feminine-sounding names are more deadly than those with male-sounding names. That’s the conclusion of a study published in PNAS. Put me down on the side of skepticism.

I should point out that the published paper discusses two different sets of results: an analysis of the death toll from past hurricanes, and a set of surveys of people’s perceptions of hurricanes based only on their names. The latter study shows that people do perceive hurricanes as milder when they have feminine-sounding names (in the absence of other information). I’m quite prepared to believe that one. The first finding, about actual people actually dying, is the one I don’t believe.

First, this is precisely the sort of result that’s most susceptible to publication bias. (If you checked for this and found nothing, you wouldn’t publish, but if you found something, you would.) This is the main reason that some people claim that “most published research findings are false.”

Add to that the closely related problem that’s sometimes known as p-hacking. This is the practice of testing out multiple hypotheses, data sets, or statistical methods, and only reporting the ones that yield interesting results. P-hacking artificially inflates the statistical significance of your results. From the PNAS paper:

The analyses showed that the change in hurricane fatalities as a function of MFI [a measure of how masculine or feminine a name is perceived to be] was marginal for hurricanes lower in nor- malized damage, indicating no effect of masculinity-femininity of name for less severe storms. For hurricanes higher in nor- malized damage, however, this change was substantial, such that hurricanes with feminine names were much deadlier than those with masculine names.

To summarize, “the first thing we tried didn’t yield an interesting result, but we kept trying until we found something that did.” (To give the authors credit, at least they acknowledge that they did this. That doesn’t make the result right, but it’s more honest than the alternative.)

But here’s the biggest problem. The study used hurricanes from 1950 onwards. Hurricanes were essentially all given female names until 1978, so the masculine names are heavily weighted toward late times. So something else that produces a change in hurricane deadliness over time would mimic  the effect that is seen. An obvious candidate: better early warning systems, which reduce fatalities for any given level of storm severity.

Not surprisingly, other people have pointed out problems like these. The authors of the study have responded. If you want to decide what you think about this, you should definitely read what they have to say. I’ll just point out a few things.

On the most important issue (the question of whether to use pre-1979 all-female-named hurricanes), the authors point out that their data used degree of perceived femininity of names, not simply a male-female binary, so there is information in the pre-1979 names. This is not sufficient to resolve the problem, which is that the pre-1979 data are, on average, very different in MFI from the later data. They then say

Although it is true that if we model the data using only hurricanes since 1979 (n=54) this is too small a sample to obtain a significant interaction, when we model the fatalities of all hurricanes since 1950 using their names’ degree of femininity, the interaction between name-femininity and damage is statistically significant.

This response, of course, simply digs them deeper into the p-hacking hole. Essentially, they’re saying that they used the less reliable data set because the more reliable one didn’t give an interesting result.

The authors give a second, better response to this problem:

We included elapsed years (years since the hurricane) in our modeling and this did not have any significant effect in predicting fatalities. In other words, how long ago the storm occurred did not predict its death toll.

This is potentially a good argument, although the details depend on precisely what tests they performed. The relevant question is whether replacing the MFI with the time since the hurricane gives a model that fits reasonably well (in which case assuming an MFI effect is not necessary). Unfortunately, these results are not presented in the paper, so there’s no way to tell if that’s what was done. The authors do mention that they tried including elapsed time as one of their variables, but without enough specifics to tell what was actually done.

The more I think about it, the stranger it is that this issue is not addressed in detail in the paper. Usually, in this sort of study, you control for other variables that might mimic the signal you’re looking for. In this case, an overall drift in hurricane deadliness with time is by far the most obvious such variable, and yet it’s not included in any of the models for which they compute goodness of fit. It’s very strange that the authors would not include this in their models, and even stranger that the reviewers let them.

Even if BICEP2 is wrong, inflation is still science

June 4th, 2014 by Ted Bunn

Paul Steinhardt played a major role in developing the theory behind cosmological inflation, but he has since turned into one of the theory’s biggest detractors. Sometimes, theorists get so attached to their theories that they become blind proponents of them, so it’s quite commendable for someone to become a critic of a theory that he pioneered. But of course that doesn’t mean that Steinhardt’s specific criticisms are correct.

He’s got a short and fiery manifesto in Nature (not behind a paywall, I think, but if you can’t get to it, let me know). The title and subheading:

Big Bang blunder bursts the multiverse bubble

Premature hype over gravitational waves highlights gaping holes in models for the origins and evolution of the Universe.

For a column like this (as opposed to a research article), the author isn’t necessarily responsible for the title, but in this case the headlines pretty accurately capture the tone of the piece.

The hook for the piece is the controversy surrounding the BICEP2 claim to have detected the signature of gravitational waves from inflation in the cosmic microwave background (CMB) radiation. Since my last post on this, the reasons for doubt have gotten stronger: two preprints have come out giving detailed arguments that the BICEP team have not made a convincing case against the possibility that their signal is due to dust contamination. The BICEP team continues to say everything is fine, but, as far as I know, they have not provided a detailed rebuttal of the arguments in the preprint.

For what it’s worth, I find the doubts raised in these preprints to be significant. I’m not saying the BICEP2 result is definitely not CMB, but there’s significant doubt in my mind. At this point, I would place an even-odds bet that they have not seen CMB, but I wouldn’t make the bet at 5-1 odds.

So I share Steinhardt’s skepticism about the BICEP2 claim, at least to some extent. But he leaps from this to a bunch of ridiculously overblown statements about the validity of inflation as a scientific theory.

The common view is that [inflation] is a highly predictive theory. If that was the case and the detection of gravitational waves was the ‘smoking gun’ proof of inflation, one would think that non-detection means that the theory fails. Such is the nature of normal science. Yet some proponents of inflation who celebrated the BICEP2 announcement already insist that the theory is equally valid whether or not gravitational waves are detected. How is this possible?

The “smoking gun” is a terribly overused metaphor in this context, but here it’s actually helpful to take it quite seriously. A smoking gun is strong evidence that a crime has been committed, but the absence of a smoking gun doesn’t mean there was no crime. That’s exactly the way it is with inflation, and despite what Steinhardt says, this is perfectly consistent with “normal” science. People searched for the Higgs boson for decades before they found it. When a search failed to find it, that didn’t mean that the Higgs didn’t exist or that the standard model (which predicted the existence of the Higgs) wasn’t “normal science.”

Steinhardt knows this perfectly well, and by pretending otherwise he is behaving shamefully.

Steinhardt goes on to say

The answer given by proponents is alarming: the inflationary paradigm is so flexible that it is immune to experimental and observational tests.

Whenever someone attributes an opinion to unnamed people and provides no citation to back up the claim, you should assume you’re being swindled. I know of no “proponent” of inflation who bases his or her support on this rationale.

There is a true statement underlying this claim: inflation is not a unique theory but rather a family of theories. There are many different versions of inflation, which make different predictions. To put it another way, the theory has adjustable parameters. Again, this is a perfectly well-accepted part of “normal science.” If BICEP2 turns out to be right, they will have measured some of the important parameters of the theory.

It’s certainly not true that inflation is immune to tests. To cite just one obvious example, inflation predicts a spatially flat Universe. If we measured the curvature of the Universe and found it to be significantly different from zero, that would be, essentially, a falsification of inflation. As it turns out, inflation passed this test.

I put the word “essentially” in there because what inflation actually predicts is that the probability of getting a curved Universe is extremely low, not that it’s zero. So a measurement of nonzero curvature wouldn’t constitute a mathematical proof that inflation was false. Once again, that’s science. No matter what Popper says, what we get in science is (probabilistic) evidence for or against theories, not black-and-white proof. We use Bayesian reasoning (or as I like to call it, “reasoning”) to draw conclusions from this evidence. A curved Universe would have been extremely strong evidence against inflation.

Part of Steinhardt’s objection to inflation stem from the fact that inflationary models often predict a multiverse. That is, in these theories there are different patches of the Universe with different properties.

Scanning over all possible bubbles in the multi­verse, every­thing that can physically happen does happen an infinite number of times. No experiment can rule out a theory that allows for all possible outcomes. Hence, the paradigm of inflation is unfalsifiable.

This once again ignores the fact that essentially all scientific tests are probabilistic in nature. Because measurements always have some uncertainty, you pretty much never measure anything that allows you to conclude “X is impossible.” Instead, you get measurements that, by means of [Bayesian] reasoning, lead to the conclusion “X is extremely unlikely.” Even if anything can happen in these bubbles, some things happen much more than others, and hence are much more likely. So by observing whether our patch of the Universe fits in with the likely outcomes of inflation or the unlikely ones, we build up evidence for or against the theory. Normal science.

To be fair, I should say that there are technical issues associated with this paradigm. Because inflation often predicts an infinite number of bubbles, there are nontrivial questions about how to calculate probabilities. The buzzword for this is the “measure problem.” To be as charitable as possible to Steinhardt, I suppose I should allow for the possibility that that’s what he’s referring to here, but I don’t think that that’s the most natural reading of the text, and in any case it’s far from clear that the measure problem is as serious as all that.

One final note. As Steinhardt says, future experiments will shed light on the BICEP2 situation, and these experiments will justifiably face heightened scrutiny:

This time, the teams can be assured that the world will be paying close attention. This time, acceptance will require measurements over a range of frequencies to discriminate from foreground effects, as well as tests to rule out other sources of confusion. And this time, the announcements should be made after submission to journals and vetting by expert referees. If there must be a press conference, hopefully the scientific community and the media will demand that it is accompanied by a complete set of documents, including details of the systematic analysis and sufficient data to enable objective verification.

For what it’s worth, I don’t think that people should necessarily wait until results have been refereed before announcing them publicly. In astrophysics, it’s become standard to release preprints publicly before peer review. I know that lots of scientists disagree about this, but on balance I think that that’s a good thing. The doubts that have been raised about BICEP2 could very easily not have been caught by the journal referees. If they’d waited to announce the result publicly until after peer review, we could easily be having the same argument months later, about something that had undergone peer review. Errors are much likely to be caught when the entire community is scrutinizing the results rather than one or two referees.

I should add that Steinhardt is completely right about the “accompanied by a complete set of documents” part.

Update: Peter Coles has a very nice post (as usual) on this. His views and mine are extremely similar.