p-values aren’t wrong; they’re just uninteresting

March 9th, 2015

The journal Basic and Applied Social Psychology has come out with a ban on p-values. To be precise, they’ve banned the “null hypothesis significance testing procedure” from articles published in the journal. This ban means that authors in the journal can’t claim that an effect they see in their data is “statistically significant” in the usual way that we’re all accustomed to reading.

Like all right-thinking people, I believe that the only coherent way to think about statistical questions is the Bayesian way, and that there are serious problems with the alternative “frequentist” approach. Moreover, the sort of (frequentist) significance testing banned by this journal can indeed lead to serious problems. It’s a large part of the reason that a strong argument can be made that (at least in some scientific disciplines) most published research findings are false.

All of that suggests that I ought to applaud this decision, but at the risk of seeming disloyal to my fellow Bayesians, I don’t. In fact, the journal editors’ decision to impose this ban makes me trust the quality of the journal less than I otherwise would, not more.

I was pleased to see that my old friend Allen Downey, always a voice of sanity on matters of this sort, is of the same opinion. I won’t rehash everything he says in his post, but I heartily endorse it.

The main thing to realize is that the techniques in question aren’t actually wrong. On the contrary, they correctly answer the questions they’re supposed to answer.

If your data allow you to reject the null hypothesis with a significance (“p-value”) of 5%, that means that, if the null hypothesis were true, there’d be only a 5% chance of getting data that look like the data you actually got. 

Some people — or so I’ve been told — labor under the misconception that the p-value tells you the probability that the null hypothesis is true, but it doesn’t. I’m going to rehash the old story here; skip ahead if you’ve heard it before.

Suppose that a pregnancy test yields correct results 95% of the time. Pat takes the test, which comes out positive. That means that the “null hypothesis” that Pat is not pregnant can be ruled out with a significance (p-value) of 5%. But it does not mean that there’s a 95% chance that Pat is pregnant. The probability that Pat is pregnant depends both on the result of the test and on any additional information you have about Pat — that is, the prior probability that Pat is pregnant. For example, if Pat is anatomically male, then the probability of pregnancy is zero, regardless of the test result.

Needless to say, Pat doesn’t care about the p-value; Pat cares about whether Pat is pregnant. The p-value is not wrong; it’s just uninteresting. As I’ve said before, this can be summarized with a convenient Venn diagram:

Slide11

So if p-values are uninteresting, why shouldn’t the journal ban them?

The main reason is because they can be an ingredient in drawing useful conclusions. You can combine the p-value with your prior knowledge to get an answer to the question you’re actually interested in (“How likely is the hypothesis to be true?”). Just because the p-value doesn’t directly answer Pat’s question about whether she’s pregnant, that doesn’t mean that it’s not valid and useful information about the reliability of the pregnancy test, which she can (and should) use in drawing conclusions.

As far as I can tell, the argument in favor of banning p-values is that people sometimes misinterpret them, but that’s a weak argument for a ban. It’s worth distinguishing two possibilities here:

  1. The article describes the statistical procedures and results accurately, but statistically illiterate readers misunderstand them. In this situation, I’m perfectly happy to go all libertarian and caveat emptor. Why should an intelligent reader be denied the right to hear about a p-value just because someone else might be misled due to his own ignorance?
  2. The article describes the statistical procedures and results in a misleading way. Obviously, the journal should not allow this. But a ban shouldn’t be necessary to enforce this. The whole point of a peer-reviewed journal is that experts are evaluating the article to make sure it doesn’t contain incorrect or misleading statements. If the editors feel the need for a ban, then they are in effect admitting that they and referees cannot effectively evaluate the validity of an article’s statistical claims.

The editorial describing the reasons for the ban states, incorrectly, that the banned technique is “invalid.” Moreover, the perceived need for a ban seems to me to arise from the editors’ lack of confidence in their own ability to weed out good statistics from bad. That’s why I say that, although I have no great love for p-values, this ban reduces my willingness to trust any results published in this journal.

If only scientists hyped marginal results more, our problems would be solved

January 5th, 2015

Yesterday’s New York Times Sunday Review section contains one of the most gloriously silly pieces of science journalism I’ve seen in a while.

The main point of the article, which is by the science historian Naomi Oreskes and is headlined “Playing Dumb on Climate Change,” is that the 95% confidence threshold that’s commonly used as the requirement for “statistical significance” is too high. That’s right — in a world in which there’s strong reason to believe that most published research findings are false (in biomedical research), Oreskes thinks that the main problem we need to address is that scientists are too shy and retiring when it comes to promoting marginal results.

The truth, of course, is precisely the opposite. To quote from a good Nature article on this stuff from last year (which I wrote about back in February),

The irony is that when UK statistician Ronald Fisher introduced the P value in the 1920s, he did not mean it to be a definitive test. He intended it simply as an informal way to judge whether evidence was significant in the old-fashioned sense: worthy of a second look.

In case anyone doesn’t know, what both the Times article and I referred to as “95% confidence” is the same thing as what statisticians call a P-value of 5%.

Fisher, not surprisingly, had this exactly right. A “95% confidence” result is merely a hint that something interesting might be going on. It’s far from definitive evidence. And yet scientists and journalists routinely report these results as if they were virtual certainties. Oreskes’s proposal to lower that threshold is precisely the opposite of what we should do.

In the course of arguing for this position, Oreskes repeats a common misconception about P-values:

Typically, scientists apply a 95 percent confidence limit, meaning that they will accept a causal claim only if they can show that the odds of the relationship’s occurring by chance are no more than one in 20. But it also means that if there’s more than even a scant 5 percent possibility that an event occurred by chance, scientists will reject the causal claim. It’s like not gambling in Las Vegas even though you had a nearly 95 percent chance of winning.

A 95% confidence result (P < 5%) certainly does not mean that there’s a 5% probability that the event occurred by chance. It means that, if we assume that there is no causal link, then there’s a 5% chance of seeing results as extreme as the ones we found.

That distinction (which I’ve written about before) may sound like nitpicking, but it’s extremely important. Suppose that a pregnancy test is guaranteed to be 95% accurate. If I take that pregnancy test and get a positive result, It does not mean that there’s a 95% chance that I’m pregnant. Because there was a very low prior probability of my being pregnant (among other reasons, because I’m male), I’d be quite confident that that test result was a false positive.

When scientists quote P-values, they’re like the 95% accuracy quoted for the pregnancy test: both are probabilities of getting a given outcome (positive test result), assuming a given hypothesis (I’m pregnant). Oreskes (like all too many others) turns this into a statement about the probability of the hypothesis being true. You can’t do that without folding in information about the hypothesis’s prior probability (I’m unlikely to be pregnant, test or no test).

This is one reason that 95%-confidence results aren’t nearly as certain as people seem to think. Lots of scientific hypotheses are a priori not very likely, so even a 95%-confidence confirmation of them doesn’t mean all that much. Other phenomena, such as P-hacking and publication bias, mean that even fewer 95%-confidence results are true than you’d expect.

Oreskes says that scientists “practice a form of self-denial, denying themselves the right to believe anything that has not passed very high intellectual hurdles,” a description I’m happy to agree with, as an aspiration if not always a reality. Where she loses me is in suggesting that this is a bad thing.

She also claims that this posture of extreme skepticism is due to scientists’ fervent desire to distinguish their beliefs from religious beliefs. It’s possible that the latter claim could be justified (Oreskes is a historian of science, after all), but there’s not a hint of evidence or argument to support it in this article.

Oreskes’s article is misleading in another way:

We’ve all heard the slogan “correlation is not causation,” but that’s a misleading way to think about the issue. It would be better to say that correlation is not necessarily causation, because we need to rule out the possibility that we are just observing a coincidence.

This is at best a confusing way to think about the correlation-causation business, as it seems to suggest that the only two possibilities for explaining a correlation are coincidence and causation. This dichotomy is incorrect. There is a correlation between chocolate consumption and Nobel prizes. The correlation is not due to chance (the P-value is extremely low), but one cannot conclude that chocolate causes Nobel prizes (or vice versa).

 

Physics is not in danger of abandoning empiricism

December 19th, 2014

Nature just published an opinion piece by George Ellis and Joe Silk warning that the “integrity of physics” is under threat from people who claim that physical theories should no longer require experimental verification. The article seems quite strange to me. It contains some well-argued points but mixes them with some muddled thinking. (This is a subject that seems to bring out this sort of thing.)

Although this article seems to me to be flawed, I have enormous respect for the authors, particularly for Joe Silk. Very, very far down the list of his accomplishments is supervising my Ph.D. research. I know from working with him that he is a phenomenal scientist.

I’ll comment on a few specific subjects treated in the article.

1. String theory.

The section on string theory is the best part of the article (by which I mean it’s the part that I agree with most).

As is well known, string theory has become very popular among some theoretical physicists, despite the fact that it has no prospect of experimental test in the foreseeable future. Personally, I have a fair amount of sympathy with Silk and Ellis’s view that this is an unhealthy situation.

Silk and Ellis use an article by the philosopher Richard Dawid to illustrate the point of view they disagree with. I’d never heard of this article, but it does seem to me to reflect the sorts of arguments that string theory partisans often make. Dawid seems to me to accept a bit uncritically some elements of string-theory boosterism, such as that it’s the only possible theory that can unify gravity with quantum physics and that it is “structurally unique” — meaning that it reproduces known physics with no adjustable parameters.

Silk and Ellis:

Dawid argues that the veracity of string theory can be established through philosophical and probabilistic arguments about the research process. Citing Bayesian analysis, a statistical method for inferring the likelihood that an explanation fits a set of facts, Dawid equates confirmation with the increase of the probability that a theory is true or viable. But that increase of probability can be purely theoretical. Because “no-one has found a good alternative” and “theories without alternatives tended to be viable in the past”, he reasons that string theory should be taken to be valid.

I don’t think Dawid actually mentions Bayesian reasoning explicitly, but this does seem to be a fair characterization of his argument.

I have no problem in principle with the idea that a theory can be shown to be highly probable based only on theoretical arguments and consistency with past data. Suppose that someone did manage to show that string theory really did have “structural uniqueness”  — that is, that the mathematics of the theory could be used to derive all of the parameters of the standard model of particle physics (masses of all the elementary particles, strengths of their interactions, etc.) with no adjustable parameters. That would be overwhelming evidence that string theory was correct, even if the theory never made a novel prediction. That hasn’t happened, and I don’t see evidence that it’s likely to happen, but it’s a logical possibility.

The last sentence of the above quote is a good reason for skepticism. In your Bayesian reasoning, you should give significant prior weight to the possibility that we simply haven’t thought of the right approach yet. So although “This is the only viable approach we’ve thought of” does constitute some evidence in favor of the theory, it provides less evidence than some string theorists seem to think.

So in the end I agree with Silk and Ellis that string theory should be regarded with great skepticism, although I’m not entirely in accord with their reasons.

2. Multiverse cosmology.

Silk and Ellis are quite unhappy with theories in which our observable Universe is just part of a much larger multiverse, and particularly with the sort of anthropic reasoning that’s often combined with multiverse theories. They claim that this sort of reasoning is anti-scientific because it doesn’t satisfy Karl Popper’s falsifiability criterion. This is a weak line of argument, as there are lots of reasons to regard Popperian falsifiability as too blunt an instrument to characterize scientific reasoning.

Silk and Ellis use an essay by Sean Carroll as their exemplar of the dangers of this sort of reasoning:

Earlier this year, championing the multiverse and the many-worlds hypothesis, Carroll dismissed Popper’s falsifiability criterion as a “blunt instrument”. He offered two other requirements: a scientific theory should be “definite” and “empirical”. By definite, Carroll means that the theory says “something clear and unambiguous about how reality functions”. By empirical, he agrees with the customary definition that a theory should be judged a success or failure by its ability to explain the data.

He argues that inaccessible domains can have a “dramatic effect” in our cosmic back-yard, explaining why the cosmological constant is so small in the part we see. But in multiverse theory, that explanation could be given no matter what astronomers observe. All possible combinations of cosmological parameters would exist somewhere, and the theory has many variables that can be tweaked.

The last couple of sentences are quite unfair. The idea behind these anthropic multiverse theories is that they predict that some outcomes are far more common than others. When we observe a particular feature of our Universe, we can ask whether that feature is common or uncommon in the multiverse. If it’s common, then the theory had a high probability of producing this outcome; if it’s uncommon, the probability is low. In terms of Bayesian reasoning (or as I like to call it, “reasoning”), the observation would then provide evidence for or against the theory.

Now you can argue that this is a bad approach in various ways. To take the most obvious line of attack, you can argue that in any particular theory the people doing the calculations have done them wrong. (Something called the “measure problem” may mean that the ways people are calculating the probabilities are incorrect, and there are other possible objections.) But those objections have nothing to do with the supposed looming menace of anti-empiricism. If you make such an argument, you’re saying that the theory under discussion is wrong, not that it’s unempirical. In other words, if that’s the problem you’re worried about, then you’re having a “normal” scientific argument, not defending the very nature of science itself.

 3. Many-worlds interpretation of quantum mechanics.

This is just a little thing that irked me.

Silk and Ellis:

The many-worlds theory of quantum reality posed by physicist Hugh Everett is the ultimate quantum multiverse, where quantum probabilities affect the macroscopic. According to Everett, each of Schrödinger’s famous cats, the dead and the live, poisoned or not in its closed box by random radioactive decays, is real in its own universe. Each time you make a choice, even one as mundane as whether to go left or right, an alternative universe pops out of the quantum vacuum to accommodate the other action.

Personally, I’m a fan of this interpretation of quantum mechanics. But as far as I know, everyone acknowledges that the Everett interpretation is “just” an interpretation, not a distinct physical theory. In other words, it’s generally acknowledged that the question of whether to accept the Everett interpretation is outside the domain of science, precisely because the interpretation makes no distinct physical predictions. So it’s disingenuous to cite this as an example of people dragging science away from empiricism.

 

Multitalented students

December 11th, 2014

At a liberal-arts college like the University of Richmond, students are encouraged (and even to some extent required) to pursue interests beyond their primary field of study. This does come with costs — our students do less advanced coursework in their discipline than a student who graduates from, say, a European university — but on balance I really like it.

In the past week, I’ve seen a couple examples of students in my department doing exciting, non-physics-related things.

On Sunday, I went to the Christmas service of lessons and carols at the University Chapel. Two student choirs, the Womens’ Chorale and the mixed-gender Schola Cantorum, performed a beautiful and highly varied set of Christmas vocal music. About half a dozen students I’ve had in advanced physics classes (Isaac Rohrer, Joe Kelly, Grace Dawson, Kelsey Janik, Ed Chandler, and I’m pretty sure some more I’m forgetting) were among the performers.

Just this morning, there was a piece on our local public radio station about an interdisciplinary art / archaeology / ecology project in which they consider and modify the environment of a university parking lot in a variety of ways. One of the students interviewed in the piece, David Ricculli, is a physics major who worked in my lab. Another, Kelsey Janik, is not a physics major but has taken several of our advanced courses, as well as my first-year seminar, and is also one of the singers.

In past years, I’ve seen our physics students display art work in on-campus exhibitions, perform in a huge variety of theatrical and musical events, and present talks on topics like monetary policy in ancient Rome. I really enjoy seeing the range of things they can do.

Training the next generation of nitpickers

October 20th, 2014

One of my students just submitted his first bug report to Wolfram. I’m so proud.

This is arguably not a bug, but it’s certainly unexpected, nonstandard, and undesirable behavior.  Wolfram Alpha is calculating the norm of a vector as the square root of the sum of the squares of the vector’s components (i.e., the usual Pythagorean relation). But when the vector has complex numbers, that’s not the right thing to do: you have to use absolute squares. Otherwise, you get absurd results like these, and your norm isn’t even a norm.

 

Lamar Smith actually is going after peer review

October 14th, 2014

Last year, scientists and science writers got worked up over a bill proposed by Representative Lamar Smith (Republican of Texas) that, it was claimed, constituted an attack on peer review of grants at the National Science Foundation. I thought that that attack was silly. The proposed law, while certainly not a good idea, would have had little or no effect on the peer review process. I still think that that diagnosis was correct.

To repeat something else I said at the time, even if this bill is mostly harmless, that doesn’t refute the claim that Smith is an enemy of science (he certainly is), and it doesn’t rule out the possibility that he does want to go after peer review in other ways.

Since I was (sort of, I guess) defending Smith before, I feel like I should point out that he has been meddling in the NSF review process lately in ways that bother me considerably more than that proposed legislation. Science and io9 have pieces that are worth reading on the subject.

Smith has made a list of grants that he doesn’t like and has had staffers examine the process by which these grants are reviewed. There doesn’t seem to be any question that Smith chose to go after these grants because he didn’t like  their titles and brief descriptions. In other words, as Representative Eddie Bernice Johnston (Democrat of Texas) put it in a letter to Smith,

 The plain truth is that there are no credible allegations of waste, fraud, or abuse associated with these 20 awards. The only issue with them appears to be that you, personally, think that the grants sound wasteful based on your understanding of their titles and purpose. Seeking to substitute your judgment for the determinations of NSF’s merit review process is the antithesis of the successful principles our nation has relied on to make our research investment decisions. The path you are going down risks becoming a textbook example of political judgment trumping expert judgment.

Smith argues that Congress has the duty to oversee how NSF is spending its money, which is undoubtedly true. But it makes no sense to do that by picking individual awards based on their titles and having people with no expertise try to evaluate their merits. And in the process, actual harm can be done, particularly if the anonymity of the peer review process is compromised, as Johnston claims it has been in her letter. (I have not examined Johnston’s allegation in detail.)

In case it’s not obvious, let me make clear that anonymity does matter. As an untenured assistant professor, I participated on an NSF review panel that gave a negative recommendation to a proposal from one of the biggest names in my field (someone who could surely torpedo my career). Among the proposals recommended for funding by that panel were some stronger proposals by young, relatively unknown researchers. I hope that I would have made the same recommendation if I had not been anonymous, but I’m not at all sure that I would have.

Smith’s actual agenda seems to be that certain categories of proposals (largely in the social sciences) should be eliminated from NSF funding. If he wants to propose that straightforwardly and try to pass a law to that effect, he has the right to do so. But Johnston’s exactly right that interfering with the peer review process is not the way to go after this goal. In my experience, NSF peer review works remarkably well. Having individual members of Congress examining individual proposals is certainly not going to improve that system.

In one way, Smith’s actions fit into a long tradition of politicians railing against wasteful-sounding research grants. William Proxmire had his “Golden Fleece” awards way back in the 1970s. Then there was this tweet from John McCain:

That was actually about an earmark, not a peer-reviewed grant, so it raises quite different issues about the funding process, but as an example of a thoughtless critique of science, it fits right in. (Astronomy is a significant industry in Hawaii, and  astronomy jobs are in fact jobs.)  What Smith’s doing is different from these, because he’s using the investigative machinery of Congress rather than just bloviating.

 

Atheists who believe in E.T.

October 2nd, 2014

According to a press release from Vanderbilt, atheists are more likely than members of various religions to believe in the existence of extraterrestrial life:

Belief in extraterrestrials varies by religion

  • 55 percent of Atheists
  • 44 percent of Muslims
  • 37 percent of Jews
  • 36 percent of Hindus
  • 32 percent of Christians

I heard about this via a blog hosted at the Institute of Physics. The writer expresses surprise at the finding:

Apparently, the people most likely to believe in extraterrestrial life are…atheists. More than half (55%) of the atheists in the poll professed a belief in extraterrestrials, compared with 44% of Muslims, 37% of Jews, 36% of Hindus and just 32% of Christians.
Without information about how many people were polled, or how they were selected, it’s hard to know how seriously to take these results. The press release also didn’t say how the question was phrased, which is likewise pretty important. After all, believing that we are unlikely to be alone in a vast universe is very different from believing that little green men gave you a ride in their spaceship last Tuesday. But even so, it seems odd that atheists – a group defined by their lack of belief in a being (or beings) for which there is no good scientific evidence – are so willing to believe in the existence of extraterrestrials. Because, of course, there’s no good evidence for them, either.

I agree that the lack of information about polling methodology is annoying. (The press release refers to a book that’s not out yet, and I can’t find any other publications by this author that contain the results.) But the last part of this quote is just silly. There certainly is evidence for the existence of extraterrestrial life, and it’s not at all unreasonable for a rationalist (assuming, for the moment, the author’s implicit equation of atheism with rationalism) to believe in it.

In particular, we know that there are a huge number of planets like Earth out there. There’s considerable evidence that that number is unbelievably large (i.e., 10 to some large power), and it might even be infinite. Furthermore, we know that in the one instance of an Earthlike planet that we’ve studied in detail, life arose almost as soon as it could have. Those facts constitute strong evidence in favor of the idea that extraterrestrial life exists.

Of course that’s not a proof (in the sense of pure mathematics or logic) that life exists, but presumably “belief in” something requires only (probabilistic) evidence, not literal mathematical proof. (If mathematical certainty were required for belief, then the list of things a rational person should believe in would be quite short.)

I don’t think it’s the least bit surprising that atheists are more likely than theists to believe in extraterrestrial life. That’s exactly what I would have predicted. After all, some major religious traditions are based on the idea that God created the Universe specifically for us humans. A natural consequence of that idea is that we humans are the only living beings out there. On the other hand, someone who doesn’t believe in such a tradition is far more likely to believe that life is a random occurrence that happens with some probability whenever conditions are right for it. A natural consequence of this belief is that life exists elsewhere.

 

 

 

To dust we return

September 22nd, 2014

In case you haven’t heard, the people behind the Planck satellite have released their analysis of the region of the sky observed by BICEP earlier this year. They find higher levels of dust than those found in BICEP’s foreground models. In fact, the amount of dust is large enough to completely explain BICEP’s detection.

This doesn’t rule out the possibility that there is some cosmological signal in the BICEP data, but it does mean there’s no strong evidence for such a signal.

I should disclose that I haven’t read the Planck paper yet; I’ve just skimmed the key sections. But at a quick glance the analysis they’ve done certainly looks sensible, and for a variety of reasons I’d be surprised if they got this wrong. Of course, I already thought there was significant reason to doubt the original interpretation of the BICEP results.

I don’t have much more to say, so here are some links: Peter Coles, Sean Carroll, BBC.

Actually, I will make one quick meta observation. Some people are once again castigating the BICEP team for going public with this result prematurely. I think that that criticism is largely misguided. They may well be subject to fair criticism for getting the analysis wrong, of course, but that’s different from saying that they shouldn’t have made it public. I’m fine with people seeing the process by which science gets done, which includes everyone scrutinizing everyone else’s work.

 

GPA puzzles

September 5th, 2014

A colleague pointed me to an article by Valen Johnson called An alternative to traditional GPA for evaluating student performance, because the article takes a Bayesian approach, and he knew I liked that sort of thing.

Johnson addresses the problem that a student’s grade point average (GPA), the standard measure of academic quality in US educational institutions, doesn’t necessarily give fair or useful results. Some instructors, and even some entire disciplines, on average grade higher than others, so some students are unfairly penalized / rewarded in their GPAs based on what they choose to study.

To illustrate the problem, Johnson uses an example taken from an earlier paper by Larkey and Caulkin. I’d never seen this before, and I thought it was cute, so I’m passing it on.

Imagine that four students take nine courses, receiving the following grades:

In this scenario, every individual course indicates that the ranking of the students is I, II, III, IV (from best to worst). That is, in every course in which students I and II overlap, I beats II, and similarly for all other pairs. But the students’ GPAs put them in precisely the opposite order.

This is a made-up example, of course, but it illustrates the idea that in the presence of systematic differences in grading standard, you can get anomalous results.

This example tickles my love of math puzzles. If you’d asked me whether it was possible to construct a scenario like this, I think I would have said no.

There are obvious follow-up questions, for those who like this sort of thing. Could you get similar results with fewer courses? If you had a different number of students, how many courses would you need to get this outcome?

I know the answer for the case of two students. If you allow for courses with only one student in them, then it’s easy to get this sort of inversion: have the students get a C+ and a C respectively in one course, and then give student II an A in some other course. If you don’t allow one-student courses, then it’s impossible. But as soon as you go up to three students, I don’t think the answer is obvious at all.

As I said, I was mostly interested in this curious puzzle, but in case you’re curious, here are a few words about the problem Johnson is addressing. I don’t have much to say about it, because I haven’t studied the paper in enough detail.

Some people have proposed that a student’s transcript should include statistical information about the grade distribution in each of the student’s courses, so that anyone reading the transcript will have some idea of what the grade is worth. For what it’s worth, that strikes me as a sensible thing to do, although getting the details right may be tricky.

That only solves the problem if the person evaluating the student (prospective employer, graduate program, or the like) is going to take the time to look at the transcript in detail. Often, people just look at a summary statistic like GPA. Johnson proposes a way of calculating a quantity that could be considered an average measure of student achievement, taking into account the variation in instructors’ grading habits. Other people have done this before, or course. Johnson’s approach is different in that it’s justified by Bayesian probability calculations from a well-specified underlying model, as opposed to more-or-less ad hoc calculations.

I’m philosophically sympathetic to this approach, although some of the details of Johnson’s calculations seem a bit odd to me. I’d have to study it much more carefully than I intend to to say for sure what I think of it.

 

Vaccines are still good for you

August 29th, 2014

People seem to have been talking about some new reports that claim (yet again) a connection between vaccines and autism. The latest versions go further, alleging a cover-up by the CDC. The most important thing to know about this is that the overwhelming scientific consensus remains that vaccines are not linked to autism. They do, on the other hand, prevent vast amounts of suffering due to preventable diseases. The anti-vaccine folks do enormous harm.

(Although I have a few other things to say, the main point of this piece is to link to an excellent post by Allen Downey. The link is below, but it’s mixed in with a bunch of other stuff, so  I thought I’d highlight it up here.)

The usual pro-science people (e.g., Phil Plait) have jumped on this most recent story, stating correctly that the new report is bogus. They tend to link to two articles explaining why, but I’d rather steer you toward a piece by my old friend Allen Downey. Unlike the other articles, Allen explains one specific way in which the new study is wrong.

The error Allen describes is a common one. People often claim that a result is “statistically significant” if it has a “p-value” below 5%. This means that there is only a 5% chance of a false positive — that is, if there is no real effect, you’d be fooled into thinking there was an effect 5% of the time. Now suppose that you do 20 tests. The odds are very high in that case that at least one of them will be “significant” at the 5% level. People often draw attention to these positive results while sweeping under the rug the other tests that didn’t show anything. As far as I can tell, Allen’s got the goods on these guys, demonstrating convincingly that that’s what they did.

The other pieces I’ve read debunking the recent study have tended to focus on the people involved, pointing out (correctly, as far as I know) that they’ve made bogus arguments in the past, that they have no training in statistics or epidemiology, etc. Some people say that you shouldn’t pay any attention to considerations like that: all that matters is the content of the argument, and ad hominem considerations are irrelevant. That’s actually not true. Life is short. If you hear an argument from someone who’s always been wrong before, you might quite rationally decide that it’s not worth your time to figure out why it’s wrong. Combine that with a strong prior belief (tons of other evidence have shown no vaccine-autism link), and perfectly sound Bayesian reasoning (or as I like to call it, “reasoning”) tells you to discount the new claims. So before I saw Allen’s piece, I was pretty convinced that the new results were wrong.

But despite all that, it’s clearly much better if someone is willing to do the public service of figuring out why it’s wrong and explaining it clearly. This is pretty much the reason that I bothered to figure out in detail that evolution doesn’t violate the laws of thermodynamics: there was no doubt about the conclusion, but because the bogus argument continues to get raised, it’s good to be able to point people towards an explanation of  exactly why it’s wrong.

So thanks, Allen!