I’ve been programming in IDL for a couple of decades. How did I not know about this bizarre behavior of its random number generator?

Apparently, this is something people know about, but somehow I’d missed it for all this time.

Department of Physics

I’ve been programming in IDL for a couple of decades. How did I not know about this bizarre behavior of its random number generator?

Apparently, this is something people know about, but somehow I’d missed it for all this time.

I’m a little late getting to this, but in case you haven’t heard about it, here it is.

A journalist named John Bohannon did a stunt recently in which he and some coauthors “published” a “study” that “showed” that chocolate caused weight loss. (The reasons for the scare quotes will become apparent.) The work was picked up by a bunch of news outlets. Bohannon wrote about the whole thing on io9. It’s also been picked up in a bunch of other places, including the BBC radio program *More or Less* (which I’ve mentioned before a few times).

The idea was to do a study that was shoddy in precisely the ways that many “real” studies are, get it published in a low-quality journal, and see if they could get it picked up by credulous journalists.

My colleagues and I recruited actual human subjects in Germany. We ran an actual clinical trial, with subjects randomly assigned to different diet regimes. And the statistically significant benefits of chocolate that we reported are based on the actual data. It was, in fact, a fairly typical study for the field of diet research. Which is to say: It was terrible science. The results are meaningless, and the health claims that the media blasted out to millions of people around the world are utterly unfounded.

There is an interesting question of journalistic ethics here. Bohannon calls himself a journalist, but he deliberately introduced bad science into the mediasphere with the specific intent of deception. Is it OK for a journalist to do that, if his motives are pure? I don’t know.

I don’t want to focus on that sort of issue, because I don’t have anything non-obvious to say. Instead, I want to dig a bit into the details of what Bohannon *et al.* did. Although Bohannon’s io9 post is well worth reading and gets the big picture largerly right, it’s wrong or misleading in a few ways, which happen to be the sort of thing I care about.

Bohannon *et al.* recruited a group of subjects and divided them into three groups: a control group, a group that was put on a low-carb diet, and a group that was put on a low-carb diet but also told to eat a certain amount of chocolate each day. The chocolate group lost weight faster than the other groups. The result was “statistically significant,” in the usual meaning of that term — the *p*-value was below 0.05.

So what was wrong with this study? As Bohannon explains it in his io9 post,

Here’s a dirty little science secret: If you measure a large number of things about a small number of people, you are almost guaranteed to get a “statistically significant” result. Our study included 18 different measurements—weight, cholesterol, sodium, blood protein levels, sleep quality, well-being, etc.—from 15 people. (One subject was dropped.) That study design is a recipe for false positives.

…

The conventional cutoff for being “significant” is 0.05, which means that there is just a 5 percent chance that your result is a random fluctuation. The more lottery tickets, the better your chances of getting a false positive. So how many tickets do you need to buy?

P(winning) = 1 – (1 –p)^{n}With our 18 measurements, we had a 60% chance of getting some“significant” result with

p< 0.05. (The measurements weren’t independent, so it could be even higher.) The game was stacked in our favor.It’s called

p-hacking—fiddling with your experimental design and data to pushpunder 0.05—and it’s a big problem. Most scientists are honest and do it unconsciously. They get negative results, convince themselves they goofed, and repeat the experiment until it “works.” Or they drop “outlier” data points.

Sadly, even in this piece, whose purpose is to debunk bad statistics, Bohannon repeats the usual incredibly common error. A *p*-value of 0.05 does not mean that “there is just a 5 percent chance that your result is a random fluctuation.” It means that, if you assume that nothing but random fluctuations are at work, there’s a 5% chance of getting results as extreme as you did. A (frequentist) *p*-value is incapable of telling you anything about the probability of any given hypothesis (such as “your result is a random fluctuation”).

One other quibble: the parenthetical remark about the measurements not being independent is literally true but misleading. The fact that the measurements aren’t independent means that the probability of a false positive “could be even higher”, but it could also be lower. In fact, the latter seems more likely to me. (The probability goes down if the measurements are positively correlated with each other, and up if they’re anticorrelated.)

The other thing that’s worth focusing on is the number of subjects in the study, which was incredibly small (15 across all three groups). Bohannon suggests (in the first sentence quoted above) that this is part of the reason they got a false positive, and other pieces I’ve read on this say the same thing. But it’s not true. The reason they got a false positive was *p*-hacking (buying many lottery tickets), which would have worked just as well with a larger number of subjects. If you had more subjects, the random fluctuations would have gotten smaller, but the level of fluctuation required for statistical “significance” would have gone down as well. By definition, the odds of any one “lottery ticket” winning is 5%, whether you have a lot of subjects or a few.

It’s true that with fewer subjects the effect size (i.e., the number of extra pounds lost, on average) is likely to be larger, but the published article went to great lengths to downplay the effect size (e.g., not mentioning it in the abstract, which is often all anyone reads).

Let me repeat that I think that Bohannon’s description of what he did is well worth reading and has a lot that’s right at the macro-scale, even though I wish that he’d gotten the above details right.

In *Surely You’re Joking, Mr. Feynman*, Richard Feynman tells a story of sitting in on a philosophy seminar and being asked by the instructor whether he thought that an electron was an “essential object.”

Well, now I was in trouble. I admitted that I hadn’t read the book, so I had no idea of what Whitehead meant by the phrase; I had only come to watch. “But,” I said, “I’ll try to answer the professor’s question if you will first answer a question from me, so I can have a better idea of what ‘essential object’ means. Is a brick an essential object?”

What I had intended to do was to find out whether they thought theoretical constructs were essential objects. The electron is a theory that we use; it is so useful in understanding the way nature works that we can almost call it real. I wanted to make the idea of a theory clear by analogy. In the case of the brick, my next question was going to be, “What about the inside of the brick?”–and I would then point out that no one has ever seen the inside of a brick. Every time you break the brick, you only see the surface. That the brick has an inside is a simple theory which helps us understand things better. The theory of electrons is analogous. So I began by asking, “Is a brick an essential object?”

The way he tells the story (which, of course, need not be presumed to be 100% accurate), he never got to the followup question, because the philosophers got bogged down in an argument over the first question.

I was reminded of this when I read A Crisis at the Edge of Physics , by Adam Frank and Marcelo Gleiser, in tomorrow’s *New York Times*. The article is a pretty good overview of some of the recent hand-wringing over certain areas of theoretical physics that seem, to some people, to be straying too far from experimental testability. (Frank and Gleiser mention a silly article by my old Ph.D. adviser that waxes particularly melodramatic on this subject.)

From the *Times* piece:

If a theory successfully explains what we can detect but does so by positing entities that we can’t detect (like other universes or the hyperdimensional superstrings of string theory) then what is the status of these posited entities? Should we consider them as real as the verified particles of the standard model? How are scientific claims about them any different from any other untestable — but useful — explanations of reality?

These entities are, it seems to me, not fundamentally different from the inside of Feynman’s brick, or from an electron for that matter. No one has ever seen an electron, or the inside of a brick, or the core of the Earth, for that matter. We believe that those things are real, because they’re essential parts of a theory that we believe in. We believe in that theory because it makes a lot of successful predictions. If string theory or theories that predict a multiverse someday produce a rich set of confirmed predictions, then the entities contained on those theories will have as much claim to reality as electrons do.

Just to be clear, that hasn’t happened yet, and it may never happen. But it’s just wrong to say that these theories represent a fundamental retreat from the scientific method, just because they contain unobservable entities. (To be fair, Frank and Gleiser don’t say this, but many other people do.) Most interesting theories contain unobservable entities!

I’ve been reading some news coverage about the now-retracted paper published in *Science*, which purported to show that voters’ opinions on same-sex marriage could be altered by conversations with gay canvassers. Some of the things the senior author, Donald Green, said in one article struck me as very odd, from my perspective in the natural sciences. I wonder if the culture in political science is really that different?

Here’s the first quote:

“It’s a very delicate situation when a senior scholar makes a move to look at a junior scholar’s data set,” Dr. Green said. “This is his career, and if I reach in and grab it, it may seem like I’m boxing him out.”

In case you don’t have your scorecard handy, Dr. Green is the senior author of the paper. There’s only one other author, a graduate student named Michael LaCour. LaCour did all of the actual work on the study (or at least he said he did — that’s the point of the retraction).

In physics, it’s bizarre to imagine that one of the two authors of a paper would feel any delicacy about asking to see the data the paper is based on. If a paper has many authors, then of course not every author will actually look at the data, but with only two authors, it would be extremely strange for one of them not to look. Is it really that different in political science?

Later in the same article, we find this:

Money seemed ample for the undertaking — and Dr. Green did not ask where exactly it was coming from.

“Michael said he had hundreds of thousands in grant money, and, yes, in retrospect, I could have asked about that,” Dr. Green said. “But it’s a delicate matter to ask another scholar the exact method through which they’re paying for their work.”

Delicacy again! This one is, if anything, even more incomprehensible to me. I can’t imagine having my name on a paper presenting the results of research without knowing where the funding came from. For one thing, in my field the funding source is always acknowledged in the paper.

In both cases, Green is treating this as someone else’s work that he has nothing do do with. If that were true, then asking to see the raw data would be presumptuous (although in my world asking about the funding source would not). But he’s one of only two authors on the paper — it’s (supposedly) his work too.

It seems to me that there are two possibilities:

- The folkways of political scientists are even more different from those of natural scientists than I had realized.
- Green is saying ridiculous things to pretend that he wasn’t grossly negligent.

I don’t know which one is right.

I learned via Peter Coles of this list of ways that scientists try to spin results that don’t reach the standard-but-arbitrary threshold of statistical significance. The compiler, Matthew Hankins, says

You don’t need to play the significance testing game – there are better methods, like quoting the effect size with a confidence interval – but if you do, the rules are simple: the result is either significant or it isn’t.

…

The following list is culled from peer-reviewed journal articles in which (a) the authors set themselves the threshold of 0.05 for significance, (b) failed to achieve that threshold value for p and (c) described it in such a way as to make it seem more interesting.

The list begins like this:

(barely) not statistically significant (p=0.052)

a barely detectable statistically significant difference (p=0.073)

a borderline significant trend (p=0.09)

a certain trend toward significance (p=0.08)

a clear tendency to significance (p=0.052)

a clear trend (p<0.09)

a clear, strong trend (p=0.09)

a considerable trend toward significance (p=0.069)

a decreasing trend (p=0.09)

a definite trend (p=0.08)

a distinct trend toward significance (p=0.07)

And goes on at considerable length.

Hankins doesn’t provide sources for these, so I can’t rule out the possibility that some are quoted out of context in a way that makes them sound worse than they are. Still, if you like snickering at statistical solecisms, snicker away.

I would like to note one quasi-serious point. The ones that talk about a “trend,” and especially “a trend toward significance,” are much worse than the ones that merely use language such as “marginally significant.” In the latter case, the authors are merely acknowledging that the usual threshold for “significance” (p=0.05) is arbitrary. Hankins says that, having agreed to play the significance game, you have to follow its rules, but that seems like excessive pedantry to me. The “trend” language, on the other hand, suggests either a deep misunderstanding of how statistics work or an active attempt to mislead.

Hankins:

For example, “a trend towards significance” expresses non-significance as some sort of motion towards significance, which it isn’t: there is no ‘trend’, in any direction, and nowhere for the trend to be ‘towards’.

This is exactly right. The only thing a p-value does is tell you about the probability that results like the ones you saw could have occurred by chance. Under that hypothesis, a low p-value occurred due to a chance fluctuation and will (with high probability) revert to higher values if you gather more data.

The “trend” language suggests, either deliberately or accidentally, that the results are marching toward significance and will get there if only we can gather more data. But that’s only true if the effect you’re looking for is really there, which is precisely what we don’t know yet. (If we knew that, we wouldn’t need the data.) If it’s not there, then there will be no trend; rather, you’ll get regression to more typical (higher / less “significant”) p-values.

Last week, *Nature* ran a piece under the headline Quantum physics: What is really real? I obnoxiously posted this on Facebook (because I’m too old and out of touch to be on any of the hipper social media sites):

I have now read the piece, and I can report that there’s no need for a recantation. As expected, *Nature* is making grandiose claims about quantum mechanics and the nature of reality that go beyond anything supported by evidence.

*Nature* writes pretty much the same story every couple of years. The main idea behind all of these articles is the question of whether the quantum mechanical wavefunction describes *the way a system really is * or merely *our knowledge of the system. *In philosophy-of-science circles, these two points of view are sometimes known as the psi-ontic and psi-epistemic stances. More specifically, all three of these articles have to do with a theorem published (in one of the *Nature *journals) by Pusey *et al.* that claims to provide an experimental way of distinguishing between these possibilities. After Pusey *et al.* published this theoretical result, others went ahead and performed the proposed experimental tests, leading to the (claimed) conclusion that the wavefunction describes actual reality, not merely our knowledge.

You should of course be skeptical of any claim that an experimental result reveals something about the deep nature of reality. Sure enough, if you dig down just a little bit, it becomes clear that these results do no such thing. The Pusey *et al.* theorem proves that *a certain class* of psi-epistemic theories make predictions that differ from the predictions of standard quantum mechanics. The subsequent experiments confirmed the standard predictions, so they rule out that class of theories.

The problem is that ruling out a specific class of psi-epistemic theories is not the same thing as ruling out the psi-epistemic point of view as a whole. We now know that that class of theories is wrong, but that’s all we know. To make matters worse, the class of theories ruled out by these experiments, as far as I can tell, does not contain any theories that any proponents of psi-epstemicism actually believe in. The theories they tested are straw men.

In particular, the most prominent proponents of the psi-epistemic point of view are the advocates of something called quantum Bayesianism (QBism). QBism is an interpretation of quantum mechanics, as opposed to an alternative theory — that is, it makes predictions that are identical to those of standard quantum mechanics. There is, therefore, no experimental result that would distinguish QBism from psi-ontic versions of quantum mechanics.

Not all psi-epistemicists are QBists, of course, but as far as I can tell even the others never advocated for any theories in the class considered by Pusey *et al*. If I’m wrong about that, I’d be interested to know.

The journal *Basic and Applied Social Psychology* has come out with a ban on *p*-values. To be precise, they’ve banned the “null hypothesis significance testing procedure” from articles published in the journal. This ban means that authors in the journal can’t claim that an effect they see in their data is “statistically significant” in the usual way that we’re all accustomed to reading.

Like all right-thinking people, I believe that the only coherent way to think about statistical questions is the Bayesian way, and that there are serious problems with the alternative “frequentist” approach. Moreover, the sort of (frequentist) significance testing banned by this journal can indeed lead to serious problems. It’s a large part of the reason that a strong argument can be made that (at least in some scientific disciplines) most published research findings are false.

All of that suggests that I ought to applaud this decision, but at the risk of seeming disloyal to my fellow Bayesians, I don’t. In fact, the journal editors’ decision to impose this ban makes me trust the quality of the journal less than I otherwise would, not more.

I was pleased to see that my old friend Allen Downey, always a voice of sanity on matters of this sort, is of the same opinion. I won’t rehash everything he says in his post, but I heartily endorse it.

The main thing to realize is that the techniques in question aren’t actually wrong. On the contrary, they correctly answer the questions they’re supposed to answer.

If your data allow you to reject the null hypothesis with a significance (“*p*-value”) of 5%, that means that, *if the null hypothesis were true, there’d be only a 5% chance of getting data that look like the data you actually got. *

Some people — or so I’ve been told — labor under the misconception that the *p*-value tells you the probability that the null hypothesis is true, but it doesn’t. I’m going to rehash the old story here; skip ahead if you’ve heard it before.

Suppose that a pregnancy test yields correct results 95% of the time. Pat takes the test, which comes out positive. That means that the “null hypothesis” that Pat is not pregnant can be ruled out with a significance (*p*-value) of 5%. But it does not mean that there’s a 95% chance that Pat is pregnant. The probability that Pat is pregnant depends both on the result of the test and on any additional information you have about Pat — that is, the *prior probability* that Pat is pregnant. For example, if Pat is anatomically male, then the probability of pregnancy is zero, regardless of the test result.

Needless to say, Pat doesn’t care about the *p-*value; Pat cares about whether Pat is pregnant. The *p*-value is not wrong; it’s just uninteresting. As I’ve said before, this can be summarized with a convenient Venn diagram:

So if *p*-values are uninteresting, why shouldn’t the journal ban them?

The main reason is because they can be an ingredient in drawing useful conclusions. You can combine the *p*-value with your prior knowledge to get an answer to the question you’re actually interested in (“How likely is the hypothesis to be true?”). Just because the *p*-value doesn’t directly answer Pat’s question about whether she’s pregnant, that doesn’t mean that it’s not valid and useful information about the reliability of the pregnancy test, which she can (and should) use in drawing conclusions.

As far as I can tell, the argument in favor of banning *p*-values is that people sometimes misinterpret them, but that’s a weak argument for a ban. It’s worth distinguishing two possibilities here:

- The article describes the statistical procedures and results accurately, but statistically illiterate readers misunderstand them. In this situation, I’m perfectly happy to go all libertarian and
*caveat emptor.*Why should an intelligent reader be denied the right to hear about a*p-*value just because someone else might be misled due to his own ignorance? - The article describes the statistical procedures and results in a misleading way. Obviously, the journal should not allow this. But a ban shouldn’t be necessary to enforce this. The whole point of a peer-reviewed journal is that experts are evaluating the article to make sure it doesn’t contain incorrect or misleading statements. If the editors feel the need for a ban, then they are in effect admitting that they and referees cannot effectively evaluate the validity of an article’s statistical claims.

The editorial describing the reasons for the ban states, incorrectly, that the banned technique is “invalid.” Moreover, the perceived need for a ban seems to me to arise from the editors’ lack of confidence in their own ability to weed out good statistics from bad. That’s why I say that, although I have no great love for *p*-values, this ban reduces my willingness to trust any results published in this journal.

Yesterday’s *New York Times* Sunday Review section contains one of the most gloriously silly pieces of science journalism I’ve seen in a while.

The main point of the article, which is by the science historian Naomi Oreskes and is headlined “Playing Dumb on Climate Change,” is that the 95% confidence threshold that’s commonly used as the requirement for “statistical significance” is too high. That’s right — in a world in which there’s strong reason to believe that most published research findings are false (in biomedical research), Oreskes thinks that the main problem we need to address is that scientists are too shy and retiring when it comes to promoting marginal results.

The truth, of course, is precisely the opposite. To quote from a good *Nature* article on this stuff from last year (which I wrote about back in February),

The irony is that when UK statistician Ronald Fisher introduced the P value in the 1920s, he did not mean it to be a definitive test. He intended it simply as an informal way to judge whether evidence was significant in the old-fashioned sense: worthy of a second look.

In case anyone doesn’t know, what both the *Times* article and I referred to as “95% confidence” is the same thing as what statisticians call a P-value of 5%.

Fisher, not surprisingly, had this exactly right. A “95% confidence” result is merely a hint that something interesting might be going on. It’s far from definitive evidence. And yet scientists and journalists routinely report these results as if they were virtual certainties. Oreskes’s proposal to lower that threshold is precisely the opposite of what we should do.

In the course of arguing for this position, Oreskes repeats a common misconception about P-values:

Typically, scientists apply a 95 percent confidence limit, meaning that they will accept a causal claim only if they can show that the odds of the relationship’s occurring by chance are no more than one in 20. But it also means that if there’s more than even a scant 5 percent possibility that an event occurred by chance, scientists will reject the causal claim. It’s like not gambling in Las Vegas even though you had a nearly 95 percent chance of winning.

A 95% confidence result (P < 5%) certainly does not mean that there’s a 5% probability that the event occurred by chance. It means that, if we assume that there is no causal link, then there’s a 5% chance of seeing results as extreme as the ones we found.

That distinction (which I’ve written about before) may sound like nitpicking, but it’s extremely important. Suppose that a pregnancy test is guaranteed to be 95% accurate. If I take that pregnancy test and get a positive result, It does not mean that there’s a 95% chance that I’m pregnant. Because there was a very low *prior probability* of my being pregnant (among other reasons, because I’m male), I’d be quite confident that that test result was a false positive.

When scientists quote P-values, they’re like the 95% accuracy quoted for the pregnancy test: both are probabilities of getting a given *outcome *(positive test result)*, *assuming a given *hypothesis* (I’m pregnant). Oreskes (like all too many others) turns this into a statement about the probability of the hypothesis being true. You can’t do that without folding in information about the hypothesis’s prior probability (I’m unlikely to be pregnant, test or no test).

This is one reason that 95%-confidence results aren’t nearly as certain as people seem to think. Lots of scientific hypotheses are *a priori *not very likely, so even a 95%-confidence confirmation of them doesn’t mean all that much. Other phenomena, such as P-hacking and publication bias, mean that even fewer 95%-confidence results are true than you’d expect.

Oreskes says that scientists “practice a form of self-denial, denying themselves the right to believe anything that has not passed very high intellectual hurdles,” a description I’m happy to agree with, as an aspiration if not always a reality. Where she loses me is in suggesting that this is a bad thing.

She also claims that this posture of extreme skepticism is due to scientists’ fervent desire to distinguish their beliefs from religious beliefs. It’s possible that the latter claim could be justified (Oreskes is a historian of science, after all), but there’s not a hint of evidence or argument to support it in this article.

Oreskes’s article is misleading in another way:

We’ve all heard the slogan “correlation is not causation,” but that’s a misleading way to think about the issue. It would be better to say that correlation is not necessarily causation, because we need to rule out the possibility that we are just observing a coincidence.

This is at best a confusing way to think about the correlation-causation business, as it seems to suggest that the only two possibilities for explaining a correlation are coincidence and causation. This dichotomy is incorrect. There is a correlation between chocolate consumption and Nobel prizes. The correlation is not due to chance (the P-value is extremely low), but one cannot conclude that chocolate causes Nobel prizes (or vice versa).

*Nature* just published an opinion piece by George Ellis and Joe Silk warning that the “integrity of physics” is under threat from people who claim that physical theories should no longer require experimental verification. The article seems quite strange to me. It contains some well-argued points but mixes them with some muddled thinking. (This is a subject that seems to bring out this sort of thing.)

Although this article seems to me to be flawed, I have enormous respect for the authors, particularly for Joe Silk. Very, very far down the list of his accomplishments is supervising my Ph.D. research. I know from working with him that he is a phenomenal scientist.

I’ll comment on a few specific subjects treated in the article.

The section on string theory is the best part of the article (by which I mean it’s the part that I agree with most).

As is well known, string theory has become very popular among some theoretical physicists, despite the fact that it has no prospect of experimental test in the foreseeable future. Personally, I have a fair amount of sympathy with Silk and Ellis’s view that this is an unhealthy situation.

Silk and Ellis use an article by the philosopher Richard Dawid to illustrate the point of view they disagree with. I’d never heard of this article, but it does seem to me to reflect the sorts of arguments that string theory partisans often make. Dawid seems to me to accept a bit uncritically some elements of string-theory boosterism, such as that it’s the only possible theory that can unify gravity with quantum physics and that it is “structurally unique” — meaning that it reproduces known physics with no adjustable parameters.

Silk and Ellis:

Dawid argues that the veracity of string theory can be established through philosophical and probabilistic arguments about the research process. Citing Bayesian analysis, a statistical method for inferring the likelihood that an explanation fits a set of facts, Dawid equates confirmation with the increase of the probability that a theory is true or viable. But that increase of probability can be purely theoretical. Because “no-one has found a good alternative” and “theories without alternatives tended to be viable in the past”, he reasons that string theory should be taken to be valid.

I don’t think Dawid actually mentions Bayesian reasoning explicitly, but this does seem to be a fair characterization of his argument.

I have no problem in principle with the idea that a theory can be shown to be highly probable based only on theoretical arguments and consistency with past data. Suppose that someone did manage to show that string theory really did have “structural uniqueness” — that is, that the mathematics of the theory could be used to derive all of the parameters of the standard model of particle physics (masses of all the elementary particles, strengths of their interactions, etc.) with no adjustable parameters. That would be overwhelming evidence that string theory was correct, even if the theory never made a novel prediction. That hasn’t happened, and I don’t see evidence that it’s likely to happen, but it’s a logical possibility.

The last sentence of the above quote is a good reason for skepticism. In your Bayesian reasoning, you should give significant prior weight to the possibility that we simply haven’t thought of the right approach yet. So although “This is the only viable approach we’ve thought of” does constitute *some* evidence in favor of the theory, it provides less evidence than some string theorists seem to think.

So in the end I agree with Silk and Ellis that string theory should be regarded with great skepticism, although I’m not entirely in accord with their reasons.

Silk and Ellis are quite unhappy with theories in which our observable Universe is just part of a much larger multiverse, and particularly with the sort of anthropic reasoning that’s often combined with multiverse theories. They claim that this sort of reasoning is anti-scientific because it doesn’t satisfy Karl Popper’s falsifiability criterion. This is a weak line of argument, as there are lots of reasons to regard Popperian falsifiability as too blunt an instrument to characterize scientific reasoning.

Silk and Ellis use an essay by Sean Carroll as their exemplar of the dangers of this sort of reasoning:

Earlier this year, championing the multiverse and the many-worlds hypothesis, Carroll dismissed Popper’s falsifiability criterion as a “blunt instrument”. He offered two other requirements: a scientific theory should be “definite” and “empirical”. By definite, Carroll means that the theory says “something clear and unambiguous about how reality functions”. By empirical, he agrees with the customary definition that a theory should be judged a success or failure by its ability to explain the data.

He argues that inaccessible domains can have a “dramatic effect” in our cosmic back-yard, explaining why the cosmological constant is so small in the part we see. But in multiverse theory, that explanation could be given no matter what astronomers observe. All possible combinations of cosmological parameters would exist somewhere, and the theory has many variables that can be tweaked.

The last couple of sentences are quite unfair. The idea behind these anthropic multiverse theories is that they predict that some outcomes are far more common than others. When we observe a particular feature of our Universe, we can ask whether that feature is common or uncommon in the multiverse. If it’s common, then the theory had a high probability of producing this outcome; if it’s uncommon, the probability is low. In terms of Bayesian reasoning (or as I like to call it, “reasoning”), the observation would then provide evidence for or against the theory.

Now you can argue that this is a bad approach in various ways. To take the most obvious line of attack, you can argue that in any particular theory the people doing the calculations have done them wrong. (Something called the “measure problem” may mean that the ways people are calculating the probabilities are incorrect, and there are other possible objections.) But those objections have nothing to do with the supposed looming menace of anti-empiricism. If you make such an argument, you’re saying that the theory under discussion is wrong, not that it’s unempirical. In other words, if that’s the problem you’re worried about, then you’re having a “normal” scientific argument, not defending the very nature of science itself.

This is just a little thing that irked me.

Silk and Ellis:

The many-worlds theory of quantum reality posed by physicist Hugh Everett is the ultimate quantum multiverse, where quantum probabilities affect the macroscopic. According to Everett, each of Schrödinger’s famous cats, the dead and the live, poisoned or not in its closed box by random radioactive decays, is real in its own universe. Each time you make a choice, even one as mundane as whether to go left or right, an alternative universe pops out of the quantum vacuum to accommodate the other action.

Personally, I’m a fan of this interpretation of quantum mechanics. But as far as I know, everyone acknowledges that the Everett interpretation is “just” an interpretation, not a distinct physical theory. In other words, it’s generally acknowledged that the question of whether to accept the Everett interpretation is outside the domain of science, precisely because the interpretation makes no distinct physical predictions. So it’s disingenuous to cite this as an example of people dragging science away from empiricism.

At a liberal-arts college like the University of Richmond, students are encouraged (and even to some extent required) to pursue interests beyond their primary field of study. This does come with costs — our students do less advanced coursework in their discipline than a student who graduates from, say, a European university — but on balance I really like it.

In the past week, I’ve seen a couple examples of students in my department doing exciting, non-physics-related things.

On Sunday, I went to the Christmas service of lessons and carols at the University Chapel. Two student choirs, the Womens’ Chorale and the mixed-gender Schola Cantorum, performed a beautiful and highly varied set of Christmas vocal music. About half a dozen students I’ve had in advanced physics classes (Isaac Rohrer, Joe Kelly, Grace Dawson, Kelsey Janik, Ed Chandler, and I’m pretty sure some more I’m forgetting) were among the performers.

Just this morning, there was a piece on our local public radio station about an interdisciplinary art / archaeology / ecology project in which they consider and modify the environment of a university parking lot in a variety of ways. One of the students interviewed in the piece, David Ricculli, is a physics major who worked in my lab. Another, Kelsey Janik, is not a physics major but has taken several of our advanced courses, as well as my first-year seminar, and is also one of the singers.

In past years, I’ve seen our physics students display art work in on-campus exhibitions, perform in a huge variety of theatrical and musical events, and present talks on topics like monetary policy in ancient Rome. I really enjoy seeing the range of things they can do.