Zheng & Bunn paper submitted

Jeff Zheng (UR ’11) and I just submitted a paper for publication in Physical Review D.  It’s also on the preprint arXiv.  The point of the paper is to examine possible explanations for a couple of the supposed “anomalies” in the cosmic microwave background radiation on large angular scales.  These anomalies are unexpected patterns in the microwave background.  For instance, fluctuations corresponding to waves with different wavelengths on the sky should be completely unrelated to each other, but a couple of the largest observable wave patterns point in almost exactly the same direction as each other.  In addition, one half of the sky seems to have slightly larger-amplitude fluctuations than the other half.

It’s hard to know how seriously to take these puzzles: human beings are very good at seeing patterns, even when those patterns are just due to chance.  That might be all that’s going on here, but in our paper we tentatively adopt the point of view that the patterns are meaningful, and then assess some possible theories for what might have caused them.  We show that some alternative theories do provide a better fit to the data, but only slightly better, and we calculate (using a piece of machinery known as Bayesian evidence) that that improvement in fit is too small to be regarded as significant evidence in favor of the alternative theories.

Science News on probability and statistics

This piece does a good job of explaining some of the dangers in the use and interpretation of statistics in scientific studies. It’s mostly about ways in which the statistical results quoted in the scientific literature can be misleading (to scientists as well as to the general public).

The article chiefly attacks the use of 95% confidence results (i.e., results that reject the null hypothesis with p=0.05 or less) as indications that something has been scientifically established.  It does a good job laying out several related problems with this attitude:

  • 95% isn’t all that high.  If you do a bunch of such tests, you get a false positive one time out of every 20.  Lots of scientists are out there doing lots of tests, so there are bound to be lots of false positives.
  • Sometimes even a single study will perform many comparisons, each of which could yield a positive result.  In that case, the probability of getting false positives goes up very rapidly.
  • Of course we hear about positive results, not the negative ones.  The result is that a lot (much more than 5%) of the results you hear about are false positives.
  • People — even lots of scientists — misunderstand what the probabilities here refer to.  When a test is done that has a p-value of 5% (often referred to as a 95% confidence result), they think that it means that there’s a 95% chance the hypothesis being tested is correct.  In fact, it means that there’s a 5% chance that the test would have come out the way it did if the  hypothesis is false.  That is, they’re probabilities about the possible results of the test, not probabilities about the ideas being tested. That distinction seems minor, but it’s actually hugely important.

If you don’t think that that last distinction matters, imagine the following scenario.  Your doctor gives you a test for a deadly disease, and the test is 99% accurate.  If you get a positive result, it does not mean there’s a 99% chance you have the disease.   Box 4 in the article works through a numerical example of this.

As the article puts it,

Correctly phrased, experimental data yielding a P value of .05 means that there is only a 5 percent chance of obtaining the observed (or more extreme) result if no real effect exists (that is, if the no-difference hypothesis is correct). But many explanations mangle the subtleties in that definition. A recent popular book on issues involving science, for example, states a commonly held misperception about the meaning of statistical significance at the .05 level: "This means that it is 95 percent certain that the observed difference between groups, or sets of samples, is real and could not have arisen by chance."

That interpretation commits an egregious logical error (technical term: "transposed conditional"): confusing the odds of getting a result (if a hypothesis is true) with the odds favoring the hypothesis if you observe that result. A well-fed dog may seldom bark, but observing the rare bark does not imply that the dog is hungry. A dog may bark 5 percent of the time even if it is well-fed all of the time.

This is exactly right, and it’s a very important distinction.

The specific cases discussed in the article mostly have to do with medical research.  I know very little about the cultural attitudes in that discipline, so it’s hard for me to judge some things that are said.  The article seems (as I read it) to imply that lots of people, including scientists, regard a 95% confidence result as meaning that something is pretty well established as true.  If that’s correct, then lots of people are out there believing lots of wrong things.   A 95% confidence result should be regarded as an interesting hint that something might be true, leading to new hypotheses and experiments that will either confirm or refute it.  Something’s not well-established until its statistical significance is way better than that.

Let me repeat: I have no idea whether medical researchers really do routinely make that error.  The article seems to me to suggest that they do, but I have no way of telling whether it’s right.  It certainly is true that science journalism falls into this trap with depressing regularity, though.

Since I don’t know much about medical research, let me comment on a couple of ways this stuff plays out in physics.

  • In astrophysics we do quote 95% confidence results quite often, although we also use other confidence levels.  Most of the time, I think, other researchers correctly adopt the interesting-hint attitude towards such results.  In particle physics, they’re often quite strict in their use of terminology: a particle physicist would never claim a “detection” of a particle based on a mere 95% confidence result.  I think that their usual threshold for use of that magic word is either 4 or 5 sigma (for normally distributed errors), which means either 99.99% or 99.9999% confidence.
  • The multiple-tests problem, on the other hand, can be serious in physics.  One way, which I’ve written about before, is in the various “anomalies” that people have claimed to see in the microwave background data.  A bunch of these anomalies show up in statistical tests at 95% to 99% confidence.  But we don’t have any good way of assessing how many tests have been done (many, especially those that yield no detection, aren’t published), so it’s hard to tell how to interpret these results.

Although the Science News article is mostly right, I do have some complaints.  The main one is just that it is overwrought from time to time:

It's science's dirtiest secret: The "scientific method" of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions.

Nonsense.  It’s certainly true that people mis- and overinterpret statistical statements all the time, both in the general-interest press and in the scholarly literature, but that doesn’t mean that the tools themselves are invalid.  If I use a hammer to open a beer bottle, I’ll get bad results, but it’s not the hammer’s fault.

By the way, the “mutually inconsistent philosophies” here seem at one point to refer ot the quite obscure difference between Fisher’s approach and that of Neyman and Pearson, and later to be the somewhat less obscure difference between frequentists and Bayesian.  Either way, “mutually inconsistent” and “offer no meaningful basis” are huge exaggerations.

(Lots of people seem to think that such clash-of-the-titans language is correct when applied to frequentists vs. Bayesians, but I think that’s wrong.  When it comes to statistical methods, as opposed to the pure philosophy of probability, the two approaches are simply different sets of tools, not irreconcilable ways of viewing the world.  People can and do use tools from both boxes.)

The concluding few paragraphs of the article take this hand-wringing to absurd lengths. For the record, it is absolutely not true, as a quotation from the author David Salsburg claims, that the coexistence of Bayesian and frequentist attitudes to statistics means that the whole edifice “may come crashing down from the weight of its own inconsistencies."  The problems described in the article are real, but they’re cultural problems in the way people communicate and talk about their results, not problems in the philosophy of probability.

One other technical problem: the article suggests that randomized clinical trials have a problem because they don’t guarantee that all relevant characteristics are equally split between the trial and control groups:

Randomization also should ensure that unknown differences among individuals are mixed in roughly the same proportions in the groups being tested. But statistics do not guarantee an equal distribution any more than they prohibit 10 heads in a row when flipping a penny. With thousands of clinical trials in progress, some will not be well randomized.

This is true but is not a problem.  This fact is automatically accounted for in the statistical analysis (i.e., the p-values) that result from the study.

Earthquakes and Earth’s rotation

 Updates: 1. Every article other than the one I linked to says the change in the length of the day is in microseconds, not milliseconds.  Much more plausible.  2.  The Onion’s on the case.

Ashley pointed out this article on the Chile earthquake’s effect on Earth’s rotation.

The massive 8.8 earthquake that struck Chile may have changed the entire Earth’s rotation and shortened the length of days on our planet, a NASA scientist said Monday.

The quake, the seventh strongest earthquake in recorded history, hit Chile Saturday and should have shortened the length of an Earth day by 1.26 milliseconds, according to research scientist Richard Gross at NASA’s Jet Propulsion Laboratory in Pasadena, Calif.

“Perhaps more impressive is how much the quake shifted Earth’s axis,” NASA officials said in a Monday update.

The change in the length of the day is good first-year physics stuff.  Angular momentum is conserved, and is equal to moment of inertia times rotation rate.  The moment of inertia of a body depends on how its mass is distributed.  If you change the distribution you change the moment of inertia, and the rotation rate has to change to compensate.  Think of the standard-issue spinning figure skater pulling in his arms, or diver going into a tuck position, and starting to rotate faster. I’m a bit surprised the change is as large as this, but I guess it’s possible.

Here’s an embarrassing confession. I can’t make sense of this:

The Earth’s figure axis is not the same as its north-south axis, which it spins around once every day at a speed of about 1,000 mph (1,604 kph).

The figure axis is the axis around which the Earth’s mass is balanced. It is offset from the Earth’s north-south axis by about 33 feet (10 meters).

I don’t think I know what “figure axis” means in this context.  The Earth at any instant has an axis about which it’s rotating, and that axis will always pass through the center of mass, which is my best guess at the meaning of the phrase “around which the Earth’s mass is balanced.”  But is that the figure axis or the north-south axis?  What’s the difference between the two?  (North-south axis could in principle be defined by the magnetic field, but that would be different by much more than 10 meter, so it’s not that.)

There’s one other thing I don’t understand:

Over the course of a year, the length of a day normally changes gradually by one millisecond. It increases in the winter, when the Earth rotates more slowly, and decreases in the summer, Gross has said in the past.

Why would Earth’s rotation vary over the course of a year?  I can think of two possibilities:

Possibility 1. Annual changes in wind speed and/or direction.  The total angular momentum of Earth-plus-atmosphere is what’s conserved, so when the wind is blowing west to east, the Earth will rotate slower than when it’s blowing east to west.  Do winds blow more west to east in the (northern-hemisphere) winter?  Paging my brother Andy for the answer to this.

Possibility 2. The article’s made a mistake.  It’s not that the rotation rate changes, but rather that the Earth’s orbital speed around the Sun changes.  If the rotation rate is fixed, then the length of a sidereal day (a day measured relative to the stars)  remains the same.  But a solar day (measured relative to the Sun, of course) is a bit longer than a sidereal day, and the difference depends on the orbital speed.  In the (northern-hemisphere) winter, the orbital speed is faster, which means that the length of a solar day is longer, and vice versa in the summer.  So that effect has the right sign to be what Gross is talking about.  But it’s much too large an effect: I think it’s a few seconds, not milliseconds.

After the jump, I’ll try a back-of-the-envelope calculation to see if Possibility 2 makes sense.

Continue reading Earthquakes and Earth’s rotation

Physics for pre-meds

Last June,  a committee convened by the Association of American Medical Colleges and the Howard Hughes Medical Institute issued a report titled Scientific Foundations for Future Physicians, proposing changes to the science requirements for medical students, including both the pre-med and medical-school curricula.  Among other things, this report is intended as input to a committee that is planning major revisions to the MCAT some time around 2014.  As far as I can tell, physics faculty members (including me until recently) tend not to know this is going on.  But since many physics departments earn a signifcant part of their living by teaching premeds, we should probably be paying attention to this process.

The main broad-brush recommendation in the report is to move away from specific lists of required courses and toward “competency” requirements.  Medical schools should no longer say, “Thou shalt take two semesters of physics,” for instance, but rather should require that students can perform  certain tasks.  Part of the reason for this is to remove barriers to colleges that want to implement novel ways of teaching science, especially ways that emphasize interdisciplinarity:

Organizing educational programs according to departmental priorities is a long-standing tradition in both undergraduate and professional education, but some institutions have begun to develop their educational program through an integrated, nondepartmental approach, and it is this approach the committee supports in its report.

That quote could have been talking about UR’s new Interdisciplinary Quantitative Science course.  During the development of the course, one thing we had to pay attention to was making sure that it checked all of the required pre-med boxes, and in particular that it would be evident from the transcript that students had had the required courses.  For instance, since medical schools require two semesters of physics, and this course replaces one of those, we had to make sure that at least one unit’s worth of the course was listed in the transcript as Physics (in addition, of course, to making sure students actually learned the appropriate physics).

Naturally, one of the first things I looked at in the document was how the proposed changes would affect the physics students would take.  One of the eight “competencies” recommended for admission to medical school is

Demonstrate knowledge of basic physical principles and their application to the understanding of living systems.

This is fleshed out with a bunch of “learning objectives”:

  • Mechanics
  • Electricity & magnetism
  • Waves & optics
  • Thermodynamics & fluids
  • Principles of quantum mechanics
  • Principles of systems behavior

The committee’s recommendation is that these competencies replace explicit course requirements such as “two semesters of physics.”  But the above list of learning objectives pretty much matches what’s taught in a usual two-semester physics-for-premeds sequence.  Actually, it covers a bit more than that: we never do “principles of systems behavior”, and quantum mechanics is often left out as well.

So it seems to me that, if these recommendations are implemented, premed students will not end up taking less physics than they do now.  To a good approximation, they won’t even be taking different physics from what they do now.  As far as physics is concerned, it’s surprising how little change the committee recommends.  Despite the report’s words about encouraging interdisciplinary approaches to teaching science, it’s easy to imagine these recommendations leading to physics-for-premed courses chugging along pretty much as before.

Of course, I don’t know if medical school admissions people, and more importantly the MCAT-redesign people, will adopt these recommendations, or how they’ll implement them if they do.  In particular, what actually happens with the MCAT will probably be the most important driver of changes to the premed curriculum.  The MCAT-revision process is just getting started now.  People who care about undergraduate science curriculum issues should certainly pay attention.