May 2012 – Ted Bunn’s Blog

Watch out for Andromeda

There’s something I’ve wondered about for years. Astronomers always say that the Andromeda Galaxy (also known as M31) is going to collide with our own. After all, we measure the galaxy’s velocity via the Doppler effect, and we find it’s moving towards us. But the Doppler effect only lets you measure the radial component of the galaxy’s velocity — that is, how fast it’s moving towards or away from us, but not how fast it’s moving laterally. How do we know that Andromeda isn’t moving sideways at a high enough speed to miss us?

Well, it turns out not to be moving sideways very fast at all, so it is going to hit us. Good to know I guess. I’m just glad to see that this question I’ve been wondering about for so long is a legitimate question: I’ve seen so many mentions of the impending collision, with no reference at all to the lateral-velocity question, that I was wondering if I was missing something obvious.

Does NSF know how to design an experiment?

The journal Science has a news piece on an experimental program in which the National Science Foundation varied the way grant proposals are submitted and evaluated to see if there was any effect on the outcomes:

They invited the applicants to supplement their standard, 15-page project descriptions and itemized cost estimates with two-page synopses that left out the details of each proposal but underscored the central idea. All of the applicants agreed to participate.

The division assembled two peer-review panels. One rated the traditional full proposals, while the other evaluated the two-page versions, which omitted the names and affiliations of applicants.

The two panels came up with quite different ratings.

The Science piece goes on to speculate that making the applicants anonymous makes a big difference in how proposals are evaluated. They have to base this on anecdotal evidence, though: one person from a lesser-known institution got funded under the anonymized protocol but had previously been rejected under the onymized protocol. (“Onymized” isn’t a word, as far as I know, but it should be.)

They have to rely on anecdata rather than using the actual data from this experiment, for a reason that should be obvious to any middle-school science fair entrant: the experimental protocol changed two important things at once. There’s no way to tell from the results of this experiment whether the change in outcomes was due to the applicants’ anonymity or to the shortening of the proposals from 15 pages to 2.

Radically shortening proposals like this is a huge change. There’s no way for a 2-page proposal to contain a significant amount of information about the proposed research. I’d be astonished if you got similar results from reviews of the 15-page and 2-page proposals, even if you leave anonymity aside. But because of what appears to be a bizarrely flawed experimental design, I don’t know for sure, and neither does anyone else.

In fairness, Science does note that NSF plans to do another round of the study to try to separate out the two effects. But I’m baffled by the choice to put them together in the first place.

Another stupid comment from the Science piece:

Two divisions within NSF’s Directorate for Biological Sciences are already applying one insight gained from the Big Pitch: Shorter might be better.

Well I suppose it might be, but as far as I can tell this experiment provided precisely no evidence for this claim. It may or may not have shown that shorter is different: different proposals rose to the top in the two protocols. (I think it would have been astonishing if this had not been the case.) But as far as I can tell there’s no reason for thinking that either is better.

Electrostatics puzzle

Suppose you have two conducting spheres of different radii. Both have positive charge on them. If the spheres are far apart, of course, they repel each other in the usual Coulomb’s-Law way. You bring the two spheres closer and closer together. Does the force remain repulsive for arbitrarily small distances, or does it become attractive when the surfaces of the two spheres are sufficiently close? Does the answer depend on the values of the charges and radii?

(The idea, which should be familiar to undergraduate physics students, is that the charges move around on the surfaces of the conductors. When they’re close to each other, the positive charge on one sphere will repel the positive charge on the other, leaving negative charge nearby and potentially leading to an attractive force.)

The general solution seems to be quite difficult, but according to a Nature News article it was recently solved. The answer is that for almost all values of charges and radii, the spheres do attract each other at sufficiently close distances.

Even though the general case is quite ugly, you can turn this question into a nice puzzle, accessible to undergraduate physics students:

Show that there are some choices of charges and radii such that, at sufficiently close distances, the two spheres attract each other.

A hint, in case you need it: Consider extreme cases. That’s good advice for thinking about lots of physics problems, by the way — one of the reasons I think this is a cute puzzle to give students.

Stop digging

Andrew Sullivan deserves a lot of credit for posting dissenting views from readers on his blog. But sometimes he’d be better off not bothering. After posting a silly quote about complexity and the second law of thermodynamics, he posted a dissent from a reader that, remarkably, manages to be even sillier.

Among other things, the dissenter says that “more complex systems are more efficient in their use of energy.” It’s not even clear that that clause even means anything: “more efficient” at achieving what goal? In what sense is a cow “more efficient” than a bacterium?

There is also no sense in which “increasing complexity is built into the fabric of the universe.” Some patches of the Universe (e.g., our neighborhood) are quite complex, but the overwhelming majority is simple and boring.

Finally, the dissenter quotes Einstein as saying “The most important decision we make is whether we believe we live in a friendly or hostile universe.” To an excellent approximation, all quotations attributed to Einstein are apocryphal. If the quote sounds spiritual and woo-woo, while not actually meaning anything, the probability rises to virtually 100%. In this particular case, the earliest printed source seems to be The Complete Idiot’s Guide to Spiritual Healing (2000). Enough said.

Want proof that data illiteracy is a problem? Talk to a member of Congress

If I could change one thing about our education system, I’d emphasize at all grade levels a set of skills you might call “data literacy” — a combination of things like numeracy, basic statistics, and generally the ability to think about quantitative data coherently.

Until we get this, I’ll just have to dream of the day when it would be surprising to hear a member of Congress say

In the end this is not a scientific survey. It’s a random survey.

Kudos to the reporter for following this sentence immediately with the following one-sentence paragraph:

In fact, the randomness of the survey is precisely what makes the survey scientific, statistical experts say.

Less physics for pre-meds?

The American Association of Medical Colleges has approved changes to the MCAT (the exam required for admission to US medical schools). Maybe I haven’t been paying attention, but as far as I can tell the physics community isn’t paying much attention to this. I think we should be, because some standard topics in the introductory physics sequence seem to have moved off the list of topics covered in the exam.

Compare this document describing topics covered on the current exam with this one describing the new one, which is to start in 2015. There are a number of topics on the old list but not the new list, the biggest ones (in my opinion) being magnetic fields and momentum.

I’m not going to comment on whether the loss of these topics will have a deleterious effect on future generations of physicians. I am interested in what effect it will have on the curriculum at universities such as mine. There are some topics on the MCAT list that we don’t cover in great depth in our first-year physics course because we don’t have time (e.g., sound, fluids, geometric optics). Many of the students coming through our introductory course are pre-med students. Will we be expected to dump magnetism and conservation of momentum to make room for these?

Nature really hates psychology

That’s Nature the premier science journal, not the nature that abhors a vacuum. Check out this news feature on problems in the field.

These problems occur throughout the sciences, but psychology has a number of deeply entrenched cultural norms that exacerbate them. It has become common practice, for example, to tweak experimental designs in ways that practically guarantee positive results. And once positive results are published, few researchers replicate the experiment exactly, instead carrying out ‘conceptual replications’ that test similar hypotheses using different methods. This practice, say critics, builds a house of cards on potentially shaky foundations.

I’m not a psychologist, so I can’t be sure how much merit there is in the article’s indictments. But don’t worry — I won’t let that stop me from commenting!

The article seems to me to make three main claims:

“Publication bias” is a serious problem — results that show something positive and surprising are much more likely to be published.
Psychologists often use dishonest statistical methods (perhaps unintentionally).
People don’t replicate previous results exactly.

Let’s take each in turn.

1. Publication bias.

There’s no doubt that experiments that show a positive result are more highly valued than experiments that don’t show one. It’s much better for your career to find something than to find nothing. That’s bound to lead to a certain amount of “publication bias”: the published literature will contain many more false positives than you’d naively expect, because the false positives are more likely to get published.

This problem is exacerbated by the fact that people often use the ridiculously low threshold of 95% confidence to decide whether a result is statistically significant. This means that you expect one out of every 20 tests you do to yield a false positive result. There are lots of people doing lots of tests, so of course there are lots of false positives. That’s why, according to John Ioannidis, “most published [medical] research findings are false.”

The Nature piece claims that these problems are worse in psychology than in other fields. I don’t know if that’s true or not. Some evidence from the article:

Psychology and psychiatry, according to other work by Fanelli, are the worst offenders: they are five times more likely to report a positive result than are the space sciences, which are at the other end of the spectrum. The situation is not improving. In 1959, statistician Theodore Sterling found that 97% of the studies in four major psychology journals had reported statistically significant positive results5. When he repeated the analysis in 1995, nothing had changed.

One reason for the excess in positive results for psychology is an emphasis on “slightly freak-show-ish” results, says Chris Chambers, an experimental psychologist at Cardiff University, UK. “High-impact journals often regard psychology as a sort of parlour-trick area,” he says. Results need to be exciting, eye-catching, even implausible. Simmons says that the blame lies partly in the review process. “When we review papers, we’re often making authors prove that their findings are novel or interesting,” he says. “We’re not often making them prove that their findings are true.”

Incidentally, the article has a graphic illustrating the fraction of times that papers in different disciplines quote positive results, but it doesn’t seem to support the claim that psychology is five times worse than space science:

I estimate psychology is at about 93% and space science is at about 70%. That’s not a factor of 5.

2. Dishonest statistics.

Nature doesn’t use a loaded word like “dishonest,” but here’s what they claim:

Many psychologists make on-the-fly decisions about key aspects of their studies, including how many volunteers to recruit, which variables to measure and how to analyse the results. These choices could be innocently made, but they give researchers the freedom to torture experiments and data until they produce positive results.

In a survey of more than 2,000 psychologists, Leslie John, a consumer psychologist from Harvard Business School in Boston, Massachusetts, showed that more than 50% had waited to decide whether to collect more data until they had checked the significance of their results, thereby allowing them to hold out until positive results materialize. More than 40% had selectively reported studies that “worked”. On average, most respondents felt that these practices were defensible. “Many people continue to use these approaches because that is how they were taught,” says Brent Roberts, a psychologist at the University of Illinois at Urbana–Champaign.

If you deliberately choose to report data that lead to positive results and not data that lead to negative results, then you’re just plain lying about your data. The same goes for continuing to gather data, only stopping when “positive results materialize.”

Once again, I am not claiming that psychologists do this. I am claiming that if they do this, as Nature claims, it’s a very serious problem.

3. People don’t replicate previous studies.

The Nature article tells the story of the people who tried to replicate a previous study claiming to show precognition. Journals wouldn’t accept the paper detailing the failed replication. I agree with the author that this is a shame.

But the article’s concerns on this seem quite overwrought to me. They point out that most of the time people don’t try to replicate studies exactly, instead performing “conceptual replications,” in which they do something similar but not identical to what’s been done before. The author seems to think that this is a problem, but I don’t really see why. Here’s his argument:

But to other psychologists, reliance on conceptual replication is problematic. “You can’t replicate a concept,” says Chambers. “It’s so subjective. It’s anybody’s guess as to how similar something needs to be to count as a conceptual replication.” The practice also produces a “logical double-standard”, he says. For example, if a heavy clipboard unconsciously influences people’s judgements, that could be taken to conceptually replicate the slow-walking effect. But if the weight of the clipboard had no influence, no one would argue that priming had been conceptually falsified. With its ability to verify but not falsify, conceptual replication allows weak results to support one another. “It is the scientific embodiment of confirmation bias,” says Brian Nosek, a social psychologist from the University of Virginia in Charlottesville. “Psychology would suffer if it wasn’t practised but it doesn’t replace direct replication. To show that ‘A’ is true, you don’t do ‘B’. You do ‘A’ again.”

It’s true that a “conceptual replication” doesn’t directly falsify a particular result in the same way that a (failed) exact replication would. But I can’t bring myself to care that much. If a result is incorrect, it will gradually become clear that it doesn’t fit in with the pattern built up by many subsequent similar-but-not-identical experiments. The incorrect result will gradually atrophy from lack of attention, even if there’s not a single definitive refutation.

At least, that’s the way I see things working in physics, where direct replications of previous experiments are extremely uncommon.

Scooped … by Newton and Leibniz

I just learned via the FQXi twitter feed about this paper that was actually published in a medical journal in 1994:

A Mathematical Model for the Determination of Total Area Under Glucose Tolerance and Other Metabolic Curves

Abstract

OBJECTIVE To develop a mathematical model for the determination of total areas under curves from various metabolic studies.

RESEARCH DESIGN AND METHODS In Tai’s Model, the total area under a curve is computed by dividing the area under the curve between two designated values on the X-axis (abscissas) into small segments (rectangles and triangles) whose areas can be accurately calculated from their respective geometrical formulas. The total sum of these individual areas thus represents the total area under the curve. Validity of the model is established by comparing total areas obtained from this model to these same areas obtained from graphic method Gess than ±0.4%). Other formulas widely applied by researchers under- or overestimated total area under a metabolic curve by a great margin.

RESULTS Tai’s model proves to be able to 1) determine total area under a curve with precision; 2) calculate area with varied shapes that may or may not intercept on one or both X/Y axes; 3) estimate total area under a curve plotted against varied time intervals (abscissas), whereas other formulas only allow the same time interval; and 4) compare total areas of metabolic curves produced by different studies.

CONCLUSIONS The Tai model allows flexibility in experimental conditions, which means, in the case of the glucose-response curve, samples can be taken with differing time intervals and total area under the curve can still be determined with precision.

That’s right — the author discovered the trapezoidal rule from integral calculus and named it after herself. A subsequent issue of the journal contained several letters pointing this out. (Both links may be paywalled for all I know.)

It’s pretty embarrassing that such a paper (a) got written in the first place and (b) passed peer review, but the really shocking part is that the paper has garnered 161 citations.

Why astrophysicists should measure immigration delays at Heathrow

As I’ve mentioned before, I like the BBC’s podcast More or Less, which examines the use and misuse of statistics in the news. It generally conveys some reasonably sophisticated ideas engagingly and correctly. Here’s a nice example I learned from them recently.

The British Government has set a goal for the time people spend in immigration queues at Heathrow. According to an article by Tim Harford, host of More or Less,

The Border Force is supposed to ensure that passengers from outside the EU get through immigration checks within 45 minutes 19 times out of 20, while EU-based passengers should get through within 25 minutes, again 19 times out of 20.

They then measure whether this goal has been met:

At regular intervals [once per hour, to be precise] they pick somebody joining the back of the queue and then time how long it takes for that person to clear immigration.

At first glance, this might sound like a reasonable method, but in fact it’s not. The reason should certainly be familiar to astrophysicists (and probably to lots of other sorts of scientists). I’ll put a page break here just in case you want to think about it for a minute.

Continue reading Why astrophysicists should measure immigration delays at Heathrow

ESA’s going to Jupiter

The European Space Agency’s next large mission will be JUICE, a probe to study Jupiter’s icy moons. People who study other areas of astrophysics (like me) are disappointed that ESA didn’t choose a mission to the stuff we’re terribly excited about. In particular, some people are very disappointed that the X-ray observatory ATHENA lost out.

I don’t really know what should have happened, but this blog post (which I learned about from Peter Coles, by the way) does a pretty good job of explaining why JUICE is interesting.

The most striking thing to me is the extremely long time scale. JUICE isn’t scheduled to get to Jupiter until the 2030s. I know it’s sometimes necessary to plan way in advance, but it does seem like a big gamble to devote a bunch of resources to something that far off. How certain are we that the questions that seem interesting now will still seem interesting then?