Auntie Beeb

I got an email from someone at the BBC last week, asking for permission to use some images of mine (as opposed to images of me, in which they have expressed no interest) in an upcoming documentary. The images in question are ones I made to illustrate some aspects of the maps made by the COBE satellite about 20 years ago. I was a bit surprised that they wanted them, but of course I’m happy to help out.

Because the images are so old, I couldn’t lay my hands on decently high-resolution versions of them. All I could find were copies at various web sites such as this one by Wayne Hu. It turned out to be easy enough just to remake them, so that’s what I did. In fact, thanks to the HEALPix software package, it was easy to make versions that were considerably better than the originals.

The BBC may not end up using the images. I’ll go ahead and put them up here with a bit of explanation anyway.

COBE made all-sky maps of temperature variations in the cosmic microwave background radiation. The pattern of hot and cold regions in these maps provided invaluable information about the early Universe and won a couple of people the Nobel Prize. The COBE maps look like this:

(I didn’t make this one, by the way. The COBE team did. The rest of the images in this post are mine.)

The COBE instrument had imperfect resolution (like all telescopes), meaning that it couldn’t see features smaller than some given size. It also had significant noise in the images, because the signals it was looking at were very weak. So the relation between what COBE saw and what’s actually out there is not obvious. Here’s one way to illustrate the difference.

Suppose that COBE had been designed to measure the Earth, rather than the Universe. Then the “true” signal it would look at is something like this:

(This is a map of the elevation of Earth’s surface.)

The telescope’s resolution is such that features smaller than about 7 degrees are blurred out, so the map would be degraded to something like this:

To make matters worse, there is noise (random “static”) in the data, which would make the actual observed map look more like this:

(By the way, I wasn’t terribly precise at this stage in the process. The noise level is roughly equivalent to the COBE noise level, but only roughly.)

You can see the large-scale features (e.g., continents) peeking out from the noise, but especially on small scales the signal is dominated by noise.

There are various ways you can “filter” the data to reduce the effects of noise. The optimal filter (for some definition of “optimal”) in this situation is called the Wiener filter. This is essentially a way of smoothing out the data to get rid of the small-scale variation (which is mostly noise) and keep the large-scale stuff (which is mostly signal). If you apply a Wiener filter to the noisy map above, you get this:

This is what you might reasonably expect to see if you observed the elevation of Earth with a COBE-like instrument. The large-scale features do correspond roughly to real things, but you can’t trust all the details.

Note that this is not a criticism of the COBE work — it’s just that the signals they were looking for were very hard to measure. That’s why other telescopes, most notably the successor satellites WMAP and Planck, were necessary.

Telescope lost and found

Some of you probably already know about this, but for those who don’t, here’s a strange story. A telescope designed to observe the cosmic microwave background radiation was on its way to the NASA balloon launch facility in Palestine, Texas, when it went missing for about  three days.

The driver of the truck containing the telescope disappeared for a while, then turned up without the trailer containing the telescope. The trailer turned up later. The driver won’t say what happened.

Apparently the telescope is fine. All that was missing from the trailer were “two bicycles and three ladders.”

The cost to replace the telescope would have been enormous, but of course it’s an extremely illiquid asset — who would you try to fence it to?

Since I work in this field, it’s no surprise that I know many of the people involved in the experiment, and of course I’m very relieved for them and for the cosmology community.

I feel a strong connection to this story, because many years ago I was briefly a member of a team that flew a similar (albeit much more primitive) microwave background telescope from the balloon launch facility in Palestine. I wasn’t involved in the project for very long, and my only real contribution was driving the truck containing the telescope back from Palestine to California with my friend and fellow graduate student Warren Holmes. Nowadays, apparently they outsource tasks like that to professionals rather than grad students. Look how well that worked out.

A couple of elitist coastal vignettes about Palestine:

  1. One guy on the experiment was a vegetarian. Options for him were not plentiful. He got mighty sick of the salad bar at the Golden Coral.
  2. One of the more interesting places to visit in Palestine was the pawn shop. As I recall, about 80% of the shelf space was taken up by two items: guns and guitars.

 

The new MCAT

Only about two months late, I finally got around to reading the New York Times Education Life piece on the new MCAT (the test taken by applicants to US medical schools). Self-centered soul that I am, I’ve primarily paid attention to what’s happening to the physics content of the MCAT, not about what other new subjects are being added.

The short answer is social sciences, primarily psychology and sociology.

This may well be a good idea, although the primary reason for the change, according to the article, strikes me as bizarre:

 In surveys, “the public had great confidence in doctors’ knowledge but much less in their bedside manner,” said Darrell G. Kirch, president of the association, in announcing the change. “The goal is to improve the medical admissions process to find the people who you and I would want as our doctors. Being a good doctor isn’t just about understanding science, it’s about understanding people.”

The adoption of the new test, which will be first administered in 2015, is part of a decade-long effort by medical educators to restore a bit of good old-fashioned healing and bedside patient skills into a profession that has come to be dominated by technology and laboratory testing.

The hypothesis that studying psychology and sociology will improve bedside manner is far from obvious, to say the least. Studying psychology to improve this sort of skill is about as likely to work as studying physics to improve your ability to play baseball.

Let me stipulate a couple of things. This is not intended as a putdown of psychology as a discipline. Physics is worth studying even if it doesn’t improve your batting average, and psychology is worth studying even if it doesn’t improve your bedside manner. Similarly, I’m not claiming that the change in emphasis on the MCAT is a bad idea — in fact, it may well be very sensible. I just don’t think this is a good reason for it. I doubt very much that anything you could put on a multiple-choice test would select well for bedside manner.

A few other little things:

1. The article begins with an anecdote about a medical ethics class that had a sudden increase in enrollment due to students wanting to be ready for the new MCAT. But the new test doesn’t roll out until after these students will have taken the MCAT. Perhaps what the test really needs is an emphasis on arithmetic.

2. Some of the graphics in the article are truly atrocious.

Presumably the point of this one is to tell you something about the gender balance at different stages of the application process. Without actually counting the little purple and orange folks, can you tell whether the percentage of women goes up or down at each stage?

Then there’s this one.

Again the point is presumably to show how the percentages of different groups change from the applicant pool to the accepted pool. Can you tell from this graph whether the green percentage went up or down? The magenta percentage? (Confession: I cropped out the legend, which listed the percentages, so in fact you’d know the answers if you saw the original graph. But if the only way to tell the interesting information is by reading the numbers from the legend, what’s the point of the pie chart?)

Also, because the wedges are different heights, the volumes aren’t proportional to the actual percentages. This is a classic how-to-lie-with-statistics problem.

These graphs aren’t quite perfectly designed to obscure the relevant information, but they’re close.

Watch out for Andromeda

There’s something I’ve wondered about for years. Astronomers always say that the Andromeda Galaxy (also known as M31) is going to collide with our own. After all, we measure the galaxy’s velocity via the Doppler effect, and we find it’s moving towards us. But the Doppler effect only lets you measure the radial component of the galaxy’s velocity — that is, how fast it’s moving towards or away from us, but not how fast it’s moving laterally. How do we know that Andromeda isn’t moving sideways at a high enough speed to miss us?

Well, it turns out not to be moving sideways very fast at all, so it is going to hit us. Good to know I guess. I’m just glad to see that this question I’ve been wondering about for so long is a legitimate question: I’ve seen so many mentions of the impending collision, with no reference at all to the lateral-velocity question, that I was wondering if I was missing something obvious.

Does NSF know how to design an experiment?

The journal Science has a news piece on an experimental program in which the National Science Foundation varied the way grant proposals are submitted and evaluated to see if there was any effect on the outcomes:

They invited the applicants to supplement their standard, 15-page project descriptions and itemized cost estimates with two-page synopses that left out the details of each proposal but underscored the central idea. All of the applicants agreed to participate.

The division assembled two peer-review panels. One rated the traditional full proposals, while the other evaluated the two-page versions, which omitted the names and affiliations of applicants.

The two panels came up with quite different ratings.

The Science piece goes on to speculate that making the applicants anonymous makes a big difference in how proposals are evaluated. They have to base this on anecdotal evidence, though: one person from a lesser-known institution got funded under the anonymized protocol but had previously been rejected under the onymized protocol. (“Onymized” isn’t a word, as far as I know, but it should be.)

They have to rely on anecdata rather than using the actual data from this experiment, for a reason that should be obvious to any middle-school science fair entrant: the experimental protocol changed two important things at once. There’s no way to tell from the results of this experiment whether the change in outcomes was due to the applicants’ anonymity or to the shortening of the proposals from 15 pages to 2.

Radically shortening proposals like this is a huge change. There’s no way for a 2-page proposal to contain a significant amount of information about the proposed research. I’d be astonished if you got similar results from reviews of the 15-page and 2-page proposals, even if you leave anonymity aside. But because of what appears to be a bizarrely flawed experimental design, I don’t know for sure, and neither does anyone else.

In fairness, Science does note that NSF plans to do another round of the study to try to separate out the two effects. But I’m baffled by the choice to put them together in the first place.

Another stupid comment from the Science piece:

Two divisions within NSF’s Directorate for Biological Sciences are already applying one insight gained from the Big Pitch: Shorter might be better.

Well I suppose it might be, but as far as I can tell this experiment provided precisely no evidence for this claim. It may or may not have shown that shorter is different: different proposals rose to the top in the two protocols. (I think it would have been astonishing if this had not been the case.) But as far as I can tell there’s no reason for thinking that either is better.

 

Electrostatics puzzle

Suppose you have two conducting spheres of different radii. Both have positive charge on them. If the spheres are far apart, of course, they repel each other in the usual Coulomb’s-Law way. You bring the two spheres closer and closer together. Does the force remain repulsive for arbitrarily small distances, or does it become attractive when the surfaces of the two spheres are sufficiently close? Does the answer depend on the values of the charges and radii?

(The idea, which should be familiar to undergraduate physics students, is that the charges move around on the surfaces of the conductors. When they’re close to each other, the positive charge on one sphere will repel the positive charge on the other, leaving negative charge nearby and potentially leading to an attractive force.)

The general solution seems to be quite difficult, but according to a Nature News article it was recently solved. The answer is that for almost all values of charges and radii, the spheres do attract each other at sufficiently close distances.

Even though the general case is quite ugly, you can turn this question into a nice puzzle, accessible to undergraduate physics students:

Show that there are some choices of charges and radii such that, at sufficiently close distances, the two spheres attract each other.

A hint, in case you need it: Consider extreme cases. That’s good advice for thinking about lots of physics problems, by the way — one of the reasons I think this is a cute puzzle to give students.

 

Stop digging

Andrew Sullivan deserves a lot of credit for posting dissenting views from readers on his blog. But sometimes he’d be better off not bothering. After posting a silly quote about complexity and the second law of thermodynamics, he posted a dissent from a reader that, remarkably, manages to be even sillier.

Among other things, the dissenter says that “more complex systems are more efficient in their use of energy.” It’s not even clear that that clause even means anything: “more efficient” at achieving what goal? In what sense is a cow “more efficient” than a bacterium?

There is also no sense in which “increasing complexity is built into the fabric of the universe.” Some patches of the Universe (e.g., our neighborhood) are quite complex, but the overwhelming majority is simple and boring.

Finally, the dissenter quotes Einstein as saying “The most important decision we make is whether we believe we live in a friendly or hostile universe.” To an excellent approximation, all quotations attributed to Einstein are apocryphal. If the quote sounds spiritual and woo-woo, while not actually meaning anything, the probability rises to virtually 100%. In this particular case, the earliest printed source seems to be The Complete Idiot’s Guide to Spiritual Healing (2000). Enough said.

Want proof that data illiteracy is a problem? Talk to a member of Congress

If I could change one thing about our education system, I’d emphasize at all grade levels a set of skills you might call “data literacy” — a combination of things like numeracy, basic statistics, and generally the ability to think about quantitative data coherently.

Until we get this, I’ll just have to dream of the day when it would be surprising to hear a member of Congress say

In the end this is not a scientific survey. It’s a random survey.

Kudos to the reporter for following this sentence immediately with the following one-sentence paragraph:

In fact, the randomness of the survey is precisely what makes the survey scientific, statistical experts say.

 

Less physics for pre-meds?

The American Association of Medical Colleges has approved changes to the MCAT (the exam required for admission to US medical schools). Maybe I haven’t been paying attention, but as far as I can tell the physics community isn’t paying much attention to this. I think we should be, because some standard topics in the introductory physics sequence seem to have moved off the list of topics covered in the exam.

Compare this document describing topics covered on the current exam with this one describing the new one, which is to start in 2015. There are a number of topics on the old list but not the new list, the biggest ones (in my opinion) being magnetic fields and momentum.

I’m not going to comment on whether the loss of these topics will have a deleterious effect on future generations of physicians. I am interested in what effect it will have on the curriculum at universities such as mine. There are some topics on the MCAT list that we don’t cover in great depth in our first-year physics course because we don’t have time (e.g., sound, fluids, geometric optics). Many of the students coming through our introductory course are pre-med students. Will we be expected to dump magnetism and conservation of momentum to make room for these?

 

Nature really hates psychology

That’s Nature the premier science journal, not the nature that abhors a vacuum. Check out this news feature on problems in the field.

These problems occur throughout the sciences, but psychology has a number of deeply entrenched cultural norms that exacerbate them. It has become common practice, for example, to tweak experimental designs in ways that practically guarantee positive results. And once positive results are published, few researchers replicate the experiment exactly, instead carrying out ‘conceptual replications’ that test similar hypotheses using different methods. This practice, say critics, builds a house of cards on potentially shaky foundations.

I’m not a psychologist, so I can’t be sure how much merit there is in the article’s indictments. But don’t worry — I won’t let that stop me from commenting!

The article seems to me to make three main claims:

  1. “Publication bias” is a serious problem — results that show something positive and surprising are much more likely to be published.
  2. Psychologists often use dishonest statistical methods (perhaps unintentionally).
  3. People don’t replicate previous results exactly.

Let’s take each in turn.

1. Publication bias.

There’s no doubt that experiments that show a positive result are more highly valued than experiments that don’t show one. It’s much better for your career to find something than to find nothing. That’s bound to lead to a certain amount of “publication bias”: the published literature will contain many more false positives than you’d naively expect, because the false positives are more likely to get published.

This problem is exacerbated by the fact that people often use the ridiculously low threshold of 95% confidence to decide whether a result is statistically significant. This means that you expect one out of every 20 tests you do to yield a false positive result. There are lots of people doing lots of tests, so of course there are lots of false positives. That’s why, according to John Ioannidis, “most published [medical] research findings are false.”

The Nature piece claims that these problems are worse in psychology than in other fields. I don’t know if that’s true or not. Some evidence from the article:

Psychology and psychiatry, according to other work by Fanelli, are the worst offenders: they are five times more likely to report a positive result than are the space sciences, which are at the other end of the spectrum. The situation is not improving. In 1959, statistician Theodore Sterling found that 97% of the studies in four major psychology journals had reported statistically significant positive results5. When he repeated the analysis in 1995, nothing had changed.

One reason for the excess in positive results for psychology is an emphasis on “slightly freak-show-ish” results, says Chris Chambers, an experimental psychologist at Cardiff University, UK. “High-impact journals often regard psychology as a sort of parlour-trick area,” he says. Results need to be exciting, eye-catching, even implausible. Simmons says that the blame lies partly in the review process. “When we review papers, we’re often making authors prove that their findings are novel or interesting,” he says. “We’re not often making them prove that their findings are true.”

Incidentally, the article has a graphic illustrating the fraction of times that papers in different disciplines quote positive results, but it doesn’t seem to support the claim that psychology is five times worse than space science:

 

I estimate psychology is at about 93% and space science is at about 70%. That’s not a factor of 5.

2. Dishonest statistics.

Nature doesn’t use a loaded word like “dishonest,” but here’s what they claim:

Many psychologists make on-the-fly decisions about key aspects of their studies, including how many volunteers to recruit, which variables to measure and how to analyse the results. These choices could be innocently made, but they give researchers the freedom to torture experiments and data until they produce positive results.

In a survey of more than 2,000 psychologists, Leslie John, a consumer psychologist from Harvard Business School in Boston, Massachusetts, showed that more than 50% had waited to decide whether to collect more data until they had checked the significance of their results, thereby allowing them to hold out until positive results materialize. More than 40% had selectively reported studies that “worked”. On average, most respondents felt that these practices were defensible. “Many people continue to use these approaches because that is how they were taught,” says Brent Roberts, a psychologist at the University of Illinois at Urbana–Champaign.

If you deliberately choose to report data that lead to positive results and not data that lead to negative results, then you’re just plain lying about your data. The same goes for continuing to gather data, only stopping when “positive results materialize.”

Once again, I am not claiming that psychologists do this. I am claiming that if they do this, as Nature claims, it’s a very serious problem.

3. People don’t replicate previous studies.

The Nature article tells the story of the people who tried to replicate a previous study claiming to show precognition. Journals wouldn’t accept the paper detailing the failed replication. I agree with the author that this is a shame.

But the article’s concerns on this seem quite overwrought to me. They point out that most of the time people don’t try to replicate studies exactly, instead performing “conceptual replications,” in which they do something similar but not identical to what’s been done before. The author seems to think that this is a problem, but I don’t really see why. Here’s his argument:

But to other psychologists, reliance on conceptual replication is problematic. “You can’t replicate a concept,” says Chambers. “It’s so subjective. It’s anybody’s guess as to how similar something needs to be to count as a conceptual replication.” The practice also produces a “logical double-standard”, he says. For example, if a heavy clipboard unconsciously influences people’s judgements, that could be taken to conceptually replicate the slow-walking effect. But if the weight of the clipboard had no influence, no one would argue that priming had been conceptually falsified. With its ability to verify but not falsify, conceptual replication allows weak results to support one another. “It is the scientific embodiment of confirmation bias,” says Brian Nosek, a social psychologist from the University of Virginia in Charlottesville. “Psychology would suffer if it wasn’t practised but it doesn’t replace direct replication. To show that ‘A’ is true, you don’t do ‘B’. You do ‘A’ again.”

It’s true that a “conceptual replication” doesn’t directly falsify a particular result in the same way that a (failed) exact replication would. But I can’t bring myself to care that much. If a result is incorrect, it will gradually become clear that it doesn’t fit in with the pattern built up by many subsequent similar-but-not-identical experiments. The incorrect result will gradually atrophy from lack of attention, even if there’s not a single definitive refutation.

At least, that’s the way I see things working in physics, where direct replications of previous experiments are extremely uncommon.