GPA puzzles

A colleague pointed me to an article by Valen Johnson called An alternative to traditional GPA for evaluating student performance, because the article takes a Bayesian approach, and he knew I liked that sort of thing.

Johnson addresses the problem that a student’s grade point average (GPA), the standard measure of academic quality in US educational institutions, doesn’t necessarily give fair or useful results. Some instructors, and even some entire disciplines, on average grade higher than others, so some students are unfairly penalized / rewarded in their GPAs based on what they choose to study.

To illustrate the problem, Johnson uses an example taken from an earlier paper by Larkey and Caulkin. I’d never seen this before, and I thought it was cute, so I’m passing it on.

Imagine that four students take nine courses, receiving the following grades:

In this scenario, every individual course indicates that the ranking of the students is I, II, III, IV (from best to worst). That is, in every course in which students I and II overlap, I beats II, and similarly for all other pairs. But the students’ GPAs put them in precisely the opposite order.

This is a made-up example, of course, but it illustrates the idea that in the presence of systematic differences in grading standard, you can get anomalous results.

This example tickles my love of math puzzles. If you’d asked me whether it was possible to construct a scenario like this, I think I would have said no.

There are obvious follow-up questions, for those who like this sort of thing. Could you get similar results with fewer courses? If you had a different number of students, how many courses would you need to get this outcome?

I know the answer for the case of two students. If you allow for courses with only one student in them, then it’s easy to get this sort of inversion: have the students get a C+ and a C respectively in one course, and then give student II an A in some other course. If you don’t allow one-student courses, then it’s impossible. But as soon as you go up to three students, I don’t think the answer is obvious at all.

As I said, I was mostly interested in this curious puzzle, but in case you’re curious, here are a few words about the problem Johnson is addressing. I don’t have much to say about it, because I haven’t studied the paper in enough detail.

Some people have proposed that a student’s transcript should include statistical information about the grade distribution in each of the student’s courses, so that anyone reading the transcript will have some idea of what the grade is worth. For what it’s worth, that strikes me as a sensible thing to do, although getting the details right may be tricky.

That only solves the problem if the person evaluating the student (prospective employer, graduate program, or the like) is going to take the time to look at the transcript in detail. Often, people just look at a summary statistic like GPA. Johnson proposes a way of calculating a quantity that could be considered an average measure of student achievement, taking into account the variation in instructors’ grading habits. Other people have done this before, or course. Johnson’s approach is different in that it’s justified by Bayesian probability calculations from a well-specified underlying model, as opposed to more-or-less ad hoc calculations.

I’m philosophically sympathetic to this approach, although some of the details of Johnson’s calculations seem a bit odd to me. I’d have to study it much more carefully than I intend to to say for sure what I think of it.

 

Vaccines are still good for you

People seem to have been talking about some new reports that claim (yet again) a connection between vaccines and autism. The latest versions go further, alleging a cover-up by the CDC. The most important thing to know about this is that the overwhelming scientific consensus remains that vaccines are not linked to autism. They do, on the other hand, prevent vast amounts of suffering due to preventable diseases. The anti-vaccine folks do enormous harm.

(Although I have a few other things to say, the main point of this piece is to link to an excellent post by Allen Downey. The link is below, but it’s mixed in with a bunch of other stuff, so  I thought I’d highlight it up here.)

The usual pro-science people (e.g., Phil Plait) have jumped on this most recent story, stating correctly that the new report is bogus. They tend to link to two articles explaining why, but I’d rather steer you toward a piece by my old friend Allen Downey. Unlike the other articles, Allen explains one specific way in which the new study is wrong.

The error Allen describes is a common one. People often claim that a result is “statistically significant” if it has a “p-value” below 5%. This means that there is only a 5% chance of a false positive — that is, if there is no real effect, you’d be fooled into thinking there was an effect 5% of the time. Now suppose that you do 20 tests. The odds are very high in that case that at least one of them will be “significant” at the 5% level. People often draw attention to these positive results while sweeping under the rug the other tests that didn’t show anything. As far as I can tell, Allen’s got the goods on these guys, demonstrating convincingly that that’s what they did.

The other pieces I’ve read debunking the recent study have tended to focus on the people involved, pointing out (correctly, as far as I know) that they’ve made bogus arguments in the past, that they have no training in statistics or epidemiology, etc. Some people say that you shouldn’t pay any attention to considerations like that: all that matters is the content of the argument, and ad hominem considerations are irrelevant. That’s actually not true. Life is short. If you hear an argument from someone who’s always been wrong before, you might quite rationally decide that it’s not worth your time to figure out why it’s wrong. Combine that with a strong prior belief (tons of other evidence have shown no vaccine-autism link), and perfectly sound Bayesian reasoning (or as I like to call it, “reasoning”) tells you to discount the new claims. So before I saw Allen’s piece, I was pretty convinced that the new results were wrong.

But despite all that, it’s clearly much better if someone is willing to do the public service of figuring out why it’s wrong and explaining it clearly. This is pretty much the reason that I bothered to figure out in detail that evolution doesn’t violate the laws of thermodynamics: there was no doubt about the conclusion, but because the bogus argument continues to get raised, it’s good to be able to point people towards an explanation of  exactly why it’s wrong.

So thanks, Allen!

 

 

Joggins Fossil Institute does the right thing

As I wrote a few days ago, I sent a note to the people who run Joggins Fossil Cliffs in Nova Scotia complaining that they distribute pseudoscientific crystal-healing nonsense along with some items for sale in their gift shop. I got a very prompt reply saying

Thank you for the feedback on your experience at the Joggins Fossil Cliffs.

Excellent to hear that you and your wife enjoyed your time here.

We have removed the documentation that you referenced.

Good for them!

As I mentioned before, this place is worth a visit if you’re in the area. There are cool fossils to see, and with this one exception (now apparently fixed), they did a very good job of explaining things.

 

 

 

Someone doesn’t understand probabilities

I know: as headlines go, this one is not exactly Man Bites Dog. Let me be a bit more specific. Either the New York Times or trial lawyers don’t understand probability. (This, incidentally, is a good example of the inclusive “or”.)

The Times has an interactive feature illustrating the process by which lawyers decide whether to allow someone to be seated on a jury. For those who don’t know, in most if not all US courts, lawyers are allowed to have potential jurors stricken from jury pools, either for cause, if there’s evidence that a juror is biased, or using a limited number of “peremptory challenges” to remove people that the lawyer merely suspects will be unfavorable to his or her side. The Times piece asks you a series of questions and indicates how your answers affect the  lawyers’ opinion about you in a hypothetical lawsuit by an investor suing her money manager for mismanaging her investments.

The first two questions are about your job and age. As a white-collar worker, I’m told that I’d be more likely to side with the defendant, but the fact that I’m over 30 makes me more likely to favor the plaintiff. A slider at the top of the screen indicates the net effect:

So far so good. Question 3 then asks about my income. Here are the two possible outcomes:

So if I’m high-income, there’s no effect, but if I’m low-income, I’m more likely to side with the plaintiff. This is logically impossible. If one answer shifts the probability one direction, the other answer must shift it the other direction (by some nonzero amount).

Before the lawyers found out the answer, they knew that I was either low-income or high-income. (A waggish mathematician might observe that the possibility that my income is exactly $50,000 is not included in the two possibilities. This is why no one likes a waggish mathematician.) The lawyers’  assessment of me before asking the question must be a weighted average of the two  subsequent possibilities, with weights given by their prior beliefs about what my income would turn out to be. For instance, if they thought initially that there was a 70% chance that I’d be in the high-income category, then the initial probability should have been 0.7 times the high-income probability plus 0.3 times the low-income probability.

That means that if one answer to the income question shifts the probability toward the plaintiffs, then the other answer must shift the probability in the other direction.

So either the lawyers the reporter talked to are irrational or the reporter has misunderstood them. For what it’s worth, my money is on the first option. Lots of people don’t understand probabilities, but it seems likely to me that the Times reporters would have asked these questions straightforwardly and accurately reported the answers they heard from the lawyers they talked to.

If that’s true, it seems like it should present a money-making opportunity for people with expertise in probability. Lawyers who hired such people as consultants would presumably do a better job at jury selection and win more cases.

Curmudgeonliness

Update: Got a very nice and very prompt note back from the people who run the place. Apparently they’ve removed this material.

My wife and I just got back from a very nice vacation in Nova Scotia, which is very beautiful (and much cooler than Richmond in August). Among other things (such as rafting on the tidal bore in the Bay of Fundy, which I highly recommend), we visited the Joggins Fossil Cliffs, a UNESCO World Heritage Site where, as the name suggests, you can see tons of fossils. The site includes both a museum and a stretch of beach you can walk along and spot fossils in their natural habitat, so to speak. There are guides to show you things and help you figure out what you’re seeing. On the whole, it’s quite interesting and educational. If you’re nearby, it’s definitely worth a visit.

The site is run by a nonprofit educational organization. As usual, they get part of their revenue from a gift shop. Among the things you can buy in the gift shop are pretty polished stones.

So far so good. Now for the curmudgeonliness. The polished stones are accompanied by this pamphlet.

As I’m sure I don’t need to tell anyone who’s reading this, the last sentence of each description is complete nonsense. Stones and crystals do not have any effect on the human psyche.

I understand that the organization needs to raise money, but is it too much to ask that they refrain from actively promoting pseudoscience in doing so? The gift shop does not stock Creationist books that claim the Earth is 6000 years old, presumably because to do so would undermine their educational mission. This may be somewhat different in degree but not at all different in kind.

This sort of thing might seem harmless, but it’s not. People really believe in things like this. If they didn’t, there wouldn’t be Web sites like healingcrystals.com (I’d rather not link to it) that will sell you crystals to cure hundreds of different ailments. Look at this screen shot, for instance.

This is a link to 728 items you can buy that purport to help you if you have cancer but in fact do nothing. People with cancer (among other things) are being fleeced and are being given false hope by this sort of nonsense. For a science educator to give any sort of seal of approval to this is not OK.

As my colleague Matt Trawick pointed out, the last item on the list is particularly interesting, in a Catch-22 sort of way. Suppose that you buy some sodalite and it does in fact cause you “to become logical and rational.” Would you then go ask for your money back?

By the way, I’ve sent a note to the organization that runs the Fossil Cliffs outlining my concern. I’ll post something if I hear anything back. (See Update at the top.)

It’s not rocket science

You may have heard about these NASA engineers who claim to have demonstrated a reactionless drive mechanism — that is, something that can generate thrust without shooting anything out the back end. Such a device would violate one of the most well-established laws of physics, namely conservation of momentum. It would be an incredibly big deal if true.

Of course it’s not true, for all the usual reasons: Extraordinary claims require extraordinary evidence, never believe an experiment until it’s been confirmed by a theory, etc.

You can be confident that this result is wrong by using reasoning, or, as some people like to call it, Bayesian reasoning. To be specific, the new experimental result causes you to update your prior beliefs. Your prior belief was, or at least should have been, that there’s an incredibly high probability that momentum is conserved, particularly in situations like this one that are described by the best-tested theory in the history of science. When your prior is extremely strong (in this case because of centuries’ worth of experimental confirmation), even a very well-done experiment is not enough to dislodge it.

Phil Plait’s post is a reasonable place to go for more details, although he’s much too kind at a couple of points:

I’m not saying it’s wrong, but I am saying it’s very, very likely to be some sort of measurement or experimental error.

This is bizarrely wishy-washy. I, for one, am saying it’s wrong.

Plait also says

The only other way this device could possibly work is if it’s interacting with “virtual particles”, an interesting idea, but a highly speculative one.

Again, this is far too kind. To say that this works by “interacting with virtual particles” means precisely as much as saying that it works by interacting with invisible blue fairies. “Virtual particles,” as the term has been used in physics for nearly a century, would definitely not produce an effect like this. If the authors mean anything by this claim at all, then they are using that term in a way that bears no relation to its usual meaning, but of course there’s no indication at all of what they do think it means. They should just just call them invisible blue fairies instead, to avoid confusion.

Despite my complaints, Plait does sound an appropriate, if understated, note of skepticism. He also links to a couple of posts by my old friend John Baez, which treat the subject with an appropriate level of scorn. No euphemisms like “highly speculative” for him:

 “Quantum vacuum virtual plasma” is something you’d say if you failed a course in quantum field theory and then smoked too much weed.

Despite being a mathematician, Baez digs deeper than most people into the experimental details, pointing out one astonishing fact that I haven’t seen mentioned elsewhere: the article describes in detail the workings of the vacuum chamber in which the experiment was performed, but the actual experiment was done “at ambient atmospheric pressure” (i.e., not in a vacuum). This is important because one obvious possible source of error is the production of air currents surrounding the device.

 

 

Another Mathematica bug

I got a reply to my last bug report, acknowledging that it’s a bug and saying they’d send it on to their engineers.

Here’s another:

The first one should only evaluate to zero if it’s zero for all a,b. The second indicates that that’s not the case. This one cost me several hours, as I searched elsewhere for why my results didn’t make sense. (The offending bit was originally in the middle of a larger expression; otherwise, I would have figured it out more quickly.)

Update: The Mathematica people have acknowledged this bug and passed the word on to their developers, presumably to be fixed in the next version.

Update 2: Two years later, I got an email saying that the bug had been fixed. I can confirm that, as of Mathematica version 10.4, the problem is gone. It was still there in 10.0. I didn’t check the versions in between. The previous bug involving the spherical Bessel function is still there.

Many worlds

I just wanted to link to Sean Carroll’s post defending the many-worlds interpretation of quantum mechanics. Sean has a habit of getting this sort of thing right.

He explains that the multiple worlds are not an add-on to the theory but instead are simply what happens naturally when you take the equations of the theory at face value. The standard (“Copenhagen”) interpretation is the one that needs to postulate an ad hoc extra rule. We should simply rename things:

  • The “many-worlds interpretation of quantum mechanics” should henceforth be known as “quantum mechanics.”
  • The Copenhagen interpretation should henceforth be known as “the disappearing-worlds interpretation of quantum mechanics.”

The system works

Peer review, that is.

Remember BICEP2?

They announced a detection of B-mode microwave background polarization, which would be strong evidence that inflation happened in the early universe, but some people expressed doubt about whether they’d adequately eliminated the possibility that what they were seeing was contamination due to more nearby sources of radiation, particularly Galactic dust. (Some other very eminent people then said silly things.)

All of this occurred before the paper describing the results had undergone peer review. The paper has now been accepted for publication in the prestigious journal Physical Review Letters, with significant changes. As of now, the arxiv still has the original version, so it’s easy to compare it with the published version.

The authors have removed all discussion of one of the dust models that they used to argue against the possibility of dust contamination. This is the notorious “DDM2” model, which was based in part on data gleaned from a slide shown in a talk. A footnote explains the removal of this model, saying in part that “we have concluded the information used for the DDM2 model has unquantifiable uncertainty.”

Although the concerns about the DDM2 model got the most attention, people raised a number of concerns about the preprint’s discussion of dust contamination. Presumably the referees agreed, because the published paper is much more cautious in its claims. For instance, take a look at the end of the preprint,

The long search for tensor B-modes is appar- ently over, and a new era of B-mode cosmology has begun.

and compare it with the published version,

We have pushed into a new regime of sensitivity, and the high-confidence detection of B-mode polarization at degree angular scales brings us to an exciting juncture. If the origin is in tensors, as favored by the evidence presented above, it heralds a new era of B-mode cosmology. However, if these B modes represent evidence of a high-dust foreground, it reveals the scale of the challenges that lie ahead.

This is a case in which peer review clearly improved the quality of the paper: the second version is much more accurate than the first.

Other than the removal of the DDM2 model, I don’t think that the actual results have changed; the difference is all in the description of their significance. This is exactly as it should be. Even those of us who harbor doubts about the interpretation generally agree that this is a very important and exciting data set. The researchers deserve high praise for performing an experimental tour de force.

Some people say that the BICEP team shouldn’t have released their results at all until after peer review. I think that this objection is wrongheaded. There’s no way of knowing, but I bet that the official referees were able to give a much better critique of the paper because lots of other experts had been examining and commenting on it.

One argument that we shouldn’t publicize unreviewed results because this sort of thing makes us look bad.  The media gave a lot of coverage to the original result and are naturally covering subsequent events as a reversal, which is perhaps embarrassing. In this particular case, of course I wish that the earlier reports had emphasized the doubts (which started to appear right away), but in general I can’t get too upset about this problem. I think it’s much better if the media covers science as it actually is — people get exciting results, and then the rest of the community chews them over before deciding what to think about them — instead of a sanitized version. It seems clear to me that the advantages of an open discussion in figuring out the truth far outweigh the disadvantages in (arguably) bad publicity.

There’s one strange thing about the BICEP2 paper. It appeared in Physical Review Letters, which traditionally is limited to very short papers. The limit used to be four pages. It’s now expressed in word count, but it comes to about the same thing. The published paper is at least five times longer than this limit. I don’t know if this has ever happened before.

Here’s another piece of the puzzle. The preprint doesn’t say which journal it was submitted to but is formatted in a style that doesn’t match PRL at all. In particular, the citations are in author-year format, whereas PRL uses numbered citations.

It’s no big deal, but I’m mildly curious about the explanation for these facts.

Strange Mathematica behavior

So far so good. This is the correct result. Now multiply the x by 1.0:

 

That makes Mathematica think that the integral fails to converge.

I found this out when reproducing some calculations from a couple of years ago. It seemed to work back then, so this behavior seems to have been introduced in a recent version of Mathematica.

I know of various reasons why putting in the 1.0 could make a difference (because it forces Mathematica to think in terms of floating-point numbers with finite accuracy, rather than exact integers), but I don’t think any of them should make a difference here. The integral is failing to converge at x=0 (I checked that the problem is there, not at infinity), and the integrand is perfectly well-behaved there, even if you replace the 1.0 by any other complex number.