I for one welcome our new alien overlords

Stephen Hawking says that we shouldn’t try to contact aliens, lest they come and attack us for our resources:

Hawking believes that contact with such a species could be devastating for humanity.

He suggests that aliens might simply raid Earth for its resources and then move on: "We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet. I imagine they might exist in massive ships, having used up all the resources from their home planet. Such advanced aliens would perhaps become nomads, looking to conquer and colonise whatever planets they can reach."

He concludes that trying to make contact with alien races is "a little too risky". He said: "If aliens ever visit us, I think the outcome would be much as when Christopher Columbus first landed in America, which didn't turn out very well for the Native Americans."

I can’t get too worried about this.  It seems to me that any alien civilization with the technology to get here and attack us would also have the technology to search telescopically for planets with useful resources.  We’ll probably be able to do a decent job on that ourselves within the next decade or two.  To be specific, we’ll be able to do spectroscopy on the atmospheres of lots of planets, which would give us a good idea of which ones to go to and mine —  if only we could get there.

For anyone who wants to find us, get to us, and exploit us, finding will be by far the easiest step, so this doesn’t strike me as a good argument for hiding.

Of course, there may be other reasons for not broadcasting our presence to aliens, the most obvious being that it’s a poor use of resources.  It all comes down to a cost-benefit analysis: Hawking doesn’t want to do it because of the potential cost (alien attack); I’m more concerned about the (overwhelmingly likely) lack of benefit.

v is not equal to dx/dt

In a discussion of David Hogg’s and my quixotic quest to convince people that it’s OK to think of the redshifts of distant galaxies as being due to the galaxies’ motion (that is, as a Doppler shift), Phillip Helbig writes

I think we all must agree on the following statement: Using the relativistic Doppler formula to calculate the velocity of an object at high redshift does not yield a meaningful answer in the the velocity so derived is not the temporal derivative of ANY distance used for other purposes in cosmology.

I replied to him in the comments, but I think that this point needs a longer response and might be of more general interest.

I agree with the beginning and end of Phillip’s statement, but not the middle.  To be precise, I agree that the velocity derived from the Doppler formula is not the derivative of a distance, but I don’t agree that that means it’s not a meaningful velocity.

That’s right: I’m saying a velocity is not necessarily the rate of change of a distance.  That sounds crazy: isn’t that the definition of velocity?

Well, sometimes.  But there are other times in astrophysics when a Doppler shift is measured, and nobody objects to calling the resulting quantity a velocity, even though that quantity is not the rate of change of a distance (or more generally of a position).  The clearest example I know of is a binary  star.

Here’s a cartoon spacetime diagram of an observation of a binary star.

Binary star spacetime diagram

Time increases upward on this diagram.  The blue curve represents the Earth.  The curve wobbles back and forth as the Earth orbits the Sun.  The red curve represents a star, which is orbiting another star (not shown).  The dashed curve shows the path of a photon going from the star to the observer.

This is a situation that occurs all the time in astronomy.  The observer sees the photon (many photons, actually), measures a redshift, and calls the result the velocity of the star relative to us.

Now riddle me this: What is the position function x(t) such that this velocity is dx/dt?  For that matter, at what t should this derivative be evaluated?

There is no good answer to this question.  The velocity in question is not equal to the time derivative of a position, in any useful sense.  The main reason is that the velocity in question is a relative velocity, relating motion at two different times.

If you insist on describing the measured velocity of the star as a dx/dt, here’s the best way I can think of to do it.  Define an inertial reference frame in which the Earth is at rest at the moment of observation.  Then the measured velocity is dx/dt, where (x,t) are the coordinates of the star in this frame, and the derivative is evaluated at the time of emission.  But this doesn’t meet Phillip’s criterion: the quantity x in this expression is not a “distance used for any other purpose.”  It’s certainly not in any sense the distance from the Earth to the star, for instance: at the time the derivative is evaluated, the Earth was nowhere near the right location for this to be true.

The velocity of the Earth, in some chosen reference frame, is a dx/dt, and the velocity of the star is also a dx/dt.  (Each of these two is represented by an arrow in the picture above.)  But the relative velocity of the two isn’t.  If you’re unwilling to call this quantity a velocity, then I guess you should be unwilling to call the quantity derived from a cosmological redshift a velocity.  But this seems to me a bit of a Humpty Dumpty way to talk.

More on the cosmological redshift

George Musser sent me and David Hogg an email with some questions about the paper Hogg and I wrote about the interpretation of the redshift (which I’ve written about before).  The discussion may help to clarify a bit what Hogg and I are and are not claiming, so here it is (with Musser and Hogg’s permission, of course).

Musser’s original question:

I’m still absorbing your paper from a couple of years ago on the cosmological redshift, being one of those people who has made the distinction with Doppler shift and, more generally, between “expansion of space” of “motion through space”.

If these are equivalent and, in fact, the latter is preferred, then should I think of the big bang as spraying out galaxies through space like a conventional explosion — i.e. the very picture cosmologists have been telling us is wrong all these years? If the rubber-sheet model of space is so problematic, then what picture should I keep in my head?

Also, if the photon only ever sees locally flat spacetime, is that why the cosmological redshift does not entail a loss of energy?

Hogg’s reply:

The only cosmologist saying that that the “explosion” picture is wrong is Harrison (who himself is very wrong), although others think it is uncomfortable, like Sean Carroll (who is not wrong). Empirically, there is no difference; what is definitely wrong is the idea that the space is “rubber” or has dynamics of its own. There is no absolute space–the investigator has coordinate freedom, and the empty space has no dynamics, so this rubber sheet picture is very misleading. And no, the photon does not “lose energy” in any sense. It is just has different energies for different observers, and we are all different observers on different galaxies.


That is helpful, but I am still confused. An explosion goes off at a certain position in space and matter shoots outward in every direction. Is that really a valid picture of the big bang? What do I make of the presence of horizons?

And finally me:

It’s still true that there is no spatial center to the expansion. That is, there is no point in space that is “really” at rest with everything moving away from it. Space is homogeneous, which means that whatever point you pick looks as much like the center as any other point.

One thing that can be said for the expanding-rubber-sheet picture: at least in its form as an expanding balloon, it conveys this idea of homogeneity tolerably well. (Well, except that it’s hard for people to remember that only the surface of the balloon counts as “space” in this metaphor. People always want to think of the center of the balloon’s volume as “where” the Big Bang happened.)

So I’d rather you not think of the Big Bang as an explosion “at a certain position in space”. It’s still true that it happened everywhere rather than somewhere. There’s no preexisting space into which stuff expands. For instance, if we imagine a closed Universe (i.e., one that has a finite  volume today), its volume was smaller in the past, approaching zero as you get closer to the Big Bang. So in that sense space really is expanding.

[A bit of fine print: All of the above is true as applied to the standard model of the Universe, in which homogeneity is assumed. Whether it’s true of our actual Universe is of course an empirical question. The answer is yes, as far as we can tell so far. But there’s no way to tell — and there probably never will be any way to tell — what space is like outside of our horizon. But anyway, this point is independent of the question of interpretation that we’re discussing at the moment. So it’s safe to ignore this point for the present discussion.]

As Hogg says, the main thing we object to is the idea that the rubber sheet has its own dynamics and interacts with the stuff in the Universe — that is, that the stretching of the rubber sheet tends to pull things apart, or that it “stretches” the wavelengths of light. As far as I’m concerned, the main reason for objecting to this language is not because it gives the wrong idea about cosmology, but because it gives the wrong idea about relativity. The most important point about relativity is that space doesn’t have any such powers and abilities. If you’re a small particle whizzing through space, at every moment space looks to you just like ordinary, gravity-free, non-expanding space.

So if you’re going to abandon the heresy of the rubber sheet, what should you replace it with? I don’t have anything as catchy as the rubber sheet, unfortunately. What I visualize when I visualize the expanding Universe is just a bunch of small neighborhoods, each one of which is completely ordinary gravity-free space, but each of which is moving away from its neighbors.

In this picture, the redshift is easy to understand. If a guy in one neighborhood tosses a ball to his neighbor, the speed of the ball as measured by the catcher will be less than the speed as measured by the thrower. That is, the two measure different energies for the ball, not because there’s some phenomenon taking energy away, but just because they’re in different reference frames. If the catcher then turns around and throws again to his neighbor, the same thing happens again, and so on. That’s all the redshift is. It’s not some mysterious “stretching.”

Good or bad Bayes?

My brother Andy pointed me to this discussion on Tamino’s Open Mind blog of Bayesian vs. frequentist statistical methods.  It’s focused on a nice, clear-cut statistics problem from a textbook by David MacKay, which can be viewed in either a frequentist or Bayesian way:

We are trying to reduce the incidence of an unpleasant disease called microsoftus. Two vaccinations, A and B, are tested on a group of volunteers. Vaccination B is a control treatment, a placebo treatment with no active ingredients. Of the 40 subjects, 30 are randomly assigned to have treatment A and the other 10 are given the control treatment B. We observe the subjects for one year after their vaccinations. Of the 30 in group A, one contracts microsoftus. Of the 10 in group B, three contract microsoftus. Is treatment A better than treatment B?

Tamino reproduces MacKay’s analysis and then proceeds to criticize it in strong terms.  Tamino’s summary:

Let \theta_A be the probability of getting "microsoftus" with treatment A, while \theta_B is the probability with treatment B. He adopts a uniform prior, that all possible values of \theta_A and \theta_B are equally likely (a standard choice and a good one). "Possible" means between 0 and 1, as all probabilities must be.

He then uses the observed data to compute posterior probability distributions for \theta_A,~\theta_B. This makes it possible to computes the probability that \theta_A < \theta_B (i.e., that you’re less likely to get the disease with treatment A than with B). He concludes that the probability is 0.990, so there’s a 99% chance that treatment A is superior to treatment B (the placebo).

Tamino has a number of objections to this analysis, which I think I agree with, although I’d express things a bit differently.  To me, the problem with the above analysis is precisely the part that Tamino says is “a standard choice and a good one”: the choice of prior.

MacKay’s choice of prior expresses the idea that, before looking at the data, we thought that all possible pairs of probabilities (\theta_A,~\theta_B) were equally likely.   That prior is very unlikely to be an accurate reflection of our actual prior state of belief regarding the drug.  Before you looked at the data, you probably thought there was a non-negligible chance that the drug had no significant effect at all — that is, that the two probabilities were exactly (or almost exactly) equal. So in fact your prior probability was surely not a constant function on the  (\theta_A,~\theta_B) plane — it had a big ridge running down the line \theta_A = \theta_B.  An analysis that assumes a prior without such a ridge is an analysis that assumes from the beginning that the drug has a significant effect with overwhelming probability.  So the fact that he concludes the drug has an effect with high probability is not at all surprising — it was encoded in his prior from the beginning!

The nicest way to analyze a situation like this from a Bayesian point of view is to compare two different models: one where the drug has no effect and one where it has some effect.    MacKay analyzes the second one.  Tamino goes on to analyze both cases and compare them.   He concludes that the probability of getting the observed data is 0.00096 under the null model (drug has no effect) and 0.00293 under the alternative model (drug has an effect).

How do you interpret these results?  The ratio of these two probabilities is about 3.  This ratio is sometimes called the Bayesian evidence ratio,  and it tells you how to modify your prior probability for the two models.  To be specific,

Posterior probability ratio = Prior probability ratio x evidence ratio.

For instance, suppose that before looking at the data you thought that there was a 1 in 10 chance that the drug would have an effect.  Then the prior probability ratio was (1/10) / (9/10), or 1/9.  After you look at the data, you “update” your prior probability ratio to get a posterior probability ratio of 1/9 x 3, or 1/3.  So after looking at the data, you now think there’s a 1/4 chance that the drug has an effect and a 3/4 chance that it doesn’t.

Of course, if you had a different prior probability, then you’d have a different posterior probability.  The data can’t tell you what to believe; it can just tell you how to update your previous beliefs.

As Tamino says,

Perhaps the best we can say is that the data enhance the likelihood that the treatment is effective, increasing the odds ratio by about a factor of 3. But, the odds ratio after this increase depends on the odds ratio before the increase €” which is exactly the prior we don't really have much information on!

People often make statement like this as if they’re pointing out a flaw in the Bayesian analysis, but this isn’t a bug in the Bayesian analysis — it’s a feature!  You shouldn’t expect the data to tell you the posterior probabilities in a way that’s independent of the prior probabilities.  That’s too much to ask.  Your final state of belief will be determined by both the data and your prior belief, and that’s the way it should be.

Incidentally, my research group’s most recent paper has to do with a problem very much like this situation:  we’re considering whether a particular data set favors a simple model, with no free parameters, or a more complicated one.  We compute Bayesian evidence ratios just like this, in order to tell you how you should update your probabilities for the two hypotheses as a result of the data.  But we can’t tell you which theory to believe — just how much your belief in one should go up or down as a result of the data.