{"id":211,"date":"2010-04-03T19:44:23","date_gmt":"2010-04-04T00:44:23","guid":{"rendered":"http:\/\/blog.richmond.edu\/physicsbunn\/2010\/04\/03\/good-or-bad-bayes\/"},"modified":"2010-04-03T19:44:23","modified_gmt":"2010-04-04T00:44:23","slug":"good-or-bad-bayes","status":"publish","type":"post","link":"https:\/\/blog.richmond.edu\/physicsbunn\/2010\/04\/03\/good-or-bad-bayes\/","title":{"rendered":"Good or bad Bayes?"},"content":{"rendered":"<p>My brother Andy pointed me to <a href=\"http:\/\/tamino.wordpress.com\/2010\/03\/22\/good-bayes-gone-bad\/\">this discussion<\/a> on Tamino&#8217;s Open Mind blog of Bayesian vs. frequentist statistical methods.\u00a0 It&#8217;s focused on a nice, clear-cut statistics problem from a <a href=\"http:\/\/www.inference.phy.cam.ac.uk\/mackay\/itila\/book.html\">textbook by David MacKay<\/a>, which can be viewed in either a frequentist or Bayesian way:<\/p>\n<blockquote><p><em>We are trying to reduce the incidence of an unpleasant disease called microsoftus. Two vaccinations, A and B, are tested on a group of volunteers. Vaccination B is a control treatment, a placebo treatment with no active ingredients. Of the 40 subjects, 30 are randomly assigned to have treatment A and the other 10 are given the control treatment B. We observe the subjects for one year after their vaccinations. Of the 30 in group A, one contracts microsoftus. Of the 10 in group B, three contract microsoftus. Is treatment A better than treatment B?<\/em><\/p><\/blockquote>\n<p>Tamino reproduces MacKay&#8217;s analysis and then proceeds to criticize it in strong terms.\u00a0 Tamino&#8217;s summary:<\/p>\n<blockquote><p> Let <img decoding=\"async\" src=\"http:\/\/l.wordpress.com\/latex.php?latex=%5Ctheta_A&amp;bg=ffffff&amp;fg=000000&amp;s=0\" alt=\"\\theta_A\" \/> be the probability of getting &quot;microsoftus&quot; with treatment A, while <img decoding=\"async\" src=\"http:\/\/l.wordpress.com\/latex.php?latex=%5Ctheta_B&amp;bg=ffffff&amp;fg=000000&amp;s=0\" alt=\"\\theta_B\" \/> is the probability with treatment B.  He adopts a uniform prior, that all possible values of <img decoding=\"async\" src=\"http:\/\/l.wordpress.com\/latex.php?latex=%5Ctheta_A&amp;bg=ffffff&amp;fg=000000&amp;s=0\" alt=\"\\theta_A\" \/> and <img decoding=\"async\" src=\"http:\/\/l.wordpress.com\/latex.php?latex=%5Ctheta_B&amp;bg=ffffff&amp;fg=000000&amp;s=0\" alt=\"\\theta_B\" \/> are equally likely (a standard choice and a good one).  &quot;Possible&quot; means between 0 and 1, as all probabilities must be.<\/p>\n<p>He then uses the observed data to compute <em>posterior<\/em> probability distributions for <img decoding=\"async\" src=\"http:\/\/l.wordpress.com\/latex.php?latex=%5Ctheta_A%2C%7E%5Ctheta_B&amp;bg=ffffff&amp;fg=000000&amp;s=0\" alt=\"\\theta_A,~\\theta_B\" \/>.  This makes it possible to computes the probability that <img decoding=\"async\" src=\"http:\/\/l.wordpress.com\/latex.php?latex=%5Ctheta_A+%3C+%5Ctheta_B&amp;bg=ffffff&amp;fg=000000&amp;s=0\" alt=\"\\theta_A &lt; \\theta_B\" \/> (i.e., that you&#8217;re less likely to get the disease with treatment A than with B). He concludes that the probability is 0.990, so there&#8217;s a 99% chance that treatment A is superior to treatment B (the placebo).<\/p><\/blockquote>\n<p>Tamino has a number of objections to this analysis, which I think I agree with, although I&#8217;d express things a bit differently.\u00a0 To me, the problem with the above analysis is precisely the part that Tamino says is &#8220;a standard choice and a good one&#8221;: the choice of prior.<\/p>\n<p>MacKay&#8217;s choice of prior expresses the idea that, before looking at the data, we thought that all possible pairs of probabilities (<img decoding=\"async\" src=\"http:\/\/l.wordpress.com\/latex.php?latex=%5Ctheta_A%2C%7E%5Ctheta_B&amp;bg=ffffff&amp;fg=000000&amp;s=0\" alt=\"\\theta_A,~\\theta_B\" \/>) were equally likely.\u00a0\u00a0 That prior is very unlikely to be an accurate reflection of our actual prior state of belief regarding the drug.\u00a0 Before you looked at the data, you probably thought there was a non-negligible chance that the drug had no significant effect at all &#8212; that is, that the two probabilities were exactly (or almost exactly) equal. So in fact your prior probability was surely not a constant function on the\u00a0 (<img decoding=\"async\" src=\"http:\/\/l.wordpress.com\/latex.php?latex=%5Ctheta_A%2C%7E%5Ctheta_B&amp;bg=ffffff&amp;fg=000000&amp;s=0\" alt=\"\\theta_A,~\\theta_B\" \/>) plane &#8212; it had a big ridge running down the line <img decoding=\"async\" src=\"http:\/\/l.wordpress.com\/latex.php?latex=%5Ctheta_A&amp;bg=ffffff&amp;fg=000000&amp;s=0\" alt=\"\\theta_A\" \/> = <img decoding=\"async\" src=\"http:\/\/l.wordpress.com\/latex.php?latex=%5Ctheta_B&amp;bg=ffffff&amp;fg=000000&amp;s=0\" alt=\"\\theta_B\" \/>.\u00a0 An analysis that assumes a prior without such a ridge is an analysis that assumes from the beginning that the drug has a significant effect with overwhelming probability.\u00a0 So the fact that he concludes the drug has an effect with high probability is not at all surprising &#8212; it was encoded in his prior from the beginning!<\/p>\n<p>The nicest way to analyze a situation like this from a Bayesian point of view is to compare two different models: one where the drug has no effect and one where it has some effect. \u00a0\u00a0 MacKay analyzes the second one.\u00a0 Tamino goes on to analyze both cases and compare them. \u00a0 He concludes that the probability of getting the observed data is 0.00096 under the null model (drug has no effect) and 0.00293 under the alternative model (drug has an effect).<\/p>\n<p>How do you interpret these results?\u00a0 The ratio of these two probabilities is about 3.\u00a0 This ratio is sometimes called the <em>Bayesian evidence ratio,\u00a0 <\/em>and it tells you how to modify your prior probability for the two models.\u00a0 To be specific,<\/p>\n<p>Posterior probability ratio = Prior probability ratio x evidence ratio.<\/p>\n<p>For instance, suppose that before looking at the data you thought that there was a 1 in 10 chance that the drug would have an effect.\u00a0 Then the prior probability ratio was (1\/10) \/ (9\/10), or 1\/9.\u00a0 After you look at the data, you &#8220;update&#8221; your prior probability ratio to get a posterior probability ratio of 1\/9 x 3, or 1\/3.\u00a0 So after looking at the data, you now think there&#8217;s a 1\/4 chance that the drug has an effect and a 3\/4 chance that it doesn&#8217;t.<\/p>\n<p>Of course, if you had a different prior probability, then you&#8217;d have a different posterior probability.\u00a0 The data can&#8217;t tell you what to believe; it can just tell you how to update your previous beliefs.<\/p>\n<p>As Tamino says,<\/p>\n<blockquote><p>Perhaps the best we can say is that the data enhance the likelihood that the treatment is effective, increasing the odds ratio by about a factor of 3. But, the odds ratio after this increase depends on the odds ratio before the increase \u20ac\u201d which is exactly the prior we don&#39;t really have much information on!<\/p><\/blockquote>\n<p>People often make statement like this as if they&#8217;re pointing out a flaw in the Bayesian analysis, but this isn&#8217;t a bug in the Bayesian analysis &#8212; it&#8217;s a feature!\u00a0 You shouldn&#8217;t expect the data to tell you the posterior probabilities in a way that&#8217;s independent of the prior probabilities.\u00a0 That&#8217;s too much to ask.\u00a0 Your final state of belief will be determined by both the data and your prior belief, and that&#8217;s the way it should be.<\/p>\n<p>Incidentally, my research group&#8217;s <a href=\"http:\/\/blog.richmond.edu\/physicsbunn\/2010\/03\/31\/zheng-bunn-paper-submitted\/\">most recent paper<\/a> has to do with a problem very much like this situation:\u00a0 we&#8217;re considering whether a particular data set favors a simple model, with no free parameters, or a more complicated one.\u00a0 We compute Bayesian evidence ratios just like this, in order to tell you how you should update your probabilities for the two hypotheses as a result of the data.\u00a0 But we can&#8217;t tell you which theory to believe &#8212; just how much your belief in one should go up or down as a result of the data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>My brother Andy pointed me to this discussion on Tamino&#8217;s Open Mind blog of Bayesian vs. frequentist statistical methods.\u00a0 It&#8217;s focused on a nice, clear-cut statistics problem from a textbook by David MacKay, which can be viewed in either a frequentist or Bayesian way: We are trying to reduce the incidence of an unpleasant disease &hellip; <a href=\"https:\/\/blog.richmond.edu\/physicsbunn\/2010\/04\/03\/good-or-bad-bayes\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Good or bad Bayes?<\/span><\/a><\/p>\n","protected":false},"author":12,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-211","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/posts\/211","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/comments?post=211"}],"version-history":[{"count":0,"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/posts\/211\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/media?parent=211"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/categories?post=211"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/tags?post=211"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}