{"id":209,"date":"2010-03-22T13:10:39","date_gmt":"2010-03-22T18:10:39","guid":{"rendered":"http:\/\/blog.richmond.edu\/physicsbunn\/2010\/03\/22\/science-news-on-probability-and-statistics\/"},"modified":"2010-03-22T13:10:39","modified_gmt":"2010-03-22T18:10:39","slug":"science-news-on-probability-and-statistics","status":"publish","type":"post","link":"https:\/\/blog.richmond.edu\/physicsbunn\/2010\/03\/22\/science-news-on-probability-and-statistics\/","title":{"rendered":"Science News on probability and statistics"},"content":{"rendered":"<p><a href=\"http:\/\/www.sciencenews.org\/view\/feature\/id\/57091\/title\/Odds_are,_its_wrong\">This piece<\/a> does a good job of explaining some of the dangers in the use and interpretation of statistics in scientific studies. It&#8217;s mostly about ways in which the statistical results quoted in the scientific literature can be misleading (to scientists as well as to the general public).<\/p>\n<p>The article chiefly attacks the use of 95% confidence results (i.e., results that reject the null hypothesis with p=0.05 or less) as indications that something has been scientifically established.\u00a0 It does a good job laying out several related problems with this attitude:<\/p>\n<ul>\n<li>95% isn&#8217;t all that high.\u00a0 If you do a bunch of such tests, you get a false positive one time out of every 20.\u00a0 Lots of scientists are out there doing lots of tests, so there are bound to be lots of false positives.<\/li>\n<li>Sometimes even a single study will perform many comparisons, each of which could yield a positive result.\u00a0 In that case, the probability of getting false positives goes up very rapidly.<\/li>\n<li>Of course we hear about positive results, not the negative ones.\u00a0 The result is that a lot (much more than 5%) of the results you hear about are false positives.<\/li>\n<li>People &#8212; even lots of scientists &#8212; misunderstand what the probabilities here refer to.\u00a0 When a test is done that has a p-value of 5% (often referred to as a 95% confidence result), they think that it means that there&#8217;s a 95% chance the hypothesis being tested is correct.\u00a0 In fact, it means that there&#8217;s a 5% chance that the test would have come out the way it did if the\u00a0 hypothesis is false.\u00a0 That is, they&#8217;re probabilities about the <em>possible results of the test<\/em>, not probabilities about <em>the ideas being tested<\/em>. That distinction seems minor, but it&#8217;s actually hugely important.<\/li>\n<\/ul>\n<p>If you don&#8217;t think that that last distinction matters, imagine the following scenario.\u00a0 Your doctor gives you a test for a deadly disease, and the test is 99% accurate.\u00a0 If you get a positive result, it does <em>not <\/em>mean there&#8217;s a 99% chance you have the disease.\u00a0\u00a0 Box 4 in the article works through a numerical example of this.<\/p>\n<p>As the article puts it,<\/p>\n<blockquote><p>Correctly phrased, experimental data yielding a P value of .05 means that there is only a 5 percent chance of obtaining the observed (or more extreme) result if no real effect exists (that is, if the no-difference hypothesis is correct). But many explanations mangle the subtleties in that definition. A recent popular book on issues involving science, for example, states a commonly held misperception about the meaning of statistical significance at the .05 level: <em>&quot;This means that it is 95 percent certain that the observed difference between groups, or sets of samples, is real and could not have arisen by chance.&quot;<\/em><\/p>\n<p>That interpretation commits an egregious logical error (technical term: &quot;transposed conditional&quot;): confusing the odds of getting a result (if a hypothesis is true) with the odds favoring the hypothesis if you observe that result. A well-fed dog may seldom bark, but observing the rare bark does not imply that the dog is hungry. A dog may bark 5 percent of the time even if it is well-fed all of the time.<\/p><\/blockquote>\n<p>This is exactly right, and it&#8217;s a very important distinction.<\/p>\n<p>The specific cases discussed in the article mostly have to do with medical research.\u00a0 I know very little about the cultural attitudes in that discipline, so it&#8217;s hard for me to judge some things that are said.\u00a0 The article seems (as I read it) to imply that lots of people, including scientists, regard a 95% confidence result as meaning that something is pretty well established as true.\u00a0 If that&#8217;s correct, then lots of people are out there believing lots of wrong things.\u00a0\u00a0 A 95% confidence result should be regarded as an interesting hint that something <em>might <\/em>be true, leading to new hypotheses and experiments that will either confirm or refute it.\u00a0 Something&#8217;s not well-established until its statistical significance is way better than that.<\/p>\n<p>Let me repeat: I have no idea whether medical researchers really do routinely make that error.\u00a0 The article seems to me to suggest that they do, but I have no way of telling whether it&#8217;s right.\u00a0 It certainly is true that science journalism falls into this trap with depressing regularity, though.<\/p>\n<p>Since I don&#8217;t know much about medical research, let me comment on a couple of ways this stuff plays out in physics.<\/p>\n<ul>\n<li>In astrophysics we do quote 95% confidence results quite often, although we also use other confidence levels.\u00a0 Most of the time, I think, other researchers correctly adopt the interesting-hint attitude towards such results.\u00a0 In particle physics, they&#8217;re often quite strict in their use of terminology: a particle physicist would never claim a &#8220;detection&#8221; of a particle based on a mere 95% confidence result.\u00a0 I think that their usual threshold for use of that magic word is either 4 or 5 sigma (for normally distributed errors), which means either 99.99% or 99.9999% confidence.<\/li>\n<li>The multiple-tests problem, on the other hand, can be serious in physics.\u00a0 One way, which I&#8217;ve written about <a href=\"http:\/\/blog.richmond.edu\/physicsbunn\/2009\/11\/02\/what-arnold-schwarzenegger-and-the-microwave-background-have-in-common\/\">before<\/a>, is in the various &#8220;anomalies&#8221; that people have claimed to see in the microwave background data.\u00a0 A bunch of these anomalies show up in statistical tests at 95% to 99% confidence.\u00a0 But we don&#8217;t have any good way of assessing how many tests have been done (many, especially those that yield no detection, aren&#8217;t published), so it&#8217;s hard to tell how to interpret these results.<\/li>\n<\/ul>\n<p>Although the Science News article is mostly right, I do have some complaints.\u00a0 The main one is just that it is overwrought from time to time:<\/p>\n<blockquote><p>It&#39;s science&#39;s dirtiest secret: The &quot;scientific method&quot; of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions.<\/p><\/blockquote>\n<p>Nonsense.\u00a0 It&#8217;s certainly true that people mis- and overinterpret statistical statements all the time, both in the general-interest press and in the scholarly literature, but that doesn&#8217;t mean that the tools themselves are invalid.\u00a0 If I use a hammer to open a beer bottle, I&#8217;ll get bad results, but it&#8217;s not the hammer&#8217;s fault.<\/p>\n<p>By the way, the &#8220;mutually inconsistent philosophies&#8221; here seem at one point to refer ot the quite obscure difference between Fisher&#8217;s approach and that of Neyman and Pearson, and later to be the somewhat less obscure difference between frequentists and Bayesian.\u00a0 Either way, &#8220;mutually inconsistent&#8221; and &#8220;offer no meaningful basis&#8221; are huge exaggerations.<\/p>\n<p>(Lots of people seem to think that such clash-of-the-titans language is correct when applied to frequentists vs. Bayesians, but I think that&#8217;s wrong.\u00a0 When it comes to statistical methods, as opposed to the pure philosophy of probability, the two approaches are simply different sets of tools, not irreconcilable ways of viewing the world.\u00a0 People can and do use tools from both boxes.)<\/p>\n<p>The concluding few paragraphs of the article take this hand-wringing to absurd lengths. For the record, it is absolutely not true, as a quotation from the author David Salsburg claims, that the coexistence of Bayesian and frequentist attitudes to statistics means that the whole edifice &#8220;may come crashing down from the weight of its own inconsistencies.&quot;\u00a0 The problems described in the article are real, but they&#8217;re cultural problems in the way people communicate and talk about their results, not problems in the philosophy of probability.<\/p>\n<p>One other technical problem: the article suggests that randomized clinical trials have a problem because they don&#8217;t guarantee that all relevant characteristics are equally split between the trial and control groups:<\/p>\n<blockquote><p>Randomization also should ensure that unknown differences among individuals are mixed in roughly the same proportions in the groups being tested. But statistics do not guarantee an equal distribution any more than they prohibit 10 heads in a row when flipping a penny. With thousands of clinical trials in progress, some will not be well randomized.<\/p><\/blockquote>\n<p>This is true but is not a problem.\u00a0 This fact is automatically accounted for in the statistical analysis (i.e., the p-values) that result from the study.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This piece does a good job of explaining some of the dangers in the use and interpretation of statistics in scientific studies. It&#8217;s mostly about ways in which the statistical results quoted in the scientific literature can be misleading (to scientists as well as to the general public). The article chiefly attacks the use of &hellip; <a href=\"https:\/\/blog.richmond.edu\/physicsbunn\/2010\/03\/22\/science-news-on-probability-and-statistics\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Science News on probability and statistics<\/span><\/a><\/p>\n","protected":false},"author":12,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-209","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/posts\/209","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/comments?post=209"}],"version-history":[{"count":0,"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/posts\/209\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/media?parent=209"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/categories?post=209"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.richmond.edu\/physicsbunn\/wp-json\/wp\/v2\/tags?post=209"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}