{"id":947,"date":"2015-08-15T13:57:13","date_gmt":"2015-08-15T12:57:13","guid":{"rendered":"http:\/\/pcool.dyndns.org:8080\/statsbook\/?page_id=947"},"modified":"2025-07-02T23:37:01","modified_gmt":"2025-07-02T22:37:01","slug":"p-value","status":"publish","type":"page","link":"https:\/\/pcool.dyndns.org\/index.php\/p-value\/","title":{"rendered":"P value"},"content":{"rendered":"\n<p>Statistics deals with uncertainty. When we make a statement; we also need to say how certain we are that this statement is correct. We would like this to be 100%, but realise that this is not possible. <\/p>\n\n\n\n<p>The decision to reject a<a href=\"https:\/\/pcool.dyndns.org\/index.php\/hypothesis\/\" data-type=\"page\" data-id=\"942\"> null hypothesis<\/a> is made on the basis of a <strong>statistical test<\/strong>. What statistical test is being used, depends on the distribution of the <strong>outcome variable<\/strong> and can be <a href=\"https:\/\/pcool.dyndns.org\/index.php\/non-parametric-tests-2\/\" data-type=\"page\" data-id=\"605\">parametric<\/a> or <a href=\"https:\/\/pcool.dyndns.org\/index.php\/non-parametric-tests\/\" data-type=\"page\" data-id=\"594\">non parametric<\/a>. A parametric test (ie t-test) can only be used if the data are <a href=\"https:\/\/pcool.dyndns.org\/index.php\/normal-distribution\/\" data-type=\"page\" data-id=\"578\">Normally distributed<\/a>. If the data are not Normally distributed, parametric statistics can&#8217;t be used and we will have to use a non parametric test (ie Wilcoxon test).<\/p>\n\n\n\n<p>A statistical test will calculate a <strong>test statistic<\/strong>. The test statistic is different for different tests. For example, a <a href=\"https:\/\/pcool.dyndns.org\/index.php\/non-parametric-tests-2\/\" data-type=\"page\" data-id=\"605\">parametric<\/a> t-test calculates the t-value and a <a href=\"https:\/\/pcool.dyndns.org\/index.php\/non-parametric-tests\/\" data-type=\"page\" data-id=\"594\">non parametric<\/a> Wilcoxon test calculates the W value. The p value is the <strong>probability the<\/strong> <strong>test statistic takes a more extreme value <\/strong>and is used as a threshold to make a decision whether the null hypothesis is true or not.<\/p>\n\n\n\n<p><strong>P Value = Probability the test statistic takes a more extreme value<\/strong><\/p>\n\n\n\n<p>It is generally accepted (in medical statistics) that something is \u2018proven\u2019, or statistically significant, if the probability the test statistic takes<strong> <\/strong>a more extreme value is less than 5%. On the basis of this, the null hypothesis is accepted or rejected.<\/p>\n\n\n\n<p><em>In medical statistics we are usually satisfied something is statistically significant if p less than 5%. If p less than 5%, we feel this is unlikely to be due to chance and the null hypothesis is rejected in favour of the alternate hypothesis.<\/em><\/p>\n\n\n\n<p><em style=\"font-weight: bold;\">Statistically significant<\/em>: p value &lt; 5%.<\/p>\n\n\n\n<p>Please bear in mind that the p value indicates how incompatible data are with a statistical model. P values do <strong>NOT<\/strong> measure the probability that the studied hypothesis is true and do <strong>NOT<\/strong> measure the size of an effect (the alternative hypothesis is <strong>NOT<\/strong> &#8216;more true&#8217; if the p value is lower)<sup class='sup-ref-note' id='note-zotero-ref-p947-r1-o1'><a class='sup-ref-note' href='#zotero-ref-p947-r1'>1<\/a><\/sup>. Please see also in <a href=\"https:\/\/pcool.dyndns.org\/index.php\/p-values-confidence-intervals-their-use-and-abuse\/\" data-type=\"page\" data-id=\"1937\">p values their use and abuse.<\/a><\/p>\n\n\n\n<p>It is customary to round the p value to three decimal figures. However, when p is less than one in a thousand p &lt; 0.001 is used.<\/p>\n\n\n\n<p>That something is statistically significant doesn\u2019t necessarily mean it is also <strong><em>clinically significant<\/em><\/strong>. It might well be that, although statistically there is a difference, it is of no clinical importance.<\/p>\n\n\n\n<p>Also, if we were unable to demonstrate a statistically significant difference; this doesn\u2019t mean there is no difference. It might well be that with more patients in our study, we can demonstrate a significant difference (<a href=\"https:\/\/pcool.dyndns.org\/index.php\/power-analysis\/\" data-type=\"page\" data-id=\"597\">underpowered study<\/a>, type 2 <a href=\"https:\/\/pcool.dyndns.org\/index.php\/errors\/\" data-type=\"page\" data-id=\"816\">error<\/a>).<\/p>\n\n\n\n<p>If p = 5%, there is a probability of 1 in 20 that we drew the wrong conclusion (incorrectly reject the null hypothesis, type 1 <a href=\"https:\/\/pcool.dyndns.org\/index.php\/errors\/\" data-type=\"page\" data-id=\"816\">error<\/a>).  However, it is generally a reasonable trade off in clinical studies.<\/p>\n\n\n\n<p><strong>P value correction<\/strong><\/p>\n\n\n\n<p>In a large data set, there may be many variables (columns when the data are in <a href=\"https:\/\/vita.had.co.nz\/papers\/tidy-data.pdf\" data-type=\"link\" data-id=\"https:\/\/vita.had.co.nz\/papers\/tidy-data.pdf\">tidy format<\/a>) with an opportunity to do multiple tests until something turns up that is &#8216;significant&#8217;. However, when performing multiple tests, it is important to bear in mind that, when p = 5%, one in twenty tests will be significant by chance.  To address this, p values should be corrected to the number of tests that have been performed. A number of methods have been described that include Bonferroni and Holme&#8217;s methods and are included in R. <\/p>\n\n\n\n<p>For example, when comparing two groups of patients, multiple comparison tests were performed on many variables that included the patient&#8217;s height. The height variable had a p value of 0.001 and was considered &#8216;significant&#8217;. However, this should be evaluated in view of the number of test performed. To correct the p value using Bonferroni&#8217;s method in R:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f0070f\" class=\"has-inline-color\">p = 0.001<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#0832f0\" class=\"has-inline-color\"> <\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f0071f\" class=\"has-inline-color\"># set the p value<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#0832f0\" class=\"has-inline-color\">\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f00717\" class=\"has-inline-color\">p.adjust(p, method='bonferroni', n=20)<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#0832f0\" class=\"has-inline-color\">\n&#091;1] 0.02<\/mark><\/em>\n<em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f20a29\" class=\"has-inline-color\">p.adjust(p, method='bonferroni', n=50)<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#2609f2\" class=\"has-inline-color\">\n&#091;1] 0.05<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#0832f0\" class=\"has-inline-color\">\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f00727\" class=\"has-inline-color\">p.adjust(p, method='bonferroni', n=100)<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#0832f0\" class=\"has-inline-color\">\n&#091;1] 0.1<\/mark><\/em><\/code><\/pre>\n\n\n\n<p class=\"is-style-text-annotation is-style-text-annotation--1\"><strong>n <\/strong>is the number of tests that have been performed<\/p>\n\n\n\n<p>It can be seen that, if 50 or more tests were being performed, this p value can no longer be regarded as &#8216;significant&#8217;.<\/p>\n\n\n\n<p><strong>Fishing for p values<\/strong><\/p>\n\n\n\n<p>It is important to consider that data should be evaluated in clinical perspective and should make clinical sense. There is no place for &#8216;fishing for p values&#8217; until one is &#8216;significant&#8217;. On its own a p value has little importance<sup class='sup-ref-note' id='note-zotero-ref-p947-r2-o1'><a class='sup-ref-note' href='#zotero-ref-p947-r2'>2<\/a><\/sup>. A p value is only part of the data evaluation and reporting should be with full transparency and further evidence to justify the conclusions.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Statistics deals with uncertainty. When we make a statement; we also need to say how certain we are that this statement is correct. We would like this to be 100%, but realise that this is not possible. The decision to reject a null hypothesis is made on the basis of a statistical test. What statistical [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"inline_featured_image":false,"footnotes":""},"class_list":["post-947","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/947","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/comments?post=947"}],"version-history":[{"count":21,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/947\/revisions"}],"predecessor-version":[{"id":4831,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/947\/revisions\/4831"}],"wp:attachment":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/media?parent=947"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}