{"id":823,"date":"2015-08-10T15:30:14","date_gmt":"2015-08-10T14:30:14","guid":{"rendered":"http:\/\/pcool.dyndns.org:8080\/statsbook\/?page_id=823"},"modified":"2025-07-04T21:08:26","modified_gmt":"2025-07-04T20:08:26","slug":"correlation-coefficient","status":"publish","type":"page","link":"https:\/\/pcool.dyndns.org\/index.php\/correlation-coefficient\/","title":{"rendered":"Correlation Coefficient"},"content":{"rendered":"\n<p><a href=\"https:\/\/pcool.dyndns.org\/index.php\/regression-coefficient\/\" data-type=\"page\" data-id=\"826\">As described<\/a>, a regression line was fitted through 30 data points in the <a href=\"https:\/\/pcool.dyndns.org:\/wp-content\/data_files\/trees30.rda\" target=\"_blank\" rel=\"noreferrer noopener\">trees30.rda<\/a> data set.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/trees30regression-1024x768.png\" alt=\"\" class=\"wp-image-3853\" srcset=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/trees30regression-1024x768.png 1024w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/trees30regression-300x225.png 300w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/trees30regression-768x576.png 768w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/trees30regression.png 1355w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>As can be seen in the graph, the line seems to fit the data well. However, the fit is not always as good as illustrated here. It would be nice to have a measure of how close the line fits the data. This measure is called the <strong><em>correlation coefficient<\/em><\/strong> and often denoted by <strong><em>R<\/em><\/strong>. It is defined as:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(R = \\frac{SumOfProductsAboutTheMeanOfXAndY }{\\sqrt(SumOfSquaresAboutTheMeanOfX \\cdot SumOfSquaresAboutTheMeanOfY)}\\)<script src=\"https:\/\/pcool.dyndns.org\/wp-includes\/js\/dist\/hooks.min.js?ver=dd5603f07f9220ed27f1\" id=\"wp-hooks-js\"><\/script>\n<script src=\"https:\/\/pcool.dyndns.org\/wp-includes\/js\/dist\/i18n.min.js?ver=c26c3dc7bed366793375\" id=\"wp-i18n-js\"><\/script>\n<script id=\"wp-i18n-js-after\">\nwp.i18n.setLocaleData( { 'text direction\\u0004ltr': [ 'ltr' ] } );\n\/\/# sourceURL=wp-i18n-js-after\n<\/script>\n<script  async src=\"https:\/\/cdnjs.cloudflare.com\/ajax\/libs\/mathjax\/2.7.7\/MathJax.js?config=TeX-MML-AM_CHTML\" id=\"mathjax-js\"><\/script>\n<\/div>\n\n\n\n<p>Obviously, computers are commonly used to calculate the correlation coefficient. In the tree example, the correlation coefficient can be found:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>cor(TreeGirthMass$Girth,TreeGirthMass$Mass,method='pearson')<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>&#091;1] 0.9731369<\/em><\/span>\n<em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#fb0404\" class=\"has-inline-color\">cor.test(TreeGirthMass$Girth,TreeGirthMass$Mass,method='pearson')<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#3405fa\" class=\"has-inline-color\">\n\tPearson's product-moment correlation\n\ndata:  TreeGirthMass$Girth and TreeGirthMass$Mass\nt = 22.366, df = 28, p-value &lt; 2.2e-16\nalternative hypothesis: true correlation is not equal to 0\n95 percent confidence interval:\n 0.9437318 0.9872758\nsample estimates:\n      cor \n0.9731369 <\/mark><\/em><\/code><\/pre>\n\n\n\n<p>The correlation coefficient therefore is 0.9731 with a 95% confidence interval of (0.9437, 0.9873). The p-value for the test of no association is less than 0.001 (p&lt;0.001) and significant. It is concluded there is an association between the girth and mass of the trees. How good this association is, is indicated by the correlation coefficient.<\/p>\n\n\n\n<p>The correlation coefficient always has a value between \u20131 and 1. A correlation coefficient of 0.97 therefore, means that there is an excellent correlation between the girth and mass of a tree (it should be noted that the square of the correlation coefficient is always <strong><em>smaller<\/em><\/strong> than the correlation coefficient itself; this is because the square of a number between \u20131 and 1 is always <strong><em>smaller<\/em><\/strong> than the number itself).<\/p>\n\n\n\n<p>If the correlation coefficient = 1, the line fits the data perfectly:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"301\" height=\"204\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/correlation1.png\" alt=\"\" class=\"wp-image-3020\"\/><\/figure>\n\n\n\n<p>A correlation coefficient of zero means that there is no correlation whatsoever:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"294\" height=\"204\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/correlation2.png\" alt=\"\" class=\"wp-image-3021\"\/><\/figure>\n\n\n\n<p>In fact, we could have drawn <strong><em>any line<\/em><\/strong> through the data points above! A correlation coefficient of \u20131, means that there is reverse relation between the data.<\/p>\n\n\n\n<p><strong>The correlation coefficient is a measure how close the line fits the data. It ranges from \u20131 to +1. A correlation coefficient of zero means that there is no correlation  \/ association. The more the value approaches 1, the better the line fits the data. A negative value corresponds to a reverse relation.<\/strong><\/p>\n\n\n\n<p><strong>Causation<\/strong><\/p>\n\n\n\n<p>Correlation may be demonstrated statistically. However, this does not necessarily demonstrate a cause (causation). Hill<sup class='sup-ref-note' id='note-zotero-ref-p823-r1-o1'><a class='sup-ref-note' href='#zotero-ref-p823-r1'>1<\/a><\/sup> described the criteria for causation:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>strength<\/li>\n\n\n\n<li>consistency<\/li>\n\n\n\n<li>specificity<\/li>\n\n\n\n<li>temporality<\/li>\n\n\n\n<li>biological gradient<\/li>\n\n\n\n<li>plausibility<\/li>\n\n\n\n<li>experiment<\/li>\n\n\n\n<li>analogy<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>As described, a regression line was fitted through 30 data points in the trees30.rda data set. As can be seen in the graph, the line seems to fit the data well. However, the fit is not always as good as illustrated here. It would be nice to have a measure of how close the line [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"inline_featured_image":false,"footnotes":""},"class_list":["post-823","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/823","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/comments?post=823"}],"version-history":[{"count":2,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/823\/revisions"}],"predecessor-version":[{"id":4922,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/823\/revisions\/4922"}],"wp:attachment":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/media?parent=823"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}