{"id":541,"date":"2015-08-01T12:51:04","date_gmt":"2015-08-01T11:51:04","guid":{"rendered":"http:\/\/pcool.dyndns.org:8080\/statsbook\/?page_id=541"},"modified":"2025-07-01T10:38:22","modified_gmt":"2025-07-01T09:38:22","slug":"scatterplot","status":"publish","type":"page","link":"https:\/\/pcool.dyndns.org\/index.php\/scatterplot\/","title":{"rendered":"Scatter Plot"},"content":{"rendered":"\n<p>Scatter plots are commonly used in medicine to illustrate the relation between two continuous variables. However, scatter plots can also be used to show discrete numeral and ordinal data.<\/p>\n\n\n\n<p>Download the <a href=\"https:\/\/pcool.dyndns.org:\/wp-content\/data_files\/anscombe.rda\" target=\"_blank\" rel=\"noreferrer noopener\">anscombe.rda<\/a> dataset for this example<sup class='sup-ref-note' id='note-zotero-ref-p541-r1-o1'><a class='sup-ref-note' href='#zotero-ref-p541-r1'>1<\/a><\/sup>.<\/p>\n\n\n\n<p>Anscombe&#8217;s fictional data sets can be shown by:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f90707\" class=\"has-inline-color\">anscombe.quartet\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#3f07fa\" class=\"has-inline-color\">   X1    Y1 X2   Y2 X3    Y3 X4    Y4\n1  10  8.04 10 9.14 10  7.46  8  6.58\n2   8  6.95  8 8.14  8  6.77  8  5.76\n3  13  7.58 13 8.74 13 12.74  8  7.71\n4   9  8.81  9 8.77  9  7.11  8  8.84\n5  11  8.33 11 9.26 11  7.81  8  8.47\n6  14  9.96 14 8.10 14  8.84  8  7.04\n7   6  7.24  6 6.13  6  6.08  8  5.25\n8   4  4.26  4 3.10  4  5.39 19 12.50\n9  12 10.84 12 9.13 12  8.15  8  5.56\n10  7  4.82  7 7.26  7  6.42  8  7.91\n11  5  5.68  5 4.74  5  5.73  8  6.89<\/mark><\/em><\/code><\/pre>\n\n\n\n<p>The four data sets are x1 vs y1, x2 vs y2, x3 vs y3 and x4 vs y4. The x and y variables have identical means:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f20606\" class=\"has-inline-color\">summary(anscombe.quartet$X1)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#0507f2\" class=\"has-inline-color\">   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. \n    4.0     6.5     9.0     9.0    11.5    14.0 <\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f20606\" class=\"has-inline-color\">\nsummary(anscombe.quartet$X2)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#1205f2\" class=\"has-inline-color\">   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. \n    4.0     6.5     9.0     9.0    11.5    14.0 <\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f20606\" class=\"has-inline-color\">\nsummary(anscombe.quartet$X3)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#2b05f2\" class=\"has-inline-color\">   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. \n    4.0     6.5     9.0     9.0    11.5    14.0 <\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f20606\" class=\"has-inline-color\">\nsummary(anscombe.quartet$X4)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#3b05f2\" class=\"has-inline-color\">   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. \n      8       8       8       9       8      19 <\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f20606\" class=\"has-inline-color\">\nsummary(anscombe.quartet$Y1)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#0507f2\" class=\"has-inline-color\">   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. \n  4.260   6.315   7.580   7.501   8.570  10.840 <\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f20606\" class=\"has-inline-color\">\nsummary(anscombe.quartet$Y2)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#0a05f2\" class=\"has-inline-color\">   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. \n  3.100   6.695   8.140   7.501   8.950   9.260 <\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f20606\" class=\"has-inline-color\">\nsummary(anscombe.quartet$Y3)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#2b05f2\" class=\"has-inline-color\">   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. \n   5.39    6.25    7.11    7.50    7.98   12.74 <\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f20606\" class=\"has-inline-color\">\nsummary(anscombe.quartet$Y4)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#1b05f2\" class=\"has-inline-color\">   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. \n  5.250   6.170   7.040   7.501   8.190  12.500 <\/mark><\/em><\/code><\/pre>\n\n\n\n<p>and standard deviations:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f60404\" class=\"has-inline-color\">sd(anscombe.quartet$X1)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#1305f5\" class=\"has-inline-color\">&#091;1] 3.316625\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f60404\" class=\"has-inline-color\">sd(anscombe.quartet$X2)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#1b05f5\" class=\"has-inline-color\">&#091;1] 3.316625\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f60404\" class=\"has-inline-color\">sd(anscombe.quartet$X3)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#1b05f5\" class=\"has-inline-color\">&#091;1] 3.316625\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f60404\" class=\"has-inline-color\">sd(anscombe.quartet$X4)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#3c05f5\" class=\"has-inline-color\">&#091;1] 3.316625\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f60404\" class=\"has-inline-color\">sd(anscombe.quartet$Y1)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#2b05f5\" class=\"has-inline-color\">&#091;1] 2.031568\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f60404\" class=\"has-inline-color\">sd(anscombe.quartet$Y2)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#3305f5\" class=\"has-inline-color\">&#091;1] 2.031657\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f60404\" class=\"has-inline-color\">sd(anscombe.quartet$Y3)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#2305f5\" class=\"has-inline-color\">&#091;1] 2.030424\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f60404\" class=\"has-inline-color\">sd(anscombe.quartet$Y4)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#2305f5\" class=\"has-inline-color\">&#091;1] 2.030579<\/mark><\/em><\/code><\/pre>\n\n\n\n<p>It is important to plot data, rather than solely relying on descriptive parameters,&nbsp; so that their relation can be appreciated. To plot the first data set:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>ggplot(<span style=\"color: #ff0000;\"><em>data=anscombe.quartet,<\/em><\/span> <span style=\"color: #ff0000;\"><em>aes(x = X1, y = Y1)<\/em><\/span>) +<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>geom_point() +<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>ggtitle(label = 'Anscombe\\'s First Data Set') +<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>theme_bw()<\/em><\/span><\/code><\/pre>\n\n\n\n<p class=\"is-style-text-annotation is-style-text-annotation--1\">The backslash \\ before the &#8216;s is required so the quotation mark does not indicate the end of the title&#8217;s text string, but that the quotation mark is part of the title.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/anscombe1-1024x768.png\" alt=\"\" class=\"wp-image-2794\" srcset=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/anscombe1-1024x768.png 1024w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/anscombe1-300x225.png 300w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/anscombe1-768x576.png 768w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/anscombe1.png 1355w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><a href=\"http:\/\/pcool.dyndns.org:8080\/statsbook\/wp-content\/uploads\/anscombe1.png\"><\/a>It is customary to put the independent (explanatory or predictor) variable on the x-axis (abscissa) and the dependent (response or outcome) variable on the y-axis (ordinate). However, it is not always clear which variable is dependent and which independent.<\/p>\n\n\n\n<p>The second data set:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>ggplot(<span style=\"color: #ff0000;\"><em>data=anscombe.quartet,<\/em><\/span> <span style=\"color: #ff0000;\"><em>aes(x = X2, y = Y2)<\/em><\/span>) +<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>geom_point() +<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>ggtitle(label = 'Anscombe\\'s Second Data Set') +<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>theme_bw()<\/em><\/span><\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"835\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/07\/anscombe2-1024x835.png\" alt=\"\" class=\"wp-image-4682\" srcset=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/07\/anscombe2-1024x835.png 1024w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/07\/anscombe2-300x244.png 300w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/07\/anscombe2-768x626.png 768w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/07\/anscombe2.png 1362w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>The third data set:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>ggplot(<span style=\"color: #ff0000;\"><em>data=anscombe.quartet,<\/em><\/span> <span style=\"color: #ff0000;\"><em>aes(x = X3, y = Y3)<\/em><\/span>) +<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>geom_point() +<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>ggtitle(label = 'Anscombe\\'s Third Data Set') +<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>theme_bw()<\/em><\/span><\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"774\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/07\/anscombe3-1024x774.png\" alt=\"\" class=\"wp-image-4683\" srcset=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/07\/anscombe3-1024x774.png 1024w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/07\/anscombe3-300x227.png 300w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/07\/anscombe3-768x581.png 768w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/07\/anscombe3.png 1353w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>And the fourth data set:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>ggplot(<span style=\"color: #ff0000;\"><em>data=anscombe.quartet,<\/em><\/span> <span style=\"color: #ff0000;\"><em>aes(x = X4, y = Y4)<\/em><\/span>) +<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>geom_point() +<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>ggtitle(label = 'Anscombe\\'s Fourth Data Set') +<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>theme_bw()<\/em><\/span><\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"774\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/07\/anscombe4-1024x774.png\" alt=\"\" class=\"wp-image-4684\" srcset=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/07\/anscombe4-1024x774.png 1024w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/07\/anscombe4-300x227.png 300w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/07\/anscombe4-768x581.png 768w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/07\/anscombe4.png 1353w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>This illustrates the importance of plotting your data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Scatter plots are commonly used in medicine to illustrate the relation between two continuous variables. However, scatter plots can also be used to show discrete numeral and ordinal data. Download the anscombe.rda dataset for this example. Anscombe&#8217;s fictional data sets can be shown by: The four data sets are x1 vs y1, x2 vs y2, [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"inline_featured_image":false,"footnotes":""},"class_list":["post-541","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/541","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/comments?post=541"}],"version-history":[{"count":3,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/541\/revisions"}],"predecessor-version":[{"id":4686,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/541\/revisions\/4686"}],"wp:attachment":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/media?parent=541"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}