{"id":361,"date":"2015-07-12T09:16:13","date_gmt":"2015-07-12T08:16:13","guid":{"rendered":"http:\/\/pcool.dyndns.org:8080\/statsbook\/?page_id=361"},"modified":"2025-07-03T09:52:53","modified_gmt":"2025-07-03T08:52:53","slug":"descriptive-statistics","status":"publish","type":"page","link":"https:\/\/pcool.dyndns.org\/index.php\/descriptive-statistics\/","title":{"rendered":"Descriptive Statistics"},"content":{"rendered":"\n<p><strong>CENTRAL&nbsp;<\/strong><b>TENDENCY<\/b><\/p>\n\n\n\n<p>Averages: mean, median and mode.<\/p>\n\n\n\n<p>As an example,&nbsp;the first variable (X1) in&nbsp;<a href=\"https:\/\/pcool.dyndns.org:\/wp-content\/data_files\/anscombe.rda\" target=\"_blank\" rel=\"noreferrer noopener\">Anscombe\u2019s first data set<\/a><sup class='sup-ref-note' id='note-zotero-ref-p361-r1-o1'><a class='sup-ref-note' href='#zotero-ref-p361-r1'>1<\/a><\/sup> can be used.<\/p>\n\n\n\n<p>To show the data values:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>anscombe.quartet$X1<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>&#091;1] 10 &nbsp;8 13 &nbsp;9 11 14 &nbsp;6 &nbsp;4 12 &nbsp;7 &nbsp;5<\/em><\/span><\/code><\/pre>\n\n\n\n<p><strong>Mean<\/strong>: add and divide by number of observations.<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(Mean(X1) = \\frac{1}{n} \\sum_{i=1}^{n} X1{i} \\)<script src=\"https:\/\/pcool.dyndns.org\/wp-includes\/js\/dist\/hooks.min.js?ver=dd5603f07f9220ed27f1\" id=\"wp-hooks-js\"><\/script>\n<script src=\"https:\/\/pcool.dyndns.org\/wp-includes\/js\/dist\/i18n.min.js?ver=c26c3dc7bed366793375\" id=\"wp-i18n-js\"><\/script>\n<script id=\"wp-i18n-js-after\">\nwp.i18n.setLocaleData( { 'text direction\\u0004ltr': [ 'ltr' ] } );\n\/\/# sourceURL=wp-i18n-js-after\n<\/script>\n<script  async src=\"https:\/\/cdnjs.cloudflare.com\/ajax\/libs\/mathjax\/2.7.7\/MathJax.js?config=TeX-MML-AM_CHTML\" id=\"mathjax-js\"><\/script>\n<\/div>\n\n\n\n<p>Where n is the number of observations. <\/p>\n\n\n\n<p>Or in R:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>mean(anscombe.quartet$X1)<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>&#091;1] 9<\/em><\/span><\/code><\/pre>\n\n\n\n<p><strong>Median:<\/strong>&nbsp;middle value of the ordered data. If the number of observations is odd, the median value is one of the observations. If the number of observations is even, the median value is the mean of the central two observations.<\/p>\n\n\n\n<p>Or in R:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><span style=\"color: #ff0000;\">median(anscombe.quartet$X1)<\/span><\/em>\n<em><span style=\"color: #0000ff;\">&#091;1] 9<\/span><\/em><\/code><\/pre>\n\n\n\n<p>The <strong>mode, <\/strong>or most common category,&nbsp;is more difficult to calculate and is often best depicted graphically.<\/p>\n\n\n\n<p>There is no standard function for the mode in R. The following:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><span style=\"color: #ff0000;\">mode(anscombe.quartet$X1)<\/span><\/em>\n<em><span style=\"color: #0000ff;\">&#091;1] \"numeric\"<\/span><\/em><\/code><\/pre>\n\n\n\n<p>returns the internal storage mode of the R object and<strong> not<\/strong> the mode!<\/p>\n\n\n\n<p>The mode can be found with a user defined <a href=\"https:\/\/pcool.dyndns.org\/index.php\/functions\/\" data-type=\"page\" data-id=\"24\" target=\"_blank\" rel=\"noreferrer noopener\">function<\/a>:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><span style=\"color: #ff0000;\">Mode &lt;- function(x) {<\/span><\/em>\n<em><span style=\"color: #ff0000;\"> ux &lt;- unique(x)<\/span><\/em>\n<em><span style=\"color: #ff0000;\"> ux&#091;which.max(tabulate(match(x, ux)))]<\/span><\/em>\n<em><span style=\"color: #ff0000;\">}<\/span><\/em><\/code><\/pre>\n\n\n\n<p>This function calculates the mode in R by entering:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><span style=\"color: #ff0000;\">Mode(anscombe.quartet$X1)<\/span><\/em>\n<em><span style=\"color: #0000ff;\">&#091;1] 10<\/span><\/em><\/code><\/pre>\n\n\n\n<p class=\"is-style-text-annotation is-style-text-annotation--1\">Note the capital M as defined in the function!<\/p>\n\n\n\n<p><strong>DISPERSION<\/strong><\/p>\n\n\n\n<p>Indicators of the spread or variability of the data.<\/p>\n\n\n\n<p><strong>Variance:<\/strong>&nbsp;the average of the sum of the squares about the mean:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(Variance(X1) = \\frac{1}{n-1} \\sum_{i=1}^{n}(X1(i) &#8211; Mean(X1))^2 \\)<\/div>\n\n\n\n<p>The term<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\( \\sum_{i=1}^{n}(X1(i) &#8211; Mean(X1))^2 \\)<\/div>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\"><\/div>\n\n\n\n<p>&nbsp;is the sum of the squares about the mean.<\/p>\n\n\n\n<p>Or in R:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><span style=\"color: #ff0000;\">var(anscombe.quartet$X1)<\/span><\/em>\n<span style=\"color: #0000ff;\"><em>&#091;1] 11<\/em><\/span><\/code><\/pre>\n\n\n\n<p><strong>Standard Deviation:&nbsp;<\/strong>the square root of the variance:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(SD(X1)=\\sqrt(Variance(X1) \\)<\/div>\n\n\n\n<p>or<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(SD(X1) = \\sqrt(\\frac{1}{n-1} \\sum_{i=1}^{n}(X1(i) &#8211; Mean(X1))^2) \\)<\/div>\n\n\n\n<p>or<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(Variance(X1) = (SD(X1))^2 \\)<\/div>\n\n\n\n<p>Or in R:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>sd(anscombe.quartet$X1)<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>&#091;1] 3.316625<\/em><\/span><\/code><\/pre>\n\n\n\n<p>The square root of the variance gives the same result:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>sqrt(var(anscombe.quartet$X1))<\/em><\/span>\n<em><span style=\"color: #0000ff;\">&#091;1] 3.316625<\/span><\/em><\/code><\/pre>\n\n\n\n<p><strong>Range:&nbsp;<\/strong>highest minus lowest value.<\/p>\n\n\n\n<p>The maximum observation minus the minimum observation.<\/p>\n\n\n\n<p>Or in R:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"color: #ff0000;\"><em>range(anscombe.quartet$X1)<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>&#091;1] &nbsp;4 14<\/em><\/span><\/code><\/pre>\n\n\n\n<p>So, the range is 10.<\/p>\n\n\n\n<p><strong>Minimum:<\/strong> Lowest value<\/p>\n\n\n\n<p>Or in R:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>min(anscombe.quartet$X1)<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>&#091;1] 4<\/em><\/span><\/code><\/pre>\n\n\n\n<p><strong>Maximum:<\/strong> Highest value<\/p>\n\n\n\n<p>Or in R:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>max(anscombe.quartet$X1)<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>&#091;1] 14<\/em><\/span><\/code><\/pre>\n\n\n\n<p><strong>Interquartile range: &#8216;<\/strong>midspread&#8217; or &#8216;mid fifty&#8217;, the difference between the upper and lower quartiles.<\/p>\n\n\n\n<p>Data can be divided into four quartiles: Q1, Q2, Q3 and Q4.&nbsp;Q2 is equal to the median value and the interquartile range is Q3 minus Q1:<\/p>\n\n\n\n<p>IQR = Q3 &#8211; Q1<\/p>\n\n\n\n<p>Or in R:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><span style=\"color: #ff0000;\">IQR(anscombe.quartet$X1)<\/span><\/em>\n<span style=\"color: #0000ff;\"><em>&#091;1] 5<\/em><\/span><\/code><\/pre>\n\n\n\n<p class=\"is-style-text-annotation is-style-text-annotation--2\">In a <a href=\"https:\/\/pcool.dyndns.org\/index.php\/box-plot\/\" data-type=\"page\" data-id=\"501\">box plot<\/a>, the IQR is indicated by the box in the plot.<\/p>\n\n\n\n<p><strong>Normal distribution:<\/strong> mean, mode and median are approximately the same.<\/p>\n\n\n\n<p>Descriptive statistics can be obtained in R in one go. For example to obtain the descriptives of X1 in <a href=\"https:\/\/pcool.dyndns.org:\/wp-content\/data_files\/anscombe.rda\" target=\"_blank\" rel=\"noreferrer noopener\">Anscombe&#8217;s first data set<\/a>, load the data and enter the following into the console:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f2070f\" class=\"has-inline-color\">summary(anscombe.quartet$X1)<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#2507f3\" class=\"has-inline-color\">\n   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. \n    4.0     6.5     9.0     9.0    11.5    14.0 <\/mark><\/em><\/code><\/pre>\n\n\n\n<p>However, this summary doesn&#8217;t provide the standard deviation. To obtain a custom description with the dplyr<sup class='sup-ref-note' id='note-zotero-ref-p361-r2-o1'><a class='sup-ref-note' href='#zotero-ref-p361-r2'>2<\/a><\/sup> package:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f80707\" class=\"has-inline-color\">library(dplyr)\nanscombe.quartet %&gt;%\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f70202\" class=\"has-inline-color\">  summarise(n=n(), mean=mean(X1), median=median(X1), min=min(X1), max=max(X1), iqr=IQR(X1), sd=sd(X1), var=var(X1), q1=quantile(X1, 0.25), q2=quantile(X1,0.5), q3=quantile(X1,0.75), q4=quantile(X1,1))\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#0205f7\" class=\"has-inline-color\">   n mean median min max iqr       sd var  q1 q2   q3 q4\n1 11    9      9   4  14   5 3.316625  11 6.5  9 11.5 14<\/mark><\/em>\n<\/code><\/pre>\n\n\n\n<p><strong>Summarising data<\/strong><\/p>\n\n\n\n<p>If the distribution conforms a Normal distribution, data should be presented by the mean (central tendency) and the standard deviation (spread). However, the mean is very sensitive to outliers and is not a good measure of central tendency in skewed distributions. When data do not conform a Normal distribution, they should be summarised by the median and the interquartile range (middle 50% of the data).<\/p>\n\n\n\n<p><strong>Normally distributed data: mean and standard deviation<\/strong><\/p>\n\n\n\n<p><strong>Otherwise: median and interquartile range<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>CENTRAL&nbsp;TENDENCY Averages: mean, median and mode. As an example,&nbsp;the first variable (X1) in&nbsp;Anscombe\u2019s first data set can be used. To show the data values: Mean: add and divide by number of observations. Where n is the number of observations. Or in R: Median:&nbsp;middle value of the ordered data. If the number of observations is odd, [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"inline_featured_image":false,"footnotes":""},"class_list":["post-361","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/361","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/comments?post=361"}],"version-history":[{"count":5,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/361\/revisions"}],"predecessor-version":[{"id":4845,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/361\/revisions\/4845"}],"wp:attachment":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/media?parent=361"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}