{"id":892,"date":"2015-08-12T18:50:32","date_gmt":"2015-08-12T17:50:32","guid":{"rendered":"http:\/\/pcool.dyndns.org:8080\/statsbook\/?page_id=892"},"modified":"2025-07-02T11:20:24","modified_gmt":"2025-07-02T10:20:24","slug":"confidence-intervals","status":"publish","type":"page","link":"https:\/\/pcool.dyndns.org\/index.php\/confidence-intervals\/","title":{"rendered":"Confidence Intervals"},"content":{"rendered":"\n<p>When we want to describe the distribution of a large population, it is not practical or impossible to measure every member \/ item of the population. Therefore, a <strong>random<\/strong> sample is taken to obtain information about the population.<\/p>\n\n\n\n<p>The <strong>sample<\/strong> can be described in terms of the sample mean and sample standard deviation. If the sample is not Normally distributed further <a href=\"https:\/\/pcool.dyndns.org\/index.php\/descriptive-statistics\/\" data-type=\"page\" data-id=\"361\">descriptive statistics<\/a> can describe the sample. The sample statistics are used to describe \/ make inferences about the <strong>whole population.<\/strong><\/p>\n\n\n\n<p>The sample mean is a good estimator of the population mean (unbiased estimator). However, every time you take a different sample of the population you will get a different mean. The distribution of means will be a Normal distribution (<strong>central limit theorem<\/strong>) <strong>even<\/strong> if the samples or population are not Normally distributed!<\/p>\n\n\n\n<p>The distribution of the mean is a Normal distribution with as mean the sample mean. The dispersion (spread) is indicated by the <strong>standard error of the mean<\/strong> (SEM) which is the sample standard deviation divided by the square root of the sample size (n):<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(SEM = \\frac{SD(sample)}{\\sqrt{n}} \\)<script src=\"https:\/\/pcool.dyndns.org\/wp-includes\/js\/dist\/hooks.min.js?ver=dd5603f07f9220ed27f1\" id=\"wp-hooks-js\"><\/script>\n<script src=\"https:\/\/pcool.dyndns.org\/wp-includes\/js\/dist\/i18n.min.js?ver=c26c3dc7bed366793375\" id=\"wp-i18n-js\"><\/script>\n<script id=\"wp-i18n-js-after\">\nwp.i18n.setLocaleData( { 'text direction\\u0004ltr': [ 'ltr' ] } );\n\/\/# sourceURL=wp-i18n-js-after\n<\/script>\n<script  async src=\"https:\/\/cdnjs.cloudflare.com\/ajax\/libs\/mathjax\/2.7.7\/MathJax.js?config=TeX-MML-AM_CHTML\" id=\"mathjax-js\"><\/script>\n<\/div>\n\n\n\n<p>Therefore, the bigger the sample the less dispersion (spread) in the distribution of the mean.<\/p>\n\n\n\n<p>Confidence intervals can be constructed by the mean plus or minus the standard error of the mean:<\/p>\n\n\n\n<p><strong>Mean \u00b1 1 <\/strong>\u00d7 <strong>SEM = 68 %<\/strong><\/p>\n\n\n\n<p><strong>Mean \u00b1 2 <\/strong>\u00d7 <strong>SEM = 95 % (more accurately 1.96 times SEM)<\/strong><\/p>\n\n\n\n<p><strong>Mean \u00b1 3 <\/strong>\u00d7 <strong>SEM = 99 %<\/strong><\/p>\n\n\n\n<p>Please note the confidence interval is <strong>NOT<\/strong> a probability! The true value of the population mean is unknown and lies either within the confidence interval (probability = 1) or outside it (probability = 0). The confidence interval only displays the confidence in the estimate and this is influenced by the sample size. The larger the sample size, the narrower the confidence interval.<\/p>\n\n\n\n<p><strong>For example<\/strong>, the table below shows a sample of 100 men&#8217;s heights taken at random from the population:<\/p>\n\n\n\n<table id=\"tablepress-8\" class=\"tablepress tablepress-id-8\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">Height<\/th><th class=\"column-2\">Number or People<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping row-hover\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">168<\/td><td class=\"column-2\">5<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">169<\/td><td class=\"column-2\">25<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">170<\/td><td class=\"column-2\">40<\/td>\n<\/tr>\n<tr class=\"row-5\">\n\t<td class=\"column-1\">171<\/td><td class=\"column-2\">25<\/td>\n<\/tr>\n<tr class=\"row-6\">\n\t<td class=\"column-1\">172<\/td><td class=\"column-2\">5<\/td>\n<\/tr>\n<tr class=\"row-7\">\n\t<td class=\"column-1\">Total<\/td><td class=\"column-2\">100<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-8 from cache -->\n\n\n<p>To calculate the <strong>sample<\/strong> mean of the variable height:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(Mean(sample)= \\frac{5 \\cdot 168 + 25 \\cdot 169 + 40 \\cdot 170 +25 \\cdot 171 + 5 \\cdot 172}{100} = 170 \\)<\/div>\n\n\n\n<p>To calculate the <strong>sample<\/strong> standard deviation of the variable height:<\/p>\n\n\n\n<p>The sum of the squares about the mean is:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\( SumSquares = \\sum_{i=1}^{n} (Height(i) &#8211; Mean(Height))^2 \\)\n<\/div>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(SumSquares = 5 \\cdot (168-170)^2 + 25 \\cdot (169-170)^2 + \\)\n                                      \\(40 \\cdot (170-170)^2 + \\)\n                                      \\( 25 \\cdot (171 &#8211; 170)^2 + 5 \\cdot (172-170)^2 \\)<\/div>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(SumSquares = 5 \\cdot (-2)^2 + 25 \\cdot (-1)^2 + \\)\n                                      \\(40 \\cdot (0)^2 + \\)\n                                      \\( 25 \\cdot (1)^2 + 5 \\cdot (2)^2 \\)\n\n<\/div>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(SumSquares = 5 \\cdot 4 + 25 \\cdot 1 + \\)\n                                      \\(40 \\cdot 0+ \\)\n                                      \\( 25 \\cdot 1+ 5 \\cdot 4 \\)<\/div>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\n\\(SumSquares =20 + 25 + 25 + 20 = 90 \\)<\/div>\n\n\n\n<p>So the <strong>sample<\/strong> variance is:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(Variance(sample) = \\frac{1}{100-1} \\cdot 90 \\approx 0.91 \\)<\/div>\n\n\n\n<p>And the <strong>sample<\/strong> standard deviation is:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(SD(sample) = \\sqrt{0.91} \\approx 0.95 \\)<\/div>\n\n\n\n<p>Using the central limit theorem, the distribution of the population mean of the variable height has:<\/p>\n\n\n\n<p><strong>Population<\/strong> mean: 170 cm<\/p>\n\n\n\n<p><strong>Standard Error of the Mean<\/strong>: <\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(SEM = \\frac{0.95}{\\sqrt{100}} = 0.095 \\)<\/div>\n\n\n\n<p>To calculate the 95% confidence interval of the mean:<\/p>\n\n\n\n<p>Mean \u00b1 1.96 \u00d7 SEM  = 170 \u00b1 0.19 <\/p>\n\n\n\n<p>The 95% confidence interval therefore is: <\/p>\n\n\n\n<p>(169.81, 170.19).<\/p>\n\n\n\n<p><strong>To calculate in R:<\/strong><\/p>\n\n\n\n<p>The data are stored in <a href=\"https:\/\/pcool.dyndns.org:\/wp-content\/data_files\/exampleheights.rda\" target=\"_blank\" rel=\"noreferrer noopener\">exampleheights.rda<\/a>. The data frame is called heights and the variable height.<\/p>\n\n\n\n<p><strong>Sample mean:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>mean(heights$height)<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>&#91;1] 170<\/em><\/span><\/code><\/pre>\n\n\n\n<p><strong>Sample standard deviation:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>sd(heights$height)<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>&#91;1] 0.9534626<\/em><\/span><\/code><\/pre>\n\n\n\n<p><strong>Population mean:<\/strong><\/p>\n\n\n\n<p>The same as the sample mean: 170 cm.<\/p>\n\n\n\n<p><strong>Standard Error of the mean:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f50404\" class=\"has-inline-color\"><em>sd(heights$height)\/sqrt(nrow(heights))\n<\/em><\/mark><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#1b05f5\" class=\"has-inline-color\">&#91;1] 0.09534626<\/mark><\/em><\/code><\/pre>\n\n\n\n<p><strong>Therefore, to estimate the 95% confidence interval<\/strong>; 1.96 times the SEM:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>1.96*sd(heights$height)\/sqrt(<\/em><\/span><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f00303\" class=\"has-inline-color\">nrow(heights<\/mark><\/em><span style=\"color: #ff0000;\"><em>))<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>&#91;1] 0.1868787<\/em><\/span><\/code><\/pre>\n\n\n\n<p>So, the 95% confidence interval is:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>mean(heights$height)-1.96*sd(heights$height)\/sqrt(<em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f00303\" class=\"has-inline-color\">nrow(heights<\/mark><\/em><span style=\"color: #ff0000;\"><em>)<\/em><\/span>)<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>&#91;1] 169.8131<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>mean(heights$height)+1.96*sd(heights$height)\/sqrt(<em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f00303\" class=\"has-inline-color\">nrow(heights<\/mark><\/em><span style=\"color: #ff0000;\"><em>)<\/em><\/span>)<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>&#91;1] 170.1869<\/em><\/span><\/code><\/pre>\n\n\n\n<p>Or (169.81, 170.19)<\/p>\n\n\n\n<p><strong>Alternatively<\/strong>, perform an one sample t-test:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f20a19\" class=\"has-inline-color\">t.test(heights$height)<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#2f0af2\" class=\"has-inline-color\">\n\n\tOne Sample t-test\n\ndata:  heights$height\nt = 1783, df = 99, p-value &lt; 2.2e-16\nalternative hypothesis: true mean is not equal to 0\n95 percent confidence interval:\n<strong> 169.8108 170.1892<\/strong>\nsample estimates:\nmean of x \n      170<\/mark><\/em> <\/code><\/pre>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When we want to describe the distribution of a large population, it is not practical or impossible to measure every member \/ item of the population. Therefore, a random sample is taken to obtain information about the population. The sample can be described in terms of the sample mean and sample standard deviation. If the [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"inline_featured_image":false,"footnotes":""},"class_list":["post-892","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/892","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/comments?post=892"}],"version-history":[{"count":6,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/892\/revisions"}],"predecessor-version":[{"id":4810,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/892\/revisions\/4810"}],"wp:attachment":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/media?parent=892"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}