{"id":578,"date":"2015-08-01T13:11:45","date_gmt":"2015-08-01T12:11:45","guid":{"rendered":"http:\/\/pcool.dyndns.org:8080\/statsbook\/?page_id=578"},"modified":"2025-07-03T09:37:16","modified_gmt":"2025-07-03T08:37:16","slug":"normal-distribution","status":"publish","type":"page","link":"https:\/\/pcool.dyndns.org\/index.php\/normal-distribution\/","title":{"rendered":"Normal Distribution"},"content":{"rendered":"\n<p>Suppose we measured the heights of 100 people to the nearest centimetre. The measured variable is called &#8216;Height&#8217; and there are 100 variates (individual measurements). The number of observations is n = 100 and are summarised in the table below:<\/p>\n\n\n\n<table id=\"tablepress-8\" class=\"tablepress tablepress-id-8\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">Height<\/th><th class=\"column-2\">Number or People<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping row-hover\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">168<\/td><td class=\"column-2\">5<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">169<\/td><td class=\"column-2\">25<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">170<\/td><td class=\"column-2\">40<\/td>\n<\/tr>\n<tr class=\"row-5\">\n\t<td class=\"column-1\">171<\/td><td class=\"column-2\">25<\/td>\n<\/tr>\n<tr class=\"row-6\">\n\t<td class=\"column-1\">172<\/td><td class=\"column-2\">5<\/td>\n<\/tr>\n<tr class=\"row-7\">\n\t<td class=\"column-1\">Total<\/td><td class=\"column-2\">100<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-8 from cache -->\n\n\n<p>The data can also be plotted:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"298\" height=\"225\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/height1.png\" alt=\"\" class=\"wp-image-3193\"\/><\/figure>\n\n\n\n<p><a href=\"http:\/\/pcool.dyndns.org:8080\/statsbook\/wp-content\/uploads\/height1.png\"><\/a>If the data points are connected, a distribution plot is obtained:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"298\" height=\"225\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/height2.png\" alt=\"\" class=\"wp-image-3194\"\/><\/figure>\n\n\n\n<p>If the measurements would have been in millimetres rather than centimetres, the curve becomes smooth:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"298\" height=\"249\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/height3.png\" alt=\"\" class=\"wp-image-3195\"\/><\/figure>\n\n\n\n<p>This graph shows the <strong><em>distribution<\/em><\/strong> of the variable Height.<\/p>\n\n\n\n<p>Also note that the y-axis now shows the <strong><em>probability<\/em><\/strong> rather than the actual number of people. This curve is called the <strong><em>Gaussian or \u2018bell shaped\u2019 curve<\/em><\/strong>.<\/p>\n\n\n\n<p>The curve has the following basic formula:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(f(x) = \\frac{1}{\\sigma \\sqrt{2\\pi} } e^{-\\frac{1}{2}\\left(\\frac{x-\\mu}{\\sigma}\\right)^2}  \\)<script src=\"https:\/\/pcool.dyndns.org\/wp-includes\/js\/dist\/hooks.min.js?ver=dd5603f07f9220ed27f1\" id=\"wp-hooks-js\"><\/script>\n<script src=\"https:\/\/pcool.dyndns.org\/wp-includes\/js\/dist\/i18n.min.js?ver=c26c3dc7bed366793375\" id=\"wp-i18n-js\"><\/script>\n<script id=\"wp-i18n-js-after\">\nwp.i18n.setLocaleData( { 'text direction\\u0004ltr': [ 'ltr' ] } );\n\/\/# sourceURL=wp-i18n-js-after\n<\/script>\n<script  async src=\"https:\/\/cdnjs.cloudflare.com\/ajax\/libs\/mathjax\/2.7.7\/MathJax.js?config=TeX-MML-AM_CHTML\" id=\"mathjax-js\"><\/script>\n<\/div>\n\n\n\n<p class=\"is-style-text-annotation is-style-text-annotation--1\">Where \u03c3 is the standard deviation and \u03bc the mean.<\/p>\n\n\n\n<p>Further mathematics is beyond the scope of this book. However, it should be appreciated the function is controlled by the mean (\u03bc) and the standard deviation (\u03c3).<\/p>\n\n\n\n<p>When data are distributed according to this formula, the data are <strong><em>Normally distributed<\/em><\/strong>. Any other distribution of data is called a <strong><em>not Normal<\/em><\/strong>. If the distribution of the data is according to a described formula (ie Normal), <a href=\"https:\/\/pcool.dyndns.org\/index.php\/non-parametric-tests-2\/\" data-type=\"page\" data-id=\"605\">parametric statistical analysis<\/a> can be used to analyse the data. However, if the data are not distributed according to a described distribution, <a href=\"https:\/\/pcool.dyndns.org\/index.php\/non-parametric-tests\/\" data-type=\"page\" data-id=\"594\">non parametric analysis<\/a> should be used.<\/p>\n\n\n\n<p><strong>Mean<\/strong><\/p>\n\n\n\n<p>Returning to the example, the mean height can be calculated:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(Mean(Height) = \\frac{5 \\cdot 168 + 25 \\cdot 169 + 40 \\cdot 170 + 25 \\cdot 171 + 5 \\cdot 172}{100} = 170 \\)<\/div>\n\n\n\n<p>As can be seen, the mean is at the top of the Gaussian curve.<\/p>\n\n\n\n<p>The curve is symmetrical around the mean.<\/p>\n\n\n\n<p>In general terms, the mean is:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(Mean(x) = \\frac{1}{n} \\sum_{i-1}^{n} x(i) \\)<\/div>\n\n\n\n<p>The mean and average are often used as synonyms. However, they are <strong><em>not<\/em><\/strong> quite the same. The mean is an average, but there are <em>other<\/em> averages than the mean (such as <a href=\"https:\/\/pcool.dyndns.org\/index.php\/other-distributions\/\" data-type=\"page\" data-id=\"581\">mode and median<\/a>).<\/p>\n\n\n\n<p>A Normal distribution with mean zero and standard deviation of 1:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"244\" height=\"198\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/mean.png\" alt=\"\" class=\"wp-image-3274\"\/><\/figure>\n\n\n\n<p>The same distribution with mean -1:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"244\" height=\"197\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/mean-1.png\" alt=\"\" class=\"wp-image-3272\"\/><\/figure>\n\n\n\n<p>And with mean +1:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"244\" height=\"198\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/mean-11.png\" alt=\"\" class=\"wp-image-3273\"\/><\/figure>\n\n\n\n<p>As can be seen, the distribution curve shifts to the left when the mean decreases and to the right when the mean increases. The shape of the curve however, remains unchanged: <strong>A different mean shifts the curve along the x-axis, but does not alter its shape.<\/strong><\/p>\n\n\n\n<p><strong>Standard Deviation and Variance<\/strong><\/p>\n\n\n\n<p>In describing normally distributed data, the standard deviation and variance are used. They are a measure of the <strong><em>spread (or variability) <\/em><\/strong>of the data.<\/p>\n\n\n\n<p>The variance is the standard deviation (S) squared or:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(Variance = \\sigma^2\\)<\/div>\n\n\n\n<p>or<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(\\sigma = \\sqrt{Variance} \\)<\/div>\n\n\n\n<p>In general terms:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\( Variance{X} = \\frac{1}{n-1} \\sum_{i=1}^{n} (X(i) &#8211; Mean(X))^2 \\) <\/div>\n\n\n\n<p>The term<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\( \\sum_{i=1}^{n} (X(i) &#8211; Mean(X))^2 \\)<\/div>\n\n\n\n<p>is the sum of the squares about the mean.<\/p>\n\n\n\n<p>Returning to the example (with mean 170 cm), the sum of the squares about the mean is:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(SumSquares = \\sum_{i=1}^{n} (Height(i) &#8211; Mean(Height))^2  \\)<\/div>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(SumSquares = 5 \\cdot (168-170)^2 + 25 \\cdot (169-170)^2 + \\)\n\\(40 \\cdot (170-170)^2 + \\)\n\\(25 \\cdot (171-170)^2 + 5 \\cdot (172-170)^2 \\)<\/div>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(SumSquares = 5 \\cdot (-2)^2 + 25 \\cdot (-1)^2 + 40 \\cdot (0)^2 + \\)\n\\(25 \\cdot (1)^2 + 5 \\cdot (2)^2 \\)<\/div>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(SumSquares = 20 + 25 + 0 + 25 + 20 = 90 \\)<\/div>\n\n\n\n<p>So the variance is:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(Variance = \\frac{1}{100-1} \\cdot 90 \\approx 0.91 \\)<\/div>\n\n\n\n<p>And the standard deviation is:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(SD = \\sqrt{0.91} \\approx 0.95 \\)<\/div>\n\n\n\n<p>Again, lets look at a Normal distribution with a mean of zero and a standard deviation of 1:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"253\" height=\"198\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/sd1.png\" alt=\"\" class=\"wp-image-3618\"\/><\/figure>\n\n\n\n<p>The same distribution, but with standard deviation 0.5:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"244\" height=\"194\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/sd0.5.png\" alt=\"\" class=\"wp-image-3617\"\/><\/figure>\n\n\n\n<p>And with standard deviation 2:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"244\" height=\"194\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/sd2.png\" alt=\"\" class=\"wp-image-3619\"\/><\/figure>\n\n\n\n<p>So the standard deviation is a measure of the spread (or variability of the data). When the standard deviation decreases, the curve becomes steeper (data closer together). When the standard deviation increases, the curve becomes flatter (data further spread apart).<\/p>\n\n\n\n<p><strong>If the data are normally distributed, the distribution of data can be described by two parameters: the mean and the standard deviation (or variance).<\/strong><\/p>\n\n\n\n<p>As stated above, the standard deviation is a measure of the spread of data. It can be shown that 68.27% of the data lie in an interval plus or minus one standard deviation from the mean. Similarly 95.45% of the data lie in an interval plus or minus twice the standard deviation and 99.73% of the data within an interval plus or minus three times the standard deviation.<\/p>\n\n\n\n<p>Or:<\/p>\n\n\n\n<p><strong>Mean \u00b1 1 <\/strong>\u00d7 <strong>SD = 68 %<\/strong><\/p>\n\n\n\n<p><strong>Mean \u00b1 2 <\/strong>\u00d7 <strong>SD = 95 % (more accurately 1.96 times the SD)<br><\/strong><\/p>\n\n\n\n<p><strong>Mean \u00b1 3 <\/strong>\u00d7 <strong>SD = 99 %<\/strong><\/p>\n\n\n\n<p>This is shown in graphs below.<\/p>\n\n\n\n<p>Mean \u00b1 1 standard deviation (68% of the data in interval):<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"173\" height=\"178\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/oncesd.png\" alt=\"\" class=\"wp-image-3287\"\/><\/figure>\n\n\n\n<p>Mean \u00b1 2 standard deviations (95% of the data in interval):<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"175\" height=\"180\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/twicesd.png\" alt=\"\" class=\"wp-image-3857\"\/><\/figure>\n\n\n\n<p>Mean \u00b1 3 standard deviations (99% of the data in interval):<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"175\" height=\"180\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/thricesd.png\" alt=\"\" class=\"wp-image-3813\"\/><\/figure>\n\n\n\n<p>In our example, the Mean was 170 and the standard deviation was 0.95.<\/p>\n\n\n\n<p>Therefore:<\/p>\n\n\n\n<p>68% of the data are between 169.05 and 170.95<\/p>\n\n\n\n<p>95% of the data are between 168.1 and 171.9<\/p>\n\n\n\n<p>99% of the data are between 167.15 and 172.85<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Suppose we measured the heights of 100 people to the nearest centimetre. The measured variable is called &#8216;Height&#8217; and there are 100 variates (individual measurements). The number of observations is n = 100 and are summarised in the table below: The data can also be plotted: If the data points are connected, a distribution plot [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"inline_featured_image":false,"footnotes":""},"class_list":["post-578","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/578","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/comments?post=578"}],"version-history":[{"count":4,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/578\/revisions"}],"predecessor-version":[{"id":4839,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/578\/revisions\/4839"}],"wp:attachment":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/media?parent=578"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}