Data and Plots | Stats Book

DATA TYPES

Quantitative
Discrete
Continuous

Qualitative
Nominal
Ordinal
Binary

Variable:

Actual property measured by individual observations

Variate:

Single score or reading of a given variable

Normal distribution (ie Height):

$Height=\frac{1}{\sqrt{2\pi}}exp\left ( \frac{Height^2} {2}\right )$

Mean:

$Mean(x)=\frac{1}{n} \sum_{i=1}^{n} x(i)$

Variance:

$Variance(X 1)=\frac{1}{n-1}\sum_{i=1}^{n} (X1(i)-Mean(X1)) ^{2}$

Standard Deviation:

$\sigma=\sqrt{Variance}$

If the data are Normally distributed, the distribution of data can be described by two parameters:

Mean
Standard deviation (or variance)

Mean + / – 1 × SD = 68 %

Mean + / – 2 × SD = 95 % (more accurately 1.96 times SD)

Mean + / – 3 × SD = 99 %

A different mean shifts the curve along the x-axis, but does not alter its shape. When the standard deviation decreases, the curve becomes steeper; when the standard deviation increases, the curve becomes flatter.

Not Normal distribution:

Range:

Interval from lowest to highest value

Interquartile range:

Difference between upper and lower quartiles

Mode:

Most common category

Median:

Equal number of measurements above and below

A skewed curve is skewed to the right if the ‘tail’ is on the right side and skewed to the left in the ‘tail’ is to the left.

Confidence interval:

The sample mean is an unbiased estimator of the population mean.

Central limit theorem: the distribution of mean (from different samples) will be a Normal distribution, even if the samples or population are not Normally distributed.

The distribution of the mean is Normal with the sample mean as mean and the standard error of the mean (SEM) as measure of dispersion (n is sample size):

$SEM=\frac{SD(sample)}{\sqrt{n}}$

Confidence intervals can be constructed by the mean plus or minus the standard error of the mean:

Mean + / – 1 × SEM = 68 %

Mean + / – 2 × SEM = 95 % (more accurately 1.96 times SEM)

Mean + / – 3 × SEM = 99 %