Statistics deals with uncertainty. When we make a statement; we also need to say how certain we are that this statement is correct. We would like this to be 100%, but realise that this is not possible. It is generally accepted (in medical statistics) that something is ‘proven’ if we are more than 95% certain. This means that there is a probability of 5% that our statement is *incorrect*. Or *p = 5%.*

**P Value Probability that statement is incorrect**

We usually call something ‘proven’ or ** statistically significant** if following a statistical test the p value < 5%.

**Statistically significant p < 5%**

That something is statistically significant doesn’t necessarily mean it is also ** clinically significant**. It might well be that, although statistically there is a difference, it is of no clinical importance.

Also, if we were unable to demonstrate a statistically significant difference; this doesn’t mean there is no difference. After all, if p = 5% there is a probability of 1 in 20 that our statement is incorrect! It might well be that with more patients in our study, we can demonstrate a significant difference (underpowered study).

If p = 5%, there is a probability of 1 in 20 that our statement is incorrect. However, in 19 out of 20 times our statement is correct. In a lot of cases we would like to be more certain than this (when we cross the road or fly in a plane!). Also when we buy a bottle of wine, we wouldn’t accept it if one in 20 bottles has less than 75 cl of wine!

If p = 1%, our statement is incorrect only 1 out of 100; whilst this is 1 in 1000 if p = 0.1%. The *smaller* the p value the *more* certain we are that what we say is true. Consequently, we aim for a *low* p value.

However, the lower the p value, the more patients would be required to demonstrate a statistically significant difference. Furthermore, in biological systems (unlike man made systems) the variation is large. Therefore, it is very unlikely to get p values much below 5% or 1%. If the p value is considerably lower than 1% it is more likely that we are trying to state the obvious or that the variables are dependent.

*In medical statistics we are usually satisfied something is statistically significant if p < 5%. If p < 5%, we feel this is unlikely to be due to chance and the null hypothesis is rejected in favour of the alternate hypothesis.*

For example, 29 patients that attended our outpatient department had their biceps skin fold measured. The results are listed in skinfold.rda.

The data are *not* in order of when the patient attended outpatient clinic, but grouped by diagnosis and increasing thickness of the skin fold. Patients either had ‘No Cancer’ or ‘Cancer’:

Now, we need to decide which ** statistical test** to use.

A statistical test can either be ** parametric** or

**.**

*non parametric*A parametric test can ** only** be used if the data are N

**. If the data are not Normally distributed, we can**

*ormally distributed***use parametric statistics and we will have to use a non parametric test.**

*not*