Suppose, we are interested in finding out if there is a difference in nutritional status in patients who present to our outpatient department with cancer as compared to patients who do not have cancer. Our impression is that patients with cancer have a nutritional status that is worse than other patients. We would like to test if this impression is correct.
In order to test this statement, we need to formulate a hypothesis. The null hypothesis states that there is no difference between the two groups of patients.
Null hypothesis: No difference between study groups
As an alternative there is the alternate hypothesis:
Alternate hypothesis: There is a difference between groups
It does seem perhaps a bit strange, but it is common practice in statistical testing to formulate a null hypothesis (ie there is no difference) and then try to refute this hypothesis with a statistical test. The alternative hypothesis becomes true if we can demonstrate our patients have a worse or better nutritional status (two sided testing). With our statistical test we are trying to prove if the null hypothesis is incorrect.
First, we need to define an outcome measure. We could use several variables for this (such as albumin concentration in peripheral blood, body mass index etc). We decided to use the thickness of the biceps skin fold as our outcome measure. The data is in the skinfold.rda data set.
So, the variable we use as our outcome measure is the biceps skin fold thickness. We measure this with a special instrument that measures the thickness of the skin fold in millimetres. The data collected is therefore continuous data.
Next, we need to define how certain we would like to be by setting the p value. In first instance one would say 100%! However, nothing is certain (and not even that!).