Statsbook

Power Analysis

As stated, parametric statistics is preferred if at all possible. Parametric tests are more powerful than non parametric tests. This means that we require fewer patients to demonstrate a statistically significant difference. For example, (in analysing the same data) a t-test is more likely to demonstrate a statistically significant difference than a Wilcoxon test. Similarly, the Wilcoxon test is more powerful than the sign test.

Parametric tests are more powerful than non parametric tests because the distribution of the data is better known. If we know more about the distribution, we can incorporate this knowledge in our statistical test. The mathematics is not discussed here, but it appears logical that the more we know about the distribution, the more powerful a test can be.

If the test we use makes assumptions about the distribution of our data, we also limit its use to specific conditions. For example a t-test can only be used if the data conforms a Normal distribution. So, in general, the more powerful a test, the more restricted its use.

Statistical power is as the probability to correctly reject a false null hypothesis. Typically a power of 80% of higher is accepted in research.

When a statistical test is significant, there is sufficient power. However, a non significant test could be underpowered (there may be a difference but this has not been demonstrated by the data, type 2 error), or there really truly is no difference. In this respect, some journals suggest a post-hoc (rather than a priori) power analysis. However, a post-hoc power analysis will only confirm what is already known and can’t differentiate between an underpowered study or there truly being no difference.

Power analysis is performed a priori (beforehand), to determine the number of patients required in a study to detect a certain difference (effect)between study groups and minimise the chance of a type 2 error. The exact mathematics is beyond the scope of this book. Furthermore, power calculations depend on what type of statistical analysis will be performed. However, in order to estimate the sample size an estimate is required of:

  1. Difference desired to detect (effect size)
    • Difference in means
  2. Dispersion / spread of data
    • Standard deviation
  3. Significance level (α)
    • Type 1 error
  4. Statistical power
      • Test statistic

    Power Analysis in R:

    Power analysis is performed to estimate how many patients would be required in a study. It is related to a type two error: accepting the null hypothesis incorrectly (β). Obviously, we would want to make this chance as small as possible. But the less chance of making a type two error, the more chance of making a type one error. Normally statisticians suggest to set β = 0.2. The power of the test is than 1 – β = 0.8 or 80%.

    To calculate an estimated sample size the following information is required:

    • Variance (or standard deviation)
    • Difference desired to detect (δ)
    • Significance level (α): probability of making a type one error, usually 5%
    • Statistical power required: usually set at 80%
    • What type of test (t-test: one sample, two sample, paired, unpaired, etc)

    For example:

    How many patients are required in each of two groups to detect a difference in height of 10% with a t-test that has a power of 80 % and significance of 5 % whilst the mean is 175 cm and the standard deviation 20 cm?

    • The standard deviation is 20 cm (sd = 20 cm)
    • The difference to detect is 10 % of 175 cm: delta = 17.5 cm
    • The significance level is 5 %: sig.level = 0.05
    • Power = 80 % = 0.8

    Enter in the R console:

    power.t.test(sd=20,delta=17.5,sig.level=0.05,power=0.8)
      Two-sample t test power calculation 
                  n = 21.50714
              delta = 17.5
                 sd = 20
          sig.level = 0.05
              power = 0.8
        alternative = two.sided
    NOTE: n is number in *each* group

    The calculated number of patients in each group is normally rounded up. Even if the result of the power analysis would have been 21.01 patients in each group; this would have been rounded up to 22.

    The example above suggests 22 patients in each of the two groups (study size 44). Normally, it is recommended to be on the side of caution and increase this number to deal with unforeseen circumstances such as lost to follow up etc.

    Similarly, if we wanted to detect a 5 % difference (delta = 8.75):

    power.t.test(sd=20,delta=8.75,sig.level=0.05,power=0.8)
         Two-sample t test power calculation 
                  n = 82.98415
              delta = 8.75
                 sd = 20
          sig.level = 0.05
              power = 0.8
        alternative = two.sided
    NOTE: n is number in *each* group

    Power analysis recommends 83 patients in each group (study size 166).

    Another fantastic program that is especially created for power analysis is G*power, published by the University of Dusseldorf 1. The program is freely available for Windows and Mac.