Box and whisker plots, or just box plots are used for the presentation of continuous data, that can be grouped in categories. The box represent the interquartile range and the horizontal line within it the median value. The whiskers represent the upper and lower adjacent values that lie within 1.5 times the interquartile range (although the definition may very between software programs). Any value outside the whiskers is marked separately as an ‘outlier’.
Download and open the plotbox.rda dataset for this example. The data set contains the maximum flexion in 20 patients before and after a manipulation under anaesthesia. The data can be shown by:
plotbox
Pre Post
1 94 115
2 95 103
3 89 113
4 89 122
5 101 103
6 76 93
7 102 118
8 104 102
9 103 81
10 93 102
11 101 124
12 92 130
13 77 103
14 95 125
15 89 108
16 95 106
17 92 105
18 84 89
19 83 113
20 86 86
To create a box and whisker plot of the pre-mua data:
pre <- ggplot(data=plotbox, aes(y=Pre,x='PreMUA')) +
geom_boxplot()
pre
Add a black and white theme, a title and axes labels:
pre <- pre +
ggtitle('Preoperative flexion') +
xlab(label='Status') +
ylab(label='max flexion [deg]') +
theme_bw()
pre
To create a similar box and whisker plot of the post-mua flexion:
post <- ggplot(data=plotbox, aes(y=Post,x='PostMUA')) +
geom_boxplot() +
ggtitle('Postoperative flexion') +
xlab(label='Status') +
ylab(label='max flexion [deg]') +
theme_bw()
post
The preMUA plot looks symmetrical with the median in the middle of the box. However, the postMUA plot is clearly skewed as the median is located in the lower part of the box (right skewed, because the tail is to the ‘right’ or higher values). This is also reflected in the descriptive statistics:
summary(plotbox$Pre)
Min. 1st Qu. Median Mean 3rd Qu. Max.
76.00 88.25 92.50 92.00 96.50 104.00
summary(plotbox$Post)
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.0 102.0 105.5 107.0 115.8 130.0
The mean and median are approximately the same in preMUA suggesting the data might be Normally distributed. However, postMUA this does not seem likely.
It would be nice to show both plots in one:
both <- ggplot(data=plotbox) +
geom_boxplot(aes(y=Pre,x='Pre MUA')) +
geom_boxplot(aes(y=Post,x='Post MUA')) +
ggtitle('Flexion before and after MUA') +
xlab(label='Status') +
ylab(label='max flexion [deg]')+
theme_bw()
both
This will show the plot with the x-axis in alphabetical order. The easiest way to change this is by regrouping the data into tidy format. Create a data-frame called ‘muapre’ for all preoperative data; the first collumn is called ‘flexion’ and the second ‘group’ (which are all “Pre”). Similarly, a ‘muapost’ data-frame is created for all postoperative data with the same column names:
muapre<-data.frame(flexion=plotbox$Pre, group='Pre')
muapre
flexion group
1 94 Pre
2 95 Pre
.....
.....
19 83 Pre
20 86 Pre
muapost<-data.frame(flexion=plotbox$Post, group='Post')
muapost
flexion group
1 115 Post
2 103 Post
.....
.....
19 113 Post
20 86 Post
Now, bind the rows (rbind) together to create a new data-frame called mua:
mua<-rbind(muapre,muapost)
mua
flexion group
1 94 Pre
2 95 Pre
.....
.....
39 113 Post
40 86 Post
Now the ‘mua’ data-frame can be used to create a grouped box and whisker plot:
muaplot <- ggplot(data=mua, aes(y = flexion, x = group)) +
stat_boxplot() +
ggtitle(label='Flexion Following MUA') +
ylab(label='Flexion [deg]') +
theme_bw()
muaplot
