A mosaic plot is used to evaluate several categorical variables into one plot and are a way of displaying contingency tables graphically. The width and heights of individual cells represent their proportions of total. In each individual column, the width on the box is the same and is equal to the total count of that column. The height of each cell represents the proportion of patients in that column. In fact, each column in a mosaic plot represents a bar plot with the bins stacked on top of each other. Each cell in the mosaic plot represents the proportion of that combination of categories to the total and is just a graphical display of a contingency table.
To create a mosaic plot, the vcd (visualise categorical data) library 1 could be used. The example below shows how to create a mosaic plot using the Titanic data set included in R.
library(vcd)
Titanic
, , Age = Child, Survived = No
Sex
Class Male Female
1st 0 0
2nd 0 0
3rd 35 17
Crew 0 0
, , Age = Adult, Survived = No
Sex
Class Male Female
1st 118 4
2nd 154 13
3rd 387 89
Crew 670 3
, , Age = Child, Survived = Yes
Sex
Class Male Female
1st 5 1
2nd 11 13
3rd 13 14
Crew 0 0
, , Age = Adult, Survived = Yes
Sex
Class Male Female
1st 57 140
2nd 14 80
3rd 75 76
Crew 192 20
Please note the data is in a contingency table format (not a data frame) required for the mosaic function of the vcd package. To convert data to a contingency table, use the table function in R base package.
mosaic(Titanic)
The real strength of a mosaic plot is that it is possible to display the residuals of a chi-square test graphically by applying shading to the different categories according the the value of their residual (Pearson). In fact it is a graphical display of a chi-square test. To obtain such a plot, just set the shading argument to TRUE:
mosaic(Titanic, shade = TRUE)
It is now very easy to see which categories are under-represented (red) and over-represented (blue).
In addition, it is possible to create custom mosaic plots with ggplot2. An example function for two categorical variables can be downloaded here and here. Please note that the function requires three arguments: data frame, variable1 and variable2.
For example, using the build in mtcars dataset:
mosaicGG(mtcars,’cyl’,’am’)
4 6 8
0 3 4 12
1 8 3 2
Pearson’s Chi-squared test
data: table(data[[FILL]], data[[X]])
X-squared = 8.7407, df = 2, p-value = 0.01265
FILL X residual
1 0 4 -1.38175267
2 1 4 1.67045752
3 0 6 -0.07664242
4 1 6 0.09265616
5 0 8 1.27898721
6 1 8 -1.54622013
Warning message:
In chisq.test(table(data[[FILL]], data[[X]])) :
Chi-squared approximation may be incorrect
The expected frequencies are less than 5, hence the warning message.
The resulting plot:
Or as PDF: