Scatter plots are commonly used in medicine to illustrate the relation between two continuous variables. However, scatter plots can also be used to show discrete numeral and ordinal data.
Download the anscombe.rda dataset for this example 1.
Anscombe’s fictional data sets can be shown by:
anscombe.quartet
X1 Y1 X2 Y2 X3 Y3 X4 Y4
1 10 8.04 10 9.14 10 7.46 8 6.58
2 8 6.95 8 8.14 8 6.77 8 5.76
3 13 7.58 13 8.74 13 12.74 8 7.71
4 9 8.81 9 8.77 9 7.11 8 8.84
5 11 8.33 11 9.26 11 7.81 8 8.47
6 14 9.96 14 8.10 14 8.84 8 7.04
7 6 7.24 6 6.13 6 6.08 8 5.25
8 4 4.26 4 3.10 4 5.39 19 12.50
9 12 10.84 12 9.13 12 8.15 8 5.56
10 7 4.82 7 7.26 7 6.42 8 7.91
11 5 5.68 5 4.74 5 5.73 8 6.89
The four data sets are x1 v y1, x2 v y2, x3 v y3 and x4 v y4. The x and y variables have identical mean and standard deviations:
descriptive.table(vars = d(X1,X2,X3,X4,Y1,Y2,Y3,Y4),data= anscombe.quartet,
+ func.names =c(“Valid N”,”Mean”,”Median”,”St. Deviation”))
$`strata: all cases `
Valid N Mean Median St. Deviation
X1 11 9.000000 9.00 3.316625
X2 11 9.000000 9.00 3.316625
X3 11 9.000000 9.00 3.316625
X4 11 9.000000 8.00 3.316625
Y1 11 7.500909 7.58 2.031568
Y2 11 7.500909 8.14 2.031657
Y3 11 7.500000 7.11 2.030424
Y4 11 7.500909 7.04 2.030579
It is important to plot data, rather than solely relying on descriptive parameters, so that their relation can be appreciated. To plot the first data set:
ggplot() +
geom_point(aes(x = X1,y = Y1),data=anscombe.quartet) +
ggtitle(label = ‘Anscombe\’s First Data Set’) +
theme_bw()
If you are using ggplot < 0.9.2, the title can be set using: opts(title=’Anscombe\’s First Data Set’)
The backslash \ before the ‘s is required so the quotation mark does not indicate the end of the title’s text string, but that the quotation mark is part of the title!
It is customary to put the independent (explanatory or predictor) variable on the x-axis (abscissa) and the dependent (response or outcome) variable on the y-axis (ordinate). However, it is not always clear which variable is dependent and which independent.