Essentially, a regression plot is a scatter plot with a fitted regression line. Regression lines could be linear, quadratic, and polynomial amongst others. The example below demonstrates how to create a linear regression plot for Anscombe’s first data set. Download the anscombe.rda dataset for this example1.
Create a scatterplot as discussed:
Anscombe’s fictional data sets can be shown by:
anscombe.quartet
X1 Y1 X2 Y2 X3 Y3 X4 Y4
1 10 8.04 10 9.14 10 7.46 8 6.58
2 8 6.95 8 8.14 8 6.77 8 5.76
3 13 7.58 13 8.74 13 12.74 8 7.71
4 9 8.81 9 8.77 9 7.11 8 8.84
5 11 8.33 11 9.26 11 7.81 8 8.47
6 14 9.96 14 8.10 14 8.84 8 7.04
7 6 7.24 6 6.13 6 6.08 8 5.25
8 4 4.26 4 3.10 4 5.39 19 12.50
9 12 10.84 12 9.13 12 8.15 8 5.56
10 7 4.82 7 7.26 7 6.42 8 7.91
11 5 5.68 5 4.74 5 5.73 8 6.89
The first data set has X1 on the x -axis and Y1 on the y-axis. To create a scatterplot:
regressionplot <- ggplot(data=anscombe.quartet, aes(x = X1,y = Y1)) +
geom_point() +
ggtitle(label = "Anscombe \'s First Data Set") +
theme_bw()
The backslash \ before the ‘s is required so the quotation mark does not indicate the end of the title’s text string, but that the quotation mark is part of the title!
To add a regression line with a 95% confidence interval:
regressionplot <- regressionplot +
geom_smooth(aes(x=X1, y=Y1), data=anscombe.quartet, method = 'lm')
regressionplot
Will show the plot:

Or without a 95% confidence interval:
regressionplot2 <- ggplot(data=anscombe.quartet, aes(x = X1,y = Y1)) +
geom_point() +
geom_smooth(method = 'lm', se = FALSE) +
ggtitle(label = "Anscombe \'s First Data Set") +
theme_bw()
regressionplot2
Will show the plot:

It is customary to put the independent (explanatory or predictor) variable on the x-axis (abscissa) and the dependent (response or outcome) variable on the y-axis (ordinate). However, it is not always clear which variable is dependent and which independent.