{"id":2122,"date":"2018-01-01T23:10:39","date_gmt":"2018-01-01T23:10:39","guid":{"rendered":"http:\/\/pcool.dyndns.org:8080\/statsbook\/?page_id=2122"},"modified":"2025-06-30T17:10:14","modified_gmt":"2025-06-30T16:10:14","slug":"linear-discriminant-analysis","status":"publish","type":"page","link":"https:\/\/pcool.dyndns.org\/index.php\/linear-discriminant-analysis\/","title":{"rendered":"Linear Discriminant Analysis"},"content":{"rendered":"\n<p>Linear Discriminant Analysis(LDA)&nbsp;is a&nbsp; linear transformation technique of <strong>supervised<\/strong> machine learning (there needs to be a classifier). The aim of the technique is to find a linear combination of variables that separates the classifier variables as much as possible.<\/p>\n\n\n\n<p>To illustrate the method, R\u2019s build in data set \u201ciris\u201d is used. The iris data set contains&nbsp;the sepal and petal length and width&nbsp;of three different types of iris plants: Setosa, Versicolor and Virginica. Have a look at the iris data set:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>head(iris)<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>Sepal.Length Sepal.Width Petal.Length Petal.Width Species<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>1 5.1 3.5 1.4 0.2 setosa<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>2 4.9 3.0 1.4 0.2 setosa<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>3 4.7 3.2 1.3 0.2 setosa<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>4 4.6 3.1 1.5 0.2 setosa<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>5 5.0 3.6 1.4 0.2 setosa<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>6 5.4 3.9 1.7 0.4 setosa<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>str(iris)<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>\u2018data.frame': 150 obs. of 5 variables:<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 \u2026<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 \u2026<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 \u2026<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 \u2026<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>$ Species : Factor w\/ 3 levels \u201csetosa\u201d,\u201dversicolor\u201d,..: 1 1 1 1 1 1 1 1 1 1 \u2026<\/em><\/span><\/code><\/pre>\n\n\n\n<p>Load the MASS <a href=\"https:\/\/pcool.dyndns.org\/index.php\/packages\/\" data-type=\"page\" data-id=\"22\">package<\/a>:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>library(MASS)<\/em><\/span><\/code><\/pre>\n\n\n\n<p>And perform linear discriminant analysis:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>lda_iris &lt;- lda(iris$Species ~ iris&#91;,1] + iris&#91;,2] + iris&#91;,3] + iris&#91;,4])<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>lda_iris<\/em><\/span>\n<em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#0813f2\" class=\"has-inline-color\">Call:\nlda(iris$Species ~ iris&#91;, 1] + iris&#91;, 2] + iris&#91;, 3] + iris&#91;, \n    4])\n\nPrior probabilities of groups:\n    setosa versicolor  virginica \n 0.3333333  0.3333333  0.3333333 \n\nGroup means:\n           iris&#91;, 1] iris&#91;, 2] iris&#91;, 3] iris&#91;, 4]\nsetosa         5.006     3.428     1.462     0.246\nversicolor     5.936     2.770     4.260     1.326\nvirginica      6.588     2.974     5.552     2.026\n\nCoefficients of linear discriminants:\n                 LD1         LD2\niris&#91;, 1]  0.8293776 -0.02410215\niris&#91;, 2]  1.5344731 -2.16452123\niris&#91;, 3] -2.2012117  0.93192121\niris&#91;, 4] -2.8104603 -2.83918785\n\nProportion of trace:\n   LD1    LD2 \n0.9912 0.0088 <\/mark><\/em><\/code><\/pre>\n\n\n\n<p>Now use the model to predict the species of each plant:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><span style=\"color: #ff0000;\">lda_iris_predict<span class=\"s1\"> &lt;- predict(<\/span>lda_iris<span class=\"s1\">, <\/span>iris<span class=\"s1\">&#91;,1:4])<\/span><\/span><\/em><br><br><\/code><\/pre>\n\n\n\n<p class=\"p1\">Attach this predicted variable (stored in lda_iris_predict$class) to the original iris data frame as a new variable called Predict and have a look at the top (head) of the data frame:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f2050d\" class=\"has-inline-color\">iris$Predict &lt;- lda_iris_predict$class<br>head(iris)<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#0628f1\" class=\"has-inline-color\"><br>  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Predic<br>1          5.1         3.5          1.4         0.2  setosa  setosa<br>2          4.9         3.0          1.4         0.2  setosa  setosa<br>3          4.7         3.2          1.3         0.2  setosa  setosa<br>4          4.6         3.1          1.5         0.2  setosa  setosa<br>5          5.0         3.6          1.4         0.2  setosa  setosa<br>6          5.4         3.9          1.7         0.4  setosa  setosa<\/mark><\/em><br><br><\/code><\/pre>\n\n\n\n<p class=\"p1\">The values of the discriminant functions can be found by:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f2020a\" class=\"has-inline-color\">lda_iris_predict$x&#91;,1] # values for the first discriminant function<\/mark><\/em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#030ef2\" class=\"has-inline-color\"><em>\n         1          2          3          4          5          6          7          8          9         10         11         12         13         14         15 \n 8.0617998  7.1286877  7.4898280  6.8132006  8.1323093  7.7019467  7.2126176  7.6052935  6.5605516  7.3430599  8.3973865  7.2192969  7.3267960  7.5724707  9.8498430 \n.....<\/em>\n.....<em>\n       136        137        138        139        140        141        142        143        144        145        146        147        148        149        150 \n-6.7967163 -6.5244960 -4.9955028 -3.9398530 -5.2038309 -6.6530868 -5.1055595 -5.5074800 -6.7960192 -6.8473594 -5.6450035 -5.1795646 -4.9677409 -5.8861454 -4.6831543 <\/em><\/mark>\n\n<span style=\"color: #ff0000;\"><em>lda_iris_predict$x&#91;,2] # values for the second discriminant function<\/em><\/span>\n<em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#1d07f7\" class=\"has-inline-color\">           1            2            3            4            5            6            7            8            9           10           11           12           13 \n-0.300420621  0.786660426  0.265384488  0.670631068 -0.514462530 -1.461720967 -0.355836209  0.011633838  1.015163624  0.947319209 -0.647363392  0.109646389  1.072989426 \n .....<\/mark><\/em>\n.....<em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#1d07f7\" class=\"has-inline-color\">\n         144          145          146          147          148          149          150 \n-1.460686950 -2.428950671 -1.677717335  0.363475041 -0.821140550 -2.345090513 -0.332033811 <\/mark><\/em><\/code><\/pre>\n\n\n\n<p class=\"p1\">How good were the predictions? Just create a confusion table of the classifier (Species) agains the predictor (Predict):<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>table(iris$Species, iris$Predict)<\/em><\/span><br><br><\/code><\/pre>\n\n\n\n<table id=\"tablepress-35\" class=\"tablepress tablepress-id-35\">\n<thead>\n<tr class=\"row-1\">\n\t<td class=\"column-1\"><\/td><th class=\"column-2\">Setosa<\/th><th class=\"column-3\">Versicolor<\/th><th class=\"column-4\">Virginica<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping row-hover\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">Setosa<\/td><td class=\"column-2\">50<\/td><td class=\"column-3\">0<\/td><td class=\"column-4\">0<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">Versicolor<\/td><td class=\"column-2\">0<\/td><td class=\"column-3\">48<\/td><td class=\"column-4\">2<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">Virginica<\/td><td class=\"column-2\">0<\/td><td class=\"column-3\">1<\/td><td class=\"column-4\">49<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-35 from cache -->\n\n\n<p class=\"p1\">So, all Setosa species were predicted correctly. Two Versicolor species were incorrectly labelled as Virginica and one Virginica was incorrectly labelled as Versicolor. Consequently, the accuracy is:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\(Acc = \\frac{50 + 48 + 49}{150}  = 98\\% \\)<script src=\"https:\/\/pcool.dyndns.org\/wp-includes\/js\/dist\/hooks.min.js?ver=dd5603f07f9220ed27f1\" id=\"wp-hooks-js\"><\/script>\n<script src=\"https:\/\/pcool.dyndns.org\/wp-includes\/js\/dist\/i18n.min.js?ver=c26c3dc7bed366793375\" id=\"wp-i18n-js\"><\/script>\n<script id=\"wp-i18n-js-after\">\nwp.i18n.setLocaleData( { 'text direction\\u0004ltr': [ 'ltr' ] } );\n\/\/# sourceURL=wp-i18n-js-after\n<\/script>\n<script  async src=\"https:\/\/cdnjs.cloudflare.com\/ajax\/libs\/mathjax\/2.7.7\/MathJax.js?config=TeX-MML-AM_CHTML\" id=\"mathjax-js\"><\/script>\n<\/div>\n\n\n\n<p class=\"p1\">To plot the data in ggplot:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>iris_lda_df &lt;- data.frame(first = lda_iris_predict$x&#91;,1], <\/em><\/span><br><span style=\"color: #ff0000;\"><em>second = lda_iris_predict$x&#91;,2], Species = iris$Species, <\/em><\/span><br><span style=\"color: #ff0000;\"><em>Predict = iris$Predict)<\/em><\/span><br><br><span style=\"color: #ff0000;\"><em>ggplot(iris_lda_df, aes(x = first, y = second, colour = Species, <\/em><\/span><br><span style=\"color: #ff0000;\"><em>shape = Predict)) +<\/em><\/span><br><span style=\"color: #ff0000;\"><em>  geom_point(size = 4, alpha = 0.8) +<\/em><\/span><br><span style=\"color: #ff0000;\"><em>  ggtitle(\"Linear Discriminant Analysis\") +<\/em><\/span><br><span style=\"color: #ff0000;\"><em>  scale_x_continuous(\"First Discriminant Function\") +<\/em><\/span><br><span style=\"color: #ff0000;\"><em>  scale_y_continuous(\"Second Discriminant Function\")<\/em><\/span> +<br>  <span style=\"color: #ff0000;\"><em>theme_bw()<\/em><\/span><br><br><\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"908\" height=\"914\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/lda.png\" alt=\"\" class=\"wp-image-3262\" srcset=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/lda.png 908w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/lda-298x300.png 298w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/lda-150x150.png 150w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/lda-768x773.png 768w\" sizes=\"auto, (max-width: 908px) 100vw, 908px\" \/><\/figure>\n\n\n\n<p class=\"p1\">The plot shows the separation obtained and the classification (actual and predicted). The two green squares were predicted as Virginica, but are actually Versicolor species. Similarly, the blue triangle was predicted as Versicolor but was actually a Virginica species. Overall, a very satisfactory model!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Linear Discriminant Analysis(LDA)&nbsp;is a&nbsp; linear transformation technique of supervised machine learning (there needs to be a classifier). The aim of the technique is to find a linear combination of variables that separates the classifier variables as much as possible. To illustrate the method, R\u2019s build in data set \u201ciris\u201d is used. The iris data set [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"inline_featured_image":false,"footnotes":""},"class_list":["post-2122","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/2122","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/comments?post=2122"}],"version-history":[{"count":2,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/2122\/revisions"}],"predecessor-version":[{"id":4620,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/2122\/revisions\/4620"}],"wp:attachment":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/media?parent=2122"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}