{"id":2028,"date":"2017-07-15T10:04:32","date_gmt":"2017-07-15T09:04:32","guid":{"rendered":"http:\/\/pcool.dyndns.org:8080\/statsbook\/?page_id=2028"},"modified":"2025-07-01T20:40:34","modified_gmt":"2025-07-01T19:40:34","slug":"decision-tree-plot","status":"publish","type":"page","link":"https:\/\/pcool.dyndns.org\/index.php\/decision-tree-plot\/","title":{"rendered":"Decision Tree Plot"},"content":{"rendered":"\n<p>A decision tree plot can be useful to visualise decision rules for categorical or continuous outcome variables. A popular package to create decision trees in R is rpart<sup class='sup-ref-note' id='note-zotero-ref-p2028-r1-o1'><a class='sup-ref-note' href='#zotero-ref-p2028-r1'>1<\/a><\/sup>. In addition, there is a separate package to produce plots with rpart; rpart.plot<sup class='sup-ref-note' id='note-zotero-ref-p2028-r2-o1'><a class='sup-ref-note' href='#zotero-ref-p2028-r2'>2<\/a><\/sup>. Both packages should be<a href=\"https:\/\/pcool.dyndns.org\/index.php\/packages\/\" data-type=\"page\" data-id=\"22\"> installed<\/a>.<\/p>\n\n\n\n<p><strong>Binary (categorical) outcome<\/strong><\/p>\n\n\n\n<p>The rpart package\u00a0<sup class='sup-ref-note' id='note-zotero-ref-p2028-r3-o1'><a class='sup-ref-note' href='#zotero-ref-p2028-r3'>3<\/a><\/sup> contains a data frame with information about patients with kyphosis. To view the first 6 observations:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#ed072e\" class=\"has-inline-color\">library(rpart)\nhead(kyphosis)<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#4206ee\" class=\"has-inline-color\">\n  Kyphosis Age Number Start\n1   absent  71      3     5\n2   absent 158      3    14\n3  present 128      4     5\n4   absent   2      5     1\n5   absent   1      4    15\n6   absent   1      2    16<\/mark><\/em><\/code><\/pre>\n\n\n\n<p>The data frame contains 4 variables (str stands for structure):<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>str(kyphosis)<\/em><\/span>\n<span style=\"color: #0000ff;\"><em>'data.frame': 81 obs. of 4 variables:<\/em><\/span>\n<span style=\"color: #0000ff;\"><em> $ Kyphosis: Factor w\/ 2 levels \"absent\",\"present\": 1 1 2 1 1 1 1 1 1 2 ...<\/em><\/span>\n<span style=\"color: #0000ff;\"><em> $ Age : int 71 158 128 2 1 1 61 37 113 59 ...<\/em><\/span>\n<span style=\"color: #0000ff;\"><em> $ Number : int 3 3 4 5 4 2 2 3 2 6 ...<\/em><\/span>\n<span style=\"color: #0000ff;\"><em> $ Start : int 5 14 5 1 15 16 17 16 16 12 ...<\/em><\/span><\/code><\/pre>\n\n\n\n<p>The variable &#8216;Kyphosis&#8217; is categorical and is either &#8216;absent&#8217; or &#8216;present&#8217;. The variable &#8216;Number&#8217; is integer and shows the number of vertebrae involved. The variable &#8216;Start&#8217; is the highest vertebra.<\/p>\n\n\n\n<p>To create a decision tree (machine learning) plot is straight forward:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>model &lt;- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis, method = 'class')<\/em><\/span><\/code><\/pre>\n\n\n\n<p>The variable to the left of the &nbsp;tilde (~) is the outcome (response) variable and the variables to the right of the tilde are the explanatory variables. To show the model:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#eb0c1b\" class=\"has-inline-color\">model<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#100bea\" class=\"has-inline-color\">\nn= 81 \n\nnode), split, n, loss, yval, (yprob)\n      * denotes terminal node\n\n 1) root 81 17 absent (0.79012346 0.20987654)  \n   2) Start>=8.5 62  6 absent (0.90322581 0.09677419)  \n     4) Start>=14.5 29  0 absent (1.00000000 0.00000000) *\n     5) Start&lt; 14.5 33  6 absent (0.81818182 0.18181818)  \n      10) Age&lt; 55 12  0 absent (1.00000000 0.00000000) *\n      11) Age>=55 21  6 absent (0.71428571 0.28571429)  \n        22) Age>=111 14  2 absent (0.85714286 0.14285714) *\n        23) Age&lt; 111 7  3 present (0.42857143 0.57142857) *\n   3) Start&lt; 8.5 19  8 present (0.42105263 0.57894737) *<\/mark><\/em><\/code><\/pre>\n\n\n\n<p>A plot is more informative:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>plot(model)<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>text(model)<\/em><\/span><\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"745\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/Tree-1024x745.png\" alt=\"\" class=\"wp-image-3834\" srcset=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/Tree-1024x745.png 1024w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/Tree-300x218.png 300w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/Tree-768x559.png 768w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/Tree-1536x1118.png 1536w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/Tree.png 1646w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Although it is possible to reduce the text size (with cex = 0.8), the plot doesn&#8217;t look very good. However, there is an additional package rpart.plot<sup class='sup-ref-note' id='note-zotero-ref-p2028-r4-o1'><a class='sup-ref-note' href='#zotero-ref-p2028-r4'>4<\/a><\/sup> for enhanced graphics.<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><span style=\"color: #ff0000;\">library(rpart.plot)<\/span><\/em>\n<em><span style=\"color: #ff0000;\">rpart.plot(model)<\/span><\/em><\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"805\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeEnhanced-1024x805.png\" alt=\"\" class=\"wp-image-3839\" srcset=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeEnhanced-1024x805.png 1024w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeEnhanced-300x236.png 300w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeEnhanced-768x604.png 768w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeEnhanced-1536x1208.png 1536w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeEnhanced.png 1740w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><strong>Continuous outcome variable (regression)<\/strong><\/p>\n\n\n\n<p>The rpart package also contains a data frame cu.summary with information about different cars. To show the data frame&#8217;s first six observations as well as its structure:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>library(rpart)<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>head(cu.summary)<\/em><\/span>\n<em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#3e06fa\" class=\"has-inline-color\">                Price Country Reliability Mileage  Type\nAcura Integra 4 11950   Japan Much better      NA Small\nDodge Colt 4     6851   Japan        &lt;NA>      NA Small\nDodge Omni 4     6995     USA  Much worse      NA Small\nEagle Summit 4   8895     USA      better      33 Small\nFord Escort   4  7402     USA       worse      33 Small\nFord Festiva 4   6319   Korea      better      37 Small<\/mark><\/em>\n\n<span style=\"color: #ff0000;\"><em>str(cu.summary)<\/em><\/span>\n<em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#270af2\" class=\"has-inline-color\">'data.frame':\t117 obs. of  5 variables:\n $ Price      : num  11950 6851 6995 8895 7402 ...\n $ Country    : Factor w\/ 10 levels \"Brazil\",\"England\",..: 5 5 10 10 10 7 5 6 6 7 ...\n $ Reliability: Ord.factor w\/ 5 levels \"Much worse\"&lt;\"worse\"&lt;..: 5 NA 1 4 2 4 NA 5 5 2 ...\n $ Mileage    : num  NA NA NA 33 33 37 NA NA 32 NA ...\n $ Type       : Factor w\/ 6 levels \"Compact\",\"Large\",..: 4 4 4 4 4 4 4 4 4 4 ...<\/mark><\/em><\/code><\/pre>\n\n\n\n<p>To predict the car&#8217;s mileage from Price, Country, Reliability and Type (using analysis of variance: anova):<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><span style=\"color: #ff0000;\">model2 &lt;- rpart(Mileage ~ Price + Country + Reliability + Type, method=\"anova\", data=cu.summary)<\/span><\/em>\n<em><span style=\"color: #ff0000;\">plot(model2)<\/span><\/em>\n<em><span style=\"color: #ff0000;\">text(model2)<\/span><\/em><\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"918\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeRegression-1024x918.png\" alt=\"\" class=\"wp-image-3844\" srcset=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeRegression-1024x918.png 1024w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeRegression-300x269.png 300w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeRegression-768x689.png 768w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeRegression.png 1358w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Or for a nicer plot:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><span style=\"color: #ff0000;\"><em>library(rpart.plot)<\/em><\/span>\n<span style=\"color: #ff0000;\"><em>rpart.plot(model2)<\/em><\/span><\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"673\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeRegressionEnhanced-1024x673.png\" alt=\"\" class=\"wp-image-3849\" srcset=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeRegressionEnhanced-1024x673.png 1024w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeRegressionEnhanced-300x197.png 300w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeRegressionEnhanced-768x505.png 768w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeRegressionEnhanced-1536x1010.png 1536w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/TreeRegressionEnhanced.png 1782w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>A decision tree plot can be useful to visualise decision rules for categorical or continuous outcome variables. A popular package to create decision trees in R is rpart. In addition, there is a separate package to produce plots with rpart; rpart.plot. Both packages should be installed. Binary (categorical) outcome The rpart package\u00a0 contains a data [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"inline_featured_image":false,"footnotes":""},"class_list":["post-2028","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/2028","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/comments?post=2028"}],"version-history":[{"count":1,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/2028\/revisions"}],"predecessor-version":[{"id":4736,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/2028\/revisions\/4736"}],"wp:attachment":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/media?parent=2028"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}