{"id":67,"date":"2015-04-28T12:09:29","date_gmt":"2015-04-28T11:09:29","guid":{"rendered":"http:\/\/pcool.dyndns.org:8080\/statsbook\/?page_id=67"},"modified":"2025-06-23T23:49:43","modified_gmt":"2025-06-23T22:49:43","slug":"import-spreadsheet","status":"publish","type":"page","link":"https:\/\/pcool.dyndns.org\/index.php\/data-into-r\/import-spreadsheet\/","title":{"rendered":"Import Spreadsheet"},"content":{"rendered":"\n<p>Data are usually collected in a spreadsheet program such as Microsoft Excel or Open Office. Once data collection is complete it is preferable to import the data directly from the spreadsheet into R, rather than entering it <a href=\"https:\/\/pcool.dyndns.org\/index.php\/data-into-r\/enter-manually\/\" data-type=\"page\" data-id=\"63\">manually<\/a>.<\/p>\n\n\n\n<p>The comma separated values is the most versatile file format, but the Excel format is often used.<\/p>\n\n\n\n<p><strong>Comma Separated Values (CSV)<\/strong><\/p>\n\n\n\n<p><a href=\"https:\/\/pcool.dyndns.org:\/wp-content\/data_files\/height10.csv\" target=\"_blank\" rel=\"noreferrer noopener\">Heights of 10 people (csv file)<\/a><\/p>\n\n\n\n<p><strong>Microsoft Excel (XLS)<\/strong><\/p>\n\n\n\n<p><a href=\"https:\/\/pcool.dyndns.org:\/wp-content\/data_files\/height10.xls\" target=\"_blank\" rel=\"noreferrer noopener\">Heights of 10 people (xls file)<\/a><\/p>\n\n\n\n<p><em>There should be <strong>no spaces<\/strong> in variable names. In a spreadsheet, the variable names are stored in the first row. If the variable name contains a space, this space will be converted to a full stop &#8220;.&#8221;.&nbsp; However, it is generally preferable not to use spaces in variable names<\/em> and use one of the conventions as <a href=\"https:\/\/pcool.dyndns.org\/index.php\/data-into-r\/\" data-type=\"page\" data-id=\"61\">described<\/a>.<\/p>\n\n\n\n<p>To import the spreadsheet:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f10606\" class=\"has-inline-color\">library(readxl)<\/mark><\/em>\n<em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f50606\" class=\"has-inline-color\">df_csv &lt;- read_csv('\/path\/to_file\/height10.csv')\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#1b05f5\" class=\"has-inline-color\">Rows: 10 Columns: 1                                            \n\u2500\u2500 Column specification \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nDelimiter: \",\"\ndbl (1): height\n\n\u2139 Use `spec()` to retrieve the full column specification for this data.\n\u2139 Specify the column types or set `show_col_types = FALSE` to quiet this message.<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f50606\" class=\"has-inline-color\">\ndf_csv\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#2305f5\" class=\"has-inline-color\"># A tibble: 10 \u00d7 1\n   height\n    &lt;dbl&gt;\n 1    184\n 2    146\n 3    169\n 4    185\n 5    160\n 6    173\n 7    179\n 8    171\n 9    160\n10    150<\/mark>\n# or for excel files:<mark style=\"background-color:rgba(0, 0, 0, 0);color:#f50606\" class=\"has-inline-color\">\ndf_xls &lt;- read_excel('\/path\/to_file\/height10.xls')                                                            \ndf_xls\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#1b05f5\" class=\"has-inline-color\"># A tibble: 10 \u00d7 1\n   height\n    &lt;dbl&gt;\n 1    184\n 2    146\n 3    169\n 4    185\n 5    160\n 6    173\n 7    179\n 8    171\n 9    160\n10    150<\/mark><\/em><\/code><\/pre>\n\n\n\n<p>To see the structure:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f40404\" class=\"has-inline-color\">str(df_xls)<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#4406f2\" class=\"has-inline-color\">tibble &#091;10 \u00d7 1] (S3: tbl_df\/tbl\/data.frame)\n $ height: num &#091;1:10] 184 146 169 185 160 173 179 171 160 150<\/mark><\/em><\/code><\/pre>\n\n\n\n<p>Before proceeding with statistical analysis, make sure the variables are set of the appropriate type (numerical, factor, date etc) as also discussed in the <a href=\"https:\/\/pcool.dyndns.org\/index.php\/data-into-r\/enter-manually\/\" data-type=\"page\" data-id=\"63\">manual<\/a> section.<\/p>\n\n\n\n<p>To save the data frame as an rda (R data) file:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f40808\" class=\"has-inline-color\"><em>save(df_xls, file = '\/path\/to_file\/df_xls.rda')<\/em><\/mark><\/code><\/pre>\n\n\n\n<p><strong>To import a csv file with base R (without importing a package):<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><em><span style=\"color: #ff0000;\">height10 &lt;- read.table(\"\/path\/to_file\/height10.csv\", header=T, sep=\",\", quote=\"\\\"\")<\/span><\/em><\/code><\/pre>\n\n\n\n<p>This will create a data frame &#8216;height10&#8217; that is available in the Data Viewer <em>(please edit the \/path\/to_file\/ to the appropriate location of the file on your computer)<\/em>.<\/p>\n\n\n\n<p><strong>To import a xlsx file can also be done with the readxl package:<\/strong><\/p>\n\n\n\n<p>For example the <a href=\"https:\/\/pcool.dyndns.org:\/wp-content\/data_files\/SpecGroup.xlsx\" target=\"_blank\" rel=\"noreferrer noopener\">SpecGroup.xlsx<\/a> file:<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f10505\" class=\"has-inline-color\">library(readxl)<\/mark><\/em>\n<em><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f10505\" class=\"has-inline-color\">library(lubridate)\ndf &lt;- read_excel('\/path\/to_file\/SpecGroup.xlsx')                                                          \ndf\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#2b05f2\" class=\"has-inline-color\"># A tibble: 60 \u00d7 4\n   Month   Group     Speciality Number\n   &lt;chr>   &lt;chr>     &lt;chr>       &lt;dbl>\n 1 01\/2016 New       Oncology        5\n 2 01\/2016 New       Ortho          16\n 3 01\/2016 New       Oncology        7\n 4 01\/2016 Other     Oncology        2\n 5 01\/2016 Other     Ortho           2\n 6 01\/2016 Other     Ortho           4\n 7 01\/2016 Other     Oncology        7\n 8 01\/2016 Resection Ortho           4\n 9 01\/2016 Resection Ortho           3\n10 02\/2016 New       Oncology        8\n# \u2139 50 more rows\n# \u2139 Use `print(n = ...)` to see more rows<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f10505\" class=\"has-inline-color\">\nstr(df)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#2b05f2\" class=\"has-inline-color\">tibble &#091;60 \u00d7 4] (S3: tbl_df\/tbl\/data.frame)\n $ Month     : chr &#091;1:60] \"01\/2016\" \"01\/2016\" \"01\/2016\" \"01\/2016\" ...\n $ Group     : chr &#091;1:60] \"New\" \"New\" \"New\" \"Other\" ...\n $ Speciality: chr &#091;1:60] \"Oncology\" \"Ortho\" \"Oncology\" \"Oncology\" ...\n $ Number    : num &#091;1:60] 5 16 7 2 2 4 7 4 3 8 ...<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f10505\" class=\"has-inline-color\">\ndf$Month = my(df$Month)\ndf$Group = as.factor(df$Group)\ndf$Speciality = as.factor(df$Speciality)\ndf\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#2305f2\" class=\"has-inline-color\"># A tibble: 60 \u00d7 4\n   Month      Group     Speciality Number\n   &lt;date>     &lt;fct>     &lt;fct>       &lt;dbl>\n 1 2016-01-01 New       Oncology        5\n 2 2016-01-01 New       Ortho          16\n 3 2016-01-01 New       Oncology        7\n 4 2016-01-01 Other     Oncology        2\n 5 2016-01-01 Other     Ortho           2\n 6 2016-01-01 Other     Ortho           4\n 7 2016-01-01 Other     Oncology        7\n 8 2016-01-01 Resection Ortho           4\n 9 2016-01-01 Resection Ortho           3\n10 2016-02-01 New       Oncology        8\n# \u2139 50 more rows\n# \u2139 Use `print(n = ...)` to see more rows<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f10505\" class=\"has-inline-color\">\nstr(df)\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#1a05f2\" class=\"has-inline-color\">tibble &#091;60 \u00d7 4] (S3: tbl_df\/tbl\/data.frame)\n $ Month     : Date&#091;1:60], format: \"2016-01-01\" ...\n $ Group     : Factor w\/ 3 levels \"New\",\"Other\",..: 1 1 1 2 2 2 2 3 3 1 ...\n $ Speciality: Factor w\/ 2 levels \"Oncology\",\"Ortho\": 1 2 1 1 2 2 1 2 2 1 ...\n $ Number    : num &#091;1:60] 5 16 7 2 2 4 7 4 3 8 ...<\/mark><\/em><\/code><\/pre>\n\n\n\n<p>The package <strong>readxl<\/strong><sup class='sup-ref-note' id='note-zotero-ref-p67-r1-o1'><a class='sup-ref-note' href='#zotero-ref-p67-r1'>1<\/a><\/sup> should be installed as <a href=\"https:\/\/pcool.dyndns.org\/index.php\/packages\/\" data-type=\"page\" data-id=\"22\">described<\/a>. <em>(please edit the \/path\/to_file\/ to the appropriate location of the file)<\/em><\/p>\n\n\n\n<p>The lubridate<sup class='sup-ref-note' id='note-zotero-ref-p67-r2-o1'><a class='sup-ref-note' href='#zotero-ref-p67-r2'>2<\/a><\/sup> package has several functions to convert dates to <a href=\"https:\/\/www.iso.org\/iso-8601-date-and-time-format.html\">ISO 8601<\/a> standard. It is recommended to use the ISO standard (year-month-day) in statistical analysis. Similar to numbers provide the most significant number (year) first, followed by month and day. This will also result in the dates being ordered correctly. The functions in lubridate are logically named; ie ymd() expects the string to be in &#8216;year-month-day&#8217; order. Here the my() function was used as the string was in month\/year order. The last str(df) call shows that the date has been changed to ISO 8601 format. Other lubridate functions are described on the <a href=\"https:\/\/lubridate.tidyverse.org\/\" data-type=\"link\" data-id=\"https:\/\/lubridate.tidyverse.org\/\">website<\/a>.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data are usually collected in a spreadsheet program such as Microsoft Excel or Open Office. Once data collection is complete it is preferable to import the data directly from the spreadsheet into R, rather than entering it manually. The comma separated values is the most versatile file format, but the Excel format is often used. [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":61,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"inline_featured_image":false,"footnotes":""},"class_list":["post-67","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/67","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/comments?post=67"}],"version-history":[{"count":7,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/67\/revisions"}],"predecessor-version":[{"id":4341,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/67\/revisions\/4341"}],"up":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/61"}],"wp:attachment":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/media?parent=67"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}