Statsbook

Enter Manually

R uses vectors and data frames or tibbles (Tidyverse) to store data. A data frame is two dimensional matrix that can contain all different types of data:

  • Descriptive
    • Character (text)
    • Factor (different groups, categorical variable)
    • Ordinal Factor(different groups that are ordered)
  • Numerical
    • Logical (a binary variable, true / false, yes / no etc)
    • Integer (for integer numbers, discrete numeral data)
    • Double (double precision floating point, for decimal numbers)
  • Date and Time

Each column (vertical) contains a variable and each row (horizontal) a case or patient.

To enter numerical continuous data (ie heights of patients):

heights <- c(184,146,169,185,160,173,179,171,160,150)
heights
 [1] 184 146 169 185 160 173 179 171 160 150

The ‘c’ function is the concatenate function that adds vectors together

To enter a grouping factor:

group <- c(rep('group 1', 5), rep('group 2', 5))
group
 [1] "group 1" "group 1" "group 1" "group 1" "group 1" "group 2"
 [7] "group 2" "group 2" "group 2" "group 2"

Two vectors are concatenated; one of five repetitions of ‘group 1’ and one of five repetitions of ‘group 2’. The data is categorical and has no order. To convert to a factor:

group <- as.factor(group)
group
 [1] group 1 group 1 group 1 group 1 group 1 group 2 group 2
 [8] group 2 group 2 group 2
Levels: group 1 group 2

To enter date using the lubridate1 package:

library(lubridate)
dates = c('2025-01-10', '2025-01-11', '2025-01-13', '2025-01-14', '2025-01-15', '2025-01-16', '2025-01-17', '2025-01-17', '2025-01-19', '2025-01-20')
dates <- ymd(dates)
dates
 [1] "2025-01-10" "2025-01-11" "2025-01-13" "2025-01-14"
 [5] "2025-01-15" "2025-01-16" "2025-01-17" "2025-01-17"
 [9] "2025-01-19" "2025-01-20"

The lubridate2 package has several functions to convert dates to ISO 8601 standard. It is recommended to use the ISO standard (year-month-day) in statistical analysis. Similar to numbers provide the most significant number (year) first, followed by month and day. This will also result in the dates being ordered correctly. The functions in lubridate are logically named; ie ymd() expects the string to be in ‘year-month-day’ order. Other lubridate functions are described on the website.

There are now three vectors:

  • a numerical heights vector,
  • a group vector (factor) and a
  • dates vector.

To combine these three vectors to a data frame that contains all the information (it is important to have the data in the correct order, so the first item in the height vector corresponds to the first item in the group and dates vectors!):

df <- data.frame(group, heights, dates)
df
     group heights      dates
1  group 1     184 2025-01-10
2  group 1     146 2025-01-11
3  group 1     169 2025-01-13
4  group 1     185 2025-01-14
5  group 1     160 2025-01-15
6  group 2     173 2025-01-16
7  group 2     179 2025-01-17
8  group 2     171 2025-01-17
9  group 2     160 2025-01-19
10 group 2     150 2025-01-20

To check the data is of the correct type, use the str (structure) function:

str(df)
'data.frame':	10 obs. of  3 variables:
 $ group  : Factor w/ 2 levels "group 1","group 2": 1 1 1 1 1 2 2 2 2 2
 $ heights: num  184 146 169 185 160 173 179 171 160 150
 $ dates  : Date, format: "2025-01-10" ...

You can now use the data for analysis.

To save the data frame as an rda (R data) file:

save(df, file='/path/to_file/df.rda')

When you open the rda file in R, it will open the data frame with all the variables of the declared type.