Advanced Concepts

On this page, advanced concepts are discussed for the more experienced user. In particular, the manipulation of data and the assigning / arranging of new variables.

To rearrange / group data, it is useful to know several R operators / functions.

R Objects and Operators

Common R Functions

Helpful R Functions in Packages

Defining R Functions

Importing and Cleaning Data in R

R data manipulation functions examples:

To select a subset of data; for example select different observers (1 and 2) and create a new data frame with data of both observers in columns (the original data frame is called ‘results’ with as variables ‘observer’, ‘rotation’ and ‘outcome’):

observer1<-subset(results,observer==1,select=c(outcome, rotation))
names(observer1)[names(observer1)==”outcome”]<-‘outcome1′
names(observer1)[names(observer1)==”rotation”]<-‘rotation1′
observer2<-subset(results,observer==2,select=c(outcome,rotation))
names(observer2)[names(observer2)==”outcome”]<-‘outcome2′
names(observer2)[names(observer2)==”rotation”]<-‘rotation2′
observers<-cbind(observer1,observer2)

Some of these functions / operators are used in the examples below.

Printing percentages and currency signs:

To print percentages, use the sprintf function from R. For help on formatting, type ?sprintf in the console. A full description is outside the scope of this page. However, the first argument of the sprintf function should be within quotation marks (“). This argument starts with a % sign (to show a variable is coming), is followed by a full stop (to indicate the decimal point), followed by the number of decimal characters followed by an f (for floating point variable) and finally followed %% (the first percent sign is an ‘escape’ character as it would otherwise indicate a variable). The second argument of the function is the variable it should be applied to. Therefore, to print a percentage (%) sign behind a number:

a<-c(1,2,3,4,5,6,7,8,9)
a
[1] 1 2 3 4 5 6 7 8 9
b<-sprintf(“%.0f%%”,a)
b
[1] “1%” “2%” “3%” “4%” “5%” “6%” “7%” “8%” “9%”

Similarly, to print a % sign with two decimal places:

c<-sprintf(“%.2f%%”, a)
c
[1] “1.00%” “2.00%” “3.00%” “4.00%” “5.00%” “6.00%” “7.00%” “8.00%” “9.00%”

Finally, to print a £ sign (for example):

d<-sprintf(“£ %.2f”, a)
d
[1] “£ 1.00″ “£ 2.00″ “£ 3.00″ “£ 4.00″ “£ 5.00″ “£ 6.00″ “£ 7.00″ “£ 8.00″ “£ 9.00″

Manipulating dates:

Create a survival curve from dates; a data frame that contains the date of diagnosis and date of failure (example on how to convert dates to follow up time and how to create the censor variable; use survivaldates.rda with this example).

Revalue of map values (factors) with the plyr package 1 (for example month names from the first date of the month:

library(plyr)
month<-mapvalues(activity$ActivityMonth,from=c(’01/01/2014′,’01/02/2014′,’01/03/2014′,’01/04/2014′,’01/05/2014′,’01/06/2014′,’01/07/2014′,’01/08/2014′,’01/09/2014′,’01/10/2014′,’01/11/2014′,’01/12/2014′),to=c(‘Jan’,’Feb’,’Mar’,’Apr’,’May’,’Jun’,’Jul’,’Aug’,’Sep’,’Oct’,’Nov’,’Dec’))

Calculating the ‘day number’ of a date:
datum<-as.Date(as.character(’31/12/2000′),’%d/%m/%Y’)
datum
[1] “2000-12-31″
format(datum,format=’%j’)
[1] “366”

Convert incorrectly formatted data into appropriately declared variables that allow subsequent analysis:

It is a common error to group variables incorrectly and create a separate variable (column) for each group. However, variables should be in columns and the group is a separate variable. An example is show here.

 Examples:

Please not that the functions also run from R (rather than JGR), but that the ggplot2 library 2 will have to be loaded separately.

To create a faceted bar chart, load the Diag.rda data frame into JGR and run the create faceted bar chart function. This will create the following plot:

Diag

This example shows how to regroup data and create a plot on defined criteria.

To create a stacked and faceted bar plot, load the SpecGroup.rda data frame into JGR and run the stacked faceted bar plot 1 function. This will create the following plot:

SpecGroup

To create a stacked and faceted bar plot where the axes are ‘free’ and only labels that are greater than 1 are displayed; load the TNMstage.rdadata frame into JGR and run the stacked faceted bar plot 2 function. This will create the following plot:

TNMstage

1.
Wickham H. plyr [Internet]. 2015. Available from: https://cran.r-project.org/web/packages/plyr/index.html
1.
Wickham H, Chang W. ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics [Internet]. Springer New York; 2016. Available from: http://cran.r-project.org/package=ggplot2