Advanced Concepts

On this page, advanced concepts are discussed for the more experienced user. In particular, the manipulation of data and the assigning / arranging of new variables.

To rearrange / group data, it is useful to know several R operators / functions.

R Objects and Operators

Common R Functions

Helpful R Functions in Packages

Defining R Functions

Importing and Cleaning Data in R

R data manipulation functions examples:

To select a subset of data; for example select different observers (1 and 2) and create a new data frame with data of both observers in columns (the original data frame is called ‘results’ with as variables ‘observer’, ‘rotation’ and ‘outcome’):

observer1<-subset(results,observer==1,select=c(outcome, rotation))

Some of these functions / operators are used in the examples below.

Printing percentages and currency signs:

To print percentages, use the sprintf function from R. For help on formatting, type ?sprintf in the console. A full description is outside the scope of this page. However, the first argument of the sprintf function should be within quotation marks (“). This argument starts with a % sign (to show a variable is coming), is followed by a full stop (to indicate the decimal point), followed by the number of decimal characters followed by an f (for floating point variable) and finally followed %% (the first percent sign is an ‘escape’ character as it would otherwise indicate a variable). The second argument of the function is the variable it should be applied to. Therefore, to print a percentage (%) sign behind a number:

[1] 1 2 3 4 5 6 7 8 9
[1] “1%” “2%” “3%” “4%” “5%” “6%” “7%” “8%” “9%”

Similarly, to print a % sign with two decimal places:

c<-sprintf(“%.2f%%”, a)
[1] “1.00%” “2.00%” “3.00%” “4.00%” “5.00%” “6.00%” “7.00%” “8.00%” “9.00%”

Finally, to print a £ sign (for example):

d<-sprintf(“£ %.2f”, a)
[1] “£ 1.00″ “£ 2.00″ “£ 3.00″ “£ 4.00″ “£ 5.00″ “£ 6.00″ “£ 7.00″ “£ 8.00″ “£ 9.00″

Manipulating dates:

Create a survival curve from dates; a data frame that contains the date of diagnosis and date of failure (example on how to convert dates to follow up time and how to create the censor variable; use survivaldates.rda with this example).

Revalue of map values (factors) with the plyr package 1 (for example month names from the first date of the month:


Calculating the ‘day number’ of a date:
[1] “2000-12-31″
[1] “366”

Convert incorrectly formatted data into appropriately declared variables that allow subsequent analysis:

It is a common error to group variables incorrectly and create a separate variable (column) for each group. However, variables should be in columns and the group is a separate variable. An example is show here.


Please not that the functions also run from R (rather than JGR), but that the ggplot2 library 2 will have to be loaded separately.

To create a faceted bar chart, load the Diag.rda data frame into JGR and run the create faceted bar chart function. This will create the following plot:


This example shows how to regroup data and create a plot on defined criteria.

To create a stacked and faceted bar plot, load the SpecGroup.rda data frame into JGR and run the stacked faceted bar plot 1 function. This will create the following plot:


To create a stacked and faceted bar plot where the axes are ‘free’ and only labels that are greater than 1 are displayed; load the TNMstage.rdadata frame into JGR and run the stacked faceted bar plot 2 function. This will create the following plot:


Wickham H. plyr [Internet]. 2015. Available from:
Wickham H, Chang W. ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics [Internet]. Springer New York; 2016. Available from: