Programming basics in R - The Basics

Tiago Maié

Fabio Ticconi

Johannes Schöneich

25 October 2023

Intro - Before you start

This tutorial contains different levels of difficulty that build on each other. If you skip a level it might be possible that you won’t manage to finish the next one.
Further, Level 1 should be done BEFORE the class takes place. The class will start on level 2. Don’t ruin your chances of getting further in the class and learning more about this subject that you chose to learn by skipping level 1.

Please do Level 1 before the class takes place.

Must know:

  • Programming is simply giving the computer a basic “cooking” recipe to execute.
  • As with a cooking recipe, it has to be done sequentially. If you were making a simple bread, you wouldn’t bake the water, then add the flour and then call it a bread.
  • As with a cooking recipe, if you write each instruction in a readable way, it will help you read the “recipe” later on. Tablespoon and tbsp usually mean the same but if you’ve never cooked before or haven’t cooked for a while, the first is more understandable than the latter.
  • The computer is not smart and (usually) cannot guess what you are trying to say. To keep the cooking analogies going, if you make a mistake by adding grapes to a pizza (don’t do this, please!) in the end you will either have the wrong result or the computer will stop you from doing it (unfortunately no computer was present when they created spaghetti pizza).
  • If you follow the exact same recipe as the person next to you, you both will end up with the same exact result (unless you are working with random numbers).
  • As with cooking, with train, practice and imagination, you can speed up and spice up your recipes. As with cooking, don’t be afraid to get your hands dirty.
  • Cooking (and other) analogies will be used during this tutorial :) brace yourself!

For students being evaluated:

  • Every Exercise section is mandatory and should be submitted for evaluation.
  • The submission should be a single script per class/day (e.g. day1_exercises_[last name]-[first name].R).
  • In this script only the code with your solution for each of the Exercise sections should be present as well as comments to clearly state to which exercise each solution corresponds to.

1 - Level

At this point you are supposed to have installed R and RStudio on your computer. If you haven’t done that yet, please go and do it before you move forward. You can install R from cran.rstudio.com and RStudio from rstudio.com (the links here will take you to the correct installation page).

1.1 R and RStudio

R is a (statistical) programming language. RStudio is the environment where you will be typing your code.
Imagine RStudio is like Word. Makes your text pretty but if you don’t put anything inside it doesn’t do much.
R is great if you want to do data analysis and produce state-of-the-art plots.

If you just want to write a general program, there are other languages that might be more suited to your needs like Python for example.
R is free for everyone to use. Runs on Windows, Linux and MacOS so it is very portable. Has a very large community that produces code, packages and libraries for you to use as well as help when you need it. More often than not, if you want to write some code and you don’t know where to start, you can always find help online and examples that you can use yourself as the basis for your code. Like seriously if you start programming, regardless of the language, most of the time you will spend either on the help page of some library that you want to use or in google trying to figure out a solution to some problem.
Websites like RDocumentation, StackOverflow, StatMethods and R-Bloggers will save you countless hours of problem solving.

1.2 The basics

The R console is a direct comunication line with the “R computer”. Think of it like talking with R. If you type something in the R console and then press Enter, R will look at hear your message, interpret it and then give you a reply. R/RStudio has a very short memory. If you close R/RStudio and open it again at a later date, it will have no recollection of what you talked about the last time. A script is a simple text file that will work as your cooking recipe. It will probably have several lines that will be executed sequencially. Think of it like texting with R. If you write something in a script and save it, you can always look at it later. When you want to execute a script, you give the script to R, R will look at the recipe, check if there are any basic mistakes in it, interpret it and then give you a reply that follows the sequence in your recipe.

It is preferable that you write your code in a script because:

  • If you want, at a later date, to check which texts you sent to R, you can do so very easily by just opening the script;
  • If you want, at any point, to check what the complex recipe your great-great-grand-mother friendly bioinformaticist shared with you does, you can do so very easily by just opening the recipe script and studying it line by line.

1.3 Exercise

Open a script in RStudio, write print(“Hello world!”) and save it.
Execute the script and check the result.
Congratulations you wrote your first program! :)

1.4 Exercise submission

If you only had to submit the previous exercise and you were called “Marie Fischer”, your script should look something like this day1_exercises_Fischer-Marie.R


2 - Level

We will start with a very simple exercise that will show you that programming is basically like using a glorified calculator.

2.1 Values and operations

In this very simple example, a value can be for example any given number. 1 is a value, 42 is a value, 0.9 is a value.
An operation is what you do to combine them. 1 + 1 is an operation, 100 / 0.9 is another operation.

2.2 Variables

In R the value of π (pi) is stored in the variable “pi”.
Type pi in the console, press enter/return and check the result.

In R you can assign a value to your own variable with the = character (you can do the same with <- but you can just use = for simplicity).

Code examples:

weekdays = 5
days_in_november = 30
beer_price = 5
x = 2.5
y = -9

my_first_name = 'Jane'
my_last_name = 'Doe'

is_weather_good = FALSE
is_cloudy_today = TRUE

In R you have 3 basic types of values: numeric, character and logical.

33        # This is a value of type numeric
## [1] 33
'Three-3' # This is a value of type character
## [1] "Three-3"
TRUE      # this is a value of type logical
## [1] TRUE

A numeric value is any type of number. A character value is any type of character or character combination (like a word or a sentence) as long as it is between quotes ("). A logical value can only assume the value TRUE or FALSE.
Note that logical values are commonly known in programming as boolean values.

2.3 Operations with variables

Usually values of different types can NOT be combined with eachother but can be combined between themselves. Basically you can do 2 + 2, copy+paste together “two” to get “twotwo” and have a condition or variable be TRUE or FALSE. However you definitely can’t do 2+‘two’+TRUE. However you can also definitely try :) Don’t be afraid to get your hands dirty.

Code examples:

2+'two'+TRUE # Errors often tell you what you are doing wrong, don't be scared
## Error in 2 + "two": non-numeric argument to binary operator
days_in_november = 30
days_in_december = 31
days_in_november + days_in_december
## [1] 61

As seen in the previous example, if you try to combine things that shouldn’t be combined, R will give you an error.

2.4 Exercise

Given the basic arithmetic operators below:

  • + Addition
  • - Subtraction
  • * Multiplication
  • / Division
  • ^ Exponentiation

Write the code to:

  1. compute 1 plus 3;
  2. compute 3 minus 1;
  3. compute 2 multiplied by 2;
  4. compute 4 divided by 2;
  5. compute 3 exponentiated by 2;

and write down the results.

Writing the result is not very practical, repeat the exercise but this time save each value to a variable.

Given that:

  • Basket A has 11 red apples and 5 green apples where each, regardless of color, cost €1,
  • Basket B has 20 bananas where each costs €0.5,
  • Basket C has 9 pears and 3 green apples. Pears cost €0.8 each,
  • Basket D has 2 mangos, 3 oranges and 2 green apples. Mangos cost €3 each and oranges €1.5 each,
  • There are 5 baskets but Basket E is empty.
  • Baskets themselves cost €2 each.

You have just learned about assining a value to a variable, write the code to:

  • Assign each of the values described above to a variable with a name that makes sense (example: red_apples_A=5, bananas_price=0.5)

As another example, the code below corresponds correctly to one of the baskets but the naming of the variables is too cryptic to understand.

qwe = 9
asd = 3
zxc = 'green'
rty = 0.8
fgh = 1

Bonus question: Can you imagine a different way to code the color of the apples using a logical value instead of a character value with a naming that makes sense?

Write the code to:

  • compute the mean number of fruits over all baskets;
  • compute the total number of fruits present in the baskets;
  • compute the difference in number of fruits present between baskets C and D;
  • compute the total cost of buying all baskets;
  • compute the average price of all baskets;
  • compute the price of Basket A plus Basket B;
  • assign a variable for each basket defining whether the basket has green apples.

You should know that different types of variables need different operators to be combined.
For logical variables we can use && and || and !, this translates to AND, to OR and to NOT respectively.
If you have A = TRUE; B = TRUE; C = FALSE and D = FALSE then:

  • for && (AND):
    • the result of A && B will be TRUE
    • the result of A && C will be FALSE
    • the result of C && D will be FALSE
  • for || (OR):
    • the result of A || B will be TRUE
    • the result of A || C will be TRUE
    • the result of C || D will be FALSE
  • for ! (NOT):
    • the result of !A will be FALSE
    • the result of !C will be TRUE
is_weather_good = FALSE
is_cloudy_today = TRUE
is_weather_good && is_cloudy_today
## [1] FALSE

For character variables things are a little bit more complicated. For now let’s just say that you can’t easily combine them with simple operators but that you need something a little more complex, you’ll need a function.

2.5 Relationship between values

Even though we can’t (at the moment) combine character values together, we can do comparisons. This can be done, not only between characters, but also between the other types.

For definining relationships (or comparisons) between values and variables we need relational operators. These are:

  • < Less than
  • > Greater than
  • <= Less than or equal to
  • >= Greater than or equal to
  • == Equal to
  • != Not equal to (Different than)

When we define a relationship, the result is a logical value.
The relationship defined as 2 > 4 is FALSE.
The relationship (1 + 1) == 2 is TRUE.
The relationship ‘two’ != 2 is TRUE.
The relationship 23 <= 25 is TRUE.

Since these relationships are logical values, we can use logical operators to combine them. The relationship (2 > 4) && (23 <= 25) is FALSE.
The relationship !(2 > 4) && (23 <= 25) is TRUE.

Code examples:

2 == 'two'
## [1] FALSE
2 != 'two'
## [1] TRUE
2 > 4
## [1] FALSE
(1 + 1) == 2
## [1] TRUE
23 <= 25
## [1] TRUE
(2 > 4) && (23 <= 25)
## [1] FALSE
!(2 > 4) && (23 <= 25)
## [1] TRUE
'jane' != 'doe'
## [1] TRUE
'apple' == 'orange'
## [1] FALSE

2.6 Exercise

We analyzed two datasets for differential gene expression between cancer and control groups.

The cancer group has 10 patients, the control group has 6 patients.

cancer_group = 10
control_group = 6

With the variables defined above, compute the following relationships:

  • “the control group has more patients than the cancer group”
  • “the control group has less patients than the cancer group”
  • “the control group has the same number of patients as the cancer group”
  • “the total number of patients is less than 20”
  • “the total number of patients is more than 10”

3 - Level

Up until this point you have learned how to do arithmetic operations in R, how to define variables of different types and how to compute relationships between them.

We will now add up to what we’ve learned vectors and matrices.

It might not have been clear up until now but you should know that there are very many ways to do the same task when you are programming or writing code. Some(many)times there is not even an optimal solution. With this being said we will show you for example how to create and select elements in a matrix but there are other ways to do the same thing. Don’t be afraid to get your hands dirty! It is better to have a correct and ugly solution than no solution at all.

3.1 Vectors and matrices

To put it simply, a vector is a one-dimensional matrix. A matrix has two-dimensions and is what people commonly call a table.
A matrix has n rows and m columns (n x m), a vector taken from that matrix will have 1 row and m columns (1 x m) or n rows and 1 column (n x 1). In R, vectors and matrices can only have elements of a single type. This means that, given what we have learned, a vector or matrix will be composed of either numbers, characters or booleans (logical variables) but not a mix of these. Note that if you try to create a vector with different types of values, they will all be converted automatically to a single type.

In R, to create a vector we use the notation c(x,y,z) where “x”, “y” and “z” are the elements (values or variables) inside the vector and “c()” is just the notation used so that R knows that we want to create a vector. If we want to create a vector with numbers 1 through 5 we could do:

c(1, 2, 3, 4, 5)
## [1] 1 2 3 4 5

If we want to create a vector with the first 5 letters of the english alphabet we could do:

c('a', 'b', 'c', 'd', 'e')
## [1] "a" "b" "c" "d" "e"

If we want to create a vector representing the days of the week, starting on Monday, as logical values where weekend days are TRUE and weekdays are FALSE we could do:

c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE)
## [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

If we want to create a vector of p-values, we could do:

c(0.239, 0.913, 0.051, 0.043, 0.002, 0.115, 0.092)
## [1] 0.239 0.913 0.051 0.043 0.002 0.115 0.092

If we want to create a vector defining whether each of these p-values is below the usual threshold of 0.05, we could do:

c(FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE)
## [1] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE

For integers specifically there is an alternative and practical way that we can create a vector that will be very useful later on. By using the character ‘:’ between two different numbers a vector will be created with each numerical unit (integer) between them.
If we have 5:10 the vector 5, 6, 7, 8, 9, 10 will be created.

Code example:

c(5, 6, 7, 8, 9, 10)
## [1]  5  6  7  8  9 10
5:10
## [1]  5  6  7  8  9 10

Both the examples above mean (basically) the same.
This will be very practical when we want to subset vectors/matrices.

While it is very likely that you will create vectors, when talking about matrices it is more likely that you will load a matrix from a file (like an excel table) than it is for you to create one from scratch. Regardless if you do need to create one you will have to know at the very least two things. The first are the elements in your matrix writen as a vector, the second is the number of rows that you want your matrix to have. To create a 2 x 2 matrix with letters ‘a’ through ‘d’ you could do:

my_matrix_elements = c('a','b','c','d')
my_matrix = matrix(data=my_matrix_elements, nrow=2)
my_matrix
##      [,1] [,2]
## [1,] "a"  "c" 
## [2,] "b"  "d"

To create a 3 x 2 matrix

To create this matrix we use what we call a function. A function is a specifc operation that needs specific arguments and gives us back a result. We will explore functions later on but for now, to create a matrix, you need to provide the argument “data” for the data that you want in the matrix and the argument “nrow” for the number of rows that you want in your matrix.

If we want to create a 3 x 3 matrix with numbers 1 through 9 we could do:

my_matrix_elements = 1:9
my_matrix = matrix(data=my_matrix_elements, nrow=3)
my_matrix
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

If we want to create a 2 x 5 matrix with numbers 1 through 10 we could do:

my_matrix_elements = 1:10
my_matrix = matrix(data=my_matrix_elements, nrow=2)
my_matrix
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10

Note the order of the elements in the matrices in the previous examples. Do they have the order that you would expect?

3.2 Exercise

Given the sequence of numbers 9, 8, 7, 1, 2, 3, 5, 4, 6, write the code to:

  • create a vector with the given sequence
  • create a matrix with 3 rows using the vector created above

You want to create a long vectors and large tables, write the code to:

  • create a vector with numbers 1 through 100 and assign it to a variable
  • create a table with 10 rows using the vector created in the previous line
  • create a vector with numbers 501 through 1000 and assign it to a variable
  • create a matrix with 10 rows using the vector created in the previous line
  • create a vector with numbers 2793 through 3292 and assign it to a variable
  • create a matrix with 50 rows using the vector created in the previous line

To add to what you have learned, there are more arguments that we can use in matrix. Two arguments that are notably important is “byrow” which makes it so that the matrix is filled by row instead of by column (as it is by default). The other important argument is “dimnames”, this argument allows us to define the row names and the column names in a matrix.

3.3 Selecting and subsetting

We can perform operations over vectors and matrices as well as select or subset specific elements.
For example you might want to select the first 3 rows of a matrix or the very last element of a vector.

To subset a matrix or vector we can use the indices of the elements we want to select. At this point it is also better to start almost always saving values to variables as it is easier to perform operations over a variable than over the value itself. In R to select elements in a vector or matrix we use the notation “[]” (square brackets). This lets R know that we want to select elements of that variable.

3.3.1 Subsetting vectors

Given the following vector variable called “vector_a”:

vector_a = c(1,6,8,2,6,7,1,2,6,7,1,2,58,9,9,5,8,7,16,8,42,6,87,5,69,8,5)

Let us select the first 3 elements (1, 6 and 8) by, inside the square brackets, putting the indexes of the elements we want to select:

vector_a[c(1,2,3)]
## [1] 1 6 8

If we remember the example from before when we said that an alternative way to create a vector would come handy, this is it!

vector_a[1:3]
## [1] 1 6 8

We can also combine both notations, for example if we wanted to select the first 10 elements and then the fifteenth:

vector_a[c(1:10,15)]
##  [1] 1 6 8 2 6 7 1 2 6 7 9

Further, we can save the indices we want to select as a variable and then use this variable to subset the original vector, for example if we want to select elements 10 through 15 of vector_a we could do:

keep_elements = 10:15
vector_a[keep_elements]
## [1]  7  1  2 58  9  9

3.4 Exercise

Given the sequence of characters ‘i’,‘a’,‘m’,‘l’,‘e’,‘a’,‘r’,‘n’,‘i’,‘n’,‘g’, write the code to:

  • count the number of characters in this sequence and save this value to a variable
  • create a vector with the given sequence
  • subset the first 3 elements of the vector created in the previous line
  • subset the elements 4 through to the last element (or in other words, select all but the first 3 elements) of the vector created in the previous line

3.4.1 Subsetting matrices

Subsetting a matrix is a bit more complicated because we need to select rows and columns. We will still use the “[]” notation however now, instead of having a single number inside the square brackets (vector_a[2]), or a sequence of numbers (vector_a[4:8] or vector_a[c(3,2,1)]), depending on what we want to select/subset we can have several different options.

If we want to select:

  • a single value:
    • we will have 2 numbers separated by “,” (my_matrix[2,1] selects the value in the 2nd row, 1st column).
  • a single row:
    • we will have 1 number followed by “,” (my_matrix[2,] selects the 2nd row).
  • a single column:
    • we will have a “,” followed by 1 number (my_matrix[,1] selects the 1st column).
  • two or more rows:
    • we will have a vector followed by “,” (my_matrix[2:3,] selects the 2nd and 3rd rows).
  • two or more columns:
    • we will have a “,” followed by a vector (my_matrix[,c(1,3)] selects the 1st and 3rd columns).
  • two or more rows and two or more columns:
    • we will have a vector followed by “,” followed by another vector (my_matrix[2:3,c(1,3)] gives us a new table made of the 2nd and 3rd rows and the 1st and 3rd columns of my_matrix).

For example if we have:

my_matrix_elements = 1:25
my_matrix = matrix(data=my_matrix_elements, nrow=5)
my_matrix
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    6   11   16   21
## [2,]    2    7   12   17   22
## [3,]    3    8   13   18   23
## [4,]    4    9   14   19   24
## [5,]    5   10   15   20   25

And we want to select the 1st, 3rd, 4th and 5th rows we could do:

my_matrix[c(1,3:5),]
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    6   11   16   21
## [2,]    3    8   13   18   23
## [3,]    4    9   14   19   24
## [4,]    5   10   15   20   25

If we wanted to select the 1st, 3rd, 4th and 5th columns instead, we could do:

my_matrix[,c(1,3:5)]
##      [,1] [,2] [,3] [,4]
## [1,]    1   11   16   21
## [2,]    2   12   17   22
## [3,]    3   13   18   23
## [4,]    4   14   19   24
## [5,]    5   15   20   25

If we want to select the 1st, 3rd, 4th and 5th rows and columns we could do:

my_matrix[c(1,3:5),c(1,3:5)]
##      [,1] [,2] [,3] [,4]
## [1,]    1   11   16   21
## [2,]    3   13   18   23
## [3,]    4   14   19   24
## [4,]    5   15   20   25

As it should be clear by now we can put these indices in variables so that we have an easier time reading the code like so:

keep_rows = c(1,3)
keep_cols = 3:5
my_matrix[keep_rows,keep_cols]
##      [,1] [,2] [,3]
## [1,]   11   16   21
## [2,]   13   18   23

3.5 Exercise

Write down which rows and columns we have selected in the previous code example.

Write the code to:

  • create a vector with numbers 1 through 100 and assign it to a variable named my_vector
  • create a table with 10 rows using the vector created in the previous line and assign it to a variable named my_tbl
  • select the 4th row of my_tbl
  • select the 6th column of my_tbl
  • select the rows 5 through 10 and columns 2, 4, 6, 8 and 10 of my_tbl

4 - Level

4.1 Functions

If you remember high-school math, a function is a simple operation (like we defined above) but where we can define some input.
For example, given the function f(x) = x + 1, we provide x and the function will give us x + 1. If we said that x = 2, then f(x) will return 3.
This is a very simple example, of course things can get much more complicated but the concept is still the same.

If you have a lengthy recipe that you often use, it might be worth to get a kitchen robout (which happens to work just like a function). You provide it with ingredients which can be few or many (the parameters of the function) and after executing its task the kitchen robot will give you a meal (return the result). The cooking analogy breaks down here because sometimes you will have to build your own kitchen robot. You will build your own robot later.

4.2 Exercise

Given the basic functions below:

  • abs(x) computes the absolute value of x;
  • sqrt(x) computes the square root of x;
  • log10(x) computes the logarithm of base 10 of x;
  • log2(x) computes the logarithm of base 2 of x.

Write the code to:

  • compute the absolute value of -42;
  • compute the square root of 9;
  • compute the logarithm of base 10 of 0.1;
  • compute the logarithm of base 2 of 1024;

and check the results.


In programming in general, a comment is a piece of text that will not be interpreted by the computer.
In R, a comment begins when you type the character #.

In R, when you execute a script, even though each line is being processed, not all of them will necessarily be shown on the screen. To specifically print a variable, you have the function print(x), this will print whatever is the value x.

For example:

abs(x=-42) # The result should be 42
## [1] 42
sqrt(x=25) # The result should be 5
## [1] 5
print(x="Hello world!") # Note to self: world didn't reply
## [1] "Hello world!"

You can also start a line with #, this whole line will not be interpreted.

# This is a comment!
print(x="Hello world!") # This is a comment too!
## [1] "Hello world!"
# World still didn't reply