This tutorial contains different levels of difficulty that build on each other. If you skip a level it might be possible that you won’t manage to finish the next one.
Further, Level 1 should be done BEFORE the class takes place. The class will start on level 2. Don’t ruin your chances of getting further in the class and learning more about this subject that you chose to learn by skipping level 1.
Please do Level 1 before the class takes place.
At this point you are supposed to have installed R and RStudio on your computer. If you haven’t done that yet, please go and do it before you move forward. You can install R from cran.rstudio.com and RStudio from rstudio.com (the links here will take you to the correct installation page).
R is a (statistical) programming language. RStudio is the environment where you will be typing your code.
Imagine RStudio is like Word. Makes your text pretty but if you don’t put anything inside it doesn’t do much.
R is great if you want to do data analysis and produce state-of-the-art plots.
If you just want to write a general program, there are other languages that might be more suited to your needs like Python for example.
R is free for everyone to use. Runs on Windows, Linux and MacOS so it is very portable. Has a very large community that produces code, packages and libraries for you to use as well as help when you need it. More often than not, if you want to write some code and you don’t know where to start, you can always find help online and examples that you can use yourself as the basis for your code. Like seriously if you start programming, regardless of the language, most of the time you will spend either on the help page of some library that you want to use or in google trying to figure out a solution to some problem.
Websites like RDocumentation, StackOverflow, StatMethods and R-Bloggers will save you countless hours of problem solving.
The R console is a direct comunication line with the “R computer”. Think of it like talking with R. If you type something in the R console and then press Enter, R will look at hear your message, interpret it and then give you a reply. R/RStudio has a very short memory. If you close R/RStudio and open it again at a later date, it will have no recollection of what you talked about the last time. A script is a simple text file that will work as your cooking recipe. It will probably have several lines that will be executed sequencially. Think of it like texting with R. If you write something in a script and save it, you can always look at it later. When you want to execute a script, you give the script to R, R will look at the recipe, check if there are any basic mistakes in it, interpret it and then give you a reply that follows the sequence in your recipe.
It is preferable that you write your code in a script because:
Open a script in RStudio, write print(“Hello world!”) and save it.
Execute the script and check the result.
Congratulations you wrote your first program! :)
If you only had to submit the previous exercise and you were called “Marie Fischer”, your script should look something like this day1_exercises_Fischer-Marie.R
We will start with a very simple exercise that will show you that programming is basically like using a glorified calculator.
In this very simple example, a value can be for example any given number. 1 is a value, 42 is a value, 0.9 is a value.
An operation is what you do to combine them. 1 + 1 is an operation, 100 / 0.9 is another operation.
In R the value of π (pi) is stored in the variable “pi”.
Type pi in the console, press enter/return and check the result.
In R you can assign a value to your own variable with the = character (you can do the same with <- but you can just use = for simplicity).
Code examples:
weekdays = 5
days_in_november = 30
beer_price = 5
x = 2.5
y = -9
my_first_name = 'Jane'
my_last_name = 'Doe'
is_weather_good = FALSE
is_cloudy_today = TRUE
In R you have 3 basic types of values: numeric, character and logical.
33 # This is a value of type numeric
## [1] 33
'Three-3' # This is a value of type character
## [1] "Three-3"
TRUE # this is a value of type logical
## [1] TRUE
A numeric value is any type of number. A character value is any type of character or character combination (like a word or a sentence) as long as it is between quotes ("). A logical value can only assume the value TRUE or FALSE.
Note that logical values are commonly known in programming as boolean values.
Usually values of different types can NOT be combined with eachother but can be combined between themselves. Basically you can do 2 + 2, copy+paste together “two” to get “twotwo” and have a condition or variable be TRUE or FALSE. However you definitely can’t do 2+‘two’+TRUE. However you can also definitely try :) Don’t be afraid to get your hands dirty.
Code examples:
2+'two'+TRUE # Errors often tell you what you are doing wrong, don't be scared
## Error in 2 + "two": non-numeric argument to binary operator
days_in_november = 30
days_in_december = 31
days_in_november + days_in_december
## [1] 61
As seen in the previous example, if you try to combine things that shouldn’t be combined, R will give you an error.
Given the basic arithmetic operators below:
Write the code to:
and write down the results.
Writing the result is not very practical, repeat the exercise but this time save each value to a variable.
Given that:
You have just learned about assining a value to a variable, write the code to:
As another example, the code below corresponds correctly to one of the baskets but the naming of the variables is too cryptic to understand.
qwe = 9
asd = 3
zxc = 'green'
rty = 0.8
fgh = 1
Bonus question: Can you imagine a different way to code the color of the apples using a logical value instead of a character value with a naming that makes sense?
Write the code to:
You should know that different types of variables need different operators to be combined.
For logical variables we can use && and || and !, this translates to AND, to OR and to NOT respectively.
If you have A = TRUE; B = TRUE; C = FALSE and D = FALSE then:
is_weather_good = FALSE
is_cloudy_today = TRUE
is_weather_good && is_cloudy_today
## [1] FALSE
For character variables things are a little bit more complicated. For now let’s just say that you can’t easily combine them with simple operators but that you need something a little more complex, you’ll need a function.
Even though we can’t (at the moment) combine character values together, we can do comparisons. This can be done, not only between characters, but also between the other types.
For definining relationships (or comparisons) between values and variables we need relational operators. These are:
When we define a relationship, the result is a logical value.
The relationship defined as 2 > 4 is FALSE.
The relationship (1 + 1) == 2 is TRUE.
The relationship ‘two’ != 2 is TRUE.
The relationship 23 <= 25 is TRUE.
Since these relationships are logical values, we can use logical operators to combine them. The relationship (2 > 4) && (23 <= 25) is FALSE.
The relationship !(2 > 4) && (23 <= 25) is TRUE.
Code examples:
2 == 'two'
## [1] FALSE
2 != 'two'
## [1] TRUE
2 > 4
## [1] FALSE
(1 + 1) == 2
## [1] TRUE
23 <= 25
## [1] TRUE
(2 > 4) && (23 <= 25)
## [1] FALSE
!(2 > 4) && (23 <= 25)
## [1] TRUE
'jane' != 'doe'
## [1] TRUE
'apple' == 'orange'
## [1] FALSE
We analyzed two datasets for differential gene expression between cancer and control groups.
The cancer group has 10 patients, the control group has 6 patients.
cancer_group = 10
control_group = 6
With the variables defined above, compute the following relationships:
Up until this point you have learned how to do arithmetic operations in R, how to define variables of different types and how to compute relationships between them.
We will now add up to what we’ve learned vectors and matrices.
It might not have been clear up until now but you should know that there are very many ways to do the same task when you are programming or writing code. Some(many)times there is not even an optimal solution. With this being said we will show you for example how to create and select elements in a matrix but there are other ways to do the same thing. Don’t be afraid to get your hands dirty! It is better to have a correct and ugly solution than no solution at all.
To put it simply, a vector is a one-dimensional matrix. A matrix has two-dimensions and is what people commonly call a table.
A matrix has n rows and m columns (n x m), a vector taken from that matrix will have 1 row and m columns (1 x m) or n rows and 1 column (n x 1). In R, vectors and matrices can only have elements of a single type. This means that, given what we have learned, a vector or matrix will be composed of either numbers, characters or booleans (logical variables) but not a mix of these. Note that if you try to create a vector with different types of values, they will all be converted automatically to a single type.
In R, to create a vector we use the notation c(x,y,z) where “x”, “y” and “z” are the elements (values or variables) inside the vector and “c()” is just the notation used so that R knows that we want to create a vector. If we want to create a vector with numbers 1 through 5 we could do:
c(1, 2, 3, 4, 5)
## [1] 1 2 3 4 5
If we want to create a vector with the first 5 letters of the english alphabet we could do:
c('a', 'b', 'c', 'd', 'e')
## [1] "a" "b" "c" "d" "e"
If we want to create a vector representing the days of the week, starting on Monday, as logical values where weekend days are TRUE and weekdays are FALSE we could do:
c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE)
## [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE
If we want to create a vector of p-values, we could do:
c(0.239, 0.913, 0.051, 0.043, 0.002, 0.115, 0.092)
## [1] 0.239 0.913 0.051 0.043 0.002 0.115 0.092
If we want to create a vector defining whether each of these p-values is below the usual threshold of 0.05, we could do:
c(FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE)
## [1] FALSE FALSE FALSE TRUE TRUE FALSE FALSE
For integers specifically there is an alternative and practical way that we can create a vector that will be very useful later on. By using the character ‘:’ between two different numbers a vector will be created with each numerical unit (integer) between them.
If we have 5:10 the vector 5, 6, 7, 8, 9, 10 will be created.
Code example:
c(5, 6, 7, 8, 9, 10)
## [1] 5 6 7 8 9 10
5:10
## [1] 5 6 7 8 9 10
Both the examples above mean (basically) the same.
This will be very practical when we want to subset vectors/matrices.
While it is very likely that you will create vectors, when talking about matrices it is more likely that you will load a matrix from a file (like an excel table) than it is for you to create one from scratch. Regardless if you do need to create one you will have to know at the very least two things. The first are the elements in your matrix writen as a vector, the second is the number of rows that you want your matrix to have. To create a 2 x 2 matrix with letters ‘a’ through ‘d’ you could do:
my_matrix_elements = c('a','b','c','d')
my_matrix = matrix(data=my_matrix_elements, nrow=2)
my_matrix
## [,1] [,2]
## [1,] "a" "c"
## [2,] "b" "d"
To create a 3 x 2 matrix
To create this matrix we use what we call a function. A function is a specifc operation that needs specific arguments and gives us back a result. We will explore functions later on but for now, to create a matrix, you need to provide the argument “data” for the data that you want in the matrix and the argument “nrow” for the number of rows that you want in your matrix.
If we want to create a 3 x 3 matrix with numbers 1 through 9 we could do:
my_matrix_elements = 1:9
my_matrix = matrix(data=my_matrix_elements, nrow=3)
my_matrix
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
If we want to create a 2 x 5 matrix with numbers 1 through 10 we could do:
my_matrix_elements = 1:10
my_matrix = matrix(data=my_matrix_elements, nrow=2)
my_matrix
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 5 7 9
## [2,] 2 4 6 8 10
Note the order of the elements in the matrices in the previous examples. Do they have the order that you would expect?
Given the sequence of numbers 9, 8, 7, 1, 2, 3, 5, 4, 6, write the code to:
You want to create a long vectors and large tables, write the code to:
To add to what you have learned, there are more arguments that we can use in matrix. Two arguments that are notably important is “byrow” which makes it so that the matrix is filled by row instead of by column (as it is by default). The other important argument is “dimnames”, this argument allows us to define the row names and the column names in a matrix.
We can perform operations over vectors and matrices as well as select or subset specific elements.
For example you might want to select the first 3 rows of a matrix or the very last element of a vector.
To subset a matrix or vector we can use the indices of the elements we want to select. At this point it is also better to start almost always saving values to variables as it is easier to perform operations over a variable than over the value itself. In R to select elements in a vector or matrix we use the notation “[]” (square brackets). This lets R know that we want to select elements of that variable.
Given the following vector variable called “vector_a”:
vector_a = c(1,6,8,2,6,7,1,2,6,7,1,2,58,9,9,5,8,7,16,8,42,6,87,5,69,8,5)
Let us select the first 3 elements (1, 6 and 8) by, inside the square brackets, putting the indexes of the elements we want to select:
vector_a[c(1,2,3)]
## [1] 1 6 8
If we remember the example from before when we said that an alternative way to create a vector would come handy, this is it!
vector_a[1:3]
## [1] 1 6 8
We can also combine both notations, for example if we wanted to select the first 10 elements and then the fifteenth:
vector_a[c(1:10,15)]
## [1] 1 6 8 2 6 7 1 2 6 7 9
Further, we can save the indices we want to select as a variable and then use this variable to subset the original vector, for example if we want to select elements 10 through 15 of vector_a we could do:
keep_elements = 10:15
vector_a[keep_elements]
## [1] 7 1 2 58 9 9
Given the sequence of characters ‘i’,‘a’,‘m’,‘l’,‘e’,‘a’,‘r’,‘n’,‘i’,‘n’,‘g’, write the code to:
Subsetting a matrix is a bit more complicated because we need to select rows and columns. We will still use the “[]” notation however now, instead of having a single number inside the square brackets (vector_a[2]), or a sequence of numbers (vector_a[4:8] or vector_a[c(3,2,1)]), depending on what we want to select/subset we can have several different options.
If we want to select:
For example if we have:
my_matrix_elements = 1:25
my_matrix = matrix(data=my_matrix_elements, nrow=5)
my_matrix
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 6 11 16 21
## [2,] 2 7 12 17 22
## [3,] 3 8 13 18 23
## [4,] 4 9 14 19 24
## [5,] 5 10 15 20 25
And we want to select the 1st, 3rd, 4th and 5th rows we could do:
my_matrix[c(1,3:5),]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 6 11 16 21
## [2,] 3 8 13 18 23
## [3,] 4 9 14 19 24
## [4,] 5 10 15 20 25
If we wanted to select the 1st, 3rd, 4th and 5th columns instead, we could do:
my_matrix[,c(1,3:5)]
## [,1] [,2] [,3] [,4]
## [1,] 1 11 16 21
## [2,] 2 12 17 22
## [3,] 3 13 18 23
## [4,] 4 14 19 24
## [5,] 5 15 20 25
If we want to select the 1st, 3rd, 4th and 5th rows and columns we could do:
my_matrix[c(1,3:5),c(1,3:5)]
## [,1] [,2] [,3] [,4]
## [1,] 1 11 16 21
## [2,] 3 13 18 23
## [3,] 4 14 19 24
## [4,] 5 15 20 25
As it should be clear by now we can put these indices in variables so that we have an easier time reading the code like so:
keep_rows = c(1,3)
keep_cols = 3:5
my_matrix[keep_rows,keep_cols]
## [,1] [,2] [,3]
## [1,] 11 16 21
## [2,] 13 18 23
Write down which rows and columns we have selected in the previous code example.
Write the code to:
If you remember high-school math, a function is a simple operation (like we defined above) but where we can define some input.
For example, given the function f(x) = x + 1, we provide x and the function will give us x + 1. If we said that x = 2, then f(x) will return 3.
This is a very simple example, of course things can get much more complicated but the concept is still the same.
If you have a lengthy recipe that you often use, it might be worth to get a kitchen robout (which happens to work just like a function). You provide it with ingredients which can be few or many (the parameters of the function) and after executing its task the kitchen robot will give you a meal (return the result). The cooking analogy breaks down here because sometimes you will have to build your own kitchen robot. You will build your own robot later.
Given the basic functions below:
Write the code to:
and check the results.
In programming in general, a comment is a piece of text that will not be interpreted by the computer.
In R, a comment begins when you type the character #.
In R, when you execute a script, even though each line is being processed, not all of them will necessarily be shown on the screen. To specifically print a variable, you have the function print(x), this will print whatever is the value x.
For example:
abs(x=-42) # The result should be 42
## [1] 42
sqrt(x=25) # The result should be 5
## [1] 5
print(x="Hello world!") # Note to self: world didn't reply
## [1] "Hello world!"
You can also start a line with #, this whole line will not be interpreted.
# This is a comment!
print(x="Hello world!") # This is a comment too!
## [1] "Hello world!"
# World still didn't reply