STA1007S Lab 1: R Interface: Getting Started

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

STA1007S Lab 1: R interface

SUBMISSION INSTRUCTIONS:

Your answers need to be submitted on Vula.

Go into the Submissions section and click on Lab Session 1 to access the submission form. Please note that
the answers get automatically marked and so have to be in the correct format:

ENTER YOUR ANSWERS TO 2 DECIMAL PLACES UNLESS THE ANSWER IS A ZERO OR AN INTE-
GER (for example if the answer is 0 you just enter 0 and not 0.00, or if the answer is 2 you enter 2 and not 2.00).

DO NOT INCLUDE ANY UNITS (ie meters, mgs, etc).

PROBABILITIES MUST BE BETWEEN 0 AND 1, SO A 50% CHANCE WOULD CORRESPOND TO A


PROBABILITY OF 0.5.

Getting started
R is currently the most general statistical software available. You are unlikely to come across a statistical
technique that R cannot do. For this reason, R has become the standard tool for doing statistical analyses in
many fields. Being able to run R will be a distinct advantage when you apply for work as a scientist.
One big advantage of R is that it is free. Whatever you do after you finish your studies, you will have access
to R. In this course, our aim is to empower you to be able to use it. And this is the tricky bit. R is a
command-driven program, so most people find it is a steep learning curve at first. However, our labs are
designed to give you a gentle introduction to R. By the end of the course, we want you to be comfortable
with the basics of the software and be able to carry out basic statistical calculations with it. You will most
likely be using R again in some of your other subject-specific courses and so the STA1007S labs are meant to
prepare you for your studies and give you a tool that will be very useful whenever you have to work with
data.

What files do you need?


Because R is command driven, it allows you to organise your data-related work in a very efficient way. All
you need for your data analyses is a data file and a file with R code. In the R code file (we call it the R
script), you are going to tell R where to find the data, how to read the data file and what kind of analysis
you want R to carry out. This is all you need to save because you will always be able to reproduce the exact
same analysis by rerunning the code.

First steps with R


Now let’s start using R. If you are working on a computer where R is not yet installed, follow the instructions
for downloading and installing R and RStudio in the file ‘Installing_R_and_RStudio.pdf’ that can be found
on Vula, Resources, Miscellaneous, R.

1
1. Let’s start by making a new folder in a convenient location on your computer. Give this folder an
informative name – perhaps “MySTA1007analyses”. You will save all relevant files for the lab sessions
in this folder.

2. Once you’ve done this, double-click the R shortcut on your desktop:

This opens R and presents a window called the R console with some information about the version of R you
are using etc.:

The R console is one interface you could use to communicate with R. We are going to use a different interface
in a moment but let’s look at the console first just to get a feeling for how R works. R expects commands
from us. These need to be entered after the prompt (the red ‘>’). To see how it works let’s try some basic
arithmetics. Type the following command and hit enter:
5+2

## [1] 7
You’ve asked R to calculate 5 plus 2 and it returns the answer, 7, in the next line in blue colour. A new
prompt appears and R is ready for the next command.
Hopefully, you will see that 5+2 above is in a grey box, although the box might print out quite light and
difficult to distinguish from the background. You will also notice that the font changes to something like
this. This type setting represents code, which you must type into R, unless otherwise stated. The output
will usually follow, so that you can check that you are getting the right response from R. The output will be
preceded by a ## sign, like the 7 above. The number [1] between the square brackets is just an index that
we don’t have to worry about for now.
Now you can type the following command and hit enter:
sqrt(2)

## [1] 1.414214
You’ve asked R to calculate the square root of 2. sqrt() is what we call an R function. R has many built-in
functions and you may also create new, customized ones. Functions take in arguments (in the previous
example the number 2 would be an argument) and output some result (in the previous example the square
root of 2 is the output). We’ll see plenty of functions during the labs.

2
Then, we have “operators”. Operators also give instructions to R but they are less complex than functions
and don’t need arguments. For example, we have the common arithmetic operators:
• Addition: +
• Subtraction: -
• Multiplication: *
• Division: /
• Exponentiation: ˆ
You already know these. Let’s try a new one. Type in:
1:10

## [1] 1 2 3 4 5 6 7 8 9 10
The : operator creates a sequence of numbers; in this case from 1 to 10. This will be very useful for indexing
in coming labs; for example, if you wanted to select the first 10 rows of your data.
OK, great, so R is a (very complicated) calculator. Let’s make things a bit more interesting. Apart from
calculating things, R can also store things that, in general, we call “objects”. There are many types of different
objects but for now we will look at only a few basic ones. A very simple object might be just a number. For
example, type in the following command:
x <- 2

Nothing happened? We know that the code was executed because a new prompt appeared under our code
and we didn’t get any error messages from R. We have just created a object x and we have assigned the
value 2 to it, using the <- operator. Even though R doesn’t say anything, now our object x is stored in R´s
“memory” and we can work with it as if it was a variable with value 2. If we type in the following command:
x + 3

## [1] 5
We obtain the value 5, which is just 2 + 3. In fact, we can just retrieve the value of the variable x by typing:
x

## [1] 2
Now, we can take this further and play around with different variables. For example, type in the following
code (press enter after each line):
my.income <- 15000
my.expenses <- 12000
my.savings <- my.income - my.expenses
my.savings

## [1] 3000
All three variables: my.income, my.expenses and my.savings are stored in R´s memory and we can call
them at any time by writing their names.
my.income

## [1] 15000
But we must be very careful, if we assign a different value to any of these variables, R will overwrite them
without any warning! Type the following lines and press enter after each one of them:
my.expenses <- 15000
my.savings <- my.income - my.expenses
my.savings

3
## [1] 0
Our savings are gone! We have overwritten the object my.expenses and R didn’t say anything. Well, you
get the idea; for R, variables are just variables and you can assign any value to them at any time. Be careful
when picking names for your objects and try to keep track of them. RStudio is a different interface we use to
communicate with R and it makes it easier to keep track of the objects you create. We’ll see that in a minute.
Also, writing code in the R console is not very effective. Instead of feeding R commands one-by-one, we can
type up a sequence of commands that we’ll pass on to R. This is called an R script and it usually contains all
the commands that we require to conduct some analysis. Writing clean and organized scripts is extremely
important, as we will see shortly. R has an in-built script editor but RStudio makes it easier to write and run
code in an organized way.
We are now moving to this more user-friendly way of communicating with R. Close R and when it asks you
whether you want to save the work space, say ‘no’.

R Scripts
In this course, we will use R through the RStudio interface. RStudio is really just an interface that makes
controlling R easier.

1. Double-click this icon on your desktop to open RStudio:


2. RStudio opens. Go to File → New File and click on ‘R script’.

You should now see something like this:

The RStudio window is divided into four sections (if you only see three sections, go to File → New File → R
Script). The bottom left section should look familiar to you. It is the R console that we just looked at, with
the information on the R version and the command prompt. The top left window currently looks just blank.
This is the script editor and we are going to use it next.

4
Starting your first R script
An R script is a collection of R commands and we are going to type these commands into the script editor,
i.e. the top left window in RStudio. You will see that writing something here and pressing enter doesn’t
run any code; it just jumps to the next line, like in any other text editor. That’s exactly what the script
is for. We can write several lines of code and run them all at once when we are ready. But before we let
you loose on R commands, we introduce you to another important aspect of scripts: annotation (comments).
Annotation contains information for you (not for R), to remind you what a particular R script does. Any
information you don’t want R to read and process should be preceded by a #. When R sees a #, it ignores
the rest of the line and does not attempt to do anything with it.
It is good practice to start every script with some annotation – information that helps you remember what
the script does. In the script editor (top left window), type something like this (don’t worry about running
the code just yet):
# Amazing R User (your name)
# 23rd August (today's date)
# This script contains some commands I learned in STA1007S Lab 1

You see there is no output for this code chunk. That is because even if we run the code, R wouldn’t do
anything with it, because of the # signs. It will interpret this as normal text, not commands.
Now save this script to the folder you created earlier (“MySTA1007analyses” or whatever you called it) using
File → Save. Call it lab1script.R (or something sensible). Avoid using special characters (#, $, %, &, *, +,
etc) or spaces in the name. This might cause trouble down the line. The file is just a plain text file but giving
it the extension ‘.R’ will tell your operating system to open this file in RStudio in the future.
We’ll now look at a few really useful R functions that you will be using at the start of almost every script.
First, we want to clear R’s “brain” to be sure there are no objects in our work space that might interfere
with what we want to do. Remember when we created our my.income, my.expenses, my.savings? We don’t
want to have any of those laying around causing trouble. It’s like cleaning the kitchen table before starting to
prepare your meal. To do this, add the following line to your script file (again, don’t worry about running
the code yet, for now, just type in the commands):
# Clear my work space
rm(list = ls())

We have taken a bit of a leap here. Don’t worry if you don’t fully understand, you will soon enough. This
is nothing but nesting functions one inside of another. To explain what this does, let’s read this from the
inside out. The function ls() requests all the objects in R’s memory and list=ls() makes a list of these
objects. Finally, rm() asks R to remove all of these objects. Note the annotation before the actual code.
This will remind us what the next commands are intended to do and why we want to execute them. This is
EXTREMELY useful when we need to understand a script that we haven’t looked at for a while.

Setting a working directory


Next, we need to be able to tell R where to look for our data and where to save any files. The location where
files are saved on a computer is called a path. When you want to either save a file or access an existing file, R
will look in the path that is set as the working directory. To find out where R is currently looking, we can
use the function getwd(). Now you can type the following lines of code (still don’t run the code, just type as
if it was normal text):
# Find out what directory R is working with
getwd()

Everyone might get a slightly different output from this function. The name of the function getwd stands for
get working directory. This is the path on your computer where R is currently expecting to get things from.
It is usually not the location where your files are actually stored!

5
Luckily, we can change it easily with the function setwd(), for set working directory. Inside the round
brackets, you need to give R the path to your folder. There are easy ways to get the path without having
to type it. Mistyping the path is one of the most common (and most frustrating) problems when you start
using R. So how do you find the path to a folder or file on your computer?

1. Open Windows Explorer and navigate to the folder you created at the beginning of the lab
“MySTA1007analysis” or similar.
2. Windows Explorer will display the path in the title bar when you click on the left corner of this bar
(see figure below).
3. You can highlight the path in the address bar with your mouse and copy it with ctrl-c.

In Windows, the path starts with a drive letter and the folders are then separated by backward slashes
(e.g. C:\user\mySTA1007analyses). R conforms to the more general convention (shared e.g. with Mac and
Unix operating systems) of separating the folders with a forward slash (e.g. C:/user/mySTA1007analyses).
So, when you copy the path from Windows Explorer, you will need to change the backslashes (\) to forward
slashes (/).
You can do this now and type the following command in your R script. Substitute the — by the correct path
to your working directory and enclose your path in double quotes:
# Change working directory to STA1007 labs folder
setwd("---/MySTA1007analysis")

You can then again use:


# Confirm that R is now looking in the correct folder
getwd()

to verify that R is now looking in the correct place.


Your script should now look something like this:

6
Make sure you save the script file frequently. Together with the raw data, this is the most important file in
your R-life.
Alternatively, you can also use the RStudio drop down menus to set the working directory. To do this click
on the “Session” option and then select “Set Working Directory” followed by “Choose Directory” and then
browse to the location that you want.

Running your R script


OK, so now we’ve created a script, but we still need to run it! In the console window (the bottom left hand
one), you can see the prompt (>and a cursor flashing after it). Now go back to the script window (top left).
Select the first few lines of your script (the ones starting with #) with your mouse. This should highlight
them. Then click on the ‘Run’ button just above the top right corner of the script window. Now look at the
console window again. RStudio has sent the highlighted lines of code to the R console, which returned the
text to you and effectively ignored the message. We know this because R did not return any errors and is
showing the prompt again. Since these lines contained only annotation, this is what we had expected. R is
ready for the next commands.
Now select the lines of the script with the rm(list=ls()) and the getwd() commands, run them and look
at the console. You’ve asked R to clear its memory and tell you where it is looking, and it returns the path
that it currently uses. Now run the line with setwd() to change where R is looking. You can check that it
has done this by running the getwd() again.
Great! You’ve run your first R script and you are starting to get a feeling for how to steer software via
commands – a very powerful way of communicating with your computer.

RStudio projects
We can anticipate that dealing with working directories might cause some trouble. This is especially true when
we are working on multiple computers and we want our scripts to work smoothly on all of them. So, what
should we do when our scripts are in different directories on different computers? Or in that odd situation
where we are running scripts from a flash drive? We can always change our working directory depending on
the situation but even then, other links in our script might be broken. It turns out that RStudio, once again,
comes to the rescue. The solution to our problems in this case is RStudio projects.
The idea is simple, you create a project in the folder you are going to be working on and this folder will be
your root directory from now on. This means that at the moment you open a project, R will immediately
switch the working directory to this root directory. This also means that when you want to refer to any file,

7
you can just ignore all the path up until the root directory. This makes your code much more compact and
clear. And not only this, if you move your project folder to a different location or even to another computer,
the root directory will still be the same as long as the project files are in that folder. This might sound a bit
confusing, but it is actually quite simple.

1. Click on File → New Project. Alternatively, you can click on the Projects tab (see figure below) on the
top right corner of your screen → New Project.
2. Then click on New Project → Existing directory
3. Now browse to the folder you created at the beginning of the lab, “MySTA1007analyses” or something
similar.
4. Once you are in that folder, click “Create project”.

Now, if you go into your “MySTA1007analyses” folder you will find a new R Project file. This is all you need.
From now on, whenever you want to work on STA1007S labs you can open your project by:

1. Click on File → Open Project. . . Alternatively, you can click on the Projects tab (see figure below) on
the top right corner of your screen → Open Project. . .
2. Browse to your project folder and double click in the R Project file.

R will automatically load your project as you left it last time and it will make the project folder the root
directory. This means that R will first look for any data in this folder and will save any files or objects in this
destination as well. You may copy and paste the project folder anywhere on your computer or even on a
different computer and R will will make the new destination the root directory.
Now we are completely ready for the fun! But unfortunately, we’ll have to wait for the next session.

8
And now to the thing you have to hand in. . .

SUBMISSION:
Vula Question 1 [TRUE or FALSE]
RStudio is just an interface that makes it easier for us to communicate with R. But we could also use R
without RStudio and achieve similar results.
Vula Question 2 [TRUE or FALSE]
Whenever we perform a calculation in R, it will not only display the result in the console but it will also
automatically save the result in its internal memory.
Vula Question 3 [TRUE or FALSE]
The working directory is the folder in which R will first look for data and save information into.
Vula Question 4 [TRUE or FALSE]
To change the working directory we will use the function ‘setwd()‘.
Vula Question 5 [TRUE or FALSE]
To make annotations in our code we will start our command line with the character #.
Vula Question 6 [TRUE or FALSE]
We should start our script with calculations right away and not lose our valuable time making annotations
and comments throughout the script.

The commands you learned today


These are the functions and operators that you learned today. Fill in your own description of what they do.
#
rm()
ls()
getwd()
setwd()
<-
:

Some useful resources for help and further reading


• Quick-R is a web page with lots of useful material and help with R:
http://www.statmethods.net/index.html
• Have a look at the Introstat R companion and associated R script and .RData file on Vula, under
‘Resources’ → ‘Miscellaneous’ → ‘R’. The companion gives you R code and the data related to many of
the examples in Introstat.

You might also like