Intro to R
Liana Harutyunyan
Programming for Data Science
April 4, 2024
American University of Armenia
liana.harutyunyan@aua.am
1
Install R
• First, you need to install R.
• Secondly, you need to install RStudio.
2
RStudio
In the Console part, you can type any R command, and hit
enter.
For reproducable code, you can open .R or .Rmd files.
• .R files - are just for plain R code
• .Rmd files - stands for R markdown, and can include
both text and R code (like jupyter notebook)
3
RStudio Projects
RStudio Projects make it easier for you to work, they set
your working directory, save the history, documents.
To create File -> New Project:
• in a new directory
• existing directory
Easiest way to do this:
• Create a folder using your OS
• In RStudio choose ”existing directory” when creating a
project, and click on the empty folder you just created.
• Bring all your files you are going to work in the folder.
4
RStudio Projects
• When you create a project, later you can open it using
File -> Open Project.
• In one project folder, you can have multiple .R and .Rmd
files.
5
RStudio
• To run your R code, use ctrl/cmd + Enter.
• In the left upper corner of RStudio, you can see
Environment: that includes all variables that are
currently stored in the memory.
• In another tab here, you can see History where it stores
all commands you have run.
6
RStudio console
You may want to re-execute commands that you previously
entered. The RStudio console supports the ability to recall
previous commands using the arrow keys:
• Up
• Down
7
R basic functionality
Let’s try built in functions:
• log
• factorial
• mean
To get the documentation on the functions you can type in
the console ?log.
8
R packages
• There are a lot of built-in R functions.
• There are many packages that support many other
functionalities.
• To install a package:
install.packages("package name")
You need to write this once in your console. After it is
installed in your OS once, no need to this again.
Because of this, please remove the line after installing
or comment it ().
• Once you install it, you need to import it to use in your
code. For this, have in your code:
library(package name)
9
Variables
As in Python, everything in R is an object.
• We can assign a value to a variable by: x <- 3
• Variable names follow the same rules as in Python.
They can not start with numbers.
10
Boolean types
Boolean types in R are TRUE and FALSE (also can be written T
and F).
11
Vectors
There are vectors and lists in R.
Vectors consists of objects of the same type and are
created using c().
v1 <- c(1, 2, 3)
v2 <- c(1:5)
v3 <- c("Anna", "Bob")
When different data types are given to vector, it coerces the
types into one common. You can check the class of vector
by class(v3).
12
Vectors
In R, unlike most programming languages, indexing starts
from 1.
To get first element of vector:
v3[1]
13
Named vectors
We can assign names to vector members.
v4 <- c("Liana", "Harutyunyan")
names(v4) <- c("first name", "last name")
Then we can retrieve the element by its name.
v4["last name"]
Furthermore, we can reverse the order with a character
string index vector.
v4[c("last name", "first name")]
14
Matrices
A matrix is a collection of data elements arranged in a
two-dimensional rectangular layout.
The data elements must be of the same basic type (if not,
coercion).
A = matrix(c(2, 4, 3, 1), nrow=2, ncol=2, byrow = TRUE)
Documentation:
A = matrix(numbers, number of rows,
number of cols, whether to fill by rows)
15
Matrices
An element at the mth row, nth column of A can be accessed
by the expression A[m, n].
As indexing starts from 1, element at 2nd row, 2rd column
will be just
A[2, 2]
The entire second row:
A[2, ]
The entire first column:
A[, 1]
16
Matrices
You can also obtain matrices, if you combine two vectors
together.
Functions: rbind, cbind
x <- c(0:10)
y <- c(-5:5)
mat 2 <- rbind(x, y)
mat 3 <- cbind(x, y)
17
Matrices
If lengths are not equal, R gives a warning and starts to
repeat the vector elements of the one that is not enough.
Matrices can also have colnames and rownames.
18
Data Frames
A data frame is the most common way of storing data in R.
df <- data.frame(x=1:10, y=11:20)
Try out head, str, summary functions.
Dataframes also have colnames and rownames.
19
Data Frames
To access certain element of data frame (you can also
modify it):
The first index is for rows, the second for column.
df[1, "x"]
There are no loc, ilocs like in Python.
Try out: df[1, ], df[c(1, 3), ], df[, "x"], df[, c("y",
"x")].
20
Data Frames
To subset data frame:
data <- data[condition, ]
example: df[df["y"] > 15, ]
21
Lists
Unlike vector or matrices, in lists, we can store different
types of data.
For example: one element can be a data frame, the other a
matrix.
unlist function can flatten everything to vector.
22
Lists
To index a list we use two [[]]:
list 1[[1]]
This can be, for example a dataframe.
23
Factors
Factors in R are objects that have a fixed and known set of
possible values.
factor ex <- factor(c("low", "high", "medium", "low"))
levels - argument is the set of unique values it can take.
factor ex <- factor(c("low", "high", "medium", "low"),
levels=c("low", "high"))
24
Data Types
• named vectors
• matrices (is.matrix?)
• rbind
• cbind
• List
• data.frame (rownames, colnames)
• factors
25