R Module 1 Notes
R Module 1 Notes
R Module 1 Notes
To enable us to use R, we firstly discuss its capabilities, describe how to install it,
illustrate some basic command, and know how to obtain help.
There are several statistical software packages that provide all sorts of analytical and
data management capabilities:
R (www.r-project.org)
SAS (www.sas.com)
SPSS (www.spss.com)
Stata (www.stata.com)
Price ++ (free) -- - +
Command Structure + + -- ++
Support + - -- ++
Ease of Teaching - -- + ++
R Commercial Package
Many different datasets (and other “objects”) One dataset available at a given time
available at same time.
One stop shopping - almost every analytical Tend to have limited scope, forcing
tool you can think of is available you to learn additional programs; extra
options cost more and/or require you
to learn a different language.
R is free and will continue to exist. Nothing They cost money. There is no
can make it go away, its price will never guarantee they will continue to exist,
increase. but if they do, you can bet that their
prices will always increase.
R Module 1 Notes
CAVEAT:
“Using R is a bit akin to smoking. The beginning is difficult, one may get
headaches and even gag the first few times. But in the long run,it
becomes pleasurable and even addictive. Yet, deep down, for those
willing to be honest, there is something not fully healthy in it.”
--Francois Pinard
A. What is R?
B. Installation
Select install R for the first time. Download and install in your device.
C. The R Workspace
◦ Toggle through previous commands by using the up and down arrow keys
Interactive
Command Window
Commands are
typed here.
R Scripts Window
R scripts
A text file containing commands that you would enter on the command line of R
To place a comment in a R script, use a hash mark (#) at the beginning of the line
Menu bar
R Module 1 Notes
Tool bar
Button Functions
• Open : Opens R file.
• Load Workspace
• Save: Saves the current data.
• Copy
• Paste
• Copy and Paste
• Stop current computation
• Print
• Functions:
– Almost everything in R is done through functions. Numeric and character
functions are commonly used in creating or recoding variables.
– Note that while the examples here apply functions to individual variables,
many can be applied to vectors and matrices as well.
trunc(x) trunc(5.99) is 5
round(x, digits=n) round(3.475, digits=2) is 3.48
signif(x, digits=n) signif(3.475, digits=2) is 3.5
cos(x), sin(x), tan(x) also acos(x), cosh(x), acosh(x), etc.
log(x) natural logarithm
log10(x) common logarithm
exp(x) e^x
seq(from , to, by) generate a sequence
indices <- seq(1,10,2)
#indices is c(1, 3, 5, 7, 9)
rep(x, ntimes) repeat x n times
y <- rep(1:3, 2)
# y is c(1, 2, 3, 1, 2, 3)
cut(x, n) divide continuous variable in factor
with n levels
y <- cut(x, 5)
y
• Matrix Arithmetic.
* is element wise multiplication
%*% is matrix multiplication
• Assignment
To assign a value to a variable use “<-” or equal (=) character
• Objects can be used in other calculations. To print object just enter name of object.
• Restrictions for name of object:
Object names cannot contain `strange' symbols like !, +, -, #.
A dot (.) and an underscore ( _) are allowed, also a name starting with a
dot.
Object names can contain a number but cannot start with a number.
R is case sensitive, X and x are two different objects, as well as temp and
temP.
• The assignment operator <-
x <- 25
• assigns the value of 25 to the variable x
y <- 3*x
• assigns the value of 3 times x (75 in this case) to the variable y
r <- 4
area.circle <- pi*r^2
area.circle
• NOTE: R is case-sensitive (y ≠ Y)
• We can evaluate truth or falsity of expressions:
2>1
1>2&2>1
• generate sequences (and perform operations on them)
3*(1:5)
R Module 1 Notes
Workspace
• Objects that you create during an R session are hold in memory, the collection of
objects that you currently have is called the workspace.
• This workspace is not saved on disk unless you tell R to do so. This means that your
objects are lost when you close R and not save the objects, or worse when R or your
system crashes on you during a session.
• During your R session you can also explicitly save the workspace image. Go to the
`File‘ menu and then select `Save Workspace...', or use the save.image function.
## save to the current working directory
save.image(“basicR .Rdata”)
## just checking what the current working directory is
getwd()
## save to a specific file and location
save.image("C:\\Program Files\\R\\R-2.5.0\\bin\\basicR .RData")
• If you have saved a workspace image and you start R the next time, it will restore the
workspace. So all your previously saved objects are available again. You can also
explicitly load a saved workspace Go the `File' menu and select `Load workspace...'.
or alternatively:
load ("basicR.RData ")
• R gets confused if you use a path in your code like c:\mydocuments\myfile.txt.
Note that R sees "\" as an escape character. Thus, it is better to use
c:\\my documents\\myfile.txt or c:/mydocuments/myfile.txt.
• To list the objects that you have in your current R session use the function ls or the
function objects :
ls()
objects()
• So to run the function ls we need to enter the name followed by an opening “(“ and a
closing “)”. Entering only ls will just print the object, you will see the underlying R
code of the function ls.
• Most functions in R accept certain arguments.
• For example, one of the arguments of the function ls is pattern. To list all objects
starting with the letter “x”:
x2 = 9
y2 = 10
ls(pattern="x")
• If you assign a value to an object that already exists then the contents of the object
will be overwritten with the new value (without a warning!).
• Use the function rm to remove one or more objects from your session.
R Module 1 Notes
rm(x, x2)
• Let us generate two small vectors with data and a scatterplot.
z2 <- c(1,2,3,4,5,6)
z3 <- c(6,8,3,5,7,1)
plot(z2,z3)
title("My first scatterplot")
◦ R comes with a number of sample datasets that you can experiment with. Type data() to
see the available datasets. The result will depend on which packages you have loaded.
◦ Type help(datasetname) for details on a sample dataset.
data()
help(women)
The system allows you to write new functions and package those functions in a
so called `R package' (or `R library').
The R package may also contain other R objects, for example data sets or
documentation.
• There is a lively R user community and many R packages have been written and made
available on CRAN for other users.
Just a few examples, there are packages for portfolio optimization, drawing
maps, exporting objects to html, time series analysis, spatial statistics and the list
goes on and on.
• You can use the function search to see a list of packages that are currently attached to
the system, this list is also called the search path.
search( )
• To attach another package to the system you can use the menu or the library function.
library()
library(MASS)
shoes
R Module 1 Notes
• Or you can use the Menu: Select the “Packages” in the Menu and select “Load
Package”, a list of available packages on your system will be displayed. Select one and
click “OK”, the package is now attached to your current R session.
• Suppose we want to install a package called Rcmdr:
Choose Rcmdr in Packages ► Install packages menu
Or alternatively run the command: install.packages("Rcmdr“)
D. Getting Help
• R has a very good help system built in.
• If you know which function you want help with simply use ?_______ with the function in
the blank. For example for the functions hist and lm:
?hist
args(hist)
?lm
args(lm)
A. Scalar
B. Vectors
• null vector
# null vector using content function
x <- c()
• numeric vector
# numeric vector
a <- c(2,4,-3.6,12) ; a
• character vector
# character vector
b <- c("one","two","three")
R Module 1 Notes
• logical vector
#logical vector
c1 <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE)
# replicate
e = rep(NA,5); e
# length
length(d)
• Be careful with assignments using “<- “, this is not the same as “ < - “
# assigning a value of 2 to f
f<-2
f
# is f less than negative 2?
f< -2
• We can sort:
a
a <-sort(a, decreasing=TRUE); a
EXERCISE
1. Generate a vector e1 of positive even integers less than 100.
2. Remove the values greater than 50 and less than 90, and store these into e2.
C. Matrix
• All columns in a matrix must have the same mode(numeric, character, etc.) and the
same length.
• General format is
mymatrix <- matrix(vector, nrow=r, ncol=c,
byrow=FALSE,dimnames=list(char_vector_rownames, char_vector_colnames))
Or alternatively:
nrow(mat_a)
ncol(mat_a)
R Module 1 Notes
• Do matrix multiplication
mat_a %*% t(mat_a )
First row
mat_a[1,]
Extracting 2nd element in 1st row, and 3rd element in 2nd row;
c(mat_a[1,2], mat_a[2,3])
• We can stack two vectors, one below the other, use rbind():
mat_b <-rbind(a,a); mat_b
If one vector has less length than the others, elements will be repeated until
appropriate:
a
d
mat_c = rbind(a,d); mat_c
• We can stack two vectors, one next to each other, use cbind():
mat_d <-cbind(a,a); mat_d
• To see how many missing values there are, use sum and is.na functions:
sum(is.na(mat_e))
• To obtain the element number of the matrix of the missing value(s), use which and is.na
functions:
which(is.na(mat_e))
Note: by default counting goes from first column, to next columns.
R Module 1 Notes
• EXERCISE
Find the matrix product of M_A and M_B if
M_A= M_B =
D. Arrays
E. Dataframes
• Another generalization of a matrix , but with different columns possibly having different
modes (numeric, character, factor, etc.).
d <- 1:5
e <- c("red", NA, "white", "blue", "red")
f <- c(TRUE,TRUE,TRUE,FALSE,TRUE)
mydata <- data.frame(d,e,f)
names(mydata) <- c("ID","Color","Passed") #variable names
F. Lists
G. Factor