RBigData NTL

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Course presented at the Congreso Internacional de Estadı́stica

Escuela Superior Politécnica de Chimborazo, Riobamba, Ecuador; October 2019

Introduction to R

Nicholas T. Longford, Imperial College London, United Kingdom


sntlnick@sntl.co.uk

1. Basics
2. How R works
3. What is special about R
4. How to be clever in R
5. R for all your computational needs; programming
6. Graphics in R — to explore, to inform, to impress
7. Expanding your computational skills and experience
First steps. R as a calculator

Start: Click on the R icon


Finish: q() and click on Don’t Save, Cancel, or Save

Save — Saving your work (Workspace)


— objects you created in the session

A session:
Your work in R between Start and Finish

A session consists of a sequence of expressions (commands)


which are executed, if possible
missing value — NA — most operations with NA — result is NA

2
The basic rules of R

Syntax — the expression you type has to be interpretable


Execution — R has its rules for what can be executed

Error and warning messages

A comprehensive help system help(help)

Workspace — storing the objects you created

Objects are created by assignment

The syntax of an assignment:


[New name] <- a valid expression
<- — the symbol for assignment
3
The basic rules of R. Example

Assignment A <- 5
Check your workspace ls()
Display the value of A A
Use A in an expression (A + 4)^2 / 107.3
Create a new object using A B <- log((A + 4)^4 + 17)
Remove A from workspace rm(A)
System-defined objects Functions sqrt, exp, round, . . . ,
Constants pi, datasets
User-defined objects A
Apply a function sqrt(A + 5.2)

4
R as a calculator. Scalar operations

The standard rules of calculations:


valid representation of numbers
numbers (or objects) separated by symbols for operations
priority indicated by paired parentheses ( )
parentheses can be used multiply

A valid name (syntax) has to be used for an object used for assignment
alpha-numeric
The value of an object can be over-written
assignment to an object that has already been defined

5
From scalars to vectors

System-defined function to concatenate scalars and vectors


A3B <- c(2, 6, 4.15, π, 1004)

Spaces have no interpretation, except to separate object names and numbers


Use spaces as a ‘cosmetic’ feature of your code

Carriage return indicate the end of an expression


Semicolon ; separates two expressions written on the same line
An expression can be written across more than one line,
if it is incomplete at the end of the line
Try: 17 +
32 - 0.5
6
From scalars to vectors II

Vectors can be concatenated:


A4B <- c(A3B, 107, -16, -0.03, A3B, round(A3B^2, 1) )

Operations on vectors:
length(A4B); names(A4B)
round — see above

Functions help, args, example

Syntax: func(arg1, arg2, ...)

Examples:
help(round); help(seq); args(floor)
7
Vectors generated by the system

Regular sequences: seq, rep


seq(5, 15, length=21)
rep(seq(4), 5)

Random numbers
runif(500, 2, 8)
rnorm, rgamma, rbeta, rchisq, rt, . . .
— use help to learn about them

Changing the order of the elements of a vector:


sort(vec); rev(vec); (rank(vec); order(vec))
Combined with other functions: unique(round(vec))
8
Arguments of a function

Arguments have (symbolic) names, e.g., runif(n, min=0, max=1)

Mandatory argument: You have to specify a value (e.g., for n)


Optional argument: There is a default;
— it can be overruled by your specification,
e.g., runif(n=500, min=1, max=10)

With the names, the arguments can be presented in any order


Without the names, the order of the arguments has to conform the definition
Named and unnamed arguments can be mixed, but it is not a good practice

The arguments may be values, objects, or expressions (evaluations)

9
Type of objects and their attributes

vectors
functions is.vector, as.vector
scalar is a vector (of length 1)
matrices and arrays — matrix, is.matrix, as.matrix
lists — list, is.list, as.list
collections of objects
functions — function, is.function
data frames — data.frame, is.data.frame
user-defined types

10
Numeric, character and logical

Three basic types of variables/values

as.numeric, is.numeric
as.character, is.character
as.logical, is.logical

Coercion — forced change from one type to another


chr <- as.character(c(4, 17.4))
cannot use arithmetic on chr

AB * (CC > 0)
CC > 0 is a logical vector,
but in the numeric operation it is interpreted as 0/1
11
Character and logical functions

character strings (words) are in double quotes, e.g. "word"


word — an object’s name

nchar, substring, paste

Logical values: T, F
Logical operators:
==, &, |, !=, !

character and logical vector


types cannot be mixed in a vector
Try: c(15, "A"); c(4.2, T, 9, F); c(T, F)^2

12
Naming and subsetting

vector AB;
names(AB) <- c("First", "Second", "Third", "Tabasco",
"Quintana Roo", ...)
AB[seq(6)] — the first 6 elements of AB

Subsetting:
by element No.s: ls()[seq(50)]
by names: AB[c("First", "Second")]
by a logical vector (T — include; F — exclude)
by negatives of elem. No.s — elem.s to exclude (e.g. vec[-seq(4)])

!! The subset-vector can be an expression (evaluations)

13
A bit of fun

In a party of 15 unrelated persons, what is the probability


that two persons have birthday on the same day?

Probability that 2 persons, 3 persons have birthdays on distinct days:


364 364 363 364 363 362
, × , × × ,...
365 365 365 365 365 365
1 - prod(seq(365, 365 - 2)) / 365^3
Improvement:
Psz <- 5
1 - prod(seq(365, 365 - Psz + 1)) / 365^Psz

Q. Simultaneous calculation for a sequence of party sizes??


14
Graphics

Histogram:
hist(rnorm(20000, 1.7, 4.2))

Study help(hist)

Additional (optional) arguments:


xlab, ylab, xlim, ylim,
main, sub, . . .

Adding to the histogram:


points, lines, segments, polygon
text, legend
— each function with a vast array of its own arguments
15
Plots

Function plot

plot(vec)

Additional arguments — the same as for hist


— generic arguments for plotting

Plot types:
argument type=, values: "n", "p", "l", "b"

line width: lwd, symbol size: cex, colour: col


The system-defined pallette of colours: colors()
Homework: Study help(plot)
16
Functions

Examples (functions with a single expression):

## Function to count the number of unique elements


LeUni <- function(vec)
length(unique(vec)

## The number of unique elements of a vector


sumNA <- function (vec)
sum(is.na(vec))

## The probability of same-day birthday


BdayPr <- function(k)
1 - prod((365 - seq(k) + 1)/365)
17
Programming

Loops:
for (i in vec)
{
R code (involving i)
}

Conditional loops:
while (condition)
{
R code
}
Example: Iterative algorithms (e.g., GLM)

18
Matrices

— two-dimensional arrays
MAT <- matrix(data=seq(16), nrow=8, ncol=6,
dimnames=seq(8), LETTERS(6))
diag(vec)

Recycling — re-using a vector if necessary


Coercion — the result has elements of the same type

Submatrices:
MAT[, seq(3)]; MAT[c(3, 7, 3, 8), c("C", "E", "A")]
MAT[sort.list(MAT[, 2]), ]
Repetition, using conditions, exclusion, etc.
19
Working with matrices

matrix multiplication — operator %*%

Function apply — apply a function on every row/column of a matrix


apply(MAT, 1, sum) — the vector of row totals
apply(MAT, 2, LeUni) — the vector of column . . .
1/2 — the dimension (margin)
— can use your own function
The result: a vector if the function’s result is a scalar
a matrix if the function’s result is a vector of fixed length
a list if the function’s result has variable length

A matrix is also a vector: MAT[5 + seq(15)]


20
Input and output

Function scan for inputting a vector

library(foreign) — input of data formatted by other packages


read.csv, read.sas, read.dta

Output:
write.csv, write.sas, write.dta

Save one or a set of R objects


save
Recover the objects in a saved file:
load
System-defined datasets stored in R: package datasets
21
Lists

List in R is an indexed collection of objects


list(MAT, letters, LeUni, ...)
The elements of the list may be unrelated
— of different types, with different attributes

Example:
LST <- list() ## An empty list
for (i in seq(10))
LST[[i]] <- seq(i)

Operating on a list:
lapply(LST, sum) ## List of the within-element totals
22
Lists II

as.list --- convert to a list


sapply(LST, sum) ## Turn the result to a vector if possible
The function used in lapply or sapply may even be apply
names(LST) ## list with named elements
LST[[3]] ## 3rd element of the list
LST[seq(2,5)] ## sublist comprising elements 2 -- 5

Example:
ExtrC <- function(mat, cls)
mat[, cls]
lapply(LST, ExtrC, 4) ## The 4th cols pf elem.s of LST
23
Summary (syntax)

( ) priority in evaluation, delimiting the arguments of a function


[ ] subvector, submatrix, or sublist
[[ ]] element of a list
{ } the scope of a function or loop
, separing the arguments of a function
; separating two expression in a line
: (integer) from – to (like seq)
= the arguments of a function
== ‘equal’ as a logical operand
<- assignment (to an object)

+, -, /, *, %*%, ^, sum, prod, |, &, !, !=, <, >, <=, >=, " "
24

You might also like