3rd Class

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Computer Programming with R

The apply family of functions

Salma Akter
Lecturer
Department of Statistics
Jagannath University
The apply family of functions

R has several looping functions known as apply family of functions. These functions include

• apply(): Apply a function over the margins of an array


• lapply(): Loop over a list and evaluate a function on each element
• sapply(): Same as lapply but try to simplify the result
• tapply(): Apply a function over subsets of a vector
• mapply(): Multivariate version of lapply
apply ()

The function apply can be used to apply a function to the rows or columns of a matrix.

Syntax:

apply(X, MARGIN, FUN, ...)

The arguments to apply() are

• X is an array

• MARGIN is an integer vector indicating which margins should be “retained”. when MARGIN = 1, it
applies over rows, whereas with MARGIN = 2, it works over columns. Note that when you use the
construct MARGIN = c(1, 2), it applies to both rows and columns.

• FUN is a function to be applied

• ... is for other arguments to be passed to FUN


apply ()

• Ex:
Create a 20 by 10 matrix of Normal random numbers. Then compute the mean of each column.
x <- matrix(rnorm(200), 20, 10)
apply(x, 2, mean)

## We can also compute the sum of each row


apply(x, 1, sum)
apply(x,c(1,2),sum)

Ex:
x<-matrix(runif(50),nrow=5,ncol=10)
apply(x,1,mean)
apply(x,2,sum)
apply ()

## For the special case of column/row sums and column/row means of matrices, we have some useful
shortcuts.
• rowSums = apply(x, 1, sum)
• rowMeans = apply(x, 1, mean)
• colSums = apply(x, 2, sum)
• colMeans = apply(x, 2, mean)
lapply

• The lapply() function does the following simple series of operations:


1. it loops over a list, iterating over each element in that list
2. it applies a function to each element of the list (a function that we specify)
3. and returns a list.
Syntax:
lappy(X, FUN, ...)
This function takes three arguments:
X is a list, dataframe or vector
FUN is a function or more functions to be applied
... is possible arguments for functions
• Have a look
n<-list(pop=8405837, cities= c(“Dhaka”, “Cumilla”, “Rajshahi”))

for ( info in n){


print(class(info))
}

lapply(n,class)
lapply

Ex:
x <- list(a = 1:5, b = rnorm(10))
lapply(x, mean)

Ex:
x <- list(a = 1:4, b= rnorm(20, 1), c = rnorm(100, 5))
lapply(x, mean)

Ex:
x<-c("abc","defghi","d","pqrz")
lapply(x,nchar)
sapply

• The sapply() function behaves similarly to lapply(); the only real difference is in the return value.
sapply() will try to simplify the result of lapply().

• If the result is a list where every element is length 1, then a vector is returned

• If the result is a list where every element is a vector of the same length (> 1), a matrix
is returned.

• If it can’t figure things out, a list is returned.

Syntax:
sappy(X, FUN, ...)
sapply

Ex:

x <- list(a = 1:4, b= rnorm(20, 1), c = rnorm(100, 5))

sapply(x, mean)

Ex:

x<-c("abc","defghi","d","pqrz")

sapply(x,nchar)
tapply

• tapply() is used to apply a function over subsets of a vector. tapply splits the array based on
specified data, usually factor levels and then applies the function to it.

Syntax:
tapply (X, INDEX, FUN = NULL, ..., simplify = TRUE)

The arguments to tapply() are as follows:


• X is a vector
• INDEX is a factor or a list of factors (or else they are coerced to factors)
• FUN is a function to be applied
• … contains other arguments to be passed FUN
• simplify, should we simplify the result?
tapply

• Ex: Consider the dataset dmbp13 in the rugarch package.

install.packages(“rugarch”)

library(rugarch)

data(dmbp)

head(dmbp)

The data set contains the daily percentage nominal returns and a dummy variable

that takes the value of 1 on Mondays and other days following no trading in

the Deutschemark or British pound/ U.S. dollar market during regular European

trading hours and 0 otherwise.


tapply

• If we want to compute the mean daily percentage nominal returns grouped by days (0 or 1), we can
use the tapply function.

ret <- dmbp$V1

The dummy variable should be stored as a factor. We add a vector of labels for

the levels (0 ="Not Monday" and 1 ="Monday").

days <- factor(dmbp$V2, labels = c("Not Monday", "Monday"))

tapply(ret, days, mean) # mean daily % return grouped by days


mapply

• The mapply() function is a multivariate apply of sorts which applies a function in parallel over a set

of arguments. It will apply the specified function to the first element of each argument first,

followed by the second element, and so on.

Ex:

a<-1:5

b<-6:10

mapply(sum,a,b)

You might also like