Basic R Programming
Basic R Programming
Basic R Programming
1 Why R?
• R is high-level programming language and popular among biostatistician
• R is free open source and everyone can contribute to R (e.g. publish paper to develop new R packages)
• R can install many ready-made packages
1
4 Some examples
> # hi, I am the comment
> print("Hello world!")
[1] 303.3
> a == b # a equals to b?
[1] FALSE
[1] TRUE
> 5/4
[1] 1.25
> 1/0
[1] Inf
> cos(pi)
[1] -1
> max(4,5)
[1] 5
[1] 2
[1] 2.197225
[1] TRUE
2
5 Data Structures
5.1 Vectors
Creat a vector
> v1 <- -2:5
> v1
[1] -2 -1 0 1 2 3 4 5
[1] 3.0 3.0 2.0 2.0 7.2 7.2 0.9 0.9 100.0 100.0
[1] 3.0 2.0 7.2 0.9 100.0 3.0 2.0 7.2 0.9 100.0
[1] 3.0 2.0 2.0 7.2 7.2 7.2 0.9 0.9 0.9 0.9 100.0 100.0
[13] 100.0 100.0 100.0
> sample(1:10)
[1] 10 8 3 7 2 9 4 6 5 1
Reference elements
> v2
> v2[3]
[1] 7.2
> v2[c(2,4)]
3
[1] 2.0 0.9
> v2[-c(2,4)]
> v2[v2>5]
> v2[2]=88
Vector operations
> v2
> which(v2>=3)
[1] 1 2 3 5
> which.min(v2)
[1] 4
> length(v2)
[1] 5
> v2*10+1
> (v2*10+1)[2:4][2]
[1] 73
5.2 Matrices
> d <- sample(1:20)
> d
[1] 3 17 2 16 14 12 4 8 15 20 13 7 6 19 1 10 9 11 18 5
4
cols
rows a b c d e
cat 3 17 2 16 14
dog 12 4 8 15 20
rat 13 7 6 19 1
fish 10 9 11 18 5
[1] 15
[1] 15
> m1[c(2,4),3]
dog fish
8 11
> m1[c("dog","fish"),"c"]
dog fish
8 11
> m1[3,]
a b c d e
13 7 6 19 1
> m1["rat",]
a b c d e
13 7 6 19 1
[1] 4 5
[1] 4
[1] 5
[1] 20
5
> colnames(m1) # column name
[,1] [,2]
[1,] 111 333
[2,] 222 444
cols
rows a b c d e
cat 3 17 2 16 14
dog 12 4 111 15 333
rat 13 7 222 19 444
elephant 10 9 11 18 5
[,1] [,2]
[1,] 21 24
[2,] 22 25
[3,] 23 26
6
> m3%*%m5 # matrix manipulation, if m3%*%m4, it will throw out error message
[,1] [,2]
[1,] 202 229
[2,] 268 304
> # several useful functions
> t(m1) # transpose
rows
cols cat dog rat elephant
a 3 12 13 10
b 17 4 7 9
c 2 111 222 11
d 16 15 19 18
e 14 333 444 5
> apply(m1,1,sum) # sum of each row
cat dog rat elephant
52 475 705 53
> apply(m1,1,sqrt) # pay attention to the output dimension
rows
cat dog rat elephant
a 1.732051 3.464102 3.605551 3.162278
b 4.123106 2.000000 2.645751 3.000000
c 1.414214 10.535654 14.899664 3.316625
d 4.000000 3.872983 4.358899 4.242641
e 3.741657 18.248288 21.071308 2.236068
> apply(m1,2, function(x) return(sum(x^2+1)))
a b c d e
426 439 61734 1170 308250
> rbind(m3,m4) # row combination
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[3,] 11 12 13
[4,] 14 15 16
> cbind(m3,m4) # column combination
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 3 5 11 12 13
[2,] 2 4 6 14 15 16
> as.vector(m3)
[1] 1 2 3 4 5 6
7
5.3 Arrays
An array is a muti-dimensional matrix.
> # generate an array var
> a <- array(1:24,dim=c(4,3,2),
+ dimnames=list(c("a","b","c","d"),c("x","y","z"),c("old","new") ))
> a
, , old
x y z
a 1 5 9
b 2 6 10
c 3 7 11
d 4 8 12
, , new
x y z
a 13 17 21
b 14 18 22
c 15 19 23
d 16 20 24
> # reference elements
> a[2,1,"new"] # pay attention to the dimension change
[1] 14
> a[-2,"x",]
old new
a 1 13
c 3 15
d 4 16
> # operation
> dim(a)
[1] 4 3 2
> apply(a,3,mean)
old new
6.5 18.5
5.4 Lists
> # generate a list var
> l1 <- list(name=c("Peter","Lily","Emma"),c("yes","no"),
+ age=c(20,40,33,rep(18,times=3)),
+ value=matrix(1:6,2,3))
> l1
8
$name
[1] "Peter" "Lily" "Emma"
[[2]]
[1] "yes" "no"
$age
[1] 20 40 33 18 18 18
$value
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> l1[["age"]]
[1] 20 40 33 18 18 18
> l1$name
> l1[["value"]][2,]
[1] 2 4 6
> length(l1)
[1] 4
9
[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 2 2 2 2 2
[2,] 2 2 2 2 2
[3,] 2 2 2 2 2
[4,] 2 2 2 2 2
[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
[[3]]
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
[[1]]
[1] 40
[[2]]
[1] 55
[[3]]
[1] 3
5.5 Dataframes
> # generate a dataframe
> ID <- 100:103
> name <- c("Peter","Lily","Emma","Joe")
> sex <- c("M","F","F","M")
> age <- c(22,30,16,44)
> married <- c(F,T,F,T)
> d1 <- data.frame(ID,name,sex,age,married)
> rownames(d1) <-name
> d1
10
Lily 101 Lily F 30 TRUE
Emma 102 Emma F 16 FALSE
Joe 103 Joe M 44 TRUE
[1] 16
> d1[d1$ID==103,]
> # operation
> d2 <- d1[order(d1[,"age"]),]
> d2
6 Control Structures
6.1 if()...else
if(condition){
expression 1, if TRUE
}else{
expression 2, if FALSE
}
[1] "Congratulations!"
6.2 ifelse()
ifelse(condition, TRUE expression, FALSE expression)
11
[,1] [,2] [,3]
[1,] 2 6 1
[2,] 5 3 4
> y <- ifelse(x>3,1,0)
> y
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 1 0 1
6.3 switch()
switch (statement, list)
> grade="D"
> switch(grade, A="Great job!", B="Not bad", C="So-so", D="Donnot cry! I trust you!")
[1] "Donnot cry! I trust you!"
12
[1] 55
> a <- 0
> i <- 0
> while(i<10){
+ i <- i+1
+ a <- a+i
+ }
> a
[1] 55
7 Functions
functionName <- function(arguments){
body
}
> # example 1:
> my.func <- function(grade){
+ switch(grade, A="Great job!", B="Not bad", C="So-so", D="Fail again?... OK... Just cry!")
+ }
> my.func("D")
[1] 120
> factorial(5)
[1] 120
13
> # example 3: a function with mutiple arguments
> my.func2 <- function(x,y){
+ len=length(y)
+ return(x[1:len])
+ }
> my.func2(c(4,3,2,1),c(1,2))
[1] 4 3
9 Graph
> # set the figure contain four sub graphs
> par(mfrow=c(2,2)) # 2-by-2 sub graph
14
> # graph 1: plot
> x <- 1:10
> y <- seq(0.1,1,by=0.1) # theoretical value
> z <- y + rnorm(10,mean=0,sd=0.1) # actual value
> plot(x, z, type="p",col="red",main="Plot", xlab="x label", ylab="value")
> lines(x, y)
> # graph 2: histogram
> age <- rnorm(1000,mean=20,sd=3)
> hist(age,main="Histogram",xlab="age",ylab="counts")
> # graph 3: boxplot
> boxplot(age,main="boxplot",ylab="age")
> # graph 4: qq plot
> qqnorm(age,main="qq plot")
10 At the end
1. R is similar to MATLAB, but has differences in many details
2. R programming courses
UPitt: BIOST 2094 – STATISTICAL COMPUTING AND DATA ANALYSIS USING R
3. Some useful references
http://cran.r-project.org/doc/manuals/r-release/R-lang.html
http://www.cyclismo.org/tutorial/R/
http://ww2.coastal.edu/kingw/statistics/R-tutorials/
OR, just google it!
15
Plot Histogram
0.8
150
counts
value
0.4
50
0.0
0
2 4 6 8 10 10 15 20 25 30
x label age
boxplot qq plot
Sample Quantiles
25
25
20
20
age
15
15
10
10
−3 −2 −1 0 1 2 3
Theoretical Quantiles
16