Homo Deus A Brief History of Tomorrow

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Data Science Essentials

R Programming

J. R. Shrestha

IOE, Pulchowk Campus

November 2016

1 / 19
Basics

The Language of R is:


Interpreted (Not compiled as C, FORTRAN, etc.)
Case Sensitive (a and A are different)
Loosely Typed (declaration of variables not required)

A simple R statement/command would consist of all or some of:


Variable(s)
Constant(s)
Mathematical Operator(s) [ + - * / ^ %% %/% ]
Functions
Expressions
Assignment [ <- -> = assign( ) ]

2 / 19
Example
r <- 30.5861
2*r -> h
assign( "volume", round(2*pi*r*(r+h), 2) )
volume

Some commonly used Mathematical functions


sin() cos() tan() Trigonometric functions
sinh() cosh() tanh() Hyperbolic functions
asin() acos() atan() Inverse trigonometric functions
asinh() acosh() atanh() Inverse hyperbolic functions
exp() log() Exponent and natural logarithm
log10() log2() Logarithm with base 10 and 2
sqrt() Square root
abs() Absolute value
round() Rounding off
floor() ceiling() trunc() Creates integers from floating
point numbers (double)
3 / 19
Basic Workspace Functions
To list the objects in current workspace:
ls() or objects()
To remove an object named x:
rm(x) or remove(x)
To remove all the objects:
rm(list=ls())
To get the current working directory:
getwd()
To set the current working directory (use / as path separator):
setwd("path")
To quit from the R environment:
q() or quit()
To clear the console window:
ctrl + L
4 / 19
Getting Help

To start the help system (Browser):


help.start()
To get help on a specific function:
help(function_name) or ?function_name
To search the help documentation for specified keyword(s):
help.search("keywords") or ?? "keywords"
To view example on a specific function:
example(function_name)
To get help on a package:
help(package = "packagename")
library(help = "packagename")

5 / 19
Data Types in R

Data Types (classes) Data Structures


Numeric
Double
Integer Vectors
Factors
Complex
Matrices
Logical
Arrays (multidimensional)
Character
Data Frames
Date/Time
Lists
Special Objects
NA (Not Available)
NaN (Not a Number)
Inf (Infinity)
6 / 19
Data Type: Double
Used to represent continuous variables
Default data type for numeric variables in R
Floating point, double precision (2 × 4 bytes)
Value range: ±(2×10−308 to 2×10+308 )
Mostly, real numbers are represented only approximately in computers
Check
(0.5 + 0.25 + 0.125) == 0.875
(0.1 + 0.1 + 0.1) == 0.3

Example
x <- 87.9
y <- "123.45"
typeof(x); typeof(y)
z <- x + as.double(y)
is.double(z)
7 / 19
Data Type: Integer

Used for natural numbers representing counting variables


Default data type for numbers in R is Double
Integers must be created explicitly using as.integer()
Memory consumption: 4 bytes (32 bits)
Value Range: −(231 −1) to +(231 −1)

Example 1 Example 2
m = 20 n = as.integer(20)
is.integer(m) is.integer(n)
typeof(m) typeof(n)

8 / 19
Data Structures: Vector
An ordered collection of objects of the same type
All doubles or all integers or all logical etc.
A single numeric variable in R is regarded as a vector of length one
Can be created by combining similar variables using the function
c(...) or other alternatives [seq(), rep(), etc.]

Examples
x <- c(8.5, 7.3, 6.9, 5.4)
y <- 1:5
z <- c(rev(x), 0, y)
s <- seq(from=-3, to=3, by=0.5)
t <- seq(from=0, to=pi, length=17)
r1 <- rep(1:4, 4)
r2 <- rep(1:4, c(2,1,3,4))
r3 <- rep(1:4, 4:1)
9 / 19
Vector Calculations
Calculations when done on vectors are done element-wise
v1 <- c(10, 20, 30, 40, 50)
v2 <- c(1, 2, 3, 4, 5)
v <- v1 + v2

Functions producing single result (aggregate functions)


sum(v) prod(v) min(v) max(v) range(v)
mean(v) sd(v) var(v) median(v) length(v)

To access specific element(s) of a vector:


v[5] fifth element of vector v
v[2:4] second to fourth elements
v[c(1,2,5)] first, second, and fifth elements
v[-c(2,3)] all elements except second and third
v[v>30] all elements of v greater than 30
10 / 19
Plotting

Year <- c(1800, 1850, 1900, 1950, 2000)


Carbon <- c(8, 54, 534, 1630, 6611)
plot(Carbon~Year, pch=16)

* estimated worldwide totals of carbon emissions that resulted from fossil fuel use
[Marland et al., 2003]

The objects Year and Carbon are vectors which are each formed by
combining separate numbers together
The construct Carbon~Year is a graphics formula
The plot() function interprets it to mean ”plot Carbon as a function
of Year”
pch is a setting which means plot character

11 / 19
Data Structure: Factors
Used to represent categorical data
Data values represented by numerical codes
Also called ‘enumerated type’ or ‘category’

Example
g1 <- c("male", "male", "female", "male", "female")
g2 <- factor(g1)
g3 <- factor(g1, levels=c("male", "female"))
typeof(g1)
typeof(g2) data.class(g2)
is.integer(g2) is.factor(g2)
levels(g2) levels(g3)
as.integer(g2) as.integer(g3)

Mathematical Calculations cannot be done with Factor data


g4 <- g3 + 5
12 / 19
Data Structure: Factors
For categorical data in nominal scale, use factor( )
For categorical data in ordinal scale, use ordered( ) or
factor(..., ordered=TRUE)

Example
t1 <- c("medium", "high", "low", "high", "low")
t2 <- ordered(t1, levels = c("low", "medium", "high"))
table(t2)

low < medium < high

R preserves the ordering information inherent in ordered factors which


becomes an essential tool to gain control over the appearance of
charts/graphs.
In statistical modeling, R applies the appropriate statistical
transformation when we have factors or ordered factors in our model.
13 / 19
Data Structure: Data Frames
Tabular data having columns of different data types
The most convenient data structure for data analysis in R
Most statistical modeling routines in R require a data frame as input
A handful of built-in data sets available for use
Example
data() #list available data sets
data(mtcars) #load the data set mtcars
D <- mtcars #assign/copy mtcars to D
rownames(D) #rowheadings
names(D) #column headings
head(D) #first few rows (cases)
tail(D) #last few rows
str(D) #structure of D
D$mpg or D["mpg"] #access the column named mpg in D
nrow(D) #to get the number or rows
na.omit(D) #NAs omitted
complete.cases(D)
14 / 19
Creating Data Frame
Example 1
height <- c(6.5, 5.5, 6.4, ...)
weight <- c(65.4, 62.3, 59.5, ...)
income <- c(50000, 40000, 35000, ...)
employee <- data.frame(height, weight, income)
rownames(employee) <- c("Ram", "Sita", "Laxman", ... )
employee$Gender <- c("male", "female", "male" ... )
str(employee)

Example 2
g1 <- sample(c("male", "female"), size=20, replace=TRUE)
g2 <- factor(g1, levels=c("male", "female"))
s1 <- rnorm(20, mean=50000, sd=10000)
s2 <- trunc(s1)
DF <- data.frame(g2, s2)
names(DF) <- c("gender", "salary")
15 / 19
Data Structure: Matrix

Two dimensional array


Generalization of vector
Can be regarded as a number of equal length vectors pasted together
All the elements must be of the same data type
Mathematical functions that apply to vectors can also apply to
matrices and are applied on each matrix element

16 / 19
Creating Matrices
By changing the dimension using dim()
> x <- 1:8
> dim(x) <- c(2, 4)
> x
[,1] [,2] [,3] [,4]
[1,] 1 3 5 7
[2,] 2 4 6 8

Using the function matrix()


> x <- matrix(1:8, 2, 4, byrow=F)
> x
[,1] [,2] [,3] [,4]
[1,] 1 3 5 7
[2,] 2 4 6 8
also check: byrow=T
17 / 19
... Creating Matrices

Using cbind() to bind vectors as columns


> x <- cbind(c(1, 2, 3), c(4, 5, 6))
> x
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6

Using rbind() to bind vectors as rows


> x <- rbind(c(1, 2, 3), c(4, 5, 6))
> x
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
18 / 19
Matrix Operations
+ - * / ^ # element-wise addition, subtraction,
multiplication, division, and
exponentiation respectively

C <- A %*% B # Matrix Multiplication (A times B)


X <- solve(A, B) # Solve the system AX=B
X <- solve(A) # X = Inverse of A
Y <- t(X) # Y = Transpose of X
D <- det(A) # D = Determinant of A
Z <- eigen(A) # Eigen values (Z$val) and vectors (Z$vec)

rowSums(A) # sums of each row of A


colSums(A) # sums of each column of A
rowMeans(A) # means of each row of A
colMeans(A) # means of each column of A

rbind(A, B, ...) # combine matrices/vectors vertically


cbind(A, B, ...) # combine matrices/vectors horizontally
19 / 19

You might also like