Data Analytics using R
Programming
By
Dr. K. Sasirekha M.C.A., M.Phil., Ph.D.,
Department of Computer Science
Periyar University
1
Agenda
• Data Analytics – Basics
• Getting Started with R
• Some basic operations in R
• Packages in R
• Data Analytics with R
2
Data Analytics - Basics
What is Data?
Data
Qualitative Quantitative
Discrete Continuous
5 3.45
3
Data Analytics - Basics
What is Data Analytics?
• Analyzing raw data in order to make
conclusions about that information
4
Data Analytics - Basics
• Descriptive analytics What has happened ?
• Diagnostic analytics What has happened in
depth?
• Predictive analytics What might happen ?
• Prescriptive analytics What should we do ?
5
Data Analytics - Basics
Applications
• Business Analytics
• Health Analytics
• Web Analytics
• Risk Analytics
6
Getting Started with R
• R (the language) was created in the early
1990s
is based upon the S language
is a high-level language
is an interpreted language
7
Installing R
• To install R you must first go to
http://www.r-project.org
• Once you’ve chosen a mirror close to you,
click that link and select your platform.
8
Choosing an IDE
• If you use R under Windows or Mac OS X, then a graphical
user interface (GUI) is available to you.
• Some of he best GUIs are:
Eclipse/Architect
RStudio
Revolution-R
Live-R
Tinn-R
https://www.rstudio.com/
9
Variable Assignment
• Assign values to variables with the assignment operator "=“
• Note that another form of assignment operator "<-" is also in use
> X = 2;
X
[1] 2
> X <- 5
X
[1] 5
• Comment : #
10
Basic Data Types
Numeric
Integer
Complex
Logical
Character
Factor
Date
11
Numeric
> x = 10.5 # assign a decimal value
>x # print the value of x
[1] 10.5
> class(x) # print the class name of x
[1] "numeric"
12
Integer
• In order to create an integer variable in R, we invoke the as.integer
function.
For example,
> y = as.integer(3)
>y # print the value of y
[1] 3
> class(y) # print the class name of y
[1] "integer"
> is.integer(y) # is y an integer?
[1] TRUE
13
Complex
• A complex value in R is defined via the pure imaginary
value i
• For example,
> z = 1 + 2i # create a complex number
>z # print the value of z
[1] 1+2i
> class(z) # print the class name of z
[1] "complex“
14
Logical
> x = 1; y = 2 # sample values
>z=x>y # is x larger than y?
>z # print the logical value
[1] FALSE
> class(z) # print the class name of z
[1] "logical"
15
Character
> x = as.character( “hai”)
>x # print the character string
[1] “hai”
> class(x) # print the class name of x
[1] "character"
16
Date
> temp <- c("12-09-1973")
> z <- as.Date(temp, "%d-%m-%Y")
>z
[1] "1973-09-12”
> class(z)
[1] "Date"
17
Data structures
Before you can perform statistical analysis in R, your
data has to be structured in some coherent way. To
store your data R has the following structures:
Vector
Matrix
Array
Data frame
List
Time-series
18
Vector
A vector is a sequence of data elements of the same basic type.
For example, Here is a vector containing three numeric values 2, 3, 5.
> c(2, 3, 5)
[1] 2 3 5
Here is a vector of logical values.
> c(TRUE, FALSE, TRUE, FALSE, FALSE)
[1] TRUE FALSE TRUE FALSE FALSE
19
Vector Operations
#creating vector using ':' operator
a = 1:5; a
b = -3:4; b
#creating vector using seq function
c=seq(from=1, to=10, by=2); c
#Access Elements of a Vector
a[3]
a[1:3]
a[c(F,T,T,F,T)]
20
Vector Operations Cont’d
#Performing Vector Arithmetic
a = 1:4; b = 5:8 ;
a
b
c = a + b; c
c = a - b; c
c =a * b; c
c = a / b; c
c = a + (b)^2; c
c = 2+a; c
c =2+3*b; c
c =(2+3)*b; c
21
Vector Operations Cont’d
#Vector Repetition
e=rep(5,4) ; e
# Replace single element
e[1]=10
e
e=e[e!=10]
e
#Delete single element
e=e[-1]
e
#Delete Entire Vector
e= NULL
e
22
Matrix Operations
#A matrix is a two-dimensional array
#Creating a Matrix
A=matrix(1:9, nrow = 3); A
B=matrix(1:9, nrow=3, byrow=TRUE); B
#Access Elements of a matrix
A[2, 3]
A[2, ]
A[ ,3]
#Combining Matrices
a = matrix(1:9, 3,3); a
b = matrix(10:18, 3,3); b
cbind(a,b)
rbind(a,b)
23
Matrix Operations Cont’d
#Matrix Arithmetic
c = a+b; c
c = a-b; c
c = a*b; c
c = a/b; c
#Modify Matrix Elements
a[3,3] = 0; a
a[a > 5] = 0; a
24
Array
• In R, Arrays are generalizations of vectors and
matrices.
> z = array(1:27,dim=c(3,3,3))
> dim(z)
[1] 3 3 3
print(z)
z[,,3]
25
List Operations
# A list contain elements of different types like − numbers, strings,
vectors
mylist= list( c(1, 1, 2, 5, 14, 42), month.abb, matrix(c(3, -8, 1, -3),
nrow = 2))
mylist
#Naming list elements
names(mylist) = c("numbers", "months", "matrix")
mylist
#A list’s length is the number of top-level elements that it
contains
length(mylist)
26
List Operations Cont’d
#Arithmetic operations on list
L1 = list(1:5);
L1
L2 = list(6:10);
L2
L1[[1]] + L2[[1]]
L1[[1]] - L2[[1]]
L1[[1]] * L2[[1]]
L1[[1]] / L2[[1]]
27
Data Frame
#Data frame is a two dimensional data structure in R
#hold different type of data
#A data frame is created with the data.frame() function
#mydata <- data.frame(col1, col2,.,colN)
#where col1, col2, col3, . are column vectors of any type
(such as character, numeric, or logical)
28
Data Frame Operations
#Creating a Data Frame
patientID <- c(1, 2, 3, 4)
age <- c(25, 34, 28, 52)
diabetes <- c("Type1", "Type2", "Type1", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor")
patientdata <- data.frame(patientID, age, diabetes, status)
patientdata
#Data Frame Properties
nrow(patientdata)
ncol(patientdata)
29
Data Frame Operations
#Accessing of a elements in Data Frame
patientdata[1:2]
#Modifying elements in Data Frame
patientdata[1, "age"] <- 30
patientdata
#Adding elements to a Data Frame
patientdata <- rbind(patientdata, list(5, 40, "Type2", "Improved"))
Patientdata
#Deleting Components from Data Frame
patientdata$gender <- NULL
patientdata
patientdata[-5,]
30
Function and Control Stmt
#A series of numbers in which each number #is the sum of the two preceding numbers.
#The simplest is the series 1, 1, 2, 3, 5, 8, etc.
Fibonacci <- function(n)
{
#if else Statement
if (n==1)
{
x <- 0
}
else
{
x <- c(0,1)
# While Loop
while (length(x) < n)
{
position <- length(x)
new <- x[position] + x[position-1]
x <- c(x,new)
}
}
return(x)
} 31
Packages
Packages are collections of R functions, compiled code, data,
documentation, and tests, in a well-defined format.
The directory where packages are stored is called the library.
R comes with a standard set of packages.
Others are available for download and installation.
>library() # see all packages installed
>install.packages("class")
>search() # see packages currently loaded
32
Packages Cont’d
• Adding Packages
33
Statistical Operations
#to get the iris dataset
dm=iris[,-5]
#dataset to convert into matrix
dm=as.matrix(dm)
meandm=mean(dm)
meandm
mediandm=median(dm)
mediandm
sddm=sd(dm)
sddm
34
Data Exploration Operations
s=c(50,80,90,25,70)
maximum=max(s)
minimum=min(s)
total=sum(s)
average=ave(s)
squareroot=sqrt(s)
round=round(squareroot)
Summary ()
35
DATA VISUALIZATION OPERATIONS
#Visualization of Average Rainfall in India for Last 10 Years
Year=c(2009,2010,2011,2012,2013,2014,2015,2016,2017,2018);
Rainfall=c(69.43,43.15,35.23,50.03,60.02,47.62,48.38,38.69,52.48,58.18);
names(Rainfall)=Year
#Pie Chart
pie(Rainfall,col=Year,main="Average Rainfall in India for Last 10 Years")
#Bar Chart
barplot(Rainfall,col=Year, main="Average Rainfall in India for Last 10 Years")
36
DATA VISUALIZATION OPERATIONS Cont’d
#Histograms
hist(Rainfall,col="yellow", border="blue")
#Line Graph
plot(Year,Rainfall,type='o', col="blue", main="Average
Rainfall in India for Last 10 Years")
#Scatterplot
plot(Year, Rainfall, col="red", main="Average Rainfall in
India for Last 10 Years")
37