Data Analytics Using R
Data Analytics Using R
Programming
By
Dr. K. Sasirekha M.C.A., M.Phil., Ph.D.,
Department of Computer Science
Periyar University
1
Agenda
2
Data Analytics - Basics
What is Data?
Data
Qualitative Quantitative
Discrete Continuous
5 3.45
3
Data Analytics - Basics
What is Data Analytics?
4
Data Analytics - Basics
• Descriptive analytics What has happened ?
5
Data Analytics - Basics
Applications
• Business Analytics
• Health Analytics
• Web Analytics
• Risk Analytics
6
Getting Started with R
7
Installing R
• To install R you must first go to
http://www.r-project.org
• Once you’ve chosen a mirror close to you,
click that link and select your platform.
8
Choosing an IDE
• If you use R under Windows or Mac OS X, then a graphical
> X <- 5
X
[1] 5
• Comment : #
10
Basic Data Types
Numeric
Integer
Complex
Logical
Character
Factor
Date
11
Numeric
> x = 10.5 # assign a decimal value
>x # print the value of x
[1] 10.5
12
Integer
• In order to create an integer variable in R, we invoke the as.integer
function.
For example,
> y = as.integer(3)
>y # print the value of y
[1] 3
> class(y) # print the class name of y
[1] "integer"
> is.integer(y) # is y an integer?
[1] TRUE
13
Complex
• A complex value in R is defined via the pure imaginary
value i
• For example,
[1] FALSE
[1] "logical"
15
Character
[1] “hai”
[1] "character"
16
Date
> temp <- c("12-09-1973")
> z <- as.Date(temp, "%d-%m-%Y")
>z
[1] "1973-09-12”
> class(z)
[1] "Date"
17
Data structures
Vector
Matrix
Array
Data frame
List
Time-series
18
Vector
A vector is a sequence of data elements of the same basic type.
> c(2, 3, 5)
[1] 2 3 5
Here is a vector of logical values.
c =2+3*b; c
c =(2+3)*b; c
21
Vector Operations Cont’d
#Vector Repetition
e=rep(5,4) ; e
#Combining Matrices
a = matrix(1:9, 3,3); a
b = matrix(10:18, 3,3); b
cbind(a,b)
rbind(a,b)
23
Matrix Operations Cont’d
#Matrix Arithmetic
c = a+b; c
c = a-b; c
c = a*b; c
c = a/b; c
> dim(z)
[1] 3 3 3
print(z)
z[,,3]
25
List Operations
# A list contain elements of different types like − numbers, strings,
vectors
mylist= list( c(1, 1, 2, 5, 14, 42), month.abb, matrix(c(3, -8, 1, -3),
nrow = 2))
mylist
26
List Operations Cont’d
#Arithmetic operations on list
L1 = list(1:5);
L1
L2 = list(6:10);
L2
L1[[1]] + L2[[1]]
L1[[1]] - L2[[1]]
L1[[1]] * L2[[1]]
L1[[1]] / L2[[1]]
27
Data Frame
#Data frame is a two dimensional data structure in R
• Adding Packages
33
Statistical Operations
#to get the iris dataset
dm=iris[,-5]
meandm=mean(dm)
meandm
mediandm=median(dm)
mediandm
sddm=sd(dm)
sddm
34
Data Exploration Operations
s=c(50,80,90,25,70)
maximum=max(s)
minimum=min(s)
total=sum(s)
average=ave(s)
squareroot=sqrt(s)
round=round(squareroot)
Summary ()
35
DATA VISUALIZATION OPERATIONS
#Visualization of Average Rainfall in India for Last 10 Years
Year=c(2009,2010,2011,2012,2013,2014,2015,2016,2017,2018);
Rainfall=c(69.43,43.15,35.23,50.03,60.02,47.62,48.38,38.69,52.48,58.18);
names(Rainfall)=Year
#Pie Chart
pie(Rainfall,col=Year,main="Average Rainfall in India for Last 10 Years")
#Bar Chart
barplot(Rainfall,col=Year, main="Average Rainfall in India for Last 10 Years")
36
DATA VISUALIZATION OPERATIONS Cont’d
#Histograms
hist(Rainfall,col="yellow", border="blue")
#Line Graph
plot(Year,Rainfall,type='o', col="blue", main="Average
Rainfall in India for Last 10 Years")
#Scatterplot
plot(Year, Rainfall, col="red", main="Average Rainfall in
India for Last 10 Years")
37