MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 2
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 2
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 2
Interactive Computational
Methods in Data Analysis
(Lecture 2)
Dr. Farid Zamani
Room: 207 | faridzamani@upm.edu.my
Department of Mathematics and Statistics
Faculty of Science
Revision from Lecture 1
Data structure in R
Factors
List
Data frames
2
Data visualisation Using ggplot2
This chapter will teach you how to visualise your data using ggplot2. R has several systems
for making graphs, but ggplot2 is one of the most elegant and most versatile. ggplot2
implements the grammar of graphics, a coherent system for describing and building
graphs. With ggplot2, you can do more faster by learning one system and applying it in
many places.
This chapter focusses on ggplot2, one of the core members of the tidyverse. To
access the datasets, help pages, and functions that we will use in this chapter, load the
tidyverse by running this code:
Instal.packages (“tidyverse”)
library(tidyverse)
To build a ggplot, we will use the following basic template that can be used for different
types of plots:
4
Use the mpg datasets
You can test your answer with the mpg data frame found in
ggplot2 that is ggplot2::mpg
5
> mpg
Hypothesis:
A car with a low fuel efficiency consumes more fuel
than a car with a high fuel efficiency when they
travel the same distance.
6
Example of ggplot2 package in R
# Loading package
library(dplyr)
library(ggplot2)
library(dplyr)
7
Here we will display and map dataset into certain aesthetics.
# Aesthetic Layer
ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp))+
labs(title = "MTCars Data Plot")
In geometric layer control the essential elements, see how our data being displayed
using point, line, histogram, bar, boxplot
# Geometric layer
ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp)) +
geom_point() +
labs(title = "Miles per Gallon vs Horsepower",
x = "Horsepower",
y = "Miles per Gallon")
8
Geometric layer: Adding Size, color, and shape and then plotting the Histogram plot
# Adding size
ggplot(data = mtcars, aes(x = hp, y = mpg, size =
disp)) +
geom_point() +
labs(title = "Miles per Gallon vs Horsepower",
x = "Horsepower",
y = "Miles per Gallon")
9
# Adding shape and color
ggplot(data = mtcars, aes(x = hp, y = mpg, col =
factor(cyl), shape = factor(am))) +geom_point() +
labs(title = "Miles per Gallon vs Horsepower",
x = "Horsepower",
y = "Miles per Gallon")
10
# Histogram plot
ggplot(data = mtcars, aes(x = hp)) +
geom_histogram(binwidth = 5) +
labs(title = "Histogram of Horsepower",
x = "Horsepower",
y = "Count")
11
Facet Layer:
It is used to split the data up into subsets of the entire dataset and it allows the subsets to
be visualized on the same plot. Here we separate rows according to transmission type
and Separate columns according to cylinders
# Facet Layer
# Separate rows according to transmission type
p <- ggplot(data = mtcars, aes(x = hp, y = mpg, shape =
factor(cyl))) + geom_point()p + facet_grid(am ~ .) +
labs(title = "Miles per Gallon vs Horsepower",
x = "Horsepower",
y = "Miles per Gallon")
12
# Separate columns according to
cylinders
p <- ggplot(data = mtcars, aes(x = hp, y
= mpg, shape = factor(cyl))) +
geom_point()
p + facet_grid(. ~ cyl) +
labs(title = "Miles per Gallon vs
Horsepower",
x = "Horsepower",
y = "Miles per Gallon")
13
Statistics layer
In this layer, we transform our data using binning, smoothing, descriptive, intermediate
14
Activity 2
15
First install and load the ggcorrplot and ggplot2 package
16
Create the correlation matrix
We will take a sample dataset for explaining our approach better. We will take the
inbuilt USArrests dataset, and we will visualize its correlation matrix following
the above approach.
Syntax :
correlation_matrix <- round(cor(data),1)
Parameters :
• correlation_matrix : Variable for correlation matrix used to visualize.
• data : data is our dataset which we have taken for visualization.
Syntax:
corrp.mat <- cor_pmat(data)
Parameters :
• corrp.mat : Variable for correlation matrix with p-values.
• data : It is our dataset taken for creating correlation matrix with p-values.
17
# Installing and loading the ggcorrplot package
install.packages("ggcorrplot")
library(ggcorrplot)
head(correlation_matrix[, 1:4])
head(corrp.mat[, 1:4])
18
Visualizing correlation matrix
Now since we have a correlation matrix and the correlation matrix with p-values, we
will now try to visualize this correlation matrix.
The first visualization is to use the ggcorrplot() function and plot our correlation matrix
in the form of the square and circle method.
Syntax :
ggcorrplot(correlation_matrix, method= c(“circle”,”square”))
Parameters :
19
Visualizing the correlation matrix using different
methods
# Visualizing the correlation matrix using
# square and circle methods
ggcorrplot(correlation_matrix, method ="square")
ggcorrplot(correlation_matrix, method ="circle")
20
Next, we will visualize correlogram layout types in our correlation
matrix and providing hc.order and type as lower for lower triangle
layout and upper for upper triangle layout as parameters in
ggcorrplot() function.
ggcorrplot(correlation_matrix, hc.order
=TRUE, type ="upper",
outline.color ="white")
21
Reordering the correlation matrix
Syntax :
ggcorrplot(correlation_matrix, hc.order = TRUE, outline.color = “white”)
Parameters :
• correlation_matrix : The correlation matrix used for visualization.
• hc.order : If it is true, then the correlation matrix will be ordered.
• outline.color : It is the outline color of the square or circle.
22
Introducing correlation coefficient
We will now visualize our correlation matrix by adding the correlation
coefficient using the ggcorrplot function and providing correlation matrix,
hc.order, type, and lower variables as arguments.
Syntax :
ggcorrplot(correlation_matrix, hc.order = TRUE, type = “lower”, lab = TRUE)
Parameters :
• correlation_matrix : The correlation matrix used for visualization.
• hc.order : If it is true, then the correlation matrix will be ordered.
• type : It is the arrangement of the character to display.
• lab : It is a logical value. If it is true, then we add the correlation coefficient to our
matrix.
23
# Adding the correlation coefficient
ggcorrplot(correlation_matrix,
hc.order =TRUE,type ="lower", lab
=TRUE)
24
TERIMA KASIH / THANK YOU
www.upm.edu.my