MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 2

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

MTH 4407

Interactive Computational
Methods in Data Analysis
(Lecture 2)
Dr. Farid Zamani
Room: 207 | faridzamani@upm.edu.my
Department of Mathematics and Statistics
Faculty of Science
Revision from Lecture 1
Data structure in R

How does it work?

Vectors – numeric, character

Factors

List

Matrices and Arrays

Data frames

2
Data visualisation Using ggplot2
This chapter will teach you how to visualise your data using ggplot2. R has several systems
for making graphs, but ggplot2 is one of the most elegant and most versatile. ggplot2
implements the grammar of graphics, a coherent system for describing and building
graphs. With ggplot2, you can do more faster by learning one system and applying it in
many places.

This chapter focusses on ggplot2, one of the core members of the tidyverse. To
access the datasets, help pages, and functions that we will use in this chapter, load the
tidyverse by running this code:

Instal.packages (“tidyverse”)
library(tidyverse)

To build a ggplot, we will use the following basic template that can be used for different
types of plots:

ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) + <GEOM_FUNCTION>()


Activity 1

Visualizing mpg data using ggplot2

4
Use the mpg datasets

Let’s use our first graph to answer a question: Do cars


with big engines use more fuel than cars with small
engines? You probably already have an answer but try
to make your answer precise.
What does the relationship between engine size and
fuel efficiency look like? Is it positive? Negative?
Linear? Nonlinear?

You can test your answer with the mpg data frame found in
ggplot2 that is ggplot2::mpg

5
> mpg

Among the variables in mpg are: displ, a car’s


engine size, in litres.

hwy, a car’s fuel efficiency on the highway, in miles


per gallon (mpg).

Hypothesis:
A car with a low fuel efficiency consumes more fuel
than a car with a high fuel efficiency when they
travel the same distance.

6
Example of ggplot2 package in R

We devise visualizations on mtcars dataset which includes 32 car brands and 11


attributes using ggplot2 layers.

# Installing the package


install.packages("dplyr")

# Loading package
library(dplyr)

# Summary of dataset in package


summary(mtcars)

library(ggplot2)
library(dplyr)

#you will see a title only


ggplot(data = mtcars) + labs(title = "MTCars Data Plot")

7
Here we will display and map dataset into certain aesthetics.

# Aesthetic Layer
ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp))+
labs(title = "MTCars Data Plot")

In geometric layer control the essential elements, see how our data being displayed
using point, line, histogram, bar, boxplot

# Geometric layer
ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp)) +
geom_point() +
labs(title = "Miles per Gallon vs Horsepower",
x = "Horsepower",
y = "Miles per Gallon")

8
Geometric layer: Adding Size, color, and shape and then plotting the Histogram plot

# Adding size
ggplot(data = mtcars, aes(x = hp, y = mpg, size =
disp)) +
geom_point() +
labs(title = "Miles per Gallon vs Horsepower",
x = "Horsepower",
y = "Miles per Gallon")

9
# Adding shape and color
ggplot(data = mtcars, aes(x = hp, y = mpg, col =
factor(cyl), shape = factor(am))) +geom_point() +
labs(title = "Miles per Gallon vs Horsepower",
x = "Horsepower",
y = "Miles per Gallon")

10
# Histogram plot
ggplot(data = mtcars, aes(x = hp)) +
geom_histogram(binwidth = 5) +
labs(title = "Histogram of Horsepower",
x = "Horsepower",
y = "Count")

11
Facet Layer:

It is used to split the data up into subsets of the entire dataset and it allows the subsets to
be visualized on the same plot. Here we separate rows according to transmission type
and Separate columns according to cylinders
# Facet Layer
# Separate rows according to transmission type
p <- ggplot(data = mtcars, aes(x = hp, y = mpg, shape =
factor(cyl))) + geom_point()p + facet_grid(am ~ .) +
labs(title = "Miles per Gallon vs Horsepower",
x = "Horsepower",
y = "Miles per Gallon")

12
# Separate columns according to
cylinders
p <- ggplot(data = mtcars, aes(x = hp, y
= mpg, shape = factor(cyl))) +
geom_point()
p + facet_grid(. ~ cyl) +
labs(title = "Miles per Gallon vs
Horsepower",
x = "Horsepower",
y = "Miles per Gallon")

13
Statistics layer

In this layer, we transform our data using binning, smoothing, descriptive, intermediate

ggplot(data = mtcars, aes(x = hp, y = mpg)) +


geom_point() + stat_smooth(method = lm, col = "red") +
labs(title = "Miles per Gallon vs Horsepower")

14
Activity 2

Visualizing correlation matrix using


ggplot2

15
First install and load the ggcorrplot and ggplot2 package

Second, After computing the correlation matrix, we will compute


the matrix of correlation p-values using
the corr_pmat() function.

Third, we will visualize the correlation matrix with the help


of ggcorrplot() function using ggplot2.

16
Create the correlation matrix
We will take a sample dataset for explaining our approach better. We will take the
inbuilt USArrests dataset, and we will visualize its correlation matrix following
the above approach.

Syntax :
correlation_matrix <- round(cor(data),1)
Parameters :
• correlation_matrix : Variable for correlation matrix used to visualize.
• data : data is our dataset which we have taken for visualization.

Syntax:
corrp.mat <- cor_pmat(data)
Parameters :
• corrp.mat : Variable for correlation matrix with p-values.
• data : It is our dataset taken for creating correlation matrix with p-values.

17
# Installing and loading the ggcorrplot package
install.packages("ggcorrplot")
library(ggcorrplot)

# Reading the data


data(USArrests)

# Computing correlation matrix


correlation_matrix <- round(cor(USArrests),1)

head(correlation_matrix[, 1:4])

# Computing correlation matrix with p-values


corrp.mat <- cor_pmat(USArrests)

head(corrp.mat[, 1:4])
18
Visualizing correlation matrix
Now since we have a correlation matrix and the correlation matrix with p-values, we
will now try to visualize this correlation matrix.

The first visualization is to use the ggcorrplot() function and plot our correlation matrix
in the form of the square and circle method.

Syntax :
ggcorrplot(correlation_matrix, method= c(“circle”,”square”))

Parameters :

• correlation_matrix : The correlation matrix for visualization.


• method : It is a character value used for visualization methods.

19
Visualizing the correlation matrix using different
methods
# Visualizing the correlation matrix using
# square and circle methods
ggcorrplot(correlation_matrix, method ="square")
ggcorrplot(correlation_matrix, method ="circle")

20
Next, we will visualize correlogram layout types in our correlation
matrix and providing hc.order and type as lower for lower triangle
layout and upper for upper triangle layout as parameters in
ggcorrplot() function.

# Visualizing upper and lower triangle


layouts
ggcorrplot(correlation_matrix, hc.order
=TRUE, type ="lower",
outline.color ="white")

ggcorrplot(correlation_matrix, hc.order
=TRUE, type ="upper",
outline.color ="white")

21
Reordering the correlation matrix
Syntax :
ggcorrplot(correlation_matrix, hc.order = TRUE, outline.color = “white”)

Parameters :
• correlation_matrix : The correlation matrix used for visualization.
• hc.order : If it is true, then the correlation matrix will be ordered.
• outline.color : It is the outline color of the square or circle.

# Visualizing and reordering


correlation # matrix
ggcorrplot(correlation_matrix,
hc.order =TRUE,
outline.color ="white")

22
Introducing correlation coefficient
We will now visualize our correlation matrix by adding the correlation
coefficient using the ggcorrplot function and providing correlation matrix,
hc.order, type, and lower variables as arguments.

Syntax :
ggcorrplot(correlation_matrix, hc.order = TRUE, type = “lower”, lab = TRUE)

Parameters :
• correlation_matrix : The correlation matrix used for visualization.
• hc.order : If it is true, then the correlation matrix will be ordered.
• type : It is the arrangement of the character to display.
• lab : It is a logical value. If it is true, then we add the correlation coefficient to our
matrix.

23
# Adding the correlation coefficient
ggcorrplot(correlation_matrix,
hc.order =TRUE,type ="lower", lab
=TRUE)

24
TERIMA KASIH / THANK YOU
www.upm.edu.my

You might also like