Session 4: Data Visualization
R for Stata Users
Luiza Andrade, Rob Marty, Rony Rodriguez-Ramirez, Luis Eduardo San Martin, Leonardo Viotti, Marc-
Andrea Fiorina
The World Bank – DIME | WB Github
March 2023
Introduction
2 / 56
Introduction
Initial Setup
If You Attended Session 2 If You Did Not Attend Session 2
1. Go to the dime-r-training-mar2023 folder that you created yesterday, and open the dime-r-training-mar2023 R project
that you created there.
3 / 56
Introduction
Initial Setup
If You Attended Session 2 If You Did Not Attend Session 2
1. Create a folder named dime-r-training-mar2023 in your preferred location in your computer.
2. Go to the OSF page of the course and download the file in: R for Stata Users - 2023 March > Data > dime-r-training-
mar2023.zip .
3. Unzip dime-r-training-mar2023.zip .
4. Open the dime-r-training-mar2023 R project.
3 / 56
Today's session
Exploratory Analysis v. Publication/Reporting
Data, aesthetics, & the grammar of graphics
Aesthetics in extra dimensions, themes, and saving plots
For this session, you’ll use the ggplot2 package from the tidyverse meta-
package.
Similarly to previous sessions, you can find some references at the end of this
presentation that include a more comprehensive discussion on data
visualization.
4 / 56
Introduction
Before we start
Make sure the packages ggplot2 are installed and loaded. You can load it directly using library(tidyverse) or
library(ggplot2)
Load the whr_panel data set (remember to use the here package) we created last week.
# Packages
library(tidyverse)
library(here)
whr_panel <- read_csv(
here(
"DataWork", "DataSets", "Final", "whr_panel.csv"
)
)
## Rows: 470 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): country, region
## dbl (6): year, happiness_rank, happiness_score, economy_gdp_per_capita, heal... 5 / 56
Introduction
In our workflow there are usually two distinct uses for plots:
1. Exploratory analysis: Quickly visualize your data in an insightful way.
Base R can be used to quickly create basic figures
We will also use ggplot2 to quickly create basic figures as well.
2. Publication/Reporting: Make pretty graphs for a presentation, a project report, or papers:
We’ll do this using ggplot2 with more customization. The idea is to create beautiful graphs.
6 / 56
Exploratory Analysis
7 / 56
Plot with Base R
First, we’re going to use base plot, i.e., using Base R default libraries. It is easy to use and can produce useful graphs with very
few lines of code.
Exercise 1: Exploratory Analysis.
(1) Create a vector called vars with the variables: economy_gdp_per_capita , happiness_score ,
health_life_expectancy , and freedom .
(2) Select all the variables from the vector vars in the whr_panel dataset and assign to the object whr_plot .
(3) Use the plot() function: plot(whr_plot)
# Vector of variables
vars <- c("economy_gdp_per_capita", "happiness_score", "health_life_expectancy", "freedom")
# Create a subset with only those variables, let's call this subset whr_plot
whr_plot <- whr_panel %>%
select(all_of(vars))
01:00
8 / 56
Base Plot
plot(whr_plot)
9 / 56
The beauty of ggplot2
1. Consistency with the Grammar of Graphics
This book is the foundation of several data viz applications: ggplot2 , polaris-
tableau , vega-lite
2. Flexibility
3. Layering and theme customization
4. Community
It is a powerful and easy to use tool (once you understand its logic) that produces
complex and multifaceted plots.
10 / 56
ggplot2: basic structure (template)
The basic ggplot structure is:
ggplot(data = DATA) +
GEOM_FUNCTION(mapping = aes(AESTHETIC MAPPINGS))
Mapping data to aesthetics
Think about colors, sizes, x and y references
We are going to learn how we connect our data to the components of a ggplot
11 / 56
ggplot2: full structure
ggplot(data = <DATA>) + 1. Data : The data that you want to visualize
<GEOM_FUNCTION>( 2. Layers : geom_ and stat_ → The geometric shapes and
mapping = aes(<MAPPINGS>), statistical summaries representing the data
stat = <STAT>,
3. Aesthetics : aes() → Aesthetic mappings of the
position = <POSITION>
geometric and statistical objects
) +z
<COORDINATE_FUNCTION> + 4. Scales : scale_ → Maps between the data and the
<FACET_FUNCTION> + aesthetic dimensions
<SCALE_FUNTION> + 5. Coordinate system : coord_ → Maps data into the
<THEME_FUNCTION> plane of the data rectangle
6. Facets : facet_ → The arrangement of the data into a
grid of plots
7. Visual themes : theme() and theme_ → The overall
visual defaults of a plot
12 / 56
ggplot2: decomposition
There are multiple ways to
structure plots with ggplot
For this presentation, I will stick to Thomas Lin
Pedersen's decomposition who is one of most
prominent developers of the ggplot and
gganimate package.
These components can be seen as layers, this is
why we use the + sign in our ggplot syntax.
13 / 56
Exploratory Analysis
Let's start making some plots.
ggplot(data = whr_panel) +
geom_point(mapping = aes(x = happiness_score, y = economy_gdp_per_capita))
14 / 56
Exploratory Analysis
We can also set up our mapping in the ggplot() function.
ggplot(data = whr_panel, aes(x = happiness_score, y = economy_gdp_per_capita)) +
geom_point()
15 / 56
Exploratory Analysis
We can also set up the data outside the ggplot() function as follows:
whr_panel %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita)) +
geom_point()
16 / 56
Exploratory Analysis
I prefer to use the second way of structuring our ggplot.
1. First, setting our data;
2. pipe it;
3. then aesthetics;
4. and finally the geometries.
Both structures will work but this will make a difference if you want to load more
datasets at the same time, and whether you would like to combine more geoms in
the same ggplot. More on this in the following slides.
17 / 56
Exploratory Analysis
Exercise 2: Create a scatter plot with x = freedom and y = economy_gdp_per_capita .
Solution:
whr_panel %>%
ggplot() +
geom_point(aes(x = freedom, y = economy_gdp_per_capita))
01:00
18 / 56
Exploratory Analysis
The most common geoms are:
geom_bar() , geom_col() : bar charts.
geom_boxplot() : box and whiskers plots.
geom_density() : density estimates.
geom_jitter() : jittered points.
geom_line() : line plots.
geom_point() : scatter plots.
If you want to know more about layers, you can refer to this.
19 / 56
Exploratory Analysis
In summary, our basic plots should have the following:
whr_panel %>% The data we want to plot.
ggplot(
aes(
x = happiness_score,
y = economy_gdp_per_capita
)
) +
geom_point()
20 / 56
Exploratory Analysis
In summary, our basic plots should have the following:
whr_panel %>% Columns (variables) to use for x and y
ggplot(
aes(
x = happiness_score,
y = economy_gdp_per_capita
)
) +
geom_point()
21 / 56
Exploratory Analysis
In summary, our basic plots should have the following:
whr_panel %>% How the plot is going to be drawn.
ggplot(
aes(
x = happiness_score,
y = economy_gdp_per_capita
)
) +
geom_point()
22 / 56
Exploratory Analysis
We can also map colors.
whr_panel %>%
ggplot(
aes(
x = happiness_score,
y = economy_gdp_per_capita,
color = region
)
) +
geom_point()
23 / 56
Exploratory Analysis
Let's try to do something different, try, instead of region , adding color = "blue" inside aes() .
What do you think is the problem with this code?
whr_panel %>%
ggplot(
aes(
x = happiness_score,
y = economy_gdp_per_capita,
color = "blue"
)
)
24 / 56
Exploratory Analysis
In ggplot2 , these settings are called aesthetics.
"Aesthetics of the geometric and statistical objects".
We can set up:
position : x, y, xmin, xmax, ymin, ymax, etc.
colors : color and fill.
transparency : alpha.
sizes : size and width.
shapes : shape and linetype.
Notice that it is important to know where we are setting our aesthetics. For example:
geom_point(aes(color = region)) to color points based on the variable region
geom_point(color = "red") to color all points in the same color.
25 / 56
Exploratory Analysis
Let's modify our last plot. Let's add color = "blue" inside geom_point() .
whr_panel %>%
ggplot(
aes(
x = happiness_score,
y = economy_gdp_per_capita
)
) +
geom_point(color = "blue")
26 / 56
Exploratory Analysis
Exercise 3: Map colors per year for the freedom and gdp plot we did before. Keep in mind the type of the variable
year .
Solution:
whr_panel %>%
ggplot(
aes(
x = freedom,
y = economy_gdp_per_capita,
color = year
)
) +
geom_point()
01:00
27 / 56
Exploratory Analysis
How do you think we could solve it?
Change the variable year as: as.factor(year) .
whr_panel %>%
ggplot(
aes(
x = freedom,
y = economy_gdp_per_capita,
color = as.factor(year)
)
) +
geom_point()
28 / 56
ggplot2: settings
29 / 56
ggplot2: settings
Now, let's try to modify our plots. In the following slides, we are going to:
1. Change shapes.
2. Include more geoms.
3. Separate by regions.
4. Pipe and mutate before plotting.
5. Change scales.
6. Modify our theme.
30 / 56
ggplot2: shapes
whr_panel %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita)) +
geom_point(shape = 5)
31 / 56
ggplot2: shapes
32 / 56
ggplot2: including more geoms
whr_panel %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
33 / 56
ggplot2: Facets
whr_panel %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita)) +
geom_point() +
facet_wrap(~ region)
34 / 56
ggplot2: Colors and facets
Exercise 4: Use the last plot and add a color aesthetic per region.
Solution:
whr_panel %>%
ggplot(
aes(
x = happiness_score,
y = economy_gdp_per_capita,
color = region
)
) +
geom_point() +
facet_wrap(~ region)
01:00
35 / 56
ggplot2: Pipe and mutate before plotting
Let's imagine now, that we would like to transform a variable before plotting.
R Code Plot
whr_panel %>%
mutate(
latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
) %>%
ggplot(
aes(
x = happiness_score, y = economy_gdp_per_capita,
color = latam)
) +
geom_point()
36 / 56
ggplot2: Pipe and mutate before plotting
Let's imagine now, that we would like to transform a variable before plotting.
R Code Plot
36 / 56
ggplot2: geom's sizes
We can also specify the size of a geom, either by a variable or just a number.
whr_panel %>%
filter(year == 2017) %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita)) +
geom_point(aes(size = economy_gdp_per_capita))
37 / 56
ggplot2: Changing scales
Linear Log
ggplot(data = whr_panel,
aes(x = happiness_score,
y = economy_gdp_per_capita)) +
geom_point()
38 / 56
ggplot2: Changing scales
Linear Log
ggplot(data = whr_panel,
aes(x = happiness_score,
y = economy_gdp_per_capita)) +
geom_point() +
scale_x_log10()
38 / 56
ggplot2: Themes
Let's go back to our plot with the latam dummy.
We are going to do the following to this plot:
1. Filter only for the year 2015.
2. Change our theme.
3. Add correct labels.
4. Add some annotations.
5. Modify our legends.
39 / 56
ggplot2: Labels
R Code Plot
whr_panel %>%
mutate(
latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
) %>%
filter(year == 2015) %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita,
color = latam)) +
geom_point() +
labs(
x = "Happiness Score",
y = "GDP per Capita",
title = "Happiness Score vs GDP per Capita, 2015"
)
40 / 56
ggplot2: Labels
R Code Plot
40 / 56
ggplot2: Legends
R Code Plot
whr_panel %>%
mutate(
latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
) %>%
filter(year == 2015) %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita,
color = latam)) +
geom_point() +
scale_color_discrete(labels = c("No", "Yes")) +
labs(
x = "Happiness Score",
y = "GDP per Capita",
color = "Country in Latin America\nand the Caribbean",
title = "Happiness Score vs GDP per Capita, 2015"
)
41 / 56
ggplot2: Legends
R Code Plot
41 / 56
ggplot2: Themes
R Code Plot
whr_panel %>%
mutate(
latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
) %>%
filter(year == 2015) %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita,
color = latam)) +
geom_point() +
scale_color_discrete(labels = c("No", "Yes")) +
labs(
x = "Happiness Score",
y = "GDP per Capita",
color = "Country in Latin America\nand the Caribbean",
title = "Happiness Score vs GDP per Capita, 2015"
) +
theme_minimal()
42 / 56
ggplot2: Themes
R Code Plot
42 / 56
ggplot2: Themes
The theme() function allows you to modify each aspect of your plot. Some arguments are:
theme(
# Title and text labels
plot.title = element_text(color, size, face),
# Title font color size and face
legend.title = element_text(color, size, face),
# Title alignment. Number from 0 (left) to 1 (right)
legend.title.align = NULL,
# Text label font color size and face
legend.text = element_text(color, size, face),
# Text label alignment. Number from 0 (left) to 1 (right)
legend.text.align = NULL,
)
More about these modification can be found here
43 / 56
ggplot2: Color palettes
We can also add color palettes using other packages such as: RColorBrewer,
viridis or funny ones like the wesanderson package. So, let's add new colors.
First, install the RColorBrewer package.
# install.packages("RColorBrewer")
library(RColorBrewer)
Let's add scale_color_brewer(palette = "Dark2") to our ggplot.
44 / 56
ggplot2: Color palettes
R Code Plot
whr_panel %>%
mutate(
latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
) %>%
filter(year == 2015) %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita,
color = latam)) +
geom_point() +
scale_color_brewer(palette = "Dark2", labels = c("No", "Yes")) +
labs(
x = "Happiness Score",
y = "GDP per Capita",
color = "Country in Latin America\nand the Caribbean",
title = "Happiness Score vs GDP per Capita, 2015"
) +
theme_minimal()
45 / 56
ggplot2: Color palettes
R Code Plot
45 / 56
ggplot2: Color palettes
My favorite color palettes packages:
1. ghibli
2. LaCroixColoR
3. NineteenEightyR
4. nord
5. palettetown
6. quickpalette
7. wesanderson
46 / 56
Saving a plot
47 / 56
Saving a plot
Remember that in R we can always assign our functions to an object. In this case, we can assign our ggplot2 code to an
object called fig as follows.
fig <- whr_panel %>%
mutate(
latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
) %>%
filter(year == 2015) %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita,
color = latam)) +
geom_point() +
scale_color_discrete(labels = c("No", "Yes")) +
labs(
x = "Happiness Score",
y = "GDP per Capita",
color = "Country in Latin America\nand the Caribbean",
title = "Happiness Score vs GDP per Capita, 2015"
) +
theme_minimal()
48 / 56
Therefore, if you want to plot it again, you can just type fig in the console.
Saving a plot
Exercise 5: Save a ggplot under the name fig_*YOUR INITIALS* . Use the ggsave() function. You can either include
the function after your plot or, save the ggplot first as an object and then save the plot.
The syntax is ggsave(OBJECT, filename = FILEPATH, heigth = ..., width = ..., dpi = ...) .
Solution:
ggsave(
fig,
filename = here("DataWork","Output","Raw","fig_MA.png"),
dpi = 750,
scale = 0.8,
height = 8,
width = 12
)
01:00
49 / 56
And that's it for this session. Join us tomorrow for data
analysis. Remember to submit your feedback!
50 / 56
References and recommendations
51 / 56
References and recommendations
ggplot tricks:
Tricks and Secrets for Beautiful Plots in R by Cédric Scherer: https://github.com/z3tt/outlierconf2021
Websites:
Interactive stuff : http://www.htmlwidgets.org/
The R Graph Gallery: https://www.r-graph-gallery.com/
Gpplot official site: http://ggplot2.tidyverse.org/
Online courses:
Johns Hopkins Exploratory Data Analysis at Coursera: https://www.coursera.org/learn/exploratory-data-analysis
Books:
The grammar of graphics by Leland Wilkinson.
Beautiful Evidence by Edward Tufte.
R Graphics cook book by Winston Chang
R for Data Science by Hadley Wickham andGarrett Grolemund
52 / 56
Appendix: interactive graphs
53 / 56
Interactive graphs
There are several packages to create interactive or dynamic data vizualizations with R. Here are a few:
leaflet - R integration tp one of the most popular open-source libraries for interactive maps.
highcharter - cool interactive graphs.
plotly - interactive graphs with integration to ggplot.
gganimate - ggplot GIFs.
DT - Interactive table
These are generally, html widgets that can be incorporated in to an html document and websites.
54 / 56
Interactive graphs
Now we’ll use the ggplotly() function from the plotly package to create an interactive graph!
Extra exercise: Interactive graphs.
Load the plotly package
Pass that object with the last plot you created to the ggplotly() function
55 / 56
Interactive graphs
R Code Plot
# Load package
library(plotly)
# Use ggplotly to create an interactive plot
ggplotly(fig) %>%
layout(legend = list(orientation = "h", x = 0.4, y = -0.2))
56 / 56
Interactive graphs
R Code Plot
Happiness Score vs GDP per Capita, 2015
1.5
GDP per Capita
1.0
0.5
0.0
3 4 5 6 7
Happiness Score
Country in Latin America FALSE TRUE
and the Caribbean
56 / 56