Scatter plots
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H G G P L O T 2
Rick Scave a
Founder, Scave a Academy
48 geometries
geom_*
abline contour dotplot ji er pointrange ribbon spoke
area count errorbar label polygon rug step
bar crossbar errorbarh line qq segment text
bin2d curve freqpoly linerange qq_line sf tile
blank density hex map quantile sf_label violin
boxplot density2d histogram path raster sf_text vline
col density_2d hline point rect smooth
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Common plot types
Plot type Possible Geoms
Sca er plots points, ji er, abline, smooth, count
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Scatter plots
Each geom can accept speci c aesthetic ggplot(iris, aes(x = Sepal.Length,
mappings, e.g. geom_point(): y = Sepal.Width)) +
geom_point()
Essential
x,y
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Scatter plots
Each geom can accept speci c aesthetic ggplot(iris, aes(x = Sepal.Length,
mappings, e.g. geom_point(): y = Sepal.Width,
col = Species)) +
Essential Optional
geom_point()
alpha, color, ll, shape, size,
x,y
stroke
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Geom-specific aesthetic mappings
# These result in the same plot!
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) +
geom_point()
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point(aes(col = Species))
Control aesthetic mappings of each layer independently:
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
head(iris, 3) # Raw data
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 5.1 3.5 1.4 0.2
2 setosa 4.9 3.0 1.4 0.2
3 setosa 4.7 3.2 1.3 0.2
iris %>%
group_by(Species) %>%
summarise_all(mean) -> iris.summary
iris.summary # Summary statistics
# A tibble: 3 x 5
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.01 3.43 1.46 0.246
2 versicolor 5.94 2.77 4.26 1.33
3 virginica 6.59 2.97 5.55 2.03
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) +
# Inherits both data and aes from ggplot()
geom_point() +
# Different data, but inherited aes
geom_point(data = iris.summary, shape = 15, size = 5)
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Shape attribute values
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Example
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) +
geom_point() +
geom_point(data = iris.summary, shape = 21, size = 5,
fill = "black", stroke = 2)
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
On-the-fly stats by ggplot2
See the second course for the stats layer.
Note: Avoid plo ing only the mean without a measure of spread, e.g. the standard
deviation.
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
position = "jitter"
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) +
geom_point(position = "jitter")
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
geom_jitter()
A short-cut to geom_point(position = "jitter")
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) +
geom_jitter()
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Don't forget to adjust alpha
Combine ji ering with alpha-blending if necessary
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) +
geom_jitter(alpha = 0.6)
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Hollow circles also help
shape = 1 is a. hollow circle.
Not necessary to also use alpha-blending.
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) +
geom_jitter(shape = 1)
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H G G P L O T 2
Histograms
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H G G P L O T 2
Rick Scave a
Founder, Scave a Academy
Common plot types
Plot type Possible Geoms
Sca er plots points, ji er, abline, smooth, count
Bar plots histogram, bar, col, errorbar
Line plots line, path
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Histograms
ggplot(iris, aes(x = Sepal.Width)) +
geom_histogram()
A plot of binned values
i.e. a statistical function
`stat_bin()` using `bins = 30`.
Pick better value with `binwidth`.
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Default of 30 even bins
ggplot(iris, aes(x = Sepal.Width)) +
geom_histogram()
A plot of binned values
i.e. a statistical function
# Default bin width:
diff(range(iris$Sepal.Width))/30
[1] 0.08
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Intuitive and meaningful bin widths
ggplot(iris, aes(x = Sepal.Width)) +
geom_histogram(binwidth = 0.1)
Always set a meaningful bin widths for your
data.
No spaces between bars.
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Re-position tick marks
ggplot(iris, aes(x = Sepal.Width)) +
geom_histogram(binwidth = 0.1,
center = 0.05)
Always set a meaningful bin widths for your
data.
No spaces between bars.
X axis labels are between bars.
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Different Species
ggplot(iris, aes(x = Sepal.Width,
fill = Species)) +
geom_histogram(binwidth = .1,
center = 0.05)
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Default position is "stack"
ggplot(iris, aes(x = Sepal.Width,
fill = Species)) +
geom_histogram(binwidth = .1,
center = 0.05,
position = "stack")
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
position = "dodge"
ggplot(iris, aes(x = Sepal.Width,
fill = Species)) +
geom_histogram(binwidth = .1,
center = 0.05,
position = "dodge")
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
position = "fill"
ggplot(iris, aes(x = Sepal.Width,
fill = Species)) +
geom_histogram(binwidth = .1,
center = 0.05,
position = "fill")
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Final Slide
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H G G P L O T 2
Bar plots
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H G G P L O T 2
Rick Scave a
Founder, Scave a Academy
Bar Plots, with a categorical X-axis
Use geom_bar() or geom_col()
Geom Stat Action
geom_bar() "count" Counts the number of cases at each x position
geom_col() "identity" Plot actual values
All positions from before are available
Two types
Absolute counts
Distributions
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Bar Plots, with a categorical X-axis
Use geom_bar() or geom_col()
Geom Stat Action
geom_bar() "count" Counts the number of cases at each x position
geom_col() "identity" Plot actual values
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Bar Plots, with a categorical X-axis
Use geom_bar() or geom_col()
Geom Stat Action
geom_bar() "count" Counts the number of cases at each x position
geom_col() "identity" Plot actual values
All positions from before are available
Two types
Absolute counts
Distributions
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Habits of mammals
str(sleep)
'data.frame': 76 obs. of 3 variables:
$ vore : Factor w/ 4 levels "carni","herbi",..: 1 4 2 4 2 2 1 1 2 2 ...
$ total: num 12.1 17 14.4 14.9 4 14.4 8.7 10.1 3 5.3 ...
$ rem : num NA 1.8 2.4 2.3 0.7 2.2 1.4 2.9 NA 0.6 ...
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Bar plot
ggplot(sleep, aes(vore)) +
geom_bar()
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Plotting distributions instead of absolute counts
# Calculate Descriptive Statistics: iris_summ_long
iris %>%
select(Species, Sepal.Width) %>% Species avg stdev
gather(key, value, -Species) %>% setosa 3.43 0.38
group_by(Species) %>%
versicolor 2.77 0.31
summarise(avg = mean(value),
virginica 2.97 0.32
stdev = sd(value))
-> iris_summ_long
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Plotting distributions
ggplot(iris_summ_long, aes(x = Species,
y = avg)) +
geom_col() +
geom_errorbar(aes(ymin = avg - stdev,
ymax = avg + stdev),
width = 0.1)
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H G G P L O T 2
Line plots
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H G G P L O T 2
Rick Scave a
Founder, Scave a Academy
Common plot types
Plot type Possible Geoms
Sca er plots points, ji er, abline, smooth, count
Bar plots histogram, bar, col, errorbar
Line plots line, path
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Beaver
str(beaver)
'data.frame': 101 obs. of 3 variables:
$ time : POSIXct, format: "2000-01-01 09:30:00" "2000-01-01 09:40:00" "2000-01-01 09:50:00" ...
$ temp : num 36.6 36.7 36.9 37.1 37.2 ...
$ active: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Beaver
ggplot(beaver, aes(x = time, y = temp)) +
geom_line()
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Beaver
ggplot(beaver, aes(x = time, y = temp,
color = factor(active))
) +
geom_line()
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
The fish catch dataset
str(fish)
'data.frame': 427 obs. of 3 variables:
$ Species: Factor w/ 7 levels "Pink","Chum",..: 1 2 3 4 5 6 7 1 2 3 ...
$ Year : int 1950 1950 1950 1950 1950 1950 1950 1951 1951 1951 ...
$ Capture: int 100600 139300 64100 30500 0 23200 10800 259000 155900 51200 ...
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Linetype aesthetic
ggplot(fish, aes(x = Year,
y = Capture,
linetype = Species)) +
geom_line()
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Size aesthetic
ggplot(fish, aes(x = Year,
y = Capture,
size = Species)) +
geom_line()
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Color aesthetic
ggplot(fish, aes(x = Year,
y = Capture,
color = Species)) +
geom_line()
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Aesthetics for categorical variables
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Fill aesthetic with geom_area()
ggplot(fish, aes(x = Year,
y = Capture,
fill = Species)) +
geom_area()
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Using position = "fill"
ggplot(fish, aes(x = Year,
y = Capture,
fill = Species)) +
geom_area(position = "fill")
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
geom_ribbon()
ggplot(fish, aes(x = Year,
y = Capture,
fill = Species)) +
geom_ribbon(aes(ymax = Capture,
ymin = 0),
alpha = 0.3)
INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H G G P L O T 2