Introduction to
relational plots and
subplots
I N T R O D U C T I O N TO D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
Questions about quantitative variables
Relational plots
Height vs. weight
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Questions about quantitative variables
Relational plots
Height vs. weight
Number of school absences vs. nal grade
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Questions about quantitative variables
Relational plots
Height vs. weight
Number of school absences vs. nal grade
GDP vs. percent literate
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Introducing relplot()
Create "relational plots": scatter plots or line plots
Why use relplot() instead of scatterplot() ?
relplot() lets you create subplots in a single gure
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
scatterplot() vs. relplot()
Using scatterplot() Using relplot()
import seaborn as sns import seaborn as sns
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
sns.scatterplot(x="total_bill", sns.relplot(x="total_bill",
y="tip", y="tip",
data=tips) data=tips,
kind="scatter")
plt.show()
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subplots in columns
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
col="smoker")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subplots in rows
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
row="smoker")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subplots in rows and columns
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
col="smoker",
row="time")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subgroups for days of the week
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Wrapping columns
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
col="day",
col_wrap=2)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Ordering columns
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
col="day",
col_wrap=2,
col_order=["Thur",
"Fri",
"Sat",
"Sun"])
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N TO D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Customizing scatter
plots
I N T R O D U C T I O N TO D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
Scatter plot overview
Show relationship between two quantitative variables
We've seen:
Subplots ( col and row )
Subgroups with color ( hue )
New Customizations:
Subgroups with point size and style
Changing point transparency
Use with both scatterplot() and relplot()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subgroups with point size
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
size="size")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Point size and hue
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
size="size",
hue="size")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subgroups with point style
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
hue="smoker",
style="smoker")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Changing point transparency
import seaborn as sns
import matplotlib.pyplot as plt
# Set alpha to be between 0 and 1
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
alpha=0.4)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N TO D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Introduction to line
plots
I N T R O D U C T I O N TO D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
What are line plots?
Two types of relational plots: scatter plots and line
plots
Scatter plots
Each plot point is an independent
observation
Line plots
Each plot point represents the same "thing",
typically tracked over time
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Air pollution data
Collection stations throughout city
Air samples of nitrogen dioxide levels
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Scatter plot
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2_mean",
data=air_df_mean,
kind="scatter")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Line plot
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2_mean",
data=air_df_mean,
kind="line")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subgroups by location
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subgroups by location
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2_mean",
data=air_df_loc_mean,
kind="line",
style="location",
hue="location")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Adding markers
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2_mean",
data=air_df_loc_mean,
kind="line",
style="location",
hue="location",
markers=True)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Turning off line style
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2_mean",
data=air_df_loc_mean,
kind="line",
style="location",
hue="location",
markers=True,
dashes=False)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Multiple observations per x-value
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Multiple observations per x-value
Scatter plot
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2",
data=air_df,
kind="scatter")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Multiple observations per x-value
Line plot
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2",
data=air_df,
kind="line")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Multiple observations per x-value
Shaded region is the con dence interval
Assumes dataset is a random sample
95% con dent that the mean is within this
interval
Indicates uncertainty in our estimate
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Replacing con dence interval with standard
deviation
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2",
data=air_df,
kind="line",
ci="sd")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Turning off con dence interval
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2",
data=air_df,
kind="line",
ci=None)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N TO D ATA V I S U A L I Z AT I O N W I T H S E A B O R N