Summarizing Graphics
Summarizing Graphics
40
30
ercent
Pe
20
10
50
40
ncy (Count)
30
Frequen
20
10
18 19 20 21 22 23 24 25 26 27
Age (in years)
n=92 students
Analogy
measurement data.
Histogram
• Divide measurement up into equal-sized
categories.
g
• Determine number (or percentage) of
measurements falling into each category.
category
• Draw a bar for each category so bars’
heights represent number (or percent)
falling into the categories.
• Label
L b l andd title
i l appropriately.
i l
Histogram
60
50
ncy (Count)
40
30
Frequen
20
10
18 23 28
Age (in years)
n=92 students
Too many categories
GPAs of Spring 1998 Stat 250 Students
6
Frequency (Count)
2 3 4
GPA
n=92 students
Dot Plot
Fastest Ever Driving Speed
226 Stat 100 Students, Fall '98
100
Men
126
Women
70 80 90 100 110 120 130 140 150 160
S
Speed
d
Dot Plot
• Summarizes measurement data.
• Horizontal axis represents measurement
scale.
• Plot one dot for each data point.
point
Stem-and-Leaf Plot
Stem-and-leaf of Shoes N = 139 Leaf Unit = 1.0
12 0 223334444444
63 0 555555555555566666666677777778888888888888999999999
(33) 1 000000000000011112222233333333444
43 1 555555556667777888
25 2 0000000000023
12 2 5557
8 3 0023
4 3
4 4 00
2 4
2 5 0
1 5
1 6
1 6
1 7
1 7 5
Stem-and-Leaf Plot
• Summarizes measurement data.
• Each data point is broken down into a
“stem” and a “leaf.”
• First,
First “stems”
stems are aligned in a column.
column
• Then, “leaves” are attached to the stems.
Box Plot
Amount of sleep in past 24 hours
of Spring 1998 Stat 250 Students
10
9
8
of sleep
7
6
5
Hours o
4
3
2
1
0
Box Plot
• Summarizes measurement data.
• Vertical (or horizontal) axis represents
measurement scale.
• Lines in box represent the 25th percentile
(“first quartile”), the 50th percentile
((“median”)
median ), and the 75th percentile ((“third
third
quartile”), respectively.
An aside...
• Roughly speaking:
– The “25th
25th percentile”
percentile is the number such that
25% of the data points fall below the number.
– The “median” or “50th p percentile” is the
number such that half of the data points fall
below the number.
– The “75th percentile” is the number such that
75% of the data points fall below the number.
Box Plot (cont’d)
• “Whiskers” are drawn to the most extreme
data p
points that are not more than 1.5 times
the length of the box beyond either quartile.
– Whiskers are useful for identifying outliers.
• “Outliers,” or extreme observations, are
denoted by asterisks
asterisks.
– Generally, data points falling beyond the
whiskers are considered outliers.
outliers
Using Box Plots to Compare
Fastest Ever Driving Speed
226 Stat 100 Students, Fall 1998
160
eed (mph)
Fastest Spe
110
F
60
female male
G d
Gender
Which graph to use when?
• Stem-and-leaf plots and dotplots are good
for small data sets,, while histograms
g and
box plots are good for large data sets.
• Boxplots and dotplots are good for
comparing two groups.
• Boxplots are good for identifying outliers
outliers.
• Histograms and boxplots are good for
id if i “shape”
identifying “h ” off data.
d
Scatter Plots
F t sizes
Foot i off Spring
S i 1998 St
Statt 250 students
t d t
31
30
29
oot (in cm)
28
27
Right fo
26
25
24
23
22
22 23 24 25 26 27 28 29 30 31
Left foot (in cm)
n=88
88 students
t d t
Scatter Plots
• Summarizes the relationship between two
measurement variables.
• Horizontal axis represents one variable and
vertical axis represents second variable.
variable
• Plot one point for each pair of
measurements.
measurements
No relationship
Lengths
g of left forearms and head circumferences
of Spring 1998 Stat 250 Students
32
31
Left forearrm (in cm)
30
29
28
27
26
25
24
23
22
52 57 62
Head circumference ((in cm))
n=89 students
Closing comments
• Many possible types of graphs.
• Use common sense in reading graphs
graphs.
• When creating graphs, don’t summarize
your data too much or too little
little.
• When creating graphs, label everything for
others.
h R
Remember
b you are tryingi to
communicate something to others!