ch2-22092024-104300am
ch2-22092024-104300am
ch2-22092024-104300am
page 41
OUTLINE
Introduction
2–1 Organizing Data
2–2 Histograms, Frequency Polygons, and Ogives
2–3 Other Types of Graphs
Summary
OBJECTIVES
After completing this chapter, you should be able to
1 Organize data using a frequency distribution.
2 Represent data in frequency distributions graphically, using
histograms, frequency polygons, and ogives.
3 Represent data using bar graphs, Pareto charts, time series
graphs, pie graphs, and dotplots.
4 Draw and interpret a stem and leaf plot.
STATISTICS TODAY
page 42
Introduction
When conducting a statistical study, the researcher must gather data for the
particular variable under study. For example, if a researcher wishes to study
the number of people who were bitten by poisonous snakes in a specific
geographic area over the past several years, the researcher has to gather the
data from various doctors, hospitals, or health departments.
To describe situations, draw conclusions, or make inferences about events,
the researcher must organize the data in some meaningful way. The most
convenient method of organizing data is to construct a frequency distribution.
After organizing the data, the researcher must present them so they can be
understood by those who will benefit from reading the study. The most useful
method of presenting the data is by constructing statistical charts and graphs.
There are many different types of charts and graphs, and each one has a
specific purpose.
This chapter explains how to organize data by constructing frequency
distributions and how to present the data by constructing charts and graphs.
The charts and graphs illustrated here are histograms, frequency polygons,
ogives, pie graphs, Pareto charts, and time series graphs. A graph that
combines the characteristics of a frequency distribution and a histogram,
called a stem and leaf plot, is also explained.
Since little information can be obtained from looking at raw data, the
researcher organizes the data into what is called a frequency distribution.
Unusual Stats
Of Americans 50 years old and over, 23% think their greatest achievements are still ahead
of them.
page 43
Now some general observations can be made from looking at the
frequency distribution. For example, it can be stated that the majority of the
wealthy people in the study are 45 years old or older.
The classes in this distribution are 27–35, 36–44, etc. These values are
called class limits. The data values 27, 28, 29, 30, 31, 32, 33, 34, 35 can be
tallied in the first class; 36, 37, 38, 39, 40, 41, 42, 43, 44 in the second class;
and so on.
Two types of frequency distributions that are most often used are the
categorical frequency distribution and the grouped frequency distribution.
The procedures for constructing these distributions are shown now.
SOLUTION
Since the data are categorical, discrete classes are used. They are W, M, J,
C, and T.
Step 1 Make a table as shown.
Step 2 Tally the data and place the results in the second column page 44
labeled Tally.
Step 3 Count the tallies and place the results in the third column labeled
Frequency.
Step 4 Find the percentage of values in each class by using the formula
Unusual Stats
Six percent of Americans say they find life dull.
Unusual Stats
One out of every hundred people in the United States is color-blind.
If the data are in tenths, such as 6.2, 7.8, and 12.6, the limits for a class
hypothetically might be 7.8–8.8, and the boundaries for that class would be
7.75–8.85. Find these values by subtracting 0.05 from 7.8 and adding 0.05 to
8.8.
Class boundaries are not always included in frequency distributions;
however, they give a more formal approach to the procedure of organizing
data, including the fact that sometimes the data have been rounded. You
should be familiar with boundaries since you may encounter them in a
statistical study.
Finally, the class width for a class in a frequency distribution is found by
subtracting the lower (or upper) class limit of one class from the lower (or
upper) class limit of the next class. For example, the class width in the
preceding distribution on the distribution of blood glucose levels is 7, found
from 65 − 58 = 7.
The class width can also be found by subtracting the lower boundary from
the upper boundary for any given class. In this case, 64.5 − 57.5 = 7.
Note: Do not subtract the limits of a single class. It will result in an
incorrect answer.
The researcher must decide how many classes to use and the width of each
class. To construct a frequency distribution, follow these rules:
1. There should be between 5 and 20 classes. Although there is no hard-and-
fast rule for the number of classes contained in a frequency distribution, it
is of utmost importance to have enough classes to present a clear
description of the collected data.
2. It is preferable but not absolutely necessary that the class width be an
odd number. This ensures that the midpoint of each class has the same
place value as the data. The class midpoint Xm is obtained by adding the
lower and upper boundaries and dividing by 2, or adding the lower and
upper limits and dividing by 2:
or
For example, the midpoint of the first class in the example with glucose
levels is
The midpoint is the numeric location of the center of the class. Midpoints
are necessary for graphing (see Section 2–2). If the class width is an even
number, the midpoint is in tenths. For example, if the class width is 6 and
the boundaries are 5.5 and 11.5, the midpoint is
page 46
Recall that boundaries are mutually exclusive. For example, when a class
boundary is 5.5 to 10.5, the data values that are included in that class are
values from 6 to 10. A data value of 5 goes into the previous class, and a
data value of 11 goes into the next-higher class.
4. The classes must be continuous. Even if there are no values in a class, the
class must be included in the frequency distribution. There should be no
gaps in a frequency distribution. The only exception occurs when the
class with a zero frequency is the first or last class. A class with a zero
frequency at either end can be omitted without affecting the distribution.
5. The classes must be exhaustive. There should be enough classes to
accommodate all the data.
6. The classes must be equal in width. This avoids a distorted view of the
data.
One exception occurs when a distribution has a class that is open-
ended. That is, the first class has no specific lower limit, or the last class
has no specific upper limit. A frequency distribution with an open-ended
class is called an open-ended distribution. Here are two examples of
distributions with open-ended classes.
The frequency distribution for age is open-ended for the last class, which
means that anybody who is 54 years or older will be tallied in the last
class. The distribution for minutes is open-ended for the first class,
meaning that any minute values below 110 will be tallied in that class.
Procedure Table
Constructing a Grouped Frequency Distribution
Step 1 Determine the classes.
Find the highest and lowest values.
Find the range.
Select the number of classes desired.
Find the width by dividing the range by the number of classes
and rounding up.
Select a starting point (usually the lowest value or any
convenient number less than the lowest value); add the width
to get the lower limits.
Find the upper class limits.
Find the boundaries.
Step 2 Tally the data.
Step 3 Find the numerical frequencies from the tallies, and find the
cumulative frequencies.
Unusual Stats
America’s most popular beverages are soft drinks. It is estimated that, on average, each
person drinks about 52 gallons of soft drinks per year, compared to 22 gallons of beer.
SOLUTION
page 49
Cumulative frequencies are used to show how many data values are
accumulated up to and including a specific class. In Example 2–2, of the total
record high temperatures 28 are less than or equal to 114°F. Forty-eight of the
total record high temperatures are less than or equal to 124°F.
After the raw data have been organized into a frequency distribution, it will
be analyzed by looking for peaks and extreme values. The peaks show which
class or classes have the most data values compared to the other classes.
Extreme values, called outliers, show large or small data values that are
relative to other data values.
When the range of the data values is relatively small, a frequency
distribution can be constructed using single data values for each class. This
type of distribution is called an ungrouped frequency distribution and is
shown next.
SOLUTION
Step 1 Determine the number of classes. Since the range is small (10 –
3 = 7), classes consisting of a single data value can be used.
They are 3, 4, 5, 6, 7, 8, 9, 10.
Note: If the data are continuous, class boundaries can be used.
Subtract 0.5 from each class value to get the lower class
boundary, and add 0.5 to each class value to get the upper class
boundary.
Step 2 Tally the data.
Step 3 From the tallies, find the numerical frequencies and cumulative
frequencies. The completed ungrouped frequency distribution is
shown.
page 50
In this case, eight students worked 4 hours and this was the
largest frequency. The cumulative frequencies are
Interesting Fact
Male dogs bite children more often than female dogs do; however, female cats bite
children more often than male cats do.
page 51
Applying the Concepts 2–1
Ages of Presidents at Inauguration
The data represent the ages of our Presidents at the time they were first
inaugurated.
1. Were the data obtained from a population or a sample? Explain your
answer.
2. What was the age of the oldest President?
3. What was the age of the youngest President?
4. Construct a frequency distribution for the data. (Use your own
judgment as to the number of classes and class size.)
5. Are there any peaks in the distribution?
6. Identify any possible outliers.
7. Write a brief summary of the nature of the data as shown in the
frequency distribution.
Exercises 2–1
1. List five reasons for organizing data into a frequency distribution.
2. Name the three types of frequency distributions, and explain when
each should be used.
3. How many classes should frequency distributions have? Why should
the class width be an odd number?
4. What are open-ended frequency distributions? Why are they
necessary?
For Exercises 5–8, find the class boundaries, midpoints, and widths for
each class.
5. 58–62
6. 125–131
7. 16.35–18.46
8. 16.3–18.5
13. Household Pets The following data show the household pets that
people own: F = fish, D = dog, B = bird, C = cat, and R = reptile.
Construct a categorical frequency distribution for the data and
summarize the results.
page 52
Source: Based on information from the National Pet Owners’ Society
14. Trust in Internet Information A survey was taken on how much trust
people place in the information they read on the Internet. Construct a
categorical frequency distribution for the data. A = trust in all that they
read, M = trust in most of what they read, H = trust in about one-half of
what they read, S = trust in a small portion of what they read. (Based
on information from the UCLA Internet Report.)
16. Ages of Dogs The ages of 20 dogs in a pet shelter are shown. Construct
a frequency distribution using 7 classes.
17. Wind Speed The data show the maximum wind speeds (in miles per
hour) at selected cities in the United States. Construct a frequency
distribution for the data using eight classes. Summarize the results.
Source: The World Almanac and Book of Facts
19. Rabies Virus Cases The data show the ages of 22 people who had
contracted the rabies virus for a certain year. Construct a frequency
distribution for the data and summarize the results. Use five classes.
20. School Districts in States The data show the number of school
districts in each state in the United States. Construct a frequency
distribution using six classes for the data and summarize the results.
21. World Temperatures The following data show the high temperatures
for May 8 in 22 selected cities around the world. Construct a frequency
distribution for the data. Use five classes.
Source: USA TODAY
24. Ranges of Tides The data show the mean ranges for tides in page 53
coastal cities in the United States. The ranges represent the
difference of the mean high water mark and the mean low water mark.
Construct a frequency distribution using six classes. Summarize the
results.
25. Average Wind Speeds A sample of 40 large cities was selected, and
the average of the wind speeds was computed for each city over one
year. Construct a frequency distribution, using 7 classes.
Source: World Almanac and Book of Facts.
33. In a survey, 50 college students were asked what color automobile they
drive to school. The responses were:
34. A researcher decides to survey 30 college students and asks each one
what their favorite pizza topping is. The researcher wants to use a pie
chart to summarize the data. The categories are pepperoni, beef, green
peppers, olives, tomatoes, mushrooms, pineapple, onions, spinach, and
jalapenos. Why might this not be a good idea?
35. A researcher decides to present the results obtained in a frequency
distribution by combining the five frequencies of adjacent classes, thus
using fewer classes. Why is this not a good idea?
page 54
Technology
EXCEL
Step by Step
Step by Step
Categorical Frequency Table (Qualitative or Discrete
Data)
1. In an open workbook, select cell A1 and type in all the beverage data
from Example 2–1 down column A.
9. After all the data have been counted, select cell D7 in the worksheet.
10. From the toolbar select Formulas, select the dropdown menu All, then
select Sum and type in D2:D6 to insert the total of the frequencies
into cell D7. If you select each cell, you should see the following
formulas.
After entering data or a heading into a worksheet, you can change page 55
the width of a column to fit the input. To automatically change the
width of a column to fit the data:
1. Select the column or columns that you want to change.
2. On the Home tab, in the Cells group, select Format.
3. Under Cell Size, click Autofit Column Width.
page 56
MINITAB
Step by Step
OBJECTIVE 2
Represent data in frequency distributions graphically, using histograms, frequency polygons,
and ogives.
Statistical graphs can be used to describe the data set or to analyze it.
Graphs are also useful in getting the audience’s attention in a publication or a
speaking presentation. They can be used to discuss an issue, reinforce a
critical point, or summarize a data set. They can also be used to discover a
trend or pattern in a situation over a period of time.
The three most commonly used graphs in research are
1. The histogram.
2. The frequency polygon.
3. The cumulative frequency graph, or ogive (pronounced o-jive).
The steps for constructing the histogram, frequency polygon, and the ogive
are summarized in the procedure table.
Procedure Table
Constructing a Histogram, Frequency Polygon, and Ogive
Step 1 Draw and label the x and y axes.
Step 2 On the x axis, label the class boundaries of the frequency
distribution for the histogram and ogive. Label the midpoints for
the frequency polygon.
Step 3 Plot the frequencies for each class, and draw the vertical bars for
the histogram and the lines for the frequency polygon and ogive.
(Note: Remember that the lines for the frequency polygon begin and end
on the x axis while the lines for the ogive begin on the x axis.)
Historical Note
Karl Pearson introduced the histogram in 1891. He used it to show time concepts of
various reigns of Prime Ministers.
SOLUTION
Step 1 Draw and label the x and y axes. The x axis is always the
horizontal axis, and the y axis is always the vertical axis.
Step 2 Represent the frequency on the y axis and the class boundaries
on the x axis.
Step 3 Using the frequencies as the heights, draw vertical bars for each
class. See Figure 2–1.
FIGURE 2–1 Histogram for Example 2ȓ4
Historical Note
Graphs originated when ancient astronomers drew the position of the stars in the heavens.
Roman surveyors also used coordinates to locate landmarks on their maps.
The development of statistical graphs can be traced to William Playfair (1759–1823), an
engineer and drafter who used graphs to present economic data pictorially.
Step 2 Draw the x and y axes. Label the x axis with the midpoint of
each class, and then use a suitable scale on the y axis for the
frequencies.
Step 3 Using the midpoints for the x values and the frequencies as the y
values, plot the points.
Step 4 Connect adjacent points with line segments. Draw a line back to
the x axis at the beginning and end of the graph, at the same
distance that the previous and next midpoints would be located,
as shown in Figure 2–2.
FIGURE 2–2
Frequency Polygon for Example 2–5
The frequency polygon and the histogram are two different ways to
represent the same data set. The choice of which one to use is left to the
discretion of the researcher.
The Ogive
The third type of graph that can be used represents the cumulative
frequencies for the classes. This type of graph is called the cumulative
frequency graph, or ogive. The cumulative frequency is the sum of the
frequencies accumulated up to the upper boundary of a class in the
distribution.
page 61
Step 2 Draw the x and y axes. Label the x axis with the class
boundaries. Use an appropriate scale for the y axis to represent
the cumulative frequencies. (Depending on the numbers in the
cumulative frequency columns, scales such as 0, 1, 2, 3, . . . , or
5, 10, 15, 20, . . . , or 1000, 2000, 3000, . . . can be used. Do not
label the y axis with the numbers in the cumulative frequency
column.) In this example, a scale of 0, 5, 10, 15, . . . will be
used.
Step 3 Plot the cumulative frequency at each upper class boundary, as
shown in Figure 2–3. Upper boundaries are used since the
cumulative frequencies represent the number of data values
accumulated up to the upper boundary of each class.
Step 4 Starting with the first upper class boundary, 104.5, connect
adjacent points with line segments, as shown in Figure 2–4.
Then extend the graph to the first lower class boundary, 99.5, on
the x axis.
FIGURE 2–3
Plotting the Cumulative Frequency for Example 2–6
FIGURE 2–4
Ogive for Example 2–6
Unusual Stats
Twenty-two percent of Americans sleep 6 hours a day or less.
page 62
Cumulative frequency graphs are used to visually represent how many
values are below a certain upper class boundary. For example, to find out
how many record high temperatures are less than 114.5°F, locate 114.5°F on
the x axis, draw a vertical line up until it intersects the graph, and then draw a
horizontal line at that point to the y axis. The y axis value is 28, as shown in
Figure 2–5.
FIGURE 2–5
Finding a Specific Cumulative Frequency
page 63
SOLUTION
Step 1 Convert each frequency to a proportion or relative frequency by
dividing the frequency for each class by the total number of
observations.
For the class 35.5–42.5, the relative frequency is = 0.16. For
the class 42.5–49.5, the relative frequency is = 0.28. For the
class 49.5–56.5, the relative frequency is = 0.40. For the class
56.5–63.5, the relative frequency is = 0.10. For the class 63.5–
70.5, the relative frequency is = 0.06.
Place these values in the column labeled Relative frequency.
Also, find the midpoints, as shown in Example 2–5, for each
class and place them in the Midpoints column.
Step 3 Draw each graph as shown in Figure 2–6. For the page 64
histogram and ogive, use the class boundaries along the
x axis. For the frequency, use the midpoints on the x axis. For
the scale on the y axis, use proportions.
FIGURE 2–6
Graphs for Example 2–7
Distribution Shapes
When one is describing data, it is important to be able to recognize the shapes
of the distribution values. In later chapters, you will see that the shape of a
distribution also determines the appropriate statistical methods used to
analyze the data.
When the peak of a distribution is to the left and the data values page 66
taper off to the right, a distribution is said to be positively or right-
skewed. See Figure 2–7(e). When the data values are clustered to the right
and taper off to the left, a distribution is said to be negatively or left-skewed.
See Figure 2–7(f). Skewness will be explained in detail in Chapter 3.
Distributions with one peak, such as those shown in Figure 2–7(a), (e), and
(f), are said to be unimodal. (The highest peak of a distribution indicates
where the mode of the data values is. The mode is the data value that occurs
more often than any other data value. Modes are explained in Chapter 3.)
When a distribution has two peaks of the same height, it is said to be
bimodal. See Figure 2–7(g). Finally, the graph shown in Figure 2–7(h) is a U-
shaped distribution.
Distributions can have other shapes in addition to the ones shown here;
however, these are some of the more common ones that you will encounter in
analyzing data.
When you are analyzing histograms and frequency polygons, look at the
shape of the curve. For example, does it have one peak or two peaks? Is it
relatively flat, or is it U-shaped? Are the data values spread out on the graph,
or are they clustered around the center? Are there data values in the extreme
ends? These may be outliers. (See Section 3–3 for an explanation of outliers.)
Are there any gaps in the histogram, or does the frequency polygon touch the
x axis somewhere other than at the ends? Finally, are the data clustered at one
end or the other, indicating a skewed distribution?
For example, the histogram for the record high temperatures in Figure 2–1
shows a single peaked distribution, with the class 109.5–114.5 containing the
largest number of temperatures. The distribution has no gaps, and there are
fewer temperatures in the highest class than in the lowest class.
Source: https://www.zillow.com/bradenton-fl/
page 67
Exercises 2–2
1. Do Students Need Summer Development? For 108 randomly
selected college applicants, the following frequency distribution for
entrance exam scores was obtained. Construct a histogram, frequency
polygon, and ogive for the data. (The data for this exercise will be used
for Exercise 13 in this section.)
3. Pupils Per Teacher The average number of pupils per teacher in each
state is shown. Construct a grouped frequency distribution with 6
classes. Draw a histogram, frequency polygon, and ogive. Analyze the
distribution.
6. NFL Salaries The salaries (in millions of dollars) for 31 NFL teams
for a specific season are given in this frequency distribution.
Construct a histogram, a frequency polygon, and an ogive for the
data; and comment on the shape of the distribution. (The data for this
exercise will be used for Exercise 16 of this section.)
Source: NFL.com
page 68
10. Making the Grade The frequency distributions shown indicate the
percentages of public school students in fourth-grade reading and
mathematics who performed at or above the required proficiency levels
for the 50 states in the United States. Draw histograms for each, and
decide if there is any difference in the performance of the students in
the subjects.
17. Home Runs The data show the most number of home runs hit by a
batter in the American League over the last 30 seasons. Construct a
frequency distribution using 5 classes. Draw a histogram, a frequency
polygon, and an ogive for the date, using relative frequencies. Describe
the shape of the histogram.
page 69
18. Protein Grams in Fast Food The amount of protein (in grams) for a
variety of fast-food sandwiches is reported here. Construct a frequency
distribution, using 6 classes. Draw a histogram, a frequency polygon,
and an ogive for the data, using relative frequencies. Describe the
shape of the histogram.
20. Using the results from Exercise 19, answer these questions.
a. How many values are in the class 27.5–30.5?
b. How many values fall between 24.5 and 36.5?
c. How many values are below 33.5?
d. How many values are above 30.5?
Technology
TI-84 Plus
Step by Step
Step by Step
Constructing a Histogram
To plot the histogram from raw data:
Example TI2–1
Plot a histogram for the following data from Example 2–2.
Constructing a Histogram
1. Press [Ctrl]-N for a new workbook.
2. Enter the data from Example 2–2 in column A, one number per cell.
Label the column Temperature.
3. Enter the upper boundaries into column B. Label the column Bin.
4. From the toolbar, select the Data tab, then select Data Analysis.
5. In Data Analysis, select Histogram and click [OK].
6. In the Histogram dialog box, type A1:A51 in the Input Range box
and type B1:B8 in the Bin Range box.
7. Check the Labels box. Note: Do not check this box unless your labels
are selected in the input range.
8. Select the radio button next to Output range, then type in D1 to the
output range.
9. Check Chart Output. Click [OK].
2. Move the Gap Width slider all the way to the left to change the gap
width of the bars in the histogram to 0.
3. To change the label for the horizontal axis:
a. Left-click the mouse over any part of the histogram.
b. Select the Design tab from the toolbar.
c. Select the Add Chart Element tab, Axis Titles and Primary
Horizontal.
Once the Axis Titles text box is selected, you can type in the name of
the variable represented on the horizontal axis.
Select the legend then delete. There is no need for a legend if there
is only one color. You can also change the line border color to see
each bar better. Select the paint bucket icon in the format data series
options. Select Border>Solid Line, then select a different color from
the palette.
Note: Classes with frequency 0 have been added at the beginning and the
end to “anchor” the frequency polygon to the horizontal axis.
3. Press and hold the left mouse button, and drag over the Frequencies
(including the label) from column B.
4. Select the Insert tab from the toolbar and the Line Chart option.
6. We will need to edit the graph so that the midpoints are on the
horizontal axis.
a. Right-click the mouse on any region of the chart.
b. Choose Select Data.
c. Select Edit below the Horizontal (Category) Axis Labels panel on
the right.
d. Press and hold the left mouse button, and drag over the midpoints
(not including the label) for the Axis label range, then click [OK].
e. Click [OK] on the Select Data Source box.
8. Change the chart title so that one can easily see what the graph page 75
represents.
a. Select Chart Elements, Layout from the toolbar.
b. Select Chart Title.
c. Choose one of the options from the Chart Title menu and edit.
Constructing an Ogive
1. To create an ogive, use the upper class boundaries (horizontal axis)
and cumulative frequencies (vertical axis) from the frequency
distribution.
2. Type the upper class boundaries (including a class with frequency 0
before the lowest class to anchor the graph to the horizontal axis) and
corresponding cumulative frequencies into adjacent columns of an
Excel worksheet.
3. Press and hold the left mouse button, and drag over the Cumulative
Frequencies from column B. Select Line Chart, then the 2-D Line
option.
As with the frequency polygon, you can insert labels on the axes and a chart
title for the ogive.
page 76
MINITAB
Step by Step
Construct a Histogram
1. Enter the data from Example 2–2, the high temperatures for the 50
states, into C1.
2. Select Graph>Histogram.
3. Select [Simple], then click [OK].
4. Click C1 Temperatures in the Graph variables dialog box, and
label the graph.
5. Click [OK]. A new graph window containing the histogram will open.
6. Click the File menu to print or save the graph.
7. Click File>Exit.
8. Save the project as Example 2-2.mpj.
OBJECTIVE 3
Represent data using bar graphs, Pareto charts, time series graphs, pie graphs, and dotplots.
Bar Graphs
When the data are qualitative or categorical, bar graphs can be used to
represent the data. A bar graph can be drawn using either horizontal or
vertical bars.
The graphs show that first-year college students spend the most on
electronic equipment.
Bar graphs can also be used to compare data for two or more groups. These
types of bar graphs are called compound bar graphs. Consider the following
data for the average times (in hours) that adults in the United States spend
viewing television each week. (Note: It is not necessary to have equal class
sizes in these types of graphs.)
Figure 2–10 shows a bar graph that compares the average time (in hours)
that men and women watch television each week. The comparison is made by
placing the bars for each gender next to each other; then the heights of the
bars can be compared.
The graph shows that people spend more time watching television as they
grow older. Also, it shows that for each age group, women watch slightly
more television than men.
FIGURE 2–10
Example of a Compound Bar Graph
Historical Note
Vilfredo Pareto (1848–1923) was an Italian scholar who developed theories in economics,
statistics, and the social sciences. His contributions to statistics include the development
of a mathematical function used in economics. This function has many statistical
applications and is called the Pareto distribution. In addition, he researched income
distribution, and his findings became known as Pareto’s law.
SOLUTION
Step 1 Arrange the data from largest value to smallest value.
The graph shows that people will pay as much as 44% more for fresh
vegetables and as much as 27% for cereal bars.
page 80
Suggestions for Drawing Pareto Charts
1. Make the bars the same width.
2. Arrange the data from largest to smallest according to frequency.
3. Make the units that are used for the frequency equal in size.
Example 2–10 shows the procedure for constructing a time series graph.
Historical Note
Time series graphs are over 1000 years old. The first ones were used to chart the
movements of the planets and the sun.
SOLUTION
Step 1 Draw and label the x and y axes.
Step 2 Label the x axis for months and the y axis for millions of cubic
meters.
Step 3 Plot each point for the values shown.
Step 4 Draw line segments connecting adjacent points. Do not draw a
smooth curve through the points. See Figure 2–12.
Water consumption was highest in July, August, and September, and then
returned to values close to those in January through April.
FIGURE 2–13
Two Time Series Graphs for Comparison
page 82
SPEAKING OF STATISTICS
The graph shows the number of murders (in thousands) that have occurred in the United
States since 2014. Based on the graph, do you think the number of murders is increasing,
decreasing, or remaining the same?
Source: FBI
SOLUTION
Step 1 Since there are 360° in a circle, the frequency for each class
must be converted to a proportional part of the circle. This
conversion is done by using the formula
where f = frequency for each class and n = sum of the
frequencies. Hence, the following conversions are obtained. The
degrees should sum to 360°.1
page 83
Step 3 Next, using a protractor and a compass, draw the graph, page 84
using the appropriate degree measures found in Step 1,
and label each section with the name and percentages, as shown
in Figure 2–14.
1
Note: The degrees column does not always sum to 360° due to rounding.
2
Note: The percent column does not always sum to 100% due to rounding.
SOLUTION
Step 1 Find the number of degrees for each shift, using the formula:
Step 3 Using a protractor, graph each section and write its name and
corresponding percentage as shown in Figure 2–15.
To analyze the nature of the data shown in the pie graph, look at the size of
the sections in the pie graph. For example, are any sections relatively large
compared to the rest? Figure 2–15 shows that the number of calls for the
three shifts are about equal, although slightly more calls were received on the
evening shift.
Note: Computer programs can construct pie graphs easily, so the
mathematics shown here would only be used if those programs were not
available.
Dotplots
A dotplot uses points or dots to represent the data values. If the data values
occur more than once, the corresponding points are plotted above one
another.
Dotplots are used to show how the data values are distributed and to see if
there are any extremely high or low data values.
EXAMPLE 2–13 Federal Waste Sites
The data show the number of federal waste sites in each of the 50 states.
Draw a dot plot for the data and summarize the results.
Step 1 Find the lowest and highest data values, and decide what scale to
use on the horizontal axis. The lowest data value is 0 and the
highest data value is 13, so a scale from 0 to 13 is needed.
Step 2 Draw a horizontal line, and draw the scale on the line.
Step 3 Plot each data value above the line. If the value occurs page 85
more than once, plot the other point above the first
point. See Figure 2–16.
FIGURE 2–16
Figure for Example 2–13
The graph shows that most of the states have between zero and three
waste sites, with 13 states having one waste site. This is the largest
frequency.
OBJECTIVE 4
Draw and interpret a stem and leaf plot.
For example, a data value of 34 would have 3 as the stem and 4 as the leaf. A
data value of 356 would have 35 as the stem and 6 as the leaf.
Example 2–14 shows the procedure for constructing a stem and leaf plot.
SOLUTION
Step 1 Arrange the data in order:
02, 13, 14, 20, 23, 25, 31, 32, 32, 32,
32, 33, 36, 43, 44, 44, 45, 51, 52, 57
Note: Arranging the data in order is not essential and can be
cumbersome when the data set is large; however, it is helpful in
constructing a stem and leaf plot. The leaves in the final stem
and leaf plot should be arranged in order.
Step 2 Separate the data according to the first digit, as shown. page 86
Step 3 A display can be made by using the leading digit as the stem and
the trailing digit as the leaf. For example, for the value 32, the
leading digit, 3, is the stem and the trailing digit, 2, is the leaf.
For the value 14, the 1 is the stem and the 4 is the leaf. Now a
plot can be constructed as shown in Figure 2–17.
FIGURE 2–17
Stem and Leaf Plot for Example 2–14
Figure 2–17 shows that the distribution peaks in the center and that there
are no gaps in the data. For 7 of the 20 days, the number of patients receiving
cardiograms was between 31 and 36. The plot also shows that the testing
center treated from a minimum of 2 patients to a maximum of 57 patients in
any one day.
If there are no data values in a class, you should write the stem number and
leave the leaf row blank. Do not put a zero in the leaf row.
SOLUTION
Step 1 Arrange the data in order.
50, 51, 51, 52, 53, 53, 55, 55, 56, 57, 57, 58, 59, 62, 63,
65, 65, 66, 66, 67, 68, 69, 69, 72, 73, 75, 75, 77, 78, 79
Step 2 Separate the data according to the classes.
page 87
SPEAKING OF STATISTICS
The Federal Reserve estimated that during a recent year, there were 22 billion bills in
circulation. About 28% of them were $1 bills, 3% were $2 bills, 7% were $5 bills, 5% were
$10 bills, 21% were $20 bills, 4% were $50 bills, and 32% were $100 bills. It costs about
3¢ to print each bill.
The average life of a $1 bill is 22 months, a $10 bill 3 years, a $20 bill 4 years, a $50 bill
9 years, and a $100 bill 9 years. What type of graph would you use to represent the
average lifetimes of the bills?
When you analyze a stem and leaf plot, look for peaks and gaps in the
distribution. See if the distribution is symmetric or skewed. Check the
variability of the data by looking at the spread.
Related distributions can be compared by using a back-to-back stem and
leaf plot. The back-to-back stem and leaf plot uses the same digits for the
stems of both distributions, but the digits that are used for the leaves are
arranged in order out from the stems on both sides. Example 2–16 shows a
back-to-back stem and leaf plot.
SOLUTION page 88
FIGURE 2–19 Back-to-Back Stem and Leaf Plot for Example 2–16
Stem and leaf plots are part of the techniques called exploratory data
analysis. More information on this topic is presented in Chapter 3.
Misleading Graphs
Graphs give a visual representation that enables readers to analyze and
interpret data more easily than they could simply by looking at numbers.
However, inappropriately drawn graphs can misrepresent the data and lead
the reader to false conclusions. For example, a car manufacturer’s ad stated
that 98% of the vehicles it had sold in the past 10 years were still on the road.
The ad then showed a graph similar to the one in Figure 2–20. The graph
shows the percentage of the manufacturer’s automobiles still on the road and
the percentage of its competitors’ automobiles still on the road. Is there a
large difference? Not necessarily.
FIGURE 2–20
Graph of Automaker’s Claim Using a Scale from 95 to 100%
FIGURE 2–21 page 89
Graph in Figure 2–20 Redrawn Using a Scale from 0 to 100%
Notice the scale on the vertical axis in Figure 2–20. It has been cut off (or
truncated) and starts at 95%. When the graph is redrawn using a scale that
goes from 0 to 100%, as in Figure 2–21, there is hardly a noticeable
difference in the percentages. Thus, changing the units at the starting point on
the y axis can convey a very different visual representation of the data.
It is not wrong to truncate an axis of the graph; many times it is necessary
to do so. However, the reader should be aware of this fact and interpret the
graph accordingly. Do not be misled if an inappropriate impression is given.
Let us consider another example. The projected required fuel economy in
miles per gallon for vehicles is shown. In this case, an increase from 14.1 to
16 miles per gallon is projected.
Note that it is not wrong to use the graphing techniques of truncating the
scales or representing data by two-dimensional pictures. But when these
techniques are used, the reader should be cautious of the conclusion drawn on
the basis of the graphs.
Another way to misrepresent data on a graph is by omitting labels or units
on the axes of the graph. The graph shown in Figure 2–24 compares the cost
of living, economic growth, population growth, etc., of four main geographic
areas in the United States. However, since there are no numbers on the y axis,
very little information can be gained from this graph, except a crude ranking
of each factor. There is no way to decide the actual magnitude of the
differences.
Finally, all graphs should contain a source for the information presented.
The inclusion of a source for the data will enable you to check the reliability
of the organization presenting the data.
2. Paying Off a College Debt The following data show the ages at which
millennials (20–29) expect to pay off their college debt. Draw a
horizontal bar graph to represent the data.
4. Internet Users The data show the top five nations with the most
Internet users in millions. Draw a Pareto chart for the data.
5. Online Ad Spending The amount spent (in billions of dollars) for ads
online is shown. (The numbers for 2016 through 2019 were projected
numbers.) Draw a time series graph and comment on the trend.
Source: eMarketer.
9. What’s Cooking? A study of 1063 U.S. adults found that 24% used a
microwave oven to prepare a meal, 44% use the stove-top, 25% used
the oven, and 7% used another method. Draw a pie graph for the data
and analyze the graph.
Source: Pieapod Survey
11. Lottery Winners A survey by BMO Harris asked 304 small business
owners if they would sell their businesses if they won the lottery. The
results are shown. Draw a pie graph for the data.
14. Teacher Strikes In Pennsylvania the numbers of teacher strikes for the
last 14 years are shown. Construct a dotplot for the data. Comment on
the graph.
15. Years of Experience The data show the number of years of experience
the players on the Pittsburgh Steelers football team have at the
beginning of the season. Draw and analyze a dot plot for the data.
16. Commuting Times Fifty off-campus students were asked how long it
takes them to get to school. The times (in minutes) are shown.
Construct a dotplot and analyze the data.
17. 50 Home Run Club There are 43 Major League baseball players (as of
2015) who have hit 50 or more home runs in one season. Construct a
stem and leaf plot and analyze the data.
18. Terrorist Attacks The data show the number of terrorist attacks
against the United States over a recent 16-year period. Draw a stem and
leaf plot for the data.
19. Length of Major Rivers The data show the lengths (in page 94
hundreds of miles) of major rivers in South America and
Europe. Construct a back-to-back stem and leaf plot, and compare the
distributions.
20. Math and Reading Achievement Scores The math and reading
achievement scores from the National Assessment of Educational
Progress for selected states are listed below. Construct a back-to-back
stem and leaf plot with the data, and compare the distributions.
21. State which type of graph (Pareto chart, time series graph, or pie
graph) would most appropriately represent the data.
a. Situations that distract automobile drivers
b. Number of persons in an automobile used for getting to and from
work each day
c. Amount of money spent for textbooks and supplies for one semester
d. Number of people killed by tornados in the United States each year
for the last 10 years
e. The number of pets (dogs, cats, birds, fish, etc.) in the United States
this year
f. The average amount of money that a person spent for a significant
other for Christmas for the last 6 years
22. State which graph (Pareto chart, time series graph, or pie graph) would
most appropriately represent the given situation.
a. The number of students enrolled at a local college for each year
during the last 5 years
b. The budget for the student activities department at a certain college
for a specific year
c. The means of transportation the students use to get to school
d. The percentage of votes each of the four candidates received in the
last election
e. The record temperatures of a city for the last 30 years
f. The frequency of each type of crime committed in a city during the
year
23. Credit Scores The following factors contribute to a FICO credit score.
Draw a pie chart and vertical bar graph for the data. Which graph is a
better representation of the data?
Source: Experian
24. Alcohol Poisoning The following data show the ages and the percents
of individuals who have died from alcohol poisoning for a specific
year. Draw a pie chart and a vertical bar graph for the data. Which
graph do you think better illustrates the significance of the data?
Explain.
Source: Addictionresource.com
(Note: Class width here is 10 and was already decided by the researchers, so it can be used
here.)
25. Cost of Milk The graph shows the increase in the price of a quart of
milk. Why might the increase appear to be larger than it really is?
26. Websites The data show the number (in millions) of websites in the
United States from 2012 to 2020. Draw a time series graph for the data.
Technology
TI-84 Plus
Step by Step
Step by Step
To graph a time series, follow the procedure for a frequency polygon from
Section 2–2, using the following data for the number of outdoor drive-in
theaters
EXCEL
Step by Step
5. To change the title of the chart, click on the current title of the page 96
chart.
6. When the text box containing the title is highlighted, click the mouse
in the text box and change the title.
7. Right-click on any bar and select Format Data Series. Change the gap
width to zero.
8. Add labels to the axes and delete the legend.
7. Select Edit from the Horizontal Axis Labels and highlight page 97
the years from column A, then click [OK].
8. Click [OK] on the Select Data Source box.
9. Create a title for your chart, such as Vehicles Using the Pennsylvania
Turnpike. Right-click the mouse on any region of the chart. Select the
Chart Tools tab from the toolbar, then Layout.
10. Select Chart Title and highlight the current title to change the title.
11. Select Axis Titles to change the horizontal and vertical axis labels.
4. Click on any region of the chart. Then select Design from the Chart
Tools tab on the toolbar.
5. Select Add Chart Elements from the chart Layouts tab on page 98
the toolbar. Under Format Data Labels, check Category Name
and Percentage; uncheck legend.
6. To change the title of the chart, click on the current title of the chart.
7. When the text box containing the title is highlighted, click the mouse
in the text box and change the title.
MINITAB
Step by Step
After the graph is made, right-click over any bar to change the appearance
such as the color of the bars. To change the gap between them, right-click
on the horizontal axis and then choose Edit X scale. To change the y Scale
to percents, right-click on the vertical axis and then choose Graph options
and Show Y as a Percent.
The data show the average water consumption in a specific city in millions
of cubic meters for a one-year period (Example 2–10).
page 102
Summary
• When data are collected, the values are called raw data. Since very little
knowledge can be obtained from raw data, they must be organized in
some meaningful way. A frequency distribution using classes is the
common method that is used. (2–1)
• Once a frequency distribution is constructed, graphs can be drawn to
give a visual representation of the data. The most commonly used
graphs in statistics are the histogram, frequency polygon, and ogive. (2–
2)
• Other graphs such as the bar graph, Pareto chart, time series graph, pie
graph and dotplot can also be used. Some of these graphs are frequently
seen in newspapers, magazines, and various statistical reports. (2–3)
• A stem and leaf plot uses part of the data values as stems and part of the
data values as leaves. This graph has the advantage of a frequency
distribution and a histogram. (2–3)
• Finally, graphs can be misleading if they are drawn improperly. For
example, increases and decreases over time in time series graphs can be
exaggerated by truncating the scale on the y axis. One-dimensional
increases or decreases can be exaggerated by using two-dimensional
figures. Finally, when labels or units are purposely omitted, there is no
actual way to decide the magnitude of the differences between the
categories. (2–3)
Important Terms
bar graph 76
categorical frequency distribution 43
class 42
class boundaries 45
class midpoint 45
class width 45
compound bar graphs 78
cumulative frequency 61
cumulative frequency distribution 48
dotplot 84
frequency 42
frequency distribution 42
frequency polygon 59
grouped frequency distribution 44
histogram 59
lower class limit 44
ogive 61
open-ended distribution 46
Pareto chart 79
pie graph 81
raw data 42
relative frequency graph 62
stem and leaf plot 85
time series graph 80
ungrouped frequency distribution 49
upper class limit 44
Important Formulas
Formula for the percentage of values in each class:
where
or
Formula for the degrees for each section of a pie graph:
page 103
Review Exercises
Section 2–1
1. Alcohol Consumption A survey shows the type of alcoholic beverages
that a person consumes. Construct a categorical frequency distribution
for the data: B = beer, W = wine, and S = spirits.
4. Wind Speed The data show the average wind speed for 36 days in a
large city. Construct an ungrouped frequency distribution for the data.
5. Waterfall Heights The data show the heights (in feet) of notable
waterfalls in North America. Organize the data into a grouped
frequency distribution using 6 classes. This data will be used for
Exercises 7, 9, and 11.
6. Ages of the Vice Presidents at the Time of Their Death The ages at
the time of death of those Vice Presidents of the United States who
have passed away are listed below. Use the data to construct a
frequency distribution. Use 6 classes. The data for this exercise will be
used for Exercises 8, 10, and 12.
Section 2–2
7. Find the relative frequency for the frequency distribution for the data in
Exercise 5.
8. Find the relative frequency for the frequency distribution for the data in
Exercise 6.
9. Construct a histogram, frequency polygon, and ogive for the data in
Exercise 5.
10. Construct a histogram, frequency polygon, and ogive for the data in
Exercise 6.
Section 2–3
13. Non-Alcoholic Beverages The data show the yearly consumption (in
gallons) of popular non-alcoholic beverages. Draw a vertical and
horizontal bar graph to represent the data.
Source: U.S. Department of Agriculture
14. Family Size The following data represent the percents of family sizes
of residents in the United States. Draw a vertical bar graph for the data
and summarize the results.
15. Crime The data show the percentage of the types of crimes page 104
commonly committed in the United States. Construct a Pareto
chart for the data.
Source: FBI
16. Hours of Sleep The following data show the recommended number of
hours per night that people need for sleep. Draw a Pareto graph for the
data and summarize the results.
Source: SleepFoundation.org
17. Broadway Stage Engagements The data represent the number of new
shows on Broadway from 2014 to 2020. Draw a time series graph for
the data.
18. Internet Users The data (in hundred millions) show the number of
Internet users worldwide from 2010 to 2019. Draw a time series graph
for the data.
21. Peyton Manning’s Colts Career Peyton Manning played for the
Indianapolis Colts for 14 years. (He did not play in 2011.) The data
show the number of touchdowns he scored for the years 1998–2010.
Construct a dotplot for the data and comment on the graph.
Source: NFL.com
22. Songs on CDs The data show the number of songs on each of 40 CDs
from the author’s collection. Construct a dotplot for the data and
comment on the graph.
23. Weights of Football Players A local football team has 30 players; the
weight of each player is shown. Construct a stem and leaf plot for the
data. Use stems 20__, 21__, 22__, etc.
25. Pain Relief The graph below shows the time it takes Quick page 105
Pain Relief to relieve a person’s pain. The graph below that
shows the time a competitor’s product takes to relieve pain. Why might
these graphs be misleading?
26. Casino Payoffs The graph shows the payoffs obtained from the White
Oak Casino compared to the nearest competitor’s casino. Why is this
graph misleading?
Data Analysis
A Data Bank is found in Appendix B.
1. From the Data Bank located in Appendix B, choose one of the
following variables: age, weight, cholesterol level, systolic pressure,
IQ, or sodium level. Select at least 30 values. For these values,
construct a grouped frequency distribution. Draw a histogram,
frequency polygon, and ogive for the distribution. Describe briefly the
shape of the distribution.
2. From the Data Bank, choose one of the following variables:
educational level, smoking status, or exercise. Select at least 20 values.
Construct an ungrouped frequency distribution for the data. For the
distribution, draw a Pareto chart and describe briefly the nature of the
chart.
3. From the Data Bank, select at least 30 subjects and construct a
categorical distribution for their marital status. Draw a pie graph and
describe briefly the findings.
4. Using the data from Data Set IV in Appendix B, construct a frequency
distribution and draw a histogram. Describe briefly the shape of the
distribution of the tallest buildings in New York City.
5. Using the data from Data Set XI in Appendix B, construct a frequency
distribution and draw a frequency polygon. Describe briefly the shape
of the distribution for the number of pages in statistics books.
6. Using the data from Data Set IX in Appendix B, divide the United
States into four regions, as follows:
Find the total population for each region, and draw a Pareto chart and a
pie graph for the data. Analyze the results. Explain which chart might
be a better representation for the data.
7. Using the data from Data Set I in Appendix B, make a stem and leaf
plot for the record low temperatures in the United States. Describe the
nature of the plot.
page 106
STATISTICS TODAY
How Your Identity Can Be Stolen
—Revisited
Data presented in numerical form do not convey an easy-to-interpret conclusion; however,
when data are presented in graphical form, readers can see the visual impact of the
numbers. In the case of identity fraud, the reader can see that most of the identity fraud is
credit card fraud followed by employment of tax-related fraud. These two types of fraud
account for over half (57%) of the identity frauds.
The Federal Trade Commission suggests some ways to protect your identity:
1. Shred all financial documents no longer needed.
2. Protect your Social Security number.
3. Don’t give out personal information on the phone, through the mail, or over the Internet.
4. Never click on links sent in unsolicited emails.
5. Don’t use an obvious password for your computer documents.
6. Keep your personal information in a secure place at home.
Chapter Quiz
Determine whether each statement is true or false. If the
statement is false, explain why.
1. In the construction of a frequency distribution, it is a good idea to have
overlapping class limits, such as 10–20, 20–30, 30–40.
10. What graph should be used to show the relationship between the parts
and the whole?
a. Histogram
b. Pie graph
c. Pareto chart
d. Ogive
11. Except for rounding errors, relative frequencies should add up to what
sum?
a. 0
b. 1
c. 50
d. 100
14. Data such as blood types (A, B, AB, O) can be organized into a(n)
______ frequency distribution.
15. Data collected over a period of time can be graphed using a(n) ______
graph.
21. Construct a histogram, a frequency polygon, and an ogive for the data
in Exercise 20.
23. Construct a histogram, frequency polygon, and ogive for the data in
Exercise 22. Analyze the histogram.
24. Recycled Trash Construct a Pareto chart and a horizontal bar graph for
the number of tons (in millions) of trash recycled per year by
Americans based on an Environmental Protection Agency study.
26. Needless Deaths of Children The New England Journal of page 108
Medicine predicted the number of needless deaths due to
childhood obesity. Draw a time series graph for the data.
27. Museum Visitors The number of visitors to the Historic Museum for
25 randomly selected hours is shown. Construct a stem and leaf plot
for the data.
29. Water Usage The graph shows the average number of gallons of water
a person uses for various activities. Can you see anything misleading
about the way the graph is drawn?
2. Road Rage Here are some statistics on road rage. If you are page 109
giving a presentation on this subject, what type of graph could
you use to emphasize the given statistics? Draw the graph.
Road rage is a serious and dangerous problem that today’s drivers have
to deal with almost daily. Road rage can lead to serious consequences
and even death.
A recent study by the American Automobile Association (AAA)
reported the following information based on the responses of millions
of drivers. The most common responses of drivers who exhibit road
rage are
Tailgating (104 million or 51%)
Yelling at another driver (95 million or 47%)
Honking the horn (91 million or 45%)
Making angry gestures (67 million or 33%)
Blocking another vehicle on purpose (49 million or 24%)
Cutting off another vehicle (24 million or 12%)
Getting out of the automobile and confronting the other driver (7.6
million or 4%)
Deliberately ramming another vehicle (5.7 million or 3%)
Some causes for road rage are
Ignoring other drivers because of cell phone use
Keeping high beams on regardless of traffic conditions
Failing to use turn signals when changing lanes
Forgetting to check the blind spot when changing lanes
Causing other drivers to change their speed or use their brakes
The AAA suggests that drivers be tolerant and forgiving of other
drivers and never respond to other drivers’ road rage actions.
Data Projects
Where appropriate, use the TI-84 Plus, Excel, MINITAB, or a
computer program of your choice to complete the following exercises.
1. Business and Finance Consider the 30 stocks listed as the Dow Jones
Industrials. For each, find its earnings per share. Randomly select 30
stocks traded on the NASDAQ. For each, find its earnings per share.
Create a frequency table with 5 categories for each data set. Sketch a
histogram for each. How do the two data sets compare?
2. Sports and Leisure Use systematic sampling to create a sample of 25
National League and 25 American League baseball players from the
most recently completed season. Find the number of home runs for
each player. Create a frequency table with 5 categories for each data
set. Sketch a histogram for each. How do the two leagues compare?
3. Technology Randomly select 50 songs from your music player or
music organization program. Find the length (in seconds) for each
song. Use these data to create a frequency table with 6 categories.
Sketch a frequency polygon for the frequency table. Is the shape of the
distribution of times uniform, skewed, or bell-shaped? Also note the
genre of each song. Create a Pareto chart showing the frequencies of
the various categories. Finally, note the year each song was released.
Create a pie chart organized by decade to show the percentage of songs
from various time periods.
4. Health and Wellness Use information from the Red Cross to page 110
create a pie chart depicting the percentages of Americans with
various blood types. Also find information about blood donations and
the percentage of each type donated. How do the charts compare? Why
is the collection of type O blood so important?
5. Politics and Economics Consider the U.S. Electoral College System.
For each of the 50 states, determine the number of delegates received.
Create a frequency table with 8 classes. Is this distribution uniform,
skewed, or bell-shaped?
6. Your Class Have each person in class take their pulse and determine
the heart rate (beats in 1 minute). Use the data to create a frequency
table with 6 classes. Then have everyone in the class do 25 jumping
jacks and immediately take the pulse again after the activity. Create a
frequency table for those data as well. Compare the two results. Are
they similarly distributed? How does the range of scores compare?
5. Answers will vary. For the distribution shown, there is a peak at the
49–55 class.
6. The age 78 could be an outlier.
7. The distribution is unimodal with 49–55 being the modal class. It is
somewhat positively skewed.