ch2-22092024-104300am

Download as pdf or txt
Download as pdf or txt
You are on page 1of 97

2 page 40

page 41

Frequency Distributions and


Graphs

Image Source, all rights reserved.

OUTLINE
Introduction
2–1 Organizing Data
2–2 Histograms, Frequency Polygons, and Ogives
2–3 Other Types of Graphs
Summary
OBJECTIVES
After completing this chapter, you should be able to
1 Organize data using a frequency distribution.
2 Represent data in frequency distributions graphically, using
histograms, frequency polygons, and ogives.
3 Represent data using bar graphs, Pareto charts, time series
graphs, pie graphs, and dotplots.
4 Draw and interpret a stem and leaf plot.

STATISTICS TODAY

How Your Identity Can Be Stolen


Identity fraud is a big business today—more than 9 million people are
victims. The total amount of fraud in a recent year was $14.4 million.
Identity fraud occurs every 2 seconds. About 1 in 15 people are
victims of identity theft. It takes 100–200 hours and six months to
correctly identify problems resulting from identity theft. The average
amount of theft is $1343 per victim. The percents for the top six types
of identity theft for a recent year are shown.

Common Types of Identify Thefts


Credit Card Fraud 35%
Employment or Tax-Related Fraud 22%
Phone or Utility Fraud 15%
Bank Fraud 13%
Loan or Lease Fraud 8%
Government Documents or Government Benefits Fraud 7%
Looking at a table does not have the impact that presenting
numbers in a well-drawn graph does. See Statistics Today—Revisited
at the end of this chapter for a graph that can better represent these
data. This chapter will show you how to construct appropriate graphs
to represent data and help you to get your point across.

page 42
Introduction
When conducting a statistical study, the researcher must gather data for the
particular variable under study. For example, if a researcher wishes to study
the number of people who were bitten by poisonous snakes in a specific
geographic area over the past several years, the researcher has to gather the
data from various doctors, hospitals, or health departments.
To describe situations, draw conclusions, or make inferences about events,
the researcher must organize the data in some meaningful way. The most
convenient method of organizing data is to construct a frequency distribution.
After organizing the data, the researcher must present them so they can be
understood by those who will benefit from reading the study. The most useful
method of presenting the data is by constructing statistical charts and graphs.
There are many different types of charts and graphs, and each one has a
specific purpose.
This chapter explains how to organize data by constructing frequency
distributions and how to present the data by constructing charts and graphs.
The charts and graphs illustrated here are histograms, frequency polygons,
ogives, pie graphs, Pareto charts, and time series graphs. A graph that
combines the characteristics of a frequency distribution and a histogram,
called a stem and leaf plot, is also explained.

2–1 Organizing Data


Suppose a researcher wished to do a study on the ages of the 50 wealthiest
people in the world. The researcher first would have to get the data on the
ages of the people. In this case, these ages are listed in Forbes magazine.
When the data are in original form, they are called raw data and are listed
next.
OBJECTIVE 1
Organize data using a frequency distribution.

Since little information can be obtained from looking at raw data, the
researcher organizes the data into what is called a frequency distribution.

Unusual Stats
Of Americans 50 years old and over, 23% think their greatest achievements are still ahead
of them.

Each raw data value is placed into a quantitative or qualitative category


called a class. The frequency of a class then is the number of data values
contained in a specific class. A frequency distribution is shown for the
preceding data set.

page 43
Now some general observations can be made from looking at the
frequency distribution. For example, it can be stated that the majority of the
wealthy people in the study are 45 years old or older.
The classes in this distribution are 27–35, 36–44, etc. These values are
called class limits. The data values 27, 28, 29, 30, 31, 32, 33, 34, 35 can be
tallied in the first class; 36, 37, 38, 39, 40, 41, 42, 43, 44 in the second class;
and so on.
Two types of frequency distributions that are most often used are the
categorical frequency distribution and the grouped frequency distribution.
The procedures for constructing these distributions are shown now.

Categorical Frequency Distributions


The categorical frequency distribution is used for data that can be placed in
specific categories, such as nominal- or ordinal-level data. For example, data
such as political affiliation, religious affiliation, or major field of study would
use categorical frequency distributions.

EXAMPLE 2–1 Breakfast Beverages


Forty people were asked what beverage they drink for breakfast. Construct
a categorical frequency distribution for the data and summarize the results.
Use these classes: W = water, M = milk, J = juice, C = coffee, and T = tea.

Source: Based on U.S. Department of Agriculture

SOLUTION

Since the data are categorical, discrete classes are used. They are W, M, J,
C, and T.
Step 1 Make a table as shown.
Step 2 Tally the data and place the results in the second column page 44
labeled Tally.
Step 3 Count the tallies and place the results in the third column labeled
Frequency.
Step 4 Find the percentage of values in each class by using the formula

where f = frequency of the class and n = total number of values.


For example, in class W, the percentage is

Percentages are not normally part of a frequency distribution,


but they can be added since they are used in certain types of
graphs such as pie graphs. Also, the decimal equivalent of a
percent is called a relative frequency.
Step 5 Find the totals for the Frequency and Percent columns. The
completed table is shown. It is a good idea to add the Percent
column to make sure it sums to 100%. This column won’t
always sum to 100% because of rounding.
For the sample, most people selected water and the smallest
number selected tea.

Grouped Frequency Distributions


When the range of the data is large, the data must be grouped into classes that
are more than one unit in width, in what is called a grouped frequency
distribution. For example, a distribution of the blood glucose levels in
milligrams per deciliter (mg/dL) for 50 randomly selected college students is
shown.

Unusual Stats
Six percent of Americans say they find life dull.

The procedure for constructing the preceding frequency distribution is


given in Example 2–2; however, several things should be noted. In this
distribution, the values 58 and 64 of the first class are called class limits. The
lower class limit is 58; it represents the smallest data value that can be
included in the class. The upper class limit is 64; it represents the largest
data value that can be included in the class. The numbers in the second
column are called class boundaries. These numbers are used to separate the
classes so that there are no gaps in the frequency distribution. The gaps are
due to the limits; for example, there is a gap between 64 and 65.

Students sometimes have difficulty finding class boundaries when page 45


given the class limits. The basic rule of thumb is that the class limits
should have the same decimal place value as the data, but the class
boundaries should have one additional place value and end in a 5. For
example, if the values in the data set are whole numbers, such as 59, 68, and
82, the limits for a class might be 58–64, and the boundaries are 57.5–64.5.
Find the boundaries by subtracting 0.5 from 58 (the lower class limit) and
adding 0.5 to 64 (the upper class limit).

Unusual Stats
One out of every hundred people in the United States is color-blind.

If the data are in tenths, such as 6.2, 7.8, and 12.6, the limits for a class
hypothetically might be 7.8–8.8, and the boundaries for that class would be
7.75–8.85. Find these values by subtracting 0.05 from 7.8 and adding 0.05 to
8.8.
Class boundaries are not always included in frequency distributions;
however, they give a more formal approach to the procedure of organizing
data, including the fact that sometimes the data have been rounded. You
should be familiar with boundaries since you may encounter them in a
statistical study.
Finally, the class width for a class in a frequency distribution is found by
subtracting the lower (or upper) class limit of one class from the lower (or
upper) class limit of the next class. For example, the class width in the
preceding distribution on the distribution of blood glucose levels is 7, found
from 65 − 58 = 7.
The class width can also be found by subtracting the lower boundary from
the upper boundary for any given class. In this case, 64.5 − 57.5 = 7.
Note: Do not subtract the limits of a single class. It will result in an
incorrect answer.
The researcher must decide how many classes to use and the width of each
class. To construct a frequency distribution, follow these rules:
1. There should be between 5 and 20 classes. Although there is no hard-and-
fast rule for the number of classes contained in a frequency distribution, it
is of utmost importance to have enough classes to present a clear
description of the collected data.
2. It is preferable but not absolutely necessary that the class width be an
odd number. This ensures that the midpoint of each class has the same
place value as the data. The class midpoint Xm is obtained by adding the
lower and upper boundaries and dividing by 2, or adding the lower and
upper limits and dividing by 2:

or

For example, the midpoint of the first class in the example with glucose
levels is

The midpoint is the numeric location of the center of the class. Midpoints
are necessary for graphing (see Section 2–2). If the class width is an even
number, the midpoint is in tenths. For example, if the class width is 6 and
the boundaries are 5.5 and 11.5, the midpoint is

page 46

Rule 2 is only a suggestion, and it is not rigorously followed, especially


when a computer is used to group data.
3. The classes must be mutually exclusive. Mutually exclusive classes have
nonoverlapping class limits so that data cannot be placed into two classes.
Many times, frequency distributions such as this
are found in the literature or in surveys. Into which class should a 40-
year-old person be placed? A better way to construct a frequency
distribution is to use classes such as

Recall that boundaries are mutually exclusive. For example, when a class
boundary is 5.5 to 10.5, the data values that are included in that class are
values from 6 to 10. A data value of 5 goes into the previous class, and a
data value of 11 goes into the next-higher class.
4. The classes must be continuous. Even if there are no values in a class, the
class must be included in the frequency distribution. There should be no
gaps in a frequency distribution. The only exception occurs when the
class with a zero frequency is the first or last class. A class with a zero
frequency at either end can be omitted without affecting the distribution.
5. The classes must be exhaustive. There should be enough classes to
accommodate all the data.
6. The classes must be equal in width. This avoids a distorted view of the
data.
One exception occurs when a distribution has a class that is open-
ended. That is, the first class has no specific lower limit, or the last class
has no specific upper limit. A frequency distribution with an open-ended
class is called an open-ended distribution. Here are two examples of
distributions with open-ended classes.

The frequency distribution for age is open-ended for the last class, which
means that anybody who is 54 years or older will be tallied in the last
class. The distribution for minutes is open-ended for the first class,
meaning that any minute values below 110 will be tallied in that class.

The steps for constructing a grouped frequency distribution are page 47


summarized in the following Procedure Table.

Procedure Table
Constructing a Grouped Frequency Distribution
Step 1 Determine the classes.
Find the highest and lowest values.
Find the range.
Select the number of classes desired.
Find the width by dividing the range by the number of classes
and rounding up.
Select a starting point (usually the lowest value or any
convenient number less than the lowest value); add the width
to get the lower limits.
Find the upper class limits.
Find the boundaries.
Step 2 Tally the data.
Step 3 Find the numerical frequencies from the tallies, and find the
cumulative frequencies.

Example 2–2 shows the procedure for constructing a grouped frequency


distribution, i.e., when the classes contain more than one data value.

Unusual Stats
America’s most popular beverages are soft drinks. It is estimated that, on average, each
person drinks about 52 gallons of soft drinks per year, compared to 22 gallons of beer.

EXAMPLE 2–2 Record High Temperatures


These data represent the record high temperatures in degrees Fahrenheit
(°F) for each of the 50 states. Construct a grouped frequency distribution
for the data, using 7 classes.

Source: The World Almanac and Book of Facts.

SOLUTION

The procedure for constructing a grouped frequency distribution for


numerical data follows.
Step 1 Determine the classes.
Find the highest value and lowest value: H = 134 and L = 100.
Find the range: R = highest value − lowest value = H − L, so

Select the number of classes desired (usually between 5 and 20).


In this case, 7 is arbitrarily chosen.
Find the class width by dividing the range by the number of
classes.

Round the answer up to the nearest whole number if page 48


there is a remainder: 4.9 ≈ 5. (Rounding up is different
from rounding off. A number is rounded up if there is any
decimal remainder when dividing. For example, 85 ÷ 6 = 14.167
and is rounded up to 15. Also, 53 ÷ 4 = 13.25 and is rounded up
to 14. Also, after dividing, if there is no remainder, you will
need to add an extra class to accommodate all the data.)
Select a starting point for the lowest class limit. This can be the
smallest data value or any convenient number less than the
smallest data value. In this case, 100 is used. Add the width to
the lowest score taken as the starting point to get the lower limit
of the next class. Keep adding until there are 7 classes, as
shown, 100, 105, 110, etc.
Subtract one unit from the lower limit of the second class to get
the upper limit of the first class. Then add the width to each
upper limit to get all the upper limits.

The first class is 100–104, the second class is 105–109, etc.


Find the class boundaries by subtracting 0.5 from each lower
class limit and adding 0.5 to each upper class limit:

Step 2 Tally the data.


Step 3 Find the numerical frequencies from the tallies.
The completed frequency distribution is

The frequency distribution shows that the class 109.5–114.5


contains the largest number of temperatures (18) followed by
the class 114.5–119.5 with 13 temperatures. Hence, most of the
temperatures (31) fall between 110 and 119°F.
Historical Note
Florence Nightingale, a nurse in the Crimean War in 1854, used statistics to persuade
government officials to improve hospital care of soldiers in order to reduce the death rate
from unsanitary conditions in the military hospitals that cared for the wounded soldiers.

Sometimes it is necessary to use a cumulative frequency distribution. A


cumulative frequency distribution is a distribution that shows the number
of data values less than or equal to a specific value (usually an upper
boundary). The values are found by adding the frequencies of the classes less
than or equal to the upper class boundary of a specific class. This gives an
ascending cumulative frequency. In this example, the cumulative frequency
for the first class is 0 + 2 = 2; for the second class it is 0 + 2 + 8 = 10; for the
third class it is 0 + 2 + 8 + 18 = 28. Naturally, a shorter way to do this would
be to just add the cumulative frequency of the class below to the frequency of
the given class. For example, the cumulative frequency for the number of
data values less than 114.5 can be found by adding 10 + 18 = 28. The
cumulative frequency distribution for the data in this example is as follows:

page 49

Cumulative frequencies are used to show how many data values are
accumulated up to and including a specific class. In Example 2–2, of the total
record high temperatures 28 are less than or equal to 114°F. Forty-eight of the
total record high temperatures are less than or equal to 124°F.
After the raw data have been organized into a frequency distribution, it will
be analyzed by looking for peaks and extreme values. The peaks show which
class or classes have the most data values compared to the other classes.
Extreme values, called outliers, show large or small data values that are
relative to other data values.
When the range of the data values is relatively small, a frequency
distribution can be constructed using single data values for each class. This
type of distribution is called an ungrouped frequency distribution and is
shown next.

EXAMPLE 2–3 Hours Worked by Students


The data show the number of hours 30 students worked part-time on
campus for one specific day last week. Construct an ungrouped frequency
distribution for the data and analyze the distribution.

SOLUTION
Step 1 Determine the number of classes. Since the range is small (10 –
3 = 7), classes consisting of a single data value can be used.
They are 3, 4, 5, 6, 7, 8, 9, 10.
Note: If the data are continuous, class boundaries can be used.
Subtract 0.5 from each class value to get the lower class
boundary, and add 0.5 to each class value to get the upper class
boundary.
Step 2 Tally the data.
Step 3 From the tallies, find the numerical frequencies and cumulative
frequencies. The completed ungrouped frequency distribution is
shown.

page 50
In this case, eight students worked 4 hours and this was the
largest frequency. The cumulative frequencies are

When you are constructing a frequency distribution, the guidelines


presented in this section should be followed. However, you can construct
several different but correct frequency distributions for the same data by
using a different class width, a different number of classes, or a different
starting point.

Interesting Fact
Male dogs bite children more often than female dogs do; however, female cats bite
children more often than male cats do.

Furthermore, the method shown here for constructing a frequency


distribution is not unique, and there are other ways of constructing one. Slight
variations exist, especially in computer packages. But regardless of what
methods are used, classes should be mutually exclusive, continuous,
exhaustive, and of equal width.
In summary, the different types of frequency distributions were shown in
this section. The first type, shown in Example 2–1, is used when the data are
categorical (nominal), such as blood type or political affiliation. This type is
called a categorical frequency distribution. The second type of distribution is
used when the range is large and classes several units in width are needed.
This type is called a grouped frequency distribution and is shown in Example
2–2. Another type of distribution is used for numerical data and when the
range of data is small, as shown in Example 2–3. Since each class is only one
unit, this distribution is called an ungrouped frequency distribution.
All the different types of distributions are used in statistics and are helpful
when one is organizing and presenting data.
The reasons for constructing a frequency distribution are as follows:
1. To organize the data in a meaningful, intelligible way.
2. To enable the reader to determine the nature or shape of the distribution.
3. To facilitate computational procedures for measures of average and
spread (shown in Sections 3–1 and 3–2).
4. To enable the researcher to draw charts and graphs for the presentation of
data (shown in Section 2–2).
5. To enable the reader to make comparisons among different data sets.
The factors used to analyze a frequency distribution are essentially the
same as those used to analyze histograms and frequency polygons, which are
shown in Section 2–2.

page 51
Applying the Concepts 2–1
Ages of Presidents at Inauguration
The data represent the ages of our Presidents at the time they were first
inaugurated.
1. Were the data obtained from a population or a sample? Explain your
answer.
2. What was the age of the oldest President?
3. What was the age of the youngest President?
4. Construct a frequency distribution for the data. (Use your own
judgment as to the number of classes and class size.)
5. Are there any peaks in the distribution?
6. Identify any possible outliers.
7. Write a brief summary of the nature of the data as shown in the
frequency distribution.

See page 109–110 for the answers.

Exercises 2–1
1. List five reasons for organizing data into a frequency distribution.
2. Name the three types of frequency distributions, and explain when
each should be used.
3. How many classes should frequency distributions have? Why should
the class width be an odd number?
4. What are open-ended frequency distributions? Why are they
necessary?

For Exercises 5–8, find the class boundaries, midpoints, and widths for
each class.
5. 58–62
6. 125–131
7. 16.35–18.46
8. 16.3–18.5

For Exercises 9–12, show frequency distributions that are incorrectly


constructed. State the reasons why they are wrong.

13. Household Pets The following data show the household pets that
people own: F = fish, D = dog, B = bird, C = cat, and R = reptile.
Construct a categorical frequency distribution for the data and
summarize the results.

page 52
Source: Based on information from the National Pet Owners’ Society

14. Trust in Internet Information A survey was taken on how much trust
people place in the information they read on the Internet. Construct a
categorical frequency distribution for the data. A = trust in all that they
read, M = trust in most of what they read, H = trust in about one-half of
what they read, S = trust in a small portion of what they read. (Based
on information from the UCLA Internet Report.)

15. Eating at Fast Food Restaurants A survey was taken of 50


individuals. They were asked how many days per week they ate at a
fast-food restaurant. Construct a frequency distribution using 8 classes
(0–7). Based on the distribution, how often did most people eat at a
fast-food restaurant?

16. Ages of Dogs The ages of 20 dogs in a pet shelter are shown. Construct
a frequency distribution using 7 classes.

17. Wind Speed The data show the maximum wind speeds (in miles per
hour) at selected cities in the United States. Construct a frequency
distribution for the data using eight classes. Summarize the results.
Source: The World Almanac and Book of Facts

18. Stories in the World’s Tallest Buildings The number of stories in


each of a sample of the world’s 30 tallest buildings follows. Construct
a grouped frequency distribution and a cumulative frequency
distribution with 7 classes.

Source: New York Times Almanac.

19. Rabies Virus Cases The data show the ages of 22 people who had
contracted the rabies virus for a certain year. Construct a frequency
distribution for the data and summarize the results. Use five classes.

Source: CDC U.S. Government

20. School Districts in States The data show the number of school
districts in each state in the United States. Construct a frequency
distribution using six classes for the data and summarize the results.

Source: The World Almanac and Book of Facts

21. World Temperatures The following data show the high temperatures
for May 8 in 22 selected cities around the world. Construct a frequency
distribution for the data. Use five classes.
Source: USA TODAY

22. Sports-Related Surgery The following data show the ages of 50


patients who had sports-related surgeries. Construct a frequency
distribution for the data using five classes and summarize the results.

23. Days Worked by Medical Doctors A survey of 32 medical doctors


shows the number of days that they worked in a recent year. Construct
a frequency distribution for the data using six classes. Summarize the
results.

24. Ranges of Tides The data show the mean ranges for tides in page 53
coastal cities in the United States. The ranges represent the
difference of the mean high water mark and the mean low water mark.
Construct a frequency distribution using six classes. Summarize the
results.

Source: The World Almanac and Book of Facts

25. Average Wind Speeds A sample of 40 large cities was selected, and
the average of the wind speeds was computed for each city over one
year. Construct a frequency distribution, using 7 classes.
Source: World Almanac and Book of Facts.

26. Percentage of People Who Completed 4 or More Years of College


Listed by state are the percentages of the population who have
completed 4 or more years of a college education. Construct a
frequency distribution with 7 classes.

Source: New York Times Almanac.

Extending the Concepts


27. JFK Assassination A researcher conducted a survey asking people if
they believed more than one person was involved in the assassination
of John F. Kennedy. The results were as follows: 73% said yes, 19%
said no, and 9% had no opinion. Is there anything suspicious about the
results?

28. The Value of Pi The ratio of the circumference of a circle to its


diameter is known as π (pi). The value of π is an irrational number,
which means that the decimal part goes on forever and there is no fixed
sequence of numbers that repeats. People have found the decimal part
of π to over a million places. We can statistically study the number.
Shown here is the value of π to 40 decimal places. Construct an
ungrouped frequency distribution for the digits. Based on the
distribution, do you think each digit appears equally in the number?
29. One hundred randomly selected people were asked, “Do you think our
President is doing a good job on foreign policy making?” Forty-two
people responded, “Yes,” 42 responded, “No,” and 16 didn’t respond at
all. Why should the 16 people be included in the categorical frequency
distribution?

30. In making a categorical frequency distribution, why should you always


include the total number of subjects in the study at the bottom of the
frequency column?

31. In a categorical frequency distribution, sometimes only percents are


used. What is one advantage of using percent?

32. Sometimes in a categorical frequency distribution, a category such as


“other” is used. For example, on a shelf in a college bookstore there
may be 8 algebra books, 3 statistics books, 5 calculus books, and 4
other types of mathematical books. Why must you be cautious when
using an “other” category in a categorical frequency distribution?

33. In a survey, 50 college students were asked what color automobile they
drive to school. The responses were:

What is wrong with the distribution? Did the researcher make a


mistake or is there some other explanation?

34. A researcher decides to survey 30 college students and asks each one
what their favorite pizza topping is. The researcher wants to use a pie
chart to summarize the data. The categories are pepperoni, beef, green
peppers, olives, tomatoes, mushrooms, pineapple, onions, spinach, and
jalapenos. Why might this not be a good idea?
35. A researcher decides to present the results obtained in a frequency
distribution by combining the five frequencies of adjacent classes, thus
using fewer classes. Why is this not a good idea?

page 54
Technology
EXCEL
Step by Step

Step by Step
Categorical Frequency Table (Qualitative or Discrete
Data)
1. In an open workbook, select cell A1 and type in all the beverage data
from Example 2–1 down column A.

2. Type in the variable name Breakfast Beverage in cell C1.


3. Select cell C2 and type in the five different beverage types down the
column.
4. Type in the name Count in cell D1.
5. Select cell D2. From the toolbar, select the Formulas tab on the
toolbar.
6. Select the Insert Function icon , then select the Statistical category
in the Insert Function dialog box.
7. Select the Countif function from the function name list.
8. In the dialog box, type A1:A40 in the Range box. Type in the
beverage type “W” in quotes in the Criteria box. The count or
frequency of the number of data corresponding to the beverage type
should appear in cell D2. Repeat for the remaining beverage types in
cells D3-D6.

9. After all the data have been counted, select cell D7 in the worksheet.
10. From the toolbar select Formulas, select the dropdown menu All, then
select Sum and type in D2:D6 to insert the total of the frequencies
into cell D7. If you select each cell, you should see the following
formulas.

After entering data or a heading into a worksheet, you can change page 55
the width of a column to fit the input. To automatically change the
width of a column to fit the data:
1. Select the column or columns that you want to change.
2. On the Home tab, in the Cells group, select Format.
3. Under Cell Size, click Autofit Column Width.

Making a Grouped Frequency Distribution (Quantitative


Data)
1. Press [Ctrl]-N for a new workbook.
2. Enter the raw data from Example 2–2 in column A, one number per
cell. Label the column Temperature.
3. Enter the upper class boundaries in column B. Label the column Bin.
4. From the toolbar select the Data tab, then click Data Analysis.
5. In the Analysis Tools, select Histogram and click [OK].
6. In the Histogram dialog box, type A1:A51 in the Input Range box and
type B1:B8 in the Bin Range box.
7. Check the Labels box. Note: Do not check this box unless your labels
are selected in the input range.
8. Select the radio button next to Output range, then type in D1 to the
output range.
9. Check the Cumulative Percentage option. Click [OK]. Note: By
leaving the Chart Output unchecked, a new worksheet will display the
table only.
10. You can change the label for the column containing the upper class
boundaries and expand the width of the columns automatically after
relabeling:
Select the Home tab from the toolbar.
Highlight the columns that you want to change.
Select Format, then AutoFit Column Width.

page 56
MINITAB
Step by Step

Make a Categorical Frequency Table (Qualitative or


Discrete Data)
1. Type in all the beverage types from Example 2–1 down C1 of the
worksheet. Note as you type in text the column header will change to
C1-T to signify this column is text instead of numeric.

2. Click above row 1 and name the column Beverage.


3. Select Stat>Tables>Tally Individual Values.
4. The cursor should be blinking in the Variables dialog box. If not, click
inside the dialog box. Double-click C1 in the Variables list.
5. Check the boxes for the statistics: Counts, Percents, Cumulative
counts, and Cumulative percents.

6. Click [OK]. The results will be displayed in the Session page 57


Window as shown.

Make a Grouped Frequency Distribution (Quantitative


Variable)
1. Select File>New>Minitab Worksheet. A new worksheet will be
added to the project.
2. Type the data used in Example 2–2 into C1. Name the column
Temperatures.
3. Use the instructions in the textbook to determine the class limits of
100 to 134 in increments of 5.
In the next step you will create a new column of data, converting the
numeric variable to text categories that can be tallied.
4. Select Data>Code>Numeric to Text.
a) The cursor should be blinking in Code data from columns. If not,
click inside the box, then double-click C1 Temperatures in the
list. Only quantitative variables will be shown in this list.
b) Click inside the Store coded data in columns: box and select
type in C2.
c) Press [Tab] to move to the table.
d) Type 100:104 in the original values, press [Tab], type 100–104 in
the New: column.
e) Press [Tab] to move to the row, and type the next category 105:109
and 105–109.
f) Continue to tab to each dialog box, typing the lower endpoint and
upper endpoint and then the category until the last category has
been entered.
The dialog box should look like the one shown.

5. Click [OK]. In the worksheet, a new column of data will be created in


column C2. This new variable will contain the category for each value
in C1. The column C2-T contains alphanumeric data.

6. Click Stat>Tables>Tally Individual Values, then double- page 58


click Recoded Temperatures in the Variables list.
a) Check the boxes for the desired statistics, such as Counts, Percents,
Cumulative counts, and Cumulative percents.
b) Click [OK].
The table will be displayed in the Session Window. Eighteen states have
high temperatures between 110 and 114°F. Eighty-two percent of the
states have record high temperatures less than or equal to 119°F.
7. Click File>Save Project As . . . , and type the name of the project
file, Ch2-1. This will save the two worksheets and the Session
Window.

2–2 Histograms, Frequency Polygons, and


Ogives
After you have organized the data into a frequency distribution, you can
present them in graphical form. The purpose of graphs in statistics is to
convey the data to the viewers in pictorial form. It is easier for most people to
comprehend the meaning of data presented graphically than data presented
numerically in tables or frequency distributions. This is especially true if the
users have little or no statistical knowledge.

OBJECTIVE 2
Represent data in frequency distributions graphically, using histograms, frequency polygons,
and ogives.

Statistical graphs can be used to describe the data set or to analyze it.
Graphs are also useful in getting the audience’s attention in a publication or a
speaking presentation. They can be used to discuss an issue, reinforce a
critical point, or summarize a data set. They can also be used to discover a
trend or pattern in a situation over a period of time.
The three most commonly used graphs in research are
1. The histogram.
2. The frequency polygon.
3. The cumulative frequency graph, or ogive (pronounced o-jive).

The steps for constructing the histogram, frequency polygon, and the ogive
are summarized in the procedure table.

Procedure Table
Constructing a Histogram, Frequency Polygon, and Ogive
Step 1 Draw and label the x and y axes.
Step 2 On the x axis, label the class boundaries of the frequency
distribution for the histogram and ogive. Label the midpoints for
the frequency polygon.
Step 3 Plot the frequencies for each class, and draw the vertical bars for
the histogram and the lines for the frequency polygon and ogive.
(Note: Remember that the lines for the frequency polygon begin and end
on the x axis while the lines for the ogive begin on the x axis.)

The Histogram page 59

Historical Note
Karl Pearson introduced the histogram in 1891. He used it to show time concepts of
various reigns of Prime Ministers.

EXAMPLE 2–4 Record High Temperatures


Construct a histogram to represent the data shown for the record high
temperatures for each of the 50 states (see Example 2–2).

SOLUTION
Step 1 Draw and label the x and y axes. The x axis is always the
horizontal axis, and the y axis is always the vertical axis.
Step 2 Represent the frequency on the y axis and the class boundaries
on the x axis.
Step 3 Using the frequencies as the heights, draw vertical bars for each
class. See Figure 2–1.
FIGURE 2–1 Histogram for Example 2ȓ4

As the histogram shows, the class with the greatest number of


data values (18) is 109.5–114.5, followed by 13 for 114.5–
119.5. The graph also has one peak with the data clustering
around it.

Historical Note
Graphs originated when ancient astronomers drew the position of the stars in the heavens.
Roman surveyors also used coordinates to locate landmarks on their maps.
The development of statistical graphs can be traced to William Playfair (1759–1823), an
engineer and drafter who used graphs to present economic data pictorially.

The Frequency Polygon


Another way to represent the same data set is by using a frequency polygon.

Example 2–5 shows the procedure for constructing a frequency page 60


polygon. Be sure to begin and end on the x axis.
EXAMPLE 2–5 Record High Temperatures
Using the frequency distribution given in Example 2–4, construct a
frequency polygon.
SOLUTION
Step 1 Find the midpoints of each class. Recall that midpoints are found
by adding the upper and lower boundaries and dividing by 2:

and so on. The midpoints are

Step 2 Draw the x and y axes. Label the x axis with the midpoint of
each class, and then use a suitable scale on the y axis for the
frequencies.
Step 3 Using the midpoints for the x values and the frequencies as the y
values, plot the points.
Step 4 Connect adjacent points with line segments. Draw a line back to
the x axis at the beginning and end of the graph, at the same
distance that the previous and next midpoints would be located,
as shown in Figure 2–2.

FIGURE 2–2
Frequency Polygon for Example 2–5

The frequency polygon and the histogram are two different ways to
represent the same data set. The choice of which one to use is left to the
discretion of the researcher.

The Ogive
The third type of graph that can be used represents the cumulative
frequencies for the classes. This type of graph is called the cumulative
frequency graph, or ogive. The cumulative frequency is the sum of the
frequencies accumulated up to the upper boundary of a class in the
distribution.

page 61

Example 2–6 shows the procedure for constructing an ogive. Be sure to


start on the x axis.

EXAMPLE 2–6 Record High Temperatures


Construct an ogive for the frequency distribution described in Example 2–
4.
SOLUTION
Step 1 Find the cumulative frequency for each class.

Step 2 Draw the x and y axes. Label the x axis with the class
boundaries. Use an appropriate scale for the y axis to represent
the cumulative frequencies. (Depending on the numbers in the
cumulative frequency columns, scales such as 0, 1, 2, 3, . . . , or
5, 10, 15, 20, . . . , or 1000, 2000, 3000, . . . can be used. Do not
label the y axis with the numbers in the cumulative frequency
column.) In this example, a scale of 0, 5, 10, 15, . . . will be
used.
Step 3 Plot the cumulative frequency at each upper class boundary, as
shown in Figure 2–3. Upper boundaries are used since the
cumulative frequencies represent the number of data values
accumulated up to the upper boundary of each class.
Step 4 Starting with the first upper class boundary, 104.5, connect
adjacent points with line segments, as shown in Figure 2–4.
Then extend the graph to the first lower class boundary, 99.5, on
the x axis.

FIGURE 2–3
Plotting the Cumulative Frequency for Example 2–6

FIGURE 2–4
Ogive for Example 2–6

Unusual Stats
Twenty-two percent of Americans sleep 6 hours a day or less.

page 62
Cumulative frequency graphs are used to visually represent how many
values are below a certain upper class boundary. For example, to find out
how many record high temperatures are less than 114.5°F, locate 114.5°F on
the x axis, draw a vertical line up until it intersects the graph, and then draw a
horizontal line at that point to the y axis. The y axis value is 28, as shown in
Figure 2–5.

Relative Frequency Graphs


The histogram, the frequency polygon, and the ogive shown previously were
constructed by using frequencies in terms of the raw data. These distributions
can be converted to distributions using proportions instead of raw data as
frequencies. These types of graphs are called relative frequency graphs.
Graphs of relative frequencies instead of frequencies are used when the
proportion of data values that fall into a given class is more important than
the actual number of data values that fall into that class. For example, if you
wanted to compare the age distribution of adults in Philadelphia,
Pennsylvania, with the age distribution of adults of Erie, Pennsylvania, you
would use relative frequency distributions. The reason is that since the
population of Philadelphia is 1,526,006 and the population of Erie is 101,786,
the bars using the actual data values for Philadelphia would be much taller
than those for the same classes for Erie.
To convert a frequency into a proportion or relative frequency, divide the
frequency for each class by the total of the frequencies. The sum of the
relative frequencies will always be 1. These graphs are similar to the ones
that use raw data as frequencies, but the values on the y axis are in terms of
proportions. Example 2–7 shows the three types of relative frequency graphs.

FIGURE 2–5
Finding a Specific Cumulative Frequency

page 63

EXAMPLE 2–7 Net Worth of Small


Businesses
The following frequency distribution shows the approximate net worth in
thousands of dollars of 50 small businesses in a city. Draw a relative
histogram, relative frequency polygon, and a relative ogive for the data.

SOLUTION
Step 1 Convert each frequency to a proportion or relative frequency by
dividing the frequency for each class by the total number of
observations.
For the class 35.5–42.5, the relative frequency is = 0.16. For
the class 42.5–49.5, the relative frequency is = 0.28. For the
class 49.5–56.5, the relative frequency is = 0.40. For the class
56.5–63.5, the relative frequency is = 0.10. For the class 63.5–
70.5, the relative frequency is = 0.06.
Place these values in the column labeled Relative frequency.
Also, find the midpoints, as shown in Example 2–5, for each
class and place them in the Midpoints column.

Step 2 Find the cumulative relative frequencies. To do this, add the


frequency in each class to the total frequency of the preceding
class. In this case, 0.00 + 0.16 = 0.16, 0.16 + 0.28 = 0.44, 0.44 +
0.40 = 0.84, 0.84 + 0.10 = 0.94, and 0.94 + 0.06 = 1.00. Place
these values in a column labeled Cumulative relative frequency.
An alternative method would be to change the cumulative
frequencies for the classes to relative frequencies. (Divide each
by the total.)

Step 3 Draw each graph as shown in Figure 2–6. For the page 64
histogram and ogive, use the class boundaries along the
x axis. For the frequency, use the midpoints on the x axis. For
the scale on the y axis, use proportions.

FIGURE 2–6
Graphs for Example 2–7

Distribution Shapes
When one is describing data, it is important to be able to recognize the shapes
of the distribution values. In later chapters, you will see that the shape of a
distribution also determines the appropriate statistical methods used to
analyze the data.

FIGURE 2–7 page 65


Distribution Shapes

A distribution can have many shapes, and one method of analyzing a


distribution is to draw a histogram or frequency polygon for the distribution.
Several of the most common shapes are shown in Figure 2–7: the bell-shaped
or mound-shaped, the uniform-shaped, the J-shaped, the reverse J-shaped,
the positively or right-skewed shape, the negatively or left-skewed shape, the
bimodal-shaped, and the U-shaped.
Distributions are most often not perfectly shaped, so it is not necessary to
have an exact shape but rather to identify an overall pattern.
A bell-shaped distribution shown in Figure 2–7(a) has a single peak and
tapers off at either end. It is approximately symmetric; i.e., it is roughly the
same on both sides of a line running through the center.
A uniform distribution is basically flat or rectangular. See Figure 2–7(b).
A J-shaped distribution is shown in Figure 2–7(c), and it has a few data
values on the left side and increases as one moves to the right. A reverse J-
shaped distribution is the opposite of the J-shaped distribution. See Figure 2–
7(d).

When the peak of a distribution is to the left and the data values page 66
taper off to the right, a distribution is said to be positively or right-
skewed. See Figure 2–7(e). When the data values are clustered to the right
and taper off to the left, a distribution is said to be negatively or left-skewed.
See Figure 2–7(f). Skewness will be explained in detail in Chapter 3.
Distributions with one peak, such as those shown in Figure 2–7(a), (e), and
(f), are said to be unimodal. (The highest peak of a distribution indicates
where the mode of the data values is. The mode is the data value that occurs
more often than any other data value. Modes are explained in Chapter 3.)
When a distribution has two peaks of the same height, it is said to be
bimodal. See Figure 2–7(g). Finally, the graph shown in Figure 2–7(h) is a U-
shaped distribution.
Distributions can have other shapes in addition to the ones shown here;
however, these are some of the more common ones that you will encounter in
analyzing data.
When you are analyzing histograms and frequency polygons, look at the
shape of the curve. For example, does it have one peak or two peaks? Is it
relatively flat, or is it U-shaped? Are the data values spread out on the graph,
or are they clustered around the center? Are there data values in the extreme
ends? These may be outliers. (See Section 3–3 for an explanation of outliers.)
Are there any gaps in the histogram, or does the frequency polygon touch the
x axis somewhere other than at the ends? Finally, are the data clustered at one
end or the other, indicating a skewed distribution?
For example, the histogram for the record high temperatures in Figure 2–1
shows a single peaked distribution, with the class 109.5–114.5 containing the
largest number of temperatures. The distribution has no gaps, and there are
fewer temperatures in the highest class than in the lowest class.

Applying the Concepts 2–2


Selling Real Estate
Assume you are a realtor in Bradenton, Florida. You have recently obtained
a listing of the selling prices of the homes that have sold in that area in the
last 6 months. You wish to organize those data so you will be able to
provide potential buyers with useful information. Use the following data to
create a histogram, frequency polygon, and cumulative frequency polygon.

Source: https://www.zillow.com/bradenton-fl/

1. What questions could be answered more easily by looking at the


histogram rather than the listing of home prices?
2. What different questions could be answered more easily by looking at
the frequency polygon rather than the listing of home prices?
3. What different questions could be answered more easily by looking at
the cumulative frequency polygon rather than the listing of home
prices?
4. Are there any extremely large or extremely small data values
compared to the other data values?
5. Which graph displays these extremes the best?
6. Is the distribution skewed?

See page 110 for the answers.

page 67
Exercises 2–2
1. Do Students Need Summer Development? For 108 randomly
selected college applicants, the following frequency distribution for
entrance exam scores was obtained. Construct a histogram, frequency
polygon, and ogive for the data. (The data for this exercise will be used
for Exercise 13 in this section.)

Applicants who score above 107 need not enroll in a summer


developmental program. In this group, how many students do not have
to enroll in the developmental program?
2. Bear Kills The number of bears killed in 2014 for 56 counties in
Pennsylvania is shown in the frequency distribution. Construct a
histogram, frequency polygon, and ogive for the data. Comment on the
skewness of the distribution. How many counties had 75 or fewer bears
killed? (The data for this exercise will be used for Exercise 14 of this
section.)

Source: Pennsylvania State Game Commission.

3. Pupils Per Teacher The average number of pupils per teacher in each
state is shown. Construct a grouped frequency distribution with 6
classes. Draw a histogram, frequency polygon, and ogive. Analyze the
distribution.

Source: U.S. Department of Education.

4. Number of College Faculty The number of faculty listed for a sample


of private colleges that offer only bachelor’s degrees is listed below.
Use these data to construct a frequency distribution with 7 classes, a
histogram, a frequency polygon, and an ogive. Discuss the shape of
this distribution. What proportion of schools have 180 or more faculty?
Source: World Almanac and Book of Facts.

5. Railroad Crossing Accidents The data show the number of railroad


crossing accidents for the 50 states of the United States for a specific
year. Construct a histogram, frequency polygon, and ogive for the data.
Comment on the skewness of the distribution. (The data in this exercise
will be used for Exercise 15 in this section.)

Source: Federal Railroad Administration.

6. NFL Salaries The salaries (in millions of dollars) for 31 NFL teams
for a specific season are given in this frequency distribution.
Construct a histogram, a frequency polygon, and an ogive for the
data; and comment on the shape of the distribution. (The data for this
exercise will be used for Exercise 16 of this section.)

Source: NFL.com

7. Suspension Bridges Spans The following frequency distribution


shows the length (in feet) of the main spans of the longest suspension
bridges in the United States. Construct a histogram, frequency polygon,
and ogive for the distribution. Describe the shape of the distribution.

page 68

Source: U.S. Department of Transportation.

8. Time Spent Looking for a Parking Space The following frequency


distribution shows the amount of time (in hours) that workers in a large
city spend each year trying to find a parking space. Draw a histogram,
frequency polygon, and ogive for the data. Summarize the results.
9. Debts of Millennials The following frequency distribution shows the
amount of debt (in thousands of dollars) that millennials have for a
recent year. Draw a histogram, frequency polygon, and ogive for the
data and summarize the results.

10. Making the Grade The frequency distributions shown indicate the
percentages of public school students in fourth-grade reading and
mathematics who performed at or above the required proficiency levels
for the 50 states in the United States. Draw histograms for each, and
decide if there is any difference in the performance of the students in
the subjects.

Source: National Center for Educational Statistics.

11. Places Visited During a Day The following frequency distribution


shows the number of places (stores, service stations, etc.) a person
visits on an average day. Construct a histogram, frequency polygon,
and ogive for the data. Comment on the data.

12. Federal Crime Sentence Lengths The frequency distribution shows


the lengths in months of a random sample of federal racketeering and
extortion sentences given to the guilty persons in a U.S. court. Draw a
histogram, frequency polygon, and ogive for the data.

Source: Based on information in the Sourcebook of Federal Sentencing.

13. Construct a histogram, frequency polygon, and ogive, using relative


frequencies for the data in Exercise 1 of this section.

14. Construct a histogram, frequency polygon, and ogive, using relative


frequencies for the data in Exercise 2 of this section.
15. Construct a histogram, frequency polygon, and ogive, using relative
frequencies for the data in Exercise 5 of this section.

16. Construct a histogram, frequency polygon, and ogive, using relative


frequencies for the data in Exercise 6 of this section.

17. Home Runs The data show the most number of home runs hit by a
batter in the American League over the last 30 seasons. Construct a
frequency distribution using 5 classes. Draw a histogram, a frequency
polygon, and an ogive for the date, using relative frequencies. Describe
the shape of the histogram.

page 69

Source: World Almanac and Book of Facts.

18. Protein Grams in Fast Food The amount of protein (in grams) for a
variety of fast-food sandwiches is reported here. Construct a frequency
distribution, using 6 classes. Draw a histogram, a frequency polygon,
and an ogive for the data, using relative frequencies. Describe the
shape of the histogram.

Source: The Doctor’s Pocket Calorie, Fat, and Carbohydrate Counter.

Extending the Concepts


19. Using the histogram shown here, do the following.

a. Construct a frequency distribution; include class limits, class


frequencies, midpoints, and cumulative frequencies.
b. Construct a frequency polygon.
c. Construct an ogive.

20. Using the results from Exercise 19, answer these questions.
a. How many values are in the class 27.5–30.5?
b. How many values fall between 24.5 and 36.5?
c. How many values are below 33.5?
d. How many values are above 30.5?

21. Math SAT Scores Shown is an ogive depicting the cumulative


frequency of the average mathematics SAT scores by state. Use it to
construct a histogram and a frequency polygon.

Technology
TI-84 Plus
Step by Step

Step by Step
Constructing a Histogram
To plot the histogram from raw data:
Example TI2–1
Plot a histogram for the following data from Example 2–2.

1. Enter the data in L1.


2. Make sure Window values are appropriate for the histogram.

To display the graphs on the screen, enter the appropriate page 70


values in the calculator, using the WINDOW menu. The
default values are Xmin = −10, Xmax = 10, Ymin = −10, and Ymax = 10.
The Xscl changes the distance between the tick marks on the x axis
and can be used to change the class width for the histogram.
To change the values in the WINDOW:
a. Press WINDOW.
b. Move the cursor to the value that needs to be changed. Then type
in the desired value and press ENTER.
c. Continue until all values are appropriate.
d. Press [2nd] [QUIT] to leave the WINDOW menu.
3. Press [2nd] [STAT PLOT] ENTER.
4. Press ENTER to turn the plot 1 on, if necessary.
5. Move cursor to the Histogram symbol and press ENTER, if
necessary. The histogram is the third option.
6. Make sure Xlist is L1.
7. Make sure Freq is 1.
8. Press GRAPH to display the histogram.
9. To obtain the frequency (number of data values in each class), press
the TRACE key, followed by ◂ or ▸ keys.

To plot the histogram from grouped data:


Example TI2–2
Plot a histogram for the data from Examples 2–4 and 2–5.

1. Enter the midpoints into L1.


2. Enter the frequencies into L2.
3. Make sure Window values are appropriate for the histogram.
4. Press [2nd] [STAT PLOT] ENTER.
5. Press ENTER to turn the plot on, if necessary.
6. Move cursor to the histogram symbol, and press ENTER, if
necessary.
7. Make sure Xlist is L1.
8. Make sure Freq is L2.
9. Press GRAPH to display the histogram.

To plot a frequency polygon from grouped data:


1. Follow the same steps as for the histogram for grouped data, except
change the graph type from histogram (third graph) to a line graph
(second graph).

To plot an ogive from grouped data: page 71


1. Enter the first lower class boundary for the first group, followed by all
the upper class boundaries into L1.
2. In L2, enter 0 for the first value and the rest the cumulative
frequencies.

3. Press [2nd] [STAT PLOT] ENTER.


4. Change the graph type to line (second graph).
5. Press WINDOW. Change the Ymax from the Window menu to the
sample size.

6. Press GRAPH to display the ogive.


EXCEL
Step by Step

Constructing a Histogram
1. Press [Ctrl]-N for a new workbook.
2. Enter the data from Example 2–2 in column A, one number per cell.
Label the column Temperature.
3. Enter the upper boundaries into column B. Label the column Bin.
4. From the toolbar, select the Data tab, then select Data Analysis.
5. In Data Analysis, select Histogram and click [OK].
6. In the Histogram dialog box, type A1:A51 in the Input Range box
and type B1:B8 in the Bin Range box.

7. Check the Labels box. Note: Do not check this box unless your labels
are selected in the input range.
8. Select the radio button next to Output range, then type in D1 to the
output range.
9. Check Chart Output. Click [OK].

Editing the Histogram page 72

To move the vertical bars of the histogram closer together:


1. Right-click one of the bars of the histogram, and select Format Data
Series.

2. Move the Gap Width slider all the way to the left to change the gap
width of the bars in the histogram to 0.
3. To change the label for the horizontal axis:
a. Left-click the mouse over any part of the histogram.
b. Select the Design tab from the toolbar.
c. Select the Add Chart Element tab, Axis Titles and Primary
Horizontal.

Once the Axis Titles text box is selected, you can type in the name of
the variable represented on the horizontal axis.
Select the legend then delete. There is no need for a legend if there
is only one color. You can also change the line border color to see
each bar better. Select the paint bucket icon in the format data series
options. Select Border>Solid Line, then select a different color from
the palette.

Right-click on the label More on the horizontal axis. Choose page 73


the Select Data option. Under the Horizontal Axis Labels,
uncheck the More category. Click [OK].

Here is the finished histogram.

Constructing a Frequency Polygon


1. Press [CTRL]-N for a new notebook.
2. Enter the midpoints of the data from Example 2–2 into column A and
the frequencies into column B, including labels.

Note: Classes with frequency 0 have been added at the beginning and the
end to “anchor” the frequency polygon to the horizontal axis.
3. Press and hold the left mouse button, and drag over the Frequencies
(including the label) from column B.
4. Select the Insert tab from the toolbar and the Line Chart option.

5. Select the 2-D line chart type. page 74

6. We will need to edit the graph so that the midpoints are on the
horizontal axis.
a. Right-click the mouse on any region of the chart.
b. Choose Select Data.
c. Select Edit below the Horizontal (Category) Axis Labels panel on
the right.
d. Press and hold the left mouse button, and drag over the midpoints
(not including the label) for the Axis label range, then click [OK].
e. Click [OK] on the Select Data Source box.

7. Insert labels on both axes.


a. Click the mouse on any region of the graph.
b. Select Chart Elements and then Layout on the toolbar.
c. Select Axis Titles to open the horizontal and vertical axis text
boxes. Then manually type in labels for the axes.

8. Change the chart title so that one can easily see what the graph page 75
represents.
a. Select Chart Elements, Layout from the toolbar.
b. Select Chart Title.
c. Choose one of the options from the Chart Title menu and edit.

Constructing an Ogive
1. To create an ogive, use the upper class boundaries (horizontal axis)
and cumulative frequencies (vertical axis) from the frequency
distribution.
2. Type the upper class boundaries (including a class with frequency 0
before the lowest class to anchor the graph to the horizontal axis) and
corresponding cumulative frequencies into adjacent columns of an
Excel worksheet.
3. Press and hold the left mouse button, and drag over the Cumulative
Frequencies from column B. Select Line Chart, then the 2-D Line
option.

As with the frequency polygon, you can insert labels on the axes and a chart
title for the ogive.

page 76
MINITAB
Step by Step

Construct a Histogram
1. Enter the data from Example 2–2, the high temperatures for the 50
states, into C1.
2. Select Graph>Histogram.
3. Select [Simple], then click [OK].
4. Click C1 Temperatures in the Graph variables dialog box, and
label the graph.
5. Click [OK]. A new graph window containing the histogram will open.
6. Click the File menu to print or save the graph.
7. Click File>Exit.
8. Save the project as Example 2-2.mpj.

2–3 Other Types of Graphs


In addition to the histogram, the frequency polygon, and the ogive, several
other types of graphs are often used in statistics. They are the bar graph,
Pareto chart, time series graph, pie graph, and the dotplot. Figure 2–8 shows
an example of each type of graph.

OBJECTIVE 3
Represent data using bar graphs, Pareto charts, time series graphs, pie graphs, and dotplots.

Bar Graphs
When the data are qualitative or categorical, bar graphs can be used to
represent the data. A bar graph can be drawn using either horizontal or
vertical bars.

EXAMPLE 2–8 College Spending for First-Year


Students
The table shows the average money spent by first-year college students.
Draw a horizontal and vertical bar graph for the data.

Source: The National Retail Federation.

FIGURE 2–8 Other Types of Graphs Used in Statistics page 77


SOLUTION
Step 1 Draw and label the x and y axes. For the horizontal bar graph
place the frequency scale on the x axis, and for the vertical bar
graph place the frequency scale on the y axis.
Step 2 Draw the bars corresponding to the frequencies. See Figure 2–9.

FIGURE 2–9 Bar Graphs for Example 2–8 page 78

The graphs show that first-year college students spend the most on
electronic equipment.

Bar graphs can also be used to compare data for two or more groups. These
types of bar graphs are called compound bar graphs. Consider the following
data for the average times (in hours) that adults in the United States spend
viewing television each week. (Note: It is not necessary to have equal class
sizes in these types of graphs.)

Figure 2–10 shows a bar graph that compares the average time (in hours)
that men and women watch television each week. The comparison is made by
placing the bars for each gender next to each other; then the heights of the
bars can be compared.
The graph shows that people spend more time watching television as they
grow older. Also, it shows that for each age group, women watch slightly
more television than men.

FIGURE 2–10
Example of a Compound Bar Graph

Pareto Charts page 79


When the variable displayed on the horizontal axis is qualitative or
categorical, a Pareto chart can also be used to represent the data.

Historical Note
Vilfredo Pareto (1848–1923) was an Italian scholar who developed theories in economics,
statistics, and the social sciences. His contributions to statistics include the development
of a mathematical function used in economics. This function has many statistical
applications and is called the Pareto distribution. In addition, he researched income
distribution, and his findings became known as Pareto’s law.

EXAMPLE 2–9 Organic Foods


A study found that people are willing to pay at least 20% more for organic
food than they do for regular food. The excess amount spent for organic
foods is shown. Draw a Pareto chart for the data.

Source: Hartman Group

SOLUTION
Step 1 Arrange the data from largest value to smallest value.

Step 2 Draw and label the x and y axes.


Step 3 Draw vertical bars to represent the percents. See Figure 2–11.

FIGURE 2–11 Pareto Chart for Example 2–9

The graph shows that people will pay as much as 44% more for fresh
vegetables and as much as 27% for cereal bars.
page 80
Suggestions for Drawing Pareto Charts
1. Make the bars the same width.
2. Arrange the data from largest to smallest according to frequency.
3. Make the units that are used for the frequency equal in size.

When you analyze a Pareto chart, make comparisons by looking at page 81


the heights of the bars.

The Time Series Graph


When data are collected over a period of time, they can be represented by a
time series graph.

Example 2–10 shows the procedure for constructing a time series graph.

EXAMPLE 2–10 Water Consumption


The data show the average water consumption in a specific city in millions
of cubic meters for a one-year period. Draw a time series graph for the
data and describe the results.

Historical Note
Time series graphs are over 1000 years old. The first ones were used to chart the
movements of the planets and the sun.

SOLUTION
Step 1 Draw and label the x and y axes.
Step 2 Label the x axis for months and the y axis for millions of cubic
meters.
Step 3 Plot each point for the values shown.
Step 4 Draw line segments connecting adjacent points. Do not draw a
smooth curve through the points. See Figure 2–12.

FIGURE 2–12 Water Consumption for Example 2–10

Water consumption was highest in July, August, and September, and then
returned to values close to those in January through April.

FIGURE 2–13
Two Time Series Graphs for Comparison

Source: Bureau of Census, U.S. Department of Commerce.


When you analyze a time series graph, look for a trend or pattern that
occurs over the time period. For example, is the line ascending (indicating an
increase over time) or descending (indicating a decrease over time)? Another
thing to look for is the slope, or steepness, of the line. A line that is steep over
a specific time period indicates a rapid increase or decrease over that period.
Two or more data sets can be compared on the same graph called a
compound time series graph if two or more lines are used, as shown in Figure
2–13. This graph shows the percentage of elderly men and women in the U.S.
labor force from 1960 to 2020. It shows that the percentage of elderly men
decreased significantly from 1960 to 1990 and then increased slightly after
that. For the elderly females, the percentage decreased slightly from 1960 to
1980 and then increased from 1980 to 2020.

The Pie Graph


Pie graphs are used extensively in statistics. The purpose of the pie graph is
to show the relationship of the parts to the whole by visually comparing the
sizes of the sections. Percentages or proportions can be used. The variable is
nominal or categorical.

Example 2–11 shows the procedure for constructing a pie graph.

page 82

SPEAKING OF STATISTICS
The graph shows the number of murders (in thousands) that have occurred in the United
States since 2014. Based on the graph, do you think the number of murders is increasing,
decreasing, or remaining the same?

Murders in the United States

Source: FBI

EXAMPLE 2–11 Super Bowl Snack Foods


This frequency distribution shows the number of pounds of each snack
food eaten during the Super Bowl. Construct a pie graph for the data.

Source: USA TODAY Weekend.

SOLUTION
Step 1 Since there are 360° in a circle, the frequency for each class
must be converted to a proportional part of the circle. This
conversion is done by using the formula
where f = frequency for each class and n = sum of the
frequencies. Hence, the following conversions are obtained. The
degrees should sum to 360°.1

page 83

Step 2 Each frequency must also be converted to a percentage. Recall


from Example 2–1 that this conversion is done by using the
formula

Hence, the following percentages are obtained. The percentages


should sum to 100%.2

Step 3 Next, using a protractor and a compass, draw the graph, page 84
using the appropriate degree measures found in Step 1,
and label each section with the name and percentages, as shown
in Figure 2–14.

FIGURE 2–14 Pie Graph for Example 2–11

1
Note: The degrees column does not always sum to 360° due to rounding.
2
Note: The percent column does not always sum to 100% due to rounding.

EXAMPLE 2–12 Police Calls


Construct and analyze a pie graph for the calls received each shift by a
local municipality for a recent year. (Data obtained by author.)

SOLUTION
Step 1 Find the number of degrees for each shift, using the formula:

For each shift, the following results are obtained:

Step 2 Find the percentages:

Step 3 Using a protractor, graph each section and write its name and
corresponding percentage as shown in Figure 2–15.

FIGURE 2–15 Figure for Example 2–12

To analyze the nature of the data shown in the pie graph, look at the size of
the sections in the pie graph. For example, are any sections relatively large
compared to the rest? Figure 2–15 shows that the number of calls for the
three shifts are about equal, although slightly more calls were received on the
evening shift.
Note: Computer programs can construct pie graphs easily, so the
mathematics shown here would only be used if those programs were not
available.

Dotplots
A dotplot uses points or dots to represent the data values. If the data values
occur more than once, the corresponding points are plotted above one
another.

Dotplots are used to show how the data values are distributed and to see if
there are any extremely high or low data values.
EXAMPLE 2–13 Federal Waste Sites
The data show the number of federal waste sites in each of the 50 states.
Draw a dot plot for the data and summarize the results.

Step 1 Find the lowest and highest data values, and decide what scale to
use on the horizontal axis. The lowest data value is 0 and the
highest data value is 13, so a scale from 0 to 13 is needed.
Step 2 Draw a horizontal line, and draw the scale on the line.

Step 3 Plot each data value above the line. If the value occurs page 85
more than once, plot the other point above the first
point. See Figure 2–16.

FIGURE 2–16
Figure for Example 2–13

The graph shows that most of the states have between zero and three
waste sites, with 13 states having one waste site. This is the largest
frequency.

Stem and Leaf Plots


The stem and leaf plot is a method of organizing data and is a combination of
sorting and graphing. It has the advantage over a grouped frequency
distribution of retaining the actual data while showing them in graphical
form.

OBJECTIVE 4
Draw and interpret a stem and leaf plot.
For example, a data value of 34 would have 3 as the stem and 4 as the leaf. A
data value of 356 would have 35 as the stem and 6 as the leaf.
Example 2–14 shows the procedure for constructing a stem and leaf plot.

EXAMPLE 2–14 Out Patient Cardiograms


At an outpatient testing center, the number of cardiograms performed each
day for 20 days is shown. Construct a stem and leaf plot for the data.

SOLUTION
Step 1 Arrange the data in order:
02, 13, 14, 20, 23, 25, 31, 32, 32, 32,
32, 33, 36, 43, 44, 44, 45, 51, 52, 57
Note: Arranging the data in order is not essential and can be
cumbersome when the data set is large; however, it is helpful in
constructing a stem and leaf plot. The leaves in the final stem
and leaf plot should be arranged in order.

Step 2 Separate the data according to the first digit, as shown. page 86

Step 3 A display can be made by using the leading digit as the stem and
the trailing digit as the leaf. For example, for the value 32, the
leading digit, 3, is the stem and the trailing digit, 2, is the leaf.
For the value 14, the 1 is the stem and the 4 is the leaf. Now a
plot can be constructed as shown in Figure 2–17.

FIGURE 2–17
Stem and Leaf Plot for Example 2–14
Figure 2–17 shows that the distribution peaks in the center and that there
are no gaps in the data. For 7 of the 20 days, the number of patients receiving
cardiograms was between 31 and 36. The plot also shows that the testing
center treated from a minimum of 2 patients to a maximum of 57 patients in
any one day.
If there are no data values in a class, you should write the stem number and
leave the leaf row blank. Do not put a zero in the leaf row.

EXAMPLE 2–15 Number of Car Thefts in a


Large City
An insurance company researcher conducted a survey on the number of
car thefts in a large city for a period of 30 days last summer. The raw data
are shown. Construct a stem and leaf plot by using classes 50–54, 55–59,
60–64, 65–69, 70–74, and 75–79.

SOLUTION
Step 1 Arrange the data in order.
50, 51, 51, 52, 53, 53, 55, 55, 56, 57, 57, 58, 59, 62, 63,
65, 65, 66, 66, 67, 68, 69, 69, 72, 73, 75, 75, 77, 78, 79
Step 2 Separate the data according to the classes.

Step 3 Plot the data as shown here.

The graph for this plot is shown in Figure 2–18.


FIGURE 2–18
Stem and Leaf Plot for Example 2–15

page 87

SPEAKING OF STATISTICS
The Federal Reserve estimated that during a recent year, there were 22 billion bills in
circulation. About 28% of them were $1 bills, 3% were $2 bills, 7% were $5 bills, 5% were
$10 bills, 21% were $20 bills, 4% were $50 bills, and 32% were $100 bills. It costs about
3¢ to print each bill.
The average life of a $1 bill is 22 months, a $10 bill 3 years, a $20 bill 4 years, a $50 bill
9 years, and a $100 bill 9 years. What type of graph would you use to represent the
average lifetimes of the bills?

How Much Paper Money Is in Circulation Today?

Art Vandalay/Getty Images

When you analyze a stem and leaf plot, look for peaks and gaps in the
distribution. See if the distribution is symmetric or skewed. Check the
variability of the data by looking at the spread.
Related distributions can be compared by using a back-to-back stem and
leaf plot. The back-to-back stem and leaf plot uses the same digits for the
stems of both distributions, but the digits that are used for the leaves are
arranged in order out from the stems on both sides. Example 2–16 shows a
back-to-back stem and leaf plot.

EXAMPLE 2–16 Number of Stories in Tall


Buildings
The number of stories in two randomly selected samples of tall buildings
in Miami and Houston are shown. Construct a back-to-back stem and leaf
plot for the data and compare the distributions.
Source: World Almanac and Book of Facts

SOLUTION page 88

Step 1 Arrange the data for both data sets in order.


Step 2 Construct a stem and leaf plot, using the same digits as stems.
Place the digits for the leaves for Miami on the left side of the
stem and the digits for the leaves for Houston on the right side,
as shown. See Figure 2–19.

FIGURE 2–19 Back-to-Back Stem and Leaf Plot for Example 2–16

Step 3 Compare the distributions. Miami’s buildings are somewhat


higher than those in Houston. The distribution for the buildings
in Miami peak at 40 to 49 stories and 50 to 59 stories, while
those in Houston peak at 40 to 49 stories.

Stem and leaf plots are part of the techniques called exploratory data
analysis. More information on this topic is presented in Chapter 3.

Misleading Graphs
Graphs give a visual representation that enables readers to analyze and
interpret data more easily than they could simply by looking at numbers.
However, inappropriately drawn graphs can misrepresent the data and lead
the reader to false conclusions. For example, a car manufacturer’s ad stated
that 98% of the vehicles it had sold in the past 10 years were still on the road.
The ad then showed a graph similar to the one in Figure 2–20. The graph
shows the percentage of the manufacturer’s automobiles still on the road and
the percentage of its competitors’ automobiles still on the road. Is there a
large difference? Not necessarily.

FIGURE 2–20
Graph of Automaker’s Claim Using a Scale from 95 to 100%
FIGURE 2–21 page 89
Graph in Figure 2–20 Redrawn Using a Scale from 0 to 100%

Notice the scale on the vertical axis in Figure 2–20. It has been cut off (or
truncated) and starts at 95%. When the graph is redrawn using a scale that
goes from 0 to 100%, as in Figure 2–21, there is hardly a noticeable
difference in the percentages. Thus, changing the units at the starting point on
the y axis can convey a very different visual representation of the data.
It is not wrong to truncate an axis of the graph; many times it is necessary
to do so. However, the reader should be aware of this fact and interpret the
graph accordingly. Do not be misled if an inappropriate impression is given.
Let us consider another example. The projected required fuel economy in
miles per gallon for vehicles is shown. In this case, an increase from 14.1 to
16 miles per gallon is projected.

Source: National Highway Traffic Safety Administration.


When you examine the graph shown in Figure 2–22(a), you see a slight
increase from 14.1 to 16, but when you spread out the scale as shown in
Figure 2–22(b), you see a much larger increase for the same data values.
Again, by manipulating the scale, you can change the visual presentation.
Another misleading graphing technique sometimes used involves
exaggerating a one-dimensional increase by showing it in two dimensions.
For example, the average cost of a 30-second Super Bowl commercial has
increased from $42,000 in 1967 to $5.6 million in 2020.
The increase shown by the graph in Figure 2–23(a) represents the change
by a comparison of the heights of the two bars in one dimension. The same
data are shown two-dimensionally with circles in Figure 2–23(b). Notice that
the difference seems much larger because the eye is comparing the areas of
the circles rather than the lengths of the diameters.

FIGURE 2–22 page 90


Projected Miles per Gallon

Note that it is not wrong to use the graphing techniques of truncating the
scales or representing data by two-dimensional pictures. But when these
techniques are used, the reader should be cautious of the conclusion drawn on
the basis of the graphs.
Another way to misrepresent data on a graph is by omitting labels or units
on the axes of the graph. The graph shown in Figure 2–24 compares the cost
of living, economic growth, population growth, etc., of four main geographic
areas in the United States. However, since there are no numbers on the y axis,
very little information can be gained from this graph, except a crude ranking
of each factor. There is no way to decide the actual magnitude of the
differences.
Finally, all graphs should contain a source for the information presented.
The inclusion of a source for the data will enable you to check the reliability
of the organization presenting the data.

FIGURE 2–23 page 91


Comparison of Costs for a 30-Second Super Bowl Commercial
FIGURE 2–24
A Graph with No Units on the y Axis

Applying the Concepts 2–3


Causes of Accidental Deaths in the United States, 1999–
2009
The graph shows the number of deaths in the United States due to
accidents. Answer the following questions about the graph.

Source: National Safety Council

1. Name the variables used in the graph. page 92


2. Are the variables qualitative or quantitative?
3. What type of graph is used here?
4. Which variable shows a decrease in the number of deaths over the
years?
5. Which variable or variables show an increase in the number of deaths
over the years?
6. The number of deaths in which variable remains about the same over
the years?
7. List the approximate number of deaths for each category for the year
2001.
8. In 1999, which variable accounted for the most deaths? In 2009, which
variable accounted for the most deaths?
9. In what year were the numbers of deaths from poisoning and falls
about the same?

See page 110 for the answers.


Exercises 2–3
1. Controlled Substances Prescription Guidelines The National Safety
Council has set 6 recommendations for states to follow in prescribing
controlled substances. The data show the number of states that have
laws in that particular area. Draw a vertical bar graph to illustrate the
data.

Source: TD Ameritrade Survey

2. Paying Off a College Debt The following data show the ages at which
millennials (20–29) expect to pay off their college debt. Draw a
horizontal bar graph to represent the data.

Source: TD Ameritrade Survey

3. Snack Food The following data represent the amount of money in


billions of dollars spent by United States residents on snack foods for a
recent year. Draw a Pareto chart for the data.

Source: Consumer Goods and FMCG

4. Internet Users The data show the top five nations with the most
Internet users in millions. Draw a Pareto chart for the data.

Source: Computer Industry Almanac

5. Online Ad Spending The amount spent (in billions of dollars) for ads
online is shown. (The numbers for 2016 through 2019 were projected
numbers.) Draw a time series graph and comment on the trend.

Source: eMarketer.

6. Medical Prescriptions The number of medical prescriptions dispensed


over the years from 2015 to 2019 is shown. Draw a time series graph
and comment on the nature of the situation.
7. World Population The data (in millions) for the population of the
world for the specific years is shown. Draw a time series graph for the
data.

8. Automobile Sales The data show the number of automobiles page 93


sold by an automobile dealership. Draw a time series graph for
the data and analyze the results.

9. What’s Cooking? A study of 1063 U.S. adults found that 24% used a
microwave oven to prepare a meal, 44% use the stove-top, 25% used
the oven, and 7% used another method. Draw a pie graph for the data
and analyze the graph.
Source: Pieapod Survey

10. FEMA Help for Hurricane Victims The following survey by


Morning Consult/Politico asked whether or not the Federal Emergency
Management Agency (FEMA) had done enough for Texas and
Louisiana residents after Hurricane Harvey made landfall. Assume that
they surveyed 1000 people. Draw a pie graph for the information.

11. Lottery Winners A survey by BMO Harris asked 304 small business
owners if they would sell their businesses if they won the lottery. The
results are shown. Draw a pie graph for the data.

12. Children’s Living Arrangements The following data represent the


living arrangement for children with single parents. Draw two pie
graphs and compare the results.

Source: U.S. Census Bureau


13. Prior Prison Sentences The following data show the number of prior
prison sentences that persons have after being convicted of a crime.
Construct a dot plot for the data.

14. Teacher Strikes In Pennsylvania the numbers of teacher strikes for the
last 14 years are shown. Construct a dotplot for the data. Comment on
the graph.

Source: School Leader News.

15. Years of Experience The data show the number of years of experience
the players on the Pittsburgh Steelers football team have at the
beginning of the season. Draw and analyze a dot plot for the data.

16. Commuting Times Fifty off-campus students were asked how long it
takes them to get to school. The times (in minutes) are shown.
Construct a dotplot and analyze the data.

17. 50 Home Run Club There are 43 Major League baseball players (as of
2015) who have hit 50 or more home runs in one season. Construct a
stem and leaf plot and analyze the data.

Source: The World Almanac and Book of Facts.

18. Terrorist Attacks The data show the number of terrorist attacks
against the United States over a recent 16-year period. Draw a stem and
leaf plot for the data.

Source: Global Terrorism Data Base

19. Length of Major Rivers The data show the lengths (in page 94
hundreds of miles) of major rivers in South America and
Europe. Construct a back-to-back stem and leaf plot, and compare the
distributions.

Source: The World Almanac and Book of Facts.

20. Math and Reading Achievement Scores The math and reading
achievement scores from the National Assessment of Educational
Progress for selected states are listed below. Construct a back-to-back
stem and leaf plot with the data, and compare the distributions.

Source: World Almanac.

21. State which type of graph (Pareto chart, time series graph, or pie
graph) would most appropriately represent the data.
a. Situations that distract automobile drivers
b. Number of persons in an automobile used for getting to and from
work each day
c. Amount of money spent for textbooks and supplies for one semester
d. Number of people killed by tornados in the United States each year
for the last 10 years
e. The number of pets (dogs, cats, birds, fish, etc.) in the United States
this year
f. The average amount of money that a person spent for a significant
other for Christmas for the last 6 years

22. State which graph (Pareto chart, time series graph, or pie graph) would
most appropriately represent the given situation.
a. The number of students enrolled at a local college for each year
during the last 5 years
b. The budget for the student activities department at a certain college
for a specific year
c. The means of transportation the students use to get to school
d. The percentage of votes each of the four candidates received in the
last election
e. The record temperatures of a city for the last 30 years
f. The frequency of each type of crime committed in a city during the
year

23. Credit Scores The following factors contribute to a FICO credit score.
Draw a pie chart and vertical bar graph for the data. Which graph is a
better representation of the data?

Source: Experian

24. Alcohol Poisoning The following data show the ages and the percents
of individuals who have died from alcohol poisoning for a specific
year. Draw a pie chart and a vertical bar graph for the data. Which
graph do you think better illustrates the significance of the data?
Explain.

Source: Addictionresource.com
(Note: Class width here is 10 and was already decided by the researchers, so it can be used
here.)

25. Cost of Milk The graph shows the increase in the price of a quart of
milk. Why might the increase appear to be larger than it really is?

26. Websites The data show the number (in millions) of websites in the
United States from 2012 to 2020. Draw a time series graph for the data.

27. Chicago Homicides The data represent the number of page 95


homicides that occurred in Chicago from 2013 to 2020. Draw a
time series graph for the data.
28. Trip Reimbursements The average amount requested for business trip
reimbursement is itemized below. Illustrate the data with an
appropriate graph. Do you have any questions regarding the data?

Source: USA TODAY.

Technology
TI-84 Plus
Step by Step

Step by Step
To graph a time series, follow the procedure for a frequency polygon from
Section 2–2, using the following data for the number of outdoor drive-in
theaters

EXCEL
Step by Step

Constructing a Pareto Chart


To make a Pareto chart:
1. Enter the snack food categories from Example 2–11 into column A of
a new worksheet.
2. Enter the corresponding frequencies in column B. The data should be
entered in descending order according to frequency.
3. Highlight the data from columns A and B, and select the Insert tab
from the toolbar.
4. Select the Column Chart type.

5. To change the title of the chart, click on the current title of the page 96
chart.
6. When the text box containing the title is highlighted, click the mouse
in the text box and change the title.
7. Right-click on any bar and select Format Data Series. Change the gap
width to zero.
8. Add labels to the axes and delete the legend.

Constructing a Time Series Chart


Example

*Vehicles (in millions) that used the Pennsylvania Turnpike.


Source:
https://www.paturnpike.com/pdfs/business/finance/AuditorGeneralsPeformanceAuditMar20
19.pdf

To make a time series chart:


1. Enter the years 2009 through 2018 from the example in column A of a
new Excel worksheet.
2. Enter the corresponding frequencies in column B.
3. Highlight the data from column B and select the Insert tab from the
toolbar.
4. Select the Line chart type.

5. Right-click the mouse on any region of the graph.


6. Select the Select Data option.

7. Select Edit from the Horizontal Axis Labels and highlight page 97
the years from column A, then click [OK].
8. Click [OK] on the Select Data Source box.
9. Create a title for your chart, such as Vehicles Using the Pennsylvania
Turnpike. Right-click the mouse on any region of the chart. Select the
Chart Tools tab from the toolbar, then Layout.
10. Select Chart Title and highlight the current title to change the title.
11. Select Axis Titles to change the horizontal and vertical axis labels.

Constructing a Pie Chart


To make a pie chart:
1. Enter the shifts from Example 2–12 into column A of a new
worksheet.
2. Enter the frequencies corresponding to each shift in column B.
3. Highlight the data in columns A and B and select Insert from the
toolbar; then select the Pie chart type.

4. Click on any region of the chart. Then select Design from the Chart
Tools tab on the toolbar.

5. Select Add Chart Elements from the chart Layouts tab on page 98
the toolbar. Under Format Data Labels, check Category Name
and Percentage; uncheck legend.
6. To change the title of the chart, click on the current title of the chart.
7. When the text box containing the title is highlighted, click the mouse
in the text box and change the title.

MINITAB
Step by Step

Construct a Pie Chart


1. Enter the summary data for snack foods and frequencies from
Example 2–11 into C1 and C2.
2. Name them Snack and Pounds in Millions.
3. Select Graph>Pie Chart.
a) Click the option for Chart values from a table.
b) Press [Tab] to move to Categorical variable, then double-click
C1 to select it.
c) Press [Tab] to move to Summary variables, then double-click C2
to select it.
4. Click the [Labels] tab, then Titles/Footnotes. page 99
a) Type in the title: Super Bowl Snacks in Millions of Pounds.
b) Click the Slice Labels tab, then the options for Category name
and Percent.
c) Click the option to Draw a line from label to slice.
d) Click [OK] twice to create the chart.

Construct a Bar Chart


The procedure for constructing a bar chart is similar to that for the pie chart.
1. Select Graph>Bar Chart.
a) Click on the drop-down list in Bars Represent: and then select
values from a table.
b) Click on the Simple chart, then click [OK]. The dialog box will be
similar to the Pie Chart Dialog Box.
2. Select the frequency column C2 Pounds in Millions for Graph
variables: and C1 Snack for the Categorical variable.
3. Click on [Labels], then type the title in the Titles/Footnote tab:
Super Bowl Snacks in Millions of Pounds.
4. Click the tab for Data Labels, then click the option to Use labels
from column: and select C1 Snacks.
5. Click [OK] twice.

After the graph is made, right-click over any bar to change the appearance
such as the color of the bars. To change the gap between them, right-click
on the horizontal axis and then choose Edit X scale. To change the y Scale
to percents, right-click on the vertical axis and then choose Graph options
and Show Y as a Percent.

Construct a Pareto Chart page 100


Pareto charts are a quality control tool. They are similar to a bar chart with
no gaps between the bars, and the bars are arranged by frequency.
1. Select Stat>Quality Tools>Pareto.
2. Click in the box for Defects or attribute in: and select C1 Snack.
3. Click in the box Frequencies in: and select C2 Pounds in Millions.

4. Click on [Options]. Type in the title, Super Bowl Snacks in


Millions of Pounds.
5. Click [OK] twice. The chart is completed.

Construct a Time Series Plot page 101

The data show the average water consumption in a specific city in millions
of cubic meters for a one-year period (Example 2–10).

1. Add a blank worksheet to the project by selecting File>New>New-


Minitab Worksheet.
2. Type the months from January to December in C1; label the column
Month.
3. Type water consumption numbers in column C2. Label the column
Water Consumption.
4. To make the graph, select Graph>Time series plot, then Simple,
and press [OK].
a) For Series select Water Consumption.
b) Click [Time/scale], select the Stamp option and select Month for
the Stamp column.
c) Click the Gridlines tab and select all three boxes, Y major, Y
minor, and X major.
d) Click [OK] twice. A new window will open that contains the graph.
e) To change the title, double-click the title in the graph window. A
dialog box will open, allowing you to change the text to Average
Water Consumption in Millions of Cubic Meters.

Construct a Stem and Leaf Plot


1. Type in the data for Example 2–15. Label the column Car Thefts.
2. Select Graph>Stem-and-Leaf.
3. Double-click on C1 Car Thefts in the column list.
4. Click in the Increment text box, and enter the class width of 5.
5. Click [OK]. This character graph will be displayed in the session
window.

page 102
Summary
• When data are collected, the values are called raw data. Since very little
knowledge can be obtained from raw data, they must be organized in
some meaningful way. A frequency distribution using classes is the
common method that is used. (2–1)
• Once a frequency distribution is constructed, graphs can be drawn to
give a visual representation of the data. The most commonly used
graphs in statistics are the histogram, frequency polygon, and ogive. (2–
2)
• Other graphs such as the bar graph, Pareto chart, time series graph, pie
graph and dotplot can also be used. Some of these graphs are frequently
seen in newspapers, magazines, and various statistical reports. (2–3)
• A stem and leaf plot uses part of the data values as stems and part of the
data values as leaves. This graph has the advantage of a frequency
distribution and a histogram. (2–3)
• Finally, graphs can be misleading if they are drawn improperly. For
example, increases and decreases over time in time series graphs can be
exaggerated by truncating the scale on the y axis. One-dimensional
increases or decreases can be exaggerated by using two-dimensional
figures. Finally, when labels or units are purposely omitted, there is no
actual way to decide the magnitude of the differences between the
categories. (2–3)

Important Terms
bar graph 76
categorical frequency distribution 43
class 42
class boundaries 45
class midpoint 45
class width 45
compound bar graphs 78
cumulative frequency 61
cumulative frequency distribution 48
dotplot 84
frequency 42
frequency distribution 42
frequency polygon 59
grouped frequency distribution 44
histogram 59
lower class limit 44
ogive 61
open-ended distribution 46
Pareto chart 79
pie graph 81
raw data 42
relative frequency graph 62
stem and leaf plot 85
time series graph 80
ungrouped frequency distribution 49
upper class limit 44

Important Formulas
Formula for the percentage of values in each class:

where

Formula for the range:

Formula for the class width:

Formula for the class midpoint:

or
Formula for the degrees for each section of a pie graph:

page 103
Review Exercises
Section 2–1
1. Alcohol Consumption A survey shows the type of alcoholic beverages
that a person consumes. Construct a categorical frequency distribution
for the data: B = beer, W = wine, and S = spirits.

2. Takeout Food A survey of 30 people were asked what type of takeout


food they usually ordered. The responses were hot dogs (D),
hamburgers (H), pizza (P), hoagies (G), and salads (S). Construct a
frequency distribution for the data.

3. BUN Count The blood urea nitrogen (BUN) count of 20 randomly


selected patients is given here in milligrams per deciliter (mg/dl).
Construct an ungrouped frequency distribution for the data.

4. Wind Speed The data show the average wind speed for 36 days in a
large city. Construct an ungrouped frequency distribution for the data.

5. Waterfall Heights The data show the heights (in feet) of notable
waterfalls in North America. Organize the data into a grouped
frequency distribution using 6 classes. This data will be used for
Exercises 7, 9, and 11.

Source: National Geographic Society

6. Ages of the Vice Presidents at the Time of Their Death The ages at
the time of death of those Vice Presidents of the United States who
have passed away are listed below. Use the data to construct a
frequency distribution. Use 6 classes. The data for this exercise will be
used for Exercises 8, 10, and 12.

Source: World Almanac and Book of Facts.

Section 2–2
7. Find the relative frequency for the frequency distribution for the data in
Exercise 5.
8. Find the relative frequency for the frequency distribution for the data in
Exercise 6.
9. Construct a histogram, frequency polygon, and ogive for the data in
Exercise 5.

10. Construct a histogram, frequency polygon, and ogive for the data in
Exercise 6.

11. Construct a histogram, frequency polygon, and ogive, using relative


frequencies for the data in Exercise 5.

12. Construct a histogram, frequency polygon, and ogive, using relative


frequencies for the data in Exercise 6.

Section 2–3
13. Non-Alcoholic Beverages The data show the yearly consumption (in
gallons) of popular non-alcoholic beverages. Draw a vertical and
horizontal bar graph to represent the data.
Source: U.S. Department of Agriculture

14. Family Size The following data represent the percents of family sizes
of residents in the United States. Draw a vertical bar graph for the data
and summarize the results.

Source: U.S. Census Bureau

15. Crime The data show the percentage of the types of crimes page 104
commonly committed in the United States. Construct a Pareto
chart for the data.

Source: FBI

16. Hours of Sleep The following data show the recommended number of
hours per night that people need for sleep. Draw a Pareto graph for the
data and summarize the results.

Source: SleepFoundation.org

17. Broadway Stage Engagements The data represent the number of new
shows on Broadway from 2014 to 2020. Draw a time series graph for
the data.

18. Internet Users The data (in hundred millions) show the number of
Internet users worldwide from 2010 to 2019. Draw a time series graph
for the data.

19. Spending of First-Year College Students The average amounts spent


by first-year college students for school items are shown. Construct a
pie graph for the data.

Source: National Retail Federation.


20. Smart Phone Insurance Construct and analyze a pie graph for the
people who did or did not buy insurance for their smart phones at the
time of purchase.

Source: Based on information from Anderson Analytics

21. Peyton Manning’s Colts Career Peyton Manning played for the
Indianapolis Colts for 14 years. (He did not play in 2011.) The data
show the number of touchdowns he scored for the years 1998–2010.
Construct a dotplot for the data and comment on the graph.

Source: NFL.com

22. Songs on CDs The data show the number of songs on each of 40 CDs
from the author’s collection. Construct a dotplot for the data and
comment on the graph.

23. Weights of Football Players A local football team has 30 players; the
weight of each player is shown. Construct a stem and leaf plot for the
data. Use stems 20__, 21__, 22__, etc.

24. Public Libraries The numbers of public libraries in operation for


selected states are listed below. Organize the data with a stem and leaf
plot.

Source: World Almanac.

25. Pain Relief The graph below shows the time it takes Quick page 105
Pain Relief to relieve a person’s pain. The graph below that
shows the time a competitor’s product takes to relieve pain. Why might
these graphs be misleading?
26. Casino Payoffs The graph shows the payoffs obtained from the White
Oak Casino compared to the nearest competitor’s casino. Why is this
graph misleading?

Data Analysis
A Data Bank is found in Appendix B.
1. From the Data Bank located in Appendix B, choose one of the
following variables: age, weight, cholesterol level, systolic pressure,
IQ, or sodium level. Select at least 30 values. For these values,
construct a grouped frequency distribution. Draw a histogram,
frequency polygon, and ogive for the distribution. Describe briefly the
shape of the distribution.
2. From the Data Bank, choose one of the following variables:
educational level, smoking status, or exercise. Select at least 20 values.
Construct an ungrouped frequency distribution for the data. For the
distribution, draw a Pareto chart and describe briefly the nature of the
chart.
3. From the Data Bank, select at least 30 subjects and construct a
categorical distribution for their marital status. Draw a pie graph and
describe briefly the findings.
4. Using the data from Data Set IV in Appendix B, construct a frequency
distribution and draw a histogram. Describe briefly the shape of the
distribution of the tallest buildings in New York City.
5. Using the data from Data Set XI in Appendix B, construct a frequency
distribution and draw a frequency polygon. Describe briefly the shape
of the distribution for the number of pages in statistics books.
6. Using the data from Data Set IX in Appendix B, divide the United
States into four regions, as follows:
Find the total population for each region, and draw a Pareto chart and a
pie graph for the data. Analyze the results. Explain which chart might
be a better representation for the data.
7. Using the data from Data Set I in Appendix B, make a stem and leaf
plot for the record low temperatures in the United States. Describe the
nature of the plot.

page 106

STATISTICS TODAY
How Your Identity Can Be Stolen
—Revisited
Data presented in numerical form do not convey an easy-to-interpret conclusion; however,
when data are presented in graphical form, readers can see the visual impact of the
numbers. In the case of identity fraud, the reader can see that most of the identity fraud is
credit card fraud followed by employment of tax-related fraud. These two types of fraud
account for over half (57%) of the identity frauds.

The Federal Trade Commission suggests some ways to protect your identity:
1. Shred all financial documents no longer needed.
2. Protect your Social Security number.
3. Don’t give out personal information on the phone, through the mail, or over the Internet.
4. Never click on links sent in unsolicited emails.
5. Don’t use an obvious password for your computer documents.
6. Keep your personal information in a secure place at home.

Chapter Quiz
Determine whether each statement is true or false. If the
statement is false, explain why.
1. In the construction of a frequency distribution, it is a good idea to have
overlapping class limits, such as 10–20, 20–30, 30–40.

2. Bar graphs can be drawn by using vertical or horizontal bars.


3. It is not important to keep the width of each class the same in a
frequency distribution.
4. Frequency distributions can aid the researcher in drawing charts and
graphs.
5. The type of graph used to represent data is determined by the type of
data collected and by the researcher’s purpose.
6. In construction of a frequency polygon, the class limits are used for the
x axis. x
7. Data collected over a period of time can be graphed by using a pie
graph.

Select the best answer.


8. What is another name for the ogive?
a. Histogram
b. Frequency polygon
c. Cumulative frequency graph
d. Pareto chart

9. What are the boundaries for 8.6–8.8? page 107


a. 8–9
b. 8.5–8.9
c. 8.55–8.85
d. 8.65–8.75

10. What graph should be used to show the relationship between the parts
and the whole?
a. Histogram
b. Pie graph
c. Pareto chart
d. Ogive

11. Except for rounding errors, relative frequencies should add up to what
sum?
a. 0
b. 1
c. 50
d. 100

Complete these statements with the best answers.


12. The three types of frequency distributions are ______, ______, and
______.

13. In a frequency distribution, the number of classes should be between


______ and ______.

14. Data such as blood types (A, B, AB, O) can be organized into a(n)
______ frequency distribution.

15. Data collected over a period of time can be graphed using a(n) ______
graph.

16. A statistical device used in exploratory data analysis that is a


combination of a frequency distribution and a histogram is called a(n)
______.

17. On a Pareto chart, the frequencies should be represented on the ______


axis.

18. Housing Arrangements A questionnaire on housing arrangements


showed this information obtained from 25 respondents. Construct a
frequency distribution for the data (H = house, A = apartment, M =
mobile home, C = condominium). These data will be used in Exercise
19.
19. Construct a pie graph for the data in Exercise 18.

20. Items Purchased at a Convenience Store When 30 randomly selected


customers left a convenience store, they were asked the number of
items they had purchased. Construct an ungrouped frequency
distribution for the data. These data will be used in Exercise 21.

21. Construct a histogram, a frequency polygon, and an ogive for the data
in Exercise 20.

22. Coal Consumption The following data represent the energy


consumption of coal (in billions of Btu) by each of the 50 states and the
District of Columbia. Use the data to construct a frequency distribution
and a relative frequency distribution with 7 classes.

Source: Time Almanac.

23. Construct a histogram, frequency polygon, and ogive for the data in
Exercise 22. Analyze the histogram.

24. Recycled Trash Construct a Pareto chart and a horizontal bar graph for
the number of tons (in millions) of trash recycled per year by
Americans based on an Environmental Protection Agency study.

Source: USA TODAY.

25. Identity Thefts The results of a survey of 84 people whose identities


were stolen using various methods are shown. Draw a pie chart for the
information.

Source: Javelin Strategy and Research.

26. Needless Deaths of Children The New England Journal of page 108
Medicine predicted the number of needless deaths due to
childhood obesity. Draw a time series graph for the data.

27. Museum Visitors The number of visitors to the Historic Museum for
25 randomly selected hours is shown. Construct a stem and leaf plot
for the data.

28. Parking Meter Revenue In a small city the number of quarters


collected from the parking meters is shown. Construct a dotplot for the
data.

29. Water Usage The graph shows the average number of gallons of water
a person uses for various activities. Can you see anything misleading
about the way the graph is drawn?

Critical Thinking Challenges


1. The Great Lakes Shown are various statistics about the Great Lakes.
Answer the following questions and decide which graph would best
illustrate the answers.
a. What is the volume of the largest of the Great Lakes?
b. What is the total U.S. shoreline of the Great Lakes?
c. Which is the widest lake?
d. What percentage of the total area of the Great Lakes are Lake
Superior and Lake Michigan combined?
e. Which is the deepest of the Great Lakes? Which is the shallowest?
f. What is the relationship between the areas of Lake Ontario and Lake
Superior?
g. Which is the smallest of the Great Lakes? What did you use to reach
this conclusion?

Source: The World Almanac and Book of Facts.

2. Road Rage Here are some statistics on road rage. If you are page 109
giving a presentation on this subject, what type of graph could
you use to emphasize the given statistics? Draw the graph.
Road rage is a serious and dangerous problem that today’s drivers have
to deal with almost daily. Road rage can lead to serious consequences
and even death.
A recent study by the American Automobile Association (AAA)
reported the following information based on the responses of millions
of drivers. The most common responses of drivers who exhibit road
rage are
Tailgating (104 million or 51%)
Yelling at another driver (95 million or 47%)
Honking the horn (91 million or 45%)
Making angry gestures (67 million or 33%)
Blocking another vehicle on purpose (49 million or 24%)
Cutting off another vehicle (24 million or 12%)
Getting out of the automobile and confronting the other driver (7.6
million or 4%)
Deliberately ramming another vehicle (5.7 million or 3%)
Some causes for road rage are
Ignoring other drivers because of cell phone use
Keeping high beams on regardless of traffic conditions
Failing to use turn signals when changing lanes
Forgetting to check the blind spot when changing lanes
Causing other drivers to change their speed or use their brakes
The AAA suggests that drivers be tolerant and forgiving of other
drivers and never respond to other drivers’ road rage actions.

Data Projects
Where appropriate, use the TI-84 Plus, Excel, MINITAB, or a
computer program of your choice to complete the following exercises.
1. Business and Finance Consider the 30 stocks listed as the Dow Jones
Industrials. For each, find its earnings per share. Randomly select 30
stocks traded on the NASDAQ. For each, find its earnings per share.
Create a frequency table with 5 categories for each data set. Sketch a
histogram for each. How do the two data sets compare?
2. Sports and Leisure Use systematic sampling to create a sample of 25
National League and 25 American League baseball players from the
most recently completed season. Find the number of home runs for
each player. Create a frequency table with 5 categories for each data
set. Sketch a histogram for each. How do the two leagues compare?
3. Technology Randomly select 50 songs from your music player or
music organization program. Find the length (in seconds) for each
song. Use these data to create a frequency table with 6 categories.
Sketch a frequency polygon for the frequency table. Is the shape of the
distribution of times uniform, skewed, or bell-shaped? Also note the
genre of each song. Create a Pareto chart showing the frequencies of
the various categories. Finally, note the year each song was released.
Create a pie chart organized by decade to show the percentage of songs
from various time periods.

4. Health and Wellness Use information from the Red Cross to page 110
create a pie chart depicting the percentages of Americans with
various blood types. Also find information about blood donations and
the percentage of each type donated. How do the charts compare? Why
is the collection of type O blood so important?
5. Politics and Economics Consider the U.S. Electoral College System.
For each of the 50 states, determine the number of delegates received.
Create a frequency table with 8 classes. Is this distribution uniform,
skewed, or bell-shaped?
6. Your Class Have each person in class take their pulse and determine
the heart rate (beats in 1 minute). Use the data to create a frequency
table with 6 classes. Then have everyone in the class do 25 jumping
jacks and immediately take the pulse again after the activity. Create a
frequency table for those data as well. Compare the two results. Are
they similarly distributed? How does the range of scores compare?

Answers to Applying the Concepts


Section 2–1 Ages of Presidents at Inauguration
1. The data were obtained from the population of all Presidents at the
time this text was written.
2. The oldest inauguration age was 78 years old.
3. The youngest inauguration age was 42 years old.
4. Answers will vary. One possible answer is

5. Answers will vary. For the distribution shown, there is a peak at the
49–55 class.
6. The age 78 could be an outlier.
7. The distribution is unimodal with 49–55 being the modal class. It is
somewhat positively skewed.

Section 2–2 Selling Real Estate


1. A histogram of the data gives price ranges and the counts of homes in
each price range. We can also talk about how the data are distributed by
looking at a histogram.
2. A frequency polygon shows increases or decreases in the number of
home prices around values.
3. A cumulative frequency polygon shows the number of homes sold at or
below a given price.
4. The house that sold for $1,350,000 is an extreme value in this data set.
5. Answers will vary. One possible answer is that the histogram displays
the outlier well since there is a gap in the prices of the homes sold.
6. The distribution of the data is skewed to the right.

Section 2–3 Causes of Accidental Deaths in the United


States
1. The variables in the graph are the year, cause of death, and number of
deaths in thousands.
2. The cause of death is qualitative, while the year and number of deaths
are quantitative.
3. A time series graph is used here.
4. The motor vehicle accidents showed a slight increase from 1999 to
2007, and then a decrease.
5. The number of deaths due to poisoning and falls is increasing.
6. The number of deaths due to drowning remains about the same over
the years.
7. For 2001, about 44,000 people died in motor vehicle accidents, about
15,000 people died from falls, about 14,000 people died from
poisoning, and about 3000 people died from drowning.
8. In 1999, motor vehicle accidents claimed the most lives, while in 2009,
poisoning claimed the most lives.
9. Around 2002, the number of deaths from falls and poisoning were
about the same.

You might also like