0% found this document useful (0 votes)
32 views

Module Part 2 Frequency Distribution and Graphs

Chapter 2 focuses on organizing data through frequency distributions and presenting it using various types of graphs such as histograms, frequency polygons, and ogives. It outlines the importance of categorizing raw data into classes, calculating frequencies, and constructing different types of distributions, including cumulative frequency distributions. The chapter emphasizes the utility of graphical representations in making data more comprehensible and facilitating analysis.

Uploaded by

dyuhniz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Module Part 2 Frequency Distribution and Graphs

Chapter 2 focuses on organizing data through frequency distributions and presenting it using various types of graphs such as histograms, frequency polygons, and ogives. It outlines the importance of categorizing raw data into classes, calculating frequencies, and constructing different types of distributions, including cumulative frequency distributions. The chapter emphasizes the utility of graphical representations in making data more comprehensible and facilitating analysis.

Uploaded by

dyuhniz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

CHAPTER 2 FREQUENCY DISTRIBUTION AND GRAPHS

OUTLINE OBJECTIVES
Introduction 1. Organize data using a frequency distribution.
2–1 Organizing Data 2. Represent data in frequency distributions
2–2 Histograms, Frequency Polygons, and Ogives graphically, using histograms, frequency
2–3 Other Types of Graphs Summary polygons, and ogives.
3. Represent data using bar graphs, Pareto charts,
time series graphs, pie graphs, and dot plots.
4. Draw and interpret a stem and leaf plot.

Introduction
When conducting a statistical study, the researcher must gather data for the particular variable under
study. For example, if a researcher wishes to study the number of people who were bitten by poisonous snakes
in a specific geographic area over the past several years, he or she has to gather the data from various doctors,
hospitals, or health departments.
To describe situations, draw conclusions, or make inferences about events, the researcher must
organize the data in some meaningful way. The most convenient method of organizing data is to construct a
frequency distribution.
After organizing the data, the researcher must present them so they can be understood by those who
will benefit from reading the study. The most useful method of presenting the data is by constructing statistical
charts and graphs. There are many different types of charts and graphs, and each one has a specific purpose.
This chapter explains how to organize data by constructing frequency distributions and how to present
the data by constructing charts and graphs. The charts and graphs illustrated here are histograms, frequency
polygons, ogives, pie graphs, Pareto charts, and time series graphs. A graph that combines the characteristics of
a frequency distribution and a histogram, called a stem and leaf plot, is also explained

2-1 Organizing Data


Suppose a researcher wished to do a study on the ages of the 50 wealthiest people in the world. The
researcher first would have to get the data on the ages of the people. In this case, these ages are listed in Forbes
Magazine. When the data are in original form, they are called raw data and are listed next.

Since little information can be obtained from looking at raw data, the researcher organizes the data into what is
called a frequency distribution.
A frequency distribution is the organization of raw data in table form, using classes and frequencies.

Each raw data value is placed into a quantitative or qualitative category called a class. The frequency of a class
then is the number of data values contained in a specific class. A frequency distribution is shown for the
preceding data set.

Now some general observations can be made from looking at the frequency distribution. For example, it can be
stated that the majority of the wealthy people in the study are 45 years old or older.
The classes in this distribution are 27–35, 36–44, etc. These values are called class limits. The data
values 27, 28, 29, 30, 31, 32, 33, 34, 35 can be tallied in the first class; 36, 37, 38, 39, 40, 41, 42, 43, 44 in the
second class; and so on. Two types of frequency distributions that are most often used are the categorical
frequency distribution and the grouped frequency distribution. The procedures for constructing these
distributions are shown now.

Categorical Frequency Distributions


The categorical frequency distribution is used for data that can be placed in specific categories, such as
nominal- or ordinal-level data. For example, data such as political affiliation, religious affiliation, or major
field of study would use categorical frequency distributions.
Grouped Frequency Distributions
When the range of the data is large, the data must be grouped into classes that are more than one unit in
width, in what is called a grouped frequency distribution. For example, a distribution of the blood glucose
levels in milligrams per deciliter (mg/dL) for 50 randomly selected college students is shown.
The procedure for constructing the preceding frequency distribution is given in Example 2–2; however, several
things should be noted. In this distribution, the values 58 and 64 of the first class are called class limits. The
lower class limit is 58; it represents the smallest data value that can be included in the class. The upper class
limit is 64; it represents the largest data value that can be included in the class. The numbers in the second
column are called class boundaries. These numbers are used to separate the classes so that there are no gaps in
the frequency distribution. The gaps are due to the limits; for example, there is a gap between 64 and 65.
Students sometimes have difficulty finding class boundaries when given the class limits. The basic rule
of thumb is that the class limits should have the same decimal place value as the data, but the class boundaries
should have one additional place value and end in a 5. For example, if the values in the data set are whole
numbers, such as 59, 68, and 82, the limits for a class might be 58–64, and the boundaries are 57.5–64.5. Find
the boundaries by subtracting 0.5 from 58 (the lower class limit) and adding 0.5 to 64 (the upper class limit).

Lower limit – 0.5 = 58 − 0.5 = 57.5 = lower boundary


Upper limit + 0.5 = 64 + 0.5 = 64.5 = upper boundary

If the data are in tenths, such as 6.2, 7.8, and 12.6, the limits for a class hypothetically might be 7.8–
8.8, and the boundaries for that class would be 7.75–8.85. Find these values by subtracting 0.05 from 7.8 and
adding 0.05 to 8.8. Class boundaries are not always included in frequency distributions; however, they give a
more formal approach to the procedure of organizing data, including the fact that sometimes the data have been
rounded. You should be familiar with boundaries since you may encounter them in a statistical study. Finally,
the class width for a class in a frequency distribution is found by subtracting the lower (or upper) class limit of
one class from the lower (or upper) class limit of the next class. For example, the class width in the preceding
distribution on the distribution of blood glucose levels is 7, found from 65 − 58 = 7. The class width can also
be found by subtracting the lower boundary from the upper boundary for any given class. In this case, 64.5 −
57.5 = 7. Note: Do not subtract the limits of a single class. It will result in an incorrect answer. The researcher
must decide how many classes to use and the width of each class.
To construct a frequency distribution, follow these rules:
1. There should be between 5 and 20 classes. Although there is no hard-and-fast rule for the number
of classes contained in a frequency distribution, it is of utmost importance to have enough classes
to present a clear description of the collected data.
2. It is preferable but not absolutely necessary that the class width be an odd number. This ensures
that the midpoint of each class has the same place value as the data. The class midpoint Xm is
obtained by adding the lower and upper boundaries and dividing by 2, or adding the lower and
upper limits and dividing by 2:
The midpoint is the numeric location of the center of the class. Midpoints are necessary for
graphing (see Section 2–2). If the class width is an even number, the midpoint is in tenths. For
example, if the class width is 6 and the boundaries are 5.5 and 11.5, the midpoint is

Rule 2 is only a suggestion, and it is not rigorously followed, especially when a computer is used
to group data.
3. The classes must be mutually exclusive. Mutually exclusive classes have nonoverlapping class
limits so that data cannot be placed into two classes. Many times, frequency distributions such as
this
Age
10–20
20–30
30–40
40–50

are found in the literature or in surveys. If a person is 40 years old, into which class should she or
he be placed? A better way to construct a frequency distribution is to use classes such as
Age
10–20
21–31
32–42
43–53
Recall that boundaries are mutually exclusive. For example, when a class boundary is 5.5 to 10.5,
the data values that are included in that class are values from 6 to 10. A data value of 5 goes into
the previous class, and a data value of 11 goes into the next-higher class.
4. The classes must be continuous. Even if there are no values in a class, the class must be included in
the frequency distribution. There should be no gaps in a frequency distribution. The only exception
occurs when the class with a zero frequency is the first or last class. A class with a zero frequency
at either end can be omitted without affecting the distribution
5. The classes must be exhaustive. There should be enough classes to accommodate all the data.

6. The classes must be equal in width. This avoids a distorted view of the data. One exception occurs
when a distribution has a class that is open-ended. That is, the first class has no specific lower
limit, or the last class has no specific upper limit. A frequency distribution with an open-ended
class is called an open-ended distribution. Here are two examples of distributions with open-
ended classes.

The frequency distribution for age is open-ended for the last class, which means that anybody who is 54 years
or older will be tallied in the last class. The distribution for minutes is open-ended for the first class, meaning
that any minute values below 110 will be tallied in that class.

The steps for constructing a grouped frequency distribution are summarized in the following Procedure Table.

Example 2–2 shows the procedure for constructing a grouped frequency distribution, i.e., when the classes
contain more than one data value.































Sometimes it is necessary to use a cumulative frequency distribution. A cumulative frequency
distribution is a distribution that shows the number of data values less than or equal to a specific value
(usually an upper boundary). The values are found by adding the frequencies of the classes less than or
equal to the upper class boundary of a specific class. This gives an ascending cumulative frequency. In
this example, the cumulative frequency for the first class is 0 + 2 = 2; for the second class it is 0 + 2 +
8 = 10; for the third class it is 0 + 2 + 8 + 18 = 28. Naturally, a shorter way to do this would be to just
add the cumulative frequency of the class below to the frequency of the given class. For example, the
cumulative frequency for the number of data values less than 114.5 can be found by adding 10 + 18 =
28. The cumulative frequency distribution for the data in this example is as follows:

Cumulative frequencies are used to show how many data values are accumulated up to and including a
specific class. In Example 2–2, of the total record high temperatures 28 are less than or equal to 114°F.
Forty-eight of the total record high temperatures are less than or equal to 124°F.
After the raw data have been organized into a frequency distribution, it will be analyzed by
looking for peaks and extreme values. The peaks show which class or classes have the most data values
compared to the other classes. Extreme values, called outliers, show large or small data values that are
relative to other data values.
When the range of the data values is relatively small, a frequency distribution can be
constructed using single data values for each class. This type of distribution is called an ungrouped
frequency distribution and is shown next.











All the different types of distributions are used in statistics and are helpful when one is organizing and
presenting data. The reasons for constructing a frequency distribution are as follows:
1. To organize the data in a meaningful, intelligible way.
2. To enable the reader to determine the nature or shape of the distribution.
3. To facilitate computational procedures for measures of average and spread.
4. To enable the researcher to draw charts and graphs for the presentation of data.
5. To enable the reader to make comparisons among different data sets. The factors used to analyze a
frequency distribution are essentially the same as those used to analyze histograms and frequency polygons,
which are shown in Section 2–2.

Applying the Concepts

Ages of Presidents at Inauguration


The data represent the ages of all the presidents of a certain country at the time they were first
inaugurated.

1. Were the data obtained from a population or a sample? Explain your answer.
2. What was the age of the oldest President?
3. What was the age of the youngest President?
4. Construct a frequency distribution for the data. (Use your own judgment as to the number of classes
and class size.)
5. Are there any peaks in the distribution?
6. Identify any possible outliers.
7. Write a brief summary of the nature of the data as shown in the frequency distribution.
2-2 Histograms, Frequency Polygons, and Ogives

After you have organized the data into a frequency distribution, you can present them in graphical
form. The purpose of graphs in statistics is to convey the data to the viewers in pictorial form. It is easier for
most people to comprehend the meaning of data presented graphically than data presented numerically in tables
or frequency distributions. This is especially true if the users have little or no statistical knowledge.
Statistical graphs can be used to describe the data set or to analyze it. Graphs are also useful in getting
the audience’s attention in a publication or a speaking presentation. They can be used to discuss an issue,
reinforce a critical point, or summarize a data set. They can also be used to discover a trend or pattern in a
situation over a period of time.
The three most commonly used graphs in research are
1. The histogram.
2. The frequency polygon.
3. The cumulative frequency graph, or ogive (pronounced o-jive).
The steps for constructing the histogram, frequency polygon, and the ogive are summarized in the
procedure table.

The Histogram
The histogram is a graph that displays the data by using contiguous vertical bars (unless the frequency of a
class is 0) of various heights to represent the frequencies of the classes.
The Frequency Polygon
Another way to represent the same data set is by using a frequency polygon.

The frequency polygon is a graph that displays the data by using lines that connect points plotted for the
frequencies at the midpoints of the classes. The frequencies are represented by the heights of the points.

Example 2–5 shows the procedure for constructing a frequency polygon. Be sure to begin and end on the x-
axis.
The frequency polygon and the histogram are two different ways to represent the same data set. The choice of
which one to use is left to the discretion of the researcher.

The Ogive
The third type of graph that can be used represents the cumulative frequencies for the classes. This type of
graph is called the cumulative frequency graph, or ogive. The cumulative frequency is the sum of the
frequencies accumulated up to the upper boundary of a class in the distribution.

The ogive is a graph that represents the cumulative frequencies for the classes in a frequency distribution.

Example 2–6 shows the procedure for constructing an ogive. Be sure to start on the x-axis.
Cumulative frequency graphs are used to visually represent how many values are below a certain
upper class boundary. For example, to find out how many record high temperatures are less than 114.5°F,
locate 114.5°F on the x-axis, draw a vertical line up until it intersects the graph, and then draw a horizontal
line at that point to the y-axis. The y-axis value is 28, as shown in Figure 2–5.

Relative Frequency Graphs


The histogram, the frequency polygon, and the ogive shown previously were constructed by using
frequencies in terms of the raw data. These distributions can be converted to distributions using proportions
instead of raw data as frequencies. These types of graphs are called relative frequency graphs.
Graphs of relative frequencies instead of frequencies are used when the proportion of data values
that fall into a given class is more important than the actual number of data values that fall into that class.
For example, if you wanted to compare the age distribution of adults in Philadelphia, Pennsylvania, with the
age distribution of adults of Erie, Pennsylvania, you would use relative frequency distributions. The reason is
that since the population of Philadelphia is 1,526,006 and the population of Erie is 101,786, the bars using
the actual data values for Philadelphia would be much taller than those for the same classes for Erie.
To convert a frequency into a proportion or relative frequency, divide the frequency for each class
by the total of the frequencies. The sum of the relative frequencies will always be 1. These graphs are similar
to the ones that use raw data as frequencies, but the values on the y axis are in terms of proportions. Example
2–7 shows the three types of relative frequency graphs.
Distribution Shapes
When one is describing data, it is important to be able to recognize the shapes of the distribution values. In
later chapters, you will see that the shape of a distribution also determines the appropriate statistical methods
used to analyze the data.
A distribution can have many shapes, and one method of analyzing a distribution is to draw a
histogram or frequency polygon for the distribution. Several of the most common shapes are shown in Figure
2–7: the bell-shaped or mound-shaped, the uniform-shaped, the J-shaped, the reverse J-shaped, the
positively or right-skewed shape, the negatively or left-skewed shape, the bimodal-shaped, and the U-shaped.
Distributions are most often not perfectly shaped, so it is not necessary to have an exact shape but
rather to identify an overall pattern.
A bell-shaped distribution shown in Figure 2–7(a) has a single peak and tapers off at either end. It is
approximately symmetric; i.e., it is roughly the same on both sides of a line running through the center.
A uniform distribution is basically flat or rectangular. See Figure 2–7(b).
A J-shaped distribution is shown in Figure 2–7(c), and it has a few data values on the left side and
increases as one moves to the right. A reverse J-shaped distribution is the opposite of the J-shaped
distribution. See Figure 2–7(d).
When the peak of a distribution is to the left and the data values taper off to the right, a distribution
is said to be positively or right-skewed. See Figure 2–7(e). When the data values are clustered to the right and
taper off to the left, a distribution is said to be negatively or left-skewed. See Figure 2–7(f). Skewness will be
explained in detail in Chapter 3. Distributions with one peak, such as those shown in Figure 2–7(a), (e),
and (f), are said to be unimodal. (The highest peak of a distribution indicates where the mode of the data
values is. The mode is the data value that occurs more often than any other data value. Modes are explained
in Chapter 3.) When a distribution has two peaks of the same height, it is said to be bimodal. See Figure 2–
7(g). Finally, the graph shown in Figure 2–7(h) is a U-shaped distribution.
Distributions can have other shapes in addition to the ones shown here; however, these are some of
the more common ones that you will encounter in analyzing data.
When you are analyzing histograms and frequency polygons, look at the shape of the curve. For
example, does it have one peak or two peaks? Is it relatively flat, or is it U-shaped? Are the data values
spread out on the graph, or are they clustered around the center? Are there data values in the extreme ends?
These may be outliers. (See Section 3–3 for an explanation of outliers.) Are there any gaps in the histogram,
or does the frequency polygon touch the x axis somewhere other than at the ends? Finally, are the data
clustered at one end or the other, indicating a skewed distribution? For example, the histogram for the record
high temperatures in Figure 2–1 shows a single peaked distribution, with the class 109.5–114.5 containing
the largest number of temperatures. The distribution has no gaps, and there are fewer temperatures in the
highest class than in the lowest class.

Applying the Concepts

Selling Real Estate


Assume you are a realtor in Bradenton, Florida. You have recently obtained a listing of the selling prices of
the homes that have sold in that area in the last 6 months. You wish to organize those data so you will be
able to provide potential buyers with useful information. Use the following data to create a histogram,
frequency polygon, and cumulative frequency polygon.
142,000 127,000 99,600 162,000 89,000 93,000 99,500
73,800 135,000 119,500 67,900 156,300 104,500 108,650
123,000 91,000 205,000 110,000 156,300 104,000 133,900
179,000 112,000 147,000 321,550 87,900 88,400 180,000
159,400 205,300 144,400 163,000 96,000 81,000 131,000
114,000 119,600 93,000 123,000 187,000 96,000 80,000
231,000 189,500 177,600 83,400 77,000 132,300 166,000
1. What questions could be answered more easily by looking at the histogram rather than the listing of home
prices?
2. What different questions could be answered more easily by looking at the frequency polygon rather than
the listing of home prices?
3. What different questions could be answered more easily by looking at the cumulative frequency polygon
rather than the listing of home prices?
4. Are there any extremely large or extremely small data values compared to the other data values?
5. Which graph displays these extremes the best?
6. Is the distribution skewed?
2-3 Other Types of Graphs

In addition to the histogram, the frequency polygon, and the ogive, several other types of graphs are often
used in statistics. They are the bar graph, Pareto chart, time series graph, pie graph, and the dotplot. Figure
2–8 shows an example of each type of graph.
Bar Graphs
When the data are qualitative or categorical, bar graphs can be used to represent the data.

A bar graph can be drawn using either horizontal or vertical bars. A bar graph represents the data by using
vertical or horizontal bars whose heights or lengths represent the frequencies of the data.

Bar graphs can also be used to compare data for two or more groups. These types of bar graphs are called
compound bar graphs. Consider the following data for the number (in millions) of never married adults in
the United States.
Figure 2–10 shows a bar graph that compares the number of never married males with the number
of never married females for the years shown. The comparison is made by placing the bars next to each other
for the specific years. The heights of the bars can be compared. This graph shows that there have consistently
been more never married males than never married females and that the difference in the two groups has
increased slightly over the last 50 years.

Pareto Charts
When the variable displayed on the horizontal axis is qualitative or categorical, a Pareto chart can also be
used to represent the data.

A Pareto chart is used to represent a frequency distribution for a categorical variable, and the frequencies
are displayed by the heights of vertical bars, which are arranged in order from highest to lowest.
When you analyze a Pareto chart, make comparisons by looking at the heights of the bars.

The Time Series Graph


When data are collected over a period of time, they can be represented by a time series graph.

A time series graph represents data that occur over a specific period of time.

Example 2–10 shows the procedure for constructing a time series graph.
When you analyze a time series graph, look for a trend or pattern that occurs over the time period. For
example, is the line ascending (indicating an increase over time) or descending (indicating a decrease over
time)? Another thing to look for is the slope, or steepness, of the line. A line that is steep over a specific time
period indicates a rapid increase or decrease over that period.
Two or more data sets can be compared on the same graph called a compound time series graph if
two or more lines are used, as shown in Figure 2–13. This graph shows the percentage of elderly males and
females in the U.S. labor force from 1960 to 2010. It shows that the percentage of elderly men decreased
significantly from 1960 to 1990 and then increased slightly after that. For the elderly females, the percentage
decreased slightly from 1960 to 1980 and then increased from 1980 to 2010.

The Pie Graph


Pie graphs are used extensively in statistics. The purpose of the pie graph is to show the relationship of the
parts to the whole by visually comparing the sizes of the sections. Percentages or proportions can be used.
The variable is nominal or categorical.

A pie graph is a circle that is divided into sections or wedges according to the percentage of frequencies in
each category of the distribution.

Example 2–11 shows the procedure for constructing a pie graph.


To analyze the nature of the data shown in the pie graph, look at the size of the sections in the pie
graph. For example, are any sections relatively large compared to the rest? Figure 2–15 shows that the
number of calls for the three shifts are about equal, although slightly more calls were received on the evening
shift.
Note: Computer programs can construct pie graphs easily, so the mathematics shown here would
only be used if those programs were not available.

Dotplots
A dotplot uses points or dots to represent the data values. If the data values occur more than once, the
corresponding points are plotted above one another.

A dotplot is a statistical graph in which each data value is plotted as a point (dot) above the horizontal axis.
Dotplots are used to show how the data values are distributed and to see if there are any extremely high or
low data values.
Stem and Leaf Plots
The stem and leaf plot is a method of organizing data and is a combination of sorting and graphing. It has the
advantage over a grouped frequency distribution of retaining the actual data while showing them in graphical
form.

A stem and leaf plot is a data plot that uses part of the data value as the stem and part of the data value as
the leaf to form groups or classes. For example, a data value of 34 would have 3 as the stem and 4 as the leaf.
A data value of 356 would have 35 as the stem and 6 as the leaf.

Example 2–14 shows the procedure for constructing a stem and leaf plot.
Figure 2–17 shows that the distribution peaks in the center and that there are no gaps in the data. For 7 of the
20 days, the number of patients receiving cardiograms was between 31 and 36. The plot also shows that the
testing center treated from a minimum of 2 patients to a maximum of 57 patients in any one day.
If there are no data values in a class, you should write the stem number and leave the leaf row
blank. Do not put a zero in the leaf row.
When you analyze a stem and leaf plot, look for peaks and gaps in the distribution. See if the distribution is
symmetric or skewed. Check the variability of the data by looking at the spread. ‘
Related distributions can be compared by using a back-to-back stem and leaf plot. The back-to-back
stem and leaf plot uses the same digits for the stems of both distributions, but the digits that are used for the
leaves are arranged in order o ut from the stems on both sides. Example 2–16 shows a back-to-back stem and
leaf plot.
Stem and leaf plots are part of the techniques called exploratory data analysis. More information on this
topic is presented in Chapter 3.

Misleading Graphs
Graphs give a visual representation that enables readers to analyze and interpret data more easily than they
could simply by looking at numbers. However, inappropriately drawn graphs can misrepresent the data and
lead the reader to false conclusions. For example, a car manufacturer’s ad stated that 98% of the vehicles it
had sold in the past 10 years were still on the road. The ad then showed a graph similar to the one in
Figure 2–20. The graph shows the percentage of the manufacturer’s automobiles still on the road and the
percentage of its competitors’ automobiles still on the road. Is there a large difference? Not necessarily.
Notice the scale on the vertical axis in Figure 2–20. It has been cut off (or truncated) and starts at
95%. When the graph is redrawn using a scale that goes from 0 to 100%, as in Figure 2–21, there is hardly a
noticeable difference in the percentages. Thus, changing the units at the starting point on the y axis can
convey a very different visual representation of the data.
It is not wrong to truncate an axis of the graph; many times it is necessary to do so. However, the reader
should be aware of this fact and interpret the graph accordingly. Do not be misled if an inappropriate
impression is given.
Let us consider another example. The projected required fuel economy in miles per gallon for
General Motors vehicles is shown. In this case, an increase from 21.9 to 23.2 miles per gallon is projected.

When you examine the graph shown in Figure 2–22(a), using a scale of 0 to 25 miles per gallon, the graph
shows a slight increase. However, when the scale is changed to 21 to 24 miles per gallon, the graph shows a
much larger increase even though the data remain the same. See Figure 2–22(b). Again, by changing the
units or starting point on the y axis, one can change the visual representation.
Another misleading graphing technique sometimes used involves exaggerating a one-dimensional increase
by showing it in two dimensions. For example, the average cost of a 30-second Super Bowl commercial has
increased from $42,000 in 1967 to $4.5 million in 2015 (Source: USA TODAY).
The increase shown by the graph in Figure 2–23(a) represents the change by a comparison of the
heights of the two bars in one dimension. The same data are shown two-dimensionally with circles in Figure
2–23(b). Notice that the difference seems much larger because the eye is comparing the areas of the circles
rather than the lengths of the diameters.
Note that it is not wrong to use the graphing techniques of truncating the scales or representing
data by two-dimensional pictures. But when these techniques are used, the reader should be cautious of the
conclusion drawn on the basis of the graphs. Another way to misrepresent data on a graph is by omitting
labels or units on the axes of the graph. The graph shown in Figure 2–24 compares the cost of living,
economic growth, population growth, etc., of four main geographic areas in the United States.
However, since there are no numbers on the y axis, very little information can be gained from this
graph, except a crude ranking of each factor. There is no way to decide the actual magnitude of the
differences.
Finally, all graphs should contain a source for the information presented. The inclusion of a source
for the data will enable you to check the reliability of the organization presenting the data.
Applying the Concepts

Causes of Accidental Deaths in the United States, 1999–2009 The graph shows the number of deaths in the
United States due to accidents. Answer the following questions about the graph.

1. Name the variables used in the graph.


2. Are the variables qualitative or quantitative?
3. What type of graph is used here?
4. Which variable shows a decrease in the number of deaths over the years?
5. Which variable or variables show an increase in the number of deaths over the years?
6. The number of deaths in which variable remains about the same over the years?
7. List the approximate number of deaths for each category for the year 2001.
8. In 1999, which variable accounted for the most deaths? In 2009, which variable accounted for the most
deaths?
9. In what year were the numbers of deaths from poisoning and falls about the same?
SECTION 2 EXERCISES

Determine whether each statement is true or false. If the statement is false, explain why.
1. In the construction of a frequency distribution, it is a good idea to have overlapping class limits, such
as 10–20, 20–30, 30–40.
2. Bar graphs can be drawn by using vertical or horizontal bars.
3. It is not important to keep the width of each class the same in a frequency distribution.
4. Frequency distributions can aid the researcher in drawing charts and graphs.
5. The type of graph used to represent data is determined by the type of data collected and by the
researcher’s purpose.
6. In construction of a frequency polygon, the class limits are used for the x axis.
7. Data collected over a period of time can be graphed by using a pie graph. Select the best answer.
8. What is another name for the ogive?
a. Histogram b. Frequency polygon c. Cumulative frequency graph d. Pareto chart
9. What are the boundaries for 8.6–8.8?
a. 8–9 b. 8.5–8.9 c. 8.55–8.85 d. 8.65–8.75
10. What graph should be used to show the relationship between the parts and the whole?
a. Histogram b. Pie graph c. Pareto chart d. Ogive
11. Except for rounding errors, relative frequencies should add up to what sum?
a. 0 b. 1 c. 50 d. 100

Complete these statements with the best answers.


12. The three types of frequency distributions are ____, ___, and ___.
13. In a frequency distribution, the number of classes should be between ___ and ___.
14. Data such as blood types (A, B, AB, O) can be organized into a(n) ___ frequency distribution.
15. Data collected over a period of time can be graphed using a(n) ___ graph.
16. A statistical device used in exploratory data analysis that is a combination of a frequency distribution
and a histogram is called a(n) ___.
17. On a Pareto chart, the frequencies should be represented on the axis.
18. Housing Arrangements A questionnaire on housing arrangements showed this information obtained
from 25 respondents. Construct a frequency distribution for the data (H = house, A = apartment, M = mobile
home, C = condominium). These data will be used in Exercise 19.
H C H M H A C A M C M C A
M A C C M C C H A H H M
19. Construct a pie graph for the data in Exercise 18.
20. Items Purchased at a Convenience Store When 30 randomly selected customers left a convenience
store, each was asked the number of items he or she purchased. Construct an ungrouped frequency
distribution for the data. These data will be used in Exercise 21.
2 9 4 3 6
6 2 8 6 5
7 5 3 8 6
6 2 3 2 4
6 9 9 8 9
4 2 1 7 4
21. Construct a histogram, a frequency polygon, and an ogive for the data in Exercise 20.
22. Coal Consumption The following data represent the energy consumption of coal (in billions of Btu)
by each of the 50 states and the District of Columbia. Use the data to construct a frequency distribution and a
relative frequency distribution with 7 classes.
631 723 267 60 372 15 19 92 306 38
413 8 736 156 478 264 1015 329 679 1498
52 1365 142 423 365 350 445 776 1267 0
26 356 173 373 335 34 937 250 33 84
0 253 84 1224 743 582 2 33 0 426
474
Source: Time Almanac.
23. Construct a histogram, frequency polygon, and ogive for the data in Exercise
22. Analyze the histogram.
24. Recycled Trash Construct a Pareto chart and a horizontal bar graph for the number of tons (in millions)
of trash recycled per year by Americans based on an Environmental Protection Agency study.
Type Amount
Paper 320.0 Iron/steel 292.0
Aluminum 276.0 Yard waste 242.4
Glass 196.0 Plastics 41.6 Source: USA TODAY.
25. Identity Thefts The results of a survey of 84 people whose identities were stolen using various methods
are shown. Draw a pie chart for the information.
Lost or stolen wallet, checkbook, or credit card 38
Retail purchases or telephone transactions 15
Stolen mail 9
Computer viruses or hackers 8
Phishing 4
Other 10
84 Source: Javelin Strategy and Research.
26. Needless Deaths of Children The New England Journal of Medicine predicted the number of
needless deaths due to childhood obesity. Draw a time series graph for the data.
Year 2020 2025 2030 2035
Deaths 130 550 1500 3700
27. Museum Visitors The number of visitors to the Historic Museum for 25 randomly selected hours is
shown. Construct a stem and leaf plot for the data.
15 53 48 19 38 86 63 98 79 38
62 89 67 39 26 28 35 54 88 76
31 47 53 41 68
28. Parking Meter Revenue In a small city the number of quarters collected from the parking meters is
shown. Construct a dotplot for the data.
13 12 11 7 16 10 16 15 7 11
3 5 14 3 6 8 3 10 9 3
5 7 8 9 9 9 2 6 4 11
7 4 2 8 10 7 17 4 11 8
2 5 5 14 6 3 9 3 12 3
29. Water Usage The graph shows the average number of gallons of water a person uses for various
activities. Can you see anything misleading about the way the graph is drawn?

You might also like