Lesson 2 Frequency Distributions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

FT T319

Applied Statistics

LESSON 2: Frequency Distributions

Introduction
When conducting a statistical study, the researcher must gather data for the particular variable
under study. For example, if a researcher wishes to study the number of people who were bitten by
poisonous snakes in a specific geographic area over the past several years, he or she has to gather the
data from various doctors, hospitals, or health departments.
To describe situations, draw conclusions, or make inferences about events, the researcher must
organize the data in some meaningful way. The most convenient method of organizing data is to
construct a frequency distribution.
After organizing the data, the researcher must present them so they can be understood by
those who will benefit from reading the study. The most useful method of presenting the data is by
constructing statistical charts and graphs. There are many different types of charts and graphs, and each
one has a specific purpose.

Organizing Data
Wealthy People
Suppose a researcher wished to do a study on the ages of the top 50 wealthiest people in the
world. The researcher first would have to get the data on the ages of the people. In this case, these ages
are listed in Forbes Magazine. When the data are in original form, they are called raw data and are listed
next.

Since little information can be obtained from looking at raw data, the researcher organizes the
data into what is called a frequency distribution. A frequency distribution consists of classes and their
corresponding frequencies. Each raw data value is placed into a quantitative or qualitative category
called a class. The frequency of a class then is the number of data values contained in a specific class. A
frequency distribution is shown for the preceding data set.
Now some general observations can be made from looking at the frequency distribution. For
example, it can be stated that the majority of the wealthy people in the study are over 55 years old.

A frequency distribution is the organization of raw data in table form, using classes and
frequencies.
The classes in this distribution are 35–41, 42–48, etc. These values are called class limits. The
data values 35, 36, 37, 38, 39, 40, 41 can be tallied in the first class; 42, 43, 44, 45, 46, 47, 48 in the
second class; and so on.

Two Types of Frequency Distributions


 Categorical frequency distribution
 Grouped frequency distribution

Categorical frequency distribution - is used for data that can be placed in specific categories, such as
nominal- or ordinal-level data. For example, data such as political affiliation, religious affiliation, or
major field of study would use categorical frequency distributions.

Example:
Distribution of Blood Types
Twenty-five army inductees were given a blood test to determine their blood type. The data set
is

Construct a frequency distribution for the data.

Solution
Since the data are categorical, discrete classes can be used. There are four blood types: A, B, O,
and AB. These types will be used as the classes for the distribution. The procedure for constructing a
frequency distribution for categorical data is given next.

Step 1 Make a table as shown.

Step 2 Tally the data and place the results in column B.


Step 3 Count the tallies and place the results in column C.
Step 4 Find the percentage of values in each class by using the formula:

where f = frequency of the class and n = total number of values.


For example, in the class of type A blood, the percentage is:

Percentages are not normally part of a frequency distribution, but they can be added since they
are used in certain types of graphs such as pie graphs. Also, the decimal equivalent of a percent is called
a relative frequency.

Step 5 Find the totals for columns C (frequency) and D (percent). The completed table is shown.

For the sample, more people have type O blood than any other type.

Grouped Frequency Distributions


When the range of the data is large, the data must be grouped into classes that are more than
one unit in width, in what is called a grouped frequency distribution. For example, a distribution of the
number of hours that boat batteries lasted is the following.

In this distribution, the values 24 and 30 of the first class are called class limits. The lower class
limit is 24; it represents the smallest data value that can be included in the class. The upper class limit is
30; it represents the largest data value that can be included in the class. The numbers in the second
column are called class boundaries. These numbers are used to separate the classes so that there are no
gaps in the frequency distribution. The gaps are due to the limits; for example, there is a gap between
30 and 31.
Students sometimes have difficulty finding class boundaries when given the class limits. The
basic rule of thumb is that the class limits should have the same decimal place value as the data, but the
class boundaries should have one additional place value and end in a 5.
Example:
If the values in the data set are whole numbers, such as 24, 32, and 18, the limits for a class
might be 31–37, and the boundaries are 30.5–37.5. Find the boundaries by subtracting 0.5 from 31 (the
lower class limit) and adding 0.5 to 37 (the upper class limit).
Lower limit – 0.5 = 31 – 0.5 = 30.5 = lower boundary
Upper limit + 0.5 = 37 + 0.5 = 37.5 = upper boundary
If the data are in tenths, such as 6.2, 7.8, and 12.6, the limits for a class hypothetically might be
7.8–8.8, and the boundaries for that class would be 7.75–8.85. Find these values by subtracting 0.05
from 7.8 and adding 0.05 to 8.8.
Finally, the class width for a class in a frequency distribution is found by subtracting the lower
(or upper) class limit of one class from the lower (or upper) class limit of the next class.
Example:
The class width in the preceding distribution on the duration of boat batteries is 7, found from
31 – 24 = 7.
The class width can also be found by subtracting the lower boundary from the upper boundary
for any given class. In this case: 30.5 – 23.5 = 7.
Note: Do not subtract the limits of a single class. It will result in an incorrect answer.

The researcher must decide how many classes to use and the width of each class. To construct a
frequency distribution, follow these rules:
1. There should be between 5 and 20 classes.
2. It is preferable but not absolutely necessary that the class width be an odd number.
The class midpoint Xm is obtained by adding the lower and upper boundaries and dividing by 2, or
adding the lower and upper limits and dividing by 2:

Example:
The midpoint of the first class in the example with boat batteries is

The midpoint is the numeric location of the center of the class. Midpoints are necessary for
graphing. If the class width is an even number, the midpoint is in tenths. For example, if the class width
is 6 and the boundaries are 5.5 and 11.5, the midpoint is

Note: Rule 2 is only a suggestion, and it is not rigorously followed, especially when a computer is used to
group data.

3. The classes must be mutually exclusive. Mutually exclusive classes have nonoverlapping class limits
so that data cannot be placed into two classes. Many times, frequency distributions such as

are found in the literature or in surveys. If a person is 40 years old, into which class should she or he be
placed? A better way to construct a frequency distribution is to use classes such as
4. The classes must be continuous. Even if there are no values in a class, the class must be included in
the frequency distribution. There should be no gaps in a frequency distribution. The only exception
occurs when the class with a zero frequency is the first or last class. A class with a zero frequency at
either end can be omitted without affecting the distribution.
5. The classes must be exhaustive. There should be enough classes to accommodate all the data.
6. The classes must be equal in width. This avoids a distorted view of the data. One exception occurs
when a distribution has a class that is open-ended. That is, the class has no specific beginning value or
no specific ending value. A frequency distribution with an open-ended class is called an open-ended
distribution. Here are two examples of distributions with open-ended classes.

The frequency distribution for age is open-ended for the last class, which means that anybody
who is 54 years or older will be tallied in the last class. The distribution for minutes is open-ended for
the first class, meaning that any minute values below 110 will be tallied in that class.

Example:
Record High Temperatures
These data represent the record high temperatures in degrees Fahrenheit (⁰F) for each of the 50
states. Construct a grouped frequency distribution for the data using 7 classes.

Solution
The procedure for constructing a grouped frequency distribution for numerical data follows.
Step 1 Determine the classes.
*Find the highest value and lowest value: H = 134 and L = 100.
*Find the range: R = highest value – lowest value = H - L, so
R = 134 – 100 = 34
*Select the number of classes desired (usually between 5 and 20). In this case, 7 is arbitrarily
chosen.
*Find the class width by dividing the range by the number of classes.

*Round the answer up to the nearest whole number if there is a remainder: 4.9 ≈ 5. (Rounding
up is different from rounding off. A number is rounded up if there is any decimal remainder when
dividing. For example, 85 6 = 14.167 and is rounded up to 15. Also, 53 4 = 13.25 and is rounded up
to 14. Also, after dividing, if there is no remainder, you will need to add an extra class to accommodate
all the data.)
*Select a starting point for the lowest class limit. This can be the smallest data value or any
convenient number less than the smallest data value. In this case, 100 is used. Add the width to the
lowest score taken as the starting point to get the lower limit of the next class. Keep adding until there
are 7 classes, as shown, 100, 105, 110, etc.
*Subtract one unit from the lower limit of the second class to get the upper limit of the first
class. Then add the width to each upper limit to get all the upper limits.
105 – 1 = 104
The first class is 100–104; the second class is 105–109, etc.
*Find the class boundaries by subtracting 0.5 from each lower class limit and adding 0.5 to each
upper class limit:
99.5–104.5, 104.5–109.5, etc.
Step 2 Tally the data.
Step 3 Find the numerical frequencies from the tallies.
The completed frequency distribution is:

 The frequency distribution shows that the class 109.5–114.5 contains the largest number of
temperatures (18) followed by the class 114.5–119.5 with 13 temperatures. Hence, most of the
temperatures (31) fall between 109.5 and 119.5⁰F.

Sometimes it is necessary to use a cumulative frequency distribution. A cumulative frequency


distribution is a distribution that shows the number of data values less than or equal to a specific value
(usually an upper boundary). The values are found by adding the frequencies of the classes less than or
equal to the upper class boundary of a specific class. This gives an ascending cumulative frequency. In
this example, the cumulative frequency for the first class is 0 + 2 = 2; for the second class it is 0 + 2 + 8
= 10; for the third class it is 0 + 2 + 8 + 18 = 28. Naturally, a shorter way to do this would be to just add
the cumulative frequency of the class below to the frequency of the given class. For example, the
cumulative frequency for the number of data values less than 114.5 can be found by adding 10 + 18 +
28. The cumulative frequency distribution for the data in this example is as follows:

Cumulative frequencies are used to show how many data values are accumulated up to and
including a specific class. In the previous example, 28 of the total record high temperatures are less than
or equal to 114⁰F. Forty-eight of the total record high temperatures are less than or equal to 124⁰F.
After the raw data have been organized into a frequency distribution, it will be analyzed by
looking for peaks and extreme values. The peaks show which class or classes have the most data values
compared to the other classes. Extreme values, called outliers, show large or small data values that are
relative to other data values.

Ungrouped Frequency Distribution


When the range of the data values is relatively small, a frequency distribution can be
constructed using single data values for each class. This type of distribution is called an ungrouped
frequency distribution.

Example:
MPGs for SUVs
The data shown here represent the number of miles per gallon (mpg) that 30 selected four-wheel-drive
sports utility vehicles obtained in city driving. Construct a frequency distribution, and analyze the
distribution.

Solution
Step 1 Determine the classes. Since the range of the data set is small (19 – 12 = 7), classes consisting of a
single data value can be used. They are 12, 13, 14, 15, 16, 17, 18, 19.
Note: If the data are continuous, class boundaries can be used. Subtract 0.5 from each class
value to get the lower class boundary, and add 0.5 to each class value to get the upper class boundary.
Step 2 Tally the data.
Step 3 Find the numerical frequencies from the tallies, and find the cumulative frequencies.
The completed ungrouped frequency distribution is:

In this case, almost one-half (14) of the vehicles get 15 or 16 miles per gallon. The cumulative
frequencies are
Reasons for Constructing a Frequency Distribution
1. To organize the data in a meaningful, intelligible way.
2. To enable the reader to determine the nature or shape of the distribution.
3. To facilitate computational procedures for measures of average and spread.
4. To enable the researcher to draw charts and graphs for the presentation of data.
5. To enable the reader to make comparisons among different data sets.

Learning Task/Activity:

Name: _______________________________ Date: _____________


Course & Year: ________________________ Instructor: Dr. ANJIN PLEIADESS P. CABRERA

Exercise 2
Frequency Distributions

A. Find the class boundaries, midpoints, and widths for each class.
1. 32–38
Class Boundaries: _______
Midpoints: ____________
Class Width: ___________
2. 86–104
Class Boundaries: _______
Midpoints: ____________
Class Width: ___________
3. 895–905
Class Boundaries: _______
Midpoints: ____________
Class Width: ___________
4. 12.3–13.5
Class Boundaries: _______
Midpoints: ____________
Class Width: ___________
5. 3.18–4.96
Class Boundaries: _______
Midpoints: ____________
Class Width: ___________

B. Grams per Food Serving


The data shown are the number of grams per serving of 30 selected brands of cakes. Construct a
frequency distribution using 5 classes.

You might also like