Organization of Data

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 56

ORGANIZATION

OF DATA
STATISTICAL METHODS
Learning Outcomes:
At the end of this unit, you will be able to
• Organize and present data using arrays and frequency
distribution tables.
• Apply techniques of data organization and
presentation in real life
A. Raw data and array
Raw data – data in their original form
Array – is an ordered arrangement of data according to
magnitude.
Raw Data and Array are referred as
ungrouped data
Table 1. Exam scores of 110 students in Stat 1 (Raw Data)
82 82 83 79 72 71 84 59 77 50 87
83 82 63 75 50 85 76 79 68 69 62
79 69 74 53 73 71 50 76 57 81 62
72 88 84 80 68 50 74 84 71 73 68
71 80 72 60 81 89 94 80 84 81 50
84 76 75 82 76 53 91 69 60 89 79
59 62 79 82 72 81 60 84 68 66 94
77 78 87 75 86 82 74 73 72 84 51
50 69 75 70 77 87 86 77 75 96 66
87 73 84 68 86 62 87 92 69 52 65
Table 2. Exam scores of 110 students in Stat
1 (Array)
50 57 63 69 72 74 77 80 82 84 87
50 59 65 69 72 75 77 80 82 84 87
50 59 66 69 72 75 77 80 82 85 88
50 60 66 69 72 75 77 81 83 86 89
50 60 68 70 73 75 78 81 83 86 89
50 60 68 71 73 75 79 81 84 86 91
51 62 68 71 73 76 79 81 84 87 92
52 62 68 71 73 76 79 82 84 87 94
53 62 68 71 74 76 79 82 84 87 94
53 62 69 72 74 76 79 82 84 87 96
B. Frequency distribution
• is a way of summarizing data by showing the number of
observations that belong in the different categories or
classes
• also referred to as grouped data.
Two General Forms of Frequency
Distribution Table
1. Single-value grouping is a frequency distribution
where the classes are the distinct values of the variable.
This is applicable for data with only a few unique values.
2. Grouping by class intervals – is a frequency
distribution where the classes are the intervals.
Example:
Suppose we have data on the number of children of 50
married women using any modern contraceptive method.
Construct its frequency distribution.
Solution: This is an example of a single-
grouping frequency distribution table.
Example:
Refer to the data on Table 2. This is an example of a
grouping by class intervals frequency distribution table
Example:
This is also an example of a grouping by class intervals
frequency distribution table.
Definition of Terms
• Class interval - range of values that belong in the class
or category.
• Class Frequency - the number of observations that belong in
a class interval
• Class Limits – the end numbers used to define the class
interval.
• Lower class limit (LCL) - is the lower end number
• Upper class limit (UCL) – is the upper end number
Definition of Terms
• Class boundaries are the true class limits .
• Lower class boundary (LCB) is halfway
between the lower class limit of the class and
the upper class limit of the preceding class
• Upper class boundary (UCB) is halfway
between the upper class limit of the class and
the lower class limit of the next class .
Definition of Terms
• Class size is the size of the class interval.
- difference between the upper class boundaries of the class and the
preceding class
- difference between the lower class boundaries of the next class and
the class
- We can also use the class limits in place of the class boundaries .
• Class mark is the midpoint of a class interval.
- It is the average of the lower class limit and the upper class limit or
the average of the lower class boundary and upper class boundary of a
class interval.
Steps in Constructing a Frequency
Distribution
1. Make an array for the given data.
2. Determine the number of classes (K). There are no
precise rules concerning the optimal number of classes but
Sturges’ formula can be used as a first approximation.
• Sturges’ formula: K = 1 + 3.322 log n, where n =
number of observations
Round off K to the nearest integer.
Steps in Constructing a Frequency
Distribution
3. Determine the approximate class size (C). Whenever
possible, all classes should be of the same size.
• Solve for the range R = max - min
• Compute for C = R÷K, then round it up to the next
number.
4. Determine the lowest class limit. The first class must
include the smallest value in the data set.
Steps in Constructing a Frequency
Distribution
5. Determine the upper limit of the lowest class using the
formula:
UCL = LCL + C – 1
6. Determine the class boundaries using the following
formula:
LCB = LCL - 1/2
UCB = UCL + 1/2
Steps in Constructing a Frequency
Distribution
7. Determine all class limits by adding the class interval C
to the limit of the previous class.

8. Tally the frequencies for each class. Sum the frequencies


and check against the total number of observations.
Example: The data below shows the midterm
exam scores of students in Stat 1. Construct
a grouped frequency distribution for the
data.
82 82 83 79 72 71 84 59 77 50

83 82 63 75 50 85 76 79 68 69

79 69 74 53 73 71 50 76 57 81
Solution:
1. Make an array for the given data.

50 53 63 69 72 75 77 79 82 83

50 57 68 71 73 76 79 81 82 84

50 59 69 71 74 76 79 82 83 85
2. Determine the number of
classes (K).
K = 1 + 3.322 log n, where n =
30
K = 1 + 3.322 log (30) = 5.91 ≈
6
3. Determine the approximate class
interval (C).

R = max – min = 85 – 50 =
35
C = R ÷ K = 35 ÷ 6 = 5.8
≈6
4. Determine the lowest class
limit.
LCL = 50
5. Determine the upper limit of the
lowest class using the formula:

UCL = LCL + C - 1
UCL = 50 + 6 – 1= 55
6. Determine the class boundaries
using the following formula:
LCB = LCL – ½
LCB= 50 – ½ = 49.5
UCB = UCL + ½
UCB= 50 + ½= 50.5
7. Determine all class limits by adding the class
interval C to the limit of the previous class.
8. Tally the frequencies for each class. Sum
the frequencies and check against the total
number of observations.
Variations of Frequency
Distribution
1. Relative Frequency (RF) Distribution and Relative Frequency
Percentage (RFP)
RF = class frequency/ no. of observations
RFP = RF × 100%
2. Cumulative Frequency Distribution (CFD) - shows the
accumulated frequencies of successive classes, beginning at
either end of the distribution.

• Greater than CFD ( > CFD) - shows the no. of


observations greater than the LCB
• Less than CFD ( < CFD) - shows the no. of
observations less than the UCB
Graphical Presentations of the
Frequency Distribution
Frequency Histogram
• shows the overall picture of the distribution of the observed values in
the data set.
• It displays the class boundaries on the horizontal axis and the class
frequencies on the vertical axis.
• We represent each class frequency by a vertical bar, whose height is
equal to the frequency of the class
• interval and whose width represent the class size.
• We plot the sides of the bars at the class boundaries, rather than class
limits, because these are the true class limits.
MEASURES OF CENTRAL
TENDENCY
Learning Outcomes: At the end of this lesson, you
will be able to
• Identify characteristics of the measures of
central tendency
• Calculate the mean, median and mode of given
data
Summary measures

- a single value that we compute from a


collection of measurements in order to
describe one of the collection’s
particular characteristics
Measures of Central Tendency
- any single value that is used to identify the “center”
or the typical value of a data set
- often referred to as the average
- precise yet simple
- most representative value of the data
A. Summation

• denotes the sum of numerical values


• represented by the capital Greek letter
sigma

“summation of X sub i, where i is from 1 to n”


A. Summation
where:

= value of the variable for the observation


i = the index of summation
1 = lower limit of the summation
n = upper limit of the summation
• the upper and lower limits of the summation is used to identify the
set of values that the index i will take on.
• also called the index set
index set – is the collection of all integers, starting from the lower
limit of the summation up to the upper limit of summation
Example:
The weight in pounds of the five students are as follows: 110, 90, 105, 120
and 115. Express the formula for the total weight of these students using
the summation notation.
Solution: Let = weight of a student in pounds
=5
• represents the weight of the first student = 110 lbs
• represents the weight of the second student = 90 lbs
• represents the weight of the third student = 105 lbs
• represents the weight of the fourth student = 120 lbs
• represents the weight of the fifth student = 115 lbs
Example:
Notes on summation

1. The index (as indicated by the letter


below ∑ may be any letter, but the
letters i, j, k are the most common.
Notes on summation

2. The lower limit of the summation may


start with any number.
Example:
Notes on summation
3. The index of the summation will not
necessarily appear as a subscript in the terms of
the summation.
Example:
Weighted mean
The weighted mean is a modification of the usual
mean that assigns weights (or measures of relative
importance) to the observations to be averaged. If each
observation Xi is assigned a weight Wi where i = 1, 2, 3, ... , n,
the weighted mean is given by:
Example: Suppose a teacher
assigns the following weights to the
various course requirements:
Assignment 15%
Project 25%
Midterm Exam 20%
Final Exam 40%
The maximum score a student may obtain for each
component is 100. Jeffrey obtains marks of 83 for assignments,
72 for the project, 41 for the midterm exam, and 47 for the final
exam. Find his mean mark/grade for the course.
Solution:
The median
• the positional middle of the arrayed data
• divides the observations into two equal parts
- If the number of observations is odd, the median is the
middle number.
- If the number of observations is even, the median is
the average of the 2 middle numbers.
The median

•Let X(i) the ith observation in the array, i = 1,


2, . . . , n.

if n is odd
if n is even
The median
Example: Given the following heights (in inches) of
gumamela plants: 71, 72, 75, 75 and 67. Find the median
height.
Solution:
Array: 67 71 72 75 75
n = 5 (odd)
The median

Example: Given the following scores: 1, 7, 3, 3, 6, 5,


4, 3, find the median of the scores.
Solution: n = 8 (even)
Array: 1, 3, 3, 3, 4, 5, 6, 7
Approximating the median from
grouped data
- possible only when the values of the observations
falling in the median class can be assumed to be evenly
spaced throughout the class.
- median can calculated for frequency distributions,
even those that contain open-ended intervals, unless the
median falls into an open-ended class
Note: The median class is the class containing the
median.
Procedure:
Step 1. Construct the less than cumulative frequency
distribution (< CDF).
Step 2. Starting from the top, locate the class with less
than cumulative frequency greater than or equal to n/2 for
the first time. This class is the median class.
Step 3. Approximate the median using the following
formula:
Procedure:
where LCBmd = the lower class boundary of the
median class
c = class size of the median class
n = the total no. of observations in
the distribution
<CFmd-1 = less than cumulative freq. of
the class preceding the median class
fmd = freq. of the median class
The mode

- the observed value that occurs most frequently


- locates the point where the observation values occur with
the greatest density
- nominal average
- generally a less popular measure than the mean or the
median
- determined by counting the frequency of each value and
finding the value with the highest frequency of occurrence
The mode

Example: Give the mode


a. 2, 5, 2, 3, 5, 2, 1, 4, 2, 2, 2, 1, 2, 2, 2, 3, 2, 2,
2, 2
Mo = 2
Approximating the mode from
grouped data
Step 1. Locate the modal class (the class with the highest
frequency).
Step 2. Approximate the mode using the following formula:
Moi
where LCBmo = lower class boundary of the modal class
c = class size of the modal class
fmo = frequency of the modal class
= frequency of modal class – frequency of the proceeding class
= frequency of modal class – frequency of the next class
Characteristics of the Mode
- It does not always exist; and if it does, it may not be
unique.
• unimodal – if there is only one mode
• bimodal – if there are two modes
• trimodal – if there are three modes
- It is not affected by extreme values.
- It can be used for qualitative as well as quantitative
data.

You might also like