Organization of Data
Organization of Data
Organization of Data
OF DATA
STATISTICAL METHODS
Learning Outcomes:
At the end of this unit, you will be able to
• Organize and present data using arrays and frequency
distribution tables.
• Apply techniques of data organization and
presentation in real life
A. Raw data and array
Raw data – data in their original form
Array – is an ordered arrangement of data according to
magnitude.
Raw Data and Array are referred as
ungrouped data
Table 1. Exam scores of 110 students in Stat 1 (Raw Data)
82 82 83 79 72 71 84 59 77 50 87
83 82 63 75 50 85 76 79 68 69 62
79 69 74 53 73 71 50 76 57 81 62
72 88 84 80 68 50 74 84 71 73 68
71 80 72 60 81 89 94 80 84 81 50
84 76 75 82 76 53 91 69 60 89 79
59 62 79 82 72 81 60 84 68 66 94
77 78 87 75 86 82 74 73 72 84 51
50 69 75 70 77 87 86 77 75 96 66
87 73 84 68 86 62 87 92 69 52 65
Table 2. Exam scores of 110 students in Stat
1 (Array)
50 57 63 69 72 74 77 80 82 84 87
50 59 65 69 72 75 77 80 82 84 87
50 59 66 69 72 75 77 80 82 85 88
50 60 66 69 72 75 77 81 83 86 89
50 60 68 70 73 75 78 81 83 86 89
50 60 68 71 73 75 79 81 84 86 91
51 62 68 71 73 76 79 81 84 87 92
52 62 68 71 73 76 79 82 84 87 94
53 62 68 71 74 76 79 82 84 87 94
53 62 69 72 74 76 79 82 84 87 96
B. Frequency distribution
• is a way of summarizing data by showing the number of
observations that belong in the different categories or
classes
• also referred to as grouped data.
Two General Forms of Frequency
Distribution Table
1. Single-value grouping is a frequency distribution
where the classes are the distinct values of the variable.
This is applicable for data with only a few unique values.
2. Grouping by class intervals – is a frequency
distribution where the classes are the intervals.
Example:
Suppose we have data on the number of children of 50
married women using any modern contraceptive method.
Construct its frequency distribution.
Solution: This is an example of a single-
grouping frequency distribution table.
Example:
Refer to the data on Table 2. This is an example of a
grouping by class intervals frequency distribution table
Example:
This is also an example of a grouping by class intervals
frequency distribution table.
Definition of Terms
• Class interval - range of values that belong in the class
or category.
• Class Frequency - the number of observations that belong in
a class interval
• Class Limits – the end numbers used to define the class
interval.
• Lower class limit (LCL) - is the lower end number
• Upper class limit (UCL) – is the upper end number
Definition of Terms
• Class boundaries are the true class limits .
• Lower class boundary (LCB) is halfway
between the lower class limit of the class and
the upper class limit of the preceding class
• Upper class boundary (UCB) is halfway
between the upper class limit of the class and
the lower class limit of the next class .
Definition of Terms
• Class size is the size of the class interval.
- difference between the upper class boundaries of the class and the
preceding class
- difference between the lower class boundaries of the next class and
the class
- We can also use the class limits in place of the class boundaries .
• Class mark is the midpoint of a class interval.
- It is the average of the lower class limit and the upper class limit or
the average of the lower class boundary and upper class boundary of a
class interval.
Steps in Constructing a Frequency
Distribution
1. Make an array for the given data.
2. Determine the number of classes (K). There are no
precise rules concerning the optimal number of classes but
Sturges’ formula can be used as a first approximation.
• Sturges’ formula: K = 1 + 3.322 log n, where n =
number of observations
Round off K to the nearest integer.
Steps in Constructing a Frequency
Distribution
3. Determine the approximate class size (C). Whenever
possible, all classes should be of the same size.
• Solve for the range R = max - min
• Compute for C = R÷K, then round it up to the next
number.
4. Determine the lowest class limit. The first class must
include the smallest value in the data set.
Steps in Constructing a Frequency
Distribution
5. Determine the upper limit of the lowest class using the
formula:
UCL = LCL + C – 1
6. Determine the class boundaries using the following
formula:
LCB = LCL - 1/2
UCB = UCL + 1/2
Steps in Constructing a Frequency
Distribution
7. Determine all class limits by adding the class interval C
to the limit of the previous class.
83 82 63 75 50 85 76 79 68 69
79 69 74 53 73 71 50 76 57 81
Solution:
1. Make an array for the given data.
50 53 63 69 72 75 77 79 82 83
50 57 68 71 73 76 79 81 82 84
50 59 69 71 74 76 79 82 83 85
2. Determine the number of
classes (K).
K = 1 + 3.322 log n, where n =
30
K = 1 + 3.322 log (30) = 5.91 ≈
6
3. Determine the approximate class
interval (C).
R = max – min = 85 – 50 =
35
C = R ÷ K = 35 ÷ 6 = 5.8
≈6
4. Determine the lowest class
limit.
LCL = 50
5. Determine the upper limit of the
lowest class using the formula:
UCL = LCL + C - 1
UCL = 50 + 6 – 1= 55
6. Determine the class boundaries
using the following formula:
LCB = LCL – ½
LCB= 50 – ½ = 49.5
UCB = UCL + ½
UCB= 50 + ½= 50.5
7. Determine all class limits by adding the class
interval C to the limit of the previous class.
8. Tally the frequencies for each class. Sum
the frequencies and check against the total
number of observations.
Variations of Frequency
Distribution
1. Relative Frequency (RF) Distribution and Relative Frequency
Percentage (RFP)
RF = class frequency/ no. of observations
RFP = RF × 100%
2. Cumulative Frequency Distribution (CFD) - shows the
accumulated frequencies of successive classes, beginning at
either end of the distribution.
if n is odd
if n is even
The median
Example: Given the following heights (in inches) of
gumamela plants: 71, 72, 75, 75 and 67. Find the median
height.
Solution:
Array: 67 71 72 75 75
n = 5 (odd)
The median