GE 104 Module 4

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 24

Chapter 4 –Statistics

Learning Objectives:
 Use a variety of statistical tools to process and manage numerical data
 Advocate the use of statistical data in making important decisions

Introduction
Statistics is a branch of applied mathematics that deals with gathering, organizing, presenting,
analysing, and interpreting the collected data. There are two branches of statistics-descriptive statistics
and inferential statistics. Descriptive statistics involves the collecting, organizing, describing,
summarizing and presenting of gathered data in a meaningful and informative way while inferential
statistics refers to the process of drawing conclusion and making decisions on the population based on
evidence obtained from a sample. Inferential statistics include estimation and hypothesis testing.

In performing all these processes involved, the application of statistical tools and techniques is
necessary. Statistical tools derived from mathematics are useful in processing and managing numerical
data in order to describe a phenomenon and predict values.

The essential processes arrange the data to be analysed and interpreted. These refer to gathering
and organizing data that can be done using the frequency distribution or group data and series of values in
the case of few data or values. The use of the measures of central tendency is very much important to help
us determine central value which can be used to describe the general or overall performance of a certain
group of values like the mean, the median, the mode. On the other hand, the measures of dispersion can
also be utilized in order to know how close or how far the data or values from each other like the range,
standard deviation, and the variance. These are also helpful in describing whether the scattered, varied,
distant or spread or they are just clustered or close to each other. These measures include also the
measures of relative position which include percentile, quartile and decile.

The tool which serves as intermediary between theory and practice, between thought and observation, is Mathematics; it is mathematics
which builds the linking bridges and gives the ever more reliable forms David Hilbert

Key Concepts
Gathering and organizing Data
The data (Asaad, 2004) are the quantities (numbers) or qualities (attributes) measured or observed
that are to be collected and/or analysed. A collection of data is called data set. Two categories of data are
categorical and continuous data. Categorical data are nominal and ordinal scales while Continuous data
are ratio and interval scales.
Nominal Scale – consist of a finite set of possible values having no particular order. (Example: gender,
mode of transportation, nationality, occupation and civil status).
Ordinal Scale – set of possible values having a specific order. (Example: pain level, social status, attitude
towards a subject)
Continuous Scale – has interval and ratio scales.
Interval Scale – are measured on continuum and differences between any two numbers on the scale are of
known size. (Example: temperature, ton of garbage, number of arrests, income and age). There is a need
to distinguish them in order to decide what method to use because it varies according to the type of data.
Categorical data use non-parametric statistics while continuous data use parametric statistics.

A Variable refers to a property that can take on different values or categories which cannot be
predicted with certainty. The three common types of variables are independent variable or X which are
also called explanatory variables, these may be continuous, nominal or ordinal; dependent variable or Y
which are also called the response variables; and control variables or Z variables. Variables can also be
classified as qualitative variable and quantitative variable.
Quantitative variable – is one that can be measured and ordered according to quantity. Furthermore,
quantitative variable maybe discrete or continuous variable.
Discrete variable – includes finite or countably finite.
Continuous variable – covers the values in an interval of real number line.
Qualitative variable – is one simply used as labels to distinguish one group from the other.
The data gathered shall be presented, analysed and interpreted that can be easily understood by the
reader. Data may be presented in textual, tabular, graphical or a combination of these.
Textual presentation – uses statements with numerals in order to describe the data for the concrete
information and in expository form. It is to discuss the data and the information and interpretation it
carries.
Tabular presentation – uses statistical table to directly display the quantities or values collected as data.
Graphical presentation – illustrates data in a form of graphs aiding readers to understand the text easily.
A graph is a most attractive, effective and convincing way. There are various types of graph we can
prepare like bar graph, circle graph line graph and pictograph.

Measures of Central Tendency

A measure of central tendency or measure of central location is summary measure that


describes a whole set of data with a large quantity that represents the middle or center of the distribution
the way in which a group of data that cluster around a central value. In short, this is a measure that tells
where the center of a data is located.

The most common commonly used measures of central tendency are the mean, median, and
mode. These are used when the general or over-all performance of the class is compared to other classes.

MEAN

The mean, Mn is also called the arithmetic mean or average. It can be affected by extreme scores.
It is stable, and varies less from sample to sample. It is used if the most reliable measure is desired and
when there are few with very high values and few with very low values. The mean is the balance point of
a score distribution.

How to compute for the mean?

A. Ungrouped Data

a. Mean, Mn = sum of the values


the number of values
Examples:
1. The ages of five contestants in a Statistics Quiz Bee are the following: 18, 17, 18, 19, and 18.
Find their average age.

Solution:
Mn = 18 + 17 + 18 + 19 + 18 add all the values (ages)
5 then divide the sum by 5.
Mn = 18 Then the mean age of the
contestants is 18

2. Six employees are working as call center agents. Their salaries are as follows: P23,500,
P24,300, P25,800, P23,900, P24, 100, and P24,950. What is the average salary of the
employees?
Solution:
Mn = 23500 + 24300 + 25800 + 23900 +24100 + 24950
6
Mn = 24425

Thus, the mean salary of the employees is P24, 425. This implies that employees who are
getting salaries below P24, 245 are not being paid enough while those who are getting salaries
above P24, 425 are receiving more than the average salary.

b. Weighted Mean
Weighted mean is mean calculated by giving values in a data set more influence according to
some attribute of the data. It is an average in which each quantity to be averaged is assigned a weight, and
these weightings determine the relative importance of each quantity on the average. Weightings are the
equivalent of having that many like items with the same value involved in the average.
The formula for weighted mean is W Mn = ∑wx, where w is the weight of each value and x is the
matching value. ∑w
Examples:
1. Xandra bought different fruits for New Year. She bought 3 apples at P10 each, 5 ponkans at P5
each, 3 pears at P15 each, 4 pieces of chico at P25 each. What is the average price of each fruit
that Xandra bought?
Solution:
WMn =∑wx = 3(10) + 5(5) + 3(15) + 4(25)
∑w 3 + 5 +3 + 4

WMn = 30 + 25 + 45 + 100 = 13.33


15
Thus, the average price of each fruit bought by Xandra is P13.33

2. At MJR fitness and health society, 60% of the members are women and 40% are men. What is
the average age of all the members if the average age of the women is 35 and the average age
of the men is 30?

Solution:
Given: average age of women = 35
Average age of men = 30
Condition: 60% of the members are women and 40% are men.
 To solve the problem, we multiply the average ages by the corresponding weights
35(0.60) = 21
30(0.40) = 12
 Then add the percentages that you’ve gotten, that is
21 + 12 = 23
Hence, the average age of all the members of the society is 33.

B. Grouped Data

One method that can be used to find the mean of grouped data is the class mark or midpoint
method.
Class mark or Midpoint Method
In this method, the class mark of each interval has to be known and then it will be multiplied to the
corresponding frequency of every class interval. The formula for the mean using this method is

Mn = ∑fx
N
Where:

Mn = Mean
f = frequency
x = class mark or midpoint
N = total number of observation

Examples:
1. Consider the frequency distribution below:
CI f
75 – 79 5
70 – 74 7
65 – 69 8
60 – 64 10
55 – 59 8
50 - 54 9
45 – 49 5
N = 50
Determine the mean of the distribution
Solution:
First, get the midpoint or class mark of each class interval. Next, multiply the frequency of each
class to the corresponding midpoint or class mark. Then, get the sum of the products. The table is shown
below: S
Cl f x fx
75 – 79 5 77 385
70 – 74 7 72 504
65 – 69 8 67 536
60 – 64 10 62 620
55 – 59 7 57 399
50 – 54 9 52 468
45 - 49 4 47 188
N=50 ∑fx =3100
From the values in the table above, we can now compute for the value of the mean by substituting
the computed N = 50 and ∑fx = 3100 in the formula.

Mn= ∑fx = 3100 = 62


N 50

Thus, the mean of the data is 62.

2. The heights of 40 grade 6 pupils in a certain grade school are presented in a frequency distribution as
shown below:
Height of a class of 40 students
CI f
48 – 52 4
53 – 57 7
58 – 62 7
63 – 67 8
68 - 72 6
73 – 77 6
78 – 82 2
N = 40
Determine the average height of the students using the midpoint method
Solution:
First, get the midpoint or class mark of each class interval. Next, multiply the frequency of each
class to the corresponding midpoint or class mark. Then, get the sum of products. The table is shown
below:

Height of a class of 40 students


CI f x fx
48 – 52 4 50 200
53 – 57 7 55 385
58 – 62 7 60 420
63 - 67 8 65 520
68 - 72 6 70 420
73 – 77 6 75 450
78 - 82 2 80 160
N = 40 ∑fx = 2555
From the values above, we now compute for the value of the mean by substituting the computed N
= 40 and ∑fx = 2555 in the formula.
Mn = ∑fx = 2555 = 63,88
N 40
Therefore, the mean height of students is 63. 88 cm.

MEDIAN

The median, Md, is the value in the distribution that divides and arranged (ascending/descending)
set into two equal parts. It is the midpoint or middlemost of a distribution of scores. Fifty percent of the
scores fall above it 50% falls below it. It is also known as the 50 th percentile. It is not affected by extreme
scores. This is used when the distribution of scores is skewed. The median separates the distribution into
two equal parts.

How to compute for the median?

A. Ungrouped Data

The median is obtained by inspecting the middlemost value of the arrange distribution either in
ascending or descending order. It can also be solved using the formula (N+1)/2th position after
being arranged.

Examples:

1. Find the median of the following scores:


50, 55, 60, 65, 12, 35, 48
Solution: (arranging the scores in an ascending order)
12, 35, 48, 50, 55, 60, 65 N = 7

Therefore:
Md = (N+1)/2 = 4th Score
Md = 50

2. Find the median of the following weights in kilos.


101, 107, 115, 120, 111, 105
Solution: (arranging the numbers in an ascending order)
101, 105, 107, 111, 115, 120 N=6
Therefore:
Md = (N+1)2th score
Md = (6+1)/2
= (7)/2 = 3.5th score, that is between the 3rd and the 4th scores.
Md = (107+111)/2 =109
Md = 109

B. Grouped Data

In computing the median of the grouped data, determine the median class which contains the
(N/2)th score under ˂cf of the cumulative frequency distribution. To solve for the median, we use the
formula:

Md= XLB + (N/2 – cfb) i


fm
where: Md= median
XLB = the lower boundary or true lower limit of the median class
N = total frequency
cfb = cumulative frequency before the median class
fm = frequency of the median class
i = size of the class interval
Examples:

1. Solve for the median for the following data.

Statistics Test Results

Class interval frequency ˂cf


28-29 1 60
26-27 3 59
24-25 3 56
22-23 3 53
20-21 6 50
18-19 6 44
16-17 8 38
14-15 6 = fm 30 = median class
12-13 10 24 = cfb
10-11 14 14
N = 60

Solution:
N/2th score = (60/2)th score
= 30th score
The median class that contains the 30th score is 14-15; since it has the 30th score
XLB = 13.5
cfb = 24
fm = 6
i=2
Using the formula for median, we have;
Md= XLB + (N/2 – cfb) i
fm
Md= 13.5 + (60/2 -24) 2
6
= 13.5 + (30 – 24) 2
6
= 13.5 + (6) 2
6
= 13.5 + (1)2
= 13.5 + 2
Md= 15.5 ….Answer

This means that 50% of the students got a score below 15.5 or if the passing score is 50% of the
total number of items, almost half of the class failed in the test.

2. Find the median of the following distribution


Class interval frequency ˂cf
25-29 3 3
30-34 2 5
35-39 5 10
40-44 8 18
45-49 8 26
50-54 8 34
55-59 9 43
60-64 6 49
65-69 6 55
70-74 3 58
75-79 3 61
80-84 3 64
N = 64

Solution:
N/2th score = (64/2)th score
= 32th score
Looking at the ˂cf column, we can see that 32 lie within 34. The class interval that corresponds to 34 is
50-54. Therefore, the median class is the interval 50-54.
The median class that contains the 30th score is 14-15; since it has the 30th score
XLB = 49.5
cfb = 26
fm = 8
i=5

Using the formula for median, we have;


Md= XLB + (N/2 – cfb) i
fm

Md= 49.5 + (64/2 – 26) 5


8
= 49.5 + (32-26) 5
8
= 49.5 + (6) 5
8
= 49.5 + (0.75) 5
= 49.5 + 3.75
Md= 53.25 ….Answer

MODE
The mode is the value with the largest frequency. It is the value that occurs most frequently in the
distribution. This is used when the quickest estimate of typical performance is wanted. If no number in the
list is repeated, then there is no mode in the distribution. However, it is also possible to have more than
one mode for the same distribution of data. A distribution can be unimodal with one mode value,
bimodal with two modes values and trimodal with three mode values. In other words, it can have more
than one mode.

How to find the mode?

A. Ungrouped Data

The mode of grouped data is found by merely inspection.

Examples:
1. Find the mode of the following discounts.
13%, 4%, 7%, 7%, 9%, 7%, 11%, 11%, 8%, 10%, 8%
Solution: (Arrange the data set in order)
4%, 7%, 7%, 7%, 8%, 8%, 9%, 10%, 11%, 11%, 13%
By inspection, the mode is 7% since it has the largest frequency. . unimodal

2. Find the mode of the following distribution.


7, 9, 10, 10, 10, 15, 20, 20, 20
Solution: (By inspection data are arranged already in ascending order).
And the modes are 10 and 20… Bi-modal
B. Grouped Data

To find the mode of the grouped data, determine first the modal class. In a frequency distribution,
the modal class can be easily determined by inspection as it is the class with the highest frequency.
And we will use the formula.

Mo = XLB + d1 i
d1 + d2

Mo = Mode
XLB = the lower boundary or true lower limit of the modal class
d1 = difference between the frequency of the modal class and the frequency of the class before
the modal class
d2 = difference between the frequency of the modal class and the frequency of the class after
the modal class
i = size of the class interval

Example:
1. Calculate the modal score from the distribution of scores in a mathematics quiz.

Class interval frequency


10-14 2
15-19 5
20-24 8 – frequency before
25-29 - modal class 12
30-34 6 - frequency after
35-39 4
40-44 3
N=40

Solution:
a. By inspection, the highest frequency is 12 and the corresponding class interval is 25-29. This
means that the modal class is the interval 25-29.
XLB = 24.5
d1 = 12 – 8 = 4
d2 = 12 – 6 = 6
i=5

b. Substitute the values to the formula.


Mo = XLB + d1 i
d1 + d2

Mo = 24.5 + 4 5
4+6

Mo = 24.5 + 4 5
10
Mo = 24.5 + 0.4 (5)
Mo = 24.5 + 2
Mo = 26.5
Assessment Task ( 8 )

Name: ____________________________Course, Year & Section: _________________Score: _____

Multiple Choice Tests


DIRECTIONS: Choose the letter of the correct answer and write it on the blank provided before the
number.

_____1. Which of the following is the use of statistics?


a. It can give precise description of the data
b. It can predict the behavior of individual
c. It can be used to test a hypothesis
d. All of these
_____2. The following are uses of statistics in business, except for one. Which is it?
a. Used in forecasting business trends
b. Used in sales forecasting
c. Used in formulation of national policies
d. Used in management and control
_____3. This of the following is a process used in collection of data except for one. Which is it?
a. experiment b. tables c. test d. interview
_____4. This refers to the process of gathering relevant information from the population.
a. collection of data
b. organization of data
c. analysis of data
d. interpretation of data
_____5. This refers to the process of deducing relevant information from the given data so that
numerical description can be formulated.
a. collection of data
b. organization of data
c. analysis of data
d. interpretation of data

Test II – Problem Solving.

1. The sizes of pants sold during one business day in a department store are;
32, 38, 34, 42, 36, 34, 40, 44, 32, 34. Find the average size of the pants sold.

2. It was recorded that 4 brands of ball pen with tag prices of 17.50, 24.50, 28.50, and 32.50 were
bought by 7, 12, 4, and 2 students respectively. Find the mean sale.

3. Find the median for the following set of scores;


15, 28, 25, 48, 22, 43, 39, 44, 43, 49, 34, 22, 33, 27, 25, 22, 30

4. Find the mode for the following data set.

121, 110, 120, 119, 112, 121, 118, 115, 107, 115

5. Find the mode for the following data set.

7, 5, 10, 22, 20, 6, 11, 21


Assessment Task ( 9 )

Name: ____________________________Course, Year & Section: _________________Score: _____

Problem Solving.

1. The heights of 40 grade 6 pupils in a certain grade school are presented in a frequency distribution
as shown below.

Height of a class of 40 students


CI f
48-52 4
53-57 7
58-62 7
63-67 8
68-72 6
73-77 6
78-82 2
N = 40
Find the mean height of the students.

2. Compute the median from the distribution of scores in a Mathematics Quiz.

CI f
10-14 2
15-19 5
20-24 8
25-29 12
30-34 6
35-39 4
40-44 3
N= 40

3. Compute the mode from the distribution of Grades in Mathematics.

Grade (%) No. of Examinees


Classes (f)
90-94 10
85-89 10
80-84 15
75-79 22
70-74 18
65-69 12
N = 87
Measures of Relative Position

As median divides the set of scores into two equal parts, there are other measures that divide the
distribution into one hundred, four, or ten parts. These are the other measures of position: the percentiles,
the quartiles, and the deciles.

Computation of Quartiles, Deciles, and Percentiles for Ungrouped Data

1. Arrange first the scores according to magnitude or size.


2. Compute the position of the given Quartile, Decile, and Percentile in the distribution using the
formula;

Qk = k(N+1) th item Dk = k(N+1) th item Pk = k(N+1) th item


4 10 100
Qk = desired quartile Dk = desired decile Pk = desired percentile
N = desired number of cases
3. Starting from the lowest score, locate the score corresponding to the obtained position in the
distribution.
4. Interpolate to get the score if the obtained position from step 2 is not exact.
Examples:
1. Consider the following scores of students in a Math test.
35, 40, 8, 10, 15, 32, 30, 28, 25, 22, 20, 18
Find:
1. Q2 =
2. D7 =
3. P30 =

Solution:

For Q2:

1. Arrange the score according to magnitude or size


8, 10, 15, 18, 20, 22, 25, 28, 30, 32, 35, 40. There are 12 scores/observation, meaning N=12
2. Compute for the position of Q2
Q2 = 2(N+1) th item
4
= 2 (12 + 1)
4
= 2(13)
4
= 26
4
Q2 = 6.5 item. This means that Q2 is located between (22 6th item) and (25 7th item) position .
3. Since the obtained value for Q2 is not exact, then we need to interpolate.
4. Interpolation:
25 – 22 = 3
3 (.5) = 1.5
22 + 1.5 = 23.5
Q2 = 23.5 →This means that 50% of the score lies below 23.5

For D7:
1. Arrange the score according to magnitude or size
8, 10, 15, 18, 20, 22, 25, 28, 30, 32, 35, 40. There are 12 scores/observation, meaning N=12
2. Compute for the position of D7
D7 = 7(N+1) th item
10
= 7 (12 + 1)
10
= 7(13)
10
= 91
10
D7 = 9.1 item.
3. Since the obtained value for D7 is not exact, then we need to interpolate.
4. Interpolation:
32 – 30 = 2
2 (.1) = 0.2
30 + 0.2 = 30.2
D7 = 30.2 → This implies that 70% of the score lies below 30.2

For P30:
1. Arrange the score according to magnitude or size
8, 10, 15, 18, 20, 22, 25, 28, 30, 32, 35, 40. There are 12 scores/observation, meaning N=12
2. Compute for the position of P30
P30 = 30(N+1) th item
100
= 30 (12 + 1)
100
= 30(13)
100
= 390
100
P30 = 3.9 item
3. Since the obtained value for P30 is not exact, then we need to interpolate.
4. Interpolation:
18 – 15 = 3
3(.9) = 2.7
15 + 2.7 = 17.7
P30 = 17.7 This shows that 30% of the scores lies below 17.7

Computation of Quartiles, Deciles, and Percentiles for Grouped Data

Quartiles:

Qk = L + KN - F
4 i
fk

where:

Qk = kth Quartile
L = exact lower limit of interval containing kth quartile
F = cumulative frequency before the kth quartile class
fk = frequency of the kth quartile class
N = total number of observation
i = class interval
Deciles:

Dk = L + KN - F
10 i
fk

where:

Dk = kth Quartile
L = exact lower limit of interval containing kth decile
F = cumulative frequency before the kth decile class
fk = frequency of the kth decile class
N = total number of observation
i = class interval

Percentile:

Pk = L + KN - F
100 i
fk

where:

Pk = kth Quartile
L = exact lower limit of interval containing kth percentile
F = cumulative frequency before the kth percentile class
fk = frequency of the kth percentile class
N = total number of observation
i = class interval

Example:

1. Consider the frequency distribution of scores of 30 students in a Math Test

Scores f F
70-79 2 30
60-69 3 28
50-59 2 25
40-49 P68 7 23
30-39 D3 -- Q1 9 16
20-29 7 7
N = 30

Find:
a. Q1
b. D3
c. P68

Solution:

For Q1:
1. First compute for the ˂F of the data
2. Next, find the first quartile class
Q1 = 1 (N) = 1(30) = 7.5 → 7.5 belongs to the class interval 30-39
2 4
3. Then solve using the formula
Quartiles:

Qk = L + KN - F
4 i
fk

Q1 = 29.5 + 1 (30) - 7
4_____ 10
9
= 29.5 + 30 - 7
4 10
9
= 29.5 + (7.5 – 7) 10
9
= 29.5 + (0.5) 10
9
= 29.5 + (0.06) 10
= 29.5 + 0.6
Q1 = 30.1 → This means that 25% of the students scored below or lower than 30.1

For D3 :
1. First compute for the ˂F of the data
2. Next, find the 3rd decile class
D3 = 3 (N) = 3(30) = 90 ----- 9 is located in the class interval 30-39
10 10 10
3. Then solve using the formula

Deciles:

Dk = L + KN - F
10 i
fk

D3 = 29.5 + 3(30) - 7
10 10
9

= 29.5 + 90 - 7
10___ 10
9
= 29.5 + (9 -7) 10
9
= 29.5 + (2) 10
9
= 29.5 + (0.22) 10
= 29.5 + 2.2
D3 = 31.7 →This implies that 30% of the students scored below or lower than 31.7

For P68 :
1. First compute for the ˂F of the data
2. Next, find the 68th percentile class
P68 = 68 (N) = 68(30) = 2040 = 20.4 is located in the class interval 40-49
100 100 100
3. Then solve using the formula
Percentile:

Pk = L + KN - F
100 i
fk

P68= 39.5 + 68 (30) - 16


100______ 10
7

= 39.5 + 2040 – 16
100____ 10
7
= 39.5 + (20.4 – 16) 10
7
= 39.5 + (4.4) 10
7
= 39.5 + (0.63) (10)

= 39.5 + 6.3

P68 = 45.8 → This means that 68% of the students scored below or lower than 45.8
Assessment Task ( 10 )

Name: ____________________________Course, Year & Section: _________________Score: _____

A. Find Q3, D9, and P77 of the following data. 20, 27, 23, 28, 23, 25

B. In a class of 50, Jason got a percentile rank of 65.

1. What does this percentile rank imply?


2. How many students rank below Jason?
3. How many students rank above Jason?

C. Find Q1, D6, and P58 of the following group data.

Weights in lbs f
129-136 2
121-128 7
113-120 6
105-112 5
97-104 10
89-96 12
81-88 8
Measures of Variability

What is variability? Variability refers to how spread apart the values/observation are or how much
the values/observations vary from each other.

There are four measures of variability: The Range, Mean Absolute Deviation, Variance and
Standard Deviation

Four Measures of Variability

1. Range (R). The range is the difference between the highest value and the lowest value. It is the
simplest measure of variability to calculate. The formula is as follows:

R = Highest – Lowest

Examples:

1. Find the range of the following group of numbers: 10, 12, 5, 16, 7, 13, 4.
Solution:
By inspection, the highest number is 16, and the lowest number is 4.
So, 16-4 = 12
Range = 12
2. Given the frequency distribution table below, compute for Range.
Scores of 40 students in a 60 item Quiz
Class Interval f
53-58 3
47-52 4
41-46 1
35-40 2
29-34 10
23-28 11
17-22 4
11-16 3
5-10 2
N=40

Solution:
First, determine the class boundaries. So we have
Class Interval f Class
Boundaries
53-58 3 52.5 - 58.5
47-52 4 46.5 – 52.5
41-46 1 40.5 – 46.5
35-40 2 34.5 – 40.4
29-34 10 28.5 – 34.5
23-28 11 22.5 – 28.5
17-22 4 16.5 – 22.5
11-16 3 10.5 – 16.5
5-10 2 4.5 – 10.5
N = 40
Next, get the difference between the highest class boundary and the lowest class boundary. Then
we will have,
R = HCB – LCB
R = 58.5 – 4.5
R = 54 → Thus the range of the data set is 54 points
2. Mean Absolute Deviation (MAD) is the average distance between each observation and the mean.
It gives us an idea about the variability in a data set.

Steps in calculating the Mean Absolute Deviation (MAD).

1. Calculate the mean


2. Calculate how far away each value/observation is from the mean using positive distances. These
are called absolute deviations
3. Add those deviations together
4. Divide the sum by the number of values/observation

Formulas:

Ungrouped Data:
MAD = ∑│x – ẍ│
N
Where:
MAD = mean absolute deviation
x = raw score
ẍ = mean score
N = number of observation

Grouped Data:
MAD = ∑f│x – ẍ│
N
Where:
MAD = mean absolute deviation
f = frequency
x = raw score
ẍ = mean score
N = number of observation
Examples:
a. A group of mountaineers went on hiking to Mt. Mayon to study the different species of plants
existing in that area. The ages of the mountaineers are 34, 35, 45, 46, 49 and 32. What is the
MAD of their ages?
Solution:
Mean Age (ẍ) = 34 + 35 + 45 + 46 + 49 + 32 = 40.17
6

x x - ẍ │x – ẍ│
34 34 – 40.17 = -6.17 6.17
35 35 – 40.17 = -5.17 5.17
45 45 – 40.17 = 4.83 4.83
46 46 – 40.17 = 5.83 5.83
49 49 – 40.17 = 8.83 8.83
32 32 – 40.17 = 8.17 8.17
∑│x – ẍ│= 39

MAD = ∑│x – ẍ│ = 39 = 6.5 ≈ 7


N 6
Therefore, the mean absolute deviation is 7 years old
b. Given the frequency distribution table below, compute for the MAD.

Scores of 40 students in a 60 item Quiz

Class Interval f
53-58 3
47-52 4
41-46 1
35-40 2
29-34 10
23-28 11
17-22 4
11-16 3
5-10 2
N=40
Solution:
Scores of 40 students in a 60 item Quiz

Class interval f x fx │x – ẍ│ f│x – ẍ│


53 -58 3 55.5 166.5 25.2 75.6
47 - 52 4 49.5 198 19.2 76.8
41 - 46 1 43.5 43.5 13.2 13.2
35 - 40 2 37.5 75 7.2 14.4
29 - 34 10 31.5 315 1.2 12
23 - 28 11 25.5 280.5 4.8 52.8
17 - 22 4 19.5 78 10.8 43.2
11 - 16 3 13.5 40.5 16.8 50.4
5 - 10 2 7.5 15 22.5 45.6
N = 40 ẍ = ∑ fx = 1212 = 30.3 ∑f│x – ẍ│= 384
N 40

MAD = ∑f│x – ẍ│
N
= 384
40
= 9.6 ≈ 10 →Therefore, the mean absolute deviation of the scores of the students is
approximately 10.

3. Variance is the average of the squared deviations of the set of observations from the mean. It
measures how far a data set is spread out.

Ungrouped Data
Population Variance: σ N
2
= ∑(x - µ) 2
N
Where:
σ = population variance
N
2

x = raw score
µ = population mean
N = number of observations
Sample Variance: s 2
n-1 = ∑(x - ẍ) 2
n-1
Where:
s 2
n-1 = sample variance
x = raw score
ẍ = population mean
n = number of observations

Grouped Data

Population Variance: σ N
2
= ∑f(x - µ) 2
N
Where:
σ N = population variance
2

f = frequency
x = class mark
µ = population mean
N = number of observations

Sample Variance: s 2
n-1 = n∑fx2 - (∑fx) 2
n (n-1)
Where:
s 2
n-1 = sample variance
f = frequency
x = class mark
ẍ = sample mean
n = number of observations

Examples:
a. A group of mountaineers went on hiking to Mt. Mayon to study the different species of plants
existing in that area. The ages of the mountaineers are 34, 35, 45, 46, 49 and 32. What is the
variance of their ages?
Solution:

x x - ẍ (x – ẍ)2
34 34 – 40.17 = -6.17 38.07
35 35 – 40.17 = -5.17 26.73
45 45 – 40.17 = 4.83 23.33
46 46 – 40.17 = 5.83 33.99
49 49 – 40.17 = 8.83 77.97
32 32 – 40.17 = 8.17 66.75
N=6 ∑(x – ẍ)2= 266.83
ẍ = 40.17

Population Variance: σ N
2
= ∑(x - µ) 2
N
= 266.83
6
= 44. 4717 ≈ 44.47
Sample Variance: s 2
n-1 = ∑(x - ẍ) 2
n-1
= 266.83
5
= 53.366 ≈ 53.37

b. Given the frequency distribution table below, compute for the variance.

Scores of 40 students in a 60 item Quiz

Class Interval f
53-58 3
47-52 4
41-46 1
35-40 2
29-34 10
23-28 11
17-22 4
11-16 3
5-10 2
N=40
Solution:
Scores of 40 students in a 60 item Quiz

Class interval f x (x – ẍ)2 f(x – ẍ)2


53 -58 3 55.5 635.04 1905.12
47 - 52 4 49.5 368.64 1474.56
41 - 46 1 43.5 174.24 174.24
35 - 40 2 37.5 51.84 103.68
29 - 34 10 31.5 1.44 14.4
23 - 28 11 25.5 23.04 253.44
17 - 22 4 19.5 116.64 466.56
11 - 16 3 13.5 282.24 846.72
5 - 10 2 7.5 519.84 1039.68
N = 40 6278.4

ẍ = 30.3
Population Variance: σ N
2
= ∑f(x - µ) 2
N
= 6278.4
40
= 156.96

Sample Variance: s 2
n-1 = n∑fx2 - (∑fx) 2
n (n-1)
= (40x 43002) – (1212)2
40 x 39
= 1,720, 080 – 1, 468, 944
1, 560
= 251, 136
1560
= 160.9846 ≈ 160.98
4. Standard Deviation is a measure of the dispersion of a set of data from its mean. It is determined
by calculating the positive root square root of variance. A large standard deviation indicates that
the data points are far from the mean (heterogeneous) and a small standard deviation indicates that
they are clustered closely around the mean (homogeneous).
Population Standard Deviation: σ= √ σ 2

Sample Standard Deviation: s=√ s 2

Examples:
a. A group of mountaineers went on hiking to Mt. Mayon to study the different species of plants
existing in that area. The ages of the mountaineers are 34, 35, 45, 46, 49 and 32. What is the
standard deviation of their ages? Is the data homogeneous or heterogeneous?
Solution:
σ= √ σ2
= √44.47 = 6.668583 ≈ 6.67
s=√ s2
=√57.37 = 7.305477 ≈ 7.31
 The data is homogeneous.

b. Given the frequency distribution table, compute for the standard deviation.

Scores of 40 students in a 60 item Quiz

Class Interval f
53-58 3
47-52 4
41-46 1
35-40 2
29-34 10
23-28 11
17-22 4
11-16 3
5-10 2
N=40
Is the data homogeneous or heterogeneous?
Solution:
σ=√ σ2 = √156.96 = 12.52839 ≈ 12.53
Comparing the value of 2i and the computed standard deviation, we can already determine the
homogeneity and heterogeneity of the given data set. If 2i > σ, then the data is homogeneous. If 2i < σ,
then the data is heterogeneous. So we’ll have,
2i = 2(6) = 12 and σ = 12.53
12 < 12.53….this only shows that the data is heterogeneous.

Assessment Task ( 11 )
Name: ____________________________Course, Year & Section: _________________Score: _____

1. Solve for the Range, MAD, Variance and Standard Deviation of the following data set
12, 18, 24, 27, 13, 17, 18, 20

1. Find the Range, MAD, Variance and Standard Deviation of the following grouped data.
Weight of 50 Women in a Fitness Club

Weights in lbs f
129 - 126 2
121 - 128 7
113 - 120 6
105 - 112 5
97 - 104 10
89 - 96 12
81 - 88 8
N = 50

Normal Distribution
Data can be “distributed” (spread out) in different ways. It can be spread out more on the left or
more on right, or it can be all jumbled up.

But there are many cases where the data tends to be around a central value with no bias left or
right, and it gets close to a “Normal Distribution” like this:

Source: Google Image


The Normal Distribution is a bell – shaped curve called the normal curve which shows the
probability distribution of a continuous random variable. Some examples that follow a normal distribution
are heights of people, size of things produced by machines, blood pressures, and many more.

The normal curve has the following characteristics:


1. The curve is symmetric
2. The values of the mean, median, and the mode are the same
3. The curve represents a unimodal distribution
4. The area under the normal curve is 1 or the probability under the curve is 100%
5. The tails are asymptotic to the horizontal line and they extend to infinity.

The empirical rule in a normal distribution is

Source: Google Image


 68% of the data will fall within 1 standard deviation of the mean
 95% of the data will fall within 2 standard deviations of the mean
 Almost all (99.7%) of the data will fall within 3 standard deviation of the mean

REFERENCES

Aufmann, R, J. Lockwood, R. Nation, and D. Clegg. 2018. Mathematics in the Modern World
14th Edition, Cengage Learning
Daligdig, Romeo. 2019. Mathematics in the Modern World. Lorimar Publishing Inc.

Rodriguez, MJ, I. Salvador, F. Ragma, E. Torres, E. Manalang, N. Oredina, and J. Ogoy. 2018
Mathematics in the Modern World. Nieme Publishing House Co. Ltd.Philippine
Copyright.

You might also like