Sta301 Lec08
Sta301 Lec08
Sta301 Lec08
Lecture No. 8
f h= class interval = 3 l
n/2 = 30/2 = 15
In this example, n = 30 and n/2 = 15.
Thus the third class is the median class. The
median lies somewhere between 35.95 and 38.95.
Applying the above formula, we obtain
~ 3
X 35.95 15 6
14
35.95 1.93
37.88
~
37.9
Interpretation
Frequency Distribution of
Child-Care Managers Age
Class Interval Frequency
20 – 29 6
30 – 39 18
40 – 49 11
50 – 59 11
60 – 69 3
70 – 79 1
Total 50
Now, the median is given by,
~ hn
X l c
f 2
where
l= lower class boundary of the median class
h= class interval size of the median class
f= frequency of the median class
n= f (the total number of observations)
c= cumulative frequency of the class preceding the
median class
First of all, we construct the column of class boundary
as well as the column of cumulative frequencies.
Cumulative
Class Frequency
Class limits Frequency
Boundaries f
c.f
20 – 29 19.5 – 29.5 6 6
30 – 39 29.5 – 39.5 18 24
40 – 49 39.5 – 49.5 11 35
50 – 59 49.5 – 59.5 11 46
60 – 69 59.5 – 69.5 3 49
70 – 79 69.5 – 79.5 1 50
Total 50
Now, first of all we have to determine the median class
(i.e. that class for which the cumulative frequency is
just in excess of n/2).
In this example,
n = 50
implying that
n/2 = 50/2 = 25
Cumulative
Class Frequency
Class limits Frequency
Boundaries f
c.f
20 – 29 19.5 – 29.5 6 6
30 – 39 29.5 – 39.5 18 24
Median
class 40 – 49 39.5 – 49.5 11 35
50 – 59 49.5 – 59.5 11 46
60 – 69 59.5 – 69.5 3 49
70 – 79 69.5 – 79.5 1 50
Total 50
Hence,
l = 39.5
h = 10
f = 11
and
c = 24
Substituting these values in the formula, we obtain:
10
X 39.95 25 24
11
39.95 0.9
40.4
Interpretation
Thus, we conclude that the median age is 40.4
years.
In other words, 50% of the managers are
younger than this age, and 50% are older.
Example
WAGES OF WORKERS
IN A FACTORY
Monthly Income No. of
(in Rupees) Workers
Less than 2000/- 100
2000/- to 2999/- 300
3000/- to 3999/- 500
4000/- to 4999/- 250
5000/- and above 50
Total 1200
In this example, both the first class and the last class are open-
ended classes. This is so because of the fact that we do not have
exact figures to begin the first class or to end the last class. The
advantage of computing the median in the case of an open-ended
frequency distribution is that, except in the unlikely event of the
median falling within an open-ended group occurring in the
beginning of our frequency distribution, there is no need to
estimate the upper or lower boundary.
EMPIRICAL RELATION ETWEEN THE MEAN,
MEDIAN AND THE MODE
X
Mean = Median = Mode
But in the case of a skewed distribution, the mean,
median and mode do not all lie on the same point.
They are pulled apart from each other, and the
empirical relation explains the way in which this
happens. Experience tells us that in a unimodal
curve of moderate skewness, the median is usually
sandwiched between the mean and the mode.
The second point is that, in the case of many
real-life data-sets, it has been observed that the
distance between the mode and the median is
approximately double of the distance between the
median and the mean, as shown below:
f
X
Mode
Median
Mean
This diagrammatic picture is equivalent to the
following algebraic expression:
Median - Mode ~
2 (Mean - Median) ------ (1)
The above-mentioned point can also be expressed in
the following way:
Mean – Mode ~
3 (Mean – Median) ---- (2)
Mode ~
3 Median – 2 Mean
An exactly similar situation holds in case of a
moderately negatively skewed distribution.
An important point to note is that this empirical
relation does not hold in case of a
J-shaped or an extremely skewed distribution.
Let us try to verify this relation for the
data of EPA Mileage Ratings that we
have been considering for the past
few lectures.
Frequency Distribution for EPA Mileage
Ratings
12
10
8
6
4
2
0 X
14
12
10
8
6
4
2
0 X
X 37.85
Median:
X 37.88
Mode:
ˆ 37.825
X
Interesting Observation
The close proximity of the three measures of
central tendency provides a strong indication
of the fact that this particular distribution is
indeed very slightly skewed.
EMPIRICAL RELATION
BETWEEN THE MEAN,
MEDIAN AND THE MODE
50% 50%
X
Median
A further split to produce quarters, tenths or
hundredths of the total area under the frequency polygon
is equally possible, and may be extremely useful for
analysis. (We are often interested in the highest 10% of
some group of values or the middle 50% another.)
QUARTILES
The quartiles, together with the median, achieve the
division of the total area into four equal parts.
The first, second and third quartiles are given by
the formulae:
First quartile hn
Q1 l c
f 4
Second quartile (i.e. median)
h 2n h
Q2 l c l n 2 c
f 4 f
Third quartile
h 3n
Q3 l c
f 4
It is clear from the formula of the second
quartile that the second quartile is the same as the
median.
h n
P1 l c
f 100
The formulae for the subsequent percentiles are
h 2n
P2 l c
f 100
h 3n
P3 l c
f 100
and so on.
Again, it is easily seen that the 50th percentile is the same as
the median, the 25th percentile is the same as the 1st quartile, the 75th
percentile is the same as the 3rd quartile, the 40th percentile is the
same as the 4th decile, and so on.
All these measures i.e. the median, quartiles, deciles and percentiles
are collectively called quantiles or fractiles.
The question is, “What is the significance of this concept of
partitioning? Why is it that we wish to divide our frequency
distribution into two, four, ten or hundred parts?”
The answer to the above questions is:
In certain situations, we may be interested in describing the relative
quantitative location of a particular measurement within a data set.
Quantiles provide us with an easy way of achieving this. Out
of these various quantiles, one of the most frequently used is
percentile ranking.
FREQUENCY DISTRIBUTION OF
CHILD-CARE MANAGERS AGE
Class Interval Frequency
20 – 29 6
30 – 39 18
40 – 49 11
50 – 59 11
60 – 69 3
70 – 79 1
Total 50
Suppose we wish to determine:
hn
Q1 l c
f 4
Where, l, h and f pertain to the class that contains the
first quartile.
In this example,
hn
Q1 = l c
f 4
10
= 29.5 12.5 6
18
= 29.5 3.6
= 33.1
Interpretation
One-fourth of the managers are younger than age
33.1 years, and three-fourth are older than this
age.
The 6th Decile is given by
h 6n
D6 l c
f 10
In this example,
6n/10 = 6(50)/10 = 30
Class Frequency Cumulative
Boundaries f Frequency
cf
19.5 – 29.5 6 6
Class 29.5 – 39.5 18 24
containing
39.5 – 49.5 11 35
D6
49.5 – 59.5 11 46
59.5 – 69.5 3 49
69.5 – 79.5 1 50
Total 50
Hence,
l = 39.5
h = 10
f = 11
and
C = 24
h 6n
D6 =l c
f 10
10
= 39.5 30 24
11
= 29.5 5.45
= 44.95
Interpretation
Six-tenth i.e. 60% of the managers are younger than
age 44.95 years, and four-tenth are older than this
age.
The 17th Percentile is given by
h 17n
P17 l c
f 100
In this example,
19.5 – 29.5 6 6
Class
containing 29.5 – 39.5 18 24
P17
39.5 – 49.5 11 35
49.5 – 59.5 11 46
59.5 – 69.5 3 49
69.5 – 79.5 1 50
Total 50
Hence,
l = 29.5
h = 10
f = 18
and
C=6
Hence, 6th decile is given by
h 17n
P17 =l c
f 100
10
= 29.5 8.5 6
18
= 29.5 1.4
= 30.9
Interpretation
17% of the managers are younger than age 30.9 years,
and 83% are older than this age.
EXAMPLE:
If oil company ‘A’ reports that its yearly sales are at
the 90th percentile of all companies in the industry, the
implication is that 90% of all oil companies have yearly
sales less than company A’s, and only 10% have yearly
sales exceeding company A’s:
Relative Frequency
0.1
0.9 0
0
Yearly Sales
Company A’s sales
(90th percentile)
EXAMPLE
~
X 37.9
In a similar way, we can
locate the quartiles, deciles and
percentiles.
To obtain the first quartile,
the horizontal line will be
drawn against the value n/4,
and for the third quartile, the
horizontal line will be drawn
against the value 3n/4.
Cumulative Frequency Polygon
or OGIVE
35
30
3n 25
4 20
15
n 10
4 5
0
Q1 Q3
For the deciles, the horizontal lines will be
against the values n/10, 2n/10, 3n/10, and so on.
And for the percentiles, the horizontal lines
will be against the values n/100, 2n/100, 3n/100, and
so on. The graphic location of the quartiles as
well as of a few deciles and percentiles for the data-
set of the EPA mileage ratings may be taken up as an
exercise:
IN TODAY’S LECTURE, YOU
LEARNT:
•Median in case of the frequency distribution of a
continuous variable
•Median in case of an open-ended frequency
distribution
•Empirical relation between the mean, median and the
mode
•Quantiles
•Graphic location of quantiles
IN THE NEXT LECTURE,
YOU WILL LEARN
•Geometric mean
•Harmonic mean
•Relation between the arithmetic, geometric and
harmonic means
•Some other measures of central tendency
•Concept of dispersion