Data Analysis: Kulwant Singh Kapoor
Data Analysis: Kulwant Singh Kapoor
Data Analysis: Kulwant Singh Kapoor
2,, ,
n
with =N, total
frequency, median is the size of the (N+1)/2th item or
observation. In this case the use of cumulative frequency
(c. .) distribution facilitates the calculations.
EXAMPLE 4
MARKS OF 10
STUDENTS ARE
4 7 6 8 9 4 3 2 7 8
IN ORDER 2 3 4 4 6 7 7 8 8 9
MEDIAN 6.5
MARKS OF 11
STUDENTS ARE
4 7 6 8 9 4 3 2 7 8 4
IN ORDER 2 3 4 4 4 6 7 7 8 8 9
MEDIAN 6
8 COINS ARE TOSSED AND NUMBER OF HEAD ARE NOATED
THE EXPERIMENT IS REPEATED 256 TIMES
# HEADS FREQUENCY
X f CF xf
0 1 1 0
1 9 10 9
2 26 36 52
3 59 95 177
4 72 167 288
5 52 219 260
6 29 248 174
7 7 255 49
8 1 256 8
N/2 128 1017
MEDIAN 4 mean 3.972656
Case III: Continuous distribution:
Compute cumulative frequency (cf)
Find N/2
See cf just greater than N/2
The corresponding class contains the median value
called median class
2
h N
Median l C
f
| |
= +
|
\ .
Where l is the lower limit of median class
f is the frequency of the median class
H is the magnitude of the median class
N is the total frequency
C is the CF of the class preceding the median class
Merits:
i. It is rigidly defined
ii. It is easy to understand and calculate for a non medical
person.
iii. It is not affected by extreme observations and as such is very
useful in the case of skewed distributions
iv. It can be computed by dealing with the distribution with open
end classes
v. It can sometimes be located by simple inspection and can
also be computed graphically
vi. It is the only average to be used while dealing with qualitative
characteristics which can not be measured quantitatively but
still can be arranged in ascending oe descending order of
magnitude.
Merits And Demerits
Merits And Demerits
Demerits:
i. In case of even number of observations of
ungrouped data it can not be determined
exactly.
ii. It is not based on each and every item of the
distribution.
iii. It is not suitable for further mathematical
treatment.
iv. It is relatively less stable than mean, particularly
for small samples.
Quartile
The values which divide the given
data into four equal parts are
known as quartiles. Therefore,
there will be only three such points
Quartile
The values which divide the given data into four
equal parts are known as quartiles. Therefore,
there will be only three such points Q
1,
Q
2 and
Q
3
such that Q
1
Q
2
Q
3
termed as the three quartiles.
Q
1
known as the lower or first quartile is the value
which has 25% of the items of the distribution
below it and consequently 75% of the items are
greater than it. Q
2 ,
the second quartile coincides
with the median and has equal number of
observations above and below it. Q
3
upper or third
quartile, has 75% of the observations below it and
consequently 25% of the observations above it
1
4
h N
Q l C
f
| |
= +
|
\ .
3
3
4
h N
Q l C
f
| |
= +
|
\ .
Percentile
Percentiles are the values which divide the
series into 100 equal parts. So, there are 99
percentiles P
1
, P
2
P
99
such that P
1
P
2
P
99.
The i
th
percentile value is:
100
i
h iN
P l C
f
| |
= +
|
\ .
MODE
Mode is the value which has the
greatest frequency density
Mode for continuous distribution is
given by
( )
( ) ( )
1 0
1 0 2 1
h f f
Mode l
f f f f
= +
EXAMPLE 7
f x xf
10-20 4 15 60
20-30 6 25 150
30-40 5 35 175
40-50 10 45 450
50-60 20 55 1100
60-70 22 65 1430
70-80 21 75 1575
80-90 6 85 510
90-100 2 95 190
100-110 1 105 105
f1=22 h=10 5745
f0=20 97
f2=21 mean 59.2268
l=60
mode= 66.6666667
Measures of Dispersion
Range
Quartile deviation
Mean Deviation
Variance
Standard deviation
RANGE
max min
Range X X =
Range is the difference between the two extreme
observations of distribution
OR
It is the difference between the greatest (maximum) and the
smallest (minimum) observation of the distribution.
It is the simplest but crude measure of dispersion. It is
rigidly defined, readily comprehensible and easiest to
compute requiring very little calculations
EXAMPLE
MARKS OF STUDENTS
ROLL NO. MARKS SORTED
123 98 52
125 95 56
126 96 56
127 87 66
128 56 78
134 52 87
135 89 89
136 78 95
137 56 96
138 66 98
RANGE 98-52= 46
RANGE
Merits and Demerits of Range
It is not based in the entire set of data.
Its value varies very widely from sample to
sample.
If the X
max
and X
min
remain unaltered and all the
other values are replaced by a set of observation
the range of distribution remains the same.
It can not be used when dealing with open end
classes
Not Suitable for mathematical treatment.
It is very sensitive to the size of the sample.
It is too indefinite to be used as a practical
measure of dispersion.
QUARTILE DEVIATION
3 1
D
2
Q Q
Quartile eviation
=
It is a measure of dispersion based on the upper quartile
Q
3
and the lower quartile Q
1.
Inter-quartile Range= Q
3
- Q
1
Quartile Deviation is obtained from inter quartile range
on dividing by 2.
Merits and Demerits of Quartile
Merits:
It is quite easy to understand & calculate.
It makes use of 50% of the data & as such is
better measure than range
As it ignore 25% of data from the beginning and
25% from the top end, it is not affected at all by
extreme observations.
It can be Computed from the Frequency
distribution with open end classes .
(Contd.)
Demerits:
It is not based on all observations.
It is affected considerably by
fluctuations of sampling.
It is not suitable for further
mathematical treatment.
Merits and Demerits of Quartile
EXAMPLE
DISTRIBUTION OF MONTHLY EARNING
MONTH EARNING
1 10239
2 10250
3 10251
4 10251
5 10257
6 10258
7 10260
8 10261
9 10262
10 10262
11 10273
12 10275
Q1 10251
Q3 10262
QUARTILE DEVIATION 5.5
MEAN DEVIATION
1
D
i
Mean eviation X X
n
=
1
D
i i
Mean eviation f X X
N
=
( )
2
1
Standard Deviation
i i
f X X
N
o = =
( )
2
2
1
i
Variance X X
n
o = =
2 2 2 2
( )( )
[ ( ) ][ ( ) ]
n xy x y
r
n x x n y y
=
EXAMPLE
ADVERTISING Sales
EXPENSES
x y dx=x-mx dy=y-my dx^2 dy^2 dxdy
39 47 -26 -19 676 361 494
65 53 0 -13 0 169 0
62 58 -3 -8 9 64 24
90 86 25 20 625 400 500
82 62 17 -4 289 16 -68
75 68 10 2 100 4 20
25 60 -40 -6 1600 36 240
98 91 33 25 1089 625 825
36 51 -29 -15 841 225 435
78 84 13 18 169 324 234
650 660 0 0 5398 2224 2704
mx= 65
my= 66
r= 0.78