Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Statistics

• It is the science of assembling, analyzing,


characterizing, and interpreting the collection
of data.
• Generally the data are characterized by,
1. Measure of Central Tendency: Data shows a
tendency to concentrate at certain values
2. Measure of Dispersion: Data varie about a
measure of central tendency
Measures of Central Tendency

• Mean

• Mode

• Median
Arithmetic Mean
Arithmetic Mean of a set of numbers X1, X2,
X3,………..XN denoted by x̅ and is defined
as
Weighted Arithmetic Mean:
• Direct Method :
Example : Calculated the Arithmetic Mean
DIRC Monthly Users Statistics in the University
Library
Month No. of Total Users
Working
Days
Sep-2011 24 11618
Oct-2011 21 8857
Nov-2011 23 11459
Dec-2011 25 8841
Jan-2012 24 5478
Feb-2012 23 10811
Total 140 57064
= 407.6
Example: Find mean from the
following data
Marks class Frequency (f) Mid value (m) mf
0-10 2 5 10
10-20 18 15 270
20-30 30 25 750
30-40 17 35 595
40-50 3 45 135
sum= 70 Sum= 1760

Mean =
 mf = 25.14
f
Advantages of Mean
• It is easy to understand & simple to
calculate.

• It is based on all the values.

• It is rigidly defined.

• It is not based on the position in the


series.
Disadvantages of Mean
• It is affected by extreme values.
• It cannot be located graphically.
• It gives deceptive (misleading)
conclusions.
Geometric Mean
• It finds application in cases like populations
where we are concerned with a quantity
whose changes tend to be directly
proportional to the quantity itself.
• Row data:
  log xi 
GM = n x1 * x2 *...* xn  GM = Anti log  
 n 
• Frequency distribution:

GM = Anti log 
 f i log xi 


  i f 

Harmonic Mean
The harmonic mean is useful in limited
situations where time, rate or prices are
involved.

• Row data: HM =
n
1
 x

HM =
 f
• Frequency distribution: f
x
Median
Median is a central value of the distribution,
or the value which divides the distribution in
equal parts, each part containing equal number
of items.
Calculation of Median –Discrete series :
i. Arrange the data in ascending or descending
order. Then the median of this ordered set of
values is the value x at (n+1)/2-th position if n is
odd and average of x at (n/2)-th and (n/2)+1
positions if n is even.
ii. Discrete distribution: median in that value which
corresponds to ((N+1)/2)-th cumulative
frequency
Calculation of median – Continuous series

For calculation of median in a continuous


frequency distribution the following formula
will be employed. Algebraically,
Example: Median of a set Grouped Data in a
Distribution of Respondents by age
Age Group Frequency (f) Cumulative
frequencies(cf)
0-20 15 15
20-40 32 47
40-60 54 101
60-80 30 131
80-100 19 150
Total 150
Median (M)=40+

= 40+

= 40+0.52X20
= 40+10.37
= 50.37
Advantages of Median:
• Median can be understood even by common
people

• Median can be determined even with the


extreme items

• It can be located graphically

• It is most useful dealing with qualitative data


Disadvantages of Median:
• It is not based on all the values.

• It is not capable of further mathematical


treatment.
Mode
➢ Mode is the most frequent value or score

in the distribution.

➢ It is denoted by the capital letter Z.

➢ highest point of the frequencies


The exact value of mode can be obtained by the
following formula.

Z=L1+
Example: Calculate Mode for the distribution of
monthly rent Paid by Libraries in Karnataka

Monthly rent (Rs) Number of Libraries (f)


500-1000 5
1000-1500 10
1500-2000 8
2000-2500 16
2500-3000 14
3000 & Above 12
Total 65
Z=2000+

Z =2000+

Z=2000+0.8 ×500=400
Z=2400
Advantages of Mode
• Mode is readily understandable and
easily calculated
• It is not at all affected by extreme value
• The value of mode can also be
determined graphically
Disadvantages of Mode
• It is not based on all observations
• It is not capable of further mathematical
manipulation
• Mode is affected to a great extent by
sampling fluctuations
• Choice of grouping has great influence
on the value of mode
Measure of Dispersion
• Range

• Mean Deviation

• Variance

• Standard Deviation
Range
• Range is the difference of the greatest and the
least values in the distribution.

• Simplest but crude measure of dispersion


Mean Deviation
1 n
• Mean Deviation (M.D.)= 
n i =1
xi − x

In frequency distribution,

n n
1
M .D. =
N
f
i =1
i xi − x , N =  f i
i =1
Variance
1 n
2 =  ( xi − x )
2
Variance : n i =1

In frequency distribution,

n n
1
2 =  f (x − x ) , N =  fi
2
i i
N i =1 i =1
Standard Deviation
• Positive square root of variance
Sample S.D.
• Any experimental data may be considered as a
sample of the population;
• the statistics of a sample are used to express
the variability of a subset
• and supply an estimate of the standard
deviation of the population is known as the
sample standard deviation and is denoted by
‘s’.
For Discreet Data

n 2
1
s= 
n i =1
( xi − x ) , n  100

n 2
1
s= 
n − 1 i =1
( xi − x ) , n  100
For Frequency Distribution

n 2
1
s=
N
 f (x − x)
i =1
i i , N  100

n 2
1
s= 
N − 1 i =1
fi ( xi − x ) , N  100
Notes:
• When a statistician selects a sample and
makes a single measurement, he/she obtain at
least a rough estimate of the mean of the
parent population. This single observation,
however, can give no hint as to the degree of
the variability in the population.
• When a second measurement is taken,
however a first basis for estimating the
population variability is obtained. The
statistician states this fact by saying that two
observation supply one degree of freedom,
and so on…
Examples:
1. Find the standard deviation of IQ of 50 boys
from the following table:
I.Q. (X) 0-20 20-40 40-60 60-80 80-100 100-120 120-140 140-160

No. of 3 4 3 4 13 12 8 3
Boys
(f)
Class Frq. Xi Xi*fi Xi-Mean (xi-Mean)^2 fi*(xi-Mean)^2
Class Frq. Xi Xi*fi Xi-Mean (xi-Mean)^2 fi*(xi-Mean)^2

0-20 3 10 30 -81.2 6593.44 19780.32

20-40 4 30 120 -61.2 3745.44 14981.76

40-60 3 50 150 -41.2 1697.44 5092.32

60-80 4 70 280 -21.2 449.4 1797.76

80-100 13 90 1170 -1.2 1.44 18.72

100-120 12 110 1320 18.8 353.44 4241.28

120-140 8 130 1040 38.8 1505.44 12043.52

140-160 3 150 450 58.8 3457.44 10372.32


• Mean = 91.2

• S.D.= 37.34
2. Calculate the mean, standard deviation,
variance, the coefficient of variance, range,
median of the following data of blood pressure
measurement:
100, 98, 101, 94, 104, 102, 108, 108
3. Verify that the standard deviation of the
values 1.19, 1.20 and 1.21 is ___. What is the
standard deviation of the number 2.19, 2.20 and
2.21? Explain the result of the two calculation
above.
Solution:
If a constant is added to each value, the S.D. is
unchanged. S.D. depends on difference among
the values, not the absolute magnitude.
4. An analysis of monthly wages paid to workers
in two firms A and B belong to the same
industry gave the following results.
Firm A Firm B
No. of wages 986 548
earners
Average monthly 52.5 47.5
wages
Variance of 100 121
distribution of
wages

(a) Which firms pays out larger amounts?


(b) In which firm is these greater variability?
Thank You

You might also like