Stat-231 Practical Manual
Stat-231 Practical Manual
Stat-231 Practical Manual
STAT-231
INDEX
Place:
In histogram, data are plotted as a series of rectangles. Class intervals are shown on the
‘X-axis and the frequency on ‘Y -axis’
The height of each rectangle represents the frequency of the class interval. Each
rectangle is formed with the other so as to give a continuous picture. Such a graph is also
called staircases or block diagram.
Marks No of Students
21-30 6
31-40 15
41-50 22
51-60 31
61-70 17
71-80 9
5
2) Frequency curve : If the middle point of the upper boundaries of the rectangles
of a histogram is corrected by a smooth freehand curve, then that diagram is called
frequency curve.
1) The ‘less than ogive’ method- In this method we start with the upper limits of
the classes and go on adding the frequencies. When these frequencies are plotted,
we get a rising curve.
2) The ‘more than ogive’ method- In this method we start with the lower limits of
the classes and from the total frequencies we substract the frequency of each class.
When these frequencies are plotted, we get a declining curve.
*************
Exercise No -2 Date :
The measures of central tendency are the point of centre or equilibrium at which all data
set is balanced.
Arithmetic mean or the mean of a variable is defined as the sum of the observations
divided by the number of observations.
If the variable X assumes n values X1, X2, X3 .................. Xn then the mean X is given by
X1 + X2 + X3+...................+Xn
X = n
n
X = ∑ Xi
i=1
n
Problem no -1) Find the arithmetic mean of the following data
2,4,6,8,10
Exercise No -2 Date :
Problem no-2) If the weights of 5 ear heads of sorghum are 100,102,118, 124, 126 g
then the mean weight is ?
n
X = ∑ fi xi
i=1
Where,
n
Where,
f= Frequency of individual class
n= Sum of all frequencies
m= mid point of class
Exercise No -2 Date :
Median : The value which divide the whole distribution or series into two equal parts
after arranging the data in ascending or descending order is called as median.
Mode : The value which occurs frequently in the distribution or series is called as mode.
Problem no 5) Find the mean and median for the paddy variety from the following
data.
Yield ( q/ha) : 217.5 , 358.2 , 573.5 , 332.5 , 287.0 , 875.5 , 788.3 , 881.3 , 828.3
N
_
H.M =
n 1
∑ xi
i =1
5, 10, 17, 4, 30
Exercise No -2 Date :
Geometric Mean :
G.M of a series containing N observations is the nth root of the product of the values
1/n
G.M = (x1. x2..x3...............xn)
n
G.M = Antilog ( ∑ log xi
i =1
n
2, 4, 6, 8, 10, 12
******************
Exercise No -3 Date:
(Grouped data)
Grouped Data- In grouped distribution values are associated with frequency. Grouping
can be in the form of discrete distribution or continuous frequency distribution.
n
X = ∑ fixi
i=1
Where
Marks 5 10 15 20 25 30
Number of Students 8 18 12 9 7 6
n
∑ fmi
i=1
X =
n
Exercise No -3
Computation of arithmetic means, mode, median, quartiles, deciles & percentiles
(Grouped data)
Where
Cumulative Frequency ( cf ) –
Cumulative frequency of each class is the sum of the frequency of the class and the
frequencies of the previous classes ie adding the frequencies successfully. So that the last
cumulative frequency gives the total number of items.
(Grouped data
Number of members x 4 5 6 7 8 9 10
Frequency f 6 10 13 9 5 4 3
Median = l + n/2 – cf X i
f
Exercise No -3
(Grouped data)
f= total frequency
Note: If the class intervals are given in inclusive type convert them into exclusive
type and call it as true class interval and consider lower limit in this.
Mode ( Mo) : The mode refers to that value in a distribution , which occur most
frequently . It is an actual value which has the highest concentration of items in and
around it.
Continuous Distribution: See the highest frequency then the corresponding value of
class interval is called as the modal class.
f1 –f0 Xi
Mode = Mo = l +
2f1-f0-f2
Where ,
(Grouped data)
For a moderately asymmetrical distribution the relationship between them are brought by
Karl Pearson
Problem no 6) If the mean and median of a moderately asymmetrical series are 26.8
and 27.9 respectively. What would be its most probable mode.
N
H.M =
n
∑ f /x
i =1
∑ f log xi
H.M = Antilog
n
Exercise No -3
(Grouped data)
Problem no7) Calculate the A.M., G.M. and H.M of the following frequency
distribution.
X 2 4 6 8 10 12 14
Frequency f 3 5 8 12 15 4 3
Problem no 8) Calculate the A.M., G.M. and H.M of the following frequency
distribution.
*****************
Exercise No -4 Date:
Measures of Dispersion
A) Absolute measures B) Relative Measures
Range Co-efficient of Range
Quartile Deviation Co-efficient of Quartile Deviation
Mean Deviation Co-efficient of Mean Deviation
Standard Deviation Co-efficient of Standard Deviation
Definition of Range : It is the difference between the largest and smallest values of
variable included in the distribution.
Range = L – S
Coefficient of Range :
Coefficient of Range = L – S
L+S
Problem No 1) calculate the Range and Coefficient of Range for the following series.
X 3 5 6 7 8 9
Exercise No -4
Q. D. = Q3-Q1
2
Q. D. = Q3-Q1
Q3+Q1
Problem No 2) Find out the value of Q.D. and its coefficient from the following data
Marks 20 28 40 12 30 40 60
There are two methods of calculating Standard Deviation for Individual Series
Step 2: Find out the deviation of each value from the mean ( X = X- X )
Step 3: Square the deviations and take the total of squared deviations ax2
Step 4 : Divide the total (ax2 ) squared deviations by the number of observations ∑ X 2
n
2
The square root of ∑ X is Standard Deviation.
n
Thus
S=
∑ ( X - X )2
OR
∑ X2 - ( ∑X )2
S= n
5. Coefficient of Variation
X 8 15 20 17 11 3 9
******************
Exercise No -5 Date:
Range = L – S
Coefficient of Range :
Coefficient of Range = L – S
L+S
Q. D. = Q3-Q1
2
Q. D. = Q3-Q1
Q3+Q1
Exercise No -5
Quartile Deviation
Q1 = N+ 1 th item
4
Q3 = 3( N + 1) th item
4
Marks 10 20 30 40 50 60
No of students 4 7 15 8 7 2
M.D. = ∑ f D
N
Exercise No -5
In continuous series we have to find out the mid points of the various classes and
take deviation of these points from the average selected.
M.D = ∑f D
where D = m-average
M=Mid point
There are three methods for calculating Standard Deviation in discrete series.
S= ∑ f d2
N
S= ∑fx2 – ( ∑ f x )2
N
N
If the actual man is fractions, the calculation takes lot of time and labour. So this
method is rarely used in practice.
Exercise No -5
But in a continuous series, mid values of the class intervals are to be found.
∑ f ( x- X ) 2
ϭ N
=
∑ f m2 - ( ∑ f m )2
ϭ N
=
N
Coefficient of Variation:
It is expressed in terms of units in which the original figures are collected and stated.
X 8 15 20 17 11 3 9
X D D ( X – X )2
8
15
20
17
11
3
9
∑
Problem no 4 ) Calculate the Mean, M.D , S.D & C.V of the following data
( Grouped Data )
X 5 3 8 4 5 9
f 1 3 5 7 2 1
X f fx D= ( x – X)2 D f D ( X – X )2 F( X – X )2
5
3
8
4
5
9
∑
*************
Exercise No -6 Date:
Selection of random sample using simple random sampling Selection of random sample
using simple random sampling
2) Adequacy : The size of sample should be adequate, otherwise it may not represent the
characteristics of the universe.
4) Homogeneity : There is no difference in the nature of units of the universe and that of
sample.
Sampling
• Purposive sampling
• Simple Random Sampling
• Systematic Sampling
• Stratified Sampling
• Multistage Sampling
• Exercise No -6
• Selection of random sample using simple random sampling Selection of random
sample using simple random sampling
Purposive sampling
• The selection of units entirely depends on the choice of the investigator.
• This type of sampling is adopted when it is not possible to adopt any random
procedure for selection of sampling units.
Systematic Sampling
Types of correlation
r = Cov ( xy)
SD ( x) SD( y)
Exercise No -7
X Y X2 Y2 XY
15 17
16 18
17 19
18 20
19 20
20 21
21 21
∑X= ∑Y= ∑ X2 = ∑Y2 = ∑ XY=
**************
Exercise No -8 Date:
The individual inb the group can be arranged in order and there on obtaining for
each individual a number of showing his / her rank in the group.
r c = 1 – 6 ∑ D2
n3-n
Also,
rc = 1 – 6 [ ∑ D2 + 1/12 ( m3- m ) +1/12 ( m3- m ) + ............. ]
n3-n
Problem no 1 ) Calculated the rank correlation coefficient for the following data
X Y Rank of X Rank of Y D= ( X – Y) D2
80 69
85 21
83 71
81 48
78 57
50 29
X Y Rank of X Rank of Y D= ( X – Y) D2
40 11
25 11
30 20
7 5
9 15
9 1
45 18
20 8
9 5
42 16
******************
Ex no 9 & 10 : Regression : Fitting of Simple linear regression equation with test of
significance of regression coefficient
Y = a + byx X
X = independent variable
a= intercept (constant)
Y = a + byx X
X = a + bxy Y
=∑(X-X)∑(Y-Y)
∑(X-X)2
=∑(X-X)∑(Y-Y)
∑(Y-Y)2
a = Y byx X
a = X byx Y
Problem no 1 ) The following data one for the amount of water supplied in inches
and yield in tonnes per acre. Find the regression equation of yield on water.
Water( X) 12 18 24 30 36 42 48
************
Exercise No -11 Date:
Test of Significance : Problems on One Sample, Two Sample and Paired t-test
45 47 50 52 48 47 49 53 51
Does the mean of 9 items differ significantly from population mean of 47.5
Problem no 2) Two independent sample of 8 & 7 items had the following data test .
Whether the difference between the mean of sample is significant?
Sample ( I ) ( X) Sample ( II ) ( Y ) X2 Y2
9 10
11 12
13 10
11 14
15 09
12 10
09 08
14 -- --
2 2
∑X= ∑Y = ∑X = ∑Y =
Problem no 3) An IQ test was administered to 5 person before and after they were
trained . The results are given below ( So IQ before and IQ after )
**************
Exercise No -12 Date:
Suppose we are interested to test whether the two normal population have same
variance or not.
Let X1, X2, X3............. Xn be a random sample of size n, from the first population
with variance S12 and
Y1, Y2, Y3............... Yn be arandom sample size n2 from second population with a
variance S22
Null Hypothesis :
Ho = 6 12 = 6 22 = 6
ie population variance are same.
In other words H0 is that the two independent estimates of the common population
variance do not differ significantly.
f0 = S.x2
S.y2
_
S. x2 = 1 ∑ ( x-x)2
n 1- 1
_
S. y2 = 1 ∑ ( y-y)2
n2 - 1
Note: It should be noted that numerator is always greater than the denominator in
ratio.
If f0 > fe ] Reject Ho
Cal F > table F
If f0 < fe ] Accept Ho
Cal F < table F
Problem no 1) Two random samples drawn from two normal population gave
following data. Obtain line based estimates of variance two population and test
whether the population have same variance.
Sample A 20 16 26 27 23 22 18 24 15 19
Sample B 27 33 42 35 32 34 38 28 41 43
Problem no 2) Test the equality of variance of two samples given below and write
the conclusion
X 18 12 16 21 16 14 19 12
Y 9 11 12 15 14 16 15 14
F table value at 5 % level of significance 3.79
Problem no 3) Test the equality of variance of two samples given below and write
the conclusion
X 1 4 7 3 2 9 11
Y 11 15 14 12 13 12
F table value at 5 % level of significance 5.59
**************
Exercise No – 13 & 14 Date:
Chi Square Test of Goodness of Fit, Chi square of independence of Attributes for
2 X 2 contingency table
Chi-square : The chi square is one of the simplest and most widely used non parametric
test in statistical work.
The chi square ( X2) test was first used by Karl Pearson in 1900. Chi square describe
the magnitude of discrepancy in theory and observation.
∑ Ei
i =1
Where O = Observed frequency
E = Expected frequency
2X 2 Contingency table:
Under the null hypothesis of independence of attributes the value of chi square for
2 X 2 contingency table is
2X 2 Contingency table:
a b a+b
c d c+d
Chi Square Test of Goodness of Fit, Chi square of independence of Attributes for
2 X 2 contingency table
Problem no -1) From the following data regarding colour of eyes of father and son ,
test whether the colour of son’s eyes is associated with that of father’s.
Black 51 471
Total
Affected 12 28
Unaffected 13 07
Total
**************
Exercise No -15 Date :
Analysis of Variance : Analysis of Variance one way and two way classification
The portion of variation caused by different source. These source are known as
component of variation .
Role of ANOVA-
Analysis of Variance :
Sum of Squares – It means the sum of squares of the deviation of the varieties
from their men.
Degree of freedom – The number of degree of freedom is one less than the
number of variate in the sample concerned.
1. If no of variates are n then d.f is ( n – 1 )
2. If no of treatments re t then d.f is ( t – 1 )
Variance - It is obtained by dividing the Sum of Square by corresponding
degree of freedom
Sum of Square
Variance = Degree of freedom
Assumption of ANOVA-
The one way classification of data are classified according to only one criteria
H0 : µ1 = µ2 = µ3 = µ4 = ..................... µk
Ie A.M of population from which the k samples were randomly drawn are equal to one
another.
F = S12
S22
4. Compare the calculated value of F with table value of F for the degree of freedom t
certain critical level. Generally we take 5 % level of significance.
5. If Cal F > table F – difference is significant
Note – The procedure for ANOVA is applicable for both the equal and
unequal sample size .
Source of SS ( Sum of V ( Degree of MS ( Mean Square ) Variance Ratio
variation Square ) Freedom ) of F
Between the SSC V1 = C - 1 MSC = SSC / c-1 MSC / MSE
sample.
Where
The procedure for analysis of variance is somewhat different than the one followed
while dealing with the problems of one Way classification.
In two way classification the ANOVA table take place the following form
= Total Sum of Square - Sum of Square between columns - Sum of Square between
rows
F ( v1,v2) = MSC/MSE
Where v1 = ( c -1 ) v2 = ( c – 1 ) ( r – 1)
F ( v1,v2) = MSR/MSE
Where v1 = ( r -1 ) v2 = ( c – 1 ) ( r – 1)
It should be carefully noted that v1 may not be same in both cases in end case v1 = ( c -1 )
and another case v1 = ( r -1 )
*************