STAT-231 Manual
STAT-231 Manual
Histogram
A histogram is a bar chart or graph showing the frequency of occurrence of each value of the
variable being analysed.
In histogram, data are plotted as a series of rectangles. Class intervals are shown on the ‘X-
axis and the frequency on ‘Y -axis’
The height of each rectangle represents the frequency of the class interval. Each rectangle is
formed with the other so as to give a continuous picture. Such a graph is also called
staircases or block diagram.
However, we cannot construct a histogram for distribution with open-end classes. It is also
quite misleading if the distribution has unequal intervals and suitable adjustments in
frequencies are not made.
Solve the following problems
Problem No-1) Draw a histogram for the following data.
Number of Workers 8 16 27 19 10 8
No of Students 6 15 22 31 17 9
Frequency Curve
If the middle point of the upper boundaries of the rectangles of a histogram is corrected by a
smooth freehand curve, then that diagram is called frequency curve.
Problem No -3) Draw a frequency curve for the following data
No of Family 21 35 56 74 63 40 29 14
Frequency Polygon
If we mark the midpoints of the top horizontal sides of the rectangles in a histogram and join
them by a straight, the figure so formed is called Frequency Polygon.
This is done under the assumption that the frequencies in a class interval are evenly
distributed throughout the class. The area of the polygon is equal to the area of the
histogram, because the area left outside is just equal to the area included in it.
Problem No-4) Draw a frequency polygon for the following data.
Students No 4 7 10 18 14 8 3
Class Interval 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100
Frequency 4 6 13 25 32 19 8 3
٭٭٭
EXERCISE NO. 02
MEASURES OF CENTRAL TENDENCY: COMPUTATION OF ARITHMETIC MEAN, MODE,
MEDIAN, GM, HM , QUARTILE, DECILES& PERCENTILE (UNGROUPED DATA)
5 8
10 18
15 12
20 9
25 7
30 6
0-10 12
10-20 18
20-30 27
30-40 20
40-50 17
50-60 6
Median :
The value which divide the whole distribution or series into two equal parts after arranging
the data in ascending or descending order is called as median.
Mode :
The value which occurs frequently in the distribution or series is called as mode.
Problem no 5) Find the mean and median for the paddy variety from the following data.
Yield ( q/ha) : 217.5 , 358.2 , 573.5 , 332.5 , 287.0 , 875.5 , 788.3 , 881.3 , 828.3
Problem no 6) calculate the mean for
2, 4, 6, 8, 10
Problem no 7) Find the median for the following data
25, 18, 27, 10, 8, 30, 42, 20 , 53
Problem no 8) Calculate the mode for the following data
2 ,7, 18, 15, 10, 17, 8, 10, 2
Problem no 9 ) Calculate the mode for the following series
1, 12, 10, 15, 24, 30
Problem no 10) Calculate the mode of the following data
2, 7,10, 15, 12, 7, 14, 24, 10, 7, 20, 10
Harmonic Mean : It is a set of observations.
It is defined as reciprocal of arithmetic average of reciprocal of given values X1, X2,X3……….Xn
𝑁
𝐻. 𝑀 =
1
∑𝑛𝑖=1
𝑋𝑖
𝐺𝑀 = 𝑎𝑛𝑡𝑖𝑙𝑜𝑔 (∑ log 𝑋𝑖 )
𝑖=1
𝑛
Problem no-12) Calculate G.M for the following series
2, 4, 6, 8, 10, 12
٭٭٭
EXERCISE NO. 03
COMPUTATION OF ARITHMETIC MEANS, MODE, MEDIAN, QUARTILES, DECILES &
PERCENTILES (GROUPED DATA)
Grouped Data:
In grouped distribution values are associated with frequency. Grouping can be in the form of
discrete distribution or continuous frequency distribution.
Arithmetic Mean for Grouped data or Frequency distribution:
∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖
𝑋̄ =
𝑛
Where
x= the variable values of x
f= the frequency of individual class
N= the sum of the frequencies or total frequency
Problem no 1) Given the following frequency distribution. Calculate the arithmetic mean (
Discrete Series)
Marks 5 10 15 20 25 30
Number of Students 8 18 12 9 7 6
Number of Students 5 7 12 15 13 8
The cumulative frequency just greater than 25.5 is 29 and the value of x corresponding to 29
is 6. Hence the median size is 6 members per family.
Note: It is an appropriate method because a fractional value given by mean does not
indicate the average number of members in the family.
Problem no 3) The following data pertaining to the number of members in a family. Find the
median size of the family.
Number of members x 4 5 6 7 8 9 10
Frequency f 6 10 13 9 5 4 3
Frequency f 5 8 10 12 7 6 3 2
Mode ( Mo) : The mode refers to that value in a distribution , which occur most frequently .
It is an actual value which has the highest concentration of items in and around it.
Discrete Data : The highest frequency and corresponding value of X is mode.
Continuous Distribution: See the highest frequency then the corresponding value of class
interval is called as the modal class.
𝑓1 − 𝑓0
𝑀𝑜𝑑𝑒 = 𝑀𝑜 = 1 + [ ]×𝑖
2𝑓1 − 𝑓0 − 𝑓2
Where ,
l= Lower limit of the modal class
f1= frequency of modal class
f0= frequency of the class preceding the modal class
f2= frequency of the class succeeding the modal class
i = Width of class interval
Problem no 5) calculate the mode for the following frequency distribution
Frequency f 5 8 12 7 5 3
Problem no7) Calculate the A.M., G.M. and H.M of the following frequency distribution.
X 2 4 6 8 10 12 14
Frequency f 3 5 8 12 15 4 3
Problem no 8) Calculate the A.M., G.M. and H.M of the following frequency distribution.
Class
0-10 10-20 20-30 30-40 40-50 50-60 60-70
interval
Frequency f 4 6 10 2 15 8 5
٭٭٭
EXERCISE NO . 04
MEASURES OF DISPERSION: COMPUTATION OF RANGE, MEAN DEVIATION, QUARTILE
DEVIATION, QUARTILE DEVIATION, STANDARD DEVIATION AND VARIANCE AND
RESPECTIVE RELATIVE MEASURES (UNGROUPED DATA)
Dispersion:
means the Degree of Scattered.
Measures of Dispersion
A) Absolute measures B) Relative Measures
Range Co-efficient of Range
Quartile Deviation Co-efficient of Quartile Deviation
Mean Deviation Co-efficient of Mean Deviation
Standard Deviation Co-efficient of Standard Deviation
1) Range and coefficient of Range:
Definition of Range : It is the difference between the largest and smallest values of variable
included in the distribution.
Range for the individual observations and discrete series
𝑅𝑎𝑚𝑔𝑒 = 𝐿 − 𝑆
Where
L= Largest value or Upper boundary of the highest class
S= Smallest value or Lower boundary of the lowest class
Coefficient of Range :
𝐿−𝑆
𝐶𝑒𝑜𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 =
𝐿+𝐴
Problem No 1) calculate the Range and Coefficient of Range for the following series.
X 3 5 6 7 8 9
Marks 20 28 40 12 30 40 60
∑(𝑋 − 𝑋̄)2
𝑆=√
𝑛
OR
(∑ 𝑋)2
𝑆 = √∑ 𝑋 2 −
𝑛
Where x = Values of the variables
N = Number of observations of the series
5) Coefficient of Variation
Definition of Coefficient of Variation : The standard deviation must be converted into a
relative measure of dispersion for the purpose of comparison.
The relative measure is known as the coefficient of variation.
The coefficient of variation is obtained by dividing the standard deviation by the mean and
multiply it by 100.
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛(𝑆𝐷)
𝐶𝑝𝑒𝑓𝑓𝑖𝑐𝑒𝑖𝑛𝑡 𝑜𝑓 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛(𝐶𝑉) = × 100
𝑀𝑒𝑎𝑛
Problem no 3) Calculate the Range, M.D ( Mean Deviation ) , S.D (Standard Deviation ),
Variance and C.V ( Coefficient of Variation.
X 8 15 20 17 11 3 9
٭٭٭
EXERCISE NO. 05
MEASURES OF DISPERSION: COMPUTATION OF RANGE, MEAN DEVIATION, QUARTILE &
STANDARD DEVIATION AND VARIANCE AND RESPECTIVE RELATIVE MEASURES (GROUPED
DATA)
Marks 10 20 30 40 50 60
No of students 4 7 15 8 7 2
Frequency 12 18 5 10 9 6
∑ 𝑓|𝐷|
𝑀. 𝐷 =
𝑁
Mean Deviation for Continuous Series
The method of calculating mean deviation in a continuous series is same as in the
discrete series.
In continuous series we have to find out the mid points of the various classes and
take deviation of these points from the average selected.
∑ 𝑓|𝐷|
𝑀. 𝐷 =
𝑁
Where
D = m-average
M= Mid-point
Standard Deviation ( S.D )
Standard Deviation for Discrete Series
There are three methods for calculating Standard Deviation in discrete series.
i) Actual Mean Method
ii) Assumed Mean Method
iii) Step Deviation Method
∑ 𝑓𝑑 2
𝑆=√
𝑁
2 (∑ 𝑓𝑋)2
√∑ 𝑓𝑋 − 𝑁
𝑆=
𝑁
If the actual man is fractions, the calculation takes lot of time and labour. So, this method is
rarely used in practice.
ii) Assumed Mean Method:
Here deviations are taken not from an actual mean but from an assumed mean. This
method is used, if the given variable values are not in equal intervals.
iii) Step Deviation Method:
This method is adopted when the variable values are in equal intervals.
Standard Deviation – Continuous Series
In continuous series , the method of calculating S.D is almost same as in a discrete series.
But in a continuous series, mid values of the class intervals are to be found.
∑ 𝑓(𝑥 − 𝑋̄)2
ϭ=√
𝑁
2 (∑ 𝑓𝑚)2
√∑ 𝑓𝑚 − 𝑁
ϭ=
𝑁
Where,
x = the midpoint of class interval
N = the total sum of frequencies ( N = ∑f )
F = is the frequencies of the respective class interval
Coefficient of Variation:
The Standard deviation is an absolute measure of dispersion.
It is expressed in terms of units in which the original figures are collected and stated. The
relative measure is known as the coefficient of Variation.
𝑆. 𝐷
𝐶𝑉 = × 100
𝑀𝑒𝑎𝑛
Problem no -3) Calculate Range, M.D , S.D , Variance and C.V of the following data (
Ungrouped data )
X 8 15 20 17 11 3 9
X D |D| ( X-X̄ )2
8
15
20
17
11
3
9
∑
Problem no 4 ) Calculate the Mean, M.D , S.D & C.V of the following data ( Grouped Data )
X 5 3 8 4 5 9
f 1 3 5 7 2 1
٭٭٭
EXERCISE NO. 06
SELECTION OF RANDOM SAMPLE USING SIMPLE RANDOM SAMPLING SELECTION OF
RANDOM SAMPLE USING SIMPLE RANDOM SAMPLING
Correlation :
It is the degree of relationship between variables.
Correlation between two or more variable :
It is an analysis of covariation between two or more variables- By A. M. Tuttle
Types of correlation
i) Positive and negative correlation
ii) Simple, partial and multiple correlation
iii) Linear and non- linear correlation
𝐶𝑜𝑣(𝑥𝑦)
𝑟 = 𝐶𝑜𝑣(𝑥𝑦) =
𝑆𝐷(𝑥)𝑆𝐷(𝑦)
Problem no 1) Find Karl Pearson’s Coefficient of Correlation from the following data
between height of the father ( X ) and son ( Y ) . Comment on the result
X Y X2 Y2 XY
15 17
16 18
17 19
18 20
19 20
20 21
21 21
∑X= ∑Y= ∑ X2 = ∑Y2 = ∑ XY=
٭٭٭
EXERCISE NO. 08
SPEARMAN’S RANK CORRELATION
Rank Correlation : It is studied when no assumption about the parameters of the population
is known made.
This method is based on Ranks. It is useful to study the qualitative measure of attributes like
honesty , colour, beauty and intelligence etc.
The individual in the group can be arranged in order and there on obtaining for each
individual a number of showing his / her rank in the group.
This method was developed by Edward Spearman in 1904.
6 ∑ D2
rc =1-
n3 -n
Where
rc - rank correlation coefficient
n - Number of pairs of observation
∑ D2 - Sum of the squares of differences between the pairs of ranks.
r is positive when there will be complete agreement in the order of ranks and direction is
same.
r is negative when there will be complete disagreement in the order of ranks and direction
is opposite.
Also,
6[∑ D2 + 1⁄12(m3 -m)+ 1⁄12 (m3 -m)+ ……….]
rc =1-
n3 -n
Where m is number of items whose ranks are common
Problem no 1 ) Calculated the rank correlation coefficient for the following data
X Y Rank of X Rank of Y D= ( X – Y) D2
80 69
85 21
83 71
81 48
78 57
50 29
Problem no 2) Calculate the correlation coefficient of rank correlation of the following data
X Y Rank of X Rank of Y D= ( X – Y) D2
40 11
25 11
30 20
7 5
9 15
9 1
45 18
20 8
9 5
42 16
٭٭٭
EXERCISE NO. 9 & 10
REGRESSION : FITTING OF SIMPLE LINEAR REGRESSION EQUATION WITH TEST OF
SIGNIFICANCE OF REGRESSION COEFFICIENT
Water( X) 12 18 24 30 36 42 48
٭٭٭
EXERCISE NO. 11
TEST OF SIGNIFICANCE : PROBLEMS ON ONE SAMPLE, TWO SAMPLE AND PAIRED T-TEST
PROBLEMS ON ONE SAMPLE
45 47 50 52 48 47 49 53 51
Does the mean of 9 items differ significantly from population mean of 47.5?
Problem on Two Sample
Problem no 2) Two independent sample of 8 & 7 items had the following data test . Whether
the difference between the mean of sample is significant?
Sample(I) X Sample(II) Y X2 Y2
9 10
11 12
13 10
11 14
15 09
12 10
09 08
14 -- --
2
∑X = ∑Y = ∑X = ∑Y2 =
٭٭٭
EXERCISE NO. 12
F TEST FOR EQUALITY OF VARIANCE
If f0 > fe
Cal F > Table F
] Reject H0
If f0 < fe
Cal F < Table F
] Accept H0
Problem no 1) Two random samples drawn from two normal population gave following data.
Obtain line-based estimates of variance two population and testwhether the population have
same variance.
Sample A 20 16 26 27 23 22 18 24 15 19
Sample B 27 33 42 35 32 34 38 28 41 43
Problem no 2) Test the equality of variance of two samples given below and write the
conclusion
X 18 12 16 21 16 14 19 12
Y 9 11 12 15 14 16 15 14
Problem no 3) Test the equality of variance of two samples given below and write the
conclusion
X 1 4 7 3 2 9 11
Y 11 15 14 12 13 12
Chi-square :
The chi square is one of the simplest and most widely used non-parametric test in statistical
work.
The chi square ( X2) test was first used by Karl Pearson in 1900. Chi square describe
the magnitude of discrepancy in theory and observation.
𝑛
2)
(𝑂𝑖 − 𝐸𝑖 )2
𝐶ℎ𝑖 𝑆𝑞𝑢𝑎𝑟𝑒(𝑋 = ∑[ ]
𝐸𝑖
𝑖=1
Where ,
O is an Observed frequency
E is an Expected frequency
2 × 2 Contingency Table:
Under the null hypothesis of independence of attributes, the value of chi square for 2 × 2
contingency table is
2
𝑁(𝑎𝑑 − 𝑏𝑐)2
𝐶ℎ𝑖 𝑆𝑞𝑢𝑎𝑟𝑒(𝑋) =
(𝑎 + 𝑏)(𝑐 + 𝑑)(𝑎 + 𝑐)(𝑏 + 𝑐)
2×2 Contingency Table:
a b a+b
c d c+d
Problem no -1) From the following data regarding colour of eyes of father and son , test
whether the colour of son’s eyes is associated with that of father’s.
Affected 12 28
Unaffected 13 07
Total
Analysis of Variance :
✓ Sum of Squares: It means the sum of squares of the deviation of the varieties from
their men.
✓ Degree of freedom: The number of degree of freedom is one less than the number of
variate in the sample concerned.
• If no of variates are n then d.f is (n – 1)
• If no of treatments re t then d.f is (t – 1)
✓ Variance: It is obtained by dividing the Sum of Square by corresponding degree of
freedom.
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =
𝐷𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝐹𝑟𝑒𝑒𝑑𝑜𝑚
Assumption of ANOVA :
• Normality: The value in each group re normally distributed.
• Homogeneity: The variance within each group should be equal for all groups 612 = 622 =
623 = … … … = 62𝑐
• Independent error: It states that the error should be independent for each value.
One Way Classification:
The one-way classification of data is classified according to only one criteria the null
hypothesis.
𝐻0 ∶ µ1 = µ2 = µ3 = µ4 = … … … µ𝑘
i.e. A.M of population from which the k samples were randomly drawn are equal to one
another.
The steps in carrying out the analysis are
▪ Calculate the variance between the sample.
▪ Calculate the variance within the sample.
▪ Calculate the ratio of F
𝐵𝑒𝑡𝑤𝑒𝑒𝑛 − 𝐶𝑜𝑙𝑢𝑚𝑛 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝐹=
𝑊𝑖𝑡ℎ𝑖𝑛 − 𝐶𝑜𝑙𝑢𝑚𝑛 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑆12
𝐹= 2
𝑆2
▪ Compare the calculated value of F with table value of F for the degree of freedom to
certain critical level. Generally, we take 5 % level of significance.
▪ If Cal F > table F – difference is significant
▪ If Cal F < table F- difference is not significant
Note – The procedure for ANOVA is applicable for both the equal and unequal sample size
.
Where
SST- Total Sum of Square variation
SSC- Sum of Squares between the samples.( Columns)
SSE-Sum of Square within the samples ( Rows)
MSC-Mean Sum of Squares between the samples
MSE -Mean Sum of Squares within the samples.
Analysis of Variance in two-way classification model
In two-way classification the data are classified according to two different criteria or factors.
The procedure for analysis of variance is somewhat different than the one followed
while dealing with the problems of one-Way classification.
In two-way classification the ANOVA table take place the following form.
Source of variation SS ( Sum of Square ) Degree of Freedom MS ( Mean Sum Square ) Variance Ratio of F
Between the sample. SSC c-1 MSC = SSC / c-1 MSC / MSE
Within the sample. SSR r-1 MSE = SSE / r-1 MSR /MSE
Residual or error SSE (c - 1 ) (r-1 ) MSE = SSE / (r-1) ( c- 1)
Where
SST- Sum of Square between columns
SSR - Sum of Square between rows
SSE-Sum of Square due to Error
Total number of degree of freedom ( cr-1 )
Where c refers to number of columns
r refers to number of rows
No of d.f between column ( c -1)
No of d.f between rows ( r – 1)
No of d.f. for residual ( c – 1 ) ( r – 1)
Residual or error Sum of Square
It Is the Total Sum of Square - Sum of Square between columns - Sum of Square
between rows.
𝑀𝑆𝐶
𝐹(𝑣1 , 𝑣2 ) =
𝑀𝑆𝐸
Where,
V1 = (c – 1)
V2 = (c – 1)(r -1)
𝑀𝑆𝑅
𝐹(𝑣1 𝑣2 ) =
𝑀𝑆𝐸
Where
v1 = ( r -1 ) v2 = ( c – 1 ) ( r – 1)
It should be carefully noted that v1 may not be same in both cases in end case v1 = ( c -1 )
and another case v1 = ( r -1 )
٭٭٭