0% found this document useful (0 votes)
102 views34 pages

STAT-231 Manual

Uploaded by

Shaikh Sahil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views34 pages

STAT-231 Manual

Uploaded by

Shaikh Sahil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Ex.

Title Date Sign


No.

Graphical presentation: Histogram, Frequency curve, frequency


1
polygon, Cumulative frequency curve ( Ogive curve)

Measures of Central Tendency: Computation of arithmetic mean,


2 mode, median, GM, HM, quartile, deciles& percentile ( Ungrouped
data)

Computation of arithmetic mean, mode, median, quartiles, deciles &


3
percentiles ( Grouped data)

Measures of Dispersion: Computation of range, mean deviation,


4 quartile deviation, quartile deviation, standard deviation and variance
and respective relative measures ( Ungrouped data)

Measures of Dispersion: Computation of range, mean deviation,


5 quartile deviation, quartile deviation, standard deviation and variance
and respective relative measures ( Grouped data)

6 Selection of random sample using simple random sampling.

Correlation : Computation of Karl Pearson’s Coefficient of Correlation


7
with its test of significance

8 Spearman’s rank Correlation

9 & Regression: Fitting of Simple Linear Regression equation with test of


10 significance of regression coefficient

Test of Significance : Problems on One Sample, Two Sample and Paired


11
t-test

12 F test for equality of variance

13 & Chi Square Test of Goodness of Fit, Chi square of independence of


14 Attributes for 2 X 2 contingency table.

Analysis of Variance : Analysis of Variance one way and two-way


15
classification.
EXERCISE NO. 01
GRAPHICAL PRESENTATION: HISTOGRAM, FREQUENCY CURVE, FREQUENCY POLYGON,
CUMULATIVE FREQUENCY CURVE ( OGIVE CURVE)

Histogram
A histogram is a bar chart or graph showing the frequency of occurrence of each value of the
variable being analysed.
In histogram, data are plotted as a series of rectangles. Class intervals are shown on the ‘X-
axis and the frequency on ‘Y -axis’
The height of each rectangle represents the frequency of the class interval. Each rectangle is
formed with the other so as to give a continuous picture. Such a graph is also called
staircases or block diagram.
However, we cannot construct a histogram for distribution with open-end classes. It is also
quite misleading if the distribution has unequal intervals and suitable adjustments in
frequencies are not made.
Solve the following problems
Problem No-1) Draw a histogram for the following data.

Daily Wages 0-50 50-100 100-150 150-200 200-250 250-300

Number of Workers 8 16 27 19 10 8

Problem No-2) For the following data, draw a histogram

Marks 21-30 31-40 41-50 51-60 61-70 71-80

No of Students 6 15 22 31 17 9

Frequency Curve
If the middle point of the upper boundaries of the rectangles of a histogram is corrected by a
smooth freehand curve, then that diagram is called frequency curve.
Problem No -3) Draw a frequency curve for the following data

Monthly wages ( 0- 1000- 2000- 3000- 4000- 5000- 6000- 7000-


Rs) 1000 2000 3000 4000 5000 6000 7000 8000

No of Family 21 35 56 74 63 40 29 14

Frequency Polygon
If we mark the midpoints of the top horizontal sides of the rectangles in a histogram and join
them by a straight, the figure so formed is called Frequency Polygon.
This is done under the assumption that the frequencies in a class interval are evenly
distributed throughout the class. The area of the polygon is equal to the area of the
histogram, because the area left outside is just equal to the area included in it.
Problem No-4) Draw a frequency polygon for the following data.

Weight ( Kg) 30-35 35-40 40-45 45-50 50-55 55-60 60-65

Students No 4 7 10 18 14 8 3

Cumulative Frequency Curve (Ogive Curve):


For a set of observations, we know how to construct a frequency distribution. In
some cases, we may require the number of observations less a given value or more than a
given value. This is obtained by an accumulating (adding) the frequencies up to (or above)
the given value. This accumulated frequency is called cumulative frequency.
These cumulative frequencies are then listed in a table is called cumulative frequency
table. The curve table is obtained by plotting cumulative frequencies is called a cumulative
frequency curve or an ogive.
There are two methods of constructing ogive namely:
i) The ‘less than ogive’ method
ii) The ‘more than ogive’ method
The ‘less than ogive’ method- In this method we start with the upper limits of the classes
and go on adding the frequencies. When these frequencies are plotted, we get a rising
curve.
The ‘more than ogive’ method- In this method we start with the lower limits of the classes
and from the total frequencies we subtract the frequency of each class. When these
frequencies are plotted, we get a declining curve.
Problem No-5) Draw the Ogives for the following data.

Class Interval 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100

Frequency 4 6 13 25 32 19 8 3

‫٭٭٭‬
EXERCISE NO. 02
MEASURES OF CENTRAL TENDENCY: COMPUTATION OF ARITHMETIC MEAN, MODE,
MEDIAN, GM, HM , QUARTILE, DECILES& PERCENTILE (UNGROUPED DATA)

Measures of Central Tendency:


The collected data is to be presented in precise form. The representatives of data are to be
expressed numerically. The representatives are called as measures.
The measures of central tendency are the point of centre or equilibrium at which all data set
is balanced.

Measures of Central Tendency

Simple Averages Special Averages

Mean Median Mode Geometric Mean Harmonic Mean

Definition of Arithmetic Mean : It is defined as the sum of observations divided by the


number of observations.
Arithmetic Mean for Raw data or Ungrouped data or discrete series:
Arithmetic mean or the mean of a variable is defined as the sum of the observations divided
by the number of observations.
If the variable X assumes n values X1, X2, X3………….Xn then the mean X̄ is given
𝑋1 + 𝑋2 + 𝑋3 + … … … … + 𝑋𝑛
𝑋̄ =
𝑛
𝑛
∑𝑖=1 𝑋𝑖
𝑋̄ =
𝑛
Problem no -1) Find the arithmetic mean of the following data
2,4,6,8,10
Problem no-2) If the weights of 5 ear heads of sorghum are 100,102,118, 124, 126 g then the
mean weight is ?
Arithmetic Mean for discrete and continuous frequency distribution
a) For Discrete Frequency Distribution
𝑓1 𝑋1 + 𝑓2 𝑋2 + 𝑓3 𝑋3 + … … … … … + 𝑓𝑛 𝑋𝑛
𝑋̄ =
𝑛
𝑛
∑𝑖=1 𝑓𝑖 𝑋𝑖
𝑋̄ =
𝑛
Where,
f = Frequency of individual class
n = Sum of all frequencies
b) For Continuous Frequency Distribution
𝑓1 𝑚1 + 𝑓2 𝑚2 + 𝑓3 𝑚3 + … … … … . +𝑓𝑛 𝑚𝑛
𝑋̄ =
𝑛
𝑛
∑𝑖=1 𝑓𝑖 𝑚𝑖
𝑋̄ =
𝑛
Where,
f= Frequency of individual class
n= Sum of all frequencies
m= midpoint of class
Problem no -3) Find the arithmetic mean of the following data

Marks ( Xi) No of students ( fi) fiXi

5 8

10 18

15 12

20 9

25 7

30 6

Problem no -4) Calculate the arithmetic mean of the following data

Marks ( Xi) No of students ( fi) mi ( Midpoint) fimi

0-10 12

10-20 18

20-30 27

30-40 20

40-50 17

50-60 6

Median :
The value which divide the whole distribution or series into two equal parts after arranging
the data in ascending or descending order is called as median.
Mode :
The value which occurs frequently in the distribution or series is called as mode.
Problem no 5) Find the mean and median for the paddy variety from the following data.
Yield ( q/ha) : 217.5 , 358.2 , 573.5 , 332.5 , 287.0 , 875.5 , 788.3 , 881.3 , 828.3
Problem no 6) calculate the mean for
2, 4, 6, 8, 10
Problem no 7) Find the median for the following data
25, 18, 27, 10, 8, 30, 42, 20 , 53
Problem no 8) Calculate the mode for the following data
2 ,7, 18, 15, 10, 17, 8, 10, 2
Problem no 9 ) Calculate the mode for the following series
1, 12, 10, 15, 24, 30
Problem no 10) Calculate the mode of the following data
2, 7,10, 15, 12, 7, 14, 24, 10, 7, 20, 10
Harmonic Mean : It is a set of observations.
It is defined as reciprocal of arithmetic average of reciprocal of given values X1, X2,X3……….Xn
𝑁
𝐻. 𝑀 =
1
∑𝑛𝑖=1
𝑋𝑖

Problem no 11) From the given data calculate Harmonic Mean


5, 10, 17, 4, 30
Geometric Mean :
G.M of a series containing N observations is the nth root of the product of the values X1,
X2,X3………Xn
𝑛𝑡ℎ
𝐺𝑀 = √𝑋1 . 𝑋2 . 𝑋3 . … … … . 𝑋𝑁
𝐺𝑀 = (𝑋1 . 𝑋2 . 𝑋3 . … … . 𝑋𝑛 )1⁄𝑛
Taking log on both sides
log 𝐺𝑀 = 𝑙𝑜𝑔(𝑋1 . 𝑋2 . 𝑋3 . … … . 𝑋𝑛 )1⁄𝑛
1
log 𝐺𝑀 = log(𝑋1 . 𝑋2 . 𝑋3 . … … . 𝑋𝑛 )
𝑛
log 𝐺𝑀 = 1/𝑛(𝑙𝑜𝑔𝑋1 + 𝑙𝑜𝑔𝑋2 + … … … + 𝑙𝑜𝑔𝑋𝑛
𝑛

log 𝐺𝑀 = 1⁄𝑛 ∑ log 𝑋𝑖


𝑖=1

𝐺𝑀 = 𝑎𝑛𝑡𝑖𝑙𝑜𝑔 (∑ log 𝑋𝑖 )
𝑖=1
𝑛
Problem no-12) Calculate G.M for the following series
2, 4, 6, 8, 10, 12
‫٭٭٭‬
EXERCISE NO. 03
COMPUTATION OF ARITHMETIC MEANS, MODE, MEDIAN, QUARTILES, DECILES &
PERCENTILES (GROUPED DATA)

Grouped Data:
In grouped distribution values are associated with frequency. Grouping can be in the form of
discrete distribution or continuous frequency distribution.
Arithmetic Mean for Grouped data or Frequency distribution:
∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖
𝑋̄ =
𝑛
Where
x= the variable values of x
f= the frequency of individual class
N= the sum of the frequencies or total frequency
Problem no 1) Given the following frequency distribution. Calculate the arithmetic mean (
Discrete Series)

Marks 5 10 15 20 25 30

Number of Students 8 18 12 9 7 6

Arithmetic Mean for Grouped data or Frequency distribution:


∑𝒏𝒊=𝟏 𝒇𝒎𝒊
𝑿̄ =
𝒏
Where
m= the mid point of individual class
f= the frequency of individual class
N= the sum of the frequencies or total frequency
Problem no 2) Given the following frequency distribution. Calculate the arithmetic mean (
Continuous Series)

Marks 0-10 10-20 20-30 30-40 40-50 50-60

Number of Students 5 7 12 15 13 8

Median ( Grouped Data )


Cumulative Frequency ( cf ) :
Cumulative frequency of each class is the sum of the frequency of the class and the
frequencies of the previous classes i.e. adding the frequencies successfully. So that the last
cumulative frequency gives the total number of items.
Median ( Discrete Series)
Step 1- Find the cumulative frequencies.
Step 2- Find out the median.
Step -3 – See in the cumulative frequencies the value just greater than median.
Step-4 – Then the corresponding value of x is median.
Formula for Median ( Discrete Series)
𝑆𝑖𝑧𝑒 𝑜𝑓 ( 𝑛 + 1)𝑡ℎ 𝑖𝑡𝑒𝑚
𝑀𝑒𝑑𝑖𝑎𝑛 =
2
𝑆𝑖𝑧𝑒 𝑜𝑓 (50 + 1)𝑡ℎ 𝑖𝑡𝑒𝑚
=
2
𝑆𝑖𝑧𝑒 𝑜𝑓(51)𝑡ℎ 𝑖𝑡𝑒𝑚
=
2
= 25.5𝑡ℎ 𝑖𝑡𝑒𝑚

The cumulative frequency just greater than 25.5 is 29 and the value of x corresponding to 29
is 6. Hence the median size is 6 members per family.
Note: It is an appropriate method because a fractional value given by mean does not
indicate the average number of members in the family.
Problem no 3) The following data pertaining to the number of members in a family. Find the
median size of the family.

Number of members x 4 5 6 7 8 9 10

Frequency f 6 10 13 9 5 4 3

Median (Continuous Series)


Step 1- Find the cumulative frequencies.
Step 2- Find (n/2)th item
Step -3 – See in the cumulative frequencies the value just greater than corresponding class
interval is called as median class.
Formula for Median (Continuous Series)
𝑛⁄2 − 𝑐𝑓𝑋𝑖
𝑀𝑒𝑑𝑖𝑎𝑛 = 1 +
𝑓
Where l= Lower limit of the median class
cf= Cumulative frequency preceding the median
i= Width of median class
f= total frequency
Note: If the class intervals are given in inclusive type convert them into exclusive type and
call it as true class interval and consider lower limit in this.
Problem no 4) calculate the median from the following data

X 0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39

Frequency f 5 8 10 12 7 6 3 2

Mode ( Mo) : The mode refers to that value in a distribution , which occur most frequently .
It is an actual value which has the highest concentration of items in and around it.
Discrete Data : The highest frequency and corresponding value of X is mode.
Continuous Distribution: See the highest frequency then the corresponding value of class
interval is called as the modal class.
𝑓1 − 𝑓0
𝑀𝑜𝑑𝑒 = 𝑀𝑜 = 1 + [ ]×𝑖
2𝑓1 − 𝑓0 − 𝑓2

Where ,
l= Lower limit of the modal class
f1= frequency of modal class
f0= frequency of the class preceding the modal class
f2= frequency of the class succeeding the modal class
i = Width of class interval
Problem no 5) calculate the mode for the following frequency distribution

Class interval 0-10 10-20 20-30 30-40 40-50 50-60

Frequency f 5 8 12 7 5 3

Empirical relationship between Averages


In a symmetrical distribution three simple averages
𝑀𝑒𝑎𝑛 = 𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑀𝑜𝑑𝑒
For a moderately asymmetrical distribution, the relationship between them are brought by
Karl Pearson.
𝑀𝑜𝑑𝑒 = 3 𝑀𝑒𝑑𝑖𝑎𝑛 − 2 𝑀𝑒𝑎𝑛
Problem no 6) If the mean and median of a moderately asymmetrical series are 26.8 and
27.9 respectively. What would be its most probable mode?
Harmonic Mean (H.M.) for grouped data
𝑁
𝐻. 𝑀 =
∑𝑛𝑖=1 𝑓⁄𝑥
Geometric Mean (G.M) for grouped data
∑ 𝒇 𝒍𝒐𝒈𝒙𝒊
𝑯. 𝑴 = 𝑨𝒏𝒕𝒊𝒍𝒐𝒈
𝒏

Problem no7) Calculate the A.M., G.M. and H.M of the following frequency distribution.

X 2 4 6 8 10 12 14

Frequency f 3 5 8 12 15 4 3

Problem no 8) Calculate the A.M., G.M. and H.M of the following frequency distribution.

Class
0-10 10-20 20-30 30-40 40-50 50-60 60-70
interval

Frequency f 4 6 10 2 15 8 5

‫٭٭٭‬
EXERCISE NO . 04
MEASURES OF DISPERSION: COMPUTATION OF RANGE, MEAN DEVIATION, QUARTILE
DEVIATION, QUARTILE DEVIATION, STANDARD DEVIATION AND VARIANCE AND
RESPECTIVE RELATIVE MEASURES (UNGROUPED DATA)

Dispersion:
means the Degree of Scattered.
Measures of Dispersion
A) Absolute measures B) Relative Measures
Range Co-efficient of Range
Quartile Deviation Co-efficient of Quartile Deviation
Mean Deviation Co-efficient of Mean Deviation
Standard Deviation Co-efficient of Standard Deviation
1) Range and coefficient of Range:
Definition of Range : It is the difference between the largest and smallest values of variable
included in the distribution.
Range for the individual observations and discrete series
𝑅𝑎𝑚𝑔𝑒 = 𝐿 − 𝑆
Where
L= Largest value or Upper boundary of the highest class
S= Smallest value or Lower boundary of the lowest class
Coefficient of Range :
𝐿−𝑆
𝐶𝑒𝑜𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 =
𝐿+𝐴

Problem No 1) calculate the Range and Coefficient of Range for the following series.

X 3 5 6 7 8 9

2) Quartile Deviation and Coefficient of Quartile Deviation


Definition of Quartile Deviation : Quartile Deviation is the half of the difference between
the first and third quartiles. Hence , it is called Semi Inter Quartile Range.
𝑄3 − 𝑄1
𝑄. 𝐷. =
2
Definition of Coefficient of Quartile Deviation:
𝑄3 − 𝑄1
𝑄. 𝐷. =
𝑄3 + 𝑄1
Individual series or Ungrouped
Problem No 2) Find out the value of Q.D. and its coefficient from the following data

Marks 20 28 40 12 30 40 60

3) Mean Deviation and Coefficient of Mean Deviation:


Definition of Mean Deviation : It is the arithmetic mean of the deviations of a series
computed from any measure of central tendency i.e. mean, median or mode, all the
deviations are taken as positive i.e. signs are ignored.
Coefficient of Mean Deviation :
𝑀𝑒𝑎𝑛 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑀𝑒𝑎𝑛 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 =
𝑀𝑒𝑎𝑛 𝑜𝑟 𝑀𝑒𝑑𝑖𝑎𝑛 𝑜𝑟 𝑀𝑜𝑑𝑒
Computation of Mean Deviation- Individual Series
▪ Calculate the average mean, median or mode of the series.
▪ Take the deviations of items from average ignoring signs and denote these deviations
by |D|
▪ Compute the total of these deviations i.e. |S D|
▪ Divide this total obtained by the number of items.
Symbolically M.D
∑|𝐷|
𝑀. 𝐷 =
𝑛
Where ∑|X-X̄ |represents the modulus or the absolute value of deviations ( X-X̄ ), when the (–
ve) signs are ignored.
4) Standard Deviation and Co-efficient of Standard Deviation
Karl Pearson introduced the concept of Standard deviation in 1893. Standard deviation is
also called as Root- Mean Square Deviation.
Definition of Standard Deviation: It is defined as the positive square-root of the arithmetic
mean of the Square of the deviation of the given observation from their arithmetic mean.
The Standard Deviation is denoted by the Greek letter s ( Sigma)
Standard Deviation for Individual Series
There are two methods of calculating Standard Deviation for Individual Series
▪ Deviations taken from Actual Mean
▪ Deviations taken from Assumed Mean
Deviations taken from Actual Mean
This method is adopted when the mean is a whole number.
Step 1: Find out the actual mean of the series (X̄ )
Step 2: Find out the deviation of each value from the mean ( X = X- X̄ )
Step 3: Square the deviations and take the total of squared deviations ax2
Step 4 : Divide the total (ax2 ) squared deviations by the number of observations ∑ X2/n
The square root of ∑ X2/n is Standard Deviation.
Thus

∑(𝑋 − 𝑋̄)2
𝑆=√
𝑛
OR

(∑ 𝑋)2
𝑆 = √∑ 𝑋 2 −
𝑛
Where x = Values of the variables
N = Number of observations of the series
5) Coefficient of Variation
Definition of Coefficient of Variation : The standard deviation must be converted into a
relative measure of dispersion for the purpose of comparison.
The relative measure is known as the coefficient of variation.
The coefficient of variation is obtained by dividing the standard deviation by the mean and
multiply it by 100.
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛(𝑆𝐷)
𝐶𝑝𝑒𝑓𝑓𝑖𝑐𝑒𝑖𝑛𝑡 𝑜𝑓 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛(𝐶𝑉) = × 100
𝑀𝑒𝑎𝑛

Problem no 3) Calculate the Range, M.D ( Mean Deviation ) , S.D (Standard Deviation ),
Variance and C.V ( Coefficient of Variation.

X 8 15 20 17 11 3 9

‫٭٭٭‬
EXERCISE NO. 05
MEASURES OF DISPERSION: COMPUTATION OF RANGE, MEAN DEVIATION, QUARTILE &
STANDARD DEVIATION AND VARIANCE AND RESPECTIVE RELATIVE MEASURES (GROUPED
DATA)

Range for The Continuous Series


𝑅𝑎𝑛𝑔𝑒 = 𝐿 − 𝑆
Method 1:
Where L= Upper boundary of the highest class
S= Lower boundary of the lowest class
Method 2:
Where L= Mid value of the highest class
S= Mid value of the lowest class
Coefficient of Range :
𝐿−𝑆
Coefficient of Range =
𝐿+𝑆

Quartile Deviation and Coefficient of Quartile Deviation


Definition of Quartile Deviation : Quartile Deviation is the half of the difference between
the first and third quartiles. Hence , it is called Semi Inter Quartile Range.
𝑄3 − 𝑄1
𝑄. 𝐷. =
2
Definition of Coefficient of Quartile Deviation:
𝑄3 − 𝑄1
𝑄. 𝐷. =
𝑄3 + 𝑄1
Quartile Deviation
𝑁 + 1𝑡ℎ 𝑖𝑡𝑒𝑚
𝑄1 =
4
3(𝑁 + 1)𝑡ℎ 𝑖𝑡𝑒𝑚
𝑄3 =
4
For Discrete Series or Grouped data
Problem no 1) calculate Q.D. and its coefficient of the following distribution.

Marks 10 20 30 40 50 60

No of students 4 7 15 8 7 2

For Continuous series


Problem No 2) Calculate Q.D. and its coefficient of the following distribution.

Class Interval 10-20 20-30 30-40 40-50 50-60 60-70

Frequency 12 18 5 10 9 6

Mean Deviation for Discrete Series


Step 1- Find out an average ( Mean, Median or mode)
Step 2- Find out the deviation of the variable values from the average, ignoring signs and
denote them by D.
Step 3- Multiply the deviation of each value by its respective frequency and find out the total
of D
Step 4- Divide total f.D by the total frequencies N

∑ 𝑓|𝐷|
𝑀. 𝐷 =
𝑁
Mean Deviation for Continuous Series
The method of calculating mean deviation in a continuous series is same as in the
discrete series.
In continuous series we have to find out the mid points of the various classes and
take deviation of these points from the average selected.

∑ 𝑓|𝐷|
𝑀. 𝐷 =
𝑁
Where
D = m-average
M= Mid-point
Standard Deviation ( S.D )
Standard Deviation for Discrete Series
There are three methods for calculating Standard Deviation in discrete series.
i) Actual Mean Method
ii) Assumed Mean Method
iii) Step Deviation Method

i) Actual Mean Method:


Steps :
✓ Calculate the mean of the series.
✓ Find deviations for various items from the mean i.e. X – X̄ = d
✓ Square the deviations (d2) and multiply by the respective frequencies (f) then we get
fd2
✓ Total to product (a f d2) then apply the formula

∑ 𝑓𝑑 2
𝑆=√
𝑁

2 (∑ 𝑓𝑋)2
√∑ 𝑓𝑋 − 𝑁
𝑆=
𝑁

If the actual man is fractions, the calculation takes lot of time and labour. So, this method is
rarely used in practice.
ii) Assumed Mean Method:
Here deviations are taken not from an actual mean but from an assumed mean. This
method is used, if the given variable values are not in equal intervals.
iii) Step Deviation Method:
This method is adopted when the variable values are in equal intervals.
Standard Deviation – Continuous Series
In continuous series , the method of calculating S.D is almost same as in a discrete series.
But in a continuous series, mid values of the class intervals are to be found.

∑ 𝑓(𝑥 − 𝑋̄)2
ϭ=√
𝑁

2 (∑ 𝑓𝑚)2
√∑ 𝑓𝑚 − 𝑁
ϭ=
𝑁

Where,
x = the midpoint of class interval
N = the total sum of frequencies ( N = ∑f )
F = is the frequencies of the respective class interval
Coefficient of Variation:
The Standard deviation is an absolute measure of dispersion.
It is expressed in terms of units in which the original figures are collected and stated. The
relative measure is known as the coefficient of Variation.
𝑆. 𝐷
𝐶𝑉 = × 100
𝑀𝑒𝑎𝑛
Problem no -3) Calculate Range, M.D , S.D , Variance and C.V of the following data (
Ungrouped data )

X 8 15 20 17 11 3 9

X D |D| ( X-X̄ )2
8
15
20
17
11
3
9

Problem no 4 ) Calculate the Mean, M.D , S.D & C.V of the following data ( Grouped Data )

X 5 3 8 4 5 9

f 1 3 5 7 2 1

X f fx D = ( x – X̄ )2 |D| f|D| (X-X)2 F( X - X̄ )2

‫٭٭٭‬
EXERCISE NO. 06
SELECTION OF RANDOM SAMPLE USING SIMPLE RANDOM SAMPLING SELECTION OF
RANDOM SAMPLE USING SIMPLE RANDOM SAMPLING

Definitions of Different Concepts


i) Population: It is an aggregate of all the individual units.
ii) Sample: It refers to small unit of population which projects almost all character of
that population.
iii) Parameter: The unknown constant of population is known as parameter.
iv) Statistic: A statistic is a function of observable random variables and does not involve
any unknown parameter. Statistic is also a random variable mean (u), variance (S2)
Need of Sampling
i) Representativeness: A sample should be so selected that it truly represents the
universe otherwise the results obtained may be misleading. To ensure the
representativeness the random method of selection should be used.
ii) Adequacy: The size of sample should be adequate, otherwise it may not represent
the characteristics of the universe.
iii) Independence: All items of the sample should be selected independently of one
another and all items of the universe should have the same chance of being selected
in the sample.
iv) Homogeneity: There is no difference in the nature of units of the universe and that of
sample.
Sampling
✓ Purposive sampling
✓ Simple Random Sampling
✓ Systematic Sampling
✓ Stratified Sampling
✓ Multistage Sampling
Purposive sampling
▪ The selection of units entirely depends on the choice of the investigator.
▪ This type of sampling is adopted when it is not possible to adopt any random
procedure for selection of sampling units.
▪ In this sampling procedure there is no involvement of probability. It is also called as
subjective sampling.
Simple Random Sampling
▪ The basic probability sampling method is simple random sampling.
▪ It is the simplest of all the probability sampling methods.
▪ It is used when the population is homogeneous.
▪ The basic probability sampling method is simple random sampling.
▪ It is the simplest of all the probability sampling methods.
▪ It is used when the population is homogeneous.
Systematic Sampling
▪ Systematic sampling is a simpler and quicker method compared to other method.
Suppose that the population of size N is numbered from 1 to N.
▪ Let the desired sample size be n.
▪ The population can be divided into subgroups.
‫٭٭٭‬
EXERCISE NO. 07
CORRELATION : COMPUTATION OF KARL PEARSON’S COEFFICIENT OF CORRELATION WITH
ITS TEST OF SIGNIFICANCE

Correlation :
It is the degree of relationship between variables.
Correlation between two or more variable :
It is an analysis of covariation between two or more variables- By A. M. Tuttle
Types of correlation
i) Positive and negative correlation
ii) Simple, partial and multiple correlation
iii) Linear and non- linear correlation

Methods of studying correlation


i) Scatter diagram / Graphical method.
ii) Karl Pearson’s coefficient of correlation or Covariance method or Product movement
correlation coefficient.
iii) Spearman’s rank correlation coefficient.

i) Karl Pearson’s coefficient of correlation:


Karl Pearson , a great biometrician and statistician , suggested a mathematical
method for measuring the magnitude of linear relationship between two variables.
It is most widely used method in practice and it is known as Pearson’s Coefficient of
Correlation. It is denoted by ‘ r ‘.
𝑟 = 𝐶𝑜𝑣(𝑥𝑦)

𝐶𝑜𝑣(𝑥𝑦)
𝑟 = 𝐶𝑜𝑣(𝑥𝑦) =
𝑆𝐷(𝑥)𝑆𝐷(𝑦)

Problem no 1) Find Karl Pearson’s Coefficient of Correlation from the following data
between height of the father ( X ) and son ( Y ) . Comment on the result

X Y X2 Y2 XY

15 17
16 18
17 19
18 20
19 20
20 21
21 21
∑X= ∑Y= ∑ X2 = ∑Y2 = ∑ XY=

‫٭٭٭‬
EXERCISE NO. 08
SPEARMAN’S RANK CORRELATION

Rank Correlation : It is studied when no assumption about the parameters of the population
is known made.
This method is based on Ranks. It is useful to study the qualitative measure of attributes like
honesty , colour, beauty and intelligence etc.
The individual in the group can be arranged in order and there on obtaining for each
individual a number of showing his / her rank in the group.
This method was developed by Edward Spearman in 1904.
6 ∑ D2
rc =1-
n3 -n
Where
rc - rank correlation coefficient
n - Number of pairs of observation
∑ D2 - Sum of the squares of differences between the pairs of ranks.

r is positive when there will be complete agreement in the order of ranks and direction is
same.
r is negative when there will be complete disagreement in the order of ranks and direction
is opposite.
Also,
6[∑ D2 + 1⁄12(m3 -m)+ 1⁄12 (m3 -m)+ ……….]
rc =1-
n3 -n
Where m is number of items whose ranks are common
Problem no 1 ) Calculated the rank correlation coefficient for the following data

X Y Rank of X Rank of Y D= ( X – Y) D2

80 69

85 21

83 71

81 48

78 57

50 29
Problem no 2) Calculate the correlation coefficient of rank correlation of the following data

X Y Rank of X Rank of Y D= ( X – Y) D2
40 11
25 11
30 20
7 5
9 15
9 1
45 18
20 8
9 5
42 16

‫٭٭٭‬
EXERCISE NO. 9 & 10
REGRESSION : FITTING OF SIMPLE LINEAR REGRESSION EQUATION WITH TEST OF
SIGNIFICANCE OF REGRESSION COEFFICIENT

Regression: The term is coined by British Biometrician Sir Francis Galton.


Regression gives a mathematical relationship between two variables. The mathematical
expression of regression line is as-
𝑌 = 𝑎 + 𝑏𝑦𝑥 𝑋
Where,
Y = dependent variable
X = independent variable
a= intercept (constant)
byx= regression coefficient ( constant )
Line of regression ‘Y’ on ‘X’
𝑌 = 𝑎 + 𝑏𝑦𝑥 𝑋
Line of regression ‘X’ on ‘Y’
𝑋 = 𝑎 + 𝑏𝑥𝑦 𝑌
byx = Regression coefficient of Y on X
bxy = Regression coefficient of X on Y
Regression Coefficient
𝐶𝑜𝑣(𝑥, 𝑦)
𝑏𝑦𝑥 =
𝑉𝑎𝑟(𝑋)
𝐶𝑜𝑣(𝑥, 𝑦)
𝑏𝑥𝑦 =
𝑉𝑎𝑟(𝑦)

Regression ‘Y’ on ‘X’


Coefficient of regression of ‘Y’ on ‘X’
𝐶𝑜𝑣(𝑥, 𝑦)
𝑏𝑦𝑥 =
𝑉𝑎𝑟(𝑋)
∑(𝑋 − 𝑋̄) ∑(𝑌 − 𝑌̄)
=
∑(𝑋 − 𝑋̄)2
Coefficient of regression of ‘X’ on ‘Y’
𝐶𝑜𝑣(𝑥, 𝑦)
𝑏𝑥𝑦 =
𝑉𝑎𝑟(𝑋)
∑(𝑋 − 𝑋̄) ∑(𝑌 − 𝑌̄)
=
∑(𝑌 − 𝑌̄)2
Problem no 1 ) The following data one for the amount of water supplied in inches and yield
in tonnes per acre. Find the regression equation of yield on water.

Water( X) 12 18 24 30 36 42 48

Yield ( Y ) 5.3 5.7 6.7 7.2 8.2 8.7 8.4

‫٭٭٭‬
EXERCISE NO. 11
TEST OF SIGNIFICANCE : PROBLEMS ON ONE SAMPLE, TWO SAMPLE AND PAIRED T-TEST
PROBLEMS ON ONE SAMPLE

Problem no 1) The 9 items of sample had following values

45 47 50 52 48 47 49 53 51

Does the mean of 9 items differ significantly from population mean of 47.5?
Problem on Two Sample
Problem no 2) Two independent sample of 8 & 7 items had the following data test . Whether
the difference between the mean of sample is significant?

Sample(I) X Sample(II) Y X2 Y2
9 10
11 12
13 10
11 14
15 09
12 10
09 08
14 -- --
2
∑X = ∑Y = ∑X = ∑Y2 =

Problem on Paired t- test


Problem no 3) An IQ test was administered to 5 people before and after they were trained .
The results are given below ( So IQ before and IQ after )

IQ before training IQ after training


d=X -Y d2
( X) (Y)
110 120
120 118
123 125
132 136
125 121
∑X= ∑Y = ∑d= ∑ d2 =

‫٭٭٭‬
EXERCISE NO. 12
F TEST FOR EQUALITY OF VARIANCE

Testing the ratio of variance :


Suppose we are interested to test whether the two normal population have same variance
or not.
Let X1, X2, X3............. Xn be a random sample of size n, from the first population with variance
S12 and
Y1, Y2, Y3 ………. Yn be a random sample size n2 from second population with a
variance S22
Null Hypothesis :
𝑯° = 𝟔𝟐𝟏 = 𝟔𝟐𝟐 = 𝟔

i.e. population variance is same.


In other words, Ho is that the two independent estimates of the common population
variance do not differ significantly.
𝑆. 𝑥 2
𝑓0 =
𝑆. 𝑦 2
1
𝑆. 𝑥 2 = × ∑(𝑥 − 𝑥̄ )2
𝑛1 − 1
1
𝑆. 𝑦 2 = × ∑(𝑦 − 𝑦̄ )2
𝑛2 − 1
Note: It should be noted that numerator is always greater than the denominator in ratio.
V1 = d. f. for sample having larger variance
V2 = d. f. for sample having smaller variance

If f0 > fe
Cal F > Table F
] Reject H0
If f0 < fe
Cal F < Table F
] Accept H0
Problem no 1) Two random samples drawn from two normal population gave following data.
Obtain line-based estimates of variance two population and testwhether the population have
same variance.
Sample A 20 16 26 27 23 22 18 24 15 19

Sample B 27 33 42 35 32 34 38 28 41 43

Problem no 2) Test the equality of variance of two samples given below and write the
conclusion

X 18 12 16 21 16 14 19 12

Y 9 11 12 15 14 16 15 14

F table value at 5 % level of significance 3.79

Problem no 3) Test the equality of variance of two samples given below and write the
conclusion

X 1 4 7 3 2 9 11

Y 11 15 14 12 13 12

F table value at 5 % level of significance 5.59


‫٭٭٭‬
EXERCISE NO. 13 & 14
CHI SQUARE TEST OF GOODNESS OF FIT, CHI SQUARE OF INDEPENDENCE OF ATTRIBUTES
FOR 2 × 2 CONTINGENCY TABLE

Chi-square :
The chi square is one of the simplest and most widely used non-parametric test in statistical
work.
The chi square ( X2) test was first used by Karl Pearson in 1900. Chi square describe
the magnitude of discrepancy in theory and observation.
𝑛
2)
(𝑂𝑖 − 𝐸𝑖 )2
𝐶ℎ𝑖 𝑆𝑞𝑢𝑎𝑟𝑒(𝑋 = ∑[ ]
𝐸𝑖
𝑖=1

Where ,
O is an Observed frequency
E is an Expected frequency

2 × 2 Contingency Table:
Under the null hypothesis of independence of attributes, the value of chi square for 2 × 2
contingency table is

2
𝑁(𝑎𝑑 − 𝑏𝑐)2
𝐶ℎ𝑖 𝑆𝑞𝑢𝑎𝑟𝑒(𝑋) =
(𝑎 + 𝑏)(𝑐 + 𝑑)(𝑎 + 𝑐)(𝑏 + 𝑐)
2×2 Contingency Table:

a b a+b

c d c+d

a+c b+d total (N)

Problem no -1) From the following data regarding colour of eyes of father and son , test
whether the colour of son’s eyes is associated with that of father’s.

Eye colour of Son


Eye colour of
Brown Black Total
father
Brown 230 148
Black 51 471
Total
Chi square table value of ( 2 -1) ( 2- 1) d.f. is 3.84 . Write an inference.
Problem no -2 ) In an experiment on reimmunization of cattle . The following results were
obtained

Affected Unaffected Total

Affected 12 28

Unaffected 13 07

Total

Chi square table value of ( 2 -1) ( 2- 1) d.f. is 3.84 . Write an inference.


‫٭٭٭‬
EXERCISE NO . 15
ANALYSIS OF VARIANCE : ANALYSIS OF VARIANCE ONE WAY AND TWO-WAY
CLASSIFICATION

When we note observation from an experiment pertaining yield or measurement of


any other character, we find that the observations vary from one another greatly. This
variation is due to number of factors known as source of variation.
The portion of variation caused by different source. These sources are known as
component of variation .
Role of ANOVA :
▪ It sorts and estimates the variance component
▪ It provides the test of significance.

Analysis of Variance :
✓ Sum of Squares: It means the sum of squares of the deviation of the varieties from
their men.
✓ Degree of freedom: The number of degree of freedom is one less than the number of
variate in the sample concerned.
• If no of variates are n then d.f is (n – 1)
• If no of treatments re t then d.f is (t – 1)
✓ Variance: It is obtained by dividing the Sum of Square by corresponding degree of
freedom.
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =
𝐷𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝐹𝑟𝑒𝑒𝑑𝑜𝑚
Assumption of ANOVA :
• Normality: The value in each group re normally distributed.
• Homogeneity: The variance within each group should be equal for all groups 612 = 622 =
623 = … … … = 62𝑐
• Independent error: It states that the error should be independent for each value.
One Way Classification:
The one-way classification of data is classified according to only one criteria the null
hypothesis.
𝐻0 ∶ µ1 = µ2 = µ3 = µ4 = … … … µ𝑘

i.e. A.M of population from which the k samples were randomly drawn are equal to one
another.
The steps in carrying out the analysis are
▪ Calculate the variance between the sample.
▪ Calculate the variance within the sample.
▪ Calculate the ratio of F
𝐵𝑒𝑡𝑤𝑒𝑒𝑛 − 𝐶𝑜𝑙𝑢𝑚𝑛 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝐹=
𝑊𝑖𝑡ℎ𝑖𝑛 − 𝐶𝑜𝑙𝑢𝑚𝑛 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑆12
𝐹= 2
𝑆2

▪ Compare the calculated value of F with table value of F for the degree of freedom to
certain critical level. Generally, we take 5 % level of significance.
▪ If Cal F > table F – difference is significant
▪ If Cal F < table F- difference is not significant
Note – The procedure for ANOVA is applicable for both the equal and unequal sample size
.

Source of SS ( Sum of V ( Degree of Variance Ratio


MS ( Mean Square )
variation Square ) Freedom ) of F

Between the sample SSC V1 = C - 1 MSC = SSC / c-1 MSC / MSE

Within the sample SSE V2 = n - c MSE = SSE / n-c

Where
SST- Total Sum of Square variation
SSC- Sum of Squares between the samples.( Columns)
SSE-Sum of Square within the samples ( Rows)
MSC-Mean Sum of Squares between the samples
MSE -Mean Sum of Squares within the samples.
Analysis of Variance in two-way classification model
In two-way classification the data are classified according to two different criteria or factors.
The procedure for analysis of variance is somewhat different than the one followed
while dealing with the problems of one-Way classification.
In two-way classification the ANOVA table take place the following form.
Source of variation SS ( Sum of Square ) Degree of Freedom MS ( Mean Sum Square ) Variance Ratio of F
Between the sample. SSC c-1 MSC = SSC / c-1 MSC / MSE
Within the sample. SSR r-1 MSE = SSE / r-1 MSR /MSE
Residual or error SSE (c - 1 ) (r-1 ) MSE = SSE / (r-1) ( c- 1)

Where
SST- Sum of Square between columns
SSR - Sum of Square between rows
SSE-Sum of Square due to Error
Total number of degree of freedom ( cr-1 )
Where c refers to number of columns
r refers to number of rows
No of d.f between column ( c -1)
No of d.f between rows ( r – 1)
No of d.f. for residual ( c – 1 ) ( r – 1)
Residual or error Sum of Square
It Is the Total Sum of Square - Sum of Square between columns - Sum of Square
between rows.
𝑀𝑆𝐶
𝐹(𝑣1 , 𝑣2 ) =
𝑀𝑆𝐸
Where,
V1 = (c – 1)
V2 = (c – 1)(r -1)
𝑀𝑆𝑅
𝐹(𝑣1 𝑣2 ) =
𝑀𝑆𝐸
Where
v1 = ( r -1 ) v2 = ( c – 1 ) ( r – 1)
It should be carefully noted that v1 may not be same in both cases in end case v1 = ( c -1 )
and another case v1 = ( r -1 )
‫٭٭٭‬

You might also like