Stat-231 Practical Manual

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

PRACTICAL MANUAL

STAT-231
INDEX

Sr Exercise Name of the topic Date Sign


no no
1 1 Graphical presentation: Histogram, Frequency
curve, frequency polygon, Cumulative frequency
curve ( Ogive curve)

2 2 Measures of Central Tendency: Computation of


arithmetic mean, mode, median, GM, HM, quartile,
deciles& percentile ( Ungrouped data)

3 3 Computation of arithmetic mean, mode, median,


quartiles, deciles & percentiles ( Grouped data)

4 4 Measures of Dispersion: Computation of range,


mean deviation, quartile deviation, quartile
deviation, standard deviation and variance and
respective relative measures ( Ungrouped data)

5 5 Measures of Dispersion: Computation of range,


mean deviation, quartile deviation, quartile
deviation, standard deviation and variance and
respective relative measures ( Grouped data)

6 6 Selection of random sample using simple random


sampling.

7 7 Correlation : Computation of Karl Pearson’s


Coefficient of Correlation with its test of
significance

8 8 Spearman’s rank Correlation

9 Regression: Fitting of Simple Linear Regression


equation with test of significance of regression
coefficient
9& 10
10 11 Test of Significance : Problems on One Sample,
Two Sample and Paired t-test

11 12 F test for equality of variance

12 13 & 14 Chi Square Test of Goodness of Fit, Chi square of


independence of Attributes for 2 X 2 contingency
table.

13 15 Analysis of Variance : Analysis of Variance one


way and two way classification.
CERTIFICATE
This is to certify that Shri/Miss.........................................................
Registration No.............................. a student of III Semester of B.Sc ( Hons)
( Agri) degree has completed all the practical exercises of the
Course STAT-231 (Statistical Methods) satisfactorily during the
Year 20 - 20

Place:

Date: / / 202 Signature of Course Teacher


Exercise No -1 Date :

Graphical presentation: Histogram, Frequency curve, Frequency


polygon, Cumulative Frequency Curve ( Ogive Curve)

1) Histogram: A histogram is a bar chart or graph showing the frequency of


occurrence of each value of the variable being analysed.

In histogram, data are plotted as a series of rectangles. Class intervals are shown on the
‘X-axis and the frequency on ‘Y -axis’

The height of each rectangle represents the frequency of the class interval. Each
rectangle is formed with the other so as to give a continuous picture. Such a graph is also
called staircases or block diagram.

However we cannot construct a histogram for distribution with open-end classes. It


is also quite misleading if the distribution has unequal intervals and suitable adjustments
in frequencies are not made.

Solve the following problems

Problem No-1) Draw a histogram for the following data.

Daily Wages Number of Workers


0-50 8
50-100 16
100-150 27
150-200 19
200-250 10
250-300 8

Problem No-2) For the following data, draw a histogram

Marks No of Students
21-30 6
31-40 15
41-50 22
51-60 31
61-70 17
71-80 9
5

Graphical presentation: Histogram, Frequency curve, frequency


polygon, Cumulative Frequency Curve ( Ogive Curve)

2) Frequency curve : If the middle point of the upper boundaries of the rectangles
of a histogram is corrected by a smooth freehand curve, then that diagram is called
frequency curve.

Problem No -3) Draw a frequency curve for the following data

Monthly wages ( Rs) No of Family


0-1000 21
1000-2000 35
2000-3000 56
3000-4000 74
4000-5000 63
5000-6000 40
6000-7000 29
7000-8000 14

3) Frequency polygon: If we mark the midpoints of the top horizontal sides of


the rectangles in a histogram and join them by a straight, the figure so formed is called
Frequency Polygon.
This is done under the assumption that the frequencies in a class interval are
evenly distributed throughout the class. The area of the polygon is equal to the area
of the histogram, because the area left outside is just equal to the area included in
it.

Problem No-4) Draw a frequency polygon for the following data.

Weight ( Kg) Students No


30-35 04
35-40 07
40-45 10
45-50 18
50-55 14
55-60 08
60-65 03
6

Graphical presentation: Histogram, Frequency curve, frequency


polygon, Cumulative Frequency Curve ( Ogive Curve)

4) Cumulative Frequency Curve (Ogive Curve): For a set of


observations, we know how to construct a frequency distribution. In some cases we
may require the number of observations less a given value or more than a given value.
This is obtained by a accumulating (adding) the frequencies up to (or above) the given
value. This accumulated frequency is called cumulative frequency.
These cumulative frequencies are then listed in a table is called cumulative
frequency table. The curve table is obtained by plotting cumulative frequencies is
called a cumulative frequency curve or an ogive.
There are two methods of constructing ogive namely:
1) The ‘less than ogive’ method
2) The ‘more than ogive’ method

1) The ‘less than ogive’ method- In this method we start with the upper limits of
the classes and go on adding the frequencies. When these frequencies are plotted,
we get a rising curve.
2) The ‘more than ogive’ method- In this method we start with the lower limits of
the classes and from the total frequencies we substract the frequency of each class.
When these frequencies are plotted, we get a declining curve.

Problem No-5) Draw the Ogives for the following data

Class Interval Frequency


20-30 4
30-40 6
40-50 13
50-60 25
60-70 32
70-80 19
80-90 8
90-100 3

*************
Exercise No -2 Date :

Measures of Central Tendency: Computation of arithmetic mean, mode, median, GM,


HM , quartile, deciles& percentile (Ungrouped data)

Measures of Central Tendency: The collected data is to be presented in precise form.


The representatives of data are to be expressed numerically. The representatives are
called as measures.

The measures of central tendency are the point of centre or equilibrium at which all data
set is balanced.

Measures of Central Tendency

A) Simple Averages B) Special Averages

1) Mean 2) Median 3) Mode 1) Geometric 2) Harmonic


Mean Mean

Definition of Arithmetic Mean : It is defined as the sum of observations divided by the


number of observations.

Arithmetic Mean for Raw data or Ungrouped data or discrete series:

Arithmetic mean or the mean of a variable is defined as the sum of the observations
divided by the number of observations.

If the variable X assumes n values X1, X2, X3 .................. Xn then the mean X is given by

X1 + X2 + X3+...................+Xn
X = n

n
X = ∑ Xi
i=1

n
Problem no -1) Find the arithmetic mean of the following data

2,4,6,8,10
Exercise No -2 Date :

Measures of Central Tendency: Computation of arithmetic mean, mode, median, GM,


HM , quartile, deciles& percentile (Ungrouped data)

Problem no-2) If the weights of 5 ear heads of sorghum are 100,102,118, 124, 126 g
then the mean weight is ?

Arithmetic Mean for discrete and continuous frequency distribution

a) For discrete frequency distribution

f1 X1 + f2X2 + f3X3+ .................. +fnXn


X = n

n
X = ∑ fi xi
i=1

Where,

f = Frequency of individual class

n = Sum of all frequencies

b) For Continuous frequency distribution


f1 m1 + f2m2 + f3m3+ .................. +fnmn
X = n
n
X = ∑ fi mi
i=1

n
Where,
f= Frequency of individual class
n= Sum of all frequencies
m= mid point of class
Exercise No -2 Date :

Measures of Central Tendency: Computation of arithmetic mean, mode, median, GM,


HM , quartile, deciles& percentile (Ungrouped data)

Problem no -3) Find the arithmetic mean of the following data


Marks ( Xi) No of students ( fi) fiXi
5 8
10 18
15 12
20 9
25 7
30 6

Problem no -4) Calculate the arithmetic mean of the following data


Marks ( Xi) No of students ( fi) mi ( Mid point) fimi
0-10 12
10-20 18
20-30 27
30-40 20
40-50 17
50-60 6

Median : The value which divide the whole distribution or series into two equal parts
after arranging the data in ascending or descending order is called as median.

Mode : The value which occurs frequently in the distribution or series is called as mode.

Problem no 5) Find the mean and median for the paddy variety from the following
data.

Yield ( q/ha) : 217.5 , 358.2 , 573.5 , 332.5 , 287.0 , 875.5 , 788.3 , 881.3 , 828.3

Problem no 6) calculate the mean for 2, 4, 6, 8, 10

Problem no 7) Find the median for the following data

25, 18, 27, 10, 8, 30, 42, 20 , 53


Exercise No -2 Date :

Measures of Central Tendency: Computation of arithmetic mean, mode, median, GM,


HM , quartile, deciles& percentile (Ungrouped data)

Problem no 8) Calculate the mode for the following data

2 ,7, 18, 15, 10, 17, 8, 10, 2

Problem no 9 ) Calculate the mode for the following series

1, 12, 10, 15, 24, 30

Problem no 10) Calculate the mode of the following data

2, 7,10, 15, 12, 7, 14, 24, 10, 7, 20, 10

Harmonic Mean : It is a set of observations.

It is defined as reciprocal of arithmetic average of reciprocal of given values x1,


x2,x3 ............. xn

N
_
H.M =
n 1
∑ xi
i =1

Problem no 11) From the given data calculate Harmonic Mean

5, 10, 17, 4, 30
Exercise No -2 Date :

Measures of Central Tendency: Computation of arithmetic mean, mode, median, GM,


HM , quartile, deciles& percentile (Ungrouped data)

Geometric Mean :

G.M of a series containing N observations is the nth root of the product of the values

x1, x2,x3 ............. xn

GM= nth root x1 .x2. x3 ............... xn

1/n
G.M = (x1. x2..x3...............xn)

Taking log on both sides


1/n
log G.M = log (x1. x2..x3...............xn)

log G.M = 1/n log (x1. x2..x3 .............. xn)

log G.M = 1/n ( log x1 + log x2 + ................+ log xn)

log G.M = 1/n n


∑ log xi
i =1

n
G.M = Antilog ( ∑ log xi
i =1
n

Problem no-12) Calculate G.M for th following series

2, 4, 6, 8, 10, 12

******************
Exercise No -3 Date:

Computation of arithmetic means, mode, median, quartiles, deciles & percentiles

(Grouped data)

Grouped Data- In grouped distribution values are associated with frequency. Grouping
can be in the form of discrete distribution or continuous frequency distribution.

Arithmetic Mean for Grouped data or Frequency distribution:

n
X = ∑ fixi

i=1

Where

x= the variable values of x

f= the frequency of individual class

N= the sum of the frequencies or total frequency

Problem no 1) Given the following frequency distribution. Calculate the arithmetic


mean ( Discrete Series)

Marks 5 10 15 20 25 30
Number of Students 8 18 12 9 7 6

Arithmetic Mean for Grouped data or Frequency distribution:

n
∑ fmi
i=1
X =
n
Exercise No -3
Computation of arithmetic means, mode, median, quartiles, deciles & percentiles

(Grouped data)

Where

m= the mid point of individual class

f= the frequency of individual class

N= the sum of the frequencies or total frequency

Problem no 2) Given the following frequency distribution. Calculate the arithmetic


mean ( Continuous Series)

Marks 0-10 10-20 20-30 30-40 40-50 50-60


Number of Students 5 7 12 15 13 8

Median ( Grouped data )

Cumulative Frequency ( cf ) –

Cumulative frequency of each class is the sum of the frequency of the class and the
frequencies of the previous classes ie adding the frequencies successfully. So that the last
cumulative frequency gives the total number of items.

Median ( Discrete Series)

Step 1- Find the cumulative frequencies.


Step 2- Find out the median.
Step -3 – See in the cumulative frequencies the value just greater than median.
Step-4 – Then the corresponding value of x is median.
Formula for Median ( Discrete Series)

Median = Size of ( n +1)th item


2
Exercise No -3
Computation of arithmetic means, mode, median, quartiles, deciles & percentiles

(Grouped data

= Size of (50+1)th item


2
=Size of (51)th item
2
= 25.5th item
The cumulative frequency just greater than 25.5 is 29 and the value of x corresponding to
29 is 6. Hence the median size is 6 members per family.
Note: It is an appropriate method because a fractional value given by mean does not
indicate the average number of members in the family.
Problem no 3) The following data pertaining to the number of members in a family.
Find the median size of the family.

Number of members x 4 5 6 7 8 9 10
Frequency f 6 10 13 9 5 4 3

Median (Continuous Series)

Step 1- Find the cumulative frequencies.


Step 2- Find (n/2) th item
Step -3 – See in the cumulative frequencies the value just greater than corresponding
class interval is called as median class.
Formula for Median (Continuous Series)

Median = l + n/2 – cf X i
f
Exercise No -3

Computation of arithmetic means, mode, median, quartiles, deciles & percentiles

(Grouped data)

Where l= Lower limit of the median class

cf= Cumulative frequency preceding the median

i= Width of median class

f= total frequency

Note: If the class intervals are given in inclusive type convert them into exclusive
type and call it as true class interval and consider lower limit in this.

Problem no 4) calculate the median from the following data

X 0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39


Frequency f 5 8 10 12 7 6 3 2

Mode ( Mo) : The mode refers to that value in a distribution , which occur most
frequently . It is an actual value which has the highest concentration of items in and
around it.

Discrete Data : The highest frequency and corresponding value of X is mode.

Continuous Distribution: See the highest frequency then the corresponding value of
class interval is called as the modal class.

f1 –f0 Xi
Mode = Mo = l +
2f1-f0-f2
Where ,

l= Lower limit of the modal class

f1= frequency of modal class

f0= frequency of the class preceding the modal class

f2= frequency of the class succeeding the modal class

i = Width of class interval


Exercise No -3

Computation of arithmetic means, mode, median, quartiles, deciles & percentiles

(Grouped data)

Problem no 5) calculate the mode for the following frequency distribution

Class interval 0-10 10-20 20-30 30-40 40-50 50-60


Frequency f 5 8 12 7 5 3

Empirical relationship between Averages

In a symmetrical distribution three simple averages

Mean = Median = Mode

For a moderately asymmetrical distribution the relationship between them are brought by
Karl Pearson

Mode = 3 Median – 2 Mean

Problem no 6) If the mean and median of a moderately asymmetrical series are 26.8
and 27.9 respectively. What would be its most probable mode.

Harmonic Mean (H.M.) for grouped data

N
H.M =
n
∑ f /x
i =1

Geometric Mean (G.M) for grouped data

∑ f log xi
H.M = Antilog
n
Exercise No -3

Computation of arithmetic means, mode, median, quartiles, deciles & percentiles

(Grouped data)

Problem no7) Calculate the A.M., G.M. and H.M of the following frequency
distribution.

X 2 4 6 8 10 12 14
Frequency f 3 5 8 12 15 4 3

Problem no 8) Calculate the A.M., G.M. and H.M of the following frequency
distribution.

Class interval 0-10 10-20 20-30 30-40 40-50 50-60 60-70


Frequency f 4 6 10 2 15 8 5

*****************
Exercise No -4 Date:

Measures of Dispersion: Computation of range, mean deviation, quartile


deviation, quartile deviation, standard deviation and variance and respective relative
measures (Ungrouped data)

Dispersion: means the Degree of Scattered.

Measures of Dispersion
A) Absolute measures B) Relative Measures
Range Co-efficient of Range
Quartile Deviation Co-efficient of Quartile Deviation
Mean Deviation Co-efficient of Mean Deviation
Standard Deviation Co-efficient of Standard Deviation

1. Range and coefficient of Range:

Definition of Range : It is the difference between the largest and smallest values of
variable included in the distribution.

A) Range for the individual observations and discrete series

Range = L – S

Where L= Largest value or Upper boundary of the highest class

S= Smallest value or Lower boundary of the lowest class

Coefficient of Range :

Coefficient of Range = L – S
L+S

Problem No 1) calculate the Range and Coefficient of Range for the following series.

X 3 5 6 7 8 9
Exercise No -4

Measures of Dispersion: Computation of range, mean deviation, quartile


deviation, quartile deviation, standard deviation and variance and respective relative
measures (Ungrouped data)

2. Quartile Deviation and Coefficient of Quartile Deviation

Definition of Quartile Deviation : Quartile Deviation is the half of the difference


between the first and third quartiles. Hence , it is called Semi Inter Quartile Range.

Q. D. = Q3-Q1
2

Definition of Coefficient of Quartile Deviation:

Q. D. = Q3-Q1
Q3+Q1

A) Individual series or Ungrouped

Problem No 2) Find out the value of Q.D. and its coefficient from the following data

Marks 20 28 40 12 30 40 60

3. Mean Deviation and Coefficient of Mean Deviation :


Definition of Mean Deviation : It is the arithmetic mean of the deviations of a series
computed from any measure of central tendency ie mean, median or mode, all the
deviations are taken as positive ie signs are ignored.

Coefficient of Mean Deviation :

Coefficient of Mean Deviation = Mean Deviation


Mean or Median or Mode
Exercise No -4

Measures of Dispersion: Computation of range, mean deviation, quartile


deviation, quartile deviation, standard deviation and variance and respective relative
measures (Ungrouped data)

Computation of Mean Deviation- Individual Series

1. Calculate the average mean, median or mode of the series.


2. Take the deviations of items from average ignoring signs and denote these
deviations by D
3. Compute the total of these deviations ie SD
4. Divide this total obtained by the number of items.
5. Symbolically M.D = ∑ D

Where ∑ X - X represents the modulus or the absolute value of deviations

( X - X ) , when the – ve signs are ignored.

4. Standard Deviation and Co-efficient of Standard Deviation

Karl Pearson introduced the concept of Standard deviation in 1893.

Standard deviation is also called as Root- Mean Square Deviation.

Definition of Standard Deviation: It is defined as the positive square-root of the


arithmetic mean of the Square of the deviation of the given observation from their
arithmetic mean.

The Standard Deviation is denoted by the Greek letter s ( Sigma)

Standard Deviation for Individual Series

There are two methods of calculating Standard Deviation for Individual Series

a) Deviations taken from Actual Mean


b) Deviations taken from Assumed Mean
Exercise No -4

Measures of Dispersion: Computation of range, mean deviation, quartile


deviation, quartile deviation, standard deviation and variance and respective relative
measures (Ungrouped data)

a) Deviations taken from Actual Mean

This method is adopted when the mean is a whole number.

Step 1: Find out the actual mean of the series ( X)

Step 2: Find out the deviation of each value from the mean ( X = X- X )

Step 3: Square the deviations and take the total of squared deviations ax2

Step 4 : Divide the total (ax2 ) squared deviations by the number of observations ∑ X 2
n
2
The square root of ∑ X is Standard Deviation.
n
Thus

S=
∑ ( X - X )2

OR

∑ X2 - ( ∑X )2
S= n

Where x = Values of the variables

N = Number of observations of the series


Exercise No -4

Measures of Dispersion: Computation of range, mean deviation, quartile


deviation, quartile deviation, standard deviation and variance and respective relative
measures (Ungrouped data)

5. Coefficient of Variation

Definition of Coefficient of Variation : The standard deviation must be converted into


a relative measure of dispersion for the purpose of comparison.

The relative measure is known as the coefficient of variation.

The coefficient of variation is obtained by dividing the standard deviation by the


mean and multiply it by 100.

Coefficient of variation ( CV) = Standard Deviation ( S.D ) X 100


Mean

Problem no 3) Calculate the Range, M.D ( Mean Deviation ) , S.D (Standard


Deviation ), Variance and C.V ( Coefficient of Variation

X 8 15 20 17 11 3 9

******************
Exercise No -5 Date:

Measures of Dispersion: Computation of range, mean deviation, quartile deviation,


quartile deviation, standard deviation and variance and respective relative measures
(Grouped data)

1) Range for the continuous series

Range = L – S

Method 1: Where L= Upper boundary of the highest class

S= Lower boundary of the lowest class

Method 2: Where L= Mid value of the highest class

S= Mid value of the lowest class

Coefficient of Range :

Coefficient of Range = L – S
L+S

2. Quartile Deviation and Coefficient of Quartile Deviation

Definition of Quartile Deviation : Quartile Deviation is the half of the difference


between the first and third quartiles. Hence , it is called Semi Inter Quartile Range.

Q. D. = Q3-Q1
2

Definition of Coefficient of Quartile Deviation:

Q. D. = Q3-Q1
Q3+Q1
Exercise No -5

Measures of Dispersion: Computation of range, mean deviation, quartile deviation,


quartile deviation, standard deviation and variance and respective relative measures
(Grouped data)

Quartile Deviation

Q1 = N+ 1 th item
4

Q3 = 3( N + 1) th item
4

For Discrete Series or Grouped data

Problem no 1) calculate Q.D. and its coefficient of the following distribution.

Marks 10 20 30 40 50 60
No of students 4 7 15 8 7 2

For Continuous series

Problem No 2) Calculate Q.D. and its coefficient of the following distribution.

Class Interval 10-20 20-30 30-40 40-50 50-60 60-70


Frequency 12 18 5 10 9 6
Mean Deviation for Discrete Series

Step 1- Find out an average ( Mean, Median or mode)


Step 2- Find out the deviation of the variable values from the average, ignoring signs and
denote them by D.
Step 3- Multiply the deviation of each value by its respective frequency and find out the
total of D
Step 4- Divide total f.D by the total frequencies N

M.D. = ∑ f D

N
Exercise No -5

Measures of Dispersion: Computation of range, mean deviation, quartile deviation,


quartile deviation, standard deviation and variance and respective relative measures
(Grouped data)

Mean Deviation for Continuous Series


The method of calculating mean deviation in a continuous series is same as in the
discrete series.

In continuous series we have to find out the mid points of the various classes and
take deviation of these points from the average selected.

M.D = ∑f D

where D = m-average

M=Mid point

Standard Deviation ( S.D )

Standard Deviation for Discrete Series

There are three methods for calculating Standard Deviation in discrete series.

1) Actual Mean Method


2) Assumed Mean Method
3) Step Deviation Method
Exercise No -5

Measures of Dispersion: Computation of range, mean deviation, quartile deviation,


quartile deviation, standard deviation and variance and respective relative measures
(Grouped data)

1) Actual Mean Method :

Steps -1) Calculate the mean of the series.


2) Find deviations for various items from the mean ie X – X = d
3) Square the deviations ( d2) and multiply by the respective frequencies ( f )
then we get fd2
4) Total to product ( a f d2 ) then apply the formula

S= ∑ f d2
N

S= ∑fx2 – ( ∑ f x )2
N
N

If the actual man is fractions, the calculation takes lot of time and labour. So this
method is rarely used in practice.

2) Assumed Mean Method:


Here deviations are taken not from an actual mean but from an assumed mean.
This method is used, if the given variable values are not in equal intervals.

3) Step Deviation Method :


This method is adopted when the variable values are in equal intervals.

a) Standard Deviation – Continuous Series


In continuous series , the method of calculating S.D is almost same as in a
discrete series.

Exercise No -5

Measures of Dispersion: Computation of range, mean deviation, quartile deviation,


quartile deviation, standard deviation and variance and respective relative measures
(Grouped data)

But in a continuous series, mid values of the class intervals are to be found.

∑ f ( x- X ) 2
ϭ N
=

∑ f m2 - ( ∑ f m )2
ϭ N
=
N

Where x = the mid point of class interval

N = the total sum of frequencies ( N = ∑f )

F = is the frequencies of the respective class interval

Coefficient of Variation:

The Standard deviation is an absolute measure of dispersion.

It is expressed in terms of units in which the original figures are collected and stated.

The relative measure is known as the coefficient of Variation.

Coefficient of Variation ( C.V ) = S.D


X 100
Mean
Exercise No -5
Measures of Dispersion: Computation of range, mean deviation, quartile deviation,
quartile deviation, standard deviation and variance and respective relative measures
(Grouped data)
Problem no -3) Calculate Range, M.D , S.D , Variance and C.V of the following data
( Ungrouped data )

X 8 15 20 17 11 3 9

X D D ( X – X )2
8
15
20
17
11
3
9

Problem no 4 ) Calculate the Mean, M.D , S.D & C.V of the following data

( Grouped Data )

X 5 3 8 4 5 9
f 1 3 5 7 2 1

X f fx D= ( x – X)2 D f D ( X – X )2 F( X – X )2
5
3
8
4
5
9

*************
Exercise No -6 Date:

Selection of random sample using simple random sampling Selection of random sample
using simple random sampling

Definitions of different concepts


1) Population: It is an aggregate of all the individual units.
2) Sample : It refers to small unit of population which projects almost all character of
that population.
3) Parameter : The unknown constant of population is known as parameter.
4) Statistic : A statistic is a function of observable random variables and does not
involve any unknown parameter.
Statistic is also a random variable mean (u ) , variance ( S2 )
Need of Sampling

1) Representativeness : A sample should be so selected that it truly represents the


universe otherwise the results obtained may be misleading.

To ensure the representativeness the random method of selection should be used.

2) Adequacy : The size of sample should be adequate, otherwise it may not represent the
characteristics of the universe.

3) Independence : All items of the sample should be selected independently of one


another and all items of the universe should have the same chance of being selected in the
sample.

4) Homogeneity : There is no difference in the nature of units of the universe and that of
sample.

Sampling

• Purposive sampling
• Simple Random Sampling
• Systematic Sampling
• Stratified Sampling
• Multistage Sampling
• Exercise No -6
• Selection of random sample using simple random sampling Selection of random
sample using simple random sampling

Purposive sampling
• The selection of units entirely depends on the choice of the investigator.

• This type of sampling is adopted when it is not possible to adopt any random
procedure for selection of sampling units.

• In this sampling procedure there is no involvement of probability. It is also called


as subjective sampling.

Simple Random Sampling

• The basic probability sampling method is simple random sampling.

• It is the simplest of all the probability sampling methods.

• It is used when the population is homogeneous.

• The basic probability sampling method is simple random sampling.

• It is the simplest of all the probability sampling methods.

• It is used when the population is homogeneous.

Systematic Sampling

Systematic sampling is a simpler and quicker method compared to other method.

Suppose that the population of size N is numbered from 1 to N.

Let the desired sample size be n.

The population can be divided into subgroups .


****************
Exercise No -7 Date:

Correlation : Computation of Karl Pearson’s Coefficient of Correlation with its test of


significance

Correlation : It is the degree of relationship between variables.

Correlation between two or more variable : It is an analysis of covariation between


two or more variables- By A.M.Tuttle

Types of correlation

1) Positive and negative correlation


2) Simple , partial and multiple correlation
3) Linear and non- linear correlation

Methods of studying correlation

1) Scatter diagram / Graphical method


2) Karl Pearson’s coefficient of correlation or Covariance method or Product
movement correlation coefficient
3) Spearman’s rank correlation coefficient

1) Karl Pearson’s coefficient of correlation:


Karl Pearson , agreat biometrician and statistician , suggested a mathematical method for
measuring the magnitude of linear relationship between two variables.

It is most widely used method in practice and it is known as Pearson’s Coefficient of


Correlation. It is denoted by ‘ r ‘

r = Cov ( xy)

• r = Cov ( xy) = Cov ( xy)

SD ( x) SD( y)
Exercise No -7

Correlation : Computation of Karl Pearson’s Coefficient of Correlation with its test of


significance

Problem no 1) Find Karl Pearson’s Coefficient of Correlation from the following


data between height of the father ( X ) and son ( Y ) . Comment on the result

X Y X2 Y2 XY
15 17
16 18
17 19
18 20
19 20
20 21
21 21
∑X= ∑Y= ∑ X2 = ∑Y2 = ∑ XY=

**************
Exercise No -8 Date:

Spearman’s rank Correlation

Rank Correlation : It is studied when no assumption about the parameters of the


population is known made.

This method is based on Ranks. It is useful to study the qualitative measure of


attributes like honesty , colour, beauty and intelligence etc.

The individual inb the group can be arranged in order and there on obtaining for
each individual a number of showing his / her rank in the group.

This method was developed by Edward Spearman in 1904.

r c = 1 – 6 ∑ D2
n3-n

Where rc – rank correlation coefficient


n - Number of pairs of observation
∑ D2 – Sum of the squares of differences between the pairs of ranks

r is positive when there will be complete agreement in the order


of ranks and direction is same.

r is negative when there will be complete disagreement in the


order of ranks and direction is opposite.

Also,
rc = 1 – 6 [ ∑ D2 + 1/12 ( m3- m ) +1/12 ( m3- m ) + ............. ]
n3-n

Where m is number of items whose ranks are common


Exercise No -8
Spearman’s rank Correlation

Problem no 1 ) Calculated the rank correlation coefficient for the following data
X Y Rank of X Rank of Y D= ( X – Y) D2
80 69
85 21
83 71
81 48
78 57
50 29

Problem no 2) Calculate the correlation coefficient of rank correlation of the


following data

X Y Rank of X Rank of Y D= ( X – Y) D2
40 11
25 11
30 20
7 5
9 15
9 1
45 18
20 8
9 5
42 16

******************
Ex no 9 & 10 : Regression : Fitting of Simple linear regression equation with test of
significance of regression coefficient

Regression: The term is coined by British Biometrician Sir Francis Galton.

Regression gives a mathematical relationship between two variables.

The mathematical expression of regression line is as-

Y = a + byx X

Where Y = dependent variable

X = independent variable

a= intercept (constant)

byx= regression coefficient ( constant )

Line of regression ‘Y’ on ‘X’

Y = a + byx X

Line of regression ‘X’ on ‘Y’

X = a + bxy Y

byx = Regression coefficient of Y on X


bxy = Regression coefficient of X on Y
Regression coefficient

byx = Cov (x, y)


Var ( X )

bxy = Cov (x, y)


Var ( Y )
Regression ‘Y’ on ‘X’

Coefficient of regression of ‘Y’ on ‘X’

byx = Cov (x, y)


Var ( X )

=∑(X-X)∑(Y-Y)
∑(X-X)2

Coefficient of regression of ‘X’ on ‘Y’

bxy = Cov (x, y)


Var ( Y )

=∑(X-X)∑(Y-Y)
∑(Y-Y)2

a = Y byx X

a = X byx Y

Problem no 1 ) The following data one for the amount of water supplied in inches
and yield in tonnes per acre. Find the regression equation of yield on water.

Water( X) 12 18 24 30 36 42 48

Yield ( Y ) 5.3 5.7 6.7 7.2 8.2 8.7 8.4

************
Exercise No -11 Date:

Test of Significance : Problems on One Sample, Two Sample and Paired t-test

 Problems on One sample

Problem no 1) The 9 items of sample had following values

45 47 50 52 48 47 49 53 51
Does the mean of 9 items differ significantly from population mean of 47.5

 Problem on Two Sample

Problem no 2) Two independent sample of 8 & 7 items had the following data test .
Whether the difference between the mean of sample is significant?

Sample ( I ) ( X) Sample ( II ) ( Y ) X2 Y2
9 10
11 12
13 10
11 14
15 09
12 10
09 08
14 -- --
2 2
∑X= ∑Y = ∑X = ∑Y =

 Problem on Paired t- test

Problem no 3) An IQ test was administered to 5 person before and after they were
trained . The results are given below ( So IQ before and IQ after )

IQ before training IQ after training d=X -Y d2


( X) (Y)
110 120
120 118
123 125
132 136
125 121
∑X= ∑Y = ∑d= ∑ d2 =

**************
Exercise No -12 Date:

F test for equality of variance

Testing the ratio of variance :

Suppose we are interested to test whether the two normal population have same
variance or not.

Let X1, X2, X3............. Xn be a random sample of size n, from the first population
with variance S12 and

Y1, Y2, Y3............... Yn be arandom sample size n2 from second population with a
variance S22

Null Hypothesis :

Ho = 6 12 = 6 22 = 6
ie population variance are same.

In other words H0 is that the two independent estimates of the common population
variance do not differ significantly.

f0 = S.x2
S.y2
_
S. x2 = 1 ∑ ( x-x)2
n 1- 1

_
S. y2 = 1 ∑ ( y-y)2
n2 - 1

Note: It should be noted that numerator is always greater than the denominator in
ratio.

V1 = d.f. for sample having larger variance

V2 = d.f. for sample having smaller variance


Exercise No -12

F test for equality of variance

If f0 > fe ] Reject Ho
Cal F > table F

If f0 < fe ] Accept Ho
Cal F < table F

Problem no 1) Two random samples drawn from two normal population gave
following data. Obtain line based estimates of variance two population and test
whether the population have same variance.

Sample A 20 16 26 27 23 22 18 24 15 19

Sample B 27 33 42 35 32 34 38 28 41 43

Problem no 2) Test the equality of variance of two samples given below and write
the conclusion

X 18 12 16 21 16 14 19 12

Y 9 11 12 15 14 16 15 14
F table value at 5 % level of significance 3.79

Problem no 3) Test the equality of variance of two samples given below and write
the conclusion

X 1 4 7 3 2 9 11
Y 11 15 14 12 13 12
F table value at 5 % level of significance 5.59

**************
Exercise No – 13 & 14 Date:

Chi Square Test of Goodness of Fit, Chi square of independence of Attributes for

2 X 2 contingency table

Chi-square : The chi square is one of the simplest and most widely used non parametric
test in statistical work.

The chi square ( X2) test was first used by Karl Pearson in 1900. Chi square describe
the magnitude of discrepancy in theory and observation.

Chi Square ( X2) = n [( O - E ) ]


i i
2

∑ Ei
i =1
Where O = Observed frequency
E = Expected frequency

2X 2 Contingency table:
Under the null hypothesis of independence of attributes the value of chi square for
2 X 2 contingency table is

Chi Square ( X2) = N ( ad – bc )2

( a +b) ( c+d) (a+c) (b+c)

2X 2 Contingency table:

a b a+b

c d c+d

a+c b+d Total ( N)


Exercise No – 13 & 14

Chi Square Test of Goodness of Fit, Chi square of independence of Attributes for

2 X 2 contingency table

Problem no -1) From the following data regarding colour of eyes of father and son ,
test whether the colour of son’s eyes is associated with that of father’s.

Eye colour of Son

Eye colour Brown Black Total


of father
Brown 230 148

Black 51 471

Total

Chi square table value of ( 2 -1) ( 2- 1) d.f. is 3.84 . Write an inference.

Problem no -2 ) In an experiment on reimmunization of cattle . The following


results were obtained

Affected Unaffected Total

Affected 12 28

Unaffected 13 07

Total

Chi square table value of ( 2 -1) ( 2- 1) d.f. is 3.84 . Write an inference.

**************
Exercise No -15 Date :

Analysis of Variance : Analysis of Variance one way and two way classification

When we note observation from an experiment pertaining yield or measurement of


any other character, we find that the observations vary from one another greatly. This
variation is due to number of factor known s source of variation.

The portion of variation caused by different source. These source are known as
component of variation .

Role of ANOVA-

A) It sorts and estimates the variance component


B) It provides the test of significance.

 Analysis of Variance :
 Sum of Squares – It means the sum of squares of the deviation of the varieties
from their men.
 Degree of freedom – The number of degree of freedom is one less than the
number of variate in the sample concerned.
1. If no of variates are n then d.f is ( n – 1 )
2. If no of treatments re t then d.f is ( t – 1 )
 Variance - It is obtained by dividing the Sum of Square by corresponding
degree of freedom

Sum of Square
Variance = Degree of freedom

Assumption of ANOVA-

1) Normality : The value in each group re normally distributed.


2) Homogeneity : The variance within each group should be equal for all groups
( 612 =6 22 = 6 32 =................... = 6c2 )
3) Independent error : It states tht the error should be independent for each
value.
One Way Classification:

The one way classification of data are classified according to only one criteria

The null hypothesis

H0 : µ1 = µ2 = µ3 = µ4 = ..................... µk

Ie A.M of population from which the k samples were randomly drawn are equal to one
another.

The steps in carrying out the analysis are


1. Calculate the variance between the sample.
2. Calculate the variance within the sample.
3. Calculate the ratio of F

F = Between - Column variance


Within - Column variance

F = S12
S22

4. Compare the calculated value of F with table value of F for the degree of freedom t
certain critical level. Generally we take 5 % level of significance.
5. If Cal F > table F – difference is significant

If Cal F < table F- difference is not significant

Note – The procedure for ANOVA is applicable for both the equal and
unequal sample size .
Source of SS ( Sum of V ( Degree of MS ( Mean Square ) Variance Ratio
variation Square ) Freedom ) of F
Between the SSC V1 = C - 1 MSC = SSC / c-1 MSC / MSE
sample.

Within the SSE V2 = n - c MSE = SSE / n-c


sample.

Where

SST- Total Sum of Square variation


SSC- Sum of Squares between the samples.( Columns)
SSE-Sum of Square within the samples ( Rows)
MSC-Mean Sum of Squares between the samples
MSE -Mean Sum of Squares within the samples.

 Analysis of Variance in two way classification model


In two way classification the data are classified according to two different criteria or
factors

The procedure for analysis of variance is somewhat different than the one followed
while dealing with the problems of one Way classification.

In two way classification the ANOVA table take place the following form

Source of SS ( Sum Degree of MS ( Mean Sum Variance Ratio


variation of Freedom Square ) of F
Square )
Between the SSC c-1 MSC = SSC / c-1 MSC / MSE
sample.

Within the SSR r-1 MSE = SSE / r-1 MSR /MSE


sample.

Residual or SSE (c - 1 ) (r-1 ) MSE = SSE / (r-1) ( c- 1)


error
Where
SST- Sum of Square between columns
SSR - Sum of Square between rows
SSE-Sum of Square due to Error
Total number of degree of freedom ( cr-1 )
Where c refers to number of columns
r refers to number of rows
No of d.f between column ( c -1)

No of d.f between rows ( r – 1)

No of d.f. for residual ( c – 1 ) ( r – 1)

 Residual or error Sum of Square

= Total Sum of Square - Sum of Square between columns - Sum of Square between
rows

F ( v1,v2) = MSC/MSE

Where v1 = ( c -1 ) v2 = ( c – 1 ) ( r – 1)

F ( v1,v2) = MSR/MSE
Where v1 = ( r -1 ) v2 = ( c – 1 ) ( r – 1)

It should be carefully noted that v1 may not be same in both cases in end case v1 = ( c -1 )
and another case v1 = ( r -1 )

*************

You might also like