sp-sampling-lect-33

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Introduction to Sampling Theory

Lecture 33
Cluster Sampling

Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Slides can be downloaded from


http://home.iitk.ac.in/~shalab/sp
1
Estimation of Population Mean:
Consider the mean of all such cluster means as an estimator of
population mean as

1 n
ycl   yi
n i 1
1 n
E ( ycl )   E ( yi )  Y .
n i 1
N n 2 1 N
Var ( ycl )  E ( ycl  Y ) 
2

Nn
Sb , Sb2   i
N  1 i 1
( y  Y ) 2

N n 2 1 n

Var ( ycl ) 
Nn
sb , s 
2
b  i cl
n  1 i 1
( y  y ) 2

2
Comparison with SRS:
If an equivalent sample of nM units were to be selected from the
population of NM units by SRSWOR, the variance of the mean per
element would be
NM  nM S 2
Var ( ynM )  .
NM nM
f S2
 .
n M
N -n 1 N M
where f 
N
and S 
2

NM  1 i 1 j 1
( yij  Y ) 2
.

N n 2
Also Var ( ycl )  Sb
Nn
f 2
 Sb .
n 3
Comparison with SRS:
Consider N M
( NM  1) S   ( yij  Y ) 2
2

i 1 j 1

N M 2

  ( yij  yi )  ( yi  Y ) 
i 1 j 1

N M N M
  ( yij  yi )   ( yi  Y ) 2
2

i 1 j 1 i 1 j 1

 N ( M  1) S w2  M ( N  1) Sb2

1 N
where S 
N
2
w S
i 1
i
2
is the mean sum of squares within clusters in
the population.
1 M
S 
i
2
 ij i
M  1 j 1
( y  y ) 2
is the mean sum of squares for the ith
cluster.
4
Comparison with SRS:
The efficiency of cluster sampling over SRSWOR is
Var ( ynM )
E
Var ( ycl )
S2

MSb2

1  N ( M  1) S w2 
   ( N  1)  .
( NM  1)  M Sb 2

Thus the relative efficiency increases when S w2 is large and Sb2 is small.

So cluster sampling will be efficient if clusters are so formed that the


variation between the cluster means is as small as possible while
variation within the clusters is as large as possible.
5
Efficiency in Terms of Intraclass Correlation:
The intra class correlation between the elements within a cluster is
E ( yij  Y )( yik  Y ) 1
given by   ;    1
E ( yij  Y ) 2 M 1

1 N M M

 
MN ( M  1) i 1 j 1 k (  j )1
( yij  Y )( yik  Y )

1 N M

MN i 1 j 1
( yij  Y ) 2

1 N M M

 
MN ( M  1) i 1 j 1 k (  j )1
( yij  Y )( yik  Y )

 MN  1  2
 S
 MN 
N M M

 
i 1 j 1 k (  j ) 1
( yij  Y )( yik  Y )
 .
( MN  1)( M  1) S 2
6
Efficiency in Terms of Intraclass Correlation:
Consider 2
N 1 N M 
 ( yi  Y )   
2

i 1  M
 ( yij  Y ) 
i 1 j 1 
2
N  1 M
1 M M 
  2  ( yij  Y )  2
2
  ( yij  Y )( yik  Y ) 
i 1  M j 1 M j 1 k (  j ) 1 
N M M N N M
   ( yij  Y )( yik  Y )  M 2  ( yi  Y ) 2   ( yij  Y ) 2
i 1 j 1 k (  j ) 1 i 1 i 1 j 1

or
 ( MN  1)( M  1) S 2  M 2 ( N  1) Sb2  ( NM  1) S 2
or
( MN  1)
S  2
2
1   ( M  1) S 2  .
M ( N  1)
b

7
Efficiency in Terms of Intraclass Correlation:
The variance of ycl now becomes
N n 2
Var ( ycl )  Sb
Nn
N  n MN  1 S 2
2 
 1  ( M  1)   .
Nn N  1 M
MN  1 N n
For large N ,  1,  1 and so
MN N
1 S2
Var ( ycl )  1  ( M  1)  .
nM

The variance of sample mean under SRSWOR for large N is


S2
Var ( ynM )  .
nM
8
Efficiency in Terms of Intraclass Correlation:
The relative efficiency for large N is now given by

Var ( ynM )
E
Var ( ycl )
S2
 nM
S2
1  ( M  1)  
nM
1 1
 ;     1.
1  ( M  1)  ( M  1)

9
Efficiency in Terms of Intraclass Correlation:
 If M = 1 then E = 1, i.e., SRS and cluster sampling are equally
efficient. Each cluster will consist of one unit, i.e., SRS.
 If M > 1, then cluster sampling is more efficient when
E >1
or ( M  1)   0

or   0.
 If   0, then E = 1, i.e., there is no error which means that
the units in each cluster are arranged randomly. So the
sample is heterogeneous.

10
Efficiency in Terms of Intraclass Correlation: 
 In practice,  is usually positive and  decreases as M increases
but the rate of decrease in  is much lower in comparison to
the rate of increase in M.

The situation that    is possible when the nearby units are


grouped together to form cluster and which are completely
enumerated.

 There are situations when   

11
Estimation of Relative Efficiency:
The relative efficiency of cluster sampling relative to an
equivalent SRSWOR is obtained as

S2
E
MSb2
An estimator of E can be obtained by substituting the
estimates of S 2 and Sb2 .

12
Estimation of Relative Efficiency:
1 n
Since ycl   yi is the mean of n means yi from a
n i 1
population of N means yi , i  1, 2,..., N which are drawn by

SRSWOR, so from the theory of SRSWOR,


 1 n 2
E (s )  E 
2
b
 n  1

i 1
( yi  y c ) 

1 N
  i
N  1 i 1
( y  Y ) 2

 Sb2 .

2 2
s S
Thus b is an unbiased estimator of b .
13
Estimation of Relative Efficiency:
1 n 2
Since s   Si is the mean of n mean sum of squares
2
w
n i 1
Si2 drawn from the population of N mean sums of squares

Si2 , i  1, 2,..., N , so it follows from the theory of SRSWOR that


1 n 2
E ( s )  E   Si 
2
w
 n i 1 
1 N

N
 i
S 2

i 1

 S w2 .
2
Thus sw is an unbiased estimator of S w2 .

14
Estimation of Relative Efficiency:
1 N M
Consider S 
2
 ij
MN  1 i 1 j 1
( y  Y ) 2

N M 2

or ( MN  1) S 2   ( yij  yi )  ( yi  Y ) 
i 1 j 1

N M
  ( yij  yi ) 2  ( yi  Y ) 2 
i 1 j 1

N
  ( M  1) Si2  M ( N  1) Sb2
i 1

 N ( M  1) S w2  M ( N  1) Sb2 .

An unbiased estimator of S2 can be obtained as


1
Sˆ 2   N ( M  1) sw2  M ( N  1) sb2 
MN  1 15
Estimation of Relative Efficiency:
 ( y )  N  n s2
Var cl b
Nn

N  n ˆ2
S
(y ) 
Var nM
Nn M
1 n
where sb2  
n  1 i 1
( yi  ycl .
) 2

S2
An estimate of efficiency E  is
MSb2

N ( M  1) s 2
 M ( N  1) s 2
Eˆ  w b
.
M ( NM  1) sb 2

16
Estimation of Relative Efficiency:

If N is large so that M ( N  1)  MN and MN  1  MN , then

1  M  1  S w2
E  
M  M  MSb2

and its estimate is

1  M  1  s 2
Eˆ   
w
2
.
M  M  Msb

17

You might also like