sp-sampling-lect-33
sp-sampling-lect-33
sp-sampling-lect-33
Lecture 33
Cluster Sampling
Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
1 n
ycl yi
n i 1
1 n
E ( ycl ) E ( yi ) Y .
n i 1
N n 2 1 N
Var ( ycl ) E ( ycl Y )
2
Nn
Sb , Sb2 i
N 1 i 1
( y Y ) 2
N n 2 1 n
Var ( ycl )
Nn
sb , s
2
b i cl
n 1 i 1
( y y ) 2
2
Comparison with SRS:
If an equivalent sample of nM units were to be selected from the
population of NM units by SRSWOR, the variance of the mean per
element would be
NM nM S 2
Var ( ynM ) .
NM nM
f S2
.
n M
N -n 1 N M
where f
N
and S
2
NM 1 i 1 j 1
( yij Y ) 2
.
N n 2
Also Var ( ycl ) Sb
Nn
f 2
Sb .
n 3
Comparison with SRS:
Consider N M
( NM 1) S ( yij Y ) 2
2
i 1 j 1
N M 2
( yij yi ) ( yi Y )
i 1 j 1
N M N M
( yij yi ) ( yi Y ) 2
2
i 1 j 1 i 1 j 1
N ( M 1) S w2 M ( N 1) Sb2
1 N
where S
N
2
w S
i 1
i
2
is the mean sum of squares within clusters in
the population.
1 M
S
i
2
ij i
M 1 j 1
( y y ) 2
is the mean sum of squares for the ith
cluster.
4
Comparison with SRS:
The efficiency of cluster sampling over SRSWOR is
Var ( ynM )
E
Var ( ycl )
S2
MSb2
1 N ( M 1) S w2
( N 1) .
( NM 1) M Sb 2
Thus the relative efficiency increases when S w2 is large and Sb2 is small.
1 N M M
MN ( M 1) i 1 j 1 k ( j )1
( yij Y )( yik Y )
1 N M
MN i 1 j 1
( yij Y ) 2
1 N M M
MN ( M 1) i 1 j 1 k ( j )1
( yij Y )( yik Y )
MN 1 2
S
MN
N M M
i 1 j 1 k ( j ) 1
( yij Y )( yik Y )
.
( MN 1)( M 1) S 2
6
Efficiency in Terms of Intraclass Correlation:
Consider 2
N 1 N M
( yi Y )
2
i 1 M
( yij Y )
i 1 j 1
2
N 1 M
1 M M
2 ( yij Y ) 2
2
( yij Y )( yik Y )
i 1 M j 1 M j 1 k ( j ) 1
N M M N N M
( yij Y )( yik Y ) M 2 ( yi Y ) 2 ( yij Y ) 2
i 1 j 1 k ( j ) 1 i 1 i 1 j 1
or
( MN 1)( M 1) S 2 M 2 ( N 1) Sb2 ( NM 1) S 2
or
( MN 1)
S 2
2
1 ( M 1) S 2 .
M ( N 1)
b
7
Efficiency in Terms of Intraclass Correlation:
The variance of ycl now becomes
N n 2
Var ( ycl ) Sb
Nn
N n MN 1 S 2
2
1 ( M 1) .
Nn N 1 M
MN 1 N n
For large N , 1, 1 and so
MN N
1 S2
Var ( ycl ) 1 ( M 1) .
nM
Var ( ynM )
E
Var ( ycl )
S2
nM
S2
1 ( M 1)
nM
1 1
; 1.
1 ( M 1) ( M 1)
9
Efficiency in Terms of Intraclass Correlation:
If M = 1 then E = 1, i.e., SRS and cluster sampling are equally
efficient. Each cluster will consist of one unit, i.e., SRS.
If M > 1, then cluster sampling is more efficient when
E >1
or ( M 1) 0
or 0.
If 0, then E = 1, i.e., there is no error which means that
the units in each cluster are arranged randomly. So the
sample is heterogeneous.
10
Efficiency in Terms of Intraclass Correlation:
In practice, is usually positive and decreases as M increases
but the rate of decrease in is much lower in comparison to
the rate of increase in M.
11
Estimation of Relative Efficiency:
The relative efficiency of cluster sampling relative to an
equivalent SRSWOR is obtained as
S2
E
MSb2
An estimator of E can be obtained by substituting the
estimates of S 2 and Sb2 .
12
Estimation of Relative Efficiency:
1 n
Since ycl yi is the mean of n means yi from a
n i 1
population of N means yi , i 1, 2,..., N which are drawn by
Sb2 .
2 2
s S
Thus b is an unbiased estimator of b .
13
Estimation of Relative Efficiency:
1 n 2
Since s Si is the mean of n mean sum of squares
2
w
n i 1
Si2 drawn from the population of N mean sums of squares
i 1
S w2 .
2
Thus sw is an unbiased estimator of S w2 .
14
Estimation of Relative Efficiency:
1 N M
Consider S
2
ij
MN 1 i 1 j 1
( y Y ) 2
N M 2
or ( MN 1) S 2 ( yij yi ) ( yi Y )
i 1 j 1
N M
( yij yi ) 2 ( yi Y ) 2
i 1 j 1
N
( M 1) Si2 M ( N 1) Sb2
i 1
N ( M 1) S w2 M ( N 1) Sb2 .
N n ˆ2
S
(y )
Var nM
Nn M
1 n
where sb2
n 1 i 1
( yi ycl .
) 2
S2
An estimate of efficiency E is
MSb2
N ( M 1) s 2
M ( N 1) s 2
Eˆ w b
.
M ( NM 1) sb 2
16
Estimation of Relative Efficiency:
1 M 1 S w2
E
M M MSb2
1 M 1 s 2
Eˆ
w
2
.
M M Msb
17