Chapter11 Sampling Systematic Sampling
Chapter11 Sampling Systematic Sampling
Chapter11 Sampling Systematic Sampling
Systematic Sampling
The systematic sampling technique is operationally more convenient than simple random sampling. It
also ensures, at the same time that each unit has an equal probability of inclusion in the sample. In this
method of sampling, the first unit is selected with the help of random numbers, and the remaining units
are selected automatically according to a predetermined pattern. This method is known as systematic
sampling.
Suppose the N units in the population are numbered 1 to N in some order. Suppose further that N is
expressible as a product of two integers n and k , so that N nk .
So the first unit is selected at random and other units are selected systematically. This systematic sample
is called kth systematic sample and k is termed as a sampling interval. This is also known as linear
systematic sampling.
The observations in the systematic sampling are arranged as in the following table:
Systematic sample 1 2 3 i k
number
Sample 1 y1 y2 y3 yi yk
composition
2 yk 1 yk 2 yk 3 yk i y2k
n
y( n 1) k 1 y( n 1) k 2 y( n 1) k 3 y( n 1) k i ynk
Probability 1 1 1 1 1
k k k k k
Sample mean y1 y2 y3 yi yk
- Select the (i, j )th unit, i.e., j th unit in i th row as the first unit.
- Then the rows to be selected are
i, i , i 2,..., i (m 1)
and columns to be selected are
j , j k , j 2k ,..., j (n 1)k .
- The points at which the m selected rows and n selected columns intersect determine the position
of mn selected units in the sample.
Such a sample is called an aligned sample.
Under certain conditions, an unaligned sample is often superior to an aligned sample as well as a
stratified random sample.
i 1, 2,..., k , j 1, 2,..., n.
Suppose the drawn random number is i k .
Sample consists of i th column (in the earlier table).
Consider the sample mean given by
1 n
ysy yi yij
n j 1
as an estimator of the population mean given by
1 k n
Y yij
nk i 1 j 1
1 k
yi .
k i 1
1
Probability of selecting i th column as systematic sample .
k
So
1 k
E ( ysy ) yi Y .
k i 1
Further,
1 k
Var ( ysy ) ( yi Y )2 .
k i 1
k n 2
( yij yi ) ( yi Y )
i 1 j 1
k n k
( yij yi ) 2 n ( yi Y ) 2
i 1 j 1 i 1
k
k (n 1) S wsy
2
n ( yi Y ) 2
i 1
where
1 k n
S 2
wsy
k (n 1) i 1 j 1
( yij yi ) 2
is the variation among the units that lies within the same systematic sample . Thus
N 1 2 k (n 1) 2
Var ( ysy ) S S wsy
N N
N 1 2 (n 1) 2
S S wsy
N n
Variation Pooled within
as a variation of the
whole k systematic sample
with N nk . This expression indicates that when the within variation is large, then Var ( yi ) becomes
smaller. Thus higher heterogeneity makes the estimator more efficient and higher heterogeneity is well
expected in a systematic sample.
( yij Y )( yi Y )
nk (n 1) i 1 j ( ) 1 1
.
nk 1 2
S
nk
So substituting
k n n
(y
i 1 j ( ) 1 1
ij Y )( yi Y ) (n 1)(nk 1) w S 2
in Var ( yi ) gives
nk 1 S 2
Var ( ysy ) 1 w (n 1)
nk n
N 1 S 2
1 w (n 1).
N n
- 2
more efficient than ySRS when S wsy S2 .
- 2
less efficient than ySRS when S wsy S2.
- 2
equally efficient as ySRS when S wsy S 2.
Var ( ySRS )
RE
Var ( ysy )
N n 2
S
Nn
N 1 2
S 1 w (n 1)
Nn
N n 1
N 1 1 w (n 1)
n(k 1) 1 1
; 1.
(nk 1) 1 w (n 1) nk 1
Thus ysy is
1
- more efficient than ySRS when w
nk 1
1
- less efficient than ySRS when w
nk 1
1
- equally efficient as ySRS when w .
nk 1
Systematic sample 1 2 3 i k
number
Sample 1 y1 y2 y3 yi yk
composition
2 yk 1 yk 2 yk 3 yk i y2 k
n
y( n 1) k 1 y( n 1) k 2 y( n 1) k 3 y( n 1) k i ynk
Probability 1 1 1 1 1
k k k k k
Sample mean y1 y2 y3 yi yk
Considering the set up of stratified sample in the set up of a systematic sample, we have
- Number of strata = n
- Size of strata = k (row size)
- Sample size to be drawn from each stratum = 1
1 n
yst ky j
nk j 1
1 n
yj
n j 1
1 n
Var ( yst )
n2
Var ( y )
j 1
j
1 n
k 1 2 N n 2
n2
j 1 k .1
S j using Var ( ySRS )
Nn
S
k 1 n 2
Sj
kn 2 j 1
k 1 2
S wst
nk
N n 2
S wst
Nn
where
1 k
S 2j
k 1 i 1
( yij y j ) 2
1 n 2 1 k n
2
S wst j n(k 1)
n j 1
S
i 1 j 1
( yij y j ) 2
nk (n 1) i 1 j 1
( yij y j )( yi y )
=
1 k n
( yij y j )2
nk i 1 j 1
k n n
i 1 j 1
( yij y j )( yi y )
( N 1)(n 1) S wst
2
So
1
Var ( ysy ) ( N n) S wst
2
( N n)(n 1) wst S wst
2
n2k
N n 2
S wst 1 (n 1) wst . (using N nk )
Nn
Thus
N n
Var ( ysy ) Var ( yst ) (n 1) wst S wst
2
Nn
and the relative efficiency of systematic sampling relative to equivalent stratified sampling is given by
Var ( yst ) 1
RE .
Var ( ysy ) 1 (n 1) wst
- less efficient than the corresponding equivalent stratified sample when wst 0
- equally efficient than the corresponding equivalent stratified sample when wst 0.
So the values of successive units in the population increase in accordance with a linear model so that
yi a bi, i 1, 2,..., N .
Now we determine the variances of ySRS , ysy and yst under this linear trend.
Under SRSWOR
N n 2
V ( ySRS ) S .
Nn
Here N nk
1 N
Y ab i
N i 1
1 N ( N 1)
ab
N 2
N 1
ab
2
1 N
S2
N 1 i 1
( yi Y ) 2
2
1 N N 1
N 1 i 1
a bi a b
2
2
b2 N N 1
i 2
N 1 i 1
b2 N 2 N 1
2
i N
N 1 i 1 2
b 2 N ( N 1)(2 N 1) N ( N 1) 2
N 1 6 4
N ( N 1)
b2
12
nk n 2 nk (nk 1)
Var ( ySRS ) b
nk .n 12
2
b
(k 1)(nk 1).
12
Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur
Page 9
Under systematic sampling
Earlier yij denoted the value of study variable with the j th unit in the i th systematic sample. Now yij
2
k k
n 1 nk 1
i 1
( yi Y ) a b i
2
i 1
k a b
2 2
2
k
k 1
b i
2
i 1 2
k 2 k 1
2
k 1 k
b i k
2
2 i
i 1 2 2 i 1
k (k 1)(2k 1) k 1 2 k ( k 1)
b 2
(k 1)
6 2 2
b2
k (k 2 1)
12
1 b2
Var ( ysy ) k (k 2 1)
k 12
b2
(k 2 1).
12
N n 2 k 1 2
Var ( yst ) S wst S wst
Nn nk
1 n 2
2
where S wst Sj
n j 1
1 k n
n(k 1) i 1 j 1
( yij y j ) 2
2
1 k n
k 1
n(k 1) i 1 j 1
a b i ( j 1)k a b
2
( j 1)k
2
b2 k n
k 1
i
n(k 1) i 1 j 1
2
b 2 nk (k 2 1)
n(k 1) 12
k (k 1)
b2
12
k 1 2 k (k 1)
Var ( yst ) b
nk 12
b k 1
2 2
12 n
1
If k is large, so that is negligible, then comparing Var ( yst ),Var ( ysy ) and V ( ySRS ),
k
Var ( yst ) : Var ( ysy ) : Var ( ySRS )
k 2 1
or : k 2 1 : (k 1)(1 nk )
n
k 1
or : k 1 : nk 1
n
k 1 k 1 nk 1
or : :
n(k 1) k 1 k 1
1
1 : n
n
Thus
1
Var ( yst ) : Var ( ysy ) : Var ( ySRS ) :: : 1 : n
n
So stratified sampling is best for linearly trended population. Next best is systematic sampling.
( y ) 1 1 s2
Var sy wc
n nk
1 n 1
2
where swc
n 1 j 0
( yi jk yi ) 2 .
4. The interpenetrating subsamples can be utilized by dividing the sample into C groups each of
n
size . Then the group means are y1 , y2 ,..., yc . Now find
c
1 c
y yt
c t 1
1 c
( y )
Var sy ( yt y )2 .
c(c 1) t 1
Then consider the following sample mean as an estimator of the population mean
1 n 1
n 1 yij if i p
ysy yi n
j 1
1
yij if i p.
n j 1
In this case
1 p 1 n 1 n
1 n
E ( yi ) yij yij
k i 1 n 1 j 1 i p 1 n j 1
Y.
An unbiased estimator of Y is
k
ysy*
N
y
j
ij
k
Ci
N
where Ci nyi is the total of values of the i th column.
k
E ( ysy* ) E (Ci )
N
k 1 k
. Ci
N k i 1
Y
k 2 k 1 *2
Var ( ysy* ) Sc
N2 k
2
1 k NY
where S *2
c nyi
k 1 i 1
.
k
When population size N is not expressible as the product of n and k , then let
N nq r.
Then take the sampling interval as
n
q if r
k 2.
q 1 n
if r
2
M M
Let denotes the largest integer contained in .
g g
If k q* ( q or q 1) , then the
N N N
* with probability * 1 *
q q q
number of units expected in sample
N 1 with probability N N .
q* * *
q q
If q q* , then we get
r r r
n with probability 1
q q q
n* .
n r r r
1 with probability
q q q
Similarly, if q* q 1, then
nr (n r ) nr
n with probability 1
q 1 (q 1) q 1
n
*
n n r 1 with probability n r (n r ) .
q 1
q 1 (q 1)
We now prove the following theorem which shows how to obtain an unbiased estimator of the population
mean when N nk .
Theorem: In systematic sampling with sampling interval k from a population with size N nk , an
where i stands for the i th systematic sample, i 1, 2,..., k and n' denotes the size of i th systematic
sample.
1
Proof. Each systematic sample has a probability . Hence
k
k
1 k n'
E (Yˆ ) . y
i 1 k N i
1 k
n'
N
y .
i 1 i
Now, each unit occurs in only one of the k possible systematic samples. Hence
k
n'
N
y Y ,
i 1 i 1
i
i
Example:
14
Let N 14 and n 5. Then, k nearest integer to 3. Let the first number selected at random from
5
1 to 14 be 7. Then, the circular systematic sample consists of units with serial numbers
7,10,13, 16-14=2, 19-14=5.
This procedure is illustrated diagrammatically in the following figure.
1
12
2
13
3
12
4
11
5
10 6
7
9
8
n
where y denotes the total of y values in the i th circular systematic sample, i 1, 2,..., N . We note
i
here that in circular systematic sampling, there are N circular systematic samples, each having
1
probability of its selection. Hence,
N
N
1 n 1 1 N n
E( y ) y y
i 1 n i N Nn i 1 i
Clearly, each unit of the population occurs in n of the N possible circular systematic sample means.
Hence,
N
n N
i 1
y
i
n
i 1
Yi ,
What to do when N nk
One of the following possible procedures may be adopted when N nk .
(i) Drop one unit at random if the sample has (n 1) units.
(ii) Eliminate some units so that N nk .
(iii) Adopt circular systematic sampling scheme.
(iv) Round off the fractional interval k .