Generic TST Protocol Distributed Annexes KNCV

Download as xls, pdf, or txt
Download as xls, pdf, or txt
You are on page 1of 14

Sample size calculation for obtaining a statistically significant difference between prevalences in two surveys at 5% s

see: Sample size determination in health studies, a practical manual. SK Lwanga and S Lemeshow. WHO, Geneva, 1991

N= 10.5 * (P1*(1-P1) + P2*(1-P2))


(P1-P2)2

P1 P2 P1-P2 V N
0.4 0.1 0.3 0.33 39
0.4 0.2 0.2 0.4 105
0.4 0.25 0.15 0.4275 200
0.4 0.3 0.1 0.45 473
0.4 0.35 0.05 0.4675 1964

0.3 0.1 0.2 0.3 79


0.3 0.15 0.15 0.3375 158
0.3 0.2 0.1 0.37 389
0.3 0.25 0.05 0.3975 1670
0.3 0.275 0.025 0.409375 6878

0.2 0.1 0.1 0.25 263


0.2 0.125 0.075 0.269375 503
0.2 0.15 0.05 0.2875 1208
0.2 0.175 0.025 0.304375 5113
0.2 0.19 0.01 0.3139 32960
s in two surveys at 5% significance level and with 90% power
WHO, Geneva, 1991
Sample size calculation for estimating the difference between prevalences in two surveys with absolute precision
see: Sample size determination in health studies, a practical manual. SK Lwanga and S Lemeshow. WHO, Geneva, 1991

N= z-square * (P1*(1-P1) + P2*(1-P2))


(absolute precision)2

P1 P2 P1-P2 V precision z-square


0.4 0.3 0.1 0.45 0.01 3.8416
0.4 0.3 0.1 0.45 0.02 3.8416
0.4 0.3 0.1 0.45 0.03 3.8416
0.4 0.3 0.1 0.45 0.04 3.8416
0.4 0.3 0.1 0.45 0.05 3.8416

0.3 0.2 0.1 0.37 0.01 3.8416


0.3 0.2 0.1 0.37 0.02 3.8416
0.3 0.2 0.1 0.37 0.03 3.8416
0.3 0.2 0.1 0.37 0.04 3.8416
0.3 0.2 0.1 0.37 0.05 3.8416

0.2 0.1 0.1 0.25 0.01 3.8416


0.2 0.1 0.1 0.25 0.02 3.8416
0.2 0.1 0.1 0.25 0.03 3.8416
0.2 0.1 0.1 0.25 0.04 3.8416
0.2 0.1 0.1 0.25 0.05 3.8416
o surveys with absolute precision
S Lemeshow. WHO, Geneva, 1991

N
17287
4322
1921
1080
691

14214
3553
1579
888
569

9604
2401
1067
600
384
DESIGN EFFECT:
The design effect can be calculated as the variance around the outcome variable in the cluster design divided by
the variance as if it were a simple random sample of the same size

STATA
Using data on an individual basis, the design effect (deff) is by default calculated in STATA with the procedure svy
Example of syntax:
Use [dataset]
svymean [variable name], strata ([name of district variable])
Using aggregated data, the design effect (deff) can be calculated in STATA with blogit
Example of syntax:
blogit nrpositive nrtotal, cluster(cluster)
blogit nrpositive nrtotal
(where nrpositive is the number of those who participants who were positive, and nrtotal is the total number of participants
(the design effect is the ratio of the variances (square of the robust standard errors))

SPSS
In SPSS, the design effect (deff) is calculated with Analyze --> Complex Samples. Request the design effect with th

Excel
In Excel, the design effect (deff) can be calculated approximately as described hereafter:

Step 1: Calculation of the variance around the prevalence in a simple random sample (SRS)

prevalence of
district sample size no. positive TST infection
a 1200 98 0.0817
b 1100 81 0.0736
c 1050 72 0.0686
d 950 108 0.1137
e 1000 101 0.1010
f 900 87 0.0967
g 1150 83 0.0722
h 950 78 0.0821
i 1000 111 0.1110
j 900 98 0.1089
total 10200 917 0.0899

Variance (SRS) = mean prevalence * (1-mean prevalence)


(sample size-1)

Variance (SRS) = 8.0223157399205E-06

Step 2: Calculation of the variance around the prevalence in the cluster design
square of
prevalence of (district prevalence -
district sample size no. positive TST infection mean prevalence)
a 1200 98 0.081666666667 0.00006782
b 1100 81 0.073636363636 0.00026457
c 1050 72 0.068571428571 0.00045499
d 950 108 0.113684210526 0.00056560
e 1000 101 0.101000000000 0.00012317
f 900 87 0.096666666667 0.00004576
g 1150 83 0.072173913043 0.00031428
h 950 78 0.082105263158 0.00006079
i 1000 111 0.111000000000 0.00044513
j 900 98 0.108888888889 0.00036050
total 10200 917

number of districts = mean prevalence = sum of squares =


10 0.089901960784 0.00270261

Cluster variance = Sum of squares of (prevalence - mean prevalence)


square of (number of districts - 1)

Cluster variance = 0.00002703

Step 3: Calculation of the design effect (DEFF)

Design effect (DEFF) = Cluster variance


SRS variance

Design effect (DEFF) = 3.36886181366873


the cluster design divided by

STATA with the procedure svymean, svytotal, svyratio and svyprop

s the total number of participants in the survey)

equest the design effect with the Statistics button.


ce - mean prevalence)
umber of districts - 1)
Example for how to perform a TWO-STAGE CLUSTER SAMPLE
with districts sampled proportional to size, and ramdom sampling of schools within the districts
see: Guidelines for conducting TST surveys in high prevalence countries. Arnadottir et al, Tubercle and Lung Disease 199

Step 0: Estimate the required number of children to be included in the survey, taking into account design effect, e
Assume 100.000 children will be included in the study

Step 1: list all the districts (administrative areas) in alphabetical order, with its population size and calculate the c
District District population Cumulative population
A 556000 556000
B 125000 681000
C 245000 926000
D 73000 999000
E 156000 1155000
F 468000 1623000
G 74000 1697000
H 356000 2053000
I 64000 2117000
J 231000 2348000
K 639000 2987000
L 123000 3110000
M 54000 3164000
N 185000 3349000
O 354000 3703000
P 568000 4271000
Q 34000 4305000
Total 4305000

Step 2: decide on the number of clusters that will be sampled in the whole country
No.of clusters that
will be sampled
(the min.
recommended is
30): 30

Step 3: calculate the sampling interval (automatically calculated)

Sampling interval is
the total population
/ no. of clusters: 143500

Step 4: assign a random number


Type '=rand()' in an empty cell and press enter
Multiply this random number between 0 and 1 with the sampling interval, e.g.:

0.84241947305981 120887

If we assume that the random number is 0.165547, then the random start number is 23756

Step 5: list the sampling interval number added with subsequent counts of the sampling interval:
random start number: 23756
sampling interval: 143500
Step 6: allocate the districts included in the sample on the basis of the random start number and sampling interva
random start number: 23756
random start number + sampling interval: 167256
random start number + 2*sampling interval 310756
random start number + 3*sampling interval 454256
random start number + 4*sampling interval 597756
random start number + 5*sampling interval 741256
random start number + 6*sampling interval 884756
random start number + 7*sampling interval 1028256
random start number + 8*sampling interval 1171756
random start number + 9*sampling interval 1315256
random start number + 10*sampling interv 1458756
etc. etc. 1602256

School 1 will be in the district with a cumulative population equal to or under 23756
School 2 will be in the district with a cumulative population equal to or under 167256

Step 7: apply the random start number to the cumulative population to determine the number of times a district wi
District District population Cumulative population Sampling o# of samples per district
A 556000 556000 1, 2, 3, 4 4
B 125000 681000 5 1
C 245000 926000 6, 7 2
D 73000 999000 0
E 156000 1155000 8 1
F 468000 1623000 9, 10, 11, 4
G 74000 1697000 etc. etc. etc. etc.
H 356000 2053000
I 64000 2117000
J 231000 2348000
K 639000 2987000
L 123000 3110000
M 54000 3164000
N 185000 3349000
O 354000 3703000
P 568000 4271000
Q 34000 4305000
Total 4305000

Step 8: randomly select the number of schools in district A needed to include the required number per cluster
If a total of 10.000 children will be included in the survey and 30 clusters are sampled, each cluster should deliver approxim

District A is included 4 times, so 4*333=1332 children should be included in the survey in district A.
If the average number of eligible chidren in a school is 100 in district A, at least 14 schools should be sampled.

District B is included in the sample one time, so 333 children should be included in the survey in district B
If the average number of eligible children in a school is 80 in district B, at least 42 schools should be sampled.

etc. etc.
ubercle and Lung Disease 1996;77:Suppl 1-20

into account design effect, exclusions and if necessary BCG vaccination coverage

tion size and calculate the cumulation population

ng interval:
number and sampling interval

number of times a district will be sampled:


ples per district

uired number per cluster


cluster should deliver approximately 10.000/30=333 children

hould be sampled.

ey in district B
hould be sampled.
Example of BUDGET with most common budget components
Insert other components if necesarry

Valuta (US Dollar, Euro, etc.)


Human Resources
Salary field team (team leader, team members, chauffeur)
Salary investigator(s) and datamanager
Allowance trainer
Allowance epidemiologist/statistician

Travelling expenses
Vehicle costs including maintenance and insurance
Fuel for planning visits
Fuel for field team during training
Fuel for field team during survey
Travel costs for supervision visits by investigator(s)
Travel costs for shifting camp between regions
Travel costs trainer
Travel costs epidemiologist/statistician
Travel costs to test smear-positives

Accomodation and food (per diem)


Per diem for field team during training
Per diem for field team during survey
Per diem for field team to test smear-positives
Per diem trainer
Per diem epidemiologist/statistician

Supplies
Tuberculin vials (training and survey)
Disposable 1 ml syringes (training and survey)
Transparant rulers
Cotton wool, alcohol, plasters
Vaccine carriers and icepacks for transportation of tuberculin
Waste buckets
Containers for used needles
Calculator
T shirts and caps for survey teams
Information leaflets (teachers and parents)
Stationary, printing and copy costs, stamps, etc.
Other office supplies (pens, staplers, envelops, etc)
Personal computer(s)
Printer(s)
Data storage devices (e.g. external hard disk, rewritable CD's, removable disc)
Incentives for children / schools
Refrigerator (plus gas cylinders)

Other costs
Costs follow-up children with induration (medicine, health care if not provided by NTP)
Insurance for field team and study population
Dissemenation of results (meeting, printing of report, conference visit)

Total
Valuta (US Dollar, Euro, etc.)

You might also like