Pooled estimates of indicators
Achille Lemmi, Vijay Verma, Gianni Betti, Laura Neri, Francesca Gagliardi,
Giulio Tarditi, Caterina Ferretti.1
1
University of Siena, email : lemmi@unisi.it, verma@unisi.it, betti2@unisi.it,
neri@unisi.it, gagliardi10@unisi.it, giuliotarditi@gmail.com,
caterinaferretti@libero.it.
Abstract
Reliable indicators of poverty and social exclusion are an essential monitoring tool.
Policy research and application increasingly require statistics disaggregated to lower
levels and smaller subpopulations. This paper addresses some statistical aspects relating
to improving the sampling precision of such indicators for subnational regions, in
particular through the cumulation of data.
Keywords: sample design and estimation, longitudinal data analysis, measuring poverty
and inequality.
1. Context and scope
Reliable indicators of poverty and social exclusion are an essential monitoring tool. In the
EU-wide context, these indicators are most useful when they are comparable across
countries and over time for monitoring trends. Furthermore, policy research and
application increasingly require statistics disaggregated to lower levels and smaller
subpopulations. Direct, one-time estimates from surveys designed primarily to meet
national needs tend to be insufficiently precise for meeting these new policy needs. This
is particularly true in the domain of poverty and social exclusion, the monitoring of
which requires complex distributional statistics – statistics necessarily based on intensive
and relatively small-scale surveys of households and persons.
This paper addresses some statistical aspects relating to improving the sampling precision
of such indicators for subnational regions in EU countries (Verma et al., 2006), in
particular through the cumulation of data over rounds of regularly repeated national
surveys (Verma et al., 2009). The reference data for this purpose are based on EU
Statistics on Income and Living Conditions (EU-SILC), which is the major source of
comparative statistics on income and living conditions in Europe. EU-SILC covers data
and data sources of various types: cross-sectional and longitudinal; household-level and
person-level; on income and social conditions; and from registers and interview surveys
depending on the country. A standard integrated design has been adopted by nearly all
EU countries. It involves a rotational panel in which a new sample of households and
persons is introduced each year to replace one quarter of the existing sample. Persons
enumerated in each new sample are followed-up in the survey for four years. The design
1
yields each year a cross-sectional sample, as well as longitudinal samples of various
durations. Two types of measures can be so constructed at the regional level by
aggregating information on individual elementary units: average measures such as totals,
means, rates and proportions constructed by aggregating or averaging individual values;
and distributional measures, such as measures of variation or dispersion among
households and persons in the region. Average measures are often more easily
constructed or are available from alternative sources. Distributional measures tend to be
more complex and are less readily available from sources other than complex surveys; at
the same time, such measures are more pertinent to the analysis of poverty and social
exclusion. An important point to note is that, more than at the national level, many
measures of averages can also serve as indicators of disparity and deprivation when seen
in the regional context: the dispersion of regional means is of direct relevance in the
identification of geographical disparity. Survey data such as from EU-SILC can be used
in different forms and manners to construct regional indicators.
(1) Direct estimation from survey data – in the same way as done normally at the national
level – provided that the regional sample sizes are adequate for the purpose.
(2) Constructing alternative (but with a substantively similar meaning) indicators which
utilise the available survey data more intensively.
(3) Cumulation of data over survey waves to increase precision of the direct estimates.
(4) Using survey data in conjunction with data from other (especially administrative)
sources – which are larger in size but less detailed in content than survey data – in order
to produce improved estimates using small area estimation (SAE) techniques.
(5) Going altogether beyond the survey by exploiting administrative and other sources.
2. Cumulation over waves in a rotational panel design
Illustrations from European social surveys
The two most important regular social surveys in the EU are the Labour Force Survey
(EU-LFS) and Statistics on Income and Living Conditions (EU-SILC). The EU-LFS was
initiated at EU level in 1960, with a systematic common framework adopted from 1983.
It is a large sample survey, conducted in all EU countries on a continuous basis,
providing quarterly and annual results on labour participation along with sociodemographic and educational variables. Annually ad-hoc modules dedicated to specific
topics supplement the core survey. The EU-SILC was launched starting from 2003 in
some countries; it covered 27 EU and EFTA countries by 2005, and all 30 by 2008. In
each country it involves an annual survey with a rotational panel design. Its content is
comprehensive, focusing on income, poverty and living conditions.
Both EU-LFS and EU-SILC involve comprehensiveness in the substantive dimension
(coverage of different topics), in space (coverage of different countries), and in time
(regular waves or rounds). EU-LFS involves diverse types of rotational designs; a simple
and common one is illustrated below on the left hand side. In this example, a sample
address stays in the survey for 5 consecutive quarters before being dropped. The
subsamples contributing to a particular year have been identified in the central part of the
diagram. As for EU-SILC most countries use the standard rotational household panel
2
design shown below on the right. Here the survey is annual, and each panel stays in the
survey for four consecutive years.
Pooling of data versus pooling of estimates
When two or more data sources contain – for the same type of units such as households
or persons – a set of variables measured in a comparable way, then the information may
be pooled either (a) by combining estimates from the different sources, or (b) by pooling
data at the micro level. Technical details and relative efficiencies of the procedures
depend on the situation. The two approaches may give numerically identical results, or
the one or the other may provide more accurate estimates; in certain cases, only one of
the two approaches may be appropriate or feasible in any case.
Consider for instance the common case of pooling results across countries in a multicountry survey programme such as EU-SILC or EU-LFS. For linear statistics such as
totals, pooling individual country estimates say i with some appropriate weights Pi gives
the same result as pooling data at the micro level with unit weights w ij rescaled as
wij wij . Pi
j
wij
. For ratios of the form w .v
i
j
ij
ij
j
wij .uij ,
the two forms give very similar
but not identical results, corresponding respectively to the ‘separate’ and ‘combined’
types of ratio estimate.
This paper is concerned with a different but equally common type of problem, namely
pooling of different sources pertaining to the same population or largely overlapping and
similar populations. In particular, the interest is in pooling over survey waves in a
national survey in order to increase the precision of regional estimates. Estimates from
samples from the same population are most efficiently pooled with weights in proportion
to their variances (meaning, with similar designs, in direct proportion to their sample
sizes). Alternatively, the samples may be pooled at the micro level, with unit weights
inversely proportion to their probabilities of appearing in any of the samples. This latter
procedure may be more efficient (e. g., O’Muircheataigh and Pedlow, 2002), but be
impossible to apply as it requires information, for every unit in the pooled sample, on its
probability of selection into each of the samples irrespective of whether or not the unit
appears in the particular sample (Wells, 1998). Another serious difficulty in pooling
samples is that, in the presence of complex sampling designs, the structure of the
resulting pooled sample can become too complex or even unknown to permit proper
variance estimation. In any case, different waves of a survey like EU-SILC or EU-LFS do
not correspond to exactly the same population. The problem is akin to that of combining
samples selected from multiple frames, for which it has been noted that micro level
pooling is generally not the most efficient method (Lohr and Rao, 1996).
For the above reasons, pooling of wave-specific estimates rather than of micro data sets is
generally the appropriate approach to aggregation over time from surveys such as EUSILC and EU-LFS.
3
3. Gain in precision from cumulation over survey waves
Consider that for each wave, a person’s poverty status is determined based on the income
distribution of that wave separately, and the proportion poor at each wave is computed.
These proportions are then averaged over a number of consecutive waves. The issue is to
quantify the gain in sampling precision from such pooling, given that data from different
waves of a rotational panel are highly correlated. Variance for the pooled estimators can
be estimated on the following lines, using for instance the Jackknife Repeated
Replication (JRR) procedure (see Section 4). The total sample of interest is formed by the
union of all the cross-sectional samples being compared or aggregated. Using as basis the
common structure of this total sample, a set of JRR replications is defined in the usual
way. Each replication is formed such that when a unit is to be excluded in its
construction, it is excluded simultaneously from every wave where the unit appears. For
each replication, the required measure is constructed for each of the cross-sectional
samples involved, and these measures are used to obtain the required averaged measure
for the replication, from which variance is then estimated in the usual way (Betti et al.,
2007).
Table 1: Gain from cumulation over two waves: cross-sectional and persistent
poverty rates. Poland EU-SILC 2005-2006
Sample
base
CS-2006
CS-2005
LG 05-06
LG 05-06
LG 05-06
Poverty rate
Est
HCR 2006
HCR 2005
HCR 2006
HCR 2005
Persistent ‘05-06
19.1
20.6
18.5
20.2
12.5
n
persons
45,122
49,044
32,820
32,820
32,820
%se*
actual
0.51
0.45
mean
income
0.42
1.31
0.55
0.60
14%
(1)
(2)
(3)
(4)
(5)
HCR: poverty line
national regional
0.34
0.40
1.18
1.18
0.40
0.47
0.48
0.56
30%
30%
In terms of the quantities defined above, rows (1)-(5) of Table 1 are as follows.
Standard error of average HCR over two years
(assuming independent samples)
Factor by which standard error is increased due
to positive correlation between waves
Standard error of average HCR over two years
(given correlated samples)
Average standard error over a single year
Average gain in precision (variance reduction,
or increase in effective sample size, over a
single year sample)
(1)
1 2 .V1 V 2
1/ 2
( 2) .1 b. n n H
1/ 2
(3) (1) . ( 2) . V
1/ 2
(4) V1
1/ 2
V2
1/ 2
2
(5) 1 (3) (4)
2
In place of the full JRR application, it is more illuminating to provide here the following
simplified procedure for quantifying the gain in precision from averaging over waves of
the rotational panel. It illustrates the statistical mechanism of how the gain is achieved.
Indicating by pj and p'j the (1, 0) indicators of poverty of individual j over the two
adjacent waves, we have the following for the population variances:
2
var p j p j p p.1 p v ; similarly, var p 'j p .1 p v
cov p j , p 'j p j p
. p j p a p. p c1 ,
say,
where ‘a’ is the persistent poverty rate over the two years. For the simple case where the
two waves completely overlap and p p , variance vA for the averaged measure is:
4
v
,
v A .1 b
2
c a p2 .
b 1
2
v p p
with correlation
The correlation between two periods is
expected to decline as the two become more widely separated. Consider, for example, the
case when the correlation between two points k waves apart can be approximated as
c k v c1 v k . In a set of K periods there are (K-k) pairs exactly k periods apart, k=1 to (K1). It follows that variance vK of an average over K periods relates to variance v of the
estimate from a single wave as:
k
v 1
K k c1
.
f c k . 1 2. Kk11
K v
v K
where a, the persistent poverty between pairs of adjacent waves, and p, the crosssectional poverty rate, are averages over the waves involved. For application to pairs of
waves in EU-SILC, it is necessary to allow for variations in cross-sectional sample sizes
and partial overlaps. The result is:
V V1 V2 4. 1 b. n n H
where V1 and V2 are the sampling variances, b the correlation coefficient over the two
cross-sections, n is the overlap between the cross-sectional samples, and n H is the
harmonic mean of their sample sizes n1 and n2.
The methodology described above was applied to the 2005-2006 cross-sectional and
longitudinal EU-SILC samples for Poland. Table 1 shows some results at the national
level. Averaging the poverty rate (head count ratio, HCR) over two waves leads to a
variance of this averaged estimator that is 30% less than the variance of the HCR
estimated from just a single wave.
Reduction from averaging over rounds in a rotational design
Consider a rotational sample in which each unit stays in the sample for n consecutive
periods, with the required estimate being the average over Q consecutive periods, such as
Q=4 for four-year average. The case n=1 corresponds simply to independent samples
each quarter. In the general case, the total sample involved in the estimation consists of
(n+Q-1) independent subsamples. With some simplifying but reasonable assumptions, it
can be proved (Verma, Gagliardi and Ferretti, 2009) that the variance of the pooled
estimate is approximately
m1 1
V2
V2
F (R)
m1 m2 m1 1 1 f (m1 ) 2m 1 f (m) (n Q)
Va2
m1
n Q
n Q
f (m)
2
(m 1) R (m 2) R2 ... Rm1
m
The first factor is the variance which would be obtained in the absence of correlations
between waves; F(R) is the increase over that as a results of correlations.
4. Variance and design effects
The issues addressed in this paper concern the efficiency of cumulating information over
consecutive waves of a survey such as EU-SILC, involving complex statistics based on
complex sample designs. Estimates are required for the whole population and also for
subpopulations of different types. Both cross-sectional and longitudinal statistics are
involved. Comparisons and cumulation over correlated cross-sections, with which this
paper is concerned, add another layer of complexity.
5
Jackknife Repeated Replication (JRR) provides a versatile and straightforward technique
for variance estimation in these situations. It is one of the classes of variance estimation
methods based on comparisons among replications generated through repeated resampling of the same parent sample. Once the set of replications has been appropriately
defined for any complex design, the same variance estimation algorithm can be applied to
a statistic of any complexity. We have extended and applied this method for estimating
variances for subpopulations (including regions and other geographical domains),
longitudinal measures such as persistent poverty rates, and measures of net changes and
averages over cross-sections in the rotational panel design of EU-SILC (Verma and Betti,
2007). Appropriate coding of the sample structure, in the survey micro-data and
accompanying documentation, is an essential requirement in order to compute sampling
errors taking into account the actual sample design. Lack of information on the sample
structure in survey data files is a long-standing and persistent problem in survey work,
and unfortunately affects EU-SILC as well. Indeed, the major problem in computing
sampling errors for EU-SILC is the lack of sufficient information for this purpose in the
micro-data available to researchers. We have developed approximate procedures in
order to overcome these limitations at least partially, and used them to produce useful
estimates of sampling errors (Verma et al., 2010). Use has been made of these results in
this paper, but it is not possible here to go into detail concerning them.
A most useful concept for the computation, analysis and interpretation of sampling errors
concerns ‘design effect’ (Kish, 1995). Design effect is the ratio of the variance (v) under
the given sample design, to the variance (v0) under a simple random sample of the same
size: d 2 v v0 , d se se0 . Proceeding from estimates of sampling error to estimates of design
effects is essential for understanding the patterns of variation in and the determinants of
magnitude of the error, for smoothing and extrapolating the results of computations, and
for evaluating the performance of the sampling design.
Analysis of design effects into components is also needed in order to understand from
where inefficiencies of the sample arise, to identify patterns of variation, and through that
to extend the results to other statistics, designs and situations. And most importantly, with
JRR (and other replication methods) the total design effect can only be estimated by
estimating (some of) its components separately (Verma, Betti, 2010). In applications for
EU-SILC, there is in addition a most important and special reason for decomposing the
total design effect into its components. Because of the limited information on sample
structure included in the micro-data available to researchers, direct and complete
computation of variances cannot be done in many cases. Decomposition of variances and
design effects identifies more ‘portable’ components, which may be more easily imputed
(carried over) from a situation where they can be computed with the given information, to
another situation where such direct computations are not possible. On this basis valid
estimates of variances can be produced for a wider range of statistics, thus at least partly
overcoming the problem due to lack of information on sample structure. We may
decompose total variance v (for the actual design) into the components or factors as
v v .d v .d .d .d .d , where dW is the effect of sample weights, dH of clustering of
individual persons into households, dD of clustering of households into dwellings, and d X
that of other complexities of the design, mainly clustering and stratification. All factors
other than dX do not involve clusters or strata, but depend only on individual elements
(households, persons etc.), and the sample weight associated with each such element in
2
2
0
0
W
H
D
X
6
the sample. Parameter dW depends on variability of sample weights, and secondly also on
the correlation between the weights and the variable being estimated; dH is determined by
the number of and correlation among relevant individuals in the household, and similarly
dD by the number of households per dwelling in a sample of the latter. By contrast, factor
dX represents the effect on sampling error of various complexities of the design such as
multiple stages and stratification. Hence unlike other components, dX requires
information on the sample structure linking elementary units to higher stage units and
strata. This effect can be estimated as follows using the JRR procedures. We compute
variance under two assumptions about structure of the design: variance v under the actual
design, and vR computed by assuming the design to be (weighted) simple random
sampling of the ultimate units (addresses, households, persons as the case may be). This
can be estimated from a ‘randomised sample’ created from the actual sample by
completely disregarding its structure other than the weights attached to individual
elements. This gives d v v , with vR v0 .dW .d H .d D 2 .
Table 2 gives standard error, design effect and components of design effect for the crosssectional 2006 EU-SILC sample for Poland. The sample was a two stage stratified sample
of dwellings containing 45,122 individual persons. With “%se” (3rd and last column) we
mean: for mean statistics e.g. equivalised disposable income – standard error expressed as
percentage of the mean value; for proportions and rates (e.g. poverty rates) – standard
error given as absolute percent points. Terms (%se actual) and (%se SRS) relate,
respectively, to the variances v and v0 in the text. Parameter dD cannot be estimated
separately because of lack of information, but its effect is small and is, in any case,
already incorporated into overall design effect d.
Table 2: Estimation of variance and design effects at the national level. Crosssectional sample. Poland EU-SILC 2006
2
x
R
Est.
(1) Mean equivalised disposable
income
(2) HCR – ‘head count’ or
poverty rate, using national
poverty line
(3) HCR – ‘head count’ or
poverty rate, using regional
(NUTS1) poverty line
%se
actu
al
Design effect
%se
dX
dW
dH
d
SRS
3,704 0.57
0.94
1.22
1.74
1.99 0.29
19.1
0.51
1.02
1.09
1.74
1.94 0.26
19.0
0.61
1.05
1.09
1.74
1.99 0.30
Table 2 gives poverty rates defined with respect to two different ‘levels’ of poverty line:
country level and NUTS1 level. By this we mean the population level to which the
income distribution is pooled for the purpose of defining the poverty line. Conventionally
poverty rates are defined in terms of the country poverty line (as 60% of the national
median income). The income distribution is considered at the country level, in relation to
which a poverty line is defined and the number (and proportion) of poor computed. It is
also useful to consider poverty lines at other levels. Especially useful for constructing
regional indicators is the use of regional poverty lines, i.e. a poverty line defined for each
region based only on the income distribution within that region. The numbers of poor
persons identified with these lines can then be used to estimate regional poverty rates.
They can also be aggregated upwards to give an alternative national poverty rate – but
7
which still remains based on the regional poverty lines. So defined, the poverty measures
are not affected by disparities in the mean levels of income among the regions. The
measures are therefore more purely relative.
5. Illustrative applications of cumulation at the regional level
Table 3 shows results for the estimation of variance and design effect for the crosssectional 2006 and 2005 Poland datasets. The results at national level for the three
measures considered have been already presented in previous sections. Here we present
the results at NUTS1 regional level. All the values, except “%se SRS” and dX, are
computed at regional level in the same manner as the national level. All factors other than
dX do not involve clusters or strata, but essentially depend only on individual elements
and the associated sample weights. Hence normally they are well estimated, even for
quite small regions. Factor d X G for a region (G) may be estimated in relation to d
estimated at the country (C) level on the following lines. For large regions, each with a
large enough number of PSUs (say over 25 or 30), we may estimate the variance and
hence d directly at the regional level. Sometimes a region involves a SRS of elements,
even if the national sample is multi-stage in other parts; here obviously, d 1 . If the
sample design in the region is the same or very similar to that for the country as a whole
– which is quite often the case – we can take d X G d X C . It is common that the main
difference between the regional and the total samples is the average cluster size (b). In
this case we use d 1 d 1. b b . The last-mentioned model concerns the effect of
X C
X G
X G
2
X G
2
X C
G
C
clustering and hence is meaningful only if d 1 , which is often but not always the case
in actual computations. Values smaller than 1.0 may arise when the effect of stratification
is stronger than that of clustering, when units within clusters are negatively correlated
(which is rare, but not impossible), or simply as a result of random variability in the
empirical results. In any case, if d X C 1 , the above equation should be replaced by
X C
d X G d X C .
The quantity (%se* SRS) can be directly computed at the regional level as was
done for the national level in Table 2. However, very good approximation can be usually
obtained very simply without involving JRR computations of variance. The following
model has been used in Table 3. For means (such as equivalised income) over very
similar populations, assumption of a constant coefficient of variation is reasonable. The
region-to-country ratio of relative standard errors (expressed as percentage of the mean
value as in Table 3) under simple random sampling is inversely proportional to the
square-root of their respective sample sizes: %se * SRS %se * SRS . n n . For proportions
(p, with q=100-p), with standard error expressed in absolute percent points as in Table 3,
we can take:
. A poverty rate may be treated as
p .q
% se * SRS %se * SRS . . n n
p .q
2
G
2
G
2
C
G
2
C
C
G
G
C C
C
G
proportions for the purpose of applying the above. We see from Table 3 that the
(%se*actual) at regional level is generally, for all the three measures, 2 to 3 times larger
than that at the national level.
8
Table 3: Estimation of variance and design effects at the regional (NUTS1) level.
Full cross-sectional dataset
n
%se*
%se*
2006
Est. persons SRS dX d
actual
Mean equivalised disposable income
Poland 3,704 45,122 0.29 0.94 1.99 0.57
PL1
4,236 8,728 0.65 0.94 2.06 1.34
PL2
3,889 9,273 0.63 0.94 1.78 1.13
PL3
3,162 9,079 0.64 0.94 2.00 1.28
PL4
3,530 6,912 0.73 0.94 1.90 1.39
PL5
3,906 4,538 0.90 0.94 1.96 1.77
PL6
3,419 6,592 0.75 0.94 1.90 1.43
At-risk-of-poverty rate, national poverty line
Poland 19.1 45,122 0.26 1.02 1.94 0.51
PL1
17.1 8,728 0.57 1.02 1.85 1.06
PL2
14.7 9,273 0.52 1.02 1.86 0.97
PL3
25.2 9,079 0.64 1.02 2.09 1.34
PL4
18.7 6,912 0.66 1.02 1.98 1.32
PL5
18.6 4,538 0.82 1.02 1.91 1.56
PL6
21.4 6,592 0.71 1.02 1.95 1.40
At-risk-of-poverty rate, regional poverty lines
Poland 19.0 45,122 0.30 1.05 1.99 0.61
PL1
19.8 8,728 0.70 1.04 1.90 1.34
PL2
18.5 9,273 0.67 1.04 1.91 1.27
PL3
18.6 9,079 0.68 1.06 2.14 1.45
PL4
17.5 6,912 0.76 1.05 2.04 1.54
PL5
20.9 4,538 1.00 1.04 1.97 1.96
PL6
19.1 6,592 0.80 1.05 2.00 1.60
n
%se*
2005
Est. persons actual
3,040 49,044 0.62
3,455 9,871 1.32
3,143 10,181 1.22
2,618 9,674 1.32
2,977 7,195 1.84
3,164 5,066 1.85
2,816 7,057 1.58
20.6 49,044 0.45
19.1 9,871 0.92
16.4 10,181 0.87
25.2 9,674 1.13
20.2 7,195 1.19
20.2 5,066 1.43
23.7 7,057 1.26
20.5 49,044 0.51
20.9 9,871 1.07
19.0 10,181 1.05
20.8 9,674 1.21
20.1 7,195 1.35
22.2 5,066 1.68
21.3 7,057 1.37
Regional HCR estimates based on the national poverty line are quite different from those
based on the regional ones. Also, while individual regional estimates of HCR using the
regional poverty line are quite close to the national estimate (19.0 for 2006), the ones
using the national poverty line are more variable (from 14.7 to 25.2 for 2006).
From Table 4 below it can be seen that generally for the HCR measures, both for country
and NUTS1 level poverty lines, cumulating the estimates over two waves leads to a
reduction of 30% in variance compared to that for a single wave. This reduction of the
variance is smaller for mean equivalised income due to a higher correlation between
incomes for the two years – generally the coefficient of correlation of the equivalised
income between waves exceeds 0.70.
Table 4: Gain in precision from averaging over correlated samples. Poland NUTS1
regions
Mean equivalised income
(1)
(2)
(3)
(4)
(5)
Country
0.42
1.31
0.55
0.60
14%
PL1
0.94
1.33
1.26
1.33
11%
PL2
0.83
1.30
1.08
1.17
15%
PL3
0.92
1.31
1.20
1.30
14%
PL4
1.15
1.27
1.47
1.62
18%
PL5
1.28
1.32
1.70
1.81
12%
PL6
1.07
1.32
1.41
1.51
12%
9
HCR national poverty line
(1)
(2)
(3)
(4)
(5)
Country
0.34
1.18
0.40
0.48
30%
PL1
0.70
1.18
0.83
0.99
29%
PL2
0.65
1.17
0.76
0.92
31%
PL3
0.88
1.18
1.03
1.24
30%
PL4
0.89
1.18
1.05
1.26
30%
PL5
1.06
1.17
1.23
1.50
32%
PL6
0.94
1.19
1.12
1.33
29%
PL3
0.94
1.17
1.10
1.33
31%
PL4
1.03
1.18
1.21
1.45
30%
PL5
1.29
1.17
1.51
1.82
31%
PL6
1.05
1.18
1.24
1.49
31%
HCR regional poverty line
(1)
(2)
(3)
(4)
(5)
Country
0.40
1.18
0.47
0.56
30%
PL1
0.86
1.18
1.02
1.21
29%
PL2
0.83
1.18
0.98
1.16
29%
Rows (1) – (5) have been defined in Table 1.
References
Betti, G., Gagliardi, F., Nandi, T.: Jackknife variance estimation of differences and
averages of poverty measures. Working Paper no° 68/2007, DMQ, Università di Siena
(2007).
Kish, L.: Methods for design effects. J. Official Statist. 11, 55-77 (1995).
Lohr, S. L., Rao, J,N.K.: Inference from dual frame surveys. Journal of American
Statistical Association, 95, 271-280 (2000).
O’Muircheataigh, C., Pedlow, S.: Combining samples vs. cumulating cases: a comparison
of two weighting strategies in NLS97. American Statistical Association Proceedings
of the Joint Statistical Meetings, pp. 2557-2562 (2002).
Verma, V., Betti, G.: Cross-sectional and Longitudinal Measures of Poverty and
Inequality: Variance Estimation using Jackknife Repeated Replication. Conference
2007 ‘Statistics under one Umbrella’, Bielefeld University (2007).
Verma, V., Betti, G.: Taylor linearization sampling errors and design effects for poverty
measures and other complex statistics, Journal of Applied Statistics (2010), on-line
first.
Verma, V., Betti, G., Gagliardi, F.: An assessment of survey errors in EU-SILC, Eurostat
Methodologies and Working Papers, Eurostat, Luxembourg (2010).
Verma, V., Betti, G., Natilli, M., Lemmi, A.: Indicators of social exclusion and poverty in
Europe’s regions. Working Paper no° 59/2006, DMQ, Università di Siena (2006).
Verma, V., Gagliardi, F., Ferretti, C.: On pooling of data and measures. Working Paper
no° 84/2009, DMQ, Università di Siena (2009).
Wells, J. E.: Oversampling through households or other clusters: comparison of methods
for weighting the oversample elements. Australian and New Zeeland Journal of
Statistics, 40, 269-277 (1998).
10