Academia.eduAcademia.edu

Pooled estimates of indicators

Pooled estimates of indicators Achille Lemmi, Vijay Verma, Gianni Betti, Laura Neri, Francesca Gagliardi, Giulio Tarditi, Caterina Ferretti.1 1 University of Siena, email : lemmi@unisi.it, verma@unisi.it, betti2@unisi.it, neri@unisi.it, gagliardi10@unisi.it, giuliotarditi@gmail.com, caterinaferretti@libero.it. Abstract Reliable indicators of poverty and social exclusion are an essential monitoring tool. Policy research and application increasingly require statistics disaggregated to lower levels and smaller subpopulations. This paper addresses some statistical aspects relating to improving the sampling precision of such indicators for subnational regions, in particular through the cumulation of data. Keywords: sample design and estimation, longitudinal data analysis, measuring poverty and inequality. 1. Context and scope Reliable indicators of poverty and social exclusion are an essential monitoring tool. In the EU-wide context, these indicators are most useful when they are comparable across countries and over time for monitoring trends. Furthermore, policy research and application increasingly require statistics disaggregated to lower levels and smaller subpopulations. Direct, one-time estimates from surveys designed primarily to meet national needs tend to be insufficiently precise for meeting these new policy needs. This is particularly true in the domain of poverty and social exclusion, the monitoring of which requires complex distributional statistics – statistics necessarily based on intensive and relatively small-scale surveys of households and persons. This paper addresses some statistical aspects relating to improving the sampling precision of such indicators for subnational regions in EU countries (Verma et al., 2006), in particular through the cumulation of data over rounds of regularly repeated national surveys (Verma et al., 2009). The reference data for this purpose are based on EU Statistics on Income and Living Conditions (EU-SILC), which is the major source of comparative statistics on income and living conditions in Europe. EU-SILC covers data and data sources of various types: cross-sectional and longitudinal; household-level and person-level; on income and social conditions; and from registers and interview surveys depending on the country. A standard integrated design has been adopted by nearly all EU countries. It involves a rotational panel in which a new sample of households and persons is introduced each year to replace one quarter of the existing sample. Persons enumerated in each new sample are followed-up in the survey for four years. The design 1 yields each year a cross-sectional sample, as well as longitudinal samples of various durations. Two types of measures can be so constructed at the regional level by aggregating information on individual elementary units: average measures such as totals, means, rates and proportions constructed by aggregating or averaging individual values; and distributional measures, such as measures of variation or dispersion among households and persons in the region. Average measures are often more easily constructed or are available from alternative sources. Distributional measures tend to be more complex and are less readily available from sources other than complex surveys; at the same time, such measures are more pertinent to the analysis of poverty and social exclusion. An important point to note is that, more than at the national level, many measures of averages can also serve as indicators of disparity and deprivation when seen in the regional context: the dispersion of regional means is of direct relevance in the identification of geographical disparity. Survey data such as from EU-SILC can be used in different forms and manners to construct regional indicators. (1) Direct estimation from survey data – in the same way as done normally at the national level – provided that the regional sample sizes are adequate for the purpose. (2) Constructing alternative (but with a substantively similar meaning) indicators which utilise the available survey data more intensively. (3) Cumulation of data over survey waves to increase precision of the direct estimates. (4) Using survey data in conjunction with data from other (especially administrative) sources – which are larger in size but less detailed in content than survey data – in order to produce improved estimates using small area estimation (SAE) techniques. (5) Going altogether beyond the survey by exploiting administrative and other sources. 2. Cumulation over waves in a rotational panel design Illustrations from European social surveys The two most important regular social surveys in the EU are the Labour Force Survey (EU-LFS) and Statistics on Income and Living Conditions (EU-SILC). The EU-LFS was initiated at EU level in 1960, with a systematic common framework adopted from 1983. It is a large sample survey, conducted in all EU countries on a continuous basis, providing quarterly and annual results on labour participation along with sociodemographic and educational variables. Annually ad-hoc modules dedicated to specific topics supplement the core survey. The EU-SILC was launched starting from 2003 in some countries; it covered 27 EU and EFTA countries by 2005, and all 30 by 2008. In each country it involves an annual survey with a rotational panel design. Its content is comprehensive, focusing on income, poverty and living conditions. Both EU-LFS and EU-SILC involve comprehensiveness in the substantive dimension (coverage of different topics), in space (coverage of different countries), and in time (regular waves or rounds). EU-LFS involves diverse types of rotational designs; a simple and common one is illustrated below on the left hand side. In this example, a sample address stays in the survey for 5 consecutive quarters before being dropped. The subsamples contributing to a particular year have been identified in the central part of the diagram. As for EU-SILC most countries use the standard rotational household panel 2 design shown below on the right. Here the survey is annual, and each panel stays in the survey for four consecutive years.                     Pooling of data versus pooling of estimates When two or more data sources contain – for the same type of units such as households or persons – a set of variables measured in a comparable way, then the information may be pooled either (a) by combining estimates from the different sources, or (b) by pooling data at the micro level. Technical details and relative efficiencies of the procedures depend on the situation. The two approaches may give numerically identical results, or the one or the other may provide more accurate estimates; in certain cases, only one of the two approaches may be appropriate or feasible in any case. Consider for instance the common case of pooling results across countries in a multicountry survey programme such as EU-SILC or EU-LFS. For linear statistics such as totals, pooling individual country estimates say i with some appropriate weights Pi gives the same result as pooling data at the micro level with unit weights w ij rescaled as  wij  wij . Pi  j wij . For ratios of the form    w .v i j ij ij  j wij .uij , the two forms give very similar but not identical results, corresponding respectively to the ‘separate’ and ‘combined’ types of ratio estimate. This paper is concerned with a different but equally common type of problem, namely pooling of different sources pertaining to the same population or largely overlapping and similar populations. In particular, the interest is in pooling over survey waves in a national survey in order to increase the precision of regional estimates. Estimates from samples from the same population are most efficiently pooled with weights in proportion to their variances (meaning, with similar designs, in direct proportion to their sample sizes). Alternatively, the samples may be pooled at the micro level, with unit weights inversely proportion to their probabilities of appearing in any of the samples. This latter procedure may be more efficient (e. g., O’Muircheataigh and Pedlow, 2002), but be impossible to apply as it requires information, for every unit in the pooled sample, on its probability of selection into each of the samples irrespective of whether or not the unit appears in the particular sample (Wells, 1998). Another serious difficulty in pooling samples is that, in the presence of complex sampling designs, the structure of the resulting pooled sample can become too complex or even unknown to permit proper variance estimation. In any case, different waves of a survey like EU-SILC or EU-LFS do not correspond to exactly the same population. The problem is akin to that of combining samples selected from multiple frames, for which it has been noted that micro level pooling is generally not the most efficient method (Lohr and Rao, 1996). For the above reasons, pooling of wave-specific estimates rather than of micro data sets is generally the appropriate approach to aggregation over time from surveys such as EUSILC and EU-LFS. 3 3. Gain in precision from cumulation over survey waves Consider that for each wave, a person’s poverty status is determined based on the income distribution of that wave separately, and the proportion poor at each wave is computed. These proportions are then averaged over a number of consecutive waves. The issue is to quantify the gain in sampling precision from such pooling, given that data from different waves of a rotational panel are highly correlated. Variance for the pooled estimators can be estimated on the following lines, using for instance the Jackknife Repeated Replication (JRR) procedure (see Section 4). The total sample of interest is formed by the union of all the cross-sectional samples being compared or aggregated. Using as basis the common structure of this total sample, a set of JRR replications is defined in the usual way. Each replication is formed such that when a unit is to be excluded in its construction, it is excluded simultaneously from every wave where the unit appears. For each replication, the required measure is constructed for each of the cross-sectional samples involved, and these measures are used to obtain the required averaged measure for the replication, from which variance is then estimated in the usual way (Betti et al., 2007). Table 1: Gain from cumulation over two waves: cross-sectional and persistent poverty rates. Poland EU-SILC 2005-2006 Sample base CS-2006 CS-2005 LG 05-06 LG 05-06 LG 05-06 Poverty rate Est HCR 2006 HCR 2005 HCR 2006 HCR 2005 Persistent ‘05-06 19.1 20.6 18.5 20.2 12.5 n persons 45,122 49,044 32,820 32,820 32,820 %se* actual 0.51 0.45 mean income 0.42 1.31 0.55 0.60 14% (1) (2) (3) (4) (5) HCR: poverty line national regional 0.34 0.40 1.18 1.18 0.40 0.47 0.48 0.56 30% 30% In terms of the quantities defined above, rows (1)-(5) of Table 1 are as follows. Standard error of average HCR over two years (assuming independent samples) Factor by which standard error is increased due to positive correlation between waves Standard error of average HCR over two years (given correlated samples) Average standard error over a single year Average gain in precision (variance reduction, or increase in effective sample size, over a single year sample) (1)  1 2 .V1  V 2  1/ 2 ( 2)  .1  b. n n H  1/ 2 (3)  (1) . ( 2)  . V  1/ 2  (4)  V1  1/ 2  V2  1/ 2 2 (5)  1  (3) (4)  2 In place of the full JRR application, it is more illuminating to provide here the following simplified procedure for quantifying the gain in precision from averaging over waves of the rotational panel. It illustrates the statistical mechanism of how the gain is achieved. Indicating by pj and p'j the (1, 0) indicators of poverty of individual j over the two adjacent waves, we have the following for the population variances: 2 var  p j    p j  p   p.1  p   v ; similarly, var  p 'j   p .1  p   v   cov p j , p 'j   p j  p  . p j  p   a  p. p   c1 , say, where ‘a’ is the persistent poverty rate over the two years. For the simple case where the two waves completely overlap and p  p , variance vA for the averaged measure is: 4 v , v A  .1  b 2  c   a  p2  .  b   1    2   v   p p  with correlation The correlation between two periods is expected to decline as the two become more widely separated. Consider, for example, the case when the correlation between two points k waves apart can be approximated as c k v  c1 v k . In a set of K periods there are (K-k) pairs exactly k periods apart, k=1 to (K1). It follows that variance vK of an average over K periods relates to variance v of the estimate from a single wave as: k v  1   K  k   c1   .  f c   k   . 1  2. Kk11    K   v    v  K  where a, the persistent poverty between pairs of adjacent waves, and p, the crosssectional poverty rate, are averages over the waves involved. For application to pairs of waves in EU-SILC, it is necessary to allow for variations in cross-sectional sample sizes and partial overlaps. The result is: V  V1  V2  4. 1  b. n n H  where V1 and V2 are the sampling variances, b the correlation coefficient over the two cross-sections, n is the overlap between the cross-sectional samples, and n H is the harmonic mean of their sample sizes n1 and n2. The methodology described above was applied to the 2005-2006 cross-sectional and longitudinal EU-SILC samples for Poland. Table 1 shows some results at the national level. Averaging the poverty rate (head count ratio, HCR) over two waves leads to a variance of this averaged estimator that is 30% less than the variance of the HCR estimated from just a single wave. Reduction from averaging over rounds in a rotational design Consider a rotational sample in which each unit stays in the sample for n consecutive periods, with the required estimate being the average over Q consecutive periods, such as Q=4 for four-year average. The case n=1 corresponds simply to independent samples each quarter. In the general case, the total sample involved in the estimation consists of (n+Q-1) independent subsamples. With some simplifying but reasonable assumptions, it can be proved (Verma, Gagliardi and Ferretti, 2009) that the variance of the pooled estimate is approximately m1 1   V2   V2     F (R)   m1  m2  m1 1  1 f (m1 )  2m  1 f (m) (n  Q)   Va2   m1  n Q   n  Q   f (m)   2  (m 1)  R  (m  2)  R2  ...  Rm1 m  The first factor is the variance which would be obtained in the absence of correlations between waves; F(R) is the increase over that as a results of correlations. 4. Variance and design effects The issues addressed in this paper concern the efficiency of cumulating information over consecutive waves of a survey such as EU-SILC, involving complex statistics based on complex sample designs. Estimates are required for the whole population and also for subpopulations of different types. Both cross-sectional and longitudinal statistics are involved. Comparisons and cumulation over correlated cross-sections, with which this paper is concerned, add another layer of complexity. 5 Jackknife Repeated Replication (JRR) provides a versatile and straightforward technique for variance estimation in these situations. It is one of the classes of variance estimation methods based on comparisons among replications generated through repeated resampling of the same parent sample. Once the set of replications has been appropriately defined for any complex design, the same variance estimation algorithm can be applied to a statistic of any complexity. We have extended and applied this method for estimating variances for subpopulations (including regions and other geographical domains), longitudinal measures such as persistent poverty rates, and measures of net changes and averages over cross-sections in the rotational panel design of EU-SILC (Verma and Betti, 2007). Appropriate coding of the sample structure, in the survey micro-data and accompanying documentation, is an essential requirement in order to compute sampling errors taking into account the actual sample design. Lack of information on the sample structure in survey data files is a long-standing and persistent problem in survey work, and unfortunately affects EU-SILC as well. Indeed, the major problem in computing sampling errors for EU-SILC is the lack of sufficient information for this purpose in the micro-data available to researchers. We have developed approximate procedures in order to overcome these limitations at least partially, and used them to produce useful estimates of sampling errors (Verma et al., 2010). Use has been made of these results in this paper, but it is not possible here to go into detail concerning them. A most useful concept for the computation, analysis and interpretation of sampling errors concerns ‘design effect’ (Kish, 1995). Design effect is the ratio of the variance (v) under the given sample design, to the variance (v0) under a simple random sample of the same size: d 2  v v0 , d  se se0 . Proceeding from estimates of sampling error to estimates of design effects is essential for understanding the patterns of variation in and the determinants of magnitude of the error, for smoothing and extrapolating the results of computations, and for evaluating the performance of the sampling design. Analysis of design effects into components is also needed in order to understand from where inefficiencies of the sample arise, to identify patterns of variation, and through that to extend the results to other statistics, designs and situations. And most importantly, with JRR (and other replication methods) the total design effect can only be estimated by estimating (some of) its components separately (Verma, Betti, 2010). In applications for EU-SILC, there is in addition a most important and special reason for decomposing the total design effect into its components. Because of the limited information on sample structure included in the micro-data available to researchers, direct and complete computation of variances cannot be done in many cases. Decomposition of variances and design effects identifies more ‘portable’ components, which may be more easily imputed (carried over) from a situation where they can be computed with the given information, to another situation where such direct computations are not possible. On this basis valid estimates of variances can be produced for a wider range of statistics, thus at least partly overcoming the problem due to lack of information on sample structure. We may decompose total variance v (for the actual design) into the components or factors as v  v .d  v .d .d .d .d  , where dW is the effect of sample weights, dH of clustering of individual persons into households, dD of clustering of households into dwellings, and d X that of other complexities of the design, mainly clustering and stratification. All factors other than dX do not involve clusters or strata, but depend only on individual elements (households, persons etc.), and the sample weight associated with each such element in 2 2 0 0 W H D X 6 the sample. Parameter dW depends on variability of sample weights, and secondly also on the correlation between the weights and the variable being estimated; dH is determined by the number of and correlation among relevant individuals in the household, and similarly dD by the number of households per dwelling in a sample of the latter. By contrast, factor dX represents the effect on sampling error of various complexities of the design such as multiple stages and stratification. Hence unlike other components, dX requires information on the sample structure linking elementary units to higher stage units and strata. This effect can be estimated as follows using the JRR procedures. We compute variance under two assumptions about structure of the design: variance v under the actual design, and vR computed by assuming the design to be (weighted) simple random sampling of the ultimate units (addresses, households, persons as the case may be). This can be estimated from a ‘randomised sample’ created from the actual sample by completely disregarding its structure other than the weights attached to individual elements. This gives d   v v  , with vR  v0 .dW .d H .d D 2 . Table 2 gives standard error, design effect and components of design effect for the crosssectional 2006 EU-SILC sample for Poland. The sample was a two stage stratified sample of dwellings containing 45,122 individual persons. With “%se” (3rd and last column) we mean: for mean statistics e.g. equivalised disposable income – standard error expressed as percentage of the mean value; for proportions and rates (e.g. poverty rates) – standard error given as absolute percent points. Terms (%se actual) and (%se SRS) relate, respectively, to the variances v and v0 in the text. Parameter dD cannot be estimated separately because of lack of information, but its effect is small and is, in any case, already incorporated into overall design effect d. Table 2: Estimation of variance and design effects at the national level. Crosssectional sample. Poland EU-SILC 2006 2 x R Est. (1) Mean equivalised disposable income (2) HCR – ‘head count’ or poverty rate, using national poverty line (3) HCR – ‘head count’ or poverty rate, using regional (NUTS1) poverty line %se actu al Design effect %se dX dW dH d SRS 3,704 0.57 0.94 1.22 1.74 1.99 0.29 19.1 0.51 1.02 1.09 1.74 1.94 0.26 19.0 0.61 1.05 1.09 1.74 1.99 0.30 Table 2 gives poverty rates defined with respect to two different ‘levels’ of poverty line: country level and NUTS1 level. By this we mean the population level to which the income distribution is pooled for the purpose of defining the poverty line. Conventionally poverty rates are defined in terms of the country poverty line (as 60% of the national median income). The income distribution is considered at the country level, in relation to which a poverty line is defined and the number (and proportion) of poor computed. It is also useful to consider poverty lines at other levels. Especially useful for constructing regional indicators is the use of regional poverty lines, i.e. a poverty line defined for each region based only on the income distribution within that region. The numbers of poor persons identified with these lines can then be used to estimate regional poverty rates. They can also be aggregated upwards to give an alternative national poverty rate – but 7 which still remains based on the regional poverty lines. So defined, the poverty measures are not affected by disparities in the mean levels of income among the regions. The measures are therefore more purely relative. 5. Illustrative applications of cumulation at the regional level Table 3 shows results for the estimation of variance and design effect for the crosssectional 2006 and 2005 Poland datasets. The results at national level for the three measures considered have been already presented in previous sections. Here we present the results at NUTS1 regional level. All the values, except “%se SRS” and dX, are computed at regional level in the same manner as the national level. All factors other than dX do not involve clusters or strata, but essentially depend only on individual elements and the associated sample weights. Hence normally they are well estimated, even for quite small regions. Factor d X G  for a region (G) may be estimated in relation to d   estimated at the country (C) level on the following lines. For large regions, each with a large enough number of PSUs (say over 25 or 30), we may estimate the variance and hence d   directly at the regional level. Sometimes a region involves a SRS of elements, even if the national sample is multi-stage in other parts; here obviously, d    1 . If the sample design in the region is the same or very similar to that for the country as a whole – which is quite often the case – we can take d X G   d X C  . It is common that the main difference between the regional and the total samples is the average cluster size (b). In this case we use d    1  d    1. b  b  . The last-mentioned model concerns the effect of X C X G X G 2 X G 2 X C G C clustering and hence is meaningful only if d    1 , which is often but not always the case in actual computations. Values smaller than 1.0 may arise when the effect of stratification is stronger than that of clustering, when units within clusters are negatively correlated (which is rare, but not impossible), or simply as a result of random variability in the empirical results. In any case, if d X C   1 , the above equation should be replaced by X C d X G   d X C  . The quantity (%se* SRS) can be directly computed at the regional level as was done for the national level in Table 2. However, very good approximation can be usually obtained very simply without involving JRR computations of variance. The following model has been used in Table 3. For means (such as equivalised income) over very similar populations, assumption of a constant coefficient of variation is reasonable. The region-to-country ratio of relative standard errors (expressed as percentage of the mean value as in Table 3) under simple random sampling is inversely proportional to the square-root of their respective sample sizes: %se * SRS   %se * SRS  . n   n   . For proportions (p, with q=100-p), with standard error expressed in absolute percent points as in Table 3, we can take: . A poverty rate may be treated as  p .q  % se * SRS    %se * SRS   .       . n  n   p .q 2 G 2 G 2 C G  2 C C G G C  C   C G proportions for the purpose of applying the above. We see from Table 3 that the (%se*actual) at regional level is generally, for all the three measures, 2 to 3 times larger than that at the national level. 8 Table 3: Estimation of variance and design effects at the regional (NUTS1) level. Full cross-sectional dataset n %se* %se* 2006 Est. persons SRS dX d actual Mean equivalised disposable income Poland 3,704 45,122 0.29 0.94 1.99 0.57 PL1 4,236 8,728 0.65 0.94 2.06 1.34 PL2 3,889 9,273 0.63 0.94 1.78 1.13 PL3 3,162 9,079 0.64 0.94 2.00 1.28 PL4 3,530 6,912 0.73 0.94 1.90 1.39 PL5 3,906 4,538 0.90 0.94 1.96 1.77 PL6 3,419 6,592 0.75 0.94 1.90 1.43 At-risk-of-poverty rate, national poverty line Poland 19.1 45,122 0.26 1.02 1.94 0.51 PL1 17.1 8,728 0.57 1.02 1.85 1.06 PL2 14.7 9,273 0.52 1.02 1.86 0.97 PL3 25.2 9,079 0.64 1.02 2.09 1.34 PL4 18.7 6,912 0.66 1.02 1.98 1.32 PL5 18.6 4,538 0.82 1.02 1.91 1.56 PL6 21.4 6,592 0.71 1.02 1.95 1.40 At-risk-of-poverty rate, regional poverty lines Poland 19.0 45,122 0.30 1.05 1.99 0.61 PL1 19.8 8,728 0.70 1.04 1.90 1.34 PL2 18.5 9,273 0.67 1.04 1.91 1.27 PL3 18.6 9,079 0.68 1.06 2.14 1.45 PL4 17.5 6,912 0.76 1.05 2.04 1.54 PL5 20.9 4,538 1.00 1.04 1.97 1.96 PL6 19.1 6,592 0.80 1.05 2.00 1.60 n %se* 2005 Est. persons actual 3,040 49,044 0.62 3,455 9,871 1.32 3,143 10,181 1.22 2,618 9,674 1.32 2,977 7,195 1.84 3,164 5,066 1.85 2,816 7,057 1.58 20.6 49,044 0.45 19.1 9,871 0.92 16.4 10,181 0.87 25.2 9,674 1.13 20.2 7,195 1.19 20.2 5,066 1.43 23.7 7,057 1.26 20.5 49,044 0.51 20.9 9,871 1.07 19.0 10,181 1.05 20.8 9,674 1.21 20.1 7,195 1.35 22.2 5,066 1.68 21.3 7,057 1.37 Regional HCR estimates based on the national poverty line are quite different from those based on the regional ones. Also, while individual regional estimates of HCR using the regional poverty line are quite close to the national estimate (19.0 for 2006), the ones using the national poverty line are more variable (from 14.7 to 25.2 for 2006). From Table 4 below it can be seen that generally for the HCR measures, both for country and NUTS1 level poverty lines, cumulating the estimates over two waves leads to a reduction of 30% in variance compared to that for a single wave. This reduction of the variance is smaller for mean equivalised income due to a higher correlation between incomes for the two years – generally the coefficient of correlation of the equivalised income between waves exceeds 0.70. Table 4: Gain in precision from averaging over correlated samples. Poland NUTS1 regions Mean equivalised income (1) (2) (3) (4) (5) Country 0.42 1.31 0.55 0.60 14% PL1 0.94 1.33 1.26 1.33 11% PL2 0.83 1.30 1.08 1.17 15% PL3 0.92 1.31 1.20 1.30 14% PL4 1.15 1.27 1.47 1.62 18% PL5 1.28 1.32 1.70 1.81 12% PL6 1.07 1.32 1.41 1.51 12% 9 HCR national poverty line (1) (2) (3) (4) (5) Country 0.34 1.18 0.40 0.48 30% PL1 0.70 1.18 0.83 0.99 29% PL2 0.65 1.17 0.76 0.92 31% PL3 0.88 1.18 1.03 1.24 30% PL4 0.89 1.18 1.05 1.26 30% PL5 1.06 1.17 1.23 1.50 32% PL6 0.94 1.19 1.12 1.33 29% PL3 0.94 1.17 1.10 1.33 31% PL4 1.03 1.18 1.21 1.45 30% PL5 1.29 1.17 1.51 1.82 31% PL6 1.05 1.18 1.24 1.49 31% HCR regional poverty line (1) (2) (3) (4) (5) Country 0.40 1.18 0.47 0.56 30% PL1 0.86 1.18 1.02 1.21 29% PL2 0.83 1.18 0.98 1.16 29% Rows (1) – (5) have been defined in Table 1. References Betti, G., Gagliardi, F., Nandi, T.: Jackknife variance estimation of differences and averages of poverty measures. Working Paper no° 68/2007, DMQ, Università di Siena (2007). Kish, L.: Methods for design effects. J. Official Statist. 11, 55-77 (1995). Lohr, S. L., Rao, J,N.K.: Inference from dual frame surveys. Journal of American Statistical Association, 95, 271-280 (2000). O’Muircheataigh, C., Pedlow, S.: Combining samples vs. cumulating cases: a comparison of two weighting strategies in NLS97. American Statistical Association Proceedings of the Joint Statistical Meetings, pp. 2557-2562 (2002). Verma, V., Betti, G.: Cross-sectional and Longitudinal Measures of Poverty and Inequality: Variance Estimation using Jackknife Repeated Replication. Conference 2007 ‘Statistics under one Umbrella’, Bielefeld University (2007). Verma, V., Betti, G.: Taylor linearization sampling errors and design effects for poverty measures and other complex statistics, Journal of Applied Statistics (2010), on-line first. Verma, V., Betti, G., Gagliardi, F.: An assessment of survey errors in EU-SILC, Eurostat Methodologies and Working Papers, Eurostat, Luxembourg (2010). Verma, V., Betti, G., Natilli, M., Lemmi, A.: Indicators of social exclusion and poverty in Europe’s regions. Working Paper no° 59/2006, DMQ, Università di Siena (2006). Verma, V., Gagliardi, F., Ferretti, C.: On pooling of data and measures. Working Paper no° 84/2009, DMQ, Università di Siena (2009). Wells, J. E.: Oversampling through households or other clusters: comparison of methods for weighting the oversample elements. Australian and New Zeeland Journal of Statistics, 40, 269-277 (1998). 10