Stata Swiid
Stata Swiid
Stata Swiid
Frederick Solt
Associate Professor of Political Science
University of Iowa
frederick-solt@uiowa.edu
The Standardized World Income Inequality Database (SWIID) takes a Bayesian approach to
standardizing observations collected from the OECD Income Distribution Database, the Socio-
Economic Database for Latin America and the Caribbean generated by CEDLAS and the World
Bank, Eurostat, the World Bank’s PovcalNet, the UN Economic Commission for Latin America and
the Caribbean, national statistical offices around the world, and many other sources. Luxembourg
Income Study data serves as the standard.
As described in Solt (2020), the SWIID maximizes the comparability of available income in-
equality data for the broadest possible sample of countries and years. But incomparability remains,
and it is sometimes substantial. This remaining incomparability is reflected in the standard errors
of the SWIID estimates, making it often crucial to take this uncertainty into account when making
comparisons across countries or over time (Solt 2009, 238; Solt 2016, 14; Solt 2020). It was once
the case that incorporating the standard errors into an analysis required considerable effort. It is
now straightforward.
In version 9.6 of the SWIID, the inequality estimates and their associated uncertainty are
represented by 100 draws from the posterior distribution: for any given observation, the differences
across these imputations capture the uncertainty in the estimate. The swiid9_6.zip includes the
file swiid9_6.dta, which is pre-formatted to facilitate taking this uncertainty into account. The
following sections describe how to subset the data, merge in additional variables, and do analyses.
1 Getting Started
The swiid9_6.dta file is pre-formatted for use with Stata’s tools for analyzing multiply imputed
data. Estimates of each of four inequality measures and their associated uncertainty are represented
by a placeholder variable (which has the measure’s name but only missing data for all observations)
plus 100 separate variables (prefixed with _1_, _2_, etc.): for any given observation, the differences
across these 100 variables capture the uncertainty in the estimate.
The four measures are:
gini_disp: Estimate of Gini index of inequality in equivalized (square root scale) household
disposable (post-tax, post-transfer) income, using Luxembourg Income Study data as the
standard.
gini_mkt: Estimate of Gini index of inequality in equivalized (square root scale) household
market (pre-tax, pre-transfer) income, using Luxembourg Income Study data as the standard.
abs_red: Estimated absolute redistribution, the number of Gini-index points market-income
inequality is reduced due to taxes and transfers: the difference between the gini_mkt and
gini_disp.
1
rel_red: Estimated relative redistribution, the percentage reduction in market-income in-
equality due to taxes and transfers: the difference between the gini_mkt and gini_disp,
divided by gini_mkt, multiplied by 100.
2 Adding Variables
Generating new variables from the SWIID estimates requires a bit of care. To preserve Stata’s recog-
nition of how the SWIID is formatted for analysis, the mi passive: prefix must be used. Suppose
we wanted to generate a variable for the log of gini_net. For this new variable to take into account
the uncertainty in the SWIID estimates, instead of simply typing gen ln_gini_net = ln(gini_net),
we need to preface that command with the mi passive: prefix, as below:
The result is a placeholder variable for the new measure ln(gini_net), plus 100 separate
variables prefixed with _1_, _2_, etc. that together represent the uncertainty in our new measure.
Note that there is no need to use mi passive: to create variables in the dataset that are not based
on the SWIID estimates.
3 Merging
To merge the SWIID and additional data, simply merge the other dataset into the SWIID dataset.
Note that this means that the SWIID should be the ‘master’ file in the merge, the other data should
be the ‘using’ file.
Suppose we wanted to do a (simplified) replication of Solt, Habel, and Grant’s (2011) analysis
of World Values Survey data on religiosity. As our measure of religiosity, we will use the WVS
item on respondents’ self-report of the importance of God to their lives, which is measured on a
ten-point scale. Given secularization theory, we will need to control for GDP per capita, which
we will calculate from information from the Penn World Tables (Feenstra, Inklaar and Timmer
2015). Below we first load the PWT dataset and use it to generate a dataset of GDP per capita (in
thousands of dollars). Then we load the WVS data, generate our variables of interest, and merge
in our PWT data. Finally, we merge these data into the SWIID.
// Get GDP per capita data from the Penn World Tables, Version 9.1 (Feenstra et al. 2015)
// download from https://www.rug.nl/ggdc/docs/pwt91.dta
// create gdppc and save as .dta
2
// generate variables of interest, merge in the PWT data, and save
4 Analyzing
Once any additional variables are created or merged in, we may proceed to analysis. Continuing
with our example, we estimate a three-level linear mixed-effects model of individual responses nested
in country-years nested in countries using mixed. To take the uncertainty in the SWIID estimates
into account, we construct our model comman as usual, but precede it with the mi estimate: prefix
to perform it on each of the 100 variables that report the uncertainty in the SWIID estimates. Note
that performing an analysis 100 times can be time-consuming.
mi estimate: mixed religiosity gini_disp gdppc age educ male || country: || country_year:
3
6 Mean-plus-standard-error Summary Format
The format described above facilitates taking the uncertainty in the SWIID estimates into account
when conducting analyses. It does not, however, lend itself easily to tasks such plotting. The
mean-plus-standard-error summary format is much better suited to such purposes. The SWIID is
presented in this format in the swiid9 1 summary.csv file.
// A silly example
gen name_length = length(country)
gen first_letter = substr(country, 1, 1)
keep if year==2010 & first_letter=="S" /*2010 for Senegal, Serbia, . . .*/
4
Figure 1: A Scatterplot with Confidence Intervals
Solt, Frederick. 2020. “Measuring Income Inequality Across Countries and Over Time: The
Standardized World Income Inequality Database.” Social Science Quarterly. SWIID Version
9.6, December 2023.
References
Feenstra, Robert C., Robert Inklaar and Marcel P. Timmer. 2015. “The Next Generation of the
Penn World Table.” American Economic Review 105(10):3150–3182.
Solt, Frederick. 2009. “Standardizing the World Income Inequality Database.” Social Science Quar-
terly 90(2):231–242.
Solt, Frederick. 2016. “The Standardized World Income Inequality Database.” Social Science Quar-
terly 97(5):1267–1281.
Solt, Frederick. 2020. “Measuring Income Inequality Across Countries and Over Time: The Stan-
dardized World Income Inequality Database.” Social Science Quarterly 101(3):1183–1199. SWIID
Version 9.6, December 2023.
5
Solt, Frederick, Philip Habel and J. Tobin Grant. 2011. “Economic Inequality, Relative Power, and
Religiosity.” Social Science Quarterly 92(2):447–465.