RJwrapper
RJwrapper
RJwrapper
Abstract Multiple imputation is a tool for parameter estimation and inference with partially observed
data, which is used increasingly widely in medical and social research. When the data to be imputed
are correlated or have a multilevel structure — repeated observations on patients, school children
nested in classes within schools within educational districts — the imputation model needs to include
this structure. Here we introduce our joint modelling package for multiple imputation of multilevel
data, jomo, which uses a multivariate normal model fitted by Markov Chain Monte Carlo (MCMC).
Compared to previous packages for multilevel imputation, e.g. pan, jomo adds the facility to (i) handle
and impute categorical variables using a latent normal structure, (ii) impute level-2 variables, and (iii)
allow for cluster-specific covariance matrices, including the option to give them an inverse-Wishart
distribution at level 2. The package uses C routines to speed up the computations and has been
extensively validated in simulation studies both by ourselves and others.
Introduction
Missing data are ubiquitous in clinical and social research. The most straightforward way to deal with
missing data is to exclude all observations with any item missing from the analysis, i.e. a complete
records analysis. However this strategy is at best inefficient; further unless — given covariates — the
probability of a complete record does not depend on the outcome (dependent) variable, it will lead to
biased results.
Rubin (1976) described different mechanisms causing missing data: Missing Completely At
Random (probability of missingness unrelated to observed and unobserved values, MCAR), Missing
At Random (given observed data, occurrence of missing values is independent of the actual values,
MAR) and Missing Not At Random (given observed data, occurrence of missing values still depends
on the actual values, MNAR).
Multiple Imputation (MI) is a very flexible, practical, tool to deal with missing data. It consists
of imputing missing data several times, creating multiple imputed data sets. Then, the substantive
model is directly fitted to each of the imputed data sets; the results are then combined for inference
using Rubin’s rules (Rubin, 1987). An appropriately specified multiple imputation model gives valid,
efficient inference if data are MAR (Carpenter and Kenward, 2013, Ch. 2; Little and Rubin, 2002). Key
attractions of MI are that it separates the imputation of missing data from the analysis, thus allowing
(a) use of the substantive analysis model that we intended to use with fully observed data and (b)
inclusion of auxiliary variables in the imputation model — which provide information about the
missing values — without having to include them in the substantive analysis model.
There are several models and associated algorithms which can be used to impute missing data;
Schafer (1997) proposed a joint multivariate normal model, fitted by MCMC. The main assumption of
this method is that the partially observed data follow a joint multivariate normal distribution; given
this, a Gibbs sampler uses the proper conditional distributions to update the parameters of the model
and impute the missing data.
One of the advantages of the joint modelling approach is that it extends naturally to multi-
level/hierarchical data structures. Such structures arise commonly, for example, when we have
repeated observations (level 1) on individuals (level 2), or students (level 1) nested in schools (level 2).
A number of joint modelling multiple imputation packages have been written: norm (Novo and
Schafer, 2013; Schafer and Olsen, 2000) assumes a multivariate normal model for imputation of single-
level normal data, cat (Harding et al., 2012) a log-linear model to impute categorical data, and mix
(Schafer., 2010) uses the general location model (Olkin and Tate, 1961) to impute a mix of continuous
and categorical data (Schafer, 1997, Ch. 9). pan (Zhao and Schafer, 2013; Schafer and Yucel, 2002) uses
a multilevel multivariate normal model for imputation of multilevel normal data.
As far as we are aware, jomo is the first R package to extend this to allow for a mix of multilevel
(clustered) continuous and categorical data. It is derived from, but also extends the functionality of
REALCOM (Carpenter et al., 2011), a standalone program written in Matlab. In REALCOM, binary and
categorical variables are handled through a latent normal variables approach presented in Carpenter
and Kenward (2013, Ch. 5). The aim of the jomo package is to provide an efficient implementation of
the REALCOM imputation model in R, while both (i) speeding up the processes through the use of C
sub-routines and (ii) adding the flexibility to specify level-2 specific covariance matrices and level-2
random covariance matrices, the latter following the proposal of Yucel (2011).
This paper is organized as follows: after briefly introducing joint modelling multiple imputation
and the principal functions within jomo, we present a series of short tutorials, in which we explain
how to impute (i) single-level continuous and categorical data; (ii) clustered homoscedastic data, (iii)
clustered heteroscedastic data and (iv) level-2 variables. We then present functions to help check
the convergence of the underlying MCMC algorithms and outline the suggested workflow for using
the package. The next section introduces another R package, mitml, which provides both a different
interface to jomo and a number of useful tools to manage and investigate the properties of imputed
data sets. The penultimate section provides an overview of both the simulation studies and the
applications where the package was used. We conclude with a short discussion.
Type of variables
Model type Missing data Continuous Binary/categorical Mixed
Single-level Level 1 jomo1con jomo1cat jomo1mix
Two-level Level 1 jomo1rancon jomo1rancat jomo1ranmix
Both levels jomo2com jomo2com jomo2com
Two-level, heteroscedastic Level 1 jomo1ranconhr jomo1rancathr jomo1ranmixhr
Both levels jomo2hr jomo2hr jomo2hr
Table 1: Summary of subfunctions used by the main "umbrella function" jomo, given the type of
imputation model, the level at which missing data occur, and the type of the variables with missing
data.
More general random effects structures can be modelled as well, as we will show in a later section.
It is similarly possible to consider a joint imputation model for variables at level 2, e.g. patient-
level variables in longitudinal studies (Carpenter and Kenward, 2013, p. 212), or allowing for level-1
heteroscedasticity, which can be particularly appealing in individual patient data (IPD) meta-analyses
among other applications.
Package structure
The jomo package can be used to impute missing data in continuous and categorical variables
in single-level and multilevel data. The main interface to the jomo package is the jomo function,
which automatically selects the correct imputation method for the data, depending on (a) the model
specification (e.g., single-level vs. multilevel) and (b) the variables in the data set (e.g., continuous
variables of type numeric vs. categorical variables as factor vs. a mixture of them). In addition, the
sub-functions called by jomo can be called individually; a list of all sub-functions is given below and
in Table 1.
1. jomo1con, jomo1cat and jomo1mix: these impute single-level continuous, categorical and mixed
data sets. jomo1con is very similar to the imp.norm function of the norm package. However,
jomo1cat and jomo1mix used the latent normal variables approach described in Carpenter and
Kenward (2013, Ch. 5) to impute categorical variables and a mix of continuous and categorical
variables, respectively.
2. jomo1rancon, jomo1rancat and jomo1ranmix: these impute clustered continuous, categorical,
and mixed data, respectively. jomo1rancon is very similar to pan, whereas jomo1rancat and
jomo1ranmix use the latent normal model for the categorical variables. All these functions have
a fixed, common covariance matrix across all the clusters (level-2 units) in the imputation model;
3. jomo1ranconhr, jomo1rancathr and jomo1ranmixhr: these functions extend the above to allow
for either cluster (level-2 unit) specific covariance matrices, or random covariance matrices,
where the covariance matrices follow an inverse-Wishart distribution across level-2 units, as
described by Yucel (2011) and Quartagno and Carpenter (2016).
4. jomo2com and jomo2hr: these functions impute missing values in level-2 variables, and can be
used in the same manner as those in groups (2) and (3) above.
We next illustrate the use of jomo in each of the above situations. Throughout, we assume that the data
are MAR.
library(jomo)
> summary(JSPmiss)
school id sex fluent ravens english
48 : 76 280 : 1 Min. :0.0000 0 : 32 Min. : 4.00 Min. : 0.00
42 : 52 281 : 1 1st Qu.:0.0000 1 : 29 1st Qu.:22.00 1st Qu.:24.00
31 : 44 282 : 1 Median :1.0000 2 :823 Median :26.00 Median :40.00
8 : 43 283 : 1 Mean :0.5103 NA's:235 Mean :25.35 Mean :41.36
33 : 43 284 : 1 3rd Qu.:1.0000 3rd Qu.:30.00 3rd Qu.:56.00
5 : 39 285 : 1 Max. :1.0000 Max. :36.00 Max. :98.00
(Other):822 (Other):1113 NA's :246 NA's :236
behaviour cons
lowerquarter:248 Min. :1
upper :871 1st Qu.:1
Median :1
Mean :1
3rd Qu.:1
Max. :1
This shows the data set has (248 + 871) = 1119 observations, of which 236 have missing data on
the outcome english, 235 have missing data on fluent and so on and so forth. For the purpose of
this example, we ignore clustering by school and take as the substantive analysis model the following
linear regression:
To illustrate the use of jomo with continuous variables, let i = 1, . . . , 1119 index observations and use
# Set the seed (so we can replicate the results exactly if desired)
set.seed(1569)
# Run jomo and store the imputed data sets in a new data frame, imp
imp <- jomo(Y = Y, X = X, nburn = 1000, nbetween = 1000, nimp = 5)
Running this code, with the above seed, the following output is shown on screen:
The first sentence informs us that, as we did not pass any clustering indicator to the function, jomo is
using single-level imputation. The second sentence is telling us which of the sub-functions was used.
As we have two numeric dependent variables, jomo1con has been chosen.
Then, the software prints a ‘.’ for each 10 burn-in updates of the MCMC sampler, followed by a
notification that it has created the first imputed data set. It then prints a ‘.’ for each 10 further updates
of the MCMC sampler, before imputations 2, 3, 4, and 5. The default values for the burn-in and
between-imputation updates are both 1000, resulting in 100 dots printed in each case.
Finally, the software prints the estimated posterior mean of the regression coefficients for english
(β 0,e , β 1,e ,), ravens (β 0,r , β 1,r ), and the elements of the covariance matrix:
ee,i 458.40 64.93
∼ N 0, Ω = .
er,i 64.93 36.42
The same results can be obtained by running jomo1con in place of jomo with exactly the same argu-
ments. In general, to exactly replicate the results, you will have to (i) have the data sorted in the same
order and (ii) use the same seed. Sometimes we will wish to suppress the output generated by jomo;
the option output = 0 controls this.
Running jomo creates the object imp, which is a data frame:
class(imp)
#[1] "data.frame"
# and dimension:
dim(imp)
# [1] 6714 6
The names include the original variable names, inherited from JSPmiss, and a new variable Imputation,
which indexes the original data (0) and the imputed data sets. We have five imputed data sets. Let’s
look at the top of the original data, and the top of the fifth imputation:
# View last imputation (the left most column is the row number):
head(imp[imp$Imputation == 5,])
To impute missing data, jomo fits a Bayesian model using MCMC. Therefore, we need to provide
starting values and priors for each parameter in the model. The default starting values are a matrix
of zeros for the fixed effects parameter β, and the identity matrix for the covariance matrix. In the
majority of situations, changing the starting values will not change the results materially. Nevertheless,
good starting values may considerably reduce the number of iterations needed for the algorithm to
converge.
In order to represent the maximum uncertainty and to give the greatest weight to the data, jomo
assumes flat improper priors for all the parameters in the model, except the covariance matrix. For
this, an inverse-Wishart prior is used, with degrees of freedom set to the minimum possible, i.e.
the dimension of the covariance matrix; this represents the greatest uncertainty (least information).
Changing the scale matrix of the inverse-Wishart prior may have some impact when we have a very
small number of observations. In our example, with 1119 observations and only two outcomes, the
impact is immaterial.
We now illustrate how to set the starting values for all the parameters, the scale matrix of the
inverse-Wishart prior for the covariance matrix as well as the number of burn-in iterations for the
MCMC sampler, the number of iterations between imputations, and the number of imputations:
# Set scale matrix of the inverse-Wishart prior for the covariance matrix:
l1cov.prior <- diag(2, 2);
# Impute:
# [Note for new R users: the inputs set above have the same names as their
# corresponding options in the function. Hence, when we set <option> = <object
# name>, we have the same string on both sides of '=']
We see no material change from the previous results. In simple problems, the default burn-in of nburn
= 1000 is often enough for the sampler to converge. Further below, we show how to visually check
whether the stationary distribution has been reached.
Recall our substantive linear regression model above. Once we have created our imputed data sets,
we follow the usual multiple imputation rules and fit our substantive model to each imputed data
set, before summarising the results for final inference using Rubin’s rules, which are implemented in
mitools:
# Use the object imp which we created with the original run above. First
# convert the data frame of results imp to a list of imputations
imp.list <- imputationList(split(imp, imp$Imputation)[-1])
There are multiple alternative implementations of Rubin’s rules in R. These include pool from mice,
runMI from semTools, particularly appealing with Structural Equation Models, MI.inference from
BaBooN (Meinfelder, 2011), and testEstimates from mitml, which we present more in depth in the
penultimate section of this paper. Other packages with their own implementation of Rubin’s rules
include Amelia, mi, and lavaan.survey.
Categorical variables
Fully observed binary covariates can be included in the X matrix of the imputation model as type
numeric, exactly as with sex in this example. To include fully observed categorical covariates with
three or more categories, appropriate dummy variables have to be created. For this purpose, we might
use the R package dummies (Brown, 2012) or the function constrasts in base R.
jomo also readily imputes a mix of binary, categorical and continuous variables. This is done
using a latent normal model (see Goldstein et al., 2009; Carpenter and Kenward, 2013, Ch. 4). To
illustrate this, we continue to use the data set JSPmiss but now also impute the partially observed
fluency level (3 categories). The underlying joint imputation model remains a multivariate normal
model, but fluent is represented by two latent normal variables:
where:
Pr (Y f lu,i = 1) = Pr Y f∗lu,1,i = max Y f∗lu,j,i and Y f∗lu,1,i > 0
j=1,2
(6)
Pr (Y f lu,i = 2) = Pr Y f∗lu,2,i = max Y f∗lu,j,i and Y f∗lu,2,i > 0
j=1,2
Pr (Y f lu,i = 3) = Pr Y f∗lu,j,i < 0 for j = 1, 2 ,
ee,i
er,i
e f 1,i ∼ N (0, Ω) (7)
e f 2,i
As highlighted above, in order for this model to be estimable, we need to constrain the variance-
covariance matrix of (e f 1,i , e f 2,i ) T , (i.e. the bottom right 2 × 2 submatrix of Ω) to be
1 0.5
.
0.5 1
# Define the data frame with the dependent (outcome) variables for the imputation model:
Y <- JSPmiss[, c("english", "ravens", "fluent")]
# Define the data.frame with the (fully observed) covariates of the imputation model
X <- JSPmiss[, c("cons", "sex")]
This will take a little longer than the previous examples, and returns the following output:
The output illustrates that jomo recognized that fluent was a factor variable and therefore used the
function for the imputation of mixed data types.
The matrix posterior mean of the fixed effect estimates links directly to (5). Specifically, β̂ 0,e =
38.26, β̂ 1,e = 6.36, . . . , β̂ 0, f 2 = −1.84, β̂ 1, f 2 = 0.18. Likewise, the posterior means of the covariance
matrix terms are labelled, and correspond directly to (5). Specifically,
ee,i 458.16 64.44 −11.38 −8.89
d er,i = 64.44
36.74 −2.14 −1.50
Var e f 1,i −11.38 −2.14
.
1 0.5
e f 2,i −8.89 −1.50 0.5 1
Calling the relevant sub-function (jomo1mix) is possible again but a bit more complex, because contin-
uous and categorical outcomes must be passed as separate arguments.
We can specify starting values explicitly if we wish. To specify all the starting values, we need
to specify n − 1 starting β values for each n-category variable and a proper starting value for the
covariance matrix. For example, in the present case, fluent has 3 categories, which are modelled with
2 latent normal variables. As a result, β is a 2 × 4 matrix of regression coefficients, and the covariance
matrix is of size 4 × 4 (i.e., two predictors with two continuous and two latent dependent variables).
So, continuing with Y and X defined as above:
# Starting value for covariance matrix; the software disregards impossible values:
l1cov.start <- diag(2, 4)
set.seed(1569)
imp <- jomo(Y = Y, X = X, beta.start = beta.start, l1cov.start = l1cov.start)
As our starting values are different from the default, we get slightly different estimates of the posterior
means.
While the software is designed for unordered categorical data, it can be used for ordinal data
too. Our simulation results show that if variables are truly ordinal it gives good results with only a
marginal loss in efficiency (Quartagno and Carpenter, 2019).
Option (2) requires sufficient data within each level-2 unit to estimate the covariance matrix; option (3)
is a practical choice when we suspect there is heterogeneity across level-2 units, but there is insufficient
information within each level-2 unit for option (2).
We illustrate the software again with the JSPmiss data set, distributed with the package. Let j
index school and i students within schools. Our substantive model is:
u j ∼ N (0, σu2 )
(9)
ei,j ∼ N (0, σe2 ).
We now use multilevel imputation for the missing data. This is done by extending the joint
imputation model described above to the multilevel setting. As before, the variables english, ravens
and fluent are responses, and the fully observed variable sex a covariate. The imputation model has
random intercepts at level 2, and a common level-1 covariance matrix (approach 1 above).
where:
Pr (Y f lu,i,j = 1) = Pr Y f∗lu,1,i,j = max Y f∗lu,k,i,j and Y f∗lu,1,i,j > 0
k=1,2
(11)
∗ ∗ ∗
Pr (Y f lu,i,j = 2) = Pr Y f lu,2,i,j = max Y f lu,k,i,j and Y f lu,2,i,j > 0
k=1,2
∗
Pr (Y f lu,i,j = 3) = Pr Y f lu,k,i,j < 0 for k = 1, 2
ue,j
ee,i,j
er,i,j ur,j
ei,j e f 1,i,j ∼ N (0, Ωe )
= u f 1,j ∼ N (0, Ωu )
uj = (12)
e f 2,i,j u f 2,j
This is the same as model (5), apart from the addition of the covariance matrix Ωu for the vector of
random intercepts, u j .
Apart from specifying the level-two identifier, imputing the missing values proceeds the same as
before:
each of the j = 1, . . . , 50 schools. For example, for school 2, the posterior means are
ue,2 −0.67
ur,2 0.09
u f 1,2 = 0.16
u f 2,2 −0.14
2. we obtain the estimated level-2 variance covariance matrix of the random intercepts (common
across all ten cities):
ue,j
67.79 13.58 −4.14 −2.21
d ur,j = 13.58 4.17 −1.01 −0.57
Var u f 1,j −4.14 −1.01
.
0.61 0.34
u f 2,j −2.21 −0.57 0.34 0.46
To fit the substantive multilevel model to each imputed data set we proceed as before:
In order to get multiple imputation inference for the random coefficients, we recommend using the
mitml package, as described in the penultimate section below.
We now show how to specify additional random effects in the imputation model, apart from the
random intercept that is included by default. This is done by specifying the design matrix of the
random effects, Z. When this is not specified, it defaults to a random intercept. When it is specified by
the user, the random intercept has to be included (if desired).
Z <- JSPmiss[, c("cons", "sex")] # intercept and sex have random effects
The starting values of the sampling algorithm can again be overridden by the user, thus potentially
leading to better sampling behaviour and faster convergence. In comparison with the single-level
case, we now have two additional sets of parameters: (i) the matrix of random effects, whose rows
contain the random effects vectors for all level-2 units (u.start), and (ii) the level-2 covariance matrix
(l2cov.start). Note that with small numbers of level-2 units, the impact of the scale matrix of the
prior for the level-2 covariance matrix can be substantial. We proceed as follows:
set.seed(1569)
imp <- jomo(Y = Y, X = X, clus = clus, beta.start = beta.start, u.start = u.start,
l1cov.start = l1cov.start, l2cov.start = l2cov.start,
l2cov.prior = l2cov.prior, nburn = 2000, nbetween = 1000, nimp = 5)
# Output omitted; as these are not all the default values, the posterior means differ
# from the previous results by Monte Carlo error.
In some cases, it is implausible that all of the clusters share the same level-1 covariance matrix. For
example, when aggregating individual patient data from different studies to perform a meta-analysis,
it is often reasonable to assume that covariance matrices are different across studies.
Continuing to use (10) as an example, the only difference from before is that now the level-1
covariance matrix is not modelled as constant but as different across level-2 units. Thus instead of Ωe ,
we have Ωe,j , j = 1, . . . , 50. We fit this model by specifying the additional argument meth = "fixed":
Note that the output is now considerably longer, as the posterior mean for each of the level 1 covariance
matrices is reported. Compared to the previous models, this model is more complex and requires
estimation of a larger number of parameters.
There are several reasons why we may wish to go beyond the setting above and allow the covariance
matrices to be random across level-2 units. For example, we may have reason to believe that different
level-2 units have different covariance matrices but that the number of observations on some of these
level-2 units is insufficient to estimate level-2 specific covariance matrices reliably. In this case, sharing
information across level-2 units is desirable. Another situation is when some variable is fully missing
from some clusters, and therefore it is necessary to share information with clusters where it was
observed through the specification of a hierarchical distribution for the covariance matrices.
Continuing to use (10) as an example, the cluster-specific covariance matrices are now assumed to
follow a specific distribution. Thus instead of Ωe,j , we have Ωe,j ∼ IW ( a, S) j = 1, . . . , 10. Here, a and
S are the degrees of freedom and scale matrix of the inverse-Wishart distribution, respectively.
We can simply fit this model by specifying the option meth = "random":
imp3 <- jomo(Y = Y, X = X, clus = clus, meth = "random")
and then analyse the imputed data sets in the usual way. For full details on the algorithm used to
fit the random covariance matrices algorithm initially proposed by Yucel (2011), see the appendix of
Quartagno and Carpenter (2016).
The sub-function called by jomo for this type of data is jomo1ranmixhr, which can be called directly
but with a slightly more complex syntax.
With random covariance matrices, we have one further parameter, a, denoting the degrees of
freedom of the inverse-Wishart distribution for the cluster-specific covariance matrices. The default
starting value for this parameter, a.start, is the minimum possible, i.e., the dimension of the level-1
covariance matrix. This is also the default for the hyperparameter a.prior of the chi-square prior
distribution for a.
With random level-1 covariance matrices we can also specify starting values for the nclus covariance
matrices. Below, we show how to do this by (i) first creating the matrix for the first level 2 unit, a
4-by-4 diagonal matrix with all entries ‘2’, using diag(2,4) then (ii) stacking 50 copies of this:
# Starting values for the 5 by 5 level-1 covariance matrix for the first level-2 unit
l1cov.start.1 <- diag(2, 4)
# Stack 10 copies of this matrix (one for each of the level-2 units)
l1cov.start <- matrix(l1cov.start.1, nrow = 4 * nlevels(JSPmiss$school), ncol = 4,
byrow = TRUE)
# Run jomo
imp <- jomo(Y = Y, X = X, clus = clus, l1cov.start = l1cov.start, a = a.start,
meth = "random")
(1)
Ynormexam,i,j = β 0,n + un,j + en,i,j
(1)
Ystandlrt,i,j = β 0,s + us,j + es,i,j (13)
(2)
Yavslrt,j = β0, a + u a,j
(1)
u
n,j
en,i,j (1)
ei,j = ∼ N (0, Ωe ) us,j ∼ N (0, Ωu ),
uj = (14)
es,i,j
(2)
u a,j
where the superscripts (1) and (2) indicate the level-2 random effect for a level-1 and level-2
covariate, respectively (level-2 covariates have no level-1 residuals).
Fitting this model is very similar to fitting previous models; we simply have to define the level-2
variables.
# Run jomo
set.seed(1569)
imp <- jomo(Y = Y, Y2 = Y2, clus = clus)
As above, we can specify the starting values for all the parameters in the model, and in particular the
parameter of the level-2 variable β2 with the input l2.beta.start. As we noted above, with small
cluster sizes, the scale matrix for the prior of the level 2 covariance matrix, l2cov.prior, may have a
non-negligible impact on the results.
The sub-functions for imputation of level 2 variables are jomo2com and jomo2hr. Cluster-specific
covariance matrices can be specified as before by setting meth = "common" or meth = "random".
500
55
450
βnormexam,0
ω2n,1,1
400
50
350
45
300
0 1000 3000 5000 0 1000 3000 5000
2
Figure 1: MCMC chain for β e,0 (left panel) and ωe,1,1 (right panel).
# Run jomo.MCMCchain
imp <- jomo.MCMCchain(Y = Y, X = X, nburn = 5000)
This updates the sampler nburn times, but does not create any imputed data sets. Instead, the output
of this function is a list containing three elements:
• finimp: the final state of the data set, which would be the first imputation if we ran the jomo
function with nburn burn-in iterations;
• collectbeta: a three-dimensional array containing the fixed effect parameter draws at each of
the nburn iterations;
• collectomega: a three-dimensional array containing the level-1 covariance matrix draws at each
of the nburn iterations;
When running the corresponding .MCMCchain functions for multilevel imputation we will also have:
• collectu: a three-dimensional array containing the random effects draws at each of the nburn
iterations;
• collectcovu: a three-dimensional array containing the level-2 covariance matrix draws at each
of the nburn iterations;
We can then check the convergence of the sampler by looking at the trace plot for each parameter
value. For example, in Figure 1 (left panel), we can see the plot for β e,0 , which we obtain by running:
In this case, we can see that a burn in of 100–500 is reasonable; the sampler clearly converges very
quickly.
Plots for elements of the covariance matrix updated though Metropolis-Hastings steps may look
different, because these chains have higher auto-correlation (as they are not guaranteed to be updated
at each iteration). The right panel of Figure 1 gives an example; this was obtained by running the
following commands:
Note there is little point in plotting the constrained elements of the covariance matrix — these will
always give a straight line!
In practice, we typically need to use finimp.latnorm, together with the last value of the fixed parame-
ters and the covariance matrix at level 1 (and at level 2 if present). The following code illustrates the
approach:
# Define data frames for outcomes and covariates of imputation model and
# convert "fluent" to factor
JSPmiss <- within(JSPmiss, fluent <- factor(fluent))
Y <- JSPmiss[, c("english", "ravens", "fluent")]
X <- JSPmiss[, c("cons", "sex")]
# Capture the state of the sampler as starting values for the second set of iterations:
beta.start <- imp1$collectbeta[,,1000] # capture the fixed parameter values
l1cov.start <- imp1$collectomega[,,1000] # capture the level-1 covariance matrix values
start.imp <- imp1$finimp.latnorm # capture the final imputed data set (with
# latent normals for categorical variables)
In practice, it often works well to use this function to find plausible initial values for the scale matrices
of the level-1 and level-2 covariance matrix priors. This allows us to provide ‘weakly informative’
priors consistent with the data, and avoids imputations being unnecessarily variable.
To do this, we run jomo.MCMCchain first, using the default prior. We retain the last draw (or the
posterior mean of the latter part of the chain) as the covariance matrix prior. We use these to assign
values to l1cov.prior or l2cov.prior and then we apply jomo as usual. Specifically:
# Run jomo
imp <- jomo(Y = Y, X = X, l1cov.prior = l1cov.prior)
However, if a proper prior guess for the value of all the parameters was available, this would
be preferable, as it would avoid using the same data twice, to fit the model and to estimate the
hyperparameters of the priors.
When using jomo, we recommended the following workflow:
1. Before running the imputation model (which may take some time), perform a "dry run", to check
the software is fitting the model we intended. We can do this using the .MCMCchain function
with nburn = 2 and checking the output.
2. Re-run the same function for a larger number of iterations (e.g. 5000) and analyse the resulting
trace and autocorrelation plots to choose a sensible number of burn-in and between-imputation
iterations for the final imputation process.
3. Run the jomo function for the chosen number of iterations.
4. Fit the substantive model on the imputed data sets and apply Rubin’s rules.
The main interface to jomo is provided by the function jomoImpute, which offers two convenient
ways of specifying the imputation model. The first option uses a formula-based syntax similar to
packages for multilevel modelling such as lme4, nlme (Pinheiro et al., 2017), and others. The following
operators can be used to define such a formula:
~ : separates the dependent variables (left-hand side) and predictor variables (right-hand side) of
the imputation model;
+ : adds dependent and predictor variables to the model;
* : adds interactions of two or more predictors to the model;
| : specifies the cluster indicator and adds cluster-specific random effects to the model (e.g.,
1|school), and
I() : defines additional transformation of predictor variables to be included in the model.
For example, to fit the imputation model in (10) for the substantive model in (8) with JSPmiss, the
model formula can be specified as:
The imputation is then run with jomoImpute by specifying the incomplete data, the imputation
model, the number of imputations (m), and the number of iterations for burn-in (n.burn) and between
imputations (n.iter). Like jomo, jomoImpute requires that categorical variables are formatted as
factors.
# Run imputation
imp <- jomoImpute(data = JSPmiss, formula = fml, n.burn = 1000, n.iter = 1000, m = 5,
seed = 1569)
In addition, imputation models can be run independently in subsets of the data. For this purpose,
jomoImpute includes an optional group argument, denoting the name of a variable in the data set. If
specified, the imputation model is run separately for each level of group.
As an alternative to specifying a formula, the imputation model can be specified with the type
argument of jomoImpute. The type argument is an integer vector denoting the role of each variable in
the imputation model. The following values are supported:
In applications with missing data at both level 1 and 2, formula and type are specified as a list of two
formulas or type vectors, denoting the imputation model for variables at level 1 and 2, respectively.
The mitml package can also be used to manage, visualise, and analyse the imputed data. For example,
the summary and plot methods display information about the imputed data object and the convergence
of the MCMC algorithm.
# Call:
The function mitmlComplete can be used to extract a list of imputed data sets. Each data set can
be transformed and analysed with the functions with and within similar to base R. For example, the
following code extracts the imputed data and fits the model in (8) to each of the data sets:
Finally, mitml allows pooling the results obtained from the imputed data sets. For example,
testEstimates can be used to pool the estimates of individual parameters such as fixed effects and
variance components (Rubin’s rules).
# Call:
#
# testEstimates(model = fit.imp, var.comp = TRUE)
# Final parameter estimates and inferences obtained from 5 imputed data sets.
# Estimate Std.Error t.value df P(>|t|) RIV FMI
# (Intercept) -16.133 3.737 -4.317 139.189 0.000 0.204 0.181
# ravens 1.623 0.120 13.552 21.626 0.000 0.755 0.476
# sex 6.837 1.145 5.970 142.006 0.000 0.202 0.179
# fluent1 5.001 4.241 1.179 94.051 0.241 0.260 0.223
# fluent2 14.345 3.139 4.570 56.201 0.000 0.364 0.292
#
# Estimate
# Intercept~~Intercept|school 32.231
# Residual~~Residual 291.538
# ICC|school 0.099
# Unadjusted hypothesis test as appropriate in larger samples.
Many different pooling methods are supported by mitml, including Rubin’s rules with and without
correction for smaller samples (testEstimates), pooled Wald and likelihood-ratio tests (LRTs) for
multiple parameters and model comparisons (testModels, anova), and tests of constraints on the
model parameters via the “delta method” (testConstraints, see Casella and Berger, 2002).
Note that, in order to use mitml directly with jomo, the imputed data must be converted to the
mitml.list format. This conversion can be achieved with the function jomo2mitml.list.
• We should check that the stationary distribution has been reached, before acting on our results.
As described above, the package provides tools to facilitate this.
• With many (level-1) variables and relatively few observations, a careful choice of the prior for
the level-1 covariance matrix is important. We recommend a weakly informative prior and, in
particular, following the strategy described above, running the .MCMCchain functions to find a
sensible choice for the scale matrices for the inverse-Wishart priors.
• Like most imputation software, ours assumes that data are MAR. If data are MNAR, the results
of analyses under MAR may be biased.
In this paper we present functions for multilevel imputation, which make better and more efficient
use of all the available data compared to ad-hoc strategies, like imputing including cluster as a fixed
effect or imputing separately by cluster. The first approach has been discussed in various publications
(Lüdtke et al., 2017; Audigier et al., 2018; Drechsler, 2015) which broadly concluded that the approach
is unsatisfactory for small clusters and low intra-cluster correlation. Additionally, under this approach
dealing with random slopes is problematic and it is not possible to impute systematically missing
variables. Similar considerations are likely to hold for the second strategy consisting in imputing
separately by cluster, with the additional complication that level 2 variables cannot be imputed with
this method.
If we are imputing a variable which has a random slope in the substantive model (Grund et al.,
2016a), then (i) as usual, this variable will be a response in the imputation model and (ii) we should also
allow its association with the outcome to be cluster-specific in the imputation model by allowing the
level-1 covariance matrix to be random across level-2 units. However, although this approach performs
better than a simpler one using a common covariance matrix, it is not a perfectly compatible approach
(Quartagno and Carpenter, 2018; Enders et al., 2018), and functions for substantive model compatible
imputation (Goldstein et al., 2014) should be preferred to impute missing data in those settings, when
possible. These have been recently added to jomo and they will be presented in a second paper
soon. When interactions or non-linear terms are present in the model of interest, ignoring them in the
imputation model may lead to bias; instead, they should be included as covariates (Carpenter and
Kenward, 2013, p. 130). When these terms involve partially observed variables, the solution consists
again in using substantive model compatible functions.
When using functions for random cluster-specific covariance matrices, users should note that
this specifies an inverse-Wishart distribution matrix for the level-1 covariance matrices across the
level-2 units. Our simulations (Quartagno and Carpenter, 2016) suggest when this assumption is not
appropriate there will be some (usually immaterial) loss of efficiency. In principle, jomo could be
extended to incorporate other distributions.
All the illustrated functions make use of either Gibbs or Metropolis-Hastings sampling; however,
other sampling algorithms such as Hamiltonian Monte-Carlo may provide interesting alternatives in
the future.
Future updates and additions to the package will be advertised on www.missingdata.org.uk,
together with an up-to-date list of publications related to the package. We hope the package is useful
to readers and welcome their feedback.
Sources of funding
Matteo Quartagno was supported by funding from the European Community’s Seventh Framework
Programme FP7/2011: Marie Curie Initial Training Network MEDIASRES ("Novel Statistical Method-
ology for Diagnostic/Prognostic and Therapeutic Studies and Systematic Reviews"; www.mediasres-
itn.eu) with the Grant Agreement Number 290025.
James Carpenter is supported by the MRC grant MC_UU_12023/21
Aknowledgements
The authors of the package would like to thank Christopher Charlton and Professor Harvey Goldstein
from Bristol University for their help in creating the package. We would like to thank Alexander
Robitzsch, Vincent Audigier, Anower Hossain, Manuel Gomes, Nicole Erler and all the other people
that found bugs and imperfections in the code.
Bibliography
R. R. Andridge. Quantifying the impact of fixed effects modeling of clusters in multiple imputation
for cluster randomized trials. Biom J, 53(1):57–74, Feb 2011. [p10]
C. Brown. dummies: Create dummy/indicator variables flexibly and efficiently, 2012. URL http://CRAN.R-
project.org/package=dummies. R package version 1.5.6. [p8]
J. R. Carpenter and M. G. Kenward. Multiple Imputation and its Application. Wiley, 2013. ISBN:
978-0-470-74052-1. [p1, 2, 3, 4, 8, 14, 21]
G. Casella and R. L. Berger. Statistical inference. Duxbury Press, 2nd edition, 2002. [p20]
C. K. Enders, T. Hayes, and H. Du. A comparison of multilevel imputation schemes for random coeffi-
cient models: Fully conditional specification and joint model imputation with random covariance
matrices. Multivariate Behavioral Research, 53(5):695–713, 2018. doi: 10.1080/00273171.2018.1477040.
URL https://doi.org/10.1080/00273171.2018.1477040. PMID: 30693802. [p21]
H. Goldstein, J. R. Carpenter, and W. J. Browne. Fitting multilevel multivariate models with missing
data in responses and covariates that may include interactions and non-linear terms. Journal of the
Royal Statistical Society: Series A., 177(2):553–564, 2014. DOI: 10.1111/rssa.12022. [p21]
S. Grund, O. Lüdtke, and A. Robitzsch. Multiple imputation of missing covariate values in multilevel
models with random slopes: a cautionary note. Behav Res Methods, 48(2):640–649, Jun 2016a. [p21]
S. Grund, O. Lüdtke, and A. Robitzsch. Multiple imputation of multilevel missing data: An introduc-
tion to the r package pan. SAGE Open, 6(4):2158244016668220, 2016b. doi: 10.1177/2158244016668220.
URL https://doi.org/10.1177/2158244016668220. [p4]
S. Grund, A. Robitzsch, and O. Lüdtke. mitml: Tools for Multiple Imputation in Multilevel Modeling, 2016c.
URL https://CRAN.R-project.org/package=mitml. R package version 0.3-0. [p18]
S. Grund, O. Lüdtke, and A. Robitzsch. Multiple imputation of missing data at level 2: A comparison
of fully conditional and joint modeling in multilevel designs. Journal of Educational and Behavioral
Statistics, page 1076998617738087, 2018a. doi: 10.3102/1076998617738087. URL https://doi.org/
10.3102/1076998617738087. [p20]
S. Grund, O. Lüdtke, and A. Robitzsch. Multiple imputation of missing data for multilevel models:
Simulations and recommendations. Organizational Research Methods, 21(1):111–149, 2018b. doi:
10.1177/1094428117703686. URL https://doi.org/10.1177/1094428117703686. [p20]
T. Harding, F. Tusell, and J. L. Schafer. cat: Analysis of categorical-variable datasets with missing values,
2012. URL http://CRAN.R-project.org/package=cat. R package version 0.0-6.5. [p1]
A. Hossain, K. Diaz-Ordaz, and J. W. Bartlett. Missing continuous outcomes under covariate dependent
missingness in cluster randomised trials. Statistical Methods in Medical Research, 26(3):1543–1562,
2017a. doi: 10.1177/0962280216648357. URL https://doi.org/10.1177/0962280216648357. PMID:
27177885. [p20]
O. Lüdtke, A. Robitzsch, and S. Grund. Multiple imputation of missing data in multilevel designs: A
comparison of different strategies. Psychol Methods, 22(1):141–165, Mar 2017. [p10, 21]
R. J. A. Little and D. B. Rubin. Bayes and Multiple Imputation, pages 200–220. John Wiley & Sons, Inc.,
2002. ISBN 9781119013563. doi: 10.1002/9781119013563.ch10. URL http://dx.doi.org/10.1002/
9781119013563.ch10. [p1]
F. Meinfelder. BaBooN: Bayesian Bootstrap Predictive Mean Matching - Multiple and single imputation for
discrete data, 2011. URL http://CRAN.R-project.org/package=BaBooN. R package version 0.1-6.
[p8]
X.-L. Meng. Multiple-imputation inferences with uncongenial sources of input. Statistical Science, 9(4):
538–558, 11 1994. doi: 10.1214/ss/1177010269. URL http://dx.doi.org/10.1214/ss/1177010269.
[p2]
P. Mortimore, P. Sammons, L. Stoll, D. Lewis, and R. Ecob. School Matters. Wells: Open Books., 1988.
[p4]
A. A. Novo and J. L. Schafer. norm: Analysis of multivariate normal datasets with missing values, 2013.
URL http://CRAN.R-project.org/package=norm. R package version 1.0-9.5. [p1]
I. Olkin and R. F. Tate. Multivariate correlation models with mixed discrete and continuous variables.
The Annals of Mathematical Statistics, 32(2):448–465, 06 1961. doi: 10.1214/aoms/1177705052. URL
http://dx.doi.org/10.1214/aoms/1177705052. [p1]
J. Pinheiro, D. Bates, S. DebRoy, D. Sarkar, and R Core Team. nlme: Linear and Nonlinear Mixed Effects
Models, 2017. URL https://CRAN.R-project.org/package=nlme. R package version 3.1-131. [p18]
M. Quartagno and J. R. Carpenter. Multiple imputation for ipd meta-analysis: allowing for hetero-
geneity and studies with missing covariates. Statistics in Medicine, 35(17):2938–2954, 2016. ISSN
1097-0258. doi: 10.1002/sim.6837. URL http://dx.doi.org/10.1002/sim.6837. sim.6837. [p3, 4,
10, 13, 20, 21]
M. Quartagno and J. R. Carpenter. Multiple imputation for discrete data: an evaluation of the joint
latent normal model. Biometrical Journal, In press, 2019. [p9, 20]
J. Rasbash, F. Steele, W. Browne, and H. Goldstein. A user’s guide to mlwin, v3.00. Centre for Multilevel
Modelling, University of Bristol., 2017. [p4]
D. B. Rubin. Multiple Imputation for Nonresponse in Surveys. Wiley, New York, 1987. [p1]
J. L. Schafer. Analysis of incomplete multivariate data. Chapman & Hall., 1997. ISBN 0-412-04061-1. [p1]
J. L. Schafer. mix: Estimation/multiple Imputation for Mixed Categorical and Continuous Data, 2010. URL
http://CRAN.R-project.org/package=mix. R package version 1.0-8. [p1]
J. L. Schafer and M. K. Olsen. Multiple imputation for multivariate missing-data problems: A data
analyst’s perspective. Multivariate Behavioral Research, 33, 07 2000. [p1]
J. L. Schafer and R. M. Yucel. Computational strategies for multivariate linear mixed-effects models
with missing values. Journal of Computational and Graphical Statistics, 11(2):437–457, 2002. doi:
10.1198/106186002760180608. URL https://doi.org/10.1198/106186002760180608. [p1]
S. R. Seaman and R. H. Keogh. Handling missing data in matched case-control studies using multiple
imputation. Biometrics, 71(4):1150–1159, 2015. ISSN 1541-0420. doi: 10.1111/biom.12358. URL
http://dx.doi.org/10.1111/biom.12358. [p20]
Z. Zhang, R. Parker, C. Charlton, G. Leckie, and W. Browne. R2mlwin: A package to run mlwin
from within r. Journal of Statistical Software, Articles, 72(10):1–43, 2016. ISSN 1548-7660. doi:
10.18637/jss.v072.i10. URL https://www.jstatsoft.org/v072/i10. [p4]
J. H. Zhao and J. L. Schafer. pan: Multiple Imputation for multivariate panel or clustered data. R Foundation
for Statistical Computing, 2013. R Package, version 0.9. [p1]
Matteo Quartagno
MRC Clinical Trials Unit at UCL
90 High Holborn, London WC1V 6LC
United Kingdom
m.quartagno@ucl.ac.uk
Simon Grund
IPN - Leibniz Institute for Science and Mathematics Education at Kiel University
Olshausenstraße 62, D-24118 Kiel
Germany
grund@ipn.uni-kiel.de
James Carpenter
London School of Hygiene and Tropical Medicine
Keppel Street, Bloomsbury, London WC1E 7HT
United Kingdom
james.carpenter@lshtm.ac.uk