Mixed Effects Models in S and S-Plus PDF
Mixed Effects Models in S and S-Plus PDF
Mixed Effects Models in S and S-Plus PDF
Preface
Mixed-eects models provide a exible and powerful tool for the analysis of
grouped data, which arise in many areas as diverse as agriculture, biology,
economics, manufacturing, and geophysics. Examples of grouped data include longitudinal data, repeated measures, blocked designs, and multilevel
data. The increasing popularity of mixed-eects models is explained by the
exibility they oer in modeling the within-group correlation often present
in grouped data, by the handling of balanced and unbalanced data in a
unied framework, and by the availability of reliable and ecient software
for tting them.
This book provides an overview of the theory and application of linear and nonlinear mixed-eects models in the analysis of grouped data.
A unied model-building strategy for both linear and nonlinear models is
presented and applied to the analysis of over 20 real datasets from a wide variety of areas, including pharmacokinetics, agriculture, and manufacturing.
A strong emphasis is placed on the use of graphical displays at the various
phases of the model-building process, starting with exploratory plots of the
data and concluding with diagnostic plots to assess the adequacy of a tted
model. Over 170 gures are included in the book.
The class of mixed-eects models considered in this book assumes that
both the random eects and the errors follow Gaussian distributions. These
models are intended for grouped data in which the response variable is (at
least approximately) continuous. This covers a large number of practical
applications of mixed-eects models, but does not include, for example,
generalized linear mixed-eects models (Diggle, Liang and Zeger, 1994).
viii
Preface
The balanced mix of real data examples, modeling software, and theory
makes this book a useful reference for practitioners who use, or intend to
use, mixed-eects models in their data analyses. It can also be used as a text
for a one-semester graduate-level applied course in mixed-eects models.
Researchers in statistical computing will also nd this book appealing for
its presentation of novel and ecient computational methods for tting
linear and nonlinear mixed-eects models.
The nlme library we developed for analyzing mixed-eects models in implementations of the S language, including S-PLUS and R, provides the
underlying software for implementing the methods presented in the text,
being described and illustrated in detail throughout the book. All analyses
included in the book were produced using version 3.1 of nlme with S-PLUS
3.4 running on an Iris 5.4 Unix platform. Because of platform dependencies, the analysis results may be expected to vary slightly with dierent
computers or operating systems and with dierent implementations of S.
Furthermore, the current version of the nlme library for R does not support
the same range of graphics presentations as does the S-PLUS version. The
latest version of nlme and further information on the NLME project can
be obtained at
http://nlme.stat.wisc.edu or
http://cm.bell-labs.com/stat/NLME.
Errata and updates of the material in the book will be made available
on-line at the same sites.
The book is divided into parts. Part I, comprising ve chapters, is dedicated to the linear mixed-eects (LME) model and Part II, comprising
three chapters, covers the nonlinear mixed-eects (NLME) model. Chapter 1 gives an overview of LME models, introducing some examples of
grouped data and the type of analyses that applies to them. The theory
and computational methods for LME models are the topics of Chapter
2. Chapter 3 describes the structure of grouped data and the many facilities available in the nlme library to display and summarize such data.
The model-building approach we propose is described and illustrated in
detail in the context of LME models in Chapter 4. Extensions of the basic LME model to include variance functions and correlation structures for
the within-group errors are considered in Chapter 5. The second part of
the book follows an organization similar to the rst. Chapter 6 provides
an overview of NLME models and some of the analysis tools available for
them in nlme. The theory and computational methods for NLME models
are described in Chapter 7. The nal chapter is dedicated to model building
in the context of NLME models and to illustrating in detail the nonlinear
modeling facilities available in the nlme library.
Even though the material covered in the book is, for the most part,
self-contained, we assume that the reader has some familiarity with linear
regression models, say at the level of Draper and Smith (1998). Although
enough theory is covered in the text to understand the strengths and weak-
Preface
ix
but some blank lines have been removed without indication. The S output
was generated using the options settings
> options( width = 68, digits = 5 )
Preface
Venables. Finally, we would like to thank our editor, John Kimmel, for his
continuous encouragement and support.
Jose C. Pinheiro
Douglas M. Bates
March 2000
Contents
Preface
vii
1
3
4
8
11
12
14
19
21
23
25
27
30
30
37
40
44
45
52
52
xii
Contents
57
58
58
60
62
62
66
68
71
75
77
78
79
81
82
83
87
92
94
94
96
97
97
101
104
108
110
110
113
114
116
120
130
130
133
134
139
146
146
. .
. .
. .
. .
for
. .
. .
. .
. . .
. . .
. . .
. . .
the
. . .
. . .
. . .
.
.
.
.
.
.
.
.
. . 157
. . 167
. . 174
Contents
on
on
. .
. .
the
the
. .
. .
Within-Group Error
Random Eects . .
. . . . . . . . . . . .
. . . . . . . . . . . .
xiii
.
.
.
.
174
187
196
197
201
202
202
203
II
271
205
206
208
214
226
226
230
232
239
249
266
267
.
.
.
.
.
.
.
.
.
.
.
.
273
273
277
287
294
300
301
Models
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
305
306
306
309
310
312
312
322
324
328
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xiv
Contents
7.5
7.6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
328
329
332
333
334
336
337
. . 338
. . 338
. . 342
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
347
354
354
365
385
391
391
395
. . 401
. . 409
. . 410
References
A Data Used in Examples and Exercises
A.1 AlfalfaSplit-Plot Experiment on Varieties of Alfalfa . . .
A.2 AssayBioassay on Cell Culture Plate . . . . . . . . . . .
A.3 BodyWeightBody Weight Growth in Rats . . . . . . . .
A.4 CefamandolePharmacokinetics of Cefamandole . . . . .
A.5 CO2Carbon Dioxide Uptake . . . . . . . . . . . . . . . .
A.6 DialyzerHigh-Flux Hemodialyzer . . . . . . . . . . . . .
A.7 DNaseAssay Data for the Protein DNase . . . . . . . .
A.8 EarthquakeEarthquake Intensity . . . . . . . . . . . . .
A.9 ergoStoolErgometrics Experiment with Stool Types . .
A.10 Glucose2Glucose Levels Following Alcohol Ingestion . .
A.11 IGFRadioimmunoassay of IGF-I Protein . . . . . . . . .
A.12 IndomethIndomethicin Kinetics . . . . . . . . . . . . . .
A.13 LoblollyGrowth of Loblolly Pine Trees . . . . . . . . . .
A.14 MachinesProductivity Scores for Machines and Workers
A.15 OatsSplit-plot Experiment on Varieties of Oats . . . . .
A.16 OrangeGrowth of Orange Trees . . . . . . . . . . . . . .
A.17 OrthodontOrthodontic Growth Data . . . . . . . . . . .
415
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
423
425
425
427
427
428
429
429
430
431
432
433
433
434
435
435
436
436
Contents
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xv
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
437
437
437
438
439
440
440
441
443
443
444
444
448
448
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
451
451
452
453
455
457
458
459
460
461
462
464
466
469
471
471
473
474
476
478
479
479
483
485
486
488
490
492
494
495
xvi
Contents
qqnorm.lme . . . . .
ranef . . . . . . . .
ranef.lme . . . . . .
ranef.lmList . . . .
residuals.lme . . .
selfStart . . . . . .
selfStart.default
selfStart.formula
Variogram . . . . . .
Variogram.lme . . .
.
.
.
.
.
.
.
.
.
.
497
498
499
501
503
504
505
506
507
508
511
511
511
512
512
513
513
514
515
516
516
517
518
519
519
520
521
Index
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
523
Part I
Linear Mixed-Eects
Models
1
Linear Mixed-Eects Models:
Basic Concepts and Examples
4
Rail
3
6
1
5
2
40
60
80
100
travel
FIGURE 1.1. Travel time in nanoseconds for ultrasonic head-waves in a sample of six railroad rails. The times shown are the result of subtracting 36,100
nanoseconds from the original observation.
i = 1, . . . , M,
j = 1, . . . , ni ,
(1.1)
where yij is the observed travel time for observation j on rail i, is the
mean travel time across the population of rails being sampled, and the ij
are independent N (0, 2 ) error terms. The number of rails is M and the
number of observations on rail i is ni . In this case M =6 and n1 = n2 =
M
= n6 = 3. The total number of observations is N = i=1 ni = 18.
The lm function is used to t the single-mean model (1.1) in S. Its rst
argument is a formula describing the model and its second argument is a
data frame containing the variables named in the model formula.
> fm1Rail.lm <- lm( travel ~ 1, data = Rail )
> fm1Rail.lm
Call:
lm(formula = travel ~ 1, data = Rail)
Coefficients:
(Intercept)
66.5
Degrees of freedom: 18 total; 17 residual
Residual standard error: 23.645
Rail
3
6
1
5
2
-40
-20
20
Residuals
FIGURE 1.2. Boxplots of residuals by rail number for the lm t of the single-mean
model (1.1) to the data from the rail experiment.
the classication factor when modeling grouped data: the group eects
are incorporated into the residuals (which, in this case, have identical signs
for each rail), leading to an inated estimate of the within-rail variability.
The rail eects indicated in Figure 1.2 may be incorporated into the
model for the travel times by allowing the mean of each rail to be represented by a separate parameter. This xed-eects model for the one-way
classication is written
yij = i + ij ,
i = 1, . . . , M,
j = 1, . . . , ni ,
(1.2)
where the i represents the mean travel time of rail i and, as in (1.1), the
errors ij are assumed to be independently distributed as N (0, 2 ). We can
again use lm to t (1.2).
> fm2Rail.lm <- lm( travel ~ Rail - 1, data = Rail )
> fm2Rail.lm
Call:
lm(formula = travel ~ Rail - 1, data = Rail)
Coefficients:
Rail2 Rail5 Rail1 Rail6 Rail3 Rail4
31.667
50
54 82.667 84.667
96
Degrees of freedom: 18 total; 12 residual
Residual standard error: 4.0208
Rail
3
6
1
5
2
-6
-4
-2
Residuals
FIGURE 1.3. Boxplots of residuals by rail number for the lm t of the xed-eects
model (1.2) to the data from the rail experiment.
by rail number, shown in Figure 1.3. The residuals are now centered around
zero and have considerably smaller magnitudes than those in Figure 1.2.
Even though the xed-eects model (1.2) accounts for the rail eects, it
does not provide a useful representation of the rails data. Its basic problem
is that it only models the specic sample of rails used in the experiment,
while the main interest is in the population of rails from which the sample
was drawn. In particular, fm2Rail.lm does not provide an estimate of the
between-rail variability, which is one of the central quantities of interest in
the rails experiment. Another drawback of this xed-eects model is that
the number of parameters in the model increases linearly with the number
of rails.
A random-eects model circumvents these problems by treating the rail
eects as random variations around a population mean. The following reparameterization of model (1.2) helps motivate the random-eects model
for the rails data. We write
(1.3)
yij = + i + ij ,
where = 6i=1 i /6 represents the average travel time for the rails in the
experiment. The random-eects model replaces by the mean travel time
over the population of rails and replaces the deviations i by random
variables whose distribution is to be estimated.
A random-eects model for the one-way classication used in the rails
experiment is written
yij = + bi + ij ,
(1.4)
where is the mean travel time across the population of rails being sampled, bi is a random variable representing the deviation from the population
mean of the mean travel time for the ith rail, and ij is a random variable
representing the deviation in travel time for observation j on rail i from
the mean travel time for rail i.
ij N (0, 2 ).
(1.5)
The rst argument indicates that the response is travel and that there is
a single xed eect, the intercept. The second argument indicates that the
data will be found in the object named Rail. The third argument indicates
that there is a single random eect for each group and that the grouping
is given by the variable Rail. Note that there is a variable or column Rail
within the data frame that is also named Rail. Because no estimation
method is specied, the default, "REML", is used.
We can query the tted lme object, fm1Rail.lme, using dierent accessor
functions, also described in detail in Appendix B. One of the most useful
of these is the summary function
> summary( fm1Rail.lme )
Linear mixed-effects model fit by REML
Data: Rail
AIC
BIC logLik
128.18 130.68 -61.089
Random effects:
Formula: ~ 1 | Rail
(Intercept) Residual
StdDev:
24.805
4.0208
Fixed effects: travel ~ 1
Value Std.Error DF t-value p-value
(Intercept) 66.5
10.171 12 6.5382 <.0001
Standardized Within-Group Residuals:
Min
Q1
Med
Q3
Max
-1.6188 -0.28218 0.035693 0.21956 1.6144
Number of Observations: 18
Number of Groups: 6
We see that the REML estimates for the parameters have been calculated
as
= 66.5,
b = 24.805,
= 4.0208,
10
Standardized residuals
11
1.5
1.0
0.5
0.0
-0.5
-1.0
-1.5
30
40
50
60
70
80
90
Fitted values
FIGURE 1.4. Standardized residuals versus the tted values for the REML t of
a random-eects model to the data from the rail experiment.
The standardized residuals, shown on the vertical axis in Figure 1.4, are
the raw residuals, eij = yij bi , divided by the estimated standard
deviation,
, of the ij .
In this plot we are looking for a systematic increase (or, less commonly,
a systematic decrease) in the variance of the ij as the level of the response
increases. If this is present, the residuals on the right-hand side of the plot
will have a greater vertical spread than those on the left, forming a horizontal wedge-shaped pattern. Such a pattern is not evident in Figure 1.4.
With more complicated models there are other diagnostic plots that we
may want to examine, as discussed in Chapter 4.
12
We should also examine numerical summaries of the model. A basic summary is a set of condence intervals on the parameters, , and b , as
produced by the intervals function.
> intervals( fm1Rail.lme )
Approximate 95% confidence intervals
Fixed effects:
lower est. upper
(Intercept) 44.339 66.5 88.661
Random Effects:
Level: Rail
lower
est. upper
sd((Intercept)) 13.274 24.805 46.354
Within-group standard error:
lower
est. upper
2.695 4.0208 5.9988
In this case, the xed-eects model is so simple that the analysis of variance
is trivial. The hypothesis being tested here is = 0. The p-value, which
is that probability of observing data as unusual as these or even more so
when actually is 0, is so small as to rule out this possibility. Regardless
of the p-value, the hypothesis = 0 is of no practical interest here because
the data have been shifted by subtracting 36,100 nanoseconds from each
measurement.
Subject
1
2
1
7
3
6
9
4
5
8
>
2
s
s
s
>
+
>
s
s
s
s
>
>
>
>
>
10
4
+
+
>
s
s
13
+
>
+
+
+
+
12
14
FIGURE 1.5. Eort required (Borg scale) to arise from a stool for nine dierent
subjects each using four dierent types of stools. Dierent symbols, shown in the
key at the top of the plot, are used for the dierent types of stools.
the nlme library are from an ergometrics experiment that has a randomized
block design. The experimenters recorded the eort required by each of
nine dierent subjects to arise from each of four types of stools. We want
to compare these four particular types of stools so we use xed eects
for the Type factor. The nine dierent subjects represent a sample from
the population about which we wish to make inferences so we use random
eects to model the Subject factor.
From Figure 1.5 it appears that there are systematic dierences between
stool types on this measurement. For example, the T2 stool type required
the greatest eort from each subject while the T1 stool type was consistently
one of the low eort types. The subjects also exhibited variability in their
scoring of the eort, but we would expect this. We say that Subject to be
a blocking factor because it represents a known source of variability in the
experiment. Type is said to be an experimental factor because the purpose
of the experiment is to determine if there are systematic dierences in the
level of eort to arise from the dierent types of stools.
We can visually compare the magnitude of the eects of the Type and
Subject factors using a design plot
> plot.design( ergoStool )
This plot is produced by averaging the responses at each level of each factor
and plotting these averages. We see that the variability associated with the
Type factor is comparable to the variability associated with the Subject
factor. We also see that the average eort according to stool type is in the
order T1 T4 T3 T2.
11
2
1
3
7
6
9
5
8
Type
Subject
10
9
mean of effort
12
14
Factors
FIGURE 1.6. Design plot for the data in the stool ergometric experiment. The
mean of the response (effort) is plotted for each level of each of the factors Type
and Subject.
i = 1, . . . , 9,
N (0, b2 ),
j = 1, . . . , 4,
ij N (0, 2 ),
(1.6)
or, equivalently,
y i = X i + Z i bi + i ,
bi
N (0, b2 ),
where, for i = 1, . . . , 9,
yi1
1
yi2
0
yi = , X i =
yi3
0
yi4
0
0
1
0
0
0
0
1
0
i = 1, . . . , 9,
i N (0, 2 I),
0
0
,
0
1
1
1
Zi = 1 =
,
1
1
i1
i2
.
i =
i3
i4
15
(In R the default contrasts for an unordered factor are the treatment
contrasts, which are described below.)
The X i matrices for a given set of contrasts can be displayed with the
model.matrix function. To save space we show the X 1 matrix only.
> ergoStool1 <- ergoStool[ ergoStool$Subject == "1", ]
> model.matrix( effort ~ Type, ergoStool1 ) # X matrix for Subject 1
(Intercept) Type1 Type2 Type3
1
1
-1
-1
-1
2
1
1
-1
-1
3
1
0
2
-1
4
1
0
0
3
DF t-value p-value
24 21.331 <.0001
24
7.498 <.0001
24
0.618 0.5421
24 -3.236 0.0035
16
T2
T3
0
0
17
Although the individual parameter estimates for the Type factor are different between the two ts, the anova results are the same. The dierence
in the parameter estimates simply reects the fact that dierent contrasts
are being estimated. The similarity of the anova results indicates that the
overall variability attributed to the Type factor does not change. In each
case, the row labelled Type in the analysis of variance table represents a
test of the hypothesis
H0 : 2 = 3 = 4 = 0,
which is equivalent to reducing model (1.6) to
y i = 1 + Z i bi + i ,
i = 1, . . . , 9,
bi N (0, b2 ),
i N (0, 2 I).
18
and the tted model is now expressed in terms of the mean eort for each
stool type over the population
> fm3Stool <+ lme(effort ~ Type - 1, data = ergoStool, random = ~ 1 | Subject)
> summary( fm3Stool )
Linear mixed-effects model fit by REML
Data: ergoStool
AIC
BIC logLik
133.13 141.93 -60.565
Random effects:
Formula: ~ 1 | Subject
(Intercept) Residual
StdDev:
1.3325
1.1003
Fixed effects: effort ~ Type - 1
Value Std.Error DF t-value p-value
Type1 8.556
0.57601 24 14.853 <.0001
Type2 12.444
0.57601 24 21.604 <.0001
Type3 10.778
0.57601 24 18.711 <.0001
Type4 9.222
0.57601 24 16.010 <.0001
Correlation:
Type1 Type2 Type3
Type2 0.595
Type3 0.595 0.595
Type4 0.595 0.595 0.595
19
This change in the xed-eects structure does change the anova results.
> anova( fm3Stool )
numDF denDF F-value p-value
Type
4
24 130.52 <.0001
i = 1, . . . , 9,
bi N (0, b2 ),
i N (0, 2 I).
That is, the hypothesis H0 completely eliminates the xed-eects parameters from the model so the mean response across the population would be
zero. This hypothesis is not meaningful in the context of this experiment.
To reiterate, some general principles to keep in mind regarding xedeects terms for factors are:
The overall eect of the factor should be assessed with anova, not
by examining the t-values or p-values associated with the xedeects parameters. The anova output does not depend on the choice
of contrasts as long as the intercept term is retained in the model.
Interpretation of the parameter estimates for a xed-eects term depends on the contrasts being used.
For REML estimation, likelihood-ratio tests or comparisons of AIC
or BIC require the same xed-eects structure and the same choice
of contrasts in all models.
The cell means parameters can be estimated by adding -1 to a
model formula but this will usually make the results of anova meaningless.
Standardized residuals
20
1
0
-1
10
12
14
FIGURE 1.7. Standardized residuals versus the tted values for the REML t
of a random-eects model to the data in the ergometric experiment on types of
stools.
est.
10.250000
1.944444
0.092593
-0.342593
upper
11.24175
2.47970
0.40162
-0.12408
Random Effects:
Level: Subject
lower
est. upper
sd((Intercept)) 0.74923 1.3325 2.3697
Within-group standard error:
lower
est. upper
0.82894 1.1003 1.4605
Standardized residuals
10
12
14
10
12
21
14
2
1
0
-1
1
0
-1
10
12
14
10
12
14
10
12
14
FIGURE 1.8. Standardized residuals versus the tted values by Subject for the
REML t of a random-eects model to the data in the ergometric experiment on
types of stools.
22
Worker
A
5
3
1
4
2
6
>
C
+++
>>
+
++ +
+ + >>>
>>>
+++ >>
>
+ ++ >>>
+ ++
45
> >>
50
55
60
65
70
Productivity score
FIGURE 1.9. Productivity scores for three types of machines as used by six dierent workers. Scores take into account the number and the quality of components
produced.
We note that there is very little variability in the productivity score for the
same worker using the same machine.
As we did for the experiment on the types of stools, we will model the
subject or Worker factor with random eects and the type or Machine factor
with xed eects. The replications in this experiment will allow us to assess
the presence of interactions between worker and machine. That is, we can
address the question of whether the eect of changing from one type of
machine to another is dierent for dierent workers.
The comparative dotplot in Figure 1.9 allows us to see patterns across
the workers and to see dierences between machines within each worker.
However, the possibility of interactions is not easy to assess in this plot. An
alternative plot, called an interaction plot, shows the potential interactions
more clearly. It is produced by averaging the scores for each worker on each
machine, plotting these averages versus the machine type, and joining the
points for each worker. The function interaction.plot in S creates such a
plot. It is most easily called after attaching the data frame with the data
so the variables in the data frame can be accessed by name.
> attach( Machines ) # make variables in Machines available by name
> interaction.plot( Machine, Worker, score, las = 1) # Figure 1.10
> detach()
# undo the effect of attach( Machines )
70
mean of score
23
5
3
1
4
2
6
65
60
55
50
45
A
C
Machine
FIGURE 1.10. An interaction plot for the productivity scores for six dierent
workers using three dierent machine types.
i = 1, . . . , 6,
N (0, b2 ),
j = 1, . . . , 3,
2
ijk N (0, ).
k = 1, . . . , 3,
(1.7)
There is a xed eect for each type of machine and a random eect for
each worker. As before, the xed eects for the machines will be re-coded
as an intercept and a set of contrasts when we t this model as
> fm1Machine <+
lme( score ~ Machine, data = Machines, random = ~ 1 | Worker )
> fm1Machine
Linear mixed-effects model fit by REML
Data: Machines
Log-restricted-likelihood: -145.23
Fixed: score ~ Machine
(Intercept) Machine1 Machine2
59.65
3.9833
3.3111
Random effects:
Formula: ~ 1 | Worker
(Intercept) Residual
StdDev:
5.1466
3.1616
Number of Observations: 54
Number of Groups: 6
24
1, . . . , 6, j = 1, . . . , 3, is
yijk = j + bi + bij + ijk ,
bi
N (0, 12 ),
i = 1, . . . , 6,
bij
N (0, 22 ),
j = 1, . . . , 3,
k = 1, . . . , 3,
2
ijk N (0, ).
This model has random eects at two levels: the eects bi for the worker
and the eects bij for the type of machine within each worker. In a call
to lme we can express this nesting as Worker/Machine in the formula for
the random eects. This expression is read as Worker and Machine within
Worker . We can update the previous model with a new specication for
the random eects.
> fm2Machine <- update( fm1Machine, random = ~ 1 | Worker/Machine )
> fm2Machine
Linear mixed-effects model fit by REML
Data: Machines
Log-restricted-likelihood: -109.64
Fixed: score ~ Machine
(Intercept) Machine1 Machine2
59.65
3.9833
3.3111
Random effects:
Formula: ~ 1 | Worker
(Intercept)
StdDev:
4.7814
Formula: ~ 1 | Machine %in% Worker
(Intercept) Residual
StdDev:
3.7294 0.96158
Number of Observations: 54
Number of Groups:
Worker Machine %in% Worker
6
18
The likelihood ratio statistic comparing the more general model (fm2Machine)
to the more specic model (fm2Machine) is huge and the p-value for the test
is essentially zero, so we prefer fm2Machine.
25
The anova function with multiple arguments also reproduces the values
of the AIC and the BIC criteria for each model. As described in 1.1.1,
these criteria can be used to decide which model to prefer. Because the
preference is according to smaller is better, both these criteria show a
strong preference for fm2Machine over fm1Machine.
26
Number of Groups:
Worker Machine %in% Worker
6
18
> intervals( fm1MachinesU )
Approximate 95% confidence intervals
Fixed effects:
lower
est.
upper
(Intercept) 55.2598 59.6476 64.0353
Machine1 1.5139 3.9812 6.4485
Machine2 1.8940 3.3123 4.7307
Random Effects:
Level: Worker
lower
est. upper
sd((Intercept)) 2.2162 4.7388 10.132
Level: Machine
lower
est. upper
sd((Intercept)) 2.4091 3.7728 5.9084
Within-group standard error:
lower
est. upper
0.71202 0.9332 1.2231
est.
10.250000
1.944444
0.092593
-0.342593
upper
11.24175
2.47970
0.40162
-0.12408
27
Random Effects:
Level: Subject
lower
est. upper
sd((Intercept)) 0.74952 1.3325 2.3688
Level: Type
lower
est. upper
sd((Intercept)) 0.05386 0.99958 18.551
Within-group standard error:
lower
est. upper
4.3603e-07 0.45988 485050
Apparently the standard deviations 2 and could vary over twelve orders
of magnitude!
If we write this model for these data, taking into account that each
subject only tries each type of stool once, we would have
yij = i + bj + bij + ij , i = 1, . . . , 3, j = 1, . . . , 6,
bj N (0, 12 ), bij N (0, 22 ), ij N (0, 2 ).
We can see that the bij are totally confounded with the ij so we cannot
estimate separate standard deviations for these two random terms. In fact,
the estimates reported for and 2 in this model give a combined variance
that corresponds to
2 from fm1Stool.
> (fm1Stool$sigma)^2
[1] 1.2106
> (fm4Stool$sigma)^2 + 0.79621^2
[1] 1.2107
The lesson here is that it is always a good idea to check the condence
intervals on the variance components after tting a model. Having abnormally wide intervals usually indicates problems with the model denition.
In particular, a model with nested interaction terms can only be t when
there are replications available in the data.
28
i = 1, . . . , 6,
i N (0, 2 I),
29
Data: Machines
AIC
BIC logLik
231.89 251.21 -105.95
Random effects:
Formula: ~ Machine - 1 | Worker
Structure: General positive-definite
StdDev
Corr
MachineA 4.07928 MachnA MachnB
MachineB 8.62529 0.803
MachineC 4.38948 0.623 0.771
Residual 0.96158
Fixed effects: score ~ Machine
Value Std.Error DF t-value p-value
(Intercept) 59.650
2.1447 46 27.813 <.0001
Machine1 3.983
1.2104 46
3.291 0.0019
Machine2 3.311
0.5491 46
6.030 <.0001
Correlation:
(Intr) Machn1
Machine1 0.811
Machine2 -0.540 -0.453
Standardized Within-Group Residuals:
Min
Q1
Med
Q3
Max
-2.3935 -0.51378 0.026908 0.47245 2.5334
Number of Observations: 54
Number of Groups: 6
This model can be compared to the previous two models using the multiargument form of anova.
> anova( fm1Machine, fm2Machine, fm3Machine
Model df
AIC
BIC logLik
fm1Machine
1 5 300.46 310.12 -145.23
fm2Machine
2 6 231.27 242.86 -109.64 1
fm3Machine
3 10 231.89 251.21 -105.95 2
)
Test L.Ratio p-value
vs 2
vs 3
71.191
7.376
<.0001
0.1173
Because the p-value for the test comparing models 2 and 3 is about 12%,
we would conclude that the t fm3Machine is not signicantly better than
fm2Machine, taking into account the fact that fm3Maching requires four additional parameters in the model.
The AIC criterion is nearly the same for models 2 and 3, indicating that
there is no strong preference between these models. The BIC criterion does
indicate a strong preference for model 2 relative to model 3. In general BIC
puts a heavier penalty than does AIC on having more parameters in the
model. Because there are a total of ten parameters in model 3 compared
30
to six parameters in model 2, the BIC criterion will tend to prefer model 2
unless model 3 provides a substantially better t.
i = 1, . . . , M,
N (0, b2 ),
j = 1, . . . , ni ,
ij N (0, 2 )
i = 1, . . . , N,
i N (0, 2 ),
N (0, b2 ),
i = 1, . . . , M,
j = 1, . . . , ni ,
2
ij N (0, ).
(1.8)
F03
10
12
31
14
F04
F11
30
25
20
F10
F09
F06
F01
F05
F07
F02
F08
M13
M14
M09
M15
M06
M04
M01
M10
30
25
20
30
25
20
M16
M05
M02
M11
M07
M08
M03
M12
30
25
20
10
12
14
10
12
14
10
12
14
10
12
14
Age (yr)
FIGURE 1.11. Distance from the pituitary to the pterygomaxillary ssure versus
age for a sample of 16 boys (subjects M01 to M16) and 11 girls (subjects F01 to
F11). The aspect ratio for the panels has been chosen to facilitate comparison of
the slope of the lines.
From Figure 1.11 it appears that there are qualitative dierences between
boys and girls in their growth patterns for this measurement. In Chapter 4
we will model some of these dierences, but for now it is easier to restrict
our modeling to the data from the female subjects only. To extract the
data for the females only we rst check on the names of the variables in
the Orthodont object, then check for the names of the levels of the variables
32
Sex, then extract only those rows for which the Sex variable has the value
"Female".
> names( Orthodont )
[1] "distance" "age"
"Subject" "Sex"
> levels( Orthodont$Sex )
[1] "Male"
"Female"
> OrthoFem <- Orthodont[ Orthodont$Sex == "Female", ]
Figure 1.11 indicates that, for most of the female subjects, the orthodontic measurement increases with age and that the growth is approximately
linear over this range of ages. It appears that the intercepts, and possibly
the slopes, of these growth curves may dier between girls. For example,
subject 10 has considerably smaller measurements than does subject 11,
and the growth rate for subjects 2 and 3 is considerably greater than that
for subjects 5 and 8.
To explore this potential linear relationship further, we t separate linear
regression models for each girl using the lmList function.
> fm1OrthF.lis <- lmList( distance ~ age, data = OrthoFem )
> coef( fm1OrthF.lis )
(Intercept)
age
F10
13.55 0.450
F09
18.10 0.275
F06
17.00 0.375
F01
17.25 0.375
F05
19.60 0.275
F08
21.45 0.175
F07
16.95 0.550
F02
14.20 0.800
F03
14.40 0.850
F04
19.65 0.475
F11
18.95 0.675
Subject
(Intercept)
F11
F04
F03
F08
F02
F07
F05
F01
F06
F09
F10
|
|
|
|
10
15
|
|
25
0.0
|
|
|
|
20
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
age
|
|
33
|
|
0.4
|
0.8
1.2
F09
F06
F01
F05
F07
F02
F08
F03
F04
F11
14.621
13.521
13.771
16.121
13.471
10.721
17.971
10.921
16.171
15.471
, , age
lower
F10 0.1401
F09 -0.0349
F06 0.0651
F01 0.0651
F05 -0.0349
F07 0.2401
F02 0.4901
F08 -0.1349
F03 0.5401
F04 0.1651
F11 0.3651
18.10
17.00
17.25
19.60
16.95
14.20
21.45
14.40
19.65
18.95
est.
0.450
0.275
0.375
0.375
0.275
0.550
0.800
0.175
0.850
0.475
0.675
21.579
20.479
20.729
23.079
20.429
17.679
24.929
17.879
23.129
22.429
upper
0.7599
0.5849
0.6849
0.6849
0.5849
0.8599
1.1099
0.4849
1.1599
0.7849
0.9849
Figure 1.12 is of interest as much for what it does not show as for what
it does show. First, consider what the gure does show. We notice that
the intervals for the intercepts are all the same width, as are the intervals
34
Subject
(Intercept)
F11
F04
F03
F08
F02
F07
F05
F01
F06
F09
F10
| | |
18
I(age - 11)
| | |
|
| | |
|
| | |
| | |
|
|
| | |
| | |
|
|
| | |
|
|
| | |
|
|
| | |
|
|
| | |
|
|
20
22
24
26
0.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0.4
|
0.8
1.2
for the slope with respect to age. This is a consequence of having balanced
data; that is, all the subjects were observed the same number of times and
at the same ages. We also notice that there is considerable overlap in the
set of intervals for the slope with respect to age. It may be feasible to use
a model with a common slope.
The surprising thing about Figure 1.12 is that it does not show the
substantial dierences in the intercepts that Figure 1.11 would lead us to
expect. Furthermore, even though we have ordered the groups from the
one with the smallest average distance (subject F10) to the one with the
largest average distance (subject F11), this ordering is not reected in the
intercepts. Finally, we see that the pattern across subjects in the intervals
for the intercepts is nearly a reection of the pattern in the intervals for
the slopes.
Those with experience analyzing regression models may already have
guessed why this reection of the pattern occurs. It occurs because all the
data were collected between age 8 and age 14, but the intercept represents
a distance at age 0. The extrapolation back to age 0 will result in a high
negative correlation (about 0.98) between the estimates of the slopes and
their corresponding intercept estimate.
We will remove this correlation if we center the data. In this case, we
would t the distance as a linear function of age - 11 so the two coecients
being estimated are the distance at 11 years of age and the slope or growth
rate. If we t this revised model and plot the condence intervals
> fm2OrthF.lis <- update( fm1OrthF.lis, distance ~ I( age - 11 ) )
> plot( intervals( fm2OrthF.lis ) )
# produces Figure 1.13
then these intervals (Figure 1.13) show the expected trend in the (Intercept)
term, which now represents the tted distance at 11 years.
To continue with the analysis of these data we could t a regression model
to the centered data with a common growth rate but separate intercepts for
each girl. Before doing that we should consider what we could infer from
35
such a model. We could use such a model to make inferences about the
growth rate for this sample of girls. Also, we could make inferences about
the expected distance for each girl at 11 years of age. Using combinations
of the parameters we could make inferences about the expected distance
for each of these girls at other ages. The key point is that we are in some
ways restricting ourselves to the distances that we have or could observe
on these particular girls
By tting a mixed-eects model to these data we allow ourselves to make
inferences about the xed eects, which represent average characteristics of
the population represented by these subjects, and the variability amongst
subjects. A call to lme to t linear growth curves with common slopes but
randomly distributed shifts to the girls orthodontic data is
> fm1OrthF <+
lme( distance ~ age, data = OrthoFem, random = ~ 1 | Subject )
> summary( fm1OrthF )
Linear mixed-effects model fit by REML
Data: OrthoFem
AIC
BIC logLik
149.22 156.17 -70.609
Random effects:
Formula: ~ 1 | Subject
(Intercept) Residual
StdDev:
2.0685 0.78003
Fixed effects: distance ~ age
Value Std.Error DF t-value p-value
(Intercept) 17.373
0.85874 32 20.230 <.0001
age 0.480
0.05259 32
9.119 <.0001
Correlation:
(Intr)
age -0.674
Standardized Within-Group Residuals:
Min
Q1
Med
Q3
Max
-2.2736 -0.70902 0.17282 0.41221 1.6325
Number of Observations: 44
Number of Groups: 11
We could also t a model with the formula distance ~ I(age - 11) but,
because of the requirement of a common slope, for model building purposes
the properties of the centered model are essentially equivalent to the uncentered model. Using the uncentered model makes it easier to compare
with other models described below.
36
i = 1, . . . , 11,
i N (0, 2 I),
with matrices
X 1 = = X 11
1 8
1 10
=
1 12 ,
1 14
Z 1 = = Z 11
1
1
=
1 .
1
b = 2.0685,
= 0.78003,
1 = 17.373,
2 = 0.480.
37
Number of Observations: 44
Number of Groups: 11
Notice that, to the accuracy printed here, the estimates of the xed-eects
parameters are the same for ML and REML. The ML estimates of the
= 0.76812 are smaller than the
standard deviations,
b = 1.9699 and
corresponding REML estimates. This is to be expectedthe REML criterion was created to compensate for the downward bias of the maximum
likelihood estimates of variance components, so it should produce larger
estimates.
We have made the assumption of a common slope or growth rate for all
the subjects. To test this we can t a model with random eects for both
the intercept and the slope.
> fm2OrthF <- update( fm1OrthF, random = ~ age | Subject )
The predictions from this model are shown in Figure 1.14. We compare the
two models with the anova function.
> anova( fm1OrthF, fm2OrthF )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm1OrthF
1 4 149.22 156.17 -70.609
fm2OrthF
2 6 149.43 159.85 -68.714 1 vs 2 3.7896 0.1503
Because the p-value for the second model versus the rst is about 15%, we
conclude that the simpler model, fm1OrthF, is adequate.
38
F02
10
12
14
F08
10
F03
12
14
F04
F11
28
26
24
22
20
18
F10
F09
F06
F01
F05
F07
28
26
24
22
20
18
10
12
14
10
12
14
10
12
14
Age (yr)
FIGURE 1.14. Original data and tted linear relationships from a mixed-eects
model for the girls orthodontic data. This model incorporates random eects for
both the slope and the intercept.
39
2.145709
3.590778
The coefficients function (or its shorter form coef) is used to extract
the coecients of the tted lines for each subject. For the tted model
1 + bi and the slope
fm1OrthF the intercept of the tted line for subject i is
is 2 .
> coef( fm1OrthF )
(Intercept)
age
F10
13.367 0.47955
F09
15.902 0.47955
F06
15.902 0.47955
F01
16.144 0.47955
F05
17.351 0.47955
F07
17.713 0.47955
F02
17.713 0.47955
F08
18.075 0.47955
F03
18.437 0.47955
F04
19.524 0.47955
F11
20.972 0.47955
Looking back at the BLUPs of the random eects for the ML and REML
ts, we can see that they are very similar. The same will be true of the
coecients for each subject and hence for the tted lines themselves. To
show this we can plot either the estimated BLUPs or the estimated coecients. The compareFits function is helpful here because it allows us to put
both sets of coecients on the same panels.
> plot(compareFits(coef(fm1OrthF), coef(fm1OrthFM)))
# Figure 1.15
In Figure 1.15 each line corresponds to one subject. In the left panel the
estimated intercepts from the REML t are shown as open circles while
those from the ML t are shown as +s. The two estimates for each subject
are essentially identical. In the right panel the estimates for the coecient
with respect to age are shown. Because there is no random eect associated
with this coecient, the estimates do not vary between subjects. Again, the
ML estimates and the REML estimates are essentially identical.
We may also want to examine the predictions for each subject from the
tted model. The augPred function produces predictions of the response for
each of the groups over the observed range of the covariate (i.e. the range
814 for age). These predictions are augmented with the original data to
produce a plot of the predicted lines for each subject superposed on the
original data as in Figure 1.16.
> plot( augPred(fm1OrthF), aspect = "xy", grid = T )
# Fig. 1.16
Further diagnostic plots, such as plots of residuals versus the tted values
by subject (not shown), did not indicate any serious deciencies in this
40
coef(fm1OrthF)
coef(fm1OrthFM)
(Intercept)
F11
F04
F03
F08
F02
F07
F05
F01
F06
F09
F10
age
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
14
16
18
20
0.40
0.45
0.50
0.55
FIGURE 1.15. A comparison of the coecients of the tted lines for each female
subject in the orthodontic example. The two sets of coecients are from the
restricted maximum likelihood t (fm1OrthF) and the maximum likelihood t
(fm1OrthFM).
F02
10
12
14
F08
10
F03
12
41
14
F04
F11
28
26
24
22
20
18
F10
F09
F06
F01
F05
F07
28
26
24
22
20
18
10
12
14
10
12
14
10
12
14
Age (yr)
FIGURE 1.16. Original data and tted growth curves for each female subject
in the orthodontic example. The tted curves are from a restricted maximum
likelihood t of the analysis of covariance model.
erable variability between dogs in this pattern. Within the same dog, the
left and the right side generally follow the same pattern over time but often
with a vertical shift between sides.
We will start with a quadratic model with respect to the day covariate
so we can model the pattern of reaching a peak. We use random-eects for
both the intercept and the linear term at the Dog level and a single random
eect for the intercept at the Side within Dog level. This allows the overall
pattern to vary between dogs in terms of the location of the peak, but not
in terms of the curvature at the peak. The only dierence between sides
for the same dog will be a shift in the intercept.
42
Pixel intensity
10
15
20
10
15
20
10
1160
1140
1120
1100
1080
1060
1040
1160
1140
1120
1100
1080
1060
1040
0
10
15
20
10
15
20
10
15
20
lower
1053.0968
4.3797
-0.4349
est.
1073.33914
6.12960
-0.36735
upper
1093.5814
7.8795
-0.2998
Random Effects:
Level: Dog
lower
est.
upper
sd((Intercept)) 15.92849 28.36994 50.52918
sd(day)
1.08085
1.84375 3.14514
cor((Intercept),day) -0.89452 -0.55472 0.19138
Level: Side
lower
est. upper
sd((Intercept)) 10.417 16.824 27.173
Within-group standard error:
lower
est. upper
7.6345 8.9896 10.585
> plot( augPred( fm1Pixel ) )
If we write the pixel intensity of the jth sides on the ith dog at the kth
occasion as yijk i = 1, . . . , 10; j = 1, 2; k = 1, . . . , nij , and the time of the
10
15
20
9/L
9/R
10
15
43
20
10/L
10/R
1150
1100
1050
1000
6/L
6/R
7/L
7/R
8/L
3/R
4/L
4/R
5/L
5/R
Pixel intensity
1150
1100
1050
1000
1150
1100
1050
1000
1/L
1/R
2/L
2/R
3/L
1150
1100
1050
1000
0
10
15
20
10
15
20
10
15
20
kth scan on the ith dog as dik , the model being t can be expressed as
yijk = 1 + 2 dik + 3 d2ik + bi,1 + bi,2 dik + bij + ijk ,
i = 1, . . . , 10, j = 1, 2, k = 1, . . . , nij .
(1.9)
1 4 16
1 4
1
1 6 36
1 6
1
X8 1 = X8 2 =
1 10 100 , Z 8,1 = Z 8,2 = 1 10 , Z 8 1 = Z 8 2 = 1 .
1 14 196
1 14
1
44
bij N (0, 22 ),
45
The p-value is extremely small indicating that the more general model,
fm1Pixel, is denitely superior. The AIC and BIC values conrm this.
We can also check if the random eect for day at the Dog level is warranted. If we eliminate this term the only random eects will be a random
eect for the intercept for each dog and for each side of each dog. We t
this model and compare it to fm1Pixel with
> fm3Pixel <- update( fm1Pixel, random = ~ 1 | Dog/Side )
> anova( fm1Pixel, fm3Pixel )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm1Pixel
1 8 841.21 861.97 -412.61
fm3Pixel
2 6 876.84 892.41 -432.42 1 vs 2 39.629 <.0001
Again, the likelihood-ratio test, and the AIC and BIC criteria, all strongly
favor the more general model, fm1Pixel.
Earlier we stated that there does not appear to be a systematic dierence
between the left and the right sides of the dogs. For some dogs the left side
produces higher pixel densities while for other dogs the right side does. We
can check that this indeed is the case by adding a term for Side to the xed
eects.
> fm4Pixel <- update( fm1Pixel, pixel ~ day + day^2 + Side )
> summary( fm4Pixel )
...
Fixed effects: pixel ~ day + I(day^2) + Side
Value Std.Error DF t-value p-value
(Intercept) 1073.3
10.171 80 105.53 <.0001
day
6.1
0.879 80
6.97 <.0001
I(day^2)
-0.4
0.034 80 -10.83 <.0001
Side
-4.6
3.813 9
-1.21 0.2576
...
With a p-value of over 25% the xed-eects term for Side would not be
considered signicant.
Finally, we would examine residual plots such as Figure 1.19 for deciencies in the model. There are no alarming patterns in this gure.
46
Standardized residuals
1040
5
1080
1120
1040
1080
1120
9
2
1
0
-1
-2
10
-3
2
1
0
-1
-2
-3
1040
1080
1120
1040
1080
1120
1040
1080
1120
Fitted values
FIGURE 1.19. Standardized residuals versus tted values by dog for a multilevel
mixed-eects model t to the pixel data..
Marvellous
0.0
IV
0.2
0.4
47
Victory
0.6
II
160
140
120
Yield (bushels/acre)
100
80
60
VI
III
160
140
120
100
80
60
0.0
0.2
0.4
0.6
0.0
0.2
0.4
0.6
FIGURE 1.20. Yield in bushels/acre of three dierent varieties of oats at four different concentrations of nitrogen (hundred weight/acre). The experimental units
were arranged into six blocks, each with three whole-plots subdivided into four
subplots. One variety of oats was used in each whole-plot with all four concentrations of nitrogen, one concentration in each of the four subplots. The panels
correspond to the blocks.
The model with xed eects for both experimental factors and for their
interaction and with random eects for both the Block factor and the
Variety (whole-plot) factor is t with
> fm1Oats <- lme( yield ~ ordered(nitro) * Variety, data = Oats,
+
random = ~ 1 | Block/Variety )
> anova( fm1Oats )
numDF denDF F-value p-value
(Intercept)
1
45 245.15 <.0001
ordered(nitro)
3
45
37.69 <.0001
Variety
2
10
1.49 0.2724
ordered(nitro):Variety
6
45
0.30 0.9322
The anova results indicate that nitro is a signicant factor, but that neither
Variety nor the interaction between Variety and nitro are signicant.
48
If we drop the interaction term and ret, we obtain essentially the same
results for the two main eects, Variety and nitro.
> fm2Oats <- update( fm1Oats, yield ~ ordered(nitro) + Variety )
> anova( fm2Oats )
numDF denDF F-value p-value
(Intercept)
1
51 245.14 <.0001
ordered(nitro)
3
Variety
2
> summary( fm2Oats )
...
Random effects:
Formula: ~ 1 | Block
(Intercept)
StdDev:
14.645
51
10
41.05
1.49
<.0001
0.2724
In this model there is a random eect for Variety %in% Block as well as
a xed eect for Variety. These terms model dierent characteristics of the
response. The random eects term, as a nested random eect, is allowing
for dierent intercepts at the level of plots within blocks. The fact that
each plot is planted with one variety means that we can use the Variety
factor to indicate the plot as long as we have Variety nested within Block.
As seen in Figure 1.20 the yields in one of the plots within a block may be
greater than those on another plot in the same block for all levels of nitro.
For example, in block III the plot that was planted with the Marvellous
variety had greater yields than the other two plots at each level of nitro.
The random eect at the level of Variety %in% Block allows shifts like this
that may be related to the fertility of the soil in that plot, for example.
On the other hand, the xed-eects term for Variety is used to model a
systematic dierence in the yields that would be due to the variety of oats
planted in the plot. There do not appear to be such systematic dierences.
For example, even though the plot planted with the Marvellous variety is
49
the highest yielding plot in block III, the Marvellous plot is one of the
lowest yielding in block V.
Because the xed eect for Variety and the random eect for Variety
%in% Block are modeling dierent types of behavior, it makes sense to remove the xed eect while retaining the random eect
> fm3Oats <- update( fm1Oats, yield ~ ordered( nitro ) )
> summary( fm3Oats )
...
Random effects:
Formula: ~ 1 | Block
(Intercept)
StdDev:
14.506
Formula: ~ 1 | Variety %in% Block
(Intercept) Residual
StdDev:
11.039
12.75
Fixed effects: yield ~ ordered(nitro)
Value Std.Error DF t-value p-value
(Intercept) 103.97
6.6406 51 15.657 <.0001
ordered(nitro).L
32.94
3.0052 51 10.963 <.0001
ordered(nitro).Q
-5.17
3.0052 51 -1.719 0.0916
ordered(nitro).C
-0.45
3.0052 51 -0.149 0.8823
We see that the estimates for the random-eects variances and the xedeects for nitro have changed very little, if at all.
We can now examine the eect of nitrogen in more detail. We notice that
the linear term, ordered(nitro).L, is highly signicant, but the quadratic
and cubic terms (.Q and .C extensions) are not. To remove the cubic and
quadratic terms in the model, we simply revert to using nitro as a numeric
variable.
> fm4Oats <+
lme( yield ~ nitro, data = Oats, random = ~ 1 | Block/Variety )
> summary( fm4Oats )
. . .
Random effects:
Formula: ~ 1 | Block
(Intercept)
StdDev:
14.506
Formula: ~ 1 | Variety %in% Block
(Intercept) Residual
StdDev:
11.005
12.867
Fixed effects: yield ~ nitro
Value Std.Error DF t-value p-value
(Intercept) 81.872
6.9453 53 11.788 <.0001
nitro 73.667
6.7815 53 10.863 <.0001
50
Correlation:
(Intrc
nitro -0.293
Standardized Within-Group Residuals:
Min
Q1
Med
Q3
Max
-1.7438 -0.66475 0.017104 0.54299 1.803
Number of Observations: 72
Number of Groups:
Block Variety %in% Block
6
18
We can see that the random eects at the Block and plot levels account
for a substantial amount of the variability in the response. Although the
standard deviations of these random eects are not estimated very precisely,
it does not appear reasonable that they could be zero. To check this we
would t models without these random eects and use likelihood ratio tests
to compare them to fm4Oats. We do not show that here.
0.2
0.4
0.6
0.0
0.2
0.4
0.6
0.0
0.2
0.4
V/Victory
V/Golden Rain
V/Marvellous
VI/Victory
VI/Golden Rain
VI/Marvellous
III/Victory
III/Golden Rain
III/Marvellous
IV/Victory
IV/Golden Rain
IV/Marvellous
51
0.6
160
140
120
100
80
Yield (bushels/acre)
60
160
140
120
100
80
60
I/Victory
I/Golden Rain
I/Marvellous
II/Victory
II/Golden Rain
II/Marvellous
160
140
120
100
80
60
0.0
0.2
0.4
0.6
0.0
0.2
0.4
0.6
0.0
0.2
0.4
0.6
FIGURE 1.21. Observed and predicted yields in bushels/acre for three dierent
varieties of oats at four dierent concentrations of nitrogen (hundred weight/acre)
by block and variety. Although the model has a random eect for variety, the
whole-plot factor, there is no xed eect for variety.
do not show any systematic lack of t. (The extra arguments in the plot
call are used to enhance the appearance of the plot. They are described in
3.3.)
We could (and did) examine other common diagnostic plots to check for
inadequacies in this model, but did not nd any. We now have a simple,
adequate model to explain the dependence of the response on both the levels
52
Exercises
1. The PBIB data (Appendix A.22) are from an agricultural experiment
that was laid out as a partially balanced incomplete block design. This
is described in more detail in 2.4.2. The roles of the variables in these
data are indicated by the names: response, Treatment, and Block. The
structure is similar to that of the ergoStool data but, because there
Exercises
53
are only four observations in each block and there are 15 levels of
Treatment, each block receives only a subset of the treatments.
(a) Plot the PBIB data with plot(PBIB). Do there appear to be systematic dierences between blocks?
(b) Create a design plot, like Figure 1.6 (p. 14), for the PBIB data.
(c) Fit a linear mixed-eects model, with xed eects for Treatment
and random eects for the intercept by Block, to the PBIB data.
The call to lme would be like that used to t fm1Stool in 1.2.
(d) Apply anova to the tted model. Is the Treatment term signicant? Describe the hypothesis being tested.
(e) Create a plot of the residuals versus the tted values for this
tted model. This plot is like Figure 1.4. Does this plot indicate
greater variance in the residuals at higher levels of the response?
(f) Create a plot of the standardized residuals versus the tted values by Block. This plot is like Figure 1.8 (p. 21). Does this plot
indicate systematic patterns in the residuals by Block?
We will discuss the anova results for this tted model in more detail
in 2.4.2.
2. The Oxboys data described in 3.1 and Appendix A.19 consist of the
heights of 26 boys from Oxford, England, each measured on nine
dierent occasions. The structure is similar to that of the OrthoFem
data of 1.4.
(a) Plot the data (using plot(Oxboys)) and verify that a simple linear regression model gives a suitable representation of the boys
growth patterns. Do there appear to be signicant dierences in
the individual growth patterns?
(b) Fit a simple linear regression model to height versus age using
the lm function, ignoring the Subject eects. Obtain the boxplots
of the residuals by Subject with bwplot(Subject ~ resid(object),
Oxboys), where object should be replaced with the name of the
tted lm object. Explain the observed pattern.
(c) Use the lmList function to t separate simple linear regression
models for each Subject, using a call similar to the one used to
produce fm1OrthF.lis in 1.4. Compare the boxplots of the residuals by Subject for the lmList t (obtained with plot(object,
Subject ~ resid(.)), with object replaced with the name of the
lmList object) to those obtained for the lm t. Compare also the
residual standard errors from the two ts and comment.
(d) Plot the individual condence intervals on the parameters estimated in the lmList t and verify that both the intercept and
the slope vary signicantly with Subject.
54
(e) Use the lme function to t an LME model to the data with
random eects for both the intercept and the slope, using a call
similar to the one used to obtain fm1OrthF in 1.4. Examine the
boxplots of the residuals by Subject, comparing them to those
obtained for the lm and lmList ts.
(f) Produce the plot of the standardized residuals versus tted values (plot(object)) and the normal plot of the standardized
residuals (qqnorm(object)). (In both cases object should be replaced with the name of the lme object.) Can you identify any
departures from the models assumptions?
(g) Plot the augmented predictions for the lme t (obtained with
plot(augPred(object))). Do the linear models for each subject
appear adequate?
(h) Another way of assessing the linear models for each subject is
to plot the residuals versus age by Subject (use plot(object,
resid(.) ~ age | Subject), replacing object with the name of
the lme object). Several subjects have a noticeable scooping
pattern in their residuals, indicating the need for a model with
curvature.
(i) Use the lmList function to t separate quadratic models for each
subject. A quadratic model in age, as shown in fm1Pixel of 1.5,
would be t with lmList(height ~ age + age2, Oxboys).
(j) Examine a plot of the condence intervals on coecients from
this second lmList t. Are there indications that the coecients
dier between subjects? Are the quadratic coecients signicantly dierent from zero for some subjects?
(k) Fit the full mixed-eects model corresponding to the last lmList
t. The model will have linear and quadratic terms for age in the
xed-eects and the random eects. A simple way to describe
this model is lme(object) replacing object with the name of the
lmList t.
(l) Check residual plots and numerical summaries for this lme model.
Do there appear to be deciencies in the t? Do there appear to
be terms in the model that could be eliminated?
3. The LME model used for the Pixel data in 1.5 uses random eects for
the intercept and the slope at the Dog level and a single random eect
for the intercept at the Side within Dog level. We did not discuss there
how that random-eects model was chosen. The lmList function can
be used with multilevel data to investigate which terms in an LME
model require random eects.
(a) Use lmList to t a separate quadratic model in day for each Dog.
Print the tted object and examine the estimated coecients.
Exercises
55
Can you explain the error message printed in the lmList t? Notice that lmList was able to recover from the error and proceed
to normal completion.
(b) Plot the individual condence intervals for the coecients in the
lmList t. Verify that only the intercept and the linear coecient
seem to vary signicantly with Dog.
(c) Use the level argument to lmList to t separate quadratic models in day for each Side within Dog (use Dog/Side as the grouping
expression and set level=2). Print the tted object using summary
and explain the missing values (NA) for the standard errors of Dog
10.
(d) Plot the individual condence intervals for the coecients in
the lmList t by Side within Dog and verify that there is more
variation among the intercepts and the linear coecients than
among the quadratic coecients.
(e) Fit an LME model with random eects for the intercept and the
linear term at both levels of grouping. Compare the resulting
lme t to the fm1Pixel object in 1.5 using anova. Which model
should be the preferred?
4. The Alfalfa data described in Appendix A.1 is another example of a
split-plot experiment. The structure is similar to that of the Oats data
of 1.6: a 3 4 full factorial on varieties of alfalfa and date of third
cutting is used with 6 blocks each subdivided into 4 plots according
to a split-plot arrangement. The whole-plot treatments are given by
the varieties and the subplot treatments by the date of third cutting.
(a) Plot the data (using plot(Alfalfa)). Do there appear to be cutting dates that are consistently worse/better than the others?
What can you say about the block-to-block variation in the
yields?
(b) Use lme to t a two-level LME model with grouping structure Block/Variety, including a single random intercept for each
level of grouping (i.e., random = ~1 | Block/Variety). Assume
a full factorial structure with main eects and interactions for
the xed eects (i.e., fixed = Yield ~ Date * Variety). Use the
treatment contrasts (options(contrasts = c("contr.treatment",
"contr.poly"))) to get more interpretable coecients for the
xed eects.
(c) Examine the signicance of the terms in the model using anova,
verifying that there are no signicant dierences between varieties and no signicant interactions between varieties and cutting dates.
56
(d) Because the data are balanced, a similar ANOVA model can
be t using aov and the Error function (use aov(Yield ~ Date
* Variety + Error(Block/Variety), Alfalfa)). Compare the results from the aov and lme ts, in particular the F-values and
p-values for testing the terms in the xed-eects model (these are
obtained for the aov object using the summary function). In this
case, because of the balanced structure of the data, the REML
t (obtained with lme) and the ANOVA t (obtained with aov)
are identical.
(e) Ret the LME model using xed eects for Date only (a simple
way to do this is to use update(object, Yield ~ Date), where
object should be replaced with name of the previous lme object). Print the resulting object using summary and investigate
the dierences between the cutting dates (recall that, for the
treatment contrasts, the coecients represent dierences with
respect to the cutting date labelled None). Can you identify a
trend in the eect of cutting date on yield?
(f) Examine the plot of the residuals versus tted values and the
normal plot of the residuals. Can you identify any departures
from the LME models assumptions?
2
Theory and Computational Methods
for Linear Mixed-Eects Models
In this chapter we present the theory for the linear mixed-eects model
introduced in Chapter 1. A general formulation of LME models is presented and illustrated with examples. Estimation methods for LME models, based on the likelihood or the restricted likelihood of the parameters,
are described, together with the computational methods used to implement them in the lme function. Asymptotic results on the distribution of
the maximum likelihood estimators and the restricted maximum likelihood
estimators are used to derive condence intervals and hypotheses tests for
the models parameters.
The purpose of this chapter is to present an overview of the theoretical
and computational aspects of LME models that allows the evaluation of the
strengths and limitations of such models for practical applications. It is not
the purpose of this chapter to present a thorough theoretical description
of LME models. Such a comprehensive treatment of the theory of linear
mixed-eects models can be found, for example, in Searle, Casella and
McCulloch (1992) or in Vonesh and Chinchilli (1997).
Readers who are more interested in the applications of LME models
and the use of the functions and methods in the nlme library to t such
models can, without loss of continuity, skip this chapter and go straight
to Chapter 3. If you decide to skip this chapter at a rst reading, it is
recommended that you return to it (especially 2.1) at a later time to get
a good understanding of the LME model formulation and its assumptions
and limitations.
58
i = 1, . . . , M,
i N (0, 2 I),
(2.1)
59
1 1 1 1
1
1 1 1
, i = 1, . . . , 6.
Xi =
1
0
2 1
1
0
0
3
The random-eects regression matrices Z i and the relative precision factor are the same as in the rails example.
Orthodontic Growth Curve in Girls
The orthodontic growth curve data for females presented in 1.4.1 are also
balanced, with M = 11, ni = 4, i = 1, . . . , 11. For the LME model with
60
random eects for both the intercept and the slope, used to t the fm2OrthF
object in 1.4.1, we have p = q = 2 and the xed- and random-eects
regressor matrices are identical and given by
1 8
1 10
Xi = Zi =
1 12 , i = 1, . . . , 11.
1 14
Any square-root of the 2 2 matrix 2 1 can be used as a relative
precision factor in this case.
i = 1, . . . , M,
bij N (0, 2 ),
j = 1, . . . , Mi ,
(2.2)
j = 1, . . . , Mi ,
bij N (0, 2 ),
k = 1, . . . , Mij ,
bijk N (0, 3 ),
Note that the distinction between, say, the kth horizontal section of the
regressor matrix for the level-2 random eect bij , written Z ij,k , and the
61
jkth horizontal section of the regressor matrix for the level-1 random eect
bi , written Z i,jk , is the position of the comma in the subscripts.
As with a single level of random eects, we will express the variance
covariance matrices, q , q = 1, . . . , Q, in terms of relative precision factors
q .
In this book, we only consider mixed-eects models with a multivariate
normal (or Gaussian) distribution for the random eects and the withingroup errors. Generally we assume that the variancecovariance matrix
q for the level-q random eects can be any positive-denite, symmetric
matrix. In some models we will further restrict the form of q , say by
requiring that it be diagonal or that it be a multiple of the identity.
Those familiar with the multilevel modeling literature (Bryk and Raudenbush, 1992; Goldstein, 1995) may notice that we count levels dierently.
In that literature the model (2.1) is called a two-level model because there
are two levels of random variation. Similarly, the model (2.2) is called a
three-level model. We prefer the terminology from the experimental design
literature and count the number of levels as the number of nested levels
of random eects.
X ij
1
1
=
1
1
0.0
0.2
,
0.4
0.6
Z i,j = Z i j
1
1
=
1 ,
1
i = 1, . . . , 6,
j = 1, . . . , 3.
62
Because all the random eects are scalars, the precision factors are
uniquely dened (up to changes in sign) as
1 =
2 /12
and 2 =
2 /22 .
i=1
p y i |bi , , 2 p bi |, 2 dbi ,
(2.3)
63
(2.4)
(2.5)
where |A| denotes the determinant of the matrix A. Substituting (2.4) and
(2.5) into (2.3) provides the likelihood as
L , , 2 |y =
exp y X i Z i bi 2 + bi 2 /2 2
M
i
abs ||
dbi
2 ni /2
(2 2 )q/2
i=1 (2 )
2
2
exp yi X i Z i bi /2
M
abs ||
=
dbi , (2.6)
q/2
2 ni /2
(2 2 )
i=1 (2 )
where
y
i = i ,
y
0
Xi
Xi =
,
0
Zi
Zi =
,
(2.7)
are augmented data vectors and model matrices. This approach of changing
the contribution of the marginal distribution of the random eects into
extra rows for the response and the design matrices is called a pseudodata approach because it creates the eect of the marginal distribution by
adding pseudo observations.
The exponent in the integral of (2.6) is in the form of a squared norm or,
more specically, a residual sum-of-squares. We can determine the conditional modes of the random eects given the data, written
bi , by minimizing
this residual sum-of-squares. This is a standard least squares problem for
which we could write the solution as
T 1 T
i .
Zi yi X
bi = Zi Zi
The squared norm can then be expressed as
2
2
2
i Zi bi
i Zi
bi
bi + Zi bi
= yi X
yi X
2
T T
i Zi
= yi X
bi Zi Zi bi
bi . (2.8)
bi + bi
64
The rst term in (2.8) does not depend on bi so its exponential can be factored out of the integral in (2.6). Integrating the exponential of the second
term in (2.8) is equivalent, up to a constant, to integrating a multivariate
normal density function. Note that
T T
2
T
exp
b
Z
Z
b
/2
i
i
i
i
i
i
Z
i|
|Z
i
dbi
q/2
(2 2 )
TZ
i|
|Z
i
T T
2
exp bi
bi Z i Z i bi bi /2
1
dbi
=
2 )q/2 / |Z
TZ
TZ
i|
i|
|Z
(2
i
i
1
1
=
=
. (2.9)
T
T
T
|Z
Z
+
|
|Z i Z i |
i
i
By combining (2.8) and (2.9) we can express the integral in (2.6) as
2
i Zi bi
exp yi X
/2 2
q/2
(2 2 )
dbi
2
i Zi
exp yi X
bi /2 2
=
T
Z i Z i
to give
L , , 2 |y
=
1
(2 2 )N/2
exp
2
M
Z
b
i
i i abs ||
i=1 i
.
2 2
T
i=1
Z i Z i
(2.10)
65
where
Z1
Xe = 0
..
.
0
0
0
0
Z2
..
.
...
...
...
...
..
.
0
0
0
0
..
.
0
0
...
...
ZM
X1
0
X2
..
.
XM
0
and
y1
0
y2
ye = 0 .
..
.
y M
0
(2.11)
() =
N
2
(2.12)
Notice that the maximum likelihood estimate of 2 is the residual sum-ofsquares divided by N , not by N p.
Substituting these conditional estimates back into (2.10) provides the
proled likelihood
M
exp (N/2) abs ||
L() = L((),
,
2 ()) =
.
[2
2 ()]N/2 i=1 T
Z i Z i
(2.13)
bM or ()
to
We do not actually need to calculate the values of
b1 , . . . ,
evaluate the proled likelihood. We only need to know the norm of the
residual from the augmented least squares problem. The decomposition
methods described in 2.2.2 provide us with fast, convenient methods of
calculating this.
66
i = 1, . . . , M,
(2.14)
i
where
= Z i bi + i . Because the
are the sum of two independent
multivariate normal random vectors, they are independently distributed as
multivariate normal vectors with mean 0 and variancecovariance matrix
2 i , where i = I + Z i Z Ti / 2 . It then follows from (2.14) that the y i
are independent multivariate normal random vectors with mean X i and
variancecovariance matrix 2 i . That is,
!
T
1
ni
(y i X i ) 1
2
2 2
i (y i X i )
exp
|i | 2 .
p y i |, , = 2
2 2
For a given value of , the values of and 2 that maximize the likelihood
could be written as
!1 M
M
"
"
T
1
X Xi
X T 1 y ,
()
=
i
i=1
2 () =
i=1
T
M
1
X
()
()
y
y
i
i
i
i
i
i=1
.
N
Computationally these expressions are much more dicult than (2.11) and
(2.12). Using these expressions for ()
and
2 () we could derive the
proled likelihood or log-likelihood.
We present these expressions for completeness only. We prefer to use
the expressions from the pseudo-data representation for computation, especially when the pseudo-data representation is combined with orthogonaltriangular decompositions described in the next section.
67
1 8
1 10
Xi =
1 12 , i = 1, . . . , 11.
1 14
We can generate such a matrix in S and create its decomposition by
> Xmat <- matrix( c(1, 1, 1,
> Xmat
[,1] [,2]
[1,]
1
8
[2,]
1
10
[3,]
1
12
[4,]
1
14
> Xqr <- qr( Xmat )
> qr.R( Xqr )
[,1]
[,2]
[1,]
-2 -22.0000
[2,]
0 -4.4721
> qr.Q( Xqr )
[,1]
[,2]
[1,] -0.5 0.67082
[2,] -0.5 0.22361
[3,] -0.5 -0.22361
[4,] -0.5 -0.67082
> qr.Q( Xqr, complete = TRUE
[,1]
[,2]
[,3]
[1,] -0.5 0.67082 0.023607
[2,] -0.5 0.22361 -0.439345
[3,] -0.5 -0.22361 0.807869
[4,] -0.5 -0.67082 -0.392131
# creates a QR structure
# returns R
# returns Q-truncated
68
Q
Q
0
2
R
=
c 0
= c1 R2 + c2 2 ,
T
= QT y is the rotated residual vector. The components
where c = cT1 cT2
c1 and c2 are of lengths p and n p, respectively.
Because X has rank p, the p p matrix R is nonsingular and upper is easily evaluated as the solution
triangular. The least-squares solution
to
= c1
R
and the residual sum-of-squares is c2 2 . Notice that the residual sum-of
squares can be evaluated without having to calculate .
Z i = Q(i)
,
0
where Q(i) is (ni + q) (ni + q) and R11(i) is q q. Then
2
2
i Z
i bi
i Z
i bi
i X
i X
y
= QT(i) y
2
2
= c1(i) R10(i) R11(i) bi + c0(i) R00(i) ,
where the q p matrix R10(i) , the ni p
and the ni -vector c0(i) are dened by
R10(i)
i and
= QT(i) X
R00(i)
c1(i)
i.
= QT(i) y
c0(i)
69
Another way of thinking of these matrices is as components in an orthogonaltriangular decomposition of an augmented matrix
Zi
Xi
0
yi
R11(i)
= Q(i)
0
0
R10(i)
R00(i)
c1(i)
,
c0(i)
where the reduction to triangular form is halted after the rst q columns.
(The peculiar numbering scheme for the submatrices and subvectors is
designed to allow easy extension to more than one level of random eects.)
Returning to the integral in (2.6) we can now remove a constant factor and
reduce it to
exp y X i Z i bi 2 + bi 2 /2 2
i
dbi
2 2
c1(i) R10(i) R11(i) bi 2
2
exp
2
c0(i) R00(i)
2
= exp
dbi . (2.15)
q/2
2 2
(2 2 )
Because R11(i) is nonsingular, we can perform a change of variable to i =
(c1(i) R10(i) R11(i) bi )/ with dierential di = q abs |R11(i) | dbi
and write the integral as
exp c1(i) R10(i) R11(i) bi 2 /2 2
(2 2 )q/2
dbi
exp i 2 /2
1
=
di
q/2
abs R11(i)
(2)
= 1/ abs R11(i) .
(2.16)
70
R00(1) c0(1)
..
.. = Q R00 c0
(2.17)
.
0
.
0
c1
R00(M ) c0(M )
produces the reduced form
L , , 2 |y
2
N/2
exp
= 2 2
abs
i=1
!
||
R11(i) .
(2.18)
and
2 () =
c1
,
N
(2.19)
!N/2
N
2
2 c1
M
||
N
exp
abs
,
2 i=1
|R11(i) |
(2.20)
M
"
||
N
[log N log(2) 1] N log c1 +
log abs
.
2
|R11(i) |
i=1
(2.21)
The proled log-likelihood (2.21) is maximized with respect to , produc The maximum likelihood estimates
ing the maximum likelihood estimate .
2
in (2.19).
and
are then obtained by setting =
71
Although technically the random eects bi are not parameters for the
statistical model, they do behave in some ways like parameters and often
we want to estimate their values. The conditional modes of the random
eects, evaluated at the conditional estimate of , are the Best Linear Unbiased Predictor s or BLUP s of the bi , i = 1, . . . , M . They can be evaluated,
using the matrices from the orthogonal-triangular decompositions, as
R
()
.
(2.22)
bi () = R1
c
1(i)
10(i)
11(i)
In practice, the unknown vector is replaced by its maximum likelihood
producing estimated BLUPs
estimate ,
bi ().
The decomposition (2.17) is equivalent to calculating the QR decomposition of the potentially huge matrix X e dened in (2.11). If we determined
the least-squares solution to (2.11) using an orthogonal-triangular decomposition
Re
,
X e = Qe
0
the triangular part of the decomposition and the leading part of the rotated,
augmented response vector would be
R11(1)
c1(1)
0
...
0
R10(1)
0
c1(2)
R11(2) . . .
0
R10(2)
..
.
.
.
.
.
..
..
..
.. and c1 =
Re = .
.. .
0
c1(M )
0
. . . R11(M ) R10(M )
0
0
0
0
R00
c0
Thus, the ()
and
2 () from (2.19) are the same as those from (2.17)
and (2.12). The vector c1 is the residual vector in the coordinate system
determined by Qe . Because Qe is orthogonal, c1 2 is the residual sumof-squares for the least squares problem dened by X e and y e .
The proled log-likelihood (2.20) has the same form as (2.13). It consists
of three additive components; a constant, a scaled logarithm of the residual
sum-of-squares, and a sum of ratios of the logarithms of determinants. In
the next section we examine these terms in detail.
72
1. The constant N2 [log N log(2) 1], which can be neglected for the
purposes of optimization.
2. N log c1 , a multiple of the logarithm of the norm of a residual
vector from the penalized least-squares t for , X i , Z i , and y i .
M
M
T
2
3.
i=1 log / abs |R11(i) | =
i=1 log / |Z i Z i + | . In the
general case this is the sum of the logarithms of the ratios of determinants.
In Figure 2.1 we show the two nonconstant terms and the resulting loglikelihood as a function of for the rails example
The shapes of the curves in Figure 2.1 indicate that it would be better to
optimize the proled log-likelihood with respect to = log instead of .
This transformation will also help to ensure that does not become negative during the course of the iterations of whatever optimization routine
we use. In Figure 2.2 we show the components and the log-likelihood as a
function of . We can see that the log-likelihood is closer to a quadratic
with respect to than with respect to .
There are patterns in Figure 2.2 that will hold in general for linear
mixed-eects models. The log of the norm of the residual is an increasing sigmoidal, or S-shaped, function with respect to . As (or
0), this log-norm approaches a horizontal asymptote at a value that
corresponds to the log residual norm from an unpenalized regression of the
form
b
Z1 0 . . .
0 X1 1
y1
b2
y2 0 Z 2 . . .
0 X2
..
y= . = .
..
..
.. . + .
.. ..
.
...
.
.
bn
yn
0
0 . . . Zn Xn
y1
X1
y2 X 2
y = . = . + .
.. ..
yn
Xn
In the ratio of determinants term, very large values of will dominate
Z Ti Z i in the denominator so the ratios approach / and the sum of the
73
-80
-75
-70
-65
log(likelihood)
3.0
3.5
4.0
4.5
log(residual norm)
-30
-20
-10
log(determinant ratio)
Delta
FIGURE 2.1. The proled log-likelihood as a function of for the rails example.
Two of the components of the log-likelihood, log 1 , the log of the length of
M
the residual, and
i=1 log / abs |11(i) | , the log of the determinant ratios,
are shown on the same scale.
74
-80
-75
-70
-65
log(likelihood)
3.0
3.5
4.0
4.5
log(residual norm)
-30
-20
-10
log(determinant ratio)
-4
-2
theta
FIGURE 2.2. The proled log-likelihood as a function of = log() for the rails
example. Two of the components of the log-likelihood, log 1 , the log of the
length of the residual, and M
i=1 log / abs |11(i) | , the log of the determinant
ratios, are shown on the same scale.
75
logarithms approaches zero. Very small values of will have little eect
on the denominator so the term has the form
M
M
"
"
T
log |Z Ti Z i |.
log |Z i Z i | = M
i=1
i=1
L(, , 2 |y) d,
76
M
"
||
N p
c1 2
log(2 2 )
log
abs
|R
|
+
log
abs
.
00
2
2 2
|R11(i) |
i=1
2
() = c1 2 /(N p) for 2 ,
This produces the conditional estimate
R
from which we obtain the proled log-restricted-likelihood
2
R
()|y)
R (|y) = R (,
M
"
i=1
log abs
||
|R11(i) |
.
(2.23)
The components of the proled log-restricted-likelihood in (2.23) are similar to those in the proled log-likelihood (2.21) except that the log of the
norm of the residual vector has a dierent multiplier
and there
is an ex
M
T 1
tra determinant term of log abs |R00 | = log i=1 X i i X i /2. A plot
of the components of the proled log-restricted-likelihood versus for the
rails example would be similar in shape to Figure 2.2.
The evaluation of the restricted maximum likelihood estimates is done
by optimizing the proled log-restricted-likelihood (2.23) with respect to
R to obtain the REML estionly, and using the resulting REML estimate
2
2
mate of ,
R ( R ). Similarly, the REML estimated BLUPs of the random
R in (2.22).
eects are obtained by replacing with
In some ways, it is blurring the denition of the REML criterion to speak
of the REML estimate of . The REML criterion only depends on and .
However, it is still useful, and perhaps even sensible, to evaluate the best
R has been determined using the REML
guess at from (2.19) once
criterion.
An important dierence between the likelihood function and the restricted likelihood function is that the former is invariant to one-to-one
reparameterizations of the xed eects (e.g., a change in the contrasts representing a categorical variable), while the latter is not. Changing the X i
matrices results in a change in log abs |R00 | and a corresponding change in
R (|y). As a consequence, LME models with dierent xed-eects structures t using REML cannot be compared on the basis of their restricted
likelihoods. In particular, likelihood ratio tests are not valid under these
circumstances.
77
j=1
(2.24)
As with a single level of random eects, we can simplify the integrals in
(2.24) if we augment the Z ij matrices with 2 and form orthogonaltriangular decompositions of these augmented arrays. This allows us to
evaluate the inner integrals. To evaluate the outer integrals we iterate this
process.
That is, we rst form and decompose the arrays
Z ij Z i,j X ij y ij
R22(ij) R21(ij) R20(ij) c2(ij)
,
= Q(ij)
0
R11(ij) R10(ij) c1(ij)
2
0
0
0
i = 1, . . . , M,
j = 1, . . . , Mi . (2.25)
R11(i1)
..
R11(iM )
i
1
R10(i1)
..
.
R10(iMi )
0
c1(i1)
..
R11(i)
.
= Q(i)
0
c1(iMi )
0
R10(i)
R00(i)
c1(i)
c0(i)
i = 1, . . . , M. (2.26)
The nal decomposition to produce R00 , c0 , and c1 is the same as that
in (2.17).
Using the matrices and vectors produced in (2.25), (2.26), and (2.17) and
following the same steps as for the single level of nesting we can express
78
k = 1, . . . , Q.
A = U U T ,
79
80
Any iterative optimization algorithm requires initial values for the parameters. Because we can express both the proled log-likelihood and the
proled log-restricted-likelihood as a function of the parameters, we only
need to formulate starting values for when performing iterative optimization for LME models. These may be obtained from a previous t for similar
data, or derived from the current data. A general procedure for deriving
initial values for from the data being t is described in Bates and Pinheiro
(1998) and is implemented in the lme function.
Individual iterations of the EM algorithm are quickly and easily computed. Although the EM iterations generally bring the parameters into the
region of the optimum very quickly, progress toward the optimum tends
to be slow when near the optimum. NewtonRaphson iterations, on the
other hand, are individually more computationally intensive than the EM
iterations, and they can be quite unstable when far from the optimum.
However, close to the optimum they converge very quickly.
We therefore recommend a hybrid approach of starting with an initial
(0) , performing a moderate number of EM iterations, then switching to
NewtonRaphson iterations. Essentially the EM iterations can be regarded
as rening the starting estimates before beginning the more general optimization routine. The lme function implements such a hybrid optimization scheme. It begins by calculating initial estimates of the parameters,
then uses several EM iterations to get near the optimum, then switches to
NewtonRaphson iterations to complete the convergence to the optimum.
By default 25 EM iterations are performed before switching to Newton
Raphson iterations.
When tting an LME model, it is often helpful to monitor the progress
of the NewtonRaphson iterations to identify possible convergence problems. This is done by including an optional control argument in the call
to lme. The value of control should be a list that can contain any of several ags or settings for the optimization algorithm. One of these ags is
msVerbose. When it is set to TRUE or T, diagnostic output on the progress
of the NewtonRaphson iterations in the indirect call of the ms function
(Bates and Chambers, 1992, 10.2) is produced.
If we set this ag in the rst t for the rails example of 1.1, the diagnostic output is not very interesting because the EM iterations leave
the parameter estimates so close to the optimum that convergence of the
NewtonRaphson iterations is declared almost immediately.
> fm1Rail.lme <+
control
Iteration: 0 ,
Parameters:
[1] -1.8196
Iteration: 1 ,
function calls, F=
61.049
81
Parameters:
[1] -1.8196
The algorithm converged to a slightly dierent value of , but with essentially the same value of the log-likelihood.
82
T
,
N , 2 R1
00 R00
1
1
..
..
.
N . , I 1 ( 1 , . . . , Q , ) ,
Q
(2.27)
log
log
2
2
2
log
T
1 T
2 T
1
1
1
..
..
..
,
I (1 , . . . , Q , ) =
.
.
.
2
2
2
1 log
2 log
2 log
where = (1 , . . . , Q , 2 ) now denotes the log-likelihood function proled
on the xed eects, I denotes the empirical information matrix, and R00
is dened as in (2.17). We use log in place of 2 in (2.27) to give an
unrestricted parameterization for which the normal approximation tends
to be more accurate.
As shown by Pinheiro (1994), the REML estimates in an LME model also
are consistent and asymptotically normal, with approximate distributions
identical to (2.27) but with replaced by the log-restricted-likelihood R
dened in 2.2.5.
In practice, the unknown parameters 1 , . . . , Q and 2 are replaced by
their respective ML or REML estimates in the expressions for the approximate variancecovariance matrices in (2.27). The approximate distributions for the maximum likelihood estimates and REML estimates are used
to produce hypothesis tests and condence intervals for the LME model
parameters, as described in 2.4.
83
model or to compare how well one model ts the data relative to another
model. This section presents approximate hypothesis tests and condence
intervals for the parameters in an LME model.
The anova function also displays the values of the Akaike Information
Criterion (AIC ) (Sakamoto et al., 1986) and the Bayesian Information
Criterion (BIC ) (Schwarz, 1978). As mentioned in 1.1.1, these are model
84
AIC = 2 |y
+ 2npar ,
BIC = 2 |y
+ npar log(N )
(2.28)
for each model, where npar denotes the number of parameters in the model.
Under these denitions, smaller is better. That is, if we are using AIC to
compare two or more models for the same data, we prefer the model with
the lowest AIC. Similarly, when using BIC we prefer the model with the
lowest BIC. The REML versions of the AIC and the BIC simply replace
in (2.28).
(|y)
by R (|y)
We will generally use likelihood-ratio tests to evaluate the signicance
of terms in the random-eects structure. That is, we t dierent nested
models in which the random-eects structure changes and apply likelihoodratio tests. Stram and Lee (1994), using the results of Self and Liang (1987),
argued that tests on the random eects structure conducted in this way can
be conservative. That is, the p-value calculated from the 2k2 k1 distribution
is greater than it should be. As Stram and Lee (1994) explain, changing
from the more general model to the more specic model involves setting the
variance of certain components of the random eects to zero, which is on
the boundary of the parameter region. The asymptotic results for likelihood
ratio tests have to be adjusted for boundary conditions. In the next section
we use simulations to demonstrate the eect of these adjustments.
Simulating Likelihood Ratio Test Statistics
One way to check on the distribution of the likelihood ratio test statistic
under the null hypothesis is through simulation. The simulate.lme function
takes two model specications, the null model and the alternative model.
These may be given as lme objects corresponding to each model, or as
lists of arguments used to produce such ts. In the latter case, only those
characteristics that change between the two models need to be specied in
the argument list for the alternative model.
For example, in the analysis of the OrthoFem data presented in 1.4.1, the
fm1OrthF t to the OrthoFem data has the specication
> fm1OrthF <- lme( distance ~ age, data = OrthoFem,
+
random = ~ 1 | Subject )
while fm2OrthF is t as
> fm2OrthF <- update( fm1OrthF, random = ~ age | Subject )
1 8
1 10
=
1 12
1 14
85
Z 1 = Z 2 = = Z 11 =
1
1
and the bi are one-dimensional random vectors
fm2OrthF the Z i matrices are
1
1
Z 1 = Z 2 = = Z 11 =
1
1
with variance = 12 . In
8
10
12
14
86
0.4
0.8
REML
Mix(1,2)
REML
df=2
1.0
0.8
Nominal p-value
0.6
0.4
0.2
ML
df=1
1.0
ML
Mix(1,2)
0.0
ML
df=2
0.8
0.6
0.4
0.2
0.0
0.0
0.4
0.8
0.0
0.4
0.8
Empirical p-value
FIGURE 2.3. Plots of the nominal versus empirical p-values for the likelihood
ratio test statistic comparing two models for the orthodontic data, female subjects
only. The null model, fm1OrthF, has a random eect for the intercept only. The
alternative model, fm2OrthF, has random eects for both the intercept and the
slope. The null model was simulated 1000 times, both models were t to the
simulated data, and the likelihood ratio test statistic was calculated for both
maximum likelihood and REML estimates. In each panel, the nominal p-values
for the LRT statistics under the corresponding distribution are plotted versus the
empirical p-values.
ML and REML estimation, are plotted versus the empirical p-values, obtained from the empirical distribution of the simulated LRT statistics.
For both REML and ML estimates, the nominal p-values for the LRT
statistics under a 2 distribution with 2 degrees of freedom are much greater
than the empirical p-values. This is the sense in which the likelihood ratio
test using 22 for the reference distribution will be conservativethe actual
p-value is smaller than the p-value that is reported. Stram and Lee (1994)
suggest a 0.521 + 0.522 mixture as a reference distribution, which is conrmed in Figure 2.3, for both ML and REML estimation. A 21 appears to
be anti-conservative in the sense that the nominal p-values are smaller
than the empirical p-values.
The adjustment suggested by Stram and Lee (1994) is not always this
successful. According to this adjustment, the null distribution of the likelihood ratio test statistic for comparing fm1Machine to fm2Machine should
have approximately a 0.520 + 0.521 mixture distribution, where 20 represents a distribution with a point mass at 0. When simulated
> machineLRTsim <- simulate.lme(fm1Machine, fm2Machine, nsim= 1000)
Nominal p-value
0.0
ML
Mix(0,1)
0.2
0.4
0.6
0.8
1.0
0.0
ML
df=1
REML
Mix(0,1)
0.2
0.4
0.6
87
0.8
1.0
REML
df=1
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Empirical p-value
FIGURE 2.4. Plots of the nominal versus empirical p-values for the likelihood ratio test statistic comparing two models for the Machines data. The null
model, fm1Machine, has a random eect for Worker only. The alternative model,
fm2Machine, has random eects for the Worker and a random interaction for
Machine %in% Worker. Both models were t to 1000 sets of data simulated from
the null model and the likelihood ratio test statistics were calculated.
it produces a distribution for the LRT statistics that closely agrees with the
equal-weight mixture in the REML case, but which resembles a 0.6520 +
0.3521 mixture in the ML case.
It is dicult to come up with general rules for approximating the distribution of the LRT statistic for such nested mixed-eects models. The
naive approach of using a 2 distribution with the number of degrees of
freedom determined by the dierence in the number of nonredundant parameters in the models as the reference is easily implemented and tends to
be conservative. This is the reference distribution we use to calculate the
p-values quoted in the multiargument form of anova. One should be aware
that these p-values may be conservative. That is, the reported p-value may
be greater than the true p-value for the test and, in some cases, it may be
much greater.
88
Nominal p-value
0.0
ML
df=3
1.0
0.4
0.8
ML
Mix(3,4)
ML
df=4
0.8
0.6
0.4
0.2
0.0
0.0
0.4
0.8
0.0
0.4
0.8
Empirical p-value
FIGURE 2.5. Plots of the nominal versus empirical p-values for the likelihood
ratio test statistic comparing two models for the ergoStool data. The alternative
model has a xed eect for Type but the null model does not. The random eects
specications are the same. Both models were t to 1000 sets of data simulated
from the null model and the likelihood ratio test statistics from the maximum
likelihood estimates were calculated.
We can see from Figure 2.5 that, at 3 degrees of freedom, which is the
dierence in the number of parameters in the two models, the 2 distribution gives p-values that are anticonservative. At 4 degrees of freedom the
p-values will be conservative. The nominal p-values for the equal-weight
mixture of 23 and 24 distributions, represented in the middle panel of
Figure 2.5, are in close agreement with the empirical p-values.
In this case the slight anticonservative nature of the reported p-values
may not be too alarming. However, as the number of parameters being
removed from the xed eects becomes large, compared to the total number
of observations, this inaccuracy in the reported p-values can be substantial.
For example, Littell, Milliken, Stroup and Wolnger (1996, 1.5) provide
analyses of data from a partially balanced incomplete block (PBIB) design.
The design is similar to the randomized block design in the ergometric
experiment described in 1.2 except that not every level of the treatment
Nominal p-value
0.0
ML
df=14
1.0
0.4
89
0.8
ML
df=16
ML
df=18
0.8
0.6
0.4
0.2
0.0
0.0
0.4
0.8
0.0
0.4
0.8
Empirical p-value
FIGURE 2.6. Plots of the nominal versus empirical p-values for the likelihood
ratio test statistic comparing two models for the PBIB data. The alternative model
has a xed eect for Type, but the null model does not. The random eects
specications are the same. Both models were t to 1000 sets of data simulated
from the null model, and the likelihood ratio test statistics from the maximum
likelihood estimates were calculated.
appears with every level of the blocking factor. This is the sense in which
it is an incomplete block design. It is partially balanced because every
pair of treatments occur together in a block the same number of times.
These data are described in greater detail in Appendix A.22 and are given
as an object called PBIB that is available with the nlme library.
The important point with regard to the likelihood ratio tests is that there
are 15 levels of the Treatment factor and only 60 observations in total. The
blocking factor also has 15 levels. If we simulate the likelihood ratio test
and plot the p-values calculated from the 214 distribution,
> pbibLRTsim <+
simulate.lme( m1 = list( fixed = response ~ 1, data = PBIB,
+
random = ~ 1 | Block ),
+
m2 = list( fixed = response ~ Treatment ),
+
method = "ML", nsim = 1000 )
> plot( pbibLRTsim, df = c(14,16,18), weights = FALSE ) # Figure 2.6
we can see, from Figure 2.6, that the p-values calculated using 214 as the
reference distribution are seriously anticonservative.
Another, perhaps more conventional, approach to performing hypothesis
tests involving terms in the xed-eects specication is to condition on
As
the estimates of the random eects variancecovariance parameters, .
described in 2.2.1, for a xed value of , the conditional estimates of
the xed eects, (),
are determined as standard least-squares estimates.
The approximate distribution of the maximum likelihood or the REML
estimates of the xed eects in (2.27) is exact for the conditional estimates
().
90
Conditional tests for the signicance of a term in the xed-eects specication are given by the usual F-tests or t-tests for linear regression models,
based on the usual (REML) conditional estimate of the variance
2
() = s2 =
R
c1 2
RSS
=
.
N p
N p
We will compare this result, a p-value of 15.8%, with that from the
likelihood ratio test. Because a likelihood ratio test for terms in the xedeects specication must be done on ML ts, we rst re-t fm1PBIB using
maximum likelihood, then modify the model.
> fm2PBIB <- update( fm1PBIB, method = "ML" )
> fm3PBIB <- update( fm2PBIB, response ~ 1 )
> anova( fm2PBIB, fm3PBIB )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm2PBIB
1 17 56.571 92.174 -11.285
fm3PBIB
2 3 52.152 58.435 -23.076 1 vs 2 23.581 0.0514
91
The simulation illustrated in Figure 2.6 shows that the 15.8% p-value from
the conditional F-test is much more realistic than the 5.1% p-value from
the likelihood ratio test.
For this reason, we prefer the conditional F-tests and t-tests for assessing
the signicance of terms in the xed eects.
These conditional tests for xed-eects terms require denominator degrees of freedom. In the case of the conditional F-tests, the numerator
degrees of freedom are also required, being dened by the term itself. The
denominator degrees of freedom are determined by the grouping level at
which the term is estimated. A term is called inner relative to a grouping
factor if its value can change within a given level of the grouping factor.
A term is outer to a grouping factor if its value does not change within
levels of the grouping factor. A term is said to be estimated at level i, if it
is inner to the i 1st grouping factor and outer to the ith grouping factor.
For example, the term Machine in the fm2Machine model is outer to Machine
%in% Worker and inner to Worker, so it is estimated at level 2 (Machine %in%
Worker). If a term is inner to all Q grouping factors in a model, it is estimated at the level of the within-group errors, which we denote as the
Q + 1st level.
The intercept, which is the parameter corresponding to the column of 1s
in the model matrices X i , is treated dierently from all the other parameters, when it is present. As a parameter it is regarded as being estimated
at level 0 because it is outer to all the grouping factors. However, its denominator degrees of freedom are calculated as if it were estimated at level
Q + 1. This is because the intercept is the one parameter that pools information from all the observations at a level even when the corresponding
column in X i doesnt change with the level.
Letting mi denote the total number of groups in level i (with the convention that m0 = 1 when the xed eects model includes an intercept
and 0 otherwise, and mQ+1 = N ) and pi denote the sum of the degrees
of freedom corresponding to the terms estimated at level i, the ith level
denominator degrees of freedom is dened as
denDF i = mi (mi1 + pi ) ,
i = 1, . . . , Q + 1.
This denition coincides with the classical decomposition of degrees of freedom in balanced, multilevel ANOVA designs and gives a reasonable approximation for more general mixed-eects models.
For example, in the fm2Machine model, Q = 2, m0 = 1, m1 = 6, m2 = 18,
m3 = 54, p0 = 1, p1 = 0, p2 = 2, and p3 = 0, giving denDF 1 = 5,
denDF 2 = 10, and denDF 3 = 36.
> anova( fm2Machine )
numDF denDF F-value p-value
(Intercept)
1
36 773.57 <.0001
Machine
2
10
20.58
3e-04
92
93
Condence intervals on the variancecovariance components for the random eects are a bit trickier to obtain. In practice, one is interested in
getting condence intervals on the original scale of the elements of and
not in the scale of the unconstrained parameters used in the optimization.
For some simple forms of , such as a diagonal structure or a multiple of
the identity structure, it is easy to transform the condence intervals obtained in the unconstrained scale (the logarithm of the standard deviations
in the two examples mentioned) back to the original parameter scale (by exponentiating the condence limits in the case of the diagonal and multiple
of the identity structures).
In the case of a general positive-denite , however, usually it is not
possible to transform back to the original scale the condence intervals
obtained for the unconstrained parameter used in the optimization. This
is true, for example, for the matrix logarithm parameterization described
in 2.2.7, when the dimension of is greater than one.
The approach used in lme is to consider a dierent parameterization
for general positive-denite when calculating condence intervals. This
parameterization, which we call the natural parameterization, uses the logarithm of the standard deviations and the generalized logits of the correlations. For a given correlation parameter 1 < < 1, its generalized logit
is log[(1 + )/(1 )] which can take any value in the real line. We denote
by the parameter vector determining the natural parameterization. The
elements of are individually unconstrained, but not jointly so. Therefore,
the natural parameterization cannot be used for optimization. However,
the elements of can be individually transformed into meaningful parameters in the original scale, so it is a useful parameterization for constructing
condence intervals.
If j corresponds to the logarithm of a standard deviation in and letting [I 1 ]jj denote its associated diagonal element in the inverse empirical
information matrix, an approximate level 1 condence interval for the
corresponding standard deviation is
$
$
%
%
exp j z(1 /2) I 1 jj , exp j + z(1 /2) I 1 jj .
%
%
exp j z(1 2 ) I 1 jj 1 exp j + z(1 2 ) I 1 jj 1
.
$
$
,
%
%
exp j z(1 2 ) I 1 jj + 1 exp j + z(1 2 ) I 1 jj + 1
94
(2.29)
Predicted values at the kth level of nesting estimate the conditional expectation of the response, given the random eects at levels k. For example, letting z h (i) denote a vector of covariates corresponding to random
eects associated with the ith group at the rst level of nesting, the level-1
predictions estimate
E [yh (i)|bi ] = xTh + z h (i)T bi .
(2.30)
Similarly, letting z h (i, j) denote a covariate vector associated with the jth
level-2 group within the ith level-1 group, the level-2 predicted values estimate
E [yh (i)|bi , bij ] = xTh + z h (i)T bi + z h (i, j)T bij .
(2.31)
95
Ware formulation. For a single grouping factor that divides the observations
into M groups of ni , i = 1, . . . , M observations, the model is written
y i = X i + Z i bi + i ,
bi N (0, ),
i = 1, . . . , M,
i N (0, 2 I).
The parameters in the model are the p-dimensional xed eects, , the qq
variancecovariance matrix, , for the random eects, and the variance 2
of the noise i . We estimate these parameters by maximum likelihood
(ML) or by restricted maximum likelihood (REML).
Although the random eects bi , i = 1, . . . , M are not formally parameters
in the model, we will often want to formulate our best guess for these
values given the data. We use the Best Linear Unbiased Predictors (BLUPs)
for this.
For computational purposes the variancecovariance matrix is reexpressed in terms of the relative precision factor which satises
T = / 2
and the matrix is expressed as a function of an unconstrained parameter
vector .
The proled log-likelihood function with respect to can be easily calculated using matrix decompositions. That is, the log-likelihood correspond
ing to the conditionally best estimates ()
and
2 () can be evaluated as a
function of alone. This simplies the problem of optimizing the likelihood
to get maximum likelihood estimates because it reduces the dimension of
the optimization. The same simplication applies to REML estimation.
We describe approximate distributions for the maximum likelihood estimates and the REML estimates using results from asymptotic theory for
linear mixed-eects models.
We compare models that dier in the random eects specication by
likelihood ratio tests or by simulation-based parametric bootstrap evaluations.
We assess the signicance of terms in the xed-eects specication by
These tests
standard linear regression tests conditional on the value of .
include t-tests for individual coecients or F-tests for more complicated
terms or linear combinations of coecients. The degrees of freedom for
a t-test (or the denominator degrees of freedom for an F-test) depend on
whether the factor being considered is inner to the grouping factor (changes
within levels of the grouping factor) or outer to the grouping factor (is
invariant within levels of the grouping factor).
Approximate condence intervals for the xed eects and the variance
covariance parameters are produced from the approximate distributions of
the maximum likelihood estimates and REML estimates.
96
i = 1, . . . , M,
bij N (0, 2 ),
j = 1, . . . , Mi ,
Exercises
1. The simulation results presented in Figure 2.4 (p. 87) indicate that
the null distribution of the REML likelihood ratio test statistic comparing a null model with a single level of scalar random eects to
an alternative model with nested levels of scalar random eects is
approximately an equally weighted mixture of a 20 and a 21 .
Conrm this result by simulating a LRT statistic on the Oats data,
considered in 1.6. The preferred model for those data, fm4Oats, was
dened with random = ~Block/Variety. Re-t this model with random
= ~Block. Using this t as the null model and fm4Oats as the alternative model, obtain a set of simulated LRT statistics with simulate.lme.
Plot these simulated LRT statistics setting df = c(0,1) to obtain a
plot like Figure 2.4. Are the conclusions from this simulation similar
to those from the simulation shown in Figure 2.4?
Note that simulate.lme must t both models to nsim simulated sets
of data. By default nsim = 1000, which could tie up your computer
for a long time. You may wish to set a lower value of nsim if the
default number of simulations will take too long.
3
Describing the Structure of
Grouped Data
98
Notice that there is not primary covariate in the Rail data so we use the
constant expression 1 in that position in the formula. In data with multiple, nested grouping factors, such as the Pixel data, the grouping factors
are separated by /. Factors that are nested within other factors appear
further to the right so an expression like Dog/Side indicates that Side is
nested within Dog.
The formula of a grouped data object has the same pattern as the formula used in a call to a trellis graphics function, such as xyplot. This is
intentional. Because such a formula is available with the data, the plot
method for objects in the groupedData class can produce an informative
trellis display from the object alone. It may, in fact, be best to think of the
formula stored with the data as a display formula for the data because it
provides a meaningful default graphical display method for the data.
The formula function shown above is an example of an extractor function
for this class. It returns some property of the objectthe display formula
in this casewithout requiring the user to be aware of how that property is
stored. We provide other extractor functions for each of the components of
the display formula. The getGroups extractor returns the value of the grouping factor. A companion function, getGroupsFormula, returns the formula
that is evaluated to produce the grouping factor. The extractors for the
other components of the display formula are getResponse and getCovariate,
0.0
1.0
-1.0
0.0
1.0
-1.0
0.0
1.0
-1.0
0.0
24
22
12
13
14
19
15
20
18
23
11
99
1.0
170
160
150
140
130
21
Height (cm)
170
160
150
140
130
10
26
25
17
16
170
160
150
140
130
-1.0
0.0
1.0
-1.0
0.0
1.0
-1.0
0.0
1.0
-1.0
0.0
1.0
-1.0
0.0
1.0
Centered age
FIGURE 3.1. Heights of 26 boys from Oxford, England, each measured on nine
occasions. The ages have been centered and are in an arbitrary unit.
100
from Oxford, England (see Appendix A.19 for more detail). We could use
the table function on the grouping factor,
> table( Oxboys$Subject )
10 26 25 9 2 6 7 17 16 15 8 20 1 18 5 23 11 21 3 24 22 12 13
9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
14 19 4
9 9 9
Because there are exactly nine observations for each subject, the data are
balanced with respect to the number of observations. However, if we also
check for balance in the covariate values, we nd they are not balanced.
> unique( table( getCovariate( Oxboys ), getGroups( Oxboys ) ) )
[1] 1 0
> length( unique( getCovariate( Oxboys ) ) )
[1] 16
Further checking reveals that there are 16 unique values of the covariate age.
The boys are measured at approximately the same ages, but not exactly the
same ages. This imbalance could aect some analysis methods for repeated
measures data. It does not aect the methods described in this book.
The isBalanced function in the nlme library can be used to check a
groupedData object for balance with respect to the grouping factor(s) or
with respect to the groups and the covariate. It is built from calls to
getGroups and table like those above.
When applied to data with multiple, nested grouping factors, the getGroups extractor takes an optional argument level. Levels are counted from
the outside inward so, in the Pixel data where the grouping is Dog/Side,
Dog is the rst level and Side is the second level. When the argument level
species a single level the result is returned as a vector
> unique( getGroups(Pixel, level = 1) )
[1] 1 2 3 4 5 6 7 8 9 10
> unique( getGroups(Pixel, level = 2) )
[1] 1/R 2/R 3/R 4/R 5/R 6/R 7/R 8/R
1/R < 2/R < 3/R < 4/R < 5/R < ...
9/R
10/R 1/L
...
101
Notice that the groups at level = 2, the Side within Dog factor, are
coerced to an ordered factor with distinct levels for each combination of
Side within Dog.
If we extract the groups for multiple levels the result is returned as a
data frame with one column for each level. Any inner grouping factors are
preserved in their original form in this frame rather than being coerced to
an ordered factor with distinct levels as above. For example,
> Pixel.groups <- getGroups( Pixel, level = 1:2 )
> class( Pixel.groups )
[1] "data.frame"
> names( Pixel.groups )
[1] "Dog"
"Side"
> unique( Pixel.groups[["Side"]] )
[1] R L
102
formed into a data frame. One of the simplest ways is using the read.table
function on data stored in an external le (Venables and Ripley, 1999, 2.4).
For example, if the Oxford boys height data are stored in a text le
named oxboys.dat of the form
Subject
1
1
1
1
1
1
1
1
1
2
26
26
26
26
26
age
-1.0000
-0.7479
-0.4630
-0.1643
-0.0027
0.2466
0.5562
0.7781
0.9945
-1.0000
...
-0.0027
0.2466
0.5562
0.7781
1.0055
height
140.50
143.40
144.80
147.10
147.70
150.20
151.70
153.30
155.80
136.90
138.40
138.90
141.80
142.60
143.10
The argument header = TRUE in the call to read.table indicates that the
rst line of the le is to be used to create the names for the variables in
the frame.
The result of read.table is of the data.frame class. It has two dimensions:
the number of rows (cases) and the number of columns (variables).
A function to create objects of a given class is called a constructor for that
class. The primary constructor function for a class is often given the same
name as the class itself. Thus the default constructor for the groupedData
class is the groupedData function. Its required arguments are a formula and
a data frame. Optional arguments include labels, where display labels for
the response and the primary covariate can be given, and units, where the
units of these variables can be given. The default axis labels for data plots
are constructed by pasting together components of labels and units. The
reason for separating the units from the rest of the display label is to permit
propagation of the units to derived quantities such as the residuals from a
tted model.
103
The rst value of Subject in the data is 1, but the value of Subject with
the smallest maximum height is 10 so this subjects data occupy the lower
left panel in Figure 3.1.
The labels and units arguments are optional. We recommend using them
because this makes it easier to create more informative trellis plots.
104
0 30
0 30
0 30
0 30
11
0 30
10
12
0 30
13
0 30
15
14
16
600
500
400
300
0 30
0 30
0 30
0 30
0 30
0 30
0 30
0 30
Time (days)
FIGURE 3.2. Weight versus time of rats on three dierent diets. The rst group of
eight rats is on the control diet. There were four rats each in the two experimental
diets.
20
40
105
60
600
500
400
300
20
40
60
20
40
60
Time (days)
FIGURE 3.3. Weight of rats versus time for three dierent diets.
imental diet, and four rats were given another experimental diet. The Diet
factor is an experimental factor that is outer to the grouping factor Rat.
One benet of specifying outer factors to the constructor is that the constructor will modify the way in which the groups are ordered. Reordering
of the groups is permitted only within the same level of an outer factor (or
within the same combination of levels of outer factors, when there is more
than one). This ensures that groups at the same levels of all outer factors
will be plotted in a set of adjacent panels.
The plot method for the groupedData class allows an optional argument
outer that can be either a logical value or a formula. When this argument is
used the panels are determined by the factor or combination of factors given
in the outer formula. A logical value of TRUE or T can be used instead of a
formula, indicating that the outer formula stored with the data should be
used to determine the panels in the plot. For example, we can get a stronger
visual comparison of the dierences between the diets for the BodyWeight
data with
> plot( BodyWeight, outer = ~ Diet, aspect = 3 )
# Figure 3.3
106
The aspect argument, discussed in 3.3.1, is used to enhance the visualization of patterns in the plot. Because this outer formula was stored with
the BodyWeight data we would produce the same plot with
> plot( BodyWeight, outer = TRUE, aspect = 3 )
arranges panels of the same Variety on the same row, making it easy to
compare the results for each variety across years. Conversely, the plot produced by
> plot( Soybean, outer = ~ Variety * Year )
(not shown) arranges panels of the same Year on the same row, making it
easy to compare varieties within each year.
When specifying outer factors in the constructor or in a plot call we
should ensure that they are indeed constant or invariant within each level
of the grouping factor. The gsummary function with the optional argument
invariantsOnly = TRUE allows us to check this. It returns the values of only
those variables that are invariant within each level of the grouping factor.
These values are returned as a data frame with one row for each level of
the grouping factor. For the BodyWeight data we get
> gsummary( BodyWeight, invar = TRUE )
Rat Diet
2
2
1
3
3
1
4
4
1
1
1
1
8
8
1
5
5
1
8
4
16
32
64
107
128
1
30
20
10
30
20
10
0
8
16
32
64
128
16
32
64
128
6
7
11
9
10
12
13
15
14
16
6
7
11
9
10
12
13
15
14
16
1
1
2
2
2
2
3
3
3
3
108
two treatments; once with the MD5 -antagonist MDL 72222 and once with a
placebo. The change in blood pressure was measured for each dose on each
occasion.
In Figure 3.4, produced by
> plot( PBG, inner = ~ Treatment, scales = list(x = list(log = 2)))
the lines in each panel joint points with the same Treatment. (The scales
argument to the plot call will be described in 3.3.) This plot provides a
strong indication that the eect of the PBG treatment is to shift the dose
response curve to the right. We will discuss methods for modeling this in
Chapter 7.
The PBG data is similar in structure to the Pixel data. In both these data
sets there is a continuous response, a continuous covariate (day for Pixel and
dose for PBG), a major grouping factor corresponding to the experimental
animal (Dog for Pixel and Rabbit for PBG) and a factor that varies within
this major grouping factor (Side for Pixel and Treatment for PBG). In the
case of the Pixel data we used nested grouping factors to represent this
structure. In the case of the PBG data we used a single grouping factor with
an inner treatment factor. These two structures are quite similarin fact,
for the purposes of plotting the data, they are essentially equivalent. The
choice of one structure or the other is more an indication of how we think
the inner factor should be modeled. For the Pixel data we modeled the
eect of the Side factor as a random eect because the Side just represents
a random selection of lymph node for each Dog. In other experiments there
may be important physiological dierences between the left side and the
right side of the animal so we would model it as a xed eect. In the PBG
data the Treatment factor is a factor with xed, repeatable levels, and we
model it as a xed eect.
If we decide that an inner factor should be modeled as a random eect
we should specify it as part of a nested grouping structure. If it should be a
xed eect we specify it as an inner factor. These choices can be overridden
when constructing models.
109
> ergoStool.mat
T1 T2 T3 T4
8 7 11 8 7
5 8 11 8 7
4 7 11 10 9
9 9 13 10 8
6 9 11 11 10
3 7 14 13 9
7 8 12 12 11
1 12 15 12 10
2 10 14 13 12
The asTable function, which can only be used with balanced and unreplicated data, produces a table of the responses in the form of a matrix where
columns correspond to the unique values of the primary covariate and rows
correspond to groups. The dimnames of the matrix are the unique levels of
the grouping factor and the covariate.
This table provides a compact representation of balanced data. Often
the data from a balanced experiment are provided in the form of a table
like this. The balancedGrouped function convert data from a table like this
to a groupedData object.
> ergoStool.new <- balancedGrouped( effort ~ Type | Subject,
+
data = ergoStool.mat )
Warning messages:
4 missing values generated coercing from character to numeric
in: as.numeric(dn[[1]])
> ergoStool.new
Grouped Data: effort ~ Type | Subject
Type Subject effort
1
T1
8
7
2
T2
8
11
3
T3
8
8
4
T4
8
7
5
T1
5
8
...
36
T4
2
12
110
It is a common practice to label the levels of a factor like the Type factor
as 1, 2, . . . , which would result in its being coerced to a numeric variable.
Unless this is detected and the numeric variable is explicitly converted to
a factor, models t to such data will be nonsensical. It is always a good
idea to check that the variables in a groupedData object have the expected
classes. We describe how to do this in 3.4.
The optional labels and units arguments can be used with balancedGrouped just as in the groupedData constructor.
As seen in the example, the balancedGrouped constructor produces an
object like any other groupedData object. The matrix of response values
is converted to a vector and both the primary covariate and the grouping
factor are expanded to have the same length as the response vector. Later
manipulations of the object or plots created from the object do not rely on
its having been generated from balanced data. This is intentional. Although
there is a certain amount of redundancy in storing multiple copies of the
same covariate or grouping factor values, it is oset by the exibility of
the data.frame structure in dealing with missing data and with general,
unbalanced data.
Many methods for the analysis of longitudinal data or repeated measurements data depend on having balanced data. The analysis techniques
described in this book do not.
111
with a continuous primary covariate, there is one panel for each level of the
grouping factor. The horizonal axis in the panel is the primary covariate,
the vertical axis is the response, and the data are represented both as points
and by a connecting line. If the primary covariate is a factor, such as in the
Machine data, or if there is no primary covariate, such as in the Rail data,
the plot is a dotplot with one row for each level of the grouping factor. In
this case the response is on the horizontal axis.
For numeric covariates the aspect ratio of each panel, which is the ratio
of the physical size of the vertical axis to that of the horizontal axis, is
determined by the 45-degree banking rule described in Cleveland (1994,
3.1). We have found that this rule produces appealing and informative
aspect ratios in a wide variety of cases. If you wish to override this choice
of aspect ratio, you can give an numerical value as the optional aspect
argument in the call to plot. A value greater than 1.0 produces tall, narrow
panels, while a value between 0.0 and 1.0 produces short, wide panels.
The arrangement of panels in rows and columns on the page is calculated
to make the specied number of panels of the chosen aspect ratio ll as
much as possible of the available area. This does not always create a good
arrangement for comparing patterns across outer factors. For example, the
grouping factor Plant in the CO2 data, described in Appendix A.5, has
twelve dierent levels. The plants themselves come from one of two types
and have been subjected to one of two treatments. Because the aspect ratio
chosen by the banking rule creates panels that are taller than they are wide,
a more-or-less square plot area will be lled with three rows of four panels
each (Figure 3.5).
For some combinations of grass type and treatment the panels are spread
across more than one row in Figure 3.5. It would be better to keep these
combinations on the same row so we can more easily compare treatments
and grass types. That is, we would prefer the panels to be arranged in four
rows of three or, perhaps, two rows of six. If we have four rows of three,
we may wish to indicate visually that the lower two rows represent one
type of plant (Quebec) and the upper two rows represent the other type
(Mississippi). We can do this by specifying a list as the optional between
argument. A component named x in this list indicates the sizes of gaps to
leave between columns while a component named y species gaps between
rows. (When forming a between argument for the rows, remember that the
trellis convention is to count from the bottom to the top, not from the top
to the bottom.) The gaps are given in units of character heights. Generally
a gap of 0.5 is sucient to distinguish groups without using too much space.
An arrangement of the CO2 data in two rows of six panels with a gap
between the third and fourth columns, shown in Figure 3.6, is produced by
> plot(CO2, layout=c(6,2), between=list(x=c(0,0,0.5,0,0))) # Fig 3.6
112
600
1000
200
600
Mn1
Mc2
Mc3
Mc1
Qc3
Qc2
Mn3
Mn2
1000
40
30
20
10
40
30
20
10
Qn1
Qn2
Qn3
Qc1
40
30
20
10
200
600
1000
200
600
1000
single page plot may result in the panels being too small to be informative.
In these cases a third component can be added to the layout argument
causing the plot to be spread over several pages.
If the number of groups in some level of an outer factor does not t
exactly into the rectangular array, an optional skip argument can be used
to skip over selected panel locations.
The use of both of these arguments is illustrated in the code for the gures
of the spruce tree growth data (Appendix A.28). There were four groves
of trees; two exposed to an ozone-rich atmosphere, and two exposed to a
normal atmosphere. In the rst and second groves 27 trees were measured,
but in the third and fourth groves only 12 and 13 trees were measured,
respectively.
Mn3
600
1000
Mn2
200
Mn1
600
1000
Mc2
200
Mc3
600
113
1000
Mc1
40
30
20
10
Qn1
Qn2
Qn3
Qc1
Qc3
Qc2
40
30
20
10
200
600
1000
200
600
1000
200
600
1000
On the rst two pages of this plot the array would be lled. On the third
page there would be a gap in the middle of the array to separate the panels
corresponding to the two dierent groves of trees exposed to a normal
atmosphere.
114
8 10
8 10
3
2.0
Optical density
1.5
1.0
0.5
0.0
10
11
2.0
1.5
1.0
0.5
0.0
0
8 10
8 10
8 10
Both the facts that the unique values of the DNase concentration are,
for the most part, logarithmically spaced
> unique( getCovariate(DNase) )
[1] 0.048828 0.195312 0.390625 0.781250 1.562500 3.125000
[7] 6.250000 12.500000
> log( unique(getCovariate(DNase)), 2 )
[1] -4.35614 -2.35614 -1.35614 -0.35614 0.64386 1.64386 2.64386
[8] 3.64386
2^-4 2^-2
115
3
2.0
Optical density
1.5
1.0
0.5
0.0
10
11
2.0
1.5
1.0
0.5
0.0
2^-4 2^-2
2^-4 2^-2
2^-4 2^-2
The rst two lines of this function draw the background grid and place
symbols at the data values. The actual symbol that is drawn is determined
by the trellis device that is active when the plot is displayed. It is usually
an open circle.
The last four lines of the panel function add a line through the data
values. Some care needs to be taken when doing this. In the DNase assay
data, for example, there are duplicate observations at each concentration in
each run. Rather than joining the dots, it makes more sense to draw the
line through the average response in each set of replicates. In the default
panel function xvals is dened to be the unique values of x and y.avg is
calculated as the average of the y values at these distinct x values. Finally,
the xvals vector is put into increasing order and the line drawn with the
points in this order.
This panel function can be overridden with an explicit panel argument to
the plot call if, for example, you want to omit the background grid. If you
do override the default panel function, it would be a good idea to follow
the general pattern of this function. In particular, you should draw the grid
rst (if you choose to do so) then add any points or lines. Also, be careful
to handle replicates or unusual ordering of the (x, y) pairs gracefully.
116
# Figure 3.9
A more meaningful trellis display of these data, using Dog as the display
level, is obtained with
> plot( Pixel, displayLevel = 1 )
# Figure 3.10
In general there is a trend for the standard deviation about the curve to
be greater when the response is greater. This is a common occurrence. A
less obvious eect is that the standard deviation is greater in the wafers
with overall higher current intensity, even though the dierences in the
current intensity are small.
5 10
20
5 10
7/l
7/r
10/r
10/l
4/l
4/r
5/r
5/l
117
20
1160
1140
1120
1100
1080
1060
1040
1160
1140
1120
1100
1080
1060
Pixel intensity
1040
8/r
8/l
6/r
6/l
3/l
3/r
9/r
9/l
1160
1140
1120
1100
1080
1060
1040
1160
1140
1120
1100
1080
1060
1040
1/r
1/l
2/r
2/l
1160
1140
1120
1100
1080
1060
1040
0
5 10
20
5 10
20
118
Pixel intensity
10
15
20
10
15
20
10
1160
1140
1120
1100
1080
1060
1040
1160
1140
1120
1100
1080
1060
1040
0
10
15
20
10
15
20
10
15
20
1.0
1.5
2.0
1.0
10
1.5
2.0
7
15
10
15
10
1.0
1.5
2.0
1.0
1.5
2.0
1.0
1.5
2.0
Voltage (V)
FIGURE 3.11. Current versus voltage for the Wafer data. The panels correspond
to wafers. With each wafer the current was measured at eight dierent sites.
1.0
1.5
2.0
1.0
1.5
10
119
2.0
7
15
10
15
10
1.0
1.5
2.0
1.0
1.5
2.0
1.0
1.5
2.0
Voltage (V)
FIGURE 3.12. Mean current versus voltage at the wafer level for the Wafer data.
Each point on each curve is the average of the current for that voltage at eight
sites on the wafer.
1.0
2.0
1.0
2.0
1.0
2.0
1.0
10
2.0
1.0
2.0
0.4
0.3
0.2
0.1
1.0
2.0
1.0
2.0
1.0
2.0
1.0
2.0
1.0
2.0
Voltage (V)
FIGURE 3.13. Standard deviation of current versus voltage at the wafer level for
the Wafer data. Each point on each curve is the standard deviation of the current
at that voltage at eight sites on the wafer.
120
3.4 Summaries
In addition to creating graphical presentations of the data we may wish to
summarize the data numerically, either by group or across groups. In 3.1
we demonstated the use of the table, gsummary, and unique functions to
summarize data by group. In this section we expand on the usage of these
functions and introduce another groupwise summary function gapply.
The gapply function is part of the nlme library. It joins several standard
S functions in the apply family. These include apply, lapply, tapply, and
sapply. They all apply a function to subsections of some data structure
and gather the results in some way. Both lapply and sapply can be used
to apply a function to the components of a list. The result of lapply is
always a list; sapply will create a more compact result if possible. Because
the groupedData class inherits from the data.frame class and a data frame
can be treated as a list whose components are the columns of the frame, we
can apply a function to the columns of a groupedData object. For example,
we can use sapply to check the data.class of the columns of a groupedData
object by
> sapply( ergoStool, data.class )
effort
Type
Subject
"numeric" "factor" "ordered"
We see that effort, the response, is a numeric variable; Type, the covariate,
is a factor, and Subject, the grouping factor, is an ordered factor. We could
replace sapply with lapply and get the same information, but the result
would be returned as a list and would not print as compactly.
Checking the data.class of all variables in a data.frame or a groupedData
object is an important rst step in any data analysis. Because factor levels
are often coded as integers, it is a common mistake to leave what should
be a factor as a numeric variable. Any linear models using such a factor
will be meaningless because the factor will be treated as a single numeric
variable instead of being expanded into a set of contrasts. Another way of
checking on the data.class of variables in a frame is to use the summary
function shown later in this section.
The table and unique functions are not solely intended for use with
grouped data, but often are useful when working with grouped data. The
gsummary function, however, is specically designed for use with groupedData objects. In 3.2.1 we used gsummary with the optional argument
invariantsOnly=TRUE to extract only those variables in the groupedData
object that are invariant within levels of the grouping factor. These could
be experimental factors, like Diet in the BodyWeight data, or they could
simply be additional characteristics of the groups, like Sex in the Orthodont
data. The Theoph data are another example of a medical study where the
grouping is by Subject. They are from a study of the pharmacokinetics of
the drug theophylline, which is used to treat asthma. Each subjects weight
3.4 Summaries
121
122
within each group. When multiple modes are present, the rst element of
the sorted modes is returned.
> gsummary( Theoph )
Subject
Wt Dose
6
6 80.0 4.00
7
7 64.6 4.95
8
8 70.5 4.53
11
11 65.0 4.92
3
3 70.5 4.53
2
2 72.4 4.40
4
4 72.7 4.40
9
9 86.4 3.10
12
12 60.5 5.30
10
10 58.2 5.50
1
1 79.6 4.02
5
5 54.6 5.86
time
5.888182
5.865455
5.890000
5.871818
5.907273
5.869091
5.940000
5.868182
5.876364
5.915455
5.950000
5.893636
conc
3.525455
3.910909
4.271818
4.510909
5.086364
4.823636
4.940000
4.893636
5.410000
5.930909
6.439091
5.782727
By giving a numeric summary function, which is a function that calculates a single numerical value from a numeric vector, as the argument
FUN, we can produce other summaries. For example, we can check that
the Theoph data are sorted according to increasing values of the maximum
response with
> gsummary( Theoph, FUN = max, omit = TRUE )
Wt Dose time conc
6 80.0 4.00 23.85 6.44
7 64.6 4.95 24.22 7.09
8 70.5 4.53 24.12 7.56
11 65.0 4.92 24.08 8.00
3 70.5 4.53 24.17 8.20
2 72.4 4.40 24.30 8.33
4 72.7 4.40 24.65 8.60
9 86.4 3.10 24.43 9.03
12 60.5 5.30 24.15 9.75
10 58.2 5.50 23.70 10.21
1 79.6 4.02 24.37 10.50
5 54.6 5.86 24.35 11.40
This ordering of the subjects does indeed give increasing maximum concentration of theophylline.
The FUN argument to gsummary is applied only to numeric variables in the
grouped data object. Any non-numeric variables are represented by their
modes within each group. A variable that is invariant within each group
is represented by the (single) value that it assumes within each group. In
other words, the value returned for each variable is determined according
to:
If the variable is an invariant, its value within each group is returned.
3.4 Summaries
123
described in Appendix A.25, is from a study where thirteen dierent variables were recorded on 136 subjects.
> Quin.sum <- gsummary( Quinidine, omit = TRUE, FUN = mean )
> dim( Quin.sum )
[1] 136 13
we see some unusual results. The summarized values of conc, dose, and
interval are always recorded as missing values (NA). A less obvious peculiarity in the data is an apparent inconsistency in the numeric values of
height and weight; for some reason height was recorded in inches while
weight was recorded in kilograms.
Returning to the question of the missing values in conc, dose, and interval, the data from a single subject indicate the reason for this
124
Each observation is either a record of a dosage or a record of a concentration measurement but never both. Thus, whenever dose is present, conc is
missing and whenever conc is present, both dose and interval are missing.
Because any subject in the experiment must have at least one dosage record
and at least one concentration record, every subject has missing data in
the conc, dose, and interval variables.
The default behavior of most summary functions in S is to return the
value NA if the input vector contains any missing values. The mean function
behaves like this and returns NA for every subject in each of these three
variables. The behavior can be overridden in mean (and in several other
summary functions) by giving the optional argument na.rm = TRUE. The
default value of FUN in gsummary is mean with na.rm = TRUE.
> Quin.sum1 <- gsummary( Quinidine, omit = TRUE )
> Quin.sum1[1:10, 1:7]
time
conc
dose interval
Age Height Weight
1
92.817 2.2000 268.71
6.0000 60.000
69 106.000
2
12.200 1.2000 166.00
NA 58.000
69 85.000
3 2090.015 3.1333 201.00
7.3333 67.000
69 65.294
4 137.790 3.3667 236.39
7.3333 88.000
66 95.185
5
24.375 0.7000 301.00
NA 62.000
71 66.000
6
3.625 2.6000 166.00
6.0000 76.000
71 93.000
7 1187.320 2.7833 256.22
6.0000 60.097
66 85.484
8
20.019 2.5000 166.00
NA 52.000
71 75.000
9
69.200 3.9500 498.00
6.0000 68.000
70 79.000
10 1717.261 3.1667 201.00
8.0000 73.154
69 79.462
Notice that there are still some NAs in the groupwise summary for interval. For these subjects every value of interval was missing.
3.4 Summaries
125
The function summary can be used with any data frame to summarize the
columns according to their class. In particular, we can use it on Quin.sum1 to
obtain some summary statistics for each variable with each group (subject)
counted only once.
> summary( Quin.sum1 )
time
conc
Min.
:
0.065
Min.
:0.50
1st Qu.: 19.300
1st Qu.:1.70
Median : 47.200
Median :2.24
Mean
: 251.000
Mean
:2.36
3rd Qu.: 171.000
3rd Qu.:2.92
Max.
:5360.000
Max.
:5.77
dose
Min.
: 83
1st Qu.:198
Median :201
Mean
:224
3rd Qu.:249
Max.
:498
Age
Height
Weight
Min.
:42.0
Min.
:60.0
Min.
: 41.0
1st Qu.:61.0
1st Qu.:67.0
1st Qu.: 67.8
Median :66.0
Median :70.0
Median : 79.2
Mean
:66.9
Mean
:69.6
Mean
: 79.2
3rd Qu.:73.0
3rd Qu.:72.0
3rd Qu.: 88.2
Max.
:92.0
Max.
:79.0
Max.
:119.0
Smoke
Ethanol
Heart
Creatinine
no :94
none
:90
No/Mild :55
< 50 : 36
yes:42
current:16
Moderate:40
>= 50:100
former :30
Severe :41
interval
Min.
: 5.00
1st Qu.: 6.00
Median : 6.00
Mean
: 6.99
3rd Qu.: 8.00
Max.
:12.00
NAs
:29.00
Race
Caucasian:91
Latin
:35
Black
:10
glyco
Min.
:0.390
1st Qu.:0.885
Median :1.170
Mean
:1.210
3rd Qu.:1.450
Max.
:2.990
126
yes: 447
current:191
former :289
Moderate:375
Severe :498
>= 50:1053
glyco
Min.
:0.39
1st Qu.:0.93
Median :1.23
Mean
:1.28
3rd Qu.:1.54
Max.
:3.16
The rst summary tells us that there are 94 nonsmokers in the study and
42 smokers while the second tells us that there are 1024 total observations
on the nonsmokers and 447 on the smokers.
Both summaries are useful to us in understanding these data. Because
NAs in conc and dose are mutually exclusive, we can see that there are
at most 1110 dosage records and at most 443 concentration measurements.
Because there are 136 subjects in the study, this means there are on average
fewer than three concentration measurements per subject.
We can get an exact count of the number of concentrations by counting
the number of nonmissing values in the entire conc variable. One way to
do this is
> sum( ifelse(is.na(Quinidine[["conc"]]), 0, 1) )
[1] 361
because the logical values TRUE and FALSE are interpreted as 1 and 0 in
arithmetic expressions.
A similar expression
> sum( !is.na(Quinidine[["dose"]]) )
[1] 1028
tells us that there are 1028 dosage events. The 164 rows that are neither
dosage records nor concentration measurements are cases where values of
other variables changed for the subject.
With only 361 concentration measurements for 136 subjects there are
fewer than three concentration measurements per subject. To explore this
further, we would like to determine the distribution of the number of concentration measurements per subject. We need to divide the data by subject
(or by group in our general terminology) and count the number of nonmissing values in the conc variable for each subject. The function gapply is
available to perform calculations such as this. It applies another function
to some or all of the variables in a grouped data object by group.
3.4 Summaries
127
The result can be a bit confusing when printed in this way. It is a named
vector of the counts where the names are the values of the Subject variable.
Thus, subjects 124 and 125 both had only a single concentration measurement. To obtain the distribution of the measurements, we apply the table
function to this vector
> table( gapply(Quinidine, "conc", function(x) sum(!is.na(x))) )
1 2 3 4 5 6 7 10 11
46 33 31 9 3 8 2 1 3
We see that most of the subjects in the study have very few measurements
of the response. A total of 110 out of the 136 subjects have fewer than four
response measurements. This is not uncommon in such routine clinical data
(data that are collected in the routine course of treatment of patients). A
common consequence of having so few observations for most of the subjects
is that the information gained from such a study is imprecise compared to
that gained from a controlled experiment.
The second argument to gapply is the name or index of the variable in
the groupedData object to which we will apply the function. When only a
single name or index is given, the value passed to the function to be applied
is in the form of a vector. If more than one variable is to be passed, they
are passed as a data.frame. If this argument is missing, all the variables in
the groupedData object are passed as a data.frame so the eect is to select
subsets of the rows corresponding to unique values of the grouping factor
from the groupedData object.
To illustrate this, let us determine those subjects for whom there are
records that are neither dosage records nor concentration records. For each
subject we must determine whether there are any records with both the
conc variable and the dose variable missing.
> changeRecords <- gapply( Quinidine, FUN = function(frm)
+
any(is.na(frm[["conc"]]) & is.na(frm[["dose"]])) )
128
> changeRecords
109 70 23 92 111 5 18 24 2 88 91 117 120 13 89 27 53 122 129 132
F F F F
F F T F F F F
F
F F F F F
F
F
T
...
78 25 61 3 64 60 59 10 69 4 81 54 41 74 28 51
F F T T T F F T F T T T T F T F
we can see that there are no changes in any of the variables except for time
so this row is redundant. We did not cull such redundant rows from the
data set because we wanted to be able directly to compare results based
on these data with analyses done by others.
The data for subject 4 provide a better example.
> Quinidine[Quinidine[["Subject"]] == 4, ]
Grouped Data: conc ~ time | Subject
Subject
time conc dose interval Age Height Weight Race Smoke
45
4
0.00
NA 332
NA 88
66
103 Black
yes
46
4
7.00
NA 332
NA 88
66
103 Black
yes
47
4 13.00
NA 332
NA 88
66
103 Black
yes
48
4 19.00
NA 332
NA 88
66
103 Black
yes
49
4 21.50 3.1
NA
NA 88
66
103 Black
yes
50
4 85.00
NA 249
6 88
66
103 Black
yes
51
4 91.00 5.8
NA
NA 88
66
103 Black
yes
3.4 Summaries
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
Ethanol
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
91.08
NA 249
NA
97.00
NA 249
NA
103.00
NA 249
NA
105.00
NA
NA
NA
109.00
NA 249
NA
115.00
NA 249
NA
145.00
NA 166
NA
151.00
NA 166
NA
156.00 3.1
NA
NA
157.00
NA 166
NA
163.00
NA 166
NA
169.00
NA 166
NA
174.75
NA 201
NA
177.00
NA
NA
NA
181.50 3.1
NA
NA
245.00
NA 201
8
249.00
NA
NA
NA
252.50 3.2
NA
NA
317.00
NA 201
8
326.00 1.9
NA
NA
Heart Creatinine glyco
Severe
>= 50 1.48
Severe
>= 50 1.48
Severe
>= 50 1.48
Severe
>= 50 1.48
Severe
>= 50 1.48
Severe
>= 50 1.61
Severe
>= 50 1.61
Severe
>= 50 1.61
Severe
>= 50 1.61
Severe
>= 50 1.61
Severe
>= 50 1.61
Severe
>= 50 1.61
Severe
>= 50 1.61
Severe
>= 50 1.88
Severe
>= 50 1.88
Severe
>= 50 1.88
Severe
>= 50 1.88
Severe
>= 50 1.88
Severe
>= 50 1.88
Severe
>= 50 1.88
Severe
>= 50 1.68
Severe
>= 50 1.68
Severe
>= 50 1.87
Severe
>= 50 1.87
Severe
>= 50 1.87
Severe
>= 50 1.83
Severe
>= 50 1.83
88
88
88
88
88
88
88
88
88
88
88
88
88
88
88
88
88
88
88
88
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
103
103
103
92
92
92
92
92
92
92
92
92
92
92
92
92
86
86
86
86
Black
Black
Black
Black
Black
Black
Black
Black
Black
Black
Black
Black
Black
Black
Black
Black
Black
Black
Black
Black
129
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
130
Here rows 55, 65, and 68 are in fact change rows. Both rows 55 and 68
record changes in the subjects weight. Row 65 records a change in the
glyco variable (i.e., the serum concentration of alpha-1-acid glycoprotein).
Exercises
1. In Figure 3.6 (p. 113), the twelve panels in the plot of the CO2 data
were laid out as two rows of six columns. Create a plot of these
data arranged as four rows of three columns. You should insert some
space between the second and third rows to separate the panels for
the Mississippi plants from those for the Quebec plants.
2. Use the outer argument to the plot function to produce a plot of the
CO2 data similar to Figure 8.15 (p. 369) where each panel displays
the data for all three plants at some combination of Variety and
Treatment. Produce two such plots of these data, each laid out as two
rows by two columns. One plot should have the rows determined by
Variety and the columns by Treatment. This arrangement makes it
easy to assess the eect of Treatment within a Variety. The other plot
should have rows determined by Treatment and columns by Variety,
allowing easy assessment of the eect of Variety within Treatment.
Exercises
131
132
4
Fitting Linear Mixed-Eects Models
134
where formula species the linear model to be tted and data gives a data
frame in which the variables in formula are to be evaluated. Several other
arguments to lm are available and are described in detail in Chambers and
Hastie (1992, Chapter 4) and also in Venables and Ripley (1999, Chapter
6).
The formula language used in the formula argument gives lm great exibility in specifying linear models. The formulas use a version of the syntax
dened by Wilkinson and Rogers (1973), which translates a linear model
like y = 0 + 1 x1 + 2 x2 + into the S expression
y ~ 1 + x1 + x2
The ~ is read is modeled as. The expression on the left-hand side of the
~ species the response to be modeled. The expression on the right-hand
side describes the covariates used and the ways they will be combined to
form the model. The expression does not include the coecients (the s).
They are implicit.
The constant term 1 is included by default and does not need to be given
explicitly in the model. If a model without an intercept term is desired, a
-1 must be included in the formula. The covariates can be factors, ordered
factors, continuous variables, matrices, or data frames. Any function of a
covariate that returns one of these types of covariate can also be used on
the right-hand side of the formula. Function calls are also allowed on the
left-hand side of the formula. For example,
log(y) ~ exp(x1) + cos(x2)
135
Diagnostic plots for assessing the quality of the t are obtained using the
method for the plot generic function
> par( mfrow=c(3,2) )
> plot( fm1Orth.lm )
(4.1)
-4
-6
2.0
1.5
23
24
25
0.0
22
23
24
6
0
Residuals
-4
-6
23
24
25
26
-2
-1
-2
0.0
f-value
0.08
0.06
0.0
-4
-6
-4
-6
101
40
0.04
0.8
104
Cooks Distance
0.4
0.02
4
2
Residuals
Fitted Values
distance
0
Fitted : age
-2
104
22
0.0
26
35 39
-2
25
Fitted : age
30
25
20
distance
26
Fitted : age
104
22
104
39
35
sqrt(abs(resid(fm1Orth.lm)))
4
2
0
-2
Residuals
2.5
1.0
39
35
0.5
136
0.4
0.8
20
40
60
80
100
Index
FIGURE 4.1. Diagnostic plots for the simple linear regression model t of the
orthodontic growth curve data.
137
with Sex representing a binary variable taking values 1 for boys and 1 for
girls. The parameters 1 and 3 represent, respectively, the intercept and
slope gender eects. We can t this model in S with another call to lm or
by using update on the previous tted model and redening the formula.
> fm2Orth.lm <- update( fm1Orth.lm, formula = distance ~ Sex*age )
The expression Sex*age in a linear model formula crosses the Sex and age
factors. This means it generates the main eects for these factors and their
interaction. It is equivalent to Sex + age + Sex:age.
The summary method displays the results in more detail. In particular,
it provides information about the marginal signicance of the parameter
estimates.
> summary( fm2Orth.lm )
Call: lm(formula = distance ~Sex + age + Sex:age, data = Orthodont)
Residuals:
Min
1Q Median
3Q Max
-5.62 -1.32 -0.168 1.33 5.25
Coefficients:
(Intercept)
Sex
age
Sex:age
The p-values for the Sex and Sex:age coecients suggest that there is no
gender eect on the orthodontic measurement growth. Because the t-test
is only measuring the marginal signicance of each term in the model, we
should proceed with caution and delete one term at a time from the model.
Deleting rst the least signicant term, Sex, we get:
> fm3Orth.lm <- update( fm2Orth.lm, formula = . ~ . - Sex )
> summary( fm3Orth.lm )
. . .
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 16.761
1.086
15.432
0.000
138
-4
-2
resid(fm2Orth.lm)
age
Sex:age
. . .
0.640
-0.107
0.097
0.020
6.613
-5.474
0.000
0.000
By convention, the .~. expression represents the formula in the object being
updated and the - operator is used to delete terms from the model.
The Sex:age coecient now becomes very signicant, indicating that the
growth patterns are dierent for boys and girls. Because the lm t is not
adequate for these data, we will postpone further discussion of these modelbuilding issues until the linear mixed-eects model has been described.
The grouped nature of these data, with repeated measures on each subject at four dierent years, violates the basic assumption of independence
that underlies the statistical methods used in lm. Boxplots of the fm2Orth.lm
residuals by subject show this.
> bwplot( getGroups(Orthodont)~resid(fm2Orth.lm) )
# Figure 4.2
The most important feature observed in Figure 4.2 is that residuals corresponding to the same subject tend to have the same sign. This indicates the
need for a subject eect in the model, which is precisely the motivation
for linear mixed-eects models.
139
where the right-hand side of the formula consists of two parts separated by
the | operator. The rst part species the linear model to be tted to each
subset of data; the second part species the grouping factor. Any linear
formula allowed in lm can also be used as a model formula with lmList.
The data argument gives the data frame in which to nd the variables used
in formula.
Continuing with the analysis of the orthodontic data, we see from a Trellis
plot of these data (Figure 1.11, page 31) that a simple linear regression
model of distance as a function of age may be suitable. We t this by
> fm1Orth.lis <- lmList( distance ~ age | Subject, Orthodont )
If data is a groupedData object (see Chapter 3), the grouping variable can
be omitted from formula, being extracted from the group formula in data.
> getGroupsFormula( Orthodont )
~ Subject
we can obtain the same tted model object with the simpler call
> fm1Orth.lis <- lmList( Orthodont )
Objects returned by lmList are of class lmList, for which several display
and plot methods are available. Table 4.1 lists some of the most important
methods for class lmList. We illustrate the use of some of these methods
below.
The print method displays minimal information about the tted object.
140
> fm1Orth.lis
Call:
Model: distance ~ age | Subject
Data: Orthodont
Coefficients:
(Intercept)
age
M16
16.95 0.550
. . .
F11
18.95 0.675
Degrees of freedom: 108 total; 54 residual
Residual standard error: 1.31
The residual standard error given in the output is the pooled estimate of
the standard error calculated from the individual lm ts by group. More
detailed output can be obtained using the summary method.
> summary( fm1Orth.lis )
Call:
Model: distance ~ age | Subject
Data: Orthodont
Coefficients:
(Intercept)
Value Std. Error t value
Pr(>|t|)
M16 16.95
3.2882 5.15484 3.6952e-06
141
M13
age
1.5
1.0
0.5
10
15
20
25
(Intercept)
M05 13.65
. . .
F11 0.675
# Figure 4.3
142
M13
I(age - 11)
1.5
1.0
0.5
20
22
24
26
28
(Intercept)
FIGURE 4.4. Pairs plot for fm2Orth.lis with ages centered at 11 years.
age - 11. The two quantities being estimated then are the distance at 11
years of age and the slope, or growth rate. We t this revised model with
> fm2Orth.lis <- update( fm1Orth.lis, distance ~ I(age-11) )
The corresponding pairs plot (Figure 4.4) does not suggest any correlation
between the intercept estimates and the slope estimates. It is clear that the
orthodontic distance for subject M13 has grown at an unusually fast rate, but
his orthodontic distance at age 11 was about average. Both intercept and
slope estimates seem to vary with individual, but to see how signicantly
they vary among subjects we need to consider the precision of the lmList
estimates. This can be evaluated with the intervals method.
> intervals( fm2Orth.lis )
, , (Intercept)
lower
est.
M16 21.687 23.000
M05 21.687 23.000
M02 22.062 23.375
. . .
F04 23.562 24.875
F11 25.062 26.375
upper
24.313
24.313
24.688
26.188
27.688
, , I(age - 11)
lower est. upper
M16 -0.037297 0.550 1.1373
M05 0.262703 0.850 1.4373
F04
F03
F02
F07
F05
F01
|
|
F06
F09
|
|
|
M13
|
|
22
|
|
|
|
26
28
30
-0.5
|
|
|
|
|
|
|
|
0.0
24
|
|
|
|
|
|
M08
|
|
M14
|
|
|
|
M02
|
|
20
M07
|
|
M04
M03
|
|
M12
M06
M11
|
|
M01
18
|
|
|
|
|
|
|
|
|
|
|
M16
|
|
|
|
M15
M09
|
|
M10
M05
|
|
F08
F10
I(age - 11)
F11
143
|
|
0.5
1.0
1.5
2.0
2.5
FIGURE 4.5. Ninety-ve percent condence intervals on intercept and slope for
each subject in the orthodontic distance growth data.
# Figure 4.5
144
0
8
10
20
30
40
50
10
20
30
40
50
10
8
6
4
2
10
10
8
6
4
2
10
20
30
40
50
10
20
30
40
50
10
20
30
40
50
Lot
|||
|||
|
|
1
6
|||
|
|||
|
|
2
10
|||
|
|||
|
age
145
|||
6
-1.0
-0.5
0.0
0.5
1.0
FIGURE 4.7. Ninety-ve percent condence intervals on intercept and slope for
each lot in the IGF data.
1
10
2
8
5
4
3
7
5.4929
6.0516
5.4764
5.5922
5.3732
5.5768
5.2788
5.2069
-0.0077901
-0.0473282
-0.0144271
0.0060638
-0.0095140
-0.0166578
0.0100830
0.0093136
# Figure 4.7
Because of the imbalance in the data, these condence intervals have very
dierent lengths. There is little indication of lot-to-lot variation in either
the intercept or the slope estimates, since all condence intervals overlap.
A xed-eects model seems adequate to represent the IGF data.
> fm1IGF.lm <- lm( conc ~ age, data = IGF )
> summary( fm1IGF.lm )
. . .
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept)
5.351
0.104
51.584
0.000
age -0.001
0.004
-0.170
0.865
Residual standard error: 0.833 on 235 degrees of freedom
. . .
146
There does not appear to be a signicant tracer decay within the 50-day
period over which the data were collected. A linear mixed-eects model for
the IGF data will be considered in the next section.
The rst argument is a two-sided linear formula specifying the xed eects
in the model. The third argument is typically given as a one-sided linear
formula, specifying the random eects and the grouping structure in the
model. For the orthodontic data, with the ages centered at 11 years, these
formulas are:
fixed = distance ~ I(age-11), random = ~ I(age-11) | Subject
Note that the response is specied only in the fixed formula. If the random
formula is omitted, its default value is taken as the right-hand side of the
fixed formula. This describes a model in which every xed eect has an
associated random eect. To use this default, data must be a groupedData
object, so the formula for the grouping structure can be obtained from the
display formula.
The argument data species a data frame in which the variables named in
fixed and random can be evaluated. When data inherits from class groupedData, the expression dening the grouping structure can be omitted in
random.
A simple call to lme to t the orthodontic data model is
> fm1Orth.lme <- lme( distance ~ I(age-11), data = Orthodont,
+
random = ~ I(age-11) | Subject )
147
Because the lme function is generic, the model can be described in several
dierent ways. For example, there is an lme method for lmList objects. When
an lmList object, such as fm2Orth.lis in 4.1.1, is given as the rst argument
to lme, it provides default values for fixed, random, and data. We can create
the same tted model with the simple call
> fm1Orth.lme <- lme( fm2Orth.lis )
One advantage of this method is that initial estimates for the parameters
in the proled (restricted-)likelihood of the mixed-eects model are automatically calculated from the lmList object.
The tted object is of the lme class, for which several methods are available to display, plot, update, and further explore the estimation results.
Table 4.2 lists the most important methods for class lme. We illustrate the
use of these methods through the examples in the next sections.
Orthodontic Growth Curve
As for all the classes of objects representing tted models, the print method
for the lme class returns a brief description of the estimation results. It
prints the estimates of the standard deviations and the correlations of the
148
random eects, the within-group standard error, and the xed eects. For
the fm1Orth.lme object it gives
> fm1Orth.lme
Linear mixed-effects model fit by REML
Data: Orthodont
Log-restricted-likelihood: -221.32
Fixed: distance ~ I(age - 11)
(Intercept) I(age - 11)
24.023
0.66019
Random effects:
Formula: ~ I(age - 11) | Subject
Structure: General positive-definite
StdDev
Corr
(Intercept) 2.13433 (Inter
I(age - 11) 0.22643 0.503
Residual 1.31004
Number of Observations: 108
Number of Groups: 27
One of the questions of interest for the orthodontic growth data is whether
boys and girls have dierent growth patterns. We can assess this by tting
the model
> fm2Orth.lme <- update(fm1Orth.lme,fixed = distance~Sex*I(age-11))
Note that lmList cannot be used to test for gender dierences in the orthodontic growth data, as it estimates individual coecients for each subject. In general, we will not be able to use lmList to test for dierences due
to factors that are invariant with respect to the groups.
Some more detailed output is supplied by summary.
> summary( fm2Orth.lme )
Linear mixed-effects model fit by REML
Data: Orthodont
AIC
BIC logLik
451.35 472.51 -217.68
Random effects:
Formula: ~ I(age - 11) | Subject
Structure: General positive-definite
StdDev
Corr
(Intercept) 1.83033 (Inter
I(age - 11) 0.18035 0.206
Residual 1.31004
Fixed effects: distance ~ Sex + I(age - 11) + Sex:I(age - 11)
(Intercept)
Sex
I(age - 11)
Sex:I(age - 11)
Correlation:
149
(Intrc
Sex I(-11)
Sex 0.185
I(age - 11) 0.102 0.019
Sex:I(age - 11) 0.019 0.102 0.185
Standardized Within-Group Residuals:
Min
Q1
Med
Q3
Max
-3.1681 -0.38594 0.0071041 0.44515 3.8495
Number of Observations: 108
Number of Groups: 27
The small p-values associated with Sex and Sex:I(age-11) in the summary
output indicate that boys and girls have signicantly dierent orthodontic
growth patterns.
The fitted method is used to extract the tted values from the lme
object, using the methodology described in 1.4.2. By default, the withingroup tted values, that is, the tted values corresponding to the individual
coecient estimates, are produced. Population tted values, based on the
xed-eects estimates alone, are obtained setting the level argument to 0
(zero). Both types of tted values can be simultaneously obtained with
> fitted( fm2Orth.lme, level = 0:1 )
fixed Subject
1 22.616 24.846
2 24.184 26.576
3 25.753 28.307
. . .
Residuals are extracted with the resid method, which also takes a level
argument.
> resid( fm2Orth.lme, level = 1 )
M01
M01
M01
M01
M02
M02
M02
M02
1.1543 -1.5765 0.69274 0.96198 0.22522 -0.29641 -1.318 0.66034
. . .
F10
F10
F10
F10
F11
F11
F11
-1.2233 0.44296 -0.39073 -0.72443 0.28277 -0.37929 1.4587
F11
0.29661
attr(, "label"):
[1] "Residuals (mm)"
150
Predicted values are obtained with the predict method. For example, to
predict the orthodontic distance for boy M11 and girl F03 at ages 16, 17 and
18, we rst dene a data frame with the relevant information
> newOrth <- data.frame( Subject = rep(c("M11","F03"), c(3, 3)),
+
Sex = rep(c("Male", "Female"), c(3, 3)),
+
age = rep(16:18, 2) )
151
0.59133
152
coef(fm2Orth.lis)
coef(fm1Orth.lme)
(Intercept)
F11
F04
F03
F08
F02
F07
F05
F01
F06
F09
F10
M10
M01
M04
M06
M15
M09
M14
M13
M12
M03
M08
M07
M11
M02
M05
M16
+
+
+
+
+
+
+
+
20
22
+
+
+
+
I(age - 11)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
24
26
28
+
+
+
0.5
1.0
1.5
2.0
FIGURE 4.8. Individual estimates from an lmList t and from an lme t of the
orthodontic distance growth data
M05
0.850
0.68579
M02
0.775
0.67469
. . .
M13
1.950
1.07385
. . .
F04
0.475
0.63032
F11
0.675
0.74338
> plot( compOrth, mark = fixef(fm1Orth.lme) )
# Figure 4.8
The mark argument to the plot method indicates points in the horizontal
axis where dashed vertical lines should be drawn.
The plots in Figure 4.8 indicate that the individual estimates from the lme
t tend to be pulled toward the xed-eects estimates, when compared
to the lmList estimates. This is typical of linear mixed-eects estimation:
the individual coecient estimates from the lme t represent a compromise
between the coecients from the individual ts corresponding to the lmList
t and the xed-eects estimates, associated with the population averages.
For this reason, these estimates are often called shrinkage estimates, in
the sense that they shrink the individual estimates toward the population
average.
The shrinkage toward the xed eects is particularly noticeable for the
slope estimate of subject M13. As pointed out in 4.2, this subject has an
outlying orthodontic growth pattern, which leads to an abnormally high
estimated slope in the lm t. The pooling of subjects in the lme estimation
gives a certain amount of robustness to individual outlying behavior. This
feature is better illustrated by the comparison of the predicted values from
the two ts, which is obtained with the comparePred function.
> plot( comparePred(fm2Orth.lis, fm1Orth.lme, length.out = 2),
F03
10
153
fm1Orth.lme
13
F04
F11
30
25
20
F10
F09
F06
F01
F05
F07
F02
F08
M13
M14
M09
M15
M06
M04
M01
M10
30
25
20
30
25
20
M16
M05
M02
M11
M07
M08
M03
M12
30
25
20
10
13
10
13
10
13
10
13
Age (yr)
FIGURE 4.9. Individual predicted values from separate lm ts and from an lme
t of the orthodontic distance growth data
# Figure 4.9
The length.out argument species the number of predictions for each tted
object. In this case, because the model is a straight line, only two points
are needed. The plot of the individual predictions for the lmList and lme
ts, shown in Figure 4.9, clearly indicates the greater sensitivity of the
individual lm ts to extreme observations.
It is also interesting to compare the fm2Orth.lme and the fmOrth.lmeM
objects, corresponding, respectively, to REML and ML ts of the same
model. We compare the estimated random eects for each t with the
compareFits function.
> plot( compareFits(ranef(fm2Orth.lme), ranef(fm2Orth.lmeM)),
+
mark = c(0, 0) )
# Figure 4.10
154
ranef(fm2Orth.lme)
ranef(fm2Orth.lmeM)
(Intercept)
F11
F04
F03
F08
F02
F07
F05
F01
F06
F09
F10
M10
M01
M04
M06
M15
M09
M14
M13
M12
M03
M08
M07
M11
M02
M05
M16
+
+
+
+
-2
+
+
+
+
+
+
+
+
+
I(age - 11)
+
+
+
+
+
+
+
+
+
+
4
+
+
+
+
+
+
-0.1
+
+
0.0
0.1
0.2
0.3
FIGURE 4.10. Individual random-eects estimates from restricted maximum likelihood (REML) and maximum likelihood (ML) ts of the orthodontic distance
growth data
The pointwise estimates of the xed eects are almost identical, but their
standard errors are quite dierent. The lm t has smaller standard errors
for the (Intercept) and Sex xed eects and larger standard errors for the
xed eects involving age. This is because the model used in lm ignores the
group structure of the data and incorrectly combines the between-group
and the within-group variation in the residual standard error. Fixed eects
that are associated with invariant factors (factors that do not vary within
groups) are actually estimated with less precision than suggested by the lm
output, because the contribution of the between-group variation to their
standard error is larger than that included in the lm residual standard
error. Conversely, the precision of the xed eects related to variables that
vary within group are less aected by the between-group variation. In the
terminology of split-plot experiments, (Intercept) and Sex are associated
155
In this case, as evidenced by the low p-value for the likelihood ratio test,
the linear mixed-eects model provides a much better description of the
data than the linear regression model.
Radioimmunoassays of IGF-I
The linear mixed-eects model corresponding to the simple linear regression
of the estimated concentration of IGF-I (yij ) in the jth tracer sample within
the ith lot on the age of the tracer sample (xij ) is
yij = (0 + b0i ) + (1 + b1i ) xij + ij ,
b0i
N (0, ) , ij N 0, 2 ,
bi =
b1i
(4.2)
where 0 and 1 are, respectively, the xed eects for the intercept and the
slope; the bi are the random-eects vectors, assumed to be independent
for dierent lots; and the ij are the independent, identically distributed
within-group errors, assumed to be independent of the random eects.
We t the linear mixed-eects model (4.2) with
> fm1IGF.lme <- lme( fm1IGF.lis )
> fm1IGF.lme
Linear mixed-effects model fit by REML
Data: IGF
Log-restricted-likelihood: -297.18
Fixed: conc ~ age
(Intercept)
age
5.375 -0.0025337
Random effects:
Formula: ~ age | Lot
Structure: General positive-definite
StdDev
Corr
(Intercept) 0.0823594 (Inter
age 0.0080862 -1
Residual 0.8206310
Number of Observations: 237
Number of Groups: 10
156
The xed-eects estimates are similar to the ones obtained with the single lm t of 4.1.1. The within-group standard errors are also similar in
the two ts, which suggests that not much is gained by incorporating random eects into the model. The estimated correlation between the random
eects ( 1) gives a clear indication that the estimated random-eects
covariance matrix is ill-conditioned, suggesting that the model may be overparameterized. The condence intervals for the standard deviations and
correlation coecient reinforce the indication of overparameterization.
> intervals( fm1IGF.lme )
Approximate 95% confidence intervals
Fixed effects:
lower
est.
upper
(Intercept) 5.163178 5.3749606 5.5867427
age -0.012471 -0.0025337 0.0074039
Random Effects:
Level: Lot
lower
est.
upper
sd((Intercept)) 0.0011710 0.0823594 5.792715
sd(age) 0.0013177 0.0080862 0.049623
cor((Intercept),age) -1.0000000 -0.9999640 1.000000
Within-group standard error:
lower
est.
upper
0.7212 0.82063 0.93377
The 95% condence interval for the correlation coecient covers all possible
values for this parameter. There is also evidence of large variability in the
estimates of the (Intercept) and age standard deviations. These issues are
explored in more detail in 4.3.
The primary question of interest in the IGF-I study is whether the tracer
decays with age. We can investigate it with the summary method.
> summary( fm1IGF.lme )
. . .
Fixed effects: conc ~ age
Value Std.Error DF t-value p-value
(Intercept) 5.3750
0.10748 226 50.011 <.0001
age -0.0025
0.00504 226 -0.502 0.6159
. . .
coef(fm1IGF.lis)
coef(fm1IGF.lme)
(Intercept)
age
+
+
7
3
4
5
8
2
10
1
6
9
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
4.6
4.8
5.0
5.2
157
5.4
5.6
5.8
6.0 0.05
0.0
0.05
0.10
0.15
toward the population mean pattern for the individual lme coecients.
The IGF-I data contain several outlying observations and the dramatic
shrinkage in the coecient estimates observed for some of the lots reects
the greater robustness of the lme t. This is better illustrated by comparing
the individual predictions under each t, as presented in Figure 4.12. The
dierences in the predicted values for the two ts are particularly dramatic
for lots 6 and 10, both of which have observations only over a very limited
time range. For lot 6 a single low observation at one of the earliest times
causes a dramatic change in the estimate of both the slope and intercept
when this lot is t by itself. When it is combined with the other lots in a
mixed-eects model the eect of this single observation is diminished. Also
notice that the outlying observations for lots 4, 3, and 7 have very little
eect on the parameter estimates because in each of these lots there are
several other observations at times both above and below the times of the
aberrant observations.
158
10
20
30
40
fm1IGF.lme
50
10
20
30
40
50
12
10
8
6
4
2
9
10
12
10
8
6
4
2
10
20
30
40
50
10
20
30
40
50
10
20
30
40
50
FIGURE 4.12. Individual predicted values from separate lm ts and from an lme
t of the IGF-I radioimmunoassays data.
block-diagonal
compound-symmetry structure
diagonal
multiple of an identity
general positive-denite matrix
159
With the exception of the standard deviation for the (Intercept) random
eect, all estimates are similar to the ones in fm1IGF.lme. We can compare
the two ts with
> anova( fm1IGF.lme, fm2IGF.lme )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm1IGF.lme
1 6 606.37 627.12 -297.18
fm2IGF.lme
2 5 604.80 622.10 -297.40 1 vs 2 0.43436 0.5099
The large p-value for the likelihood ratio test and the smaller AIC and
BIC values for the simpler model fm2IGF.lme indicate that it should be
preferred.
Because IGF is a groupedData object, the grouping structure does not
need to be given explicitly in random. In cases when both the grouping
structure and a pdMat class are to be declared in random, we use a named
list, with the name specifying the grouping factor.
> update( fm1IGF.lme, random = list(Lot = pdDiag(~ age)) )
160
> pd2
Positive definite matrix structure of class pdDiag representing
[,1] [,2]
[1,]
1
0
[2,]
0
1
> formula( pd2 )
~ age
This can be used to provide initial values for the scaled variancecovariance
matrix of the random eects, D = / 2 in the lme call.
> lme( conc ~ age, IGF, pdDiag(diag(2), ~age) )
(4.3)
where i indexes the Blocks, j indexes the Varieties, and k indexes the Nitrogen concentrations Nk . The intercept is represented by 0 , the Nitrogen
slope by 1 and the yield by yijk . The bi denote the Block random eects,
the bi,j denote the Variety within Block random eects, and the ijk denote the within-group errors. This is an example of a two-level mixed-eects
model, with the bi,j random eects nested within the bi random eects.
The multilevel model capabilities of lme were used in 1.6 to t (4.3). We
recommend tting the model this way, as it uses ecient computational
algorithms designed specically for this type of model. Nevertheless, to
further illustrate the use of the pdMat classes, we consider equivalent singlelevel representations of the same model.
By dening
yi =
yi11
yi12
..
.
i =
yi34
N1
1
0
Xi = 1
0
1
0
i11
i12
..
.
i34
0
N2
0
0
0
0
N3
0
=
0
0
,
0
N4
0
1
bi + bi,1
bi = bi + bi,2 ,
bi + bi,3
,
1
1 0 0
1
Zi = 0 1 0
1 ,
0 0 1
1
161
with denoting the Kronecker product, we can rewrite (4.3) as the singlelevel model
y i = X i + Z i bi + i , bi N (0, ) , i N 0, 2 I ,
(4.4)
where
12 + 22
12
=
12
12
2
1 + 22
12
12
.
12
2
1 + 22
162
= 12 / 12 + 22 . Therefore, 12 = b2 and 22 = b2 12 . We can
then derive the REML estimates of 1 and 2 from fm4OatsB
1
bi
1
1
0
0
bi,1
, Wi = 1 0 1 0 1 ,
b
i = b
1
i,2
1 0 0 1
bi,3
1
2
1 0
0
0
0 22 0
0
=
2
0
0 2 0
0
0
0 22
and writing
y i = X i + W i b
i + i ,
b
i N (0, ) ,
i N 0, 2 I . (4.5)
163
StdDev:
(Intercept)
14.505
164
dba f
c edb
fc b
e a b cf d
Block
b
a b
fc
f cd
2
1
ae
bf c
a
c
bf
d
d
1
a e
e
-0.2
b fc
fa b
d
cd
0.0
0.2
0.4
0.6
Log-optical density
measure of the number of cells that are alive and healthy. These data are
described in detail in Appendix A.2 and are included in the nlme library
as the groupedData object Assay.
The plot of the log-optical densities, displayed in Figure 4.13, indicates
that the response increases with dilution and is generally lower for treatments a and e. There does not appear to be any interactions between sample
and dilution.
A full factorial model is used to represent the xed eects and three
random eects are used to account for block, row, and column eects, with
the last two random eects nested within block, but crossed with each
other. The corresponding mixed-eects model for the log-optical density
yijk in the jth row, kth column of the ith block, for i = 1, . . . , 2, j =
1, . . . , 6, k = 1, . . . , 5, is
yijk = + j + k + jk + bi + rij + cik + ijk ,
2
bi N 0, 1 , rij N 0, 22 , cik N 0, 32 , ijk N 0, 2 .
(4.6)
The xed eects in (4.6) are , the grand mean, j and k , the sample and
dilution main eects, and jk , the sample-dilution interaction. To ensure
identiability of the xed eects, it is conventioned that 1 = 1 = 1k =
165
0
0
1
0 .
Var (bi ) = 0 22 I
0
0
32 I
That is, bi has a block diagonal variancecovariance matrix, with diagonal
blocks given by multiples of the identity matrix. This type of variance
covariance structure is represented in S by a pdBlocked object with pdIdent
elements. We t the linear mixed-eects model (4.6) with lme as
> ## establishing the desired parameterization for contrasts
> options( contrasts = c("contr.treatment", "contr.poly") )
> fm1Assay <- lme( logDens ~ sample * dilut, Assay,
+
random = pdBlocked(list(pdIdent(~ 1), pdIdent(~ sample - 1),
+
pdIdent(~ dilut - 1))) )
> fm1Assay
Linear mixed-effects model fit by REML
Data: Assay
Log-restricted-likelihood: 38.536
Fixed: logDens ~ sample * dilut
(Intercept) sampleb samplec sampled
samplee samplef dilut2
-0.18279 0.080753 0.13398 0.2077 -0.023672 0.073569 0.20443
dilut3 dilut4 dilut5 samplebdilut2 samplecdilut2
0.40586 0.57319 0.72064
0.0089389
-0.0084953
sampleddilut2 sampleedilut2 samplefdilut2 samplebdilut3
0.0010793
-0.041918
0.019352
-0.025066
samplecdilut3 sampleddilut3 sampleedilut3 samplefdilut3
0.018645
0.0039886
-0.027713
0.054316
samplebdilut4 samplecdilut4 sampleddilut4 sampleedilut4
0.060789
0.0052598
-0.016486
0.049799
samplefdilut4 samplebdilut5 samplecdilut5 sampleddilut5
0.063372
-0.045762
-0.072598
-0.17776
sampleedilut5 samplefdilut5
0.013611
0.0040234
Random effects:
Composite Structure: Blocked
Block 1: (Intercept)
Formula: ~ 1 | Block
166
StdDev:
(Intercept)
0.0098084
167
(4.7)
where the lot random eects bi are assumed to be independent for dierent
i, the wafer within lot random eects bi,j are assumed to be independent
for dierent i and j and to be independent of the bi , and the within-group
errors ijk are assumed to be independent for dierent i, j, and k and to
be independent of the random eects.
The most general form of the argument random when lme is used to t
a multilevel model is as a named list where the names dene the grouping
factors and the formulas describe the random-eects models at each level.
The order of nesting is taken to be the order of the elements in the list,
with the outermost level appearing rst. In the case of (4.7) we write
random = list( Lot = ~ 1, Wafer = ~ 1 )
When the random-eects formulas are the same for all levels of grouping,
we can replace the named list by a one-sided formula with the common
168
1
6
>>>
+ +
+
> >
3
+
+
> +> >
> >
>+
>
+ +
>> >+ ++
7
2
>
+ >>+> +
3
Lot
1980
1990
2000
2010
2020
2030
169
The REML estimates of the variance components are the square of the
12 = 11.3982 = 129.91,
22 =
standard deviations in the fm1Oxide output:
5.98882 = 35.866, and
2 = 3.54532 = 12.569. We can assess the variability
in these estimates with the intervals method.
> intervals( fm1Oxide, which = "var-cov" )
Approximate 95% confidence intervals
Random Effects:
Level: Lot
lower
est. upper
sd((Intercept)) 5.0277 11.398 25.838
Level: Wafer
lower
est. upper
sd((Intercept)) 3.4615 5.9888 10.361
Within-group standard error:
lower
est. upper
2.6719 3.5453 4.7044
All intervals are bounded well away from zero, indicating that the two
random eects should be kept in (4.7). We can test, for example, if the
wafer within lot random eect can be eliminated from the model with
> fm2Oxide <- update( fm1Oxide, random = ~ 1 | Lot)
> anova( fm1Oxide, fm2Oxide )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm1Oxide
1 4 462.02 471.07 -227.01
fm2Oxide
2 3 497.13 503.92 -245.57 1 vs 2
37.11 <.0001
The very high value of the likelihood ratio test statistic conrms that the
signicance of that term in the model.
As with single-level ts, estimated BLUPs of the individual coecients
are obtained using coef, but, because of the multiple grouping levels, a
level argument is used to specify the desired grouping level. For example,
to get the estimated average oxide layer thicknesses by lot, we use
> coef( fm1Oxide, level = 1 )
(Intercept)
1
1996.7
2
1988.9
170
3
4
5
6
7
8
while the estimated average thicknesses per wafer are obtained with
> coef( fm1Oxide, level = 2 )
(Intercept)
1/1
2003.2
1/2
1984.7
1/3
2001.1
. . .
8/2
1995.2
8/3
1990.7
The level argument is used similarly with the methods fitted, predict,
ranef, and resid, with the dierence that multiple levels can be simultaneously specied. For example, to get the estimated random eects at both
grouping levels we use
> ranef( fm1Oxide, level = 1:2 )
Level: Lot
(Intercept)
1
-3.46347
2
-11.22164
. . .
8
-6.38538
Level: Wafer %in% Lot
(Intercept)
1/1
6.545993
1/2 -11.958939
. . .
8/3
-3.074863
These methods are further illustrated in 4.3, when we describe tools for
assessing the adequacy of tted models.
Manufacturing of Analog MOS Circuits
The Wafer data, introduced in 3.3.4 and shown in Figure 4.15, provide
another example of multilevel data in IC manufacturing and are used here
to illustrate the capabilities in lme when covariates are used in a multilevel
model. As described in Appendix A.30, these data come from an experiment conducted at the Microelectronics Division of Lucent Technologies to
study dierent sources of variability in the manufacturing of analog MOS
circuits. The intensity of current (in mA) at 0.8, 1.2, 1.6, 2.0, and 2.4
1.5
2.0
1.0
10
1.5
171
2.0
7
15
10
15
10
1.0
1.5
2.0
1.0
1.5
2.0
1.0
1.5
2.0
Voltage (V)
FIGURE 4.15. Current versus voltage curves for each site, by wafer.
b0i
b0i,j
bi = b1i N (0, 1 ) , bi,j = b1i,j N (0, 2 ) ,
b2i
b2i,j
2
ijk N 0, .
(4.8)
The parameters 0 , 1 , and 2 are the xed eects in the quadratic model,
bi is the wafer-level random-eects vector, bi,j is the site within waferlevel random-eects vector, and ijk is the within-group error. As usual,
the bi are assumed to be independent for dierent i, the bi,j are assumed
to be independent for dierent i, j and independent of the bi , and the ijk
are assumed to be independent for dierent i, j, k and independent of the
random eects.
172
173
in the lme call to produce fm1Wafer. In this case, the object specied in
random is repeated for all levels of grouping.
The very small estimated standard deviations for the (Intercept) random eect at both levels of grouping and for the voltage^2 random eect
at the Site %in% Wafer level suggest that these terms could be eliminated
from (4.8). Before pursuing this any further, we should assess the adequacy of the tted model. This is considered in detail in 4.3 and reveals
that important terms are omitted from the xed-eects model in (4.8). We
therefore postpone this discussion until 4.3 and proceed with the analysis
of fm1Wafer in this section to further illustrate the use of lme methods with
multilevel objects.
As with single-level objects, the fitted method is used to extract the
tted values, with the level argument being used to specify the desired
level(s) of grouping. For example, to get the population level tted values,
we use
> fitted( fm1Wafer, level = 0 )
1
1
1
1
1
1
1
1
1
1.0106 4.3083 7.9805 12.027 16.448 1.0106 4.3083 7.9805 12.027
. . .
10
10
10
10
4.3083 7.9805 12.027 16.448
attr(, "label"):
[1] "Fitted values (mA)"
Similarly, residuals are extracted using the resid method. For example, the
Wafer and Site %in% Wafer residuals are obtained with
> resid( fm1Wafer, level = 1:2 )
Wafer
Site
1 0.0615008 0.0680629
2 -0.1898559 -0.1800129
. . .
399 0.0051645 0.1187074
400 -0.2076543 -0.0714028
174
2
3
4
7.0273
23.7826
30.5381
6.7207
23.2314
29.9192
Note that, because no predictions were desired at the Site %in% Wafer level,
Site did not need to be specied in newWafer. If we are interested in getting
predictions for a specic site, say 3, within Wafer 1, we can use
> newWafer2 <- data.frame( Wafer = rep(1, 4), Site = rep(3, 4),
+
voltage = c(1, 1.5, 3, 3.5) )
> predict( fm1Wafer, newWafer2, level = 0:2 )
Wafer Site predict.fixed predict.Wafer predict.Site
1
1 1/3
2.6126
2.4014
2.4319
2
1 1/3
7.0273
6.7207
6.7666
3
1 1/3
23.7826
23.2314
23.3231
4
1 1/3
30.5381
29.9192
30.0261
175
# Figure 4.16
Subject
176
F11
F04
F03
F08
F02
F07
F05
F01
F06
F09
F10
M10
M01
M04
M06
M15
M09
M14
M13
M12
M03
M08
M07
M11
M02
M05
M16
-4
-2
Residuals (mm)
(Var (ij ) = 2 ), and are independent of the group levels. Figure 4.16 indicates that the residuals are centered at zero, but that the variability
changes with group. Because there are only four observations per subject,
we cannot rely too much on the individual boxplots for inference about
the within-group variances. We observe an outlying observation for subject
M13 and large residuals for subject M09. A pattern suggested by the individual boxplots is that there is more variability among boys (the lower 16
boxplots) than among girls (the upper 11 boxplots). We can get a better
feeling for this pattern by examining the plot of the standardized residuals
versus tted values by gender, shown in Figure 4.17
> plot( fm2Orth.lme, resid(., type = "p") ~ fitted(.) | Sex,
+
id = 0.05, adj = -0.3 )
# Figure 4.17
The type = "p" argument to the resid method species that the standardized residuals should be used. The id argument species a critical value for
identifying observations in the plot (standardized residuals greater than the
1-id/2 standard normal quantile in absolute value are identied in plot).
By default, the group labels are used to identify the observations. The argument adj controls the position of the identifying labels. It is clear from
Figure 4.17 that the variability in the orthodontic distance measurements
is greater among boys than among girls. Within each gender the variability
seems to be constant. The outlying observations for subjects M09 and M13
are evident.
20
Male
24
26
28
30
32
Female
Standardized residuals
22
177
M09
-2
M09
M13
18
20
22
24
26
28
30
32
FIGURE 4.17. Scatter plots of standardized residuals versus tted values for
fm2Orth.lme by gender.
178
20
Male
24
26
28
30
Female
M09
Standardized residuals
22
2
1
0
-1
-2
M09
M13
-3
18
20
22
24
26
28
30
FIGURE 4.18. Scatter plots of standardized residuals versus tted values for the
heteroscedastic t of fm3Orth.lme by gender.
The parameters for varIdent give the ratio of the stratum standard errors
to the within-group standard error. To allow identiability of the parameters, the within-group standard error is equal to the rst stratum standard
error. For the orthodontic data, the standard error for the girls is about
41% of that for the boys. The remaining estimates are very similar to the
ones in the homoscedastic t fm2Orth.lme. We can assess the adequacy of
the heteroscedastic t by re-examining plots of the standardized residuals
versus the tted values by gender, shown in Figure 4.18. The standardized
residuals in each gender now have about the same variability. We can still
identify the outlying observations, corresponding to subjects M09 and M13.
Overall, the standardized residuals are small, suggesting that the linear
mixed-eects model was successful in explaining the orthodontic growth
curves This is better seen by looking at a plot of the observed responses
versus the within-group tted values.
> plot( fm3Orth.lme, distance ~ fitted(.),
+
id = 0.05, adj = -0.3 )
# Figure 4.19
The fm3Orth.lme tted values are in close agreement with the observed
orthodontic distances, except for the three extreme observations on subjects
M09 and M13.
The need for an heteroscedastic model for the orthodontic growth data
can be formally tested with the anova method.
> anova( fm2Orth.lme, fm3Orth.lme )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm2Orth.lme
1 8 451.35 472.51 -217.68
fm3Orth.lme
2 9 432.30 456.09 -207.15 1 vs 2 21.059 <.0001
179
M09
30
25
M09
20
M13
18
20
22
24
26
28
30
The very small p-value of the likelihood ratio statistic conrms that the
heteroscedastic model explains the data signicantly better than the homoscedastic model.
The assumption of normality for the within-group errors can be assessed
with the normal probability plot of the residuals, produced by the qqnorm
method. A typical call to qqnorm is of the form
qqnorm( object, formula )
# Figure 4.20
Once again, we observe the three outlying points, but for the rest of the
observations the normality assumption seems plausible.
Radioimmunoassays of IGF-I
We initially consider the plot of the standardized residuals versus tted
values by Lot for the fm2IGF.lme object, obtained with
180
-2
Male
Female
-1
-2
-4
-2
Residuals (mm)
The residuals are centered around zero and seem to have similar variability
across lots. There are some outliers in the data, most noticeably for Lots 3
and 7.
We assess the normality of the within-group errors with a qqnorm plot of
the residuals.
> qqnorm( fm2IGF.lme, ~ resid(.),
+
id = 0.05, adj = -0.75 )
# Figure 4.22
The normal plot in Figure 4.22 suggests that the distribution of the withingroup errors has heavier tails than expected under normality, but is also
symmetric around zero. Perhaps a mixture of normal distributions or a
t-distribution with a moderate number of degrees of freedom would model
the distribution of the within-group error more adequately. However, as the
heavier tails seem to be distributed symmetrically, the estimates of the xed
eects should not change substantially under either a mixture model or a
t-model. The heavier tails tend to inate the estimate of the within-group
standard error under the Gaussian model, leading to more conservative
tests for the xed eects, but, because the p-value for the hypothesis that
the decay of tracer activity with age is zero is quite high (0.673), the main
conclusion should remain unchanged under either a mixture or a t-model.
Thickness of Oxide Coating on a Semiconductor
The plot of the within-group standardized residuals (level = 2 in this case)
versus the within-group tted values is the default display produced by the
plot method. Therefore,
5.1
5.2
5.3
5.4
5.5
5.1
5.2
5.3
5.4
181
5.5
Standardized residuals
4
2
0
-2
-4
9
10
4
2
0
-2
-4
5.1
5.2
5.3
5.4
5.5
5.1
5.2
5.3
5.4
5.5
5.1
5.2
5.3
5.4
5.5
FIGURE 4.21. Scatter plots of standardized residuals versus tted values for the
fm2IGF.lme t, by lot.
2
Quantiles of standard normal
7
7
3
4
-1
-2
-3
2
1
9
3
-4
-2
Residuals (ng/ml)
Standardized residuals
182
-1
-2
1990
2000
2010
2020
2030
Fitted values
FIGURE 4.23. Scatter plot of the standardized within-group residuals versus the
within-group tted values for the fm1Oxide t.
# Figure 4.23
results in the plot shown in Figure 4.23, which does not indicate any departures from the within-group errors assumptions: the residuals are symmetrically distributed around zero, with approximately constant variance.
By default, the qqnorm method produces a normal plot of the withingroup standardized residuals. Hence,
> qqnorm( fm1Oxide )
# Figure 4.24
gives the normal plot in Figure 4.24, which indicates that the assumption
of normality for the within-group errors is plausible.
Manufacturing of Analog MOS Circuits
The plot of the within-group residuals versus voltage by wafer, shown in
Figure 4.25, shows a clear periodic pattern for the residuals.
> plot( fm1Wafer, resid(.) ~ voltage | Wafer )
Figure 4.25
# Figure 4.26
The panel argument to the plot method overwrites the default panel function, allowing customized displays.
183
-1
-2
-2
-1
Standardized residuals
1.0
6
1.5
2.0
1.0
1.5
2.0
10
0.2
0.1
Residuals (mA)
0.0
-0.1
-0.2
1
0.2
0.1
0.0
-0.1
-0.2
1.0
1.5
2.0
1.0
1.5
2.0
1.0
1.5
2.0
Voltage (V)
FIGURE 4.25. Scatter plots of within-group residuals versus voltage by wafer for
the fm1Wafer t.
184
1.5
2.0
1.0
1.5
2.0
10
0.2
0.1
Residuals (mA)
0.0
-0.1
-0.2
1
0.2
0.1
0.0
-0.1
-0.2
1.0
1.5
2.0
1.0
1.5
2.0
1.0
1.5
2.0
Voltage (V)
FIGURE 4.26. Scatter plots of within-group residuals versus voltage by wafer for
the fm1Wafer t. A loess smoother has been added to each panel to enhance the
visualization of the residual pattern.
The same periodic pattern appears in all panels of Figure 4.26, with a
period T of approximately 1.5 V. Noting that the residuals are centered
around zero, this periodic pattern can be represented by the cosine wave
3 cos (v) + 4 sin (v) ,
(4.9)
where 3 and 4 determine the amplitude (= 32 + 42 ) and is the frequency of the cosine wave. We can incorporate this pattern into model (4.8)
by rewriting the xed-eects model for the expected value of yijk as
E [yijk ] = 0 + 1 vk + 2 vk2 + 3 cos (vk ) + 4 sin (vk ) .
(4.10)
185
186
The .~. in the fixed formula is an abbreviated form for the xed-eects formula in the original lme object, fm1Wafer. This convention is also available
in other S modeling functions, such as lm and aov. The random argument
was included in the call to update to prevent the estimated random-eects
parameters from fm1Wafer to be used as initial estimates (these give bad
initial estimates in this case and may lead to convergence problems).
The very high t-values for the sine and cosine terms in the summary output
indicate a signicant increase in the quality of the t when these terms are
included in the xed-eects model. The estimated standard deviations for
the random eects are quite dierent from the ones in fm1Wafer and now
suggest that there is signicant wafer-to-wafer and site-to-site variation
in all random eects in the model. The estimated within-group standard
deviation for fm2Wafer is about ten times smaller than that of fm1Wafer,
giving further evidence of the greater adequacy of (4.10). We assess the
variability in the estimates with
> intervals( fm2Wafer )
Approximate 95% confidence intervals
Fixed effects:
lower
est.
upper
(Intercept) -4.338485 -4.255388 -4.172292
voltage 5.397744 5.622357 5.846969
I(voltage^2) 1.225147 1.258512 1.291878
cos(4.5679 * voltage) -0.097768 -0.095557 -0.093347
sin(4.5679 * voltage) 0.101388 0.104345 0.107303
Random Effects:
Level: Wafer
lower
est.
upper
sd((Intercept)) 0.065853 0.128884 0.25225
sd(voltage) 0.174282 0.348651 0.69747
sd(I(voltage^2)) 0.023345 0.049074 0.10316
Level: Site
lower
est.
upper
sd((Intercept)) 0.017178 0.039675 0.091635
sd(voltage) 0.175311 0.234373 0.313332
sd(I(voltage^2)) 0.035007 0.047541 0.064564
Within-group standard error:
lower
est.
upper
0.0085375 0.011325 0.015023
1.5
2.0
1.0
1.5
187
2.0
10
0.03
0.02
Residuals (mA)
0.01
0.0
-0.01
-0.02
1
0.03
0.02
0.01
0.0
-0.01
-0.02
1.0
1.5
2.0
1.0
1.5
2.0
1.0
1.5
2.0
Voltage (V)
# Figure 4.28
and shown in Figure 4.28, does not indicate any violations from the
assumption of normality for the within-group errors.
188
-1
-2
-3
-2
-1
Standardized residuals
# Figure 4.29
# Figure 4.30
(Intercept)
189
I(age - 11)
M10
M13
F11
-1
-2
F10
-2
-0.1
0.0
0.1
0.2
0.3
Random effects
FIGURE 4.29. Normal plot of estimated random eects for the homoscedastic
fm2Orth.lme t.
-2
Male
Female
M13
0.3
I(age - 11)
0.2
0.1
0.0
-0.1
-2
(Intercept)
FIGURE 4.30. Scatter plot of estimated random eects for the homoscedastic
fm2Orth.lme t.
190
I(age - 11)
M10
F11
-1
-2
F10
-4
-2
-0.1
0.0
0.1
0.2
Random effects
FIGURE 4.31. Normal plot of estimated random eects for the heteroscedastic
fm3Orth.lme lme t.
191
age
1.5
1.0
0.5
0.0
-0.5
-1.0
-1.5
10^-7
2*10^-7
-0.004
0.0
0.002
0.004
0.0
Random effects
FIGURE 4.32. Normal plot of estimated random eects for the homoscedastic
fm2IGF.lme t.
The pairs plots by gender for Orthodont.lme3, not included here, do not
suggest any departures from the assumption of homogeneity of the randomeects distribution.
Radioimmunoassays of IGF-I
The normal plots of the estimated random eects for the fm2IGF.lme t,
shown in Figure 4.32, do not indicate any departures from normality or any
outlying subjects. They do, however, suggest that there is very little variability in the (Intercept) random eect, as the estimated random eects
are on the order of 107 .
The relative variability of each random eect with respect to the corresponding xed-eect estimates can be calculated as
> fm2IGF.lme
. . .
Fixed: conc ~ age
(Intercept)
age
5.369 -0.0019301
Random effects:
Formula: ~ age | Lot
Structure: Diagonal
(Intercept)
age Residual
StdDev: 0.00031074 0.0053722
0.8218
. . .
> c( 0.00031074, 0.0053722 )/abs( fixef(fm2IGF.lme) )
(Intercept)
age
5.7876e-05 2.7833
192
1.0
0.5
0.0
-0.5
-1.0
-1.5
-10
-5
10
15
20
Random effects
FIGURE 4.33. Normal plot of estimated Lot random eects for the fm1Oxide t.
The high p-value for the likelihood ratio test indicates that the random
eect for the intercept does not contribute signicantly to the t of the IGF
data.
The normal plot of the estimated random eects for the reduced model
fm3IGF.lme, not included here, does not suggest any violations of the normality assumption and does not show any outlying values.
Thickness of Oxide Coating on a Semiconductor
Normal probability plots of the estimated random eects must be examined
at each level of grouping when assessing the adequacy of a multilevel model
t. The normal plot of the estimated Lot random eects for the fm1Oxide
t, shown in Figure 4.33, is obtained with
> qqnorm( fm1Oxide, ~ranef(., level = 1), id=0.10 )
# Figure 4.33
The level argument to the ranef method is required in this case, as, by
default, ranef returns a list with the estimated random eects at each level
193
(Intercept)
6/1
-1
-2
1/2
-10
-5
10
Random effects
FIGURE 4.34. Normal plot of estimated Wafer %in% Lot random eects for the
fm1Oxide t.
of grouping, which will cause an error in qqnorm. Because there are only
eight random eects at the Lot level, it is dicult to identify any patterns
in Figure 4.33. Lot 6 is indicated as a potential outlier, which is consistent
with the plot of the data in Figure 4.14, where this lot is shown to have
the thickest oxide layers.
The normal plot of the Wafer %in% Lot random eects, shown in Figure 4.34, and obtained with
> qqnorm( fm1Oxide, ~ranef(., level = 2), id=0.10 )
# Figure 4.34
does not indicate any departures from normality. There is some mild evidence that Wafers 1/2 and 6/1 may be outliers.
Manufacturing of Analog MOS Circuits
The pairs plot of the estimated Wafer random eects for the fm2Wafer t,
shown in Figure 4.35, suggests that the random eects at that level are
correlated.
The fm2Wafer t uses a diagonal structure for the variancecovariance
matrix of the Wafer random eects, which is equivalent to assuming that
these random eects are independent. We test the independence assumption by tting the model with a general positive-denite structure for the
variancecovariance matrix of the Wafer random eects and comparing it
to the fm2Wafer t using the anova method.
> fm3Wafer <- update( fm2Wafer,
194
0.06
0.04
0.06
0.04
0.02
I(voltage^2)
0.00
-0.02
-0.04
-0.04
-0.02
0.00
0.4
0.0
0.2
0.4
0.2
voltage
0.0
0.0
-0.2
-0.4
0.00
0.10
0.05
-0.2
0.0
-0.4
0.10
0.05
0.00
(Intercept)
-0.05
-0.10
-0.15
-0.10
-0.05
-0.15
FIGURE 4.35. Scatter plot of estimated Wafer random eects for the fm2Wafer
t.
+
+
> fm3Wafer
. . .
Random effects:
Formula: ~ voltage + voltage^2 | Wafer
Structure: General positive-definite
StdDev
Corr
(Intercept) 0.131622 (Intr) voltag
voltage 0.359244 -0.967
I(voltage^2) 0.051323 0.822 -0.940
Formula: ~ voltage + voltage^2 | Site %in% Wafer
Structure: Diagonal
(Intercept) voltage I(voltage^2) Residual
StdDev:
0.033511 0.21831
0.045125 0.011832
. . .
> anova( fm2Wafer, fm3Wafer )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm2Wafer
1 12 -1232.6 -1184.9 628.31
fm3Wafer
2 15 -1267.0 -1207.3 648.50 1 vs 2 40.378 <.0001
There is a very signicant increase in the log-restricted-likelihood, as evidenced by the large value for the likelihood ratio test, indicating that the
more general model represented by fm3Wafer gives a better t.
0.00
195
0.05
0.00
I(voltage^2)
-0.05
-0.10
-0.10
0.2
0.6
0.4
-0.05
0.6
0.4
0.2
voltage
0.2
0.0
-0.2
-0.2
0.00
0.02
0.0
0.2
0.04
0.04
0.02
0.00
(Intercept)
0.00
-0.02
-0.04
-0.02
0.00
-0.04
FIGURE 4.36. Scatter plot of estimated random eects at the Site %in% Wafer
level for the fm3Wafer t.
The pairs plot for the estimated Site %in% Wafer random eects corresponding to fm3Wafer in Figure 4.36 indicate that there is a strong negative correlation between the voltage and voltage^2 random eects, but
no substantial correlation between either of these random eects and the
(Intercept) random eects. A block-diagonal matrix can be used to represent such covariance structure, with the (Intercept) random eect corresponding to one block and the voltage and voltage^2 random eects
corresponding to another block.
> fm4Wafer <- update( fm3Wafer,
+
random = list(Wafer = ~ voltage + voltage^2,
+
Site = pdBlocked(list(~1, ~voltage+voltage^2 - 1))))
> fm4Wafer
. . .
Random effects:
Formula: ~ voltage + voltage^2 | Wafer
Structure: General positive-definite
StdDev
Corr
(Intercept) 0.131807 (Intr) voltag
voltage 0.354746 -0.967
I(voltage^2) 0.049957 0.814 -0.935
Composite Structure: Blocked
Block 1: (Intercept)
Formula: ~ 1 | Site %in% Wafer
196
StdDev:
(Intercept)
0.066562
The small p-value for the likelihood ratio test indicates that the fmWafer4
model ts the data signicantly better than the model represented by
fm3Wafer. This will be the nal model consider in this chapter for the Wafer
data. Later in 8 we revisit this example, tting a nonlinear mixed-eects
model to the data.
The normal plot of the estimated Site %in% Wafer random eects corresponding to fm4Wafer, shown in Figure 4.37, does not suggest any signicant
departure from the assumption of normality for these random eects. There
is some moderate evidence that Sites 1/6, 7/3 and 8/1 may be outliers.
> qqnorm( fm4Wafer, ~ranef(., level = 2), id = 0.05,
+
cex = 0.7, layout = c(3, 1) )
# Figure 4.37
Exercises
(Intercept)
voltage
I(voltage^2)
1/6
197
7/3
8/1
-1
-2
8/1
-0.10
7/3
0.0
0.10
-0.4
0.0
0.2
0.4
0.6
0.8 -0.15
-0.05
0.0
0.05
Random effects
FIGURE 4.37. Normal plot of estimated Site %in% Wafer random eects for the
fm4Wafer t.
Exercises
1. In 1.3.3 (p. 27) we t the model
> fm3Machine <- update(fm1Machine, random = ~Machine-1|Worker)
198
Exercises
199
(a) Ret the model as, say, fm2Assay with the interaction term removed from the xed eects. Are the estimates of the other
parameters changed noticeably? Can you compare this tted
model to fm1Assay with anova and assess the signicance of the
interaction term?
(b) Use VarCorr to extract the estimates of the variance components
from fm2Assay. Notice that the estimated variance component for
the columns on the plate, indexed by dilut within Block, is very
small. Ret the model as, say, fm3Assay with this variance component eliminated. You can express the random argument for this
reduced model in at least three ways: (i) using pdBlocked and
pdIdent as in fm1Assay, (ii) using pdCompSymm as in fm4OatsB, or
(iii) using nested random eects for Block and for sample within
Block. Experiment with these dierent representations. Demonstrate that that the tted models for these three representations
are indeed equivalent. For which of the three representations
does lme converge most easily? Compare one of these tted models to fm2Assay using anova. Is the variance component for the
columns signicant?
(c) Extract the estimates of the variance components from the tted model fm3Assay and examine the condence intervals on
the standard deviations of the random eects. Do the variance
components appear to be signicantly greater than zero? Fit a
model, say fm4Assay, with a single variance component for the
Block. Compare it to fm3Assay with anova.
(d) Can the random eects be eliminated entirely? Fit a model,
say fm5Assay, using just the xed-eects terms from fm4Assay.
Because this model does not have any random-eects terms, you
will need to use lm or gls to t it.
(e) Compare models fm2Assay, fm3Assay, fm4Assay, and fm5Assay
with anova. Which model provides the simplest adequate representation for these data?
(f) Notice that the dilut factor represents serial dilutions that will
be equally spaced on the logarithm scale. Does converting dilut
to an ordered factor, so the contrasts used for this factor will
be polynomial contrasts, suggest further simplications of the
model?
5
Extending the Basic Linear
Mixed-Eects Model
202
presented and have their use illustrated in 5.3. In 5.4, the gls function is
described and illustrated through examples.
(5.1)
where the i are positive-denite matrices parametrized by a xed, generally small, set of parameters . As in the basic linear mixed-eects model of
2.1, the within-group errors i are assumed to be independent for dierent
i and independent of the random eects bi . The 2 is factored out of the
i for computational reasons (it can then be eliminated from the proled
likelihood function).
Similarly, the extended two-level linear mixed-eects model generalizes
the basic two-level model (2.2) described in 2.1.2 by letting
ij N 0, 2 ij , i = 1, . . . , M, j = 1, . . . , Mi ,
where the ij are positive-denite matrices parametrized by a xed vector. This readily generalizes to a multilevel model with Q levels of random
eects. For simplicity, we concentrate for the remainder of this section on
the extended single-level model (5.1), but the results we obtain are easily
generalizable to multilevel models with an arbitrary number of levels of
random eects.
i = i
i
and 1
=
.
i
i
i
Letting
T
1/2
y i = i
yi ,
T
1/2
X i,
X i = i
T
1/2
i = i
i ,
T
1/2
Z i = i
Z i,
(5.2)
203
2
i N i
0, i
i i
= N 0, 2 I ,
we can rewrite (5.1) as
y i = X i + Z i bi + i , i = 1, . . . , M,
bi N (0, ), i N 0, 2 I , i = 1, . . . , M.
(5.3)
simply dy i = i
dy i , the likelihood function L corresponding to the
extended linear mixed-eects model (5.1) is expressed as
M
p y i |, , 2 ,
L , , 2 , |y =
i=1
M
M
1/2
1/2
2
2
=
p y i |, , , i
= L , , , |y
i
,
i=1
(5.4)
i=1
M
1/2
L , , 2 , |y d = LR , 2 , |y
i
.
i=1
The function LR , , 2 , |y corresponds to a restricted likelihood function of a basic linear mixed-eects model. Hence, the results in 2.2.5 can
be used to obtain a numerically ecient representation of the proled logrestricted-likelihood.
204
has two components that can be used to model heteroscedasticity and correlation: a random-eects component, given by Z i DZ Ti , and a within-group
component, given by i . In practice, these two components may compete with each other in the model specication, in the sense that similar
i matrices may result from a more complex random-eects component
being added to a simpler within-group component (say i = I), or a simpler random-eects component (say Z i DZ Ti = b2 11T ) being added to a
more complex within-group component. There will generally be a trade-o
between the complexity of the two components of i and some care must
be exercised to prevent nonidentiability, or near nonidentiability, of the
parameters in the model.
In some applications, one may wish to avoid incorporating random eects
in the model to account for dependence among observations, choosing to
use the within-group component i to directly model variancecovariance
structure of the response. This results in the simplied version of the extended linear mixed-eects model (5.1)
(5.5)
y i = X i + i , i N 0, 2 i , i = 1, . . . , M.
Estimation under this model has been studied extensively in the linear
regression literature (Draper and Smith, 1998; Thisted, 1988), usually assuming that the i are known, being referred to as the generalized least
squares problem.
Using the same transformations as in (5.2), we can re-express (5.5) as a
classic linear regression model.
(5.6)
y i = X i + i , i N 0, 2 I , i = 1, . . . , M.
Hence, for xed , the maximum likelihood estimators of and 2 are
obtained by solving an ordinary least-squares problem. Letting X denote
the matrix obtained by stacking up the X i matrices, the conditional MLEs
of and 2 are
1
T
T
()
= (X ) X
(X ) y ,
2
(5.7)
()
y X
&
2
.
() =
N
The proled log-likelihood corresponding to (5.5), which is a function of
only, is obtained by replacing and 2 in the full log-likelihood by their
conditional MLEs (5.7), giving
M
1"
(|y) = const N log y X ()
log |i | .
2 i=1
(5.8)
205
M
1"
1
T
log (X ) X
log |i | ,
2
2 i=1
(5.9)
where V i is diagonal and C i is a correlation matrix, that is, a positivedenite matrix with all diagonal elements equal to one. The matrix V i
in (5.9) is not uniquely dened, as we can multiply any number of its rows
by 1 and still get the same decomposition. To ensure uniqueness, we
require that all the diagonal elements of V i be positive.
It is easy to verify that
2
206
i = 1, . . . , M,
j = 1, . . . , ni ,
(5.10)
The variance function in this case is g(x, y) = |x|y and the covariate vij
can be the expected value ij .
The single-level variance function model (5.10) can be generalized to
multilevel models. For example, the variance function model for a two-level
model is
Var (ijk |bi,j , bij ) = 2 g 2 (ijk , v ijk , ) ,
i = 1, . . . , M, j = 1, . . . , Mi ,
k = 1, . . . , nij ,
where ijk = E [yijk |bi,j , bij ]. We concentrate, for the remainder of this
section, in the single-level model (5.1), but all results presented here easily
generalize to multilevel models.
The variance function formulation (5.10) is very exible and intuitive,
because it allows the within-group variance to depend on the xed eects,
, and the random eects, bi , through the expected values, ij . However, as
discussed in Davidian and Giltinan (1995, Ch. 4), it poses some theoretical
and computational diculties, as the within-group errors and the random
eects can no longer assumed to be independent. Under the assumption
that E [i |bi ] = 0, it is easy to verify that Var (ij ) = E [Var (ij |bi )], so
that the dependence on the unobserved random eects can be avoided by
integrating them out of the variance model. Because the variance function
g is generally nonlinear in bi , integrating the random eects out of the variance model (5.10) does not lead to a computationally feasible optimization
207
i = 1, . . . , M,
j = 1, . . . , ni .
(5.11)
Under this approximation, the within-group errors are assumed independent of the random eects, as in (5.1), and the results in 5.1.1 can still be
used. Note that, if the conditional variance model (5.10) does not depend
on ij , (5.11) gives the exact marginal variance and no approximation is
required.
When the conditional variance model (5.10) depends on ij , the optimization algorithm follows an iteratively reweighted scheme: for given
(t)
(t) , (t) , (t) , the corresponding BLUPs
ij are obtained and held xed
while the objective function is optimized to produce new estimates (t+1) ,
(t+1)
(t+1) , (t+1) which, in turn, give updated BLUPs
ij , with the process
iterating until convergence. The resulting estimates approximate the (restricted) maximum likelihood estimates. When the variance model does not
involve ij , the (restricted) likelihood can be directly optimized, producing
the exact (restricted) maximum likelihood estimates.
Variance functions for the extended linear model (5.5) are similarly dened, but, because no random eects are present, the model for the marginal
variance does not involve any approximations, being expressed as
Var (ij ) = 2 g 2 (ij , v ij , ) ,
i = 1, . . . , M,
j = 1, . . . , ni ,
(5.12)
208
xed variance
dierent variances per stratum
power of covariate
exponential of covariate
constant plus power of covariate
combination of variance functions
The tted object may be referenced in form by the symbol ., so, for example,
form = ~ fitted(.)
209
(5.13)
210
If the ratio between the Female and Male standard deviations, given by ,
is to be kept xed at 0.5 and not allowed to vary during the optimization,
we can use instead
> vf2Ident <- varIdent( form = ~ 1 | Sex, fixed = c(Female = 0.5))
> vf2Ident <- initialize( vf2Ident, Orthodont )
> varWeights( vf2Ident )
Male Male Male Male Male Male Male Male Male Male Male Male
1
1
1
1
1
1
1
1
1
1
1
1
. . .
Female Female
2
2
variables are pasted together and a dierent is used for each combination
of levels. For example, to specify a variance function with dierent variances
for each age and Sex combination we can use
> vf3Ident <- varIdent( form = ~ 1 | Sex * age )
> vf3Ident <- initialize( vf3Ident, Orthodont )
> varWeights( vf3Ident )
Male*8 Male*10 Male*12 Male*14 Male*8 Male*10 Male*12 Male*14
1
1
1
1
1
1
1
1
. . .
Female*12 Female*14
1
1
g (vij , ) = |vij | ,
(5.14)
211
which is a power of the absolute value of the variance covariate. The parameter is unrestricted (i.e., may take any value in the real line) so (5.14)
can model cases where the variance increases or decreases with the absolute
value of the variance covariate. Note that, when vij = 0 and > 0, the
variance function is 0 and the variance weight is undened. Therefore, this
class of variance functions should not be used with variance covariates that
may assume the value 0.
The main arguments to the varPower constructor are value and form,
which specify, respectively, an initial value for , when this is allowed to vary
in the optimization, and a one-sided formula with the variance covariate. By
default, value = 0, corresponding to equal variance weights of 1, and form =
~fitted(.), corresponding to a variance covariate given by the tted values.
For example, to specify a variance model with the parameter initially set
to 1, and allowed to vary in the optimization, and the tted values as the
variance covariate, we use
> vf1Power <- varPower( 1 )
> formula( vf1Power )
~ fitted(.)
The fixed argument can be used to set to a xed value, which does
not change in the optimization. For example,
> vf2Power <- varPower( fixed = 0.5 )
species a model in which the variance increases linearly with the tted
values.
An optional stratication variable, or several stratication variables separated by *, may be included in the form argument, with a dierent being used for each stratum. This corresponds to the following generalization
of (5.14)
2sij
sij
varExp
The variance model represented by this class is
Var (ij ) = 2 exp (2vij ) ,
(5.15)
212
varConstPower
The variance model represented by this class is
2
Var (ij ) = 2 1 + |vij |2 ,
(5.16)
g (vij , ) = 1 + |vij | 2 ,
213
argument fixed is given as a list with components const and power and
may be used to set either, or both, of the variance parameters to a xed
value. For example, to specify a variance function with 1 xed at the value
1, with 2 allowed to vary but initialized to 0.5, and the tted values as the
variance covariate, we use
> vf1ConstPower <- varConstPower( power = 0.5,
+
fixed = list(const = 1) )
An optional stratication variable, or several stratication variables separated by *, may be included in the form argument, with dierent 1 and
2 being used for each stratum. This corresponds to the following generalization of (5.16)
2
2,sij
g1 (sij , 1 ) = 1,sij ,
(5.17)
The varComb constructor can take any number of varFunc objects as arguments. For example, to represent (5.17), with Sex as the stratication
variable and age as the variance covariate, we use
> vf1Comb <- varComb( varIdent(c(Female = 0.5), ~ 1 | Sex),
+
varExp(1, ~ age) )
> vf1Comb <- initialize( vf1Comb, Orthodont )
> varWeights( vf1Comb )
1
2
3
4
5
6
7
8
9
0.125 0.1 0.083333 0.071429 0.125 0.1 0.083333 0.071429 0.125
. . .
98
99
100 101 102
103
104 105 106
107
0.2 0.16667 0.14286 0.25 0.2 0.16667 0.14286 0.25 0.2 0.16667
108
0.14286
214
are given by 1/ ageij for Male (observations 164) and 2/ ageij for Female
(observations 65108).
New varFunc classes, representing user-dened variance functions, can be
added to the set of standard classes in Table 5.1 and used with the modeling functions in the nlme library. For this, one must specify a constructor
function, generally with the same name as the class, and, at a minimum,
methods for the functions coef, coef<-, and initialize. The varPower constructor and methods can serve as templates for these.
1.0
1.5
200
2.0
2.5
215
3.0
300
60
50
40
30
20
10
0
0.5
1.0
1.5
2.0
2.5
3.0
FIGURE 5.1. Hemodialyzer ultraltration rates (in ml/hr) measured at 7 dierent transmembrane pressures (in dmHg) on 20 high-ux dialyzers. In vitro evaluation of dialyzers based on bovine blood ow rates of 200 dl/min and 300 dl/min.
is
yij = (0 + 0 Qi + b0i ) + (1 + 1 Qi + b1i ) xij
+ (2 + 2 Qi + b2i ) x2ij + (3 + 3 Qi ) x3ij + (4 + 4 Qi ) x4ij + ij ,
(5.18)
b0i
bi = b1i N (0, ) , ij N 0, 2 ,
b2i
where Qi is a binary variable taking values 1 for 200 dl/min hemodialyzers
and 1 for 300 dl/min hemodialyzers; 0 , 1 , 2 , 3 , and 4 are, respectively,
the intercept, linear, quadratic, cubic, and quartic xed eects averaged
over the levels of Q; i is the blood ow eect associated with the xed eect
i ; bi is the vector of random eects, assumed independent for dierent i;
and ij is the within-group error, assumed independent for dierent i, j and
independent of the random eects.
We t the homoscedastic linear mixed-eects model (5.18) with
> fm1Dial.lme <+
lme(rate ~(pressure + pressure^2 + pressure^3 + pressure^4)*QB,
+
Dialyzer, ~ pressure + pressure^2)
> fm1Dial.lme
Linear mixed-effects model fit by REML
Data: Dialyzer
Log-restricted-likelihood: -326.39
Fixed: rate ~(pressure + pressure^2 + pressure^3 + pressure^4)*QB
216
and displayed in Figure 5.2, conrms that the within-group variability increases with transmembrane pressure.
Because of its exibility, the varPower variance function is a common
choice for modeling monotonic heteroscedasticity, when the variance covariate is bounded away from zero (transmembrane pressure varies in the
data between 0.235 dmHg and 3.030 dmHg). The corresponding model
only diers from (5.18) in that the within-group errors are allowed to be
heteroscedastic with variance model
Var (ij ) = 2 x2
ij
(5.19)
and we t it with
> fm2Dial.lme <- update( fm1Dial.lme,
+
weights = varPower(form = ~ pressure) )
> fm2Dial.lme
Linear mixed-effects model fit by REML
Data: Dialyzer
Log-restricted-likelihood: -309.51
Fixed: rate ~(pressure + pressure^2 + pressure^3 + pressure^4)*QB
(Intercept) pressure I(pressure^2) I(pressure^3) I(pressure^4)
-17.68
93.711
-49.186
12.245
-1.2426
217
Residuals (ml/hr)
-2
-4
0.5
1.0
1.5
2.0
2.5
3.0
FIGURE 5.2. Plot of residuals versus transmembrane pressure for the homoscedastic tted object fm1Dial.lme.
The anova method can be used to test the signicance of the heteroscedastic model (5.19).
218
Standardized residuals
-1
-2
0.5
1.0
1.5
2.0
2.5
3.0
As expected, there is a highly signicant increase in the log-likelihood associated with the inclusion of the varPower variance function. The plot of
the standardized residuals, dened as rij = (yij yij ) /(
xij ) in this case,
versus the variance covariate is used to graphically assess the adequacy of
the variance model.
> plot( fm2Dial.lme, resid(., type = "p") ~ pressure,
+
abline = 0 )
# Figure 5.3
The resulting plot, displayed in Figure 5.3, reveals a reasonably homogeneous pattern of variability for the standardized residuals, indicating that
the varPower model successfully describes the within-group variance.
We assess the variability of the variance parameter estimate with the
intervals method.
> intervals( fm2Dial.lme )
. . .
Variance function:
lower
est. upper
power 0.4079 0.74923 1.0906
. . .
1.0
200
1.5
2.0
2.5
219
3.0
300
Residuals (ml/hr)
4
2
0
-2
-4
-6
0.5
1.0
1.5
2.0
2.5
3.0
# Fig. 5.4
and displayed in Figure 5.4. Note that the pattern of increasing variability
is still present in the raw residuals, as we did not transform the data, but,
instead, incorporated the within-group heteroscedasticity in the model. The
heteroscedastic patterns seem the same for both blood ow levels. We can
test this formally using
> fm3Dial.lme <- update(fm2Dial.lme,
+
weights=varPower(form = ~ pressure | QB))
> fm3Dial.lme
. . .
Variance function:
Structure: Power of variance covariate, different strata
Formula: ~ pressure | QB
Parameter estimates:
200
300
0.64775 0.83777
. . .
> anova( fm2Dial.lme, fm3Dial.lme )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm2Dial.lme
1 18 655.01 706.63 -309.51
fm3Dial.lme
2 19 656.30 710.78 -309.15 1 vs 2 0.71091 0.3991
220
1.5
2.5
0.5
1.5
2.5
0.5
1.5
16
13
14
18
15
19
17
20
11
2.5
60
40
20
12
60
40
20
10
60
40
20
0
0.5
1.5
2.5
0.5
1.5
2.5
0.5
1.5
2.5
0.5
1.5
2.5
FIGURE 5.5. Plot of predicted ultraltration rates versus transmembrane pressure by subject corresponding to the tted object fm2Dial.lme.
The nearly identical log-likelihood values indicate that there is no need for
the extra parameter associated with varConstPower and that the simpler
varPower model should be maintained.
A nal assessment of the heteroscedastic version of the linear mixedeects model (5.18) with within-group variance model (5.19) is provided
by the plot of the augmented predictions by subject, obtained with
> plot( augPred(fm2Dial.lme), grid = T )
# Figure 5.5
and displayed in Figure 5.5. The predicted values closely match the observed ultraltration rates, attesting the adequacy of the model.
221
One of the questions of interest for the hemodialyzer data is whether the
ultraltration characteristics dier with the evaluation blood ow rates,
which is suggested by the plots in Figure 5.1. The anova method can be
used to test the signicance of the terms associated with the evaluation
blood ow rates, in the order they were entered in the model (a sequential
type of test).
> anova( fm2Dial.lme )
. . .
numDF denDF F-value p-value
(Intercept)
1
112
552.9 <.0001
pressure
1
112 2328.6 <.0001
I(pressure^2)
1
112 1174.6 <.0001
I(pressure^3)
1
112
359.9 <.0001
I(pressure^4)
1
112
12.5 0.0006
QB
1
18
4.8 0.0414
pressure:QB
1
112
80.1 <.0001
I(pressure^2):QB
1
112
1.4 0.2477
I(pressure^3):QB
1
112
2.2 0.1370
I(pressure^4):QB
1
112
0.2 0.6840
. . .
The large p-values associated with terms of degree greater than or equal to
2 involving the variable QB suggest that they are not needed in the model.
We can verify their joint signicance with
> anova( fm2Dial.lme, Terms = 8:10 )
F-test for: I(pressure^2):QB, I(pressure^3):QB, I(pressure^4):QB
numDF denDF F-value p-value
1
3
112 1.2536 0.2939
The large p-value for the F-test conrms that these terms could be eliminated from the model.
Body Weight Growth in Rats
As a second example to illustrate the use of variance functions with lme, we
revisit the BodyWeight data introduced in 3.2.1 and described in Hand and
Crowder (1996, Table A.1), on the body weights of rats measured over 64
days. The body weights of the rats (in grams) are measured on day 1 and
every seven days thereafter, until day 64, with an extra measurement on
day 44. There are three groups of rats, each on a dierent diet. These data
are also described in Appendix A.3 and are included in the nlme library as
the groupedData object BodyWeight.
The plots of the body weights versus time by diet, shown in Figure 5.6,
indicate strong dierences among the three diet groups. There is also evidence of a rat in diet group 2 with an unusually high initial body weight.
The body weights appear to grow linearly with time, possibly with different intercepts and slopes for each diet, and with intercept and slope
222
20
40
60
600
500
400
300
20
40
60
20
40
60
Time (days)
FIGURE 5.6. Body weights of rats measured over a period of 64 days. The rats
are divided into three groups on dierent diets.
223
Standardized residuals
2
1
0
-1
-2
-3
300
400
500
600
FIGURE 5.7. Plot of standardized residuals versus tted values for the homoscedastic tted object fm1BW.lme.
> fm1BW.lme
Linear mixed-effects model fit by REML
Data: BodyWeight
Log-restricted-likelihood: -575.86
Fixed: weight ~ Time * Diet
(Intercept)
Time Diet2 Diet3 TimeDiet2 TimeDiet3
251.65 0.35964 200.67 252.07
0.60584
0.29834
Random effects:
Formula: ~ Time | Rat
Structure: General positive-definite
StdDev
Corr
(Intercept) 36.93907 (Inter
Time 0.24841 -0.149
Residual 4.44361
Number of Observations: 176
Number of Groups: 16
The plot of the standardized residuals versus the tted values, displayed
in Figure 5.7, gives clear indication of within-group heteroscedasticity.
Because the tted values are bounded away from zero, we can use the
varPower variance function to model the heteroscedasticity.
> fm2BW.lme <- update( fm1BW.lme, weights = varPower() )
> fm2BW.lme
Linear mixed-effects model fit by REML
Data: BodyWeight
224
Log-restricted-likelihood: -570.96
Fixed: weight ~ Time * Diet
(Intercept)
Time Diet2 Diet3 TimeDiet2 TimeDiet3
251.6 0.36109 200.78 252.17
0.60182
0.29523
Random effects:
Formula: ~ Time | Rat
Structure: General positive-definite
StdDev
Corr
(Intercept) 36.89887 (Inter
Time 0.24373 -0.147
Residual 0.17536
Variance function:
Structure: Power of variance covariate
Formula: ~ fitted(.)
Parameter estimates:
power
0.54266
Number of Observations: 176
Number of Groups: 16
Note that the form argument did not need to be specied in the call to
varPower, because its default value, ~fitted(.), corresponds to the desired
variance covariate.
The plot of the standardized residuals versus tted values for the heteroscedastic t corresponding to fm2BW.lme, displayed in Figure 5.8, indicates that the varPower variance function adequately represents the withingroup heteroscedasticity.
We can test the signicance of the variance parameter in the varPower
model using the anova method, which, as expected, strongly rejects the
assumption of homoscedasticity (i.e., = 0).
> anova( fm1BW.lme, fm2BW.lme )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm1BW.lme
1 10 1171.7 1203.1 -575.86
fm2BW.lme
2 11 1163.9 1198.4 -570.96 1 vs 2 9.7984 0.0017
The primary question of interest for the BodyWeight data is whether the
growth rates dier signicantly among diets. Because of the parametrization used in (5.20), the summary method only provides tests for dierences
between Diets 1 and 2 (12 in (5.20)) and between Diets 1 and 3 (13
in (5.20)).
> summary( fm2BW.lme )
. . .
Fixed effects: weight ~ Time * Diet
Value Std.Error DF t-value p-value
(Intercept) 251.60
13.068 157 19.254 <.0001
225
Standardized residuals
-1
-2
-3
300
400
500
600
FIGURE 5.8. Plot of standardized residuals versus tted values for the varPower
tted object fm2BW.lme.
Time
0.36
Diet2 200.78
Diet3 252.17
TimeDiet2
0.60
TimeDiet3
0.30
. . .
0.088 157
22.657 13
22.662 13
0.155 157
0.156 157
4.084
8.862
11.127
3.871
1.893
0.0001
<.0001
<.0001
0.0002
0.0602
226
j = 1, . . . , Mi ,
k, k = 1, . . . , nij .
Note that the correlation model applies to within-group errors within the
same innermost level of grouping. We concentrate, for the remainder of
this section, in the single-level correlation model (5.21), but all correlation
structures presented here can be easily extended to multilevel models.
227
without random eects have been extensively studied by Box, Jenkins and
Reinsel (1994). In the context of linear mixed-eects models, they are described in detail in Jones (1993).
We simplify the isotropy assumption and assume that the serial correlation model depends on the one-dimensional positions pij , pij only through
their absolute dierence. The general serial correlation model is then dened as
cor (ij , ij ) = h (|pij pij | , ) .
In the context of time-series data, the correlation function h () is referred to as the autocorrelation function. The empirical autocorrelation
function (Box et al., 1994, 3), a nonparametric estimate of the autocorrelation function, provides a useful tool for investigating serial correlation in
ij , denote the standardized residtime-series data. Let rij = (yij yij ) /
2
= Var (ij ). The empirical
uals from a tted mixed-eects model, with ij
autocorrelation at lag l is dened as
M ni l
i=1
j=1 rij ri(j+l) /N (l)
,
(5.22)
(l) =
M ni 2
j=1 rij /N (0)
i=1
where N (l) is the number of residual pairs used in the summation dening
the numerator of (l).
Serial correlation structures typically require that the data be observed
at integer time points and do not easily generalize to continuous position
vectors. We describe below some of the most common serial correlation
structures used in practice, all of which are implemented in the nlme library.
Compound Symmetry
This is the simplest serial correlation structure, which assumes equal correlation among all within-group errors pertaining to the same group. The
corresponding correlation model is
cor (ij , ij ) = ,
j = j ,
h(k, ) = ,
k = 1, 2, . . . ,
(5.23)
228
model can only take values between 0 and 1, while (5.23) allows to take
negative values (to have a positive-denite compound symmetry correlation
structure, it is only required that > 1/ [maxiM (ni ) 1]).
The compound symmetry correlation model tends to be too simplistic
for practical applications involving time-series data, as, in general, it is
more realistic to assume a model in which the correlation between two
observations decreases, in absolute value, with their distance. It is a useful
model for applications involving short time series per group, or when all
observations within a group are collected at the same time, as in split-plot
experiments.
General
This structure represents the other extreme in complexity to the compound
symmetry structure. Each correlation in the data is represented by a different parameter, corresponding to the correlation function
h(k, ) = k ,
k = 1, 2, . . . .
(5.24)
(5.25)
229
The AR(1) model is the simplest (and one of the most useful) autoregressive model. Its correlation function decreases in absolute value exponentially with lag.
h(k, ) = k ,
k = 0, 1, . . . .
(5.26)
s 0,
0.
(5.27)
k = 1, 2, . . . .
(5.28)
The number of noise terms included in the linear model (5.28), q, is called
the order of the moving average model, which is denoted by MA(q). There
are q correlation parameters in an MA(q) model, given by = (1 , . . . , q ).
The correlation function for an MA(q) model is
'
k +1 k1 ++kq q
, k = 1, . . . , q,
1+12 ++q2
h(k, ) =
0,
k = q + 1, q + 2, . . . .
Observations more than q time units apart are uncorrelated, as they do not
share any common noise terms at .
Strategies for estimating the order of autoregressive and moving average
models in time series applications are discussed in Box et al. (1994, 3).
Mixed autoregressivemoving average models, called ARMA models, are
obtained by combining together an autoregressive model and a moving
average model.
t =
p
"
i=1
i ti +
q
"
j atj + at .
j=1
There are p + q correlation parameters in an ARMA(p, q) model, corresponding to the combination of the p autoregressive parameters =
230
(5.29)
with the last equality following from E [x ] = E [y ] = 0. The withingroup errors can be standardized to have unit variance, without changing
231
1 "
2N (s) i=1
"
(s) =
(rij rij ) ,
(5.30)
(s) =
1
2N (s)
M
"
"
1/2
|rij rij |
(5.31)
Some Isotropic Variogram Models
Cressie (1993, 2.3.1) describes an extensive collection of isotropic variogram models and give conditions for their validity. The single-parameter
models in Table 5.2 are a subset of the collection in Cressie (1993), with
the Linear variogram model modied so that it is bounded in s. I(s < )
denotes a binary variable taking value 1 when s < and 0 otherwise. Most
232
TABLE 5.2. Some isotropic variogram models for spatial correlation structures.
Exponential
Gaussian
Linear
Rational quadratic
Spherical
of the models in Table 5.2 are also described in Littell et al. (1996, 9.3.1).
The correlation parameter is generally referred to as the range in the
spatial statistics literature (Littell et al., 1996, 9.3).
For one-dimensional position vectors, the exponential spatial correlation
structure is equivalent to the CAR(1) structure (5.27). This is easily veried
by dening = exp(1/) and noting the correlation function associated
with the exponential structure is expressed as h(s, ) = s . The exponential
correlation model can be regarded as a multivariate generalization of the
the CAR(1) model.
Correlation functions for the structures in Table 5.2 may be obtained
using the relation h(s, ) = 1 (s, ). A nugget eect c0 may be added to
any of the variogram models, using
+
c0 + (1 c0 ) (s, ), s > 0,
nugg (s, co , ) =
0,
s = 0.
Figure 5.9 displays plots of the semivariograms models in Table 5.2,
corresponding to a range of = 1 and a nugget eect of c0 = 0.1 The
semivariograms increase monotonically with distance and vary between 0
and 1, corresponding to non-negative correlation functions that decrease
monotonically with distance. All of the spatial correlation models in Table 5.2 are implemented in the nlme library as corStruct classes, described
in the next section.
0.5
1.0
1.5
2.0
2.5
233
3.0
Spherical
1.0
0.8
0.6
Semivariogram
0.4
0.2
0.0
Exponential
Gaussian
Linear
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Distance
FIGURE 5.9. Plots of semivariogram versus distance for the isotropic spatial
correlation models in Table 5.2 with range = 1 and nugget eect = 0.1.
the second is a one-sided formula specifying the position vector and, optionally, a grouping variable for the dataobservations in dierent groups
are assumed independent. For example, to specify age as a position variable
and to have Subject dening the grouping of the data, we use
form = ~ age | Subject
234
compound symmetry
general
autoregressive of order 1
continuous-time AR(1)
autoregressive-moving average
exponential
Gaussian
linear
rational quadratic
spherical
correlation model does not depend on the position of the observation, but
just on the group to which it belongs, the form argument is used only to
specify the grouping structure. For example,
> cs1CompSymm <- corCompSymm( value = 0.3, form = ~ 1 | Subject )
Typically, initialize is only called from within the modeling function using
the corStruct object.
corSymm
This class implements the general correlation structure (5.24). The argument value is used to initialize the correlation parameters, being given as
235
a numeric vector with the lower diagonal elements of the the largest correlation matrix represented by the corSymm object stacked columnwise. For
example, to represent the correlation matrix
(5.32)
0.1 0.0 1.0 0.0
0.1 0.2 0.0 1.0
we use
value = c( 0.2, 0.1, -0.1, 0, 0.2, 0 )
The correlations specied in value must dene a positive-denite correlation matrix. By default, value = numeric(0), which leads to initial values
of 0 being assigned to all correlations in the initialize method.
The argument form species a one-sided formula with the position variable and, optionally, a grouping variable. The position variable denes the
indices of the correlation parameters for each observation and must evaluate to an integer vector, with nonrepeated values per group, such that
its unique values, when sorted, form a sequence of consecutive integers. By
default, the position variable in form is 1, in which case the order of the
observations within the group is used to index the correlation parameters.
For example, to specify a general correlation correlation structure with
initial correlation matrix as in (5.32), observation order within the group
as the position variable, and grouping variable Subject, we use
> cs1Symm <- corSymm( value = c(0.2, 0.1, -0.1, 0, 0.2, 0),
+
form = ~ 1 | Subject )
> cs1Symm <- initialize( cs1Symm, data = Orthodont )
> corMatrix( cs1Symm )
$M01:
[,1] [,2] [,3] [,4]
[1,] 1.0 0.2 0.1 -0.1
[2,] 0.2 1.0 0.0 0.2
[3,] 0.1 0.0 1.0 0.0
[4,] -0.1 0.2 0.0 1.0
corAR1
This class implements an autoregressive correlation structure of order 1, for
integer position vectors. The argument value initializes the single correlation parameter , which takes values between 1 and 1, and, by default, is
set to 0. The argument form is a one-sided formula specifying the position
variable and, optionally, a grouping variable. The position variable must
evaluate to an integer vector, with nonrepeated values per group, but its
values are not required to be consecutive, so that missing time points are
naturally accommodated. By default, form = ~1, implying that the order
of the observations within the group be used as the position variable.
236
237
[,3]
0.00000
0.34483
1.00000
0.34483
[,4]
0.00000
0.00000
0.34483
1.00000
238
> spatDat
x
y
1 0.00 0.00
2 0.25 0.25
3 0.50 0.50
4 0.75 0.75
5 1.00 1.00
An exponential spatial correlation structure based on the Euclidean distance between x and y, with range equal to 1, and no nugget eect is
constructed and initialized with
> cs1Exp <- corExp( 1, form = ~ x + y )
> cs1Exp <- initialize( cs1Exp, spatDat )
> corMatrix( cs1Exp )
[,1]
[,2]
[,3]
[,4]
[,5]
[1,] 1.00000 0.70219 0.49307 0.34623 0.24312
[2,] 0.70219 1.00000 0.70219 0.49307 0.34623
[3,] 0.49307 0.70219 1.00000 0.70219 0.49307
[4,] 0.34623 0.49307 0.70219 1.00000 0.70219
[5,] 0.24312 0.34623 0.49307 0.70219 1.00000
Note that because partial matches are used on the value of the metric
argument, we only gave the rst three characters of "manhattan" in the
call.
A nugget eect of 0.2 is added to the correlation structure using
> cs3Exp <- corExp( c(1, 0.2), form = ~ x + y, nugget = T )
> cs3Exp <- initialize( cs3Exp, spatDat )
> corMatrix( cs3Exp )
[,1]
[,2]
[,3]
[,4]
[,5]
[1,] 1.00000 0.56175 0.39445 0.27698 0.19449
[2,] 0.56175 1.00000 0.56175 0.39445 0.27698
[3,] 0.39445 0.56175 1.00000 0.56175 0.39445
[4,] 0.27698 0.39445 0.56175 1.00000 0.56175
[5,] 0.19449 0.27698 0.39445 0.56175 1.00000
239
the modeling functions in the nlme library. For this, one must specify a
constructor function, generally with the same name as the class, and, at a
minimum, methods for the functions coef, corMatrix, and initialize. The
corAR1 constructor and methods can serve as templates for these.
(5.33)
where 0 , 1 , and 2 are the xed eects, bi is the random eects vector,
assumed independent for dierent mares, and ij is the within-group error,
assumed independent for dierent i, j and independent of the random effects. The random eects b0i and b1i are assumed to be independent with
variances 02 and 12 , respectively.
240
25
20
15
10
5
25
20
15
10
5
25
20
15
10
5
-0.2
0.0
0.2
0.4
25
20
15
10
5
10
11
0.6
0.8
1.0
25
20
15
10
5
25
20
15
10
5
1.2
241
The observations in the Ovary data were collected at equally spaced calendar times. When the calendar time was converted to the ovulation cycle scale, the intervals between observations remained very similar, but no
longer identical. Therefore, when considered in the scale of the within-group
observation order, the Ovary data provides an example of time-series data.
We use it here to illustrate the modeling of serial correlation structures in
lme.
The ACF method for the lme class obtains the empirical autocorrelation
function (5.22) from the residuals of an lme object.
> ACF(
lag
1
0
2
1
3
2
4
3
5
4
6
5
7
6
8
7
9
8
10
9
11 10
12 11
13 12
14 13
15 14
fm1Ovar.lme )
ACF
1.000000
0.379480
0.179722
0.035693
0.059779
0.002097
0.064327
0.071635
0.048578
0.027782
-0.034276
-0.077204
-0.161132
-0.196030
-0.289337
# Figure 5.11
The argument alpha species the signicance level for approximate twosided critical bounds
for the autocorrelations (Box et al., 1994), given by
z(1 /2)/ N (l), with z(p) denoting the standard normal quantile of
order p and N (l) dened as in (5.22).
The empirical autocorrelations in Figure 5.11 are signicantly dierent
from 0 at the rst two lags, decrease approximately exponentially for the
rst four lags, and stabilize at nonsignicant levels for larger lags. This suggests that an AR(1) model may be suitable for the within-group correlation
and we t it with
> fm2Ovar.lme <- update( fm1Ovar.lme, correlation = corAR1() )
Linear mixed-effects model fit by REML
Data: Ovary
242
Autocorrelation
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
0
10
Lag
FIGURE 5.11. Empirical autocorrelation function corresponding to the standardized residuals of the fm1Ovar.lme object.
Log-restricted-likelihood: -774.72
Fixed: follicles ~ sin(2 * pi * Time) + cos(2 * pi * Time)
(Intercept) sin(2 * pi * Time) cos(2 * pi * Time)
12.188
-2.9853
-0.87776
Random effects:
Formula: ~ sin(2 * pi * Time) | Mare
Structure: Diagonal
(Intercept) sin(2 * pi * Time) Residual
StdDev:
2.8585
1.258
3.5071
Correlation Structure: AR(1)
Formula: ~ 1 | Mare
Parameter estimate(s):
Phi
0.5722
Number of Observations: 308
Number of Groups: 11
The very signicant p-value for the likelihood ratio test indicates that the
AR(1) provides a substantially better t of the data than the independent errors model (5.33), suggesting that within-group serial correlation is
243
Consistently with the likelihood ratio test results, the condence interval
on indicates that it is signicantly dierent from 0.
The autocorrelation pattern in Figure 5.11 is also consistent with that of
an MA(2) model, in which only the rst two lags have nonzero correlations.
We t this model with
> fm3Ovar.lme <- update(fm1Ovar.lme, correlation = corARMA(q = 2))
> fm3Ovar.lme
. . .
Correlation Structure: ARMA(0,2)
Formula: ~ 1 | Mare
Parameter estimate(s):
Theta1 Theta2
0.47524 0.25701
. . .
The AR(1) and MA(2) models are not nested and, therefore, cannot be
compared through a likelihood ratio test. They can, however, be compared
via their information criterion statistics.
> anova( fm2Ovar.lme, fm3Ovar.lme, test = F )
Model df
AIC
BIC logLik
fm2Ovar.lme
1 7 1563.4 1589.5 -774.72
fm3Ovar.lme
2 8 1571.2 1601.0 -777.62
Even though it has one fewer parameter than the MA(2) model, the AR(1)
model is associated with a larger log-restricted-likelihood, which translates
into smaller AIC and BIC, making it the preferred model of the two.
Because the xed- and random-eects models in (5.33) use a continuous
time scale, we investigate if a continuous time AR(1) model would provide
a better representation of the within-group correlation, using the corCAR1
class.
> fm4Ovar.lme <- update( fm1Ovar.lme,
+
correlation = corCAR1(form = ~Time) )
> anova( fm2Ovar.lme, fm4Ovar.lme, test = F )
Model df
AIC
BIC logLik
fm2Ovar.lme
1 7 1563.4 1589.5 -774.72
fm4Ovar.lme
2 7 1565.5 1591.6 -775.77
244
The low p-value for the likelihood ratio test indicates that the ARMA(1, 1)
model provides a better t of the data.
We can assess the adequacy of the ARMA(1, 1) model using the empirical
autocorrelation function of the normalized residuals.
> plot( ACF(fm5Ovar.lme, maxLag = 10, resType = "n"),
+
alpha = 0.01 )
# Figure 5.12
245
Autocorrelation
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
0
10
Lag
FIGURE 5.12. Empirical autocorrelation function corresponding to the normalized residuals of the fm5Ovar.lme object.
The Variogram method for the lme class estimates the sample semivariogram from the residuals of the lme object. The arguments resType
and robust control, respectively, what type of residuals should be used
("pearson" or "response") and whether the robust algorithm (5.31) or the
classical algorithm (5.30) should be used to estimate the semivariogram.
The defaults are resType = "pearson" and robust = FALSE, so that classical
estimates of the semivariogram are obtained from the standardized residuals. The argument form is a one-sided formula specifying the position vector
to be used for the semivariogram calculations.
> Variogram( fm2BW.lme, form = ~ Time )
variog dist n.pairs
1 0.34508
1
16
2 0.99328
6
16
3 0.76201
7
144
4 0.68496
8
16
5 0.68190
13
16
6 0.95118
14
128
7 0.89959
15
16
8 1.69458
20
16
9 1.12512
21
112
10 1.08820
22
16
11 0.89693
28
96
12 0.93230
29
16
13 0.85144
35
80
14 0.75448
36
16
15 1.08220
42
64
16 1.56652
43
16
17 0.64378
49
48
18 0.67350
56
32
19 0.58663
63
16
246
Semivariogram
1.5
1.0
0.5
0.0
0
10
20
30
40
Distance
FIGURE 5.13. Sample semivariogram estimates corresponding to the standardized residuals of the fm2BW.lme object. A loess smoother is added to the plot to
enhance the visualization of patterns in the semivariogram.
The columns in the data frame returned by Variogram represent, respectively, the sample semivariogram, the distance, and the number of residual
pairs used in the estimation. Because of the imbalance in the time measurements, the number of residual pairs used at each distance varies considerably, making some semivariogram estimates more reliable than others. In
general, the number of residual pairs used in the semivariogram estimation
decreases with distance, making the values at large distances unreliable.
We can control the maximum distance for which semivariogram estimates
should be calculated using the argument maxDist.
A graphical representation of the sample semivariogram is obtained with
the plot method for class Variogram.
> plot( Variogram(fm2BW.lme, form = ~ Time,
+
maxDist = 42) )
# Figure 5.13
The resulting plot, shown in Figure 5.13, includes a loess smoother (Cleveland
et al., 1992) to enhance the visualization of semivariogram patterns. The
semivariogram seems to increase with distance up to 20 days and then stabilizes around 1. We initially use an exponential spatial correlation model
for the within-group errors, tting it with
> fm3BW.lme <- update( fm2BW.lme, corr = corExp(form = ~ Time) )
> fm3BW.lme
. . .
Correlation Structure: Exponential spatial correlation
247
The condence intervals is bounded away from zero, suggesting that the
spatial correlation model produced a signicantly better t. We can also
test this using the anova method.
> anova( fm2BW.lme, fm3BW.lme )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm2BW.lme
1 11 1163.9 1198.4 -570.96
fm3BW.lme
2 12 1145.1 1182.8 -560.57 1 vs 2 20.781 <.0001
The likelihood ratio test also indicates that the corExp model ts the data
signicantly better than the independent errors model corresponding to
fm2BW.lme.
The semivariogram plot in Figure 5.13 gives some indication that a
nugget eect may be present in the data. We can test it with
> fm4BW.lme <- update( fm3BW.lme,
+
corr = corExp(form = ~ Time, nugget = T) )
> anova( fm3BW.lme, fm4BW.lme )
Model df
AIC
BIC logLik
Test
L.Ratio
fm3BW.lme
1 12 1145.1 1182.8 -560.57
fm4BW.lme
2 13 1147.1 1187.9 -560.57 1 vs 2 0.00043111
p-value
fm3BW.lme
fm4BW.lme 0.9834
The nearly identical log-likelihood values indicate that a nugget eect is,
in fact, not needed.
248
Semivariogram
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0
10
20
30
40
Distance
FIGURE 5.14. Sample semivariogram estimates corresponding to the standardized residuals of the fm2BW.lme object. The tted semivariogram corresponding
to fm3BW.lme is added to plot.
# Figure 5.14
The tted semivariogram agrees reasonably well with the sample variogram
estimates.
We can also assess the adequacy of the exponential spatial correlation
model by investigating the sample semivariogram for the normalized residuals.
> plot( Variogram(fm3BW.lme, form = ~ Time, maxDist = 42,
+
resType = "n", robust = T) )
# Figure 5.15
249
Semivariogram
1.5
1.0
0.5
0.0
0
10
20
30
40
Distance
The corExp t has the smallest AIC and BIC and seems the most adequate
within-group correlation model for the BodyWeight data, among the spatial
correlation models considered.
250
The rst argument, model, is a two-sided linear formula specifying the model
for the expected value of the response. Correlation and weights are used,
as in lme, to dene, respectively, the correlation model and the variance
function model for the error term. Data species a data frame in which the
variables named in model, correlation, and weights can be evaluated.
The tted object returned by gls inherits from class gls, for which several
methods are available to display, plot, update, and further explore the
estimation results. Table 5.4 lists the most important methods for class gls.
The use of the gls function and its associated methods is described and
illustrated through the examples in the next sections.
Orthodontic Growth Curve
The Orthodont data were analyzed in 4.2 and 4.3 using a linear mixedeects model. We describe here an alternative analysis based on the extended linear model (5.5).
251
2
i1
1 12 13 14
i2
12 22 23 24
N (0, i ) ,
i =
=
i
i3
13 23 32 34 , (5.34)
i4
14 24 34 42
with Sex representing a binary variable taking values 1 for boys and 1
for girls. The parameters 1 and 3 represent, respectively, the intercept
and slope gender eects. The variancecovariance matrix i is assumed the
same for all subjects.
We t (5.34) with gls using a combination of the corSymm correlation
class and the varIdent variance function class.
> fm1Orth.gls <- gls( distance ~ Sex * I(age - 11), Orthodont,
+
correlation = corSymm(form = ~ 1 | Subject),
+
weights = varIdent(form = ~ 1 | age) )
In this case, because Orthodont is a groupedData object with grouping variable Subject, the argument form could be omitted in the call to corSymm.
The print method gives some basic information about the t.
> fm1Orth.gls
Generalized least squares fit by REML
Model: distance ~ Sex * I(age - 11)
Data: Orthodont
Log-restricted-likelihood: -213.66
Coefficients:
(Intercept)
Sex I(age - 11) Sex:I(age - 11)
23.801 -1.136
0.6516
-0.17524
Correlation Structure: General
Formula: ~ 1 | age
Parameter estimate(s):
Correlation:
1
2
3
2 0.568
3 0.659 0.581
4 0.522 0.725 0.740
Variance function:
Structure: Different standard deviations per stratum
Formula: ~ 1 | age
252
Parameter estimates:
8
10
12
14
1 0.8792 1.0747 0.95872
Degrees of freedom: 108 total; 104 residual
Residual standard error: 2.329
The correlation estimates are similar, suggesting that a compound symmetry structure may be a suitable correlation model. We explore this further
with the intervals method.
> intervals( fm1Orth.gls )
Approximate 95% confidence intervals
Coefficients:
(Intercept)
Sex
I(age - 11)
Sex:I(age - 11)
lower
23.06672
-1.87071
0.52392
-0.30292
Correlation structure:
lower
est.
cor(1,2) 0.098855 0.56841
cor(1,3) 0.242122 0.65878
cor(1,4) 0.021146 0.52222
cor(2,3) 0.114219 0.58063
cor(2,4) 0.343127 0.72510
cor(3,4) 0.382248 0.73967
est.
23.80139
-1.13605
0.65160
-0.17524
upper
24.536055
-0.401380
0.779289
-0.047552
upper
0.83094
0.87030
0.81361
0.83731
0.90128
0.90457
Variance function:
lower
est. upper
10 0.55728 0.87920 1.3871
12 0.71758 1.07468 1.6095
14 0.61253 0.95872 1.5005
Residual standard error:
lower est. upper
1.5985 2.329 3.3933
All condence intervals for the correlation parameters overlap, corroborating the compound symmetry assumption. We can test it formally by
updating the tted object and using the anova method.
> fm2Orth.gls <+
update(fm1Orth.gls, corr = corCompSymm(form = ~ 1 | Subject))
> anova( fm1Orth.gls, fm2Orth.gls )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm1Orth.gls
1 14 455.32 492.34 -213.66
fm2Orth.gls
2 9 452.74 476.54 -217.37 1 vs 2 7.4256 0.1909
253
The large p-value for the likelihood ratio statistics reported in the anova
output conrms the compound symmetry model.
The condence intervals for the variance function parameters corresponding to fm2Orth.gls,
> intervals( fm2Orth.gls )
. . .
Variance function:
lower
est. upper
10 0.56377 0.86241 1.3193
12 0.68132 1.03402 1.5693
14 0.60395 0.92045 1.4028
. . .
all include the value 1, suggesting that the variability does not change with
age. The high p-value for the associated likelihood ratio test conrms this
assumption.
> fm3Orth.gls <- update( fm2Orth.gls, weights = NULL )
> anova( fm2Orth.gls, fm3Orth.gls )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm2Orth.gls
1 9 452.74 476.54 -217.37
fm3Orth.gls
2 6 448.53 464.40 -218.26 1 vs 2 1.7849 0.6182
As with other modeling functions, the plot method is used to assess the
assumptions in the model. Its syntax is identical to the plot method for
class lme. For example, to examine if the error variance is the same for boys
and girls we may examine the plot of the normalized residuals versus age
by gender, obtained with
> plot( fm3Orth.gls, resid(., type = "n") ~ age | Sex ) # Fig. 5.16
and displayed in Figure 5.16. It is clear that there is more variability among
boys than girls, which we can represent in the model with the varIdent
variance function class.
> fm4Orth.gls <- update( fm3Orth.gls,
+
weights = varIdent(form = ~ 1 | Sex) )
> anova( fm3Orth.gls, fm4Orth.gls )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm3Orth.gls
1 6 448.53 464.40 -218.26
fm4Orth.gls
2 7 438.96 457.47 -212.48 1 vs 2 11.569
7e-04
As expected, the likelihood ratio test gives strong evidence in favor of the
heteroscedastic model.
The qqnorm method is used to assess the assumption of normality for the
errors. Its syntax is identical to the corresponding lme method. The normal
plots of the normalized residuals by gender, obtained with
> qqnorm( fm4Orth.gls, ~resid(., type = "n") )
# Figure 5.17
254
Male
10
11
12
13
14
Female
Normalized residuals
4
3
2
1
0
-1
-2
8
10
11
12
13
14
Age (yr)
FIGURE 5.16. Scatter plots of normalized residuals versus age by gender for the
fm3Orth.gls tted object.
and displayed in Figure 5.17, do not indicate serious departures from normality and conrm that the variance function model included in fm4Orth.gls
was successful in accommodating the error heteroscedasticity.
It is interesting, at this point, to compare the gls t corresponding to
fm4Orth.gls to the lme t corresponding to fm3Orth.lme, obtained in 4.3.1.
Because the corresponding models are not nested, a likelihood ratio test
is nonsensical. However, the information criteria can be compared, as the
xed eects models are identical for the two ts. The anova method can be
used to compare gls and lme objects.
> anova( fm3Orth.lme, fm4Orth.gls, test = F )
Model df
AIC
BIC logLik
fm3Orth.lme
1 9 432.30 456.09 -207.15
fm4Orth.gls
2 7 438.96 457.47 -212.48
The lme t has the smallest AIC and BIC and, therefore, seems to give a
better representation of the Orthodont data.
The choice between an lme model and a gls model should take into account more than just information criteria and likelihood tests. A mixedeects model has a hierarchical structure which, in many applications, provides a more intuitive way of accounting for within-group dependency than
the direct modeling of the marginal variancecovariance structure of the response in the gls approach. Furthermore, the mixed-eects estimation gives,
as a byproduct, estimates for the random eects, which may be of interest
in themselves. The gls model focuses on marginal inference and is more
appealing when a hierarchical structure for the data is not believed to be
-1
Male
255
3
Female
-1
-2
-2
-1
Normalized residuals
FIGURE 5.17. Normal plots plots of normalized residuals by gender for the
fm4Orth.gls tted object.
(5.35)
256
Residuals (ml/hr)
-5
-10
0.5
1.0
1.5
2.0
2.5
3.0
FIGURE 5.18. Plot of residuals versus transmembrane pressure for the homoscedastic tted object fm1Dial.gls.
Because no variance functions and correlation structures are used, the gls
t is equivalent to an lm t, in this case.
The plot of the residuals versus transmembrane pressure, obtained with
> plot( fm1Dial.gls, resid(.) ~ pressure,
+
abline = 0 )
# Figure 5.18
and shown in Figure 5.18, displays the same pattern of heteroscedasticity observed for the within-group residuals of the lme object fm1Dial.lme,
presented in Figure 5.2.
As in the lme analysis of 5.2.2, we choose the exible variance function
class varPower to model the heteroscedasticity in the response, and test its
signicance using the anova method.
> fm2Dial.gls <- update( fm1Dial.gls,
+
weights = varPower(form = ~ pressure) )
> anova( fm1Dial.gls, fm2Dial.gls)
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm1Dial.gls
1 11 768.10 799.65 -373.05
fm2Dial.gls
2 12 738.22 772.63 -357.11 1 vs 2 31.882 <.0001
257
Standardized residuals
-1
-2
0.5
1.0
1.5
2.0
2.5
3.0
The empirical ACF values indicate that the within-group observations are
correlated, and that the correlation decreases with lag. As usual, it is more
informative to look at a plot of the empirical ACF, displayed in Figure 5.20
and obtained with
> plot( ACF( fm2Dial.gls, form = ~ 1 | Subject),
+
alpha = 0.01 )
# Figure 5.20
258
Autocorrelation
1.0
0.5
0.0
-0.5
0
Lag
FIGURE 5.20. Empirical autocorrelation function corresponding to the standardized residuals of the fm2Dial.gls object.
The ACF pattern observed in Figure 5.20 suggests that an AR(1) model
may be appropriate to describe it. The corAR1 class is used to represent it.
> fm3Dial.gls <- update( fm2Dial.gls,
+
corr = corAR1(0.771, form = ~ 1 | Subject) )
> fm3Dial.gls
Generalized least squares fit by REML
Model: rate ~(pressure + pressure^2 + pressure^3 + pressure^4)*QB
Data: Dialyzer
Log-restricted-likelihood: -308.34
Coefficients:
(Intercept) pressure I(pressure^2) I(pressure^3) I(pressure^4)
-16.818
92.334
-49.265
11.4
-1.0196
QB pressure:QB I(pressure^2):QB I(pressure^3):QB
-1.5942
1.7054
2.1268
0.47972
I(pressure^4):QB
-0.22064
Correlation Structure: AR(1)
Formula: ~ 1 | Subject
Parameter estimate(s):
Phi
0.75261
Variance function:
Structure: Power of variance covariate
Formula: ~ pressure
Parameter estimates:
power
0.51824
Degrees of freedom: 140 total; 130 residual
Residual standard error: 3.0463
259
The initial value for the AR(1) parameter is given by the lag-1 empirical
autocorrelation. The form argument is used to specify the grouping variable
Subject.
The intervals method is used to assess the variability in the estimates.
> intervals( fm3Dial.gls )
Approximate 95% confidence intervals
Coefficients:
(Intercept)
pressure
I(pressure^2)
I(pressure^3)
I(pressure^4)
QB
pressure:QB
I(pressure^2):QB
I(pressure^3):QB
I(pressure^4):QB
lower
est.
-18.8968 -16.81845
81.9144 92.33423
-63.1040 -49.26515
4.5648 11.39967
-2.1248 -1.01964
-4.7565 -1.59419
-13.6410
1.70544
-17.9484
2.12678
-9.3503
0.47972
-1.8021 -0.22064
upper
-14.740092
102.754073
-35.426279
18.234526
0.085557
1.568141
17.051847
22.201939
10.309700
1.360829
Correlation structure:
lower
est.
upper
Phi 0.56443 0.75261 0.86643
Variance function:
lower
est.
upper
power 0.32359 0.51824 0.71289
Residual standard error:
lower
est. upper
2.3051 3.0463 4.0259
No signicant correlations are observed in the plot of the empirical autocorrelation function for the normalized residuals of fm3Dial.gls, displayed
in Figure 5.21, indicating that the AR(1) adequately represents the withinsubject dependence.
The gls model corresponding to fm3Dial.gls may be compared to the best
lme model for the Dialyzer data in 5.2.2, corresponding to the fm2Dial.lme
object. As the models are not nested, only the information criterion statistics can be compared.
260
Autocorrelation
1.0
0.5
0.0
-0.5
0
Lag
The two log-likelihoods are very similar, suggesting that the models give
equivalent representations of the data. Because the gls model has ve fewer
parameters than the lme model, its information criterion statistics take
smaller values, suggesting it is a better model. However, as pointed out
in the previous example, the choice between a gls and an lme should take
other factors in consideration, besides the information criteria.
Wheat Yield Trials
Stroup and Baenziger (1994) describe an agronomic experiment to compare
the yield of 56 dierent varieties of wheat planted in four blocks arranged
according to a randomized complete complete block design. All 56 varieties
of wheat were used in each block. The latitude and longitude of each experimental unit in the trial were also recorded. These data are described in
greater detail in Appendix A.31, being included in the nlme library as the
groupedData object Wheat2.
The plot of the wheat yields for each variety by block, shown in Figure 5.22, suggests that a block eect is present in the data. As pointed out
by Littell et al. (1996, 9.6.2), the large number of plots within each block
makes the assumption of within-block homogeneity unrealistic. A better
representation of the dependence among the experimental units may be
obtained via spatial correlation structures that use the information on their
latitude and longitude. The corresponding extended linear model for the ith
wheat variety yield in the jth block, yij , for i = 1, . . . , 56, j = 1, . . . , 4,
Block
s + # # w{
>
1
# w{
s > > #w
{+ > #
s
2
+ {+ +{
s{ + sw{+
10
#{{{
+
+{>+ ## s
>w
wssw
s{#>+
{{>ww
#>+ >
#ss {++
#> {ww
+
>{> + {w>s#
s #w+ >w
sws
#{ + s> s#{ w
# s+{+
261
+#s#+w
>s>{{ >{>s>{+ w >
w{ s#+
w
w
#+{w
#>#+s{ w
#{+
>ss
20
30
40
Yield
FIGURE 5.22. Yields of 56 dierent varieties of wheat for each block of a randomized complete block design.
is given by
yij = i + ij ,
= N 0, 2 ,
(5.36)
where i denotes the average yield for variety i and ij denotes the error
term, assumed to be normally distributed with mean 0 and with variance
covariance matrix 2 .
To explore the structure of , we initially t the linear model (5.36) with
the errors assumed independent and homoscedastic, i.e., = I.
> fm1Wheat2 <- gls( yield ~ variety - 1, Wheat2 )
As described in 5.3.2, the sample semivariogram of the standardized residuals is the primary tool for investigating spatial correlation in the errors.
The variogram method for class gls is used to obtain the sample semivariogram for the residuals of a gls object. Its syntax is identical to that of the
Variogram method for lme objects.
> Variogram( fm1Wheat2, form = ~ latitude + longitude )
variog
dist n.pairs
1 0.36308 4.3000
1212
2 0.40696 5.6080
1273
3 0.45366 8.3863
1256
4 0.51639 9.3231
1245
5 0.57271 10.5190
1254
6 0.58427 12.7472
1285
7 0.63854 13.3929
1175
8 0.65123 14.7635
1288
9 0.73590 16.1818
1290
10 0.73797 17.3666
1187
11 0.75081 18.4567
1298
262
Semivariogram
0.8
0.6
0.4
0.2
0.0
0
10
15
20
25
30
Distance
FIGURE 5.23. Sample semivariogram estimates corresponding to the standardized residuals of the fm1Wheat2 object. A loess smoother is added to the plot to
enhance the visualization of patterns in the semivariogram.
12
13
14
15
16
17
18
19
20
0.88098
0.81019
0.86199
0.86987
0.85818
0.97145
0.98778
1.09617
1.34146
20.2428
21.6335
22.6736
24.6221
26.2427
28.4542
30.7877
34.5879
39.3641
1226
1281
1181
1272
1223
1263
1228
1263
1234
263
> fm2Wheat2
Generalized least squares fit by REML
Model: yield ~ variety - 1
Data: Wheat2
Log-restricted-likelihood: -533.93
Coefficients:
varietyARAPAHOE varietyBRULE varietyBUCKSKIN varietyCENTURA
26.659
25.85
34.848
25.095
. . .
varietySIOUXLAND varietyTAM107 varietyTAM200 varietyVONA
25.656
22.77
18.764
24.782
Correlation Structure: Spherical spatial correlation
Formula: ~ latitude + longitude
Parameter estimate(s):
range nugget
27.457 0.20931
Degrees of freedom: 224 total; 168 residual
Residual standard error: 7.4106
264
Semivariogram
0.8
0.6
0.4
0.2
0.0
10
20
30
40
Distance
Parameter estimate(s):
range nugget
13.461 0.1936
Degrees of freedom: 224 total; 168 residual
Residual standard error: 8.8463
> anova( fm2Wheat2, fm3Wheat2 )
Model df
AIC
BIC logLik
fm2Wheat2
1 59 1185.9 1370.2 -533.93
fm3Wheat2
2 59 1183.3 1367.6 -532.64
The smaller AIC and BIC values for the rational quadratic model indicate that it gives a better representation of the correlation in the data than
the spherical model. We can test the signicance of the spatial correlation
parameters comparing the fm3Wheat2 t to the t with independent errors
corresponding to fm1Wheat2.
> anova( fm1Wheat2, fm3Wheat2 )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm1Wheat2
1 57 1354.7 1532.8 -620.37
fm3Wheat2
2 59 1183.3 1367.6 -532.64 1 vs 2 175.46 <.0001
The large value of the likelihood ratio test statistics gives strong evidence
against the assumption of independence.
We can verify the adequacy of the corRatio model by examining the plot
of the sample semivariogram for the normalized residuals of fm3Wheat2.
> plot( Variogram(fm3Wheat2, resType = "n") )
# Figure 5.24
No patterns are observed in the plot of the sample semivariogram, suggesting that the rational quadratic model is adequate.
265
Normalized residuals
-1
-2
-3
20
25
30
35
Fitted values
FIGURE 5.25. Scatter plot of normalized residuals versus tted values for the
tted object fm3Wheat2.
The normalized residuals are also useful for investigating heteroscedasticity and departures from normality. For example, the plot of the normalized
residuals versus the tted values, displayed in Figure 5.25 and obtained
with
> plot( fm3Wheat2, resid(., type = "n") ~ fitted(.),
+
abline = 0 )
# Figure 5.25
does not indicate any heteroscedastic patterns. The normal plot of the
normalized residuals in Figure 5.26 is obtained with
> qqnorm( fm3Wheat2, ~ resid(., type = "n") )
# Figure 5.26
The small p-value of the F-test for variety indicates that there are signicant dierences between varieties. We can test specic contrasts using the
266
-1
-2
-3
-3
-2
-1
Normalized residuals
FIGURE 5.26. Normal plots plots of normalized residuals for the tted object
fm3Wheat2.
argument L to anova, with the original cell means parametrization. For example, to test the dierence between the rst and the third wheat varieties
we use
> anova( fm3Wheat2, L = c(-1, 0, 1) )
Denom. DF: 168
F-test for linear combination(s)
varietyARAPAHOE varietyBUCKSKIN
-1
1
numDF F-value p-value
1
1 7.6966 0.0062
The small p-value for the F-test, combined with the coecient estimates
displayed previously, indicates that the BUCKSKIN variety has signicant
higher yields than the ARAPAHOE variety. Similar analyses can be obtained
for other linear contrasts of the model coecients.
Exercises
267
several classes of correlation structures to represent serial and spatial correlation, and describe how variance functions and correlations structures
can be combined to exibly model the within-group variancecovariance
structure.
We illustrate, through several examples, how the lme function is used to
t the extended linear mixed-eects model and describe a suite of S classes
and methods to implement variance functions (varFunc) and correlation
structures (corStruct). Any of these classes, or others dened by users, can
be used with lme to t extended linear mixed-eects models.
An extended linear model with heteroscedastic, correlated errors is introduced and a new modeling function to t it, gls, is described. This extended
linear model can be thought of as an extended linear mixed-eects model
with no random eects, and any of the varFunc and corStruct classes available with lme can also be used with gls. Several examples are used the
illustrate the use of gls and its associated methods.
Exercises
1. The within-group heteroscedasticity observed for the fm1BW.lme t of
the BodyWeight data in 5.2.2 was modeled using the power variance
function (varPower) with the tted values as the variance covariate.
An alternative approach, which is explored in this exercise, is to allow
dierent variances for each Diet.
(a) Plot the residuals of fm1BW.lme versus Diet (use plot(fm1BW.lme,
resid(.) ~ as.integer(Diet), abline = 0)). Note that the variability for Diets 1 and 2 are similar, but the residuals for Diet 3
have larger variability.
(b) Update the fm1BW.lme t allowing dierent variances per Diet
(use weights = varIdent(form = ~1|Diet)). To get a t that can
be compared to fm1BW.lme, remember to set the contrasts parameterization to "contr.helmert". Obtain condence intervals
on the variance function coecients using intervals. Do they
agree with the conclusions from the plot of the residuals versus
Diet? Explain.
(c) Compare the t with the varIdent variance function to fm1BW.lme
using anova. Compare it also to fm2BW.lme, the t with the
varPower variance function. Which variance function model is
preferable? Why?
(d) Use the gls function described in 5.4 to t a model with a
varPower variance function in the tted values and a corCAR1
correlation structure in Time, but with no random eects. Com-
268
Exercises
269
We use a variance function based on a linear combination of covariates, which we denote varReg. Letting v denote a vector of variance
covariates and the variance parameters, the varReg variance model
and variance function are dened as
g (v, ) = exp v T ,
Var () = 2 exp 2v T ,
where, as in 5.2, represents the random variable whose variance
is being modeled. Note that the variance parameters are unconstrained and, for identiability, the linear model in the variance function can not have an intercept (it would be confounded with ). The
varReg class extends the varExp class dened in 5.2 by allowing more
than one variance covariate to be used. The varReg class is frequently
used in the analysis of dispersion eects in robust designs (Wolnger
and Tobias, 1998).
(a) Write a constructor for the varReg class. It should contain at
least two arguments: value with initial values for the variance
parameters (set the default to numeric(0) to indicate noninitialized structures) and form with a linear formula dening the
variance covariates. To simplify things, assume that the tted
object can not be used to dene a variance covariate for this
class, that no stratication of parameters will be available and
that no parameters can be made xed in the estimation. You
can use the varExp constructor as a template.
(b) Next you need to write an initialize method. This takes two
required arguments object, the varReg object, and data, a data
frame in which to evaluate the variance covariates. For consistency with the generic function, include a ... at the end of the
argument list. The initialize method should obtain the model
matrix corresponding to formula(object), evaluated on data,
and save it as an attribute for later calculations. As mentioned
before, the model matrix should not have an (Intercept) column; check if one is present and remove it if necessary. You
should make sure that the parameters are initialized (if no starting values are given in the constructor, initialize them to 0, corresponding to an homoscedastic variance model). Additionally, the
"logLik" and "weights" attributes of the returned object need to
be initialized. Note that the weights are simply exp(-modMat %*%
coefs), where modMat represents the model matrix for the variance covariates and coefs the initialized coecients. You can
use the initialize.varExp method as a template.
(c) The coef method for the varReg class is very simple: for consistency with other varFunc coef methods, it takes three arguments:
object (the varReg object), unconstrained, and allCoef. Because
270
Part II
Nonlinear Mixed-Eects
Models
6
Nonlinear Mixed-Eects Models: Basic
Concepts and Motivating Examples
274
(6.1)
logistic
275
polynomial
3.0
2.5
h(t)
2.0
1.5
1.0
0.5
0.0
0.4
0.6
0.8
1.0
1.2
1.4
1.6
logistic
polynomial
h(t)
-1
0.0
0.5
1.0
1.5
2.0
276
277
# Figure 6.3
2.5
2.0
1.5
1.0
0.5
0.0
1
2.5
2.0
1.5
1.0
0.5
0.0
0
FIGURE 6.3. Concentration of indomethicin over time for six subjects following
intravenous injection.
278
2 > 0, 4 > 0.
(6.2)
(6.3)
279
Subject
6
5
2
4
1
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
Residuals
t value
10.9522
3.9882
2.2713
-2.6705
280
understanding the true structure of the data and from considering dierent sources of variability that are of interest in themselves. For example,
in the indomethicin study, an important consideration in determining an
adequate therapeutic regime for the drug is knowing how the concentration
proles vary among individuals.
To t a separate biexponential model to each subject, thus allowing the
individual eects to be incorporated in the parameter estimates, we express
the model as
yij = 1i exp [ exp (2i ) tj ] + 3i exp [ exp (4i ) tj ] + ij ,
(6.4)
where, as before, the ij are independent N (0, 2 ) errors and use the
nlsList function
> fm1Indom.lis <- nlsList(conc ~ SSbiexp(time, A1, lrc1, A2, lrc2),
+
data = Indometh )
> fm1Indom.lis
Call:
Model: conc ~ SSbiexp(time, A1, lrc1, A2, lrc2) | Subject
Data: Indometh
Coefficients:
A1
lrc1
1 2.0293 0.57938
4 2.1979 0.24249
2 2.8277 0.80143
5 3.5663 1.04095
6 3.0022 1.08811
3 5.4677 1.74968
A2
0.19154
0.25481
0.49903
0.29170
0.96840
1.67564
lrc2
-1.78783
-1.60153
-1.63508
-1.50594
-0.87324
-0.41226
We can see that there is considerable variability in the individual parameter estimates and that the residual standard error is less than one-half
that from the nls t. The boxplots of the residuals by subject, shown in
Figure 6.5, indicate that the individual eects have been accounted for in
the tted nlsList model.
The nlsList model is at the other extreme of the exibility spectrum
compared to the nls model: it uses 24 coecients to represent the individual concentration proles and does not take into account the obvious similarities among the individual curves, indicated in Figure 6.3. The nlsList
model is useful when one is interested in modeling the behavior of a particular, xed set of individuals, but it is not adequate when the observed
individuals are to be treated as a sample from a population of similar individuals, which constitutes the majority of applications involving grouped
data. In this case, the interest is in estimating the average behavior of an
281
Subject
6
5
2
4
1
-0.2
-0.1
0.0
0.1
0.2
Residuals (mcg/ml)
A1
3
| | |
| | |
|||
2
|
|
|
|
0.0
0.5
1.0
|||
| | |
2.0
-0.5
|
|
0.5
|
1.5
|
|
|
|
1.5
| | |
lrc2
| | |
|||
A2
| | |
Subject
lrc1
|
-4
-3
-2
-1
individual in the population and the variability among and within individuals, which is precisely what mixed-eects models are designed to do.
The plot of the individual condence intervals for the coecients in the
nlsList model, shown in Figure 6.6, gives a better idea about their variability among subjects.
> plot( intervals(fm1Indom.lis) )
# Figure 6.6
The terminal phase log-rate constants, 4i , do not seem to vary substantially among individuals, but the remaining parameters do.
Recall that in lmList ts to balanced data, the lengths of the condence
intervals on a parameter were the same for all the groups (see Figure 1.12,
p. 33, or Figure 1.13, p. 34). This does not occur in an nlsList t because
the approximate standard errors used to produce the condence intervals
in a nonlinear least squares t depend on the parameter estimates (Seber
and Wild, 1989, 5.1).
282
4i
283
The nlme function extracts the information about the model to t, the
parameters to estimate, and the starting estimates for the xed eects
from the fm1Indom.lis object.
The near-zero estimate for the standard deviation of the lrc2 random
eect suggests that this term could be dropped from the model. The remaining estimated standard deviations suggest that the other random effects should be kept in the model. We can test if the lrc2 random eect
can be removed from the model by updating the t and using anova.
> fm2Indom.nlme <- update( fm1Indom.nlme,
+
random = pdDiag(A1 + lrc1 + A2 ~ 1) )
> anova( fm1Indom.nlme, fm2Indom.nlme )
Model df
AIC
BIC logLik
Test
L.Ratio
fm1Indom.nlme
1 9 -91.185 -71.478 54.592
fm2Indom.nlme
2 8 -93.185 -75.668 54.592 1 vs 2 6.2637e-06
p-value
fm1Indom.nlme
fm2Indom.nlme
0.998
The two ts give nearly identical log-likelihoods, conrming that lrc2 can
be treated as a purely xed eect.
To further explore the variancecovariance structure of the random effects that are left in fm2Indom.nlme, we update the t using a general
positive-denite matrix.
> fm3Indom.nlme <- update( fm2Indom.nlme, random = A1+lrc1+A2 ~ 1 )
284
> fm3Indom.nlme
. . .
Random effects:
Formula: list(A1 ~ 1, lrc1 ~ 1, A2 ~ 1)
Level: Subject
Structure: General positive-definite
StdDev Corr
A1 0.690406 A1
lrc1
lrc1 0.179030 0.932
A2 0.153669 0.471 0.118
Residual 0.078072
. . .
The large correlation between the A1 and lrc1 random eects and the small
correlation between these random eects and the A2 random eect suggest
that a block-diagonal could be used to represent the variancecovariance
structure of the random eects.
> fm4Indom.nlme <- update( fm3Indom.nlme,
+
random = pdBlocked(list(A1 + lrc1 ~ 1, A2 ~ 1)) )
> anova( fm3Indom.nlme, fm4Indom.nlme )
Model df
AIC
BIC logLik
Test L.Ratio
fm3Indom.nlme
1 11 -94.945 -70.859 58.473
fm4Indom.nlme
2 9 -97.064 -77.357 57.532 1 vs 2 1.8809
p-value
fm3Indom.nlme
fm4Indom.nlme 0.3904
The large p-value for the likelihood ratio test and the smaller values
for the AIC and BIC corroborate the block-diagonal variancecovariance
structure. Allowing the A1 and lrc1 random eects to be correlated causes
a signicant improvement in the log-likelihood.
> anova( fm2Indom.nlme, fm4Indom.nlme )
Model df
AIC
BIC logLik
Test L.Ratio
fm2Indom.nlme
1 8 -93.185 -75.668 54.592
fm4Indom.nlme
2 9 -97.064 -77.357 57.532 1 vs 2 5.8795
p-value
fm2Indom.nlme
fm4Indom.nlme 0.0153
The plot of the standardized residuals versus the tted values corresponding to fm4Indom.nlme, presented in Figure 6.7, does not indicate any
departures from the NLME model assumptions, except for two possible
outlying observations for Individual 2.
> plot( fm4Indom.nlme, id = 0.05, adj = -1 )
# Figure 6.7
No signicant departures from the assumption of normality for the withingroup errors is observed in the normal probability plot of the standardized
residuals of fm4Indom.nlme, shown in Figure 6.8.
3
Standardized residuals
285
1
0
-1
-2
-3
2
0.0
0.5
1.0
1.5
2.0
2.5
FIGURE 6.7. Scatter plot of standardized residuals versus tted values for
fm4Indom.nlme.
# Figure 6.8
# Figure 6.9
Note that the within-group predictions are in close agreement with the observed concentrations, illustrating that the NLME model can accommodate
individual eects.
We conclude that fm4Indom.nlme provides a good representation of the
concentration proles in the indomethicin data. Its summary
> summary( fm4Indom.nlme )
Nonlinear mixed-effects model fit by maximum likelihood
Model: conc ~ SSbiexp(time, A1, lrc1, A2, lrc2)
Data: Indometh
AIC
BIC logLik
-97.064 -77.357 57.532
Random effects:
Composite Structure: Blocked
Block 1: A1, lrc1
Formula: list(A1 ~ 1, lrc1 ~ 1)
286
-1
-2
-3
-2
-1
Standardized residuals
FIGURE 6.8. Normal plot of standardized residuals for the fm4Indom.nlme nlme
t.
fixed
0
5
Subject
2
2.5
2.0
1.5
1.0
0.5
0.0
1
2.5
2.0
1.5
1.0
0.5
0.0
0
287
Level: Subject
Structure: General positive-definite
StdDev Corr
A1 0.69496 A1
lrc1 0.17067 0.905
Block 2: A2
Formula: A2 ~ 1 | Subject
A2 Residual
StdDev: 0.18344 0.078226
Fixed effects: list(A1
Value Std.Error
A1 2.8045
0.31493
lrc1 0.8502
0.11478
A2 0.5887
0.13321
lrc2 -1.1029
0.16954
. . .
~ 1, lrc1 ~ 1, A2 ~ 1, lrc2 ~ 1)
DF t-value p-value
57 8.9049 <.0001
57 7.4067 <.0001
57 4.4195 <.0001
57 -6.5054 <.0001
shows that the xed-eects estimates are similar to the parameter estimates in the nls t fm1Indom.nls. The approximate standard errors for the
xed eects are substantially dierent from and, except for A1, considerably
smaller than those from the nls t. The estimated within-group standard
error is slightly larger than the residual standard error in the nlsList t
fm1Indom.lis.
288
40
60
80
P
1989
P
1990
30
25
20
15
10
5
0
F
1988
F
1989
F
1990
30
25
20
15
10
5
0
20
40
60
80
20
40
60
80
FIGURE 6.10. Average leaf weight per plant of two genotypes of soybean versus
time since planting, over three dierent planting years. Within each year data
were obtained on eight plots of each variety of soybean.
1
2
3
>
Plot Variety
1988F1
F
1988F1
F
1988F1
F
plot( Soybean,
# Figure 6.10
The average leaf weight per plant in each plot is measured the same number
of times, but at dierent times, making the data unbalanced.
There is considerable variation in the growth curves among plots, but
the same overall S-shaped pattern is observed for all plots. This nonlinear
growth pattern is well described by the three parameter logistic model (6.1),
introduced in 6.1. The self-starting function SSlogis, described in Appendix C.7, can be used to automatically generate starting estimates for
the parameters in an nlsList t.
> fm1Soy.lis <- nlsList( weight ~ SSlogis(Time, Asym, xmid, scal),
+
data = Soybean )
Error in nls(y ~ 1/(1 + exp((xmid - x)/scal)..: singular gradient
matrix
Dumped
289
> fm1Soy.lis
Call:
Model: weight ~ SSlogis(Time, Asym, xmid, scal) | Plot
Data: Soybean
Coefficients:
Asym
1988F4 15.1513
1988F2 19.7455
. . .
1989P8
NA
. . .
1990P5 19.5438
1990P2 25.7873
xmid
52.834
56.575
scal
5.1766
8.4067
NA
NA
51.148 7.2920
62.360 11.6570
The error message from nls indicates that convergence was not attained for
one of the plots, 1989P8. The nlsList function is able to recover from such
nonconvergence problems and carry on with subsequent nls ts. Missing
values (NA) are assigned to the coecients of the nonconverging ts. The
coecients in fm1Soy.lis are related to the logistic model parameters as
follows: 1 = Asym, 2 = xmid, and 3 = scal
Analysis of the individual condence intervals for fm1Soy.lis suggests
that random eects are needed for all of the parameters in the logistic
model. The corresponding nonlinear mixed-eects model for the average
leaf weight per plant yij in plot i at tij days after planting is
1i
+ ij ,
yij =
1 + exp [ (tij 2i ) /3i ]
1
b1i
1i
i =
2i =
2 + b2i = + bi ,
3i
3
b3i
bi N (0, ) , ij N 0, 2 .
(6.7)
The xed eects represent the mean values of the parameters i in the
population and the random eects bi represent the deviations of the i
from their mean values. The random eects are assumed to be independent for dierent plots and the within-group errors ij are assumed to be
independent for dierent i, j and to be independent of the random eects.
Because the number of plots in the soybean data, 48, is large compared
to the number of random eects in (6.7), we use a general positive-denite
for the initial NLME model. Because we can extract information about
the model, the parameters to estimate, and the starting values for the xed
eects from the fm1Soy.lis object we can t model (6.7) with the simple
call
290
Standardized residuals
4
2
0
-2
-4
-6
0
10
15
20
25
FIGURE 6.11. Scatter plot of standardized residuals versus tted values for
fm1Soy.nlme.
The plot of the standardized residuals versus the tted values in Figure 6.11 shows a pattern of increasing variability for the within-group errors. We model the within-group heteroscedasticity using the power variance function, described in 5.2.1 and represented in S by the varPower
class.
> fm2Soy.nlme <- update( fm1Soy.nlme, weights = varPower() )
291
Standardized residuals
3
2
1
0
-1
-2
10
15
20
FIGURE 6.12. Standardized residuals versus tted values in the nlme t of the
soybean data, with heteroscedastic error.
# Figure 6.13
In Figure 6.13 all three parameters seem to vary with year and variety. It
appears that the asymptote (Asym) and the scale (scal) are larger for the P
variety than for the F variety and that this dierence is more pronounced
in 1989. The time at which half of the asymptotic leaf weight is attained
(xmid) appears to be smaller for the P variety than for the F variety.
The fixed argument to nlme allows linear modeling of parameters with
respect to covariates. For example, we can model the dependence of all
three parameters on Year with
> soyFix <- fixef( fm2Soy.nlme )
292
xmid
scal
1990 P
1990 F
1989 P
1989 F
1988 P
1988 F
-8
-6
-4
-2
-2
-1
-0.4
-0.2
0.0
0.2
Random effects
FIGURE 6.13. Estimates of the random eects by Year and Variety in the nlme
t of the soybean data.
>
>
+
+
>
.
Random effects:
Formula: list(Asym ~ 1, xmid ~ 1, scal ~ 1)
Level: Plot
Structure: General positive-definite
StdDev
Corr
Asym.(Intercept) 2.3686896 Asy(I) xmd(I)
xmid.(Intercept) 0.5863454 -0.997
scal.(Intercept) 0.0043059 -0.590 0.652
Residual 0.2147634
Variance function:
Structure: Power of variance covariate
Formula: ~ fitted(.)
Parameter estimates:
power
0.95187
. . .
293
As suggested by Figure 6.13, Year has a very signicant eect on the growth
pattern of the soybean plants.
The estimated standard deviation for the scal random eect in the
fm3Soy.nlme t is only 0.004, corresponding to an estimated coecient of
variation with respect to the scal.(Intercept) xed eect of only 0.05%.
This suggests that scal can be treated as a purely xed eect. When we do
ret the model dropping the scal random eect, we get a p-value of 0.99
in the likelihood ratio test. It often happens that creating a better-tting
model for the xed eects, by including their dependence on covariates, reduces the need for random-eects terms. In these cases, the between-group
parameter variation is mostly being explained by the covariates included
in the model.
Proceeding sequentially in the model-building process by examining plots
of the estimated random eects against the experimental factors, testing
for the inclusion of covariates and for the elimination of random eects, we
end up with the following model, in which the only random eect is that
for Asym.
> summary( fm4Soy.nlme )
. . .
AIC
BIC logLik
616.32 680.66 -292.16
Random effects:
Formula: Asym ~ 1 | Plot
Asym.(Intercept) Residual
StdDev:
1.0349 0.21804
Variance function:
Structure: Power of variance covariate
Formula: ~ fitted(.)
294
Parameter estimates:
power
0.9426
Fixed effects: list(Asym ~ Year * Variety, xmid ~ Year + Variety,
scal ~ Year)
Value Std.Error DF t-value p-value
Asym.(Intercept) 19.434
0.9537 352 20.379 <.0001
Asym.Year1989 -8.842
1.0719 352 -8.249 <.0001
Asym.Year1990 -3.707
1.1768 352 -3.150 0.0018
Asym.Variety
1.623
1.0380 352
1.564 0.1188
Asym.Year1989Variety
5.571
1.1704 352
4.760 <.0001
Asym.Year1990Variety
0.147
1.1753 352
0.125 0.9004
xmid.(Intercept) 54.815
0.7548 352 72.622 <.0001
xmid.Year1989 -2.238
0.9718 352 -2.303 0.0218
xmid.Year1990 -4.970
0.9743 352 -5.101 <.0001
xmid.Variety -1.297
0.4144 352 -3.131 0.0019
scal.(Intercept)
8.064
0.1472 352 54.762 <.0001
scal.Year1989 -0.895
0.2013 352 -4.447 <.0001
scal.Year1990 -0.673
0.2122 352 -3.172 0.0016
. . .
The residual plots for fm4Soy.nlme do not indicate any violations in the
NLME model assumptions. An overall assessment of the adequacy of the
model is provided by the plot of the augmented predictions in Figure 6.14,
which indicates that the fm4Soy.nlme model describes the individual growth
patterns of the soybean plots well.
1990P8
40
60
80
20
1990P7
1990P3
40
60
80
20
1990P1
1990P6
40
60
80
20
1990P5
1990P2
40
295
60
80
1990P4
30
25
20
15
10
5
0
1990F2
1990F3
1990F4
1990F5
1990F1
1990F8
1990F7
1990F6
1989P7
1989P4
1989P6
1989P5
1989P1
1989P3
1989P2
1989P8
30
25
20
15
10
5
0
30
25
20
15
10
5
0
1989F6
1989F5
1989F4
1989F1
1989F2
1989F7
1989F8
1989F3
1988P1
1988P5
1988P4
1988P8
1988P7
1988P3
1988P2
1988P6
30
25
20
15
10
5
0
30
25
20
15
10
5
0
1988F4
1988F2
1988F1
1988F7
1988F5
1988F8
1988F6
1988F3
30
25
20
15
10
5
0
20
40
60
80
20
40
60
80
20
40
60
80
20
40
60
80
td < t,
296
25
200
400
35
52
200
400
13
16
200
400
59
48
200
400
24
50
70
60
50
40
30
20
10
70
37
20
12
17
26
49
39
54
10
58
51
19
27
53
60
50
40
30
20
10
70
60
50
40
30
20
10
70
33
44
23
57
34
18
36
21
15
38
11
45
22
40
47
31
14
41
60
50
40
30
20
10
70
60
50
40
30
20
10
42
70
28
30
56
46
55
32
43
29
60
50
40
30
20
10
0
200
400
200
400
200
400
200
400
200
400
Time (hr)
FIGURE 6.15. Serum concentrations of phenobarbital in 59 newborn infants under varying dosage regimens versus time after birth.
" Dd
Cl
exp (t td ) .
V
V
(6.8)
d:td <t
Model (6.8) can also be expressed in recursive form (Grasela and Donn,
1985). To ensure that the estimates of V and Cl are positive, we reparameterize 6.8) using the logarithm of these parameters: lV = log V and
lCl = log Cl . The function phenoModel in the nlme library implements the
reparameterized version of model (6.8) in S. Because phenoModel is not selfstarting, initial values need to be provided for the parameters when this
function is used for estimation in S.
297
Because of the small number of concentrations recorded for each individual, the usual model-building approach of beginning the analysis with
an nlsList t cannot be used with the phenobarbital data and we must go
directly to an NLME t. The nonlinear mixed-eects model corresponding
to (6.8) representing the phenobarbital concentration yij measured at time
tij on individual i, following intravenous injections of dose Did at times tid ,
is expressed as
"
Did
exp [ exp (lCl i lV i ) (tij tid )] + ij ,
exp (lV i )
d:tid <tij
b
lCl i
= 1 + 1i = + bi , bi N (0, ) , ij N 0, 2 .
lV i
2
b2i
(6.9)
yij =
298
ApgarInd
1.0
lCl
0.5
0.0
-0.5
-1.0
0.5
1.0
1.5
2.0
Wt
2.5
2.0
2.5
3.0
3.5
<5
>= 5
ApgarInd
1.0
lV
0.5
0.0
-0.5
0.5
1.0
1.5
3.0
3.5
<5
>= 5
Starting values for the xed eects are obtained from Davidian and Giltinan
(1995, 6.6). The na.action argument in the nlme call is used to preserve
those rows with dose information. (These rows contain NA for the concentration.) The naPattern argument is used to remove these rows from the
calculation of the objective function in the optimization algorithm.
One of the questions of interest for the phenobarbital data is the possible
relationship between the pharmacokinetic parameters in (6.9) and the additional covariates available on the infants, birth weight and 5-minute Apgar
score. For the purposes of modeling the pharmacokinetic parameters, the
5-minute Apgar score is converted to a binary indicator of whether the
score is < 5 or 5, represented by the column ApgarInd in the Phenobarb
data frame.
Figure 6.16 contains plots of the estimated random eects from
fm1Pheno.nlme versus birth weight (Wt) and 5-minute Apgar score indicator
(ApgarInd). It is produced by
> fm1Pheno.ranef <- ranef( fm1Pheno.nlme, augFrame = T )
> plot( fm1Pheno.ranef, form = lCl ~ Wt + ApgarInd )
> plot( fm1Pheno.ranef, form = lV ~ Wt + ApgarInd )
299
The plots in Figure 6.16 clearly indicate that both clearance and volume
of distribution increase with birth weight. A linear model seems adequate
to represent the increase in lV with birth weight. For birth weights less than
2.5 kg, the increase in lCl seems linear. Because there are few infants with
birth weights greater than 2.5 kg in this data set, it is unclear whether the
linear relationship between lCl and Wt extends beyond this limit, but we
will assume it does. The Apgar score does not seem to have any relationship
with clearance and is not included in the model for the lCl xed eect. It is
unclear whether the Apgar score and the volume of distribution are related,
so we include ApgarInd in the model for lV to test for a possible relationship.
The updated t with covariates included in the xed-eects model is
then obtained with
> options( contrasts = c("contr.treatment", "contr.poly") )
> fm2Pheno.nlme <- update( fm1Pheno.nlme,
+
fixed = list(lCl ~ Wt, lV ~ Wt + ApgarInd),
+
start = c(-5.0935, 0, 0.34259, 0, 0),
+
control = list(pnlsTol = 1e-6) )
> #pnlsTol reduced to prevent convergence problems in PNLS step
> summary( fm2Pheno.nlme )
. . .
Random effects:
Formula: list(lCl ~ 1, lV ~ 1)
Level: Subject
Structure: Diagonal
lCl.(Intercept) lV.(Intercept) Residual
StdDev:
0.21599
0.17206
2.7374
Fixed effects: list(lCl ~ Wt, lV ~ Wt + ApgarInd)
Value Std.Error DF t-value p-value
lCl.(Intercept) -5.9574
0.12425 92 -47.947 <.0001
lCl.Wt 0.6197
0.07569 92
8.187 <.0001
lV.(Intercept) -0.4744
0.07258 92 -6.537 <.0001
lV.Wt 0.5325
0.04141 92 12.859 <.0001
lV.ApgarInd -0.0228
0.05131 92 -0.444 0.6577
. . .
300
The likelihood ratio tests for dropping either of the random eects in
fm3Pheno.nlme have very signicant p-values (< 0.0001), indicating that
both random eects are needed in the model to account for individual
eects.
A plot of the augmented predictions is not meaningful for the phenobarbital data, due to the small number of observations per individual, but
we can still assess the adequacy of the t with the plot of the observed
concentrations against the within-group tted values, produced with
> plot( fm3Pheno.nlme, conc ~ fitted(.), abline = c(0,1) )
and displayed in Figure 6.17. The good agreement between the observations
and the predictions attests the adequacy of the fm3Pheno.nlme model.
Exercises
301
70
60
50
40
30
20
10
10
20
30
40
50
60
The purpose of this chapter is to present the motivation for using NLME
models with grouped data and to set the stage for the following two chapters
in the book, dealing with the theory and computational methods for NLME
models (Chapter 7) and the nonlinear modeling facilities in the nlme library
(Chapter 8).
Exercises
1. The Loblolly data described in Appendix A.13 consist of the heights
of 14 Loblolly pine trees planted in the southern United States, measured at 3, 5, 10, 15, 20, and 25 years of age. An asymptotic regression
model, available in nlme as the self-starting function SSasymp (Appendix C.1), seems adequate to explain the observed growth pattern.
(a) Plot the data (using plot(Loblolly)) and verify that the same
growth pattern is observed for all trees. The similarity can be
emphasized by putting all the curves into a single panel using
plot(Loblolly, outer = ~1). What is the most noticeable difference among the curves?
(b) Fit a separate asymptotic regression model to each tree using nlsList and SSasymp. Notice that no starting values are
needed for the parameters, as they are automatically produced
by SSasymp.
302
Exercises
303
not attained for a model with a general (try it). Use a diagonal structure for (random = pdDiag(A1+lrc1+A2+lrc2~1)) and
examine the estimated standard errors for the random eects.
Can any of the random eects be dropped from the model? Does
this agree with your conclusions from the plot of the condence
intervals for the nlsList t?
(d) Update the nlme t in the previous item according to your previous conclusions for the random-eects model. Use anova to
compare the two models. Produce the condence intervals for
the parameters in the updated model. Do you think any further
random eects can be dropped from the model? If so, update
the t and compare it to the previous model using anova.
(e) Plot the standardized residuals versus the tted values for the
nal model obtained in the previous item. Do you observe any
patterns that contradict the models assumptions? Plot the observed concentrations versus the tted values by subject. Does
the tted model produce sensible predictions?
3. Data on the intensity of 23 large earthquakes in western North America between 1940 and 1980 were reported by Joyner and Boore (1981).
The data, included in the object Earthquake , are described in more
detail in Appendix A.8. The objective of the study was to predict the
maximum horizontal acceleration (accel) at a given location during
a large earthquake based on the magnitude of the quake (Richter)
and its distance from the epicenter (distance). These data are analyzed in Davidian and Giltinan (1995, 11.4, pp. 319326). The model
proposed by Joyner and Boore (1981) can be written as
log10 (accel) = 1 + 2 Richter
log10 distance2 + exp(3 ) 4 distance2 + exp(3 ).
(a) Plot the data and verify that acceleration measurements are
sparse or noisy for most of the quakes. No common attenuation
pattern is evident from the plot.
(b) No self-starting function is available in nlme for the Earthquake
data model. The estimates reported in Davidian and Giltinan
(1995) (1 = 0.764, 2 = 0.218, 3 = 1.845, and 4 = 5.657
103 ) can be used as initial estimates for an nlsList t (use
start = c(phi1 = -0.764, phi2 = 0.218, phi3 = 1.845, phi4 =
0.005657). However, due to sparse and noisy nature of the data,
convergence is not attained for any of the quakes in the nlsList
t (verify it).
(c) Fit an NLME model to the data with random eects for all
coecients and a diagonal , using as starting values for the
304
7
Theory and Computational Methods
for Nonlinear Mixed-Eects Models
This chapter presents the theory for the nonlinear mixed-eects model introduced in Chapter 6. A general formulation of NLME models is presented
and illustrated with examples. Estimation methods for tting NLME models, based on approximations to the likelihood function, are described and
discussed. The computational methods used in the nlme function to t
NLME models are also described. An extended class of nonlinear regression models with heteroscedastic, correlated errors, but with no random
eects, is presented.
The objective of this chapter is to give an overall description of the
theoretical and computational aspects of NLME models so as to allow one
to evaluate the strengths and limitations of such models in practice. It is not
the purpose of this chapter to present a thorough theoretical description of
NLME models. Such a comprehensive treatment of the theory of nonlinear
mixed-eects models can be found, for example, in Davidian and Giltinan
(1995) and in Vonesh and Chinchilli (1997).
Readers who are more interested in the applications of NLME models
and the use of the functions and methods in the nlme library to t such
models can, without loss of continuity, skip this chapter and go straight
to Chapter 8. If you decide to skip this chapter at a rst reading, it is
recommended that you return to it (especially 7.1) at a later time to get a
good understanding of the NLME model formulation and its assumptions
and limitations.
306
i = 1, . . . , M, j = 1, . . . , ni ,
(7.1)
bi N (0, ),
(7.2)
307
(7.3)
for i = 1, . . . , M , where
yi1
i1
i1
f (i1 , v i1 )
..
y i = ... , i = ... , i = ... , f i (i , v i ) =
,
.
f (ini , v ini )
yini
ini
ini
v i1
Ai1
B i1
(7.4)
v i = ... , Ai = ... , B i = ... .
v ini
Aini
B ini
1 0 0 0 1
1 0 0
1i
2i 0 1 0 0 2 0 1 0 b1i
=
3i 0 0 1 0 3 + 0 0 1 b2i ,
b3i
4i
4
0 0 0 1
0 0 0 . /0
1
. /0 1 .
/0
1 . /0 1 .
/0
1 bi
ij
Aij
11
bi N 0, 12
0
12
22
0
0
0 ,
33
B ij
ij N 0, 2 .
In this case, the individual coecients ij and the design matrices Aij = I
and B ij do not vary with time. The matrix for the random eects is
block-diagonal.
308
yij =
1i
+ ij ,
1 + exp [ (tij 2i ) /3i ]
1
2
3
4
5
1i
1 0 1 1 0 1 0 0 0 0 0 0 0
1 $ %
6
2i = 0 0 0 0 0 0 1 0 1 1 0 0 0 7 + 0 b1i ,
3i
8
0 0 0 0 0 0 0 0 0 0 1 0 1
0 . /0 1
. /0 1 .
/0
1
9 ./01 bi
B ij
Aij
ij
10
11
12
13
. /0 1
bi N (0, ) ,
ij |i N 0, 2 [E (yij |i )] .
309
tid , is expressed as
yij =
"
Did
exp [ exp (lCl i lV i ) (tij tid )] + ij ,
exp (lV i )
d:tid <tij
1
1 wi 0 0
1 0 b1i
2
lCl i
=
+
,
0 1 b2i
lV i
0 0 1 w i 3
. /0 1 . /0 1
/0
1
. /0 1 .
4
Aij
B ij
ij
bi
. /0 1
bi N 0, 11
0
0
22
,
ij N 0, 2 .
The correspondence between the xed eects, , and the coecient names
in the fm3Pheno.nlme object is: 1 = lCl.(Intercept), 2 = lCl.Wt, 3 =
lV.(Intercept), and 4 = lV.Wt. A diagonal matrix is used to represent
the independence between the random eects.
j = 1, . . . , Mi ,
k = 1, . . . , nij ,
(7.5)
where M is the number of rst-level groups, Mi is the number of secondlevel groups within the ith rst-level group, nij is the number of observations on the jth second-level group of the ith rst-level group, and ijk is a
normally distributed within-group error term. As in the single-level model,
f is a general, real-valued, dierentiable function of a group-specic parameter vector ijk and a covariate vector v ijk . It is nonlinear in at least
one component of ij . The second stage of the model expresses ij as
ijk = Aijk + B i,jk bi + B ijk bij ,
bi N (0, 1 ),
bij N (0, 2 ).
(7.6)
310
q1 -dimensional vectors with variancecovariance matrix 1 . The secondlevel random eects bij are q2 -dimensional independently distributed vectors with variancecovariance matrix 2 , assumed to be independent of
the rst-level random eects. The random eects design matrices B i,jk
and B ijk depend on rst- and second-level groups and possibly on the
values of some covariates at the kth observation. The within-group errors
ijk are independently distributed as N (0, 2 ) and are independent of the
random eects. The assumption of independence and homoscedasticity for
the within-group errors can be relaxed, as shown in 7.4.
We can express (7.5) and (7.6) in matrix form as
y ij = f ij ij , v ij + ij ,
(7.7)
ij = Aij + B i,j bi + B ij bij ,
for i = 1, . . . , M, j = 1, . . . , Mi , where
ij1
yij1
ij1
f (ij1 , v ij1 )
v ij1
.
..
f ij ij , v ij =
, v ij = .. ,
.
f (ijnij , v ijnij )
v ijnij
Aij1
B i,j1
B ij1
B i,jnij
B ijnij
Extensions of the NLME model to more than two levels of nesting are
straightforward. For example, with three levels of nesting the second-stage
model for the group-specic coecients is
ijkl = Aijkl + B i,jkl bi + B ij,kl bij + B ijkl bijk ,
bi N (0, 1 ),
bij N (0, 2 ),
bijk N (0, 3 ).
311
312
||
(2 2 )
(N +M q)/2
i=1
'
exp
y i f i (, bi ) + bi
2 2
2
dbi ,
(7.9)
where f i (, bi ) = f i [i (, bi ) , v i ].
Because the model function f can be nonlinear in the random eects, the
integral in (7.8) generally does not have a closed-form expression. To make
the numerical optimization of the likelihood function a tractable problem,
dierent approximations to (7.8) have been proposed. Some of these methods consist of taking a rst-order Taylor expansion of the model function f
around the expected value of the random eects (Sheiner and Beal, 1980;
Vonesh and Carter, 1992), or around the conditional (on ) modes of the
random eects (Lindstrom and Bates, 1990). Gaussian quadrature rules
have also been used (Davidian and Gallant, 1992).
We describe three dierent methods for approximating the likelihood
function in the NLME model. The rst, proposed by Lindstrom and Bates
(1990), approximates (7.8) by the likelihood of a linear mixed-eects
model. We call this the LME approximation. It is the basis of the estimation algorithm currently implemented in the nlme function. The second
method uses a Laplacian approximation to the likelihood function, and
313
the last method uses an adaptive Gaussian quadrature rule to improve the
Laplacian approximation. The LME, Laplacian, and adaptive Gaussian approximations have increasing degrees of accuracy, at the cost of increasing
computational complexity. The three approximations to the NLME likelihood are discussed and compared in Pinheiro and Bates (1995).
Lindstrom and Bates Algorithm
The estimation algorithm described by Lindstrom and Bates (1990) alternates between two steps, a penalized nonlinear least squares (PNLS) step,
and a linear mixed eects (LME) step, as described below. We initially
consider the alternating algorithm for the single-level NLME model (7.1).
In the PNLS step, the current estimate of (the precision factor) is
held xed, and the conditional modes of the random eects bi and the
conditional estimates of the xed eects are obtained by minimizing a
penalized nonlinear least squares objective function
M
"
y i f i (, bi ) + bi
.
(7.10)
i=1
b , respectively. Letting
i
(w)
f i
&
,
Xi =
T (w) ,b(w)
i
(w)
i
w
(w)
f i
Zi =
,
bTi (w) ,b(w)
i
(7.11)
(w) (w)
(w)
(w) ,
(w)
&i
(w)
= y i f i (
bi ) + X
+Z
i bi ,
(7.12)
1 T (w)
(w)
where i () = I + Z
Zi
. This log-likelihood is identical to
i
that of a linear mixed-eects model in which the response vector is given
(w) and the xed- and random-eects design matrices are given by
by w
(w)
&
(w) , respectively. Using the results in 2.2, one can express the
X
and Z
optimal values of and 2 as functions of and work with the proled
log-likelihood of , greatly simplifying the optimization problem.
T
314
Lindstrom and Bates (1990) also proposed a restricted maximum likelihood estimation method for , which consists of replacing the log-likelihood
in the LME step of the alternating algorithm by the log-restricted-likelihood
2
R
LME , | y =
M
1"
2 (w)T 1
(w)
2
&
&
log X i
i ()X i . (7.13)
LME () , , | y
2 i=1
(w)
(w)
(w) and
&i depends on both
Note that, because X
bi , changes in either the xed eects model or the random eects model imply changes in
the penalty factor for the log-restricted-likelihood (7.13). Therefore, logrestricted-likelihoods from NLME models with dierent xed or random
eects models are not comparable.
The algorithm alternates between the PNLS and LME steps until a convergence criterion is met. Such alternating algorithms tend to be more
ecient when the estimates of the variancecovariance components (
and 2 ) are not highly correlated with the estimates of the xed eects
(). Pinheiro (1994) has shown that, in the linear mixed-eects model, the
maximum likelihood estimates of and 2 are asymptotically independent
of the maximum likelihood estimates of . These results have not yet been
extended to the nonlinear mixed-eects model (7.1).
Lindstrom and Bates (1990) only use the LME step to update the estimate of . However, the LME step also produces updated estimates of
and the conditional modes of bi . Thus, one can iterate LME steps by
re-evaluating (7.11) and (7.12) (or (7.13) for the log-restricted-likelihood)
at the updated estimates of and bi , as described in Wolnger (1993).
Because the updated estimates correspond to the values obtained in the
rst iteration of a GaussNewton algorithm for the PNLS step, iterated
LME steps will converge to the same values as the alternating algorithm,
though possibly not as quickly.
Wolnger (1993) also shows that, when a at prior is assumed for , the
LME approximation to the log-restricted-likelihood (7.13) is equivalent to a
Laplacian approximation (Tierney and Kadane, 1986) to the integral (7.9).
The alternating algorithm and the LME approximation to the NLME loglikelihood can be extended to multilevel models. For example, for an NLME
model with two levels of nesting, the PNLS step consists of minimizing the
penalized nonlinear least-squares function
Mi
M "
"
y ij f ij (, bi , bij )2 + 2 bij 2 + 1 bi 2
(7.14)
i=1
j=1
to obtain estimates for the xed eects and the conditional (on 1 and
2 ) modes of the random eects bi and bij .
315
Letting
(w)
&
X
ij =
f ij
T
(w)
(w)
,bi
(w)
,bij
(w)
Z
ij =
(w)
Z
i,j =
f ij
bTij
f ij
bT
i
,
(w)
(w)
,bi
(w)
,bij
,
(w)
(w)
,bi
(w)
,bij
(w) (w)
(w) (w)
(w)
(w) ,
(w) + Z
(w)
&ij
(w)
(w)
ij = y ij f ij (
w
bi ,
bij ) + X
+Z
i,j bi
ij bij ,
(w)
(w)
(w)
&
i1
Z
w
X
i,1
i1
(w)
(w) = .. , w
&(w) = .. , Z
i =
X
i
i
.
.
.. ,
(w)
(w)
(w)
iMi
w
&
Z
X
iMi
i,Mi
(7.15)
the approximate log-likelihood function used to estimate 1 and 2 in the
two-level NLME models is
M
1"
N
LME , 2 , 1 , 2 | y = log 2 2
{log |i (1 , 2 )|
2
2 i=1
T
2
(w)
(w)
(w)
(w)
&i 1 (1 , 2 ) w
&i
i X
i X
+ 2 w
,
i
T
6 i (w) 1 T (w)T
1 T (w)
(w)
+ M
where i (1 , 2 ) = I+Z
i 1 1 Z i
j=1 Z ij 2 2 Z ij
and denotes the direct sum operator. The corresponding log-restrictedlikelihood is
2
2
R
LME , 1 , 2 | y = LME (1 , 2 ) , , 1 , 2 | y
M
2 (w)T 1
(w)
1"
&
&
log X i
i (1 , 2 )X i .
2 i=1
316
g(, , y i , bi )
g (, , y i , bi ) =
,
bi
2 g(, , y i , bi )
g (, , y i , bi ) =
,
bi bTi
(7.16)
+
M
T
9
q
2 2
1
exp 22 bi bi g (, , y i , bi ) bi bi dbi
2
i=1
7
8M
M
1
"
N
2
M
2 2
1
= 2
|| exp 22
g(, , y i ,
bi )
(,
,
y
,
b
)
.
g
i i
i=1
i=1
The Hessian
2 f i (, bi )
y i f i (,
bi =
bi )
g , , y i ,
T
bi bi
bi
f i (, bi ) f i (, bi )
T
+
+
bi
bT
bi
bi
317
is usually negligible compared to that of f i (, bi )/bi |bi f i (, bi )/bTi
(Bates and Watts, 1980). Therefore, we use the approximation
g , , y i ,
bi G (, , y i )
f i (, bi ) f i (, bi )
=
+ T .
T
bi
bi
bi
bi
bi
(7.18)
log |G (, , y i )| + 2
g , , y i ,
bi
. (7.19)
2 i=1
i=1
Because
bi does not depend on 2 , for given and the maximum
likelihood estimate of 2 (based upon LA ) is
2 (, , y) =
2 =
M
"
g , , y i ,
bi /N.
i=1
2
2 i=1
If the model function f is linear in the random eects, then the modied
Laplacian approximation is exact because the second-order Taylor expansion in (7.17) is exact when f i (, bi ) = f i () + Z i () bi .
There does not yet seem to be a straightforward generalization of the
concept of restricted maximum likelihood to NLME models. The diculty
is that restricted maximum likelihood depends heavily upon the linearity
of the xed eects in the model function, which does not occur in nonlinear
models. Lindstrom and Bates (1990) circumvented that problem by using
318
bi
bi1
..
.
=
baug
,
i
biMi
i = 1, . . . , M
denote the augmented random eects vector for the ith rst-level group,
containing the rst-level random eects bi and all the second-level random
eects bij pertaining to rst-level i. The two-level NLME likelihood can
then be expressed as
p(yi | , 2 , 1 , 2 ) =
g(, 1 , 2 , y i , baug
i )
=
,
g
baug
i
2 g(, 1 , 2 , y i , baug
i )
g (, 1 , 2 , y i , baug
,
i )=
aug
aug T
bi (bi )
(, 1 , 2 , y i , baug
i )
,
y
,
b
g (, 1 , 2 , y i , baug
1
2
i i
i
T
aug
aug
aug
1 aug
+
g , 1 , 2 , y i ,
bi
bi
baug
.
bi bi
i
2
T
We note that 2 g(, 1 , 2 , y i , baug
i )/bij bik = 0 for any j = k and
use the same reasoning as in (7.18), to approximate the matrix g (, , y i ,
bi )
319
by
G1 G2
g , 1 , 2 y i ,
bi G (, 1 , 2 , y i ) =
, where
GT2 G3
f i (, baug
f i (, baug
i )
i )
+ T1 1 ,
G1 =
T
aug
bi
bi
bi
bi
f iMi (,bi ,biMi )
)
f i (, baug
f
(,b
,b
)
i i1
i
i1
G2 =
aug ,
T
T
bi1
biM
b
i
bi
bi
i
Mi
:
f ij (, bi , bij )
f ij (, bi , bij )
T
+
G3 =
,
2
2
bij
bTij
bi ,bij
j=1
bi ,bij
f i1 (, bi , bi1 )
..
f i (, baug
.
i )=
.
f iMi (, bi , biMi )
The modied, proled Laplacian approximation to the log-likelihood of
the two-level NLME model is then given by
2 %
N$
1 + log (2) + log
+ M log |1 |
2
M
M
"
1"
Mi log |2 |
log |G (, 1 , 2 , y i )| ,
+
2 i=1
i=1
LAp (, 1 , 2 ) =
M
aug
/N. This formulation can be exwhere
2 = i=1 g , 1 , 2 , y i ,
bi
tended to multilevel NLME models with an arbitrary number of levels.
The Laplacian approximation generally gives more accurate estimates
than the alternating algorithm, as it uses an expansion around the estimated random eects only, while the LME approximation in the alternating algorithm uses an expansion around the estimated xed and random
eects. Because it requires solving a dierent penalized nonlinear leastsquares problem for each group in the data and its objective function cannot be proled on the xed eects, the Laplacian approximation is more
computationally intensive than the alternating algorithm. The algorithm
for calculating the Laplacian approximation can be easily parallelized, because the individual PNLS problems are independently optimized.
Adaptive Gaussian Approximation
Gaussian quadrature rules are used to approximate integrals of functions
with respect to a given kernel by a weighted average of the integrand evaluated at predetermined abscissas. The weights and abscissas used in Gaussian quadrature rules for the most common kernels can be obtained from
the tables of Abramowitz and Stegun (1964) or by using an algorithm
proposed by Golub (1973) (see also Golub and Welsch (1969)). Gaussian
320
$
%
exp g (, , y i , bi ) /2 2 dbi = q |G (, , y i )|1/2
;
<
2
exp g , , y i ,
bi + G1/2 (, , y i ) z /2 2 + z /2
2
exp z /2 dz
1/2
q |G (, , y i )|
NGQ
NGQ
<
;
"
"
exp g , , y i ,
bi + G1/2 (, , y i ) z j /2 2
j1 =1
jq =1
q
+ z j 2 /2
k=1
w jk ,
321
1/2
where [G (, , y i )]
denotes a square root of G (, , y i ) and z j =
T
zj1 , . . . , zjq .
The adaptive Gaussian approximation to the log-likelihood function in
the single-level NLME model is then
AGQ , 2 , | y = N2 log 2 2 + M log ||
M
"
"
log |G (, , y i )|
;
exp g , , y i ,
bi + G1/2 (, , y i ) z j /2 2
i=1
M
"
i=1
NGQ
log
1
2
q
<
z j /2
!
w jk
k=1
The one point (i.e., NGQ = 1) adaptive Gaussian quadrature approximation is simply the modied Laplacian approximation (7.19), because in this
case z1 = 0 and w1 = 1. The adaptive Gaussian quadrature also gives
the exact log-likelihood when the model function f is linear in the random
eects b.
The adaptive Gaussian approximation can be made arbitrarily accurate
q
grid
by increasing the number of abscissas, NGQ . However, because NGQ
points are used to calculate the adaptive Gaussian quadrature for each
group, it quickly becomes prohibitively computationally intensive as the
number of abscissas increases. In practice NGQ 7 generally suces and
NGQ = 1 often provides a reasonable approximation (Pinheiro and Bates,
1995).
The adaptive Gaussian approximation can be generalized to multilevel
NLME models, using the same steps as in the multilevel Laplacian approximation. For example, the adaptive Gaussian approximation to the
log-likelihood of a two-level NLME model is
M
"
N
Mi log |2 |
AGQ , 2 , 1 , 2 | y = log 2 2 + M log |1 | +
2
i=1
NGQ
M
M
;
"
"
"
1
log |G (, 1 , 2 , y i )| +
log
exp g , 1 , 2 , y i ,
2 i=1
i=1
j
!
i q2
< q1 +M
aug
w jk ,
bi + G1/2 (, 1 , 2 , y i ) z j /2 2 + z j 2 /2
k=1
T
where z j = zj1 , . . . , zjq1 +M q2 . In this case, the number of grid points for
q1 +Mi q2
the ith rst-level group is NGQ
, so that the computational complexity
of the calculations increases exponentially with the number of second-level
322
Inference
Inference on the parameters of an NLME model estimated via the alternating algorithm is based on the LME approximation to the log-likelihood
function, dened in 7.2.1. Using this approximation at the estimated values
of the parameters and the asymptotic results for LME models described in
2.3 we obtain standard errors, condence intervals, and hypothesis tests
for the parameters in the NLME model. We use the single-level NLME
model of 7.1.1 to illustrate the inference results derived from the LME approximation to the log-likelihood. Extensions to multilevel NLME models
are straightforward.
Under the LME approximation, the distribution of the (restricted) max of the xed eects is
imum likelihood estimators
N , 2
M
"
81
T
&i 1 X
&i
X
i
(7.20)
i=1
Ti , with X
i 1 T Z
&i and Z
i are dened as in (7.11).
where i = I + Z
The standard errors included in the summary method for nlme objects are
obtained from the approximate variancecovariance matrix in (7.20). The t
and F tests reported in the summary method and in the anova method with
a single argument are also based on (7.20). The degrees-of-freedom for t
and F tests are calculated as described in 2.4.2.
Now let denote an unconstrained set of parameters that determine the
precision factor . The LME approximation is also used to provide an
log
approximate distribution for the (RE)ML estimators (,
)T . We use
2
log in place of to give an unrestricted parameterization for which the
1
N
, I (, ) ,
log
log
2
LMEp / T
2 LMEp / log T
,
I (, ) =
2
2
2
LMEp / log
LMEp / log
323
(7.21)
where LMEp = LMEp (, ) denotes the LME approximation to the loglikelihood, proled on the xed eects, and I denotes the empirical information matrix. The same approximate distribution is valid for the REML
estimators with LMEp replaced by the log-restricted-likelihood R
LME dened in (7.13).
In practice, and 2 are replaced by their respective (RE)ML estimates in the expressions for the approximate variancecovariance matrices
in (7.20) and (7.21). The approximate distributions for the (RE)ML estimators are used to produce the condence intervals reported in the intervals
method for nlme objects.
The LME approximate log-likelihood is also used to compare nested
NLME models through likelihood ratio tests, as described in 2.4.1. In the
case of REML estimation, only models with identical xed and random&i matrices depend on
eects structures can be compared, because the X
and the
both
bi . The same recommendations stated in 2.4.1, on the
use of likelihood ratio tests for comparing LME models, remain valid for
likelihood ratio tests (based on the LME approximate log-likelihood) for
comparing NLME models. Hypotheses on the xed eects should be tested
using t and F tests, because likelihood ratio tests tend to be anticonservative. Likelihood ratio tests for variancecovariance parameters tend to be
somewhat conservative, but are generally used to compare NLME models
with nested random eects structures. Information criterion statistics, for
example, AIC and BIC, based on the LME approximate log-likelihood are
also used to compare NLME models.
The inference results for NLME models based on the LME approximation
to the log-likelihood are approximately asymptotic, making them less
reliable than the asymptotic inference results for LME models described in
2.3.
Predictions
As with LME models, tted values and predictions for NLME models
may be obtained at dierent levels of nesting, or at the population level.
Population-level predictions estimate the expected response when the random eects are equal to their mean value, 0. For example, letting xh represent a vector of xed-eects covariates and v h a vector of other model
covariates, the population prediction for the corresponding response yh estimates f (xTh , v h ).
324
Predicted values at the k th level of nesting estimate the conditional expectation of the response given the random eects at levels k and with the
random eects at higher levels of nesting set to zero. For example, letting
z h (i) denote a vector of covariates corresponding to random eects associated with the i th group at the rst level of nesting, the level-1 predictions
estimate f (xTh + z h (i)T bi , v h ). Similarly, letting z h (i, j) denote a covariate vector associated with the j th level-2 group within the i th level-1 group,
the level-2 predicted values estimate f (xTh + z h (i)T bi + z h (i, j)T bij , v h ).
This extends naturally to an arbitrary level of nesting.
The (RE)ML estimates of the xed eects and the conditional modes of
the random eects, which are estimated Best Linear Unbiased Estimates
(BLUP s) of the random eects in the LME approximate log-likelihood,
are used to obtain predicted values for the response. For example, the
population, level-1, and level-2 predictions for yh are
v h ),
yh = f (xTh ,
T
yh (i) = f (xh + z h (i)T
bi , v h ),
T
T
yh (i, j) = f (xh + z h (i) bi + z h (i, j)T
bi,j , v h ).
y i f i (, bi )2 + bi
.
(7.22)
i=1
f i (, bi )
.
f i (, bi ) =
bi
325
y i f i (, bi )2 .
(7.23)
i=1
f () f (
f
(w)
.
)+
T (w)
(w+1)
(w+1)
(w) for the wth iteration is
The parameter increment
=
calculated as the solution of the least-squares problem
2
f
(w)
(w)
y f (
.
T (w)
Step-halving is used at each GaussNewton iteration to ensure that the
updated parameter estimates result in a decrease of the objective function.
(w+1)
(w) +
That is, the new estimate is set to
(w) ,
of the objective function is calculated. If it is less than the value at
the value is retained and the algorithm proceed to the next step, or declares
(w+1)
(w) +
convergence. Otherwise, the new estimate is set to
/2 and the
procedure is repeated, with the increment being halved until a decrease in
the objective function is observed or some predetermined minimum step
size is reached.
The GaussNewton algorithm is used to estimate and the bi in the
PNLS step of the alternating algorithm. Because of the loosely coupled
structure of the PNLS problem (Soo and Bates, 1992), ecient nonlinear
least-squares algorithms can be employed.
The derivative matrices for the GaussNewton optimization of (7.23) are,
for i = 1, . . . , M ,
7
8
(w)
(w)
f i (, bi |)
&
X
i =
i
=X
,
(w) (w)
T
0
,bi
8
7
(w)
(w)
f i (, bi |)
i = Zi
,
=Z
(w) (w)
bTi
,b
i
326
&i and Z
(w)
with X
dened as in (7.11). The least-squares problem to be
i
solved at each GaussNewton iteration is
2
M
"
(w) (w)
(w)
(w)
(w)
(w)
y
Xi
Zi
bi bi
i f i , bi
i=1
or, equivalently,
M
"
(w) 2
(w)
(w)
i X
Z
b
w
i ,
i
i
where
i=1
(w)
i
w
(w)
i
w
,
0
(7.24)
0
0
where the reduction to triangular form is halted after the rst q columns.
The numbering scheme used for the components in (7.25) is the same introduced for the LME model in 2.2.3. Because is assumed to be of full
rank, so is the upper-triangular matrix R11(i) in (7.25).
Forming another orthogonal-triangular decomposition
R00(1) c0(1)
..
.. = Q R00 c0
(7.26)
.
0
.
0
c1
R00(M ) c0(M )
and noticing that the Q1(i) and Q0 are orthogonal matrices, we can rewrite
(7.24) as
M
"
c1(i) R11(i) bi R10(i) 2 + c0 R00 |2 + c1 2 .
(7.27)
i=1
00
bi = R1
11(i) c1(i) R10(i) ,
i = 1, . . . , M.
(7.28)
327
Mi
M "
"
2
2
(7.29)
y ij f ij (, bi , bij )2 + 2 bij + 1 bi
i=1
j=1
Mi
M "
"
(w)
&(w) Z
(w) bi Z
(w) bij 2 + 2 bij 2 ] + 1 bi 2 ,
ij X
[w
ij
i,j
ij
i=1
j=1
(7.30)
(w)
(w) &
(w)
(w)
ij , X
with w
ij , Z i,j , and Z ij dened as in (7.15). To solve it eciently,
we rst consider the orthogonal-triangular decomposition
8
7
(w)
(w) X
&(w) w
(w) Z
R
R
c
R
22(ij)
21(ij)
20(ij)
2(ij)
Z
ij
ij
i,j
ij
,
= Q2(ij)
0
R11(i) R10(i) c1(i)
2
0
0
0
where the reduction to triangular form is halted after the rst q2 columns.
Because 2 is assumed of full rank, so is R22(ij) . We then form a second
orthogonal-triangular decomposition for each rst-level group
R10(i1)
c1(i1)
R11(i1)
..
..
..
R11(i) R10(i) c1(i)
.
.
.
,
= Q1(i)
0
R00(i) c0(i)
R
R
c
11(iMi )
10(iMi )
1(iMi )
where the reduction to triangular form is stopped after the rst q1 columns.
The 1 matrix is assumed of full rank and, as a result, so is R11(i) . A nal
orthogonal-decomposition, identical to (7.26), is then formed.
Because the matrices Q2(ij) , Q1(i) , and Q0 are orthogonal, (7.30) can be
re-expressed as
Mi
M "
"
c2(ij) R20(ij) R21(ij) bi R22(ij) bij 2
i=1
j=1
2 <
+ c1(i) R10(i) R11(i) bi + c0 R00 2 ,
328
00
,
c
bi = R1
1(i)
10(i)
11(i)
i = 1, . . . , M,
bij = R1
22(ij) (c2(ij) R21(ij) bi R20(i) ), i = 1, . . . , M, j = 1, . . . , Mi .
(7.31)
The GaussNewton increments are then obtained as the dierence between
(w)
(w) ,
the least-squares estimates (7.31) and the current estimates
bi , and
(w)
bij , with step-halving used to ensure that the objective function (7.29) decreases. This extends naturally to multilevel NLME models with arbitrary
number of levels.
The eciency of the GaussNewton algorithm described in this section
derives from the fact that, at each iteration, the orthogonal-triangular decompositions are performed separately for each group and then once for the
xed eects. This allows ecient memory allocation for storing intermediate results and reduces the numerical complexity of the decompositions.
Also, the matrix inversions required to calculate the GaussNewton increments involve only upper-triangular matrices of small dimension, which
are easy to invert. (In fact, although (7.28) and (7.31) are written in terms
of matrix inverses, such as R1
00 , the actual calculation performed is the
= c0 , which is even
solution of the triangular system of equations R00
simpler.)
329
i = Ai + B i bi ,
i N 0, 2 i .
bi N (0, ),
(7.32)
i = i
1/2
1/2
1
= i
i
and
T /2
Letting
T /2
y i = i
f i (i , v i ) =
i =
yi ,
T /2
i
f i (i , v i ) ,
T /2
i
i ,
(7.33)
330
T /2
T /2
1/2
= N 0, 2 I , we can
and noting that i N i
0, 2 i
i i
rewrite (7.32) as
y i = f i (i , v i ) + i ,
i = A i + B i bi ,
bi N (0, ), i N 0, 2 I .
That is, y i is described by a basic NLME model.
T /2
y i is
Because the dierential of the linear transformation y i = i
1/2
simply dy i = |i |
dy i , the log-likelihood function corresponding to
the extended NLME model (7.32) is expressed as
M
"
2
log p y i |, 2 , ,
, , , |y =
i=1
M
"
M
1"
2
=
log p y i |, , ,
log |i |
2 i=1
i=1
M
1"
2
= , , , |y
log |i | .
2 i=1
The log-likelihood function , 2 , , |y corresponds to a basic NLME
model with model function f i and, therefore, the approximations presented
in 7.2.1 can be applied to it. The inference results described in 7.2.2 also
remain valid.
Alternating Algorithm
The PNLS step of the alternating algorithm for the extended NLME model
consists of minimizing, over and bi , i = 1, . . . , M , the penalized nonlinear least-squares function
M
"
2
y i f i (, bi )2 + bi =
i=1
9
M +
2
"
T /2
[y i f i (, bi )] + bi 2 .
i
i=1
i
w
= y i f i (
(w)
(w)
(w)
&
,
bi ) + X
i
(w)
T /2 (w)
(w) + Z
(w)
i ,
w
bi = i
i
(w)
331
&i , Z
i , and w
i dened as in (7.11).
with X
The GaussNewton algorithm for the PNLS step is identical to the algo(w)
&(w) , Z
(w) , and w
rithm described in 7.3, with X
replaced, respectively,
(w)
(w)
(w)
(w) , and w
&
i . The LME approximation to the log-likelihood
,Z
by X
i
i
function of the extended single-level NLME model is
M
1"
LME , 2 , , | y = LME , 2 , , | y
log |i | ,
2 i=1
which has the same form as the log-likelihood of the extended single-level
LME model described in 5.1. The log-restricted-likelihood for the extended
NLME model is similarly dened.
G (, , , y i ) =
bi bTi
bi
f i (, bi )
1 f i (, bi )
i
+ T .
T
bi
bi
bi
bi
The modied Laplacian approximation to the log-likelihood of the extended single-level NLME model is then given by
N
LA , 2 , , , | y = log 2 2 + M log ||
2
2
'M
M
M
"
"
"
1
2
log |G (, , , y i )| +
g (, , , yi , bi ) 12
log |i |
i=1
i=1
i=1
332
N
2
log 2 2 + M log ||
1
2
M
"
log |G (, , , y i )|
i=1
M
"
"
NGQ
log(
1
2
exp{g [, , , y i ,
bi + (G )
(, , , y i ) z j ]/2 2
i=1
2
+ z j /2}
q
k=1
1"
w jk )
log |i | .
2 i=1
M
The same comments and conclusions presented in 5.2 for the case when
the within-group variance function depends on the xed eects and/or the
random eects also apply to the extended NLME model. As in the LME
case, to keep the optimization problem feasible, an iteratively reweighted
scheme is used to approximate the variance function. The xed and random
eects used in the variance function are replaced by their current estimates
and held xed during the log-likelihood optimization. New estimates for the
xed and random eects are then produced and the procedure is repeated
until convergence. In the case of the alternating algorithm, the estimates
for the xed and random eects obtained in the PNLS step are used to
calculate the variance function weights in the LME step. If the variance
function does not depend on either the xed eects or the random eects,
then no approximation is necessary.
333
i = 1, . . . , M, j = 1, . . . , ni ,
ij = Aij .
(7.34)
334
2 (, ) =
M
"
y i f i () /N,
(7.36)
i=1
2 (, )
so that the proled log-likelihood, obtained by replacing 2 with
in (7.35), is
'
!
M
"
1
2
A GaussSeidel algorithm (Thisted, 1988, 3.11.2) is used with the proled log-likelihood (7.37) to obtain the maximum likelihood estimates of
(w) of , a new estimate
(w) for
and . Given the current estimate
is produced by maximizing (,
(w)
()
= arg min
M
"
y i f i () ,
i=1
for which we can use a standard GaussNewton algorithm. If k is the it (k) =
(k) ((w) ) is the current
eration counter for this algorithm and
estimate of , then the derivative matrices
&(k) = T /2 X
&(k) = f i ()
&(k) ,
,
X
X
i
i
i
i
T (k)
335
(k+1) for
as the (ordinary) leastprovide the GaussNewton increment
squares solution of
2
M
"
(k)
(k)
&
w
i X i ,
i=1
(k)
(k) ). Orthogonal-triangular decomposition methi
where w
= y i f i (
ods similar to the ones described in 7.3 can be used to obtain a compact
and numerically ecient implementation of the GaussNewton algorithm
for estimating . The derivation is left to the reader as an exercise.
Inference on the parameters of the GNLS model generally relies on
classical asymptotic theory for maximum likelihood estimation (Cox and
Hinkley, 1974, 9.2), which states that, for large N , the MLEs are approximately normally distributed with mean equal to the true parameter values
and variancecovariance matrix given by the inverse of the information matrix. Because E[ 2 (, 2 , )/T ] = 0 and E[ 2 (, 2 , )/ 2 ] =
0, the expected information matrix for the GNLS likelihood is block-diagonal
and the MLE of is asymptotically uncorrelated with the MLEs of and
2 .
The approximate distributions for the MLEs in the GNLS model which
are used for constructing condence intervals and hypothesis tests are
7M
81
" T
.
N , 2
& 1 X
&i
,
X
i
i
i=1
N
, I 1 (, ) ,
log
log
2
2 / log T
/T
,
I (, ) =
2 / log
2 / 2 log
(7.38)
2 =
M
"
2
T /2
/(N p),
f
y
i
i
i
i=1
336
8
Fitting Nonlinear Mixed-Eects
Models
338
Because all trees are measured on the same occasions these are balanced,
longitudinal data. It is clear from Figure 8.1 that a tree eect is present,
and we will take this into account when tting nlme or nlsList models.
To illustrate some of the details of tting nls models, we will temporarily
ignore the tree eect and t a single logistic model to all the data. Recall
from 6.1 that this model expresses the trunk circumference yij of tree i at
age xij for i = 1, . . . , 5 j = 1, . . . , 7 as
yij =
1
+ ij ,
1 + exp [ (tij 2 ) /3 ]
(8.1)
339
200
150
100
50
200
600
1000
1400
FIGURE 8.1. Circumference of ve orange trees from a grove in southern California over time. The measurement is probably the circumference at breast height
commonly used by foresters. Points corresponding to the same tree are connected
by lines.
where Asym = 1 , xmid = 2 , and scal = 3 . Unlike in the linear case, the
parameters must be declared explicitly in a nonlinear model formula and
an intercept is not assumed by default.
An alternative approach is to write an S function representing the logistic
model as, say,
> logist <+
function(x, Asym, xmid, scal) Asym/(1 + exp(-(x - xmid)/scal))
340
+
c("Asym", "xmid", "scal"), function(x, Asym, xmid, scal){} )
> Asym <- 180; xmid <- 700; scal <- 300
> logist( Orange$age[1:7], Asym, xmid, scal )
[1] 22.617 58.931 84.606 132.061 153.802 162.681 170.962
attr(, "gradient"):
Asym
xmid
scal
[1,] 0.12565 -0.065916 0.127878
[2,] 0.32739 -0.132124 0.095129
[3,] 0.47004 -0.149461 0.017935
[4,] 0.73367 -0.117238 -0.118802
[5,] 0.85446 -0.074616 -0.132070
[6,] 0.90378 -0.052175 -0.116872
[7,] 0.94979 -0.028614 -0.084125
As mentioned in 6.1, one important dierence between linear and nonlinear regression is that the nonlinear models require starting estimates for
the parameters. Determining reasonable starting estimates for a nonlinear
regression problem is something of an art, but some general recommendations are available (Bates and Watts, 1988, 3.2). We return to this issue
in 8.1.2, where we describe the selfStart class of model functions.
Because the parameters in the logistic model (8.1) have a graphical interpretation, we can determine initial estimates from a plot of the data.
In Figure 8.1 it appears that the mean asymptotic trunk circumference is
around 170 mm and that the trees attain half of their asymptotic trunk
circumference at about 700 days of age. Therefore, we use the inital estimates of 1 = 170 for the asymptotic trunk circumference and 2 = 700
for the location of the inection point. To obtain an initial estimate for 3 ,
we note that the logistic curve reaches approximately 3/4 of its asymptotic
value when x = 2 + 3 . Inspection of Figure 8.1 suggests that the trees
attain 3/4 of their nal trunk circumference at about 1200 days, giving an
intial estimate of 3 = 500.
We combine all this information in the following call to nls
> fm1Oran.nls <- nls(circumference ~ logist(age, Asym, xmid, scal),
+
data = Orange, start = c(Asym = 170, xmid = 700, scal = 500) )
Our initial estimates are reasonable and the call converges. Following the
usual framework for modeling functions in S, the object fm1Oran.nls produced by the call to nls is of class nls, for which several methods are
available to display, plot, update, and extract components from a tted
object. For example, the summary method provides information about the
estimated parameters.
> summary( fm1Oran.nls )
Formula: circumference ~ logist(age, Asym, xmid, scal)
Parameters:
Value Std. Error t value
Asym 192.68
20.239 9.5203
Standardized residuals
341
-1
50
100
150
Fitted values
FIGURE 8.2. Scatter plots of standardized residuals versus tted values for
fm1Oran.nls, a nonlinear least squares t of the logistic growth model to the
entire orange tree data set.
xmid 728.71
scal 353.49
107.272
81.460
6.7931
4.3395
The nal estimates are close to the initial values derived from Figure 8.1.
The standard errors for the parameter estimates are relatively large, suggesting that there is considerable variability in the data.
The plot method for nls objects (which is included with the nlme library),
has a syntax similar to the lme and gls plot methods described in 4.3.1
and 5.4. By default, the plot of the standardized residuals versus tted
values, shown in Figure 8.2, is produced.
> plot( fm1Oran.nls )
# Figure 8.2
The variability in the residuals increases with the tted values, but, in this
case, the wedge-shaped pattern is due to the correlation among observations
in the same tree and not to heteroscedastic errors. We can get a better
understanding of the problem by looking at the plot of the residuals by
tree presented in Figure 8.3.
> plot(fm1Oran.nls, Tree ~ resid(.), abline = 0)
# Figure 8.3
342
Tree
2
5
1
3
-40
-20
20
40
Residuals
The residuals are mostly negative for trees 1 and 3 and mostly positive for
trees 2 and 4, giving strong evidence that a tree eect should be included
in the model.
A simple approach to account for a tree eect is to allow dierent parameters for each tree, resulting in separate nls ts. This is the approach
used in the nlsList function, described in 8.1.3, which provides a valuable
tool for model building, but usually produces overparametrized models. As
illustrated in Chapter 6, nonlinear mixed-eects models strike a balance
between the simple nls model and the overparametrized nlsList model, by
allowing random eects to account for among-group variation in some of
the parameters, while preserving a moderate number of parameters in the
model.
343
Especially when the same model will be applied to several similar sets of
data, as is the case in many of the examples we consider here, we do not
want to have to manually repeat the same series of steps in determining
starting estimates. A more sensible approach is to encapsulate the steps
used to derive initial estimates for a given nonlinear model into a function
that can be used to generate intial estimates from any dataset. Self-starting
nonlinear regression models are S functions that contain an auxillary function to calculate the initial parameter estimates. They are represented in
S as selfStart objects.
The S objects of the selfStart class are composed of two functions; one
that evaluates the nonlinear regression model itself and an auxillary function, called the initial attribute, that determines starting estimates for the
models parameters from a set of data. When a selfStart object for a model
is available, there is no need to determine starting values for the parameters. The user can simply specify the formula for the model and the data to
which it should be applied. From a users point of view, tting self-starting
nonlinear regression models is nearly as easy as tting linear regression
models.
We illustrate the construction and use of self-starting models by building
function one for the logistic model. The basic steps in the calculation of
initial estimates for the logistic model (8.1) from the Orange dataset are:
1. Sort/average: sort the data according to the x variable and obtain
the average response y for each unique x.
2. Asymptote: use the maximum y as an initial value 1 for the asymptote.
3. Inection point: use the x corresponding to 0.51 as an initial value
2 for the inection point.
4. Scale: use the dierence between the x corresponding to 0.751 and
2 as an initial value 3 for the growth scale
Step 1, sort/average, was carried out implicitly in our graphical derivation of initial values for the orange trees example, but is now explicitly
incorporated in the algorithm.
Two auxillary functions, sortedXyData and NLSstClosestX, included in the
nlme library, are particularly useful for constructing self-starting models.
The sortedXyData function performs the sort/average step. It takes the
arguments x, y, and data and returns a data.frame with two columns: y,
the average y for each unique value of x, and x, the unique values of x,
sorted in ascending order. The arguments x and y can be numeric vectors
or they can be expressions or strings to be evaluated in data. The returned
object is of class sortedXyData. For example, the pointwise averages of the
growth curves of the orange trees are obtained with
344
the formal parameters are x, Asym, xmid, and scal. If the function is called
as
logist(age, A, xmid, scal)
then mCall will have components x = age, Asym = A, xmid = xmid and scal
= scal. As described above, the names of these components are the formal
parameters in the model function denition. The values of these components are the names (or, more generally, the expressions) that are the actual
arguments in the call to the model function.
The LHS argument is the expression on the left-hand side of the model
formula in the call to nls. It determines the response vector. The data
argument gives a data.frame in which to nd the variables named in the
other two arguments. The function logistInit below implements a slightly
more general version of the algorithm described above for calculating initial
estimates in the logistic model, using the required syntax.
345
> logistInit
function(mCall, LHS, data)
{
xy <- sortedXyData(mCall[["x"]], LHS, data)
if(nrow(xy) < 3) {
stop("Too few distinct input values to fit a logistic")
}
Asym <- max(abs(xy[,"y"]))
if (Asym != max(xy[,"y"])) Asym <- -Asym # negative asymptote
xmid <- NLSstClosestX(xy, 0.5 * Asym)
scal <- NLSstClosestX(xy, 0.75 * Asym) - xmid
value <- c(Asym, xmid, scal)
names(value) <- mCall[c("Asym", "xmid", "scal")]
value
}
Alternatively, it can be called with a one-sided formula dening the nonlinear model, the function for the initial attribute, and a character vector
giving the parameter names.
> logist <- selfStart( ~ Asym/(1 + exp(-(x - xmid)/scal)),
+
initial = logistInit, parameters = c("Asym", "xmid", "scal"))
When selfStart is called like this, the model function is produced by applying deriv to the right-hand side of the model formula.
The getInitial function is used to extract the initial parameter estimates from a given dataset when using a selfStart model function. It takes
two arguments: a two-sided model formula, which must include a selfStart
function on its right-hand side, and a data.frame in which to evaluate the
variables in the model formula.
346
asymptotic regression
asymptotic regression with an oset
asymptotic regression through the origin
biexponential
rst-order compartment
four-parameter logistic
logistic
MichaelisMenten
The nlme library includes several self-starting model functions that can
be used to t nonlinear regression models in S without specifying starting
estimates for the parameters. They are listed in Table 8.1 and are described
in detail in Appendix B. The SSlogis function is a more sophisticated version of our simple logist self-starting model, but with the same argument
sequence. It uses several techniques, such as the algorithm for partially
linear models, to rene the starting estimates so the returned values are
actually the converged estimates.
> getInitial(circumference ~ SSlogis(age,Asym,xmid,scal), Orange)
Asym
xmid scal
192.68 728.72 353.5
> nls( circumference ~ SSlogis(age, Asym, xmid, scal), Orange )
Residual sum of squares : 17480
347
parameters:
Asym
xmid scal
192.68 728.72 353.5
formula: circumference ~ SSlogis(age, Asym, xmid, scal)
35 observations
We can see that selfStart model objects relieve the user of much of the
eort required for a nonlinear regression analysis. If a versatile, eective
strategy for determining starting estimates is represented carefully in the
object, it can make the use of nonlinear models nearly as simple as the
use of linear models. If you frequently use a nonlinear model not included
in Table 8.1, you should consider writing your own self-starting model to
represent it. The selfStart functions in Table 8.1 and the logist function
described in this section can be used as templates.
348
Because Orange is a groupedData object, the grouping factor Tree could have
been omitted from the model formula. When a selfStart function depends
on only one covariate, as does SSlogis, and data is a groupedData object, the
entire form of the model formula can be inferred from the display formula
stored with the groupedData object. In this case, only the selfStart function
and the groupedData object need to be passed to the nlsList function. For
example, we could use
> fm1Oran.lis <- nlsList( SSlogis, Orange )
1
5
2
4
154.15
207.27
218.99
225.30
627.12
861.35
700.32
710.69
362.50
379.99
332.47
303.13
349
350
Tree
|
|
|
xmid
120
160
200
240
|
|
|
|
|
|
|
|
|
|
|
|
scal
|
|
|
|
|
200
300
400
|
500
600
FIGURE 8.4. Ninety-ve percent condence intervals on the logistic model parameters for each tree in the orange trees data.
3
1
5
2
4
.
scal
Value Std. Error t value
400.95
94.776 4.2306
362.50
81.185 4.4652
379.99
66.761 5.6917
332.47
49.381 6.7327
303.13
41.608 7.2853
. .
Although the estimates for all the parameters vary with tree, there appears to be relatively more variability in the Asym estimates. We can investigate this better with the intervals method.
> plot( intervals( fm1Oran.lis ), layout = c(3,1) )
# Figure 8.4
As mentioned in 6.2, condence intervals for the same parameter in dierent groups within an nlsList t do not necessarily have the same length,
even with balanced data. This is evident in Figure 8.4.
The only parameter for which all the condence intervals do not overlap
in Figure 8.4 is Asym, suggesting that it is the only parameter for which
random eects are needed to account for variation among trees.
The same plot method used for lmList objects is used to obtain diagnostic
plots for an nlsList object. The boxplots of the residuals by tree, obtained
with
> plot( fm1Oran.lis, Tree ~ resid(.), abline = 0 )
# Figure 8.5
and displayed in Figure 8.5, no longer indicate the tree eect observed
in Figure 8.3. The basic drawback of the nlsList model is that uses 15
parameters to account for the individual tree eects. A more parsimonious
representation is provided by the nonlinear mixed-eects model discussed
in 8.2.
As a second example to illustrate the use of the nlsList function, we
consider the theophylline data, which we used in 3.4. Recall that these
data, displayed in Figure 8.6, are the serum concentrations of theophylline
351
Tree
2
5
1
3
-10
-5
10
Residuals (mm)
Dke ka
[exp (ke t) exp (ka t)] .
Cl (ka ke )
(8.2)
The parameters in the model are the elimination rate constant ke , the
absorption rate constant ka , and the clearance Cl. For the model to be
meaningful, all three parameters must be positive. To ensure ourselves of
positive estimates while keeping the optimization problem unconstrained,
we reparameterize model (8.2) in terms of the logarithm of the clearance
and the rate constants.
ct =
352
10
15
20
25
5
10
8
6
4
2
4
12
11
10
8
6
4
2
0
10
8
6
4
2
6
10
8
6
4
2
0
0
10
15
20
25
10
15
20
25
FIGURE 8.6. Serum concentrations of theophylline versus time since oral administration of the drug in twelve subjects.
The selfStart function SSfol, described in Appendix C.5, provides a selfstarting implementation of (8.3). Because two covariates, dose and time,
are present in (8.3), and hence also in the argument sequence of SSfol, we
must specify the full model formula when calling nlsList.
> fm1Theo.lis <- nlsList( conc ~ SSfol(Dose, Time, lKe, lKa, lCl),
+
data = Theoph )
> fm1Theo.lis
Call:
Model: conc ~ SSfol(Dose, Time, lKe, lKa, lCl) | Subject
Data: Theoph
Coefficients:
lKe
lKa
6 -2.3074 0.15171
7 -2.2803 -0.38617
8 -2.3863 0.31862
11 -2.3215 1.34779
3 -2.5081 0.89755
2 -2.2862 0.66417
4 -2.4365 0.15834
9 -2.4461 2.18201
12 -2.2483 -0.18292
10 -2.6042 -0.36309
1 -2.9196 0.57509
lCl
-2.9733
-2.9643
-3.0691
-2.8604
-3.2300
-3.1063
-3.2861
-3.4208
-3.1701
-3.4283
-3.9158
1
10
12
Subject
-3.0
-2.5
| |
|
-2.0
-1
|
0
11
| |
|
|
|
2
-4.0
-3.5
| | |
|
|
|
| | |
| | |
|
| | |
lCl
| | |
lKa
353
-3.0
5 -2.4254
0.38616 -3.1326
# Figure 8.7
The individual condence intervals in Figure 8.7 indicate that there is substantial subject-to-subject variation in the absorption rate constant and
moderate variation in the clearance. The elimination rate constant does
not seem to vary signicantly with subject.
The main purpose of the preliminary analysis provided by nlsList is
to suggest a structure for the random eects to be used in a nonlinear
mixed-eects model. We must decide which random eects to include in
the model (intervals and its associated plot method are often useful for
that) and what covariance structure these random eects should have. The
pairs method, which is the same for lmList and nlsList objects, provides
one view of the random eects covariance structure.
> pairs( fm1Theo.lis, id = 0.1 )
# Figure 8.8
354
-3.2
-3.0
-3.0
-3.2
lCl
-3.4
-3.4
-3.6
-3.8
-3.8
1.0
2.0
1.5
-3.6
-3.4
2.0
1.5
1.0
lKa
1.0
0.5
0.0
0.0
-2.6
-2.3
-2.5
-2.4
0.5
1.0
-2.3
-2.4
-2.5
lKe
-2.6
-2.6
-2.7
-2.8
-2.9
-2.8
-2.7
-2.6
-2.9
FIGURE 8.8. Pairs plot for the random eects estimates corresponding to
fm1Theo.lis.
The scatter plots in Figure 8.8 suggest that Subject 1 has an unusually
low elimination rate constant and clearance and that Subject 9 has an
unusually high absorption rate constant. Overall, there do not appear to
be signicant correlations between the individual parameter estimates.
355
As with nls ts, there is an advantage of encapsulating the model expression in an S function when tting an nlme model, in that it allows
analytic derivatives of the model function to be passed to nlme and used
in the optimization algorithm. The S function deriv can be used to create
model functions that return the value of the model and its derivatives as
a gradient attribute. If the value returned by the model function does not
have a gradient attribute, numerical derivatives are used in the optimization.
The arguments fixed and random are formulas, or lists of formulas, dening the structures of the xed and random eects in the model. In these
formulas a 1 on the right-hand side of a formula indicates that a single
parameter is associated with the eect, but any linear formula in S can
be used. This gives considerable exibility to the model, as time-dependent
parameters can be incorporated easily (e.g., when a formula in the fixed list
involves a covariate that changes with time). Each parameter in the model
will usually have an associated xed eect, but it may, or may not, have an
associated random eect. Because the nlme model assumes that all random
eects have expected value zero, the inclusion of a random eect without a
corresponding xed eect would be unusual. Any covariates dened in the
fixed and random formulas can, alternatively, be directly incorporated in
the model formula. However, declaring the covariates in fixed and random
allows for more ecient calculation of derivatives and is useful for update
methods. Fixed is required when model is declared as a formula. By default,
when random is omitted, all xed eects in the model are assumed to have
an associated random eect.
In the theophylline example, to have random eects only for the log of
the absorption rate constant, lKa, and the log-clearance, lCl, we use
fixed = list(lKe ~ 1, lKa ~ 1, lCl ~ 1),
random = list(lKa ~ 1, lCl ~ 1)
356
Note that there would be two xed eects associated with lCl in this case:
an intercept and a slope with respect to Wt.
Data names a data frame in which any variables used in model, fixed,
random, and groups are to be evaluated. By default, data is set to the environment from which nlme was called.
The groups argument is a one-sided formula, or an S expression, which,
when evaluated in data, returns a factor with the group label of each observation. This argument does not need to be specied when object is an
nlsList object, or when data is a groupedData object, or when random is a
named list (in which case the name is used as the grouping factor).
The start argument provides a list, or a vector, of starting values for the
iterative algorithm. When given as a vector, it is used as starting estimates
for the xed eects only. It is only required when model is given as a formula
and the model function is not a selfStart object. In this case, starting values
for the xed eects must be specied. Starting estimates for the remaining
parameters are generated automatically. By default, the random eects are
initialized to zero.
Objects returned by nlme are of class nlme, which inherits from class lme.
As a result, all the methods available for lme objects can also be applied to
an nlme object. In fact, most methods are common to both classes. Table 8.3
lists the most important methods for class nlme. We illustrate the use of
these methods through the examples in the next sections.
Growth of Orange Trees
The nonlinear mixed-eects model corresponding to the logistic model 8.1,
with random eects for all parameters, is
1i
yij =
+ ij ,
1 + exp [ (tij 2i ) /3i ]
1
b1i
1i
i = 2i = 2 + b2i = + bi ,
3i
3
b3i
bi N (0, ) ,
ij N 0, 2 .
(8.4)
357
trees and the random eects, bi , represent the deviations of the i from
their population average. The random eects are assumed to be independent for dierent trees and the within-group errors ij are assumed to be
independent for dierent i, j and to be independent of the random eects.
The nonlinear mixed-eects model (8.4) uses ten parameters to represent
the xed eects (three parameters), the random-eects variancecovariance
structure (six parameters), and the within-group variance (one parameter).
These numbers remain unchanged if we increase the number of trees being
analyzed. In comparison, the number of parameters in the corresponding
nlsList model, described in 8.1.3, is equal to three times the number of
trees (15, in the orange trees example).
A nonlinear mixed-eects t of model (8.4) can be obtained with
> ## no need to specify groups, as Orange is a groupedData object
> ## random is omitted - by default it is equal to fixed
> fm1Oran.nlme <+ nlme( circumference ~ SSlogis(age, Asym, xmid, scal),
+
data = Orange,
+
fixed = Asym + xmid + scal ~ 1,
+
start = fixef(fm1Oran.lis) )
358
The fm1Oran.lis object stores information about the model function, the
parameters in the model, the groups formula, and the data used to t the
model. These are retrieved by nlme, allowing a more compact call. Another
important advantage of using an nlsList object as the rst argument to
nlme is that it automatically provides initial estimates for the xed eects,
the random eects, and the random-eects covariance matrix.
We can now use the nlme methods to display the results and to assess
the quality of the t. As with lme objects, the print method gives some
brief information about the tted object.
> fm1Oran.nlme
Nonlinear mixed-effects model fit by maximum likelihood
Model: circumference ~ SSlogis(age, Asym, xmid, scal)
Data: Orange
Log-likelihood: -129.99
Fixed: list(Asym ~ 1, xmid ~ 1, scal ~ 1)
Asym
xmid
scal
192.12 727.74 356.73
Random effects:
Formula: list(Asym ~ 1, xmid ~ 1, scal ~ 1)
Level: Tree
Structure: General positive-definite
StdDev
Corr
Asym 27.0302 Asym
xmid
xmid 24.3761 -0.331
scal 36.7363 -0.992 0.447
Residual 7.3208
Number of Observations: 35
Number of Groups: 5
359
Random effects:
Formula: list(Asym ~ 1, xmid ~ 1, scal ~ 1)
Level: Tree
Structure: General positive-definite
StdDev
Corr
Asym 27.0302 Asym
xmid
xmid 24.3761 -0.331
scal 36.7363 -0.992 0.447
Residual 7.3208
Fixed effects: list(Asym ~ 1, xmid ~ 1, scal ~ 1)
Value Std.Error DF t-value p-value
Asym 192.12
14.045 28 13.679 <.0001
xmid 727.74
34.618 28 21.022 <.0001
scal 356.73
30.537 28 11.682 <.0001
Correlation:
Asym
xmid
xmid 0.275
scal -0.194 0.666
. . .
The xed-eects estimates are similar, but the standard errors are much
smaller in the nlme t. The estimated within-group standard error is also
considerably smaller in the nlme t. This is because the between-group
variability is not incorporated in the nls model, being absorbed in the
standard error. This pattern is generally observed when comparing mixedeects versus xed-eects ts.
In the fm1Oran.nlme t the estimated correlation of 0.992 between Asym
and xmid suggests that the estimated variancecovariance matrix is illconditioned and that the random-eects structure may be over-parameterized. The scatter-plot matrix of the estimated random eects, produced
by the pairs method, provides a useful diagnostic plot for assessing overparameterization problems.
> pairs( fm1Oran.nlme )
# Figure 8.9
360
380
400
380
360
scal
360
340
320
320
730
340
360
740
740
730
xmid
730
720
720
190
220
200
210
730
220
210
200
Asym
190
190
180
170
160
170
180
190
160
The nearly perfect alignment between the Asym random eects and the scal
random eects further indicates that the model is over-parameterized.
The individual condence intervals for the parameters in the nlsList
model described by fm1Oran.lis, displayed in Figure 8.4 and discussed
in 8.1.3, suggested that Asym was the only parameter requiring a random eect to account for its variation among trees. We can t the corresponding model using the update method and compare it to the full model
fm1Oran.nlme using the anova method.
> fm2Oran.nlme <- update( fm1Oran.nlme, random = Asym ~ 1 )
> anova( fm1Oran.nlme, fm2Oran.nlme )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm1Oran.nlme
1 10 279.98 295.53 -129.99
fm2Oran.nlme
2 5 273.17 280.95 -131.58 1 vs 2 3.1896 0.6708
The large p-value for the likelihood ratio test conrms that the xmid and
scal random eects are not needed in the nlme model and that the simpler
model fm2Oran.nlme is to be preferred.
As in the linear case, we must check if the assumptions underlying the
nonlinear mixed-eects model appear valid for the model tted to the data.
The two basic distributional assumptions are the same as for the lme model:
Assumption 1 : the within-group errors are independent and identically normally distributed, with mean zero and variance 2 , and they
are independent of the random eects.
Standardized residuals
361
-1
-2
50
100
150
200
FIGURE 8.10. Scatter plot of standardized residuals versus tted values for
fm1Oran.nlme.
# Figure 8.10
shows that the residuals are distributed symmetrically around zero, with
an approximately constant variance. It does not indicate any violations of
the assumptions for the within-group error:
The adequacy of the tted model is better visualized by displaying the
tted and observed values in the same plot. The augPred method, which
produces the augmented predictions, and its associated plot method are
used for that. For comparison, both the population predictions (obtained
by setting the random eects to zero) and the within-group predictions
(using the estimated random eects) are displayed in Figure 8.11, produced
with
> ## level = 0:1 requests fixed (0) and within-group (1) predictions
> plot( augPred(fm2Oran.nlme, level = 0:1),
# Figure 8.11
+
layout = c(5,1) )
362
200
fixed
600
Tree
1000
1400
200
150
100
50
200
600
1000
1400
200
600
1000
1400
200
600
1000
1400
#Figure 8.12
Because there are only ve trees in the data, with just seven observations each, we cannot reliably test assumptions about the random-eects
distribution and the independence of the within-group errors.
-1
-2
-2
-1
Standardized residuals
FIGURE 8.12. Normal probability plot of the standardized residuals from a nonlinear mixed-eects model t fm1Oran.nlme to the orange data.
363
Theophylline Kinetics
The nonlinear mixed-eects version of the rst-order open-compartment
model (8.3), with all parameters as mixed-eects, is
cij =
1
b1i
lKe i
i = lKa i = 2 + b2i = + bi ,
lCl i
3
b3i
bi N (0, ) , ij N 0, 2 ,
(8.5)
The estimated correlation between the lKe and lCl random eects is
near one, indicating that the estimated random-eects variancecovariance
matrix is ill-conditioned. We can investigate the precision of the correlation
estimates with the intervals method.
364
upper
0.29600
1.02287
0.40246
0.41903
1.00000
0.37746
All three condence intervals for the correlations include zero. The interval
on the correlation between lKe and lCl shows this quantity is not estimated
with any precision whatsoever. It must lie in the interval [1, 1] and the
condence interval is essentially that complete range.
As a rst attempt at simplifying model (8.5), we investigate the assumption that the random eects are independent, that is, the matrix
is diagonal. Structured random-eects variancecovariance matrices are
specied in nlme in the same way as in lme: by using a pdMat constructor to
specify the desired class of positive-denite matrix. The pdMat classes and
methods are described in 4.2.2 and the standard pdMat classes available in
the nlme library are listed in Table 4.3. By default, when no pdMat class is
specied, a general positive-denite matrix (pdSymm class) is used to represent the random-eects variancecovariance structure. Alternative pdMat
classes are specied by calling the corresponding constructor with the random eects formula, or list of formulas, as its rst argument. For example,
we specify a model with independent random eects for the theophylline
data with either
> fm2Theo.nlme <- update( fm1Theo.nlme,
+
random = pdDiag(list(lKe ~ 1, lKa ~ 1, lCl ~ 1)) )
or
> fm2Theo.nlme <- update( fm1Theo.nlme,
+
random = pdDiag(lKe + lKa + lCl ~ 1) )
> fm2Theo.nlme
. . .
Log-likelihood: -177.02
Fixed: list(lKe ~ 1, lKa ~ 1, lCl ~ 1)
lKe
lKa
lCl
-2.4547 0.46554 -3.2272
365
Random effects:
Formula: list(lKe ~ 1, lKa ~ 1, lCl ~ 1)
Level: Subject
Structure: Diagonal
lKe
lKa
lCl Residual
StdDev: 1.9858e-05 0.64382 0.16692 0.70923
. . .
The very small estimated standard deviation for lKe suggests that the corresponding random eect could be omitted from the model.
> fm3Theo.nlme <+
update( fm2Theo.nlme, random = pdDiag(lKa + lCl ~ 1) )
We use the anova method to test the equivalence of the dierent nlme
models used so far for the theophylline data. By including fm3Theo.nlme as
the second model, we obtain the p-values comparing this model with each
of the other two.
> anova( fm1Theo.nlme, fm3Theo.nlme, fm2Theo.nlme )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm1Theo.nlme
1 10 366.64 395.47 -173.32
fm3Theo.nlme
2 6 366.04 383.34 -177.02 1 vs 2 7.4046 0.1160
fm2Theo.nlme
3 7 368.05 388.23 -177.02 2 vs 3 0.0024 0.9611
The simpler fm3Theo.nlme model, with two independent random eects for
lKa and lCl, has the smallest AIC and BIC. Also, the large p-values for the
likelihood ratio tests comparing it to the other two models indicate that it
should be preferred.
The plot of the standardized residuals versus the tted values in Figure 8.13 gives some indication that the within-group variability increases
with the drug concentration.
> plot( fm3Theo.nlme )
# Figure 8.13
The use of variance functions in nlme to model within-group heteroscedasticity is described in 8.3.1. We postpone further investigation of the withingroup error assumptions in the theophylline until that section.
The qqnorm method is used to investigate the normality of the random
eects.
> qqnorm( fm3Theo.nlme, ~ ranef(.) )
# Figure 8.14
Figure 8.14 does not indicate any violations of the assumption of normality
for the random eects.
366
Standardized residuals
-2
10
FIGURE 8.13. Scatter plot of standardized residuals versus tted values for
fm3Theo.nlme.
lKa
lCl
-1
-0.5
0.0
0.5
1.0
-0.3
-0.1
0.0
0.1
0.2
Random effects
367
368
It is clear from Figure 8.15 that the CO2 uptake rates of Quebec plants
are greater than those of Mississippi plants and that chilling the plants
reduces their CO2 uptake rates.
An asymptotic regression model with an oset is used in Potvin et al.
(1990) to represent the expected CO2 uptake rate U (c) as a function of the
ambient CO2 concentration c:
U (c) = 1 {1 exp [ exp (2 ) (c 3 )]} ,
(8.6)
200
400
Quebec
nonchilled
600
800
1000
200
Quebec
chilled
Mississippi
nonchilled
400
600
369
800
1000
Mississippi
chilled
40
30
20
10
200
400
600
800
1000
200
400
600
800
1000
FIGURE 8.15. CO2 uptake versus ambient CO2 by treatment and type for
Echinochloa crus-galli plants, 6 from Quebec and 6 from Mississippi. Half the
plants of each type were chilled overnight before the measurements were taken.
where 1 is the asymptotic CO2 uptake rate, 2 is the logarithm of the rate
constant, and 3 is the maximum ambient concentration of CO2 at which
there is no uptake. The logarithm of the rate constant is used to enforce the
positivity of the estimated rate constant, while keeping the optimization
problem unconstrained.
The selfStart function SSasympOff gives a self-starting implementation of
model (8.6), which is used to automatically generate starting estimates for
the parameters in an nlsList t.
> fm1CO2.lis <- nlsList( SSasympOff, CO2)
> fm1CO2.lis
Call:
Model: uptake ~ SSasympOff(conc, Asym, lrc, c0) | Plant
Data: CO2
Coefficients:
Asym
lrc
c0
Qn1 38.140 -4.3807 51.221
Qn2 42.872 -4.6658 55.856
. . .
Mc3 18.535 -3.4654 67.843
Mc1 21.787 -5.1422 -20.395
Degrees of freedom: 84 total; 48 residual
Residual standard error: 1.7982
370
asymptote Asym and only moderate variation in the log-rate lrc and the
oset c0. We initially consider a nonlinear mixed-eects version of the CO2
uptake model (8.6) with all parameters as mixed eects and no treatment
covariates. The corresponding model for the CO2 uptake uij of plant i at
ambient CO2 concentration cij is
uij = 1i {1 exp [ exp (2i ) (cij 3i )]} + ij ,
1
b1i
1i
i = 2i = 2 + b2i = + bi ,
3i
3
b3i
bi N (0, ) , ij N 0, 2 ,
(8.7)
The very high correlation between Asym and c0 suggests that the randomeects model is over-parameterized. The scatter plot matrix of the estimated random eects (not shown) conrms that Asym and c0 are in almost
Standardized residuals
371
Qc3
2
1
0
-1
-2
Qc2
-3
Qc3
10
20
30
40
FIGURE 8.16. Scatter plot of standardized residuals versus tted values for
fm2CO2.nlme.
perfect linear alignment. A simpler model with just Asym and lrc as random
eects gives an equivalent t of the data.
> fm2CO2.nlme <- update( fm1CO2.nlme, random = Asym + lrc ~ 1 )
> fm2CO2.nlme
. . .
Random effects:
Formula: list(Asym ~ 1, lrc ~ 1)
Level: Plant
Structure: General positive-definite
StdDev
Corr
Asym 9.65939 Asym
lrc 0.19951 -0.777
Residual 1.80792
. . .
> anova( fm1CO2.nlme, fm2CO2.nlme )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm1CO2.nlme
1 10 422.62 446.93 -201.31
fm2CO2.nlme
2 7 419.52 436.53 -202.76 1 vs 2 2.8961 0.4079
The plot of the standardized residuals versus the tted values in Figure 8.16 does not indicate any violations from the assumptions on the
within-group error. The residuals are distributed symmetrically around
zero, with uniform variance. Two large standardized residuals are observed
for plant Qc3 and one for plant Qc2.
> plot( fm2CO2.nlme,id = 0.05,cex = 0.8,adj = -0.5 )
# Figure 8.16
The normal plot of the within-group residuals, not shown, does not indicate
violations in the normality of the within-group errors.
372
The primary question of interest for the CO2 data is the eect of plant
type and chilling treatment on the individual model parameters i . The
random eects accommodate individual deviations from the xed eects.
Plotting the estimated random eects against the candidate covariates provides useful information for choosing covariates to include in the model.
First, we need to extract the estimated random eects from the tted
model and combine them with the covariates. The ranef method accomplishes that.
> fm2CO2.nlmeRE <- ranef( fm2CO2.nlme, augFrame = T )
> fm2CO2.nlmeRE
Asym
lrc
Type Treatment conc uptake
Qn1
6.17160 0.0483563
Quebec nonchilled 435 33.229
Qn2 10.53264 -0.1728531
Quebec nonchilled 435 35.157
Qn3 12.21810 -0.0579930
Quebec nonchilled 435 37.614
Qc1
3.35213 -0.0755880
Quebec
chilled 435 29.971
Qc3
7.47431 -0.1924203
Quebec
chilled 435 32.586
Qc2
7.92855 -0.1803391
Quebec
chilled 435 32.700
Mn3 -4.07333 0.0334485 Mississippi nonchilled 435 24.114
Mn2 -0.14198 0.0056463 Mississippi nonchilled 435 27.343
Mn1
0.24056 -0.1938500 Mississippi nonchilled 435 26.400
Mc2 -18.79914 0.3193732 Mississippi
chilled 435 12.143
Mc3 -13.11688 0.2994393 Mississippi
chilled 435 17.300
Mc1 -11.78655 0.1667798 Mississippi
chilled 435 18.000
> class( fm2CO2.nlmeRE )
[1] "ranef.lme" "data.frame"
The augFrame argument, when TRUE, indicates that summary values for all
the variables in the data frame should be returned along with the random
eects. The summary values are calculated as in the gsummary function
(3.4). When a covariate is constant within a group, such as Treatment and
Type in the CO2 data, its unique values per group are returned. Otherwise,
if the covariate varies within the group and is numeric, such as conc and
uptake in CO2, the group means are returned; if it is a categorical variable
(factor or ordered), the most frequent values (modes) within each group are
returned.
The plot method for the ranef.lme class is the most useful tool for identifying relationships between individual parameters and covariates. The form
argument is used to specify the desired covariates and plot type. A onesided formula on the right-hand side, with covariates separated by the *
operator, results in a dotplot of the estimated random eects versus all
combinations of the unique values of the variables named in the formula.
This plot is particularly useful for a moderate number of categorical covariates (factor or ordered variables) with a relatively small number of levels,
as in the CO2 example.
> plot( fm2CO2.nlmeRE, form = ~ Type * Treatment )
# Figure 8.17
373
lrc
Mississippi chilled
Mississippi nonchilled
Quebec chilled
Quebec nonchilled
20
-15
-10
-5
10
0.2
0.0
0.1
0.2
0.3
Random effects
(8.8)
where 1 represents the average asymptotic uptake rate, 1 and 2 represent, respectively, the plant type and chilling treatment main eects,
and 3 represents the plant typechilling treatment interaction. The parameterization used for x1i and x2i in (8.8) is consistent with the default
parameterization for factors in S.
> contrasts(CO2$Type)
[,1]
Quebec
-1
Mississippi
1
> contrasts(CO2$Treatment)
[,1]
nonchilled
-1
chilled
1
The update method is used to t the model with the covariate terms,
which are specied through the fixed argument.
374
Because the xed-eects model has been reformulated, new starting values
must be provided. We use the previous estimates for 1 , 2 and 3 and set
the initial values for 1 , 2 and 3 to zero. The xed eects are represented
internally in nlme in the same order they appear in fixed.
The summary method gives information about the signicance of the individual xed eects.
> summary( fm3CO2.nlme )
. . .
AIC
BIC logLik
393.68 417.98 -186.84
Random effects:
Formula: list(Asym ~ 1, lrc ~ 1)
Level: Plant
Structure: General positive-definite
StdDev
Corr
Asym.(Intercept) 2.92980 Asym.(
lrc 0.16373 -0.906
Residual 1.84957
Fixed effects: list(Asym ~ Type * Treatment, lrc + c0 ~ 1)
Value Std.Error DF t-value p-value
Asym.(Intercept) 32.447
0.9359 67 34.670 <.0001
Asym.Type -7.108
0.5981 67 -11.885 <.0001
Asym.Treatment -3.815
0.5884 67 -6.483 <.0001
Asym.Type:Treatment -1.197
0.5884 67 -2.033
0.046
lrc -4.589
0.0848 67 -54.108 <.0001
c0 49.479
4.4569 67 11.102 <.0001
. . .
The names of the xed-eects terms include the parameter name. All xed
eects introduced in the model to explain the variability in Asym are significantly dierent from zero at the 5% level, conrming the previous conclusions from Figure 8.17. The joint signicance of the xed eects introduced
in the model can be tested with the anova method.
> anova( fm3CO2.nlme, Terms = 2:4 )
F-test for: Asym.Type, Asym.Treatment, Asym.Type:Treatment
numDF denDF F-value p-value
1
3
67 54.835 <.0001
As expected, the approximate F-test indicates that the added terms are
highly signicant.
The inclusion of the experimental factors in the model resulted in a
reduction in the estimated standard deviation for the Asym random eects
375
lrc
Mississippi chilled
Mississippi nonchilled
Quebec chilled
Quebec nonchilled
-6
-4
-2
-0.1
0.0
0.1
0.2
0.3
Random effects
# Figure 8.18
376
Asym.(Intercept)
Asym.Type
Asym.Treatment
Asym.Type:Treatment
lrc.(Intercept)
lrc.Type
lrc.Treatment
lrc.Type:Treatment
c0
. . .
The large p-value for the likelihood ratio test and the smaller AIC and BIC
values for fm5CO2.nlme indicate that no random eects are needed for lrc.
To test if a random eect is needed for the asymptotic uptake rate,
we need to t a nonlinear xed-eects model to the CO2 data. The nls
function can be used for that, though it is not designed to eciently handle
parameters that are expressed as linear combinations of covariates. (The
gnls function, described in 8.3.3, is better suited for this type of model.)
To use nls, we must rst create variables representing the contrasts of
interest
> CO2$type <- 2 * (as.integer(CO2$Type) - 1.5)
> CO2$treatment <- 2 * (as.integer(CO2$Treatment) - 1.5)
Mn3
400
600
800
Plant
1000
200
Mn2
377
Mn1
400
600
800
1000
200
Mc2
Mc3
400
600
800
1000
Mc1
40
30
20
10
Qn1
Qn2
Qn3
Qc1
Qc3
Qc2
40
30
20
10
200
400
600
800
1000
200
400
600
800
1000
200
400
600
800
1000
The anova method can then be used to compare the models. (Note that the
nlme object must appear rst in the calling sequence to anova, so that the
correct method is invoked.)
> anova( fm5CO2.nlme, fm1CO2.nls )
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm5CO2.nlme
1 11 387.06 413.79 -182.53
fm1CO2.nls
2 10 418.34 442.65 -199.17 1 vs 2 33.289 <.0001
The very signicant p-value for the likelihood ratio test indicates that the
Asym.(Intercept) random eect is still needed in the model.
A nal assessment of the quality of the tted model is provided by the
plot of the augmented predictions included in Figure 8.19.
> plot( augPred(fm5CO2.nlme, level = 0:1),
+
layout = c(6,2) )
# Figure 8.19
378
model. Note that the population average predictions vary with plant type
and chilling treatment.
Age (yr)
Glycoprotein concentration (mg/100 dL)
Body weight (kg)
Congestive heart failure
Creatinine clearance (ml/min)
Ethanol abuse
Height (in.)
Race
Smoking status
4292
0.393.16
41119
no/mild, moderate, severe
< 50, 50
none, current, former
6079
Caucasian, Latin, Black
no, yes
379
(8.9)
Ct =
,
V (ka ke ) 1 exp (ke ) 1 exp (ka )
(8.10)
d
.
Cat =
V [1 exp (ka )]
Finally, for a between-dosages time t, the model for the expected concentration Ct , given that the last dose was received at time t , is identical
to (8.9).
Using the fact that the elimination rate constant ke is equal to the ratio
between the clearance (Cl) and the volume of distribution (V ), we can
reparameterize models (8.9) and (8.10) in terms of V , ka , and Cl .
To ensure that the estimates of V , ka , and Cl are positive, we can rewrite
models (8.9) and (8.10) in terms of lV = log(V ), lKa = log(ka ) and
lCl = log(Cl).
The initial conditions for the recursive models (8.9) and (8.10) are C0 = 0
and Ca0 = d0 /V , with d0 denoting the initial dose received by the patient. It is assumed in the models denition that the bioavailability of the
drugthe percentage of the administered dose that reaches the measurement compartmentis equal to one.
The function quinModel in the nlme library implements the recursive
models (8.9) and (8.10) in S, parameterized in terms of lV , lKa and lCl .
This is not a self-starting model, so initial values for the xed eects need
to be provided when calling nlme. We used values reported in the literature
as starting estimates for the xed eects.
Preliminary analyses of the data, without using any covariates to explain intersubject variation, indicates that only lCl and lV need random
eects to account for their variability in the patient population, and that
the corresponding random eects can be assumed to be independent. The
corresponding model for the xed and random eects is
lCl i = 1 + b1i , lV i = 2 + b2i , lKa i = 3 ,
b1i
1 0
N 0,
,
bi =
b2i
0 2
(8.11)
380
The na.action and naPattern arguments in this call to nlme are described
in 6.4.
To investigate which covariates may account for patient-to-patient variation in the pharmacokinetic parameters, we rst extract the estimated
random eects, augmented with summary values for the available covariates (the modal value is used for time-varying factors and the mean for
time-varying numeric variables).
> fm1Quin.nlmeRE <- ranef( fm1Quin.nlme, aug = T )
> fm1Quin.nlmeRE[1:3,]
lV
lCl time
conc dose interval
109 0.0005212 -0.0028369 61.58 0.50000
NA
NA
70 0.0362214 0.3227614 1.50 0.60000
NA
NA
23 -0.0254211 0.4402551 91.14 0.56667
NA
NA
Weight
Race Smoke Ethanol
Heart Creatinine
109
58 Caucasian
no
none No/Mild
>= 50
70
75 Caucasian
no former No/Mild
>= 50
23
108 Caucasian
yes
none No/Mild
>= 50
Age Height
70
67
68
69
75
72
glyco
0.46000
1.15000
0.83667
The dotplot displays used to visualize the relationships between the estimated random eects and the covariates in the CO2 example do not scale
381
The resulting plot, shown in Figure 8.20, indicates that clearance decreases with glycoprotein concentration and age, and increases with creatinine clearance and weight. There is also evidence that clearance decreases
with severity of congestive heart failure and is smaller in Blacks than in
both Caucasians and Latins. The glycoprotein concentration is clearly the
most important covariate for explaining the lCl interindividual variation.
A straight line seems adequate to model the observed relationship.
Figure 8.21 presents the plots of the estimated lV random eects versus
the available covariates. None of the covariates seems helpful in explaining
the variability of this random eect and we do not pursue the modeling of
its variability any further.
Initially, only the glycoprotein concentration is included in the model to
explain the lCl random-eect variation according to a linear model. This
modication of model (8.11) is accomplished by writing
lCl ij = (1 + b1i ) + 2 glycoij .
(8.12)
Because the glycoprotein concentration may change with time on the same
patient, the random eects for lCl need to be indexed by both patient i
and time j. We t the mixed-eects model corresponding to (8.12) with
>
>
+
+
>
.
382
Creatinine
Heart
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
0.5
1.0
1.5
2.0
2.5
3.0
< 50
Weight
>= 50
No/Mild
Moderate
Race
Severe
Height
0.6
0.4
0.2
lCl
0.0
-0.2
-0.4
-0.6
-0.8
40
60
80
100
120 Caucasian
Age
Latin
Black
60
65
Smoke
70
75
Ethanol
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
40
50
60
70
80
90
no
yes
none
current
former
Random effects:
Formula: list(lV ~ 1, lCl ~ 1)
Level: Subject
Structure: Diagonal
lV lCl.(Intercept) Residual
StdDev: 0.26764
0.27037 0.63746
Fixed effects: list(lCl ~ glyco, lKa + lV ~ 1)
Value Std.Error DF t-value p-value
lCl.(Intercept) 3.1067
0.06473 222 47.997 <.0001
lCl.glyco -0.4914
0.04263 222 -11.527 <.0001
lKa -0.6662
0.30251 222 -2.202 0.0287
lV 5.3085
0.10244 222 51.818 <.0001
. . .
0.6
Creatinine
383
Heart
0.4
0.2
0.0
-0.2
0.5
1.0
1.5
2.0
2.5
3.0
< 50
Weight
>= 50
No/Mild
Race
Moderate
Severe
Height
0.6
0.4
lV
0.2
0.0
-0.2
40
60
80
100
120 Caucasian
Age
0.6
Latin
Black
60
65
Smoke
70
75
Ethanol
0.4
0.2
0.0
-0.2
40
50
60
70
80
90
no
yes
none
current
former
The estimated lCl.glyco xed eect is very signicant, indicating that the
glycoprotein concentration should be kept in the model.
To search for further variables to include in the model, we consider the
plots of the estimated lCl.(Intercept) random eects from the fm2Quin.nlme
t versus the covariates, presented in Figure 8.22.
These plots indicate that the estimated lCl.(Intercept) random eects
increase with creatinine clearance, weight, and height, decrease with age
and severity of congestive heart failure, and are smaller in Blacks than
in Caucasians and Latins. The most relevant variable appears to be the
creatinine clearance, which is included in the model as a binary variable
taking value 0 when creatinine is < 50 and 1 when creatinine is 50.
> options( contrasts = c("contr.treatment", "contr.poly") )
384
Creatinine
Heart
0.6
0.4
0.2
0.0
-0.2
-0.4
0.5
1.0
1.5
2.0
2.5
3.0
< 50
Weight
>= 50
No/Mild
Race
Moderate
Severe
Height
0.6
lCl.(Intercept)
0.4
0.2
0.0
-0.2
-0.4
40
60
80
100
120 Caucasian
Age
Latin
Black
60
65
Smoke
70
75
Ethanol
0.6
0.4
0.2
0.0
-0.2
-0.4
40
50
60
70
80
90
no
yes
none
current
former
385
The nal model produced by this stepwise model-building approach includes an extra term for the patients body weight to explain the clearance
variation. The corresponding model for the log-clearance is expressed as
lCl ij = (1 + b1i ) + 4 glycoij + 5 Creatinineij + 6 Weightij
(8.13)
and is t in S with
>
>
+
+
>
.
Random effects:
Formula: list(lV ~ 1, lCl ~ 1)
Level: Subject
Structure: Diagonal
lV lCl.(Intercept) Residual
StdDev: 0.28154
0.24128 0.63083
Fixed effects: list(lCl~glyco + Creatinine + Weight, lKa+lV ~ 1)
Value Std.Error DF t-value p-value
lCl.(Intercept) 2.7883
0.15167 220 18.384 <.0001
lCl.glyco -0.4645
0.04100 220 -11.328 <.0001
lCl.Creatinine 0.1373
0.03264 220
4.207 <.0001
lCl.Weight 0.0031
0.00180 220
1.749 0.0816
lKa -0.7974
0.29959 220 -2.662 0.0083
lV 5.2833
0.10655 220 49.587 <.0001
. . .
The lCl.Weight coecient is not signicant at a 5% level, but it is signicant at a less conservative 10% level. Given the high level of noise and the
small number of observations per patient in the quinidine data, we considered a p-value of 8.2% to be small enough to justify the inclusion of Weight
in the model.
As reported in previous analyses of the quinidine data (Davidian and
Giltinan, 1995, 9.3), there is evidence that the variability in the concentration measurements increases with the quinidine concentration. We
postpone the investigation of heteroscedasticity in the quinidine data until
8.3.1, when we describe the use of variance functions in nlme.
386
we describe how to t and analyze them with the nlme library. The Wafer
example of 4.2.3 is used to illustrate the various methods available in nlme
for multilevel NLME models.
As in the linear case, the only dierence between a single-level and a
multilevel t in nlme is in the specication of the random argument. In the
multilevel case, random must provide information about the nested grouping
structure and the random-eects model at each level of grouping. This is
generally accomplished by specifying random as a named list, with names
corresponding to the grouping factors. The order of nesting is assumed to
be the same as the order of the elements in the list, with the outermost
grouping factor appearing rst and the innermost grouping factor appearing last. Each element of the random list has the same structure as random
in a single-level call: it can be a formula, a list of formulas, a pdMat object,
or a list of pdMat objects. Most of the nlme extractors, such as resid and
ranef, include a level argument indicating the desired level(s) of grouping.
Manufacturing of Analog MOS Circuits
A multilevel linear mixed-eects analysis of the Wafer data is presented in
4.2.3 to illustrate the multilevel capabilities of lme. The nal multilevel
model obtained in that section to represent the intensity of current yijk at
the kth level of voltage vk in the jth site within the ith wafer is expressed,
for i = 1, . . . , 10, j = 1, . . . , 8, and k = 1, . . . , 5, as
yijk = (0 + b0i + b0i,j ) + (1 + b1i + b1i,j ) vk + (2 + b2i + b2i,j ) vk2
+ 3 cos (vk ) + 4 sin (vk ) + ijk
b0i
b0i,j
bi = b1i N (0, 1 ) ,
bi,j = b1i,j N (0, 2 ) ,
b2i
b2i,j
2
ijk N 0, .
(8.14)
where 0 , 1 , and 2 are the xed eects in the quadratic model, 3 and
4 are the xed eects for the cosine wave of frequency , bi is the waferlevel random eects vector, bi,j is the site within wafer-level random-eects
vector, and ijk is the within-group error. The bi are assumed to be independent for dierent i, the bi,j are assumed to be independent for dierent i, j and independent of the bi , and the ijk are assumed to be independent for dierent i, j, k and independent of the random eects. The
wafer-level variancecovariance matrix 1 is general positive-denite and
the site-within-wafer-level matrix 2 is block-diagonal, with a 1 1 block
corresponding to the variance of b0i,j and a 2 2 block corresponding to
variancecovariance matrix of [b1i,j , b2i,j ]T .
Because its model function is nonlinear in the frequency , model (8.14)
is actually an example of a multilevel nonlinear mixed-eects model. It
387
388
(8.16)
b0i
b0i,j
b1i
b1i,j
bi,j =
bi =
b2i N (0, 1 ) ,
b2i,j N (0, 2 ) ,
b3i
b3i,j
b4i
ijk N 0, 2 .
(8.17)
Because of the large number of random eects at each grouping level, to
make the estimation problem more numerically stable, we make the simplifying assumption that the random eects are independent. That is, we
assume that 1 and 2 are diagonal matrices.
389
Even though models (8.15) and (8.16) are not nested, they can compared
using information criterion statistics. The anova method can be used for
that, but we must rst obtain a maximum likelihood t of model (8.15).
> fm1Wafer.nlme <- update( fm1Wafer.nlmeR, method = "ML" )
> anova( fm1Wafer.nlme, fm2Wafer.nlme, test = F )
Model df
AIC
BIC logLik
fm1Wafer.nlme
1 16 -1503.9 -1440.1 767.96
fm2Wafer.nlme
2 15 -1502.9 -1443.0 766.44
The more conservative BIC favors the model with fewer parameters,
fm2Wafer.nlme, while the more liberal AIC favors the model with larger
log-likelihood, fm1Wafer.nlme.
The intervals method is used to obtain condence intervals on the xed
eects and the variance components.
> intervals( fm2Wafer.nlme )
Approximate 95% confidence intervals
390
Fixed effects:
lower
est.
upper
A.(Intercept) -4.35017 -4.26525 -4.18033
A.voltage 5.40994 5.63292 5.85589
A.I(voltage^2) 1.22254 1.25601 1.28948
B -0.14403 -0.14069 -0.13735
w 4.58451 4.59366 4.60281
Random Effects:
Level: Wafer
sd(A.(Intercept))
sd(A.voltage)
sd(A.I(voltage^2))
sd(B)
sd(w)
Level: Site
sd(A.(Intercept))
sd(A.voltage)
sd(A.I(voltage^2))
sd(B)
lower
0.0693442
0.1715436
0.0222685
0.0022125
0.0078146
est.
0.1331979
0.3413426
0.0482433
0.0048037
0.0146290
upper
0.255849
0.679214
0.104516
0.010430
0.027385
lower
0.0664952
0.2442003
0.0532046
0.0053128
est.
0.0840825
0.3087244
0.0672589
0.0067264
upper
0.1063214
0.3902973
0.0850257
0.0085162
The xed eects and the within-group standard error are estimated with
more relative precision than the random-eects variance components. In
the random-eects variance components, the site-within-wafer standard
deviations are estimated with greater precision than the wafer standard
deviations.
The plot of the within-group residuals versus voltage by wafer, displayed
in Figure 8.23, and produced with
> plot( fm2Wafer.nlme, resid(.) ~ voltage | Wafer,
+
panel = function(x, y, ...) {
+
panel.grid()
+
panel.xyplot(x, y)
+
panel.loess(x, y, lty = 2)
+
panel.abline(0, 0)
+
} )
# Figure 8.23
does not reveal any periodic patterns as observed, for example, in Figure 4.27, indicating that the inclusion of a random eect for accounted
successfully for variations in frequency among wafers.
1.5
2.0
1.0
1.5
391
2.0
10
0.001
Residuals (mA)
0.0
-0.001
-0.002
0.001
0.0
-0.001
-0.002
1.0
1.5
2.0
1.0
1.5
2.0
1.0
1.5
2.0
Voltage (V)
The normal plots of the within-group residuals and of the estimated sitewithin-wafer random eects, not shown here, do not indicate any violations
of the NLME model assumptions.
392
example of 8.2.1 and the quinidine example of 8.2.2 to illustrate the use
of variance functions in nlme.
Theophylline Kinetics
The plot of the standardized residuals versus the tted values for the tted
object fm3Theo.nlme, displayed in Figure 8.13, suggests that the withingroup variance increases with the concentration of theophylline.
The denition of the rst-order open-compartment model (8.2) implies
that the tted value for the concentration at time t = 0 is
c0 = 0. Therefore, a power variance function, a natural candidate for this type of heteroscedastic pattern, cannot be used in this example, as the corresponding
weights are undened at t = 0. (Davidian and Giltinan (1995, 5.5, p. 145)
argue that the observations at t = 0 do not add any information for the
model and should be omitted from the data. We retain them here for illustration.) The constant plus power variance function, described in 5.2 and
represented in the nlme library by the varConstPower class, accommodates
the problem with
c0 = 0 by adding a constant to the power of the tted
value. In the theophylline example, the varConstPower variance function is
cij2 . We incorporate it in the nlme t using
expressed as g(
cij , ) = 1 +
> fm4Theo.nlme <- update( fm3Theo.nlme,
+
weights = varConstPower(power = 0.1) )
> fm4Theo.nlme
Nonlinear mixed-effects model fit by maximum likelihood
Model: conc ~ SSfol(Dose, Time, lKe, lKa, lCl)
Data: Theoph
Log-likelihood: -167.68
Fixed: list(lKe ~ 1, lKa ~ 1, lCl ~ 1)
lKe
lKa
lCl
-2.4538 0.43348 -3.2275
Random effects:
Formula: list(lKa ~ 1, lCl ~ 1)
Level: Subject
Structure: Diagonal
lKa
lCl Residual
StdDev: 0.6387 0.16979
0.3155
Variance function:
Structure: Constant plus power of variance covariate
Formula: ~ fitted(.)
Parameter estimates:
const
power
0.71966 0.31408
An initial value for the power parameter (2 ) is specied in the call to the
varConstPower constructor to avoid convergence problems associated with
the default value power=0, for this example.
393
Standardized residuals
3
2
1
0
-1
-2
-3
0
10
FIGURE 8.24. Scatter plot of standardized residuals versus tted values for
fm4Theo.nlme.
The small p-value for the likelihood ratio test indicates that the incorporation of the variance function in the model produced a signicant increase in
the log-likelihood. Both the AIC and the BIC also favor the fm4Theo.nlme
t. The plot of the standardized residuals versus the tted values, shown in
Figure 8.24, conrms the adequacy of the varConstPower variance function.
> plot( fm4Theo.nlme )
# Figure 8.24
394
Standardized residuals
3
2
1
0
-1
-2
-3
0
FIGURE 8.25. Scatter plot of standardized residuals versus tted values for
fm4Quin.nlme.
the calculation of weights for the power variance function and we choose
the varPower class to model the within-group heteroscedasticity.
> fm5Quin.nlme <- update( fm4Quin.nlme, weights = varPower() )
> summary( fm5Quin.nlme )
. . .
Random effects:
Formula: list(lV ~ 1, lCl ~ 1)
Level: Subject
Structure: Diagonal
lV lCl.(Intercept) Residual
StdDev: 0.32475
0.25689 0.25548
Variance function:
Structure: Power of variance covariate
Formula: ~ fitted(.)
Parameter estimates:
power
0.96616
Fixed effects: list(lCl ~ glyco + Creatinine + Weight, lKa + lV ~ 1)
Value Std.Error DF t-value p-value
lCl.(Intercept) 2.7076
0.15262 220 17.741 <.0001
lCl.glyco -0.4110
0.04487 220 -9.161 <.0001
lCl.Creatinine 0.1292
0.03494 220
3.696 0.0003
lCl.Weight 0.0033
0.00179 220
1.828 0.0689
lKa -0.4269
0.25518 220 -1.673 0.0958
lV 5.3700
0.08398 220 63.941 <.0001
. . .
395
Standardized residuals
3
2
1
0
-1
-2
-3
0
FIGURE 8.26. Scatter plot of standardized residuals versus tted values for
fm5Quin.nlme.
The incorporation of the power variance function in the NLME model for
the quinidine data produced a signicant increase in the log-likelihood, as
evidenced by the small p-value for the likelihood ratio statistics. The plot
of the standardized residuals versus tted values, displayed in Figure 8.26,
gives further evidence of the adequacy of the variance function model.
> plot( fm5Quin.nlme, xlim = c(0, 6.2) )
# Figure 8.26
396
(8.18)
where yij represents the number of follicles observed for mare i at time tij ,
0i , 1i , and 3i represent the intercept and the terms dening the amplitude and the phase of the cosine wave for mare i, 2i is the frequency of
cosine wave for mare i, and ij is the within-group error. We initially assume
that the within-group errors are independently distributed as N (0, 2 ).
The nal LME model used to t the Ovary data in 5.3.4 used independent random eects for the 0i and 1i coecients in model (8.18). We
use this as a starting point for the random-eects model, incorporating
an extra independent random eect for the frequency 2i . The xed- and
random-eects models corresponding to (8.18) are then expressed as
0i = 0 + b0i ,
1i = 1 + b1i , 2i = 2 + b2i ,
b0i
bi = b1i N (0, ) ,
b2i
3i = 3 ,
(8.19)
397
The estimated xed eects for fm5Ovar.lme are used as initial values for
0 , 1 , and 3 .
As mentioned in 5.3.4, the observations in the Ovary data were collected
at equally spaced calendar times, and then converted to an ovulation cycle
scale. Therefore, the empirical ACF can be used to investigate the correlation at dierent lags. The ACF method can also be used with nlme objects.
> ACF(
lag
1
0
2
1
3
2
4
3
5
4
6
5
7
6
8
7
9
8
10
9
11 10
12 11
13 12
14 13
15 14
fm1Ovar.nlme )
ACF
1.0000000
0.3110027
0.0887701
-0.0668554
-0.0314934
-0.0810381
-0.0010647
0.0216463
0.0137578
0.0097497
-0.0377027
-0.0741284
-0.1504872
-0.1616297
-0.2395797
maxLag = 10),
# Figure 8.27
Figure 8.27 shows that only the lag-1 autocorrelation is signicant at the
5% level, but the lag-2 autocorrelation, which is approximately equal to the
square of the lag-1 autocorrelation, is nearly signicant. This suggests two
dierent candidate correlation structures for modeling the within-group
error covariance structure: AR(1) and MA(2). The two correlation models
are not nested, but can be compared using using the information criteria
provided by the anova method, AIC and BIC. The empirical lag-1 autocorrelation is used as a starting value for the corAR1 coecient.
> fm2Ovar.nlme <- update( fm1Ovar.nlme, corr = corAR1(0.311) )
> fm3Ovar.nlme <- update( fm1Ovar.nlme, corr = corARMA(p=0, q=2) )
> anova( fm2Ovar.nlme, fm3Ovar.nlme, test = F )
Model df
AIC
BIC logLik
fm2Ovar.nlme
1 9 1568.3 1601.9 -775.15
fm3Ovar.nlme
2 10 1572.1 1609.4 -776.07
398
Autocorrelation
1.0
0.8
0.6
0.4
0.2
0.0
0
10
Lag
FIGURE 8.27. Empirical autocorrelation function corresponding to the standardized residuals of the fm1Ovar.nlme tted object.
The AR(1) model uses one fewer parameter than the MA(2) model to give
a larger log-likelihood and hence is the preferred model by both AIC and
BIC.
The approximate 95% condence intervals for the variance components
in fm2Ovar.nlme, obtained with
> intervals( fm2Ovar.nlme )
. . .
Random Effects:
Level: Mare
lower
est.
upper
sd(A) 1.5465e+00 3.3083316 7.0772e+00
sd(B) 3.4902e-01 1.4257894 5.8245e+00
sd(w) 2.4457e-89 0.0020967 1.7974e+83
. . .
The high p-value for the likelihood ratio test suggests that the two models
give essentially equivalent ts so the simpler model is preferred.
399
Autocorrelation
1.0
0.8
0.6
0.4
0.2
0.0
0
10
Lag
FIGURE 8.28. Empirical autocorrelation function corresponding to the normalized residuals of the fm1Ovar.nlme tted object.
The ARMA(1, 1) gives a signicantly better representation of the withingroup correlation, as indicated by the small p-value for the likelihood ratio
test.
The plot of the empirical ACF of the normalized residuals, displayed
in Figure 8.28, attests the the adequacy of the ARMA(1, 1) model for the
Ovary data. No signicant autocorrelations are detected, indicating that the
normalized residuals behave like uncorrelated noise, as expected under the
appropriate correlation model.
> plot( ACF(fm5Ovar.nlme,
+
alpha = 0.05 )
400
lower
est.
upper
sd((Intercept)) 0.988120 2.45659 6.1074
sd(sin(2 * pi * Time)) 0.071055 0.85199 10.2158
. . .
The wide condence interval for the standard deviation of the random
eect corresponding to sin(2*pi*Time) indicates that the t is not very
sensitive to the value of this coecient and perhaps it could be eliminated
from the model. We test this assumption using the likelihood ratio test.
> fm6Ovar.lmeML <- update( fm5Ovar.lmeML, random = ~1 )
> anova( fm5Ovar.lmeML, fm6Ovar.lmeML )
Model df
AIC
BIC logLik
Test L.Ratio
fm5Ovar.lmeML
1 8 1562.7 1592.6 -773.37
fm6Ovar.lmeML
2 7 1561.0 1587.1 -773.51 1 vs 2 0.28057
p-value
fm5Ovar.lmeML
fm6Ovar.lmeML 0.5963
The large p-value for the test indicates that the two models are essentially
equivalent so the simpler model with a single random intercept is preferred.
The LME model represented by fm6Ovar.lmeML is nested within the NLME
model represented by fm5Ovar.nlme, corresponding to the case of 2 = 1.
Hence, we can test the assumption that the frequency of the ovulation cycle
is equal to 1 using the likelihood ratio test.
> anova( fm6Ovar.lmeML, fm5Ovar.nlme )
Model df
AIC
BIC logLik
Test L.Ratio
fm6Ovar.lmeML
1 7 1561.0 1587.1 -773.51
fm5Ovar.nlme
2 8 1562.1 1592.0 -773.07 1 vs 2 0.87881
p-value
fm6Ovar.lmeML
fm5Ovar.nlme 0.3485
There is no signicant evidence that 2 = 1. This conclusion is also supported by the approximate condence interval for 2 , which contains 1.
> intervals( fm5Ovar.nlme, which = "fixed" )
Approximate 95% confidence intervals
Fixed effects:
lower
est.
A 10.37917 12.15566
B -3.91347 -2.87191
C -3.07594 -1.56879
w
0.81565
0.93111
upper
13.932151
-1.830353
-0.061645
1.046568
401
402
an asymptotic regression model with an oset, identical to the one used for
the CO2 uptake data in 8.2.2. The model for the expected ultraltration
rate y at transmembrane pressure x is written as
E[y] = 1 {1 exp [ exp(2 ) (x 3 )]} .
(8.20)
The coecient estimates suggest that Asym (1 ) and lrc (log 2 ) depend on
the blood ow rate level, but c0 (3 ) does not. The plot of the individual
QB
200
lrc
300
| |
45
c0
| | |
|
|
50
55
60
65
0.2
0.4
403
0.6
|
0.8
|
|
19
|
0.21
|
0.23
0.25
FIGURE 8.29. Ninety-ve percent condence intervals on the asymptotic regression model parameters for each level of blood ow rate (QB) in the dialyzer
data.
condence intervals in Figure 8.29 conrms that only Asym and lrc vary
with blood ow level.
> plot( intervals(fm1Dial.lis) )
# Figure 8.29
The ultraltration rate yij at the jth transmembrane pressure xij for
the ith subject is represented by the nonlinear model
yij = (1 + 1 Qi ) {1 exp [ exp (2 + 2 Qi) (xij 3 )]} + ij ,
(8.21)
where Qi is a binary variable taking values 1 for 200 dl/min hemodialyzers and 1 for 300 dl/min hemodialyzers; 1 , 2 , and 3 are, respectively,
the asymptotic ultraltration rate, the log-transport rate, and the transmembrane pressure oset averaged over the levels of Q; i is the blood ow
eect associated with the coecient i ; and ij is the error term, initially
assumed to be independently distributed N (0, 2 ) random variables.
The nonlinear model (8.21) can be tted with nls, but it is easier to
express the dependency of the asymptote and the log-rate on the blood
ow rate using gnls. The average of the fm1Dial.lis coecients are used
as the initial estimates for 1 , 2 , and 3 , while the half dierences between
the rst two coecients are used as initial estimates for 1 and 2 .
> fm1Dial.gnls <- gnls( rate ~ SSasympOff(pressure, Asym, lrc, c0),
+
data = Dialyzer, params = list(Asym + lrc ~ QB, c0 ~ 1),
+
start = c(53.6, 8.6, 0.51, -0.26, 0.225) )
> fm1Dial.gnls
Generalized nonlinear least squares fit
Model: rate ~ SSasympOff(pressure, Asym, lrc, c0)
Data: Dialyzer
Log-likelihood: -382.65
Coefficients:
Asym.(Intercept) Asym.QB lrc.(Intercept)
lrc.QB
c0
53.606
8.62
0.50874 -0.25684 0.22449
Degrees of freedom: 140 total; 135 residual
Residual standard error: 3.7902
404
t value
75.9937
12.6906
9.2105
-5.7047
21.1318
indicates that the error variability increases with the transmembrane pressure. This heteroscedastic pattern is also observed in the linear model ts
of the hemodialyzer data, presented in 5.2.2 and 5.4.
As in the previous analyses of the hemodialyzer data presented in 5.2.2
and 5.4, the power variance function, represented in nlme by the varPower
class, is used to model the heteroscedasticity in the ultraltration rates.
> fm2Dial.gnls <- update( fm1Dial.gnls,
+
weights = varPower(form = ~ pressure) )
> anova( fm1Dial.gnls, fm2Dial.gnls)
Model df
AIC
BIC logLik
Test L.Ratio p-value
fm1Dial.gnls
1 6 777.29 794.94 -382.65
fm2Dial.gnls
2 7 748.47 769.07 -367.24 1 vs 2 30.815 <.0001
405
10
Residuals (ml/hr)
-5
-10
0.5
1.0
1.5
2.0
2.5
3.0
FIGURE 8.30. Plot of residuals versus transmembrane pressure for the homoscedastic tted object fm1Dial.gls.
versus pressure, shown in Figure 8.31, indicates that the power variance
function successfully models the heteroscedasticity in the data.
The hemodialyzer ultraltration rates measurements made sequentially
on the same subject are correlated. Random eects can be used in an
NLME model to account for the within-group correlation, but we choose
here to model the within-subject dependence directly by incorporating a
correlation structure for the error term in the gnls model. Because the
measurements are equally spaced in time, the empirical autocorrelation
function can be used to investigate the within-subject correlation. The ACF
method is used to obtain the empirical ACF, with the time covariate and
the grouping factor specied via the form argument.
> ACF( fm2Dial.gnls, form = ~ 1 | Subject )
lag
ACF
1
0 1.00000
2
1 0.71567
3
2 0.50454
4
3 0.29481
5
4 0.20975
6
5 0.13857
7
6 -0.00202
The empirical ACF values conrm the within-group correlation and indicates that the correlation decreases with lag. As usual, it is more informative to look at a plot of the empirical ACF, displayed in Figure 8.32 and
obtained with
406
Standardized residuals
-1
-2
0.5
1.0
1.5
2.0
2.5
3.0
# Figure 8.32
Autocorrelation
407
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
0
Lag
FIGURE 8.32. Empirical autocorrelation function corresponding to the standardized residuals of the fm2Dial.gnls tted object.
The lag-1 empirical autocorrelation is used as initial value for the corAR1.
The variability in the estimates is assessed with the intervals method.
> intervals( fm3Dial.gnls )
. . .
Correlation structure:
lower
est.
upper
Phi 0.55913 0.7444 0.85886
. . .
The plot of the empirical ACF for the normalized residuals corresponding
to fm3Dial.gnls, displayed in Figure 8.33, does not show any signicant
correlations, indicating that the AR(1) adequately represents the withinsubject dependence in the gnls model for the hemodialyzer data.
The plot of the standardized residuals versus the transmembrane pressure
in Figure 8.34 suggests a certain lack-of-t for the asymptotic regression
model (8.21): the residuals for the highest two transmembrane pressures
are predominantly negative. This is consistent with the plot of the hemodialyzer data, shown in Figure 5.1, and also with Figure 3 in Vonesh and
Carter (1992), which indicate that, for many subjects, the ultraltration
rates decrease for the highest two transmembrane pressures. The asymptotic regression model is monotonically increasing in the transmembrane
pressure and cannot properly accommodate the nonmonotonic behavior of
the ultraltration rates.
Autocorrelation
408
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
0
Lag
Standardized residuals
-1
-2
0.5
1.0
1.5
2.0
2.5
3.0
409
As the models are not nested, only the information criterion statistics can
be compared.
> anova(
The more conservative BIC favors the fm3Dial.gnls model because of the
fewer number of parameters it uses; the more liberal AIC favors
fm3Dial.glsML because of its larger log-likelihood value. In practice, the
choice of the best model should take into account other factors besides
the information criteria, such as the interpretability of the parameters.
410
Exercises
1. Plots of the DNase data in 3.3.2 and in Appendix A.7 suggest a
sigmoidal relationship between the density and log(conc). Because
there are no data points at suciently high DNase concentrations to
dene the upper part of the curve, the sigmoidal relationship is only
suggested and is not denite.
A common model for this type of assay data is the four-parameter
logistic model available as SSfpl (Appendix C.6).
(a) Create separate ts of the SSfpl model to each run using nlsList.
Note that the display formula, density ~conc | Run, denes the
primary covariate as conc but the model should be t as a function of log(conc). You will either need to give explicit arguments
to the SSfpl function or dene a new groupedData object with
a formula based on log(conc).
(b) Examine the plot of the residuals versus the tted values from
this nlsList t. Does the plot indicate nonconstant variance?
(c) Fit an NLME model to these data using random eects for each
model parameter and a general positive-denite . Perform the
usual checks on the model using diagnostic plots, summaries,
and intervals. Can the condence intervals on the variance
covariance parameters be evaluated?
(d) Davidian and Giltinan (1995, 5.2.4) conclude from a similar
analysis that the variance of the ij increases with increasing
optical density. They estimate a variance function that corresponds to the varPower variance function. Update the previous
t by adding weights = varPower(). Does this updated model
converge? If not, change the denition of the random eects so
that is diagonal.
(e) Compare the model t with the varPower variance function to
one t without it. Note that if you have t the model with the
variance function by imposing a diagonal , you should ret
the model without the variance function but with a diagonal
before comparing. Does the addition of the varPower variance
function make a signicant contribution to the model?
(f) Examine the condence intervals on your best-tting model to
this point. Are there parameters that can be modeled as purely
Exercises
411
xed eects? (In our analysis the standard deviation of the random eect for the xmid parameter turned out to be negligible.)
If you determine that some random eects are no longer needed,
remove them and ret the model.
(g) If you have modied the model to reduce the number of random
eects, ret with a general positive-denite rather than a
diagonal . Does this t converge? If so, compare the ts with
diagonal and general positive-denite .
(h) Write a report on the analysis, including data plots and diagnostic plots where appropriate.
2. As shown in Figure 3.4 (p. 107), the relationship between deltaBP
and log(dose) in the phenybiguanide data PBG is roughly sigmoidal.
There is a strong indication that the eect of the Treatment is to shift
the curve to the right.
(a) Fit separate four-parameter logistic models (SSfpl, Appendix
C.6) to the data from each Treatment within each Rabbit using
nlsList. Recall from 3.2.1 that the primary covariate in the
display formula for PBG is dose but we want the model to be t as
a function of log(dose). You will either need to specify the model
explicitly or to re-dene the display formula for the data. Also
note that the grouping formula should be ~Rabbit/Treatment,
but the grouping in the display formula is ~Rabbit.
(b) Plot the condence intervals on the coecients. Which parameters appear to be constant across all Rabbit/Treatment combinations? Does there appear to be a systematic shift in the xmid
parameter according to Treatment?
(c) Fit a two-level NLME model with a xed eect for Treatment on
the xmid parameter and with random eects for Rabbit on the
B parameter and for Treatment within Rabbit on the B and xmid
parameters. Begin with a diagonal 2 matrix. If that model
t converges, update to a general positive-denite 2 matrix
and compare the two tted models with anova. Which model is
preferred?
(d) Summarize your preferred model. Is the xed eect for Treatment
on xmid signicant? Also check using the output from intervals.
(e) Plot the augmented predictions from your preferred model. Remember that if the display formula for the data has dose as the
primary covariate then you will need to use scales = list(x =
list(log = 2)) to get the symmetric shape of the logistic curve.
Adjust the layout argument for the plot so the panels are aligned
by Rabbit. Do these plots indicate deciencies in the model?
412
(f) Examine residual plots for other possible deciencies in the model.
Also check plots of the random eects versus covariates to see if
important xed eects have been omitted.
(g) Is it necessary to use a four-parameter logistic curve? Experiment with tting the three-parameter logistic model (SSlogis,
Appendix C.7) to see if a comparable t can be obtained. Check
residual plots from your tted models and comment.
3. The Glucose2 data described in Appendix A.10 consist of blood glucose levels measured 14 times over a period of 5 hours on 7 volunteers
who took alcohol at time 0. The same experiment was repeated on a
second occasion with the same subjects but with a dietary additive
used for all subjects. These data are analyzed in Hand and Crowder (1996, Example 8.4, pp. 118120), where the following empirical
model relating the expected glucose level to Time is proposed.
glucose = 1 + 2 Time3 exp (3 Time)
Note that there are two levels of grouping in the data: Subject and
Date within Subject.
(a) Plot the data at the Subject display level (use plot(Glucose2,
display = 1). Do there appear to be systematic dierences between the two dates on which the experiments were conducted
(which could be associated with the dietary supplement)?
(b) There is no self-starting function representing the model for the
glucose level included in the nlme library. Use nlsList with starting values start = c(phi1=5, phi2=-1, phi3=1) (derived from
Hand and Crowder (1996)) to t separate models for each Subject
and for each Date within Subject. Plot the individual condence
intervals for each of the two nlsList ts. Verify that phi1 and
phi2 seem to vary signicantly for both levels of grouping, but
phi3 does not. (There is an unusual estimate of phi3 for Subject
6, Date 1 but all other condence intervals overlap.)
(c) Fit a two-level NLME model with random eects for phi1 and
phi2, using as starting values for the xed eects the estimates
from either of the nlsList ts (start = fixef(object), with
object replaced with the name of the nlsList object). Examine the condence intervals on the variancecovariance components; what can you say about the precision of the estimated
correlation coecients?
(d) Ret the NLME model using diagonal q matrices for both
grouping levels and compare the new t to the previous one
using anova. Investigate if there are random eects that can be
dropped from the model using intervals. If so, ret the model
Exercises
413
414
(a) Use the gnls function described in 8.3.3 to t the SSfol model
to the theophylline concentrations with no random eects. Compare this t to fm3Theo.nlme using anova. Obtain the boxplots of
the residuals by Subject (plot(object, Subject~resid(.))) and
comment on the observed pattern.
(b) Print and plot the ACF of the standardized residuals for the gnls
t (use form = ~1 | Subject to specify the grouping structure).
The decrease in the ACF with lag suggests that an AR1 model
may be adequate.
(c) Update the gnls t incorporating an AR1 correlation structure,
using the lag-1 autocorrelation from the ACF output as an initial
estimate for the correlation parameter (corr = corAR1(0.725,
form = ~1 | Subject)). Compare this t to the previous gnls
t using anova. Is there signicant evidence of autocorrelation?
Examine the plot of the ACF of the normalized residuals. Does
the AR1 model seem adequate?
(d) Compare the gnls t with AR1 correlation structure to the
fm3Theo.nlme t of 8.2.1 using anova with the argument test
set to FALSE (why?). Which model seems better?
References
Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Dover, New
York.
Bates, D. M. and Chambers, J. M. (1992). Nonlinear models, in Chambers and Hastie (1992), Chapter 10, pp. 421454.
Bates, D. M. and Pinheiro, J. C. (1998). Computational methods for multilevel models, Technical Memorandum BL0112140-980226-01TM, Bell
Labs, Lucent Technologies, Murray Hill, NJ.
Bates, D. M. and Watts, D. G. (1980). Relative curvature measures of
nonlinearity, Journal of the Royal Statistical Society, Ser. B 42: 125.
Bates, D. M. and Watts, D. G. (1988). Nonlinear Regression Analysis and
Its Applications, Wiley, New York.
Beal, S. and Sheiner, L. (1980). The NONMEM system, American Statistician 34: 118119.
Becker, R. A., Cleveland, W. S. and Shyu, M.-J. (1996). The visual design
and control of trellis graphics displays, Journal of Computational and
Graphical Statistics 5(2): 123156.
Bennett, J. E. and Wakeeld, J. C. (1993). Markov chain Monte Carlo for
nonlinear hierarchical models, Technical Report TR-93-11, Statistics
Section, Imperial College, London.
416
References
References
417
418
References
References
419
420
References
References
421
Appendix A
Data Used in Examples and Exercises
We have used several sets of data in our examples and exercises. In this
appendix we list all the data sets that are available as the NLMEDATA
library included with the nlme 3.1 distribution and we describe in greater
detail the data sets referenced in the text.
The title of each section in this appendix gives the name of the corresponding groupedData object from the nlme library, followed by a short
description of the data. The formula stored with the data and a short description of each of the columns is also given.
We have adopted certain conventions for the ordering and naming of
columns in these descriptions. The rst column provides the response, the
second column is the primary covariate, if present, and the next column
is the primary grouping factor. Other covariates and grouping factors, if
present, follow. Usually we use lowercase for the names of the response and
the primary covariate. One exception to this rule is the name Time for a
covariate. We try to avoid using the name time because it conicts with a
standard S function.
Table A.1 lists the groupedData objects in the NLMEDATA library that is
part of the nlme distribution.
424
TABLE A.1: Data sets included with the nlme library distribution. The
data sets whose names are shown in bold are described in this appendix.
Alfalfa
Assay
BodyWeight
CO2
Cephamadole
ChickWeight
Dialyzer
DNase
Earthquake
ergoStool
Fatigue
Gasoline
Glucose
Glucose2
Gun
IGF
Indometh
Loblolly
Machines
MathAchSchool
MathAchieve
Meat
Milk
Muscle
Nitrendipene
Oats
Orange
Orthodont
Ovary
Oxboys
Oxide
PBG
PBIB
Phenobarb
Pixel
Quinidine
Rail
RatPupWeight
Relaxin
Remifentanil
Soybean
425
Spruce
Tetracycline1
Tetracycline2
Theoph
Wafer
Wheat
Wheat2
Other data sets may be included with later versions of the library, which
will be made available at http://nlme.stat.wisc.edu.
426
Block/Variety
None
4/Ladak
4/Ranger
4/Cossack
1/Cossack
1/Ladak
1/Ranger
3/Ranger
3/Cossack
3/Ladak
2/Ladak
2/Cossack
2/Ranger
5/Cossack
5/Ladak
5/Ranger
6/Ladak
6/Cossack
6/Ranger
>
S1
+
+
+
S20
>
O7
s
s>
>
s
+
>
+
+ >s
>
+
>
+
+
+
>
s
>
>
s
s
+
>
+>
> +
s
s
>
>
s
s>
+
s
>
+s
>
1.0
1.2
1.4
1.6
1.8
2.0
2.2
FIGURE A.1. Plot yields in a split-plot experiment on alfalfa varieties and dates
of third cutting.
from top to bottom and the 12 columns on the plate are labeled 112 from
left to right. Only the central 60 wells of the plate are used for the bioassay
(the intersection of rows BG and columns 211). There are two blocks in
the design: Block 1 contains columns 26 and Block 2 contains columns
711. Within each block, six samples are assigned randomly to rows and
ve (serial) dilutions are assigned randomly to columns. The response variable is the logarithm of the optical density. The cells are treated with a
compound that they metabolize to produce the stain. Only live cells can
make the stain, so the optical density is a measure of the number of cells
that are alive and healthy. The data are displayed in Figure 4.13 (p. 164).
Columns
The display formula for these data is
logDens ~ 1 | Block
ing from a to f.
427
dilut: a factor indicating the dilution applied to the well, varying from
1 to 5.
A.4 CefamandolePharmacokinetics of
Cefamandole
Davidian and Giltinan (1995, 1.1, p. 2) describe data, shown in Figure A.2,
obtained during a pilot study to investigate the pharmacokinetics of the
drug cefamandole. Plasma concentrations of the drug were measured on six
healthy volunteers at 14 time points following an intraveneous dose of 15
mg/kg body weight of cefamandole.
Columns
The display formula for these data is
conc ~ Time | Subject
428
100
200
300
250
200
150
100
50
0
2
250
200
150
100
50
0
0
100
200
300
100
200
300
Models
Davidian and Giltinan (1995) use the biexponential model SSbiexp (C.4,
p. 514) with these data.
429
Models
Potvin et al. (1990) suggest using a modied form of the asymptotic regression model SSasymp (C.1, p. 511), which we have coded as SSasympOff
(C.2, p. 512).
430
Columns
The display formula for these data is
density ~ conc | Run
Models
Davidian and Giltinan (1995) use the four-parameter logistic model, SSfpl
(C.6, p. 517) with these data, modeling the optical density as a logistic
function of the logarithm of the concentration.
were made.
Richter: the intensity of the earthquake on the Richter scale.
soil: soil condition at the measuring stationeither soil or rock.
17
10
431
100
11
10^-1
10^-2
12
19
10^-1
10^-2
10^-1
acceleration (g)
10^-2
18
15
13
21
10^-1
10^-2
10^-1
10^-2
23
22
10
10^-1
10^-2
10^-1
10^-2
20
16
14
10^-1
10^-2
1
10
100
10
100
432
2
8
6
4
2
3
7
8
6
4
2
8
6
4
2
0
10
15
20
25
30
10
15
20
25
30
FIGURE A.4. Blood glucose levels of seven subjects measured over a period of 5
hours on two dierent occasions. In both dates the subjects took alcohol at time
0, but on the second occasion a dietary additive was used.
iment
433
sured.
Date: a factor indicating the occasion in which the experiment was con-
ducted.
434
10 15 20 25
10 15 20 25
323
309
303
305
311
315
321
319
60
50
40
30
20
10
301
60
50
40
30
20
10
329
327
325
307
331
60
50
40
30
20
10
5
10 15 20 25
10 15 20 25
10 15 20 25
Models
Davidian and Giltinan (1995) use the biexponential model SSbiexp (C.4,
p. 514) with these data.
435
436
Models
The logistic growth model, SSlogis (C.7, p. 519) provides a reasonable t
to these data.
made.
Sex: a factor indicating if the subject is male or female.
Models:
Based on the relationship shown in Figure 1.11 we begin with a simple
linear relationship between distance and age
437
eter.
Time: time in the estrus cycle. The data were recorded daily from 3 days
before ovulation until 3 days after the next ovulation. The measurement times for each mare are scaled so that the ovulations for each
mare occur at times 0 and 1.
Mare: a factor indicating the mare on which the measurement is made.
ment
Occasion: an ordered factorthe result of converting age from a con-
438
Models
The form of the response suggests a logistic model SSlogis (C.7, p. 519)
for the change in blood pressure as function of the logarithm of the concentration of PBG.
Block
+
>
5
13
6
4
15
7
14
10
9
3
2
12
8
11
1
s
w
#
1
2
3
4
5
6
>
s
w
7
8
9
w
s
s
2.0
>
> s
{
{ s
#
> # w #
+
s
w
#
+{
>
w {
+ + #
>
w
# {
+{
> >
+
2.5
#
{
13
14
15
{
>
+
10
11
12
439
#
+
w
s
s
w
{
3.0
3.5
4.0
response
440
or 5.
dose: dose of drug administered (g/kg).
Models
A one-compartments open model with intravenous administration and rstorder elimination, described in 6.4, is used for these data
441
drawn. This is measured from the time the patient entered the study.
Subject: a factor identifying the patient on whom the data were col-
lected.
dose: dose of drug administered (mg). Although there were two dif-
442
2000
6000
2000
6000
2000
6000
2000
6000
69
81
54
41
74
28
51
78
25
61
64
60
59
10
76
94
58
113
50
39
98
30
82
93
108
119
32
133
128
136
21
43
90
102
40
84
107
130
66
139
33
80
125
110
79
95
114
135
105
116
62
65
112
127
55
68
124
35
47
20
96
99
134
12
49
67
85
72
100
71
14
26
75
87
103
138
45
44
97
36
37
104
118
137
17
29
34
46
73
223
19
38
42
52
56
63
83
57
77
115
121
123
11
48
126
53
122
129
132
16
106
15
22
88
91
117
120
13
89
27
109
70
23
92
111
18
24
6
2
6
2
6
2
6
2
6
2
6
2
6
2
6
2
6
2
6
2
6
2
6
2
6
2
6
2
6
2
6
2
0
2000
6000
2000
6000
2000
6000
2000
6000
no or yes.
Ethanol: a factor giving ethanol (alcohol) abuse status at the time of
443
The value given is the original travel time minus 36,100 nanoseconds.
Rail: a factor giving the number of the rail on which the measurement
was made.
444
Models
The form of the response suggests a logistic model, SSlogis ( C.7, p. 519).
200
400
600
200
400
600
200
400
445
600
O1T07
O1T01
O1T06
O1T17
O1T05
O1T09
O1T04
O1T25
O1T12
O1T08
O1T13
O1T23
O1T03
O1T02
O1T11
O1T21
O1T20
O1T27
O1T14
O1T22
6
5
log(Size)
4
3
7
6
5
4
O1T24
O1T18
O1T19
O1T15
O1T10
O1T26
O1T16
7
6
4
3
200
400
600
200
400
600
200
400
600
200
400
600
446
200
400
600
200
400
600
200
400
600
O2T03
O2T12
O2T14
O2T02
O2T10
O2T08
O2T17
O2T05
O2T07
O2T16
O2T13
O2T19
O2T11
O2T27
O2T21
O2T24
O2T15
O2T25
O2T01
O2T20
6
5
log(Size)
4
3
7
6
5
4
O2T18
O2T23
O2T09
O2T26
O2T22
O2T04
O2T06
7
6
4
3
200
400
600
200
400
600
200
400
600
200
400
600
200
N2T06
400
600
200
N2T07
N2T04
400
600
200
N2T01
N2T12
400
447
600
N2T05
7
6
5
4
N2T11
N2T10
N2T13
N2T09
N2T02
N1T05
N1T04
N1T06
N1T02
N1T07
N1T10
N1T09
N1T12
N1T08
N1T03
N2T03
N2T08
N1T01
N1T11
6
5
log(Size)
4
3
7
6
4
3
200
400
600
200
400
600
200
400
600
200
400
600
448
Models:
Both Boeckmann et al. (1994) and Davidian and Giltinan (1995) use a twocompartment open pharmacokinetic model, which we code as SSfol (C.5,
p. 516), for these data.
449
variety: a factor giving the unique identier for each wheat variety.
Block: a factor giving a unique identier for each block in the experi-
ment.
latitude: latitude of the experimental unit.
longitude: longitude of the experimental unit.
Appendix B
S Functions and Classes
There are over 300 dierent functions and classes dened in the nlme library.
In this appendix we reproduce the on-line documentation for those functions and classes that are most frequently used in the examples in the
text. The documentation for all the functions and classes in the library is
available with the library.
Autocorrelation Function
ACF
ACF(object, maxLag, ...)
Arguments
object
maxLag
...
Description
This function is generic; method functions can be written to handle
specic classes of objects. Classes that already have methods for this
function include gls and lme.
452
Value
Will depend on the method function used; see the appropriate documentation.
See Also
ACF.gls, ACF.lme
ACF.lme
maxLag
resType
Description
This method function calculates the empirical autocorrelation function (Box et al., 1994) for the within-group residuals from an lme t.
The autocorrelation values are calculated using pairs of residuals within
the innermost group level. The autocorrelation function is useful for investigating serial correlation models for equally spaced data.
Value
A data frame with columns lag and ACF representing, respectively, the
lag between residuals within a pair and the corresponding empirical
autocorrelation. The returned value inherits from class ACF.
anova.lme
453
See Also
ACF.gls, plot.ACF
Examples
fm1 <- lme(follicles ~ sin(2*pi*Time) + cos(2*pi*Time), Ovary,
random = ~ sin(2*pi*Time) | Mare)
ACF(fm1, maxLag = 11)
anova.lme
...
test
An optional logical value controlling whether likelihood ratio tests should be used to compare the tted
models represented by object and the objects in ....
Defaults to TRUE.
type
adjustSigma
454
Terms
verbose
An optional logical value. If TRUE, the calling sequences for each tted model object are printed with
the rest of the output, being omitted if verbose =
FALSE. Defaults to FALSE.
Description
When only one tted model object is present, a data frame with the
sums of squares, numerator degrees of freedom, denominator degrees
of freedom, F-values, and p-values for Wald tests for the terms in the
model (when Terms and L are NULL), a combination of model terms
(when Terms in not NULL), or linear combinations of the model coefcients (when L is not NULL). Otherwise, when multiple tted objects
are being compared, a data frame with the degrees of freedom, the (restricted) log-likelihood, the Akaike Information Criterion (AIC), and
the Bayesian Information Criterion (BIC) of each object is returned.
If test=TRUE, whenever two consecutive objects have dierent number
of degrees of freedom, a likelihood ratio statistic, with the associated
p-value is included in the returned data frame.
Value
A data frame inheriting from class anova.lme.
coef.lme
455
Note
Likelihood comparisons are not meaningful for objects t using
restricted maximum likelihood and with dierent xed eects.
See Also
gls, gnls, nlme, lme, AIC, BIC, print.anova.lme
Examples
fm1 <- lme(distance ~ age, Orthodont, random = ~ age | Subject)
anova(fm1)
fm2 <- update(fm1, random = pdDiag(~age))
anova(fm1, fm2)
coef.lme
augFrame
level
An optional positive integer giving the level of grouping to be used in extracting the coecients from an
object with multiple nested grouping levels. Defaults
to the highest or innermost level of grouping.
data
which
An optional positive integer or character vector specifying which columns of data should be used in the
augmentation of the returned data frame. Defaults
to all columns in data.
456
Description
The estimated coecients at level i are obtained by adding together the
xed-eects estimates and the corresponding random-eects estimates
at grouping levels less or equal to i. The resulting estimates are returned
as a data frame, with rows corresponding to groups and columns to
coecients. Optionally, the returned data frame may be augmented
with covariates summarized over groups.
Value
A data frame inheriting from class coef.lme with the estimated coecients at level level and, optionally, other covariates summarized over
groups. The returned object also inherits from classes ranef.lme and
data.frame.
See Also
lme, fixef.lme, ranef.lme, plot.ranef.lme, gsummary
Examples
fm1 <- lme(distance ~ age, Orthodont, random = ~ age | Subject)
coef.lmList
457
coef(fm1)
coef(fm1, augFrame = TRUE)
coef.lmList
data
which
FUN
458
omitGroupingFactor
An optional logical value. When TRUE the grouping
factor itself will be omitted from the groupwise summary of data but the levels of the grouping factor
will continue to be used as the row names for the
returned data frame. Defaults to FALSE.
Description
The coecients of each lm object in the object list are extracted and
organized into a data frame, with rows corresponding to the lm components and columns corresponding to the coecients. Optionally, the
returned data frame may be augmented with covariates summarized
over the groups associated with the lm components.
Value
A data frame inheriting from class coef.lmList with the estimated coecients for each lm component of object and, optionally, other covariates summarized over the groups corresponding to the lm components. The returned object also inherits from classes ranef.lmList and
data.frame.
See Also
lmList, fixed.effects.lmList, ranef.lmList,
plot.ranef.lmList, gsummary
Examples
fm1 <- lmList(distance ~ age|Subject, data = Orthodont)
coef(fm1)
coef(fm1, augFrame = TRUE)
fitted.lme
level
An optional integer vector giving the level(s) of grouping to be used in extracting the tted values from
object. Level values increase from outermost to innermost grouping, with level zero corresponding to
fixef
459
Description
The tted values at level i are obtained by adding together the population-tted values (based only on the xed-eects estimates) and the
estimated contributions of the random eects to the tted values at
grouping levels less or equal to i. The resulting values estimate the
best linear unbiased predictions (BLUPs) at level i.
Value
If a single level of grouping is specied in level, the returned value is
either a list with the tted values split by groups (asList = TRUE) or
a vector with the tted values (asList = FALSE); else, when multiple
grouping levels are specied in level, the returned object is a data
frame with columns given by the tted values at dierent levels and
the grouping factors.
See Also
lme, residuals.lme
Examples
fm1 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)
fitted(fm1, level = 0:1)
fixef
fixef(object, ...)
fixed.effects(object, ...)
Arguments
object
...
460
Description
This function is generic; method functions can be written to handle
specic classes of objects. Classes that already have methods for this
function include lmList and lme.
Value
Will depend on the method function used; see the appropriate documentation.
See Also
fixef.lmList, fixef.lme
gapply
An object to which the function will be applied, usually a groupedData object or a data.frame. Must
inherit from class data.frame.
which
An optional character or positive integer vector specifying which columns of object should be used with
FUN. Defaults to all columns in object.
FUN
form
level
An optional positive integer giving the level of grouping to be used in an object with multiple nested
grouping levels. Defaults to the highest or innermost
level of grouping.
groups
...
Optional additional arguments to the summary function FUN. Often it is helpful to specify na.rm = TRUE.
getGroups
461
Description
Applies the function to the distinct sets of rows of the data frame
dened by groups.
Value
Returns a data frame with as many rows as there are levels in the
groups argument.
See Also
gsummary
Examples
## Find number of nonmissing "conc" observations for each Subject
gapply( Quinidine, FUN = function(x) sum(!is.na(x$conc)) )
getGroups
level
data
Any object.
An optional formula with a conditioning expression
on its right hand side (i.e., an expression involving
the | operator). Defaults to formula(object).
A positive integer vector with the level(s) of grouping
to be used when multiple nested levels of grouping are
present. This argument is optional for most methods
of this generic function and defaults to all levels of
nesting.
A data frame in which to interpret the variables named
in form. Optional for most methods.
Description
This function is generic; method functions can be written to handle
specic classes of objects. Classes that already have methods for this
function include corStruct, data.frame, gls, lme, lmList, and varFunc.
Value
Will depend on the method function used; see the appropriate documentation.
462
See Also
getGroupsFormula, getGroups.data.frame, getGroups.gls,
getGroups.lmList, getGroups.lme
gls
data
correlation
An optional corStruct object describing the withingroup correlation structure. See the documentation
of corClasses for a description of the available
corStruct classes. If a grouping variable is to be used,
it must be specied in the form argument to the
corStruct constructor. Defaults to NULL, corresponding to uncorrelated errors.
weights
An optional varFunc object or one-sided formula describing the within-group heteroscedasticity structure.
If given as a formula, it is used as the argument to
varFixed, corresponding to xed variance weights.
See the documentation on varClasses for a description of the available varFunc classes. Defaults to NULL,
corresponding to homoscesdatic errors.
subset
method
A character string. If "REML" the model is t by maximizing the restricted log-likelihood. If "ML" the loglikelihood is maximized. Defaults to "REML".
gls
463
na.action
control
verbose
Description
This function ts a linear model using generalized least squares. The
errors are allowed to be correlated and/or have unequal variances.
Value
An object of class gls representing the linear model t. Generic functions such as print, plot, and summary have methods to show the
results of the t. See glsObject for the components of the t. The
functions resid, coef, and fitted can be used to extract some of its
components.
References
The dierent correlation structures available for the correlation argument are described in Box et al. (1994), Littell et al. (1996), and
Venables and Ripley (1999). The use of variance functions for linear
and nonlinear models is presented in detail in Carroll and Ruppert
(1988) and Davidian and Giltinan (1995).
See Also
glsControl, glsObject, varFunc, corClasses, varClasses
Examples
# AR(1) errors within each Mare
fm1 <- gls(follicles ~ sin(2*pi*Time) + cos(2*pi*Time), Ovary,
correlation = corAR1(form = ~ 1 | Mare))
# variance increases as a power of the absolute fitted values
fm2 <- gls(follicles ~ sin(2*pi*Time) + cos(2*pi*Time), Ovary,
weights = varPower())
464
gnls
data
params
start
correlation
An optional corStruct object describing the withingroup correlation structure. See the documentation
of corClasses for a description of the available
corStruct classes. If a grouping variable is to be used,
it must be specied in the form argument to the
corStruct constructor. Defaults to NULL, corresponding to uncorrelated errors.
gnls
465
weights
An optional varFunc object or one-sided formula describing the within-group heteroscedastic structure.
If given as a formula, it is used as the argument to
varFixed, corresponding to xed variance weights.
See the documentation on varClasses for a description of the available varFunc classes. Defaults to NULL,
corresponding to homoscesdatic errors.
subset
na.action
naPattern
An expression or formula object, specifying which returned values are to be regarded as missing.
control
verbose
Description
This function ts a nonlinear model using generalized least squares.
The errors are allowed to be correlated and/or have unequal variances.
Value
An object of class gnls, also inheriting from class gls, representing the
nonlinear model t. Generic functions such as print, plot and summary
have methods to show the results of the t. See gnlsObject for the
components of the t. The functions resid, coef, and fitted can be
used to extract some of its components.
References
The dierent correlation structures available for the correlation argument are described in Box et al. (1994), Littell et al. (1996), and
Venables and Ripley (1999). The use of variance functions for linear
466
groupedData
data
order.groups
An optional logical value, or list of logical values, indicating if the grouping factors should be converted
to ordered factors according to the function FUN applied to the response from each group. If multiple
levels of grouping are present, this argument can be
either a single logical value (which will be repeated
for all grouping levels) or a list of logical values. If
no names are assigned to the list elements, they are
groupedData
467
assumed in the same order as the group levels (outermost to innermost grouping). Ordering within a level
of grouping is done within the levels of the grouping
factors which are outer to it. Changing the grouping
factor to an ordered factor does not aect the ordering of the rows in the data frame, but it does aect
the order of the panels in a trellis display of the data
or models tted to the data. Defaults to TRUE.
FUN
outer
inner
labels
468
Description
An object of the groupedData class is constructed from the formula and
data by attaching the formula as an attribute of the data, along with
any of outer, inner, labels, and units that are given. If
order.groups is TRUE the grouping factor is converted to an ordered
factor with the ordering determined by FUN. Depending on the number of grouping levels and the type of primary covariate, the returned
object will be of one of three classes: nfnGroupedDatanumeric covariate, single level of nesting; nGroupedDatafactor covariate, single
level of nesting; and nmGroupedDatamultiple levels of nesting. Several modeling and plotting functions can use the formula stored with
a groupedData object to construct default plots and models.
Value
An object of one of the classes nfnGroupedData, nGroupedData, or nmGroupedData, also inheriting from classes groupedData and data.frame.
See Also
formula, gapply, gsummary, lme
Examples
Orth.new <- # create a new copy of the groupedData object
groupedData( distance ~ age | Subject,
data = as.data.frame( Orthodont ),
FUN = mean,
outer = ~ Sex,
labels = list(x = "Age",
y = "Distance from pituitary to pterygomaxillary fissure"),
units = list( x = "(yr)", y = "(mm)") )
plot( Orth.new )
# trellis plot by Subject
formula( Orth.new )
# extractor for the formula
gsummary( Orth.new )
# apply summary by Subject
fm1 <- lme( Orth.new )
# fixed and groups formulae extracted
# from object
gsummary
gsummary
469
Summarize by Groups
form
level
An optional positive integer giving the level of grouping to be used in an object with multiple nested
470
invariantsOnly
An optional logical value. When TRUE only those covariates that are invariant within each group will
be summarized. The summary value for the group
is always the unique value taken on by that covariate within the group. The columns in the summary
are of the same class as the corresponding columns
in object. By denition, the grouping factor itself
must be an invariant. When combined with omitGroupingFactor = TRUE, this option can be used to
discover is there are invariant covariates in the data
frame. Defaults to FALSE.
...
Optional additional arguments to the summary functions that are invoked on the variables by group.
Often it is helpful to specify na.rm = TRUE.
Description
Provide a summary of the variables in a data frame by groups of rows.
This is most useful with a groupedData object to examine the variables
by group.
Value
A data.frame with one row for each level of the grouping factor. The
number of columns is at most the number of columns in object.
See Also
summary, groupedData, getGroups
Examples
gsummary( Orthodont ) # default summary by Subject
## gsummary with invariantsOnly = TRUE and
## omitGroupingFactor = TRUE determines whether there
## are covariates like Sex that are invariant within
## the repeated observations on the same Subject.
gsummary( Orthodont, inv = TRUE, omit = TRUE )
intervals.lme
intervals
471
level
...
Description
Condence intervals on the parameters associated with the model represented by object are obtained. This function is generic; method functions can be written to handle specic classes of objects. Classes which
already have methods for this function include: gls, lme, and lmList.
Value
Will depend on the method function used; see the appropriate documentation.
See Also
intervals.gls, intervals.lme, intervals.lmList
intervals.lme
level
472
which
Description
Approximate condence intervals for the parameters in the linear mixedeects model represented by object are obtained, using a normal approximation to the distribution of the (restricted) maximum likelihood
estimators (the estimators are assumed to have a normal distribution
centered at the true parameter values and with covariance matrix equal
to the negative inverse Hessian matrix of the (restricted) log-likelihood
evaluated at the estimated parameters). Condence intervals are obtained in an unconstrained scale rst, using the normal approximation,
and, if necessary, transformed to the constrained scale. The pdNatural
parametrization is used for general positive-denite matrices.
Value
A list with components given by data frames with rows corresponding to parameters and columns lower, est., and upper representing,
respectively, lower condence limits, the estimated values, and upper
condence limits for the parameters. Possible components are:
fixed
reStruct
corStruct
varFunc
sigma
See Also
lme, print.intervals.lme, pdNatural
Examples
fm1 <- lme(distance ~ age, Orthodont, random = ~ age | Subject)
intervals(fm1)
intervals.lmList
intervals.lmList
473
level
pool
Description
Condence intervals on the linear model coecients are obtained for
each lm component of object and organized into a three-dimensional
array. The rst dimension corresponding to the names of the object
components. The second dimension is given by lower, est., and upper
corresponding, respectively, to the lower condence limit, estimated
coecient, and upper condence limit. The third dimension is given
by the coecients names.
Value
A three-dimensional array with the condence intervals and estimates
for the coecients of each lm component of object.
See Also
lmList, plot.intervals.lmList
Examples
fm1 <- lmList(distance ~ age | Subject, Orthodont)
intervals(fm1)
474
lme
A two-sided linear formula object describing the xedeects part of the model, with the response on the left
of a operator and the terms, separated by + operators, on the right, an lmList object, or a groupedData object. The method functions lme.lmList and
lme.groupedData are documented separately.
data
random
Optionally, any of the following: (i) a one-sided formula of the form ~x1+ +xn | g1/ /gm, with
x1+ +xn specifying the model for the random effects and g1/ /gm the grouping structure (m may
be equal to 1, in which case no / is required). The
random-eects formula will be repeated for all levels
of grouping, in the case of multiple levels of grouping;
(ii) a list of one-sided formulas of the form ~x1+ +xn
| g, with possibly dierent random-eects models for
each grouping level. The order of nesting will be assumed the same as the order of the elements in the
list; (iii) a one-sided formula of the form ~x1+ +xn,
or a pdMat object with a formula (i.e., a non-NULL
value for formula(object)), or a list of such formulas or pdMat objects. In this case, the grouping structure formula will be derived from the data used to t
the linear mixed-eects model, which should inherit
from class groupedData; (iv) a named list of formulas
or pdMat objects as in (iii), with the grouping factors as names. The order of nesting will be assumed
the same as the order of the order of the elements in
the list; (v) an reStruct object. See the documentation on pdClasses for a description of the available
pdMat classes. Defaults to a formula consisting of the
right-hand side of fixed.
lme
475
correlation
An optional corStruct object describing the withingroup correlation structure. See the documentation
of corClasses for a description of the available
corStruct classes. Defaults to NULL, corresponding to
no within-group correlations.
weights
An optional varFunc object or one-sided formula describing the within-group heteroscedasticity structure.
If given as a formula, it is used as the argument to
varFixed, corresponding to xed variance weights.
See the documentation on varClasses for a description of the available varFunc classes. Defaults to NULL,
corresponding to homocesdatic within-group errors.
subset
method
A character string. If "REML" the model is t by maximizing the restricted log-likelihood. If "ML" the loglikelihood is maximized. Defaults to "REML".
na.action
control
Description
This generic function ts a linear mixed-eects model in the formulation described in Laird and Ware (1982), but allowing for nested
random eects. The within-group errors are allowed to be correlated
and/or have unequal variances.
Value
An object of class lme representing the linear mixed-eects model t.
Generic functions such as print, plot and summary have methods to
show the results of the t. See lmeObject for the components of the
t. The functions resid, coef, fitted, fixef, and ranef can be used
to extract some of its components.
476
See Also
lmeControl, lme.lmList, lme.groupedData, lmeObject, lmList,
reStruct, reStruct, varFunc, pdClasses, corClasses, varClasses
Examples
fm1 <- lme(distance ~ age, data = Orthodont) # random is ~ age
fm2 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)
lmeControl
Maximum number of iterations for the lme optimization algorithm. Default is 50.
msMaxIter
Maximum number of iterations for the ms optimization step inside the lme optimization. Default is 50.
tolerance
niterEM
msTol
msScale
msVerbose
returnObject
lmeControl
477
gradHess
A logical value indicating whether numerical gradient vectors and Hessian matrices of the log-likelihood
function should be used in the ms optimization. This
option is only available when the correlation structure (corStruct) and the variance function structure (varFunc) have no varying parameters and
the pdMat classes used in the random eects structure are pdSymm (general positive-denite), pdDiag
(diagonal), pdIdent (multiple of the identity), or
pdCompSymm (compound symmetry). Default is TRUE.
apVar
.relStep
natural
Description
The values supplied in the function call replace the defaults and a list
with all possible arguments is returned. The returned list is used as
the control argument to the lme function.
Value
A list with components for each of the possible arguments.
See Also
lme, ms, lmeScale
Examples
# decrease the maximum number iterations in the ms call and
# request that information on the evolution of the ms iterations
# be printed
lmeControl(msMaxIter = 20, msVerbose = TRUE)
478
lmList
data
level
na.action
pool
An optional logical value that is preserved as an attribute of the returned value. This will be used as
the default for pool in calculations of standard deviations or standard errors for summaries.
Description
Data is partitioned according to the levels of the grouping factor g and
individual lm ts are obtained for each data partition, using the model
dened in object.
Value
A list of lm objects with as many components as the number of groups
dened by the grouping factor. Generic functions such as coef, fixef,
lme, pairs, plot, predict, ranef, summary, and update have methods
that can be applied to an lmList object.
nlme
479
See Also
lm, lme.lmList.
Examples
fm1 <- lmList(distance ~ age | Subject, Orthodont)
Extract Log-Likelihood
logLik
logLik(object, ...)
Arguments
object
...
Description
This function is generic; method functions can be written to handle
specic classes of objects. Classes which already have methods for this
function include: corStruct, gls, lm, lme, lmList, lmeStruct, reStruct, and
varFunc.
Value
Will depend on the method function used; see the appropriate documentation.
nlme
480
Arguments
model
data
fixed
random
nlme
481
An optional one-sided formula of the form ~g1 (single level of nesting) or ~g1/ /gQ (multiple levels
of nesting), specifying the partitions of the data over
which the random eects vary. g1,...,gQ must evaluate to factors in data. The order of nesting, when
multiple levels are present, is taken from left to right
(i.e., g1 is the rst level, g2 the second, etc.).
start
correlation
An optional corStruct object describing the withingroup correlation structure. See the documentation
of corClasses for a description of the available
corStruct classes. Defaults to NULL, corresponding to
no within-group correlations.
weights
An optional varFunc object or one-sided formula describing the within-group heteroscedasticity structure.
If given as a formula, it is used as the argument to
varFixed, corresponding to xed variance weights.
See the documentation on varClasses for a description of the available varFunc classes. Defaults to NULL,
corresponding to homoscesdatic within-group errors.
482
subset
method
A character string. If "REML" the model is t by maximizing the restricted log-likelihood. If "ML" the loglikelihood is maximized. Defaults to "ML".
na.action
naPattern
An expression or formula object, specifying which returned values are to be regarded as missing.
control
verbose
Description
This generic function ts a nonlinear mixed-eects model in the formulation described in Lindstrom and Bates (1990), but allowing for nested
random eects. The within-group errors are allowed to be correlated
and/or have unequal variances.
Value
An object of class nlme representing the nonlinear mixed-eects model
t. Generic functions such as print, plot and summary have methods
to show the results of the t. See nlmeObject for the components of
the t. The functions resid, coef, fitted, fixef, and ranef can be
used to extract some of its components.
See Also
nlmeControl, nlme.nlsList, nlmeObject, nlsList, reStruct,
varFunc, pdClasses, corClasses, varClasses
Examples
## all parameters as fixed and random effects
nlmeControl
483
nlmeControl
Maximum number of iterations for the nlme optimization algorithm. Default is 50.
pnlsMaxIter
Maximum number of iterations for the PNLS optimization step inside the nlme optimization. Default
is 7.
msMaxIter
Maximum number of iterations for the ms optimization step inside the nlme optimization. Default is 50.
minScale
tolerance
niterEM
pnlsTol
msTol
msScale
484
returnObject
msVerbose
gradHess
A logical value indicating whether numerical gradient vectors and Hessian matrices of the log-likelihood
function should be used in the ms optimization. This
option is only available when the correlation structure (corStruct) and the variance function structure (varFunc) have no varying parameters and
the pdMat classes used in the random eects structure are pdSymm (general positive-denite), pdDiag
(diagonal), pdIdent (multiple of the identity), or
pdCompSymm (compound symmetry). Default is TRUE.
apVar
.relStep
natural
A logical value indicating whether the pdNatural parameterization should be used for general positivedenite matrices (pdSymm) in reStruct, when the
approximate covariance matrix of the estimators is
calculated. Default is TRUE.
Description
The values supplied in the function call replace the defaults and a list
with all possible arguments is returned. The returned list is used as
the control argument to the nlme function.
Value
A list with components for each of the possible arguments.
See Also
nlme, ms, nlmeStruct
Examples
# decrease the maximum number iterations in the ms call and
nlsList
485
nlsList
data
start
control
level
na.action
pool
Description
Data is partitioned according to the levels of the grouping factor dened
in model and individual nls ts are obtained for each data partition,
using the model dened in model.
486
Value
A list of nls objects with as many components as the number of groups
dened by the grouping factor. Generic functions such as coef, fixef,
lme, pairs, plot, predict, ranef, summary, and update have methods
that can be applied to an nlsList object.
See Also
nls, nlme.nlsList.
Examples
fm1 <- nlsList(uptake ~ SSasympOff(conc, Asym, lrc, c0),
data = CO2, start = c(Asym = 30, lrc = -4.5, c0 = 52))
fm1
pairs.lme
form
id
pairs.lme
487
grid
...
Description
Diagnostic plots for the linear mixed-eects t are obtained. The form
argument gives considerable exibility in the type of plot specication.
A conditioning expression (on the right side of a | operator) always
implies that dierent panels are used for each level of the conditioning
factor, according to a trellis display. The expression on the right-hand
side of the formula, before a | operator, must evaluate to a data frame
with at least two columns. If the data frame has two columns, a scatter
plot of the two variables is displayed (the trellis function xyplot is
used). Otherwise, if more than two columns are present, a scatter plot
matrix with pairwise scatter plots of the columns in the data frame is
displayed (the trellis function splom is used).
Value
A diagnostic trellis plot.
See Also
lme, xyplot, splom
Examples
fm1 <- lme(distance ~ age, Orthodont, random = ~ age | Subject)
# scatter plot of coefficients by gender, identifying
# unusual subjects
pairs(fm1, ~coef(., augFrame = T) | Sex, id = 0.1, adj = -0.5)
# scatter plot of estimated random effects
pairs(fm1, ~ranef(.))
488
plot.lme
form
abline
id
idLabels
plot.lme
489
Description
Diagnostic plots for the linear mixed-eects t are obtained. The form
argument gives considerable exibility in the type of plot specication.
A conditioning expression (on the right side of a | operator) always
implies that dierent panels are used for each level of the conditioning
factor, according to a trellis display. If form is a one-sided formula,
histograms of the variable on the right-hand side of the formula, before
a | operator, are displayed (the trellis function histogram is used).
If form is two-sided and both its left- and right-hand side variables
are numeric, scatter plots are displayed (the trellis function xyplot
is used). Finally, if form is two-sided and its left-hand side variable
is a factor, boxplots of the right-hand side variable by the levels of
the left-hand side variable are displayed (the trellis function bwplot is
used).
Value
A diagnostic trellis plot.
See Also
lme, xyplot, bwplot, histogram
Examples
fm1 <- lme(distance ~ age, Orthodont, random = ~ age | Subject)
# standardized residuals versus fitted values by gender
plot(fm1, resid(., type = "p") ~ fitted(.) | Sex, abline = 0)
# box-plots of residuals by Subject
plot(fm1, Subject ~ resid(.))
# observed versus fitted values by Subject
plot(fm1, distance ~ fitted(.) | Subject, abline = c(0,1))
490
plot.nfnGroupedData
An object inheriting from class nfnGroupedData, representing a groupedData object with a numeric primary covariate and a single grouping level.
outer
An optional logical value or one-sided formula, indicating covariates that are outer to the grouping factor, which are used to determine the panels of the
trellis plot. If equal to TRUE, attr(object, "outer")
is used to indicate the outer covariates. An outer covariate is invariant within the sets of rows dened
by the grouping factor. Ordering of the groups is
done in such a way as to preserve adjacency of groups
with the same value of the outer variables. Defaults
to NULL, meaning that no outer covariates are to be
used.
inner
An optional logical value or one-sided formula, indicating a covariate that is inner to the grouping factor,
which is used to associate points within each panel
of the trellis plot. If equal to TRUE, attr(object,
"inner") is used to indicate the inner covariate. An
inner covariate can change within the sets of rows dened by the grouping factor. Defaults to NULL, meaning that no inner covariate is present.
innerGroups
xlab, ylab
strip
plot.nfnGroupedData
491
aspect
An optional character string indicating the aspect ratio for the plot passed as the aspect argument to the
xyplot function. Default is "xy" (see trellis.args).
panel
An optional function used to generate the individual panels in the trellis display, passed as the panel
argument to the xyplot function.
key
grid
...
Description
A trellis plot of the response versus the primary covariate is generated. If outer variables are specied, the combination of their levels
are used to determine the panels of the trellis display. Otherwise, the
levels of the grouping variable determine the panels. A scatter plot of
the response versus the primary covariate is displayed in each panel,
with observations corresponding to same inner group joined by line
segments. The trellis function xyplot is used.
Value
A trellis plot of the response versus the primary covariate.
See Also
groupedData, xyplot
Examples
# different panels per Subject
plot(Orthodont)
# different panels per gender
plot(Orthodont, outer = TRUE)
492
plot.nmGroupedData
An object inheriting from class nmGroupedData, representing a groupedData object with multiple grouping factors.
collapseLevel An optional positive integer or character string indicating the grouping level to use when collapsing the
data. Level values increase from outermost to innermost grouping. Default is the highest or innermost
level of grouping.
displayLevel
An optional positive integer or character string indicating the grouping level to use for determining the
panels in the trellis display, when outer is missing.
Default is collapseLevel.
outer
An optional logical value or one-sided formula, indicating covariates that are outer to the displayLevel
grouping factor, which are used to determine the panels of the trellis plot. If equal to TRUE, the displayLevel element attr(object, "outer") is used to
indicate the outer covariates. An outer covariate is
invariant within the sets of rows dened by the grouping factor. Ordering of the groups is done in such a
way as to preserve adjacency of groups with the same
value of the outer variables. Defaults to NULL, meaning that no outer covariates are to be used.
inner
An optional logical value or one-sided formula, indicating a covariate that is inner to the displayLevel
grouping factor, which is used to associate points
within each panel of the trellis plot. If equal to TRUE,
attr(object, "outer") is used to indicate the inner covariate. An inner covariate can change within
the sets of rows dened by the grouping factor. Defaults to NULL, meaning that no inner covariate is
present.
preserve
An optional one-sided formula indicating a covariate whose levels should be preserved when collapsing
plot.nmGroupedData
FUN
subset
grid
...
493
Description
The groupedData object is summarized by the values of the displayLevel grouping factor (or the combination of its values and the values
of the covariate indicated in preserve, if any is present). The collapsed
data is used to produce a new groupedData object, with grouping
factor given by the displayLevel factor, which is plotted using the
appropriate plot method for groupedData objects with single level of
grouping.
494
Value
A trellis display of the data collapsed over the values of the collapseLevel grouping factor and grouped according to the displayLevel
grouping factor.
See Also
groupedData, collapse.groupedData, plot.nfnGroupedData,
plot.nffGroupedData
Examples
# no collapsing, panels by Dog
plot(Pixel, display = "Dog", inner = ~Side)
# collapsing by Dog, preserving day
plot(Pixel, collapse = "Dog", preserve = ~day)
plot.Variogram
smooth
showModel
An optional logical value controlling whether the semivariogram corresponding to an "modelVariog" attribute of object, if any is present, should be added
to the plot. Defaults to TRUE, when the "modelVariog"
attribute is present.
sigma
span
An optional numeric value with the smoothing parameter for the loess t. Default is 0.6.
predict.lme
495
xlab,ylab
Optional character strings with the x- and y-axis labels. Default respectively to "Distance" and "Semivariogram".
type
ylim
...
Description
An xyplot of the semivariogram versus the distances is produced. If
smooth = TRUE, a loess smoother is added to the plot. If showModel
= TRUE and object includes an "modelVariog" attribute, the corresponding semivariogram is added to the plot.
Value
An xyplot trellis plot.
See Also
Variogram, xyplot, loess
Examples
fm1 <- lme(follicles ~ sin(2*pi*Time) + cos(2*pi*Time), Ovary)
plot(Variogram(fm1, form = ~ Time | Mare, maxDist = 0.7))
predict.lme
newdata
496
level
An optional integer vector giving the level(s) of grouping to be used in obtaining the predictions. Level values increase from outermost to innermost grouping,
with level zero corresponding to the population predictions. Defaults to the highest or innermost level of
grouping.
asList
na.action
Description
The predictions at level i are obtained by adding together the population predictions (based only on the xed-eects estimates) and the
estimated contributions of the random eects to the predictions at
grouping levels less or equal to i. The resulting values estimate the
best linear unbiased predictions (BLUPs) at level i. If group values not
included in the original grouping factors are present in newdata, the
corresponding predictions will be set to NA for levels greater or equal
to the level at which the unknown groups occur.
Value
If a single level of grouping is specied in level, the returned value is
either a list with the predictions split by groups (asList = TRUE) or
a vector with the predictions (asList = FALSE); else, when multiple
grouping levels are specied in level, the returned object is a data
frame with columns given by the predictions at dierent levels and the
grouping factors.
See Also
lme, fitted.lme
Examples
fm1 <- lme(distance ~ age, Orthodont, random = ~ age | Subject)
newOrth <- data.frame(Sex = c("Male","Male","Female","Female",
"Male","Male"),
age = c(15, 20, 10, 12, 2, 4),
Subject = c("M01","M01","F30","F30","M04",
qqnorm.lme
497
"M04"))
predict(fm1, newOrth, level = 0:1)
qqnorm.lme
form
abline
id
498
idLabels
grid
...
Description
Diagnostic plots for assessing the normality of residuals and random
eects in the linear mixed-eects t are obtained. The form argument
gives considerable exibility in the type of plot specication. A conditioning expression (on the right side of a | operator) always implies
that dierent panels are used for each level of the conditioning factor,
according to a trellis display.
Value
A diagnostic trellis plot for assessing normality of residuals or random
eects.
See Also
lme, plot.lme
Examples
fm1 <- lme(distance ~ age, Orthodont, random = ~ age | Subject)
# normal plot of standardized residuals by gender
qqnorm(fm1, ~ resid(., type = "p") | Sex, abline = c(0, 1))
# normal plots of random effects
qqnorm(fm1, ~ranef(.))
ranef
ranef(object, ...)
ranef.lme
499
Arguments
object
...
Description
This function is generic; method functions can be written to handle
specic classes of objects. Classes that already have methods for this
function include lmList and lme.
Value
Will depend on the method function used; see the appropriate documentation.
See Also
ranef.lmList, ranef.lme
ranef.lme
augFrame
level
An optional vector of positive integers giving the levels of grouping to be used in extracting the random
eects from an object with multiple nested grouping
levels. Defaults to all levels of grouping.
data
500
which
FUN
An optional logical value indicating whether the estimated random eects should be standardized(i.e.,
divided by the corresponding estimated standard error). Defaults to FALSE.
omitGroupingFactor
An optional logical value. When TRUE, the grouping
factor itself will be omitted from the groupwise summary of data, but the levels of the grouping factor
will continue to be used as the row names for the
returned data frame. Defaults to FALSE.
standard
Description
The estimated random eects at level i are represented as a data frame
with rows given by the dierent groups at that level and columns given
by the random eects. If a single level of grouping is specied, the
returned object is a data frame; else, the returned object is a list of such
data frames. Optionally, the returned data frame(s) may be augmented
with covariates summarized over groups.
Value
A data frame, or list of data frames, with the estimated random effects at the grouping level(s) specied in level and, optionally, other
ranef.lmList
501
ranef.lmList
augFrame
data
which
An optional positive integer or character vector specifying which columns of the data frame used to produce object should be used in the augmentation of
the returned data frame. Defaults to all variables in
the data.
FUN
502
An optional logical value indicating whether the estimated random eects should be standardized (i.e.,
divided by the corresponding estimated standard error). Defaults to FALSE.
omitGroupingFactor
An optional logical value. When TRUE, the grouping
factor itself will be omitted from the groupwise summary of data, but the levels of the grouping factor
will continue to be used as the row names for the
returned data frame. Defaults to FALSE.
Description
A data frame containing the dierences between the coecients of the
individual lm ts and the average coecients.
Value
A data frame with the dierences between the individual lm coecients
in object and their average. Optionally, the returned data frame may
be augmented with covariates summarized over groups or the dierences may be standardized.
See Also
lmList, fixef.lmList
Examples
fm1 <- lmList(distance ~ age | Subject, Orthodont)
ranef(fm1)
ranef(fm1, standard = TRUE)
ranef(fm1, augFrame = TRUE)
residuals.lme
residuals.lme
503
level
An optional integer vector giving the level(s) of grouping to be used in extracting the residuals from object.
Level values increase from outermost to innermost
grouping, with level zero corresponding to the population residuals. Defaults to the highest or innermost
level of grouping.
type
asList
Description
The residuals at level i are obtained by subtracting the tted levels
at that level from the response vector (and dividing by the estimated
within-group standard error, if type="pearson"). The tted values at
level i are obtained by adding together the population-tted values
(based only on the xed-eects estimates) and the estimated contributions of the random eects to the tted values at grouping levels less
or equal to i.
504
Value
If a single level of grouping is specied in level, the returned value
is either a list with the residuals split by groups (asList = TRUE)
or a vector with the residuals (asList = FALSE); else, when multiple
grouping levels are specied in level, the returned object is a data
frame with columns given by the residuals at dierent levels and the
grouping factors.
See Also
lme, fitted.lme
Examples
fm1 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)
residuals(fm1, level = 0:1)
selfStart
selfStart.default
505
selfStart.default
Description
A method for the generic function selfStart for formula objects.
Value
A function object of class selfStart, corresponding to a self-starting nonlinear model function. An initial attribute (dened by the initial
argument) is added to the function to calculate starting estimates for
the parameters in the model automatically.
See Also
selfStart.formula
Examples
# first.order.log.model is a function object defining a first
# order compartment model
# first.order.log.initial is a function object which calculates
# initial values for the parameters in first.order.log.model
# self-starting first order compartment model
SSfol <- selfStart(first.order.log.model,
first.order.log.initial)
506
selfStart.formula
initial
parameters
A character vector specifying the terms on the righthand side of model for which initial estimates should
be calculated. Passed as the namevec argument to
the deriv function.
template
Description
A method for the generic function selfStart for formula objects.
Value
A function object of class selfStart, obtained by applying deriv to the
right-hand side of the model formula. An initial attribute (dened by
the initial argument) is added to the function to calculate starting
estimates for the parameters in the model automatically.
See Also
selfStart.default, deriv
Examples
## self-starting logistic model
SSlogis <- selfStart(~ Asym/(1 + exp((xmid - x)/scal)),
Variogram
507
Variogram
Calculate Semivariogram
508
Variogram.lme
distance
An optional numeric vector with the distances between residual pairs. If a grouping variable is present,
only the distances between residual pairs within the
same group should be given. If missing, the distances
are calculated based on the values of the arguments
form, data, and metric, unless object includes a
corSpatial element, in which case the associated covariate (obtained with the getCovariate method) is
used.
form
An optional one-sided formula specifying the covariate(s) to be used for calculating the distances between residual pairs and, optionally, a grouping factor for partitioning the residuals (which must appear
to the right of a | operator in form). Default is 1, implying that the observation order within the groups
is used to obtain the distances.
resType
data
An optional data frame in which to interpret the variables in form. By default, the same data used to t
object is used.
na.action
Variogram.lme
509
causes an error message to be printed and the function to terminate, if there are any incomplete observations.
maxDist
length.out
collapse
nint
robust
An optional logical value specifying if a robust semivariogram estimator should be used when collapsing
the individual values. If TRUE the robust estimator is
used. Defaults to FALSE.
breaks
metric
510
Description
This method function calculates the semivariogram for the withingroup residuals from an lme t. The semivariogram values are calculated for pairs of residuals within the same group. If collapse is different from "none", the individual semivariogram values are collapsed
using either a robust estimator (robust = TRUE) dened in Cressie
(1993), or the average of the values within the same distance interval.
The semivariogram is useful for modeling the error term correlation
structure.
Value
A data frame with columns variog and dist representing, respectively,
the semivariogram values and the corresponding distances. If the semivariogram values are collapsed, an extra column, n.pairs, with the
number of residual pairs used in each semivariogram calculation, is included in the returned data frame. If object includes a corSpatial
element, a data frame with its corresponding semivariogram is included
in the returned value, as an attribute "modelVariog". The returned
value inherits from class Variogram.
See Also
lme, Variogram.default, Variogram.gls, plot.Variogram
Examples
fm1 <- lme(weight ~ Time * Diet, BodyWeight, ~ Time | Rat)
Variogram(fm1, form = ~ Time | Rat, nint = 10, robust = TRUE)
Appendix C
A Collection of Self-Starting Nonlinear
Regression Models
(C.1)
C.1.1
Starting values for the asymptotic regression model are obtained by:
(0)
of the asymptote.
(0)
512
t0.5
1
FIGURE C.1. The asymptotic regression model showing the parameters 1 , the
asymptotic response as x , 2 , the response at x = 0, and t0.5 , the half-life.
(C.2)
C.2.1
513
3
1
t0.5
x
FIGURE C.2. The asymptotic regression model with an oset showing the parameters 1 , the asymptote as x , t0.5 , the half-life, and 3 , the value of x
for which y = 0.
(C.3)
C.3.1
514
t0.5
1
x
FIGURE C.3. The asymptotic regression model through the origin showing the
parameters 1 , the asymptote as x and t0.5 , the half-life.
(0)
of the asymptote.
2 = log abs
n
"
(0)
log(1 yi /1 )/xi /n.
i=1
(C.4)
515
y
5
x
1
FIGURE C.4. A biexponential model showing the linear combination of the exponentials (solid line) and its constituent exponential curves (dashed line and
dotted line). The dashed line is 3.5 exp(4x) and the dotted line is 1.5 exp(x).
C.4.1
The starting estimates for the biexponential model are determined by curve
peeling, which involves:
1. Choosing half the data with the largest x values and tting the simple
linear regression model
log abs(y) = a + bx.
(0)
(0)
516
y
5
x
1
D exp(1 ) exp(2 )
{exp [ exp(1 )x] exp [ exp(2 )x]} ,
exp(3 ) [exp(2 ) exp(1 )]
(C.5)
C.5.1
The starting estimates for the SSfol model are also determined by curve
peeling. The steps are:
517
Set 3 = 1 + 2 log k.
These estimates are the nal nonlinear regression estimates.
2 1
.
1 + exp [(3 x) /4 ]
(C.6)
518
3
4
FIGURE C.6. The four-parameter logistic model. The parameters are the horizontal asymptote 1 as x , the horizontal asymptote 2 as x , the x
value at the inection point (3 ), and a scale parameter 4 .
C.6.1
The steps in determining starting estimates for the SSfpl model are:
(0)
B
.
1 + exp[(3 x)/ exp ]
B
1 + exp[(3 x)/ exp ]
519
x
FIGURE C.7. The simple logistic model showing the parameters 1 , the horizontal asymptote as x , 2 , the value of x for which y = 1 /2, and 3 , a scale
parameter on the x-axis. If 3 < 0 the curve will be monotone decreasing instead
of monotone increasing and 1 will be the horizontal asymptote as x .
1
.
1 + exp [(2 x)/3 ]
(C.7)
For this model we do not require that the scale parameter 3 be positive.
If 3 > 0 then 1 is the horizontal asymptote as x and 0 is the
horizontal asymptote as x . If 3 < 0, these roles are reversed. The
parameter 2 is the x value at which the response is 1 /2. It is the inection
point of the curve. The scale parameter 3 represents the distance on the
x-axis
this inection point and the point where the response is
between
1 / 1 + e1 0.731 . These parameters are shown in Figure C.7.
C.7.1
520
x
FIGURE C.8. The MichaelisMenten model used in enzyme kinetics. The parameters are 1 , the horizontal asymptote as x and 2 , the value of x at
which the response is 1 /2.
(0)
1
.
1 + exp[(2 x)/3 ]
1 x
,
2 + x
(C.8)
C.8.1
521
1 x
.
2 + x
Index
524
Index
Index
generalized nonlinear least squares,
333
getGroups, 100, 461
getInitial, 345
gls, 205, 249267, 462
methods, 250
with correlation structures, 251
with variance functions, 251
GLS model, 203205
Glucose2, see datasets
gnls, 332, 401409, 464
methods, 402
GNLS model, 333
approximate distributions of
estimates, 335
gradient attribute, 339
grouped data, 97
balanced, 99
groupedData
balancedGrouped, 109
constructor, 101, 108
display formula, 98
inner factor, 107
outer factor, 104
groupedData, 466
growth curve data, 30
gsummary, 106, 121, 469
heteroscedasticity, 178, 201, 291
IGF, see datasets
Indometh, see datasets
information matrix, 82, 323
initial attribute, 344
intervals, 471
gnls objects, 403
lmList objects, 142, 473
lme objects, 156, 471
nlme objects, 363
nlsList objects, 281, 350
isotropic correlation, 226, 231
Laird-Ware model, see linear mixedeects model
likelihood
components of, 71
extended LME model, 203
extended NLME model
525
526
Index
lmeControl, 476
lmList, 32, 139146, 478
condence intervals, 142
methods, 140
Loblolly, see datasets
logistic model, 274, 338, 519
logLik, 479
MA, see moving average model
Machines, see datasets
Manhattan distance, 230
matrix logarithm, 78
maximum likelihood estimators
approximate distribution, 81
LME model, 66
mechanistic model, 274
MichaelisMenten model, 520
MLE, see maximum likelihood
estimators
model
asymptotic regression, 301, 511
with an oset, 368
asymptotic regression through the
origin, 513
asymptotic regression with an
oset, 512
biexponential, 278, 514
empirical, 274
rst-order open-compartment, 351,
516
four-parameter logistic, 410, 517
logistic, 274, 289, 338
logistic regression, 519
mechanistic, 274
MichaelisMenten, 520
one-compartment, 295
one-compartment open with rstorder absorption, 378
two-compartment, 278
moving average model, 229
multilevel model
likelihood, 77
linear mixed-eects, 40, 60
lme t, 167
nonlinear mixed-eects, 309
naPattern argument to nlme, 298,
380
NewtonRaphson algorithm, 79
nlme library
obtaining, viii
nlme, 283, 479
fixed and random, 355
covariate modeling, 365385
extended NLME model, 391
maximum likelihood estimation, 358
methods, 357
multilevel, 385391
REML estimation, 387
single-level, 354365
with nlsList object, 357
with variance functions, 391
NLME model, see nonlinear mixedeects model
nlmeControl, 483
nls, 279, 338342
nlsList, 280, 347354, 485
methods, 349
nonidentiability, 204
nonlinear least squares, 278
nonlinear mixed-eects model, 282
approximate distributions of
estimates, 322
Bayesian hierarchical, 311
compared to LME model, 273277
extended, 328332
likelihood estimation, 312
multilevel, 309310
nonparametric maximum likelihood,
311
single-level, 306309
nonlinear regression model, 278
NONMEM software, 310
normal plot
of random eects, 188
of residuals, 179, 180
nugget eect, 231
Oats, see datasets
one-compartment open model with
rst-order absorption, 378
one-way ANOVA, 411
xed-eects model, 6
random-eects model, 7
Orange, see datasets
Orthodont, see datasets
orthogonal-triangular decomposition,
see QR decomposition
Index
Ovary, see datasets
Oxboys, see datasets
Oxide, see datasets
pairs
lmList objects, 141
lme objects, 188, 190, 486
nlme objects, 359
partially linear models, 342
PBG, see datasets
PBIB, see datasets
pdMat, 157
classes, 158
pdBlocked, 162
pdCompSymm, 161
pdDiag, 158, 283, 364
pdIdent, 164
peeling, 278
penalized nonlinear least squares, 313
Phenobarb, see datasets
Pixel, see datasets
plot
Variogram objects, 494
gnls objects, 404
groupedData object, 492
groupedData objects, 105, 490
lme objects, 175, 488
lm objects, 135
nls objects, 341
nlsList objects, 350
positive-denite matrix, see variance
covariance
precision factor, 313
predictions
lme objects, 150, 495
augmented, 39, 361
BLUP, 37, 71, 94
multilevel model, 174
NLME model, 323
random eects, 37
response, 37, 94
pseudo-likelihood, 207
qqnorm
gls objects, 253
lme objects, 179, 180, 497
nlme objects, 361
random eects, 188
QR decomposition, 66, 326
527
QuasiNewton algorithm, 79
Quinidine, see datasets
Rail, see datasets
random eects
crossed, 163
multilevel, 60
overparameterization, 156
single-level, 58
randomized block design, 1221
ranef, 498
lmList objects, 501
lme objects, 499
rate constant, 278, 351, 379
relative precision factor, 59
parameterization for, 78
REML, see restricted maximum
likelihood
residuals
normalized, 239
Pearson, 149
response, 149
residuals S function, 503
restricted maximum likelihood
LME model, 75
NLME model, 314
SBC, see BIC
scatter-plot matrix, 359
Schwarzs Bayesian Criterion, see BIC
self-starting models, 342347
available in nlme library, 346
SSasymp, 301, 511
SSasympOff, 369, 512
SSasympOrig, 513
SSbiexp, 279, 514
SSfol, 352, 516
SSfpl, 410, 517
SSlogis, 288, 347, 519
SSmicmen, 520
selfStart, 343
constructor, 504, 505
formula objects, 506
functions, 346
semivariogram, 230
robust, 231
serial correlation, 226230
shrinkage estimates, 152
Soybean, see datasets
528
Index
varExp, 211
varFixed, 208
varIdent, 209
varPower, 210, 217, 290
varReg, 268
variance covariate, 206
variance functions, 206225
with gls, 251
with lme, 214
with nlme, 391
variance weights, 208
variancecovariance
components, 93
of random eects, 58
of response, 66
of within-group error, 202
pdMat classes, 157
Variogram, 245, 264, 507
lme objects, 508
varWeights, 208
volume of distribution, 295
Wafer, see datasets
Wheat2, see datasets
within-group error
assumptions, 58
correlation, 202
heteroscedasticity, 202