Modelling Survival Data in Medical Research PDF
Modelling Survival Data in Medical Research PDF
Modelling Survival Data in Medical Research PDF
Third
Edition Texts in Statistical Science
Modelling Survival Data in Medical Research describes the mod-
elling approach to the analysis of survival data using a wide range of
Modelling
Survival Data in
chapters on frailty models and their applications, competing risks,
non-proportional hazards, and dependent censoring. It also de-
scribes techniques for modelling the occurrence of multiple events
and event history analysis.
Earlier chapters are now expanded to include new material on a num-
ber of topics, including measures of predictive ability and flexible
parametric models. Many new data sets and examples are included
Medical Research
to illustrate how these techniques are used in modelling survival data.
Features
Third Edition
• Presents an accessible introduction to statistical methods for
handling survival data
• Includes modern statistical techniques for survival analysis and
key references
• Contains real data examples with many new data sets
• Provides additional data sets that can be used for coursework
Bibliographic notes and suggestions for further reading are provided
at the end of each chapter.
Additional data sets to obtain a fuller appreciation of the methodology,
or to be used as student exercises, are provided in the appendix.
All data sets used in this book are also available in electronic format
online.
This book is an invaluable resource for statisticians in the pharma-
ceutical industry, professionals in medical research institutes, scien-
tists and clinicians who are analysing their own data, and students
David Collett
Collett
K12670
w w w. c rc p r e s s . c o m
Modelling
Survival Data in
Medical Research
Third Edition
CHAPMAN & HALL/CRC
Texts in Statistical Science Series
Series Editors
Francesca Dominici, Harvard School of Public Health, USA
Julian J. Faraway, University of Bath, UK
Martin Tanner, Northwestern University, USA
Jim Zidek, University of British Columbia, Canada
Modelling
Survival Data in
Medical Research
Third Edition
David Collett
NHS Blood and Transplant
Bristol, UK
First edition published in 1994 by Chapman and Hall.
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
Preface xv
1 Survival analysis 1
1.1 Special features of survival data 1
1.1.1 Censoring 2
1.1.2 Independent censoring 3
1.1.3 Study time and patient time 3
1.2 Some examples 5
1.3 Survivor, hazard and cumulative hazard functions 10
1.3.1 The survivor function 10
1.3.2 The hazard function 12
1.3.3 The cumulative hazard function 13
1.4 Computer software for survival analysis 14
1.5 Further reading 15
vii
viii CONTENTS
2.7 Comparison of three or more groups of survival data 50
2.8 Stratified tests 52
2.9 Log-rank test for trend 54
2.10 Further reading 56
Bibliography 499
This book describes and illustrates the modelling approach to the analysis
of survival data, using a wide range of examples from biomedical research.
My experience in presenting many lectures and courses on this subject, at
both introductory and advanced levels, as well as in providing advice on the
analysis of survival data, has had a big influence on its content. The result
is a comprehensive practical account of survival analysis at an intermediate
level, which I hope will continue to meet the needs of statisticians in the phar-
maceutical industry or medical research institutes, scientists and clinicians
who are analysing their own data, and students following undergraduate or
postgraduate courses in survival analysis.
In preparing this new edition, my aim has been to incorporate extensions
to the basic models that dramatically increase their scope, while updating
the text to take account of the wider availability of computer software for
implementing these techniques. This edition therefore contains new chapters
covering frailty models, non-proportional hazards, competing risks, multiple
events, event history analysis and dependent censoring. Additional material
on variable selection, non-linear models, measures of explained variation and
flexible parametric models has also been included in earlier chapters.
The main part of the book is formed by Chapters 1 to 7. After an intro-
duction to survival analysis in Chapter 1, Chapter 2 describes methods for
summarising survival data, and for comparing two or more groups of survival
times. The modelling approach is introduced in Chapter 3, where the Cox
regression model is presented in detail. This is followed by a chapter that de-
scribes methods for checking the adequacy of a fitted model. Parametric pro-
portional hazards models are covered in Chapter 5, with an emphasis on the
Weibull model for survival data. Chapter 6 describes parametric accelerated
failure time models, including a detailed account of their log-linear represen-
tation that is used in most computer software packages. Flexible parametric
models are also described and illustrated in this chapter, while model-checking
diagnostics for parametric models are presented in Chapter 7.
The remaining chapters describe a number of extensions to the basic mod-
els. The use of time-dependent variables is covered in Chapter 8, and the
analysis of interval-censored data is considered in Chapter 9. Frailty mod-
els that allow differences between individuals, or groups of individuals, to be
modelled using random effects, are described in Chapter 10. Chapter 11 sum-
marises techniques that can be used when the assumption of proportional
xv
xvi PREFACE
hazards cannot be made, and shows how these models can be used in compar-
ing survival outcomes across a number of institutions. Competing risk mod-
els that accommodate different causes of death are presented in Chapter 12,
while extensions of the Cox regression model to cope with multiple events of
the same or different types, including event history analysis, are described in
Chapter 13. Chapter 14 summarises methods for analysing data when there
is dependent censoring, and Chapter 15 shows how to determine the sample
size requirements of a study where the outcome variable is a survival time.
All of the techniques that have been described can be implemented in
many software packages for survival analysis, including the freeware package
R. However, sufficient methodological details have been included to convey a
sound understanding of the techniques and the assumptions on which they are
based, and to help in adapting the methodology to deal with non-standard
problems. Some examples in the earlier chapters are based on fewer observa-
tions than would normally be encountered in medical research programmes.
This enables the methods of analysis to be illustrated more easily, as well as
allowing tabular presentations of the results to be compared with output ob-
tained from computer software. Some additional data sets that may be used
to obtain a fuller appreciation of the methodology, or as student exercises, are
given in an Appendix. All of the data sets used in this book are available in
electronic form from the publisher’s web site at http://www.crcpress.com/.
In writing this book, I have assumed that the reader has a basic knowledge
of statistical methods, and has some familiarity with linear regression analysis.
Matrix algebra is used on occasions, but an understanding of linear algebra is
not an essential requirement. Bibliographic notes and suggestions for further
reading are given at the end of each chapter, but so as not to interrupt the
flow, references in the text itself have been kept to a minimum. Some sections
contain more mathematical details than others, and these have been denoted
with an asterisk. These sections can be omitted without loss of continuity.
I am indebted to Doug Altman, Alan Kimber, Mike Patefield, Anne White-
head and John Whitehead for their help in the preparation of the current and
earlier editions of the book, and to NHS Blood and Transplant for permission
to use data from the UK Transplant Registry in a number of the examples.
I also thank James Gallagher and staff of the Statistical Services Centre,
University of Reading, and my colleagues in the Statistics and Clinical Stud-
ies section of NHS Blood and Transplant, for giving me the opportunity to
rehearse the new material through courses and seminars. I am particularly
grateful to all those who took the trouble to let me know about errors in ear-
lier editions. Although these have been corrected, I would be very pleased to
be informed (d.collett@btinternet.com) of any further errors, ambiguities
and omissions in this edition. Finally, I would like to thank my wife Janet for
her support and encouragement over the period that this book was written.
David Collett
September, 2014
Chapter 1
Survival analysis
Survival analysis is the phrase used to describe the analysis of data in the
form of times from a well-defined time origin until the occurrence of some
particular event or end-point. In medical research, the time origin will often
correspond to the recruitment of an individual into an experimental study,
such as a clinical trial to compare two or more treatments. This in turn may
coincide with the diagnosis of a particular condition, the commencement of
a treatment regimen or the occurrence of some adverse event. If the end-
point is the death of a patient, the resulting data are literally survival times.
However, data of a similar form can be obtained when the end-point is not
fatal, such as the relief of pain, or the recurrence of symptoms. In this case, the
observations are often referred to as time to event data, and the methods for
analysing survival data that are presented in this book apply equally to data
on the time to these end-points. The methods can also be used in the analysis
of data from other application areas, such as the survival times of animals in
an experimental study, the time taken by an individual to complete a task in
a psychological experiment, the storage times of seeds held in a seed bank or
the lifetimes of industrial or electronic components. The focus of this book is
on the application of survival analysis to data arising from medical research,
and for this reason much of the general discussion will be phrased in terms of
the survival time of an individual patient from entry to a study until death.
1
2 SURVIVAL ANALYSIS
The main feature of survival data that renders standard methods inappro-
priate is that survival times are frequently censored. Censoring is described in
the next section.
1.1.1 Censoring
The survival time of an individual is said to be censored when the end-point
of interest has not been observed for that individual. This may be because the
data from a study are to be analysed at a point in time when some individuals
are still alive. Alternatively, the survival status of an individual at the time
of the analysis might not be known because that individual has been lost
to follow-up. As an example, suppose that after being recruited to a clinical
trial, a patient moves to another part of the country, or to a different country,
and can no longer be traced. The only information available on the survival
experience of that patient is the last date on which he or she was known to
be alive. This date may well be the last time that the patient reported to a
clinic for a regular check-up.
An actual survival time can also be regarded as censored when death is
from a cause that is known to be unrelated to the treatment. However, it can
be difficult to be sure that the death is not related to a particular treatment
that the patient is receiving. For example, consider a patient in a clinical
trial to compare alternative therapies for prostatic cancer who experiences a
fatal road traffic accident. The accident could have resulted from an attack
of dizziness, which might be a side effect of the treatment to which that
patient has been assigned. If so, the death is not unrelated to the treatment.
In circumstances such as these, the survival time until death from all causes,
or the time to death from causes other than the primary condition for which
the patient is being treated, might also be subjected to a survival analysis.
In each of these situations, a patient who entered a study at time t0 dies
at time t0 + t. However, t is unknown, either because the individual is still
alive or because he or she has been lost to follow-up. If the individual was
last known to be alive at time t0 + c, the time c is called a censored survival
time. This censoring occurs after the individual has been entered into a study,
that is, to the right of the last known survival time, and is therefore known as
right censoring. The right-censored survival time is then less than the actual,
but unknown, survival time. Right censoring that occurs when the observation
period of a study ends is often termed administrative censoring.
Another form of censoring is left censoring, which is encountered when the
actual survival time of an individual is less than that observed. To illustrate
this form of censoring, consider a study in which interest centres on the time
to recurrence of a particular cancer following surgical removal of the primary
tumour. Three months after their operation, the patients are examined to
determine if the cancer has recurred. At this time, some of the patients may
be found to have a recurrence. For such patients, the actual time to recurrence
is less than three months, and the recurrence times of these patients is left-
SPECIAL FEATURES OF SURVIVAL DATA 3
censored. Left censoring occurs far less commonly than right censoring, and
so the emphasis of this book will be on the analysis of right-censored survival
data.
Yet another type of censoring is interval censoring. Here, individuals are
known to have experienced an event within an interval of time. Consider again
the example concerning the time to recurrence of a tumour used in the above
discussion of left censoring. If a patient is observed to be free of the disease
at three months, but is found to have had a recurrence when examined six
months after surgery, the actual recurrence time of that patient is known to
be between three months and six months. The observed recurrence time is
then said to be interval-censored. We will return to interval censoring later,
in Chapter 9.
1 D
2 L
3 A
Patient
4 D
5 D
6 A
7 L
8 D
This figure shows that individuals 1, 4, 5 and 8 die (D) during the course
of the study, individuals 2 and 7 are lost to follow-up (L), and individuals 3
and 6 are still alive (A) at the end of the observation period.
As far as each patient is concerned, the trial begins at some time t0 .
The corresponding survival times for the eight individuals depicted in Fig-
ure 1.1 are shown in order in Figure 1.2. The period of time that a patient
spends in the study, measured from that patient’s time origin, is often re-
ferred to as patient time. The period of time from the time origin to the
death of a patient (D) is then the survival time, and this is recorded for in-
dividuals 1, 4, 5 and 8. The survival times of the remaining individuals are
right-censored (C).
In practice, the actual data recorded will be the date on which each indi-
vidual enters the study, and the date on which each individual dies or was last
known to be alive. The survival time in days, weeks or months, whichever is
the most appropriate, can then be calculated. Most computer software pack-
ages for survival analysis have facilities for performing this calculation from
input data in the form of dates.
SOME EXAMPLES 5
5 D
7 C
8 D
Patient
1 D
2 C
4 D
6 C
3 C
Patient time
ing problems. Some women in the study ceased using the IUD because of the
desire for pregnancy, or because they had no further need for a contracep-
tive, while others were simply lost to follow-up. These reasons account for the
censored discontinuation times of 13, 18, 23, 38, 54 and 56 weeks. The study
protocol called for the menstrual bleeding experience of each woman to be
documented for a period of two years from the time origin. For practical rea-
sons, each woman could not be examined exactly two years after recruitment
to determine if they were still using the IUD, and this is why there are three
discontinuation times greater than 104 weeks that are right-censored.
One objective in an analysis of these data would be to summarise the
distribution of discontinuation times. We might then wish to estimate the
median time to discontinuation of the IUD, or the probability that a woman
will stop using the device after a given period of time. Indeed, a graph of this
estimated probability, as a function of time, will provide a useful summary of
the observed data.
In the analysis of the data from this study, we will be particularly interested
in whether or not there is a difference in the survival experience of the two
groups of women. If there were evidence that those women with negative
HPA staining tended to live longer after surgery than those with positive
staining, we would conclude that the prognosis for a breast cancer patient
was dependent on the result of the staining procedure.
and represents the probability that the survival time is less than some value t.
This function is also called the cumulative incidence function, since it sum-
marises the cumulative probability of death occurring before time t.
SURVIVOR, HAZARD AND CUMULATIVE HAZARD FUNCTIONS 11
Table 1.4 Survival times of prostatic cancer patients in a clinical trial to compare
two treatments.
Patient Treatment Survival Status Age Serum Size of Gleason
number time haem. tumour index
1 1 65 0 67 13.4 34 8
2 2 61 0 60 14.6 4 10
3 2 60 0 77 15.6 3 8
4 1 58 0 64 16.2 6 9
5 2 51 0 65 14.1 21 9
6 1 51 0 61 13.5 8 8
7 1 14 1 73 12.4 18 11
8 1 43 0 60 13.6 7 9
9 2 16 0 73 13.8 8 9
10 1 52 0 73 11.7 5 9
11 1 59 0 77 12.0 7 10
12 2 55 0 74 14.3 7 10
13 2 68 0 71 14.5 19 9
14 2 51 0 65 14.4 10 9
15 1 2 0 76 10.7 8 9
16 1 67 0 70 14.7 7 9
17 2 66 0 70 16.0 8 9
18 2 66 0 70 14.5 15 11
19 2 28 0 75 13.7 19 10
20 2 50 1 68 12.0 20 11
21 1 69 1 60 16.1 26 9
22 1 67 0 71 15.6 8 8
23 2 65 0 51 11.8 2 6
24 1 24 0 71 13.7 10 9
25 2 45 0 72 11.0 4 8
26 2 64 0 74 14.2 4 6
27 1 61 0 75 13.7 10 12
28 1 26 1 72 15.3 37 11
29 1 42 1 57 13.9 24 12
30 2 57 0 72 14.6 8 10
31 2 70 0 72 13.8 3 9
32 2 5 0 74 15.1 3 9
33 2 54 0 51 15.8 7 8
34 1 36 1 72 16.4 4 9
35 2 70 0 71 13.6 2 10
36 2 67 0 73 13.8 7 8
37 1 23 0 68 12.5 2 8
38 1 62 0 63 13.2 3 8
12 SURVIVAL ANALYSIS
The survivor function, S(t), is defined to be the probability that the sur-
vival time is greater than or equal to t, and so from Equation (1.1),
The survivor function can therefore be used to represent the probability that
an individual survives beyond any given time.
The function h(t) is also referred to as the hazard rate, the instantaneous death
rate, the intensity rate or the force of mortality.
From the definition of the hazard function in Equation (1.3), h(t) is the
event rate at time t, conditional on the event not having occurred before t.
Specifically, if the survival time is measured in days, h(t) is the approximate
probability that an individual, who is at risk of the event occurring at the
start of day t, experiences the event during that day. The hazard function at
time t can also be regarded as the expected number of events experienced by
an individual in unit time, given that the event has not occurred before then,
and assuming that the hazard is constant over that time period.
The definition of the hazard function in Equation (1.3) leads to some use-
ful relationships between the survivor and hazard functions. According to a
standard result from probability theory, the probability of an event A, condi-
tional on the occurrence of an event B, is given by P(A | B) = P(AB)/P(B),
where P(AB) is the probability of the joint occurrence of A and B. Using this
result, the conditional probability in the definition of the hazard function in
Equation (1.3) is
P(t 6 T < t + δt)
,
P(T > t)
which is equal to
F (t + δt) − F (t)
,
S(t)
SURVIVOR, HAZARD AND CUMULATIVE HAZARD FUNCTIONS 13
where F (t) is the distribution function of T . Then,
{ }
F (t + δt) − F (t) 1
h(t) = lim .
δt→0 δt S(t)
Now, { }
F (t + δt) − F (t)
lim
δt→0 δt
is the definition of the derivative of F (t) with respect to t, which is f (t), and
so
f (t)
h(t) = . (1.4)
S(t)
Taken together, Equations (1.1), (1.2) and (1.4) show that from any one of
the three functions, f (t), S(t), and h(t), the other two can be determined.
d
h(t) = − {log S(t)}, (1.5)
dt
and so
S(t) = exp {−H(t)}, (1.6)
where ∫ t
H(t) = h(u) du. (1.7)
0
The function H(t) features widely in survival analysis, and is called the in-
tegrated or cumulative hazard function. From Equation (1.6), the cumulative
hazard function can also be obtained from the survivor function, since
Most of the techniques for analysing survival data that will be presented in
this book require suitable computer software for their implementation. Many
computer packages for survival analysis are now available, but of the commer-
cially available software packages, SAS (SAS Institute Inc.), S-PLUS (TIBCO
Software Inc.) and Stata (StataCorp) have the most extensive range of fa-
cilities. In addition, the R statistical computing environment (R Core Team,
2013) is free software, distributed under the terms of the GNU General Public
License. Both S-PLUS and R are modern implementations of the S statisti-
cal programming language, and include a comprehensive range of modules for
survival analysis. Any of these four packages can be used to carry out the
analyses described in subsequent chapters of this book.
In this book, the data sets used to illustrate the different methods of sur-
vival analysis have been analysed using SAS 9.4 (SAS Institute, Cary NC),
mainly using the procedures lifetest, lifereg and phreg. Where published
SAS macros have been used for more specialised analyses, these are docu-
mented in the ‘Further reading’ section of each chapter.
In some circumstances, numerical results in the output produced by soft-
ware packages may differ. This is often due to different default methods of
calculation being used. A particularly important example of this occurs when
a data set includes two or more individuals with the same survival times. In
this case, the SAS phreg procedure and the R package survival (Therneau,
2014) default to different methods of handling these tied observations, leading
to differences in the output. The default settings can of course be changed,
and the treatment of tied survival times is described in Section 3.3.2 of Chap-
ter 3. Differences in numerical values may also result from different settings
being used for parameters that control the convergence of certain iterative
procedures, and different methods being used for numerical optimisation.
FURTHER READING 15
1.5 Further reading
17
18 SOME NON-PARAMETRIC PROCEDURES
number of individuals in the study. Notice that the empirical survivor function
is equal to unity for values of t before the first death time, and zero after the
final death time.
The estimated survivor function Ŝ(t) is assumed to be constant between
two adjacent death times, and so a plot of Ŝ(t) against t is a step-function.
The function decreases immediately after each observed survival time.
11 13 13 13 13 13 14 14 15 15 17
Using Equation (2.1), the estimated values of the survivor function at times
11, 13, 14, 15 and 17 months are 1.000, 0.909, 0.455, 0.273 and 0.091. The
estimated value of the survivor function is unity from the time origin until 11
months, and zero after 17 months. A graph of the estimated survivor function
is given in Figure 2.1.
1.0
Estimated survivor function
0.8
0.6
0.4
0.2
0.0
0 3 6 9 12 15 18
Survival time
Figure 2.1 Estimated survivor function for the data from Example 2.1.
Table 2.1 Life-table estimate of the survivor function for the data from
Example 1.3.
Interval Time period dj cj nj n′j (n′j − dj )/n′j S ∗ (t)
1 0– 16 4 48 46.0 0.6522 1.0000
2 12– 10 4 28 26.0 0.6154 0.6522
3 24– 1 0 14 14.0 0.9286 0.4013
4 36– 3 1 13 12.5 0.7600 0.3727
5 48– 2 2 9 8.0 0.7500 0.2832
6 60– 4 1 5 4.5 0.1111 0.2124
1.0
0.8
Estimated survivor function
0.6
0.4
0.2
0.0
0 10 20 30 40 50 60 70
Survival time
The form of the estimated survivor function obtained using this method
is sensitive to the choice of the intervals used in its construction, just as the
ESTIMATING THE SURVIVOR FUNCTION 21
shape of a histogram depends on the choice of the class intervals. On the other
hand, the life-table estimate is particularly well suited to situations in which
the actual death times are unknown, and the only available information is the
number of deaths and the number of censored observations that occur in a
series of consecutive time intervals. In practice, such interval-censored survival
data occur quite frequently.
When the actual survival times are known, the life-table estimate can still
be used, as in Example 2.2, but the grouping of the survival times does result
in some loss of information. Alternative methods for estimating the survivor
function are then more appropriate, such as that leading to the Kaplan-Meier
estimate.
∏k ( )
nj − dj
Ŝ(t) = , (2.4)
j=1
nj
for t(k) 6 t < t(k+1) , k = 1, 2, . . . , r, with Ŝ(t) = 1 for t < t(1) , and where
t(r+1) is taken to be ∞. If the largest observation is a censored survival time,
t∗ , say, Ŝ(t) is undefined for t > t∗ . On the other hand, if the largest observed
survival time, t(r) , is an uncensored observation, nr = dr , and so Ŝ(t) is zero
for t > t(r) . A plot of the Kaplan-Meier estimate of the survivor function is
a step-function, in which the estimated survival probabilities are constant
between adjacent death times and decrease at each death time.
ESTIMATING THE SURVIVOR FUNCTION 23
Equation (2.4) shows that, as for the life-table estimate of the survivor
function in Equation (2.3), the Kaplan-Meier estimate is formed as a product
of a series of estimated probabilities. In fact, the Kaplan-Meier estimate is
the limiting value of the life-table estimate in Equation (2.3) as the number
of intervals tends to infinity and their width tends to zero. For this reason,
the Kaplan-Meier estimate is also known as the product-limit estimate of the
survivor function.
Note that if there are no censored survival times in the data set, nj − dj =
nj+1 , j = 1, 2, . . . , k, in Equation (2.4), and on expanding the product we get
n2 n3 nk+1
Ŝ(t) = × × ··· × .
n1 n2 nk
This reduces to nk+1 /n1 , for k = 1, 2, . . . , r − 1, with Ŝ(t) = 1 for t < t(1) and
Ŝ(t) = 0 for t > t(r) . Now, n1 is the number of individuals at risk just before
the first death time, which is the number of individuals in the sample, and nk+1
is the number of individuals with survival times greater than or equal to t(k+1) .
Consequently, in the absence of censoring, Ŝ(t) is simply the empirical survivor
function defined in Equation (2.1). The Kaplan-Meier estimate is therefore a
generalisation of the empirical survivor function that accommodates censored
observations.
The estimated survivor function, Ŝ(t), is plotted in Figure 2.4. Note that
since the largest discontinuation time of 107 days is censored, Ŝ(t) is not
defined beyond t = 107.
24 SOME NON-PARAMETRIC PROCEDURES
1.0
Estimated survivor function
0.8
0.6
0.4
0.2
0.0
0 20 40 60 80 100 120
Discontinuation time
Figure 2.4 Kaplan-Meier estimate of the survivor function for the data from Exam-
ple 1.1.
∏
k
S̃(t) = exp(−dj /nj ). (2.5)
j=1
x2 x3
e−x = 1 − x + − + ···,
2 6
which is approximately equal to 1 − x when x is small. It then follows that
exp(−dj /nj ) ≈ 1 − (dj /nj ) = (nj − dj )/nj , so long as dj is small relative
to nj , which it will be except at the longest survival times. Consequently,
the Kaplan-Meier estimate, Ŝ(t), in Equation (2.4), approximates the Nelson-
Aalen estimate, S̃(t), in Equation (2.5).
The Nelson-Aalen estimate of the survivor function, also known as Alt-
shuler’s estimate, will always be greater than the Kaplan-Meier estimate at
any given time, since e−x > 1 − x, for all values of x. Although the Nelson-
Aalen estimate has been shown to perform better than the Kaplan-Meier
STANDARD ERROR OF THE ESTIMATED SURVIVOR FUNCTION 25
estimate in small samples, in many circumstances, the estimates will be very
similar, particularly at the earlier survival times. Since the Kaplan-Meier esti-
mate is a generalisation of the empirical survivor function, the latter estimate
has much to commend it.
From this table we see that the Kaplan-Meier and Nelson-Aalen estimates
of the survivor function differ by less than 0.04. However, when we consider
the precision of these estimates, which we do in Section 2.2, we see that a
difference of 0.04 is of no practical importance.
∏
k
Ŝ(t) = p̂j ,
j=1
for k = 1, 2, . . . , r, where p̂j = (nj −dj )/nj is the estimated probability that an
individual survives through the time interval that begins at t(j) , j = 1, 2, . . . , r.
Taking logarithms,
∑k
log Ŝ(t) = log p̂j ,
j=1
Now, the number of individuals who survive through the interval beginning
at t(j) can be assumed to have a binomial distribution with parameters nj
and pj , where pj is the true probability of survival through that interval. The
observed number who survive is nj −dj , and using the result that the variance
of a binomial random variable with parameters n, p is np(1 − p), the variance
of nj − dj is given by
var (nj − dj ) = nj pj (1 − pj ).
Since p̂j = (nj − dj )/nj , the variance of p̂j is var (nj − dj )/n2j , that is, pj (1 −
pj )/nj . The variance of p̂j may then be estimated by
p̂j (1 − p̂j )/nj . (2.7)
In order to obtain the variance of log p̂j , we make use of a general result
for the approximate variance of a function of a random variable. According to
this result, the variance of a function g(X) of the random variable X is given
by
{ }2
dg(X)
var {g(X)} ≈ var (X). (2.8)
dX
This is known as the Taylor series approximation to the variance of a function
of a random variable. Using Equation (2.8), the approximate variance of log p̂j
is var (p̂j )/p̂2j , and using Expression (2.7), the approximate estimated variance
of log p̂j is (1 − p̂j )/(nj p̂j ), which on substitution for p̂j , reduces to
dj
. (2.9)
nj (nj − dj )
{ } ∑k
dj
var log Ŝ(t) ≈ , (2.10)
j=1
nj (nj − dj )
so that
{ } ∑
k
dj
var Ŝ(t) ≈ [Ŝ(t)]2 . (2.11)
j=1
nj (nj − dj )
Finally, the standard error of the Kaplan-Meier estimate of the survivor func-
tion, defined to be the square root of the estimated variance of the estimate,
is given by
21
{ } ∑ k
dj
se Ŝ(t) ≈ Ŝ(t) , (2.12)
nj (nj − dj )
j=1
∑k ∑k ( )
nj − nj+1 1 1 n1 − nk+1
= − = ,
j=1
n j n j+1 j=1
n j+1 n j n1 nk+1
since Ŝ(t) = nk+1 /n1 for t(k) 6 t < t(k+1) , k = 1, 2, . . . , r − 1, in the absence
of censoring. Hence, from Equation (2.11), the estimated variance of Ŝ(t) is
Ŝ(t)[1 − Ŝ(t)]/n1 . This is an estimate of the variance of the empirical sur-
vivor function, given in Equation (2.1), on the assumption that the number
of individuals at risk at time t has a binomial distribution with parameters
n1 , S(t).
-z 0 z
Value of z
Figure 2.5 Upper and lower α/2-points of the standard normal distribution.
The standard error of log{− log Ŝ(t)} is the square root of this quantity. This
leads to 100(1 − α)% limits of the form
Table 2.4 Standard error of Ŝ(t) and confidence intervals for S(t)
for the data from Example 1.1.
Time interval Ŝ(t) se {Ŝ(t)} 95% confidence interval
0– 1.0000 0.0000
10– 0.9444 0.0540 (0.839, 1.000)
19– 0.8815 0.0790 (0.727, 1.000)
30– 0.8137 0.0978 (0.622, 1.000)
36– 0.7459 0.1107 (0.529, 0.963)
59– 0.6526 0.1303 (0.397, 0.908)
75– 0.5594 0.1412 (0.283, 0.836)
93– 0.4662 0.1452 (0.182, 0.751)
97– 0.3729 0.1430 (0.093, 0.653)
107 0.2486 0.1392 (0.000, 0.522)
From this table we see that, in general, the standard error of the esti-
mated survivor function increases with the discontinuation time. The reason
for this is that estimates of the survivor function at later times are based on
fewer individuals. A graph of the estimated survivor function, with the 95%
confidence limits shown as dashed lines, is given in Figure 2.6.
It is important to observe that the confidence limits for a survivor func-
tion, illustrated in Figure 2.6, are only valid for any given time. Different
methods are needed to produce confidence bands that are such that there is
a given probability, 0.95 for example, that the survivor function is contained
in the band for all values of t. These bands will tend to be wider than the
band formed from the pointwise confidence limits. Details will not be included,
but references to these methods are given in the final section of this chapter.
Notice also that the width of these intervals is very much greater than the
difference between the Kaplan-Meier and Nelson-Aalen estimates of the sur-
vivor function, shown in Tables 2.2 and 2.3. Similar calculations lead to con-
fidence limits based on life-table and Nelson-Aalen estimates of the survivor
function.
ESTIMATING THE HAZARD FUNCTION 31
0.8
0.6
0.4
0.2
0.0
0 20 40 60 80 100 120
Discontinuation time
Figure 2.6 Estimated survivor function and 95% confidence limits for S(t).
0.05
0.04
Estimated hazard function
0.03
0.02
0.01
0.00
0 10 20 30 40 50 60 70
Survival time
Figure 2.7 Life-table estimate of the hazard function for the data from Example 1.3.
0.05
0.04
Estimated hazard function
0.03
0.02
0.01
0.00
0 20 40 60 80 100 120
Discontinuation time
Figure 2.8 Kaplan-Meier type estimate of the hazard function for the data from
Example 1.1.
Figure 2.8 shows a plot of the estimated hazard function. From this figure,
there is some evidence that the longer the IUD is used, the greater is the risk of
discontinuation, but the picture is not very clear. The approximate standard
errors of the estimated hazard function at different times are of little help in
interpreting this plot.
In practice, estimates of the hazard function obtained in this way will often
tend to be rather irregular. For this reason, plots of the hazard function may be
‘smoothed’, so that any pattern can be seen more clearly. There are a number
of ways of smoothing the hazard function, that lead to a weighted average of
values of the estimated hazard ĥ(t) at death times in the neighbourhood of t.
ESTIMATING THE HAZARD FUNCTION 35
For example, a kernel smoothed estimate of the hazard function, based on the
r ordered death times, t(1) , t(2) , . . . , t(r) , with dj deaths and nj at risk at time
t(j) , can be found from
{ ( )2 }
∑
r
t − t(j) dj
† −1
h (t) = b 0.75 1 − ,
j=1
b nj
where the value of b needs to be chosen. The function h† (t) is defined for
all values of t in the interval from b to t(r) − b, where t(r) is the greatest
death time. For any value of t in this interval, the death times in the interval
(t − b, t + b) will contribute to the weighted average. The parameter b is known
as the bandwidth and its value controls the shape of the plot; the larger the
value of b, the greater the degree of smoothing. There are formulae that lead
to ‘optimal’ values of b, but these tend to be rather cumbersome. Fuller details
can be found in the references provided in the final section of this chapter.
In this book, the use of a modelling approach to the analysis of survival data
is advocated, and so model-based estimates of the hazard function will be
considered in subsequent chapters.
∑
k ( )
nj − dj
Ĥ(t) = − log ,
j=1
nj
for t(k) 6 t < t(k+1) , k = 1, 2, . . . , r, and t(1) , t(2) , . . . , t(r) are the r ordered
death times, with t(r+1) = ∞.
36 SOME NON-PARAMETRIC PROCEDURES
If the Nelson-Aalen estimate of the survivor function is used, the estimated
cumulative hazard function, H̃(t) = − log S̃(t), is given by
∑k
dj
H̃(t) = .
n
j=1 j
This is the cumulative sum of the estimated probabilities of death from the
first to the kth time interval, k = 1, 2, . . . , r, and so this quantity has imme-
diate intuitive appeal as an estimate of the cumulative hazard.
An estimate of the cumulative hazard function also leads to an estimate
of the corresponding hazard function, since the differences between adjacent
values of the estimated cumulative hazard function provide estimates of the
underlying hazard, after dividing by the time interval. In particular, differences
in adjacent values of the Nelson-Aalen estimate of the cumulative hazard lead
directly to the hazard function estimate in Section 2.3.2.
respectively. Using the estimated survivor function, the estimated pth per-
centile is the smallest observed survival time, t̂(p), for which Ŝ{t̂(p)} <
1 − (p/100).
It sometimes happens that the estimated survivor function is greater than
0.5 for all values of t. In such cases, the median survival time cannot be
estimated. It would then be natural to summarise the data in terms of other
percentiles of the distribution of survival times, or the estimated survival
probabilities at particular time points.
Estimates of the dispersion of a sample of survival data are not widely
used, but should such an estimate be required, the semi-interquartile range
(SIQR) can be calculated. This is defined to be half the difference between
the 75th and 25th percentiles of the distribution of survival times. Hence,
1
SIQR = {t(75) − t(25)} ,
2
where t(25) and t(75) are the 25th and 75th percentiles of the survival time
distribution. These two percentiles are also known as the first and third quar-
tiles, respectively. The corresponding sample-based estimate of the SIQR is
{t̂(75) − t̂(25)}/2. Like the variance, the larger the value of the SIQR, the
more dispersed is the survival time distribution.
where t(p) is the pth percentile of the distribution and Ŝ{t(p)} is the Kaplan-
Meier estimate of the survivor function at t(p). Now,
dŜ{t(p)}
− = fˆ{t(p)},
dt(p)
an estimate of the probability density function of the survival times at t(p),
and on rearranging Equation (2.14), we get
( )2
1
var {t(p)} = var [Ŝ{t(p)}].
fˆ{t(p)}
The standard error of t̂(p), the estimated pth percentile, is therefore given by
1
se {t̂(p)} = se [Ŝ{t̂(p)}]. (2.15)
fˆ{t̂(p)}
The standard error of Ŝ{t̂(p)} is found using Greenwood’s formula for the
standard error of the Kaplan-Meier estimate of the survivor function, given in
Equation (2.12), while an estimate of the probability density function at t̂(p)
is
Ŝ{û(p)} − Ŝ{ˆl(p)}
fˆ{t̂(p)} = ,
ˆl(p) − û(p)
where { p }
û(p) = max t(j) | Ŝ(t(j) ) > 1 − +ϵ ,
100
and { }
ˆl(p) = min t(j) | Ŝ(t(j) ) 6 1 − p − ϵ ,
100
for j = 1, 2, . . . , r, and small values of ϵ. In many cases, taking ϵ = 0.05 will
be satisfactory, but a larger value of ϵ will be needed if û(p) and ˆl(p) turn
out to be equal. In particular, from Equation (2.15), the standard error of the
median survival time is given by
1
se {t̂(50)} = se [Ŝ{t̂(50)}], (2.16)
ˆ
f {t̂(50)}
CONFIDENCE INTERVALS FOR THE PERCENTILES 39
where fˆ{t̂(50)} can be found from
Ŝ{û(50)} − Ŝ{ˆl(50)}
fˆ{t̂(50)} = . (2.17)
ˆl(50) − û(50)
In this expression, û(50) is the largest survival time for which the Kaplan-
Meier estimate of the survivor function exceeds 0.55, and ˆl(50) is the smallest
survival time for which the survivor function is less than or equal to 0.45.
Once the standard error of the estimated pth percentile has been found, a
100(1 − α)% confidence interval for t(p) has limits of
where zα/2 is the upper (one-sided) α/2-point of the standard normal distri-
bution.
This interval estimate is only approximate, in the sense that the probability
that the interval includes the true percentile will not be exactly 1 − α. A
number of methods have been proposed for constructing confidence intervals
for the median with superior properties, although these alternatives are more
difficult to compute than the interval estimate derived in this section.
and
ˆl(50) = min{t(j) | Ŝ(t(j) ) 6 0.45},
93 ± 1.96 × 17.13,
and so the required interval estimate for the median ranges from 59 to 127
days.
1.0
Estimated survivor function
0.8
0.6
0.4
0.2
0.0
0 50 100 150 200 250
Survival time
Figure 2.9 Kaplan-Meier estimate of the survivor functions for women with tumours
that were positively stained (—) and negatively stained (·······).
This figure shows that the estimated survivor function for those women
with negatively stained tumours is always greater than that for women with
positively stained tumours. This means that at any time t, the estimated
probability of survival beyond t is greater for women with negative staining,
COMPARISON OF TWO GROUPS OF SURVIVAL DATA 41
suggesting that the result of the HPA staining procedure might be a useful
prognostic indicator. In particular, those women whose tumours are positively
stained appear to have a poorer prognosis than those with negatively stained
tumours.
There are two possible explanations for an observed difference between two
estimated survivor functions, such as those in Example 2.11. One explanation
is that there is a real difference between the survival times of the two groups
of individuals, so that those in one group have a different survival experience
from those in the other. An alternative explanation is that there are no real
differences between the survival times in each group, and that the difference
that has been observed is merely the result of chance variation. To help dis-
tinguish between these two possible explanations, we use a procedure known
as the hypothesis test. Because the concept of the hypothesis test has a central
role in the analysis of survival data, the underlying basis for this procedure is
described in detail in the following section.
The hypothesis test is a procedure that enables us to assess the extent to which
an observed set of data are consistent with a particular hypothesis, known as
the working or null hypothesis. A null hypothesis generally represents a sim-
plified view of the data-generating process, and is typified by hypotheses that
specify that there is no difference between two groups of survival data, or that
there is no relationship between survival time and explanatory variables such
as age or serum cholesterol level. The null hypothesis is then the hypothesis
that will be adopted, and subsequently acted upon, unless the data indicate
that it is untenable.
The next step is to formulate a test statistic that measures the ex-
tent to which the observed data depart from the null hypothesis. In gen-
eral, the test statistic is so constructed that the larger the value of the
statistic, the greater the departure from the null hypothesis. Hence, if the
null hypothesis is that there is no difference between two groups, relatively
large values of the test statistic will be interpreted as evidence against this
null hypothesis.
Once the value of the test statistic has been obtained from the observed
data, we calculate the probability of obtaining a value as extreme or more
extreme than the observed value, when the null hypothesis is true. This quan-
tity summarises the strength of the evidence in the sample data against the
null hypothesis, and is known as the probability value, or P-value for short.
If the P -value is large, we would conclude that it is quite likely that the ob-
served data would have been obtained when the null hypothesis was true, and
that there is no evidence to reject the null hypothesis. On the other hand, if
the P -value is small, this would be interpreted as evidence against the null
hypothesis; the smaller the P -value, the stronger the evidence.
42 SOME NON-PARAMETRIC PROCEDURES
In order to obtain the P -value for a hypothesis test, the test statistic must
have a probability distribution that is known, or at least approximately known,
when the null hypothesis is true. This probability distribution is referred to
as the null distribution of the test statistic. More specifically, consider a test
statistic, W , which is such that the larger the observed value of the test
statistic, w, the greater the deviation of the observed data from that expected
under the null hypothesis. If W has a continuous probability distribution,
the P -value is then P(W > w) = 1 − F (w), where F (w) is the distribution
function of W , under the null hypothesis, evaluated at w.
In some applications, the most natural test statistic is one for which large
positive values correspond to departures from the null hypothesis in one di-
rection, while large negative values correspond to departures in the opposite
direction. For example, suppose that patients suffering from a particular ill-
ness have been randomised to receive either a standard treatment or a new
treatment, and their survival times are recorded. In this situation, a null hy-
pothesis of interest will be that there is no difference in the survival experience
of the patients in the two treatment groups. The extent to which the data are
consistent with this null hypothesis might then be summarised by a test statis-
tic for which positive values indicate that the new treatment is superior to the
standard, while negative values indicate that the standard treatment is supe-
rior. When departures from the null hypothesis in either direction are equally
important, the null hypothesis is said to have a two-sided alternative, and the
hypothesis test itself is referred to as a two-sided test.
If W is a test statistic for which large positive or large negative observed
values lead to rejection of the null hypothesis, a new test statistic, such as
|W | or W 2 , can be defined, so that only large positive values of the new
statistic indicate that there is evidence against the null hypothesis. For ex-
ample, suppose that W is a test statistic that under the null hypothesis has
a standard normal distribution. If w is the observed value of W , the appro-
priate P -value is P(W 6 − |w|) + P(W > |w|), which in view of the sym-
metry of the standard normal distribution, is 2 P(W > |w|). Alternatively,
we can make use of the result that if W has a standard normal distribu-
tion, W 2 has a chi-squared distribution on one degree of freedom, written χ21 .
Thus, a P -value for the two-sided hypothesis test based on the statistic W is
the probability that a χ21 random variable exceeds w2 . The required P -value
can therefore be found from the standard normal or chi-squared distribution
functions.
When interest centres on departures in a particular direction, the hypoth-
esis test is said to be one-sided. For example, in comparing the survival times
of two groups of patients where one group receives a standard treatment and
the other group a new treatment, it might be argued that the new treatment
cannot possibly be inferior to the standard. Then, the only relevant alternative
to the null hypothesis of no treatment difference is that the new treatment is
superior. If positive values of the test statistic W reflect the superiority of the
new treatment, the P -value is then P(W > w). If W has a standard normal
COMPARISON OF TWO GROUPS OF SURVIVAL DATA 43
distribution, this P -value is half of that which would have been obtained for
the corresponding two-sided alternative hypothesis.
A one-sided hypothesis test can only be appropriate when there is no inter-
est whatsoever in departures from the null hypothesis in the opposite direction
to that specified in the one-sided alternative. For example, consider again the
comparison of a new treatment with a standard treatment, and suppose that
the observed value of the test statistic is either positive or negative, depending
on whether the new treatment is superior or inferior to the standard. If the
alternative to the null hypothesis of no treatment difference is that the new
treatment is superior, a large negative value of the test statistic would not be
regarded as evidence against the null hypothesis. Instead, it would be assumed
that this large negative value is simply the result of chance variation. Gen-
erally speaking, the use of one-sided tests can rarely be justified in medical
research, and so two-sided tests will be used throughout this book.
If a P -value is smaller than some value α, we say that the hypothesis is
rejected at the 100α% level of significance. The observed value of the test
statistic is then said to be significant at this level. But how do we decide on
the basis of the P -value whether or not a null hypothesis should actually be
rejected? Traditionally, P -values of 0.05 or 0.01 have been used in reaching a
decision about whether or not a null hypothesis should be rejected, so that if
P < 0.05, for example, the null hypothesis is rejected at the 5% significance
level. Guidelines such as these are not hard-and-fast rules and should not be
interpreted rigidly. For example, there is no practical difference between a
P -value of 0.046 and 0.056, even though only the former indicates that the
observed value of the test statistic is significant at the 5% level.
Instead of reporting that a null hypothesis is rejected or not rejected at
some specified significance level, a more satisfactory policy is to report the
actual P -value. This P -value can then be interpreted as a measure of the
strength of evidence against the null hypothesis, using a vocabulary that de-
pends on the range within which the P -value lies. Thus, if P > 0.1, there is
said to be no evidence to reject the null hypothesis; if 0.05 < P 6 0.1, there is
slight evidence against the null hypothesis; if 0.01 < P 6 0.05, there is mod-
erate evidence against the null hypothesis; if 0.001 < P 6 0.01, there is strong
evidence against the null hypothesis, and if P 6 0.001, the evidence against
the null hypothesis is overwhelming.
An alternative to quoting the exact P -value associated with a hypothesis
test is to compare the observed value of the test statistic with those values
that would correspond to particular P -values, when the null hypothesis is
true. Values of the test statistic that lead to rejection of the null hypothesis at
particular levels of significance can be found from percentage points of the null
distribution of that statistic. In particular, if W is a test statistic that has a
standard normal distribution, for a two-sided test, the upper α/2-point of the
distribution, depicted in Figure 2.5, is the value of the test statistic for which
the P -value is α. For example, values of the test statistic of 1.96, 2.58 and 3.29
correspond to P -values of 0.05, 0.01 and 0.001. Thus, if the observed value of
44 SOME NON-PARAMETRIC PROCEDURES
W were between 1.96 and 2.58, we would declare that 0.01 < P < 0.05. On
the other hand, if the null distribution of W is chi-squared on one degree of
freedom, the upper α-point of the distribution is the value of the test statistic
which would give a P -value of α. Then, values of the test statistic of 3.84,
6.64 and 10.83 correspond to P -values of 0.05, 0.01 and 0.001, respectively.
Notice that these values are simply the squares of those for the standard
normal distribution, which they must be in view of the fact that the square
of a standard normal random variable has a chi-squared distribution on one
degree of freedom.
For commonly encountered probability distributions, such as the normal
and chi-squared, percentage points are tabulated in many introductory text
books on statistics, or in statistical tables such as those of Lindley and Scott
(1984). Statistical software packages used in computer-based statistical analy-
ses of survival data usually provide the exact P -values associated with hypoth-
esis tests as a matter of course. Note that when these are rounded off to, say,
three decimal places, a P -value of 0.000 should be interpreted as P < 0.001.
In deciding on a course of action, such as whether or not to reject the
hypothesis that there is no difference between two treatments, the statistical
evidence summarised in the P -value for the hypothesis test will be just one
ingredient of the decision-making process. In addition to the statistical evi-
dence, there will also be scientific evidence to consider. This may, for example,
concern whether the size of the treatment effect is clinically important. In par-
ticular, in a large trial, a difference between two treatments that is significant
at, say, the 5% level may be found when the magnitude of the treatment effect
is so small that it does not indicate a major scientific breakthrough. On the
other hand, a new formulation of a treatment may prolong life by a factor of
two, and yet, because of small sample sizes used in the study, may not appear
to be significantly different from the standard.
Rather than report findings in terms of the results of a hypothesis testing
procedure, it is more informative to provide an estimate of the size of any
treatment difference, supported by a confidence interval for this difference.
Unfortunately, the non-parametric approaches to the analysis of survival data
being considered in this chapter do not lend themselves to this approach. We
will therefore return to this theme in subsequent chapters when we consider
models for survival data.
In the comparison of two groups of survival data, there are a number of
methods that can be used to quantify the extent of between-group differences.
Two non-parametric procedures will now be considered, namely the log-rank
test and the Wilcoxon test.
Table 2.7 Number of deaths at the jth death time in each of two
groups of individuals.
Group Number of Number surviving Number at risk
deaths at t(j) beyond t(j) just before t(j)
I d1j n1j − d1j n1j
II d2j n2j − d2j n2j
Total dj nj − dj nj
Now consider the null hypothesis that there is no difference in the survival
experience of the individuals in the two groups. One way of assessing the
validity of this hypothesis is to consider the extent of the difference between
the observed number of individuals in the two groups who die at each of the
death times, and the numbers expected under the null hypothesis. Information
about the extent of these differences can then be combined over each of the
death times.
If the marginal totals in Table 2.7 are regarded as fixed, and the null
hypothesis that survival is independent of group is true, the four entries in
this table are solely determined by the value of d1j , the number of deaths at
t(j) in Group I. We can therefore regard d1j as a random variable, which can
take any value in the range from 0 to the minimum of dj and n1j . In fact,
d1j has a distribution known as the hypergeometric distribution, according to
which the probability that the random variable associated with the number
of deaths in the first group takes the value d1j is
( )( )
dj nj − dj
d1j n1j − d1j
( ) . (2.18)
nj
n1j
In this formula, the expression
( )
dj
d1j
represents the number of different ways in which d1j times can be chosen from
dj times and is read as ‘dj C d1j ’. It is given by
( )
dj dj !
= ,
d1j d1j !(dj − d1j )!
46 SOME NON-PARAMETRIC PROCEDURES
where dj !, read as ‘dj factorial ’, is such that
dj ! = dj × (dj − 1) × · · · × 2 × 1.
The other two terms in Expression (2.18) are interpreted in a similar manner.
The mean of the hypergeometric random variable d1j is given by
so that e1j is the expected number of individuals who die at time t(j) in
Group I. This value is intuitively appealing, since under the null hypothesis
that the probability of death at time t(j) does not depend on the group that
an individual is in, the probability of death at t(j) is dj /nj . Multiplying this
by n1j , gives e1j as the expected number of deaths in Group I at t(j) .
The next step is to combine the information from the individual 2 × 2
tables for each death time to give an overall measure of the deviation of the
observed values of d1j from their expected values. The most straightforward
way of doing this is to sum the differences d1j − e1j over the total number of
death times, r, in the two groups. The resulting statistic is given by
∑
r
UL = (d1j − e1j ). (2.20)
j=1
∑ ∑
Notice that this is d1j − e1j , which is the difference between the total
observed and expected numbers of deaths in Group I. This statistic will have
zero mean, since E (d1j ) = e1j . Moreover, since the death times are indepen-
dent of one another, the variance of UL is simply the sum of the variances of
the d1j . Now, since d1j has a hypergeometric distribution, the variance of d1j
is given by
n1j n2j dj (nj − dj )
v1j = , (2.21)
n2j (nj − 1)
so that the variance of UL is
∑
r
var (UL ) = v1j = VL , (2.22)
j=1
UL
√ ∼ N (0, 1),
VL
where the symbol ‘∼’ is read as ‘is distributed as’. The square of a standard
COMPARISON OF TWO GROUPS OF SURVIVAL DATA 47
normal random variable has a chi-squared distribution on one degree of free-
dom, denoted χ21 , and so we have that
UL2
∼ χ21 . (2.23)
VL
Table 2.8 Calculation of the log-rank statistic for the data from Example 1.2.
Death time d1j n1j d2j n2j dj nj e1j v1j
5 0 13 1 32 1 45 0.2889 0.2054
8 0 13 1 31 1 44 0.2955 0.2082
10 0 13 1 30 1 43 0.3023 0.2109
13 0 13 1 29 1 42 0.3095 0.2137
18 0 13 1 28 1 41 0.3171 0.2165
23 1 13 0 27 1 40 0.3250 0.2194
24 0 12 1 27 1 39 0.3077 0.2130
26 0 12 2 26 2 38 0.6316 0.4205
31 0 12 1 24 1 36 0.3333 0.2222
35 0 12 1 23 1 35 0.3429 0.2253
40 0 12 1 22 1 34 0.3529 0.2284
41 0 12 1 21 1 33 0.3636 0.2314
47 1 12 0 20 1 32 0.3750 0.2344
48 0 11 1 20 1 31 0.3548 0.2289
50 0 11 1 19 1 30 0.3667 0.2322
59 0 11 1 18 1 29 0.3793 0.2354
61 0 11 1 17 1 28 0.3929 0.2385
68 0 11 1 16 1 27 0.4074 0.2414
69 1 11 0 15 1 26 0.4231 0.2441
71 0 9 1 15 1 24 0.3750 0.2344
113 0 6 1 10 1 16 0.3750 0.2344
118 0 6 1 8 1 14 0.4286 0.2449
143 0 6 1 7 1 13 0.4615 0.2485
148 1 6 0 6 1 12 0.5000 0.2500
181 1 5 0 4 1 9 0.5556 0.2469
Total 5 9.5652 5.9289
where, as in the previous section, d1j is the number of deaths at time t(j)
in the first group and e1j is as defined in Equation (2.19). The difference
COMPARISON OF TWO GROUPS OF SURVIVAL DATA 49
between UW and UL is that in the Wilcoxon test, each difference d1j − e1j is
weighted by nj , the total number of individuals at risk at time t(j) . The effect
of this is to give less weight to differences between d1j and e1j at those times
when the total number of individuals who are still alive is small, that is, at
the longest survival times. This statistic is therefore less sensitive than the
log-rank statistic to deviations of d1j from e1j in the tail of the distribution
of survival times.
The variance of the Wilcoxon statistic UW is given by
∑
r
VW = n2j v1j ,
j=1
where v1j is given in Equation (2.21), and so the Wilcoxon test statistic is
2
WW = UW /VW ,
which has a chi-squared distribution on one degree of freedom when the null
hypothesis is true. The Wilcoxon test is therefore conducted in the same man-
ner as the log-rank test.
and so if S1 (t) and S2 (t) are the survivor functions for the two groups of
survival data, from Equation (2.24),
ψ
S1 (t) = {S2 (t)} .
Since the survivor function takes values between zero and unity, this result
shows that S1 (t) is greater than or less than S2 (t), according to whether ψ is
less than or greater than unity, at any time t. This means that if two hazard
functions are proportional, the true survivor functions do not cross. This is a
necessary, but not a sufficient condition for proportional hazards.
An informal assessment of the likely validity of the proportional hazards
assumption can be made from a plot of the estimated survivor functions for
two groups of survival data, such as that shown in Figure 2.9. If the two esti-
mated survivor functions do not cross, the assumption of proportional hazards
may be justified, and the log-rank test is appropriate. Of course, sample-based
estimates of survivor functions may cross even though the corresponding true
hazard functions are proportional, and so some care is needed in the interpre-
tation of such graphs. A more satisfactory graphical method for assessing the
validity of the proportional hazards assumption is described in Section 4.4.1
of Chapter 4.
In summary, unless a plot of the estimated survival functions, or previous
data, indicate that there is good reason to doubt the proportional hazards
assumption, the log-rank test should be used to test the hypothesis of equality
of two survivor functions.
Example 2.14 Prognosis for women with breast cancer
From the graph of the two estimated survivor functions in Figure 2.9, we see
that the survivor function for the negatively stained women always lies above
that for the positively stained women. This suggests that the proportional
hazards assumption is appropriate, and that the log-rank test is more appro-
priate than the Wilcoxon test. However, in this example, there is very little
difference between the results of the two hypothesis tests.
for k, k ′ = 1, 2, . . . , g − 1.
Finally, in order to test the null hypothesis of no group differences, we
make use of the result that the test statistic U ′L V −1 ′ −1
L U L , or U W V W U W ,
has a chi-squared distribution on (g − 1) degrees of freedom, when the null
hypothesis is true.
Statistical software for the analysis of survival data usually incorporates
this methodology, and because the interpretation of the resulting chi-squared
statistic is straightforward, an example will not be given here.
52 SOME NON-PARAMETRIC PROCEDURES
2.8 Stratified tests
In many circumstances, there is a need to compare two or more sets of survival
data, after taking account of additional variables recorded on each individual.
As an illustration, consider a multicentre clinical trial in which two forms of
chemotherapy are to be compared in terms of their effect on the survival times
of lung cancer patients. Information on the survival times of patients in each
treatment group will be available from each centre. The resulting data are
then said to be stratified by centre.
Individual log-rank or Wilcoxon tests based on the data from each centre
will be informative, but a test that combines information about the treatment
difference in each centre provides a more precise summary of the treatment
effect. A similar situation would arise in attempting to test for treatment
differences when patients are stratified according to variables such as age
group, sex, performance status and other potential risk factors for the disease
under study.
In situations such as those described above, a stratified version of the log-
rank or Wilcoxon test may be employed. Essentially, this involves calculating
the values of the U - and V -statistics for each stratum, and then combining
these values over the strata. In this section, the stratified log-rank test will
be described, but a stratified version of the Wilcoxon test can be obtained in
a similar manner. An equivalent analysis, based on a model for the survival
times, is described in Section 11.2 of Chapter 11.
Let ULk be the value of the log-rank statistic for comparing two treatment
groups, computed from the kth of s strata using Equation (2.20). Also, denote
the variance of the statistic for the kth stratum by VLk , where VLk would be
computed for each stratum using Equation (2.21). The stratified log-rank test
is then based on the statistic
∑s 2
( ULk )
WS = ∑k=1 s , (2.25)
k=1 VLk
which has a chi-squared distribution on one degree of freedom (1 d.f.) under the
null hypothesis that there is no treatment difference. Comparing the observed
value of this statistic with percentage points of the chi-squared distribution
enables the hypothesis of no overall treatment difference to be tested.
These data are analysed by first computing the log-rank statistics for com-
paring the survival times of patients in the two treatment groups, separately
for each age group. The resulting values of the U -, V - and W -statistics, found
using Equations (2.20), (2.22) and (2.23), are summarised in Table 2.10.
The values of the WL -statistic are quite similar for the three age groups,
suggesting that the treatment effect is consistent over these groups. Moreover,
none of them are significantly large at the 10% level.
To carry out a stratified log-rank test on these data, we calculate the
WS -statistic defined in Equation (2.25). Using the results in Table 2.10,
1.23742
WS = = 0.688.
2.2246
The observed value of WS is not significant when compared with percentage
points of the chi-squared distribution on 1 d.f. We therefore conclude that after
allowing for the different age groups, there is no significant difference between
the survival times of patients treated with the BCG vaccine and those treated
with C. parvum.
For comparison, when the division of the patients into the different age
groups is ignored, the log-rank test for comparing the two groups of patients
leads to WL = 0.756. The fact that this is so similar to the value that allows for
54 SOME NON-PARAMETRIC PROCEDURES
age group differences suggests that it is not necessary to stratify the patients
by age.
The stratified log-rank test can be extended to compare more than two treat-
ment groups. The resulting formulae render it unsuitable for hand calculation,
but the methodology can be implemented using computer software for sur-
vival analysis. However, this method of taking account of additional variables
is not as flexible as that based on a modelling approach, introduced in the
next chapter, and so further details are not given here.
are the observed and expected numbers of deaths in the kth group, where
the summation is over the rk death times in that group. Note that the dot
subscript in the notation dk. and ek. stands for summation over the subscript
that the dot replaces. The codes are often taken to be equally spaced to
correspond to a linear trend across the groups. For example, if there are three
groups, the codes might be taken to be 1, 2 and 3, although the equivalent
choice of −1, 0 and 1 does simplify the calculations somewhat. The variance
of UT is given by
∑g
VT = (wk − w̄)2 ek. , (2.27)
k=1
where w̄ is a weighted sum of the quantities wk , in which the expected numbers
of deaths, ek. , are the weights, that is,
∑g
k=1 wk ek.
w̄ = ∑ g .
k=1 ek.
LOG-RANK TEST FOR TREND 55
The statistic WT = UT2 /VT then has a chi-squared distribution on 1 d.f. under
the hypothesis of no trend across the g groups.
The log-rank test for trend is based on the statistic in Equation (2.26), the
value of which is
Using the values of the expected numbers of deaths in each group, given in
Table 2.11, the weighted mean of the wk ’s is given by
e3. − e1.
w̄ = = 0.5138.
e1. + e3.
The three values of (wk − w̄)2 are 0.2364, 0.2640 and 2.2917, and, from Equa-
tion (2.27), VT = 2.4849. Finally, the test statistic is
UT2
WT = = 2.656,
VT
which is just about significant at the 10% level (P = 0.103) when judged
against a chi-squared distribution on 1 d.f. We therefore conclude that there
is slight evidence of a linear trend across the age groups.
An alternative method of examining whether there is a trend across the
levels of an ordered categorical variable, based on a modelling approach to
the analysis of survival data, is described and illustrated in Section 3.8.1 of
the next chapter.
56 SOME NON-PARAMETRIC PROCEDURES
2.10 Further reading
The life-table, which underpins the calculation of the life-table estimate of
the survivor function, is widely used in the analysis of data from epidemio-
logical studies. Fuller details of this application can be found in Armitage,
Berry and Matthews (2002), and books on statistical methods in demography
and epidemiology, such as Pollard, Yusuf and Pollard (1990) and Woodward
(2014).
The product-limit estimate of the survivor function has been in use since
the early 1900s. Kaplan and Meier (1958) derived the estimate using the
method of maximum likelihood, which is why the estimate now bears their
name. The properties of the Kaplan-Meier estimate of the survivor function
have been further explored by Breslow and Crowley (1974) and Meier (1975).
The Nelson-Aalen estimate is due to Altshuler (1970), Nelson (1972) and
Aalen (1978b).
The expression for the standard error of the Kaplan-Meier estimate was
first given by Greenwood (1926), but an alternative result is given by Aalen
and Johansen (1978). Expressions for the variance of the Nelson-Aalen esti-
mate of the cumulative hazard function are compared by Klein (1991). Al-
though Section 2.2.3 shows how a confidence interval for the value of the sur-
vivor function at particular times can be found using Greenwood’s formula,
alternative procedures are needed for the construction of confidence bands
for the complete survivor function. Hall and Wellner (1980) and Efron (1981)
have shown how such bands can be computed, and these procedures are also
described by Harris and Albert (1991).
Methods for constructing confidence intervals for the median survival time
are described by Brookmeyer and Crowley (1982), Emerson (1982), Nair
(1984), Simon and Lee (1982) and Slud, Byar and Green (1984). Simon (1986)
emphasises the importance of confidence intervals in reporting the results of
clinical trials, and includes an illustration of a method described in Slud, Byar
and Green (1984). Klein and Moeschberger (2005) include a comprehensive
review of kernel-smoothed estimates of the hazard function.
The formulation of the hypothesis testing procedure in the frequentist
approach to inference is covered in many statistical texts. See, for example,
Altman (1991) and Armitage, Berry and Matthews (2002) for non-technical
presentations of the ideas in a medical context.
The log-rank test results from the work of Mantel and Haenszel (1959),
Mantel (1966) and Peto and Peto (1972). See Lawless (2002) for details of the
rank test formulation. A thorough review of the hypergeometric distribution,
used in the derivation of the log-rank test in Section 2.6.2, is included in
Johnson, Kemp and Kotz (2005). The log-rank test for trend is derived from
the test for trend in a 2 × k contingency table, given in Armitage, Berry and
Matthews (2002).
Chapter 3
57
58 THE COX REGRESSION MODEL
to influence survival time, and so it will be important to take account of these
variables when assessing the extent of any treatment difference.
In the analysis of survival data, interest centres on the risk or hazard of
death at any time after the time origin of the study. As a consequence, the
hazard function, defined in Section 1.3.2 of Chapter 1, is modelled directly in
survival analysis. The resulting models are somewhat different in form from
linear models encountered in regression analysis and in the analysis of data
from designed experiments, where the dependence of the mean response, or
some function of it, on certain explanatory variables is modelled. However,
many of the principles and procedures used in linear modelling carry over to
the modelling of survival data.
There are two broad reasons for modelling survival data. One objective of
the modelling process is to determine which combination of potential explana-
tory variables affect the form of the hazard function. In particular, the effect
that the treatment has on the hazard of death can be studied, as can the ex-
tent to which other explanatory variables affect the hazard function. Another
reason for modelling the hazard function is to obtain an estimate of the hazard
function itself for an individual. This may be of interest in its own right, but
in addition, from the relationship between the survivor function and hazard
function described by Equation (1.6), an estimate of the survivor function
can be found. This will in turn lead to an estimate of quantities such as the
median survival time, which will be a function of the explanatory variables in
the model. The median survival time could then be estimated for current or
future patients with particular values of these explanatory variables. The re-
sulting estimate could be particularly useful in devising a treatment regimen,
or in counselling the patient about their prognosis.
The model for survival data to be described in this chapter is based on the
assumption of proportional hazards, introduced in Section 2.6.4 of Chapter 2,
and is called a proportional hazards model. We first develop the model for the
comparison of the hazard functions for individuals in two groups.
Notice that there is no constant term in the linear component of this propor-
tional hazards model. If a constant term β0 , say, were included, the baseline
hazard function could simply be rescaled by dividing h0 (t) by exp(β0 ), and
the constant term would cancel out. The model in Equation (3.3) can also be
re-expressed in the form
{ }
hi (t)
log = β1 x1i + β2 x2i + · · · + βp xpi ,
h0 (t)
to give a linear model for the logarithm of the hazard ratio.
The model in Equation (3.3), in which no assumptions are made about
the actual form of the baseline hazard function h0 (t), was introduced by Cox
(1972) and has come to be known as the Cox regression model or the Cox pro-
portional hazards model. Since no particular form of probability distribution is
assumed for the survival times, the Cox regression model is a semi-parametric
model, and Section 3.3 will show how the β-coefficients in this model can be
estimated. Of course, we will often need to estimate h0 (t) itself, and we will
see how this can be done in Section 3.10. Models in which specific assump-
tions are made about the form of the baseline hazard function, h0 (t), will be
described in Chapters 5 and 6.
In models such as this, the baseline hazard function, h0 (t), is the hazard
function for an individual for whom all the variates included in the model
take the value zero.
The term αj can be incorporated in the linear part of the Cox regression
model by including the a − 1 explanatory variables X2 , X3 , . . . , Xa with coef-
ficients α2 , α3 , . . . , αa . In other words, the term αj in the model is replaced
by α2 x2 + α3 x3 + · · · + αa xa , where xj is the value of Xj for an individual
for whom A is at level j, j = 2, 3, . . . , a. There are then a − 1 parameters
associated with the main effect of the factor A, and A is said to have a − 1
degrees of freedom.
(αβ)22 u2 v2 + (αβ)23 u2 v3 .
There are therefore two parameters associated with the interaction between A
and B. In general, if A and B have a and b levels, respectively, the two-factor
interaction AB has (a−1)(b−1) parameters associated with it, in other words
AB has (a − 1)(b − 1) degrees of freedom. Furthermore, the term (αβ)jk is
equal to zero whenever either A or B are at the first level, that is, when either
j = 1 or k = 1.
Let the coefficients of the values of the products U2 X and U3 X be α2′ and
α3′ ,respectively, and let the coefficient of the value of the variate X in the
model be β. Then, the model contains the terms βx + α2′ (u2 x) + α3′ (u3 x).
From Table 3.3, u2 = 0 and u3 = 0 for individuals at level 1 of A, and so the
coefficient of x for these individuals is just β. For those at level 2 of A, u2 = 1
and u3 = 0, and the coefficient of x is β + α2′ . Similarly, at level 3 of A, u2 = 0
and u3 = 1, and the coefficient of x is β + α3′ .
Notice that if the term βx is omitted from the model, the coefficient of x
for individuals 1, 2 and 3 would be zero. There would then be no information
about the relationship between the hazard function and the variate X for
individuals at the first level of the factor A.
The manipulation described in the preceding paragraphs can be avoided
by defining the indicator variables in a different way. If a factor A has a levels,
and it is desired to include the term αj x in a model, without necessarily
FITTING THE COX REGRESSION MODEL 65
including the term βx, a indicator variables Z1 , Z2 , . . . , Za can be defined for
A, where Zj = 1 at level j of A and zero otherwise. The corresponding values
of these products for an individual, z1 x, z2 x, . . . , za x, are then included in the
model with coefficients α1 , α2 , . . . , αa . These are the coefficients of x for each
level of A.
Now, if the variate X is included in the model, along with the a products
of the form Zj X, there will be a + 1 terms corresponding to the a coeffi-
cients. It will not then be possible to obtain unique estimates of each of these
α-coefficients, and the model is said to be overparameterised. This overpa-
rameterisation can be dealt with by forcing one of the a + 1 coefficients to be
zero. In particular, taking α1 = 0 would be equivalent to a redefinition of the
indicator variables, in which Z1 is taken to be zero. This then leads to the
same formulation of the model that has already been discussed.
The application of these ideas in the analysis of actual data sets will be
illustrated in Section 3.4, after we have seen how the Cox regression model
can be fitted.
in which x(j) is the vector of covariates for the individual who dies at the jth
ordered death time, t(j) . The summation in the denominator of this likelihood
function is the sum of the values of exp(β ′ x) over all individuals who are at
risk at time t(j) . Notice that the product is taken over the individuals for whom
death times have been recorded. Individuals for whom the survival times are
censored do not contribute to the numerator of the log-likelihood function,
but they do enter into the summation over the risk sets at death times that
occur before a censored time.
The likelihood function that has been obtained is not a true likelihood,
since it does not make direct use of the actual censored and uncensored sur-
vival times. For this reason it is referred to as a partial likelihood function.
The likelihood function in Equation (3.4) depends only on the ranking of the
death times, since this determines the risk set at each death time. Conse-
quently, inferences about the effect of explanatory variables on the hazard
function depend only on the rank order of the survival times.
Now suppose that the data consist of n observed survival times, denoted
by t1 , t2 , . . . , tn , and that δi is an event indicator, which is zero if the ith
survival time ti , i = 1, 2, . . . , n, is right-censored, and unity otherwise. The
partial likelihood function in Equation (3.4) can then be expressed in the form
{ }δi
∏n
exp(β ′ xi )
∑ ′ , (3.5)
i=1 l∈R(ti ) exp(β xl )
where R(ti ) is the risk set at time ti . From Equation (3.5), the corresponding
partial log-likelihood function is given by
∑n ∑
log L(β) = δi β ′ xi − log exp(β ′ xl ) . (3.6)
i=1 l∈R(ti )
P(individual with variables x(j) dies at t(j) | one death at t(j) ). (3.7)
Next, from the result that the probability of an event A, given that an event
B has occurred, is given by
Since the death times are assumed to be independent of one another, the
denominator of this expression is the sum of the probabilities of death at
time t(j) over all individuals who are at risk of death at that time. If these
68 THE COX REGRESSION MODEL
individuals are indexed by l, with R(t(j) ) denoting the set of individuals who
are at risk at time t(j) , Expression (3.8) becomes
The probabilities of death at time t(j) , in Expression (3.9), are now replaced
by probabilities of death in the interval (t(j) , t(j) + δt), and dividing both the
numerator and denominator of Expression (3.9) by δt, we get
The limiting value of this expression as δt → 0 is then the ratio of the prob-
abilities in Expression (3.9). But from Equation (1.3), this limit is also the
ratio of the corresponding hazards of death at time t(j) , that is,
If it is the ith individual who dies at t(j) , the hazard function in the numerator
of this expression can be written hi (t(j) ). Similarly, the denominator is the sum
of the hazards of death at time t(j) over all individuals who are at risk of death
at this time. This is the sum of the values hl (t(j) ) over those individuals in
the risk set at time t(j) , R(t(j) ). Consequently, the conditional probability in
Expression (3.7) becomes
hi (t(j) )
∑ .
l∈R(t(j) ) hl (t(j) )
On using Equation (3.3), the baseline hazard function in the numerator and
denominator cancels out, and we are left with
exp(β ′ x(j) )
∑ ′ .
l∈R(t(j) ) exp(β xl )
Finally, taking the product of these conditional probabilities over the r death
times gives the partial likelihood function in Equation (3.4).
In order to throw more light on the structure of the partial likelihood,
consider a sample of survival data from five individuals, numbered from 1
to 5. The survival data are illustrated in Figure 3.1. The observed survival
times of individuals 2 and 5 will be taken to be right-censored, and the three
ordered death times are denoted t(1) < t(2) < t(3) . Then, t(1) is the death time
of individual 3, t(2) is that of individual 1, and t(3) that of individual 4.
The risk set at each of the three ordered death times consists of the in-
dividuals who are alive and uncensored just prior to each death time. Hence,
FITTING THE COX REGRESSION MODEL 69
1 D
2 C
Individual
3 D
4 D
5 C
0 t t t
Time
the risk set R(t(1) ) consists of all five individuals, risk set R(t(2) ) consists of
individuals 1, 2 and 4, while risk set R(t(3) ) only includes individual 4. Now
write ψ(i) = exp(β ′ xi ), i = 1, 2, . . . , 5, where xi is the vector of explanatory
variables for the ith individual. The numerators of the partial likelihood func-
tion for times t(1) , t(2) and t(3) , respectively, are ψ(3), ψ(1) and ψ(4), since
individuals 3, 1 and 4, respectively, die at the three ordered death times. The
partial likelihood function over the three death times is then
ψ(3) ψ(1) ψ(4)
× × .
ψ(1) + ψ(2) + ψ(3) + ψ(4) + ψ(5) ψ(1) + ψ(2) + ψ(4) ψ(4)
It turns out that standard results used in maximum likelihood estimation
carry over without modification to maximum partial likelihood estimation. In
particular, the results given in Appendix A for the variance-covariance matrix
of the estimates of the βs can be used, as can distributional results associated
with likelihood ratio testing, to be discussed in Section 3.5.
∏
r
exp(β ′ sj )
{∑ }dj . (3.10)
′
j=1
l∈R(t(j) ) exp(β x l )
∏
r
exp(β ′ sj )
∏d j [ ∑ ∑ ] (3.11)
j=1 k=1 l∈R(t(j) ) exp(β ′ xl ) − (k − 1)d−1
j l∈D(t(j) ) exp(β ′ xl )
∂ 2 log L(β)
− .
∂βj ∂βk
Generally speaking, a confidence interval for the true hazard ratio will be
more informative than the standard error of the estimated hazard ratio. A
100(1 − α)% confidence interval for the true hazard ratio, ψ, can be found
simply by exponentiating the confidence limits for β. An interval estimate
obtained in this way is preferable to one found using ψ̂ ± zα/2 se (ψ̂). This is
because the distribution of the logarithm of the estimated hazard ratio will be
more closely approximated by a normal distribution than that of the hazard
ratio itself.
The construction of a confidence interval for a hazard ratio is illustrated
in Example 3.1 below. Fuller details on the interpretation of the parameters
in the linear component of a Cox regression model are given in Section 3.9.
We can go further and construct a confidence interval for this hazard ratio.
The first step is to obtain a confidence interval for the logarithm of the hazard
ratio, β. For example, a 95% confidence interval for β is the interval from
β̂ − 1.96 se (β̂) to β̂ + 1.96 se (β̂), that is, the interval from −0.074 to 1.890.
Exponentiating these confidence limits gives (0.93, 6.62) as a 95% confidence
interval for the hazard ratio itself. Notice that this interval barely includes
unity, suggesting that there is evidence that the two groups of women have a
different survival experience.
of the remaining six might be different from those in Table 3.4. For example,
if Bun is omitted, the estimated coefficients of the six remaining explanatory
variables, Age, Sex, Ca, Hb, Pcells and Protein, turn out to be −0.009, −0.301,
−0.036, −0.140, −0.001 and −0.420, respectively. Comparison with the values
shown in Table 3.4 shows that there are differences in the estimated coefficients
of each of these six variables, although in this case the differences are not very
great.
In general, to determine on which of the seven explanatory variables the
hazard function depends, a number of different models will need to be fitted,
and the results compared. Methods for comparing the fit of alternative models,
and strategies for model building are considered in subsequent sections of this
chapter.
Model (2) then contains the q additional explanatory variables Xp+1 , Xp+2 ,
. . . , Xp+q . Because Model (2) has a larger number of terms than Model (1),
Model (2) must be a better fit to the observed data. The statistical problem
is then to determine whether the additional q terms in Model (2) significantly
improve the explanatory power of the model. If not, they might be omitted,
and Model (1) would be deemed to be adequate.
In the discussion of Example 3.2, we saw that when there are a number
of explanatory variables of possible relevance, the effect of each term cannot
be studied independently of the others. The effect of any given term therefore
depends on the other terms currently included in the model. For example,
in Model (1), the effect of any of the p explanatory variables on the hazard
function depends on the p − 1 variables that have already been fitted, and
the effect of Xp is said to be adjusted for the remaining p − 1 variables. In
particular, the effect of Xp is adjusted for X1 , X2 , . . . , Xp−1 , but we also speak
of the effect of Xp eliminating or allowing for X1 , X2 , . . . , Xp−1 . Similarly,
when the q variables Xp+1 , Xp+2 , . . . , Xp+q are added to Model (1), the effect
of these variables on the hazard function is said to be adjusted for the p
variables that have already been fitted, X1 , X2 , . . . , Xp .
The difference between the values of −2 log L̂ for the null model and the
model that contains X can be used to assess the significance of the difference
between the hazard functions for the two groups of women. Since one model
contains one more β-parameter than the other, the difference in the values of
−2 log L̂ has a chi-squared distribution on one degree of freedom. The differ-
80 THE COX REGRESSION MODEL
ence in the two values of −2 log L̂ is 173.968 − 170.096 = 3.872, which is just
significant at the 5% level (P = 0.049). We may therefore conclude that there
is evidence, significant at the 5% level, that the hazard functions for the two
groups of women are different.
In Example 2.12, the extent of the difference between the survival times
of the two groups of women was investigated using the log-rank test. The chi-
squared value for this test was found to be 3.515 (P = 0.061). This value is
not very different from the figure of 3.872 (P = 0.049) obtained above. The
similarity of these two P -values means that essentially the same conclusions
are drawn about the extent to which the data provide evidence against the
null hypothesis of no group difference. From the practical viewpoint, the fact
that one result is just significant at the 5% level, while the other is not quite
significant at that level, is immaterial.
Although the model-based approach used in this example is operationally
different from the log-rank test, the two procedures are in fact closely related.
This relationship will be explored in greater detail in Section 3.13.
terms αj and νk may then be included in Cox regression models for hi (t), the
hazard function for the ith individual in the study. The five possible models
are then as follows:
the model. The explanatory variables fitted, and the values of −2 log L̂ for
each of the five models under consideration, are shown in Table 3.8. When
computer software for modelling survival data enables factors to be included
in a model without having to define appropriate indicator variables, the values
of −2 log L̂ in Table 3.8 can be obtained directly.
Table 3.8 Values of −2 log L̂ on fitting five models to the data in Table 3.6.
Model Terms in model Variables in model −2 log L̂
(1) null model none 177.667
(2) αj A2 , A3 172.172
(3) νk N 170.247
(4) αj + νk A2 , A3 , N 165.508
(5) αj + νk + (αν)jk A2 , A3 , N, A2 N, A3 N 162.479
There are two further steps in the modelling approach to the analysis of
survival data. First, we will need to critically examine the fit of a model to the
observed data in order to ensure that the fitted Cox regression model is indeed
appropriate. Second, we will need to interpret the parameter estimates in the
chosen model, in order to quantify the effect that the explanatory variables
have on the hazard function. Interpretation of parameters in a fitted model is
considered in Section 3.9, while methods for assessing the adequacy of a fitted
model will be considered in Chapter 4. But first, possible strategies for model
selection are discussed.
where q is the number of unknown parameters in the fitted model and d is the
number of uncensored observations in the data set. The BIC statistic is also
known as the Schwarz Bayesian Criterion and denoted SBC. The Bayesian
Information Criterion is an adjusted value of the −2 log L̂ statistic that takes
account of both the number of unknown parameters in the fitted model and
the number of observations to which the model has been fitted. As for the
AIC statistic, smaller values of BIC are obtained for better models.
Of course, some terms may be identified as alternatives to those in a par-
ticular model, leading to subsets that are equally suitable. The decision on
STRATEGY FOR MODEL SELECTION 85
which of these subsets is the most appropriate should not then rest on sta-
tistical grounds alone. When there are no subject matter grounds for model
choice, the model chosen for initial consideration from a set of alternatives
might be the one for which the value of −2 log L̂, AIC or BIC is a minimum.
It will then be important to confirm that the model does fit the data using
the methods for model checking described in Chapter 4.
In some applications, information might be recorded on a number of vari-
ables, all of which relate to the same general feature. For example, the variables
height, weight, body mass index (weight/height2 ), head circumference, arm
length and so on, are all concerned with the size of an individual. In view
of inter-relationships between these variables, a model for the survival times
of these individuals may not need to include each of them. It would then be
appropriate to determine which variables from this group should be included
in the model, although it may not matter exactly which variables are chosen.
When the number of variables is relatively large, it can be computation-
ally expensive to fit all possible models. In particular, if there is a pool of p
potential explanatory variables, there are 2p possible combinations of terms,
so that if p > 10, there are more than a thousand possible combinations of
explanatory variables. In this situation, automatic routines for variable selec-
tion that are available in many software packages might seem an attractive
prospect. These routines are based on forward selection, backward elimination
or a combination of the two known as the stepwise procedure.
In forward selection, variables are added to the model one at a time. At
each stage in the process, the variable added is the one that gives the largest
decrease in the value of −2 log L̂ on its inclusion. The process ends when the
next candidate for inclusion in the model does not reduce the value of −2 log L̂
by more than a prespecified amount. This is known as the stopping rule. This
rule is often couched in terms of the significance level of the difference in the
values of −2 log L̂ when a variable is added to a model, so that the selection
process ends when the next term for inclusion ceases to be significant at a
preassigned level.
In backward elimination, a model that contains the largest number of
variables under consideration is first fitted. Variables are then excluded one
at a time. At each stage, the variable omitted is the one that increases the
value of −2 log L̂ by the smallest amount on its exclusion. The process ends
when the next candidate for deletion increases the value of −2 log L̂ by more
than a prespecified amount.
The stepwise procedure operates in the same way as forward selection.
However, a variable that has been included in the model can be considered
for exclusion at a later stage. Thus, after adding a variable to the model, the
procedure then checks whether any previously included variable can now be
deleted. These decisions are again made on the basis of prespecified stopping
rules.
These automatic routines have a number of disadvantages. Typically, they
lead to the identification of one particular subset, rather than a set of equally
86 THE COX REGRESSION MODEL
good ones. The subsets found by these routines often depend on the variable
selection process that has been used, that is, whether it is forward selection,
backward elimination or the stepwise procedure, and generally tend not to
take any account of the hierarchic principle. They also depend on the stop-
ping rule that is used to determine whether a term should be included in
or excluded from a model. For all these reasons, these automatic routines
have a limited role in model selection, and should certainly not be used
uncritically.
Instead of using automatic variable selection procedures, the following gen-
eral strategy for model selection is recommended.
1. The first step is to fit models that contain each of the variables one at a
time. The values of −2 log L̂ for these models are then compared with that
for the null model to determine which variables on their own significantly
reduce the value of this statistic.
2. The variables that appear to be important from Step 1 are then fitted
together. In the presence of certain variables, others may cease to be im-
portant. Consequently, those variables that do not significantly increase
the value of −2 log L̂ when they are omitted from the model can now be
discarded. We therefore compute the change in the value of −2 log L̂ when
each variable on its own is omitted from the set. Only those that lead to a
significant increase in the value of −2 log L̂ are retained in the model. Once
a variable has been dropped, the effect of omitting each of the remaining
variables in turn should be examined.
3. Variables that were not important on their own, and so were not under
consideration in Step 2, may become important in the presence of others.
These variables are therefore added to the model from Step 2, one at a
time, and any that reduce −2 log L̂ significantly are retained in the model.
This process may result in terms in the model determined at Step 2 ceasing
to be significant.
4. A final check is made to ensure that no term in the model can be omitted
without significantly increasing the value of −2 log L̂, and that no term not
included significantly reduces −2 log L̂.
The first step is to fit the null model and models that contain each of the
seven explanatory variables on their own. Of these variables, Bun leads to the
largest reduction in −2 log L̂, reducing the value of the statistic from 215.940
to 207.453. This reduction of 8.487 is significant at the 1% level (P = 0.004)
when compared with percentage points of the chi-squared distribution on 1 d.f.
The reduction in −2 log L̂ on adding Hb to the null model is 4.872, which is
also significant at the 5% level (P = 0.027). The only other variable that on
its own has some explanatory power is Protein, which leads to a reduction
in −2 log L̂ that is nearly significant at the 15% level (P = 0.152). Although
this P -value is relatively high, we will for the moment keep Protein under
consideration for inclusion in the model.
88 THE COX REGRESSION MODEL
The next step is to fit the model that contains Bun, Hb and Protein,
which leads to a value of −2 log L̂ of 200.503. The effect of omitting each of
the three variables in turn from this model is shown in Table 3.9. In particular,
when Bun is omitted, the increase in −2 log L̂ is 9.326, when Hb is omitted
the increase is 3.138, and when Protein is omitted it is 2.435. Each of these
changes in the value of −2 log L̂ can be compared with percentage points of a
chi-squared distribution on 1 d.f. Since Protein does not appear to be needed
in the model, in the presence of Hb and Bun, this variable will not be further
considered for inclusion.
If either Hb or Bun is excluded from the model that contains both of these
variables, the increase in −2 log L̂ is 4.515 and 8.130, respectively. Both of
these increases are significant at the 5% level, and so neither Hb nor Bun can
be excluded from the model without significantly increasing the value of the
−2 log L̂ statistic.
Finally, we look to see if any of variables Age, Sex, Ca and Pcells should be
included in the model that contains Bun and Hb. Table 3.9 shows that when
any of these four variables is added, the reduction in −2 log L̂ is less than 0.5,
and so none of them need to be included in the model. We therefore conclude
that the most satisfactory model is that containing Bun and Hb.
1. The important prognostic variables are first selected, ignoring the treatment
effect. Models with all possible combinations of the variables can be fitted
when their number is not too large. Alternatively, the variable selection
process might follow similar lines to those described previously in Steps 1
to 4.
2. The treatment effect is then included in the model. In this way, any differ-
ences between the two groups that arise as a result of differences between
the distributions of the prognostic variables in each treatment group, are
not attributed to the treatment.
3. If the possibility of interactions between the treatment and other explana-
tory variables has not been discounted, these must be considered before the
treatment effect can be interpreted.
Table 3.10 Values of −2 log L̂, AIC and BIC for models fitted
to the data from Example 1.4.
Variables in model −2 log L̂ AIC BIC
none 36.349 36.349 36.349
Age 36.269 38.269 38.061
Shb 36.196 38.196 37.988
Size 29.042 31.042 30.834
Index 29.127 31.127 30.919
Age + Shb 36.151 40.151 39.735
Age + Size 28.854 32.854 32.438
Age + Index 28.760 32.760 32.344
Shb + Size 29.019 33.019 32.603
Shb + Index 27.981 31.981 31.565
Size + Index 23.533 27.533 27.117
Age + Shb + Size 28.852 34.852 34.227
Age + Shb + Index 27.893 33.893 33.268
Age + Size + Index 23.269 29.269 28.644
Shb + Size + Index 23.508 29.508 28.883
Age + Shb + Size + Index 23.231 31.231 30.398
∑p
which is maximised subject to the constraint that j=1 |βj | 6 s. The quantity
∑p
j=1 |βj | is called the L1 -norm of the vector β, and is usually denoted ||β||1 .
The estimates β̂ that maximise the constrained partial likelihood function are
also the values that maximise
∑
p
Lλ (β) = L(β) − λ |βj |, (3.15)
j=1
and the value of λ that maximises this function is chosen as the optimal value
for the lasso parameter. The variables selected for inclusion in the model are
then those with non-zero coefficients at this value of λ.
Following use of the lasso procedure, standard errors of the parameter es-
timates, or functions of these estimates such as hazard ratios, are not usually
presented. One reason for this is that they are difficult to compute, but the
main reason is that the parameter estimates will be biased towards zero. As a
result, standard errors are not very meaningful and will tend to underestimate
the precision of the estimates. If standard errors are required, the lasso pro-
cedure could be used to determine the variables that should be included in a
model, that is those with non-zero coefficients at an optimal value of the lasso
parameter. A standard Cox regression model that contains these variables is
then fitted, but the advantages of shrinkage are then lost.
0.2
Bun
0.0 Ca
Pcells
Age
Estimated coefficient
Hb
-0.2
Sex
-0.4
-0.6
Protein
-0.8
0 2 4 6 8 10
Value of
Figure 3.2 Trace of the estimated coefficients of the explanatory variables as a func-
tion of the lasso parameter, λ.
-140
-145
-150
0 2 4 6 8 10
Value of
When a model containing Bun, Hb2, Hb3 and Hb4 is fitted, the value of
−2 log L̂ is 200.417. The change in the value of this statistic on adding the
indicator variables Hb2, Hb3 and Hb4 to the model that contains Bun alone
is 7.036 on 3 d.f., which is significant at the 10% level (P = 0.071). However,
it is difficult to identify any pattern across the factor levels.
A linear trend across the levels of the factor corresponding to haemoglo-
bin level can be modelled by fitting the variate X, which takes values 1, 2,
3, 4, according to the factor level. When the model containing Bun and X
is fitted, −2 log L̂ is 203.891, and the change in the value of −2 log L̂ due to
any non-linearity is 203.891 − 200.417 = 3.474 on 2 d.f. This is not significant
when compared with percentage points of the chi-squared distribution on 2 d.f.
(P = 0.176). We therefore conclude that the effect of haemoglobin level on
the hazard of death in this group of patients is adequately modelled by using
the linear term Hb.
and
hi (t) = exp(β1 Hb pi 1 + β2 Hb pi 2 + β3 Bun i )h0 (t),
are shown in Table 3.13.
From this table, the best models with just one power of Hb, that is for
m = 1, are those with a linear or a quadratic term, and of these, the model
with Hb alone is the simplest. When models with two powers of Hb are fitted,
that with p1 = −2 and p2 = −1 or −0.5 lead to the smallest values of −2 log L̂,
but neither leads to a significant improvement on the model with just one
power of Hb. If another power of Hb was to be added to the model that
includes Hb alone, we would add Hb−2 , but again there is no need to do this
INTERPRETATION OF PARAMETER ESTIMATES 99
and so β̂ in the fitted Cox regression model is the estimated change in the
logarithm of the hazard ratio when the value of X is increased by one unit.
100 THE COX REGRESSION MODEL
Using a similar argument, the estimated change in the log-hazard ratio
when the value of the variable X is increased by r units is rβ̂, and the cor-
responding estimate of the hazard ratio is exp(rβ̂). The standard error of the
estimated log-hazard ratio will be r se (β̂), from which confidence intervals for
the true hazard ratio can be derived.
The above argument shows that when a continuous variable X is included
in a Cox regression model, the hazard ratio when the value of X is changed
by r units does not depend on the actual value of X. For example, if X refers
to the age of an individual, the hazard ratio for an individual aged 70, relative
to one aged 65, would be the same as that for an individual aged 20, relative
to one aged 15. This feature is a direct result of fitting X as a linear term in
the Cox regression model. If there is doubt about the assumption of linearity,
this can be checked using the procedure described in Section 3.8.1. Fractional
polynomials in X or a non-linear transformation of X might then be used in
the modelling process.
where γj is the effect due to the jth level of the factor, and h0 (t) is the baseline
hazard function. This model is overparameterised, and so, as in Section 3.2.2,
we take γ1 = 0. The baseline hazard function then corresponds to the hazard
of death at time t for an individual in the first group. The ratio of the hazards
at time t for an individual in the jth group, j > 2, relative to an individual
in the first group, is then exp(γj ). Consequently, the parameter γj is the
logarithm of this relative hazard, that is,
This model can be fitted by defining two indicator variables, A2 and A3 , where
A2 is unity if the patient is aged between 60 and 70, and A3 is unity if the
patient is more than 70 years of age, as in Example 3.4. This corresponds to
taking α1 = 0.
The value of −2 log L̂ for the null model is 128.901, and when the term αj
is added, the value of this statistic reduces to 122.501. This reduction of 6.400
on 2 d.f. is significant at the 5% level (P = 0.041), and so we conclude that
the hazard function does depend on which age group the patient is in.
The coefficients of the indicator variables A2 and A3 are estimates of α2
and α3 , respectively, and are given in Table 3.14. Since the constraint α1 = 0
has been used, α̂1 = 0.
The hazard ratio for a patient aged 60–70, relative to one aged less than
60, is e−0.065 = 0.94, while that for a patient whose age is greater than 70,
relative to one aged less than 60, is e1.824 = 6.20. These results suggest that
the hazard of death at any given time is greatest for patients who are older
than 70, but that there is little difference in the hazard functions for patients
in the other two age groups.
The standard error of the parameter estimates in Table 3.14 can be used to
obtain a confidence interval for the true hazard ratios. A 95% confidence inter-
val for the log-hazard ratio for a patient whose age is between 60 and 70, rela-
tive to one aged less than 60, is the interval with limits −0.065±(1.96×0.498),
that is, the interval (−1.041, 0.912). The corresponding 95% confidence inter-
val for the hazard ratio itself is (0.35, 2.49). This confidence interval includes
unity, which suggests that the hazard function for an individual whose age is
between 60 and 70 is similar to that for a patient aged less than 60. Similarly,
a 95% confidence interval for the hazard for a patient aged greater than 70,
102 THE COX REGRESSION MODEL
relative to one aged less than 60, is found to be (1.63, 23.59). This interval
does not include unity, and so an individual whose age is greater than 70 has
a significantly greater hazard of death, at any given time, than patients aged
less than 60.
In some applications, the hazard ratio relative to the level of a factor other
than the first may be required. In these circumstances, the levels of the factor,
and associated indicator variables, could be redefined so that some other level
of the factor corresponds to the required baseline level, and the model re-
fitted. The required estimates can also be found directly from the estimates
obtained when the first level of the original factor is taken as the baseline,
although this is more difficult.
The hazard functions for individuals at levels j and j ′ of the factor are
exp(αj )h0 (t) and exp(αj ′ )h0 (t), respectively, and so the hazard ratio for an
individual at level j, relative to one at level j ′ , is exp(αj −αj ′ ). The log-hazard
ratio is then αj − αj ′ , which is estimated by α̂j − α̂j ′ . To obtain the standard
error of this estimate, we use the result that the variance of the difference
α̂j − α̂j ′ is given by
var (α̂j − α̂j ′ ) = var (α̂j ) + var (α̂j ′ ) − 2 cov (α̂j , α̂j ′ ).
In view of this, an estimate of the covariance between α̂j and α̂j ′ , as well as
estimates of their variance, will be needed to compute the standard error of
(α̂j − α̂j ′ ). The calculations are illustrated in Example 3.11.
Example 3.11 Treatment of hypernephroma
Consider again the subset of the data from Example 3.4, corresponding to
those patients who have had a nephrectomy. Suppose that an estimate of the
hazard ratio for an individual aged greater than 70, relative to one aged be-
tween 60 and 70, is required. Using the estimates in Table 3.14, the estimated
log-hazard ratio is α̂3 − α̂2 = 1.824 + 0.065 = 1.889, and so the estimated
hazard ratio is e1.889 = 6.61. This suggests that the hazard of death at any
given time for someone aged greater than 70 is more than six and a half times
that for someone aged between 60 and 70.
The variance of α̂3 − α̂2 is
and the variance-covariance matrix of the parameter estimates gives the re-
quired variances and covariance. This matrix can be obtained from statistical
software used to fit the Cox regression model, and is found to be
( )
A2 0.2484 0.0832
,
A3 0.0832 0.4649
A2 A3
from which var (α̂2 ) = 0.2484, var (α̂3 ) = 0.4649 and cov (α̂2 , α̂3 ) = 0.0832.
INTERPRETATION OF PARAMETER ESTIMATES 103
Of course, the variances are simply the squares of the standard errors in
Table 3.14. It then follows that
var (α̂3 − α̂2 ) = 0.4649 + 0.2484 − (2 × 0.0832) = 0.5469,
and so the standard error of α̂2 − α̂3 is 0.740. Consequently a 95% confidence
interval for the log-hazard ratio is (0.440, 3.338) and that for the hazard ratio
itself is (1.55, 8.18).
An easier way of obtaining the estimated value of the hazard ratio for an
individual who is aged greater than 70, relative to one aged between 60 and
70, and the standard error of the estimate, is to redefine the levels of the
factor associated with age group. Suppose that the data are now arranged so
that the first level of the factor corresponds to the age range 60–70, level 2
corresponds to patients aged greater than 70 and level 3 to those aged less
than 60. Choosing indicator variables to be such that the effect due to the
first level of the redefined factor is set equal to zero leads to the variables B2
and B3 defined in Table 3.15.
while that for an individual at the jth level of the factor is exp(αj )h0 (t), for
j > 2. The ratio of the hazards for an individual in group j, j > 2, relative to
that of an individual in the first group, is then
exp(αj + α2 + α3 + · · · + αm ).
Each of the terms in this expression can be found from the variance-
covariance matrix of the parameter estimates after fitting a Cox regression
model, and a confidence interval for the hazard ratio obtained. However, this
particular coding of the indicator variables does make it much more compli-
cated to interpret the individual parameter estimates in a fitted model.
The non-zero parameter estimates are α̂2 = 0.005, α̂3 = 0.065, ν̂2 = −1.943,
d d
(αν) 22 = −0.051, and (αν)32 = 2.003, and the estimated hazard ratios are
summarised in Table 3.18.
Inclusion of the combination of factor levels for which the estimated hazard
ratio is 1.000 in tables such as Table 3.18, emphasises that the hazards are
relative to those for individuals in the first age group who have not had a
nephrectomy. This table shows that individuals aged less than or equal to 70,
ESTIMATING THE HAZARD AND SURVIVOR FUNCTIONS 107
who have had a nephrectomy, have a much reduced hazard of death, compared
to those in the other age group and those who have not had a nephrectomy.
Confidence intervals for the corresponding true hazard ratios can be found
using the method described in Section 3.9.2. As a further illustration, a con-
fidence interval will be obtained for the hazard ratio for individuals who have
had a nephrectomy in the second age group relative to those in the first. The
d , and so the estimated hazard ratio is 0.955.
log-hazard ratio is now α̂2 + (αν) 22
The variance of this estimate is given by
d } + 2 cov {α̂2 , (αν)
var (α̂2 ) + var {(αν) d }.
22 22
∏
k
Ŝ0 (t) = ξˆj , (3.21)
j=1
∑
k
Ĥ0 (t) = − log Ŝ0 (t) = − log ξˆj , (3.22)
j=1
for t(k) 6 t < t(k+1) , k = 1, 2, . . . , r − 1, with Ĥ0 (t) = 0 for t < t(1) .
The estimates of the baseline hazard, survivor and cumulative hazard func-
tions in Equations (3.20), (3.21) and (3.22) can be used to obtain the corre-
sponding estimates for an individual with a vector of explanatory variables
xi . In particular, from Equation (3.17), the hazard function is estimated by
exp(β̂ ′ xi )ĥ0 (t). Next, integrating both sides of Equation (3.17), we get
∫ t ∫ t
ĥi (u) du = exp(β̂ ′ xi ) ĥ0 (u) du, (3.23)
0 0
so that the estimated cumulative hazard function for the ith individual is
given by
Ĥi (t) = exp(β̂ ′ xi )Ĥ0 (t). (3.24)
for t(k) 6 t < t(k+1) , k = 1, 2, . . . , r − 1. Note that once the estimated sur-
vivor function, Ŝi (t), has been obtained, an estimate of the cumulative hazard
function is simply − log Ŝi (t).
110 THE COX REGRESSION MODEL
3.10.1 The special case of no covariates
When there are no covariates, so that we have just a single sample of survival
times, Equation (3.19) becomes
dj
= nj ,
1 − ξˆj
from which
nj − dj
ξˆj = .
nj
Then, the estimated baseline hazard function at time t(j) is 1 − ξˆj , which
is dj /nj . The corresponding estimate of the survivor function from Equa-
∏k
tion (3.21) is j=1 ξˆj , that is,
∏k ( )
nj − d j
,
j=1
nj
and taking the first two terms in the expansion of the exponent gives
{ }
′ ′
exp eβ̂ xl log ξˆj ≈ 1 + eβ̂ xl log ξˆj .
Writing 1 − ξ˜j for the estimated baseline hazard at time t(j) , obtained using
ESTIMATING THE HAZARD AND SURVIVOR FUNCTIONS 111
′ ′
exp(β̂ xl )
this approximation, and substituting 1 + eβ̂ xl log ξ˜j for ξˆj in Equa-
˜
tion (3.19), we find that ξj is such that
∑ 1 ∑
− = exp(β̂ ′ xl ).
˜
log ξj
l∈D(t(j) ) l∈R(t (j) )
Therefore,
−dj ∑
= exp(β̂ ′ xl ),
log ξ˜j l∈R(t(j) )
since dj is the number of deaths at the jth ordered death time, t(j) , and so
( )
˜ −dj
ξj = exp ∑ ′
. (3.26)
l∈R(t(j) ) exp(β̂ xl )
∑
k
dj
H̃0 (t) = − log S̃0 (t) = ∑ , (3.28)
j=1 l∈R(t(j) ) exp(β̂ ′ xl )
since nj is the number of individuals at risk at time t(j) . This is the Nelson-
Aalen estimate of the survivor function given in Equation (2.5) of Chapter 2,
and
∑k the corresponding estimate of the baseline cumulative hazard function is
j=1 dj /nj , as in Section 2.3.3 of Chapter 2.
A further approximation is found from noting that the expression
−dj
∑ ,
l∈R(t(j) ) exp(β̂ ′ xl )
112 THE COX REGRESSION MODEL
in the exponent of Equation (3.26), will tend to be small, unless there are
large numbers of ties at particular death times. Taking the first two terms of
the expansion of this exponent, and denoting this new approximation to ξj
by ξj∗ gives
dj
ξj∗ = 1 − ∑ ′
.
l∈R(t(j) ) exp(β̂ xl )
Adapting Equation (3.20), the estimated baseline hazard function in the in-
terval from t(j) to t(j+1) is then given by
dj
h∗0 (t) = ∑ , (3.30)
(t(j+1) − t(j) ) l∈R(t(j) ) exp(β̂ ′ xl )
∑
k
H̃0 (t) = (t(j+1) − t(j) )h∗0 (t),
j=1
similar to Equations (3.24) and (3.25) can be used to estimate the cumulative
hazard and survivor functions for an individual whose vector of explanatory
variables is xi .
In practice, it will often be computationally advantageous to use either
S̃0 (t) or S0∗ (t) in place of Ŝ0 (t). When the number of tied survival times is
small, all three estimates will tend to be very similar. Moreover, since the esti-
mates are generally used as descriptive summaries of the survival data, small
differences between the estimates are unlikely to be of practical importance.
Once an estimate of the survivor function has been obtained, the median
and other percentiles of the survival time distribution can be found from tab-
ular or graphical displays of the function for individuals with particular values
of explanatory variables. The method used is very similar to that described
in Section 2.4, and is illustrated in the following example.
ESTIMATING THE HAZARD AND SURVIVOR FUNCTIONS 113
Example 3.14 Treatment of hypernephroma
In Example 3.4, a Cox regression model was fitted to the data on the survival
times of patients with hypernephroma. The hazard function was found to
depend on the age group of a patient, and whether or not a nephrectomy had
been performed. The estimated hazard function for the ith patient was found
to be
ĥi (t) = exp{0.013 A2i + 1.342 A3i − 1.412 Ni }ĥ0 (t),
where A2i is unity if the patient is aged between 60 and 70 and zero otherwise,
A3i is unity if the patient is aged over 70 and zero otherwise, and Ni is unity if
the patient has had a nephrectomy and zero otherwise. The estimated baseline
hazard function is therefore the estimated hazard of death at time t, for an
individual whose age is less than 60 and who has not had a nephrectomy.
In Table 3.19, the estimated baseline hazard function, ĥ0 (t), cumulative
hazard function, Ĥ0 (t), and survivor function, Ŝ0 (t), obtained using Equa-
tions (3.18), (3.22) and (3.21), respectively, are tabulated.
From this table, we see that the general trend is for the estimated baseline
hazard function to increase with time. From the manner in which the esti-
114 THE COX REGRESSION MODEL
mated baseline hazard function has been computed, the estimates only apply
at the death times of the patients in the study. However, if the assumption
of a constant hazard in each time interval is made, by dividing the estimated
hazard by the corresponding time interval, the risk of death per unit time
can be found. This leads to the estimate in Equation (3.20). A graph of this
hazard function is shown in Figure 3.4.
0.20
Estimated hazard function
0.15
0.10
0.05
0.00
0 20 40 60 80 100 120
Survival time
Figure 3.4 Estimated baseline hazard function, per unit time, assuming constant
hazard between adjacent death times.
This graph shows that the risk of death per unit time is roughly con-
stant over the duration of the study. Table 3.19 also shows that the values of
ĥ0 (t) are very similar to differences in the values of Ĥ0 (t) between successive
observations, as would be expected.
We now consider the estimation of the median survival time, defined as the
smallest observed survival time for which the estimated survivor function is
less than 0.5. From Table 3.19, the estimated median survival time for patients
aged less than 60 who have not had a nephrectomy is 12 months.
By raising the estimate of the baseline survivor function to a suitable
power, the estimated survivor functions for patients in other age groups,
and for patients who have had a nephrectomy, can be obtained using Equa-
tion (3.25). Thus, the estimated survivor function for the ith individual is
given by
{ }exp{0.013A2i +1.342A3i −1.412Ni }
Ŝi (t) = Ŝ0 (t) .
This function is plotted in Figure 3.5, together with the estimated baseline
survivor function, which is for an individual in the same age group but who
has not had a nephrectomy.
1.0
Estimated survivor function
0.8
0.6
0.4
0.2
0.0
0 20 40 60 80 100 120
Survival time
Figure 3.5 Estimated survivor functions for patients aged less than 60, with (—) and
without (·······) a nephrectomy.
This figure shows that the probability of surviving beyond any given time is
greater for those who have had a nephrectomy, confirming that a nephrectomy
improves the prognosis for patients with hypernephroma.
Note that because of the assumption of proportional hazards, the two esti-
mated survivor functions in Figure 3.5 cannot cross. Moreover, the estimated
survivor function for those who have had a nephrectomy lies above that of
those on whom a nephrectomy has not been performed. This is a direct con-
sequence of the estimated hazard ratio for those who have had the operation,
relative to those who have not, being less than unity.
An estimate of the median survival time for this type of patient can be
obtained from the tabulated values of the estimated survivor function, or
from the graph in Figure 3.5. We find that the estimated median survival
time for a patient aged less than 60 who has had a nephrectomy is 36 months.
Other percentiles of the distribution of survival times can be estimated using
a similar approach.
In a similar manner, the survivor functions for patients in the different
age groups can be compared, either for those who have had or not had a
nephrectomy. For example, for patients who have had a nephrectomy, the
116 THE COX REGRESSION MODEL
estimated survivor functions for patients in the three age groups are respec-
tively {Ŝ0 (t)}exp{−1.412} , {Ŝ0 (t)}exp{−1.412+0.013} and {Ŝ0 (t)}exp{−1.412+1.342} .
These estimated survivor functions are shown in Figure 3.6, which clearly
shows that patients aged over 70 have a poorer prognosis than those in the
other two age groups.
1.0
Estimated survivor function
0.8
0.6
0.4
0.2
0.0
0 20 40 60 80 100 120
Survival time
Figure 3.6 Estimated survivor functions for patients aged less than 60 (—), between
60 and 70 (·······) and greater than 70 (- - -), who have had a nephrectomy.
1∑
n
Ŝ(t) = Ŝi (t), (3.31)
n i=1
where
{ }exp(β̂ ′ xi )
Ŝi (t) = Ŝ0 (t)
is the estimated survivor function for the ith individual. Risk adjusted esti-
mates of survival rates, the median and other percentiles of the survival time
distribution can then be obtained from Ŝ(t).
where the risk score, η̂i , is given by η̂i = −0.134Hb i +0.019Bun i . The estimated
survivor function is then obtained at each of the event times in the data set,
for each of the 48 patients. Averaging the estimates across the 48 patients, for
each event time, leads to the risk adjusted survivor function. This is plotted
in Figure 3.7, together with the unadjusted Kaplan-Meier estimate of the
survivor function. The unadjusted and risk adjusted estimates of the survivor
function are very close, so that in this example, the risk adjustment process
makes very little difference to estimates of survival rates or the median survival
time.
1.0
0.8
Estimated survivor function
0.6
0.4
0.2
0.0
0 20 40 60 80 100
Survival time
Figure 3.7 Unadjusted (·······) and risk adjusted (—) estimated survivor functions for
the data on the survival of multiple myeloma patients.
variables were not evenly distributed across the two treatment groups. In
this situation, the unadjusted estimates of the two survivor functions may be
misleading. A risk adjustment process will then be needed to take account of
any imbalance between the characteristics of individuals in the two groups,
before conclusions can be drawn about the treatment effect.
If differences between the treatment groups can be assumed to be inde-
pendent of time, the group effect can be added to the survival model, and the
risk adjusted survivor function for the individuals in each treatment group
can be calculated using Equation (3.31). However, in many applications, such
as when comparing survival rates between institutions, to be considered in
Chapter 11, this assumption cannot be made. The Cox regression model is
then extended to have a different baseline hazard function for each group.
Suppose that in a study to compare the survival rates of individuals in g
groups, a Cox regression model containing relevant explanatory variables has
been fitted, excluding the group effect. The model for the hazard of death for
the ith individual, i = 1, 2, . . . , nj , in the jth group, j = 1, 2, . . . , g, at time t
is then
hij (t) = exp(β ′ xij )h0j (t),
where β ′ xij = β1 x1ij + β2 x2ij + · · · + βp xpij , x1ij , x2ij , . . . , xpij are the values
of p explanatory variables measured on each individual, and h0j (t) is the
baseline hazard function for the jth group. In this model, the coefficients of
the p explanatory variables, β1 , β2 , . . . , βp , are constant over the g groups, but
there is a different baseline hazard function for each group. This is a stratified
RISK ADJUSTED SURVIVOR FUNCTION 119
Cox regression model in which the g groups define the separate strata. These
models are considered in greater detail in Chapter 11.
On fitting the stratified model, the corresponding estimated survivor func-
tion for the ith patient in the jth group is
{ }exp(β̂ ′ xij )
Ŝij (t) = Ŝ0j (t) ,
where Ŝ0j (t) is the estimated baseline survivor function for individuals in
the jth group. If the groups can be assumed to act proportionately on the
hazard function, a common baseline hazard would be fitted and a group effect
included in the model. Then, h0j (t) is replaced by exp(gj )h0 (t), where gj is
the effect of the jth group. In general, it is less restrictive to allow the group
effects to vary over time by using a stratified model.
The risk adjusted survivor function for each group can be found from the
stratified model by averaging the values of the estimated survivor functions
at each of the event times, across the individuals in each group. The average
risk adjusted survivor function at time t is then
1 ∑
nj
Ŝj (t) = Ŝij (t),
nj i=1
where
η̂ij = 0.0673 Size ij + 0.6532 Index ij ,
and Ŝ0j (t) is the estimated baseline survivor function for the jth treatment
group. Averaging the estimated survivor functions at each event time over
the patients in each group gives the risk adjusted survivor functions shown
in Figure 3.8. Also shown in this figure are the unadjusted Kaplan-Meier
estimates of the survivor functions for each group.
1.0 1.0
Estimated survivor function
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0 20 40 60 80 0 20 40 60 80
Survival time Survival time
Figure 3.8 (i) Unadjusted and (ii) risk adjusted survivor functions for prostatic can-
cer patients on DES (·······) and placebo (—).
This figure shows how the risk adjustment process has diminished the
treatment difference. Of course, this is also seen by comparing the unadjusted
hazard ratio for a patient on DES relative to placebo (0.14) with the corre-
sponding value adjusted for tumour size and Gleason index (0.33), although
this latter analysis assumes proportional hazards for the two treatments.
The
∑ Total 2SS can be partitioned into the Model SS and the Residual SS,
i (yi − ŷi ) , which represents the unexplained variation. The larger the value
of R2 , the greater the proportion of variation in the response variable that is
accounted for by the model.
The R2 statistic for the general linear model can also be expressed in the
form
V̂M
R2 = , (3.32)
V̂M + σ̂ 2
where V̂M = Model SS/(n − 1) is an estimate of the variation in the data
explained by the model, and σ̂ 2 is an estimate of the residual variation. Note
that in this formulation, σ̂ 2 = Residual SS/(n − 1), rather than the usual
unbiased estimator with divisor n − p − 1. The quantity V̂M in Equation (3.32)
can be expressed in matrix form as β̂ ′ S β̂, where β̂ is the vector of estimated
coefficients for the p explanatory variables in the fitted regression model, and
S is the variance-covariance matrix of the explanatory variables. This matrix
is formed ∑ from the sample variance of each of the p explanatory variables,
(n − 1)−1∑ i (xij − x̄j )2 , on its diagonals, and the sample covariance terms,
(n − 1)−1 i (xij − x̄j )(xij ′ − x̄j ′ ), as the off-diagonal terms, for i = 1, 2, . . . , n
and j, j ′ = 1, 2, . . . , p for j ̸= j ′ , and where x̄j is the sample mean of the jth
variable.
In the analysis of survival data, a large number of different measures of the
proportion of variation in the data that is explained by a fitted Cox regression
model have been proposed. However, published reviews of these measures,
based on extensive simulation studies, suggest that three particular statistics
have desirable properties. All of them take values between 0 and 1, are largely
independent of the degree of censoring, are not affected by the scale of the
survival data, and increase in value as explanatory variables are added to the
model. These are described in the next section.
122 THE COX REGRESSION MODEL
3.12.1 Measures of explained variation
Consider a Cox regression model for the hazard of death at time t in the
ith individual, hi (t) = exp(β ′ xi )h0 (t), where xi is the vector of values of
p explanatory variables, β is the vector of their unknown coefficients, and
h0 (t) is the baseline hazard function. One of the earliest suggestions for an
R2 type statistic, due to Kent and O’Quigley (1988), is similar in form to the
R2 statistic for linear regression analysis in Equation (3.32). This statistic is
defined by
2 V̂P
RP = ,
V̂P + π 2 /6
where the indicator function I(·) is such that I{X < 0} = 1 if X < 0 and zero
otherwise. In addition, the standard error of K̂ can be obtained, so that the
precision of K̂ can be assessed, although details will not be given here.
∏
r
exp(βx(j) )
L(β) = ∑nj , (3.33)
j=1 l=1 exp(βxl )
since there are nj = n1j + n2j individuals in the risk set, R(t(j) ), at time t(j) ,
and the corresponding log-likelihood function is
{ nj }
∑r ∑r ∑
log L(β) = βx(j) − log exp(βxl ) .
j=1 j=1 l=1
Since x(j) is zero for individuals in Group II, the first summation in this
expression
∑r is over the death times in Group I, and so is simply d1 β, where
d1 = j=1 d1j is the total number of deaths in Group I. Also,
∑
nj
exp(βxl ) = n1j eβ + n2j ,
l=1
and so
∑
r
{ }
log L(β) = d1 β − log n1j eβ + n2j . (3.34)
j=1
∂ 2 log L(β)
i(β) = −
∂β 2
is Fisher’s (observed) information function. Under the null hypothesis that
β = 0, u2 (0)/i(0) has a chi-squared distribution on one degree of freedom.
Now, from Equation (3.34),
r ( )
∂ log L(β) ∑ n1j eβ
= d1j − ,
∂β j=1
n1j eβ + n2j
and
and
∑
r
n1j n2j
i(0) = .
j=1
(n1j + n2j )2
These are simply the expressions for UL and VL given in Equations (2.20) and
(2.22) of Chapter 2, for the special case where there are no ties, that is, where
dj = 1 for j = 1, 2, . . . , r.
When there are tied observations, the likelihood function in Equa-
tion (3.33) has to be replaced by one that allows for ties. In particular, if the
likelihood function in Equation (3.12) is used, the efficient score and informa-
tion function are exactly those given in Equations (2.20) and (2.22). Hence,
when there are tied survival times, the log-rank test corresponds to using the
score test for the discrete proportional hazards model due to Cox (1972). In
128 THE COX REGRESSION MODEL
practice, the P -value that results from this score test will not usually differ
much from that obtained from comparing the values of the statistic −2 log L̂
for the models with and without a term corresponding to the treatment effect.
This was noted in the discussion of Example 3.3. Of course, one advantage of
using the Cox regression model in the analysis of such data is that it leads
directly to an estimate of the hazard ratio.
After a model has been fitted to an observed set of survival data, the adequacy
of the model needs to be assessed. Indeed, the use of diagnostic procedures
for model checking is an essential part of the modelling process.
In some situations, careful inspection of an observed set of data may lead to
the identification of certain features, such as individuals with unusually large
or small survival times. However, unless there are only one or two explanatory
variables, a visual examination of the data may not be very revealing. The sit-
uation is further complicated by censoring, in that the occurrence of censored
survival times make it difficult to judge aspects of model adequacy, even in
the simplest of situations. Visual inspection of the data should therefore be
supplemented by diagnostic procedures for detecting inadequacies in a fitted
model.
Once a model has been fitted, there are a number of aspects of the fit
of a model that need to be studied. For example, the model must include an
appropriate set of explanatory variables from those measured in the study, and
we will need to check that the correct functional form of these variables has
been used. It might be important to identify observed survival times that are
greater than would have been anticipated, or individuals whose explanatory
variables have an undue impact on particular hazard ratios. Also, some means
of checking the assumption of proportional hazards might be required.
Many model-checking procedures are based on quantities known as resid-
uals. These are values that can be calculated for each individual in the study,
and have the feature that their behaviour is known, at least approximately,
when the fitted model is satisfactory. A number of residuals have been pro-
posed for use in connection with the Cox regression model, and this chapter
begins with a review of some of these. The use of residuals in assessing specific
aspects of model adequacy is then discussed in subsequent sections.
131
132 MODEL CHECKING IN THE COX REGRESSION MODEL
are right-censored. We further suppose that a Cox regression model has been
fitted to the survival times, and that the linear component of the model con-
tains p explanatory variables, X1 , X2 , . . . , Xp . The fitted hazard function for
the ith individual, i = 1, 2, . . . , n, is therefore
where β̂ ′ xi = β̂1 x1i + β̂2 x2i + · · · + β̂p xpi is the value of the risk score for that
individual and ĥ0 (t) is the estimated baseline hazard function.
where Ĥ0 (ti ) is an estimate of the baseline cumulative hazard function at time
ti , the observed survival time of that individual. In practice, the Nelson-Aalen
estimate given in Equation (3.28) is generally used. Note that from Equa-
tion (3.24), the Cox-Snell residual, rCi , is the value of Ĥi (ti ) = − log Ŝi (ti ),
where Ĥi (ti ) and Ŝi (ti ) are the estimated values of the cumulative hazard and
survivor functions of the ith individual at ti .
This residual can be derived from a general result in mathematical statis-
tics on the distribution of a function of a random variable. According to this
result, if T is the random variable associated with the survival time of an
individual, and S(t) is the corresponding survivor function, then the random
variable Y = − log S(T ) has an exponential distribution with unit mean, irre-
spective of the form of S(t). The proof of this result is outlined in the following
paragraph, which can be omitted without loss of continuity.
From a general result, if fX (x) is the probability density function of the
random variable X, the density of the random variable Y = g(X) is given by
dy
fY (y) = fX {g (y)}/ ,
−1
dx
fY (y) = e−y ,
where rCi is the Cox-Snell residual for the ith observation, defined in Equa-
tion (4.1). It now remains to identify a suitable value for ∆. For this, we use
the lack of memory property of the exponential distribution.
To demonstrate this property, suppose that the random variable T has
an exponential distribution with mean λ−1 , and consider the probability that
T exceeds t0 + t1 , t1 > 0, conditional on T being at least equal to t0 . From
the standard result for conditional probability given in Section 3.3.1, this
probability is
The numerator of this expression is simply P(T > t0 + t1 ), and so the required
probability is the ratio of the probability of survival beyond t0 +t1 to the prob-
ability of survival beyond t0 , that is S(t0 + t1 )/S(t0 ). The survivor function
for the exponential distribution is given by S(t) = e−λt , as in Equation (5.2)
of Chapter 5, and so
exp{−λ(t0 + t1 )}
P(T > t0 + t1 | T > t0 ) = = e−λt1 ,
exp(−λt0 )
However, if the proportion of censored observations is not too great, the sets of
modified residuals from Equations (4.3) and (4.5) will not appear too different.
where rM i is the martingale residual for the ith individual, and the function
sgn(·) is the sign function. This is the function that takes the value +1 if
its argument is positive and −1 if negative. Thus, sgn(rM i ) ensures that the
deviance residuals have the same sign as the martingale residuals.
The original motivation for these residuals is that they are components of
the deviance. The deviance is a statistic that is used to summarise the extent
to which the fit of a model of current interest deviates from that of a model
which is a perfect fit to the data. This latter model is called the saturated
or full model, and is a model in which the β-coefficients are allowed to be
different for each individual. The statistic is given by
{ }
D = −2 log L̂c − log L̂f ,
where L̂c is the maximised partial likelihood under the current model and
L̂f is the maximised partial likelihood under the full model. The smaller the
value of the deviance, the better the model. The deviance can be regarded as a
generalisation of the residual sum of squares used in modelling normal data to
the analysis of non-normal data, and features prominently in generalised linear
modelling. Note that differences in deviance between two alternative models
are the same as differences in the values of the statistic −2 log∑L̂ introduced
2
in Chapter 3. The deviance residuals are then such that D = rDi , so that
observations that correspond to relatively large deviance residuals are those
that are not well fitted by the model.
Another way of viewing the deviance residuals is that they are martingale
residuals that have been transformed to produce values that are symmetric
RESIDUALS FOR THE COX REGRESSION MODEL 137
about zero when the fitted model is appropriate. To see this, first recall that
the martingale residuals rM i can take any value in the interval (−∞, 1). For
large negative values of rM i , the term in square brackets in Equation (4.7) is
dominated by rM i . Taking the square root of this quantity has the effect of
bringing the residual closer to zero. Thus, martingale residuals in the range
(−∞, 0) are shrunk toward zero. Now consider martingale residuals in the
interval (0, 1). The term δi log(δi − rM i ) in Equation (4.7) will only be non-
zero for uncensored observations, and will then have the value log(1 − rM i ).
As rM i gets closer to unity, 1 − rM i gets closer to zero and log(1 − rM i ) takes
large negative values. The quantity in square brackets in Equation (4.7) is
then dominated by this logarithmic term, and so the deviance residuals are
expanded toward +∞ as the martingale residual reaches its upper limit of
unity.
One final point to note is that although these residuals can be expected to
be symmetrically distributed about zero when an appropriate model has been
fitted, they do not necessarily sum to zero.
∂ log L(β) ∑
n
= δi {xji − aji } , (4.10)
∂βj i=1
where ∑ ′
∑l xjl exp(β xl )
aji = ′
. (4.11)
l exp(β xl )
The ith term in this summation, evaluated at β̂, is then the Schoenfeld residual
for Xj , given in Equation (4.8). Since the estimates of the β’s are such that
∂ log L(β)
βˆ = 0,
∂βj
the Schoenfeld residuals must sum to zero. These residuals also have the prop-
erty that, in large samples, the expected value of rSji is zero, and they are
uncorrelated with one another.
It turns out that a scaled version of the Schoenfeld residuals, proposed
by Grambsch and Therneau (1994), is more effective in detecting departures
from the assumed model. Let the vector of Schoenfeld residuals for the ith
individual be denoted r Si = (rS1i , rS2i , . . . , rSpi )′ . The scaled, or weighted,
∗
Schoenfeld residuals, rSji , are then the components of the vector
where d is the number of deaths among the n individuals, and var (β̂) is the
variance-covariance matrix of the parameter estimates in the fitted Cox re-
gression model. These scaled Schoenfeld residuals are therefore quite straight-
forward to compute.
(4.12)
where xji is the ith value of the jth explanatory variable, δi is the event indi-
cator which is zero for censored observations and unity otherwise, aji is given
RESIDUALS FOR THE COX REGRESSION MODEL 139
in Equation (4.11), and R(tr ) is the risk set at time tr . In this formulation,
the contribution of the ith observation to the derivative only depends on in-
formation up to time ti . In other words, if the study was actually concluded
at time ti , the ith component of the derivative would be unaffected. Residuals
are then obtained as the estimated value of the n components of the deriva-
tive. From Appendix A, the first derivative of the logarithm of the partial
likelihood function, with respect to βj , is the efficient score for βj , written
u(βj ). These residuals are therefore known as score residuals, and are denoted
by rU ji .
From Equation (4.12), the ith score residual, i = 1, 2, . . . , n, for the jth
explanatory variable in the model, Xj , is given by
∑ (âjr − xji )δr
rU ji = δi (xji − âji ) + exp(β̂ ′ xi ) ∑ .
exp(β̂ ′ xl )
tr 6 ti l∈R(tr )
which shows that the score residuals are modifications of the Schoenfeld resid-
uals. As for the Schoenfeld residuals, the score residuals sum to zero, but will
not necessarily be zero when an observation is censored.
In this section, a number of residuals have been defined. We conclude with
an example that illustrates the calculation of these different types of residual
and that shows similarities and differences between them. This example will be
used in many illustrations in this chapter, mainly because the relatively small
number of observations allows the values of the residuals and other diagnostics
to be readily tabulated. However, the methods of this chapter are generally
more informative in larger data sets.
When a Cox regression model is fitted to these data, the estimated hazard
function for the ith patient, i = 1, 2, . . . , 13, is found to be
ĥi (t) = exp {0.030 Age i − 2.711 Sex i } ĥ0 (t), (4.14)
where Age i and Sex i refer to the age and sex of the ith patient.
The variable Sex is certainly important, since when Sex is added to the
model that contains Age alone, the decrease in the value of the −2 log L̂ statis-
tic is 6.445 on 1 d.f. This change is highly significant (P = 0.011). On the
other hand, there is no statistical evidence for including the variable Age in
the model, since the change in the value of the −2 log L̂ statistic on adding
Age to the model that contains Sex is 1.320 on 1 d.f. (P = 0.251). However, it
can be argued that from a clinical viewpoint, the hazard of infection may well
depend on age. Consequently, both variables will be retained in the model.
The values of different types of residual for the model in Equation (4.14)
are displayed in Table 4.2. In this table, rCi , rM i and rDi are the Cox-Snell
residuals, martingale residuals and deviance residuals, respectively. Also rS1i
and rS2i are the values of Schoenfeld residuals for the variables Age and Sex,
∗ ∗
respectively, rS1i and rS2i are the corresponding scaled Schoenfeld residuals,
and rU 1i , rU 2i are the score residuals.
The values in this table were computed using the Nelson-Aalen estimate
of the baseline cumulative hazard function given in Equation (3.28). Had the
estimate Ĥ0 (t), in Equation (3.22), been used, different values for all but the
RESIDUALS FOR THE COX REGRESSION MODEL 141
Table 4.2 Different types of residual after fitting a Cox regression model.
∗ ∗
Patient rCi rM i rDi rS1i rS2i rS1i rS2i rU 1i rU 2i
1 0.280 0.720 1.052 −1.085 −0.242 0.033 −3.295 −0.781 −0.174
2 0.072 0.928 1.843 14.493 0.664 0.005 7.069 13.432 0.614
3 1.214 −0.214 −0.200 3.129 −0.306 0.079 −4.958 −0.322 0.058
4 0.084 0.916 1.765 −10.222 0.434 −0.159 8.023 −9.214 0.384
5 1.506 −0.506 −0.439 −16.588 −0.550 −0.042 −5.064 9.833 0.130
6 0.265 −0.265 −0.728 – – – – −3.826 −0.145
7 0.235 0.765 1.168 −17.829 0.000 −0.147 3.083 −15.401 −0.079
8 0.484 0.516 0.648 −7.620 0.000 −0.063 1.318 −7.091 −0.114
9 1.438 −0.438 −0.387 17.091 0.000 0.141 −2.955 −15.811 −0.251
10 1.212 −0.212 −0.199 10.239 0.000 0.085 −1.770 1.564 −0.150
11 1.187 −0.187 −0.176 2.857 0.000 0.024 −0.494 6.575 −0.101
12 1.828 −0.828 −0.670 5.534 0.000 0.046 −0.957 4.797 −0.104
13 2.195 −1.195 −0.904 0.000 0.000 0.000 0.000 16.246 −0.068
We now consider how residuals obtained after fitting a Cox regression model
can be used to throw light on the extent to which the fitted model provides
an appropriate description of the observed data. We will then be in a position
to study the residuals obtained in Example 4.1 in greater detail.
142 MODEL CHECKING IN THE COX REGRESSION MODEL
4.2 Assessment of model fit
A number of plots based on residuals can be used in the graphical assessment
of the adequacy of a fitted model. Unfortunately, many graphical procedures
that are analogues of residual plots used in linear regression analysis have not
proved to be very helpful. This is because plots of residuals against quantities
such as the observed survival times, or the rank order of these times, often
exhibit a definite pattern, even when the correct model has been fitted. Tra-
ditionally, plots of residuals have been based on the Cox-Snell residuals, or
adjusted versions of them described in Section 4.1.2. The use of these residuals
is therefore reviewed in the next section, and this is followed by a description
of how some other types of residuals may be used in the graphical assessment
of the fit of a model.
The relatively small number of observations in this data set makes it diffi-
cult to interpret plots of residuals. However, the plotted points in Figure 4.1
are fairly close to a straight line through the origin, which has approximately
unit slope. This could suggest that the model fitted to the data given in Ta-
ble 4.1 is satisfactory.
On the face of it, this procedure would appear to have some merit, but
cumulative hazard plots of the Cox-Snell residuals have not proved to be
very useful in practice. In an earlier section it was argued that since the val-
ues − log S(ti ) have a unit exponential distribution, the Cox-Snell residuals,
which are estimates of these quantities, should have an approximate unit ex-
ponential distribution when the fitted model is correct. This result is then
used when interpreting a cumulative hazard plot of the residuals. Unfortu-
144 MODEL CHECKING IN THE COX REGRESSION MODEL
Cumulative hazard of residual 2.5
2.0
1.5
1.0
0.5
0.0
0.0 0.5 1.0 1.5 2.0
Cox-Snell residual
1.0 2.0
1.5
0.5
Martingale residual
Deviance residual
1.0
0.0
0.5
-0.5
0.0
-1.0
-0.5
-1.5 -1.0
1 3 5 7 9 11 13 1 3 5 7 9 11 13
Index Index
The plots are quite similar, but the distribution of the deviance residuals
is seen to be more symmetric. The plots also show that there are no patients
who have residuals that are unusually large in absolute value. Figure 4.3 gives
a plot of the deviance residuals against the risk scores, that are found from
the values of 0.030 Age i − 2.711 Sex i , for i = 1, 2, . . . , 13.
2.0
1.5
Deviance residual
1.0
0.5
0.0
-0.5
-1.0
-5 -4 -3 -2 -1
Risk score
Figure 4.3 Plot of the deviance residuals against the values of the risk score.
ASSESSMENT OF MODEL FIT 147
This figure shows that patients with the largest deviance residuals have
low risk scores. This indicates that these patients are at relatively low risk
of an early catheter removal, and yet their removal time is sooner than
expected.
1.0
0.5
Martingale residual for null model
0.0
-0.5
-1.0
-1.5
-2.0
-2.5
-3.0
10 20 30 40 50 60
Age
Figure 4.4 Plot of the martingale residuals for the null model against Age, with a
smoothed curve superimposed.
There is too little data to say much about this graph, but the smoothed
curve indicates that there is no need for anything other than a linear term in
Age. In fact, the age effect is not actually significant, and so it is not surprising
that the smoothed curve is roughly horizontal.
0
0 1 2 3
Cox-Snell residual
pointed out in Section 4.2.1, this plot is not very sensitive to departures from
the fitted model.
To further assess the fit of the model, the deviance residuals are plotted
against the corresponding risk scores in Figure 4.6.
2
Deviance residual
-1
-2
-2 -1 0 1 2
Risk score
1
Martingale residual for null model
-1
-2
-3
4 6 8 10 12 14 16
Value of Hb
Figure 4.7 Plot of the martingale residuals for the null model against the values of
Hb, with a smoothed curve superimposed.
The plots for Hb and Bun confirm that linear terms in each variable are
required in the model. Note that the slope of the plot for Hb in Figure 4.7 is
negative, corresponding to the negative coefficient of Hb in the fitted model,
while the plot for Bun in Figure 4.8 has a positive slope.
In this data set, the values of Bun range from 6 to 172, and the distribution
of their values across the 48 subjects is positively skewed. In order to guard
against the extreme values of this variate having an undue impact on the co-
efficient of Bun, logarithms of this variable might be used in the modelling
process. Although there is no suggestion of this in Figure 4.8, for illustrative
purposes, we will use this type of plot to investigate whether a model contain-
ing log Bun rather than Bun is acceptable. Figure 4.9 shows the martingale
residuals for the null model plotted against the values of log Bun.
ASSESSMENT OF MODEL FIT 151
1
Martingale residual for null model
-1
-2
-3
0 25 50 75 100 125 150 175 200
Value of Bun
Figure 4.8 Plot of the martingale residuals for the null model against the values of
Bun, with a smoothed curve superimposed.
1
Martingale residual for null model
-1
-2
-3
1 2 3 4 5
Value of log Bun
Figure 4.9 Plot of the martingale residuals for the null model against the values of
log Bun, with a smoothed curve superimposed.
152 MODEL CHECKING IN THE COX REGRESSION MODEL
The smoothed curve in this figure does suggest that it is not appropriate
to use a linear term in log Bun. Indeed, if it were decided to use log Bun in the
model, Figure 4.9 indicates that a quadratic term in log Bun may be needed.
In fact, adding this quadratic term to a model that includes Hb and log Bun
leads to a significant reduction in the value of −2 log L̂, but the resulting value
of this statistic, 201.458, is then only slightly less than the corresponding value
for the model containing Hb and Bun, which is 202.938. This analysis confirms
that the model should contain linear terms in the variables Hb and Bun.
r ′U i var (β̂),
var (β̂) being the variance-covariance matrix of the vector of parameter esti-
mates in the fitted Cox regression model. The jth element of this vector, which
is called a delta-beta, will be denoted by ∆i β̂j , so that ∆i β̂j ≈ β̂j − β̂j(i) . Use
of this approximation means that the values of ∆i β̂j can be computed from
quantities available after fitting the model to the full data set.
Observations that influence a particular parameter estimate, the jth say,
will be such that the values of ∆i β̂j , the delta-betas for these observations, are
larger in absolute value than for other observations in the data set. Index plots
of the delta-betas for each explanatory variable in the model will then reveal
whether there are observations that have an undue impact on the parameter
estimate for any particular explanatory variable. In addition, a plot of the
values of ∆i β̂j against the rank order of the survival times yields information
about the relation between survival time and influence.
The delta-betas may be standardised by dividing ∆i β̂j by the standard
error of β̂j to give a standardised delta-beta. The standardised delta-beta can
be interpreted as the change in the value of the statistic β̂/ se (β̂), on omitting
the ith observation. Since this statistic can be used in assessing whether a par-
ticular parameter has a value significantly different from zero (see Section 3.4
154 MODEL CHECKING IN THE COX REGRESSION MODEL
of Chapter 3), the standardised delta-beta can be used to provide information
on how the significance of the parameter estimate is affected by the removal
of the ith observation from the database. Again, an index plot is the most
useful way of displaying the standardised delta-betas.
The statistic ∆i β̂j is an approximation to the actual change in the pa-
rameter estimate when the ith observation is omitted from the fit. The ap-
proximation is generally adequate in the sense that observations that have
an influence on a parameter estimate will be highlighted. However, the ac-
tual effect of omitting any particular observation on model-based inferences
will need to be studied. The agreement between the actual and approximate
delta-betas in a particular situation is illustrated in Example 4.6.
The largest delta-beta for Age occurs for patient number 13, but there
are other delta-betas with similar values. The actual change in the parameter
estimate on omitting the data for this patient is 0.0195, and so omission of
this observation reduces the hazard of infection relative to the baseline hazard.
The standard error of the parameter estimate for Age in the full data set is
0.026, and so the maximum amount by which this estimate is changed when
one observation is deleted is about three-quarters of a standard error. When
the data from patient 13 is omitted, the age effect becomes less significant,
but the difference is unlikely to be of practical importance.
IDENTIFICATION OF INFLUENTIAL OBSERVATIONS 155
There are two large delta-betas for Sex that are quite close to one an-
other. These correspond to the observations from patients 2 and 4. The actual
change in the parameter estimate when each observation is omitted in turn is
0.820 and 0.818, and so the approximate delta-betas underestimate the actual
change. The standard error of the estimated coefficient of Sex in the full data
set is 1.096, and so again the change in the estimate on deleting an observa-
tion is less than one standard error. The effect of deleting either of these two
observations is to increase the hazard for males relative to females, so that
the sex effect is slightly more significant.
The approximate delta-betas can be compared with the actual values. In
this example, the agreement is generally quite good, although there is a ten-
dency for the actual changes in the parameter estimates to be underestimated
by the approximation. The largest difference between the actual and approxi-
mate value of the delta-beta for Age is 0.010, which occurs for patient number
8. That for Sex is 0.276, which occurs for patient number 2. These differ-
ences are about a quarter of the value of the standard error of each parameter
estimate.
The observations that most affect the value of the maximised log-likelihood
when they are omitted are those corresponding to patients 2 and 4. The value
of the likelihood displacement diagnostic is also quite large for patient number
13. This means that the set of parameter estimates are most affected by the
removal of either of these three patients from the database.
The fourth element of |lmax |, |lmax |4 , is the largest in absolute value, and
indicates that omitting the data from patient number 4 has the greatest effect
on the pair of parameter estimates. The elements corresponding to patients 2
and 13 are also large relative to the other values, suggesting that the data for
these patients are also influential. The sum of the squares of elements 2, 4 and
13 of |lmax | is 0.70. The total of the sums of squares of the elements is 1.00, and
so cases 2, 4 and 13 account for nearly three-quarters of the variability in the
vales |lmax |i . Note that the analysis of the delta-betas in Example 4.6 showed
that the observations from patients 2 and 4 most influence the parameter
estimate for Sex, while the observation for patient 13 has a greater effect on
the estimate for Age.
In summary, the observations from patients 2, 4 and 13 affect the form
of the hazard function to the greatest extent. Omitting each of these in turn
158 MODEL CHECKING IN THE COX REGRESSION MODEL
gives the following estimates of the linear component in the hazard functions
for the ith individual.
For comparison, the linear component for the full data set is
0.030 Age i − 2.711 Sex i .
To illustrate the magnitude of the change in estimated hazard ratios, consider
the relative hazard of infection at time t for a patient aged 50 years relative
to one aged 40 years. For the full data set, this is e0.304 = 1.355. This value
increases to 1.365 and 1.564 when patients 2 and 4, respectively, are omitted,
and decreases to 1.114 when patient 13 is omitted. The effect on the haz-
ard function of removing these patients from the database is therefore not
particularly marked.
In the same way, the hazard of infection at time t for a male patient
(Sex = 1) relative to a female (Sex = 2) is e2.711 , that is, 5.041 for the full
data set. When observations 2, 4, and 13 are omitted in turn, the hazard ratio
for males relative to females is 4.138, 4.097 and 9.334, respectively. Omission
of the data from patient number 13 appears to have a great effect on the
estimated hazard ratio. However, some caution is needed in interpreting this
result. Since there are very few males in the data set, the estimated hazard
ratio is imprecisely estimated. In fact, a 95% confidence interval for the hazard
ratio, when the data from patient 13 are omitted, ranges from 0.012 to 82.96!
0.03
0.02
Delta-beta for Hb
0.01
0.00
-0.01
-0.02
0 10 20 30 40 50
Rank of survival time
Figure 4.10 Plot of the delta-betas for Hb against rank order of survival time.
From Figure 4.10, no one observation stands out as having a delta-beta for
Hb that is different from the rest. However, Figure 4.11 shows that the two
observations with the shortest survival times have relatively large positive or
large negative delta-betas for Bun. These correspond to patients 32 and 38 in
the data given in Table 1.3. Patient 32 has a survival time of just one month,
and the second largest value of Bun. Deletion of this observation from the
database decreases the parameter estimate for Bun. Patient number 38 also
survived for just one month after trial entry, but has a value of Bun that is
160 MODEL CHECKING IN THE COX REGRESSION MODEL
0.002
0.001
Delta-beta for Bun
0.000
-0.001
-0.002
0 10 20 30 40 50
Rank of survival time
Figure 4.11 Plot of the delta-betas for Bun against rank order of survival time.
rather low for someone surviving for such a short time. If the data from this
patient are omitted, the coefficient of Bun in the model is increased.
To identify observations that influence the set of parameter estimates, a
plot of the absolute values of the elements of the diagnostic lmax against the
rank order of the survival times is shown in Figure 4.12.
The observation with the largest value of |lmax | corresponds to patient 13.
This patient has an unusually small value of Hb, and a value of Bun that
is a little high, for someone who has survived as long as 65 months. If this
observation is omitted from the data set, the coefficient of Bun remains the
same, but that of Hb is reduced from −0.134 to −0.157. The effect of Hb on
the hazard of death is then a little more significant. In summary, the record
for patient 13 has little effect on the form of the estimated hazard function.
0.3
0.2
0.1
0.0
0 10 20 30 40 50
Rank of survival time
Figure 4.12 Plot of the absolute values of the elements of lmax against rank order of
survival time.
where Hi (t) and H0 (t) are the cumulative hazard functions. Taking logarithms
of each side of this equation, we get
0
Log-cumulative hazard
-1
-2
-3
0 1 2 3 4 5
Log of survival time
Figure 4.13 Log-cumulative hazard plot for multiple myeloma patients in four groups
defined by Hb 6 7 ( •), 7 < Hb 6 10 ( ), 10 < Hb 6 13 (N) and Hb > 13 (∗).
0.2 10
Scaled Schoenfeld residual for Age
0.0 0
-0.1 -5
-0.2 -10
0 100 200 300 400 500 600 0 100 200 300 400 500 600
Survival time Survival time
Figure 4.14 Plot of scaled Schoenfeld residuals for Age and Sex.
the null hypothesis that the slope is zero, this statistic has a χ2 distribution
on 1 d.f., and significantly large values of Expression (4.18) lead to rejection
of the proportional hazards assumption for the jth explanatory variable.
An overall or global test of the proportional hazards assumption across all
the p explanatory variables included in a Cox regression model is obtained by
aggregating the individual test statistics in Expression (4.18). This leads to
the statistic
(τ − τ̄ )′ S var (β̂)S ′ (τ − τ̄ )
∑d , (4.19)
i=1 (τi − τ̄ ) /d
2
where τ = (τ1 , τ2 , . . . , τd )′ is the vector formed from the d event times and
S is the d × p matrix whose columns are the (unscaled) Schoenfeld residuals
for the j explanatory variable, so that S = (rSj1 , rSj2 , . . . , rSjd )′ , and var (β̂)
is the variance-covariance matrix of the estimated coefficients of the explana-
tory variables in the fitted Cox regression model. The test statistic in Expres-
sion (4.19) has a χ2 distribution on p d.f. when the assumption of proportional
hazards across all p explanatory variables is true. This test is known as the
Grambsch and Therneau test of proportional hazards, and is sometimes more
enigmatically referred to as the zph test.
The test statistics in Expressions (4.18) and (4.19) can be adapted to
other time-scales by replacing the τi , i = 1, 2, . . . , d, by transformed values
of the death times. For example, using logarithms of the death times, rather
than the times themselves, would allow linearity in the coefficient of Xj to
be assessed on a logarithmic scale. The τi can also be replaced by the rank
order of the death times, or by the Kaplan-Meier estimate of the survivor
function at each event time. Plots of scaled Schoenfeld residuals against time,
discussed in Section 4.4.2, may indicate which of these possible options is the
most appropriate.
where x1i is the value of an indicator variable X1 that is zero for the standard
treatment and unity for the new treatment. The relative hazard of death at
any time for a patient on the new treatment, relative to one on the standard,
is then eβ1 , which is independent of the survival time.
Now define a time-dependent explanatory variable X2 , where X2 = X1 t.
If this variable is added to the model in Equation (4.20), the hazard of death
at time t for the ith individual becomes
where x2i = x1i t is the value of X1 t for the ith individual. The relative hazard
at time t is now
exp(β1 + β2 t), (4.22)
since X2 = t under the new treatment, and zero otherwise. This hazard ratio
depends on t, and the model in Equation (4.21) is no longer a proportional
hazards model. In particular, if β2 < 0, the relative hazard decreases with
time. This means that the hazard of death on the new treatment, relative
to that on the standard, decreases with time. If β1 < 0, the superiority of
the new treatment becomes more apparent as time goes on. On the other
hand, if β2 > 0, the relative hazard of death on the new treatment increases
with time, reflecting an increasing risk of death on the new treatment relative
to the standard. In the particular case where β2 = 0, the relative hazard is
constant at eβ1 . This means that a test of the hypothesis that β2 = 0 is a
test of the assumption of proportional hazards. The situation is illustrated in
Figure 4.15.
In order to aid both the computation and interpretation of the parameters
in the model of Equation (4.21), the variable X2 can be defined in terms of
the deviation from some time, t0 . The estimated values of β1 and β2 will then
tend to be less highly correlated, and the numerical algorithm for maximising
the appropriate likelihood function will be more stable. If X2 is taken to be
TESTING THE ASSUMPTION OF PROPORTIONAL HAZARDS 167
1.0
exp( 1) 2=0
2<0
0.0
0
Time
Figure 4.15 Plot of the relative hazard, exp(β1 + β2 t), against t, for different values
of β2 .
exp{β1 + β2 (t − t0 )}.
In the model of Equation (4.21), with x2i = x1i (t − t0 ), the quantity eβ1 is the
hazard of death at time t0 for an individual on the new treatment relative to
one on the standard. In practical applications, t0 will generally be chosen to
provide a convenient interpretation for the time at which this relative hazard
is applicable. For example, taking t0 to be the mean or median survival time
means that exp(β̂1 ) is the estimated relative hazard of death at this time.
A similar model can be used to detect whether the coefficient of a con-
tinuous variate has a coefficient that depends on time. Suppose that X is
such a variate, and we wish to examine whether there is any evidence that
the coefficient of X is linearly dependent on time. To do this, the term Xt is
added to the model that includes X. The hazard of death at time t for the
ith individual is then
where xi is the value of X for that individual. The hazard of death at time
t for an individual for whom X = xi + 1, relative to an individual for whom
X = xi , is then exp(β1 + β2 t), as in Equation (4.22).
The time-dependent variables considered in this section are such that their
coefficients are linearly dependent on time. A similar approach can be used
168 MODEL CHECKING IN THE COX REGRESSION MODEL
when a coefficient that is a non-linear function of time is anticipated. For ex-
ample, log t might be used in place of t in the definition of the time-dependent
variable X2 , used in Equation (4.21). In this version of the model, a test of
the hypothesis that β2 = 0 is a test of proportional hazards, where the al-
ternative hypothesis is that the hazard ratio is dependent on the logarithm
of time. Using log t in the definition of a time-dependent variable is also
helpful when the numerical values of the survival times are large, such as
when survival in a long-term study is measured in days. There may then be
computational problems associated with calculating the value of exp(β2 x2i )
in Equation (4.21), which are resolved by using log t in place of t in the
definition of X2 .
Models that include the time-dependent variable X2 cannot be fitted by
treating X2 in the same manner as other explanatory variables in the model.
The reason for this is that this variable will have different values at different
death times, complicating the calculation of the denominator of the partial
likelihood function in Equation (3.4). Full details on the fitting process will be
deferred to Chapter 8. However, inferences about the effect of time-dependent
variables on the hazard function can be evaluated as for other variables. In
particular, the change in the value of the −2 log L̂ statistic can be compared
to percentage points of the chi-squared distribution to test the significance of
the variable. This is therefore a formal test of proportional hazards.
4.5 Recommendations
When the Cox regression model is used in the analysis of survival data, there
is no need to assume a particular form of probability distribution for the sur-
vival times. As a result, the hazard function is not restricted to a specific
functional form, and the model has flexibility and widespread applicability.
On the other hand, if the assumption of a particular probability distribution
for the data is valid, inferences based on such an assumption will be more pre-
cise. In particular, estimates of quantities such as relative hazards and median
survival times will tend to have smaller standard errors than they would in
the absence of a distributional assumption. Models in which a specific proba-
bility distribution is assumed for the survival times are known as parametric
models, and parametric versions of the proportional hazards model, described
in Chapter 3, are the subject of this chapter.
A probability distribution that plays a central role in the analysis of sur-
vival data is the Weibull distribution, introduced by W. Weibull in 1951 in the
context of industrial reliability testing. Indeed, this distribution is as central
to the parametric analysis of survival data as the normal distribution is in lin-
ear modelling. Proportional hazards models based on the Weibull distribution
are therefore considered in some detail.
and
f (t) d
h(t) = = − {log S(t)},
S(t) dt
where f (t) is the probability density function of the survival times. These
relationships were derived in Section 1.3. An alternative approach is to specify
171
172 PARAMETRIC PROPORTIONAL HAZARDS MODELS
a functional form for the hazard function, from which the survivor function
and probability density functions can be determined from the equations
and
dS(t)
f (t) = h(t)S(t) = − ,
dt
where ∫ t
H(t) = h(u) du
0
h(t) = λ,
1.0 =1.0
0.8
Hazard function
0.6
0.4
0.2
=0.1
0.0 =0.01
0 20 40 60 80 100
Time
Figure 5.1 Hazard functions for exponential distributions with λ = 1.0, 0.1 and 0.01.
0.5
0.4 =1.0
Probability density function
0.3
0.2
0.1
=0.1
0.0 =0.01
0 1 2 3 4 5 6 7 8 9 10
Time
Figure 5.2 Probability density functions for exponential distributions with λ = 1.0,
0.1 and 0.01.
for 0 6 t < ∞, which is the density of a random variable that has a Weibull
distribution with scale parameter λ and shape parameter γ. This distribution
will be denoted W (λ, γ). The right-hand tail of this distribution is longer than
the left-hand one, and so the distribution is positively skewed.
The mean, or expected value, of a random variable T that has a W (λ, γ)
distribution can be shown to be given by
The value of this integral is (x − 1)!, and so for integer values of x it can easily
MODELS FOR THE HAZARD FUNCTION 175
=2
=1
0< <1
0
0
Time
Figure 5.3 The form of the Weibull hazard function, h(t) = λγtγ−1 , for different
values of γ.
be calculated. The function is also defined for non-integer values of x, and can
then be evaluated as a standard function in many software packages.
Since the Weibull distribution is skewed, a more appropriate, and more
tractable, summary of the location of the distribution is the median survival
time. This is the value t(50) such that S{t(50)} = 0.5, so that
and { }1/γ
1
t(50) = log 2 .
λ
More generally, the pth percentile of the Weibull distribution, t(p), is such
that { ( )}1/γ
1 100
t(p) = log . (5.6)
λ 100 − p
The median and other percentiles of the Weibull distribution are therefore
much simpler to compute than the mean of the distribution.
The hazard function and corresponding probability density function for
Weibull distributions with a median of 20, and shape parameters γ = 0.5, 1.5
and 3.0, are shown in Figures 5.4 and 5.5, respectively. The corresponding
value of the scale parameter, λ, for these three Weibull distributions is 0.15,
0.0078 and 0.000087, respectively.
Since the Weibull hazard function can take a variety of forms, depending on
the value of the shape parameter, γ, and appropriate summary statistics can
176 PARAMETRIC PROPORTIONAL HAZARDS MODELS
0.20 =3.0
0.15
Hazard function
0.10
=1.5
0.05
=0.5
0.00
0 10 20 30 40
Time
Figure 5.4 Hazard functions for a Weibull distribution with a median of 20 and γ =
0.5, 1.5 and 3.0.
0.06
0.05
Probability density function
0.04
0.03
0.02
0.01 =1.5
=0.5
0.00 =3.0
0 10 20 30 40
Time
Figure 5.5 Probability density functions for a Weibull distribution with a median of
20 and γ = 0.5, 1.5 and 3.0.
ASSESSING THE SUITABILITY OF A PARAMETRIC MODEL 177
be easily obtained, this distribution is widely used in the parametric analysis
of survival data.
0
Log-cumulative hazard
-1
-2
-3
2.0 2.5 3.0 3.5 4.0 4.5 5.0
Log of discontinuation time
Figure 5.6 Log-cumulative hazard plot for the data from Example 1.1.
The plot indicates that there is a straight line relationship between the
log-cumulative hazard and log t, confirming that the Weibull distribution is
an appropriate model for the discontinuation times. From the graph, the in-
tercept of the line is approximately −6.0 and the slope is approximately 1.25.
Approximate estimates of the parameters of the Weibull distribution are there-
fore λ∗ = exp(−6.0) = 0.002 and γ ∗ = 1.25. The estimated value of γ, the
shape parameter of the Weibull distribution, is quite close to unity, suggesting
that the discontinuation times might be adequately modelled by an exponen-
tial distribution.
∏
n
f (ti ).
i=1
∏
r ∏
n−r
f (tj ) S(t∗l ), (5.8)
j=1 l=1
in which the first product is taken over the r death times and the second over
the n − r censored survival times.
More compactly, suppose that the data are regarded as n pairs of obser-
vations, where the pair for the ith individual is (ti , δi ), i = 1, 2, . . . , n. In this
notation, δi is an indicator variable that takes the value zero when the sur-
vival time ti is censored, and unity when ti is an uncensored survival time.
The likelihood function can then be written as
∏
n
δ 1−δi
{f (ti )} i {S(ti )} . (5.9)
i=1
∏
n
δ
{h(ti )} i S(ti ). (5.10)
i=1
This version of the likelihood function is particularly useful when the probabil-
ity density function has a complicated form, as it often does. Estimates of the
unknown parameters in this likelihood function are then found by maximising
the logarithm of the likelihood function.
so that
P(τi = t, δi = 0) = fCi (t)STi (t).
∏
n
{fTi (ti )SCi (ti )}δi {fCi (ti )STi (ti )}1−δi ,
i=1
∏
n
fTi (ti )δi STi (ti )1−δi ,
i=1
where δi is zero if the survival time of the ith individual is censored and unity
otherwise. After some simplification,
∏
n
L(λ) = λδi e−λti ,
i=1
We now need to identify the value λ̂, for which the log-likelihood function is
a maximum. Differentiation with respect to λ gives
r ∑
n
d log L(λ)
= − ti ,
dλ λ i=1
This result could be used to obtain a confidence interval for the mean survival
time. In particular, the limits of a 100(1 − α)% confidence interval for λ are
λ̂ ± zα/2 se (λ̂), where zα/2 is the upper α/2-point of the standard normal
distribution.
In presenting the results of a survival analysis, the estimated survivor and
hazard functions, and the median and other percentiles of the distribution of
survival times, are useful. Once an estimate of λ has been found, all these func-
tions can be estimated using the results given in Section 5.1.1. In particular,
under the assumed exponential distribution, the estimated hazard function is
ĥ(t) = λ̂ and the estimated survivor function is Ŝ(t) = exp(−λ̂t). In addition,
the estimated pth percentile is given by
( )
1 100
t̂(p) = log , (5.13)
λ̂ 100 − p
and on substituting for se (λ̂) from Equation (5.12) and t̂(p) from Equa-
tion (5.13), we find
√
se {t̂(p)} = t̂(p)/ r. (5.16)
In particular, the standard error of the estimated median survival time is
√
se {t̂(50)} = t̂(50)/ r. (5.17)
Confidence intervals for a true percentile are best obtained from exponenti-
ating the confidence limits for the logarithm of the percentile. This procedure
ensures that confidence limits for the percentile will be non-negative. Again
making use of the result in Equation (5.15), the standard error of log t̂(p) is
given by
se {log t̂(p)} = t̂(p)−1 se {t̂(p)},
and after substituting for se {t̂(p)} from Equation (5.16), this standard error
becomes
√
se {log t̂(p)} = 1/ r.
0.010
0.008
Estimated hazard function
0.006
0.004
0.002
0.000
0 20 40 60 80 100 120
Discontinuation time
1.0
0.8
Estimated survivor function
0.6
0.4
0.2
0.0
0 20 40 60 80 100 120
Discontinuation time
and so the interval is from 42 days to 155 days. Confidence intervals for other
percentiles can be calculated in a similar manner.
and so, from Expression (5.9), the likelihood of the n survival times is
n {
∏ }δi
1−δ
λγtγ−1
i exp(−λt γ
i ) {exp(−λtγi )} i ,
i=1
where δi is zero if the ith survival time is censored and unity otherwise. Equiv-
alently, from Expression (5.10), the likelihood function is
n {
∏ }δi
λγtγ−1
i exp(−λtγi ).
i=1
r ∑
n
− tγ̂i = 0, (5.18)
λ̂ i=1
and
r ∑ ∑
n n
+ δi log ti − λ̂ tγ̂i log ti = 0. (5.19)
γ̂ i=1 i=1
r ∑ r ∑ γ̂
n n
+ δi log ti − ∑ γ̂ ti log ti = 0. (5.21)
γ̂ i=1 i ti i=1
This is a non-linear equation in γ̂, which can only be solved numerically using
an iterative procedure. Once the estimate, γ̂, which satisfies Equation (5.21),
has been found, Equation (5.20) can be used to obtain λ̂.
In practice, a numerical method, such as the Newton-Raphson procedure,
is used to find the values λ̂ and γ̂ which maximise the likelihood function
simultaneously. This procedure was described in Section 3.3.3 of Chapter 3,
in connection with fitting the Cox regression model. In that section it was
noted that an important by-product of the Newton-Raphson procedure is an
approximation to the variance-covariance matrix of the parameter estimates,
from which their standard errors can be obtained.
Once estimates of the parameters λ and γ have been found from fitting the
Weibull distribution to the observed data, percentiles of the survival time dis-
tribution can be estimated using Equation (5.6). The estimated pth percentile
of the distribution is
{ ( )}1/γ̂
1 100
t̂(p) = log , (5.22)
λ̂ 100 − p
The standard error of t̂(p) is the square root of this expression, given by
{ ( )2
{ } t̂(p)
se t̂(p) = γ̂ 2 var (λ̂) + λ̂2 cp − log λ̂ var (γ̂)
λ̂γ̂ 2
( ) } 12
+ 2λ̂γ̂ cp − log λ̂ cov (λ̂, γ̂) . (5.27)
Note that for the special case of the exponential distribution, where the
shape parameter, γ, is equal to unity, the standard error of the estimated pth
percentile from Equation (5.27) is
t̂(p)
se (λ̂).
λ̂
Now, using Equation (5.12) of Chapter 5,
√
se (λ̂) = λ̂/ r,
as in Equation (5.16).
A 100(1 − α)% confidence interval for the pth percentile of a Weibull dis-
tribution is found from the corresponding confidence limits for log t(p). These
limits are
log t̂(p) ± zα/2 se {log t̂(p)},
where se {log t̂(p)} is, from Equation (5.26), given by
1
se {log t̂(p)} = se {t̂(p)}, (5.28)
t̂(p)
and zα/2 is the upper α/2-point of the standard normal distribution. The
corresponding[ 100(1 − α)% confidence
] interval for the pth percentile, t(p), is
then t̂(p) exp ±zα/2 se {log t̂(p)} .
Example 5.3 Time to discontinuation of the use of an IUD
In Example 5.1, it was found that an exponential distribution provides a
satisfactory model for the data on the discontinuation times of 18 IUD users.
For comparison, a Weibull distribution will be fitted to the same data set. This
190 PARAMETRIC PROPORTIONAL HAZARDS MODELS
distribution is fitted using computer software, and from the resulting output,
the estimated scale parameter of the distribution is found to be λ̂ = 0.000454,
while the estimated shape parameter is γ̂ = 1.676. The standard errors of these
estimates are given by se (λ̂) = 0.000965 and se (γ̂) = 0.460, respectively.
Note that approximate confidence limits for the shape parameter, γ, found
using γ̂±1.96 se (γ̂), include unity, suggesting that the exponential distribution
would provide a satisfactory model for the discontinuation times.
The estimated hazard and survivor functions are obtained by substituting
these estimates into Equations (5.4) and (5.5), whence
ĥ(t) = λ̂γ̂tγ̂−1 ,
and ( )
Ŝ(t) = exp −λ̂tγ̂ .
These two functions are shown in Figures 5.9 and 5.10.
0.020
Estimated hazard function
0.015
0.010
0.005
0.000
0 20 40 60 80 100 120
Discontinuation time
1.0
0.8
Estimated survivor function
0.6
0.4
0.2
0.0
0 20 40 60 80 100 120
Discontinuation time
error of this estimate, from Equation (5.27) is, after much arithmetic, found
to be
se {t̂(50)} = 15.795.
In order to obtain a 95% confidence interval for the median discontinuation
time, the standard error of log t̂(50) is required. From Equation (5.28),
15.795
se {log t̂(50)} = = 0.199,
79.272
and so the required confidence limits for the log median discontinuation
time are log 79.272 ± 1.96 × 0.199, that is, (3.982, 4.763). The correspond-
ing interval estimate for the true median discontinuation time, found from
exponentiating these limits, is (53.64, 117.15). This means that there is
a 95% chance that the interval from 54 days to 117 days includes the
true value of the median discontinuation time. This interval is rather
wide because of the small number of actual discontinuation times in the
data set.
It is interesting to compare these results with those found in Exam-
ple 5.2, where the discontinuation times were modelled using an exponen-
tial distribution. The estimated median survival times are very similar, at
80.6 days for the exponential and 79.3 days for the Weibull model. How-
ever, the standard error of the estimated median survival time is 26.8 days
when the times are assumed to have an exponential distribution, and only
15.8 days under the Weibull model. The median is therefore estimated more
precisely when the discontinuation times are assumed to have a Weibull
distribution.
192 PARAMETRIC PROPORTIONAL HAZARDS MODELS
Other percentiles of the discontinuation time distribution, and accompany-
ing standard errors and confidence intervals, can be found in a similar fashion.
For example, the 90th percentile, that is, the time beyond which 10% of those
in the study continue with the use of the IUD, is 162.23 days, and 95% confi-
dence limits for the true percentile are from 95.41 to 275.84 days. Notice that
the width of this confidence interval is larger than that for the median discon-
tinuation time, reflecting the fact that the median is more precisely estimated
than other percentiles.
where xi is the value of X for the ith individual. Consequently, the hazard
at time t for an individual in Group I is h0 (t), and that for an individual in
Group II is ψh0 (t), where ψ = exp(β). The quantity β is then the logarithm of
the ratio of the hazard for an individual in Group II, to that of an individual
in Group I.
We will now make the additional assumption that the survival times for
the individuals in Group I have a Weibull distribution with scale parame-
ter λ and shape parameter γ. Using Equation (5.29), the hazard function for
the individuals in this group is h0 (t), where h0 (t) = λγtγ−1 . Now, also from
Equation (5.29), the hazard function for those in Group II is ψh0 (t), that is,
ψλγtγ−1 . This is the hazard function for a Weibull distribution with scale
parameter ψλ and shape parameter γ. We therefore have the result that if the
survival times of individuals in one group have a Weibull distribution with
shape parameter γ, and the hazard of death at time t for an individual in the
second group is proportional to that of an individual in the first, the survival
times of those in the second group will also have a Weibull distribution with
shape parameter γ. The Weibull distribution is then said to have the propor-
tional hazards property. This property is another reason for the importance of
the Weibull distribution in the analysis of survival data.
0
Log-cumulative hazard
-1
-2
-3
-4
1 2 3 4 5 6
Log of survival time
Figure 5.11 Log-cumulative hazard plot for women with tumours that were positively
stained (∗) and negatively stained (•).
194 PARAMETRIC PROPORTIONAL HAZARDS MODELS
In this figure, the lines corresponding to the two staining groups are rea-
sonably straight. This means that the assumption of Weibull distributions for
the survival times of the women in each group is quite plausible. Moreover,
the gradients of the two lines are very similar, which means that the propor-
tional hazards model is valid. The vertical separation of the two lines provides
an estimate of the log relative hazard. From Figure 5.11, the vertical distance
between the two straight lines is approximately 1.0, and so a rough estimate of
the hazard ratio is e1.0 = 2.72. Women in the positively stained group would
appear to have nearly three times the risk of death at any time compared to
those in the negatively stained group. More accurate estimates of the relative
hazard will be obtained from fitting exponential and Weibull models to the
data of this example, in Examples 5.5 and 5.6.
For those in Group II, the hazard function is ψλ, and the probability density
function and survivor function are
which simplifies to
∏
n1 ∏
n2
λδi1 e−λti1 (ψλ)δi′ 2 e−ψλti′ 2 .
i=1 i′ =1
If the numbers of ∑
actual death times
∑ in the two groups are r1 and r2 , respec-
tively, then r1 = i δi1 and r2 = i′ δi′ 2 , and the log-likelihood function is
A MODEL FOR THE COMPARISON OF TWO GROUPS 195
given by
∑
n1 ∑
n2
log L(ψ, λ) = r1 log λ − λ ti1 + r2 log(ψλ) − ψλ t i′ 2 .
i=1 i′ =1
Now write T1 and T2 for the total known time survived by the individuals in
Groups I and II, respectively. Then, T1 and T2 are the totals of uncensored
and censored survival times in each group, so that the log-likelihood function
becomes
In order to obtain the values ψ̂, λ̂ for which this function is a maximum,
we differentiate with respect to ψ and λ, and set the derivatives equal to zero.
The resulting equations that are satisfied by ψ̂, λ̂ are
r2
− λ̂T2 = 0, (5.30)
ψ̂
r1 + r2
− (T1 + ψ̂T2 ) = 0. (5.31)
λ̂
From Equation (5.30),
r2
λ̂ = ,
ψ̂T2
and on substituting for λ̂ in Equation (5.31) we get
r2 T1
ψ̂ = . (5.32)
r1 T2
Then, from Equation (5.30),
λ̂ = r1 /T1 .
Both of these estimates have an intuitive justification. The estimated value
of λ is the reciprocal of the average time survived by individuals in Group
I, while the estimated relative hazard, ψ̂, is the ratio of the average times
survived by the individuals in the two groups.
The asymptotic variance-covariance matrix of the parameter estimates is
the inverse of the information matrix, whose elements are found from the
second derivatives of the log-likelihood function; see Appendix A. We have
that
d2 log L(ψ, λ) r2 d2 log L(ψ, λ) r1 + r2 d2 log L(ψ, λ)
= − , = − , = −T2 ,
dψ 2 ψ2 dλ2 λ2 dλdψ
and the information matrix is the matrix of negative expected values of these
partial derivatives. The only second derivative for which expectations need
to be obtained is the derivative with respect to λ and ψ, for which E(T2 )
196 PARAMETRIC PROPORTIONAL HAZARDS MODELS
is required. This is straightforward when the survival times have an expo-
nential distribution, but as shown in Section 5.1.2, the expected value of a
survival time that has a Weibull distribution is much more difficult to calcu-
late. For this reason, the information matrix is usually approximated by using
the observed values of the negative second partial derivatives. The observed
information matrix is thus
( )
r2 /ψ 2 T2
I(ψ, λ) = ,
T2 (r1 + r2 )/λ2
and the inverse of this matrix is
( )
1 (r1 + r2 )ψ 2 −T2 ψ 2 λ2
.
(r1 + r2 )r2 − T22 ψ 2 λ2 −T2 ψ 2 λ2 r2 λ2
1
se {t̂(50)} = se {t̂(50)}.
t̂(50)
Confidence limits for log t(50) are then exponentiated to give the correspond-
ing confidence limits for t(50) itself.
In this example, 95% confidence intervals for the true median survival times
of the two groups of women are (91.4, 608.9) and (54.8, 138.3), respectively.
Notice that the confidence interval for the median survival time of patients
with positive staining is much narrower than that for women with negative
staining. This is due to there being a relatively small number of uncensored
survival times in the women whose tumours were negatively stained.
∏
n
δ
{hi (ti )} i Si (ti ). (5.38)
i=1
The logarithm of the likelihood function, rather than the likelihood itself,
is maximised with respect to the unknown parameters, and from Expres-
sion (5.38), this is
∑n
[δi log hi (ti ) + log Si (ti )].
i=1
On substituting for hi (ti ) and Si (ti ) from Equations (5.36) and (5.37), the
log-likelihood becomes
∑
n
[δi {β ′ xi + log(λγ) + (γ − 1) log ti } − λ exp(β ′ xi )tγ ] ,
i=1
∑
n
[δi {β ′ xi + log(λγ) + γ log ti } − λ exp(β ′ xi )tγ ] , (5.40)
i=1
THE WEIBULL PROPORTIONAL HAZARDS MODEL 201
which differs from that obtained∑n from the full log-likelihood, given in Expres-
sion (5.39), by the value of i=1 δi log ti . When computer software is used
to fit the Weibull proportional hazards model, the log-likelihood is generally
computed from Expression (5.40). This expression will also be used in the
examples given in this book.
Computer software for fitting parametric proportional hazards models gen-
erally includes the standard errors of the parameter estimates, from which
confidence intervals for relative hazards and the median and other percentiles
of the survival time distribution can be found. Specifically, suppose that the es-
timates of the parameters in the model of Equation (5.36) are β̂1 , β̂2 , . . . , β̂p , λ̂
and γ̂. The estimated survivor function for the ith individual in the study, for
whom the values of the explanatory variables in the model are x1i , x2i , . . . , xpi ,
is then { }
Ŝi (t) = exp − exp(β̂1 x1i + β̂2 x2i + · · · + β̂p xpi )λ̂tγ̂ ,
Both of these functions can be estimated and plotted against t, for individuals
with particular values of the explanatory variables in the model.
Generalising the result in Equation (5.22) to the situation where the
Weibull scale parameter is λ exp(β ′ xi ), the estimated pth percentile of the
survival time distribution for an individual, whose vector of explanatory vari-
ables is xi , is
{ ( )}1/γ̂
1 100
t̂(p) = log . (5.41)
λ̂ exp(β̂ ′ xi ) 100 − p
The standard error of t̂(p) and corresponding interval estimates for t(p), are
derived in Section 5.6.2.
∂g 1 ∂g α̂
=− , = 2,
∂ α̂ σ̂ ∂ σ̂ σ̂
and so using Expression (5.48),
( ) ( )2 ( )2 ( )( )
α̂ 1 α̂ 1 α̂
var − ≈ − var (α̂) + var (σ̂) + 2 − cov (α̂, σ̂).
σ̂ σ̂ σ̂ 2 σ̂ σ̂ 2
1 { 2 }
4
σ̂ var (α̂) + α̂2 var (σ̂) − 2α̂σ̂ cov (α̂, σ̂) , (5.49)
σ̂
−0.9967
β̂ = − = 0.9343.
1.0668
The corresponding hazard ratio is 2.55, as in Example 5.6.
The standard errors of α̂ and σ̂ are generally included in standard computer
output, and are 0.5441 and 0.1786, respectively. The estimated variances of α̂
and σ̂ are therefore 0.2960 and 0.0319, respectively. The covariance between
α̂ and σ̂ can be found from computer software, although it is not usually part
of the default output. It is found to be −0.0213.
Substituting for α̂, σ̂ and their variances and covariance in Expres-
sion (5.49), we get
var (β̂) ≈ 0.2498,
and so the standard error of β̂ is given by se (β̂) = 0.4998. This can be used
in the construction of confidence intervals for the corresponding true hazard
ratio.
THE WEIBULL PROPORTIONAL HAZARDS MODEL 205
5.6.4 Exploratory analyses
In Sections 5.2 and 5.5.1, we saw how a log-cumulative hazard plot could be
used to assess whether survival data can be modelled by a Weibull distribution,
and whether the proportional hazards assumption is valid. These procedures
work perfectly well when we are faced with a single sample of survival data,
or data where the number of groups is small and there is a reasonably large
number of individuals in each group. But in situations where there are a small
number of death times distributed over a relatively large number of groups,
it may not be possible to estimate the survivor function, and hence the log-
cumulative hazard function, for each group.
As an example, consider the data on the survival times of patients with
hypernephroma, given in Table 3.6. Here, individuals are classified according
to age group and whether or not a nephrectomy has been performed, giving
six combinations of age group and nephrectomy status. To examine the as-
sumption of a Weibull distribution for the survival times in each group, and
the assumption of proportional hazards across the groups, a log-cumulative
hazard plot would be required for each group. The number of patients in each
age group who have not had a nephrectomy is so small that the survivor
function cannot be properly estimated in these groups. If there were more
individuals in the study who had died and not had a nephrectomy, it would
be possible to construct a log-cumulative hazard plot. If this plot featured six
parallel straight lines, the Weibull proportional hazards model is likely to be
satisfactory.
When a model contains continuous variables, their values will first need
to be grouped before a log-cumulative hazard plot can be obtained. This may
also result in there being insufficient numbers of individuals in some groups
to enable the log-cumulative hazard function to be estimated.
The only alternative to using each combination of factor levels in construct-
ing a log-cumulative hazard plot is to ignore some of the factors. However, the
resulting plot can be very misleading. For example, suppose that patients are
classified according to the levels of two factors, A and B. The log-cumulative
hazard plot obtained by grouping the individuals according to the levels of A
ignoring B, or according to the levels of B ignoring A, may not give cause
to doubt the Weibull or proportional hazards assumptions. However, if the
log-cumulative hazard plot is obtained for individuals at each combination
of levels of A and B, the plot may not feature a series of four parallel lines.
By the same token, the log-cumulative hazard plot obtained when either A
or B is ignored may not show sets of parallel straight lines, but when a plot
is obtained for all combinations of A and B, parallel lines may result. This
feature is illustrated in the following example.
The log-cumulative hazard plot shown in Figure 5.12 is derived from the
individuals classified according to the two levels of A, ignoring the level of
factor B. The plot in Figure 5.13 is from individuals classified according to
the two levels of B, ignoring the level of factor A.
1
Log-cumulative hazard
-1
-2
-3
-4
1 2 3 4 5 6
Log of survival time
Figure 5.12 Log-cumulative hazard plot for individuals for whom A = 1 (∗) and
A = 2 (•).
1
Log-cumulative hazard
-1
-2
-3
-4
1 2 3 4 5 6
Log of survival time
Figure 5.13 Log-cumulative hazard plot for individuals for whom B = 1 (∗) and
B = 2 (•).
shown as Figure 5.13 strongly suggest that the hazards are not proportional
when individuals are classified according to the levels of B. A different picture
emerges when the 37 survival times are classified according to the levels of
both A and B. The log-cumulative hazard plot based on the four groups is
shown in Figure 5.14. The four parallel lines show that there is no doubt about
the validity of the proportional hazards assumption across the groups.
In this example, the reason why the log-cumulative hazard plot for B
ignoring A is misleading is that there is an interaction between A and B. An
examination of the data reveals that, on average, the difference in the survival
times of patients for whom B = 1 and B = 2 is greater when A = 2 than
when A = 1.
Even when a log-cumulative hazard plot gives no reason to doubt the as-
sumption of a Weibull proportional hazards model, the validity of the fitted
model will need to be examined using the methods to be described in Chap-
ter 7. When it is not possible to use a log-cumulative hazard plot to explore
whether a Weibull distribution provides a reasonable model for the survival
times, a procedure based on the Cox regression model, described in Chapter
3, might be helpful. Essentially, a Cox regression model that includes all the
relevant explanatory variables is fitted, and the baseline hazard function is
estimated, using the procedure described in Section 3.10. A plot of this func-
tion may suggest whether or not the assumption of a Weibull distribution is
tenable. In particular, if the estimated baseline hazard function in the Cox
model is increasing or decreasing, the Weibull model may provide a more con-
208 PARAMETRIC PROPORTIONAL HAZARDS MODELS
1
Log-cumulative hazard
-1
-2
-3
-4
1 2 3 4 5 6
Log of survival time
Figure 5.14 Log-cumulative hazard plot for individuals in the groups defined by the
four combinations of levels of A and B.
cise summary of the baseline hazard function than the Cox regression model.
Because the estimated baseline hazard function for a fitted Cox model can
be somewhat irregular, comparing the estimated baseline cumulative hazard
or the baseline survivor function, under the fitted Cox regression model, with
that of the Weibull model may be more fruitful.
Model (1): hi (t) = exp{β̂1 x1i + β̂2 x2i + · · · + β̂p xpi }λ̂γ̂tγ̂−1
Model (2): hi (t) = exp{β̂1 x1i + β̂2 x2i + · · · + β̂p+q xp+q,i }λ̂γ̂tγ̂−1
where x1i , x2i , . . . , xp+q,i are the values of the p + q explanatory variables for
the ith individual. The maximised likelihoods under Model (1) and Model
(2) will be denoted by L̂1 and L̂2 , respectively. The difference between the
values of −2 log L̂1 and −2 log L̂2 , that is, −2{log L̂1 − log L̂2 }, then has an
approximate chi-squared distribution with q degrees of freedom, under the
null hypothesis that the coefficients of the additional q variates in Model (2)
are all equal to zero. If the difference between the values of −2 log L̂ for these
two models is significantly large when compared with percentage points of the
chi-squared distribution, we would deduce that the extra q terms are needed
in the model, in addition to the p that are already included. Since differences
between values of −2 log L̂ are used in comparing models, it does not matter
whether the maximised log-likelihood, used in computing the value of −2 log L̂,
is based on Expression (5.39) or (5.40).
The description of the modelling process in Sections 3.5–3.8 applies equally
well to models based on the Weibull proportional hazards model, and so will
not be repeated here. However, the variable selection strategy will be illus-
trated using two examples.
The values of the −2 log L̂ statistic in Table 5.2, and other examples in
this book, have been computed using the log-likelihood in Expression (5.40).
Accordingly these values may differ from the∑values given by some computer
n
software packages by an amount equal to 2 i=1 δi log ti , which in this case
has the value 136.3733.
The reduction in the value of −2 log L̂ on adding the interaction term to
Model (4) is 4.69 on two degrees of freedom. This reduction is just about
significant at the 10% level (P = 0.096) and so there is some suggestion of an
interaction between age group and nephrectomy status. For comparison, note
that when the Cox regression model was fitted in Example 3.4, the interaction
was not significant (P = 0.220).
The interaction can be investigated in greater detail by examining the haz-
ard ratios under the model. Under Model (5), the estimated hazard function
for the ith individual is
d }ĥ0 (t),
ĥi (t) = exp{α̂j + ν̂k + (αν) jk
where
ĥ0 (t) = λ̂γ̂tγ̂−1
is the estimated baseline hazard function. The logarithm of the hazard ratio
for an individual in the jth age group, j = 1, 2, 3, and kth level of nephrectomy
status, k = 1, 2, relative to an individual in the youngest age group who has
not had a nephrectomy, is therefore
d − α̂1 − ν̂1 − (αν)
α̂j + ν̂k + (αν) d , (5.50)
jk 11
d ,
α̂j + ν̂k + (αν) jk
for j = 1, 2, 3, k = 1, 2. Table 5.4 gives the hazards for the individuals, relative
to the baseline hazard. The baseline hazard corresponds to an individual in
the youngest age group who has not had a nephrectomy, and so a hazard ratio
of unity for these individuals is recorded in Table 5.4.
When the model containing the interaction term is fitted to the data,
the estimated values of the parameters in the baseline hazard function are
λ̂ = 0.0188 and γ̂ = 1.5538. Table 5.5 gives the estimated median survival
times, in months, for individuals with each combination of age group and
nephrectomy status.
This table shows that a nephrectomy leads to more than a fourfold increase
in the median survival time in patients aged up to 70 years. The median
survival time of patients aged over 70 is not much affected by the performance
of a nephrectomy.
We end this example with a note of caution. For some combinations of
age group and nephrectomy status, particularly the groups of individuals who
have not had a nephrectomy, the estimated hazard ratios and median survival
times are based on small numbers of survival times. As a result, the standard
errors of estimates of such quantities, which have not been given here, will be
large.
The data, which were obtained from Therneau (1986), are given in Table 5.6.
In modelling these data, the factors Treat, Rdisease and Perf each have two
levels, and will be fitted as variates that take the values given in Table 5.6.
This does of course mean that the baseline hazard function is not directly
214 PARAMETRIC PROPORTIONAL HAZARDS MODELS
interpretable, since there can be no individual for whom the values of all these
variates are zero. From both a computational and interpretive viewpoint, it is
more convenient to relocate the values of the variables Age, Rdisease, Perf and
Treat. If the variable Age − 50 is used in place of Age, and unity is subtracted
from Rdisease, Perf and Treat, the baseline hazard then corresponds to the
hazard for an individual of age 50 with incomplete residual disease, good
performance status, and who has been allocated to the cyclophosphamide
group. However, the original variables will be used in this example.
We begin by identifying which prognostic factors are associated with the
survival times of the patients. The values of the statistic −2 log L̂ on fitting a
range of models to these data are given in Table 5.7.
When Weibull models that contain just one of Age, Rdisease and Perf are
fitted, we find that both Age and Rdisease lead to reductions in the value of
−2 log L̂ that are significant at the 5% level. After fitting Age, the variables
Rdisease and Perf further reduce −2 log L̂ by 1.903 and 0.048, respectively,
neither of which is significant at the 10% level. Also, when Age is added to the
model that already includes Rdisease, the reduction in −2 log L̂ is 13.719 on
1 d.f., which is highly significant (P < 0.001). This leads us to the conclusion
that Age is the only prognostic variable that needs to be incorporated in the
model.
The term associated with the treatment effect is now added to the model.
The value of −2 log L̂ is then reduced by 2.440 on 1 d.f. This reduction of 2.440
is not quite large enough for it to be significant at the 10% level (P = 0.118).
There is therefore only very slight evidence of a difference in the effect of the
two chemotherapy treatments on the hazard of death.
For comparison, when Treat alone is added to the null model, the value
of −2 log L̂ is reduced from 59.534 to 58.355. This reduction of 1.179 is cer-
tainly not significant when compared to percentage points of the chi-squared
distribution on 1 d.f. Ignoring Age therefore leads to an underestimate of the
magnitude of the treatment effect.
To explore whether the treatment difference is consistent over age, the
interaction term formed as the product of Age and Treat is added to the model.
EXPLAINED VARIATION IN THE WEIBULL MODEL 215
On doing so, −2 log L̂ is only reduced by 1.419. This reduction is nowhere near
being significant and so there is no need to include an interaction term in the
model.
The variable Treat will be retained in the model, since interest centres
on the magnitude of the treatment effect. The fitted model for the hazard of
death at time t for the ith individual is then found to be
V̂M
R2 = ,
V̂M + σ̂ 2
where V̂M is an estimate of variation in the data due to the fitted model and
σ̂ 2 is the residual variation. Now, VM can be expressed as β̂ ′ S β̂, where S is
216 PARAMETRIC PROPORTIONAL HAZARDS MODELS
the variance-covariance matrix of the explanatory variables; see Section 3.12
of Chapter 3. It then follows that R2 is a sample estimate of the quantity
β ′ Sβ
ρ2 = ′
β Sβ + σ 2
in which σ 2 is the variance of the response variable, Y.
We now adapt this measure for use in the analysis of survival data, which
requires the replacement of σ 2 by a suitable quantity. To do this, we take σ 2
to be the variance of the error term, ϵi , in the log-linear form of the Weibull
model given in Equation (5.47) of Section 5.6.3. In the particular case of
Weibull survival times, ϵi has a distribution which is such that the variance of
ϵi is π 2 /6; further details are given in Section 6.5.1 of Chapter 6. This leads
to the statistic
2 β̂ ′ S β̂
RP = ′ ,
β̂ S β̂ + π 2 /6
that was introduced in Section 3.12.1 of Chapter 3. This measure of explained
variation can be generally recommended for use with both the Cox and Weibull
proportional hazards models.
2
The RD statistic, also described in Section 3.12.1 of Chapter 3, can be
adapted for use with parametric survival models in a similar manner. This
leads to
2 D2
RD = 2 .
D + π 2 /6
where D is the scaled coefficient for the regression of the ordered values of the
risk score on normal scores as before.
Example 5.11 Survival of multiple myeloma patients
To illustrate the use of measures of explained variation in a Weibull model
for survival data, consider the data on the survival times of patients with
multiple myeloma, for which the values of three R2 statistics on fitting Cox
regression models were given in Example 3.17. For a Weibull model containing
2 2
the variables Hb and Bun, RP = 0.25 and RD = 0.23. For the model that
2 2
contains all 7 variables, RP = 0.30 and RD = 0.28. These values are quite
similar to those obtained on fitting corresponding Cox regression models, so
that the Cox and Weibull models have similar explanatory power.
h(t) = λeθt ,
THE GOMPERTZ PROPORTIONAL HAZARDS MODEL 217
for 0 6 t < ∞, and λ > 0. In the particular case where θ = 0, the hazard func-
tion has a constant value, λ, and the survival times then have an exponential
distribution. The parameter θ determines the shape of the hazard function,
positive values leading to a hazard function that increases with time. The haz-
ard function can also be expressed as h(t) = exp(α +θt), which shows that the
log-hazard function is linear in t. On the other hand, from Equation (5.4), the
Weibull log-hazard function is linear in log t. Like the Weibull hazard function,
the Gompertz hazard increases or decreases monotonically.
The survivor function of the Gompertz distribution is given by
{ }
λ
S(t) = exp (1 − eθt ) ,
θ
and the corresponding density function is
{ }
λ
f (t) = λeθt exp (1 − eθt ) .
θ
The pth percentile is such that
{ ( )}
1 θ 100 − p
t(p) = log 1 − log ,
θ λ 100
from which the median survival time is
{ }
1 θ
t(50) = log 1 + log 2 .
θ λ
A plot of the Gompertz hazard function for distributions with a median of
20 and θ = −0.2, 0.02 and 0.05 is shown in Figure 5.15. The corresponding
values of λ are 0.141, 0.028 and 0.020.
It is straightforward to see that the Gompertz distribution has the propor-
tional hazards property, described in Section 5.5.1, since if we take h0 (t) =
λeθt , then ψh0 (t) is also a Gompertz hazard function with parameters ψλ
and θ.
The general Gompertz proportional hazards model for the hazard of death
at time t for the ith of n individuals is expressed as
0.15
0.10
Hazard function
0.05
0.00
0 10 20 30 40
Time
Figure 5.15 Hazard functions for a Gompertz distribution with a median of 20 and
θ = −0.2, 0.02 and 0.05.
proportional hazards model that contained the variables Age and Treat was
fitted. For comparison, a Gompertz proportional hazards model that contains
these two variables is now fitted. Under this model, the fitted hazard function
for the ith patient is
where λ̂ = 1.706 × 10−6 and θ̂ = 0.00138. The change in the value of −2 log L̂
on adding Treat to the Gompertz proportional hazards model that contains
Age alone is now 1.686 (P = 0.184). The hazard ratio for the treatment effect,
which is now exp(0.848) = 2.34, is therefore smaller and less significant under
this model than it was for the Weibull model.
221
222 ACCELERATED FAILURE TIME AND OTHER MODELS
6.1.1 The log-logistic distribution
One limitation of the Weibull hazard function is that it is a monotonic function
of time. However, situations in which the hazard function changes direction
can arise. For example, following a heart transplant, a patient faces an increas-
ing hazard of death over the first ten days or so after the transplant, while the
body adapts to the new organ. The hazard then decreases with time as the
patient recovers. In situations such as this, a unimodal hazard function may
be appropriate.
A particular form of unimodal hazard is the function
eθ κtκ−1
h(t) = , (6.1)
1 + eθ t κ
for 0 6 t < ∞, κ > 0. This hazard function decreases monotonically if κ 6 1,
but if κ > 1, the hazard has a single mode. The survivor function correspond-
ing to the hazard function in Equation (6.1) is given by
{ }−1
S(t) = 1 + eθ tκ , (6.2)
eθ κtκ−1
f (t) = .
(1 + eθ tκ )2
t(50) = e−θ/κ .
0.20
0.15
Hazard function
=5.0
0.10
0.05
=2.0
=0.5
0.00
0 10 20 30 40
Time
Figure 6.1 Hazard functions for a log-logistic distribution with a median of 20 and
κ = 0.5, 2.0 and 5.0.
σ, if log T has a normal distribution with mean µ and variance σ 2 . The prob-
ability density function of T is given by
1 { }
f (t) = √ t−1 exp −(log t − µ)2 /2σ 2 ,
σ (2π)
for 0 6 t < ∞, σ > 0, from which the survivor and hazard functions can be
derived. The survivor function of the lognormal distribution is
( )
log t − µ
S(t) = 1 − Φ , (6.3)
σ
where Φ−1 (p/100), the pth percentile of the standard normal distribution, is
sometimes called the probit of p/100. In particular, the median survival time
under this distribution is simply t(50) = eµ .
The hazard function can be found from the relation h(t) = f (t)/S(t). This
function is zero when t = 0, increases to a maximum and then decreases to
224 ACCELERATED FAILURE TIME AND OTHER MODELS
zero as t tends to infinity. The fact that the survivor and hazard functions
can only be expressed in terms of integrals limits the usefulness of this model.
Moreover, in view of the similarity of the normal and logistic distributions,
the lognormal model will tend to be very similar to the log-logistic model.
λρ tρ−1 e−λt
f (t) = , (6.4)
Γ(ρ)
for 0 6 t < ∞, λ > 0, and ρ > 0. As for the lognormal distribution, the sur-
vivor function of the gamma distribution can only be expressed as an integral,
and we write
S(t) = 1 − Γλt (ρ),
where Γλt (ρ) is known as the incomplete gamma function, given by
∫ λt
1
Γλt (ρ) = uρ−1 e−u du.
Γ(ρ) 0
The hazard function for the gamma distribution is then h(t) = f (t)/S(t). This
hazard function increases monotonically if ρ > 1 and decreases if ρ < 1, and
tends to λ as t tends to ∞.
When ρ = 1, the gamma distribution reduces to the exponential distri-
bution described in Section 5.1.1, and so this distribution, like the Weibull
distribution, includes the exponential distribution as a special case. Indeed,
the gamma distribution is quite similar to the Weibull, and inferences based
on either model will often be very similar.
A generalisation of the gamma distribution is actually more useful than
the gamma distribution itself, since it includes the Weibull and lognormal
distributions as special cases. This model, known as the generalised gamma
distribution, may therefore be used to discriminate between alternative para-
metric models for survival data.
The probability density function of the generalised gamma distribution
is an extension of the gamma density in Equation (6.4), that includes an
additional parameter, θ, where θ > 0, and is defined by
for 0 6 t < ∞. The survivor function for this distribution is again defined in
terms of the incomplete gamma function and is given by
log t − µ
Φ−1 {1 − S(t)} = ,
σ
and so a plot of Φ−1 {1 − Ŝ(t)} against log t should give a straight line, if the
lognormal model is appropriate. The slope and intercept of this line provide
estimates of σ −1 and −µ/σ, respectively.
4
Log-odds of discontinuation
-2
2.0 2.5 3.0 3.5 4.0 4.5 5.0
Log of discontinuation time
Figure 6.2 A plot of the estimated log-odds of discontinuation after t against log t
for the data from Example 1.1.
From this plot, it appears that the relationship between the estimated
log-odds of discontinuing use of the contraceptive after time t, and log t, is
ACCELERATED FAILURE MODEL FOR TWO GROUPS 227
reasonably straight. This suggests that a log-logistic model could be used to
model the observed data.
Notice that there is very little difference in the extent of departures from
linearity in the plots of Figures 5.6 and 6.2. This means that either the Weibull
distribution or the log-logistic distribution is likely to be satisfactory, even
though the estimated hazard function under these two distributions may be
quite different. Indeed, when survival data are obtained for a relatively small
number of individuals, as in this example, there will often be little to choose
between alternative distributional models for the data. The model that is the
most convenient for the purpose in hand will then be adopted.
6.3 The accelerated failure time model for comparing two groups
The accelerated failure time model is a general model for survival data, in
which explanatory variables measured on an individual are assumed to act
multiplicatively on the time-scale, and so affect the rate at which an individual
proceeds along the time axis. This means that the models can be interpreted
in terms of the speed of progression of a disease, an interpretation that has
immediate intuitive appeal. Before the general form of the model is presented
in Section 6.4, the model for comparing the survival times of two groups of
patients is described in detail.
Suppose that patients are randomised to receive one of two treatments, a
standard treatment, S, or a new treatment, N . Under an accelerated failure
time model, the survival time of an individual on the new treatment is taken to
be a multiple of the survival time for an individual on the standard treatment.
Thus, the effect of the new treatment is to ‘speed up’ or ‘slow down’ the
passage of time. Under this assumption, the probability that an individual on
the new treatment survives beyond time t is the probability that an individual
on the standard treatment survives beyond time t/ϕ, where ϕ is an unknown
positive constant.
Now let SS (t) and SN (t) be the survivor functions for individuals in the
two treatment groups. Then, the accelerated failure time model specifies that
SN (t) = SS (t/ϕ),
for any value of the survival time t. One interpretation of this model is that
the lifetime of an individual on the new treatment is ϕ times the lifetime
that the individual would have experienced under the standard treatment.
The parameter ϕ therefore reflects the impact of the new treatment on the
baseline time-scale. When the end-point of concern is the death of a patient,
values of ϕ less than unity correspond to an acceleration in the time to death
of an individual assigned to the new treatment, relative to an individual on the
standard treatment. The standard treatment would then be the more suitable
in terms of promoting longevity. On the other hand, when the end-point is
the recovery from some disease state, values of ϕ less than unity would be
found when the effect of the new treatment is to speed up the recovery time.
228 ACCELERATED FAILURE TIME AND OTHER MODELS
In these circumstances, the new treatment would be superior to the standard.
The quantity ϕ−1 is therefore termed the acceleration factor.
The acceleration factor can also be interpreted in terms of the median
survival times of patients on the new and standard treatments, tN (50) and
tS (50), say. These values are such that SN {tN (50)} = SS {tS (50)} = 0.5.
Now, under the accelerated failure time model, SN {tN (50)} = SS {tN (50)/ϕ},
and so it follows that tN (50) = ϕtS (50). In other words, under the accelerated
failure time model, the median survival time of a patient on the new treatment
is ϕ times that of a patient on the standard treatment. In fact, the same
argument can be used for any percentile of the survival time distribution.
This means that the pth percentile of the survival time distribution for a
patient on the new treatment, tN (p), is such that tN (p) = ϕtS (p), where tS (p)
is the pth percentile for the standard treatment. This interpretation of the
acceleration factor is particularly appealing to clinicians.
From the relationship between the survivor function, probability density
function and hazard function given in Equation (1.4), the relationship between
the density and hazard functions for individuals in the two treatment groups
is
fN (t) = ϕ−1 fS (t/ϕ),
and
hN (t) = ϕ−1 hS (t/ϕ).
Now let X be an indicator variable that takes the value zero for an individual
in the group receiving the standard treatment, and unity for one who receives
the new treatment. The hazard function for the ith individual can then be
expressed as
hi (t) = ϕ−xi h0 (t/ϕxi ), (6.5)
where xi is the value of X for the ith individual in the study. Putting xi = 0
in this expression shows that the function h0 (t) is the hazard function for an
individual on the standard treatment. This is again referred to as the baseline
hazard function. The hazard function for an individual on the new treatment
is then ϕ−1 h0 (t/ϕ).
The parameter ϕ must be non-negative, and so it is convenient to set
ϕ = eα . The accelerated failure time model in Equation (6.5) then becomes
so that the hazard function for an individual on the new treatment is now
e−α h0 (t/eα ).
and
hA (t) = ϕ−1 h0 (t/ϕ),
∫t
for the two hazard functions. Using the result S(t) = exp{− 0
h(u) du}, the
baseline survivor function is
{ −0.5t
e if t 6 1,
S0 (t) =
e−0.5−(t−1) if t > 1.
Since S0 (t) > 0.61 if t < 1, the median occurs in the second part of the
survivor function and is when exp{−0.5 − (t − 1)} = 0.5. The median survival
time for those in Group I is therefore 1.19 months.
The survivor functions for the individuals in Group II under the two models
are
ψ
SP (t) = [S0 (t)] ,
and
SA (t) = S0 (t/ϕ),
respectively.
To illustrate the difference between the hazard functions under propor-
tional hazards and accelerated failure time models, consider the particular case
where ψ = ϕ−1 = 2.0. The median survival time for individuals in Group II
is 0.69 months under the proportional hazards model, and 0.60 months under
the accelerated failure time model. The hazard functions for the two groups
under both models are shown in Figure 6.3 and the corresponding survivor
functions are shown in Figure 6.4.
Under the accelerated failure time model, the increase in the hazard for
Group II from 1.0 to 2.0 occurs sooner than under the proportional hazards
model. The ‘kink’ in the survivor function also occurs earlier under the accel-
erated failure time model.
230 ACCELERATED FAILURE TIME AND OTHER MODELS
2.0
1.5
Hazard function
1.0 h0(t)
0.5
0.0
0.0 0.5 1.0 1.5 2.0 2.5
Time
Figure 6.3 The hazard functions for individuals in Group I, h0 (t), and in Group
II under (i) a proportional hazards model (—) and (ii) an accelerated failure time
model (· · ·).
1.0
0.8
Survivor function
0.6
0.4
0.2
S0(t)
0.0
0.0 0.5 1.0 1.5 2.0 2.5
Time
Figure 6.4 The survivor functions for individuals in Group I, S0 (t), and in Group
II, under (i) a proportional hazards model (—) and (ii) an accelerated failure time
model (· · ·).
ACCELERATED FAILURE MODEL FOR TWO GROUPS 231
6.3.2 The percentile-percentile plot
The percentile-percentile plot, also known as the quantile-quantile plot or the
Q-Q plot, provides an exploratory method for assessing the validity of an
accelerated failure time model for two groups of survival data. Recall that
the pth percentile of a distribution is the value t(p), which is such that the
estimated survivor function at time t(p) is 1 − (p/100), for any value of p in
the interval (0, 100). The pth percentile is therefore such that
( )
100 − p
t(p) = S −1 .
100
Now let t0 (p) and t1 (p) be the pth percentiles estimated from the survivor
functions of the two groups of survival data. The values of p might be taken
to be 10, 20, . . . , 90, so long as the number of observations in each of the two
groups is not too small. The percentiles of the two groups may therefore be
expressed as
( ) ( )
100 − p 100 − p
t0 (p) = S0−1 , t1 (p) = S1−1 ,
100 100
where S0 (t) and S1 (t) are the survivor functions for the two groups. It then
follows that
S1 {t1 (p)} = S0 {t0 (p)} , (6.7)
for any given value of p.
Under the accelerated failure time model, S1 (t) = S0 (t/ϕ), and so the pth
percentile for the second group, t1 (p), is such that
and hence
t0 (p) = ϕ−1 t1 (p).
Now let t̂0 (p), t̂1 (p) be the estimated percentiles in the two groups, so that
( ) ( )
100 − p 100 − p
t̂0 (p) = Ŝ0−1 , t̂1 (p) = Ŝ1−1 .
100 100
A plot of the quantity t̂0 (p) against t̂1 (p), for suitably chosen values of p, should
give a straight line through the origin if the accelerated failure time model is
appropriate. The slope of this line will be an estimate of the acceleration
factor, ϕ−1 . This plot may therefore be used in an exploratory assessment
of the adequacy of the accelerated failure time model. In this sense, it is an
232 ACCELERATED FAILURE TIME AND OTHER MODELS
analogue of the log-cumulative hazard plot, used in Section 4.4.1 to examine
the validity of the proportional hazards model.
The relatively small numbers of death times, and the censoring pattern in
the data from the two groups of women, mean that not all of the percentiles
can be estimated. The percentile-percentile plot will therefore have just four
pairs of points. For illustration, this is shown in Figure 6.5. The points fall on
a line that is reasonably straight, suggesting that the accelerated failure time
model would not be inappropriate. However, this conclusion must be regarded
with some caution in view of the limited number of points in the graph.
The slope of a straight line drawn through the points in Figure 6.5 is
approximately equal to 3, which is a rough estimate of the acceleration factor.
The interpretation of this is that for women whose tumours were positively
stained, the disease process is speeded up by a factor of three, relative to
those whose tumours were negatively stained. We can also say that the median
survival time for women with negatively stained tumours is estimated to be
three times that of women with positively stained tumours.
150
100
50
0
0 20 40 60
Percentile for positive staining
Figure 6.5 Percentile-percentile plot for the data on the survival times of breast can-
cer patients.
that
hi (t) = e−ηi h0 (t/eηi ), (6.8)
where
ηi = α1 x1i + α2 x2i + · · · + αp xpi
is the linear component of the model, in which xji is the value of the jth
explanatory variable, Xj , j = 1, 2, . . . , p, for the ith individual, i = 1, 2, . . . , n.
As in the proportional hazards model, the baseline hazard function, h0 (t),
is the hazard of death at time t for an individual for whom the values of
the p explanatory variables are all equal to zero. The corresponding survivor
function for the ith individual is
and the baseline survivor function, S0 (t), the survivor function of an individual
for whom x = 0, is
S0 (t) = P {exp(µ + σϵi ) > t} .
It then follows that
Si (t) = S0 {t/ exp(α′ xi )}, (6.10)
which is the general form of the survivor function for the ith individual in an
accelerated failure time model. In this version of the model, the acceleration
factor is exp(−α′ xi ) for the ith individual. The corresponding relationship
between the hazard functions is obtained using Equation (1.5) of Chapter 1.
Specifically, taking logarithms of both sides of Equation (6.10), multiplying
by −1, and differentiating with respect to t, leads to
If we now write Sϵi (ϵ) for the survivor function of the random variable ϵi in the
log-linear model of Equation (6.9), the survivor function of the ith individual
can, from Equation (6.11), be expressed as
( )
log t − µ − α1 x1i − α2 x2i − · · · − αp xpi
Si (t) = Sϵi . (6.12)
σ
This result shows how the survivor function for Ti can be found from the
survivor function of the distribution of ϵi . The result also demonstrates that
an accelerated failure time model can be derived from many probability dis-
tributions for ϵi , although some are more tractable than others.
A general expression for the pth percentile of the distribution of survival
times also follows from the results in this section. The pth percentile for the
ith individual, ti (p), is given by
100 − p
Si {ti (p)} = ,
100
and using Equation (6.11),
( )
log ti (p) − µ − α1 x1i − α2 x2i − · · · − αp xpi 100 − p
P ϵi > = .
σ 100
is the pth percentile of the distribution of survival times for the ith individual.
Note that the percentile in Equation (6.13) can be written in the form
where t0 (p) is the pth percentile for a baseline individual for whom all ex-
planatory variables take the value zero. This confirms that the α-coefficients
236 ACCELERATED FAILURE TIME AND OTHER MODELS
can be interpreted in terms of the effect of the explanatory variables on the
percentiles of the distribution of survival times.
The cumulative hazard function of the distribution of Ti is given by Hi (t) =
− log Si (t), and from Equation (6.12),
( )
log t − µ − α1 x1i − α2 x2i − · · · − αp xpi
Hi (t) = − log Sϵi ,
σ
( )
log t − µ − α1 x1i − α2 x2i − · · · − αp xpi
= Hϵi , (6.14)
σ
where Hϵi (ϵ) = − log Sϵi (ϵ) is the cumulative hazard function of ϵi . The cor-
responding hazard function, found by differentiating Hi (t) in Equation (6.14)
with respect to t, is
( )
1 log t − µ − α1 x1i − α2 x2i − · · · − αp xpi
hi (t) = hϵi , (6.15)
σt σ
where
λi = exp {−(µ + α1 x1i + α2 x2i + · · · + αp xpi )/σ} ,
which, from Equation (5.5) of Chapter 5, is the survivor function of a Weibull
distribution with scale parameter λi , and shape parameter σ −1 . Consequently,
Equation (6.16) is the accelerated failure time representation of the survivor
function of the Weibull model described in Section 5.6 of Chapter 5.
The cumulative hazard and hazard functions for the Weibull accelerated
failure time model can be found directly from the survivor function in Equa-
tion (6.16), or from Hϵi (ϵ) and hϵi (ϵ), using the general results in Equa-
tions (6.14) and (6.15). We find that the cumulative hazard function is
( )
log t − µ − α1 x1i − α2 x2i − · · · − αp xpi
Hi (t) = − log Si (t) = exp ,
σ
238 ACCELERATED FAILURE TIME AND OTHER MODELS
which can also be expressed as λi t1/σ , and the hazard function is given by
( )
1 log t − µ − α1 x1i − α2 x2i − · · · − αp xpi
hi (t) = exp , (6.17)
σt σ
−1
or hi (t) = λi σ −1 tσ −1 .
We now reconcile this form of the model with that for the Weibull pro-
portional hazards model. From Equation (5.37) of Chapter 5, the survivor
function for the ith individual is
in which λ and γ are the parameters of the Weibull baseline hazard function.
There is a direct correspondence between Equation (6.16) and Equation (6.18),
in the sense that
1
log Ti = {− log λ − β1 x1i − β2 x2i − · · · − βp xpi + ϵi } ,
γ
and the general result in Equation (6.13) leads directly to Equation (6.19).
The survivor function and hazard function of the Weibull model follow
from Equations (6.16) and (6.17), and Equation (6.19) enables percentiles to
be estimated directly.
PARAMETRIC ACCELERATED FAILURE TIME MODELS 239
6.5.2 The log-logistic accelerated failure time model
Now suppose that the survival times have a log-logistic distribution. If the
baseline hazard function in the general accelerated failure time model in Equa-
tion (6.8) is derived from a log-logistic distribution with parameters θ, κ, this
function is given by
eθ κtκ−1
h0 (t) = .
1 + eθ t κ
Under the accelerated failure time model, the hazard of death at time t for
the ith individual is
hi (t) = e−ηi h0 (e−ηi t),
where ηi = α1 x1i + α2 x2i + · · · + αp xpi is a linear combination of the values
of p explanatory variables for the ith individual. Consequently,
that is,
eθ−κηi κtκ−1
hi (t) = .
1 + eθ−κηi tκ
It then follows that the survival time for the ith individual also has a log-
logistic distribution with parameters θ−κηi and κ. The log-logistic distribution
therefore has the accelerated failure time property. However, this distribution
does not have the proportional hazards property.
The log-linear form of the accelerated failure time model in Equation (6.9)
also provides a representation of the log-logistic distribution. Suppose that
in this formulation, ϵi now has a logistic distribution with zero mean and
variance π 2 /3, so that the survivor function of ϵi is
1
Sϵi (ϵ) = .
1 + eϵ
Using Equation (6.12), the survivor function of Ti is then
{ ( )}−1
log t − µ − α1 x1i − α2 x2i − · · · − αp xpi
Si (t) = 1 + exp . (6.20)
σ
where µ and σ are two unknown parameters. Under the accelerated failure
time model, the survivor function for the ith individual, is then
which is the survivor function for an individual whose survival times have a
lognormal distribution with parameters µ + ηi and σ. The lognormal distri-
bution therefore has the accelerated failure time property.
In the log-linear formulation of the model, the random variable associated
with the survival time of the ith individual has a lognormal distribution if
log Ti is normally distributed. We therefore take ϵi in Equation (6.9) to have
a standard normal distribution, so that the survivor function of ϵi is
and
fϵi (ϵ)
hϵi (ϵ) = ,
Sϵi (ϵ)
respectively, where fϵi (ϵ) is the density function of a standard normal random
variable, given by
1 ( )
fϵi (ϵ) = √ exp −ϵ2 /2 .
(2π)
The random variable Ti , in the general accelerated failure time model, then
has a lognormal distribution with parameters µ + α′ xi and σ. The survivor
function of Ti is as given in Equation (6.23), and the hazard function is found
from Equation (6.15).
The pth percentile of the distribution of Ti , from Equation (6.13), is
{ }
ti (p) = exp σΦ−1 (p/100) + µ + α1 x1i + α2 x2i + · · · + αp xpi ,
and, in particular, t(50) = exp(µ + α′ xi ) is the median survival time for the
ith individual.
6.5.4 Summary
It is convenient to summarise the models and results that have been described
in this section, so that the different parameterisations of the distributions used
in accelerated failure time models can clearly be seen.
The general accelerated failure time model for the survival time of the ith
of n individuals, for whom x1i , x2i , . . . , xpi are the values of p explanatory
242 ACCELERATED FAILURE TIME AND OTHER MODELS
variables, X1 , X2 , . . . , Xp , is such that the random variable associated with
the survival time, Ti , can be expressed in the form
Particular distributions for Ti are derived from assumptions about the dis-
tribution of ϵi in this model. The survivor function and hazard function of
the distributions of ϵi , that lead to commonly used accelerated failure time
models for the survival times, are summarised in Table 6.2.
exp(−ϵ2 /2)
Lognormal 1 − Φ(ϵ) √
{1−Φ(ϵ)} (2π)
Φ−1 (p/100)
The cumulative hazard function of ϵi is found from Hϵi (ϵ) = − log Sϵi (ϵ),
and, if desired, the density function of ϵi is fϵi (ϵ) = hϵi (ϵ) Sϵi (ϵ). From the
survivor and hazard function of ϵi , the survivor and hazard function of Ti can
be found from
( )
log t − µ − α1 x1i − α2 x2i − · · · − αp xpi
Si (t) = Sϵi ,
σ
and ( )
1 log t − µ − α1 x1i − α2 x2i − · · · − αp xpi
hi (t) = hϵ ,
σt i σ
results that were first given in Equations (6.12) and (6.15), respectively.
The pth percentile of the distribution of ϵi is also given in Table 6.2, from
which ti (p), the pth percentile of the survival times for the ith individual, can
be found from
∏
n
δ 1−δi
L(α, µ, σ) = {fi (ti )} i {Si (ti )} ,
i=1
where fi (ti ) and Si (ti ) are the density and survivor functions for the ith
individual at ti , and δi is the event indicator for the ith observation, so that
δi is unity if the ith observation is an event and zero if it is censored. Now,
from Equation (6.12),
Si (ti ) = Sϵi (zi ) ,
where zi = (log ti − µ − α1 x1i − α2 x2i − · · · − αp xpi )/σ, and differentiation
with respect to t gives
1
fi (ti ) = fϵ (zi ) .
σti i
The likelihood function can then be expressed in terms of the survivor and
density functions of ϵi , giving
∏
n
(σti )−δi {fϵi (zi )} i {Sϵi (zi )}
δ 1−δi
L(α, µ, σ) = .
i=1
∑
n
log L(α, µ, σ) = {− δi log(σti ) + δi log fϵi (zi ) + (1 − δi ) log Sϵi (zi )} ,
i=1
(6.24)
244 ACCELERATED FAILURE TIME AND OTHER MODELS
and the maximum likelihood estimates of the p + 2 unknown parameters, µ, σ
and α1 , α2 , . . . , αp , are found by maximising this function using the Newton-
Raphson procedure, described in Section 3.3.3.
Note that the expression∑n for the log-likelihood function in Equation (6.24)
includes the term − i=1 δi log ti , which does not involve any unknown pa-
rameters. This term may therefore be omitted from the log-likelihood function,
as noted in Section 5.6.1 of Chapter 5, in the context of the Weibull propor-
tional hazards model. Indeed, log-likelihood values given by most computer
software
∑n for accelerated failure time modelling do not include the value of
− i=1 δi log ti .
After fitting a model, the value of the statistic −2 log L̂ can be computed,
and used in making comparisons between nested models, just as for the pro-
portional hazards model. Specifically, to compare two nested models, the dif-
ference in the values of the statistic −2 log L̂ for the two models is calculated,
and compared with percentage points of the chi-squared distribution, with
degrees of freedom equal to the difference in the number of α-parameters
included in the linear component of the model.
Once a suitable model has been identified, estimates of the survivor and
hazard functions may be obtained and plotted. The fitted model can be inter-
preted in terms of the estimated value of the acceleration factor for particular
individuals, or in terms of the median and other percentiles of the distribution
of survival times. In particular, the estimated pth percentile of the distribution
of survival times, for an individual whose vector of values of the explanatory
variables is xi , is, from Equation (6.13), given by
t̂i (p) = exp{σ̂ϵi (p) + µ̂ + α̂1 x1i + α̂2 x2i + · · · + α̂p xpi }.
0.010
0.008
Estimated hazard function
0.006
0.004
0.002
0.000
0 50 100 150 200 250
Survival time
Figure 6.6 Estimated hazard functions under the Weibull accelerated failure time
model for women with positively stained (—) and negatively stained (· · ·) tumours.
which is the hazard function for women with negatively stained tumours, and
hence
hi (t) = eβxi λγtγ−1 .
from which the estimated median survival time for a women with negative
staining is 235 days, while that for women with positive staining is 75 days.
These values are very close to those obtained under the Weibull accelerated
failure time model.
FITTING AND COMPARING MODELS 247
The estimated hazard function for the ith woman is now
{ [ ( )]}−1
1 log t − µ̂ − α̂xi
ĥi (t) = 1 + exp − ,
σ̂t σ̂
A graph of this function for the two groups of women is shown in Figure 6.7.
This can be compared with the graph in Figure 6.6.
0.010
0.008
Estimated hazard function
0.006
0.004
0.002
0.000
0 50 100 150 200 250
Survival time
Figure 6.7 Estimated hazard functions under the log-logistic accelerated failure time
model for women with positively stained (—) and negatively stained (· · ·) tumours.
The hazard functions for those with negative staining are quite similar
under the two models. However, the hazard function for those with positive
staining under the log-logistic model is different from that under the Weibull
model. The values of the statistic −2 log L̂ for the fitted Weibull and log-
logistic models are 121.77 and 118.495. On this basis, the log-logistic model
is a slightly better fit. An analysis of residuals, to be discussed in Chapter
7, may help in choosing between these two models, although with this small
data set, such an analysis is unlikely to be very informative.
Finally, in terms of the parameterisation of the model given in Section
6.1.1, the baseline hazard function is
eθ κtκ−1
h0 (t) = ,
1 + eθ t κ
248 ACCELERATED FAILURE TIME AND OTHER MODELS
and so the hazard function for the ith woman in the study is
eθ−καxi κtκ−1
hi (t) = .
1 + eθ−καxi tκ
As in Example 3.6, the variables Size and Index are the ones that are
needed in the model. When either of these variables is omitted, the corre-
sponding increase in the value of −2 log L̂ is significant, and neither Age nor
Shb reduce −2 log L̂ by a significant amount when they are added to the model.
When the term corresponding to the treatment effect, Treat, is added to the
model that contains Size and Index, −2 log L̂ decreases to 21.245. When this
FITTING AND COMPARING MODELS 249
reduction of 1.867 is compared with percentage points of a chi-squared distri-
bution on 1 d.f., the reduction is not significant at the 10% level (P = 0.172).
There is no evidence of any interaction between Treat and the prognostic vari-
ables Size and Index, and so the conclusion is that there is no statistically
significant treatment effect.
The magnitude of the treatment effect can be assessed by calculating the
acceleration factor. According to the log-linear form of the model, the random
variable associated with the survival time of the ith patient, Ti , is such that
in which ϵi has a logistic distribution, Size i and Index i are the values of the
tumour size and Gleason index for the ith individual, and Treat i is zero if the
ith individual is in the placebo group and unity if in the treated group. The
maximum likelihood estimates of the unknown parameters in this model are
given by µ̂ = 7.661, σ̂ = 0.338, α̂1 = −0.029, α̂2 = −0.293 and α̂3 = 0.573.
The values of the α’s suggests that the survival time tends to be shorter for
larger values of the tumour size and tumour index, and longer for individuals
assigned to the active treatment.
Using Equation (6.20), the fitted survivor function for the ith patient is
[ { }]−1
log t − µ̂ − α̂1 Sizei − α̂2 Indexi − α̂3 Treati
Ŝi (t) = 1 + exp ,
σ̂
where
1
ζ̂i = {−µ̂ − α̂1 Sizei − α̂2 Indexi − α̂3 Treati } ,
σ̂
that is,
where
η̂i = −0.029 Size i − 0.293 Index i + 0.573 Treat i ,
and from Equation (6.1),
eθ̂ κ̂tκ̂−1
ĥ0 (t) = .
1 + eθ̂ tκ̂
The estimated parameters in this form of the estimated baseline hazard func-
tion, ĥ0 (t), are given by θ̂ = −22.644 and κ̂ = 2.956. A graph of this function
is shown in Figure 6.8.
This figure indicates that the baseline hazard is increasing over time. Com-
parison with the baseline hazard function for a fitted Weibull model, also
shown in this figure, indicates that under the log-logistic model, the estimated
baseline hazard function does not increase quite so rapidly.
0.0000025
0.0000020
Estimated hazard function
0.0000015
0.0000010
0.0000005
0.0000000
0 20 40 60 80
Survival time
Figure 6.8 Estimated baseline hazard function for the fitted log-logistic model (—)
and a fitted Weibull model (· · ·).
Using the general result from Equation (1.5), the hazard function is
d
hi (t) = − log Si (t),
dt
and so
(1 − e−ηi )f0 (t)
hi (t) = h0 (t) − ,
e−ηi + (1 − e−ηi )S0 (t)
after differentiating both sides of Equation (6.26) with respect to t, where
f0 (t) is the baseline probability density function. After some rearrangement,
this equation becomes
f0 (t)
hi (t) = h0 (t) − . (6.27)
(eηi − 1)−1 + S0 (t)
From Equation (1.4), we also have that h0 (t) = f0 (t)/S0 (t) and substituting
for f0 (t) in Equation (6.27) gives
{ }
S0 (t)
hi (t) = h0 (t) 1 − ηi .
(e − 1)−1 + S0 (t)
where θ and κ are unknown parameters. The baseline odds of survival beyond
time t are then given by
S0 (t)
= e−θ t−κ .
1 − S0 (t)
The odds of the ith individual surviving beyond time t are therefore
Si (t)
= eηi −θ t−κ ,
1 − Si (t)
and so the survival time of the ith individual has a log-logistic distribution
with parameters θ − ηi and κ. The log-logistic distribution therefore has the
proportional odds property, and the distribution is the natural one to use in
conjunction with the proportional odds model. In fact, this is the only distribu-
tion to share both the accelerated failure time property and the proportional
odds property.
This result also means that the estimated β-coefficients in the linear com-
ponent of the proportional odds model in Equation (6.25) can be obtained by
multiplying the α-coefficients in the log-logistic accelerated failure time model
of Equation (6.20) by κ̂ = σ̂ −1 , where σ̂ is the estimated value of the param-
eter σ. The coefficients of the explanatory variables under the proportional
odds model can then be obtained from those under the accelerated failure
time model, and vice versa. The results of a survival analysis based on a pro-
portional odds model can therefore be interpreted in terms of an acceleration
254 ACCELERATED FAILURE TIME AND OTHER MODELS
factor or the ratio of the odds of survival beyond some time, whichever is the
more convenient.
As for other models for survival data, the proportional odds model can be
fitted using the method of maximum likelihood. Alternative models may then
be compared on the basis of the statistic −2 log L̂.
In a two-group study, a preliminary examination of the likely suitability
of the model can easily be undertaken. The log-odds of the ith individual
surviving beyond time t are
{ }
Si (t)
log = βxi − θ − κ log t,
1 − Si (t)
where xi is the value of an indicator variable that takes the value zero if an
individual is in one group and unity if in the other. The Kaplan-Meier estimate
of the survivor function is then obtained for the individuals
{ in each group} and
the estimated log-odds of survival beyond time t, log Ŝi (t)/[1 − Ŝi (t)] , are
plotted against log t. If the plot shows two parallel straight lines, this would
indicate that the log-logistic model was appropriate. If the lines were straight
but not parallel, this would suggest that the parameter κ in the model was
not the same for each treatment group. Parallel curves in this plot suggest
that although the proportional odds assumption is valid, the survival times
cannot be taken to have a log-logistic distribution.
4
Log-odds of survival
-1
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
Log of survival time
Figure 6.9 Estimated values of the log-odds of survival beyond t plotted against log t
for women with positively stained (∗) and negatively stained (•) tumours.
h(t) = θ − βe−γt ,
where θ > 0, β > 0 and γ > 0. This is essentially a Gompertz hazard function,
defined in Section 5.9, with an additional constant. The general shape of this
function is depicted in Figure 6.10. This function has a value of θ − β when
t = 0 and increases to a horizontal asymptote at a hazard of θ. Similarly the
function
h(t) = θ + βe−γt ,
where θ > 0, β > 0 and γ > 0, could be used to model a hazard which
decreases from θ + β to a horizontal asymptote at θ.
Using Equation (1.6), the corresponding survivor function can be found,
from which the probability density function can be obtained. The probability
distribution corresponding to this specification of the hazard function is known
as the Gompertz-Makeham distribution.
256 ACCELERATED FAILURE TIME AND OTHER MODELS
Hazard function
0
0
Time
in which the baseline hazard function is h0 (t) = λγtγ−1 , λ and γ are unknown
parameters that determine the scale and shape of the underlying Weibull
distribution, and xi is the vector of values of p explanatory variables for the
ith of n individuals. The corresponding cumulative hazard function is
∫ t
Hi (t) = hi (u) du = exp(β ′ xi )λtγ ,
0
log Hi (t) = γ0 + γ1 y + ηi .
γ0 + γ1 y + γ2 ν1 (y),
where
with
(y − a)3+ = max{0, (y − a)3 },
260 ACCELERATED FAILURE TIME AND OTHER MODELS
min 1 max
Figure 6.11 Restricted cubic spline with an internal knot at k1 and boundary knots
at kmin and kmax .
which confirms that log Hi (t) is a linear function of y for y < kmin and y >
kmax , and cubic functions of y for values of y between kmin and k1 , and between
k1 and kmax .
The flexibility of the parametric model for log Hi (t) can be increased by
increasing the number of internal knots. The greater the number of knots,
the more complex the curve. In general, for a model with m internal knots,
non-linear terms ν1 (y), ν2 (y), . . . , νm (y) are defined, so that for a model with
m knots,
and
kmax − kj
λj = .
kmax − kmin
The model defined by Equation (6.28) is the Royston and Parmar model.
The extended parametric form of the baseline hazard function means that
the survival times no longer have a Weibull distribution under this model,
nor any other recognisable distribution, although the model still assumes pro-
portional hazards amongst the explanatory variables. The model in Equa-
tion (6.28) can also be expressed in terms of a baseline cumulative hazard
function, H0 (t), by writing Hi (t) = exp(ηi )H0 (t), where
The corresponding survivor function is Si (t) = exp{−Hi (t)}, for the ith indi-
vidual, so that
where
0, y 6 kmin ,
′ −3λj (y − kmin )2 , y ∈ (kmin , kj ),
νj (y) =
3(y − kj )2 − 3λj (y − kmin )2 , y ∈ (kj , kmax ),
3(y − kj )2 − 3λj (y − kmin )2 − 3(1 − λj )(y − kmax )2 , y > kmax ,
∏
n
δ
{hi (ti )} i Si (ti ),
i=1
where the survivor and hazard functions for the ith individual at time ti ,
Si (ti ) and hi (ti ), are given in Equations (6.31) and (6.32), and δi is the event
indicator. The logarithm of this likelihood function can be maximised using
standard optimisation routines, and this leads to fitted survivor and hazard
functions that are smooth functions of the survival time t. The fitting process
also leads to standard errors of the parameter estimates and functions of them,
such as estimates of the hazard and survivor functions at any given time.
-1
Log-cumulative hazard
-2
-3
-4
-5
-6
-7
4 5 6 7 8
Log survival time
Figure 6.12 Log-cumulative hazard plot for the women not on tamoxifen (•) and
those in the tamoxifen group (∗).
264 ACCELERATED FAILURE TIME AND OTHER MODELS
At this stage, a variable selection process may be used to determine which
of the other explanatory factors, Age, Men, Size, Grade, Nodes, Prog, and
Oest, are needed in the model in addition to Treat, but in this example, all of
them will be included. Royston and Parmar models with increasing numbers
of knots are then fitted. Table 6.6 gives the value of the AIC statistic for the
models fitted, where the model with zero knots is the standard Weibull model.
log Ĥi (t) = exp(β̂1 Treati + β̂2 Agei + β̂3 Meni + β̂4 Sizei + β̂5 Gradei
+β̂6 Nodesi + β̂7 Progi + β̂8 Oesti )Ĥ0 (t),
where
Ĥ0 (t) = − exp{γ̂0 + γ̂1 y + γ̂2 ν1 (y) + γ̂3 ν2 (y) + γ̂4 ν3 (y)}
is the baseline cumulative hazard function for a model with 3 knots, y = log t
and the functions ν1 (y), ν2 (y), ν3 (y) are defined in Equation (6.29).
A visual summary of the fit of the Royston and Parmar models is shown in
Figure 6.13. This figure shows the adjusted baseline survivor function on fit-
ting a Cox regression model that contains the 8 explanatory variables, shown
as a step-function. In addition, the adjusted baseline survivor function for a
Weibull model, and models with one and three internal knots, is shown. This
figure confirms that the underlying risk adjusted baseline survivor function
from the Cox model is not well fitted by a Weibull model. A Royston and Par-
mar model with one knot tracks the estimated Cox baseline survivor function
much more closely, and that with three knots gives an improved performance
at the longer survival times.
The estimated values of the parameters and their standard errors in the
Royston and Parmar model with three knots, are given in Table 6.7. Also
FLEXIBLE PARAMETRIC MODELS 265
1.0
0.9
Estimated survivor function
0.8
0.7
0.6
0.5
0 500 1000 1500 2000 2500 3000
Survival time
Figure 6.13 Risk adjusted survivor functions for a fitted Cox regression model (—),
Weibull model ( ) and Royston-Parmar models with 1 (·······) and 3 (- - -) knots.
Table 6.7 Parameter estimates and their standard errors for a Royston and Parmar
model with 3 knots and a Cox regression model.
Parameter Royston and Parmar model Cox regression model
Estimate se (Estimate) Estimate se (Estimate)
β1 −0.3386 0.1290 −0.3372 0.1290
β2 −0.0096 0.0093 −0.0094 0.0093
β3 0.2753 0.1831 0.2670 0.1833
β4 0.0077 0.0040 0.0077 0.0039
β5 0.2824 0.1059 0.2801 0.1061
β6 0.0497 0.0074 0.0499 0.0074
β7 −0.0022 0.0006 −0.0022 0.0006
β8 0.0002 0.0004 0.0002 0.0004
γ0 −20.4691 3.2958
γ1 2.9762 0.5990
γ2 −0.4832 0.5873
γ3 1.4232 0.8528
γ4 −0.9450 0.4466
266 ACCELERATED FAILURE TIME AND OTHER MODELS
shown in this table are the estimated β-parameters in a fitted Cox model and
their standard errors. The estimates and their standard errors for the two
models are very similar. The adjusted hazard ratio for a patient on tamoxifen
relative to one who is not, is 0.71 under both models, so that the hazard of
recurrence of cancer or death is lower for patients on tamoxifen.
The Royston and Parmar model has the advantage of providing a paramet-
ric estimate of the baseline hazard. The fitted baseline hazard function, ad-
justed for the 8 explanatory variables, for the Weibull model and the Royston
and Parmar spline models with one and three knots, is shown in Figure 6.14.
The Royston and Parmar model indicates that the underlying hazard function
is unimodal, and so it is not surprising that the Weibull model is a poor fit.
0.0004
Estimated hazard function
0.0003
0.0002
0.0001
0.0000
0 500 1000 1500 2000 2500 3000
Survival time
Figure 6.14 Adjusted baseline hazard functions for a fitted Weibull model (—) and
a Royston-Parmar model with 1 (·······) and 3 knots (- - -).
where γ0 = −θ, γ1 = −κ, and the model is linear in y = log t. Extending this
model to incorporate non-linear terms to give a restricted cubic spline with
m internal knots, and using the same notation as in Equation (6.28), we get
( )
S0 (t)
log = γ0 + γ1 y + γ2 ν1 (y) + · · · + γm+1 νm (y),
1 − S0 (t)
which is analogous to the expression for the cumulative baseline hazard func-
tion in Equation (6.30). The proportional odds model that includes spline
terms can then be expressed as
( )
Si (t)
log = γ0 + γ1 y + γ2 ν1 (y) + · · · + γm+1 νm (y) + ηi ,
1 − Si (t)
where ηi = β ′ xi .
This flexible proportional odds model is used in just the same way as the
flexible proportional hazards model, and is particularly suited to situations
where the hazard function is unimodal. The model can also be expressed in
terms of the odds of an event occurring before time t, log[F (t)/{1 − F (t)}],
and as this is simply − log[S(t)/{1 − S(t)}], the resulting parameter estimates
will only differ in sign.
The model can be fitted using the method of maximum likelihood, as
described in Section 6.9.3, and leads to an estimate of the survivor function.
Estimates of the corresponding hazard and cumulative hazard functions can
then straightforwardly be obtained.
beyond time t that contains three knots is the best fit, although the fit of this
model is barely distinguishable from the model with just one knot.
For the model with three knots, the estimated parameter associated with
the treatment effect is 0.5287. The ratio of the odds of surviving beyond time
t for a patient on tamoxifen, relative to one who is not, is exp(0.5287) = 1.70,
and so the odds of a patient on tamoxifen surviving beyond any given time are
1.7 times that for a patient who has not received that treatment. This result
is entirely consistent with the corresponding hazard ratio for the treatment
effect, given in Example 6.7.
As for other models described in this chapter, the model that incorporates a
cured fraction can be fitted using the method of maximum likelihood. Suppose
that the data consist of n survival times t1 , t2 , . . . , tn , and that δi is the event
indicator for the ith individual so that δi = 1 if the ith individual dies and
zero otherwise. From Equation (5.38), the likelihood function is
∏
n
δ
L(β, ϕ, λ, γ) = {hi (ti )} i Si (ti ),
i=1
where hi (ti ) and Si (ti ) are found by substituting hni (ti ) and Sni (ti ) from
Equations (6.35) and (6.36) into Equations (6.33) and (6.34).
The corresponding log-likelihood function, log L(β, ϕ, λ, γ) can then be
maximised using computer software for numerical optimisation. This process
leads to estimates β̂, ϕ̂, λ̂, γ̂ of the unknown parameters and their standard
errors. In addition, models with different explanatory variables in either the
model for the probability of cure or the survival models for non-cured indi-
viduals, can be compared using the values of the statistic −2 log L(β̂, ϕ̂, λ̂, γ̂)
in the usual way.
A number of extensions to this model are possible. For example, an ac-
celerated failure time model can be used instead of a proportional hazards
model for the non-cured individuals. A Royston and Parmar model can also
be used to provide a more flexible model for the baseline hazard function in
the non-cured individuals.
rSi = {log ti − µ̂ − α̂1 x1i − α̂2 x2i − · · · − α̂p xpi } /σ̂, (7.1)
275
276 MODEL CHECKING IN PARAMETRIC MODELS
where ti is the observed survival time of the ith individual, and µ̂, σ̂, α̂j ,
j = 1, 2, . . . , p, are the estimated parameters in the fitted accelerated fail-
ure time model. This residual has the appearance of a quantity of the form
‘observation − fitted value’, and would be expected to have the same distri-
bution as that of ϵi in the accelerated failure time model, if the model were
correct. For example, if a Weibull distribution is adopted for Ti , the rSi would
be expected to behave as if they were a possibly censored sample from a Gum-
bel distribution, if the fitted model is correct. The estimated survivor function
of the residuals would then be similar to the survivor function of ϵi , that is,
Sϵi (ϵ). Using the general result in Section 4.1.1 of Chapter 4, − log Sϵi (ϵ) has
a unit exponential distribution, and so it follows that − log Sϵi (rSi ) will have
an approximate unit exponential distribution, if the fitted model is appro-
priate. This provides the basis for a diagnostic plot that may be used in the
assessment of model adequacy, described in Section 7.2.4.
where Sϵi (ϵ) is the survivor function of ϵi in the accelerated failure time model,
α̂j is the estimated coefficient of xji , j = 1, 2, . . . , p, and µ̂, σ̂ are the estimated
values of µ and σ. The form of Sϵi (ϵ) for some commonly used distributions
for Ti was summarised in Table 6.2 of Chapter 6.
The Cox-Snell residuals for a parametric model are defined by
where Ĥi (ti ) is the estimated cumulative hazard function, and Ŝi (ti ) is the es-
timated survivor function in Equation (7.2), evaluated at ti . As in the context
of the Cox regression model, these residuals can be taken to have a unit ex-
ponential distribution when the correct model has been fitted, with censored
observations leading to censored residuals; see Section 4.1.1 for details.
The Cox-Snell residuals in Equation (7.3) are very closely related to the
standardised residuals in Equation (7.1), since from Equation (7.2), we see
that rCi = − log Sϵi (rSi ). Assessment of whether the standardised residuals
RESIDUALS FOR PARAMETRIC MODELS 277
have a particular distribution is therefore equivalent to assessing whether the
corresponding Cox-Snell residuals have a unit exponential distribution.
∂ log L ∑n
−1
=σ g(zi ),
∂µ i=1
∂ log L ∑n
= σ −1 {zi g(zi ) − δi } ,
∂σ i=1
∂ log L ∑n
= σ −1 xji g(zi ),
∂αj i=1
σ̂ −1 g(rSi ),
σ̂ −1 {rSi g(rSi ) − δi } ,
rU ji = σ̂ −1 xji g(rSi ).
Of these, the score residuals for Xj are the most important, and as in Section
4.1.6 are denoted rU ji . Specific expressions for these residuals are given in
the sequel for some particular parametric models. Because the sums of score
residuals are the derivatives of the log-likelihood function at its maximum,
these residuals must sum to zero.
The standardised residuals are then as given in Equation (7.1), and if an ap-
propriate model has been fitted, these will be expected to behave as a possibly
censored sample from a Gumbel distribution. This is equivalent to assessing
whether the Cox-Snell residuals, defined below, have a unit exponential dis-
tribution.
The Cox-Snell residuals, rCi = − log Sϵi (rSi ), are, from Equation (7.6),
simply the exponentiated standardised residuals, that is rCi = exp(rSi ). These
residuals lead immediately to the martingale and deviance residuals for the
Weibull model, using Equations (7.4) and (7.5).
The score residuals for the Weibull model are found from the general results
in Section 7.1.5. In particular, the ith score residual for the jth explanatory
variable in the model, Xj , is
rU ji = σ̂ −1 xji (erSi − δi ) ,
where rSi is the ith standardised residual and δi the event indicator. We also
note that the ith score residual for µ is σ̂ −1 (erSi − δi ), which is σ̂ −1 (rCi −
δi ). Since these score residuals sum to zero, it follows that the sum of the
martingale residuals, defined in Equation (7.4), must be zero in the Weibull
model.
that is,
rCi = log {1 + exp (rSi )} ,
where rSi is the ith standardised residual. The score residuals are found from
the general results in Section 7.1.5, and we find that the ith score residual for
the jth explanatory variable in the model is
{ }
exp(rSi ) − δi
rU ji = σ̂ −1 xji .
1 + exp(rSi )
where, as usual, rSi is the ith standardised residual in Equation (7.1). Again
the martingale and deviance residuals are obtained from these, and the score
residuals are obtained from the results in Section 7.1.5. Specifically, the ith
score residual for Xj , is
{ }
−1 (1 − δi )fϵi (rSi )
rU ji = σ̂ + δi rSi ,
1 − Φ(rSi )
where fϵi (rSi ) is the standard normal density function at rSi , and Φ(rSi ) is
the corresponding distribution function.
for i = 1, 2, . . . , 26, and these are given in Table 7.1. Also given are the values
of the Cox Snell residuals, which for the Weibull model, are such that rCi =
exp(rSi ).
1.5
Cumulative hazard of residual
1.0
0.5
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Cox-Snell residual
1
Martingale residual
-1
-2
0 4 8 12 16 20 24 28
Rank of survival time
Figure 7.2 Plot of the martingale residuals against rank order of survival time.
284 MODEL CHECKING IN PARAMETRIC MODELS
2
Deviance residual
-1
-2
0 4 8 12 16 20 24 28
Rank of survival time
Figure 7.3 Plot of the deviance residuals against rank order of survival time.
200 6
4
Score residual for Treat
Score residual for Age
100
2
0
0
-100
-2
-200 -4
0 4 8 12 16 20 24 28 0 4 8 12 16 20 24 28
Rank of survival time Rank of survival time
Figure 7.4 Score residuals plotted against rank order of survival time for Age and
Treat.
1 ∑
nj
S̄j (t) = Ŝij (t),
nj i=1
where nj is the number of observations in the jth group. The value of S̄j (t)
would be obtained for a range of t values, so that a plot of the values of
S̄j (t) against t, for each value of j, yields a smooth curve. The corresponding
observed survivor function for a particular group is the Kaplan-Meier estimate
of the survivor function for the individuals in that group. Superimposing these
two sets of estimates gives a visual representation of the agreement between
the observed and fitted survivor functions. This procedure is analogous to that
described in Section 3.11.1 for the Cox regression model.
Using this approach, it is often easier to detect departures from the fitted
model, than from plots based on residuals. However, the procedure can be
286 MODEL CHECKING IN PARAMETRIC MODELS
criticised for using the same fitted model to define the groups, and to obtain
the estimated survivor function for each group. If the database is sufficiently
large, the survivor function could be estimated from half of the data, and the
fit of the model evaluated on the remaining half. Also, since the method is
based on the values of the risk score, no account is taken of differences between
individuals who have different sets of values of the explanatory variables, but
just happen to have the same value of the risk score.
Example 7.2 Chemotherapy in ovarian cancer patients
In this example, we examine the fit of a Weibull proportional hazards model
to the data on the survival times of 26 women, following treatment for ovarian
cancer. A Weibull model that contains the variables Age and Treat is fitted,
as in Example 5.10, so that the fitted survivor function for the ith individual
is { }
Ŝi (t) = exp −eη̂i λ̂tγ̂ , (7.7)
where η̂i = 0.144 Age i − 1.023 Treat i is the risk score, i = 1, 2, . . . , 26. This
is equivalent to the accelerated failure time representation of the model, used
in Example 7.1. The values of η̂i are then arranged in ascending order and
divided into three groups, as shown in Table 7.2.
Table 7.2 Values of the risk score, with the patient number in parentheses, for
three groups of ovarian cancer patients.
Group Risk score
1 (low risk) 4.29 (14) 4.45 (2) 4.59 (20) 5.15 (22) 5.17 (5)
5.17 (19) 5.31 (17) 5.59 (4) 5.59 (12)
2 (medium risk) 5.87 (16) 6.02 (13) 6.16 (8) 6.18 (18) 6.31 (26)
6.45 (6) 6.45 (9) 6.45 (11) 7.03 (25)
3 (high risk) 7.04 (15) 7.04 (24) 7.17 (7) 8.19 (23)
8.48 (1) 9.34 (3) 9.63 (10) 9.63 (21)
The next step is to obtain the average survivor function for each group by
averaging the values of the estimated survivor function, in Equation (7.7), for
the patients in the three groups. This is done for t = 0, 1, . . . , 1230, and the
three average survivor functions are shown in Figure 7.5. The Kaplan-Meier
estimate of the survivor function for the individuals in each of the three groups
shown in Table 7.2 is then calculated, and this is also shown in Figure 7.5.
From this plot, we see that the model is a good fit to the patients in the
high-risk group. For those in the middle group, the agreement between the
observed and fitted survivor functions is not that good, as the fitted model
leads to estimates of the survivor function that are a little too high. In fact, the
patients in this group have the largest values of the martingale residuals, which
also indicates that the death times of these individuals are not adequately
summarised by the fitted model. There is only one death among the individuals
in the low-risk group, and so little can be said about the fit of the model to
this set of patients.
IDENTIFICATION OF INFLUENTIAL OBSERVATIONS 287
1.0
0.8
Estimated survivor function
0.6
0.4
0.2
0.0
0 250 500 750 1000 1250
Survival time
Figure 7.5 Plot of the observed and fitted survivor functions for patients of low (·······),
medium (- - -) and high (—) risk. The observed survivor function is the step-function.
n×(p+2) matrix whose ith row is the transpose of the vector of score residuals,
u′i . An alternative measure of the influence of the ith observation on the set
of parameter estimates is the statistic
The statistics Fi and Ci will typically have values that are quite different
from each other. However, in each case a relatively large value of the statistic
will indicate that the corresponding observation is influential. Exactly how
such observations influence the estimates would need to be investigated by
omitting that observation from the data set and refitting the model.
0.012
0.008
Delta-beta for Age
0.004
0.000
-0.004
-0.008
0 4 8 12 16 20 24 28
Rank of survival time
Figure 7.6 Plot of the delta-betas for Age against rank order of survival time.
0.20
0.15
0.10
Delta-beta for Treat
0.05
0.00
-0.05
-0.10
-0.15
0 4 8 12 16 20 24 28
Rank of survival time
Figure 7.7 Plot of the delta-betas for Treat against rank order of survival time.
290 MODEL CHECKING IN PARAMETRIC MODELS
are plotted against the rank order of the survival times in Figures 7.8 and 7.9.
Figure 7.8 clearly shows that the observation corresponding to patient 5 is
influential, and that the influence of patients 1, 4, 14 and 26 should be in-
vestigated in greater detail. Figure 7.9 strongly suggests that the data from
patients 5 and 26 is influential.
0.4
0.3
Value of F-statistic
0.2
0.1
0.0
0 4 8 12 16 20 24 28
Rank of survival time
Figure 7.8 Plot of the F -statistic against rank order of survival time.
3
Value of C-statistic
0
0 4 8 12 16 20 24 28
Rank of survival time
Figure 7.9 Plot of the C-statistic against rank order of survival time.
TESTING PROPORTIONAL HAZARDS IN THE WEIBULL MODEL 291
The linear component of the fitted hazard function in the model fitted to
all 26 patients is
0.144 Age i − 1.023 Treat i ,
while that on omitting each of observations 1, 4, 5, 14 and 26 in turn is as
follows:
Omitting patient number 1: 0.142 Age i − 1.016 Treat i
These results show that the effect of omitting the data from patient 1 on
the parameter estimates is small. When the data from patient 4 are omitted,
the estimated coefficient of Age is most affected, whereas when the data from
patient 14 are omitted, the coefficient of Treat is changed the most. On leaving
out the data from patients 5 and 26, both estimates are considerably affected.
The hazard ratio for a patient on the combined treatment (Treat = 2),
relative to one on the single treatment (Treat = 1), is estimated by e−1.023 =
0.36, when the model is fitted to all 26 patients. When the observations from
patients 1, 4, 5, 14 and 26 are omitted in turn, the estimated age-adjusted
hazard ratios are 0.36, 0.30, 0.49, 0.27 and 0.50, respectively. The data from
patients 5 and 26 clearly have the greatest effect on the estimated hazard ratio;
in each case the estimate is increased, and the magnitude of the treatment
effect is diminished. Omission of the data from patients 4 or 14 decreases the
estimated hazard ratio, thereby increasing the estimated treatment difference.
Time-dependent variables
295
296 TIME-DEPENDENT VARIABLES
side effect at any given time, is a further example of an internal variable. In
each case, such variables reflect the condition of the patient and their values
may well be associated with the survival time of the patient.
On the other hand, external variables are time-dependent variables that
do not necessarily require the survival of a patient for their existence. One
type of external variable is a variable that changes in such a way that its
value will be known in advance at any future time. The most obvious example
is the age of a patient, in that once the age at the time origin is known,
that patient’s age at any future time will be known exactly. However, there
are other examples, such as the dose of a drug that is to be varied in a
predetermined manner during the course of a study, or planned changes to
the type of immunosuppressant to be used following organ transplantation.
Another type of external variable is one that exists totally independently of
any particular individual, such as the level of atmospheric sulphur dioxide, or
air temperature. Changes in the values of such quantities may well have an
effect on the lifetime of individuals, as in studies concerning the management
of patients with certain types of respiratory disease.
Time-dependent variables also arise in situations where the coefficient of
a time-constant explanatory variable is a function of time. In Section 3.9 of
Chapter 3, it was explained that the coefficient of an explanatory variable
in the Cox proportional hazards model is a log-hazard ratio, and so under
this model, the hazard ratio is constant over time. If this ratio were in fact a
function of time, then the coefficient of the explanatory variable that varies
with time is referred to as a time-varying coefficient. In this case, the log-
hazard ratio is not constant and so we no longer have a proportional hazards
model. More formally, suppose that the coefficient of an explanatory variable,
X, is a linear function of time, t, so that we may write the term as βtX. This
means that the corresponding log-hazard ratio is a linear function of time.
This was precisely the sort of term introduced into the model in order to test
the assumption of proportional hazards in Section 4.4.3 of Chapter 4. This
term can also be written as βX(t), where X(t) = Xt is a time-dependent
variable. In general, suppose that a model includes the explanatory variable,
X, with a time-varying coefficient of the form β(t). The corresponding term in
the model would be β(t)X, which can be expressed as βX(t). In other words,
a term that involves a time-varying coefficient can be expressed as a time-
dependent variable with a constant coefficient. However, if β(t) is a non-linear
function of one or more unknown parameters, for example β0 exp(β1 t), the
term is not so easily fitted in a model.
All these different types of time-dependent variables can be introduced into
the Cox regression model, in the manner described in the following section.
In this model, the baseline hazard function, h0 (t), is interpreted as the hazard
function for an individual for whom all the variables are zero at the time
origin, and remain at this same value through time.
Since the values of the variables xji (t) in the model given in Equation (8.1)
depend on the time t, the relative hazard hi (t)/h0 (t) is also time-dependent.
This means that the hazard of death at time t is no longer proportional to the
baseline hazard, and the model is no longer a proportional hazards model.
To provide an interpretation of the β-parameters in this model, consider
the ratio of the hazard functions at time t for two individuals, the rth and
sth, say. This is given by
hr (t)
= exp [β1 {xr1 (t) − xs1 (t)} + · · · + βp {xrp (t) − xsp (t)}] .
hs (t)
in which R(ti ) is the risk set at time ti , the death time of the ith individual in
the study, i = 1, 2, . . . , n, and δi is an event indicator that is zero if the survival
298 TIME-DEPENDENT VARIABLES
time of the ith individual is censored and unity otherwise. This expression can
then be maximised to give estimates of the β-parameters.
In order to use Equation (8.1) in this maximisation process, the values of
each of the variables in the model must be known at each death time for all
individuals in the risk set at time ti . This is no problem for external variables
whose values are preordained, but it may be a problem for external variables
that exist independently of the individuals in a study, and certainly for internal
variables.
To illustrate the problem, consider a trial of two maintenance therapies for
patients who have suffered a myocardial infarct. The serum cholesterol level
of such patients may well be measured at the time when a patient is admitted
to the study, and at regular intervals of time thereafter. This variable is then
a time-dependent variable, and will be denoted X(t). It is then plausible that
the hazard of death for any particular patient, the ith, say, at time t, hi (t), is
more likely to be influenced by the value of the explanatory variable X(t) at
time t, than its value at the time origin, where t = 0.
Now suppose that the ith individual dies at time ti and that there are
two other individuals, labelled r and s, in the risk set at time ti . We further
suppose that individual r dies at time tr , where tr > ti , and that the survival
time of individual s, ts , is censored at some time after tr . The situation is
illustrated graphically in Figure 8.1. In this figure, the vertical dotted lines
refer to points in patient time when the value of X(t) is measured.
i D
r D
Individual
s C
0 ti tr ts
Time
If individuals r and s are the only two in the risk set at time ti , and X
is the only explanatory variable that is measured, the contribution of the ith
A MODEL WITH TIME-DEPENDENT VARIABLES 299
individual to the log-likelihood function in Expression (8.2) will be
∑
βxi (ti ) − log exp{βxl (ti )},
l
where xi (ti ) is the value of X(t) for the ith individual at their death time,
ti , and l in the summation takes the values i, r, and s. This expression is
therefore equal to
{ }
βxi (ti ) − log eβxi (ti ) + eβxr (ti ) + eβxs (ti ) .
This shows that the value of the time-dependent variable X(t) is needed at
the death time of the ith individual, and at time ti for individuals r and s. In
addition, the value of the variable X(t) will be needed for individuals r and s
at tr , the death time of individual r.
For terms in a model that are explicit functions of time, such as interac-
tions between time and a variable or factor measured at baseline, there is no
difficulty in obtaining the values of the time-dependent variables at any time
for any individual. Indeed, it is usually straightforward to incorporate such
variables in the Cox model when using statistical software that has facilities
for dealing with time-dependent variables. For other variables, such as serum
cholesterol level, the values of the time-dependent variable at times other than
that at which it was measured has to be approximated. There are then several
possibilities.
One option is to use the last recorded value of the variable before the
time at which the value of the variable is needed. When the variable has
been recorded for an individual before and after the time when the value is
required, the value closest to that time might be used. Another possibility is to
use linear interpolation between consecutive values of the variable. Figure 8.2
illustrates these approximations.
In this figure, the continuous curve depicts the actual value of a time-
dependent variable at any time, and the dotted vertical lines signify times
when the variable is actually measured. If the value of the variable is required
at time t in this figure, we could use either the value at P, the last recorded
value of the variable, the value at R, the value closest to t, or the value at Q,
the linearly interpolated value between P and R.
Linear interpolation is clearly not an option when a time-dependent vari-
able is a categorical variable. In addition, some categorical variables may be
such that individuals can only progress through the levels of the variable in a
particular direction. For example, the performance status of an individual may
only be expected to deteriorate, so that the value of this categorical variable
might only change from ‘good’ to ‘fair’ and from ‘fair’ to ‘poor’. As another
example, following a biopsy, a variable associated with the occurrence of a
tumour will take one of two values, corresponding to absence and presence. It
might then be very unlikely for the status to change from ‘present’ to ‘absent’
in consecutive biopsies.
300 TIME-DEPENDENT VARIABLES
Time-dependent variable
R
Q
t
Time
After a Cox regression model that includes time-dependent variables has been
fitted, the baseline hazard function, h0 (t) and the corresponding baseline sur-
vivor function, S0 (t), can be estimated. This involves an adaptation of the
results given in Section 3.10 of Chapter 3 to cope with the additional com-
plication of time-dependent variables, in which the values of the explanatory
variables need to be updated with their time-specific values. In particular, the
Nelson-Aalen estimate of the baseline cumulative hazard function, given in
A MODEL WITH TIME-DEPENDENT VARIABLES 301
present A
Biopsy result
absent B
0 tA tB
Time of biopsy
∑
k
dj
H̃0 (t) = − log S̃0 (t) = ∑ , (8.3)
j=1 l∈R(t(j) ) exp{β̂ ′ xl (t)}
where H̃0 (t) is the estimated baseline cumulative hazard function obtained
on fitting the Cox regression model with p possibly time-dependent variables
with values xji (t), j = 1, 2, . . . , p, for the ith individual, i = 1, 2, . . . , n, and
β̂j is the estimated coefficient of the jth time-dependent variable. This result
was given by Altman and De Stavola (1994).
Corresponding estimates of the conditional probability of an event in the
interval (t, t + h) are 1 − P̃i (t, t + h), and these quantities can be used to
obtain an estimate of the expected number of events in each of a number
of successive intervals of width h. Comparing these values with the observed
number of events in these intervals leads to an informal assessment of model
adequacy.
where xi (ti ) is the vector of values of explanatory variables for the ith in-
dividual, which may be time-dependent, evaluated at ti , the event time of
that individual. Also, β̂ is the vector of coefficients, δi is the event indicator
that takes the value unity if ti is an event and zero otherwise, and H̃0 (ti ) is
the estimated baseline cumulative hazard function at ti , obtained from Equa-
tion (8.3). The deviance residuals may also be computed from the martingale
residuals, using Equation (4.7) of Chapter 4.
304 TIME-DEPENDENT VARIABLES
The plots described in Section 4.2.2 of Chapter 4 will often be helpful.
In particular, an index plot of the martingale residuals will enable outlying
observations to be identified. However, diagnostic plots for assessing the func-
tional form of covariates, described in Section 4.2.3, turn out to be not so
useful when a time-dependent variable is being studied. This is because there
will then be a number of values of the time-dependent covariate for any one
individual, and it is not clear what the martingale residuals for the null model
should be plotted against.
For detecting influential values, the delta-betas, introduced in Section 4.3.1
of Chapter 4, provide a helpful means of investigating the effect of each obser-
vation on individual parameter estimates. Changes in the value of the −2 log L̂
statistic, on omitting each observation in turn, can give valuable information
about the effect of each observation on the set of parameter estimates.
exp( i+ 1+ 2 )
Hazard ratio
exp( i)
exp( i+ 1)
0 t1
Time
Figure 8.4 The hazard ratio exp{ηi + β1 + β2 e−β3 (t−t1 ) }, t > t1 , for individual i who
receives a transplant at t1 .
so that P late(t) = 0 for all t when a patient dies before platelet recovery.
We first fit a Cox proportional hazards model that contains the variables
associated with the age of the patient and donor, Page and Dage. When either
308 TIME-DEPENDENT VARIABLES
of these variables is added on their own or in the presence of the other, there
is no significant reduction in the value of the −2 log L̂ statistic.
The time-dependent variable P late(t) is now added to the null model. The
value of −2 log L̂ is reduced from 67.13 to 62.21, a reduction of 4.92 on 1 d.f.,
which is significant at the 5% level (P = 0.026). This suggests that time to
platelet recovery does affect survival. After allowing for the effects of this
variable, there is still no evidence that the hazard of death is dependent on
the age of the patient or donor.
The estimated coefficient of P late(t) in the model that contains this vari-
able alone is −2.696, and the fact that this is negative indicates that there is a
greater hazard of death at any given time for a patient whose platelets are not
at a normal level. The hazard ratio at any given time is exp(−2.696) = 0.067,
and so a patient whose platelets have recovered to normal at a given time
has about one-fifteenth the risk of death at that time. However, a 95%
confidence interval for the corresponding true hazard ratio is (0.006, 0.751),
which shows that the point estimate of the relative risk is really quite
imprecise.
To quantify the effect of disease group on survival, the change in −2 log L̂
when the factor Group is added to the model that contains the time-
THREE EXAMPLES 309
dependent variable P late(t) is 6.49 on 2 d.f., which is significant at the 5%
level (P = 0.039). The parameter estimates associated with disease group
show that the hazard of death is much greater for those suffering from ALL
and those in the high-risk group of AML sufferers. The hazard ratios for an
ALL patient relative to a low-risk AML patient is 7.97 and that for a high-risk
AML patient relative to a low-risk one is 11.77.
For the model that contains the factor Group and the time-dependent vari-
able P late(t), the estimated baseline cumulative hazard and survivor functions
are given in Table 8.2. These have been obtained using the estimate of the
baseline cumulative hazard function given in Equation (8.3).
In this table, H̃0 (t) and S̃0 (t) are the estimated cumulative hazard and
survivor functions for an individual with AML and for whom the platelet re-
covery indicator, Plate(t), remains at zero throughout the study. Also given in
this table are the values of the estimated survivor function for an individual
with ALL, but for whom Plate(t) = 1 for all values of t, denoted S̃1 (t). Since
the value of Plate(t) is zero for each patient at the start of the study, and
for most patients this changes to unity at some later point in time, these two
estimated survivor functions illustrate the effect of platelet recovery at any
specific time. For example, the probability of an ALL patient surviving be-
yond 97 days is only 0.52 if their platelets have not recovered to a normal level
by this time. On the other hand, if such a patient has experienced platelet
recovery by this time, they would have an estimated survival probability of
0.94. The estimated survivor function for an ALL patient whose platelet re-
covery status changes at some time t0 from 0 to 1 can also be obtained from
Table 8.2, since this will be S̃0 (t) for t 6 t0 and S̃1 (t) for t > t0 . Estimates of
310 TIME-DEPENDENT VARIABLES
the survivor function may also be obtained for individuals in the other disease
groups.
In this illustration, the data from two patients who died before their
platelet count had reached a normal level have a substantial impact on in-
ferences about the effect of platelet recovery. If patients 7 and 21 are omitted
from the database, the time-dependent variable is no longer significant when
added to the null model (P = 0.755). The conclusion about the effect of
platelet recovery time on survival is therefore dramatically influenced by the
data for these two patients.
ĥi (t) = exp{0.216 Age i − 0.664 Treat i − 0.0002 Age i t}h0 (t).
Under this model, the hazard of death at t for a patient of a given age on the
combined treatment (Treat = 2), relative to one of the same age on the single
treatment (Treat = 1), is exp(−0.664) = 0.52, which is not very different from
the value of 0.45 found using the model that does not contain the variable
Tage. However, the log-hazard ratio for a patient aged a2 years, relative to
one aged a1 years, is
0.216(a2 − a1 ) − 0.0002(a2 − a1 )t
at time t. This model therefore allows the log-hazard ratio for Age to be
linearly dependent on survival time.
The value of −2 log L̂ for the model that contains Age, Treat and Tage is
53.613. The change in −2 log L̂ on adding the variable Tage to a model that
contains Age and Treat is therefore 0.53, which is not significant (P = 0.465).
We therefore conclude that the time-dependent variable Tage is not in fact
needed in the model.
THREE EXAMPLES 311
Example 8.3 Data from a cirrhosis study
Although the data to be used in this example are artificial, it is useful to
provide a background against which these data can be considered. Suppose
therefore that 12 patients have been recruited to a study on the treatment of
cirrhosis of the liver. The patients are randomised to receive either a placebo or
a new treatment that will be referred to as Liverol. Six patients are allocated
to Liverol and six to the placebo. At the time when the patient is entered
into the study, the age and baseline value of the patient’s bilirubin level are
recorded. The natural logarithm of the bilirubin value (in µmol/l) will be used
in this analysis, and the variables measured are summarised below:
Patients are supposed to return to the clinic three, six and twelve months
after the commencement of treatment, and yearly thereafter. On these occa-
sions, the bilirubin level is again measured and recorded. Data are therefore
available on how the bilirubin level changes in each patient throughout the du-
ration of the study. Table 8.4 gives the values of the logarithm of the bilirubin
value at each time in the follow-up period for each patient.
In taking log(bilirubin) to be a time-dependent variable, the value of the
variate is that recorded at the most recent follow-up visit, for each patient.
In this calculation, the change to a new value will be assumed to take place
immediately after the reading was taken, so that patient 1, for example, is
assumed to have a log(bilirubin) value of 3.2 for any time t when t 6 47,
312 TIME-DEPENDENT VARIABLES
Both Age and Lbr appear to be needed in the model, although the evidence
for including Age as well as Lbr is not very strong. When Treat is added to
the model that contains Age and Lbr, the reduction in the value of −2 log L̂ is
5.182 on 1 d.f. This is significant at the 5% level (P = 0.023). The coefficient
of Treat is −3.052, indicating that the drug Liverol is effective in reducing the
hazard of death. Indeed, other things being equal, Liverol reduces the hazard
of death by a factor of 0.047.
We now analyse these data, taking the log(bilirubin) values to be time-
dependent. Let Lbrt be the time-dependent variate formed from the values of
log(bilirubin). The values of −2 log L̂ on fitting Cox regression models to the
data are then given in Table 8.6.
It is clear from this table that the hazard function depends on the time-
dependent variable Lbrt, and that after allowing for this, the effect of Age is
slight. We therefore add the treatment effect Treat to the model that contains
Lbrt alone. The effect of this is that −2 log L̂ is reduced from 12.050 to 10.676,
a reduction of 1.374 on 1 d.f. This reduction is not significant (P = 0.241)
leading to the conclusion that after taking account of the dependence of the
hazard of death on the evolution of the log(bilirubin) values, no treatment
effect is discernible.
314 TIME-DEPENDENT VARIABLES
The estimated hazard function for the ith individual is given by
where Lbr i (t) is the value of log(bilirubin) for this patient at time t. The
estimated ratio of the hazard of death at time t for two individuals on the same
treatment who have values of Lbr that differ by 0.1 units at t is e0.3605 = 1.43.
This means that the individual whose log(bilirubin) value is 0.1 units greater
has close to a 50% increase in the hazard of death at time t.
One possible explanation for the difference between the results of these
two analyses is that the effect of the treatment is to change the values of
the bilirubin level, so that after changes in these values over time have been
allowed for, no treatment effect is visible.
The baseline cumulative hazard function may now be estimated for the
model that contains the time-dependent variable Lbrt and Treat. The esti-
mated values of this function are tabulated in Table 8.7.
1.0
0.9
Estimated survivor function
0.8
0.7
0.6
0.5
0 200 400 600 800 1000 1200 1400 1600
Survival time
Figure 8.5 Estimated survivor function for a patient with Lbr = 3, for all t, who is
on placebo (—) or Liverol (·······).
P̃i (t, t+360) is obtained from Equation (8.5). The full set of results will not be
given here, but as an example, the estimated approximate conditional prob-
abilities of surviving through consecutive intervals of 360 days, for patients 1
and 7, are shown in Table 8.8.
Table 8.9 Data for the first patient in the cirrhosis study in the
counting process format.
Time interval Start Stop Status Treat Age Lbrt
(0, 47] 0 47 0 0 46 3.2
(47, 184] 47 184 0 0 46 3.8
(184, 251] 184 251 0 0 46 4.9
(251, 281] 251 281 1 0 46 5.0
FURTHER READING 317
The database now has four lines of data for Patient 1, and we proceed in
a similar manner for the data from the remaining patients. A Cox regression
model with the variables Age, Treat and the time-dependent variable Lbrt is
then fitted to the extended data set, which leads to the same results as those
given in Example 8.3.
In many studies where the response variable is a survival time, the exact time
of the event of interest will not be known. Instead, the event will be known
to have occurred during a particular interval of time. Data in this form are
known as grouped or interval-censored survival data.
Interval-censored data commonly arise in studies where there is a non-
lethal end-point, such as the recurrence of a disease or condition. However,
most survival analyses are based on interval-censored data, in the sense that
the survival times are often taken as the nearest day, week or month. In this
chapter, some methods for analysing interval-censored data will be described
and illustrated. Models in which specific assumptions are made about the form
of the underlying hazard function are considered in Sections 9.1 to 9.4, and
fully parametric models are discussed in Section 9.5.
319
320 INTERVAL-CENSORED SURVIVAL DATA
However, the data set used in this analysis will be based on a mixture of
recurrences detected at scheduled screening times, known as screen-detected
recurrences and recurrences diagnosed following the occurrence of symptoms
or interval-detected recurrences. This leads to a difficulty in interpreting the
results of the analysis.
To illustrate the problem, consider a study to compare two treatments for
suppressing the recurrence of an ulcer, a new and a standard treatment, say.
Also suppose that both treatments have exactly the same effect on the recur-
rence time, but that the new treatment suppresses symptoms. The recurrence
of an ulcer in a patient on the new treatment will then tend to be detected
later than that in a patient on the standard treatment. Therefore, interval-
detected recurrences will be identified sooner in a patient on the standard
treatment. The interval-detected recurrence times will then be shorter for this
group of patients, indicating an apparent advantage of the new treatment over
the standard.
If the time interval between successive screenings is short, relative to the
average time to recurrence, there will be few interval-detected recurrences.
Standard methods for survival analysis may then be used.
Table 9.1 Data on the recurrence of an ulcer following treatment for the
primary disease.
Patient Age Duration Treatment Time of last visit Result
1 48 2 B 7 2
2 73 1 B 12 1
3 54 1 B 12 1
4 58 2 B 12 1
5 56 1 A 12 1
6 49 2 A 12 1
7 71 1 B 12 1
8 41 1 A 12 1
9 23 1 B 12 1
10 37 1 B 5 2
11 38 1 B 12 1
12 76 2 B 12 1
13 38 2 A 12 1
14 27 1 A 6 2
15 47 1 B 6 2
16 54 1 A 6 1
17 38 1 B 10 2
18 27 2 B 7 2
19 58 2 A 12 1
20 75 1 B 12 1
21 25 1 A 12 1
22 58 1 A 12 1
23 63 1 B 12 1
24 41 1 A 12 1
25 47 1 B 12 1
26 58 1 A 3 2
27 74 2 A 2 2
28 75 2 A 6 1
29 72 1 A 12 1
30 59 1 B 12 2
31 52 1 B 12 1
32 75 1 B 12 2
33 76 1 A 12 1
34 34 2 A 6 1
35 36 1 B 12 1
36 59 1 B 12 1
37 44 1 A 12 2
38 28 2 B 12 1
39 62 1 B 12 1
40 23 1 A 12 1
41 49 1 B 12 1
42 61 1 A 12 1
43 33 2 B 12 1
322 INTERVAL-CENSORED SURVIVAL DATA
been omitted from the data set on the grounds that there is no information
about whether an ulcer has recurred in the first six months of the study. This
means that those patients in Table 9.1 whose last visit was greater than 6
months after randomisation would have had a negative endoscopy at 6 months.
In modelling the data from this study, duration of disease is denoted by
an indicator variable Dur, which is zero when the duration is less than 5 years
and unity otherwise. The treatment effect is denoted by a variable Treat,
which takes the value zero if an individual is on treatment A and unity if on
treatment B. The patient’s age is reflected in the continuous variate Age.
We first analyse the recurrence times in Table 9.1 ignoring the interval
censoring. The recurrence times of those patients who have not had a detected
recurrence by the time of their last visit are taken to be censored, and a Cox
regression model is fitted.
From the values of the −2 log L̂ statistic for different models, given in
Table 9.2, it is clear that neither age nor duration of disease are important
prognostic factors. Moreover, the reduction in −2 log L̂ on adding the treat-
ment effect to the model, adjusted or unadjusted for the prognostic factors, is
nowhere near significant.
The estimated coefficient of Treat in the model that contains Treat alone
is 0.189, and the standard error of this estimate is 0.627. The estimated haz-
ard of a recurrence under treatment B (Treat = 1), relative to that under
treatment A (Treat = 0), is therefore exp(0.189) = 1.21. The standard error
of the estimated hazard ratio is found using Equation (3.13) in Chapter 3,
and is 0.758. The fact that the estimated hazard ratio is greater than unity
gives a slight indication that treatment A is superior to treatment B, but not
significantly so.
and so
log[− log{1 − pi (ts )}] = ηi + log{− log S0 (ts )}.
Writing β0 = log{− log S0 (ts )}, the model can be expressed as
In this example, the effects of age, duration of disease and treatment group
have been modelled using the variates Age, Dur, and Treat, defined in Example
9.1. However, factors corresponding to duration and treatment could have been
used in conjunction with packages that allow factors to be included directly.
This would not make any difference to the deviances in Table 9.3, but it may
have an effect on the interpretation of the parameter estimates. See Sections
3.2 and 3.9 for fuller details.
It is clear from Table 9.3 that no variable reduces the deviance by a signif-
icant amount. For example, the change in the deviance on adding Treat to the
model that only contains a constant is 0.371, which is certainly not significant
when compared to percentage points of the chi-squared distribution on 1 d.f.
MODELLING RECURRENCE AT DIFFERENT TIMES 325
Approximately the same change in deviance is found when Treat is added to
the model that contains Age and Dur, showing that the treatment effect is of
a similar magnitude after allowing for these two variables. Moreover, there is
no evidence whatsoever of an interaction between treatment and the variables
Age and Dur.
On fitting a model that contains Treat alone, the estimated coefficient of
Treat is 0.378, with a standard error of 0.629. Thus, the ratio of the hazard
of a recurrence before 12 months in a patient on treatment B (Treat = 1),
relative to that for a patient on treatment A (Treat = 0), is exp(0.378) =
1.46. The risk of a recurrence in the year following randomisation is thus
greater under treatment B than it is under treatment A, but not significantly
so. This hazard ratio is not too different from the value of 1.21 obtained in
Example 9.1. The standard error of the estimated hazard ratio, again found
using Equation (3.13) in Chapter 3, is 0.918, which is also very similar to that
found in Example 9.1.
A 95% confidence interval for the log-hazard ratio has limits of 0.378 ±
1.96 × 0.629, and so the corresponding interval estimate for the hazard ratio
itself is (0.43, 5.01). Notice that this interval includes unity, a result which was
foreshadowed by the non-significant treatment effect.
The estimated constant term in this fitted model is −1.442. This is an
estimate of log{− log S0 (12)}, the survivor function at 12 months for a patient
on treatment A. The estimated probability of a recurrence after 12 months for
a patient on treatment A is therefore exp(−e−1.442 ) = 0.79. The corresponding
value for a patient on treatment B is 0.79exp(0.378) = 0.71. The probabilities of
a recurrence in the first 12 months are therefore 0.21 for a patient on treatment
A, and 0.29 for a patient on treatment B. This again shows that patients on
treatment B have a slightly higher probability of the recurrence of an ulcer in
the year following randomisation.
and
πij = P(tj−1 6 Ti < tj | Ti > tj−1 ),
for j = 1, 2, . . . , k.
We now consider individuals who have not had a detected recurrence by
the last examination time, tk . For these individuals, we define Ti to be the
random variable associated with the time to either a recurrence or death, and
the corresponding probability of a recurrence or death in the interval from
time tk is given by
∑
k
pi,k+1 = P(Ti > tk ) = 1 − pij .
j=1
for j = 1, 2, . . . , k.
The sample likelihood of the n(k + 1) values rij is
∏ ∏
n k+1
r
pijij ,
i=1 j=1
MODELLING RECURRENCE AT DIFFERENT TIMES 327
and on substituting for pij from Equation (9.3), the likelihood function be-
comes
∏ ∏
n k+1
{(1 − πi1 ) · · · (1 − πi,j−1 )πij }rij .
i=1 j=1
which reduces to
∏
n
r
∏
k
r
i,k+1
πi,k+1 πijij (1 − πij )sij . (9.4)
i=1 j=1
This is the likelihood function for nk observations rij from a binomial dis-
tribution with response probability πij , and where the binomial denominator
is rij + sij . This denominator is equal to unity when a patient is at risk of
having a detected recurrence after time tj , and zero otherwise. In fact, the
denominator is zero when both rij and sij are equal to zero, and the like-
lihood function in Expression (9.5) is unaffected by observations for which
rij + sij = 0. Data records for which the binomial denominator is zero are
therefore uninformative, and so they can be omitted from the data set. If
there are m observations remaining after these deletions, so that m 6 nk, the
likelihood function in Expression (9.5) is that of m observations from binomial
distributions with parameters 1 and πij , in other words, m observations from
a Bernoulli distribution.
The next step is to note that for the ith patient,
so that
Si (tj )
1 − πij = .
Si (tj−1 )
Adopting a proportional hazards model for the recurrence times, the hazard
of a recurrence being detected at time tj in the ith individual can be expressed
as
hi (tj ) = exp(ηi )h0 (tj ),
where h0 (tj ) is the baseline hazard at tj , and ηi is the risk score for the ith
individual. Notice that this assumption means that the hazards need only
be proportional at the scheduled screening times tj , and not at intermediate
328 INTERVAL-CENSORED SURVIVAL DATA
times. This is less restrictive than the usual proportional hazards assumption,
which requires that hazards be proportional at every time.
Using the result in Equation (9.1),
{ }exp(ηi )
S0 (tj )
1 − πij = ,
S0 (tj−1 )
Consequently,
log{− log(1 − πij )} = ηi + log [− log {S0 (tj )/S0 (tj−1 )}]
= ηi + γj ,
where γj is the effect of the jth period, j = 1, 2, and Treati is the value of
the indicator variable Treat, for the ith individual. This variable is zero if that
patient is on treatment A and unity otherwise.
330 INTERVAL-CENSORED SURVIVAL DATA
Table 9.4 Modified data on the recurrence of an ulcer in two periods, for
the first 18 patients.
Patient Age Duration Treat- Time of Result Period R
ment last visit
1 48 2 B 7 2 1 0
1 48 2 B 7 2 2 1
2 73 1 B 12 1 1 0
2 73 1 B 12 1 2 0
3 54 1 B 12 1 1 0
3 54 1 B 12 1 2 0
4 58 2 B 12 1 1 0
4 58 2 B 12 1 2 0
5 56 1 A 12 1 1 0
5 56 1 A 12 1 2 0
6 49 2 A 12 1 1 0
6 49 2 A 12 1 2 0
7 71 1 B 12 1 1 0
7 71 1 B 12 1 2 0
8 41 1 A 12 1 1 0
8 41 1 A 12 1 2 0
9 23 1 B 12 1 1 0
9 23 1 B 12 1 2 0
10 37 1 B 5 2 1 1
11 38 1 B 12 1 1 0
11 38 1 B 12 1 2 0
12 76 2 B 12 1 1 0
12 76 2 B 12 1 2 0
13 38 2 A 12 1 1 0
13 38 2 A 12 1 2 0
14 27 1 A 6 2 1 1
15 47 1 B 6 2 1 1
16 54 1 A 6 1 1 0
17 38 1 B 10 2 1 0
17 38 1 B 10 2 2 1
18 27 2 B 7 2 1 0
18 27 2 B 7 2 2 1
The estimated coefficient of Treat in this model is 0.195 and the standard
error of this estimate is 0.626. The hazard of a recurrence on treatment B at
any given time, relative to that on treatment A, is exp(0.195) = 1.21. Since
this exceeds unity, there is the suggestion that the risk of recurrence is less
on treatment A than on treatment B, but the evidence for this is not statis-
tically significant. The standard error of the estimated hazard ratio is 0.757.
For comparison, from Example 9.2, the estimated hazard ratio at 12 months
was found to be 1.46, with a standard error of 0.918. These values are very
similar to those obtained in this example. Moreover, the results of analyses
MODELLING RECURRENCE AT DIFFERENT TIMES 331
and from this, π̂A1 = 0.104. Other fitted probabilities can be calculated in a
similar manner, and the results of these calculations are shown in Table 9.6.
The corresponding observed proportions of individuals with a recurrence for
each combination of treatment and period are also displayed. The agreement
between the observed and fitted probabilities is good, which indicates that the
model is a good fit.
P{recurrence in (0, 6)} + P{recurrence in (6, 12) and no recurrence in (0, 6)}.
The joint probability of a recurrence in (6, 12) and no recurrence in (0, 6) can
be expressed as
P{recurrence in (6, 12) | no recurrence in (0, 6)} × P{no recurrence in (0, 6)},
332 INTERVAL-CENSORED SURVIVAL DATA
and so the required probability is estimated by
∏
l ∏
l+r ∏
n
{1 − Si (bi )} Si (ai ) {Si (ai ) − Si (bi )},
i=1 i=l+1 i=l+r+1
∏
l ∏
l+r ∏
n
{1 − Si (bi )} Si (ai ) Si (ai ){1 − Si (bi )/Si (ai )}. (9.8)
i=1 i=l+1 i=l+r+1
∏
l ∏
l
pi = {1 − Si (bi )},
i=1 i=1
∏
l+r ∏
l+r
(1 − pi ) = Si (ai ),
i=l+1 i=l+1
and this is the second term in Expression (9.8). The situation is a little more
complicated for an observation that is confined to the interval (ai , bi ], since
two binary observations are needed to give the required component of Expres-
sion (9.8). One of these is taken to have yi = 0, pi = 1−Si (ai ), while the other
334 INTERVAL-CENSORED SURVIVAL DATA
is such that yc+i = 1, pc+i = 1−{Si (bi )/Si (ai )}, for i = l+r+1, l+r+2, . . . , n.
Combining these two terms leads to a component of the likelihood in Expres-
sion (9.9) of the form
∏
n
(1 − pi )pc+i ,
i=l+r+1
which corresponds to
∏
n
Si (ai ){1 − Si (bi )/Si (ai )}
i=l+r+1
in Expression (9.8).
This shows that by suitably defining a set of n + c binary observations,
with response probabilities expressed in terms of the survivor functions for the
three possible forms of interval-censored observation, the likelihood function
in Expression (9.9) is equivalent to that in Expression (9.8). Accordingly,
maximisation of the log-likelihood function for n + c binary observations is
equivalent to maximising the log-likelihood for the interval-censored data.
where S0 (t) is the baseline survivor function and xi is the vector of values of
p explanatory variables for the ith individual, i = 1, 2, . . . , n, with coefficients
that make up the vector β.
The baseline survivor function will be modelled as a step-function, where
the steps occur at the k ordered censoring times, t(1) , t(2) , . . . , t(k) , where 0 <
t(1) < t(2) < · · · < t(k) , which are a subset of the times at which observations
are interval-censored. This means that the t(j) , j = 1, 2, . . . , k, are a subset
of the values of ai and bi , i = 1, 2, . . . , n. Exactly how these times are chosen
will be described later in Section 9.4.3.
We now define
S0 (t(j−1) )
θj = log ,
S0 (t(j) )
where t(0) = 0, so that θj > 0, and at times t(j) , we have
for j = 1, 2, . . . , k.
ARBITRARILY INTERVAL-CENSORED SURVIVAL DATA 335
Since the first step in the baseline survivor function occurs at t(1) , S0 (t) = 1
for 0 6 t < t(1) . From time t(1) , the baseline survivor function, using Equa-
tion (9.11), has the value S0 (t(1) ) = exp(−θ1 )S0 (t(0) ), which, since t(0) = 0,
means that S0 (t) = exp(−θ1 ), for t(1) 6 t < t(2) . Similarly, from time t(2) ,
the survivor function is exp(−θ2 )S0 (t(1) ), that is S0 (t) = exp{−(θ1 + θ2 )},
t(2) 6 t < t(3) , and so on, until S0 (t) = exp{−(θ1 + θ2 + · · · + θk )}, t > t(k) .
Consequently,
( j )
∑
S0 (t) = exp − θr , (9.12)
r=1
for t(j) 6 t < t(j+1) , and so the baseline survivor function, at any time ti , is
given by
∑
k
S0 (ti ) = exp − θj dij , (9.13)
j=1
where {
1 if t(j) 6 ti ,
dij =
0 if t(j) > ti ,
for j = 1, 2, . . . , k. The quantities dij will be taken to be the values of k
indicator variables, D1 , D2 , . . . , Dk , for the ith observation in the augmented
data set. Note that the values of the Dj , j = 1, 2, . . . , k, will differ at each
observation time, ti .
Combining the results in Equations (9.10) and (9.13), the survivor function
for the ith individual, at times ai , bi , can now be obtained. In particular,
{ ( ∑ )}exp(β ′ xi )
′ k
exp(β xi )
Si (ai ) = S0 (ai ) = exp − θj dij ,
j=1
where dij = 1 if t(j) 6 ai , and dij = 0, otherwise. An expression for Si (bi ) can
be obtained in a similar way, leading to
{ ∑k }
Si (bi ) = exp − exp(β ′ xi ) θj dij ,
j=1
where the values d1ij in the numerator are equal to unity if t(j) 6 bi , and
zero otherwise, and the values d2ij in the denominator are equal to unity if
t(j) 6 ai , and zero otherwise. Consequently, the θ-terms in the numerator for
which t(j) 6 ai cancel with those in the denominator, and this leaves
{ ∑k }
pc+i = 1 − exp − exp(β ′ xi ) θj dij ,
j=1
where {
1 if t(j) is in the interval Ai ,
dij =
0 otherwise,
for j = 1, 2, . . . , k, and the intervals Ai are as shown in Table 9.7.
Confined 0 (0, ai ], i = l + r + 1, l + r + 2, . . . , n
1 (ai−c , bi−c ], i = n + 1, n + 2, . . . , n + c
for t(j) 6 t < t(j+1) , j = 1, 2, . . . , k, where t(k+1) = ∞, and θ̂j is the estimated
value of θj . The estimated survivor function for the ith individual follows from
′
Ŝi (t) = Ŝ0 (t)exp(β̂ xi )
{ ∑j }
= exp − exp(β̂ ′ xi ) θ̂r , (9.16)
r=1
Table 9.8 Data on the time in months to breast retraction in patients with
breast cancer.
Radiotherapy Radiotherapy and Chemotherapy
(45, ∗ ] (25, 37] (37, ∗ ] (8, 12] (0, 5] (30, 34]
(6, 10] (46, ∗ ] (0, 5] (0, 22] (5, 8] (13, ∗ ]
(0, 7] (26, 40] (18, ∗ ] (24, 31] (12, 20] (10, 17]
(46, ∗ ] (46, ∗ ] (24, ∗ ] (17, 27] (11, ∗ ] (8, 21]
(46, ∗ ] (27, 34] (36, ∗ ] (17, 23] (33, 40] (4, 9]
(7, 16] (36, 44] (5, 11] (24, 30] (31, ∗ ] (11, ∗ ]
(17, ∗ ] (46, ∗ ] (19, 35] (16, 24] (13, 39] (14, 19]
(7, 14] (36, 48] (17, 25] (13, ∗ ] (19, 32] (4, 8]
(37, 44] (37, ∗ ] (24, ∗ ] (11, 13] (34, ∗ ] (34, ∗ ]
(0, 8] (40, ∗ ] (32, ∗ ] (16, 20] (13, ∗ ] (30, 36]
(4, 11] (17, 25] (33, ∗ ] (18, 25] (16, 24] (18, 24]
(15, ∗ ] (46, ∗ ] (19, 26] (17, 26] (35, ∗ ] (16, 60]
(11, 15] (11, 18] (37, ∗ ] (32, ∗ ] (15, 22] (35, 39]
(22, ∗ ] (38, ∗ ] (34, ∗ ] (23, ∗ ] (11, 17] (21, ∗ ]
(46, ∗ ] (5, 12] (36, ∗ ] (44, 48] (22, 32] (11, 20]
(46, ∗ ] (14, 17] (10, 35] (48, ∗ ]
their final visit. For these patients, the upper limit of the time interval is shown
as an asterisk (∗) in Table 9.8. The observations for these patients are therefore
right-censored and so r = 38. The remaining c = 51 patients experience breast
retraction within confined time intervals, and the total number of observations
is n = l + c + r = 94.
The first step in fitting a model to these arbitrarily interval-censored data
is to expand the data set by adding a further 51 lines of data, repeating that
for the patients whose intervals are confined, so that the revised database has
n + c = 145 observations. The values, yi , of the binary response variable, Y ,
are then added. These are such that Y = 1 for a left-censored observation, and
Y = 0 for a right-censored observation. For confined observations, where the
data are duplicated, one of the pairs of observations has Y = 0 and the other
Y = 1. The treatment effect will be represented by the value of a variable
labelled Treat, which will be zero for a patient on radiotherapy and unity for
a patient on radiotherapy and chemotherapy. For illustration, the values of
the binary response variable, Y , are shown for the first three patient treated
with radiotherapy alone, in Table 9.9.
Table 9.10 Database for binary data analysis, for the first three patients on
radiotherapy.
Patient Treat Y D1 D2 D3 D4 D5 D6 D7 D8 D9
1 0 0 1 1 1 1 1 1 1 1 0
2 0 0 1 0 0 0 0 0 0 0 0
2 0 1 0 1 0 0 0 0 0 0 0
3 0 1 1 0 0 0 0 0 0 0 0
We now have the database to which a non-linear model for the binary
response data in Y is fitted. The model is such that we take Y to have a
Bernoulli distribution with response probability as given in Equation (9.14),
that is a binomial distribution with parameters 1, pi . Here, the SAS proce-
dure proc nlmixed has been used to fit the non-linear model to the binary
data.
On fitting the null model, that is the model that contains all 9 D-variables,
but not the treatment effect, the value of the statistic −2 log L̂ is 285.417. On
adding Treat to the model, the value of −2 log L̂ is reduced to 276.983. This
reduction of 8.43 on 1 d.f. is significant at the 1% level (P = 0.0037), and
so we conclude that the interval-censored data do provide strong evidence
of a treatment effect. The estimated coefficient of Treat is 0.8212, with a
standard error of 0.2881. The corresponding hazard ratio for a patient on
the combination of radiotherapy and chemotherapy, relative to a patient on
radiotherapy alone, is exp(0.8212) = 2.27. The interpretation of this is that
patients on the combined treatment have just over twice the risk of breast
ARBITRARILY INTERVAL-CENSORED SURVIVAL DATA 341
retraction, compared to patients on radiotherapy. A 95% confidence interval
for the corresponding true hazard ratio has limits exp(0.8212 ± 1.96 × 0.2881),
which leads to the interval (1.29, 4.00).
The minimal subset of times at which the estimated baseline survivor
function is estimated can be enlarged by adding additional censoring times
from the data set. However, there are no additional times that lead to a
significant reduction in the value of the −2 log L̂ statistic, with the estimated
θ-parameters remaining positive.
The estimated values of the coefficients of the D-variables are the values θ̂j ,
for j = 1, 2, . . . , 9, and these can be used to provide an estimate of the survivor
function for the two treatment groups. Equation (9.15) gives the form of the
estimated baseline survivor function, which is the estimated survivor function
for the patients on radiotherapy alone. The corresponding estimate for the pa-
tients who receive adjuvant chemotherapy is obtained using Equation (9.16),
and is just {Ŝ0 (t(j) )}exp(β̂) , with β̂ = 0.8212. On fitting the model that con-
tains the treatment effect and the 9 D-variables, the estimated values of θj ,
for j = 1, 2, . . . , 9, are 0.0223, 0.0603, 0.0524, 0.0989, 0.1620, 0.0743, 0.1098,
0.2633 and 0.4713, respectively. From these values, the baseline survivor func-
tion, at the times t(j) , j = 1, 2, . . . , 9, can be estimated, and this estimate
is given as Ŝ0 (t(j) ) in Table 9.11. Also given in this table is the estimated
survivor function for patients on the combined treatment, denoted Ŝ1 (t(j) ).
The survivor functions for the two groups of patients are shown in Figure 9.1.
From the estimated survivor functions, the median time to breast retraction
for patients on radiotherapy is estimated to be 39 months, while that for
patients who received adjuvant chemotherapy is 23 months. More precise esti-
mates of these median times could be obtained if a greater number of censoring
times were used in the analysis.
342 INTERVAL-CENSORED SURVIVAL DATA
1.0
Estimated survivor function
0.8
0.6
0.4
0.2
0.0
0 10 20 30 40 50
Time to retraction
Figure 9.1 Estimated survivor functions for a patient on radiotherapy (—) and the
combination of radiotherapy and chemotherapy (·······).
∏
l ∏
l+r ∏
n
{1 − Si (bi )} Si (ai ) {Si (ai ) − Si (bi )}. (9.17)
i=1 i=l+1 i=l+r+1
If a parametric model for the survival times is assumed, then Si (t) has a fully
parametric form. For example, if the survival times have a Weibull distribu-
tion with scale parameter λ and shape parameter γ, from Equation (5.37) in
Section 5.6 of Chapter 5, the survivor function for the ith individual is
{ }
Si (t) = exp − exp(β ′ xi )λtγ ,
9.6 Discussion
Frailty models
345
346 FRAILTY MODELS
will depend on many factors, including the type of cancer and stage of the tu-
mour, characteristics of the patient such as their age, weight and lifestyle, and
the manner in which the cancer is treated. A group of patients who have the
same values of each measured explanatory variable may nevertheless be ob-
served to have different survival times. This variation or heterogeneity between
individuals may arise because some individuals are not as strong as others in
ways that cannot be summarised in terms of a relatively small number of
known variables. Indeed, we can never aspire to know what all the factors are
that may have an effect on the survival of cancer patients, let alone be able
to measure them, but we can take account of them in a modelling process.
Variation between the survival times of a group of individuals can be de-
scribed in terms of some individuals being more frail than others. Those who
have higher values of a frailty term tend to die sooner than those who are
less frail. However, the extent of an individual’s frailty cannot be measured
directly; if it could, we might attempt to include it in a model for survival
times. Instead, we only observe the impact that the frailty effects have on the
observable survival times.
1.0
Estimated survivor function
0.8
0.6
0.4
0.2
0.0
0 50 100 150 200 250
Survival time
Figure 10.1 Observed and fitted Weibull survivor functions for women with positive
(·······) and negative (—) staining.
The results in Equations (10.4) and (10.5) show how the mean and variance
of a lognormal frailty distribution can be found from the variance, σu2 , of the
corresponding normally distributed random effect. An alternative approach is
to take Ui to have a normal distribution with mean −σu2 /2 and variance σu2 .
Although Zi would then have a mean of unity, this formulation is not widely
used in practice.
A distribution that is very similar to the lognormal distribution is the
gamma distribution, described in Section 6.1.3, and so this is an alternative
distribution for the frailty random variable, Zi . This frailty distribution is
sometimes more convenient to work with than the lognormal model, and can
be used to investigate some of the consequences of introducing a frailty effect,
which we do in Section 10.3.
MODELLING INDIVIDUAL FRAILTY 351
If we take Zi to have a gamma distribution with both unknown parameters
equal to θ, and density
θθ ziθ−1 e−θzi
f (zi ) = , θ > 0,
Γ(θ)
for zi > 0, so that Zi ∼ Γ(θ, θ), this frailty distribution has a mean of unity
and a variance of 1/θ. The larger the value of θ, the smaller the frailty variance,
and as θ → ∞, the frailty variance tends to zero, corresponding to the case
where the zi are all equal to unity and there is no frailty. The corresponding
distribution of Ui = exp(Zi ) has density
θθ eθui exp(−θeui )
f (ui ) = , θ > 0, (10.6)
Γ(θ)
for −∞ < ui < ∞, and Ui is said to have an exp-gamma distribution. This
distribution is also referred to as the log-gamma distribution, but this nomen-
clature is inconsistent with the definition of a lognormal distribution. The
modal value of Ui is zero, but the distribution is asymmetric and the mean
and variance of Ui are expressed in terms of digamma and trigamma functions.
Specifically,
E (Ui ) = Ψ(θ) − log θ, (10.7)
where the digamma function, Ψ(θ), can be obtained from the series expansion
∞
∑ θ−1
Ψ(θ) = −λ + ,
j=0
(1 + j)(θ + j)
where f (zi ) is the density function of the frailty distribution. The quan-
tity Si∗ (t) is the unconditional or observable survivor function, and Equa-
tion (10.10) defines an unconditional model for the survival times.
Once Si∗ (t) has been obtained for a particular model, the corresponding
observable hazard function, h∗i (t), can be found using the relationship in Equa-
tion (1.5) of Chapter 1, so that
d
h∗ (t) = − {log S ∗ (t)}. (10.11)
dt
In general the integral in Equation (10.10) has to be evaluated numerically, but
as we shall see in the following section, this can be accomplished analytically
in the special case of gamma frailty effects.
′
where yi = {θ+eβ xi H0 (t)}zi . Then, from the definition of a gamma function,
∫ ∞
yiθ−1 e−yi dyi = Γ(θ),
0
and so ′
Si∗ (t) = {1 + θ−1 eβ xi H0 (t)}−θ . (10.12)
Equation (10.11) leads to the observable hazard function, given by
′
eβ xi h0 (t)
h∗i (t) = ′ . (10.13)
1 + θ−1 eβ xi H0 (t)
θλ
h∗ (t) = ,
θ + λt
which declines non-linearly from the value λ when t = 0 to zero. The presence
of frailty effects therefore provides an alternative explanation for a decreasing
hazard function. Specifically, the overall hazard of an event might actually be
constant, but heterogeneity between the survival times of the individuals in
the study will mean that a decline in hazard is observed.
Similar features are found if we assume that the underlying baseline hazard
is dependent upon time. In the case where the underlying baseline hazard is
Weibull, with h0 (t) = λγtγ−1 , and where there are no covariates,
θλγtγ−1
h∗ (t) = .
θ + λtγ
354 FRAILTY MODELS
6 =
5
Observable hazard function
2
=5
1
=2
0 =0.5
0 2 4 6 8 10
Time
Figure 10.2 Observable hazard function for Weibull baseline hazards with h0 (t) = 3t2
and gamma frailty distributions with θ = 0.5, 2, 5 and ∞.
a non-linear function of t.
To illustrate the dependence of the observable hazard ratio on time, Fig-
ure 10.3 shows the hazard ratio ψ ∗ (t) for a Weibull baseline hazard with λ = 1,
γ = 3, ψ = eβ = 3 and various values of θ.
3 =
Observable hazard ratio
=5
=2
1 =0.5
0
0.0 0.5 1.0 1.5 2.0
Survival time
Figure 10.3 Observable hazard ratio for Weibull baseline hazard with h0 (t) = 3t2 ,
β = log 3, and gamma frailty distributions with θ = 0.5, 2, 5 and ∞.
The hazard ratio has a constant value of 3 when there is no frailty, that is
when θ = ∞, but when the frailty variance exceeds zero, the observable hazard
ratio declines over time. This dependence of the hazard ratio on time could
be used to account for any observed non-proportionality in the hazards.
These results show that the inclusion of a random effect in a Weibull pro-
portional hazards regression model can lead to hazards not being proportional,
and to a non-monotonic hazard function. Although this has been illustrated
using a Weibull model with gamma frailty, the conclusions drawn apply more
generally. This means that models that include random effects provide an al-
ternative way of modelling data where the hazard function is unimodal, or
where the hazards are not proportional.
356 FRAILTY MODELS
10.4 Fitting parametric frailty models
Fitting the model in Equation (10.3) entails estimating the values of the co-
efficients of the explanatory variables, the variance of frailty distribution and
the baseline hazard function. Models in which h0 (t) is fully specified, such as
the Weibull proportional hazards model or an accelerated failure time model,
can be fitted using the method of maximum likelihood. Denote the observed
survival data by the pairs (ti , δi ), i = 1, 2, . . . , n, where ti is the survival time
and δi is an event indicator, which takes the value zero for a censored obser-
vation and unity for an event. If the random effects ui in Equation (10.3) had
known values, the likelihood function would be
∏
n
{hi (ti )}δi Si (ti ), (10.15)
i=1
n ∫
∏ ∞
{hi (ti )}δi Si (ti )f (ui ) dui , (10.17)
i=1 0
n ∫
∏ ∞
{hi (ti )}δi Si (ti )f (zi ) dzi . (10.18)
i=1 0
P (B | A)P (A)
P(A | B) = . (10.19)
P (B)
FITTING PARAMETRIC FRAILTY MODELS 357
We now write L(ti | ui ) for the likelihood of the ith event time ti when the
random effect ui is regarded as fixed, which from combining Equation (10.16)
with Expression (10.15), is
L(ti | ui ) = {exp(β ′ xi + ui )h0 (ti )}δi exp{− exp(β ′ xi + ui )H0 (ti )}.
L(ti | ui )f (ui )
π(ui | ti ) = , (10.20)
P (ti )
∏n ∫ ∞
θθ ′ ′
{eβ xi h0 (ti )}δi ziθ+δi −1 exp{−[θ + eβ xi H0 (ti )]zi } dzi .
i=1
Γ(θ) 0
θr y r−1 e−θy
f (y) = , (10.22)
Γ(r)
and so ∫ ∞
Γ(r)
y r−1 e−θy dy = ,
0 θr
from which
∫ ∞
′ Γ(θ + δi )
ziθ+δi −1 exp{−[θ + eβ xi H0 (ti )]zi } dzi = ′ .
0 {θ + e xi H0 (ti )}θ+δi
β
∏n
θθ ′ Γ(θ + δi )
{eβ xi h0 (ti )}δi ′ .
i=1
Γ(θ) {θ + e xi H0 (ti )}θ+δi
β
∑
n
{θ log θ − log Γ(θ) + log Γ(θ + δi ) + δi [β ′ xi + log h0 (ti )]}
i=1
∑
n
′
− (θ + δi ) log{θ + eβ xi H0 (ti )}, (10.23)
i=1
′ ′ θθ ziθ−1 e−θzi
[zi eβ xi h0 (ti )]δi exp{−zi eβ xi H0 (ti )} .
Γ(θ)
Ignoring terms that do not involve zi , the posterior density of Zi , π(zi | ti ), is
proportional to
′
ziθ+δi −1 exp{−[θ + eβ xi H0 (ti )]zi }.
From the general form of a two-parameter gamma density function, shown in
Equation (10.22), it follows that the posterior distribution of Zi is a gamma
′
distribution with parameters θ + δi and θ + eβ xi H0 (ti ). Then since the ex-
pected value of a gamma random variable with parameters r, θ is r/θ, the
expectation of Zi given the data, is
θ + δi
E (Zi | ti ) = ′ .
θ + e xi H0 (ti )
β
From this, an estimate of the frailty effect for the ith individual is
θ̂ + δi
ẑi = ′
,
θ̂ + eβ̂ xi Ĥ0 (ti )
where Ĥ0 (t) is the estimated baseline cumulative hazard function. Similarly,
the variance of a gamma random variable is r/θ2 and so
θ + δi
var (Zi | ti ) = ′ .
{θ + e xi H0 (ti )}2
β
The estimated variance of Zi reduces to ẑi2 /(θ̂ + δi ) and so the standard error
of ẑi is given by
ẑi
se (ẑi ) = √ .
(θ̂ + δi )
Interval estimates for the frailty terms can then be found, and the ratio of
zˆi to its standard error can be compared to percentage points of a standard
normal distribution to give a P -value for a test of the hypothesis that the ith
frailty effect is zero. However, the results of a series of unplanned hypothesis
tests about frailty effects must be adjusted to allow for repeated significance
testing, for example by using the Bonferroni correction. With this correction,
the P -value for the frailty terms of n individuals are each divided by n before
interpreting the significance levels.
As this analysis shows, working with gamma frailties is mathematically
straightforward and leads to closed form estimates of many useful quantities.
In other situations, numerical methods are needed to evaluate the summary
statistics of the conditional distribution of the frailty given the data.
360 FRAILTY MODELS
Example 10.2 Prognosis for women with breast cancer
When a frailty term with a gamma distribution is included in a Weibull model
for the survival times of women with negatively and positively stained tu-
mours, outlined in Example 10.1, the resulting hazard function is hi (t) =
zi exp(βxi )λγtγ−1 , and the corresponding fitted survivor function is
The variance of the underlying gamma distribution for the frailty is θ̂−1 =
3.40, and estimates of the frailty effects follow from the method outlined in
this section.
In the presence of frailty, β̂ = 2.298, and the hazard ratio for a positively
stained woman relatively to one who is negatively stained is 9.95. However,
the 95% confidence interval for this estimate ranges from 0.86 to 115.54, re-
flecting the much greater uncertainty about the staining effect when account
is taken of frailty. It is also important to note that the estimated hazard ratio
is conditional on frailty, and so refers to a woman with a specific frailty value.
The revised estimates of λ and γ are 0.00008 and 1.9091, respectively, and
the individual frailty values are shown in Table 10.1, alongside the observed
survival times. Note that women with the shortest survival times have the
largest estimated frailty effects.
Table 10.1 Survival times of women with tumours that were negatively or positively
stained and corresponding gamma frailties.
Negative staining Positive staining
Survival time Frailty Survival time Frailty Survival time Frality
23 3.954 5 4.147 68 0.444
47 3.050 8 3.827 71 0.412
69 2.290 10 3.579 76* 0.083
70* 0.514 13 3.192 105* 0.047
71* 0.507 18 2.581 107* 0.045
100* 0.348 24 1.981 109* 0.044
101* 0.344 26 1.816 113 0.179
148 0.888 26 1.816 116* 0.039
181 0.646 31 1.472 118 0.166
198* 0.127 35 1.254 143 0.116
208* 0.117 40 1.038 154* 0.023
212* 0.113 41 1.001 162* 0.021
224* 0.103 48 0.788 188* 0.016
50 0.739 212* 0.013
59 0.564 217* 0.012
61 0.534 225* 0.011
* Censored survival times.
The individual estimates of the survivor functions are shown in Figure 10.4.
This figure illustrates the extent of variation in the estimated survivor func-
tions that stems from the frailty effect, and shows that there is some separation
FITTING PARAMETRIC FRAILTY MODELS 361
0.8
0.6
0.4
0.2
0.0
0 50 100 150 200 250
Survival time
Figure 10.4 Fitted Weibull survivor functions for women with positive (·······) and
negative (—) staining.
in the estimates for those women whose tumours were positively or negatively
stained.
The median survival times, found using Equation (10.21), vary from 55
to 372 months for women in the negatively stained group, and from 16 to
355 months for those in the positively stained group. Of the 32 women with
positively stained tumours, 18 have estimated median survival times that are
less than any of those in the negatively stained group. This confirms that once
allowance is made for frailty effects, differences between survival times in the
two groups of women are not as pronounced.
The observable survivor function under this model, from Equation (10.12),
can be estimated by {1 + θ̂−1 Ĥ0 (t)}−θ̂ for a woman with a negatively stained
tumour and {1 + θ̂−1 eβ̂ Ĥ0 (t)}−θ̂ for one with a positively stained tumour.
These functions are shown in Figure 10.5, superimposed on the corresponding
Kaplan-Meier estimate of the survivor functions. Comparing this with Fig-
ure 10.1, we see that once allowance is made for frailty, the Weibull model is
a much better fit to the observed survival times. This figure also provides a
visual confirmation of the suitability of the fitted model.
The observable, or unconditional, hazard ratio, for a woman with a pos-
itively stained tumour relative to one whose tumour was negatively stained,
derived from Equation (10.14), is given in Figure 10.6. This figure shows how
the observable hazard ratio varies over time; the hazard is much greater at
earlier times, but declines quite rapidly. This is due to the selection effect of
frailty, whereby women who are more frail die sooner. There are also many
362 FRAILTY MODELS
1.0
Estimated survivor function
0.8
0.6
0.4
0.2
0.0
0 50 100 150 200 250
Survival time
Figure 10.5 Fitted survivor functions for the Weibull gamma frailty model, with the
corresponding observed survivor functions, for women with positive (·······) and negative
(—) staining.
10
8
Hazard ratio
0
0 50 100 150 200 250
Survival time
Figure 10.6 Ratio of the hazard of death for a women with a positively stained tumour
relative to one whose tumour is negatively stained.
FITTING SEMI-PARAMETRIC FRAILTY MODELS 363
more early deaths in the positively stained group, which is why the observed
hazard ratio varies in this way.
where δi is the event indicator and R(ti ) is the set of patients at risk of death
at time ti . The random effects, ui , are the observed values of frailty random
variables, Ui = exp(Zi ), where Zi is usually taken to have either a lognormal
or a gamma distribution.
where trace (I −1
u )
is the sum of the diagonal elements of the inverse of I u .
The standard error of this estimate is
[ ]−1/2
√ 2 1 −1 −1 1 −1
se (σ̂u ) = (2σ̂u ) n + 4 trace (I u I u ) − 2 2 trace (I u )
2
.
σ̂u σ̂u
Maximum likelihood estimates of variances tend to be biased, and instead
estimates based on the method of restricted maximum likelihood (REML) es-
timation are preferred. For example, in the case of estimating the variance
of a single sample of observations, x1 , x2 , . . . , xn , from a normal distribu-
tion,∑
the maximum likelihood estimate of the variance is the biased estimate
n−1 i (xi − x̄)2 whereas the corresponding
∑ REML estimate turns out to be
the usual unbiased estimate (n − 1)−1 i (xi − x̄)2 .
REML estimates are obtained from a likelihood function that is indepen-
dent of β, and the REML estimate of the variance of the random frailty term
is { n }
∑
2 −1 2
σ̃u = n ũi + trace (Ṽ u ) ,
i=1
FITTING SEMI-PARAMETRIC FRAILTY MODELS 365
where Ṽ u is the estimated variance-covariance matrix of the REML estimates
of the ui , ũi . The trace of this matrix is just the sum of the estimated vari-
ances of the ũi . This is the preferred estimate of the variance of a normally
distributed random frailty effect. The standard error of σ̃u2 can be found from
[ ]−1/2
√ 2 1 1
se (σ̃u ) = (2σ̃u ) n + 4 trace (Ṽ u Ṽ u ) − 2 2 trace (Ṽ u )
2
.
σ̃u σ̃u
Both σ̃u2 and its standard error generally feature in the output of software
packages that have the facility for fitting Cox regression models with lognormal
frailty.
For fully parametric frailty models, the cumulative baseline hazard func-
tion and the corresponding survivor function can be estimated using the max-
imum likelihood estimates of the unknown parameters. However, in semi-
parametric frailty models, estimates of the baseline hazard and cumulative
hazard cannot be extended to take account of the frailty terms, and so esti-
mated survivor functions cannot easily be obtained.
∑
n
log Lp (β̂) − δi ,
i=1
where log Lp (β̂) is the maximised partial log-likelihood for the Cox regression
model when∑ there is no frailty. By taking the marginal log-likelihood to be
n
log Lm (θ) + i=1 δi , the maximised marginal log-likelihood in the presence of
frailty is then directly comparable to that for the model with no frailty. This
is helpful in comparing models, as we will see in Section 10.6.
Table 10.2 Survival times of 20 patients from listing for a lung transplant.
Patient Survival time Status Age Gender BMI Disease
1 2324 1 59 1 29.6 COPD
2 108 1 28 1 22.6 Suppurative
3 2939 0 55 1 32.1 Fibrosis
4 1258 0 62 1 30.0 Fibrosis
5 2904 0 51 1 30.4 Fibrosis
6 444 1 59 1 26.9 Fibrosis
7 158 1 55 2 24.6 COPD
8 1686 1 53 2 26.8 COPD
9 142 1 47 1 32.2 Fibrosis
10 1624 1 53 2 15.7 COPD
11 16 1 62 1 26.4 Fibrosis
12 2929 0 50 1 29.0 COPD
13 1290 1 55 2 17.1 COPD
14 2854 0 47 1 20.0 Other
15 237 1 23 1 15.9 Suppurative
16 136 1 65 1 16.0 COPD
17 2212 1 24 2 19.5 Suppurative
18 371 1 54 1 28.9 Fibrosis
19 683 1 24 2 20.2 Suppurative
20 290 1 53 1 25.2 Fibrosis
0.3
Probability density function of U
0.2
0.1
0.0
-6 -4 -2 0 2 4
Value of u
Figure 10.7 Fitted probability density functions for the normal (—) and exp-gamma
(·······) random effects.
1.0
0.9
Probability density function of Z
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 2 4 6
Value of z
Figure 10.8 Fitted probability density functions for the lognormal (—) and gamma
(·······) frailty effects.
and exp-gamma distributions for the random effect. Figure 10.8 shows that
the estimated density functions of the frailty effect are quite similar when
zi exceeds 0.2. However, the density of the fitted gamma distribution tends
to ∞ as zi tends to zero, whereas the fitted lognormal distribution is unimodal.
Also, the mean of the fitted gamma distribution is unity, whereas from Equa-
tion (10.4), the fitted lognormal distribution has a mean of 3.09, which is why
the lognormal frailty variance is so much greater than that for the gamma
distribution.
To compare the parameter estimates for the various terms and their stan-
dard errors, Table 10.3 shows the estimates for the different fitted Weibull
models, including the estimated variances of the random effects, ui , and the
frailty effects, zi , denoted d var (ui ) and d
var (zi ), respectively. Note that for
a normal random effect, the estimated variance is d var (ui ) = σ̂u2 , while for
gamma frailty, d var (zi ) = 1/θ̂. Equation (10.5) is then used to obtain the vari-
ance of a lognormal frailty term and Equation (10.8) gives the variance of the
random effect corresponding to gamma frailty.
This table shows that there is a general tendency for the parameter esti-
mates to be further from zero when a frailty term is added, and the standard
errors are also larger. However, inferences about the impact of the factors on
patient survival from listing are not much affected by the inclusion of a frailty
term, and there is little difference in the results obtained when either a lognor-
mal or gamma frailty effect is included. The only factor to have any effect on
patient survival is primary disease, with the hazard of death in patients with
COPD being less than that for patients with other diseases. Also, on the basis
COMPARING MODELS WITH FRAILTY 371
of the −2 log L̂ statistic, the model with gamma frailty might be preferred to
that with lognormal frailty.
To avoid making specific assumptions about the form of the underlying
baseline hazard function, we fit a Cox regression model that contains age,
gender, body mass index and primary disease, and that also includes a log-
normal or gamma frailty term. These models are fitted by maximising the
penalised log-likelihood functions in Equations (10.26) and (10.28), respec-
tively. To test the hypothesis that all frailty effects are zero in the pres-
ence of the four explanatory variables, we compare the value of the −2 log L̂
statistic for the fitted Cox model, which is 1159.101, with values of the
maximised marginal log-likelihood statistic, −2 log L̂m , for Cox models with
frailty.
When a lognormal frailty term is added, the value of the maximised
marginal log-likelihood statistic, −2 log Lm (σ̂u2 ) from Equation (10.27), is
1157.477, and this leads to a change in the value of the −2 log L̂m statistic of
1.62, which is not significant (P = 0.203).
On fitting models with gamma frailty, the value of the log Lm (θ̂) statistic
from Equation (10.29) is −699.653, and adding the observed number of deaths,
123, to this and multiplying by −2, gives 1153.306. This can be compared
with the value of −2 log L̂ for the Cox regression model that contains the
same explanatory variables, but no frailty effects, 1159.101. The reduction in
the value of this test statistic is 5.80, which is significant when compared to
percentage points of a χ2 distribution on 1 d.f. (P = 0.016). This now shows
a significant frailty effect after allowing for survival differences due to primary
disease and the other factors.
The variances of the normal and exp-gamma random effects are smaller in
the Cox model than in the corresponding Weibull model, and the frailty effect
is less significant. This suggests that the fitted Cox regression model explains
more of the variation in survival times from listing for a transplant than a
Weibull model.
372 FRAILTY MODELS
To compare the parameter estimates for the various terms in a Cox re-
gression model and their standard errors, Table 10.4 shows the estimates for
the different models fitted, and the estimated variances of the random effects,
d
var (ui ) and frailty effects, d
var (zi ). For lognormal frailty, REML estimates
are given. Equations (10.5) and (10.8) have again been used to estimate the
variance of the frailty effects for lognormal frailty and the variance of the
random effects for gamma frailty.
As with the Weibull model, some parameter estimates are larger when a
frailty term is added and the standard errors are larger, but the parameter
estimates in Tables 10.3 and 10.4 are broadly similar. As for the Weibull
model, the gamma frailty model leads to the smallest value of the −2 log L̂
statistic and is a better fit to the data.
Estimates of frailty effects can be obtained using the approach outlined in
Section 10.4. For example, from estimates of the frailty terms, ẑi , for the Cox
model with lognormal frailty, there are 7 patients with values of ẑi greater
than 3, namely patients 11, 36, 69, 70, 87, 113 and 163. The survival times
of these patients are 16, 21, 3, 4, 38, 22 and 35 days respectively, and so the
patients with largest frailty are those whose survival is shortest, as expected.
where f (ui ) is the probability density function of Ui . Over the g groups, the
likelihood function is
∏g ∫ ∞
L(β) = Li (β)f (ui ) dui .
i=1 0
374 FRAILTY MODELS
As in the case of individual frailty effects, the integration can only be carried
out analytically if the frailty effects have a Γ(θ, θ) distribution. In this case,
the likelihood function for the ith group is
ni {
∏ }δij
′ θθ Γ(θ + di )
eβ xij h0 (tij ) ∑ ′ θ+di
,
j=1
Γ(θ){θ + j exp(β xij )H0 (tij )}
∑
where di = i δij is the number of deaths in the ith group. The corresponding
log-likelihood over the g groups is
∑
g
log L(β) = {log Γ(θ + di ) − log Γ(θ) − di log θ}
i=1
∑
g ∑
ni
− (θ + di ) log 1 + θ−1 exp(β ′ xij )H0 (tij )
i=1 j=1
∑
g ∑
ni
+ δij [β ′ xij + log h0 (tij )] ,
i=1 j=1
θ̂ + di
ẑi = ∑ni ′
,
θ̂ + j=1 exp(β̂ xij )Ĥ0 (tij )
On fitting a Weibull model containing cold ischaemic time, and the age and
diabetic status of the recipient, the value of the −2 log L̂ statistic is 1352.24.
To take account of donor effects, a lognormal shared frailty effect is added,
and the −2 log L̂ statistic decreases slightly to 1351.60. This reduction of 0.64
on 1 d.f. is not significant (P = 0.42) when compared to a χ21 distribution,
and remains non-significant (P = 0.21) compared to the percentage points of
the 0.5(χ20 + χ21 ) distribution.
When a lognormal frailty is added to a Cox regression model that con-
tains the same explanatory variables, the maximised marginal log-likelihood,
multiplied by −2, is 831.469, whereas for the corresponding Cox regression
model without frailty, −2 log L̂ = 831.868. Again, the reduction on adding a
frailty effect is not significant. Very similar results are obtained when donor
effects are modelled using a shared gamma frailty term, and we conclude that
there is no reason to include donor effects in the model. The estimated coef-
ficients of the three explanatory variables are very similar under a Cox and
Weibull model, and are hardly affected by including random donor effects.
We conclude that taking proper account of the donor effects has not materi-
SOME OTHER ASPECTS OF FRAILTY MODELLING 377
ally affected inferences about the effect of explanatory factors on short-term
survival of these transplant recipients.
A number of other issues that arise in connection with frailty modelling are
discussed briefly in this section.
The techniques for model checking described in Chapters 4 and 7 may reveal
outlying observations, influential values and inadequacies in the functional
form of covariates. In addition, an informal way of assessing the adequacy
of a parametric model with gamma frailty is to compare the observed sur-
vivor function with the observable function derived from a frailty model, as
in Example 10.2. In more complex problems, the baseline survivor function
estimated from fitting a Cox model with the same explanatory variables as
the frailty model, can be compared to the observable survivor function for
an individual with all explanatory variables set to zero. For other choices of
frailty distribution the observable survivor function can only be determined
numerically, and so this approach is not so useful.
In frailty modelling, the validity of the chosen frailty distribution may
also be critically examined. Two particular distributional models have been
described in this chapter, the gamma and lognormal distributions, but there
are a number of other possibilities. Which of these frailty distributions is
chosen is often guided by mathematical convenience and the availability of
statistical software. One approach to discriminating between alternative frailty
distributions is to consider a general family that includes specific distributions
as special cases. The power variance function (PVF) distribution is widely
used in this context. This distribution has a density function that is a function
of three parameters, α, δ and θ, and whose mean and variance are given by
for θ > 0, 0 < α 6 1, δ > 0. Setting δ = θ1−α gives a distribution with unit
mean and variance (1 − α)/θ. When α = 0 the distribution reduces to a
gamma density with variance θ−1 and when α = 0.5 to an inverse Gaussian
distribution. Tests of hypotheses about the value of α can then inform model
choice.
Another approach to model checking is to assess the sensitivity of key
model-based inferences to the specific choice of frailty model. Comparing haz-
ard ratios from a fully parametric models with those from a Cox regression
model that has the same frailty distribution can also be valuable.
378 FRAILTY MODELS
10.8.2 Correlated frailty models
An important extension of shared frailty models is to the situation where the
frailties of individuals within a group are not identical as in the shared frailty
model, but are merely correlated. Such models are particularly relevant when
interest centres on the association between event times, as might be the case
when studying event times of paired organs, or amongst twins for example. In
the bivariate frailty case, correlated frailty can be modelled by extending the
model in Equation (10.30) to have a separate frailty for each member of the
pair. Then, in the ith pair, the hazard function is
381
382 NON-PROPORTIONAL HAZARDS
1
Survivor function
0
0 1 2
Time
some particular time. For example, in the study leading to the survivor func-
tions illustrated in Figure 11.1, the treatment difference is roughly constant
after two years. The dependence of the probability of survival beyond two
years on prognostic variables and treatment might therefore be modelled. This
approach was discussed in connection with the analysis of interval-censored
survival data in Section 9.2. As shown in that section, there are advantages
in using a linear model for the complementary log-log transformation of the
survival probability. In particular, the coefficients of the explanatory variables
in the linear component of the model can be interpreted as logarithms of haz-
ard ratios. The disadvantages of this approach are that all patients must be
followed until the point in time when the survival rates are to be analysed,
and that the death data cannot be used until this time. Moreover, faith in
the long-term benefits of one or other of the two treatments will be needed to
ensure that the trial is not stopped early because of excess mortality in one
treatment group.
Strictly speaking, an analysis based on the survival probability at a partic-
ular time is only valid when that time is specified at the outset of the study,
which may be difficult to do. If the data are used to suggest end-points such
as the probability of survival beyond two years, some caution will be needed
in interpreting the results of a significance test.
In the study that leads to the survivor functions shown in Figure 11.1,
it is clear that an analysis of the two-year survival rate will be appropriate.
Now consider a study to compare the use of chemotherapy in addition to
surgery with surgery alone, in which the survivor functions are as shown in
Figure 11.2. Here, the short-term benefit of the chemotherapy may certainly be
STRATIFIED PROPORTIONAL HAZARDS MODELS 383
Survivor function 1
0
0 1 2
Time
Figure 11.2 Short-term advantage of chemotherapy and surgery (—) over surgery
alone (·······).
worthwhile, but an analysis of the two-year survival rates will fail to establish
a treatment difference. The fact that the difference between the two survival
rates is not constant makes it difficult to use an analysis based on survival
rates at a given time. However, it might be reasonable to assume that the
hazards are proportional over the first year of the study, and to carry out a
survival analysis at that time.
Hazard function
Standard
Standard
New
New
Time Time
Centre A Centre B
Figure 11.3 Hazard functions for individuals on a new drug (—) and a standard
drug (·······) in two centres.
On fitting this model, the estimated value of β is the log-hazard ratio for a
patient on the new treatment, relative to one on the standard, in each centre.
This model for stratified proportional hazards is easily fitted using stan-
dard software packages for survival analysis, and nested models can be com-
pared using the −2 log L̂ statistic. Apart from the fact that the stratifying
variable cannot be included in the linear part of the model, no new principles
are involved. When two or more groups of survival data are being compared,
the stratified proportional hazards model is in fact equivalent to the stratified
log-rank test described in Section 2.8 of Chapter 2.
STRATIFIED PROPORTIONAL HAZARDS MODELS 385
11.2.1 Non-proportional hazards between treatments
If there are non-proportional hazards between two treatments, misleading in-
ferences can result from ignoring this phenomenon. To illustrate this point,
suppose that the hazard function for two groups of individuals, on a new and
standard treatment, are as shown in Figure 11.4 (i). If a proportional hazards
model were fitted, the resulting fitted hazard functions are likely to be as
shown in Figure 11.4 (ii). Incorrect conclusions would then be drawn about
the relative merit of the two treatments.
Hazard function
Hazard function
0 1 2 0 1 2
Time Time
Figure 11.4 (i) Non-proportional hazards and (ii) the result of fitting a propor-
tional hazards model for individuals on a new treatment (—) and a standard treat-
ment (·······).
where xi is the value of X for the ith individual, and x2i (t) and x3i (t) are
the values of the two time-dependent variables for the ith individual at t.
Under this model, the log-hazard ratio for an individual on the new treatment,
relative to one on the standard, is then β1 for t ∈ (0, t1 ], β1 + β2 for t ∈ (t1 , t2 ]
and β1 + β3 for t ∈ (t2 , t3 ]. This model can be fitted in the manner described
in Chapter 8.
The model in Equation (11.1) allows the assumption of proportional haz-
ards to be tested by adding the variables x2i (t) and x3i (t) to the model that
contains xi alone. A significant decrease in the value of the −2 log L̂ statistic
would indicate that the hazard ratio for the new treatment (X = 1) relative
to the standard (X = 0) was not constant. An equivalent formulation of the
model in Equation (11.1) is obtained by defining x1i (t) to be the value of
{
1 if t ∈ (0, t1 ] and X = 1,
X1 (t) =
0 otherwise,
for the ith individual, and fitting a model containing x1i (t), x2i (t) and x3i (t).
The coefficients of the three time-dependent variables in this model are then
the log-hazard ratios for the new treatment relative to the standard in each of
the three intervals. Confidence intervals for the hazard ratio can be obtained
directly from this version of the model.
1.0
Estimated survivor function
0.8
0.6
0.4
0.2
0.0
0 500 1000 1500 2000 2500
Survival time
Figure 11.5 Estimates of the survivor functions for gastric cancer patients on
chemotherapy alone (—) or a combination of chemotherapy and radiotherapy (·······).
388 NON-PROPORTIONAL HAZARDS
0
Log-cumulative hazard
-1
-2
-3
-4
-5
0 1 2 3 4 5 6 7 8
Log survival time
Figure 11.6 Log-cumulative hazard plot for gastric cancer patients on chemotherapy
alone (—) or a combination of chemotherapy and radiotherapy (·······).
on adding the treatment effect to a null model is 0.64. We would then have
concluded there was no evidence of a treatment effect, but since the hazards
of death on the two treatments are not proportional, this analysis is incorrect.
A more appropriate summary of the treatment effect is obtained using a
piecewise Cox regression model, where the treatment effect is assumed con-
stant in each of a number of separate intervals, but differs across the intervals.
Four time intervals will be used in this analysis, namely 1–360, 361–720, 721–
1080 and 1081– days. A time-dependent treatment effect is then set up by
defining four variables, X1 (t), X2 (t), X3 (t), X4 (t), where Xj (t) = 1 when t is
within the jth time interval for a patient on the combined treatment, and
zero otherwise, for j = 1, 2, 3, 4. This is equivalent to fitting an interaction
between a time-dependent variable associated with the four intervals and the
treatments effect. If xji (t) is the value of the jth variable for the ith individual
at time t, the Cox regression model for the hazard of death at time t is
hi (t) = exp {β1 x1i (t) + β2 x2i (t) + β3 x3i (t) + β4 x4i (t)} h0 (t),
where h0 (t) is the baseline hazard function. In this model, the four β-
coefficients are the log-hazard ratios for the combined treatment relative to
chemotherapy alone in the four intervals, and h0 (t) is the hazard function for
a patient on the chemotherapy treatment.
On fitting all four time-dependent variables, the value of −2 log L̂ is
602.372. For the model in which a treatment effect alone is fitted, on the
assumption of proportional hazards, the hazard of death for the ith patient
at time t is hi (t) = exp(βxi )h0 (t), where xi is the value of a variable X
RESTRICTED MEAN SURVIVAL 389
for the ith patient, where X = 0 for a patient on the chemotherapy treat-
ment and X = 1 for a patient on the combined treatment. On fitting this
model, −2 log L̂ = 614.946. The two models are nested and the difference in
their −2 log L̂ values provides a test of proportional hazards. This difference
of 12.57 on 3 d.f. is highly significant, P = 0.005, confirming that there is
clear evidence of non-proportionality in the hazard ratio for the treatment ef-
fect. On fitting the four time-dependent variables, the hazard ratios and 95%
confidence intervals are as shown in Table 11.2.
This table summarises how the treatment effect varies over the four time
intervals. In the first year, patients on the combined treatment have over
twice the risk of death at any time, compared to those on the chemotherapy
treatment alone. In subsequent years, patients on the combined treatment have
a reduced hazard of death, although the three interval estimates all include
unity, suggesting that these hazard ratios are not significantly different from 1.
so that ∫ ∫
t0 t0
uf (u) du = S(u) du − t0 S(t0 ).
0 0
The restricted mean survival time is therefore the area under the estimated
survivor function to t0 , and is an easily understood summary statistic. For
example, if time is measured in months, µ(24) is the average number of months
survived over a 24-month period, and so is the two-year life expectancy. This
statistic can also be used to summarise the effect of a treatment parameter or
other explanatory variables on life expectancy over a defined period of time.
The restricted mean survival can be determined from the Kaplan-Meier
estimate of the survivor function. For example, the estimated restricted mean
at the rth ordered event time, t(r) , is
∑
r
µ̂(t(r) ) = Ŝ(t(j) )(t(j) − t(j−1) ),
j=1
where
∑
r−1
Aj = Ŝ(t(i) )(t(i+1) − t(i) ).
i=j
at a given time, where the observed failure rate is obtained from the Kaplan-
Meier estimate of the survivor function for a specific institution at a given
time, and the overall failure rate at that time is estimated from the survivor
function fitted to the data across all institutions, ignoring differences between
them. The expected failure rate for an institution can be estimated from the
risk adjusted survivor function, defined in Section 3.11 of Chapter 3, which is
the average of the estimated survivor functions for individuals within an insti-
tution, at a given time, based on a risk adjustment model. Estimates of each
failure rate at a specified time are obtained by subtracting the corresponding
value of the estimated survivor function from unity. Once the RAFR has been
obtained, the analogous risk adjusted survival rate, or RASR, can simply be
found from RASR = 1− RAFR.
There are 1439 patients in the data set, and data for the first 30 patients
transplanted in the time period are shown in Table 11.3. The transplanted
kidney did not function in patient 7, and so Tsurv = 0 for this patient.
Of particular interest is the transplant failure rate at one year, and so the
eight transplant centres will be compared using this metric. We first obtain
the unadjusted Kaplan-Meier estimate of the survivor function across all 1439
patients, from which the overall one-year failure rate is 1 − 0.904 = 0.096. The
centre-specific one-year failure rates are similarly obtained from the Kaplan-
Meier estimate of the survivor function for patients in each centre. Next,
the risk adjusted survival rate in each centre is calculated using the method
described in Section 3.11.1 of Chapter 3. A Cox regression model that contains
the variables Dage, Dtype, Rage, Diab, and CIT, is fitted to the data, from
which the estimated survivor function for the ith patient in the jth centre is
Ŝij (t) = {Ŝ0 (t)}exp(η̂ij ) , (11.4)
where
η̂ij = 0.023 Dage ij + 0.191 Dtype ij + 0.002 Rage ij − 0.133 Diab ij + 0.016 CIT ij ,
and Ŝ0 (t) is the estimated baseline survivor function. The average survivor
function at time t in the jth centre, j = 1, 2, . . . , 8, is
1 ∑
nj
Ŝj (t) = Ŝij (t),
nj i=1
INSTITUTIONAL COMPARISONS 395
where nj is the number of patients in the jth centre, and the expected trans-
plant failure rate at one year can then be estimated from this survivor function.
The unadjusted and adjusted one-year survival rates are shown in Table 11.4,
together with the RAFR.
To illustrate the calculation, for Centre 1, the unadjusted and adjusted
one-year survival rates are 0.9138 and 0.8920, respectively, and since overall
unadjusted survival rate is 0.096, the RAFR is
1 − 0.9138
× 0.096 = 0.077.
1 − 0.8920
The corresponding risk adjusted transplant survival rate for this centre is
0.923. Across the 8 centres, the risk adjusted one-year survival rates vary
between 6% and 17%.
396 NON-PROPORTIONAL HAZARDS
Table 11.4 Values of RAFR found from risk adjusted failure rates.
Centre Number of Estimated one-year survival RAFR
patients Unadjusted Adjusted
1 267 0.9138 0.8920 0.0768
2 166 0.8887 0.9084 0.1170
3 148 0.8986 0.9150 0.1148
4 255 0.9254 0.9061 0.0764
5 164 0.9024 0.9002 0.0941
6 160 0.9438 0.9068 0.0581
7 102 0.8922 0.8947 0.0985
8 177 0.8475 0.9128 0.1682
Table 11.5 Values of RAFR found from the observed and expected numbers of
transplant failures.
Centre Number of Number of transplant failures RAFR
patients Observed Expected
1 267 23 28.96 0.0764
2 166 18 14.93 0.1160
3 148 15 12.48 0.1157
4 255 19 24.23 0.0755
5 164 16 16.68 0.0923
6 160 9 15.38 0.0563
7 102 11 10.67 0.0992
8 177 27 14.66 0.1772
expected number is 28.96. The overall failure rate is 0.096, and so the RAFR
for this centre is
23
RAFR = × 0.096 = 0.076.
28.96
The RAFR values in this table are very similar to those given in Table 11.4.
∑
y ∫
e−µ µk µ
e−x xy
=1− dx, (11.7)
k! 0 Γ(y + 1)
k=0
∑
y−1 ∫
e−yL yL k yL
e−x xy−1
P(Y > y) = 1 − P(Y 6 y − 1) = 1 − = dx,
k! 0 Γ(y)
k=0
This means that yL is the lower 2.5% point of a gamma random variable
with shape parameter y and unit scale parameter, and so can be obtained
from the inverse cumulative distribution function for the gamma distribution.
Similarly, the upper limit of the 95% confidence interval, yU , is the expected
value of a Poisson random variable for which P(Y 6 y) = 0.025. Again using
Equation (11.7),
∑
y ∫
e−yU yU k yU
e−x xy
P(Y 6 y) = =1− dx,
k! 0 Γ(y + 1)
k=0
so that ∫ yU
e−x xy
dx = 0.975,
0 Γ(y + 1)
INSTITUTIONAL COMPARISONS 399
and yU is the upper 2.5% point of a gamma distribution with shape parameter
y + 1 and unit scale parameter.
As an illustration, suppose that the observed number of events in a partic-
ular institution is y = 9, as it is for Centre 6 in the data on kidney transplant
failure rates in Example 11.3. The lower 2.5% point of a gamma distribu-
tion with shape parameter 9 and unit scale parameter is yL = 4.12, and so
P(Y > 9) = 0.025 when Y has a Poisson distribution with mean µ = 4.12.
Also, the upper 2.5% point of a gamma distribution with shape parameter
10 and unit scale parameter is yU = 17.08, and so P(Y 6 9) = 0.025 when
Y has a Poisson distribution with mean 17.08. These two distributions are
shown in Figure 11.7. The tail area to the right of y = 9 in the distribution
with mean 4.12, and the tail area to the left of y = 9 in the distribution with
mean 17.08, are both equal to 0.025. An exact 95% interval estimate for the
observed number of events is then (4.12, 17.08).
0.20
0.15
Probability
0.10
0.05
0.00
0 2 4 6 8 10 12 14 16 18 20 22 24
Value of y
Figure 11.7 Poisson distributions with means 4.12 and 17.09 (- - -) for deriving exact
confidence limits for an observed value of 9 (·······).
Once an interval estimate for the number of failures, (yL , yU ), has been
obtained using either the approximate or exact method, corresponding limits
for the RAFR are
yL yU
× overall failure rate, and × overall failure rate,
ej ej
where here ej is the estimated number of failures in the jth institution, ob-
tained using Equation (11.6), but taken to have negligible error.
Example 11.5 Comparisons between kidney transplant centres
For the data on transplant outcomes in eight kidney transplant centres, given
400 NON-PROPORTIONAL HAZARDS
in Example 11.3, approximate and exact 95% confidence limits for the RAFR
are shown in Table 11.6.
Table 11.6 Approximate and exact 95% interval estimates for the RAFR.
Centre RAFR Approximate limits Exact limits
Lower Upper Lower Upper
1 0.0764 0.051 0.115 0.048 0.115
2 0.1160 0.073 0.184 0.069 0.183
3 0.1157 0.070 0.192 0.065 0.191
4 0.0755 0.048 0.118 0.045 0.118
5 0.0923 0.057 0.151 0.053 0.150
6 0.0563 0.029 0.108 0.026 0.107
7 0.0992 0.055 0.179 0.050 0.178
8 0.1772 0.122 0.258 0.117 0.258
To illustrate how these interval estimates are calculated, consider the data
for Centre 1, for which y1 = 23 and the corresponding expected number of
deaths is, from Example 11.4, estimated to be 28.96. The standard √ error of
the√logarithm of the number of transplant failures in this centre is 1/ (y1 ) =
1/ (23) = 0.209. A 95% confidence interval for y1 is then exp(log 23 ± 1.96 ×
0.209), that is (15.28, 34.61), and the corresponding interval for the RAFR is
( )
15.28 34.61
× 0.096, × 0.096 ,
28.96 28.96
that is (0.051, 0.115). Exact limits for the number of failures in this centre
can be found from the lower 2.5% point of a Γ(23, 1) random variable and
the upper 2.5% point of a Γ(24, 1) random variable. This leads to the interval
(14.58, 34.51), and corresponding limits for the RAFR are
( )
14.58 34.51
× 0.096, × 0.096 ,
28.96 28.96
In this model, the term log{ej /(overall failure rate)} is a variate with a known
coefficient of unity, called an offset. When the log-linear model in Equa-
tion (11.9) is fitted to the observed number of failures in each institution,
yj , the model has the same number of unknown parameters as there are ob-
servations. It will therefore be a perfect fit to the data, and so the fitted values
µ̂j will be equal to the observed number of failures. The parameter estimates
ĉj will be the fitted values of log E (Fj ), so that the RAFR for the jth centre is
exp(ĉj ). A 95% confidence interval for the RAFR is then exp{ĉj ±1.96 se (ĉj )},
where ĉj and its standard error can be obtained from computer output from
fitting the log-linear model.
Table 11.7 The RAFR and interval estimates from fitting a log-linear model.
Centre ĉj se (ĉj ) RAFR (eĉj ) 95% limits for RAFR
Lower Upper
1 −2.571 0.209 0.0764 0.051 0.115
2 −2.154 0.236 0.1160 0.073 0.184
3 −2.157 0.258 0.1157 0.070 0.192
4 −2.584 0.229 0.0755 0.048 0.118
5 −2.382 0.250 0.0923 0.057 0.151
6 −2.877 0.333 0.0563 0.029 0.108
7 −2.310 0.302 0.0992 0.055 0.179
8 −1.730 0.192 0.1772 0.122 0.258
where now the cj have a N (0, σc2 ) distribution, and α is the overall value of
the logarithm of the RAFR, so that eα is the overall failure rate.
Using random rather than fixed effects has two consequences. First, the
estimated institution effects, c̃j , are ‘shrunk’ towards the overall rate, α̃, and
the more extreme the RAFR, the greater the shrinkage. Second, interval esti-
mates for institution effects in the random effects model will be shorter than
when fixed effects are used, increasing the precision of prediction for future
patients. Both these features are desirable when using centre rates to guide
patient choice. The concept of shrinkage was referred to in Section 3.7 of
Chapter 3, when describing the lasso method for variable selection.
Table 11.8 The RAFR and interval estimates using random centre ef-
fects in a log-linear model.
Centre c̃j se (c̃j ) RAFR (ec̃j ) 95% limits for RAFR
Lower Upper
1 −2.471 0.170 0.084 0.061 0.118
2 −2.252 0.186 0.105 0.073 0.151
3 −2.261 0.194 0.104 0.071 0.152
4 −2.468 0.179 0.085 0.060 0.120
5 −2.360 0.181 0.094 0.066 0.135
6 −2.537 0.228 0.079 0.051 0.124
7 −2.330 0.199 0.097 0.066 0.144
8 −2.000 0.232 0.135 0.086 0.214
The estimates of the RAFR values on using a random effects model are
similar to those found with the fixed effects model shown in Table 11.7, but
closer to the overall rate of 0.096. This is particularly noticeable for Centre 8,
which has the largest RAFR. Also, the values of se (c̃j ) are generally smaller,
which in turn means that the corresponding interval estimates are narrower.
These two features illustrate the shrinkage effect caused by using random
centre effects.
Competing risks
In studies where the outcome is death, individuals may die from one of a
number of different causes. For example, in a study to compare two or more
treatments for prostatic cancer, patients may succumb to a stroke, myocar-
dial infarction or the cancer itself. In some cases, an analysis of death from all
causes may be appropriate, and standard methods for survival analysis can
be used. More commonly, there will be interest in how the hazard of death
from different causes depends on treatment effects and other explanatory vari-
ables. Of course, death from any one of a number of possible causes precludes
its occurrence from any other cause, and this feature has implications for the
analysis of data of this kind. In this chapter, we review methods for summaris-
ing data on survival times for different causes of death, and describe models
for cause-specific survival data.
405
406 COMPETING RISKS
of competing risks. Estimates of the effect of such factors on the hazard of
death from a particular cause, allowing for possible competing causes, may also
be needed. In other situations, it will be of interest to compare survival times
across different causes of death to identify those causes that lead to earlier
or later failure times. An assessment of the consistency of estimated hazard
ratios for certain factors across different end-points may also be needed. An
example of a data set with multiple end-points follows.
Table 12.1 Graft survival times for 20 liver transplant recipients and the cause of
graft failure.
Patient Age Gender Primary disease Time Status Cause of failure
1 55 1 ALD 2906 0 0
2 63 1 PBC 4714 0 0
3 67 1 PBC 4673 0 0
4 58 1 ALD 27 0 0
5 59 1 PBC 4720 0 0
6 35 2 PBC 4624 0 0
7 51 2 PBC 18 1 1 Rejection
8 61 2 PSC 294 1 1 Rejection
9 51 2 ALD 4673 0 0
10 59 2 ALD 51 0 0
11 53 1 PSC 8 1 2 Thrombosis
12 56 1 ALD 4592 0 0
13 55 2 ALD 4679 0 0
14 44 1 ALD 1487 0 0
15 61 2 PBC 427 1 4 Other
16 59 2 PBC 4604 0 0
17 52 2 PSC 4083 1 3 Recurrent disease
18 61 1 PSC 559 1 3 Recurrent disease
19 57 1 PSC 4708 0 0
20 49 2 ALD 957 1 4 Other
from graft rejection and thrombosis are as follows, where an asterisk (∗) de-
notes a censored observation. For failure from rejection, the graft survival
times are:
and for failure from thrombosis, the graft survival times are:
where Sj† (t) = exp{−Hj (t)}, Sj† (t) is not an observable survivor function. This
is because we can never know the cause of a death that may occur after time
t, when there is more than one possible cause.
Survival studies where there are competing risks can also be formulated
in terms of m random variables, T1 , T2 , . . . , Tm , that are associated with the
times to the m possible causes of failure. These random variables cannot be
observed directly, since only one event can occur, and so they are referred to
as latent random variables. In practice, we observe the earliest of the m events
to occur, and the random variable associated with the time to that event, T , is
such that T = min(T1 , T2 , . . . , Tm ). If the different causes are independent, the
hazard function in Equation (12.2) is the marginal hazard function associated
with Tj , the random variable for the jth event type, which is
{ }
P(t 6 Tj 6 t + δt | Tj > t)
lim .
δt→0 δt
Unfortunately, the joint distribution of the random variables associated with
times to the different causes, (T1 , T2 , . . . , Tm ), cannot be uniquely determined,
and it is not possible to use the competing risks data to test the assumption of
independence of the different causes. Consequently, this formulation of com-
peting risks will not be considered further.
0.10
0.05
0.00
0 2 4 6 8 10 12
Survival time (years)
Figure 12.1 Cumulative incidence of the four causes of graft failure, rejection (—),
thrombosis ( ), recurrent disease (·······) and other reasons (- - -).
ALD, are shown in Figure 12.2. The incidence of thrombosis is quite similar
in patients with PBC and PSC, and greater than that for patients with ALD.
However, a more formal analysis is needed to determine the significance of
these differences.
The cumulative incidence functions provide descriptive summaries of com-
peting risks data, but they can be supplemented by analogues of the log-rank
test for comparing two or more groups, in the presence of competing risks.
These tests include Gray’s test (Gray, 1988) and the method due to Pepe and
Mori (1993). Further details are not given here as an alternative procedure
is available through a modelling approach to the analysis of competing risks
data, and possible models are described in Sections 12.4 and 12.5.
where πj is the probability of the jth cause occurring. Also, the probability
of death from cause j when death has occurred before time t is
Fj (t)
P(C = j | T < t) = ,
F (t)
414 COMPETING RISKS
Cumulative incidence function 0.100
0.075
0.050
0.025
0.000
0 2 4 6 8 10 12
Survival time (years)
Figure 12.2 Cumulative incidence of thrombosis for recipients with PBC (—), PSC
(·······) or ALD (- - -) at the time of transplant.
where F (t) = 1 − S(t), and S(t) is overall survivor function. Estimates of each
of these probabilities can be readily obtained from the estimated cumulative
incidence function.
where h0j (t) is the baseline hazard for the jth cause, xi is the vector of values
of the explanatory variables for the ith individual, and β j is the vector of
their coefficients for the jth cause.
From results given in the sequel, separate models can be developed for
each cause of death from the cause-specific survival data, illustrated in Ex-
ample 12.2. To do this, a set of survival data is produced for each cause in
turn, where death from that cause is an event and the death times for all
other causes are regarded as censored. Inferences about the impact of each
explanatory variable on the cause-specific hazard function can then be based
on hazard ratios in the usual way, using either a Cox regression model or a
parametric model.
In modelling cause-specific hazards, the event times of individuals who
experience a competing risk are censored, and so are treated as if there is
MODELLING CAUSE-SPECIFIC HAZARDS 415
the possibility of the event of interest occurring in the future. Consequently,
the estimated hazard ratios correspond to the situation where other causes of
death are removed or assumed not to occur. This can lead to the hazard of a
particular cause of failure being overestimated. Also, models for cause-specific
survival data are based on the usual assumption of independent censoring. If
the competing events do not occur independently of the event of interest, this
assumption is not valid. Unfortunately the assumption of independent compet-
ing risks cannot be tested using the observed data. Despite these drawbacks,
this approach may be warranted when interest centres on how the explanatory
variables directly influence the hazard associated with a particular cause of
death, ignoring deaths from other causes.
When there is just one event type, the survivor function, and hence the
cumulative incidence function, can be obtained from the hazard function using
Equations (1.7) and (1.6) of Chapter 1. The impact of a change in the value
of an explanatory variable on the hazard function can then be interpreted
in terms of the effect of this change on the cumulative incidence function.
However, in the presence of competing risks, the cumulative incidence function
for any cause depends on the hazard of occurrence of each potential cause
of death, as indicated in Equation (12.5). This means that we cannot infer
how explanatory variables affect the cumulative incidence of each cause from
analysing cause-specific survival data. For this, we need to model the Fj (t)
directly, which we return to in Section 12.5.
exp(βj′ xi )
∑ ′ , (12.6)
l∈R(ti ) exp(βj xl )
where R(ti ) is the risk set at time ti , that is the set of individuals who are
alive and uncensored just prior to ti . Setting δij = 1 if the ith individual
dies from the jth cause, and zero otherwise, the partial likelihood function in
416 COMPETING RISKS
Expression (12.6) can be written as
{ }δij
∏m
exp(βj′ xi )
∑ ′ ,
j=1 l∈R(ti ) exp(βj xl )
which is the partial likelihood function for the cause-specific survival data
corresponding to the jth cause. This means that m separate Cox regression
models can be fitted to the cause-specific survival data to determine how the
explanatory variables affect the hazard of death from each cause.
When h0j (t) is fully specified, we have a parametric model that can be fit-
ted using standard maximum likelihood methods. Again, the likelihood func-
tion factorises into a product of likelihoods for the cause-specific survival data.
To show this, the contribution to the likelihood function for an individual who
dies from cause ci at ti is fci (ti ), where ci = 1, 2, . . . , m, for the ith individ-
ual, i = 1, 2, . . . , n. A censored survival time, for which the value of the cause
variable ci is zero, contains no information about the possible future cause
of death, and so the corresponding contribution to the likelihood function is
the overall survivor function, S(ti ). Ignoring covariates for the moment, the
likelihood function for data from n individuals is then
∏
n
L= fci (ti )δi S(ti )1−δi , (12.7)
i=1
where δi = 0 if the ith individual has a censored survival time and unity oth-
erwise. Using the result in Equation (12.2), and writing hci (t) for the hazard
function for the ith individual who experiences cause ci ,
∏
n
L= {hci (ti )}δi S(ti ). (12.8)
i=1
∏m
Now, from Equation (12.3), S(ti ) = j=1 exp{−Hj (ti )}, where the cumula-
tive hazard function for the jth cause at time ti is obtained from the corre-
sponding cause-specific hazard function, hj (ti ), and given by
∫ ti
Hj (ti ) = hj (u) du.
0
MODELLING CAUSE-SPECIFIC HAZARDS 417
Also, setting δij = 1 if ci = j, and zero otherwise, j = 1, 2, . . . , m, the likeli-
hood in Equation (12.8) can be expressed solely in terms of the hazard func-
tions for each of the m causes, where
∏ n ∏m ∏
m
L= hj (ti )δij exp{−Hj (ti )},
i=1 j=1 j=1
∏
n ∏
m
L= hj (ti )δij exp{−Hj (ti )}. (12.9)
i=1 j=1
and by comparing this expression with Equation (12.8), we see that this is the
likelihood function for the jth cause when the event times of all other causes
are taken to be censored. Consequently, the cause-specific hazard functions
can be estimated by fitting separate parametric models to the cause-specific
survival data. For example, if the baseline hazard function for the jth cause,
h0j (t), is taken to have a Weibull form, so that h0j (t) = λj γj tγj −1 , the param-
eters λj and γj in this baseline hazard function, together with the coefficients
of explanatory variables in the model, can be obtained by fitting separate
Weibull models to the cause-specific survival times. Standard methods can
then be used to draw inferences about the impact of the explanatory variables
on the hazard of death from each cause.
Table 12.5 Cause-specific hazard ratios and their 95% confidence intervals for the
four causes of graft failure in liver transplant recipients.
Variable Cause of graft failure
Rejection Thrombosis Recurrent disease Other
Age (linear) 0.97 0.98 0.98 1.03
(0.93, 1.00) (0.96, 1.00) (0.95, 1.01) (1.01, 1.05)
πj θj e−θj t
λj (t) = ∑ −θj t
.
j πj e
MODELLING CAUSE-SPECIFIC INCIDENCE 419
This cause-specific hazard function is not constant, even though the condi-
tional incidence function has an exponential form.
When a parametric model is adopted for the cumulative incidence of the
jth cause, Fj (t), models can be fitted by maximising the likelihood function
in Equation (12.7), from which the likelihood function for the n observations
(ti , ci ) is
∏n
fci (ti )δi S(ti )1−δi ,
i=1
d 1 dFj (t)
λj (t) = − log{1 − Fj (t)} = , (12.10)
dt 1 − Fj (t) dt
for the jth cause.
Now, 1 − Fj (t) is the probability that a person survives beyond time t or
who has previously died from a cause other than the jth, and as in Section 1.3
of Chapter 1, { }
dFj (t) Fj (t + δt) − Fj (t)
= lim .
dt δt→0 δt
420 COMPETING RISKS
It then follows that the subdistribution hazard function, λj (t), can be ex-
pressed as
{ }
P(t 6 T 6 t + δt, C = j | T > t or {T 6 t and C ̸= j})
λj (t) = lim .
δt→0 δt
This is the instantaneous death rate at time t from cause j, given that an
individual has not previously died from cause j. Since the definition of this
hazard function includes those who have died from a cause other than j before
time t, this subdistribution hazard function is different from the cause-specific
hazard in Equation (12.1) in both definition and interpretation.
To model the cause-specific cumulative incidence function, a Cox regression
model is assumed for the subhazard function for the jth cause. The hazard of
cause j at time t for the ith of n individuals is then
where λ0j (t) is the baseline subdistribution hazard function for cause j, xi
is the vector of values of p explanatory variables for the ith individual, and
the vector β j contains their coefficients for the jth cause. In this model, the
subdistribution hazard functions are assumed to be proportional.
The model in Equation (12.11) is fitted by adapting the usual partial
likelihood in Equation (3.4) of Chapter 3 to include a weighted combination
of values in the risk set. The resulting partial likelihood function for the jth
of m causes is
∏
rj
exp(βj′ xh )
∑ ′ , (12.12)
h=1 l∈R(t(h) ) whl exp(βj xl )
where the product is over the rj individuals who die from cause j at event
times t(1) < t(2) < · · · < t(rj ) , and xh is the vector of values of the explanatory
variables for an individual who dies from cause j at time t(h) , h = 1, 2, . . . , rj .
The risk set R(t(h) ) is the set of all those who have not experienced an event
before the hth event time t(h) , for whom the survival time is greater than
or equal to t(h) , and those who have experienced a competing risk by t(h) ,
for whom the survival time is less than or equal to t(h) . This risk set is not
straightforward to interpret, since an individual who has died from a cause
other than the jth before time t is no longer at risk at t, but nevertheless they
do feature in the risk set for this model. The weights in Expression (12.12)
are defined as
Ŝc (t(h) )
whl = ,
Ŝc (min{t(h) , tl })
where Ŝc (t) is the Kaplan-Meier estimate of the survivor function for the
censoring times. This is obtained by regarding all event times, of any type,
in the data set as censored times, and likewise all censored times as event
times, and calculating the Kaplan-Meier estimate from the resulting data.
The weight whl will be 1.0 when tl > t(h) , that is for those in the risk set who
MODELLING CAUSE-SPECIFIC INCIDENCE 421
have not had an event before t(h) , and less than 1.0 otherwise. The effect of
this weighting function is that individuals that die from a cause other than
the jth remain in the risk set and are given a censoring time that exceeds all
event times. Also, the weights become smaller with increasing time between
the occurrence of a competing risk and the event time being considered, so
that earlier deaths from a competing risk have a diminishing impact on the
results.
The partial likelihood in Expression (12.12) is maximised to obtain esti-
mates of the β-parameters for a given cause. Since the weights used in this
partial likelihood may vary over the survival time of a particular individual,
the data must first be assembled in the counting process format. This was
outlined in Section 8.6 of Chapter 8. This also makes it straightforward to
include time-dependent variables in this model.
The subdistribution hazard function is difficult to interpret, and the fitted
model is best interpreted in terms of the effect of the explanatory variables
on the cause-specific cumulative incidence function for the jth cause, which
from Equation (12.10), can be estimated by
for the ith individual, where Λ̂ij (t) is an estimate of the cumulative subdis-
tribution hazard function, Λij (t). This estimate is given by
where Λ̂0j (t) is the baseline cumulative subdistribution hazard function for
the jth event type. This function can be estimated using an adaptation of
the Nelson-Aalen estimate of the baseline cumulative hazard function, given
in Equation (3.28) of Chapter 3, where
∑ dh
Λ̂0j (t) = ∑ ,
whl exp(β̂j′ xl )
t(h) 6t l∈R(t(h) )
and F0j (t) = 1 − exp{−Λ0j (t)} is the baseline cumulative incidence function
for the jth cause. This emphasises that a quantity of direct interest is being
modelled.
Example 12.6 Survival of liver transplant recipients
In this example, the Fine and Gray model for the cumulative incidence func-
tion is used to model the data given in Example 12.1. Models for the cu-
mulative incidence of graft rejection, thrombosis, recurrent disease and other
422 COMPETING RISKS
Table 12.6 Subhazard ratios and their 95% confidence intervals in the Fine and Gray
model for the four causes of graft failure in liver transplant recipients.
Variable Cause of graft failure
Rejection Thrombosis Recurrent disease Other
Age (linear) 0.97 0.98 0.98 1.03
(0.93, 1.00) (0.96, 1.00) (0.95, 1.01) (1.01, 1.05)
causes of failure are fitted in turn. Subhazard ratios and their corresponding
95% confidence intervals for these four failure types are shown in Table 12.6.
The subhazard ratios in this table summarise the direct effect of each
variable on the incidence of different causes of graft failure, in the presence of
competing risks. However, their values are very similar to the hazard ratios
shown in Table 12.5, which suggests that there is little association between
the competing causes of graft failure. In a later example, Example 12.7 in
Section 12.6, this is not the case.
0.6 0.6
Cumulative incidence function
0.2 0.2
0.0 0.0
0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200
Survival time - thymic lymphoma Survival time - reticulum cell sarcoma
0.6
Cumulative incidence function
0.4
0.2
0.0
0 200 400 600 800 1000 1200
Survival time - other causes
Figure 12.3 Cumulative incidence functions for mice raised in a standard environ-
ment (—) and a germ-free environment (·······) for each cause of death.
These plots suggest that the incidence of thymic lymphoma is greater for
mice raised in the germ-free environment, but that reticulum cell sarcoma has
greater incidence for mice kept in the standard environment. The incidence of
death from other causes also differs between the two environments.
424 COMPETING RISKS
Table 12.7 Survival times and causes of death for two groups of irradiated mice.
Thymic lymphoma Reticulum cell sarcoma Other causes
Standard Germ-free Standard Germ-free Standard Germ-free
159 158 317 430 40 136
189 192 318 590 42 246
191 193 399 606 51 255
198 194 495 638 62 376
200 195 525 655 163 421
207 202 536 679 179 565
220 212 549 691 206 616
235 215 552 693 222 617
245 229 554 696 228 652
250 230 557 747 249 655
256 237 558 752 252 658
261 240 571 760 282 660
265 244 586 778 324 662
266 247 594 821 333 675
280 259 596 986 341 681
343 300 605 366 734
356 301 612 385 736
383 321 621 407 737
403 337 628 420 757
414 415 631 431 769
428 434 636 441 777
432 444 643 461 800
485 647 462 807
496 648 482 825
529 649 517 855
537 661 517 857
624 663 524 864
707 666 564 868
800 670 567 870
695 586 870
697 619 873
700 620 882
705 621 895
712 622 910
713 647 934
738 651 942
748 686 1015
753 761 1019
763
MODEL CHECKING 425
We next fit a Cox regression model to the cause-specific survival times.
For a mouse reared in the ith environment, i = 1, 2, the model for the jth
cause-specific hazard, j = 1, 2, 3, is
where xi = 1 for a mouse reared in the germ-free environment and zero other-
wise, so that βj is the log-hazard ratio of death from cause j at any time for a
mouse raised in the germ-free environment, relative to one from the standard
environment. The estimated hazard ratio, together with the corresponding
95% confidence limits and the P -value for testing the hypothesis that the
hazard ratio is 1.0, for each cause of death, are shown in Table 12.8.
Table 12.8 Hazard ratios, confidence limits and P -value for the three causes
of death in a cause-specific hazards model.
Cause of death Hazard ratio 95% confidence interval P -value
Thymic lymphoma 1.36 (0.78, 2.38) 0.28
Reticulum cell sarcoma 0.13 (0.07, 0.26) < 0.001
Other causes 0.31 (0.17, 0.55) < 0.001
This table shows that the hazard of death from thymic lymphoma is not
significantly affected by the environment in which a mouse is reared, but that
mice raised in a germ-free environment have a significantly lower hazard of
death from reticulum cell sarcoma or other causes.
This analysis shows how the type of environment in which the mice were
raised influences the occurrence of each of the three causes of death in circum-
stances where the other two possible causes cannot occur. Since the cumulative
incidence of any particular cause of death depends on the hazard of all possible
causes, we cannot draw any conclusions about the effect of environment on
the cause-specific incidence functions from modelling cause-specific hazards.
For this, we fit a Fine and Gray model for the cumulative incidence of thymic
lymphoma, reticulum cell sarcoma and other causes. This enables the effect
of environment on each of the three causes of death to be modelled, in the
presence of competing risks.
From Equation (12.11), the model for the subhazard function for a mouse
reared in the ith environment that dies from the jth cause is
The corresponding model for the cumulative incidence of death from cause j
is
Fij (t) = 1 − exp{−eβj xi Λ0j (t)},
where Λ0j (t) is the baseline cumulative subhazard function and xi = 1 for
the germ-free environment and zero otherwise. The estimated ratio of the
subhazard functions for mice raised in the germ-free environment relative to
426 COMPETING RISKS
Table 12.9 Subhazard ratios, confidence limits and P -value for the three
causes of death in the Fine and Gray model.
Cause of death Hazard ratio 95% confidence interval P -value
Thymic lymphoma 1.68 (0.97, 2.92) 0.066
Reticulum cell sarcoma 0.39 (0.22, 0.70) 0.002
Other causes 1.01 (0.64, 1.57) 0.98
those in the standard environment, P -values and 95% confidence limits, are
given in Table 12.9.
This table suggests that, in the presence of competing risks of death, the
environment has some effect on the subhazard of death from thymic lym-
phoma, a highly significant effect on death from reticulum cell carcinoma, but
no impact on death from other causes. The germ-free environment increases
the subhazard of thymic lymphoma and reduces that for reticulum cell sar-
coma.
At first sight the subhazard ratios in Table 12.9 appear quite surprising
when compared to the hazard ratios in Table 12.8. Although this feature
could result from the effect of competing risks on the incidence of death from
thymic lymphoma or other causes, an alternative explanation is suggested
by the estimates of the cumulative incidence functions, which were shown
in Figure 12.3. For death from other causes, the incidence of death in the
standard environment exceeds that in the germ-free environment at times
when there are relatively few events. At later times, where there are more
events, the incidence functions are closer together, and beyond 800 days, the
incidence of death in the germ-free environment is greater. The assumption
that the subhazard functions in the Fine and Gray model are proportional is
therefore doubtful.
To investigate this further, techniques described in Section 4.4 of Chapter 4
are used. First a plot of the scaled Schoenfeld residuals from the model for the
subhazard function for each cause is obtained, with a smoothed curve super-
imposed. The plot for deaths from other causes is shown in Figure 12.4. This
figure shows clearly that the smoothed curve is not horizontal, and strongly
suggests that the environment effect is time-dependent.
To further examine the assumption of proportional subhazards, the time-
dependent variable xi log t, with a cause-dependent coefficient βj1 , is added to
the model in Equation (12.13). The change in the value of −2 log L̂ on adding
this term to the subhazard model is not significant for death from thymic lym-
phoma (P = 0.16), but significant for reticulum cell sarcoma (P = 0.016), and
highly significant for death from other causes (P < 0.001). This demonstrates
that the effect of environment on the subhazard of death from reticulum cell
sarcoma and other causes is not independent of time. To illustrate this for
death from other causes, where j = 3, Figure 12.5 shows the time-dependent
subhazard ratio, exp{β3 + β31 log t}, plotted against log t, together with 95%
confidence bands. The hazard ratio is significantly less than 1.0 for survival
MODEL CHECKING 427
3
Scaled Schoenfeld residual
-1
-3
0 200 400 600 800 1000 1200
Survival time
Figure 12.4 Plot of the scaled Schoenfeld residuals against log survival time on fitting
a Fine and Gray model to data on mice that die from other causes.
6
Subhazard ratio
0
0 1 2 3 4 5 6 7
Log of survival time
Figure 12.5 Time-dependent subhazard ratio (—), with 95% confidence bands (- - -),
for mice that die from other causes.
428 COMPETING RISKS
times less than 240 days and significantly greater than 1.0 for survival times
greater than 650 days.
Another feature of these data is that there are four unusually short times to
death from other causes in mice reared in the standard environment. However,
these observations have little influence on the results.
Individuals observed over time may experience multiple events of the same
type, or a number of events of different types. Studies in which there are
repeated occurrences of the same type of event, such as a headache or asth-
matic attack, are frequently encountered and techniques are needed to model
the dependence of the rate at which such events occur on characteristics of the
individuals or exposure factors. Multiple event data also arise when an indi-
vidual may experience a number of outcomes of different types, with interest
centering on factors affecting the time to each of these outcomes. Where the
occurrence of one event precludes the occurrence of all others, we have the
competing risks situation considered in Chapter 12, but in this chapter we
consider situations where this is not necessarily the case.
In modelling the course of a disease, individuals may pass through a num-
ber of phases or states that correspond to particular stages of the disease. For
example, a patient diagnosed with chronic kidney disease may be on dialy-
sis before receiving a kidney transplant, and this may be followed by phases
where chronic rejection occurs, or where the transplant fails. A patient may
experience some or all of these events during the follow-up period from the
time of diagnosis, and the sequence of events is termed an event history. Mul-
tistate models can be used to describe the movement of patients among the
various states, and to investigate the effect of explanatory variables on the
rate at which they transfer from one state to another.
Multiple event and event history data can be analysed using an extension
of the Cox regression model to the situation where more than one event can
occur. This extension involves the development of a probabilistic model for
events that occur over time, known as a counting process, and so this chapter
begins with an introduction to counting processes.
429
430 MULTIPLE EVENTS AND EVENT HISTORY MODELLING
realisation of the counting process Ni (t) is then a step-function that starts at
0 and increases in steps of one unit.
We next define Yi (t), t > 0, to be a process where Yi (t) = 1 when the ith
individual is at risk of an event occurring at time t, and zero otherwise, so
that Yi (t) is sometimes called the at-risk process. The value of Yi (t) must be
known at a time t−, the time immediately before t, and when this is the case,
Yi (t) is said to be a predictable process.
The counting process Ni (t) has an associated intensity, which is the rate
at which events occur. Formally, the intensity of a counting process is the
probability that Ni (t) increases by one step in unit time, conditional on the
history of the process up to time t. The history or filtration of the process
up to but not including time t is written H(t−), and is determined from the
set of values {Ni (s), Yi (s)} for all values of s up to time t. If we denote the
intensity process by λi (t), t > 0, then, in infinitesimal notation, where dt is an
infinitely small interval of time, we have
1
λi (t) = P{Ni (t) increases by one step in an interval of length dt | H(t−)},
dt
1
= P{Ni (t + dt) − Ni (t) = 1 | H(t−)}.
dt
If we set dNi (t) = Ni (t + dt) − Ni (t) to be the change in Ni (t) over an
infinitesimal time interval of length dt, λi (t) can be expressed as
1
λi (t) = P{dNi (t) = 1 | H(t−)}. (13.1)
dt
Since dNi (t) is either 0 or 1, it follows that
1
λi (t) = E {dNi (t) | H(t−)},
dt
and so the intensity of the counting process is the expected number of events
in unit time, conditional on the history of the process.
We can also define the cumulative intensity, or integrated intensity, Λi (t),
where ∫ t
Λi (t) = λi (u) du
0
is the cumulative expected number of events in the time interval (0, t], that is
Λi (t) = E {Ni (t)}.
where dNi (t) = 1 if Ni (t) increases by one unit at time t, and zero otherwise.
This function is then maximised with respect to unknown parameters in the
intensity function, leading to estimates of the βs.
A large body of theory has been developed for the study of counting pro-
cesses, and this enables the asymptotic properties of parameter estimates to
be determined. This theory is based on the properties of a type of stochastic
process with zero mean known as a martingale. In fact, the process defined by
is such that E {Mi (t)} = 0 for all t, and is a martingale. Theoretical details will
not be given here, but note that Equation (13.3) is the basis of the martingale
residuals defined in Equation (4.6) of Chapter 4.
where Yi (t) denotes whether or not the ith individual is at risk of an event
at time t, β ′ xi (t) is a linear combination of p possibly time-dependent ex-
planatory variables, and h0 (t) is a baseline hazard function. Patients who
may experience multiple events remain at risk, and so Yi (t) remains at unity
unless an individual temporarily ceases to be at risk in some time period, or
until the follow-up time is censored. The risk set at time t is then the set of all
individuals who are still being followed up at time t, just as in the standard
Cox model.
To fit this model, the recurrent event data are expressed in counting process
format, where the jth time interval, (tj−1 , tj ], is the time between the (j −1)th
and jth recurrences, j = 1, 2, . . . , where t0 = 0. The status variable denotes
whether or not an event occurred at the end-point of the interval, that is the
jth recurrence time, and is unity if tj was an event time and zero otherwise.
An individual who experiences no events contributes a single time interval
where the end-point is censored, so that the status variable is zero. Similarly,
the status variable will be zero at the time marking the end of the follow-up
period, unless an event is observed at that time. The format of the data that
is required to fit the AG model is illustrated later in Example 13.2.
The assumption of independent recurrence times can be relaxed to some
extent by including terms in the model that correspond to the number of
preceding events, or the time from the origin. Since any association between
recurrence times will usually result in the model-based standard errors of
the estimated β-parameters being too small, the robust form of the variance-
covariance matrix of the estimated β-parameters, described in Section 13.1.4,
can be used.
This approach may not properly account for within-subject dependence,
and so alternatively, and preferably, association between the recurrence times
can be accommodated by adding a random frailty effect for the ith subject,
as described in Chapter 10. The addition of a random subject effect, ui , to
MODELLING RECURRENT EVENT DATA 437
the AG model leads to the model
hi (t) = Yi (t) exp{β ′ xi (t) + ui }h0 (t),
in which ui may be assumed to have a N (0, σu2 ) distribution. Fitting the AG
model with and without a frailty effect allows the extent of within-subject
correlation in the recurrence times to be assessed, using the methods described
in Section 10.6 of Chapter 10.
Once a model has been fitted, hazard ratios and their corresponding in-
terval estimates can be determined. In addition, the recurrence times can be
summarised using the unadjusted or adjusted cumulative intensity (or hazard)
function and the cumulative incidence function, estimated from the comple-
ment of the survivor function. This will be illustrated in Example 13.2.
where Yi (t) = 1 when a rat is at risk of tumour development and zero other-
wise, xi = 1 if the ith rat is in the treated group and zero otherwise, and β
is the log-hazard ratio for a rat on the retinoid treatment, relative to one in
MODELLING RECURRENT EVENT DATA 439
Table 13.3 Representation of the recurrence times of one rat for the
AG and PWP models.
Model Event Interval Status Stratum Treatment
AG 1 (0, 70] 1 1 1
2 (70, 74] 1 1 1
3 (74, 85] 1 1 1
4 (85, 92] 1 1 1
5 (92, 122] 0 1 1
the control group. A standard Cox regression model is then fitted to the stop
times, that is the end-point of each interval, in the data arranged as illustrated
in Table 13.3.
Using the robust standard error of the treatment effect, β̂ = −0.801 and
se (β̂) = 0.198. This estimate is significantly less than zero (P < 0.001), and
the overall hazard of a tumour occurring in a rat in the retinoid group, relative
to one in the control group, at any given time, is 0.45, with a 95% confidence
interval of (0.30, 0.66). The tumour occurrence rate for rats in the retinoid
treatment group is less than half that of rats in the control group.
The treatment difference can be summarised using the estimated cumu-
lative intensity, or hazard, of tumour recurrences for rats in each treatment
group. This is exp(β̂x)Ĥ0 (t), where Ĥ0 (t) is the estimated baseline cumula-
tive intensity function, and x = 1 for rats exposed to the retinoid treatment
and x = 0 for rats in the control group. This is also the cumulative expected
number of tumour recurrences over time. The cumulative intensity for each
treatment is shown in Figure 13.1. This shows that rats exposed to the retinoid
treatment have a lower intensity of tumour occurrence than rats in the control
group. At 45 days, it is estimated that one tumour would have occurred in a
rat in the treated group, but two tumours would have occurred by that time
in a rat in the control group.
The cumulative incidence of a recurrence can be obtained from 1 −
Ŝ0 (t)exp(β̂x) , for x = 0 or 1, where Ŝ0 (t) = exp{−Ĥ0 (t)} is the estimated
baseline survivor function in the fitted AG model. The two incidence functions
are shown in Figure 13.2. Again, this figure confirms that tumour incidence
is substantially greater for rats in the control group. The median tumour re-
currence time is 31 days for rats in the treated group and 17 days for rats in
the control group.
Instead of using the robust standard error of the treatment effect, associ-
ation between the recurrence times within a rat can be modelled by adding
a normally distributed random effect to the AG model. This random effect
MODELLING RECURRENT EVENT DATA 441
5
Cumulative intensity
0
0 20 40 60 80 100 120
Recurrence time
Figure 13.1 Cumulative intensity of recurrent events for rats on the retinoid treat-
ment (—) or a control (·······).
1.0
0.8
Cumulative incidence
0.6
0.4
0.2
0.0
0 20 40 60 80 100 120
Recurrence time
Figure 13.2 Cumulative incidence of recurrent events for rats on the retinoid treat-
ment (—) or a control (·······).
442 MULTIPLE EVENTS AND EVENT HISTORY MODELLING
differs for each rat but will be constant for the recurrence times within a rat.
When this is done, the rat effect is highly significant and the estimated vari-
ance of the random effect is 0.25. In this model, the estimated overall hazard
ratio is 0.46 with a 95% confidence interval of (0.31, 0.71). These estimates
are similar to those obtained using the robust standard error.
A limitation of the AG model is that the recurrence rate does not depend
on the number of preceding recurrences, and so we next fit the PWP model.
In this model, the hazard rate for the jth recurrence, j = 1, 2, . . . , in the ith
rat, i = 1, 2, . . . , 48, is
hij (t) = Yij (t) exp(βj xi )h0j (t),
where Yij (t) = 1 until the time of the (j − 1)th recurrence and zero there-
after, xi = 1 if the ith rat is in the treated group and zero otherwise, and βj
measures the treatment effect for the jth recurrence. On fitting this model,
estimates β̂j , the hazard ratios exp(β̂j ) and their standard errors, for the first
four recurrences and five or more recurrences, are as shown in Table 13.4. Both
the standard model-based standard errors and those obtained using the sand-
wich estimator are shown in this table. Unusually, those based on the robust
estimator are smaller, but the sandwich estimator will be used in subsequent
analyses.
The hazard ratio for the first recurrence is exp(−0.686) = 0.50, which is
identical to that from just modelling the time to the first tumour occurrence,
as it should be. The hazard ratio for a second and fourth recurrence are close
to unity, and not significant, P = 0.697, 0.839, respectively, while the hazard
ratios for three and five or more recurrences are significantly less than unity,
P = 0.029, 0.002, respectively. However, the value of −2 log L̂ for the fitted
PWP model is 812.82, and on constraining all the βs to be equal, the value of
this statistic increases to 816.95. This increase of 4.13 is not significant (P =
0.39) when compared to percentage points of the χ24 distribution, and so, on
an overall basis, there is no evidence of a difference between the hazard ratios
for different numbers of recurrences. The common estimate of β is −0.523, so
that the hazard ratio for any recurrence is 0.59, with a 95% confidence interval
of (0.46, 0.77). This is very similar to that found using the AG model, but
here the recurrence times have different baseline hazards.
The hazard functions, hij (t), can be constrained to be proportional by fit-
ting an unstratified model that includes a factor associated with the recurrence
MULTIPLE EVENTS 443
number, so that
hij (t) = Yij (t) exp(βj xi + ζj )h0 (t),
where ζj is the effect of the jth recurrence. If the ζj are all zero, the model re-
duces to the AG model. The stratified and unstratified models are not nested,
and so the two models cannot be directly compared. However, assuming pro-
portional hazards, the overall hazard rate is 0.59, with a 95% confidence in-
terval of (0.45, 0.77), which is very similar to that found using the stratified
model.
but now Yij (t) is unity until the jth event occurs and 0 thereafter. In this
model, the risk set at time t consists of all individuals who were being followed
up at time t and in whom the jth event has not occurred. This model has
the same form as a Cox regression model for the jth event, except that in the
WLW model, the βs are jointly estimated from the times to all events. The
WLW model is termed a marginal model since each event time is treated as a
separate event, and the time origin for each event is the start of the follow-up
time for each individual. Hazard ratios may vary across the different event
types, and the model also allows for differences in the underlying intensity of
each event to be accommodated.
To fit this model, the total number of possible events across all the individ-
uals in the study, r, is determined, and the time to each event is expressed as
a series of intervals from the time origin, (0, tj ], where tj is the time of the jth
444 MULTIPLE EVENTS AND EVENT HISTORY MODELLING
event, j = 1, 2, . . . , r. A robust variance-covariance matrix may again be used,
or a frailty term can be incorporated to specifically allow for any correlation
between the event times within an individual. Also, the baseline hazards could
be taken to be proportional and constraints on the β-parameters may also be
introduced, as for the PWP model in Section 13.2.2.
The WLW model was originally proposed as a model for recurrent events,
but there is some controversy surrounding its use in this area. In particular,
when the WLW model is used to model recurrent events, the definition of the
risk set is such that an individual who has experienced just one recurrence is at
risk of not only a second recurrence, but also a third, fourth, fifth recurrence,
and so on. This makes it difficult to interpret the coefficients of the explanatory
variables. Because of this inconsistency, the model is only recommended for
use in situations where it is natural to consider times to separate events from
the time origin.
These data will be analysed using the WLW model, and the first step is to
organise the data into the required format. Although no patient experienced
more than three of the five possible events, the revised database will have
five rows for each patient, corresponding respectively to time to local relapse,
axillary relapse, distant relapse, second malignancy and death. The values of
the explanatory variables are repeated in each row.
To illustrate this rearrangement, the event times and censoring indicators
for four patients, one with no events, one with one event, one with two events
and one with three events, are shown in Table 13.6. In this table, Nevents is
the number of events and the explanatory variables have been omitted.
Table 13.6 Data for four patients in the tamoxifen trial who experience 0, 1, 2 and
3 events.
Id Nevents Lsurv Ls Asurv As Dsurv Ds Msurv Ms Tsurv Ts
1 0 3019 0 3019 0 3019 0 3019 0 3019 0
2 1 2255 0 2255 0 493 1 2255 0 2255 0
14 2 3428 0 3428 0 2888 1 344 1 3428 0
349 3 949 1 2117 1 2775 1 3399 0 3399 0
The data for these four patients, in the required format, are given in Ta-
ble 13.7. Here, the variable Time is the time to the event concerned and Status
is the event status, where zero corresponds to a censored time and unity to
an event. In this table, the variable Event is coded as 1 for local relapse, 2
for axillary relapse, 3 for distant relapse, 4 for second malignancy and 5 for
death.
MULTIPLE EVENTS 447
Table 13.7 Rearranged data for four patients in the tamoxifen trial with 0, 1, 2 and
3 events.
Id Treat Age Size Hist HR Hb ANdis Event Time Status
1 0 51 1.0 1 1 140 1 1 3019 0
1 0 51 1.0 1 1 140 1 2 3019 0
1 0 51 1.0 1 1 140 1 3 3019 0
1 0 51 1.0 1 1 140 1 4 3019 0
1 0 51 1.0 1 1 140 1 5 3019 0
2 0 74 0.5 1 1 138 1 1 2255 0
2 0 74 0.5 1 1 138 1 2 2255 0
2 0 74 0.5 1 1 138 1 3 493 1
2 0 74 0.5 1 1 138 1 4 2255 0
2 0 74 0.5 1 1 138 1 5 2255 0
14 0 51 1.7 1 1 140 1 1 3428 0
14 0 51 1.7 1 1 140 1 2 3428 0
14 0 51 1.7 1 1 140 1 3 2888 1
14 0 51 1.7 1 1 140 1 4 344 1
14 0 51 1.7 1 1 140 1 5 3428 0
349 1 74 1.3 1 1 149 1 1 949 1
349 1 74 1.3 1 1 149 1 2 2117 1
349 1 74 1.3 1 1 149 1 3 2775 1
349 1 74 1.3 1 1 149 1 4 3399 0
349 1 74 1.3 1 1 149 1 5 3399 0
Table 13.8 Adjusted hazard ratio for the treatment effect for each event and 95%
confidence intervals on fitting the WLW model.
Event Hazard ratio P -value 95% confidence interval
1: local relapse 10.30 < 0.001 (3.68, 28.82)
2: axillary relapse 3.14 0.092 (0.83, 11.91)
3: distant relapse 0.84 0.588 (0.45, 1.57)
4: second malignancy 1.19 0.502 (0.72, 1.98)
5: death 0.79 0.416 (0.45, 1.39)
From this table, it is estimated that there is more than 10 times the risk
of a local relapse if tamoxifen is used without radiotherapy, an effect that is
highly significant (P < 0.001). There is also some evidence, significant at the
10% level, that the absence of radiotherapy increases the risk of an axillary
448 MULTIPLE EVENTS AND EVENT HISTORY MODELLING
relapse, but the estimated hazard ratios for all other events do not differ
significantly from unity. Further analysis of this data set would determine the
extent to which the hazards rates are proportional for different event types,
and which explanatory factors were relevant to each outcome.
h(t) -
Alive Dead
hT F (t) hF R (t)
Transplant - Graft failure - Retransplant
Z
Z
Z
Z
Z
hT D (t)ZZ hF D (t)
hRD (t)
Z
Z
Z
Z
Z
Z~?
Z
=
Death
failure or retransplant to death, the set of patients at risk of death at any time
consists of those who have had graft failure or a retransplant, respectively, and
are still alive. Patients who have not yet had a graft failure or retransplant
cannot be in either risk set. Fortunately, by expressing the data in counting
process format, a Cox regression model can be used to model all four tran-
sitions between the states in this four-state model. Moreover, this approach
can be extended to more complex multistate models.
where Yijk (t) = 1 if the ith of n individuals is in state j and at risk of entering
state k at time t, and Yijk (t) = 0, otherwise. As for other models described in
this chapter, the explanatory variables may also be time-dependent. Time is
generally measured from the point of entry into an initial state, which marks
the time origin.
When separate baseline hazards and separate regression coefficients are
assumed for each transition, stratified Cox regression models can be used to
model the different transition rates. Proportional hazard rates may also be
assumed, and modelled by including a factor in the model that corresponds
450 MULTIPLE EVENTS AND EVENT HISTORY MODELLING
to the different states. In general, the counting process formulation of survival
data will be needed for this, as the risk set definition will depend on the
transition of interest. The data for a particular individual relating to the
transition from state j to state k will then consist of a record giving the time
of entry into state j, the time of exit from state j and a status indicator that
is unity if the transition is from state j to state k and zero otherwise. The
process is illustrated in Example 13.4.
h12 (t)
- Platelet
Transplant
recovery
@
@
@
@
h13 (t) @ h23 (t)
@
@
@
@
@
R
Relapse
or death
The transitions between the three states are modelled by fitting a Cox
regression model to the data in the format shown in Table 13.10, stratified by
EVENT HISTORY ANALYSIS 453
transition type. A stratified model that allows the parameters associated with
the four explanatory factors, Leukaemia, Age, Match and Tcell, to depend
on the transition type is first fitted, by including interactions between the
strata and these four factors. Comparing alternative models using the −2 log L̂
statistic shows that there is no evidence that the factors Tcell and Match vary
over the three transition types, and moreover, none of the three transitions
depend on Match. A reduced model therefore contains the main effects of
Leukaemia, Age and Tcell, together with interactions between Leukaemia and
Transition, and Age and Transition. For this model, the hazard ratios, and
their associated 95% confidence intervals, are shown in Table 13.11. In this
example, a robust estimate of variance is used, but the resulting standard
errors differ little from the model-based estimates.
Table 13.11 Hazard ratios and 95% confidence intervals for each transition.
Factor Transition 1→2 Transition 1→3 Transition 2→3
Leukaemia
AML 1.00 1.00 1.00
ALL 0.96 (0.83, 1.11) 1.29 (0.98, 1.69) 1.16 (0.86, 1.56)
CML 0.74 (0.65, 0.85) 1.02 (0.82, 1.26) 1.31 (1.04, 1.64)
Age
6 20 1.00 1.00 1.00
21–40 0.86 (0.74, 0.99) 1.29 (0.96, 1.72) 1.02 (0.76, 1.38)
> 40 0.92 (0.78, 1.08) 1.69 (1.25, 2.29) 1.68 (1.23, 2.30)
Tcell
No 1.00 1.00 1.00
Yes 1.42 (1.27, 1.59) 1.42 (1.27, 1.59) 1.42 (1.27, 1.59)
The effect of both type of leukaemia and age group differs for the three
transitions. Patients with CML progress to platelet recovery at a slower rate
than those with the other two types of disease, and are more likely to relapse
after platelet recovery has occurred. Patients with ALL have a greater hazard
of relapse or death before platelet recovery than those with AML or CML.
Patients aged 21–40 experience platelet recovery at a slower rate than others,
while those aged over 40 have an increased hazard of relapse or death, whether
or not platelet recovery has occurred. T-cell depletion leads to significantly
greater rates of transition to other states, with no evidence that these rates
differ between transition types.
The three baseline cumulative hazard and incidence rates, that is the rates
for patients in the 6 20 age group with AML and no T-cell depletion, are
shown in Figures 13.6 and 13.7. The baseline cumulative hazard plot shows
that the transition to platelet recovery occurs at a much faster rate than
transitions to relapse or death. The cumulative incidence functions have a
very similar pattern, and from Figure 13.7, the one-year cumulative incidence
of relapse or death from either of the two possible states is about one-third
that of platelet recovery, for patients in the baseline group.
454 MULTIPLE EVENTS AND EVENT HISTORY MODELLING
1.0
Baseline cumulative hazard
0.8
0.6
0.4
0.2
0.0
0 1 2 3 4 5 6 7 8
Time since transplant (years)
1.0
Baseline cumulative incidence
0.8
0.6
0.4
0.2
0.0
0 1 2 3 4 5 6 7 8
Time since transplant (years)
Figure 13.7 Baseline incidence of transitions 1→2 (—), 1→3 (- - -) and 2→3 (·······).
FURTHER READING 455
Figure 13.6 suggests that the baseline hazard functions for the transitions
to relapse or death from either of the other two states may well be propor-
tional, although the hazard rate for the transition to a state of platelet recovery
has a different shape from the other two. Some further simplification of the
model is therefore possible, but this has little effect on the resulting inferences.
Dependent censoring
The methods described in this book for the analysis of censored survival data
are only valid if the censoring is independent or non-informative. Essentially,
this means that the censoring is not associated with the actual survival time,
so that individuals with censored survival times are representative of all oth-
ers who are at risk at that time, and who have the same values of measured
explanatory variables. For example, censoring would be considered to be in-
dependent if a censored time occurs because a patient has withdrawn from a
study on relocating to a different part of the country, or when survival data
are analysed at a prespecified time. On the other hand, if patients are with-
drawn because they experience life-threatening side effects from a particular
treatment, or if patients that do not respond to an active treatment are given
rescue medication, the censoring time is no longer independent of the survival
time. The censoring is then said to be dependent or informative. This chap-
ter shows how dependent censoring can be taken account of when modelling
survival data.
457
458 DEPENDENT CENSORING
soring. In addition, 283 discontinued treatment because of treatment toxicity,
and 380 left the study at the request of either the patient or the investigator.
It might be that patients in poor health would be more likely to experience
toxicity, and for these patients, their event times may be shorter. Similarly,
patients may have left the study to seek alternative treatments, and again
these patients may not be representative of the patient group as a whole. The
possibility of dependent censoring would therefore need to be considered when
analysing times to the primary end-point.
Statistical methods can be used to examine the assumption of independent
censoring in a number of ways. One approach is to plot observed survival times
against the values of explanatory variables, where the censored observations
are distinguished from the uncensored. If a pattern is exhibited in the censor-
ing, such as there being more censored observations at an earlier time on a
particular treatment, or if there is a greater proportion of censored survival
times in patients with a particular range of values of explanatory variables,
dependent censoring is suggested. More formally, a model could be used to
examine whether the probability of censoring is related to the explanatory
variables in the model. In particular, a linear logistic model could be used in
modelling a binary response variable that takes the value unity if an observed
survival time is censored and zero otherwise. If the probability of censoring is
found to depend on the values of certain explanatory variables, the assumption
of independent censoring may have been violated.
Even though dependent censoring is present, this feature may not necessarily
affect inferences of primary interest. It is therefore important to determine the
sensitivity of such inferences to the possibility of dependent censoring, and this
can be examined using two complementary analyses. In the first, we assume
that individuals who contribute censored observations are actually those at
high risk of an event. We therefore suppose that individuals for whom the
survival time is censored would have experienced the event immediately after
the censoring time, and designate all censored times as event times. In the
second analysis, we assume that the censored individuals are those at low risk
of an event. We then suppose that individuals with censored times experience
the event only after the longest observed survival time amongst all others in
the data set. The censored times are therefore replaced by the longest event
time. The impact of these two assumptions on the results of the analysis can
then be studied in detail. If essentially the same conclusions arise from the
original analysis and the two supplementary analyses, it will be safe to assume
that the results are not sensitive to the possibility of dependent censoring.
Another way to determine the potential impact of dependent censoring
is to obtain bounds on the survivor function in the presence of dependent
censoring. This has led to a number of proposals, but many of these lead to
SENSITIVITY TO DEPENDENT CENSORING 459
bounds that are too wide to be of practical value, and so this approach will
not be considered further.
Since the occurrence of dependent censoring cannot be identified without
additional information, it is useful to analyse the extent to which quantities
such as the risk score or median survival time are sensitive to the introduction
of some association between the time to failure and the time to censoring. A
sensitivity analysis for dependent censoring in parametric proportional haz-
ards models, described by Siannis, Copas and Lu (2005), will be outlined in
the next section.
where Ŝ0 (t) = λ̂tγ̂ is the estimated baseline survivor function. Using Equa-
tion (14.1), the estimated survivor function when there is dependent censoring
is approximately
′
{Ŝ (t)}exp{β̂ xi +B(xi )} ,
0
from which the impact of dependent censoring on survival rates can be deter-
mined, for a given value of ϕ. Similarly, the estimated median survival time
of an individual with vector of explanatory variables xi in the Weibull model
is { }1/γ̂
log 2
t̂(50) = ,
λ̂ exp(β̂ ′ xi )
and again using Equation (14.1), the estimated median when there is a degree
of dependence, ϕ, between the survival and censoring times is
{ }1/γ̂
log 2
t̂ϕ (50) = .
λ̂ exp[β̂ ′ xi + B(xi )]
An approximation to the relative reduction in the median survival time for
this individual is then
t̂(50) − t̂ϕ (50)
= 1 − exp{−γ̂ −1 B(xi )}, (14.2)
t̂(50)
for some value of ϕ. A plot of this quantity against a range of possible values
of the censoring score, β̂c′ xi , shows how the median survival time for individ-
uals with different censoring scores might be affected by dependent censoring.
The robustness of the estimated median to different amounts of dependent
censoring can also be explored by using a range of ϕ-values.
SENSITIVITY TO DEPENDENT CENSORING 461
14.2.2 Impact of dependent censoring
With independent censoring, individuals who are censored are representative
of the individuals at risk at the time of censoring, and estimated hazard ra-
tios and survivor functions obtained from a standard analysis of the time to
event data will be unbiased. On the other hand, if there is dependent cen-
soring, but it is assumed to be independent, model-based estimates may be
biased. The direction of this bias depends on whether there is positive or
negative association between the time to event and time to censoring vari-
ables. If there is a positive association between the two variables, those with
censored event times would be expected to experience a shorter event time
than those who remain at risk. Similarly, if there is a negative association,
individuals with censored event times may be those who would otherwise
have had a longer time before the occurrence of the event of interest. Stan-
dard methods for survival analysis would then lead to an overestimate or
underestimate of the survivor function, respectively, and the extent of the
bias will tend to increase as the number of dependently censored observations
increases.
Table 14.1 Time from registration for a liver transplant until death while waiting.
Patient Time Status Age Gender BMI UKELD
1 1 0 60 0 24.24 60
2 2 1 66 0 30.53 67
3 3 0 71 0 26.56 61
4 3 0 65 1 23.15 63
5 3 0 62 0 22.55 64
6 4 1 56 0 36.39 73
7 5 0 52 0 24.77 57
8 5 0 65 0 33.87 49
9 5 1 58 0 27.55 75
10 5 1 57 0 22.10 64
11 6 0 62 0 21.60 55
12 7 0 56 0 25.69 66
13 7 0 52 0 32.39 59
14 8 1 45 0 28.98 66
15 9 0 50 0 31.67 60
16 9 0 65 0 24.67 57
17 9 0 44 0 24.34 64
18 10 0 67 0 22.65 61
19 12 0 67 0 26.18 57
20 13 0 57 0 22.23 53
Livers are generally allocated on the basis of need, and so tend to be offered
to those patients who are more seriously ill. As a result, patients who get a
transplant tend to be those who are nearer to death. The time to censoring
will then be associated with the time to death, and so the time from listing
until a transplant is dependent on the time from listing until death without a
transplant.
To illustrate the sensitivity analysis described in Section 14.2.1, consider a
Weibull model for the hazard of death at time t that contains the explanatory
variables that denote the age, gender, BMI and UKELD value of a patient
on the liver registration list. From a log-cumulative hazard plot, the Weibull
distribution fits well to the unadjusted survival times. The times to censoring
are also well fitted by a Weibull distribution, and the same four explanatory
variables will be included in this model. From these fitted models, the risk
score, β̂ ′ xi , and the censoring score, β̂c′ xi , can be obtained for each of the
281 patients, and a plot of the risk score against the censoring score for each
patient is given in Figure 14.1.
This figure shows that the risk score is positively correlated with the cen-
soring score, and so individuals that have a greater hazard of death, that is
those with a higher risk score, are more likely to be censored. This indicates
that there is dependent censoring in this data set.
To explore how this dependent censoring might affect the median survival
time, the relative change in the median is obtained for a moderate value of
MODELLING WITH DEPENDENT CENSORING 463
-2
-4
Risk score
-6
-8
-10
-12
-6.5 -6.0 -5.5 -5.0 -4.5
Censoring score
Figure 14.1 A plot of the values of the risk score, β̂ ′ xi against the censoring score,
β̂c′ xi , for each patient on the liver transplant waiting list.
60
40
20
0
-6.5 -6.0 -5.5 -5.0 -4.5
Censoring score
Figure 14.2 Approximate percentage reduction in the median survival time from list-
ing for a liver transplant, as a function of the censoring score, when ϕ = 0.3.
in which µc is the ‘intercept’, σc the ‘scale’, αcj is the coefficient of the jth
explanatory variable and λc = exp(−µc /σc ), γc = 1/σc , βcj = −αcj /σc , for
j = 1, 2, . . . , p.
To handle the time dependence of the weights, the data need to be ex-
pressed in counting process format, using the (start, stop, status) notation
that was described in Section 13.1.3 of Chapter 13. The stop times are taken
to be the event times in the data set.
The weights, wi (t), calculated from Equation (14.3), can get quite large
when the probability of censoring beyond t is small, and this can lead to com-
putational problems. It can then be more efficient to use stabilised weights,
wi∗ (t), calculated using wi∗ (t) = ŜKM (t)/Ŝci (t), where ŜKM (t) is the Kaplan-
Meier estimate of the probability of censoring after t. Using the term ŜKM (t)
in the numerator of the weights has no effect on parameter estimates, since
ŜKM (t) is independent of explanatory variables in the model and cancels out
in the numerator and denominator of the partial likelihood function in Equa-
tion (14.4). However, it does lead to greater stability in the model fitting
process.
Finally, to account for the additional uncertainty in the specification of
the model, a robust estimate of the variance-covariance matrix of the param-
MODELLING WITH DEPENDENT CENSORING 467
eter estimates is recommended, such as the sandwich estimate introduced in
Section 13.1.4 of Chapter 13.
Next, the data in Table 14.1 are expressed in the counting process format
and weights are calculated from the reciprocals of the censoring probabilities at
the ‘stop’ times, from the inverse of the estimated censoring probability, Ŝci (t)
in Equation (14.5). Data for the first 10 patients from Table 14.1 are shown
in the counting process format in Table 14.3, together with the censoring
probabilities and weights that are used in fitting a Cox regression model to
the survival times.
A weighted Cox regression model that contains the same four explanatory
variables is then fitted, and the estimated hazard of death at time t for the
ith patient is
ĥi (t) = exp{β̂1 x1i + β̂2 x2i + β̂3 x3i + β̂4 x4i }ĥ0 (t),
where ĥ0 (t) is the estimated baseline hazard function. The parameter esti-
mates and their standard errors in the weighted Cox regression model that
468 DEPENDENT CENSORING
Table 14.3 Data from the first 10 patients in Table 14.1 in the counting process
format.
Patient Start time Stop time Status Censoring probability Weight
1 0 1 0 0.9955 1.0045
2 0 2 1 0.9888 1.0114
3 0 2 0 0.9915 1.0085
3 2 3 0 0.9872 1.0129
4 0 2 0 0.9873 1.0128
4 2 3 0 0.9809 1.0195
5 0 2 0 0.9885 1.0116
5 2 3 0 0.9827 1.0176
6 0 2 0 0.9851 1.0151
6 2 4 1 0.9701 1.0308
7 0 2 0 0.9919 1.0081
7 2 4 0 0.9837 1.0166
7 4 5 0 0.9796 1.0208
8 0 2 0 0.9960 1.0040
8 2 4 0 0.9919 1.0081
8 4 5 0 0.9899 1.0102
9 0 2 0 0.9806 1.0198
9 2 4 0 0.9610 1.0405
9 4 5 1 0.9513 1.0511
10 0 2 0 0.9880 1.0121
10 2 4 0 0.9758 1.0248
10 4 5 1 0.9698 1.0312
takes account of dependent censoring are shown in Table 14.4. Also shown
are the corresponding unweighted estimates when dependent censoring is not
taken into account. The sandwich estimate of the variance-covariance matrix
of the parameter estimates has been used in both cases, but this only makes
a very small difference to the standard errors.
Table 14.4 Parameter estimates and their standard errors in a weighted and un-
weighted Cox regression model.
Variable Parameter Weighted Unweighted
Estimate se (Estimate) Estimate se (Estimate)
Age β̂1 0.0118 0.0316 0.0774 0.0200
Sex β̂2 −0.9895 0.6549 −0.1471 0.4944
BMI β̂3 −0.0218 0.0492 0.0236 0.0210
UKELD β̂4 0.1559 0.0427 0.2162 0.0276
The two sets of estimates are somewhat different, which shows that
the adjustment for dependent censoring has affected the hazard ratios. In
the unweighted analysis, both age and UKELD score are highly significant
MODELLING WITH DEPENDENT CENSORING 469
(P < 0.001), whereas age ceases to be significant in the weighted analysis.
From the hazard ratio for UKELD after adjustment for dependent censor-
ing, a unit increase in the UKELD leads to a 17% increase in the hazard
of death.
To illustrate the effect of the adjustment for dependent censoring, Fig-
ure 14.3 shows the estimated survivor functions for a female patient aged 50
with a UKELD score of 60 and a BMI of 25, from a weighted and unweighted
Cox regression model for the hazard of death.
1.0
Estimated survivor function
0.8
0.6
0.4
0.2
0.0
0 200 400 600 800
Days from registration
Figure 14.3 Estimated survivor functions in a weighted (·······) and unweighted (—)
Cox regression model for a 50-year-old female with a UKELD score of 60 and a BMI
of 25.
This figure shows how survival rates are overestimated if account is not
taken of dependent censoring. In particular, if no allowance is made for de-
pendent censoring, the survival rate at six months is estimated to be 77%,
but after taking account of dependent censoring, the estimate is 65%. In ad-
dition, the 80% survival rate is overestimated by nearly two months. Fail-
ure to take account of the dependent censoring can therefore result in mis-
leading estimates of the waiting list mortality for patients awaiting a liver
transplant.
In an extension to this analysis, separate models could be entertained for
different causes of censoring, namely transplant and removal from the regis-
tration list because of a deteriorating condition, as well as for any independent
censoring. Additionally, variation in the UKELD score over time could also be
incorporated so that changes in disease severity over the registration period
can be accounted for.
470 DEPENDENT CENSORING
14.4 Further reading
Bounds for the survivor function in the presence of dependent censoring have
been described by a number of authors, including Peterson (1976), Slud and
Rubinstein (1983) and Klein and Moeschberger (1988). Tsiatis (1975) explains
why the extent of association between event times and censoring times cannot
be estimated from observed data, and Siannis (2004) and Siannis, Copas and
Lu (2005) have described methods for determining the sensitivity of infer-
ences to dependent censoring in parametric models. This approach has been
extended to the Cox proportional hazards model in Siannis (2011), although
the method is computationally intensive.
Models for data with dependent censoring have been described by Wu
and Carroll (1988) and Schlucter (1992). Inverse probability of censoring
weighted estimators were introduced by Robins and Rotnitzky (1992) and
Robins (1993). Robins and Finkelstein (2000) showed how these estimators
could be used to adjust Kaplan-Meier estimates to take account of dependent
censoring. Satten, Datta and Robins (2001) and Scharfstein and Robins (2002)
showed how to estimate the survivor function for a Cox regression model in
the presence of dependent censoring.
Chapter 15
There are many aspects of the design of a medical research programme that
need to be considered when the response variable of interest is a survival time.
These include factors such as the inclusion and exclusion criteria for study
participants, the unambiguous definition of the time origin and the end-point
of the study, and the duration of patient follow-up. In a clinical trial, the
specification of treatments, the method of randomisation to be employed in
allocating patients to treatment group, and the use of blinding must also be
specified. Consideration might also be given to whether the study should be
based on a fixed number of patients, or whether a sequential design should
be adopted, in which the study continues until there is a sufficient number
of events to be able to distinguish between treatments. The need for interim
analyses, or adaptive designs that allow planned modifications to be made to
the sample size or allocated treatment as data accumulate, also needs to be
discussed.
Many of these considerations are not unique to studies where survival is
the outcome of interest, and are discussed in a number of texts on the design
and analysis of clinical trials, such as Friedman, Furberg and DeMets (2010),
Matthews (2006) and Pocock (1983). However, there is one matter in the de-
sign of fixed sample size studies that will be discussed here. This is the crucial
issue of the number of patients that are required in a survival study. If too
few patients are recruited, there may be insufficient information available in
the data to enable a treatment difference to be pronounced significant. On the
other hand, it is unethical to waste resources in studies that are unnecessarily
large. Sample size calculations for survival data are presented in this chapter.
471
472 SAMPLE SIZE REQUIREMENTS FOR A SURVIVAL STUDY
Suppose that in this study, there are two groups of patients, and that the
standard treatment is allocated to the patients in Group I, while the new
treatment is allocated to those in Group II. Assuming a proportional hazards
model for the survival times, the hazard of death at time t for a patient on
the new treatment, hN (t), can be written as
where hS (t) is the hazard function at t for a patient on the standard treatment
and ψ is the unknown hazard ratio. We will also define θ = log ψ to be the log-
hazard ratio. If θ is zero, there is no treatment difference. On the other hand,
negative values of θ indicate that survival is longer under the new treatment,
while positive values of θ indicate that patients survive longer on the standard
treatment.
In order to test the null hypothesis that θ = 0, the log-rank test described
in Section 2.6 can be used. As was shown in Section 3.13, this is equivalent
to using the score test of the null hypothesis of equal hazards in the Cox
regression model. In this chapter, sample size requirements will be based on
the log-rank test statistic, but the formulae presented can also be used when
an analysis based on the Cox regression model is envisaged.
In a survival study, the occurrence of censoring means that it is not usu-
ally possible to measure the actual survival times of all patients in the study.
However, it is the number of actual deaths that is important in the analysis,
rather than the total number of patients. Accordingly, the first step in deter-
mining the number of patients in a study is to calculate the number of deaths
that must be observed. We then go on to determine the required number of
patients.
The values of c(α, β) for commonly chosen values of the significance level α
and power 1 − β are given in Table 15.1.
c(α, β)
d= 2 .
π(1 − π)θR
∑
r
U= (d1j − e1j ),
j=1
where e1j is the expected number of deaths in Group I at t(j) , given by e1j =
n1j dj /nj , and the variance of the log-rank statistic is
∑r
n1j n2j dj (nj − dj )
V = . (15.3)
j=1
n2j (nj − 1)
When using the log-rank test, the null hypothesis that θ = 0 is rejected
if the absolute value of U is sufficiently large, that is, if |U | > k, say, where
k > 0 is a constant. We therefore require that
and
P(|U | > k; θ = θR ) = 1 − β,
for a two-sided 100α% significance test to have power 1 − β.
We now quote without proof a result given in Sellke and Siegmund (1983),
according to which the log-rank statistic, U , has an approximate normal dis-
tribution with mean θV and variance V , for small values of θ. Indeed, the
result that U ∼ N (0, V ) under the null hypothesis θ = 0, is used as a basis
for the log-rank test. Then, since
For the sort of values of k that are likely to be used in the hypothesis test,
either P(U < −k; θ = θR ) or P(U > k; θ = θR ) will be negligible. For example,
if the new treatment is expected to increase survival so that θR is taken to be
less than zero, the probability of U having a value in excess of k, k > 0, will
be small. So without loss of generality we will take
We now denote the upper 100p% point of the standard normal distribu-
tion by zp . Then Φ(zp ) = 1 − p, where Φ(·) stands for the standard normal
distribution function. The quantity Φ(zp ) therefore represents the area under
a standard normal density function to the left of the value zp . Now, since
U ∼ N (0, V ) when θ = 0,
( )
k
P(U > k; θ = 0) = 1 − P(U 6 k; θ = 0) = 1 − Φ √ ,
(V )
and using Equation (15.5) we have that
( )
k
Φ √ = 1 − (α/2).
(V )
Therefore,
k
√ = zα/2 ,
(V )
where zα/2 is the upper α/2-point of the standard normal distribution, and
so k can be expressed as √
k = zα/2 (V ). (15.6)
In a similar manner, since U ∼ N (θR V, V ) when θ = θR ,
( )
−k − θR V
P(U < −k; θ = θR ) = Φ √ ≈ 1 − β,
(V )
and so we take
−k − θR V
√ = zβ ,
(V )
476 SAMPLE SIZE REQUIREMENTS FOR A SURVIVAL STUDY
where zβ is the upper β-point of the standard normal distribution. If we now
substitute for k from Equation (15.6), we get
√ √
−zα/2 (V ) − θR V = zβ (V ),
V = (zα/2 + zβ )2 /θR
2
, (15.7)
∑
r
n1j n2j dj
. (15.8)
j=1
n2j
Then, V is given by
∑
r
V ≈ dj /4 = d/4,
j=1
∑r
where d = j=1 dj is the total number of deaths among the individuals in
the study.
Finally, using Equation (15.7), we now require d to be such that
d (zα/2 + zβ )2
= 2 ,
4 θR
which leads to the required number of deaths being that given in Equa-
tion (15.1).
At later death times, that is, when the values of j in Expression (15.8) are
close to r, the numbers of individuals at risk in the two groups will be small.
This is likely to mean that n1j and n2j will be quite different at the later
death times, and so n1j n2j /n2j will be less than 0.25. This in turn means that
V < d/4 and so the required number of deaths will tend to be underestimated.
0.8
0.6
0.4
0.2
0.0
0 2 4 6 8 10
Survival time
Figure 15.1 Estimated survivor function for patients receiving a standard treatment
for hepatitis.
who have received the standard therapy. The Kaplan-Meier estimate of the
survivor function derived from such data is shown in Figure 15.1.
From this estimate of the survivor function, the median survival time is
3.3 years, and the survival rates at two, four and six years can be taken to be
given by S(2) = 0.70, S(4) = 0.45, and S(6) = 0.25.
The new treatment is expected to increase the survival rate at five years
from 0.41, the value under the standard treatment, to 0.60. This information
can be used to calculate a value for θR . To do this, we use the result that if
the hazard functions are assumed to be proportional, the survivor function
for an individual on the new treatment at time t is
SN (t) = [SS (t)]ψ , (15.9)
where SS (t) is the survivor function for an individual on the standard treat-
ment at t and ψ is the hazard ratio. Therefore,
log SN (t)
ψ= ,
log SS (t)
and so the value of ψ corresponding to an increase in S(t) from 0.41 to 0.60 is
log(0.60)
ψR = = 0.57.
log(0.41)
With this information, the survivor function for a patient on the new treat-
ment is [SS (t)]ψR , and so SN (2) = 0.82, SN (4) = 0.63, and SN (6) = 0.45. A
plot of the two survivor functions is shown in Figure 15.2.
478 SAMPLE SIZE REQUIREMENTS FOR A SURVIVAL STUDY
1.0
Estimated survivor function
0.8
0.6
0.4
0.2
0.0
0 2 4 6 8 10
Survival time
Figure 15.2 Estimated survivor functions for individuals on the standard treatment
(—) and the new treatment (·······).
The median survival time under the new treatment can be determined
from this estimate of the survivor function. Using Figure 15.2, the median
survival time under the new treatment is estimated to be about six years. A
hazard ratio of 0.57 therefore implies an increase in median survival time from
3.3 years on the standard treatment to 6 years on the new treatment.
To calculate the number of deaths that would be required in a study to
compare the two treatments, we will take α = 0.05 and 1 − β = 0.90. With
these values of α and β, the value of the function c(α, β) from Table 15.1
is 10.51. Substituting for c(0.05, 0.1) in Equation (15.2) and taking θR =
log ψR = log(0.57) = −0.562, the number of deaths required to have a 90%
chance of detecting a hazard ratio of 0.57 to be significant at the 5% level is
given by
4 × 10.51
d= = 133.
0.5622
Allowing for possible underestimation, this can be rounded up to 140 deaths
in total. This means that approximately 70 deaths would need to be observed
in each treatment group.
The treatment difference that it is required to detect may also be expressed
in terms of the desired absolute or relative change in the median survival time.
The corresponding log-hazard ratio, for use in Equation (15.1), can then be
found by reversing the preceding calculation. For example, suppose that an
increase in the median survival time from 3.3 years on the standard treatment
to just 5 years on the new treatment is anticipated. The survivor function
on the new treatment is then 0.5 when t = 5, and using Equation (15.9),
CALCULATING THE REQUIRED NUMBER OF PATIENTS 479
SN (5) = {SS (5)}ψR = 0.5. Consequently, ψR log{SS (5)} = log 0.5, and since
SS (5) = 0.41, ψR = 0.78. This reflects a less optimistic view of the treatment
effect than when ψR is taken to be 0.57. The corresponding number of deaths
that would need to be observed for this hazard ratio to be declared significantly
different from unity at the 5% level, with 90% power, is then around 680. This
is considerably greater than the number needed to identify a hazard ratio of
0.57 as significant.
The calculations described above are only going to be of direct use when a
study is to be continued until a given number of those entering the study have
died. Most trials will be designed on the basis of the number of patients to be
recruited and so we must now examine how this number can be obtained.
where d is the required number of deaths found from Equation (15.2). Ac-
cording to a result derived in the next section, the probability of death can
be taken as
1
P(death) = 1 − {S̄(f ) + 4S̄(0.5a + f ) + S̄(a + f )}, (15.11)
6
where
SS (t) + SN (t)
S̄(t) = ,
2
and SS (t) and SN (t) are the estimated values of the survivor functions for
individuals on the standard and new treatments, respectively, at time t.
The above result shows how the required number of patients can be cal-
culated for a trial with an accrual period of a and a follow-up period of f . Of
course, the duration of the accrual period and follow-up period will depend
on the recruitment rate. So suppose that the recruitment rate is expected to
be m patients per month and that d deaths are required. If n patients are
to be entered into the study over a period of a months, this means that n/a
need to be recruited in each month. In practice, information is likely to be
480 SAMPLE SIZE REQUIREMENTS FOR A SURVIVAL STUDY
available on the accrual rate, m, that can be expected. The number recruited
in an accrual period of length a is then ma and so the expected number of
deaths in the study is
ma × P(death).
Values of a and f which make this value close to the number of deaths required
can then be found numerically, for example, by trying out different values of
a and f . This algorithm could be computerised and an optimisation method
used to find the value of a that makes
close to zero for a range of values of f. Alternatively, the value of f that yields
the result in Equation (15.12) for a range of values of a can be found. A two-
way table giving the required number of patients for different combinations of
values of a and f will be particularly useful in planning a study.
The following section gives details underlying the derivation of the result
in Equation (15.11), and can again be omitted without loss of continuity.
so that ∫ a
1
P(death) = 1 − P(survival | entry at t) dt.
a 0
A patient entering the study at time t who survives for the duration of the
study, that is, to time a + f , must have been alive for a period of length
CALCULATING THE REQUIRED NUMBER OF PATIENTS 481
a + f − t after entry. The conditional probability P(survival | entry at t) is
therefore the probability of survival beyond a + f − t. This probability is
the value of the survivor function for that individual at a + f − t, that is,
S(a + f − t). Consequently,
∫
1 a
P(death) = 1 − S(a + f − t) dt,
a 0
so that
∫ a+f
a
S(u) du ≈ {S(f ) + 4S(0.5a + f ) + S(a + f )} ,
f 6
and hence, using Equation (15.15), the probability of death during the study
is given by
1
P(death) = 1 − {S(f ) + 4S(0.5a + f ) + S(a + f )} .
6
From this result, the approximate probability of death for an individual in
Group I, for whom the survivor function is SS (t), is
1
P(death; Group I) = 1 − {SS (f ) + 4SS (0.5a + f ) + SS (a + f )} ,
6
and similarly that for an individual in Group II is
1
P(death; Group II) = 1 − {SN (f ) + 4SN (0.5a + f ) + SN (a + f )} .
6
On the assumption that there is an equal probability of an individual being
assigned to either of the two treatment groups, the overall probability of death
is the average of these two probabilities, so that
and the result for the overall probability of death given in Equation (15.11)
can be modified accordingly.
SS (τ ) + SN (τ )
1− .
2
Using Equation (15.10), the required number of patients becomes
2d
n= ,
2 − SS (τ ) − SN (τ )
Full details on the issues to be considered when designing a clinical trial are
given in Friedman, Furberg and DeMets (2010), Matthews (2006) and Pocock
(1983) and the more general text of Altman (1991). Whitehead (1997) de-
scribes how a sequential clinical trial can be designed when the outcome vari-
able of interest is a survival time, and Jennison and Turnbull (2000) describe
group sequential methods.
Extensive tables of sample size requirements in studies involving different
types of response variable, including survival times, are provided by Machin
et al. (2009) and Julious (2010). A number of commercially available soft-
ware packages for sample size calculations, including PASS (Power Analysis
and Sample Size program) and nQuery Advisor, also implement methods for
calculating the required number of patients in a survival study.
The formula for the required number of deaths in Equation (15.1) appears
in many papers, including Bernstein and Lagakos (1978), Schoenfeld (1981),
Schoenfeld and Richter (1982) and Schoenfeld (1983), although the assump-
tions on which the result is based are different. Bernstein and Lagakos (1978)
obtain Equation (15.1) on the assumption that the survival times in each group
have exponential distributions. Lachin (1981), Rubinstein, Gail and Santner
(1981) and Lachin and Foulkes (1986) also discuss sample size requirements
in trials where the survival times are assumed to be exponentially distributed.
See also the earlier work of George and Desu (1974).
Schoenfeld (1981) obtains the same result as Bernstein and Lagakos (1978)
and others when the log-rank test is used to compare treatments, without
making the additional assumption of exponentiality. Schoenfeld (1983) shows
FURTHER READING 485
that Equation (15.1) holds when information on the values of explanatory
variables is allowed for.
The formulae for the required number of patients in Section 15.3 are based
on Schoenfeld (1983). When the assumption of exponential survival times is
made, these formulae simplify to the results of Schoenfeld and Richter (1982).
Although the resulting formulae are easier to use, it is dangerous to conduct
sample size calculations on the basis of restrictive assumptions about survival
time distributions.
A variant on the formula for the required number of deaths is given
by Freedman (1982). Freedman’s result has {(1 + ψ)/(1 − ψ)}2 in place of
4/(log ψ)2 in Equation (15.1). However, for small values of log ψ,
and so the two expressions will tend to give similar results. The approximate
formula for the required number of patients, given in Section 15.3.2, is also
due to Freedman (1982).
Lakatos (1988) presented a method for estimating the required number of
patients to compare two treatments which can accommodate matters such as
staggered entry, non-compliance, loss to follow-up and non-proportional haz-
ards. Lakatos and Lan (1992) show that the Lakatos method procedure per-
forms well in a variety of circumstances. This approach is based on a Markov
model, and requires a computer program for its implementation; a SAS macro
has been given by Shih (1995).
Approximate sample size formulae for use in modelling the subdistribution
hazard function in the presence of competing risks, described in Section 12.5
of Chapter 12, have been given by Latouche, Porcher and Chevret (2004).
Appendix A
487
488 MAXIMUM LIKELIHOOD ESTIMATION
or from the equivalent formula,
( { }2 )−1
d log L(β)
E .
dβ
The reciprocal of this function, evaluated at β̂, is then the approximate vari-
ance of β̂ given in Equation (A.2), that is,
1
var (β̂) ≈ .
i(β̂)
The standard error of β̂, that is, the square root of the estimated variance of
β̂, is found from
1
se (β̂) = √ .
{i(β̂)}
This standard error can be used to construct confidence intervals for β.
In order to test the null hypothesis that β = 0, three alternative test statis-
tics can be used. The likelihood ratio test statistic is the difference between the
values of −2 log L(β̂) and −2 log L(0). The Wald test is based on the statistic
β̂ 2 i(β̂),
INFERENCE ABOUT A VECTOR OF UNKNOWN PARAMETERS 489
and the score test statistic is
{u(0)}2
.
i(0)
Each of these statistics has an asymptotic chi-squared distribution on 1 d.f.,
under the null hypothesis that β = 0. Note that the Wald statistic is equivalent
to the statistic
β̂
,
se (β̂)
which has an asymptotic standard normal distribution.
d log L(β)
βˆ = 0,
dβj
for j = 1, 2, . . . , p, simultaneously.
The vector formed from β̂1 , β̂2 , . . . , β̂p is denoted β̂, and so the maximised
likelihood is L(β̂). The efficient score for βj , j = 1, 2, . . . , p, is
d log L(β)
u(βj ) = ,
dβj
∂ 2 log L(β̂)
,
∂βj ∂βk
The square root of the (j, j)th element of this matrix can be taken to be the
standard error of β̂j , for j = 1, 2, . . . , p.
The test statistics given in Section A.1 can be generalised to the multi-
parameter situation. Consider the test of the null hypothesis that all the
β-parameters in a fitted model are equal to zero. The likelihood ratio test
statistic is the value of
{ }
2 log L(β̂) − log L(0) ,
u′ (0)I −1 (0)u(0).
Each of these statistics has a chi-squared distribution on p d.f. under the null
hypothesis that β = 0.
In comparing alternative models, interest centres on the hypothesis that
some of the β-parameters in a model are equal to zero. To test this hypoth-
esis, the likelihood ratio test is the most suitable, and so we only consider
this procedure here. Suppose that a model that contains p + q parameters,
β1 , β2 , . . . , βp , βp+1 , . . . , βp+q , is to be compared with a model that only con-
tains the p parameters β1 , β2 , . . . , βp . This amounts to testing the null hy-
pothesis that the q parameters βp+1 , βp+2 , . . . , βp+q in the model with p + q
unknown parameters are all equal to zero. Let β̂ 1 denote the vector of es-
timates under the model with p + q parameters and β̂ 2 that for the model
with just p parameters. The likelihood ratio test of the null hypothesis that
βp+1 = βp+2 = · · · = βp+q = 0 in the model with p + q parameters is then
based on the statistic
{ }
2 log L(β̂ 1 ) − log L(β̂ 2 ) ,
which has a chi-squared distribution on q d.f., under the null hypothesis. This
test forms the basis for comparing alternative models, and was described in
greater detail in Section 3.5 of Chapter 3.
Appendix B
This appendix contains a number of data sets, together with some suggestions
for analyses that could be carried out. These data sets may be downloaded
from the publishers’s web site, at the location given in the preface.
Summarise the data in terms of the estimated survivor function for each
treatment group. Compare the groups using the log-rank and Wilcoxon tests.
Fit Cox and Weibull proportional hazards model to determine the significance
of the treatment effect. Compare the results from these different analyses in
terms of the significance of the treatment effect, and, for the model-based
analyses, the estimated hazard ratio and corresponding 95% confidence limits.
491
492 ADDITIONAL DATA SETS
Obtain a log-cumulative hazard plot of the data, and comment on which
method of analysis is the most appropriate.
The values of each of these variables for the ducks in the study are shown
in Table B.3.
Using a proportional hazards model, examine the effect of age, weight and
length on survival time. Investigate whether the coefficients of the explanatory
variables, Weight and Length, differ for the ducks in each age group. Is there
any evidence that non-linear functions of the variables Weight and Length are
needed in the model?
BONE MARROW TRANSPLANTATION 495
B.4 Bone marrow transplantation
Following the treatment of leukaemia, patients often undergo a bone marrow
transplant in order to help bring their blood cells back to a normal level. A
potentially fatal side effect of this is graft-versus-host disease, in which the
transplanted cells attack the host cells. In a study described by Bagot et
al. (1988), 37 patients who were in complete remission from acute myeloid
leukaemia (AML) or acute lymphocytic leukaemia (ALL), or in the chronic
phase of chronic myeloid leukaemia (CML), received a non-depleted allogeneic
bone marrow transplant. The age of the bone marrow donor, and whether or
not the donor had previously been pregnant, was recorded, together with
the age of the recipient, their type of leukaemia, an index of mixed epidermal
lymphocyte reactions, and whether or not the recipient developed graft-versus-
host disease. The variables in this data set are as follows:
The data, which were also given in Altman (1991), are presented in Ta-
ble B.4.
Using a Weibull accelerated failure time model, investigate the dependence
of the survival times on the prognostic variables. Estimate and plot the base-
line survivor function. Fit log-logistic and lognormal models that contain the
same explanatory variables, and estimate the baseline survivor function under
these models. Compare these estimates with the estimated baseline survivor
function obtained from fitting a Cox proportional hazards model. Hence com-
ment on which parametric model is the most appropriate, and further examine
the adequacy of this model using model-checking diagnostics.
Table B.4 Survival times of leukaemia patients who received a bone marrow
transplant.
Patient Time Status Rage Dage Type Preg Index Gvhd
1 95 1 27 23 2 0 0.27 0
2 1385 0 13 18 2 0 0.31 0
3 465 1 19 19 1 0 0.39 0
4 810 1 21 22 2 0 0.48 0
5 1497 0 28 38 2 0 0.49 0
6 1181 1 22 20 2 0 0.50 0
7 993 0 19 19 2 0 0.81 0
8 138 1 20 23 2 0 0.82 0
9 266 1 33 36 1 0 0.86 0
10 579 0 18 19 1 0 0.92 0
11 600 0 17 20 2 0 1.10 0
12 1182 0 31 21 3 0 1.52 0
13 841 0 23 38 2 0 1.88 0
14 1364 0 27 15 2 0 2.01 0
15 695 0 26 16 2 0 2.40 0
16 1378 0 28 25 1 0 2.45 0
17 736 0 24 21 1 1 2.60 0
18 1504 0 18 20 2 0 2.64 0
19 849 0 24 25 1 1 3.78 0
20 1266 0 20 24 3 0 4.72 0
21 186 1 23 35 1 1 1.10 1
22 41 1 21 35 2 1 1.16 1
23 667 0 21 23 3 0 1.45 1
24 112 1 33 43 3 0 1.50 1
25 572 0 29 24 3 1 1.85 1
26 45 1 42 35 2 1 2.30 1
27 1019 0 27 31 3 0 2.34 1
28 479 1 43 29 2 1 2.44 1
29 190 1 22 20 1 0 3.70 1
30 100 1 35 39 1 1 3.73 1
31 177 1 16 14 1 0 4.13 1
32 80 1 39 35 2 1 4.52 1
33 142 1 28 25 3 1 4.52 1
34 1105 0 29 32 3 0 4.71 1
35 803 0 23 19 3 0 5.07 1
36 1126 0 33 34 3 0 9.00 1
37 114 1 19 20 1 0 10.11 1
CHRONIC GRANULOMATOUS DISEASE 497
on a number of prognostic variables, and the end-point of interest was the time
to the first serious infection following randomisation. The database, described
in Therneau and Grambsch (2000), contains the following variables:
Patient: Patient number (1–128)
Time: Time to first infection in days
Status: Status of patient (0 = censored, 1 = infection)
Centre: Centre (1 = Harvard Medical School
2 = Scripps Institute, California
3 = Copenhagen
4 = National Institutes of Health, Maryland
5 = Los Angeles Children’s Hospital
6 = Mott Children’s Hospital, Michigan
7 = University of Utah
8 = Children’s Hospital of Philadelphia, Pennsylvania
9 = University of Washington
10 = University of Minnesota
11 = University of Zurich
12 = Texas Children’s Hospital
13 = Amsterdam
14 = Mount Sinai Medical Center)
Treat: Treatment group (0 = placebo, 1 = interferon)
Age: Age in years
Sex: Sex (1 = male, 2 = female)
Height: Height in cm
Weight: Weight in kg
Pattern: Pattern of inheritance (1 = X-linked, 2 = autosomal recessive)
Cort: Use of corticosteroids at trial entry (1 = used, 2 = not used)
Anti: Use of antibiotics at trial entry (1 = used, 2 = not used)
Identify a suitable model for the times to first infection, and investigate the
adequacy of the model. Is there any evidence that the treatment effect is not
consistent across the different centres? Summarise the treatment difference in
terms of a relevant point estimate and corresponding confidence interval.
Now suppose that the actual infection times are interval-censored, and
that the infection status of any individual can only be recorded at 90, 180,
270 and 360 days. Construct a new observation from the survival times, that
gives the interval within which an infection, or censoring, occurs. The obser-
vations from individuals who do not experience an infection, or who develop
an infection after 360 days, are regarded as censored. For this constructed
database, analyse the interval-censored data using the methods described in
Chapter 9. Compare the estimate of the treatment effect with that found from
the original data.
Fit parametric and Cox regression models with shared frailty to these data,
where Centre is a random frailty effect. Compare and contrast the two sets of
results and comment on the extent of between centre variability.
498 ADDITIONAL DATA SETS
Determine the number of patients who develop a first infection in the first
180 days at each centre. By fitting a Cox regression model to the first infection
times, obtain an estimate of the 180 day risk adjusted first infection rate at
each centre, using the methods described in Chapter 11. By treating centres
as either fixed or random, calculate corresponding interval estimates for this
rate, commenting on the extent of any differences.
Investigate the impact of dependent censoring in these data. First, fit a
Weibull model for the probability of censoring, and using this model, obtain
weights that are inversely proportional to censoring. After expressing the in-
fection time data in the counting process format, fit a weighted Cox regression
model to allow for any dependent censoring. Compare the results with those
obtained from an unweighted Cox regression analysis.
Bibliography
499
500 BIBLIOGRAPHY
Andersen, P.K. (1992) Repeated assessment of risk factors in survival analysis.
Statistical Methods in Medical Research, 1, 297–315.
Andersen, P.K. and Borgan, Ø. (1985) Counting process models for life history
data: A review. Scandinavian Journal of Statistics, 12, 97–158.
Andersen, P.K. and Gill, R.D. (1982) Cox’s regression model for counting
processes: A large sample study. Annals of Statistics, 10, 1100–1120.
Andersen, P.K. and Keiding, N. (2002) Multi-state models for event history
analysis. Statistical Methods in Medical Research, 11, 91–115.
Andersen, P.K. and Perme, M.P. (2010) Pseudo-observations in survival anal-
ysis. Statistical Methods in Medical Research, 19, 71–99.
Andersen, P.K., Borgan, Ø., Gill, R.D. and Keiding, N. (1993) Statistical
Methods Based on Counting Processes, Springer, New York.
Andersen, P.K., Hansen, M.G. and Klein, J.P. (2004) Regression analysis of
restricted mean survival time based on pseudo-observations. Lifetime Data
Analysis, 10, 335–350.
Andrews, D.F. and Herzberg, A.M. (1985) Data, Springer, New York.
Arjas, E. (1988) A graphical method for assessing goodness of fit in Cox’s pro-
portional hazards model. Journal of the American Statistical Association,
83, 204–212.
Armitage, P., Berry, G. and Matthews, J.N.S. (2002) Statistical Methods in
Medical Research, 4th ed., Blackwells Science Ltd, Oxford.
Atkinson, A.C. (1985) Plots, Transformations and Regression, Clarendon
Press, Oxford.
Bagot, M., Mary, J.Y., Heslan, M., Kuentz, M., Cordonnier, C., Vernant, J.P.,
Dubertret, L. and Levy, J.P. (1988) The mixed epidermal cell lymphocyte-
reaction is the most predictive factor of acute graft-versus-host disease in
bone marrow graft recipients. British Journal of Haematology, 70, 403–409.
Barlow, W.E. and Prentice, R.L. (1988) Residuals for relative risk regression.
Biometrika, 75, 65–74.
Barnett, V. (1999) Comparative Statistical Inference, 3rd ed., Wiley, Chich-
ester.
Becker, N.G. and Melbye, M. (1991) Use of a log-linear model to compute
the empirical survival curve from interval-censored data, with application
to data on tests for HIV positivity. Australian Journal of Statistics, 33,
125–133.
Bennett, S. (1983a) Analysis of survival data by the proportional odds model.
Statistics in Medicine, 2, 273–277.
Bennett, S. (1983b) Log-logistic regression models for survival data. Applied
Statistics, 32, 165–171.
BIBLIOGRAPHY 501
Bernstein, D. and Lagakos, S.W. (1978) Sample size and power determination
for stratified clinical trials. Journal of Statistical Computation and Simu-
lation, 8, 65–73.
Beyersmann, J., Allignol, A. and Schumacher, M. (2012) Competing Risks and
Multistate Models with R, Springer, New York.
Box, G.E.P. and Tidwell, P.W. (1962) Transformation of the independent
variables. Technometrics, 4, 531–550.
Box-Steffensmeier, J.M. and Jones, B.S. (2004) Event History Modeling: A
Guide for Social Scientists, Cambridge University Press, Cambridge.
Breslow, N.E. (1972) Contribution to the discussion of a paper by D.R. Cox.
Journal of the Royal Statistical Society, B, 34, 216–217.
Breslow, N.E. (1974) Covariance analysis of censored survival data. Biomet-
rics, 30, 89–100.
Breslow, N.E. and Crowley, J. (1974) A large sample study of the life table
and product limit estimates under random censorship. Annals of Statistics,
2, 437–453.
Breslow, N.E. and Day, N.E. (1987) Statistical Methods in Cancer Research.
2: The Design and Analysis of Cohort Studies, I.A.R.C., Lyon, France.
Brookmeyer, R. and Crowley, J. (1982) A confidence interval for the median
survival time. Biometrics, 38, 29–41.
Broström, G. (2012) Event History Analysis with R, Chapman & Hall/CRC,
Boca Raton, Florida.
Brown, H. and Prescott, R.J. (2000) Applied Mixed Models in Medicine, Wiley,
Chichester.
Burdette, W.J. and Gehan, E.A. (1970) Planning and Analysis of Clinical
Studies, Charles C. Thomas, Springfield, Illinois.
Byar, D.P. (1982) Analysis of survival data: Cox and Weibull models with co-
variates. In: Statistics in Medical Research (eds. V. Mike and K.E. Stanley),
Wiley, New York.
Cain, K.C. and Lange, N.T. (1984) Approximate case influence for the pro-
portional hazards regression model with censored data. Biometrics, 40,
493–499.
Chatfield, C. (1995) Problem Solving: A Statisticians Guide, 2nd ed., Chap-
man & Hall/CRC, London.
Chatfield, C. (2004) The Analysis of Time Series, 6th ed., Chapman &
Hall/CRC, London.
Chhikara, S. and Folks, J.L. (1989) Inverse Gaussian Distribution, Marcel
Dekker, New York.
Choodari-Oskooei, B., Royston, P. and Parmar, M.K.B. (2012) A simulation
study of predictive ability measures in a survival model I: Explained vari-
ation measures. Statistics in Medicine, 31, 2627–2643.
502 BIBLIOGRAPHY
Choodari-Oskooei, B., Royston, P. and Parmar, M.K.B. (2012) A simulation
study of predictive ability measures in a survival model II: Explained ran-
domness and predictive accuracy. Statistics in Medicine, 31, 2644–2659.
Christensen, E. (1987) Multivariate survival analysis using Cox’s regression
model. Hepatology, 7, 1346–1358.
Christensen, E., Schlichting, P., Andersen, P.K., Fauerholdt, L., Schou, G.,
Pedersen, B.V., Juhl, E., Poulsen, H., Tygstrup, N. and Copenhagen Study
Group for Liver Diseases (1986) Updating prognosis and therapeutic ef-
fect evaluation in cirrhosis with Cox’s multiple regression model for time-
dependent variables. Scandinavian Journal of Gastroenterology, 21, 163–
174.
Ciampi, A. and Etezadi-Amoli, J. (1985) A general model for testing the pro-
portional hazards and the accelerated failure time hypotheses in the analy-
sis of censored survival data with covariates. Communications in Statistics,
A, 14, 651–667.
Claeskens, G., Nguti, R. and Janssen, P. (2008) One-sided tests in shared
frailty models. Test, 17, 69–82.
Cleveland, W.S. (1979) Robust locally weighted regression and smoothing
scatterplots. Journal of the Americal Statistical Association, 74, 829–836.
Cleves, M., Gould, W., Gutierrez, R.G. and Marchenko, Y.V. (2010) An In-
troduction to Survival Analysis Using Stata, 3rd ed., Stata Press.
Cohen, A. and Barnett, O. (1995) Assessing goodness of fit of parametric re-
gression models for lifetime data–graphical methods. Statistics in Medicine,
14, 1785–1795.
Collett, D. (2003) Modelling Binary Data, 2nd ed., Chapman & Hall/CRC,
Boca Raton, Florida.
Commenges, D. (1999) Multi-state models in epidemiology. Lifetime Data
Analysis, 5, 315–327.
Cook, R.D. (1986) Assessment of local influence (with discussion). Journal of
the Royal Statistical Society, B, 48, 133–169.
Cook, R.D. and Weisberg, S. (1982) Residuals and Influence in Regression,
Chapman & Hall/CRC, London.
Cook, R.J. and Lawless, J.F. (2007) The Statistical Analysis of Recurrent
Events, Springer, New York.
Copas, J.B. and Heydari, F. (1997) Estimating the risk of reoffending by
using exponential mixture models. Journal of the Royal Statistical Society,
A, 160, 237–252.
Cox, D.R. (1972) Regression models and life tables (with discussion). Journal
of the Royal Statistical Society, B, 74, 187–220.
Cox, D.R. (1975) Partial likelihood. Biometrika, 62, 269–276.
BIBLIOGRAPHY 503
Cox, D.R. (1979) A note on the graphical analysis of survival data. Biometrika,
66, 188–190.
Cox, D.R. and Hinkley, D. V. (1974) Theoretical Statistics, Chapman &
Hall/CRC, London.
Cox, D.R. and Oakes, D. (1984) Analysis of Survival Data, Chapman &
Hall/CRC, London.
Cox, D.R. and Snell, E.J. (1968) A general definition of residuals (with dis-
cussion). Journal of the Royal Statistical Society, A, 30, 248–275.
Cox, D.R. and Snell, E.J. (1981) Applied Statistics: Principles and Examples,
Chapman & Hall/CRC, London.
Cox, D.R. and Snell, E.J. (1989) Analysis of Binary Data, 2nd ed., Chapman
& Hall/CRC, London.
Crawley, M.J. (2013) The R Book, 2nd ed., John Wiley & Sons Ltd, Chichester.
Crowder, M.J. (2001) Classical Competing Risks, Chapman & Hall/CRC,
Boca Raton, Florida.
Crowder, M.J. (2012) Multivariate Survival Analysis and Competing Risks,
Chapman & Hall/CRC, Boca Raton, Florida.
Crowder, M.J., Kimber, A.C., Smith, R.L. and Sweeting, T.J. (1991) Statis-
tical Analysis of Reliability Data, Chapman & Hall/CRC, London.
Crowley, J. and Hu, M. (1977) Covariance analysis of heart transplant survival
data. Journal of the American Statistical Association, 72, 27–36.
Crowley, J. and Storer, B.E. (1983) Comment on a paper by M. Aitkin et al.
Journal of the American Statistical Association, 78, 277–281.
Dalgaard, P. (2008) Introductory Statistics with R, 2nd ed., Springer, New
York.
David, H.A. and Moeschberger, M.L. (1978) The Theory of Competing Risks,
Lubrecht & Cramer Ltd, New York.
Davis, H. and Feldstein, M. (1979) The generalized Pareto law as a model for
progressively censored survival data. Biometrika, 66, 299–306.
Day, L. (1985) Residual analysis for Cox’s proportional hazards model. In:
Proceedings of STATCOM-MEDSTAT ‘85’, MacQuarie University, Sydney.
Demidenko, E. (2013), Mixed Models: Theory and Applications with R, 2nd
ed., John Wiley & Sons, Hoboken, New Jersey.
Der, G. and Everitt, B.S. (2013) Applied Medical Statistics Using SAS, Chap-
man & Hall/CRC, Boca Raton, Florida.
Dinse, G.E. (1991) Constant risk differences in the analysis of animal tumouri-
genicity data. Biometrics, 47, 681–700.
Dobson, A.J. (2001) An Introduction to Generalized Linear Models, 2nd ed.,
Chapman & Hall/CRC, Boca Raton, Florida.
504 BIBLIOGRAPHY
Draper, N.R. and Smith, H. (1998) Applied Regression Analysis, 3rd ed., Wi-
ley, New York.
Duchateau, L. and Janssen, P. (2008) The Frailty Model, Springer, New York.
Durrleman, S. and Simon, R. (1989) Flexible regression models with cubic
splines. Statistics in Medicine, 8, 551–561.
Edmunson, J.H., Fleming, T.R., Decker, D.G., Malkasian, G.D., Jorgen-
son, E.O., Jeffries, J.A., Webb, M.J. and Kvols, L.K. (1979) Different
chemotherapeutic sensitivities and host factors affecting prognosis in ad-
vanced ovarian carcinoma versus minimal residual disease. Cancer Treat-
ment Reports, 63, 241–247.
Efron, B. (1977) The efficiency of Cox’s likelihood function for censored data.
Journal of the American Statistical Association, 72, 557–565.
Efron, B. (1981) Censored data and the bootstrap. Journal of the American
Statistical Association, 76, 312–319.
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004) Least angle
regression. Annals of Statistics, 32, 407–451.
Elashoff, J.D. (1983) Surviving proportional hazards. Hepatology, 3, 1031–
1035.
Emerson, J.D. (1982) Nonparametric confidence intervals for the median in
the presence of right censoring. Biometrics, 38, 17–27.
Escobar, L.A. and Meeker, W.Q. (1988) Using the SAS system to assess lo-
cal influence in regression analysis with censored data. Proceedings of the
Annual SAS User’s Group Conference, 1036–1041.
Escobar, L.A. and Meeker, W.Q. (1992) Assessing influence in regression anal-
ysis with censored data. Biometrics, 48, 507–528.
Everitt, B.S. (1987) Introduction to Optimisation Methods, Chapman &
Hall/CRC, London.
Everitt, B.S. and Rabe-Hesketh, S. (2001) Analyzing Medical Data Using S-
PLUS, Springer, New York.
Farewell V.T. (1982) The use of mixture models for the analysis of survival
data with long-term survivors. Biometrics, 38, 1041–1046.
Farrington, C.P. (1996) Interval censored survival data: A generalized linear
modelling approach. Statistics in Medicine, 15, 283–292.
Farrington, C.P. (2000) Residuals for proportional hazards models with
interval-censored data. Biometrics, 56, 473–482.
Fine, J.P. and Gray, R.J. (1999) A proportional hazards model for the sub-
distribution of a competing risk. Journal of the American Statistical Asso-
ciation, 94, 496–509.
Finkelstein, D.M. (1986) A proportional hazards model for interval-censored
failure time data. Biometrics, 42, 845–854.
BIBLIOGRAPHY 505
Finkelstein, D.M. and Wolfe, R.A. (1985) A semiparametric model for regres-
sion analysis of interval-censored failure time data. Biometrics, 41, 933–945.
Fisher, L.D. (1992) Discussion of a paper by L.J. Wei. Statistics in Medicine,
11, 1881–1885.
Fleming, T.R. and Harrington, D.P. (2005) Counting Processes and Survival
Analysis, John Wiley & Sons, Hoboken, New Jersey.
Ford, I., Norrie, J. and Ahmadi, S. (1995) Model inconsistency, illustrated by
the Cox proportional hazards model. Statistics in Medicine, 14, 735–746.
Freedman, L.S. (1982) Tables of the number of patients required in clinical
trials using the logrank test. Statistics in Medicine, 1, 121–129.
Friedman, L.M., Furberg, C.D. and DeMets, D.L. (2010) Fundamentals of
Clinical Trials, 4th ed., Springer, New York.
Fyles, A.W., McCready, D.R., Manchul, L.A., Trudeau, M.E., Merante, P.,
Pintilie, M., Weir, L. and Olivotto, I.A. (2004) Tamoxifen with or without
breast irradiation in women 50 years of age or older with early breast cancer.
New England Journal of Medicine, 351, 963–970.
Gail, M.H., Santner, T.J. and Brown, C.C. (1980) An analysis of comparative
carcinogenesis experiments based on multiple times to tumor. Biometrics,
36, 255–266.
Gaver, D.P. and Acar, M. (1979) Analytical hazard representations for use in
reliability, mortality and simulation studies. Communications in Statistics,
8, 91–111.
Geerdens, C., Claeskens, G. and Janssen, P. (2013) Goodness-of-fit tests for
the frailty distribution in proportional hazards models with shared frailty.
Biostatistics, 14, 433–446.
Gehan, E.A. (1969) Estimating survival functions from the life table. Journal
of Chronic Diseases, 21, 629–644.
George, S.L. and Desu, M.M. (1974) Planning the size and duration of a
clinical trial studying the time to some critical event. Journal of Chronic
Diseases, 27, 15–24.
Gill, R.D. (1984) Understanding Cox’s regression model: A martingale ap-
proach. Journal of the American Statistical Association, 79, 441–447.
Gill, R.D. and Schumacher, M. (1987) A simple test of the proportional haz-
ards assumption. Biometrika, 74, 289–300.
Glidden, D.V. (1999) Checking the adequacy of the gamma frailty model for
multivariate failure times. Biometrika, 86, 381–393.
Goeman, J., Meijer, R. and Chaturvedi, N. (2013) L1 (lasso and fused lasso)
and L2 (ridge) penalized estimation in GLMs and in the Cox model. R
package version 0.9-42, URL http://www.msbi.nl/goeman.
Goeman, J.J. (2010) L1 penalized estimation in the Cox proportional hazards
model. Biometrical Journal, 52, 70–84.
506 BIBLIOGRAPHY
Goldstein, H. and Spiegelhalter, D.J. (1996) League tables and their limita-
tions: statistical issues in comparisons of institutional performance, Journal
of the Royal Statistical Society, A, 159, 385–443.
Gönen, M. and Heller, G. (2005) Concordance probability and discriminatory
power in proportional hazards regression. Biometrika, 92, 965–970.
Gore, S.M., Pocock, S.J. and Kerr, G.R. (1984) Regression models and nonpro-
portional hazards in the analysis of breast cancer survival. Applied Statis-
tics, 33, 176–195.
Grambsch, P.M. and Therneau, T.M. (1994) Proportional hazards tests and
diagnostics based on weighted residuals. Biometrika, 81, 515–526.
Gray, R. (1990) Some diagnostic methods for Cox regression models through
hazard smoothing. Biometrics, 46, 93–102.
Gray, R. (2013) Subdistribution analysis of competing risks. R package version
2.2-6, URL http://www.r-project.org.
Gray, R.J. (1988) A class of k-sample tests for comparing the cumulative
incidence of a competing risk. Annals of Statistics, 16, 1141–1154.
Gray, R.J. (1992) Flexible methods for analyzing survival data using splines,
with applications to breast cancer prognosis. Journal of the American Sta-
tistical Association, 87, 942–951.
Greenwood, M. (1926) The errors of sampling of the survivorship tables. Re-
ports on Public Health and Statistical Subjects, number 33, Appendix 1,
HMSO, London.
Grønnesby, J.K. and Borgan, Ø. (1996) A method for checking regression
models in survival analysis based on the risk score. Lifetime Data Analysis,
2, 315–328.
Hall, W.J. and Wellner, J.A. (1980) Confidence bands for a survival curve
from censored data. Biometrika, 67, 133–143.
Hall, W.J., Rogers, W.H. and Pregibon, D. (1982) Outliers matter in sur-
vival analysis. Rand Corporation Technical Report P–6761, Santa Monica,
California.
Hammer, S.M., Katzenstein, D.A., Hughes, M.D., Gundacker, H., Schooley,
R.T., Haubrich, R.H., Henry, W.K., Lederman, M.M., Phair, J.P., Niu,
M., Hirsch, M.S. and Merigan, T.C. (1996) A trial comparing nucleoside
monotherapy with combination therapy in HIV-infected adults with CD4
cell counts from 200 to 500 per cubic millimeter. New England Journal of
Medicine, 335, 1081–1090.
Hand, D.J., Daly, F., Lunn, A.D., McConway, K.J. and Ostrowski, E. (1994)
A Handbook of Small Data Sets, Chapman & Hall/CRC, London.
Harrell, F.E. (2001) Regression Modelling Strategies, with Applications to Lin-
ear Models, Logistic Regression, and Survival Analysis, Springer-Verlag,
New York.
BIBLIOGRAPHY 507
Harrell, F.E., Lee, K.L. and Mark, D.B. (1996) Multivariable prognostic mod-
els: issues in developing models, evaluating assumptions and adequacy, and
measuring and reducing errors. Statistics in Medicine, 15, 361–387.
Harris, E.K. and Albert, A. (1991) Survivorship Analysis for Clinical Studies,
Marcel Dekker, New York.
Hastie, T. and Tibshirani, R. (1990) Generalized Additive Models, Chapman
& Hall/CRC, London.
Heinzl, H. (2000) Using SAS to calculate the Kent and O’Quigley measure
of dependence for Cox proportional hazards regression model. Computer
Methods and Programs in Biomedicine, 63, 71–76.
Henderson, R. and Milner, A. (1991) On residual plots for relative risk regres-
sion. Biometrika, 78, 631–636.
Hielscher, T., Zucknick, M., Werft, W. and Benner, A. (2010) On the prognos-
tic value of survival models with application to gene expression signatures.
Statistics in Medicine, 30, 818–829.
Hinchcliffe, S.R. and Lambert, P.C. (2013) Flexible parametric modelling of
cause-specific hazards to estimate cumulative incidence functions. BioMed
Central Medical Research Methodology, 13, 1–14.
Hinkley, D.V., Reid, N. and Snell, E.J. (1991) Statistical Theory and Mod-
elling, Chapman & Hall/CRC, London.
Hjorth, U. (1980) A reliability distribution with increasing, decreasing and
bathtub-shaped failure rate. Technometrics, 22, 99–107.
Hoel, D.G. (1972) A representation of mortality data by competing risks.
Biometrics, 28, 475–488.
Hollander, M. and Proschan, F. (1979) Testing to determine the underlying
distribution using randomly censored data. Biometrics, 35, 393–401.
Hosmer, D.W. and Lemeshow, S. (2000) Applied Logistic Regression, 2nd ed.,
John Wiley & Sons, Hoboken, New Jersey.
Hosmer, D.W., Lemeshow, S. and May, S. (2008) Applied Survival Analysis:
Regression Modeling of Time to Event Data, 2nd ed., John Wiley & Sons,
Hoboken, New Jersey.
Hougaard, P. (1995) Frailty models for survival data. Lifetime Data Analysis,
1, 255–273.
Hougaard, P. (1999) Multi-state models: A review. Lifetime Data Analysis, 5,
239–264.
Hougaard, P. (2000) Analysis of Multivariate Survival Data, Springer, New
York.
Hougaard, P. and Madsen, E.B. (1985) Dynamic evaluation of short-term
prognosis of myocardial infarction. Statistics in Medicine, 4, 29–38.
Jennison, C. and Turnbull, B.W. (2000) Group Sequential Methods with Ap-
plications to Clinical Trials, Chapman & Hall/CRC, Boca Raton, Florida.
508 BIBLIOGRAPHY
Jeong, J.-H. and Fine, J.P. (2007) Parametric regression on cumulative inci-
dence function, Biostatistics, 8, 184–196.
Johnson, N.L. and Kotz, S. (1994) Distributions in Statistics: Continuous
Univariate Distributions, Volume 1, John Wiley & Sons, Hoboken, New
Jersey.
Johnson, N.L., Kemp, A.W. and Kotz, S. (2005) Distributions in Statistics:
Discrete Distributions, John Wiley & Sons, Hoboken, New Jersey.
Julious, S.A. (2010) Sample Sizes for Clinical Trials, Chapman & Hall/CRC,
Boca Raton, Florida.
Kalbfleisch, J.D. and Prentice, R.L. (1972) Contribution to the discussion of a
paper by D.R. Cox. Journal of the Royal Statistical Society, B, 34, 215–216.
Kalbfleisch, J.D. and Prentice, R.L. (1973) Marginal likelihoods based on
Cox’s regression and life model. Biometrika, 60, 267–278.
Kalbfleisch, J.D. and Prentice, R.L. (2002) The Statistical Analysis of Failure
Time Data, 2nd ed., Wiley, New York.
Kaplan, E.L. and Meier, P. (1958) Nonparametric estimation from incomplete
observations. Journal of the American Statistical Association, 53, 457–481.
Kay, R. (1977) Proportional hazard regression models and the analysis of
censored survival data. Applied Statistics, 26, 227–237.
Kay, R. (1984) Goodness of fit methods for the proportional hazards model.
Revue Epidemiologie et de Santé Publique, 32, 185–198.
Keiding, N., Klein, J.P. and Horowitz, M.M. (2001) Multistate models and
outcome prediction in bone marrow transplantation. Statistics in Medicine,
20, 1871–1885.
Kelly, P.J. and Lim, L.L.-Y. (2000) Survival analysis for recurrent event data:
An application to childhood infectious diseases. Statistics in Medicine, 19,
13–33.
Kent, J.T. and O’Quigley, J. (1988) Measures of dependence for censored
survival data. Biometrika, 75, 525–534.
Kirk, A.P., Jain, S., Pocock, S., Thomas, H.C. and Sherlock, S. (1980) Late re-
sults of the Royal Free Hospital prospective controlled trial of prednisolone
therapy in hepatitis B surface antigen negative chronic active hepatitis.
Gut, 21, 78–83.
Klein, J.P. (1991) Small-sample moments of some estimators of the variance
of the Kaplan-Meier and Nelson-Aalen estimators. Scandinavian Journal
of Statistics, 18, 333–340.
Klein, J.P. (1992) Semiparametric estimation of random effects using the Cox
model based on the EM algorithm. Biometrics, 48, 795–806.
Klein, J.P. and Moeschberger, M.L. (1988) Bounds on net survival probabili-
ties for dependent competing risks. Biometrics, 44, 529–538.
BIBLIOGRAPHY 509
Klein, J.P. and Moeschberger, M.L. (2005) Survival Analysis: Techniques for
Censored and Truncated Data, 2nd ed., Springer, New York.
Klein, J.P. and Shu, Y. (2002) Multi-state models for bone marrow transplan-
tation studies. Statistical Methods in Medical Research, 11, 117–139.
Klein, J.P., Gerster, M., Andersen, P.K., Tarima, S. and Perme, M.P. (2008)
SAS and R functions to compute pseudo-values for censored data regression.
Computer Methods and Programs in Biomedicine, 89, 289–300.
Kleinbaum, D.G. and Klein, J.P. (2012) Survival Analysis: A Self-Learning
Text, 3rd ed., Springer, New York.
Kodell, R.L. and Nelson, C.J. (1980) An illness-death model for the study
of the carcinogenic process using survival/sacrifice data. Biometrics, 36,
267–277.
Kohl, M. and Heinze, G. (2012) PSHREG: A SAS macro for proportional
and non proportional subdistribution hazards regression with competing risk
data, Technical Report 08/2012, Section for Clinical Biometrics, Medical
University of Vienna.
Krall, J.M., Uthoff, V.A. and Harley, J.B. (1975) A step-up procedure for
selecting variables associated with survival. Biometrics, 31, 49–57.
Kuk, A.Y.C. and Chen, C. (1992) A mixture model combining logistic regres-
sion with proportional hazards regression. Biometrika, 79, 531–541.
Lachin, J.M. (1981) Introduction to sample size determination and power
analysis for clinical trials. Controlled Clinical Trials, 2, 93–113.
Lachin, J.M. and Foulkes, M.A. (1986) Evaluation of sample size and power
for analyses of survival with allowance for nonuniform patient entry, losses
to follow-up, noncompliance, and stratification. Biometrics, 42, 507–519.
Lagakos, S.W. (1981) The graphical evaluation of explanatory variables in
proportional hazards models. Biometrika, 68, 93–98.
Lakatos, E. (1988) Sample sizes based on the log-rank statistic in complex
clinical trials. Biometrics, 44, 229–241.
Lakatos, E. and Lan, K.K.G. (1992) A comparison of sample size methods for
the log-rank statistic. Statistics in Medicine, 11, 179–191.
Lambert, P., Collett, D., Kimber, A. and Johnson, R. (2004) Parametric ac-
celerated failure time models with random effects and an application to
kidney transplant survival. Statistics in Medicine, 23, 3177–3192.
Latouche, A., Beyersmann, J. and Fine, J.P. (2007) Letter to the editor: Com-
ments on ‘Analysing and interpreting competing risk data’ by M. Pintilie.
Statistics in Medicine, 26, 3676–3680.
Latouche, A., Porcher, R. and Chevret, S. (2004) Sample size formula for
proportional hazards modelling of competing risks. Statistics in Medicine,
23, 3263–3274.
510 BIBLIOGRAPHY
Lawless, J.F. (2002) Statistical Models and Methods for Lifetime Data, 2nd
ed., Wiley, New York.
Leathem, A.J. and Brooks, S.A. (1987) Predictive value of lectin binding on
breast-cancer recurrence and survival. The Lancet, 329, 1054–1056.
Lee, E.T. and Wang, J.W. (2013) Statistical Methods for Survival Data Anal-
ysis, 4th ed., Wiley, New York.
Lin, D.Y. (1997) Non-parametric inference for cumulative incidence functions
in competing risks studies. Statistics in Medicine, 16, 901–910.
Lin, D.Y. and Wei, L.J. (1989) The robust inference for the Cox proportional
hazards model. Journal of the American Statistical Association, 84, 1074–
1078.
Lin, D.Y. and Wei, L.J. (1991) Goodness-of-fit tests for the general Cox re-
gression model. Statistica Sinica, 1, 1–17.
Lindley, D.V. and Scott, W.F. (1984) New Cambridge Elementary Statistical
Tables, Cambridge University Press, Cambridge.
Lindsey, J.C. and Ryan, L.M. (1993) A three state multiplicative model for
rodent tumourigenicity experiments. Applied Statistics, 42, 283–300.
Lindsey, J.C. and Ryan, L.M. (1998) Methods for interval-censored data.
Statistics in Medicine, 17, 219–238.
Lindsey, J.K. (1998) A study of interval censoring in parametric regression
models. Lifetime Data Analysis, 4, 329–354.
Machin, D., Campbell, M.J., Tan, S.B. and Tan, S.H. (2009) Sample Sizes for
Clinical Trials, 3rd ed., John Wiley & Sons Ltd, Chichester.
Machin, D., Cheung, Y.B. and Parmar, M.K.B. (2006) Survival Analysis: A
Practical Approach, Wiley, New York.
Maller, R.A. and Zhao, X. (2002) Analysis of parametric models for competing
risks. Statistica Sinica, 12, 725–750.
Mantel, N. (1966) Evaluation of survival data and two new rank order statis-
tics arising in its consideration. Cancer Chemotherapy Reports, 50, 163–
170.
Mantel, N. and Haenszel, W. (1959) Statistical aspects of the analysis of data
from retrospective studies of disease. Journal of the National Cancer Insti-
tute, 22, 719–748.
Marubini, E. and Valsecchi, M.G. (1995) Analysing Survival Data from Clin-
ical Trials and Observational Studies, Wiley, New York.
Matthews, J.N.S. (2006) Introduction to Randomized Controlled Clinical Tri-
als, 2nd ed., Chapman & Hall/CRC, Boca Raton, Florida.
May, S. and Hosmer, D.W. (1998) A simplified method of calculating an overall
goodness-of-fit test for the Cox proportional hazards model. Lifetime Data
Analysis, 4, 109–120.
BIBLIOGRAPHY 511
McCrink, L.M., Marshall, A.H. and Cairns, K.J. (2013) Advances in joint
modelling: a review of recent developments with application to the survival
of end stage renal disease patients. International Statistical Review, 81,
249–269.
McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models, 2nd ed.,
Chapman & Hall/CRC, London.
McCulloch, C.E., Searle, S.R. and Neuhaus, J.M. (2008) Generalized, Linear,
and Mixed Models, 2nd ed., John Wiley & Sons, Hoboken, New Jersey.
McGilchrist, C.A. and Aisbett, C.W. (1991) Regression with frailty in survival
analysis. Biometrics, 47, 461–466.
McKnight, B. and Crowley, J. (1984) Tests for differences in tumour incidence
based on animal carcinogenesis experiments. Journal of the American Sta-
tistical Association, 79, 639–648.
Meier, P. (1975) Estimation of a distribution function from incomplete obser-
vations. In: Perspectives in Probability and Statistics (ed. J. Gani), Aca-
demic Press, London, pp. 67–87.
Meira-Machado, L., de Uña-Alvarez, J., Cadarso-Suárez, C. and Andersen,
P.K. (2009) Multi-state models for the analysis of time-to-event data. Sta-
tistical Methods in Medical Research, 18, 195–222.
Metcalfe, C.R. and Thompson, S.G. (2007) Wei, Lin and Weissfeld’s marginal
analysis of multivariate failure time data: should it be applied to a recurrent
events outcome? Statistical Methods in Medical Research, 16, 103–122.
Miller, A.J. (2002) Subset Selection in Regression, 2nd ed., Chapman &
Hall/CRC, Boca Raton, Florida.
Moeschberger, M.L. and Klein, J.P. (1995) Statistical methods for dependent
competing risks. Lifetime Data Analysis, 1, 195–204.
Montgomery, D.C., Peck, E.A. and Vining, G. (2012) Introduction to Linear
Regression Analysis, 5th ed., Wiley, New York.
Moreau, T., O’Quigley, J. and Mesbah, M. (1985) A global goodness-of-fit
statistic for the proportional hazards model. Applied Statistics, 34, 212–
218.
Morgan, B.J.T. (1992) Analysis of Quantal Response Data, Chapman &
Hall/CRC, London.
Nagelkerke, N.J.D., Oosting, J. and Hart, A.A.M. (1984) A simple test for
goodness of fit of Cox’s proportional hazards model. Biometrics, 40, 483–
486.
Nair, V.N. (1984) Confidence bands for survival functions with censored data:
a comparative study. Technometrics, 26, 265–275.
Nardi, A. and Schemper, M. (1999) New residuals for Cox regression and their
application to outlier screening. Biometrics, 55, 523–529.
512 BIBLIOGRAPHY
Nelder, J.A. (1977) A reformulation of linear models (with discussion). Journal
of the Royal Statistical Society, A, 140, 48–77.
Nelson, K.P., Lipsitz, S.R., Fitzmaurice, G.M., Ibrahim, J., Parzen, M. and
Strawderman, R. (2006) Use of the probability integral transformation to
fit nonlinear mixed-effects models with nonnormal random effects. Journal
of Computational and Graphical Statistics, 15, 39–57.
Nelson, W. (1972) Theory and applications of hazard plotting for censored
failure data. Technometrics, 14, 945–965.
Neuberger, J., Altman, D.G., Christensen, E., Tygstrup, N. and Williams, R.
(1986) Use of a prognostic index in evaluation of liver transplantation for
primary biliary cirrhosis. Transplantation, 4, 713–716.
Nieto, F.J. and Coresh, J. (1996) Adjusting survival curves for confounders: A
review and a new method. American Journal of Epidemiology, 143, 1059–
1068.
O’Quigley, J. and Pessione, F. (1989) Score tests for homogeneity of regression
effect in the proportional hazards model. Biometrics, 45, 135–144.
Ohlssen, D., Sharples, L.D. and Spiegelhalter, D.J. (2007) A hierarchical
modelling framework for identifying unusual performance in health care
providers. Journal of the Royal Statistical Society, A, 170, 865–890.
Pan, W. (2000) A two-sample test with interval censored data via multiple
imputation. Statistics in Medicine, 19, 1–11.
Peng Y. and Dear K.B.G. (2000) A nonparametric mixture model for cure
rate estimation. Biometrics, 56, 237–243.
Pepe, M.S. and Mori, M. (1993) Kaplan-Meier, marginal or conditional prob-
ability curves in summarizing competing risks failure time data? Statistics
in Medicine, 12, 737–751.
Petersen, T. (1986) Fitting parametric survival models with time-dependent
covariates. Applied Statistics, 35, 281–288.
Peterson, A.V. (1976) Bounds for a joint distribution function with fixed sub-
distribution functions: application to competing risks. Proceedings of the
National Academy of Sciences, 73, 11–13.
Peto, R. (1972) Contribution to the discussion of a paper by D.R. Cox. Journal
of the Royal Statistical Society, B, 34, 205–207.
Peto, R. and Peto, J. (1972) Asymptotically efficient rank invariant proce-
dures. Journal of the Royal Statistical Society, A, 135, 185–207.
Peto, R., Pike, M.C., Armitage, P., Breslow, N.E., Cox, D.R., Howard, S.V.,
Mantel, N., McPherson, K., Peto, J. and Smith, P.G. (1977) Design and
analysis of randomized clinical trials requiring prolonged observation of
each patient. II. Analysis and examples. British Journal of Cancer, 35,
1–39.
BIBLIOGRAPHY 513
Pettitt, A.N. and Bin Daud, I. (1989) Case-weighted measures of influence for
proportional hazards regression. Applied Statistics, 38, 51–67.
Pettitt, A.N. and Bin Daud, I. (1990) Investigating time dependence in Cox’s
proportional hazards model. Applied Statistics, 39, 313–329.
Pintilie, M. (2006) Competing Risks: A Practical Perspective, John Wiley &
Sons, Chichester.
Pintilie, M. (2007a) Analysing and interpreting competing risk data. Statistics
in Medicine, 26, 1360–1367.
Pintilie, M. (2007b) Authors reply to letter to the editor from A. Latouche,
J. Beyersmann and J.P. Fine. Statistics in Medicine, 26, 3679–3680.
Pocock, S.J. (1983) Clinical Trials: A Practical Approach, Wiley, Chichester.
Pollard, A.H., Yusuf, F. and Pollard, G.N. (1990) Demographic Techniques,
3rd ed., Pergamon Press, Sydney.
Prentice, R.L. and Gloeckler, L.A. (1978) Regression analysis of grouped sur-
vival data with application to breast cancer data. Biometrics, 34, 57–67.
Prentice, R.L. and Kalbfleisch, J.D. (1979) Hazard rate models with covari-
ates. Biometrics, 35, 25–39.
Prentice, R.L., Williams, B.J. and Peterson, A.V. (1981) On the regression
analysis of multivariate failure time data. Biometrika, 68, 373–379.
Putter, H. (2011) Special issue about competing risks and multi-state models.
Journal of Statistical Software, 38, 1–4.
Putter, H., Fiocco, M. and Geskus, R.B. (2007) Tutorial in biostatistics: Com-
peting risks and multi-state models. Statistics in Medicine, 26, 2389–2430.
Quantin, C., Moreau, T., Asselain, B., Maccario, J. and Lellouch, J. (1996) A
regression survival model for testing the proportional hazards hypothesis.
Biometrics, 52, 874–885.
R Core Team. (2013) R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria.
URL http://www.R-project.org/.
Rabe-Hesketh, S. and Everitt, B.S. (2007) A Handbook of Statistical Analyses
Using Stata, 4th ed., Chapman & Hall/CRC, Boca Raton, Florida.
Ramlau-Hansen, H. (1983) Smoothing counting process intensities by means
of kernel functions. Annals of Statistics, 11, 453–466.
Rancel, M.M.S. and Sierra, M.A.G. (2001) Regression diagnostic using local
influence: A review. Communications in Statistics–Theory and Methods,
30, 799–813.
Reid, N. and Crépeau, H. (1985) Influence functions for proportional hazards
regression. Biometrika, 72, 1–9.
Ripatti, S. and Palmgren, J. (2000) Estimation of multivariate frailty models
using penalised partial likelihood. Biometrics, 56, 1016–1022.
514 BIBLIOGRAPHY
Rizopoulos, D. (2012) Joint Models for Longitudinal and Time-to-Event Data:
with Applications in R, Chapman & Hall/CRC, Boca Raton, Florida.
Robins, J.M. (1993) Information recovery and bias adjustment in proportional
hazards regression analysis of randomized trials using surrogate markers.
In Proceedings of the Biopharmaceutical Section, American Statistical As-
sociation, Virginia, 24–33.
Robins, J.M. and Finkelstein, D.M. (2000) Correcting for non-compliance and
dependent censoring in an AIDS clinical trial with inverse probability of
censoring weighted (IPCW) log rank tests. Biometrika, 56, 779–788.
Robins, J.M. and Rotnitzky, A. (1992) Recovery of information and ad-
justment for dependent censoring using surrogate markers. In AIDS
Epidemiology–Methodological Issues (eds. N. Jewell, K. Dietz and V.T.
Farewell), Birkhäuser, Boston, 297–331.
Royston, P. (2006) Explained variation for survival models. The Stata Journal,
6, 83–96.
Royston, P. and Altman, D.G. (1994) Regression using fractional polynomials
of continuous covariates: parsimonious parametric modelling (with discus-
sion). Applied Statistics, 43, 429–467.
Royston, P. and Lambert, P.C. (2011) Flexible Parametric Survival Analysis
Using Stata: Beyond the Cox Model, Stata Press, Texas.
Royston, P. and Parmar, M.K.B. (2002) Flexible proportional-hazards and
proportional-odds models for censored survival data, with application to
prognostic modelling and estimation of treatment effects. Statistics in
Medicine, 21, 2175–2197.
Royston, P. and Parmar, M.K.B. (2011) The use of restricted mean survival
time to estimate the treatment effect in randomized clinical trials when the
proportional hazards assumption is in doubt. Statistics in Medicine, 30,
2409–2421.
Royston, P. and Sauerbrei, W. (2004) A new measure of prognostic separation
in survival data. Statistics in Medicine, 23, 723–748.
Rubinstein, L.V., Gail, M.H. and Santner, T.J. (1981) Planning the dura-
tion of a comparative clinical trial with loss to follow-up and a period of
continued observation. Journal of Chronic Diseases, 34, 469–479.
Satten, G.A., Datta, S. and Robins, J.M. (2001) Estimating the marginal
survival function in the presence of time dependent covariates. Statistics
and Probability Letters, 54, 397–403.
Sauerbrei, W. and Royston, P. (1999) Building multivariate prognostic and
diagnostic models: transformations of the predictors by using fractional
polynomials. Journal of the Royal Statistical Society, A, 162, 71–94.
Scharfstein, D.O. and Robins, J.M. (2002). Estimation of the failure time
distribution in the presence of informative censoring. Biometrika, 89, 617–
634.
BIBLIOGRAPHY 515
Schemper, M. (1992) Cox analysis of survival data with non-proportional haz-
ard functions. The Statistician, 41, 455–465.
Schemper, M. and Stare, J. (1996) Explained variation in survival analysis.
Statistics in Medicine, 15, 1999–2012.
Schemper, M., Wakounig, S. and Heinze, G. (2009) The estimation of average
hazard ratios by weighted Cox regression. Statistics in Medicine, 28, 2473–
2489.
Schlucter, M.D. (1992) Methods for the analysis of informatively censored
longitudinal data. Statistics in Medicine, 11, 1861–1870.
Schoenfeld, D.A. (1980) Chi-squared goodness of fit tests for the proportional
hazards regression model. Biometrika, 67, 145–153.
Schoenfeld, D.A. (1981) The asymptotic properties of comparative tests for
comparing survival distributions. Biometrika, 68, 316–319.
Schoenfeld, D.A. (1982) Partial residuals for the proportional hazards regres-
sion model. Biometrika, 69, 239–241.
Schoenfeld, D.A. (1983) Sample-size formula for the proportional-hazards re-
gression model. Biometrics, 39, 499–503.
Schoenfeld, D.A. and Richter, J.R. (1982) Nomograms for calculating the
number of patients needed for a clinical trial with survival as an endpoint.
Biometrics, 38, 163–170.
Schumacher, M., Bastert, G., Bojar, H., Hübner, K., Olschewski, M., Sauer-
brei, W., Schmoor, C., Beyerle, C., Neumann, R.L.A. and Rauschecker,
H.F. (1994) Randomized 2 × 2 trial evaluating hormonal treatment and the
duration of chemotherapy in node-positive breast cancer patients. Journal
of Clinical Oncology, 12, 2086–2093.
Searle, S.R., Casella, G. and McCulloch, C.E. (2006) Variance Components,
John Wiley & Sons, Hoboken, New Jersey.
Sellke, T. and Siegmund, D. (1983) Sequential analysis of the proportional
hazards model. Biometrika, 70, 315–326.
Shih, J.H. (1995) Sample size calculation for complex clinical trials with sur-
vival endpoints. Controlled Clinical Trials, 16, 395–407.
Shih, J.H. (1998) A goodness-of-fit test for association in a bivariate survival
model. Biometrika, 85, 189–200.
Shih, J.H. and Louis, T.A. (1995) Inferences on the association parameter in
copula models for bivariate survival data. Biometrics, 51, 1384–1399.
Siannis, F. (2004) Applications of a parametric model for informative censor-
ing. Biometrics, 60, 704–714.
Siannis, F. (2011) Sensitivity analysis for multiple right censoring processes:
Investigating mortality in psoriatic arthritis. Statistics in Medicine, 30,
356–367.
516 BIBLIOGRAPHY
Siannis, F., Copas, J. and Lu, G. (2005) Sensitivity analysis for informative
censoring in parametric survival models. Biostatistics, 6, 77–91.
Simon, R. (1986) Confidence intervals for reporting results of clinical trials.
Annals of Internal Medicine, 105, 429–435.
Simon, R. and Lee, Y.J. (1982) Nonparametric confidence limits for survival
probabilities and median survival time. Cancer Treatment Reports, 66, 37–
42.
Slud, E.V. and Rubinstein, L.V. (1983) Dependent competing risks and sum-
mary survival curves. Biometrika, 70, 643–649.
Slud, E.V., Byar, D.P. and Green, S.B. (1984) A comparison of reflected ver-
sus test-based confidence intervals for the median survival time based on
censored data. Biometrics, 40, 587–600.
Spiegelhalter, D.J. (2005) Funnel plots for comparing institutional perfor-
mance. Statistics in Medicine, 24, 1185–1202.
Spiegelhalter, D.J., Aylin, P., Best, N.G., Evans, S.J.W. and Murray, G.D.
(2002) Commissioned analysis of surgical performance using routine data:
lessons from the Bristol inquiry (with discussion). Journal of the Royal
Statistical Society, A, 162, 191–231.
Spiegelhalter, D.J., Sherlaw-Johnson, C., Bardsley, M., Blunt, I., Wood, C.
and Grigg, O. (2012) Statistical methods for healthcare regulation: rating,
screening and surveillance (with discussion). Journal of the Royal Statistical
Society, A, 175, 1–47.
Stablein, D.M. and Koutrouvelis, I.A. (1985) A two-sample test sensitive to
crossing hazards in uncensored and singly censored data. Biometrics, 41,
643–652.
Stablein, D.M., Carter, W.H., Jr. and Novak, J.W. (1981) Analysis of survival
data with nonproportional hazard functions. Controlled Clinical Trials, 2,
148–159.
Stare, J., Perme, M.P. and Henderson, R. (2011) A measure of explained
variation for event history data. Biometrics, 67, 750–759.
Storer, B.E. and Crowley, J. (1985) A diagnostic for Cox regression and general
conditional likelihoods. Journal of the American Statistical Association, 80,
139–147.
Stroup, W. (2013) Generalized Linear Mixed Models: Modern Concepts, Meth-
ods and Applications, Chapman & Hall/CRC, Boca Raton, Florida.
Sun, J., Ono, Y. and Takeuchi, Y. (1996) A simple method for calculating the
exact confidence interval of the standardized mortality ratio with an SAS
function. Journal of Occupational Health, 38, 196–197.
Sun, L., Liu, J., Sun, J. and Zhang, M-J. (2006) Modeling the subdistribution
of a competing risk. Statistica Sinica, 16, 1367–1385.
BIBLIOGRAPHY 517
Sy, J.P. and Taylor, J.M.G. (2000) Estimation in a Cox proportional hazards
cure model. Biometrics, 56, 227–236.
Tableman, M. and Kim, J.S. (2004) Survival Analysis Using S: Analysis of
Time-to-Event Data, Chapman & Hall/CRC, Boca Raton, Florida.
Tai, B-C., Machin, D., White, I. and Gebski, V. (2001) Competing risks
analysis of patients with osteosarcoma: A comparison of four different ap-
proaches. Statistics in Medicine, 20, 661–684.
Taylor, J.M.G. (1995) Semi-parametric estimation in failure time mixture
models. Biometrics, 51, 899–907.
Therneau, T. (2014) A package for survival analysis in S. R package version
2.37-7, URL http://CRAN.R-project.org/package=survival.
Therneau, T.M. (1986) The COXREGR Procedure. In: SAS SUGI Supple-
mental Library User’s Guide, version 5 ed., SAS Institute Inc., Cary, North
Carolina.
Therneau, T.M. and Grambsch, P.M. (2000) Modelling Survival Data: Extend-
ing the Cox Model, Springer, New York.
Therneau, T.M., Grambsch, P.M. and Fleming, T.R. (1990) Martingale-based
residuals for survival models. Biometrika, 77, 147–160.
Therneau, T.M., Grambsch, P.M. and Pankratz, V.S. (2003) Penalized sur-
vival models and frailty. Journal of Computational and Graphical Statistics,
12, 156–175.
Thisted, R.A. (1988) Elements of Statistical Computing, Chapman &
Hall/CRC, London.
Thomas, N., Longford, N.T. and Rolph, J.E. (1994) Empirical Bayes methods
for estimating hospital-specific mortality rates. Statistics in Medicine, 13,
889–903.
Thompson, R. (1981) Survival data and GLIM. Letter to the editor of Applied
Statistics, 30, 310.
Thomsen, B.L., Keiding, N. and Altman, D.G. (1991) A note on the calcu-
lation of expected survival, illustrated by the survival of liver transplant
patients. Statistics in Medicine, 10, 733–738.
Tibshirani, R. (1982) A plain man’s guide to the proportional hazards model.
Clinical and Investigative Medicine, 5, 63–68.
Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. Journal
of the Royal Statistical Society, B, 58, 267–288.
Tibshirani, R. (1997) The lasso method for variable selection in the Cox model,
Statistics in Medicine, 16, 385–395.
Tseng, Y.K., Hsieh, F. and Wang J.-L. (2005). Joint modeling of accelerated
failure time and longitudinal data. Biometrika, 92, 587–603.
Tsiatis, A.A. (1975) A nonidentifiability aspect of the problem of competing
risks. Proceedings of the National Academy of Sciences, 72, 20–22.
518 BIBLIOGRAPHY
Tsiatis, A.A. and Davidian, M. (2004) Joint modeling of longitudinal and
time-to-event data: An overview. Statistica Sinica, 14, 809–834.
Venables, W.N. and Ripley, B.D. (2002) Modern Applied Statistics with S, 4th
ed., Springer, New York.
Venables, W.N. and Smith, D.M. (2009) An Introduction to R, 2nd ed., Net-
work Theory Limited, Bristol.
Verweij, P.J.M., van Houwelingen, H.C. and Stijnen, T. (1998) A goodness-of-
fit test for Cox’s proportional hazards model based on martingale residuals.
Biometrics, 54, 1517–1526.
Volinsky, C.T. and Raftery, A.E. (2000) Bayesian information criterion for
censored survival models, Biometrics, 56, 256–262.
WHO Special Programme of Research, Development and Research Training
in Human Reproduction (1987) Vaginal bleeding patterns–The problem
and an example data set. Applied Stochastic Models and Data Analysis, 3,
27–35.
Wei, L.J. (1992) The accelerated failure time model: a useful alternative to
the Cox regression model in survival analysis. Statistics in Medicine, 11,
1871–1879.
Wei, L.J. and Glidden, D.V. (1997) An overview of statistical methods for
multiple failure time data in clinical trials. Statistics in Medicine, 16, 833–
839.
Wei, L.J., Lin, D.Y. and Weissfeld, L. (1989) Regression-analysis of multi-
variate incomplete failure time data by modeling marginal distributions.
Journal of the American Statistical Association, 84, 1065–1073.
Weissfeld, L.A. (1990) Influence diagnostics for the proportional hazards
model. Statistics and Probability Letters, 10, 411–417.
Weissfeld, L.A. and Schneider, H. (1990) Influence diagnostics for the Weibull
model fit to censored data. Statistics and Probability Letters, 9, 67–73.
Weissfeld, L.A. and Schneider, H. (1994) Residual analysis for parametric
models fit to censored data. Communications in Statistics–Theory and
Methods, 23, 2283–2297.
West, B.T., Welch, K.B. and Galecki, A.T. (2007) Linear Mixed Models: A
Practical Guide Using Statistical Software, Chapman & Hall/CRC, Boca
Raton, Florida.
White, I.R., Royston, P. and Wood, A.M. (2011) Multiple imputation using
chained equations: Issues and guidance for practice. Statistics in Medicine,
30, 377–399.
Whitehead, J.R. (1989) The analysis of relapse clinical trials, with application
to a comparison of two ulcer treatments. Statistics in Medicine, 8, 1439–
1454.
BIBLIOGRAPHY 519
Whitehead, J.R. (1997) The Design and Analysis of Sequential Clinical Trials,
2nd ed., Wiley, Chichester.
Wienke, A. (2011) Frailty Models in Survival Analysis, Chapman & Hall/CRC,
Boca Raton, Florida.
Wolbers, M. and Koller, M.T. (2007) Letter to the editor: Comments on
‘Analysing and interpreting competing risk data’ by M. Pintilie (original
article and author’s reply). Statistics in Medicine, 26, 3521–3523.
Wolbers, M., Koller, M.T., Witteman, J.C.M. and Steyerberg, E.W. (2009)
Prognostic models with competing risks: Methods and application to coro-
nary risk prediction. Epidemiology, 20, 555–561.
Woodward, M. (2014) Epidemiology: Study Design and Data Analysis, 3rd
ed., Chapman & Hall/CRC, Boca Raton, Florida.
Wu, M.C. and Carroll, R.J. (1988) Estimation and comparison of changes
in the presence of informative right censoring by modelling the censoring
process. Biometrics, 44, 175–188.
Yang, S. and Prentice, R.L. (1999) Semiparametric inference in the propor-
tional odds regression model. Journal of the American Statistical Associa-
tion, 94, 124–136.
Yuan, M. and Lin, Y. (2006) Model selection and estimation in regression with
grouped variables, Journal of the Royal Statistical Society, B, 68, 49–67.
Zahl, P.H. (1997) Letter to the editor: Comments on ‘Adjusting survival curves
for confounders: A review and a new method’ by F.J. Nieto and J. Coresh.
American Journal of Epidemiology, 146, 605.
Zhang, X., Loberiza, F.R., Klein, J.P. and Zhang, M.-J. (2007) A SAS macro
for estimation of direct adjusted survival curves based on a stratified Cox
regression model. Computer Methods and Programs in Biomedicine, 88,
95–101.
Statistics
Third
Edition Texts in Statistical Science
Modelling Survival Data in Medical Research describes the mod-
elling approach to the analysis of survival data using a wide range of
Modelling
Survival Data in
chapters on frailty models and their applications, competing risks,
non-proportional hazards, and dependent censoring. It also de-
scribes techniques for modelling the occurrence of multiple events
and event history analysis.
Earlier chapters are now expanded to include new material on a num-
ber of topics, including measures of predictive ability and flexible
parametric models. Many new data sets and examples are included
Medical Research
to illustrate how these techniques are used in modelling survival data.
Features
Third Edition
• Presents an accessible introduction to statistical methods for
handling survival data
• Includes modern statistical techniques for survival analysis and
key references
• Contains real data examples with many new data sets
• Provides additional data sets that can be used for coursework
Bibliographic notes and suggestions for further reading are provided
at the end of each chapter.
Additional data sets to obtain a fuller appreciation of the methodology,
or to be used as student exercises, are provided in the appendix.
All data sets used in this book are also available in electronic format
online.
This book is an invaluable resource for statisticians in the pharma-
ceutical industry, professionals in medical research institutes, scien-
tists and clinicians who are analysing their own data, and students
David Collett
Collett
K12670
w w w. c rc p r e s s . c o m