Whether They Have A Natural Order - Whether They Are Recurrences of The Same Types of Events
Whether They Have A Natural Order - Whether They Are Recurrences of The Same Types of Events
Whether They Have A Natural Order - Whether They Are Recurrences of The Same Types of Events
1
The counting process approach to survival analysis
A general approach to survival analysis was introduced by
Andersen & Gill (1982) where each subject is considered as a
counting process (counting events)
(k)
• Ni (t) is the total number of events of type k for each subject
i up to time t
(k)
• Yi (t) is an indicator function with Yik (t) = 1 if subject i is at
risk at time t for event of type k
2
By judicious choice of the various components of the process as
defined above, the counting process approach can handle all kinds
of survival data including
3
Unordered failures
Failures of the same type include, for example, repeated lung
infections with pseudomonas in children with cystic fibrosis, or the
development of breast cancer in genetically predisposed families.
4
Ordered failures
Ordered events may result from a study that records the time to
first myocardial infarction (MI), second MI, and so on. These are
ordered events in the sense that the second event cannot occur
before the first event. Unordered events, on the other hand, can
occur in any sequence. For example, in a study of liver disease
patients, a panel of seven liver function laboratory tests can
become abnormal in a specific order for one patient and in a
different order for another patient. The order in which the tests
become abnormal (fail) is random.
5
Two main approaches to modeling these data have gained
popularity over the last few years:
• Variance-corrected models.
In this approach the dependencies between failure times are not
included in the models. Instead, the covariance matrix of the
estimators is adjusted to account for the additional correlation.
These models are easily estimated in Stata.
6
Brief mathematical detail and definitions
(k) (k)
Let Ti and Ui be the failure and censoring time of the kth
failure type (k = 1, · · · , K) in the ith subject (i = 1, · · · , m), and
(k)
let Zi be a p-vector of possibly time-dependent covariates, for the
ith subject with respect to the kth failure type.
7
(k) (k)
Assume that Ti and Ui are independent, conditional on the
(k)
covariate vector (Zi ).
(k) (k) (k) (j) (j)
Define Xi = min(Ti , Ui ) and δij = I(Ti ≤ Ui ) where I(.)
is the indicator function, and let β be a p-vector of unknown
regression coefficients. Under the proportional hazard assumption,
the hazard function of the ith subject for the kth failure type is
(k)
(k) (k) Zi β
λ (t; Zi ) = λ0 (t)e
8
Maximum likelihood estimates of for the above models are obtained
from the Cox’s partial likelihood function, L(β), assuming
independence of failure times. The estimator β̂ has been shown to
be a consistent estimator for β and is asymptotically normal as
long as the marginal models are correctly specified (Lin 1994).
9
Sandwich estimators
Lin and Wei (1989) proposed a modification to this naive estimate,
appropriate when the Cox model is misspecified. The resulting
robust variance-covariance matrix is estimated as
V = I −1 U 0 U I −1 = D0 D
10
Sandwich estimators with clustered survival data
When observations are not independent, but can be divided into m
independent groups (G1 , G2 , · · · , Gm ), then the robust covariance
matrix takes the form
V = I −1 G0 GI −1
11
Implementation and examples
Implementation of all variance-adjusted models involves three
steps: Setting up the data (mainly correctly specifying the time
intervals), correct definition of the risk sets (by setting up Y (k) (t))
and care in the estimation method. All of the following models can
be handled:
1. Unordered failure events
(a) Unordered failure events of the same type
(b) Unordered failure events of different types (competing risk)
12
We will focus on the latter kind of models (i.e., ordered failure-time
models):
13
2. The WLW model
A second model, proposed by Wei, Lin, and Weissfeld (1989), is
based on the idea of marginal risk sets. For this analysis, the
data are treated like a set of unordered failures, so each event
has its own stratum and each patient appears in all strata.
14
3. The PWP model
A third method proposed by Prentice, Williams, and Peterson
(1981) is known as the conditional risk set model. The data are
set up as for Andersen and Gill’s counting processes method,
except that the analysis is stratified by failure order. The
assumption made is that a subject is not at risk of a second
event until the first event has occurred and so on.
15
There are two variations to this approach: Time from entry and
time from previous event (the so-called “gap-time model”).
16
The bladder cancer data
The four models for ordered failures are illustrated by use of the
bladder cancer data published in Wei, Lin & Weisfeld (1989).
. list in 1/9, noobs
+-----------------------------------------------------------+
| id group futime number size r1 r2 r3 r4 |
|-----------------------------------------------------------|
| 1 placebo 1 1 3 0 0 0 0 |
| 2 placebo 4 2 1 0 0 0 0 |
| 3 placebo 7 1 1 0 0 0 0 |
| 4 placebo 10 5 1 0 0 0 0 |
| 5 placebo 10 4 1 6 0 0 0 |
|-----------------------------------------------------------|
| 6 placebo 14 1 1 0 0 0 0 |
| 7 placebo 18 1 1 0 0 0 0 |
| 8 placebo 18 1 3 5 0 0 0 |
| 9 placebo 18 1 1 12 16 0 0 |
+-----------------------------------------------------------+
17
This dataset includes data on 86 subjects with bladder cancer with
follow-up between 0 and 64 months. The data for the first subject
that had zero follow-up have been excluded leaving data on 85
subjects. The following are the first nine observations in the data.
18
1. The Andersen-Gill model
To illustrate the bladder cancer data and how each of the four
models creates a different data set we consider the data for
subject #25 under the four models.
Under the A-G model, the data from this subject are as follows:
+-----------------------------------------------------------+
| id group number size rec status tstart tstop |
|-----------------------------------------------------------|
| 25 1 2 1 1 1 0 3 |
| 25 1 2 1 2 1 3 6 |
| 25 1 2 1 3 1 6 8 |
| 25 1 2 1 4 1 8 12 |
| 25 1 2 1 5 0 12 30 |
+-----------------------------------------------------------+
19
2. The Wei, Lin & Weisfeld model
Under the WLW model, each patient is simultaneously at risk
for all failures (thus the clock starts at time zero).
Once the fourth failure has been experienced the subject is no
longer at risk for another failure (unlike the A-G model above)
so the data for subject # 25 above become, under the WLW
model:
+------------------------------------------------------------+
| id group number size rec status tstart tstop |
|------------------------------------------------------------|
| 25 1 2 1 1 1 0 3 |
| 25 1 2 1 2 1 0 6 |
| 25 1 2 1 3 1 0 8 |
| 25 1 2 1 4 1 0 12 |
+------------------------------------------------------------+
20
3. The Prentice, Williams and Peterson model
In the time since entry PWP model, data are set up similarly
with the A-G model but the ordering of the failure is
considered by the model. In addition, time starts from entry
for each interval.
(a) The total time model
Under the PWP total time model the above data will be given
as
+------------------------------------------------------------+
| id group number size rec status tstart tstop |
|------------------------------------------------------------|
| 25 1 2 1 1 1 0 3 |
| 25 1 2 1 2 1 0 6 |
| 25 1 2 1 3 1 0 8 |
| 25 1 2 1 4 1 0 12 |
+------------------------------------------------------------+
21
(b) The gap-time model
Under the gap-time model the clock starts at the end of the
previous failure, so the data for the same subject are given by
+--------------------------------------------------+
| id group number size rec status gap |
|--------------------------------------------------|
| 25 1 2 1 1 1 3 |
| 25 1 2 1 2 1 3 |
| 25 1 2 1 3 1 2 |
| 25 1 2 1 4 1 4 |
+--------------------------------------------------+
22
Implementing the Andersen-Gill model
To implement the Andersen and Gill model using the results from
the bladder cancer study, the data are set up as follows: for each
patient there must be one observation per event or time interval.
23
The data for the nine subjects listed above are
. list if id<=10, noobs
+------------------------------------------------------+
| id group tstart tstop status number size |
|------------------------------------------------------|
| 1 placebo 0 1 0 1 3 |
| 2 placebo 0 4 0 2 0 |
| 3 placebo 0 7 0 1 0 |
| 4 placebo 0 10 0 5 0 |
| 5 placebo 0 6 1 4 0 |
|------------------------------------------------------|
| 5 placebo 6 10 0 4 0 |
| 6 placebo 0 14 0 1 0 |
| 7 placebo 0 18 0 1 0 |
| 8 placebo 0 5 1 1 3 |
| 8 placebo 5 18 0 1 3 |
|------------------------------------------------------|
| 9 placebo 0 12 1 1 1 |
| 9 placebo 12 16 1 1 1 |
| 9 placebo 16 18 0 1 1 |
+------------------------------------------------------+
24
In the original data, subjects 1 through 4 had no tumors recur,
thus, each of these 4 patients has only one censored (status==0)
observation spanning from tstart=0 to end of follow-up.
25
The data are set-up as follows:
. stset tstop , fail(status) exit(time .) id(id) enter(tstart)
id: id
failure event: status != 0 & status < .
obs. time interval: (tstop[_n-1], tstop]
enter on or after: time tstart
exit on or before: time .
------------------------------------------------------------------
190 total obs.
0 exclusions
------------------------------------------------------------------
190 obs. remaining, representing
85 subjects
112 failures in multiple failure-per-subject data
2711 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 64
26
The Andersen-Gill Cox model is fit as follows:
. stcox group size number, nohr nolog
------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
group | -.4070966 .2000726 -2.03 0.042 -.7992317 -.0149615
size | -.0400877 .0702575 -0.57 0.568 -.1777899 .0976146
number | .1606478 .0480081 3.35 0.001 .0665536 .2547419
------------------------------------------------------------------------------
27
The marginal risk set model (Wei, Lin, and Weissfeld)
The marginal risk model ignores the ordering of events and treats
each failure as differen type of failure i = 1, · · · , 4. The resulting
data for the first five subjects are given as follows:
. list if id<=2, noobs
. list if id <2
+----------------------------------------------------+
| id group futime number size rec status |
|----------------------------------------------------|
| 1 1 1 1 3 1 0 |
| 1 1 1 1 3 2 0 |
| 1 1 1 1 3 3 0 |
| 1 1 1 1 3 4 0 |
| 2 1 4 2 1 1 0 |
|----------------------------------------------------|
| 2 1 4 2 1 2 0 |
| 2 1 4 2 1 3 0 |
| 2 1 4 2 1 4 0 |
|----------------------------------------------------|
28
The data are set up as follows:
. stset futime, failure(status)
29
The Cox model is fitted with the sandwich estimator, clustering on
each subject and stratifying on each failure type.
. stcox group size number, nohr strata(rec) cluster(id) nolog
30
The conditional risk set model (time from entry)
As previously mentioned, there are two variations of the
conditional risk set model. The first variation in which time to each
event is measured from entry is illustrated in this section.
The data are set up as for Andersen and Gill’s method, however, a
variable indicating the failure order is included. The analysis is
then stratified by this variable. The resulting observations for the
first five subjects are
. list if id<=5, noobs
+------------------------------------------------------------+
| id group tstart tstop status number size rec |
|------------------------------------------------------------|
| 1 1 0 1 0 1 3 1 |
| 2 1 0 4 0 2 1 1 |
| 3 1 0 7 0 1 1 1 |
| 4 1 0 10 0 5 1 1 |
| 5 1 0 6 1 4 1 1 |
|------------------------------------------------------------|
| 5 1 0 10 0 4 1 2 |
+------------------------------------------------------------+
31
The resulting dataset is identical to that used to fit Andersen and
Gill’s model except that the rec variable identifies the failure risk
group for each time span.
For the first 4 individuals, who have not had a recurrence, the rec
value is one so that they are at risk for a first recurrence the whole
follow-up time. The last individual, id==5, was at risk for a first
recurrence for 6 months (rec==1) and at risk of a second recurrence
(rec==2) from 6 months to the end of follow-up at 10 months.
32
The data are set up as follows:
. stset tstop, fail(status) exit(time .) enter(tstart)
------------------------------------------------------------------
183 total obs.
0 exclusions
------------------------------------------------------------------
183 obs. remaining, representing
112 failures in single record/single failure data
3907 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 59
33
The total-time PWP model is
. stcox group size number, nohr nolog strata(rec)
------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
group | -.4897246 .2092469 -2.34 0.019 -.8998411 -.0796082
size | -.0377304 .0675414 -0.56 0.576 -.1701092 .0946484
number | .1102692 .0510491 2.16 0.031 .0102149 .2103235
------------------------------------------------------------------------------
Stratified by rec
34
A robust (sandwich) estimate of the variance can be added:
. stcox group size number, nohr nolog robust strata(rec)
35
Gap time model
The gap time PWP model measures time to each event from the time of the
previous event. Time is measured from zero to the gap between each failure.
36
The data are set up as follows:
. stset gap status
------------------------------------------------------------------
183 total obs.
5 obs. end on or before enter()
------------------------------------------------------------------
178 obs. remaining, representing
112 failures in single record/single failure data
2480 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 59
37
The corresponding gap-time model is
. stcox group size number, nohr nolog strata(rec)
------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
group | -.2695213 .2076622 -1.30 0.194 -.6765318 .1374892
size | .0068402 .0700105 0.10 0.922 -.1303777 .1440582
number | .1535334 .0521059 2.95 0.003 .0514077 .255659
------------------------------------------------------------------------------
Stratified by rec
38
Clustering by id and robust variance estimation is done as follows:
. stcox group size number, nohr nolog robust strata(rec) cluster(id)
39
References
Andersen, P. K. and R. D. Gill. 1982. Cox’s regression model for counting
processes: A large sample study. Ann Statist 10: 1100-1120. Lee, E. W., L. J.
Wei, and D. Amato. 1992.
Lin, D. Y. 1994. Cox regression analysis of multivariate failure time data: The
marginal approach. Stat Med 13: 2233-2247.
Lin, D. Y. and L. J. Wei. 1989. The robust inference for the Cox proportional
hazards model. Journal of the American Statistical Association 84: 1074-1078.
40