Analysis of Censored Data
IMS Lecture Notes - Monograph Series (1995) Volume 27
Nonparametric estimators for interval censoring
problems*
Piet Groeneboom
Delft University of Technology
Abstract
We study weighted least squares estimators for the distribution
function of observations which are only visible via interval censoring,
i.e., in the situation where one only has information about an interval to
which the variable of interest belongs and where one cannot not observe
it directly. The least squares estimators are shown to be closely related
to nonparametric maximum likelihood estimators (NPMLE's) and to
coincide with these in certain cases. New algorithms for computing the
estimators are presented and it is shown that they converge from any
starting point (in contrast with the EM-algorithm in this situation).
Finally, the estimation of non-smooth and smooth functionals of the
model is considered; for the latter case, we discuss y/n-consistency and
efficiency of the NPMLE.
1
Introduction
An extensive statistical theory exists for treating right censored data. Much
less is known about more general types of censorship. This paper considers estimators for data subject to interval censoring. In this situation one
only has information about an interval to which the observation of interest
belongs; so only indirect information about the observation of interest is
available.
Most of the time the interval will be a time interval, but the following
interesting spatial version of this situation was brought to our attention by
professor Dietz. In examinations of skin tissue, possibly affected by skin
cancer, successive (roughly) circular incisions are made to determine the
region of affected tissue; in this case one tries to estimate the smallest "safe"
radius determining the region on which the operation should take place.
On the one hand one tries to minimize the number of incisions, but on the
*AMS 1991 subject classifications. 60F17, 62E20, 62G05, 62G20, 45A05.
Key words and phrases, nonparametric maximum likelihood, empirical processes, asymptotic distributions, asymptotic efficiency, Fredholm integral equations.
105
106
P. Groeneboom
other hand making too few incisions might result in an estimate which is too
rough. Clearly statistical information about the estimates based on interval
censoring could be very valuable here.
Aids research provides other important examples of interval censoring;
usually the time of onset of a certain stage of the disease is unknown, but
often indirect information about this is available.
In this paper we will concentrate on the following two cases of interval
censoring:
Case 1. For each individual we make one observation and observe whether
or not the event of interest has occurred before the time of observation. Such
data arise for instance in cross-sectional studies.
Case 2. Two examinations at particular times are made so that it is known
whether the event happened before the first observation (left censored), between the two observations (interval censored) or after the second observation (right censored).
(1955) derived the nonparametric maximum likelihood estimator (NPMLE) of the distribution function for Case 1 and proved that it
is consistent. In this case the NPMLE can be calculated in a finite number
of steps using the "pool adjacent violators" algorithm.
PETO (1973) considers the NPMLE for the more general Case 2. He suggests that pointwise standard errors for the survival curve can be estimated
from the inverse of the Fisher information, which, however, is not correct.
Turnbull in TURNBULL (1974) and TURNBULL (1976) proposes the use of
an EM algorithm to compute the NPMLE in interval censored problems. On
the other hand, it is shown in GROENEBOOM AND WELLNER (1992), Chapter 1,
Part II, that the "self-consistency" equation is a necessary but not a sufficient
condition for the NPMLE. The EM-algorithm may therefore converge to
some inconsistent estimator. Further, even if the starting function is such
that the algorithm will converge to the NPMLE, the rate of convergence is
generally very slow. Finally, the self-consistency equations have not been
successful in developing distribution theory. For these reasons we turn to
another approach, based on isotonic regression theory. This theory gives
necessary and sufficient conditions, yields efficient algorithms for computing
the NPMLE and leads us either directly to distribution theory or to rather
specific conjectures about the asymptotic behavior.
AYER ET AL.
Furthermore, the relation between NPMLE's and nonparametric least
squares estimators will be discussed: these estimators actually coincide for
interval censoring, case 1, but have a rather different behavior for interval
censoring, case 2.
Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTS
107
2
Interval censoring, case 1
We first discuss th e following case of interval censoring.zyxwvutsrqponmlkjihgfedcbaZYXW
Cas e 1. Let ( Xx , 7 \ ) , . . . , ( Xn , T n) be a sample of random variables in M\ ,
where Xi and Γ t are independent (non- negative) random variables with distribution functions Fo and G , respectively. The only observations which are
available are T{ ("observation tim e") and δ i = {Xi < T t }. H ere we denote
the indicator of an event A (such as {Xi < T t }) just by A, instead of l ^
The log likelihood for FQ is given by the function
δ i log F(Ti) + (1 - zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJ
δ i) log( l - F (T ))} ,
(1)
where F is a right- continuous distribution function.
The (conditional) log likelihood, divided by n , can be written in t h e
following way:
Φ {F) ^ j R 2 {\ {x<t}\ o%F(<) + l {, > ί }log{l - F(t)}} dPn(x,t),
(2)
where Pn is the empirical probability measure of the pairs (X;, Γ t ), 1 < i < n.
The nonpα rα metric maximum likelihood estimator (N P M LE) Fn of F is a
(right- continuous) distribution function F , maximizing (2).
Re mark 2. 1. N ote th at only th e values of Fn at the observation points
m atter for the maximization problem. To avoid trivialities, we will take as
"t h e" N P M LE a distribution function which is piecewise con stan t, and only
has jum ps at th e observation points. I t may happen t h at th e likelihood function is maximized by a function F such t h at F(t) < 1, at each observation
point t. In this case we do not specify the location of th e remaining mass
to th e right of the biggest observation point. U nder these conventions, t h e
N P M LE is uniquely determined, both in case 1 and case 2 of th e interval
censoring problem.
It turn s out t h at in case 1 th e N P M LE Fn coincides with th e least squares
estim ator, obtained by minimizing the function
F-
J2(F(Ti) - δ i)
2
over th e set of all distribution functions F (Remark 2.1 ensures uniqueness
over the restricted class of dfs, having jumps only at the observation poin ts).
108zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
P. Groeneboom
Therefore the N P M LE is a straightforward solution of an isotonic regression
problem; a fact t h at has already been used in the paper by AYER ET AL.
(1955).
The pointwise asymptotic behavior of the N P M LE is studied in G ROEN EBOOM (1987) and the result is given again in G ROEN EBOOM AND WELLN ER
(1992) as Theorem 5.1:zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED
The o re m 5.1 in Gro e ne bo o m and Wellner ( 1 9 9 2 ). Let to be such t h at
0 < Fo(to) < 1, 0 < G(t0) < 1, and let Fo and G be differentiate at ί 0, with
strictly positive derivatives fo(to) and g(to), respectively. F urtherm ore, let
Fn be th e N P M LE of Fo. Then we have, as n - + oo,
n1/ 3{Fn(t0)
-
Fo{to)}/ {\ Fo(to)(l
-
Fo(to))fo(to)/ g(to)}1/ 3
Z 2Z ,
where —• denotes convergence in distribution, and where Z is th e last time
2
where stan dard two- sided Brownian motion minus th e parabola y(t) = t
reaches its maximum.
This shows t h at , under the conditions of the theorem, th e N P M LE converges locally at th e n 1 ' 3 rate. A minimax result showing t h at th e n 1 / 3 rate is
th e correct rate here and t h at the part of th e constant in the minimax lower
bound, depending on the underlying distribution, is correctly represented in
the asym ptotic variance of the N P M LE, is also shown in G ROEN EBOOM (1987)
(in fact, two approaches are given there; one based on Assouad's Lemma and
one based on th e theory of limiting experiments, leading to slightly different
universal constants in the lower bounds for the minimax risk). Still another
proof of th e minimax lower bound is sketched in the exercises of C hapter 2
of P art I of G ROEN EBOOM AND WELLN ER (1992).
The minimax result was also recently reconsidered by G ILL AND LEVIT
(1992). Their approach is based on the van Trees inequality (VAN TR E E S
1 3
(1968)). They recover the rc / rate, but obtain a different type of constan t, due to th e fact t h at they use a (local) uniform Lipschitz condition on
th e underlying df (in contrast to the approach in G ROEN EBOOM (1987) and
G ROEN EBOOM AND WELLN ER (1992)).
As can be expected from the general theory on differentiate functionals
(see e.g., VAN DER VAART (1991), efficient estimators of smooth functionals
like th e mean
µFo =
jtdF0(t)
should have y/ n—behavior. Suppose th at the support of Pp0 is a bounded
interval / = [0, M ], and t h at Fo and G have densities / o and g, respectively,
satisfying
g(t) > δ > 0, and / 0 ( ί ) > δ > 0,
if ί G / ,
Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQ
109
for some 6 > 0. F urther assume th at g has a bounded derivative on I. An
example of this situation is t h e case where FQ an d G are both t h e uniform
distribution function on [0,1]. Then we have t h e following result, proved in
G ROEN EBOOM AND WELLN ER (1992), C hapter 5 of P art zyxwvutsrqponmlkjihgfedcbaZY
II.
Theorem 5.5 in Groeneboom and Wellner ( 1992). Let Fo and G
satisfy t h e conditions, listed above, and let Fn be t h e N P M LE of Fo. Then
where U has a normal distribution with mean zero and variancezyxwvutsrqponmlkjihgfedc
-
* Ό (t))
Λ
The proof uses a rather involved exponential martingale argument in
order t o give an upper bound t o t h e probability t h at t h e maximum distance
1 3
between successive jum ps of Fn is bigger th an ra" / logn. This in t u rn is
used t o show t h at t h e supremum distance between Fn an d FQ is of order
n'1/ 3 log n. A different shorter proof, avoiding t h e upper bound argument
for th e supremum distance between Fn and FQ and also treatin g more general
functionals th an t h e mean, is given in HUANG AND WELLN ER ( 19 9 5 A) .
The asymptotic variance of t h e above estimator of t h e mean is in fact
the efficient asymptotic variance (i.e., coincides with t h e information lower
bound) in this situation. Interestingly enough, the information lower bound
calculation (done by Jon Wellner) preceded t h e result on t h e asym ptotic
variance of t h e estimator of t h e mean, based on t h e N P M LE. T h e lower
bound calculation is given in VAN DER VAART (1991).
In t h e example on H epatitis A in Bulgaria, given in KEID IN G (1991), a
quantity of interest is t h e transmission potential (i.e., the expected number of
people infected by a person having th e disease), which can be considered t o
be a smooth functional for a restricted class of distribution functions. In t h e
model, used by KEID IN G (1991), this quantity should be estimable at rate n 1 / 2
under smoothness conditions on t h e underlying distributions. Preliminary
results on this are reported in HANSEN (1991). An intriguing aspect of t h e
estimation of these global types of functionals is t h at t h e optimal bandwidth
choice is quite different from the optimal bandwidth choice for th e pointwise
estim ates.
3
3.1
Interval censoring, case 2
Characterization of the estimators
We now turn t o t h e second case of interval censoring, mentioned in t h e
introduction. F rom a m athem atical (an d possibly also practical) point of
110
P.zyxwvutsrqponmlkjihgfedcbaZY
Groeneboom
view this case is much more interesting th an interval censoring, case 1. Much
less is known, however, and the theory is still in its beginning stage. We
consider th e following model.zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIH
Interval censoring, Case 2. Let (Xu Γ i, ί 7i) , . . . , ( Xn , T n , Un) be a sample
of random variables in JR+ , where X{ is a (non- negative) random variable
with continuous distribution function F o , and where T2 and Ui are (nonnegative) random variables, independent of Xt , with a joint continuous distribution function H and such t h at T 2 < Ui with probability one. The only
observations which are available are (Γ t ,ί 7 t ) (the "observation tim es") and
δ i = {xi<τ i}, Ί i
= {xie(τ i,ui]}.
For a change, we start with discussing least squares estim ators. A least
squares estim ator Fn of FQ is defined as a minimfeer of the function
(3)
where the weights Wij can be chosen in several different ways, to be discussed
below. In different n otation , we have to minimize
Φ (F) d^f j ^ φ F(x, i, u) dPn(x, ΐ , «),
(4)
where
x,ί , u) =
(
) ( ( )
{
}
f
, u)(F(u) - F(t) - l{t<x<u}f
, «)(l - F(u) l{x>u})\
(5)
and P n is the empirical probability measure of the triples (Xt ,Ti, Ϊ 7t ), 1 <
i < n; the weight functions Wj, j = 1,2,3, only have to be defined at th e
points (T;, Ui) by
Wj(Ti,Ui)
= Wij, i = l , . . . , n ; j = 1, 2, 3.
where Wij is defined as in (5).
Re mark 3. 1. N ote t h at again (as in the preceding section) only th e values of
Fn at th e observation points Γ t and Ui m atter for the minimization problem.
We will take as "t h e" least squares estimator a distribution function which is
piecewise constant, and only has jumps at the observation points Γ t and U%.
It may again happen t h at the function φ is minimized by a function F such
t h at F(t) < 1, at each observation point t. In this case we do not specify th e
location of th e remaining mass to the right of the biggest observation point.
We shall show t h at , under these conventions, the least squares estim ator is
uniquely determined.
Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLK
111
We start by characterizing the least squares estim ator, under th e conventions
of Remark 3.1. To this end, we introduce the following processes.zyxwvutsrqponmlkjihgfed
D efinition 3.1 Let F be a distribution function on [0, oo). Then the process
Wp is defined by
-
F(t')}dPn(x,
-
f,u)
(F(u) - F(t'))\ dPn(x, t', u)
(F(u) - F(t'))\ dPn(x, f, u)
F(u))} dPn{x, t', u),
for t > 0,
(6)
where Pn is the empirical probability measure of the points ( X^ Γ ί , ί 7;), i —
The following proposition characterizes the least squares estim ator.
P ro po s itio n 1 Let T be the set of discrete distribution functions, with mass
concentrated at the observation points and possibly some extra mass at the
right of the biggest observation point. Then Fn minimizes the right- hand
side of (3) over all F £ T if and only if
f
dWp ( 0 < 0,
V< > 0,
(7)
J[ί ,oo)
andzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
J
β
)
=^
(8)
where Wp is defined.by (6). Moreover, Fn is uniquely determined by (7) and
(8).
The proof is quite similar to the proof of Proposition 1.3 in C hapter 1,
part I I , of G ROEN EBOOM AND WELLN ER (1992), but slightly easier, since we
don't have to worry about the endpoints, which caused some extra work in
th e characterization of the N P M LE. In order to describe an algorithm for
computing the least squares estimator, we introduce a "tim e scale process"
similar to (but different from) the time scale process G F , defined by (1.29)
in C hapter 1, part I I , of G ROEN EBOOM AND WELLN ER (1992).
p.zyxwvutsrqponmlkjihgfedcbaZY
Groeneboomzyxwvutsrqpon
Definition 3.2. Let F be a distribution function on [0, oo) and let H n be
the empirical distribution function of the pairs (Γ t , Ϊ 7t ). Then the processes
G and Vp are defined by
* ,u)}dH n{t',u)
and
Vp(t) = W F(t) + ί
F(t') dG{t% t > 0.
(10)
J[o,t]
The processes G and Vp have similar motivation and properties as the processes GF and Vp on page 49 of GROENEBOOM AND WELLNER (1992).
The following proposition characterizes Fn as the slope of the convex minorant of a self- induced cumulative sum diagram.
P roposition 2 Let the class of distribution functions T be defined as in
Proposition 1. Then Fn minimizes the right- hand side of (3) over T if and
only if Fn is the left derivative of the convex minorant of the ''cumulative
sum (cusum) diagram", consisting of the points
P3 = (G(T (j)),VPn(T U)))
,
where Po = (0,0) and ϊ yj, j = 1,2, ..., 2n , are the ordered observation
times.
This suggests a simple iterative procedure for computing the least squares
estimator: starting with an arbitrary (sub)distribution function, one computes at the (m + l) th iteration step the convex minorant of the cusum
diagram, consisting of the points
and uses the left derivative F ( m + 1 ) of the convex minorant in the process
Vp(m+i), defining the cusum diagram in the next iteration. We will show in
the next section that this procedure will converge to the solution from any
starting distribution.
The NPMLE can in this case be characterized as a least squares estimator with "self- induced weights". In fact, the NPMLE is characterized by
Proposition 1, but with the weights W{ in the process WF in (6) defined by
wλ {t,u) = 1/ F(t), w2(t,u) = l/ (F(u) - F(t)), and ws(t,u) = 1/ (1 -
F(u)).
(11)
Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONML
113
If a denominator in (11) equals zero, the corresponding weight is defined to
be infinite and the corresponding squared distance in (5) is equal to zero
in that case. Using the convention 0 oo = 0, the corresponding weighted
square gives no contribution to the total sum of squares in (3). In practice,
one actually performs a preliminary reduction of the problem, excluding
these terms from the minimization problem.
So in this case the weights are defined by the solution itself, a situation
somewhat reminiscent of the "self- consistency equations". In an iterative
convex minorant algorithm, the weights are adjusted in an iterative procedure in such a way that the solution and the weights match at the end of
the iteration.zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
3.2
Algorithms
We show that the iterative convex minorant algorithm, based on Proposition
2, corresponds to a contraction mapping for a suitably chosen norm on J7,
with a contraction constant depending on the weight function. Since there
is only one fixed point, the algorithm will converge from any starting point.
We define the l^- distance || || on T byzyxwvutsrqponmlkjihgfedcbaZYXWVUT
2
\ \ F1- F2\ \ = J(F1(t)- F2(t)fdG(t),
where G is defined by (9). Let the function
be defined byzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
ί $y
dG
( t ) =zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
(
\ 0
, otherwise.
We define F( m+1ϊ at the (m + l) th iteration step as the distribution function
in T that minimizes
,
dVF(m)
II
Let the mapping T : F H+ T F, F G T be defined by
Γ
Then, by Theorem 8.2.5 in
dG
ROBERTSON, WRIG HT AND DYKSTRA
(1988),
114zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
P. Groeneboom
But t h e square of the term at th e right- hand side of (12) can be written
ΐ n{tf,u)
\
2
(
^
(
t
t
\ u)
U )
<c
(13)
where th e constant c satisfies
c < max max I
-
M
T
»
U
i
)
2
\ zyxwvutsrqponmlkjihgfe
\ ((τ u)
As an example, if Wi(t, u) = 1, i = 1,2,3, we get
|| j p ( m + 1 ) _ p{m)\ \ < i \ \ jp( m) _ iri™- 1)!!
For finding the N P M LE one could carry out the iteration procedure above
repeatedly, for example starting with equal weights. This am ounts t o a repeated weighted least squares procedure, where th e weights are determined
by th e preceding step. At the start of each iteration after th e initial iteration
one takes th e weights as in (5), but with F defined as th e solution oft h e
least squares problem in the preceding step. A program for doing this (using
some "buffers", preventing th e iterative estimates from leaving t h e allowed
region) has been developed and seems t o work fine. Another (simpler) iterative convex minorant algorithm for computing th e N P M LE is discussed
in G ROEN EBOOM AND WELLN ER (1992), C hapter 3 of P art I I . I t is shown in
JON G BLOED ( 1995A) and JON G BLOED ( 1995B ) th at a slight modification of
the latter algorithm will always converge.
H owever, t h e original motivation for developing these algorithms was
an attem pt t o derive distribution theory. We will turn t o this in t h e next
section.zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
3.3
Local distribution theory for case 2
For interval censoring, case 1, we have the result t h at th e N P M LE converges
at rate n 1 / 3 . Interestingly enough, in case 2 there exist estimators which have
a faster rate of convergence. F irst of all, a minimax calculation shows t h at
the rate of convergence should not be n 1 / 3 but ( n lo gn ) 1 / 3 . The lower bound
calculation is given in BARKER (1988). G ILL AND LEVIT (1992) also derive
a lower bound of order ( n l o gn ) " 1 / 3 . A simple histogram- type estim ator
has been constructed by Lucien Birge (personal communication), which can
easily be shown t o attain th e rate ( n lo gn ) 1 / 3 at / 0 Th e trouble with t h e
least squares estimator with constant weights is t h at observations lying in
Estimators for interval censoring problems
115
smaller intervals do not get more weight; they should get more weight in
order to obtain the faster rate of convergence!
It is conjectured that the least squares estimator with weights, inversely
proportional to the lengths of the observation intervals, converges locally at
rate (nlogn) 1 / 3 . Computer experiments also point in this direction. What
in our view is actually more interesting is that the NPMLE seems to behave
asymptotically as a least squares estimator with weights W{ defined by
In fact there exist now a group of connected conjectures about the behavior of the NPMLE, all pointing in the direction of the following conjecture.
Conjecture. Let FQ and H be continuously differentiate at t0 and (^o,/o),
respectively, with strictly positive derivatives /o(^o) and h(to,to). By continuous differentiability of H at (to,to) is meant that the density h(tyu) is
continuous in (/, u) if t < u and (/, u) is sufficiently close to (£o? ^o) and that
h(t,t), defined by
h(t,t) = ]im h(t,u),
is continuous in /, for t in a neighborhood of toLet 0 < Fo(to),H(to,to) < 1, and let Fn be the NPMLE. Then
where Z is the last time where standard two-sided Brownian motion minus
the parabola y(t) — t2 reaches its maximum.
The conjecture is discussed in Part II, Chapter 5, section 2, of GROENEBOOM
(1992), where a result of this type is proved for an estimator,
obtained after one step of an iterative convex minorant algorithm, starting
with the underlying distribution. Of course, for practical purposes the latter
result is useless; the study of its behavior was only motivated by the belief
that its behavior is the clue to the behavior of the NPMLE.
AND WELLNER
4
4.1
Estimation of smooth functionals
Information lower bounds
As was remarked earlier, one can expect that smooth functionals of the model
can be estimated at y^-rate. The theory on the estimation of smooth functionals for case 2 is rather complicated, though, and intimately connected
with certain Fredholm integral equations for which solutions can only be
116
P.zyxwvutsrqponmlkjihgfedcbaZ
Groeneboom
given implicitly. We will give a sketch of the present situation of th e theory below, relying mostly on the exposition in G ESKUS AND G ROEN EBOOM
( 1995A, B, C ) .
For a more complete and more general treatise on the relation between
pathwise differentiability of fun ction al and asymptotic efficiency, we refer
to part I of (G roeneboom and Wellner (1992)) or (Bickel et aL (1993)). We
give some key concepts below.
Let th e unknown distribution P on the space (y, B) be contained in some
class of probability measures V, which is dominated by a σ - finite measure µ .
Let P have density p with respect to µ . Since we are interested in estimation
of some real- valued function of P , we introduce the functional Θ : V —• IR.
Let, for some δ > 0, the collection {Pt} with t £ (0,£ ) be a one- dimensional
param etric submodel which is smooth in the following sense:zyxwvutsrqponmlkjihgfedcbaZ
/ [
as ί J, 0, for some a G L 2(P)
Such a submodel is called Hellinger differentiable and a is called th e score
function or score. The folowing result is well- known.
P r o p o si t i o n 3 Each score belonging to some Hellinger differentiable submodel is contained in
P ro o f: See G ESKUS AND G ROEN EBOOM ( 1995C )
In our situation th e collection of scores α , obtained by considering all
possible one- dimensional H ellinger- differentiable param etric submodels, is a
linear space. This space is called the tangent space at P , denoted by T ( P ) .
N ote t h at T(P) C L%(P).
Now Θ : V —• IR is pathwise differentiable at P if for each H ellinger
differentiable path {P i}, with corresponding score α , we have
l i m ΐ - 1 ( Θ ( P t ) - Θ ( P ) ) = Θ 'p(α ),
with Θ p : T(P) —> IR continuous and linear.
Θ p can be written in an inner product form. Since T{P) is a subspace of th e
H ilbert- space £ 2(^)5 the continuous linear functional Θ p can be extended
to a continuous linear functional Θ p on L2(P). By the Riesz representation
theorem, to Θ 'P belongs a unique θ p G L 2(P), called the gradient, satisfying
Θ 'p(h) =< θ P, h >P
for aU h G L 2(P).
One gradient is playing a special role, which is obtained by extending T(P)
to th e H ubert space T(P). Then, the extension of Θ p is unique, yielding the
Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONML
117
canonical gradient or efficient influence function θ p £ T(P). This canonical
gradient is also obtained by taking the orthogonal projection of any gradient
0p, obtained after extension of Θ p, into T(P). Hence θ p is the gradient
with minimal norm among all gradients and we have
Will = \\θ p\\ 2P + \\Θ
P
-
θ Pfp.
The so- called convolution theorem now says that the smallest asymptotic
variance we can get for a regular estimator of Θ (P) is ||0p||2 . An asymptotically efficient estimator is a regular estimator which has an asymptotic
distribution with this (minimal) variance.
The interval censoring model is an example of a model with information
loss, in which the distribution P is induced by a transformation. In these
models the functional to be estimated is implicitly defined. The lower bound
theory for such implicitly defined functionals is treated in VAN DER VAART
(1991) and BICKEL et aί (1993). This theory will be applied to case 2 of
the interval censoring model. We start with the formulation of the model
for case 2. The loss of information is expressed by the fact that, instead of a
sample ( Xχ , ..., Xn ) , we observe (T u ϋ i, Δ i, Γ i) , . . . , (Γ n , ί / n , Δ n , Γ n ) with
Δ
«
=
1
{JC<<Γ <} a
n d
τ
i
=
1
{Ti<Xi<Ui}'
W e
su ppo se:
(M l) Xi is a non- negative absolutely continuous random variable with distribution function F. Let S > 0. F is contained in the class
T$ := {F\ support(i^) C [0,5]; F < λ , λ being Lebesgue measure}.
F is the distribution on which we want to obtain information; however,
we do not observe X{ directly.
(M2) Instead, we observe the pairs (Γ , , E/, ), with distribution function H. H
is contained in 7ί , the collection of all two- dimensional distributions
on {(ί , ϋ )|0 < t < w}, absolutely continuous with respect to twodimensional Lebesgue measure and such that each H is independent
of each F. Let h denote the density of (Γ t , ί /, ), with marginal densities
and distribution functions hi, Hi and h2j H 2 for Γ t and Ui respectively.
(M3) If both Hi and H 2 put zero mass on some set A, then F has zero mass
on A as well, so F < Hi + H 2. This means that F does not have mass
on sets in which no observations can occur.
Condition (M3) is needed to ensure consistency. Moreover, without this
assumption the functionals we are interested in are not well- defined. So discrete F should be excluded from Ts>
118
P.zyxwvutsrqponmlkjihgfedcbaZY
Groeneboom
Note that what we do observe can be seen as a measurable transformation
S of what we would observe if there would be no censoring:
with domain {(x,t,u) |0 < x, 0 < t < u}. This domain will be called the
hidden space, and the image space will be called the observation space. In
our model P is induced by F and # , and is from now on written as QF,H >
having density
?F ,*(ί ,M ,7) = h(t,u)F(t) δ (F(u) - F (ί )Γ (l -
Fin)) 1'8- *
with respect to λ 2 ® 1^2 • , where v<ι denotes the counting measure on the set
{(0, l), (l, 0), (0, 0)}.
We are interested in estimation of some functional K(F) of F. However,
K(F) is only implicitly defined as O(QF,H)I with H acting as a nuisance
parameter. In particular, we will be concerned with the problem whether
the NPMLE Θ n of Θ (QF,H) satisfies
x^ ( Θ
n
- Q(QFtH))^N(0,
\\θ
QFiH
\\ 2).
All Hellinger differentiate submodels at QF,H that can be formed, together with the corresponding score functions, are induced by the Hellinger
differentiable paths of densities on the hidden space, according to the following theorem:zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Theorem 4.1 Let V <C µ be a class of probability measures on the hidden
space (y,B). P E V is induced by the random vector Y. Suppose that the
path {Pt} to P satisfies
for some a £
Let S : (y^B) —> (Z- >C) be a measurable mapping. Suppose that the induced
measures Qt = PtS~ λ and Q — PS~ X on (Z,C) are absolutely continuous
with respect to µS~ x, with densities qt and q. Then the path {Qt} is also
Hellinger differentiable, satisfying
J [t'\ Vq~t - y/ q)~
witha(z) = E P(a(Y)\ S = z).
Proof: See BICKEL el al (1993).
2
\ ά y/ q] dµS-
1
-
0 as t [ 0
Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJ
119
N ote t h at α G L^{Q). The relation between the scores α in th e hidden
tangent space T{P) and the induced scores ά is expressed by th e mapping
AP : α ( ) H+ E P(α (Y)\ S
= • •)
This mapping is called the score operator. It is continuous and linear. Its
range is th e induced tangent space, which is contained in L^Q).
Now Theorem 4.1 yields the tangent space T(QF,H) of the induced H ellinger
differentiable path s {Qt} at QF,H with score operator A : L^{F) 0 L^H) —>>
T(QF,H)
given by:
[AFfH(« + *)](*> «> «> 7) = E FfH{a(X)
+ e(T, U) | (T, U, Δ , Γ ) = (ί , t*, ί , 7 ) }
H aving specified th e Hellinger differentiable paths in th e observation
space, we can also determine differentiability of the functional
N ote t h at
Θ (QF,H)
is defined unambiguously by condition (M 3).
In our censoring model, differentiability of Θ (QF,H) along th e induced
H ellinger differentiable paths in the observation space can be proved by
looking at the structure of the adjoint A*FH of the m ap AF}H according
to Theorem 4.2 below, which was first proved in VAN DER VAART (1991)
in a more general setting, allowing for Banach space valued functions as
estimand. Then the proof is slightly more elaborate.
Recall t h at the adjoint of a continuous linear mapping A : D —> E, with
D and E H ilbert- spaces, is the unique continuous linear mapping A* : D —>
E satisfying
< Ag,h>E=<
g,A*h>D
Mg eG.he
H.
The score operator from Theorem 4.1 is playing th e role of A. Its adjoint
can be written as a conditional expectation as well. If Z ~ PS"1, then :
[A*Pb](y) = Ep(b(Z)\ Y
= y)
a.e.- [P]zyxwvutsrqponmlkjihgfedcbaZYX
1
The o re m 4.2 Let Q = VS"
be a class of probability measures on the image
space of the measurable transformation S. Suppose the functional Θ : Q —• IR
can be written as Θ (Qp) = K(P) with K pathwise differentiable at P in the
hidden space, having canonical gradient Rp.
Then Θ is differentiable at Qp G Q along the collection of induced paths in
the observation space obtained via Theorem 4- 1 if and only if
kp e 1Z(A*P)
(14)
120
P . zyxwvutsrqponmlkjihgfedcbaZ
Groeneboom
If (14) holds, then the canonical gradients Θ QP of Q and kp of K are related
by
kp =
A*Pθ
Qp
P ro o f: See VAN DER VAART (1991) or G ESKUS AND G ROEN EBOOM ( 1 9 9 5 C ) .zyxwvutsrqponmlkj
D
Now K(F) is only implicitly defined as Θ (QF,H)- > with H acting as a nuisance param eter. N ote t h at Θ (QF,H) is defined unambiguously by condition
(M 3). The key equation t h at is needed is the following
k F e TZ(Ll)
and if this holds, then t h e canonical gradient is t h e unique element θ
satisfying
L\ θ
= k F.
in
(15)
The operators L \ and L<ι have the following form:
M
adFzyxwvutsrqponmlkjihgfedcbaZYXWV
a. e . - [ Q F f H ]
[L 2e](u, υ , δ , 7) = e(u, v)
a.e. -
[QF,H]
(16)
The adjoint of L x can be written as [L\ b](x) = E P(b(U, V, Δ , T)\ X = x) and
we get zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
rM rM
[L\ b){x) =
b(t,u,l,Q)h(t,u)dtdu
+ zyxwvutsrqponmlkjihgfedcbaZYXWVUTSR
Jt=χ Ju=t
rx
rM
/
/
b(t,u,O,l)h(t,u)dtdu+
(17)
Jt=θ Ju=x
rx rxzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB
/
I
n( "/•
it
Γ \ Π
1 rt (~t
7 / 1 / Ί ~t / ill
54 P» I r» I
M any functionals t h at are pathwise differentiable in th e model without
censoring, lose this property in t h e interval censoring model. Any functional K with a canonical gradient t h at is n ot a.e. equal t o a continuous
function cannot be obtained under L \ . So not all linear functionals remain
pathwise differentiable. F or example, n(F) = F(to)> with canonical gradient l[o,ί o]( ) ~ F(to), l ° s e s this property. This is in correspondence with
F(to) n ot being estimable at λ / n - rate. H owever, functionals of th e form
K(F) = / c(x)dF(x), with c sufficiently smooth, can be shown t o remain
differentiable under censoring. H ence for these functionals t h e above inform ation lower bound theory holds.
Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONM
121
We will be concerned with the problem whether the NPMLE Θ
®(QF,H) satisfies
n
of
In the interval censoring model, both case 1 and case 2, the function zyxwvutsrqponmlkjihgfe
rM
φ (x) := /
a(t) dF(t) with a G L%(F).zyxwvutsrqponmlkjihgfedcbaZYXWVU
Jx
appears explicitly in the score operator L\ . Therefore it plays an important
role. It is called the integrated score function. ^From its definition we know
that φ satisfies φ (Q) = φ (M) = 0 and that φ is continuous for F G TsWe now investigate solvability of the equation
in the variable a G L^{F). By the structure of the score operator L\ this
can be reformulated as an equation in φ :
k F(x)
= Γ
Γ
^Kh{t,u)dudt
Jt=OJu=t zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB
v
- f
Γ
rM
;
P$E^h(t,u)dudt
(18)
rM
~
P&h(t,u)dudt
a.e.- [F].
The support of F may consist of several disjoint intervals. However, (18) is
not defined on intervals where F does not put mass, and these intervals do
not play any role. So without loss of generality we may assume the support
of F to consist of one interval [0, M].
Unlike case 1, differentiating equation (18) on both sides does not yield
an explicit formula for φ . Instead, we get the following integral equation:
φ (x)+dF(x) |/ i= o ί jf)Ξ ί $ Λ (ί , x) dt - Jt=χ ί g Ξ | g h(x, t) dί j = k(x)dF(x),
(19)
with dp{x) being the function zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED
) -
F(x)\ i- F(χ )]
h1(x)[lF(x)]+h
(x)F(x)^
[lF(x)]+h
2(2
f
writing k(x) instead of κ F(x). Although k may depend on the underlying distribution, we do not explicitly express this dependence. Apart from
the model conditions ( M l) to (M 3), some extra conditions will have to be
introduced.
122
P.zyxwvutsrqponmlkjihgfedcbaZ
Groeneboom
(51) hi an d h2 are continuous, with hχ (x) + h2(x) > 0 for all x G [0, M ] .
(52) h(t, u) is continuous
(53) Prob{?7 - T < β 0} = 0 for some e0 with 0 < e0 < 1/ 2 M , so Λ doesn ot
have mass close t o th e diagonal
(54) F is either a continuous distribution function with support [0, M ] , or a
piecewise constant distribution function with a finite number of jum ps,
all in [0,Λ f]; F satisfies
F(y)
-
F(t)
>c>0,iΐ u- t>eo
(S5) k is continuous
The integral equation for φ belongs t o a well- known family of integral
equations, which have been studied extensively, t h e family of Fredholm integral equations of the second kind. U sing this theory, it is proved t h at
equations (19) have a (unique) solution. If we impose some extra smoothness conditions, we can derive some smoothness properties of the solution.
These smoothness properties also imply solvability of Rp = L\ L\ a for th e unknown absolutely continuous distribution function F. The extra smoothness
conditions are:
(LI ) The partial derivatives Δ *(f) = ^h(t,x)
and Δ ^(ί ) = ^h(x,t)
exist,
except for at most a countable number of points # , where left and right
derivatives exist. Th e derivatives are bounded, uniformly over t an d
x.
(L2) k is differentiate, except for at most a countable number of points
x, where left an d right derivatives exist. T h e derivative is bounded,
uniformly over x.
We nowcan specify th e structure of the canonical gradient θ p €
Φ F{u)- φ F{t)
11
φ
(u) zyxwvutsrqponmlkjihgfed
F
7 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDC
_
++ (( * *τ 0τ lΓ0 7 fr)'
fr
F{u) F{t)
where φ p satisfies t h e integral equation (19).
4.2
Asymptotic efficiency of the NPM LE
In this section, we will denote th e underlying distribution function by Fo.
U nder uniqueness, proposition 1.3 in G ROEN EBOOM AND WELLN ER (1992)
gives an alternative criterion which is necessary and sufficient for th e N P M LE .
Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJI
123
Given a sample (Ui, Vi, Δ i, Γ i ) , . . . , (Un, Vn, Δ n , Γ n ) , let T be the class
of distribution functions F satisfying
F(Ui) > 0
F(Vi) - F(Ui) > 0
1 - F(Vi) > 0
, if Xi < Uh
, if Ui < Xi < Vi,
, if Xi > Vh
and having mass concentrated on the set of observation points augmented
with an extra point bigger than all observation points. It is easily seen that
Fn belongs to this class. For distribution functions F G ί , the following
process t h * Wp{t) is properly defined:zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGF
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
/ ι\
I
C 7~1/
\ — 1 J/ ^S
(
CzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONML
I zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
A H 1 η i I
fil
1
I 7/
7)
Λ
'
Ύ TT
1/ 1/ 7—11 T 1 ——
-
/
7 {F(υ ) -
Fί uϊ ^dC
JuGlOΛ ]
υ £[O,t]
7
{F(υ ) - F(u)Y ι dQn(u, υ , ί , 7)
(1 - 6 - 7) {1 - F(υ )}
1
c?Qτ ι ('w,7j,^,7),
for t > 0,
where Qn is the empirical probability measure of the points (£/ ;, VJ, Δ t , Γ t ), i —
Let Ji = [τ 2 _ i, r t ), i = 1, ..., k + 1, To = 0, τ jς +i = M and r t is a point
of jump of F n , i = 1, ..., k. So τ \ and r^ are the first and last point of jump
of Fn respectively. Restriction to a compact interval [0, M] is only needed
to obtain the efficiency result Theorem 4.3, but not needed for Proposition
4, Corollary 4.1 and the consistency result (24).
Now proposition 1.3 in GROENEBOOM AND WELLNER (1992) says
P roposition 4 The function Fn maximizes the likelihood over all F G T if
and only if
ί
dW Pn(t')<0,
Vί > r i ,
(22)
= 0.
(23)
and
ί
J
Fn(t)dW Pn(t)
[ri,r k]
Moreover, Fn is uniquely determined by (22) and (23).
N ote that there may be observation points before τ \ and beyond r*.
However, there the N PMLE should be 0 and 1 respectively. (See the discussion before proposition 1.3 in GROENEBOOM AND WELLNER (1992).) Now
the following corollary, proved in GESKUS AND GROENEBOOM ( 1995B) is an
immediate consequence.
124
P.zyxwvutsrqponmlkjihgfedcbaZY
Groeneboomzyxwvutsrqpon
Corollary 4.1 Any function σ
that is constant on the same intervals as Fn
satisfies
forzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
i = 2, ..., &.
Re mark. In fact corollary 4.1 follows from Fenchel duality theory (see e.g.
ROCKAFELLAR
(1970), theorem 28.3).
Moreover we have uniform consistency of the N P M LE of Fo (see G ROEN EBOOM AND WELLN ER (1992), part I I , section 4.3):
Prob {jim^ \\Fn - F0\ \ oo = θ } = 1
(24)
Another result that will be needed can be deduced from VAN D E G E E R
(1993).
Le mma 4.1 For i — 1,2,
\ \ Fn -
1
6
F0\ \ Hi = C y n- ^Q o g n) / ) as n -
oo,
where H\ and H2 are the first and second marginal distribution function of
H, respectively.
In order to be able to use Lemma 4.1 one further specification is m ade
to th e kind of functionals th at are allowed:
(Dl)
K(G) - K(Fo) = J R(x) d(G- )(x) + O(\\G - Fo\\ 22),
for all distribution functions G with support contained in [0, M ], and where
||G - F0II2 is th e X2- distance between the distribution functions G and Fo
w.r.t. Lebesgue measure on IR.
We also make the following assumption:
(D 2)
The underlying distribution function Fo has a density bounded
away from zero.
By condition (D 2) and the strong consistency of th e N P M LE, there exists
a constant c, such t h at
Fn(u) -
Fn(t) > c, if u - t > £ 0 ,
(25)
Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONM
125
if n is sufficiently large.
Combining all preceding results we then obtain the following theorem
(Theorem 2.1 in GESKUS AND GROENEBOOM ( 1995B) ) , showing efficiency of
the N PMLE:zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Theorem 4.3 Let the following conditions on FQ, H and κ
be satisfied:
Fo
(Ml) to (MS), (SI) to (S5), (LI) and (L2) of the precedingsection, and (Dl)
and (D2).
Then we have
Vύ (K(Fn)- K(F0))- ^N(0,\ \ θ \ \
2
)
as
QFo
n - + oo
(26)
S ketch of proof:
The proof boils down to proving the following relation
n
) - K(Fo)) = V^J θ
Fo
d(Qn - QFo) + op(l).
(27)
Then an application of the central limit theorem yields that the N PMLE of
K(FQ) has the desired asymptotically optimal behavior. The proof consists
of the following steps.
I. By conditions (SI) and ( D l) , and lemma 4.1 we have
n
) - K(Fo)) = V^J «F 0 d(Fn - Fo) + op(l)
II. For F G / , one can define a function φ p as a solution to the integral
equation (19). This solution can be used to extend definition (20) to
Θ F for F E f , where φ F(u)/ F(u) and φ F(v)/ (l - F(v)) are defined to
be zero if F(u) = 0 or if F(v) = 1, respectively. N ote that θ p no longer
has an interpretation as canonical gradient. In lemma 2.2 in GESKUS
AND GROENEBOOM ( 1995B) the following is shown for θ p :
k Fΰ d(Fn - F0) = - j θ
Fn
dQFo.
III. Corollary 4.1 implies
where <?Λ
denotes the function defined in (20), but with the function
• t'n
φ p replaced by φ p , which is constant on the intervals of constancy
of the N PMLE (and equals φ p at one point of the interval). We then
get
pn dQFo = V^Jθ pn d{Qn - QFo) + V^J(θ pn The second term can be shown to be o p ( l) .
θ pn)dQFo
126
P.zyxwvutsrqponmlkjihgfedcbaZY
Groeneboom
IV. Th e first term is further split into
Pn
- θ
)d(Qn
Fo
-
QFo)
The last term can be shown to be op(l), using a D onsker property of
the class of functions under consideration.
References
M., BRU N K, H .D ., EWIN G , G .M., R E I D , W.T., SILVERMAN, E. (1955).
An empirical distribution function for sampling with incomplete information, Ann. M ath. Statist., vol. 26, 641- 647.
AYER,
D . (1988). Nonparametric maximum likelihood estimation of the
distribution function of interval censored observations, M aster's thesis,
U niversity of Am sterdam .
BARKER,
BARLOW,
R.E.,
BARTH OLOMEW,
D .J.,
BREMN ER,
J.M .,
(1972). Statistical Inference under Order Restrictions,
York.
BRU N K,
H .D .
Wiley, New
J. M., H ALL, W. J., H U AN G , W. M., AND WELLN ER, J. A. (1983).
Information and asymptotic efficiency in parametric - nonparametric
models Ann. Statist., vol. 11, 432- 452.
BEG U N ,
P .J., KLAASSEN C.A.J., RITOV Y. AND WELLN ER J.A. (1993). Efficient and adaptive estimation in semiparametric models, John H opkins
U niversity P ress, Baltimore.
BICKEL
M.S., SOLOMJAK, M.Z. (1967). Piecewise- polynomial approximations of functions in the classes W £. M ath. Sbornik. vol. 73, 295- 317.
BIRMAN ,
G . E. AND LAG AKOS, S. W. (1982). Nonparametric estimation of
lifetime and disease onset distributions from incomplete observations.
Biometrics, vol. 38, 921- 932.
D IN SE,
S. VAN DE (1993). Rates of convergence for the maximum likelihood
estimator in mixture models, Technical Report T W 93- 09, U niversity
of Leiden.
G EER
R.B. (1992). Efficient estimation of the mean for interval censoring case II, Technical Report 92- 83, Delft U niversity of Technology.
G ESKU S
R.B. AND G ROEN EBOOM P . (1995a). Asymptotically optimal estimation of smooth functionals for interval censoring, part 1. To appear
in Statistica N eerlandica (jubilee issue).
G ESKU S
Estimators for interval censoring problems
127
R.B. AND GROENEBOOM P. (1995b). Asymptotically optimal estimation of smooth functionals for interval censoring, part 2. Submitted
to Statistica Neerlandica.
GESKUS
R.B. AND GROENEBOOM P. (1995c). Asymptotically optimal estimation of smooth functionals for interval censoring, case 2; observation times arbitrarily close, Technical Report, Delft University of
Technology, to appear.
GESKUS
R.D. AND LEVIT, B.Y. (1992) Applications of the van Trees inequality: a Baysian Cramer-Rao bound. Preprint Nr. 773, Department of
Mathematics, University Utrecht.
GILL,
P. (1987). Asymptotics for interval censored observations.
Technical Report 87-18, Department of Mathematics, University of
Amsterdam.
GROENEBOOM,
P. (1989). Brownian motion with a parabolic drift and Airy
functions. Probability theory and related fields, vol. 81, 79-109.
GROENEBOOM,
P. (1991). Discussion on: Age-specific incidence and prevalence: a statistical perspective, by Niels Keiding. J. R. Statist. Soc.
A,vol. 154, 400-401.
GROENEBOOM,
P. AND WELLNER J.A. (1992). Information bounds and nonparametric maximum likelihood estimation, Birkhauser Verlag.
GROENEBOOM
B.E. (1991). Nonparametric estimation of functionals for interval
censored observations. Master's thesis, Delft University of Technology
and Copenhagen University.
HANSEN,
J. AND WELLNER J.A. (1995a). Asymptotic normality of the
NPMLE of linear functionals for interval censored data, case 1, to
appear in Statistica Neerlandica.
HUANG
J. AND WELLNER J.A. (1995b). Efficient estimation for the proportional hazards model with "Case 2" interval censoring, submitted.
HUANG
G. (1995). Three statistical inverse problems. Ph.D. thesis,
Delft University of Technology.
JONGBLOED,
G. (1995). The iterative convex minorant algorithm for nonparametric estimation, Technical Report, Delft University of Technology, to appear.
JONGBLOED
N. (1991) Age-specific incidence and prevalence: a statistical
perspective (with discussion). J. R. Statist. Soc. A, vol. 154, 371-412.
KEIDING,
J., POLLARD, D. (1990). Cube root asymptotics. Ann. Statist., vol.
18, 191-219.
KIM,
128
P. Groeneboom
R. (1989). Linear integral equations, Applied Mathematical Sciences vol. 82, Springer Verlag, New York.
KRESS
(1973). Experimental survival curves for interval-censored data,
Appl. Statist, vol. 22, p. 86-91.
PETO
T., WRIGHT, F.T., DYKSTRA, R.L. (1988). Order Restricted
Statistical Inference. Wiley, New York.
ROBERTSON,
ROCKAFELLAR,
R.T. (1970). Convex analysis, Princeton University Press.
A. AND WELLNER, J. (1992). Uniform Donsker Classes of Functions, Ann. Prob.,vol 20, p. 1983-2030.
SHEEHY,
B.W. (1974). Nonparametric estimation of a survivorship function with doubly censored data. J. Amer. Statist. Assoc, vol. 69, 169173.
TURNBULL,
B.W. (1976). The empirical distribution function with arbitrarily grouped censored and truncated data. J.R. Statist. Soc. B, vol.
38, 290-295.
TURNBULL,
B. W. AND MITCHELL, T. J. (1984) Nonparametric estimation of the distribution of time to onset for specific diseases in survival/sacrifice experiments. Biometrics, vol. 40, 41-50.
TURNBULL,
H. L. (1968) Detection, Estimation and Modulation Theory,
Part 1. Wiley, New York.
VAN TREES
A.W. VAN DER (1988). Statistical estimation in large parameter
spaces, CWI Tract, vol. 44, Centrum voor Wiskunde en Informatica,
Amsterdam.
VAART
A.W. VAN DER (1991). On differentiate functional,
vol. 19, p. 178-204.
VAART
View publication stats
Ann. Statist.,