Academia.eduAcademia.edu

Nonparametric estimators for interval censoring problems

1995

We study weighted least squares estimators for the distribution function of observations which are only visible via interval censoring, i.e., in the situation where one only has information about an interval to which the variable of interest belongs and where one cannot not observe it directly. The least squares estimators are shown to be closely related to nonparametric maximum likelihood estimators (NPMLE's) and to coincide with these in certain cases. New algorithms for computing the estimators are presented and it is shown that they converge from any starting point (in contrast with the EM-algorithm in this situation). Finally, the estimation of non-smooth and smooth functionals of the model is considered; for the latter case, we discuss y/n-consistency and efficiency of the NPMLE. *AMS 1991 subject classifications. 60F17, 62E20, 62G05, 62G20, 45A05.

Analysis of Censored Data IMS Lecture Notes - Monograph Series (1995) Volume 27 Nonparametric estimators for interval censoring problems* Piet Groeneboom Delft University of Technology Abstract We study weighted least squares estimators for the distribution function of observations which are only visible via interval censoring, i.e., in the situation where one only has information about an interval to which the variable of interest belongs and where one cannot not observe it directly. The least squares estimators are shown to be closely related to nonparametric maximum likelihood estimators (NPMLE's) and to coincide with these in certain cases. New algorithms for computing the estimators are presented and it is shown that they converge from any starting point (in contrast with the EM-algorithm in this situation). Finally, the estimation of non-smooth and smooth functionals of the model is considered; for the latter case, we discuss y/n-consistency and efficiency of the NPMLE. 1 Introduction An extensive statistical theory exists for treating right censored data. Much less is known about more general types of censorship. This paper considers estimators for data subject to interval censoring. In this situation one only has information about an interval to which the observation of interest belongs; so only indirect information about the observation of interest is available. Most of the time the interval will be a time interval, but the following interesting spatial version of this situation was brought to our attention by professor Dietz. In examinations of skin tissue, possibly affected by skin cancer, successive (roughly) circular incisions are made to determine the region of affected tissue; in this case one tries to estimate the smallest "safe" radius determining the region on which the operation should take place. On the one hand one tries to minimize the number of incisions, but on the *AMS 1991 subject classifications. 60F17, 62E20, 62G05, 62G20, 45A05. Key words and phrases, nonparametric maximum likelihood, empirical processes, asymptotic distributions, asymptotic efficiency, Fredholm integral equations. 105 106 P. Groeneboom other hand making too few incisions might result in an estimate which is too rough. Clearly statistical information about the estimates based on interval censoring could be very valuable here. Aids research provides other important examples of interval censoring; usually the time of onset of a certain stage of the disease is unknown, but often indirect information about this is available. In this paper we will concentrate on the following two cases of interval censoring: Case 1. For each individual we make one observation and observe whether or not the event of interest has occurred before the time of observation. Such data arise for instance in cross-sectional studies. Case 2. Two examinations at particular times are made so that it is known whether the event happened before the first observation (left censored), between the two observations (interval censored) or after the second observation (right censored). (1955) derived the nonparametric maximum likelihood estimator (NPMLE) of the distribution function for Case 1 and proved that it is consistent. In this case the NPMLE can be calculated in a finite number of steps using the "pool adjacent violators" algorithm. PETO (1973) considers the NPMLE for the more general Case 2. He suggests that pointwise standard errors for the survival curve can be estimated from the inverse of the Fisher information, which, however, is not correct. Turnbull in TURNBULL (1974) and TURNBULL (1976) proposes the use of an EM algorithm to compute the NPMLE in interval censored problems. On the other hand, it is shown in GROENEBOOM AND WELLNER (1992), Chapter 1, Part II, that the "self-consistency" equation is a necessary but not a sufficient condition for the NPMLE. The EM-algorithm may therefore converge to some inconsistent estimator. Further, even if the starting function is such that the algorithm will converge to the NPMLE, the rate of convergence is generally very slow. Finally, the self-consistency equations have not been successful in developing distribution theory. For these reasons we turn to another approach, based on isotonic regression theory. This theory gives necessary and sufficient conditions, yields efficient algorithms for computing the NPMLE and leads us either directly to distribution theory or to rather specific conjectures about the asymptotic behavior. AYER ET AL. Furthermore, the relation between NPMLE's and nonparametric least squares estimators will be discussed: these estimators actually coincide for interval censoring, case 1, but have a rather different behavior for interval censoring, case 2. Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTS 107 2 Interval censoring, case 1 We first discuss th e following case of interval censoring.zyxwvutsrqponmlkjihgfedcbaZYXW Cas e 1. Let ( Xx , 7 \ ) , . . . , ( Xn , T n) be a sample of random variables in M\ , where Xi and Γ t are independent (non- negative) random variables with distribution functions Fo and G , respectively. The only observations which are available are T{ ("observation tim e") and δ i = {Xi < T t }. H ere we denote the indicator of an event A (such as {Xi < T t }) just by A, instead of l ^ The log likelihood for FQ is given by the function δ i log F(Ti) + (1 - zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJ δ i) log( l - F (T ))} , (1) where F is a right- continuous distribution function. The (conditional) log likelihood, divided by n , can be written in t h e following way: Φ {F) ^ j R 2 {\ {x<t}\ o%F(<) + l {, > ί }log{l - F(t)}} dPn(x,t), (2) where Pn is the empirical probability measure of the pairs (X;, Γ t ), 1 < i < n. The nonpα rα metric maximum likelihood estimator (N P M LE) Fn of F is a (right- continuous) distribution function F , maximizing (2). Re mark 2. 1. N ote th at only th e values of Fn at the observation points m atter for the maximization problem. To avoid trivialities, we will take as "t h e" N P M LE a distribution function which is piecewise con stan t, and only has jum ps at th e observation points. I t may happen t h at th e likelihood function is maximized by a function F such t h at F(t) < 1, at each observation point t. In this case we do not specify the location of th e remaining mass to th e right of the biggest observation point. U nder these conventions, t h e N P M LE is uniquely determined, both in case 1 and case 2 of th e interval censoring problem. It turn s out t h at in case 1 th e N P M LE Fn coincides with th e least squares estim ator, obtained by minimizing the function F- J2(F(Ti) - δ i) 2 over th e set of all distribution functions F (Remark 2.1 ensures uniqueness over the restricted class of dfs, having jumps only at the observation poin ts). 108zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA P. Groeneboom Therefore the N P M LE is a straightforward solution of an isotonic regression problem; a fact t h at has already been used in the paper by AYER ET AL. (1955). The pointwise asymptotic behavior of the N P M LE is studied in G ROEN EBOOM (1987) and the result is given again in G ROEN EBOOM AND WELLN ER (1992) as Theorem 5.1:zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED The o re m 5.1 in Gro e ne bo o m and Wellner ( 1 9 9 2 ). Let to be such t h at 0 < Fo(to) < 1, 0 < G(t0) < 1, and let Fo and G be differentiate at ί 0, with strictly positive derivatives fo(to) and g(to), respectively. F urtherm ore, let Fn be th e N P M LE of Fo. Then we have, as n - + oo, n1/ 3{Fn(t0) - Fo{to)}/ {\ Fo(to)(l - Fo(to))fo(to)/ g(to)}1/ 3 Z 2Z , where —• denotes convergence in distribution, and where Z is th e last time 2 where stan dard two- sided Brownian motion minus th e parabola y(t) = t reaches its maximum. This shows t h at , under the conditions of the theorem, th e N P M LE converges locally at th e n 1 ' 3 rate. A minimax result showing t h at th e n 1 / 3 rate is th e correct rate here and t h at the part of th e constant in the minimax lower bound, depending on the underlying distribution, is correctly represented in the asym ptotic variance of the N P M LE, is also shown in G ROEN EBOOM (1987) (in fact, two approaches are given there; one based on Assouad's Lemma and one based on th e theory of limiting experiments, leading to slightly different universal constants in the lower bounds for the minimax risk). Still another proof of th e minimax lower bound is sketched in the exercises of C hapter 2 of P art I of G ROEN EBOOM AND WELLN ER (1992). The minimax result was also recently reconsidered by G ILL AND LEVIT (1992). Their approach is based on the van Trees inequality (VAN TR E E S 1 3 (1968)). They recover the rc / rate, but obtain a different type of constan t, due to th e fact t h at they use a (local) uniform Lipschitz condition on th e underlying df (in contrast to the approach in G ROEN EBOOM (1987) and G ROEN EBOOM AND WELLN ER (1992)). As can be expected from the general theory on differentiate functionals (see e.g., VAN DER VAART (1991), efficient estimators of smooth functionals like th e mean µFo = jtdF0(t) should have y/ n—behavior. Suppose th at the support of Pp0 is a bounded interval / = [0, M ], and t h at Fo and G have densities / o and g, respectively, satisfying g(t) > δ > 0, and / 0 ( ί ) > δ > 0, if ί G / , Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQ 109 for some 6 > 0. F urther assume th at g has a bounded derivative on I. An example of this situation is t h e case where FQ an d G are both t h e uniform distribution function on [0,1]. Then we have t h e following result, proved in G ROEN EBOOM AND WELLN ER (1992), C hapter 5 of P art zyxwvutsrqponmlkjihgfedcbaZY II. Theorem 5.5 in Groeneboom and Wellner ( 1992). Let Fo and G satisfy t h e conditions, listed above, and let Fn be t h e N P M LE of Fo. Then where U has a normal distribution with mean zero and variancezyxwvutsrqponmlkjihgfedc - * Ό (t)) Λ The proof uses a rather involved exponential martingale argument in order t o give an upper bound t o t h e probability t h at t h e maximum distance 1 3 between successive jum ps of Fn is bigger th an ra" / logn. This in t u rn is used t o show t h at t h e supremum distance between Fn an d FQ is of order n'1/ 3 log n. A different shorter proof, avoiding t h e upper bound argument for th e supremum distance between Fn and FQ and also treatin g more general functionals th an t h e mean, is given in HUANG AND WELLN ER ( 19 9 5 A) . The asymptotic variance of t h e above estimator of t h e mean is in fact the efficient asymptotic variance (i.e., coincides with t h e information lower bound) in this situation. Interestingly enough, the information lower bound calculation (done by Jon Wellner) preceded t h e result on t h e asym ptotic variance of t h e estimator of t h e mean, based on t h e N P M LE. T h e lower bound calculation is given in VAN DER VAART (1991). In t h e example on H epatitis A in Bulgaria, given in KEID IN G (1991), a quantity of interest is t h e transmission potential (i.e., the expected number of people infected by a person having th e disease), which can be considered t o be a smooth functional for a restricted class of distribution functions. In t h e model, used by KEID IN G (1991), this quantity should be estimable at rate n 1 / 2 under smoothness conditions on t h e underlying distributions. Preliminary results on this are reported in HANSEN (1991). An intriguing aspect of t h e estimation of these global types of functionals is t h at t h e optimal bandwidth choice is quite different from the optimal bandwidth choice for th e pointwise estim ates. 3 3.1 Interval censoring, case 2 Characterization of the estimators We now turn t o t h e second case of interval censoring, mentioned in t h e introduction. F rom a m athem atical (an d possibly also practical) point of 110 P.zyxwvutsrqponmlkjihgfedcbaZY Groeneboom view this case is much more interesting th an interval censoring, case 1. Much less is known, however, and the theory is still in its beginning stage. We consider th e following model.zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIH Interval censoring, Case 2. Let (Xu Γ i, ί 7i) , . . . , ( Xn , T n , Un) be a sample of random variables in JR+ , where X{ is a (non- negative) random variable with continuous distribution function F o , and where T2 and Ui are (nonnegative) random variables, independent of Xt , with a joint continuous distribution function H and such t h at T 2 < Ui with probability one. The only observations which are available are (Γ t ,ί 7 t ) (the "observation tim es") and δ i = {xi<τ i}, Ί i = {xie(τ i,ui]}. For a change, we start with discussing least squares estim ators. A least squares estim ator Fn of FQ is defined as a minimfeer of the function (3) where the weights Wij can be chosen in several different ways, to be discussed below. In different n otation , we have to minimize Φ (F) d^f j ^ φ F(x, i, u) dPn(x, ΐ , «), (4) where x,ί , u) = ( ) ( ( ) { } f , u)(F(u) - F(t) - l{t<x<u}f , «)(l - F(u) l{x>u})\ (5) and P n is the empirical probability measure of the triples (Xt ,Ti, Ϊ 7t ), 1 < i < n; the weight functions Wj, j = 1,2,3, only have to be defined at th e points (T;, Ui) by Wj(Ti,Ui) = Wij, i = l , . . . , n ; j = 1, 2, 3. where Wij is defined as in (5). Re mark 3. 1. N ote t h at again (as in the preceding section) only th e values of Fn at th e observation points Γ t and Ui m atter for the minimization problem. We will take as "t h e" least squares estimator a distribution function which is piecewise constant, and only has jumps at the observation points Γ t and U%. It may again happen t h at the function φ is minimized by a function F such t h at F(t) < 1, at each observation point t. In this case we do not specify th e location of th e remaining mass to the right of the biggest observation point. We shall show t h at , under these conventions, the least squares estim ator is uniquely determined. Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLK 111 We start by characterizing the least squares estim ator, under th e conventions of Remark 3.1. To this end, we introduce the following processes.zyxwvutsrqponmlkjihgfed D efinition 3.1 Let F be a distribution function on [0, oo). Then the process Wp is defined by - F(t')}dPn(x, - f,u) (F(u) - F(t'))\ dPn(x, t', u) (F(u) - F(t'))\ dPn(x, f, u) F(u))} dPn{x, t', u), for t > 0, (6) where Pn is the empirical probability measure of the points ( X^ Γ ί , ί 7;), i — The following proposition characterizes the least squares estim ator. P ro po s itio n 1 Let T be the set of discrete distribution functions, with mass concentrated at the observation points and possibly some extra mass at the right of the biggest observation point. Then Fn minimizes the right- hand side of (3) over all F £ T if and only if f dWp ( 0 < 0, V< > 0, (7) J[ί ,oo) andzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA J β ) =^ (8) where Wp is defined.by (6). Moreover, Fn is uniquely determined by (7) and (8). The proof is quite similar to the proof of Proposition 1.3 in C hapter 1, part I I , of G ROEN EBOOM AND WELLN ER (1992), but slightly easier, since we don't have to worry about the endpoints, which caused some extra work in th e characterization of the N P M LE. In order to describe an algorithm for computing the least squares estimator, we introduce a "tim e scale process" similar to (but different from) the time scale process G F , defined by (1.29) in C hapter 1, part I I , of G ROEN EBOOM AND WELLN ER (1992). p.zyxwvutsrqponmlkjihgfedcbaZY Groeneboomzyxwvutsrqpon Definition 3.2. Let F be a distribution function on [0, oo) and let H n be the empirical distribution function of the pairs (Γ t , Ϊ 7t ). Then the processes G and Vp are defined by * ,u)}dH n{t',u) and Vp(t) = W F(t) + ί F(t') dG{t% t > 0. (10) J[o,t] The processes G and Vp have similar motivation and properties as the processes GF and Vp on page 49 of GROENEBOOM AND WELLNER (1992). The following proposition characterizes Fn as the slope of the convex minorant of a self- induced cumulative sum diagram. P roposition 2 Let the class of distribution functions T be defined as in Proposition 1. Then Fn minimizes the right- hand side of (3) over T if and only if Fn is the left derivative of the convex minorant of the ''cumulative sum (cusum) diagram", consisting of the points P3 = (G(T (j)),VPn(T U))) , where Po = (0,0) and ϊ yj, j = 1,2, ..., 2n , are the ordered observation times. This suggests a simple iterative procedure for computing the least squares estimator: starting with an arbitrary (sub)distribution function, one computes at the (m + l) th iteration step the convex minorant of the cusum diagram, consisting of the points and uses the left derivative F ( m + 1 ) of the convex minorant in the process Vp(m+i), defining the cusum diagram in the next iteration. We will show in the next section that this procedure will converge to the solution from any starting distribution. The NPMLE can in this case be characterized as a least squares estimator with "self- induced weights". In fact, the NPMLE is characterized by Proposition 1, but with the weights W{ in the process WF in (6) defined by wλ {t,u) = 1/ F(t), w2(t,u) = l/ (F(u) - F(t)), and ws(t,u) = 1/ (1 - F(u)). (11) Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONML 113 If a denominator in (11) equals zero, the corresponding weight is defined to be infinite and the corresponding squared distance in (5) is equal to zero in that case. Using the convention 0 oo = 0, the corresponding weighted square gives no contribution to the total sum of squares in (3). In practice, one actually performs a preliminary reduction of the problem, excluding these terms from the minimization problem. So in this case the weights are defined by the solution itself, a situation somewhat reminiscent of the "self- consistency equations". In an iterative convex minorant algorithm, the weights are adjusted in an iterative procedure in such a way that the solution and the weights match at the end of the iteration.zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA 3.2 Algorithms We show that the iterative convex minorant algorithm, based on Proposition 2, corresponds to a contraction mapping for a suitably chosen norm on J7, with a contraction constant depending on the weight function. Since there is only one fixed point, the algorithm will converge from any starting point. We define the l^- distance || || on T byzyxwvutsrqponmlkjihgfedcbaZYXWVUT 2 \ \ F1- F2\ \ = J(F1(t)- F2(t)fdG(t), where G is defined by (9). Let the function be defined byzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA ί $y dG ( t ) =zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA ( \ 0 , otherwise. We define F( m+1ϊ at the (m + l) th iteration step as the distribution function in T that minimizes , dVF(m) II Let the mapping T : F H+ T F, F G T be defined by Γ Then, by Theorem 8.2.5 in dG ROBERTSON, WRIG HT AND DYKSTRA (1988), 114zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA P. Groeneboom But t h e square of the term at th e right- hand side of (12) can be written ΐ n{tf,u) \ 2 ( ^ ( t t \ u) U ) <c (13) where th e constant c satisfies c < max max I - M T » U i ) 2 \ zyxwvutsrqponmlkjihgfe \ ((τ u) As an example, if Wi(t, u) = 1, i = 1,2,3, we get || j p ( m + 1 ) _ p{m)\ \ < i \ \ jp( m) _ iri™- 1)!! For finding the N P M LE one could carry out the iteration procedure above repeatedly, for example starting with equal weights. This am ounts t o a repeated weighted least squares procedure, where th e weights are determined by th e preceding step. At the start of each iteration after th e initial iteration one takes th e weights as in (5), but with F defined as th e solution oft h e least squares problem in the preceding step. A program for doing this (using some "buffers", preventing th e iterative estimates from leaving t h e allowed region) has been developed and seems t o work fine. Another (simpler) iterative convex minorant algorithm for computing th e N P M LE is discussed in G ROEN EBOOM AND WELLN ER (1992), C hapter 3 of P art I I . I t is shown in JON G BLOED ( 1995A) and JON G BLOED ( 1995B ) th at a slight modification of the latter algorithm will always converge. H owever, t h e original motivation for developing these algorithms was an attem pt t o derive distribution theory. We will turn t o this in t h e next section.zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA 3.3 Local distribution theory for case 2 For interval censoring, case 1, we have the result t h at th e N P M LE converges at rate n 1 / 3 . Interestingly enough, in case 2 there exist estimators which have a faster rate of convergence. F irst of all, a minimax calculation shows t h at the rate of convergence should not be n 1 / 3 but ( n lo gn ) 1 / 3 . The lower bound calculation is given in BARKER (1988). G ILL AND LEVIT (1992) also derive a lower bound of order ( n l o gn ) " 1 / 3 . A simple histogram- type estim ator has been constructed by Lucien Birge (personal communication), which can easily be shown t o attain th e rate ( n lo gn ) 1 / 3 at / 0 Th e trouble with t h e least squares estimator with constant weights is t h at observations lying in Estimators for interval censoring problems 115 smaller intervals do not get more weight; they should get more weight in order to obtain the faster rate of convergence! It is conjectured that the least squares estimator with weights, inversely proportional to the lengths of the observation intervals, converges locally at rate (nlogn) 1 / 3 . Computer experiments also point in this direction. What in our view is actually more interesting is that the NPMLE seems to behave asymptotically as a least squares estimator with weights W{ defined by In fact there exist now a group of connected conjectures about the behavior of the NPMLE, all pointing in the direction of the following conjecture. Conjecture. Let FQ and H be continuously differentiate at t0 and (^o,/o), respectively, with strictly positive derivatives /o(^o) and h(to,to). By continuous differentiability of H at (to,to) is meant that the density h(tyu) is continuous in (/, u) if t < u and (/, u) is sufficiently close to (£o? ^o) and that h(t,t), defined by h(t,t) = ]im h(t,u), is continuous in /, for t in a neighborhood of toLet 0 < Fo(to),H(to,to) < 1, and let Fn be the NPMLE. Then where Z is the last time where standard two-sided Brownian motion minus the parabola y(t) — t2 reaches its maximum. The conjecture is discussed in Part II, Chapter 5, section 2, of GROENEBOOM (1992), where a result of this type is proved for an estimator, obtained after one step of an iterative convex minorant algorithm, starting with the underlying distribution. Of course, for practical purposes the latter result is useless; the study of its behavior was only motivated by the belief that its behavior is the clue to the behavior of the NPMLE. AND WELLNER 4 4.1 Estimation of smooth functionals Information lower bounds As was remarked earlier, one can expect that smooth functionals of the model can be estimated at y^-rate. The theory on the estimation of smooth functionals for case 2 is rather complicated, though, and intimately connected with certain Fredholm integral equations for which solutions can only be 116 P.zyxwvutsrqponmlkjihgfedcbaZ Groeneboom given implicitly. We will give a sketch of the present situation of th e theory below, relying mostly on the exposition in G ESKUS AND G ROEN EBOOM ( 1995A, B, C ) . For a more complete and more general treatise on the relation between pathwise differentiability of fun ction al and asymptotic efficiency, we refer to part I of (G roeneboom and Wellner (1992)) or (Bickel et aL (1993)). We give some key concepts below. Let th e unknown distribution P on the space (y, B) be contained in some class of probability measures V, which is dominated by a σ - finite measure µ . Let P have density p with respect to µ . Since we are interested in estimation of some real- valued function of P , we introduce the functional Θ : V —• IR. Let, for some δ > 0, the collection {Pt} with t £ (0,£ ) be a one- dimensional param etric submodel which is smooth in the following sense:zyxwvutsrqponmlkjihgfedcbaZ / [ as ί J, 0, for some a G L 2(P) Such a submodel is called Hellinger differentiable and a is called th e score function or score. The folowing result is well- known. P r o p o si t i o n 3 Each score belonging to some Hellinger differentiable submodel is contained in P ro o f: See G ESKUS AND G ROEN EBOOM ( 1995C ) In our situation th e collection of scores α , obtained by considering all possible one- dimensional H ellinger- differentiable param etric submodels, is a linear space. This space is called the tangent space at P , denoted by T ( P ) . N ote t h at T(P) C L%(P). Now Θ : V —• IR is pathwise differentiable at P if for each H ellinger differentiable path {P i}, with corresponding score α , we have l i m ΐ - 1 ( Θ ( P t ) - Θ ( P ) ) = Θ 'p(α ), with Θ p : T(P) —> IR continuous and linear. Θ p can be written in an inner product form. Since T{P) is a subspace of th e H ilbert- space £ 2(^)5 the continuous linear functional Θ p can be extended to a continuous linear functional Θ p on L2(P). By the Riesz representation theorem, to Θ 'P belongs a unique θ p G L 2(P), called the gradient, satisfying Θ 'p(h) =< θ P, h >P for aU h G L 2(P). One gradient is playing a special role, which is obtained by extending T(P) to th e H ubert space T(P). Then, the extension of Θ p is unique, yielding the Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONML 117 canonical gradient or efficient influence function θ p £ T(P). This canonical gradient is also obtained by taking the orthogonal projection of any gradient 0p, obtained after extension of Θ p, into T(P). Hence θ p is the gradient with minimal norm among all gradients and we have Will = \\θ p\\ 2P + \\Θ P - θ Pfp. The so- called convolution theorem now says that the smallest asymptotic variance we can get for a regular estimator of Θ (P) is ||0p||2 . An asymptotically efficient estimator is a regular estimator which has an asymptotic distribution with this (minimal) variance. The interval censoring model is an example of a model with information loss, in which the distribution P is induced by a transformation. In these models the functional to be estimated is implicitly defined. The lower bound theory for such implicitly defined functionals is treated in VAN DER VAART (1991) and BICKEL et aί (1993). This theory will be applied to case 2 of the interval censoring model. We start with the formulation of the model for case 2. The loss of information is expressed by the fact that, instead of a sample ( Xχ , ..., Xn ) , we observe (T u ϋ i, Δ i, Γ i) , . . . , (Γ n , ί / n , Δ n , Γ n ) with Δ « = 1 {JC<<Γ <} a n d τ i = 1 {Ti<Xi<Ui}' W e su ppo se: (M l) Xi is a non- negative absolutely continuous random variable with distribution function F. Let S > 0. F is contained in the class T$ := {F\ support(i^) C [0,5]; F < λ , λ being Lebesgue measure}. F is the distribution on which we want to obtain information; however, we do not observe X{ directly. (M2) Instead, we observe the pairs (Γ , , E/, ), with distribution function H. H is contained in 7ί , the collection of all two- dimensional distributions on {(ί , ϋ )|0 < t < w}, absolutely continuous with respect to twodimensional Lebesgue measure and such that each H is independent of each F. Let h denote the density of (Γ t , ί /, ), with marginal densities and distribution functions hi, Hi and h2j H 2 for Γ t and Ui respectively. (M3) If both Hi and H 2 put zero mass on some set A, then F has zero mass on A as well, so F < Hi + H 2. This means that F does not have mass on sets in which no observations can occur. Condition (M3) is needed to ensure consistency. Moreover, without this assumption the functionals we are interested in are not well- defined. So discrete F should be excluded from Ts> 118 P.zyxwvutsrqponmlkjihgfedcbaZY Groeneboom Note that what we do observe can be seen as a measurable transformation S of what we would observe if there would be no censoring: with domain {(x,t,u) |0 < x, 0 < t < u}. This domain will be called the hidden space, and the image space will be called the observation space. In our model P is induced by F and # , and is from now on written as QF,H > having density ?F ,*(ί ,M ,7) = h(t,u)F(t) δ (F(u) - F (ί )Γ (l - Fin)) 1'8- * with respect to λ 2 ® 1^2 • , where v<ι denotes the counting measure on the set {(0, l), (l, 0), (0, 0)}. We are interested in estimation of some functional K(F) of F. However, K(F) is only implicitly defined as O(QF,H)I with H acting as a nuisance parameter. In particular, we will be concerned with the problem whether the NPMLE Θ n of Θ (QF,H) satisfies x^ ( Θ n - Q(QFtH))^N(0, \\θ QFiH \\ 2). All Hellinger differentiate submodels at QF,H that can be formed, together with the corresponding score functions, are induced by the Hellinger differentiable paths of densities on the hidden space, according to the following theorem:zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Theorem 4.1 Let V <C µ be a class of probability measures on the hidden space (y,B). P E V is induced by the random vector Y. Suppose that the path {Pt} to P satisfies for some a £ Let S : (y^B) —> (Z- >C) be a measurable mapping. Suppose that the induced measures Qt = PtS~ λ and Q — PS~ X on (Z,C) are absolutely continuous with respect to µS~ x, with densities qt and q. Then the path {Qt} is also Hellinger differentiable, satisfying J [t'\ Vq~t - y/ q)~ witha(z) = E P(a(Y)\ S = z). Proof: See BICKEL el al (1993). 2 \ ά y/ q] dµS- 1 - 0 as t [ 0 Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJ 119 N ote t h at α G L^{Q). The relation between the scores α in th e hidden tangent space T{P) and the induced scores ά is expressed by th e mapping AP : α ( ) H+ E P(α (Y)\ S = • •) This mapping is called the score operator. It is continuous and linear. Its range is th e induced tangent space, which is contained in L^Q). Now Theorem 4.1 yields the tangent space T(QF,H) of the induced H ellinger differentiable path s {Qt} at QF,H with score operator A : L^{F) 0 L^H) —>> T(QF,H) given by: [AFfH(« + *)](*> «> «> 7) = E FfH{a(X) + e(T, U) | (T, U, Δ , Γ ) = (ί , t*, ί , 7 ) } H aving specified th e Hellinger differentiable paths in th e observation space, we can also determine differentiability of the functional N ote t h at Θ (QF,H) is defined unambiguously by condition (M 3). In our censoring model, differentiability of Θ (QF,H) along th e induced H ellinger differentiable paths in the observation space can be proved by looking at the structure of the adjoint A*FH of the m ap AF}H according to Theorem 4.2 below, which was first proved in VAN DER VAART (1991) in a more general setting, allowing for Banach space valued functions as estimand. Then the proof is slightly more elaborate. Recall t h at the adjoint of a continuous linear mapping A : D —> E, with D and E H ilbert- spaces, is the unique continuous linear mapping A* : D —> E satisfying < Ag,h>E=< g,A*h>D Mg eG.he H. The score operator from Theorem 4.1 is playing th e role of A. Its adjoint can be written as a conditional expectation as well. If Z ~ PS"1, then : [A*Pb](y) = Ep(b(Z)\ Y = y) a.e.- [P]zyxwvutsrqponmlkjihgfedcbaZYX 1 The o re m 4.2 Let Q = VS" be a class of probability measures on the image space of the measurable transformation S. Suppose the functional Θ : Q —• IR can be written as Θ (Qp) = K(P) with K pathwise differentiable at P in the hidden space, having canonical gradient Rp. Then Θ is differentiable at Qp G Q along the collection of induced paths in the observation space obtained via Theorem 4- 1 if and only if kp e 1Z(A*P) (14) 120 P . zyxwvutsrqponmlkjihgfedcbaZ Groeneboom If (14) holds, then the canonical gradients Θ QP of Q and kp of K are related by kp = A*Pθ Qp P ro o f: See VAN DER VAART (1991) or G ESKUS AND G ROEN EBOOM ( 1 9 9 5 C ) .zyxwvutsrqponmlkj D Now K(F) is only implicitly defined as Θ (QF,H)- > with H acting as a nuisance param eter. N ote t h at Θ (QF,H) is defined unambiguously by condition (M 3). The key equation t h at is needed is the following k F e TZ(Ll) and if this holds, then t h e canonical gradient is t h e unique element θ satisfying L\ θ = k F. in (15) The operators L \ and L<ι have the following form: M adFzyxwvutsrqponmlkjihgfedcbaZYXWV a. e . - [ Q F f H ] [L 2e](u, υ , δ , 7) = e(u, v) a.e. - [QF,H] (16) The adjoint of L x can be written as [L\ b](x) = E P(b(U, V, Δ , T)\ X = x) and we get zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA rM rM [L\ b){x) = b(t,u,l,Q)h(t,u)dtdu + zyxwvutsrqponmlkjihgfedcbaZYXWVUTSR Jt=χ Ju=t rx rM / / b(t,u,O,l)h(t,u)dtdu+ (17) Jt=θ Ju=x rx rxzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB / I n( "/• it Γ \ Π 1 rt (~t 7 / 1 / Ί ~t / ill 54 P» I r» I M any functionals t h at are pathwise differentiable in th e model without censoring, lose this property in t h e interval censoring model. Any functional K with a canonical gradient t h at is n ot a.e. equal t o a continuous function cannot be obtained under L \ . So not all linear functionals remain pathwise differentiable. F or example, n(F) = F(to)> with canonical gradient l[o,ί o]( ) ~ F(to), l ° s e s this property. This is in correspondence with F(to) n ot being estimable at λ / n - rate. H owever, functionals of th e form K(F) = / c(x)dF(x), with c sufficiently smooth, can be shown t o remain differentiable under censoring. H ence for these functionals t h e above inform ation lower bound theory holds. Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONM 121 We will be concerned with the problem whether the NPMLE Θ ®(QF,H) satisfies n of In the interval censoring model, both case 1 and case 2, the function zyxwvutsrqponmlkjihgfe rM φ (x) := / a(t) dF(t) with a G L%(F).zyxwvutsrqponmlkjihgfedcbaZYXWVU Jx appears explicitly in the score operator L\ . Therefore it plays an important role. It is called the integrated score function. ^From its definition we know that φ satisfies φ (Q) = φ (M) = 0 and that φ is continuous for F G TsWe now investigate solvability of the equation in the variable a G L^{F). By the structure of the score operator L\ this can be reformulated as an equation in φ : k F(x) = Γ Γ ^Kh{t,u)dudt Jt=OJu=t zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB v - f Γ rM ; P$E^h(t,u)dudt (18) rM ~ P&h(t,u)dudt a.e.- [F]. The support of F may consist of several disjoint intervals. However, (18) is not defined on intervals where F does not put mass, and these intervals do not play any role. So without loss of generality we may assume the support of F to consist of one interval [0, M]. Unlike case 1, differentiating equation (18) on both sides does not yield an explicit formula for φ . Instead, we get the following integral equation: φ (x)+dF(x) |/ i= o ί jf)Ξ ί $ Λ (ί , x) dt - Jt=χ ί g Ξ | g h(x, t) dί j = k(x)dF(x), (19) with dp{x) being the function zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED ) - F(x)\ i- F(χ )] h1(x)[lF(x)]+h (x)F(x)^ [lF(x)]+h 2(2 f writing k(x) instead of κ F(x). Although k may depend on the underlying distribution, we do not explicitly express this dependence. Apart from the model conditions ( M l) to (M 3), some extra conditions will have to be introduced. 122 P.zyxwvutsrqponmlkjihgfedcbaZ Groeneboom (51) hi an d h2 are continuous, with hχ (x) + h2(x) > 0 for all x G [0, M ] . (52) h(t, u) is continuous (53) Prob{?7 - T < β 0} = 0 for some e0 with 0 < e0 < 1/ 2 M , so Λ doesn ot have mass close t o th e diagonal (54) F is either a continuous distribution function with support [0, M ] , or a piecewise constant distribution function with a finite number of jum ps, all in [0,Λ f]; F satisfies F(y) - F(t) >c>0,iΐ u- t>eo (S5) k is continuous The integral equation for φ belongs t o a well- known family of integral equations, which have been studied extensively, t h e family of Fredholm integral equations of the second kind. U sing this theory, it is proved t h at equations (19) have a (unique) solution. If we impose some extra smoothness conditions, we can derive some smoothness properties of the solution. These smoothness properties also imply solvability of Rp = L\ L\ a for th e unknown absolutely continuous distribution function F. The extra smoothness conditions are: (LI ) The partial derivatives Δ *(f) = ^h(t,x) and Δ ^(ί ) = ^h(x,t) exist, except for at most a countable number of points # , where left and right derivatives exist. Th e derivatives are bounded, uniformly over t an d x. (L2) k is differentiate, except for at most a countable number of points x, where left an d right derivatives exist. T h e derivative is bounded, uniformly over x. We nowcan specify th e structure of the canonical gradient θ p € Φ F{u)- φ F{t) 11 φ (u) zyxwvutsrqponmlkjihgfed F 7 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDC _ ++ (( * *τ 0τ lΓ0 7 fr)' fr F{u) F{t) where φ p satisfies t h e integral equation (19). 4.2 Asymptotic efficiency of the NPM LE In this section, we will denote th e underlying distribution function by Fo. U nder uniqueness, proposition 1.3 in G ROEN EBOOM AND WELLN ER (1992) gives an alternative criterion which is necessary and sufficient for th e N P M LE . Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJI 123 Given a sample (Ui, Vi, Δ i, Γ i ) , . . . , (Un, Vn, Δ n , Γ n ) , let T be the class of distribution functions F satisfying F(Ui) > 0 F(Vi) - F(Ui) > 0 1 - F(Vi) > 0 , if Xi < Uh , if Ui < Xi < Vi, , if Xi > Vh and having mass concentrated on the set of observation points augmented with an extra point bigger than all observation points. It is easily seen that Fn belongs to this class. For distribution functions F G ί , the following process t h * Wp{t) is properly defined:zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGF zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA / ι\ I C 7~1/ \ — 1 J/ ^S ( CzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONML I zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA A H 1 η i I fil 1 I 7/ 7) Λ ' Ύ TT 1/ 1/ 7—11 T 1 —— - / 7 {F(υ ) - Fί uϊ ^dC JuGlOΛ ] υ £[O,t] 7 {F(υ ) - F(u)Y ι dQn(u, υ , ί , 7) (1 - 6 - 7) {1 - F(υ )} 1 c?Qτ ι ('w,7j,^,7), for t > 0, where Qn is the empirical probability measure of the points (£/ ;, VJ, Δ t , Γ t ), i — Let Ji = [τ 2 _ i, r t ), i = 1, ..., k + 1, To = 0, τ jς +i = M and r t is a point of jump of F n , i = 1, ..., k. So τ \ and r^ are the first and last point of jump of Fn respectively. Restriction to a compact interval [0, M] is only needed to obtain the efficiency result Theorem 4.3, but not needed for Proposition 4, Corollary 4.1 and the consistency result (24). Now proposition 1.3 in GROENEBOOM AND WELLNER (1992) says P roposition 4 The function Fn maximizes the likelihood over all F G T if and only if ί dW Pn(t')<0, Vί > r i , (22) = 0. (23) and ί J Fn(t)dW Pn(t) [ri,r k] Moreover, Fn is uniquely determined by (22) and (23). N ote that there may be observation points before τ \ and beyond r*. However, there the N PMLE should be 0 and 1 respectively. (See the discussion before proposition 1.3 in GROENEBOOM AND WELLNER (1992).) Now the following corollary, proved in GESKUS AND GROENEBOOM ( 1995B) is an immediate consequence. 124 P.zyxwvutsrqponmlkjihgfedcbaZY Groeneboomzyxwvutsrqpon Corollary 4.1 Any function σ that is constant on the same intervals as Fn satisfies forzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA i = 2, ..., &. Re mark. In fact corollary 4.1 follows from Fenchel duality theory (see e.g. ROCKAFELLAR (1970), theorem 28.3). Moreover we have uniform consistency of the N P M LE of Fo (see G ROEN EBOOM AND WELLN ER (1992), part I I , section 4.3): Prob {jim^ \\Fn - F0\ \ oo = θ } = 1 (24) Another result that will be needed can be deduced from VAN D E G E E R (1993). Le mma 4.1 For i — 1,2, \ \ Fn - 1 6 F0\ \ Hi = C y n- ^Q o g n) / ) as n - oo, where H\ and H2 are the first and second marginal distribution function of H, respectively. In order to be able to use Lemma 4.1 one further specification is m ade to th e kind of functionals th at are allowed: (Dl) K(G) - K(Fo) = J R(x) d(G- )(x) + O(\\G - Fo\\ 22), for all distribution functions G with support contained in [0, M ], and where ||G - F0II2 is th e X2- distance between the distribution functions G and Fo w.r.t. Lebesgue measure on IR. We also make the following assumption: (D 2) The underlying distribution function Fo has a density bounded away from zero. By condition (D 2) and the strong consistency of th e N P M LE, there exists a constant c, such t h at Fn(u) - Fn(t) > c, if u - t > £ 0 , (25) Estimators for interval censoring problemszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONM 125 if n is sufficiently large. Combining all preceding results we then obtain the following theorem (Theorem 2.1 in GESKUS AND GROENEBOOM ( 1995B) ) , showing efficiency of the N PMLE:zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Theorem 4.3 Let the following conditions on FQ, H and κ be satisfied: Fo (Ml) to (MS), (SI) to (S5), (LI) and (L2) of the precedingsection, and (Dl) and (D2). Then we have Vύ (K(Fn)- K(F0))- ^N(0,\ \ θ \ \ 2 ) as QFo n - + oo (26) S ketch of proof: The proof boils down to proving the following relation n ) - K(Fo)) = V^J θ Fo d(Qn - QFo) + op(l). (27) Then an application of the central limit theorem yields that the N PMLE of K(FQ) has the desired asymptotically optimal behavior. The proof consists of the following steps. I. By conditions (SI) and ( D l) , and lemma 4.1 we have n ) - K(Fo)) = V^J «F 0 d(Fn - Fo) + op(l) II. For F G / , one can define a function φ p as a solution to the integral equation (19). This solution can be used to extend definition (20) to Θ F for F E f , where φ F(u)/ F(u) and φ F(v)/ (l - F(v)) are defined to be zero if F(u) = 0 or if F(v) = 1, respectively. N ote that θ p no longer has an interpretation as canonical gradient. In lemma 2.2 in GESKUS AND GROENEBOOM ( 1995B) the following is shown for θ p : k Fΰ d(Fn - F0) = - j θ Fn dQFo. III. Corollary 4.1 implies where <?Λ denotes the function defined in (20), but with the function • t'n φ p replaced by φ p , which is constant on the intervals of constancy of the N PMLE (and equals φ p at one point of the interval). We then get pn dQFo = V^Jθ pn d{Qn - QFo) + V^J(θ pn The second term can be shown to be o p ( l) . θ pn)dQFo 126 P.zyxwvutsrqponmlkjihgfedcbaZY Groeneboom IV. Th e first term is further split into Pn - θ )d(Qn Fo - QFo) The last term can be shown to be op(l), using a D onsker property of the class of functions under consideration. References M., BRU N K, H .D ., EWIN G , G .M., R E I D , W.T., SILVERMAN, E. (1955). An empirical distribution function for sampling with incomplete information, Ann. M ath. Statist., vol. 26, 641- 647. AYER, D . (1988). Nonparametric maximum likelihood estimation of the distribution function of interval censored observations, M aster's thesis, U niversity of Am sterdam . BARKER, BARLOW, R.E., BARTH OLOMEW, D .J., BREMN ER, J.M ., (1972). Statistical Inference under Order Restrictions, York. BRU N K, H .D . Wiley, New J. M., H ALL, W. J., H U AN G , W. M., AND WELLN ER, J. A. (1983). Information and asymptotic efficiency in parametric - nonparametric models Ann. Statist., vol. 11, 432- 452. BEG U N , P .J., KLAASSEN C.A.J., RITOV Y. AND WELLN ER J.A. (1993). Efficient and adaptive estimation in semiparametric models, John H opkins U niversity P ress, Baltimore. BICKEL M.S., SOLOMJAK, M.Z. (1967). Piecewise- polynomial approximations of functions in the classes W £. M ath. Sbornik. vol. 73, 295- 317. BIRMAN , G . E. AND LAG AKOS, S. W. (1982). Nonparametric estimation of lifetime and disease onset distributions from incomplete observations. Biometrics, vol. 38, 921- 932. D IN SE, S. VAN DE (1993). Rates of convergence for the maximum likelihood estimator in mixture models, Technical Report T W 93- 09, U niversity of Leiden. G EER R.B. (1992). Efficient estimation of the mean for interval censoring case II, Technical Report 92- 83, Delft U niversity of Technology. G ESKU S R.B. AND G ROEN EBOOM P . (1995a). Asymptotically optimal estimation of smooth functionals for interval censoring, part 1. To appear in Statistica N eerlandica (jubilee issue). G ESKU S Estimators for interval censoring problems 127 R.B. AND GROENEBOOM P. (1995b). Asymptotically optimal estimation of smooth functionals for interval censoring, part 2. Submitted to Statistica Neerlandica. GESKUS R.B. AND GROENEBOOM P. (1995c). Asymptotically optimal estimation of smooth functionals for interval censoring, case 2; observation times arbitrarily close, Technical Report, Delft University of Technology, to appear. GESKUS R.D. AND LEVIT, B.Y. (1992) Applications of the van Trees inequality: a Baysian Cramer-Rao bound. Preprint Nr. 773, Department of Mathematics, University Utrecht. GILL, P. (1987). Asymptotics for interval censored observations. Technical Report 87-18, Department of Mathematics, University of Amsterdam. GROENEBOOM, P. (1989). Brownian motion with a parabolic drift and Airy functions. Probability theory and related fields, vol. 81, 79-109. GROENEBOOM, P. (1991). Discussion on: Age-specific incidence and prevalence: a statistical perspective, by Niels Keiding. J. R. Statist. Soc. A,vol. 154, 400-401. GROENEBOOM, P. AND WELLNER J.A. (1992). Information bounds and nonparametric maximum likelihood estimation, Birkhauser Verlag. GROENEBOOM B.E. (1991). Nonparametric estimation of functionals for interval censored observations. Master's thesis, Delft University of Technology and Copenhagen University. HANSEN, J. AND WELLNER J.A. (1995a). Asymptotic normality of the NPMLE of linear functionals for interval censored data, case 1, to appear in Statistica Neerlandica. HUANG J. AND WELLNER J.A. (1995b). Efficient estimation for the proportional hazards model with "Case 2" interval censoring, submitted. HUANG G. (1995). Three statistical inverse problems. Ph.D. thesis, Delft University of Technology. JONGBLOED, G. (1995). The iterative convex minorant algorithm for nonparametric estimation, Technical Report, Delft University of Technology, to appear. JONGBLOED N. (1991) Age-specific incidence and prevalence: a statistical perspective (with discussion). J. R. Statist. Soc. A, vol. 154, 371-412. KEIDING, J., POLLARD, D. (1990). Cube root asymptotics. Ann. Statist., vol. 18, 191-219. KIM, 128 P. Groeneboom R. (1989). Linear integral equations, Applied Mathematical Sciences vol. 82, Springer Verlag, New York. KRESS (1973). Experimental survival curves for interval-censored data, Appl. Statist, vol. 22, p. 86-91. PETO T., WRIGHT, F.T., DYKSTRA, R.L. (1988). Order Restricted Statistical Inference. Wiley, New York. ROBERTSON, ROCKAFELLAR, R.T. (1970). Convex analysis, Princeton University Press. A. AND WELLNER, J. (1992). Uniform Donsker Classes of Functions, Ann. Prob.,vol 20, p. 1983-2030. SHEEHY, B.W. (1974). Nonparametric estimation of a survivorship function with doubly censored data. J. Amer. Statist. Assoc, vol. 69, 169173. TURNBULL, B.W. (1976). The empirical distribution function with arbitrarily grouped censored and truncated data. J.R. Statist. Soc. B, vol. 38, 290-295. TURNBULL, B. W. AND MITCHELL, T. J. (1984) Nonparametric estimation of the distribution of time to onset for specific diseases in survival/sacrifice experiments. Biometrics, vol. 40, 41-50. TURNBULL, H. L. (1968) Detection, Estimation and Modulation Theory, Part 1. Wiley, New York. VAN TREES A.W. VAN DER (1988). Statistical estimation in large parameter spaces, CWI Tract, vol. 44, Centrum voor Wiskunde en Informatica, Amsterdam. VAART A.W. VAN DER (1991). On differentiate functional, vol. 19, p. 178-204. VAART View publication stats Ann. Statist.,