2501.04937v1
2501.04937v1
2501.04937v1
dom vector with independent components Xi ∈ X , such that select one of these possible choices.
√
3.1. Fisher Information 2. n(θ̂ n − θ0 ) → N (0k , J−1 ).
The FIM is important for many reasons. First, it character- The first two conditions in Theorem 1 ensure a valid Tay-
izes the asymptotic variance of the MLE, as we show in Sec- lor series expansion of the log-likelihood function and enable
tion 3.2; second, via Cramér-Rao [16], it can be used to pro- one to apply the weak law of large numbers, while the third
vide a fundamental lower bound on some performance met- condition ensures the existence of the asymptotic covariance
rics, such as the mean square error and variance. Stoica et J−1 . From Theorem 1, it is clear that J−1 is the key to under-
al. [10] analyzed the FIM for censored data in the context standing the performance of the MLE.
of linear models across arbitrary distributions. In the follow-
ing proposition, we extend the characterization of the FIM for 4. EXAMPLES
GLMs, as discussed in Section 2.
Proposition 1. The FIM of estimating the true parameter vec- In this section, we consider two different densities pXi that
tor θ0 from the censored data {Bi }ni=1 is given by belong to the exponential family. In particular, in Section 4.1
we focus on the Gaussian distribution, whereas in Section 4.2
n
X we consider the Poisson distribution.
Jn = ViT Cov (E [TXi |Bi ]) Vi . (6)
i=1
4.1. Example 1: Gaussian Distribution
Proof. The proof is provided in Appendix B. We consider the following model:
It is interesting to compare the FIM in (6) to the FIM of
Xi = wi α + Ci , i ∈ [1 : n], (8)
estimating θ0 from the uncensored data {Xi }ni=1 , which fol-
lowing similar computations as in Appendix B is given by where: (i) {wi }ni=1 is a set of known constants; (ii) α is an
n
X unknown deterministic scalar (e.g., the unknown signal in a
In = ViT Cov (TXi ) Vi . (7) wireless sensor network [1], the amplitude of the radar cross
i=1 section [7]); and (iii) Ci ∼ N (0, σ 2 ). The model in (8) is
widely used in estimation from censored data [1, 8, 15]. The
One can establish an inequality between the two FIMs. probability density function of Xi in (8) is given by
Specifically, In Jn , which follows from the data-processing
x2 xwi α wi2 α2
inequality of the Fisher information [17] using the Markov 1
chain θ0 → Xi → Bi for i ∈ [1 : n]. This suggests a loss pXi (x) = √ exp − 2 + − . (9)
2πσ 2 2σ σ2 2σ 2
of information, as expected, when using censored data. The
magnitude of this loss will depend on the considered distri- We now consider three different cases depending on which
bution and choice of threshold τi for i ∈ [1 : n]. Note that quantities we want to estimate.
choosing a set of optimal τi ’s will help to maximize the FIM; • Case 1: Unknown mean and known variance. With refer-
however, this choice often relies on unknown parameters that ence to (1), we have that Tx = x and
need to be estimated. In Section 4, we will provide a choice
x2 w2 α2 σ 2 ηi,θ
2
of the optimal thresholds for the Gaussian distribution. 1
h(x) = √ exp − 2 , φ(ηi,θ ) = i 2 = ,
2πσ 2 2σ 2σ 2
wi
3.2. Consistency and Asymptotic Normality of the MLE ηi,θ = vi θ where vi = 2 and θ = α.
σ
The next theorem, whose proof is provided in Appendix C,
states the regularity conditions under which the MLE in (4) is For this case, the FIM in (6) is given by (see Appendix D.1)
consistent and asymptotically normal. n
X p2Xi (τi )
Jn = wi2 , (11)
Theorem 1. Provided that the following conditions hold: i=1
FXi (τi )(1 − FXi (τi ))
where the labeled equalities follow from: (a) the fact that
Appendices the Bi ’s are independent; hence, when i 6= j, the expected
value of the product can be written as the product of the ex-
A. PROOF OF (5) pected values and the expected value of the score is zero (see
Lemma 1 below); (b) applying Lemma 1 below; and (c) let-
From the definition of the log-likelihood function, ting Vi (s) denote the s-th column of Vi . This concludes the
n proof of Proposition 1.
X
ℓn (θ; {bi }ni=1 ) = log (PBi (bi ; θ)) (17) Lemma 1. For every i ∈ [1 : n] and r ∈ [1 : k], the score
i=1
n
function is given by
X
= log (PBi (bi ; θ))+φ(η i,θ )−φ(η i,θ ) d
∂ log PBi X
i=1 = (E[TXi (j)|Bi ]−E [TXi (j)])Vi (j, r), (20)
n h Z ∂θr j=1
(a) X i
= log pXi (x; θ) dx + φ(η i,θ ) − φ(η i,θ )
i=1
X (bi )
where TXi (j) is the j-th element of TXi and Vi (j, r) is the
n h Z (j, r)-th element of Vi .
(b) X
= log h(x) exp(hη i,θ , Tx i − φ(η i,θ )) dx Proof. From (18) we have log (PBi (bi ; θ)) = φ(η i,θ ; bi ) −
i=1
X (bi ) φ(η i,θ ) and by applying the chain rule, we have that
i
+ φ(η i,θ ) − φ(η i,θ ) ∂ log PBi (b; θ) ∂
n h
= φ(η i,θ ; b) − φ(η i,θ )
X Z i ∂θr ∂θr
= log h(x) exp(hη i,θ , Tx i) dx −φ(ηi,θ ) d
X ∂φ(η i,θ ; b) ∂φ(η i,θ ) ∂η i,θ (j)
i=1
X (bi ) = −
j=1
∂ηi,θ (j) ∂η i,θ (j) ∂θr
n
(c) X
= φ(η i,θ ; bi ) − φ(η i,θ ) , (18) d
X ∂φ(η i,θ ; b) ∂φ(η i,θ )
i=1 = − Vi (j, r)
j=1
∂ηi,θ (j) ∂η i,θ (j)
where the labeled equalities follow from: (a) substituting (3);
(b) using (1); and (c) letting φ(η i,θ ; bi ) be the log-partition d
X
function of pXi |Bi (x|bi ) (note that φ(η i,θ ; bi ) is a normaliza- = (E [TXi (j)|Bi = b] − E [TXi (j)]) Vi (j, r), (21)
tion quantity that ensures that pXi |Bi (x|bi ) is a valid density). j=1
where the last equality follows from [19].
h Note ialso that by ing the chain rule, we have that
∂ log PBi
the law of total expectation, we have E ∂θr = 0. This ∂ 3 log PBi ∂3
concludes the proof of Lemma 1. = φ(η i,θ ; b) − φ(η i,θ )
∂θr ∂θs ∂θt ∂θr ∂θs ∂θt
d X d X d
"
X ∂ 3 φ(η i,θ ; b)
=
C. PROOF OF THEOREM 1 j=1 ℓ=1 k=1
∂η i,θ (k)∂η i,θ (ℓ)∂η i,θ (j)
#
∂ 3 φ(η i,θ )
In [20], the authors developed a theory for the MLE when the − Vi (k, t)Vi (ℓ, s)Vi (j, r)
observations are independent and come from distinct, yet re- ∂ηi,θ (k)∂η i,θ (ℓ)∂η i,θ (j)
lated populations, i.e., with some parameters in common. The X d h
d X
d X i
authors referred to such populations as associated. In particu- = κTXi |Bi =b (j, ℓ, k) − κTXi (j, ℓ, k)
lar, the authors derived regularity conditions under which the j=1 ℓ=1 k=1
MLE of parameters in associated populations is shown to be × Vi (k, t)Vi (ℓ, s)Vi (j, r), (24)
consistent and asymptotically normal. In what follows, we
tailor these conditions to our GLM with 1-bit measurements where the last equality follows from [19] where
described in Section 2. κTXi |Bi =b (j, ℓ, k) =
1) We require the existence of the following partial deriva-
Y
E (TXi (u) − E[TXi (u)|Bi = b])|Bi = b ,
tives to ensure that the Taylor series expansion of the u∈{j,ℓ,k}
log-likelihood function in (17) exists [20, conditions (25a)
I(i)],
Y
∂ log PBi ∂ 2 log PBi ∂ 3 log PBi κTXi (j, ℓ, k) = E (TXi (u) − E[TXi (u)]) .
, , and (22) u∈{j,ℓ,k}
∂θr ∂θr ∂θs ∂θr ∂θs ∂θt
(25b)
3
for all (r, s, t) ∈ [1 : k] and i ∈ [1 : n]. We start with Thus, condition 1 and condition 2 in Theorem 1 en-
the first order partial derivative. From Lemma 1, sure the existence of ∂ 3 log PBi /∂θr ∂θs ∂θt for all
(r, s, t) ∈ [1 : k]3 .
d
∂ log PBi X 2) Now we check [20, conditions I(ii)], which consist of
= (E [TXi (j)|Bi ]−E [TXi (j)])Vi (j, r).
∂θr j=1 two parts. First, we require the convergence of the first
and second order partial derivatives, which will allow
Thus, condition 1 and condition 2 in Theorem 1 ensure to interchange the differentiation and summation. This
the existence of ∂ log PBi /∂θr for all r ∈ [1 : k]. is indeed satisfied, as from (2) we have
X ∂PB (b; θ) ∂PBi (1; θ) ∂PBi (−1; θ)
Similarly, for the second order partial derivative, by ap- i
= +
plying the chain rule, we arrive at ∂θr ∂θr ∂θr
b∈{1,−1}
From (46) and (47), it follows that the two limits in [20, Mi = K, (54)
n i=1
eq.13] are the same. Moreover, from (47) (or equiva-
lently (46)), we have that which is a finite positive constant.
n 6) This condition is [20, condition III] and it is needed to
1X T
J = lim Vi Cov (E [TXi |Bi ]) Vi , (48) ensure asymptotic normality. In particular, we need
n→∞ n
i=1
n k 2
1 X X X ∂ log PBi (b; θ)
which needs to be positive definite with finite determi- lim PBi (b; θ) = 0,
n→∞ n ∂θr
nant. This is ensured by condition 3 in Theorem 1. i=1 b∈D7i r=1
(56)
5) This condition is [20, condition II(iii)] and, as the two where D7i for all i ∈ [1 : n] is defined as follows,
above, it is needed to apply the weak law of large num-
bers for independent random variables [21, p.174]. In k
n hX ∂ log PBi (b; θ)2 i 12 √ o
order to check this condition we first need to find the D7i = b ∈ {−1, 1} : >ǫ n ,
∂θr
following set for all i ∈ [1 : n] r=1
n 2 n 2
1X X ∂ log PBi (b; θ) (a) 1X ∂ log PBi (b; θ)
lim − PBi (b; θ) = lim −E
n→∞ n ∂θr ∂θs n→∞ n ∂θr ∂θs
i=1 b∈D4i i=1
n d d
1 XXX
= lim (Cov (TXi (j), TXi (ℓ)) − E [Cov (TXi (j), TXi (ℓ)|Bi = b)]) Vi (ℓ, s)Vi (j, r)
n→∞ n
i=1 j=1 ℓ=1
n d d
(b) 1 XXX
= lim Cov (E [TXi (j)|Bi ] , E [TXi (ℓ)|Bi ]) Vi (j, r)Vi (ℓ, s) (47)
n→∞ n
i=1 j=1 ℓ=1
2 21
2 ! 21
k k d
X ∂ log PBi (b; θ) X X
= (E [TXi (j)|Bi = b] − E [TXi (j)]) Vi (j, r) (55)
r=1
∂θr r=1 j=1
r ∈ [1 : k]. Thus, for all i ∈ [1 : n] and r ∈ [1 : k], D. DERIVATION OF THE FIM FOR THE GAUSSIAN
there exists some positive constant C such that DISTRIBUTION
d
X √
(E [TXi (j)|Bi = b]−E [TXi (j)]) Vi (j, r) ≤ C, D.1. FIM in Case 1
j=1
(57)
To compute the FIM in Proposition 1, we need to derive the
which leads to
following variance,
k k √
X ∂ log PBi (b; θ) 2 12 X 21
≤ C = Ck. h
2
i
2
r=1
∂θr r=1 Var (E[TXi |Bi ]) = E (E[Xi |Bi ]) − (E[Xi ])
(58)
Thus, for a sufficiently large n, there will be no b ∈ = (E [Xi |Bi = 1])2 Pr(Bi = 1)
{−1, 1} such that 2 2
+ (E [Xi |Bi = −1]) Pr(Bi = −1) − (E[Xi ])
k p2Xi (τi )
X 2 12 √ = σ4 , (61)
∂ log PBi (b)/∂θr > ǫ n, (59) FXi (τi )(1 − FXi (τi ))
r=1
for any ǫ > 0. Hence, for large n, the set D7i will be where the last equality follows since
empty for any ǫ > 0, i.e.,
To compute the FIM in Proposition 1, we need to derive the Cov (E[TXi (1)|Bi ], E[TXi (2)|Bi ])
following variance, = Cov E[Xi |Bi ], E[Xi2 |Bi ]
By substituting (65) inside (6) with vi = −1/2, we ob- It is now a simple exercise to show that if the τi ’s, with i ∈
tain (12). [1 : n], are chosen i.i.d. from some absolutely continuous dis-
tribution, then (70) is satisfied almost surely. This concludes
the proof of Proposition 2.
D.3. FIM in Case 3
To compute the FIM in Proposition 1, we need to derive F. DERIVATION OF THE FIM FOR THE POISSON
Cov (E[TXi |Bi ]). In what follows, we let TXi (j), j ∈ [1 : 2] DISTRIBUTION
denote the j-th component of TXi . We start by noting that
from Case 1 in Appendix D.1, we have that The probability mass function of Xi in (15) is given by