2501.04937v1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

GENERALIZED LINEAR MODELS WITH 1-BIT MEASUREMENTS:

ASYMPTOTICS OF THE MAXIMUM LIKELIHOOD ESTIMATOR

Jaimin Shah∗, Martina Cardone∗ , Cynthia Rush† , Alex Dytso⋆

* University of Minnesota, Minneapolis, MN 55404, USA


† Columbia University, New York, NY 10025, USA
arXiv:2501.04937v1 [math.ST] 9 Jan 2025

⋆ Qualcomm Flarion Technology, Inc., Bridgewater, NJ 08807

ABSTRACT belong to the exponential family. For instance, Gaussian


This work establishes regularity conditions for consistency noise is widely used in numerous communication systems.
and asymptotic normality of the multiple parameter maxi- Similarly, Poisson noise, commonly called shot noise, is
mum likelihood estimator (MLE) from censored data, where frequently encountered in photon-limited imaging systems,
the censoring mechanism is in the form of 1-bit measure- optical communication, and various counting processes [4].
ments. The underlying distribution of the uncensored data The exponential distribution is broadly used in survival anal-
is assumed to belong to the exponential family, with natural ysis due to its suitability for modeling time-to-event data [5].
parameters expressed as a linear combination of the predic- While the maximum likelihood estimator (MLE) has
tors, known as generalized linear model (GLM). As part of been widely employed for parameter estimation from cen-
the analysis, the Fisher information matrix is also derived for sored data, previous studies have often been limited in scope,
both censored and uncensored data, which helps to quantify focusing on specific distributions tailored to particular appli-
the impact of censoring and assess the performance of the cations and estimating a single unknown parameter. Further-
MLE. The choice of GLM allows one to consider a variety more, there has been a lack of theoretical analysis regarding
of practical examples where 1-bit estimation is of interest. In the (necessary or sufficient) regularity conditions to ensure
particular, it is shown how the derived results can be used consistency and asymptotic normality. This work aims to
to analyze two practically relevant scenarios: the Gaussian address this gap by providing a comprehensive study of the
model with both unknown mean and variance, and the Pois- asymptotic behavior of the MLE under a rather general model
son model with an unknown mean. of censored data from an exponential family, where the cen-
soring mechanism is in the form of 1-bit measurements.
Index Terms— 1-bit measurements, censored data, ex- In the signal processing domain censored, or 1-bit data,
ponential family, maximum likelihood, generalized linear is often studied under an additive noise model (i.e., ordinary
model, Fisher information, radar, laser communications, linear regression model). In this work, we take a more general
Gaussian noise, Poisson noise. and unifying approach by adopting a generalized linear model
(GLM) [6]. The key idea behind the adopted GLM is that it
1. INTRODUCTION assumes that the natural parameters of the exponential fam-
ily distribution are described by a linear combination of the
In many engineering and scientific fields, parameter estima- predictors, hence making it a versatile tool in these contexts.
tion from censored data is a critical challenge due to the in-
herently limited information available from measurements or 1.1. Relevant literature
observations. Such scenarios frequently arise in many appli-
cations, such as: (i) sensor fusion, where power and band- Parameter estimation with 1-bit data has been widely studied,
width constraints force sensors to quantize observations to a and a comprehensive review is beyond this paper’s scope. In-
single bit, effectively censoring the data [1]; (ii) radar and stead, we focus on highlighting key relevant works.
wireless systems, where the use of 1-bit analog-to-digital con- Estimating the mean from 1-bit quantized signals has been
verters at the receiver end offers a power-efficient solution for extensively explored in various domains, including sensor fu-
future wireless systems that handle large signal bandwidths or sion, radar, and communication systems. Previous studies,
numerous radio frequency chains [2]; and (iii) survival analy- such as those in [1, 7], employed the MLE to estimate the
sis, where the event of interest (e.g., system failure or patient unknown mean under the assumption of Gaussian noise. No-
mortality) may not be observed within the study period [3]. tably, these works primarily focused on single parameter es-
In several applications where censoring mechanisms are timation where the variance is known. The GLM approach
applied or advantageous, the underlying distributions often taken in this paper allows one to also study models where both
Qn
the mean and variance are unknown. A rather general and its density is pX (x; θ) = i=1 pXi (xi ; θ), where
complete answer to the estimation of the mean from censored
data under an additive noise model was provided in [8]. In pXi (x; θ) = h(x) exp(hη i,θ , Tx i − φ(η i,θ )), (1)
particular, the authors of [8] derived asymptotically optimal
estimators and the rate of convergence for a censored model that is, each Xi has a density that belongs to the exponential
with additive noise coming from a log-concave distribution. family. With reference to (1), we have that: (i) h(x) : X →
Additionally, the authors of [8] provided these results under (0, +∞) is the base measure; (ii) Tx : X → Rd is the d-
various cooperation models between the 1-bit compressors. dimensional sufficient statistic vector; (iii) φ : X → R is
The focus on more general parameter estimation that ex- the log-partition function; and (iv) η i,θ is the d-dimensional
tends beyond mean estimation has received less attention. natural parameter vector, where in our model η i,θ = Vi θ
The authors of [9] provided a more in-depth treatment of with θ ∈ Θ being a k-dimensional vector that is common to
the MLE for an additive noise model with multiple parame- η i,θ , for all i ∈ [1 : n] and needs to be estimated, and Vi is a
ters, allowing correlation across noise samples. In [10], the known d × k matrix. We will refer to this setting as a GLM.
authors analyzed the Fisher information matrix (FIM) for We assume that X is not observed directly, and instead the
censored data in the context of linear models across arbitrary measurements are given by a 1-bit censoring mechanism:
distributions, providing important theoretical foundations for (
parameter estimation in these settings. Estimation of param- 1 if Xi ≤ τi ,
Bi = (2)
eters in auto-regressive processes with additive noise from −1 if Xi > τi ,
censored data was considered in [11, 12]. Kalman and parti-
cle filter methods for estimating signals from censored data where τi for i ∈ [1 : n] is a fixed threshold and an interior
were explored in works like [13,14]. Recent approaches, such point of X . Therefore, for b ∈ {−1, 1}, we have that1
as [13,15], employed deep neural networks and reinforcement Z
learning under Gaussian noise assumptions.
PBi (b; θ) = Pr(Bi = b) = pXi (x; θ) dx, (3)
X (b)
1.2. Our Contributions and Paper Outline
The paper contribution and outline are as follows: where X (b) is a subset of X determined by b; specifically,
X (1) = {x ∈ X : x ≤ τi } and X (−1) = {x ∈ X : x > τi }.
1. Section 2 presents the censored GLM and the MLE. In many applications, as discussed in Section 1, one can
n
2. Section 3 presents our main results. Proposition 1, in Sec- only observe the censored data {bi }i=1 , which are realiza-
n
tion 3.1, characterizes the structure of the FIM under the tions of {Bi }i=1 . From these observed data samples we
censored GLM. Theorem 1, in Section 3.2, provides a set seek to estimate the k-dimensional vector θ using the MLE
of conditions, which can be considered mild, that guaran- method. In particular, the MLE is defined as follows2 ,
tee the consistency and asymptotic normality of the MLE.
n
3. Section 4 demonstrates the versatility of our approach by θ̂ n = arg max ℓn (θ; {bi }i=1 ) , (4)
θ∈Θ
focusing on several practically relevant examples. We an-
alyze the optimal asymptotic performance of the MLE for n Pn
where ℓn (θ; {bi }i=1 ) = i=1 log (PBi (bi ; θ)) is the log-
two common noise distributions: Gaussian and Poisson. likelihood function. By using (3), it is not difficult to show
Notably, we analyze the case of unknown mean and vari- that (see Appendix A for the detailed computation),
ance for Gaussian noise, which is practically important
n
(e.g., 1-bit radar) but has received limited attention. For X
ℓn (θ; {bi }ni=1 ) =
 
both the Gaussian and Poisson models, we establish sim- φ(η i,θ ; bi ) − φ(η i,θ ) , (5)
i=1
ple conditions for consistency and asymptotic normality.
Notation. For any k ∈ N, we define [1 : k] = {1, 2, . . . , k}; where φ(η i,θ ; bi ) is the log-partition function of pXi |Bi (x|bi ).
logarithms are in base e; 0k is the column vector of dimension
k of all zeros; kxk is the L2 norm of x and kxk∞ is the L- 3. MAIN RESULT
infinity norm of x. For two nonnegative definite matrices A
and B of equal size, we let A  B indicate that A − B is Here, we present our main result. We start by characterizing
nonnegative definite. We use the notion of weak consistency. the FIM, which plays a significant role in our analysis.
1 This should be understood in the sense of Lebesgue integral where dx
2. PROBLEM FORMULATION
is some appropriate dominating measure. In particular, for discrete distribu-
 T tions, it should be understood as a summation.
Let X = X1 X2 . . . Xn be an n-dimensional ran- 2 We note that the MLE may not be unique. In this case, we randomly

dom vector with independent components Xi ∈ X , such that select one of these possible choices.

3.1. Fisher Information 2. n(θ̂ n − θ0 ) → N (0k , J−1 ).
The FIM is important for many reasons. First, it character- The first two conditions in Theorem 1 ensure a valid Tay-
izes the asymptotic variance of the MLE, as we show in Sec- lor series expansion of the log-likelihood function and enable
tion 3.2; second, via Cramér-Rao [16], it can be used to pro- one to apply the weak law of large numbers, while the third
vide a fundamental lower bound on some performance met- condition ensures the existence of the asymptotic covariance
rics, such as the mean square error and variance. Stoica et J−1 . From Theorem 1, it is clear that J−1 is the key to under-
al. [10] analyzed the FIM for censored data in the context standing the performance of the MLE.
of linear models across arbitrary distributions. In the follow-
ing proposition, we extend the characterization of the FIM for 4. EXAMPLES
GLMs, as discussed in Section 2.
Proposition 1. The FIM of estimating the true parameter vec- In this section, we consider two different densities pXi that
tor θ0 from the censored data {Bi }ni=1 is given by belong to the exponential family. In particular, in Section 4.1
we focus on the Gaussian distribution, whereas in Section 4.2
n
X we consider the Poisson distribution.
Jn = ViT Cov (E [TXi |Bi ]) Vi . (6)
i=1
4.1. Example 1: Gaussian Distribution
Proof. The proof is provided in Appendix B. We consider the following model:
It is interesting to compare the FIM in (6) to the FIM of
Xi = wi α + Ci , i ∈ [1 : n], (8)
estimating θ0 from the uncensored data {Xi }ni=1 , which fol-
lowing similar computations as in Appendix B is given by where: (i) {wi }ni=1 is a set of known constants; (ii) α is an
n
X unknown deterministic scalar (e.g., the unknown signal in a
In = ViT Cov (TXi ) Vi . (7) wireless sensor network [1], the amplitude of the radar cross
i=1 section [7]); and (iii) Ci ∼ N (0, σ 2 ). The model in (8) is
widely used in estimation from censored data [1, 8, 15]. The
One can establish an inequality between the two FIMs. probability density function of Xi in (8) is given by
Specifically, In  Jn , which follows from the data-processing
x2 xwi α wi2 α2
 
inequality of the Fisher information [17] using the Markov 1
chain θ0 → Xi → Bi for i ∈ [1 : n]. This suggests a loss pXi (x) = √ exp − 2 + − . (9)
2πσ 2 2σ σ2 2σ 2
of information, as expected, when using censored data. The
magnitude of this loss will depend on the considered distri- We now consider three different cases depending on which
bution and choice of threshold τi for i ∈ [1 : n]. Note that quantities we want to estimate.
choosing a set of optimal τi ’s will help to maximize the FIM; • Case 1: Unknown mean and known variance. With refer-
however, this choice often relies on unknown parameters that ence to (1), we have that Tx = x and
need to be estimated. In Section 4, we will provide a choice
x2 w2 α2 σ 2 ηi,θ
2
 
of the optimal thresholds for the Gaussian distribution. 1
h(x) = √ exp − 2 , φ(ηi,θ ) = i 2 = ,
2πσ 2 2σ 2σ 2
wi
3.2. Consistency and Asymptotic Normality of the MLE ηi,θ = vi θ where vi = 2 and θ = α.
σ
The next theorem, whose proof is provided in Appendix C,
states the regularity conditions under which the MLE in (4) is For this case, the FIM in (6) is given by (see Appendix D.1)
consistent and asymptotically normal. n
X p2Xi (τi )
Jn = wi2 , (11)
Theorem 1. Provided that the following conditions hold: i=1
FXi (τi )(1 − FXi (τi ))

1. lim max E kTXi k3 < ∞,


 
n→∞ i∈[1:n] where FXi denotes the cumulative distribution function. This
recovers a well-known result in [8]. Thus, for Theorem 1 to
2. lim max kVi k∞ < ∞, and
n→∞ i∈[1:n] hold, it is sufficient that: (i) limn→∞ maxi∈[1:n] wi < ∞,
which often holds in practice; and (ii) limn→∞ n1 Jn (with Jn
3. J := lim n1 Jn exists and is positive definite with finite
n→∞ defined in (11)) exists and is positive, which also holds as long
determinant, where Jn is the FIM in Proposition 1. as the wi ’s are chosen non-trivially.
Then, the following two results are true: The maximum value of the FIM in (11) occurs Pn at τi =
wi α for i ∈ [1 : n], and it is given by Jn = πσ2 2 i=1 wi2 (to
1. θ̂n is a consistent estimator of θ 0 , and show this, it suffices to find the first and second derivatives of
each term in (11) with respect to τi ). Similarly, the FIM in (7) 0.01 Uncensored Data
Pr(τi = 0.42) = Pr(τi = 2) = 0.5
Pn α from
of estimating the uncensored data {Xi }ni=1 is given by Pr(τi = 1.2) = Pr(τi = 1.9) = 0.5
1 2
In = σ2 i=1 wi . Thus, the censoring mechanism requires

Mean square error


a fraction π/2 (≈ 1.6) of additional data to match the FIM
performance of uncensored data. This result extends to sensor
fusion, where n is the number of sensors: if n sensors achieve
a target FIM with uncensored data, then ⌈nπ/2⌉ sensors are 0.001
required for the same performance with censored data [1].
• Case 2: Known mean and unknown variance. With refer-
ence to (1), we have that Tx = (x−µi )2 with µi = wi α, and

ηi,θ = vi θ where vi = −1/2 and θ = 1/σ 2 , 2 4 6 8 10


1 1 n (×103)
h(x) = (2π)− 2 , φ(ηi,θ ) = log(|σ|) = − log(|2ηi,θ |).
2
Fig. 1. Case 3: Unknown mean and variance; Xi ∼ N (2, 1)
For this case, the FIM in (6) is given by (see Appendix D.2) for all i ∈ [1 : n].
n
σ4 p2Xi (τi )
(τi − wi α)2
X
Jn = . (12) dotted curve in Fig. 1 corresponds to the case when the τi ’s in
i=1
4 FXi (τi )(1 − FXi (τi ))
Case 3 are chosen to be either 1.2 or 1.9 with equal probabil-
Thus, for Theorem 1 to hold, similar to Case 1, it suffices that: ity. These values were selected randomly, but close to the true
(i) limn→∞ maxi∈[1:n] wi < ∞, which holds in practice; and parameters. From Fig. 1, we observe that the performance of
(ii) limn→∞ n1 Jn (with Jn defined in (12)) exists and is pos- the MLE improves as n increases; however, for a given n, the
itive, which, for example, is satisfied almost surely if the τi ’s performance3 depends on the choice of the τi ’s. From Fig. 1,
are chosen i.i.d. from an absolutely continuous distribution. it is also apparent that the mean square error from uncensored
• Case 3: Unknown mean andunknown data (i.e., solid curve) is smaller than the one from censored
T variance. With ref- data.
erence to (1), we have Tx = x x2 , h(x) = √12π , and
We conclude the Gaussian example with a proposition
  providing sufficient conditions on the τi′ s to satisfy (almost
wi 0 α 1 T
 surely) condition 3 of Theorem 1 for Case 3. The proof is in
η i,θ = Vi θ where Vi = and θ = ,
0 − 12 σ2 σ2
Appendix E.
wi2 α2
φ(η i,θ ) = + log |σ|. Proposition 2. Consider the model in (8) with unknown mean
2σ 2
and variance. Assume that limn→∞ maxi∈[1:n] wi < ∞, and
For this case, the FIM in (6) is given by (see Appendix D.3) limn→∞ |{wi :wni 6=0}| > 0. Then, choosing the τi ’s i.i.d. from
n some absolutely continuous distribution suffices to ensure that
p2Xi (τi )
lim 1 Jn , with Jn defined in (14), is positive definite almost
X
Jn = σ4 n→∞ n
FXi (τi )(1 − FXi (τi ))
i=1 surely.
" # (14)
wi2 − w2i (τi + wi α)
× (τi +wi α)2 .
− w2i (τi + wi α) 4
4.2. Example 2: Poisson Distribution
Remark 1. Each matrix in the sum in (14) is singular. How- We consider the following model:
ever, this does not necessarily imply that Jn is singular. To see
this, consider the following simple example: n = 2, α = 1, Xi ∼ Poisson(exp(vi θ)), i ∈ [1 : n], (15)
wi = 1, i ∈ [1 : 2], σ = 1, τ1 = −1, and τ2 = 2. In this
case, even if both matrices in the sum in (14) are singular, the where (i) {vi }ni=1 is a set of known finite constants, and (ii) θ
determinant of J2 in (14) is 0.1294, i.e., J2 is not singular. is an unknown deterministic scalar. This model is widely used
in low-power laser communications [18], where exp(vi θ) re-
Fig. 1 offers an illustration of the performance of the MLE
lates to the applied current, and in photonics applications like
for Case 3 when Xi ∼ N (2, 1) for all i ∈ [1 : n]. For
image acquisition with binary Poisson statistics [4], where vi
Case 1, the optimal choice of the τi ’s would be τi = 2, for all
represents the detector efficiency and θ is the light source in-
i ∈ [1 : n], whereas for Case 2 the optimal choice of the τi ’s
tensity. The probability mass function of Xi belongs to the
would be τi = 0.42 for all i ∈ [1 : n]. The dashed curve in
Fig. 1 corresponds to the case when the τi ’s in Case 3 are cho- 3 Deriving an optimal choice of {τ }n
i i=1 for Case 3 is an interesting open
sen to be either 2 or 0.42 with equal probability. Similarly, the problem, which is worth of further investigation.
exponential family and the FIM in (6) is given by B. PROOF OF PROPOSITION 1
n 2
X (FXi (τi )−FXi (τi − 1)) For brevity, let θr denote the rth element of θ for all r ∈ [1 :
Jn = vi2 exp(2vi θ) , (16)
i=1
FXi (τi ) (1 − FXi (τi )) k]. The (r, s)th element of the FIM, indicated as Jr,s , is
which is computed in Appendix F. For Theorem 1 to hold, n X n
hX ∂ log PBi (Bi ; θ) ∂ log PBj (Bj ; θ) i
it suffices that: (i) limn→∞ maxi∈[1:n] vi < ∞, which holds Jr,s = E
∂θr ∂θs
in practice; and (ii) limn→∞ n1 Jn (with Jn defined in (16)) i=1 j=1
exists and is positive, which also holds as long as the τi ≥ 0 (a)
hXn
∂ log PBi (Bi ; θ) ∂ log PBi (Bi ; θ) i
for all i ∈ [1 : n] and the vi ’s are chosen non-trivially. = E
i=1
∂θr ∂θs
n d
5. CONCLUSION (b) X h X 
= E (E [TXi (j)|Bi ] − E [TXi (j)]) Vi (j, r)
i=1 j=1
This work has established regularity conditions for the con-
d
sistency and asymptotic normality of the MLE of multiple pa- X i
rameters from a censored GLM model. The derived condi- (E [TXi (ℓ)|Bi ] − E [TXi (ℓ)]) Vi (ℓ, s)
tions can be considered mild. The FIM under both censored ℓ=1
n X
d X
d
and uncensored data scenarios has been characterized, which X
is crucial for quantifying the information loss due to censor- = Cov (E [TXi (j)|Bi ] , E [TXi (ℓ)|Bi ])
i=1 j=1 ℓ=1
ing. Finally, the asymptotic performance of the MLE has been
analyzed for two common noise distributions: Gaussian, with × Vi (j, r)Vi (ℓ, s)
both unknown mean and variance, and Poisson, with an un- (c)
n
X
known mean. = ViT (r)Cov (E [TXi |Bi ]) Vi (s), (19)
i=1

where the labeled equalities follow from: (a) the fact that
Appendices the Bi ’s are independent; hence, when i 6= j, the expected
value of the product can be written as the product of the ex-
A. PROOF OF (5) pected values and the expected value of the score is zero (see
Lemma 1 below); (b) applying Lemma 1 below; and (c) let-
From the definition of the log-likelihood function, ting Vi (s) denote the s-th column of Vi . This concludes the
n proof of Proposition 1.
X
ℓn (θ; {bi }ni=1 ) = log (PBi (bi ; θ)) (17) Lemma 1. For every i ∈ [1 : n] and r ∈ [1 : k], the score
i=1
n
function is given by
X  
= log (PBi (bi ; θ))+φ(η i,θ )−φ(η i,θ ) d
∂ log PBi X
i=1 = (E[TXi (j)|Bi ]−E [TXi (j)])Vi (j, r), (20)
n h  Z ∂θr j=1
(a) X  i
= log pXi (x; θ) dx + φ(η i,θ ) − φ(η i,θ )
i=1
X (bi )
where TXi (j) is the j-th element of TXi and Vi (j, r) is the
n h  Z (j, r)-th element of Vi .
(b) X 
= log h(x) exp(hη i,θ , Tx i − φ(η i,θ )) dx Proof. From (18) we have log (PBi (bi ; θ)) = φ(η i,θ ; bi ) −
i=1
X (bi ) φ(η i,θ ) and by applying the chain rule, we have that
i
+ φ(η i,θ ) − φ(η i,θ ) ∂ log PBi (b; θ) ∂ 
n h
= φ(η i,θ ; b) − φ(η i,θ )
X  Z  i ∂θr ∂θr
= log h(x) exp(hη i,θ , Tx i) dx −φ(ηi,θ ) d
X ∂φ(η i,θ ; b) ∂φ(η i,θ )  ∂η i,θ (j)

i=1
X (bi ) = −
j=1
∂ηi,θ (j) ∂η i,θ (j) ∂θr
n
(c) X  
= φ(η i,θ ; bi ) − φ(η i,θ ) , (18) d 
X ∂φ(η i,θ ; b) ∂φ(η i,θ )

i=1 = − Vi (j, r)
j=1
∂ηi,θ (j) ∂η i,θ (j)
where the labeled equalities follow from: (a) substituting (3);
(b) using (1); and (c) letting φ(η i,θ ; bi ) be the log-partition d
X
function of pXi |Bi (x|bi ) (note that φ(η i,θ ; bi ) is a normaliza- = (E [TXi (j)|Bi = b] − E [TXi (j)]) Vi (j, r), (21)
tion quantity that ensures that pXi |Bi (x|bi ) is a valid density). j=1
where the last equality follows from [19].
h Note ialso that by ing the chain rule, we have that
∂ log PBi
the law of total expectation, we have E ∂θr = 0. This ∂ 3 log PBi ∂3 
concludes the proof of Lemma 1. = φ(η i,θ ; b) − φ(η i,θ )
∂θr ∂θs ∂θt ∂θr ∂θs ∂θt
d X d X d
"
X ∂ 3 φ(η i,θ ; b)
=
C. PROOF OF THEOREM 1 j=1 ℓ=1 k=1
∂η i,θ (k)∂η i,θ (ℓ)∂η i,θ (j)
#
∂ 3 φ(η i,θ )
In [20], the authors developed a theory for the MLE when the − Vi (k, t)Vi (ℓ, s)Vi (j, r)
observations are independent and come from distinct, yet re- ∂ηi,θ (k)∂η i,θ (ℓ)∂η i,θ (j)
lated populations, i.e., with some parameters in common. The X d h
d X
d X i
authors referred to such populations as associated. In particu- = κTXi |Bi =b (j, ℓ, k) − κTXi (j, ℓ, k)
lar, the authors derived regularity conditions under which the j=1 ℓ=1 k=1
MLE of parameters in associated populations is shown to be × Vi (k, t)Vi (ℓ, s)Vi (j, r), (24)
consistent and asymptotically normal. In what follows, we
tailor these conditions to our GLM with 1-bit measurements where the last equality follows from [19] where
described in Section 2. κTXi |Bi =b (j, ℓ, k) =
 
1) We require the existence of the following partial deriva-
Y
E (TXi (u) − E[TXi (u)|Bi = b])|Bi = b ,
tives to ensure that the Taylor series expansion of the u∈{j,ℓ,k}
log-likelihood function in (17) exists [20, conditions (25a)
I(i)],  
Y
∂ log PBi ∂ 2 log PBi ∂ 3 log PBi κTXi (j, ℓ, k) = E  (TXi (u) − E[TXi (u)]) .
, , and (22) u∈{j,ℓ,k}
∂θr ∂θr ∂θs ∂θr ∂θs ∂θt
(25b)
3
for all (r, s, t) ∈ [1 : k] and i ∈ [1 : n]. We start with Thus, condition 1 and condition 2 in Theorem 1 en-
the first order partial derivative. From Lemma 1, sure the existence of ∂ 3 log PBi /∂θr ∂θs ∂θt for all
(r, s, t) ∈ [1 : k]3 .
d
∂ log PBi X 2) Now we check [20, conditions I(ii)], which consist of
= (E [TXi (j)|Bi ]−E [TXi (j)])Vi (j, r).
∂θr j=1 two parts. First, we require the convergence of the first
and second order partial derivatives, which will allow
Thus, condition 1 and condition 2 in Theorem 1 ensure to interchange the differentiation and summation. This
the existence of ∂ log PBi /∂θr for all r ∈ [1 : k]. is indeed satisfied, as from (2) we have
X ∂PB (b; θ) ∂PBi (1; θ) ∂PBi (−1; θ)
Similarly, for the second order partial derivative, by ap- i
= +
plying the chain rule, we arrive at ∂θr ∂θr ∂θr
b∈{1,−1}

∂FXi (τi ) ∂(1 − FXi (τi ))


∂ 2 log PBi ∂2  = + = 0, (26)
= φ(η i,θ ; b) − φ(η i,θ ) ∂θr ∂θr
∂θr ∂θs ∂θr ∂θs
d X d
! and similarly we have that
X ∂ 2 φ(η i,θ ; b) ∂ 2 φ(η i,θ )
= − X ∂ 2 PB (b; θ) ∂ 2 FX (τi ) ∂ 2 (1−FX (τi ))
j=1
∂ηi,θ (ℓ)∂η i,θ (j) ∂ηi,θ (ℓ)∂η i,θ (j) i
= i
+ i
= 0.
ℓ=1
∂θr ∂θs ∂θr ∂θs ∂θr ∂θs
b∈{1,−1}
× Vi (ℓ, s)Vi (j, r)
(27)
d X
d
X Hence, both the first order and second order partial
= [Cov (TXi (j), TXi (ℓ)|Bi = b) derivatives converge.
j=1 ℓ=1
Second, we need to check that the third order derivative
−Cov (TXi (j), TXi (ℓ))] Vi (ℓ, s)Vi (j, r), (23) is finite. From (24), we have that
d d d
where the last equality follows from [19]. Thus, con- ∂ 3 log PBi XXXh
= κTXi |Bi =b (j, ℓ, k)
dition 1 and condition 2 in Theorem 1 ensure the exis- ∂θr ∂θs ∂θt j=1 ℓ=1 k=1
tence of ∂ 2 log PBi /∂θr ∂θs for all (r, s) ∈ [1 : k]2 . i
− κTXi (j, ℓ, k) Vi (k, t)Vi (ℓ, s)Vi (j, r). (28)
Finally, for the third order partial derivative, by apply-
Under condition 1 and condition 2 in Theorem 1, there From Lemma 1, we have that
exists some finite positive constant K such that
d
∂ log PBi X
d X
d X
d h = (E[TXi(j)|Bi ] − E[TXi (j)])Vi (j, r).
∂θr
X i
κTXi |Bi =b (j, ℓ, k) − κTXi (j, ℓ, k) j=1
j=1 ℓ=1 k=1 (36)
Hence, whenever condition 1 and condition 2 in Theo-
× Vi (k, t)Vi (ℓ, s)Vi (j, r) ≤ K, (29) rem 1 hold, there exists some constant C such that

for all (r, s, t) ∈ [1 : k]3 and i ∈ [1 : n]. In other d


X √
words, with reference to [20, condition I(ii)], we have (E[TXi (j)|Bi ]−E[TXi (j)]) Vi (j, r) ≤ C,
that Hirst (b) = K for all (r, s, t) ∈ [1 : k]3 , b ∈ j=1

{−1, 1}, and i ∈ [1 : n]. Now, it holds that (37)


for b ∈ {−1, 1} and for all i ∈ [1 : n] and r ∈ [1 : k].
Therefore, the left-hand side of (35) can be written as
X X
Hirst (b)PBi(b; θ) = K PBi (b; θ) = K,
b∈{1,−1} b∈{1,−1} n d
X
(30)
X X
(E [TXi (j)|Bi = b]
for all θ ∈ Θ, (r, s, t) ∈ [1 : k]3 , and i ∈ [1 : n]. i=1 b∈{−1,1} j=1
Hence, for all i ∈ [1 : n], with reference to [20, condi- 2
tion I(ii)], we can set Mi = K. −E [TXi (j)]) Vi (j, r) PBi (b; θ)
n
3) This condition is [20, condition II(i)] and it is needed to X X
apply the weak law of large numbers for independent ≤ CPBi (b; θ) = Cn. (38)
i=1 b∈{−1,1}
random variables [21, p.174]. In order to check this
condition we first need to find the set D1i , which for all Thus, (35) is satisfied for all r ∈ [1 : k].
i ∈ [1 : n] and for all r ∈ [1 : k] is given by
  4) This condition is [20, condition II(ii)] and, as the one
∂ log PBi (b; θ) above, it is needed to apply the weak law of large num-
D1i = b ∈ {−1, 1} : > n . (31)
∂θr bers for independent random variables [21, p.174]. In
Under condition 1 and condition 2 in Theorem 1, the order to check this condition we first need to find the
derivative ∂ log PBi /∂θr is finite for all i ∈ [1 : n] and set D3i which for all i ∈ [1 : n], is given by
r ∈ [1 : k]. Thus, for a sufficiently large n, there will be 
∂ 2 log PBi (b; θ)

no b ∈ {−1, 1} such that
∂ log P
Bi (b;θ)
> n. Hence, D3i = b ∈ {−1, 1} : > n . (39)
∂θ ∂θr ∂θs
for large n, the set D1i will be empty, i.e.,
Whenever condition 1 and condition 2 in Theorem 1
D1i = ∅ for all i ∈ [1 : n]. (32) hold, the derivative ∂ 2 log PBi /∂θr ∂θs is finite for all
i ∈ [1 : n] (see our analysis of (23)). Thus, for a suffi-
Hence, ciently large n, there will be no b ∈ {−1, 1} such that
n X
X ∂ 2 log PBi (b;θ)
PBi (b; θ) = 0, (33) ∂θr ∂θs > n. Hence, for large n, the set D3i
i=1 b∈D1i will be empty, i.e.,
Pn P
which satisfies the condition i=1 b∈D1i PBi (b; θ) =
D3i = ∅ for all i ∈ [1 : n]. (40)
o(1) [20, conditions II(i)]. Similarly, we can find the
set D2i for all i ∈ [1 : n] and r ∈ [1 : k] as follows, Therefore,
 
∂ log PBi (b; θ) n X
D2i = b ∈ {−1, 1} : < n . (34)
X
∂θr PBi (b; θ) = 0 = o(1), (41)
i=1 b∈D3i
Using the same reasoning as above, D2i = {−1, 1} for
a sufficiently large n whenever condition 1 and condi- as required by [20, condition II(ii)]. Similarly, we can
tion 2 in Theorem 1 hold. Now, we need to check the find the set D4i for all i ∈ [1 : n] as follows,
following condition [20, condition II(i)],
∂ 2 log PBi (b; θ)
 
n X 
D4i = b ∈ {−1, 1} : < n . (42)
X ∂ log PBi (b; θ)
2 ∂θr ∂θr
PBi (b; θ) = o(n2 ).
i=1 b∈D2i
∂θ r Using the same reasoning as before we have, D4i =
(35) {−1, 1} for a sufficiently large n. Now, we need to
check the following condition [20, condition II(ii)], n, there will be no b ∈ {−1, 1} such that Hirst (b) > n
2 for all i ∈ [1 : n]. Hence, for large n, the set D5i is
n X 2
X ∂ log PBi (b; θ) empty, i.e.,
PBi (b; θ) = o(n2 ), (43)
i=1
∂θ r ∂θ s
b∈D4i D5i = ∅ for all i ∈ [1 : n]. (49)
∂ 2 log P
where ∂θr ∂θsBi is given in (23). Hence, whenever con- Hence,
n X
dition 1 and condition 2 in Theorem 1 hold, there exists X
some constant C̃ such that PBi (b; θ) = 0. (50)
i=1 b∈D5i
d X
d  Pn P
X This satisfies the condition i=1 b∈D5i PBi (b; θ) =
Cov (TXi (j), TXi (ℓ)|Bi = b)
o(1) [20, condition II(iii)]. Similarly, we can find the
j=1 ℓ=1
 p set D6i for all i ∈ [1 : n] as follows,
− Cov (TXi (j), TXi (ℓ)) Vi (ℓ, s)Vi (j, r) ≤ C̃,
D6i = {b ∈ {−1, 1} : Hirst (b) < n} . (51)
(44)
Using the same reasoning as above we have that D6i =
for all i ∈ [1 : n] and (r, s) ∈ [1 : k]2 . Therefore,
{−1, 1} for a sufficiently large n. Now, we need to
n X  2 2 check the following condition [20, condition II(iii)],
X ∂ log PBi (b; θ)
PBi (b; θ) n X
∂θr ∂θs X
2
i=1 b∈D4i
(b) PBi (b; θ) = o n2 .

Hirst (52)
n
X X i=1 b∈D6i
≤ C̃PBi (b; θ) = C̃n. (45)
i=1 b∈{−1,1} From 2) above, we have that Hirst (b) = K for all
(r, s, t) ∈ [1 : k]3 , b ∈ {−1, 1}, and i ∈ [1 : n].
Thus, (43) is satisfied. Always with reference to [20, Hence,
condition II(ii)], we also need to verify that the two lim- n X
X
2
its in [20, eq.13] are the same. In particular, the limit Hirst (b) PBi (b; θ)
on the right-hand side of [20, eq.13] is given in (46), i=1 b∈D6i
whereas the limit on the left-hand side of [20, eq.13] is X n X
2
given in (47), both at the top of the next page, where ≤ K PBi (b; θ) = K 2 n = o(n2 ), (53)
the equality in (a) follows since, as proved above, for a i=1 b∈D6i
sufficiently large n, we have that D4i = {−1, 1}; and
and
the equality in (b) is due to the law of total covariance. 1X
n

From (46) and (47), it follows that the two limits in [20, Mi = K, (54)
n i=1
eq.13] are the same. Moreover, from (47) (or equiva-
lently (46)), we have that which is a finite positive constant.
n 6) This condition is [20, condition III] and it is needed to
1X T
J = lim Vi Cov (E [TXi |Bi ]) Vi , (48) ensure asymptotic normality. In particular, we need
n→∞ n
i=1
n k  2
1 X X X ∂ log PBi (b; θ)
which needs to be positive definite with finite determi- lim PBi (b; θ) = 0,
n→∞ n ∂θr
nant. This is ensured by condition 3 in Theorem 1. i=1 b∈D7i r=1
(56)
5) This condition is [20, condition II(iii)] and, as the two where D7i for all i ∈ [1 : n] is defined as follows,
above, it is needed to apply the weak law of large num-
bers for independent random variables [21, p.174]. In k 
n hX ∂ log PBi (b; θ)2 i 12 √ o
order to check this condition we first need to find the D7i = b ∈ {−1, 1} : >ǫ n ,
∂θr
following set for all i ∈ [1 : n] r=1

for every ǫ > 0. By using Lemma 1, we have that (55)


D5i = {b ∈ {−1, 1} : Hirst (b) > n} .
at the top of the next page holds.
As discussed above in 2), whenever condition 1 and Moreover, as shown above in 1), whenever condi-
condition 2 in Theorem 1 hold, we have that Hirst is tion 1 and condition 2 in Theorem 1 hold, the deriva-
finite for all i ∈ [1 : n]. Thus, for a sufficiently large tive ∂ log PBi /∂θr is finite for all i ∈ [1 : n] and
n n  
1X ∂ log PBi (b; θ) ∂ log PBi (b; θ)
X 1X ∂ log PBi (Bi ; θ) ∂ log PBi (Bi ; θ)
lim PBi (b; θ) = lim E
n→∞ n ∂θr ∂θs n→∞ n ∂θr ∂θs
i=1 b∈{1,−1} i=1
  
n d d
1 X X X
= lim E  (E [TXi (j)|Bi ] − E [TXi (j)]) Vi (j, r)  (E [TXi (j)|Bi ] − E [TXi (j)]) Vi (j, s)
n→∞ n
i=1 j=1 j=1
n d d
1 XXX
= lim Cov (E [TXi (j)|Bi ] , E [TXi (ℓ)|Bi ]) Vi (j, r)Vi (ℓ, s) (46)
n→∞ n
i=1 j=1 ℓ=1

n  2  n  2 
1X X ∂ log PBi (b; θ) (a) 1X ∂ log PBi (b; θ)
lim − PBi (b; θ) = lim −E
n→∞ n ∂θr ∂θs n→∞ n ∂θr ∂θs
i=1 b∈D4i i=1
n d d
1 XXX
= lim (Cov (TXi (j), TXi (ℓ)) − E [Cov (TXi (j), TXi (ℓ)|Bi = b)]) Vi (ℓ, s)Vi (j, r)
n→∞ n
i=1 j=1 ℓ=1
n d d
(b) 1 XXX
= lim Cov (E [TXi (j)|Bi ] , E [TXi (ℓ)|Bi ]) Vi (j, r)Vi (ℓ, s) (47)
n→∞ n
i=1 j=1 ℓ=1

 2  21
2 ! 21

k  k d
X ∂ log PBi (b; θ) X X
= (E [TXi (j)|Bi = b] − E [TXi (j)]) Vi (j, r)  (55)
  
r=1
∂θr r=1 j=1

r ∈ [1 : k]. Thus, for all i ∈ [1 : n] and r ∈ [1 : k], D. DERIVATION OF THE FIM FOR THE GAUSSIAN
there exists some positive constant C such that DISTRIBUTION
d
X √
(E [TXi (j)|Bi = b]−E [TXi (j)]) Vi (j, r) ≤ C, D.1. FIM in Case 1
j=1
(57)
To compute the FIM in Proposition 1, we need to derive the
which leads to
following variance,
k  k √
X ∂ log PBi (b; θ) 2  12 X  21
≤ C = Ck. h
2
i
2
r=1
∂θr r=1 Var (E[TXi |Bi ]) = E (E[Xi |Bi ]) − (E[Xi ])
(58)
Thus, for a sufficiently large n, there will be no b ∈ = (E [Xi |Bi = 1])2 Pr(Bi = 1)
{−1, 1} such that 2 2
+ (E [Xi |Bi = −1]) Pr(Bi = −1) − (E[Xi ])
k  p2Xi (τi )
X 2  12 √ = σ4 , (61)
∂ log PBi (b)/∂θr > ǫ n, (59) FXi (τi )(1 − FXi (τi ))
r=1

for any ǫ > 0. Hence, for large n, the set D7i will be where the last equality follows since
empty for any ǫ > 0, i.e.,

i ∈ [1 : n]. pXi (τi )


D7i = ∅, for all (60) E[Xi |Bi = b] = µi − bσ 2 , b ∈ {−1, 1}. (62)
Pr(Bi = b)
Thus, the condition in (56) will be satisfied. This concludes
wi
the proof of Theorem 1. By substituting (61) inside (6) with vi = σ2 , we obtain (11).
D.2. FIM in Case 2 where µi = wi α. Finally, we have that

To compute the FIM in Proposition 1, we need to derive the Cov (E[TXi (1)|Bi ], E[TXi (2)|Bi ])
following variance, = Cov E[Xi |Bi ], E[Xi2 |Bi ]


= E E[Xi |Bi ]E[Xi2 |Bi ] − E [E[Xi |Bi ]] E E[Xi2 |Bi ]


   
h i
Var (E[TXi |Bi ]) = E (E[TXi |Bi ])2 − (E[TXi ])2
h = E[Xi |Bi = 1]E[Xi2 |Bi = 1] Pr (Bi = 1)
2 i 2
= E E[(Xi − µi )2 |Bi ] − E[(Xi − µi )2 ] + E[Xi |Bi = −1]E[Xi2 |Bi = −1] Pr (Bi = −1)
2
= E (Xi − µi )2 |Bi = 1 Pr(Bi = 1)
 − E[Xi ]E[Xi2 ]
2 p2Xi (τi )
+ E (Xi − µi )2 |Bi = −1 Pr(Bi = −1)

= µi σ 2 + µ3i + σ 4 (τi + µi )
2 FXi (τi )(1 − FXi (τi ))
− E[(Xi − µi )2 ]
− µi (σ 2 + µ2i )
p2Xi (τi )
= σ 4 (τi − µi )
2
, (63) p2Xi (τi )
FXi (τi )(1 − FXi (τi )) = σ 4 (τi + µi ) , (68)
FXi (τi )(1 − FXi (τi ))
where the last equality follows from (62) and the fact that for where we have used (62) and (64) to find E[Xi |Bi =
b ∈ {−1, 1} we have that b]E[Xi2 |Bi = b] for b ∈ {−1,
 1}. By substituting (68)
wi 0
inside (6) with Vi = , we obtain (14).
pXi (τi ) 0 − 21
E[Xi2 |Bi = b] = σ 2 +µ2i −bσ 2 (τi + µi ) . (64)
Pr(Bi = b)
E. PROOF OF PROPOSITION 2
Thus, we have that
We need to make sure that
E[(Xi − µi )2 |Bi = b]
   T
2 pXi (τi ) u v J u v >0 (69)
= σ + − bσµ2i 2
(τi + µi )
Pr(Bi = b)  T
  for all vectors u v 6= 02 . By using Jn in (14), the above
2 pXi (τi )
− 2µi µi − bσ + µ2i condition can be equivalently written as
Pr(Bi = b)
pXi (τi ) n 2
= σ 2 − bσ 2

(τi − µi ) . (65) 1X (τi + wi α)
Pr(Bi = b) lim ci wi u − v > 0. (70)
n→∞ n 2
i=1

By substituting (65) inside (6) with vi = −1/2, we ob- It is now a simple exercise to show that if the τi ’s, with i ∈
tain (12). [1 : n], are chosen i.i.d. from some absolutely continuous dis-
tribution, then (70) is satisfied almost surely. This concludes
the proof of Proposition 2.
D.3. FIM in Case 3

To compute the FIM in Proposition 1, we need to derive F. DERIVATION OF THE FIM FOR THE POISSON
Cov (E[TXi |Bi ]). In what follows, we let TXi (j), j ∈ [1 : 2] DISTRIBUTION
denote the j-th component of TXi . We start by noting that
from Case 1 in Appendix D.1, we have that The probability mass function of Xi in (15) is given by

exp(vi θx − exp(vi θ))


4 p2Xi (τi ) pXi (x) = . (71)
Var (E[TXi (1)|Bi ]) = σ . (66) x!
FXi (τi )(1 − FXi (τi ))
With reference to (1), we have that Tx = x and
Similarly, from Case 2 (see (63)) in Appendix D.2, we obtain
1
h(x) = ,
x!
2 p2Xi (τi )
Var (E[TXi (2)|Bi ]) = σ 4 (τi + µi ) , ηi,θ = vi θ,
FXi (τi )(1 − FXi (τi ))
(67) φ(ηi,θ ) = exp(vi θ) = exp(ηi,θ ).
Now, to compute the FIM in Proposition 1 for the model [9] O. Dabeer and E. Masry, “Multivariate Signal Parameter
in (15), we need to derive the following variance, Estimation under Dependent Noise from 1-Bit Dithered
h i Quantized Data,” IEEE Transactions on Information
Var (E[TXi |Bi ]) = E (E[Xi |Bi ])2 − (E[Xi ])2 Theory, vol. 54, no. 4, pp. 1637–1654, 2008.
2
= (E [Xi |Bi = 1]) Pr(Bi = 1) [10] P. Stoica, X. Shang, and Y. Cheng, “The cramér–Rao
2
+ (E [Xi |Bi = −1]) Pr(Bi = −1) − (E[Xi ])
2 Bound for Signal Parameter Estimation from Quantized
2
Data [lecture notes],” IEEE Signal Processing Maga-
(FXi (τi ) − FXi (τi − 1)) zine, vol. 39, no. 1, pp. 118–125, 2021.
= exp(2vi θ) , (73)
FXi (τi ) (1 − FXi (τi ))
[11] V. Krishnamurthy and I. Mareels, “Estimation of Noisy
where the last equality follows since for b ∈ {−1, 1} it holds Quantized Gaussian AR Time-Series with Randomly
that Varying Observation Coefficient,” IEEE Transactions
1−b 1+b on Signal Processing, vol. 43, no. 5, pp. 1285–1290,
evi θ (1−FXi (τi−1)) 2
(FXi (τi −1)) 2
E[Xi |Bi = b] = 1−b 1+b . 1995.
(1−FXi (τi )) 2
(FXi (τi )) 2

(74) [12] V. Krishnamurthy and H. V. Poor, “Asymptotic Analysis


By substituting (73) inside (6), we obtain (16). of an Algorithm for Parameter Estimation and Identifi-
cation of 1-B Quantized AR Time Series,” IEEE Trans-
G. REFERENCES actions on Signal Processing, vol. 44, no. 1, pp. 62–73,
1996.
[1] J. Fang, Y. Liu, H. Li, and S. Li, “One-Bit Quantizer De-
[13] Y.-S. Jeon, N. Lee, and H. V. Poor, “Reinforcement-
sign for Multisensor GLRT Fusion,” IEEE Signal Pro-
Learning-Aided Detector for Time-Varying MIMO Sys-
cessing Letters, vol. 20, no. 3, pp. 257–260, 2013.
tems with One-Bit ADCs,” in 2019 IEEE Global Com-
[2] C. Studer and G. Durisi, “Quantized Massive MU- munications Conference (GLOBECOM), 2019, pp. 1–6.
MIMO-OFDM Uplink,” IEEE Transactions on Commu-
[14] Z. Duan, V. P. Jilkov, and X. R. Li, “State Estimation
nications, vol. 64, no. 6, pp. 2387–2399, 2016.
with Quantized Measurements: Approximate MMSE
[3] N. P. Jewell and M. van der Laan, “Current Status Data: Approach,” in 2008 11th International Conference on
Review, Recent Developments and Open Problems,” Information Fusion, 2008, pp. 1–6.
Handbook of Statistics, vol. 23, pp. 625–642, 2003.
[15] S. Khobahi, N. Naimipour, M. Soltanalian, and Y. C.
[4] F. Yang, Y. M. Lu, L. Sbaiz, and M. Vetterli, “Bits from Eldar, “Deep Signal Recovery with One-Bit Quantiza-
Photons: Oversampled Image Acquisition Using Binary tion,” in 2019 IEEE International Conference on Acous-
Poisson Statistics,” IEEE Transactions on Image Pro- tics, Speech and Signal Processing (ICASSP), 2019, pp.
cessing, vol. 21, no. 4, pp. 1421–1436, 2011. 2987–2991.
[5] O. Alzeley, E. M. Almetwally, A. M. Gemeay, H. M. [16] H. Cramér, Mathematical Methods of Statistics. Prince-
Alshanbari, E. H. Hafez, and M. H. Abu-Moussa, “Sta- ton University Press, 1999, vol. 26.
tistical Inference under Censored Data for the New
Exponential-X Fréchet Distribution: Simulation and [17] R. Zamir, “A Proof of the Fisher Information Inequality
Application to Leukemia Data,” Computational Intelli- via a Data Processing Argument,” IEEE Transactions
gence and Neuroscience, vol. 2021, no. 1, 2021. on Information Theory, vol. 44, no. 3, pp. 1246–1250,
1998.
[6] R. H. Myers, D. C. Montgomery, G. G. Vining, and T. J.
Robinson, Generalized Linear Models: With Applica- [18] A. Dytso and H. V. Poor, “Estimation in Poisson Noise:
tions in Engineering and the Sciences. John Wiley & Properties of the Conditional Mean Estimator,” IEEE
Sons, 2012. Transactions on Information Theory, vol. 66, no. 7, pp.
4304–4323, 2020.
[7] Z. Cheng, Z. He, and B. Liao, “Target Detection Per-
formance of Collocated MIMO Radar with One-Bit [19] A. DasGupta, “The Exponential Family and Statisti-
ADCs,” IEEE Signal Processing Letters, vol. 26, no. 12, cal Applications,” Probability for Statistics and Ma-
pp. 1832–1836, 2019. chine Learning: Fundamentals and Advanced Topics,
pp. 583–612, 2011.
[8] A. Kipnis and J. C. Duchi, “Mean Estimation from One-
Bit Measurements,” IEEE Transactions on Information [20] R. A. Bradley and J. J. Gart, “The Asymptotic Proper-
Theory, vol. 68, no. 9, pp. 6276–6296, 2022. ties of ML Estimators When Sampling from Associated
Populations,” Biometrika, vol. 49, no. 1/2, pp. 205–214,
1962.
[21] H. Cramér, “Problems in Probability Theory,” The An-
nals of Mathematical Statistics, vol. 18, no. 2, pp. 165–
193, 1947.

You might also like