Mixing Properties of Nonstationary Multivariate Count Processes
Mixing Properties of Nonstationary Multivariate Count Processes
Mixing Properties of Nonstationary Multivariate Count Processes
processes
Quebec, Canada
E-mail: debz01@uqat.ca
Michael H. Neumann
Friedrich-Schiller-Universität Jena
Institut für Mathematik
Ernst-Abbe-Platz 2
D – 07743 Jena
Germany
E-mail: michael.neumann@uni-jena.de
Lionel Truquet
ENSAI
Campus de Ker Lann
51 Rue Blaise Pascal
BP 37203
35172 BRUZ Cedex
France
E-mail: lionel.truquet@ensai.fr
Abstract
We prove absolute regularity (β-mixing) for nonstationary and multivariate versions
of two popular classes of integer-valued processes. We show how this result can be
used to prove asymptotic normality of a least squares estimator of an involved model
parameter.
1. Introduction
Stationary time series models of counts have been extensively studied in the litera-
ture. In particular, many autoregressive models have been developed. See for instance
Al-Osh and Alzaid (1987) and Al-Osh and Alzaid (1990) for INAR processes, Zeger
and Qaqish (1988), or Davis, Dunsmuir and Streett (2003), Ferland, Latour, and
Oraichi (2006) or Fokianos, Rahbek, and Tjøstheim (2009) for Poisson autoregressive
models and the negative binomial INGARCH introduced in Zhu (2011). Though
most of the models discussed in the literature are univariate, a few contributions also
considered multivariate time series of counts. See in particular Latour (1997) for
a multivariate extension of INAR processes, Jørgensen et al. (1999) for state-space
models or Fokianos et al. (2020) for a multivariate extension of Poisson autoregressive
models. See also Debaly and Truquet (2019) for some conditions ensuring existence
of stationary solutions for some of these models. We defer the reader to Fokianos
(2021) for a survey about some existing attempts to model multivariate time series
of counts. In a nonstationary framework, see also the recent contribution of Wang
and Wang (2018) for a multivariate time series model with common factors.
More recent contributions considered some stationary versions of the models dis-
cussed above but defined conditionally on some exogenous covariates. See for instance
Agosto et al. (2016), Aknouche and Francq (2021) or Debaly and Truquet (2021) in
the context of univariate time series models of counts and Debaly and Truquet (2023)
for generic multivariate time series models. See also Doukhan, Neumann, Truquet
(2020) for very general model specifications with strictly exogenous covariates.
Most of the references mentioned above focus on stationarity and ergodicity prop-
erties of the corresponding autoregressive models, since these key stability properties
are often sufficient to derive consistency and asymptotic normality of maximum like-
lihood estimators. However, such a stationary modeling is only relevant when the
covariates exhibit itself a stationary behavior. When those covariates are not time-
stationary, with a possible explosive behavior, such as a polynomial trend, the model
is by essence non-stationary with possible explosive moments. Statistical inference
of autoregressive parameters is then more complicated in this setting. Recently,
Doukhan, Leucht, and Neumann (2022) derived mixing properties of a Poisson IN-
GARCH model with explosive covariates and applied their results to statistical testing
for existence of a trend in the intensity of the process. We complement their results
about mixing properties by considering INGARCH and INAR models, both in the
univariate and in the multivariate case. Additionally, we apply our mixing proper-
ties to study the asymptotic properties of the least squares estimator in an INAR(1)
model in which an additive explosive sequence of Poissonian covariates is incorpo-
rated in the dynamic. Non-standard fast rates of convergence rates are obtained,
depending on the growth of the intensity of the Poisson covariate. Our contribution
justify the importance of studying non-stationary models in the context of count data.
Indeed, contrarily to the case of continuous data modelled by linear models, it is not
possible to detrend the data by differentiation or to decompose the response as a sum
between a non-stationary and a stationary component. In this sense, working with
a non-stationary count process requires a specific attention and a careful analysis of
2
the asymptotic properties of some classical estimators. Our contribution is one step
in this direction.
The paper is organized as follows. In Section 2, we give our main results for
absolute regularity of two multivariate time series models of counts, a multivariate
version of INGARCH model and a multivariate INAR model with non-stationary
covariates. We apply our results to statistical inference in a non-stationary INAR
model in Section 3, followed with a numerical analysis and an application to real
data. Proofs of our results are postponed to the last section of the paper.
2. Main results
2.1. Two models for multivariate count processes. We consider the following
two classes of processes:
1) Multivariate Poisson-INGARCH(1,1)
We assume that (Xt )t∈N0 is a d-variate process on (Ω, F, P ) obeying the model equa-
tions
Xt ∣ Ft−1 ∼ Poi(λt ), (2.1a)
λt = Aλt−1 + BXt−1 + Zt−1 (2.1b)
for all t ∈ N, where Fs = σ(λ0 , X0 , Z0 , . . . , Xs , Zs ) and (Zt )t∈N0 is a sequence of inde-
pendent covariates, Zt being independent of Ft−1 and Xt . For λt = (λt,1 , . . . , λt,d )T ,
Poi(λt ) is a d-dimensional Poisson distribution with independent components and
respective intensities λt,1 , . . . , λt,d . A and B are (d × d)-matrices with non-negative
entries.
2) Multivariate GINAR(1)
Multivariate generalized INAR (GINAR) processes were introduced by Latour (1997).
In this case we assume that (Xt )t∈N0 is a d-variate process, where its components
follow the equations
d Xt−1,j
i,j
Xt,i = ∑ ∑ Yt,s + Zt−1,i , i = 1, . . . , d, t ∈ N, (2.2a)
j=1 s=1
(λ0,1 , . . . , λ0,d )T , initial values of the count variables X0,1 , . . . , X0,d , and an initial
vector of innovations Z0 , respectively.
We intend to prove absolute regularity for the count process (Xt )t∈N0 , where we
allow in particular an arbitrarily strong trend. We make use of some sort of con-
traction in the transition mechanism which will follow if the entries of the respective
matrices A and B are sufficiently small. In contrast, the exogenous variables Zt,i may
cause a trend. Natural choices for their distributions are
Zt,i ∼ Bin(Nt,i , p) or Zt,i ∼ Poi(γt,i ) or simply Zt,i = Nt,i ,
where the constants Nt,i and γt,i may increase without bound as t tends to infinity. In
the special case of a univariate Poisson-INGARCH(1,1) process with trend, mixing
properties were derived in Doukhan, Leucht, and Neumann (2022, Section 3.1).
defined on a common probability space (Ω, ̃ P̃) are any two versions of (Xt )t∈N
̃ F,
0
β X (k, n)
≤ E[
̃ sup {∣P̃ ((X
̃k+n , X
̃k+n+1 , . . .) ∈ C ∣ X
̃0 , . . . , X
̃k )
C∈σ(C)
− P̃ ((X
̃′ , X
k+n k+n+1 , . . .) ∈ C ∣ X0 , . . . , Xk ) ∣}]
̃′ ̃′ ̃′
≤ P̃ (X
̃k+n+r ≠ X
̃′
k+n+r for some r ∈ N0 )
= P̃ (X
̃k+n ≠ X
̃ )
′
k+n
∞
+ ∑ P̃ (X
̃k+n+r ≠ X
̃′
k+n+r , Xk+n+r−1 = Xk+n+r−1 , . . . , Xk+n = Xk+n ) .
̃ ̃′ ̃ ̃′ (2.4)
r=1
4
(In the second line of this display, σ(C) denotes the σ-algebra generated by the
cylinder sets.) In the special case when (Xt )t∈N0 is a Markov chain, we obtain the
following simpler estimate for the mixing coefficient:
β X (k, n) = β(σ(Xk ), σ(Xk+n )) ≤ P̃ (X
̃k+n ≠ X
̃′ ) .
k+n (2.5)
Phase 2:
Note that the first term on the right-hand side of (2.4) can be made as small as
′ λ′k+n )
∣̃
̃ T V (P̃X̃k+n ∣̃λk+n , P X̃k+n
Ed ̃ T V (Poi(̃
= Ed λk+n ), Poi(̃
λ′k+n )). Furthermore, since the
components of Xk+n are conditionally independent we may use the estimate
d
dT V (Poi(̃ λ′k+n )) ≤ ∑ dT V (Poi(̃
λk+n ), Poi(̃ λk+n,i ), Poi(̃
λ′k+n,i )).
i=1
where e is Euler’s number; see Roos (2003, formula (5)) or Exercise 9.3.5(b) in Daley
and Vere-Jones (1988, page 300). It turns out that this estimate is also suitable in
case of a strong trend and it is therefore used in our proofs. In view of (2.7), we shall
couple the variables X ̃k , Z
̃k , . . . , X
̃k+n−1 , Z
̃ with their corresponding counterparts
√ k+n−1 √
X̃ ,Z
̃ ,...,X ̃ ̃ ̃
k+n−1 , Zk+n−1 such that E∥ λk+n − λ′k+n ∥1 gets small as n grows. Since
′ ′ ̃ ′ ̃′
k k
the covariate Zt is independent of Ft−1 we choose for t = k, . . . , k +n−1 Z ̃t and Z
̃′ such
t
that they are equal. For the count variables we apply a (step-wise) maximal coupling;
see Lemma 5.1 below. This implies in particular that the difference between X ̃t and
X̃ also gets small,
′
t
√ √ √ √ √ √ √ √
n
E∥ λk+n − λk+n ∥1 ≤ ∥( A + 2 B) ∥1 E∥ λk − ̃
̃ ̃ ̃′ ̃ ̃ λ′k ∥1 = O(∥( A + 2 B)n ∥1 ),
5
√ √
and it follows from our spectral radius assumption ρ ( A + 2 B) < 1 that the right-
hand side decays exponentially fast.
Phase 3:
In the third phase, the focus is clearly on getting P̃(X
̃k+n+r ≠ X
̃k+n+1 for some r ≥ 0)
small. For given ̃λk+n and ̃ λ′k+n , we apply a maximal coupling to the corresponding
components of X ̃k+n and X̃ ′ , and we obtain
k+n
d
P̃(X ̃′ ∣ ̃
̃k+n ≠ X ̃′ ̃ ̃′
k+n λk+n , λk+n ) ≤ ∑ P (Xk+n,i ≠ Xk+n,i ∣ λk+n , λk+n ),
̃ ̃ ̃′
i=1
and so
d √ √ √ √ √ √ √
n
P̃(X
̃k+n ≠ X
̃′ ) ≤ ∑
k+n 2/e E∣ λk+n,i − λk+n,i ∣ = O(∥( A+2 B) ∥1 E∥ λk − ̃
̃ ̃ ̃′ ̃ ̃ λ′k ∥1 ).
i=1
P̃(X
̃k+n+r,i ≠ X
̃′
k+n+r,i , Xk+n = Xk+n , . . . , Xk+n+r−1 = Xk+n+r−1 )
̃ ̃′ ̃ ̃′
√ √ √ √ √
n r
= O(∥( A + 2 B) ∥1 ∥( A) ∥1 E∥ λk − ̃ ̃ ̃ λ′k ∥1 ).
To summarize, we obtain that
∞
P̃(X
̃k+n ≠ X
̃ ′ ) + ∑ P̃(X
k+n
̃k+n+r ≠ X
̃′
k+n+r , Xk+n = Xk+n , . . . , Xk+n+r−1 = Xk+n+r−1 )
̃ ̃′ ̃ ̃′
r=1
√ √ ∞ √
= O(∥( A + 2 B)n ∥1 ∑ ∥( A)r ∥1 )
r=0
√ √
= O(∥( A + 2 B)n ∥1 ).
In view of this discussion, we impose the following condition. In what follows, we
denote by ρ(C) the spectral radius of a square matrix C.
√ √
(A1) (i) ρ ( A + 2 B) < 1,
√
(ii) E λ0,i < ∞ (i = 1, . . . , d),
√ √
(iii) supt E∣ Zt,i − E Zt,i ∣ < ∞ (i = 1, . . . , d).
Now we are in a position to state our first major result.
Theorem 2.1. Suppose that (2.1a), (2.1b), and (A1) are fulfilled. Then the count
process (Xt )t∈N0 is absolutely regular and the coefficients of absolute regularity satisfy
β X (n) = O(κn ),
6
√ √
for any κ > ρ ( A + 2 B).
estimate of the total variation between two such distributions in terms of the square
roots of the intensities. Moreover, our sufficient condition for the mixing properties
of this model remains stronger than the optimal stationarity condition ρ(A + B) < 1
given in Debaly and Truquet (2019). These restrictions are mainly due to our proof
and assumptions allowing explosive covariates in the model. When the covariates
are non-stationary but with a non-explosive mean, one can derive mixing properties
of the model without conditional independence or parameter restrictions. We give a
result below.
In the following result, we still consider model (2.1b) but now, for λ ∈ Rd+ , Poi(λ)
(1) (d)
coincides with the probability distribution of the vector (Nλ1 , . . . , Nλd ) where, for
(i)
i = 1, . . . , d, (Ny ) is a Posson process with intensity 1. We then follow the general
y≥0
construction of a multivariate count distribution with Poisson marginals presented in
Fokianos et al. (2020).
Theorem 2.2. Suppose that ρ(A + B) < 1, Eλ0,i < ∞ and supt∈N E∣Zt,i ∣ < ∞ for
i = 1, . . . , d. Then the count process (Xt )t∈N0 is absolutely regular and the coefficients
of absolute regularity satisfy
β X (n) = O(κn ),
for any κ > ρ (A + B).
Remark 3. One can show directly that the condition ρ(A + B) < 1 is weaker than
1/n
(A1)(ii). Remembering Gelfand’s formula for spectral radius ρ(C) = limn→∞ ∥C n ∥1 ,
when C and D are two square matrices of size d×d and with non-negative coefficients,
we have ρ(C) ≤ ρ(D) as soon as Cij ≤ Dij for 1 ≤ i, j ≤ d. Moreover
√ 1/2n
√ n 1/n √
ρ(C) = lim ∥C n ∥1 ≤ lim ∥ C ∥1 = ρ ( C) .
n→∞ n→∞
For the matrices A and B defining our model, we then deduce that
√ √ √ √
ρ(A + B) ≤ ρ ( A + B) ≤ ρ ( A + 2 B) .
When the covariates are integrable and non-explosive, we then obtained absolute
regularity of the model under much less restrictive assumptions.
Now we turn to the GINAR model (2.2a), (2.2b). In this case, the count process
(Xt )t∈N0 is Markovian, which means that the coefficients of absolute regularity can
be estimated according to (2.5). Here we impose the following condition.
√
(A2) (i) ρ ( B) < 1/2,
√
(ii) E X0,i < ∞ (i = 1, . . . , d),
√ √
(iii) supt E∣ Zt,i − E Zt,i ∣ < ∞ (i = 1, . . . , d).
Now we can state our second main result.
Theorem 2.3. Suppose that (2.2a), (2.2b), and (A2) are fulfilled. Then the count
process (Xt )t∈N0 is absolutely regular and the coefficients of absolute regularity satisfy
β X (n) = O(κn ),
√
for any κ > 2ρ ( B).
8
As for the Poisson autoregressive model, we also give a result for non-explosive
covariates with a less restrictive condition on the parameter space.
Theorem 2.4. Suppose that ρ(B) < 1 and for 1 ≤ i ≤ d, E∣X0,i ∣ < ∞ and supt≥0 E∣Zt,i ∣ <
∞. Then for any κ > ρ(B),
β X (n) = O(κn ).
Proofs of Theorems 2.1 to 2.4 are given in Subsections 4.1 to 4.4, respectively.
3. An application in statistics
In this section we apply our results to prove asymptotic normality of a least squares
estimator of the parameter in non-stationary Poisson-INARCH(1) and GINAR(1)
models. To simplify matters and in order to avoid any kind of high-level assumptions
we focus on the special cases where in (2.1a) A = 0d×d and in (2.1b) and (2.2c)
B = Diag(b1 , . . . , bd ). This means that the parameter bi can be estimated on the basis
of the ith components of the processes (Xt )t∈N0 and (Zt )t∈N0 alone. The simplification
allows us to switch to the univariate case.
Suppose that the random variables X0 , Z0 , X1 , Z1 , . . . follow the model equations
Xt = b ○ Xt−1 + Zt−1 , t ∈ N, (3.1)
where b ○ Xt−1 can be represented as ∑X s=1 Yt,s . {Yt,s ∶ t, s ∈ N} is a double array
t−1
Now we can draw a conclusion about the growth of the count variables Xt . Let
εt = b ○ Xt−1 − b Xt−1 . A repeated application of our model equation (3.1) leads to
Xt = b Xt−1 + εt + Zt−1
= b {b Xt−2 + εt−1 + Zt−2 } + εt + Zt−1
t−1 t−1
= . . . = bt X0 + ∑ bs γt−s−1 + ∑ bs (εt−s + Zt−s−1 − γt−s−1 ). (3.2)
s=0 s=0
Therefore,
t−1
EXt = bt X0 + ∑ bs γt−s−1
s=0
t 1 − bt t−1 s
= b X0 + γt { − ∑ b (1 − γt−s−1 /γt )}
1−b s=0
γt
= (1 + o(1)), (3.3)
1−b
that is, the growth of the Xt is determined by the sequence (γt )t∈N0 .
10
i.e., the Lindeberg condition is satisfied. Using (3.6) to (3.8) we obtain by a central
limit theorem for sums of martingale differences (see e.g. Corollary 3.8 in McLeish
(1974) or Theorem 2 in conjunction with Lemma 2 in Brown (1971)) that
1 n n
d ν
√ ∑ Xt−1 εt = ∑ Zn,t Ð→ N (0, ). (3.9)
sn t=1 t=1 (1 − b)3
(3.5a) and (3.9) lead to the following result.
11
√
We see that the rate of convergence of ̂bn is sn /rn . In the special cases mentioned
in the remark above, it is n−(κ+1)/2 if γt−1 = dtκ and (n ln(n))−1/2 if γt−1 = d ln(t). In
order to establish an asymptotic confidence interval for b, it remains to estimate the
norming constants rn and sn . (3.5a), (3.5b), and Proposition 3.1 imply that
n n
2 2 P 3 3 P
(1 − ̂
bn ) ∑ Xt−1 Ð→ rn and (1 − ̂
bn ) ∑ Xt−1 Ð→ sn .
t=1 t=1
[̂
bn − Φ−1 (1 − α/2) Kn , ̂
bn + Φ−1 (1 − α/2) Kn ],
√ √
where Kn = ̂ bn ∑nt=1 Xt−1
3
/ ∑nt=1 Xt−1
2
in case of the Poisson-INGARCH model and
√ √ n
Kn = ̂ bn (1 − ̂ 3
bn ) ∑t=1 Xt−1 / ∑nt=1 Xt−1
2
for the Binomial-INARCH model.
3.2. Numerical studies. We present a small numerical study to illustrate the asymp-
totical properties of least squares estimator for st-CAR model provided by Proposi-
tion 3.1. To do so, we consider three values of parameter b, 0.1, 0.16 and 0.23. For each
value, we consider three different growth rates γt−1 = t, γt−1 = t2 and γt−1 = ln t, and
simulate trajectories of length n = 50, n = 100 and n = 1000 for Poisson-INGARCH
(abbreviated as INGARCH) and Binomial INARCH (abbreviated as INARCH) mod-
els. We repeat this process B = 1, 000 times and compute the least squares estimator
of b and compute its average value (line LSE in table 1), the average value of Kn for
Poisson-INGARCH (line Kn -INGARCH in table 1) and for Binomial-INARCH (line
Kn -INARCH in table 1) models. The simulation results look very promising, even
for moderate sample sizes n = 50. As expected, the value of Kn is smaller for γt−1 = t2
than that for γt−1 = t which is itself smaller than that for γt−1 = ln t regardless of
conditional distribution of the count process.
3.3. Real data example. As an illustration, we analyse the count number of ques-
tions about scrapy, a python framework for web scraping. We consider the number
of questions about natural language processing (NLP) as a predictor. Actually, NLP
entered a new era as of 2010 with recent development on neural network models.
BERT (Bidirectional Encoder Representations from Transformers) is the state-of-art
model for NLP developed by Google in 2018. There are more and more NLP projects
based on internet content, like sentiment analysis, due to the high use of social media,
among others. For data harvesting and processing purposes, many libraries such as
Requests or BeautifulSoup came up in python. Beside these ones, scrapy represents
a whole framework, meaning it comes with a set of rules and conventions, that allow
us to efficiently solve common web scraping problems.
12
INGARCH INARCH
γt−1 t t2 ln t t t2 ln t
n = 50 b = 0.1 LSE 0.0996 0.1000 0.0975 0.0998 0.0999 0.0989
Kn -INGARCH 0.0090 0.0016 0.0262 0.0090 0.0016 0.0263
Kn -BINARCH 0.0085 0.0015 0.0249 0.0085 0.0015 0.0250
b = 0.16 LSE 0.1597 0.1600 0.1567 0.1597 0.1600 0.1571
Kn -INGARCH 0.0110 0.0019 0.0320 0.0110 0.0019 0.0321
Kn -BINARCH 0.0101 0.0018 0.0293 0.0101 0.0018 0.0294
b = 0.23 LSE 0.2299 0.2300 0.2253 0.2295 0.2301 0.2289
Kn -INGARCH 0.0126 0.0022 0.0365 0.0126 0.0022 0.0368
Kn -BINARCH 0.0111 0.0020 0.0321 0.0111 0.0020 0.0322
n = 100 b = 0.1 LSE 0.100 0.1000 0.0999 0.1000 0.1000 0.0994
Kn -INGARCH 0.0045 0.0006 0.0170 0.0045 0.0006 0.0169
Kn -BINARCH 0.0043 0.0005 0.0161 0.0043 0.0005 0.0160
b = 0.16 LSE 0.1599 0.1600 0.1585 0.1600 0.1600 0.1592
Kn -INGARCH 0.0055 0.0007 0.0206 0.0055 0.0007 0.0206
Kn -BINARCH 0.0050 0.0006 0.0189 0.0050 0.0006 0.0189
b = 0.23 LSE 0.2298 0.2300 0.2276 0.2300 0.2300 0.2292
Kn -INGARCH 0.0063 0.0008 0.0235 0.0063 0.0008 0.0235
Kn -BINARCH 0.0055 0.0007 0.0207 0.0055 0.0007 0.0206
n = 1000 b = 0.1 LSE 0.1000 0.1000 0.1002 0.1000 0.1000 0.0999
Kn -INGARCH 0.0005 0.0000 0.0041 0.0005 0.0000 0.0041
Kn -BINARCH 0.0004 0.0000 0.0039 0.0004 0.0000 0.0039
b = 0.16 LSE 0.1600 0.1600 0.1598 0.1600 0.1600 0.1600
Kn -INGARCH 0.0006 0.0000 0.0050 0.0006 0.0000 0.0050
Kn -BINARCH 0.0005 0.0000 0.0046 0.0005 0.0000 0.0046
b = 0.23 LSE 0.2300 0.2300 0.2298 0.2300 0.2300 0.2300
Kn -INGARCH 0.0006 0.0000 0.0058 0.0006 0.0000 0.0058
Kn -BINARCH 0.0006 0.0000 0.0051 0.0006 0.0000 0.0051
For this small data analysis study, we download data of monthly number of var-
ious questions about NLP and scrapy on stackoverflow, the largest online commu-
nity for programmers to learn and share their knowledge. The data are available
on https://www.kaggle.com/datasets/aishu200023/stackindex. They were col-
lected between January 2009 and December 2019 (see Fig. 1). Estimated value of
parameter b is 0.23269 with Kn −INGARCH = 0.00453 and Kn −BINARCH = 0.00397.
Proof of Lemma 4.1. Recall that λt,i = ∑dj=1 Aij λt−1,j + ∑dj=1 Bij Xt−1,j +Zt−1,i . For t ∈ N,
i ∈ {1, . . . , d}, we split up
√ √
∣ ̃ λt,i − ̃λ′t,i ∣
d √ √
≤ ∑ ∣ (Aij + Bij ) λt−1,j − (Aij + Bij ) ̃
̃ λ′t−1,j ∣
j=1
d √ √
+ ∑ {∣ Aij ̃ ̃t−1,j − (Aij + Bij ) ̃
λt−1,j + Bij X λt−1,j ∣
j=1
√ √
+ ∣ Aij ̃
λ′t−1,j + Bij X
̃′
t−1,j − (Aij + Bij ) ̃
λ′t−1,j ∣}
√ √
+∣ Z ̃t−1,i − Z ̃′ ∣
t−1,i
d √ √ √
≤ ∑ ( Aij + Bij ) ∣ ̃ λt−1,j − ̃
λ′t−1,j ∣
j=1
d √ √ √ √ √
+∑ Bij (∣ X̃t−1,j − ̃
λt−1,j ∣ − ∣ X̃′ − ̃ λ′t−1,j ∣)
t−1,j
j=1
√ √
+ ∣ Zt−1,i − Z
̃ ̃′ ∣
t−1,i
(i) (i) (i)
=∶ Rt,1 + Rt,2 + Rt,3 , (4.1)
14
√ √ √ √
say. Since, for X ∼ Poi(λ), E∣ X − λ∣ ≤ E∣X − λ∣/ λ ≤ E(X − λ)2 /λ = 1, we
obtain
̃ (i) ∣ ̃
E(R ̃′
t,2 λt−1 , λt−1 )
√d √ √ √ √
≤ ∑ Bij {E(∣ Xt−1,j − ̃
̃ ̃ λt−1,j ∣ ∣ ̃
λt−1 , ̃
λ′t−1 ) + E(∣
̃ X̃′ − ̃
t−1,j λ′t−1,j ∣ ∣ ̃
λt−1 , ̃
λ′t−1 )}
j=1
d √
≤ 2∑ Bij . (4.2)
j=1
Finally, we have
√ √ √ √
̃ (i) ∣ Z
E(R ̃t−1 , Z
̃t−1
′
) = E∣
̃ Z ̃t−1,i − Z̃′ ∣ ≤ 2 E∣ Zt−1,i − E Zt−1,i ∣. (4.3)
t,3 t−1,i
Lemma 4.2. Suppose that (2.1a), (2.1b), and (A1) are fulfilled. Then there exists a
coupling such that
√ √ √ √ √ √
(i) E∥̃ ̃ λk+n − ̃λ′k+n ∥1 ≤ ∥( A + 2 B)n ∥1 E∥
̃ ̃ λk − ̃ λ′k ∥1
and
√ √
P̃(X ̃ ′ ) = O(∥( A + 2 B)n ∥ ).
̃k+n ≠ X
k+n 1
(ii) For r ≥ 1,
√ √
̃ ̃ λ′k+n+r ∥1 1(X
E[∥ λk+n+r − ̃ ̃k+n = X
̃′ , . . . , X
k+n
̃k+n+r−1 = X
̃′
k+n+r−1 )]
√ √ √ √ √
r n
≤ ∥( A) ∥1 ∥( A + 2 B) ∥1 E∥ λk − ̃
̃ ̃ λ′k ∥1
and
P̃(X
̃k+n+r ≠ X
̃′
k+n+r , Xk+n = Xk+n , . . . , Xk+n+r−1 = Xk+n+r−1 )
̃ ̃′ ̃ ̃′
√ √ √
= O(∥( A)r ∥1 ∥( A + 2 B)n ∥1 ).
if ̃
λt,j ≤ ̃
λ′t,j ; see (ii) of Lemma 5.1 below. This allows us to apply Lemma 5.2 and we
obtain
√ √
E(∣ Aij ̃
̃ λt,j + Bij X̃t,j − Aij ̃ λ′t,j + Bij X ̃′ ∣ ∣ ̃
t,j λt , ̃
λ′t , ̃λt,j ≥ ̃
λ′t,j )
√ √
= E( Aij ̃
̃ ̃t,j − Aij ̃
λt,j + Bij X ̃′ ∣ ̃
λ′t,j + Bij X ̃′ ̃ ̃′
t,j λt , λt , λt,j ≥ λt,j )
√ √ √ √ √ √
̃ ̃
Aij ( λt,j − λt,j ) + Bij E( X ̃i,j − X ̃′ ∣ ̃ ̃′ ̃ ̃′
i,j λt , λt , λt,j ≥ λt,j )
≤ ′ ̃
√ √ √ √
≤ ( Aij + 2 Bij ) ∣ ̃ λt,j − ̃ λ′t,j ∣
and, analogously,
√ √
E(∣ Aij X
̃ ̃t,j + Bij ̃ ̃ ′ + Bij ̃
λt,j − Aij X λ′t,j ∣ ∣ ̃
λt , ̃
λ′t , ̃
λt,j ≤ ̃
λ′t,j )
t,j
√ √ √ √
≤ ( Aij + 2 Bij ) ∣ ̃ λt,j − ̃ λ′t,j ∣.
4.2. Proof of Theorem 2.2. Let k ∈ N. To define our coupling we consider a se-
quence of i.i.d. random variables J̃ ∶= (N (t) , Z
̃t ) such that N (t) = (N (t,1) , . . . , N (t,d) )
t≥0
is a multivariate point process taking values in Nd0 and such that, for 1 ≤ i ≤ d,
(t,i)
N (t,i) = (Nz ) is a Poisson process with intensity 1. Moreover, Z ̃t is a copy of Zt .
z≥0
(t) (t,1) (t,d)
For λ ∈ Rd+ , we set Nλ = (Nλ1 , . . . , Nλd ). We also consider an independent copy
(t)
J̃′ ∶= (N , Z̃′ ) ̃ Finally, we consider two independent copies ̃
of J. λ0 and ̃
λ′0 of λ0 ,
t
t≥0
independent of the pair (J,
̃ J̃′ ) and, for 0 ≤ t ≤ k, we set X ̃ ′ = N ̃(t)′ with
̃t = N (t) , X
̃
λt t λt
for 1 ≤ t ≤ k,
̃
λt = Ã
λt−1 + B X
̃t + Z
̃t−1 , ̃
λ′t = Ã
λ′t−1 + B X
̃t′ + Z
̃t−1
′
.
Setting bi = supt≥0 EZt,i , one can apply Lemma 5.5 which will ensure that supt≥1 ∥vt ∥1 <
∞. The same property holds true if vt is replaced with vt′ with vt,i ′
= Ẽ
λ′t,i , 1 ≤ i ≤ d.
We then get supt≥0 E∥̃ λt − ̃ λ′t ∥1 < ∞. Moreover for t ≥ k + 1, we have
d
P (X
̃t ≠ X
̃t′ ) ≤ ∑ P (X
̃t,i ≠ X
̃t,i
′
)
i=1
d
≤ ∑ E ∣X
̃t,i − X
̃t,i
′
∣
i=1
d
= ∑ E ∣̃
λt,i − ̃
λ′t,i ∣ .
i=1
17
Setting wt,i = E ∣̃
λt,i − ̃
λ′t,i ∣ for 1 ≤ i ≤ d and t ≥ k + 1, we have
From Lemma 5.5, if κ ∈ (ρ(A + B), 1), there exists a positive constant α such that
∥wt ∥1 ≤ ακt−k ∥wk ∥1 . Since (ws )s≥0 is a bounded sequence, we get
4.3. Proof of Theorem 2.3. The proof of Theorem 2.3 is based on the following
two lemmas.
Lemma 4.3. Suppose that (2.2a), (2.2b), and (A2) are fulfilled. Let (X
̃k )k∈N and
0
(X
̃k )k∈N be two independent copies of the count process. Then
0
√ √
sup E∥ Xk − X
̃ ̃ ̃ ′ ∥ < ∞.
k 1
k∈N0
X
Proof of Lemma 4.3. Recall that Xt,i = ∑dj=1 Yti,j + Zt,i , where Yti,j = ∑s=1t−1,j Yt,s
i,j
. Fur-
i,j i,j
thermore, let λt = E(Yt ∣ Xt−1 ) = Bij Xt−1,j . Then
¿ ¿
√ √ Á d i,j Á d i,j ′
∣ X̃t,i − X̃ ′ ∣ ≤ ∣Á
À∑ ̃
λt − Á
À∑ ̃
λ ∣
t,i t
j=1 j=1
¿ ¿ ¿ ¿
Á d i,j Á d i,j Á d i,j ′ Á d i,j ′
À∑ Ỹ − Á
+ ∣Á t
À∑ ̃
λt ∣ + À∑ Ỹ − Á
∣Á t
À∑ ̃
λt ∣
j=1 j=1 j=1 j=1
√ √
+∣ Z ̃t−1,i − Z ̃t−1,i ∣
d √ √
i,j
λi,j
′
≤ ∑ ∣ λt − ̃
̃
t ∣
j=1
d √ √ √ √
+ ∑ {∣ Ỹti,j − ̃λi,j ̃ i,j ′ − ̃λi,j
′
t ∣ + ∣ Yt t ∣}
j=1
√ √
+ ∣ Zt−1,i − Z
̃ ̃t−1,i ∣
(i) (i) (i)
= St,1 + St,2 + St,3 , (4.4)
say.
We have
d √ √ d √ √ √
(i) ̃i,j ̃i,j ′
E(St,1 ∣ Xt−1 , Xt−1 ) = ∑ ∣ λt − λt ∣ = ∑ Bij ∣ Xt−1,j − X
̃ ̃ ̃ ′ ̃ ̃ ′ ∣.
t−1,j (4.5)
j=1 j=1
18
√ √ √ √ √
Since, for Y ∼ Bin(n, p), E∣ Y − np∣ ≤ E∣Y − np∣/ np ≤ E(Y − np)2 /np = 1 − p
we obtain that
̃ (i) ∣ X
E(S ̃t−1 , X̃t−1
′
)
t,2
d √ √ √ √
Ỹti,j − ̃λi,j ̃ i,j ′ − ̃λti,j ∣ ∣ X
′
≤ ∑ {E(∣
̃
t ∣ ∣ X
̃t−1 , X
̃t−1
′
) + E(∣
̃ Y t
̃t−1 , X
̃t−1
′
)}
j=1
≤ 2 d. (4.6)
Finally, we have
√ √ √ √
̃ (i) ∣ Z
E(S ̃t−1 , Z
̃t−1
′
) = E∣
̃ Z ̃t−1,i − Z̃′ ∣ ≤ 2 E∣ Zt−1,i − E Zt−1,i ∣. (4.7)
t,3 t−1,i
Lemma 4.4. Suppose that (2.2a), (2.2b), and (A2) are fulfilled. Then there exists a
coupling such that
√ √ √ √ √
n−1
E∥ Xk+n−1 − Xk+n−1 ∥ ≤ ∥(2 B) ∥1 E∥ X
̃ ̃ ̃ ′ ̃ ̃k − X ̃′ ∥ .
k
1 1
Proof of Lemma 4.4. We apply a step-wise maximal coupling, that is, we choose Z ̃t
i,j i,j ′ i,j
and Z ̃′ such that they are equal, and Ỹ and Ỹ are coupled such that P̃(Ỹ ≠
t t t t
′
Ỹti,j ∣ X̃t−1 , X ̃ ′ ) = dT V (P Yti,j ∣Xt−1 =X̃t−1 , P Yti,j ∣Xt−1 =X̃t−1
′
). As a by-product of the max-
t−1
′
imal coupling we obtain that Ỹti,j ≥ Ỹti,j if ̃ λi,j ̃i,j ′ and, vice versa, Ỹ i,j ≤ Ỹ i,j ′ if
t ≥ λt t t
i,j ′
λi,j
̃
t ≤ ̃
λ t ; see (ii) of Lemma 5.1 below. Then, by Lemma 5.3,
√ √ d √ √
E(∣ Xt,i − Xt,i ∣ Xt−1 , Xt−1 ) ≤ ∑ E(∣ Yt − Ỹti,j ∣ ∣ X i,j ′
̃ ̃ ̃ ′ ∣ ̃ ̃ ′ ̃ ̃ ̃t−1 , X
̃t−1
′
)
j=1
d √ √ √
≤ ∑2 Bij ∣ Xt−1,j − X
̃ ̃ ′ ∣.
t−1,j
j=1
√
After integration,
√ one
√ can apply Lemma 5.5 with b = 0, C = 2 B and
vt,i = E(∣
̃ X
̃t,i − X̃ ∣) to conclude.
′
t,i □
i,jX
independent given Xt−1 , and that ∑s=1t−1,j Yt,s ∣ Xt−1 ∼ Bin(Xt−1,j , Bij ) . Then we
obtain from (2.5), Lemma 4.3, Lemma 4.4, and Lemma 5.4
β X (k, n) ≤ P̃ (X
̃k+n ≠ X
̃′ )
k+n
d
≤ ∑ P̃ (X
̃k+n,i ≠ X
̃′ )
k+n,i
i=1
√ √
≤ O (E∥
̃ X ̃k+n−1 − X ̃′
k+n−1 ∥1 )
√
= O (∥(2 B)n−1 ∥1 ) .
We conclude by using the last assertion of Lemma 5.5. □
d
wt,i ≤ ∑ Bij wt−1,j ,
j=1
which leads to ∥wt ∥1 ≤ ∥B t−k ∥1 ∥wk ∥1 . Moreover, the sequence (wk )k∈N0 is bounded.
We conclude that
d
P (X
̃k+n ≠ X
̃ ′ ) ≤ ∑ P (X
k+n
̃k+n,i ≠ X
̃′ )
k+n,i
i=1
≤ ∥wk+n ∥1
= O (∥B n ∥1 ) .
The end of the proof is similar to that of Theorem 2.2.◻
First we derive upper estimates for the third term on the right-hand side of this
equation which show in particular that Xt is dominated by its non-stochastic part.
20
We have that
which leads to
t−1
√
∥ ∑ bs (εt−s + Zt−s−1 − γt−s−1 )∥ = O( γt ).
s=0 2
Therefore, we obtain
n n−1 n−1
2 2
∑ E[Xt−1 ] = X02 + ∑ (EXt−1 ) + O( ∑ γt )
t=1 t=1 t=1
rn
= (1 + o(1)),
(1 − b)2
i.e., (3.4a) is proved.
Regarding higher moments, note first that, for X ∼ Bin(n, p),
2
E[(X − EX)4 ] = 3(np(1 − p)) + np(1 − p)(1 − 6p(1 − p))
(see e.g. Johnson, Kotz, and Kemp (1992, Ch. 3.2, p. 107)) and, for X ∼ Poi(λ),
(see e.g. Johnson, Kotz, and Kemp (1992, Ch. 4.3, p. 157)). This implies that
and, therefore,
t−1
√
∥ ∑ bs (εt−s + Zt−s−1 − γt−s−1 )∥ = O( γt ).
s=0 4
Recall that the β-mixing coefficients serve as an upper bound for the α-mixing coeffi-
cients. By the covariance inequality for α-mixing random variables (see e.g. Doukhan
(1994, Thm. 3, Sect. 1.2.2)) we obtain, for arbitrary τ > 0,
n n n
2 2 2 2 2
E[( ∑ X̄t−1 − ∑ E X̄t−1 )] = ∑ cov (X̄s−1 , X̄t−1 )
t=1 t=1 s,t=1
n
2 2
≤ 8 ∑ α(∣s − t∣)(2+τ )/τ ∥X̄s−1 ∥2+τ ∥X̄t−1 ∥2+τ
s,t=1
n
2
= O( ∑(γs−1 + γs−1 n2δ ) (γn−1
2
+ γn−1 n2δ ))
s=1
(Note that our definition of the total variation norm differs from that in Lind-
vall (1992) by the factor 2.) To avoid trivialities, let P ≠ Q, which implies that
∆ ∶= dT V (P, Q) > 0. We obtain the desired result of
∞
P̃(X ≠ Y ) = 1 − ∑ P̃(X = Y = k) = ∆
k=0
if and only if
X k
P̃(( ) = ( )) = p(k) ∧ q(k) ∀k ∈ N0 , (5.1a)
Y k
which is therefore defined this way. One possible choice of the remaining probabilities
is given by
X k (p(k) − p(k) ∧ q(k)) (q(l) − p(l) ∧ q(l))
P̃(( ) = ( )) = ∀k ≠ l, (5.1b)
Y l ∆
23
P̃(X ≥ K0 > Y ∣ X ≠ Y ) = 1,
and part (ii) follows. □
Lemma 5.2. Let 0 ≤ λ < λ′ , and let X ∼ Poi(λ), X ′ ∼ Poi(λ′ ). Then
√ √ √ √
E X ′ − E X ≤ 2( λ′ − λ).
Proof of Lemma 5.2. Let (Xν )ν>0 be a √ collection of Poisson variates with respective
intensities ν. Then the function ν ↦ E Xν is differentiable and it holds
d √ ∞ √
d −ν ν k
E Xν = ∑ k {e }
dν k=1 dν k!
∞ √
k νk
= ∑ k ( − 1) e−ν
k=1 ν k!
∞ √
ν k−1 ∞ √
νk
= ∑ k e−ν − ∑ k e−ν
k=1 (k − 1)! k=1 k!
√ √
= E Xν + 1 − E X ν . (5.2a)
√
Since the function x ↦ x + 1 is concave we obtain by Jensen’s inequality that
√ √
E Xν + 1 ≤ ν + 1. (5.2b)
√ √ √ √
Furthermore, since E Xν = ∑√∞ k=1 ke−ν ν k /k! = λ ∑∞ −ν l
l=0 (1/ l + 1) e ν /l! = λE[1/ Xν + 1]
and since the function x ↦ 1/ x + 1 is convex we obtain again by Jensen’s inequality
that √ ν ν
E Xν ≥ √ = √ . (5.2c)
EXν + 1 ν +1
It follows from (5.2a) to (5.2c) that
d √ √ ν 1
E Xν ≤ ν + 1 − √ = √ .
dν ν +1 ν +1
24
This implies
√ √ λ′ d √ λ′ 1 √ √
E X −E X =∫
′ E Xu du ≤ ∫ √ du = 2( λ′ − λ).
λ du λ u
Furthermore,
n
1 1 n
E[ √ ] = ∑√ ( ) pk (1 − p)n−k
Xn + 1 k=0 k+1 k
n √
n!
= ∑ k+1 pk (1 − p)(n+1)−(k+1)
k=0 (n − k)! (k + 1)!
n+1 √
1 n+1 l
= ∑ l( ) p (1 − p)n+1−l
p(n + 1) l=1 l
1 √ 1
≤ E Xn+1 ≤ √ ,
p(n + 1) p(n + 1)
p(k; m, p) m(m − 1) ⋯ (m − k + 1)
k ↦ = (1 − p)m−n
p(k; n, p) n(n − 1) ⋯ (n − k + 1)
25
1 n
dT V (Bin(n, p), Bin(m, p)) = ∑ ∣f (k; n, p) − f (k; m, p)∣
2 k=0
= ∑ f (k; n, p) − f (k; m, p)
k∶ f (k;n,p)>f (k;m,p)
n
= ∑ f (k; n, p) − f (k; m, p).
k=k0 (p)
Note that, for fixed m, n and k, the mapping p ↦ p(k; m, p)/p(k; n, p) is non-
decreasing, which implies that p ↦ k0 (p) is a non-decreasing and piecewise con-
stant function. Denote the discontinuity point of this function by p1 , . . . , pK . For
p ∈/ {p1 , . . . , pK }, we have that
n
d
∑ f (k; n, p)
dp k=k0 (p)
n−1
n! n! n
= ∑ { pk−1 (1 − p)n−k − pk (1 − p)n−(k+1) } + ( ) n pn−1
k=k0 (p) (n − k)!(k − 1)! (n − (k + 1))!k! n
k0 (p) n
= ( ) pk0 (p) (1 − p)n−k0 (p)
p k0 (p)
and, analogously,
n
d
∑ f (k; m, p)
dp k=k0 (p)
k0 (p) m
= ( ) pk0 (p) (1 − p)m−k0 (p) .
p k0 (p)
d k0 (p)
dT V (Bin(n, p), Bin(m, p)) = (f (k0 (p); n, p) − f (k0 (p); m, p)) > 0 ∀p ∈/ {p1 , . . . , pK }.
dp p
Then
t−1
vt ⪯ ∑ C s b + C t v0 ,
s=0
where ⪯ denotes the coordinatewise ordering on Rd (i.e. v ⪯ v ′ means that vj ≤ vj′ for
1 ≤ j ≤ d). As a consequence if the spectral radius ρ(C) < 1, for any κ ∈ (ρ(C), 1),
there exists α > 0 such that
∞
∥vt ∥1 ≤ ∑ ∥C s ∥1 ∥b∥1 + ∥C t ∥1 ∥v0 ∥1
s=0
∞
≤ α (∑ κs ∥b∥1 + κt ∥v0 ∥1 ) .
s=0
Proof. Using non-negativity of the coefficients and iterating the bound for the vt,i
′
s, the
bound for vt using the coordinatewise ordering is straightforward. The bound for ∥vt ∥1
1/s
follows from the triangular inequality and Gelfand’s formula ρ(C) = lims→∞ ∥C s ∥1 .
□
References
Agosto, A., Cavaliere, G., Kristensen, D. and Rahbek, A.(2016). Modeling
corporate defaults: Poisson autoregressions with exogenous covariates (PARX).
Journal of Empirical Finance 38, 640–663.
Ahmad, A. and Francq, C. (2016). Poisson QMLE of count time series models.
Journal of Time Series Analysis 37(3), 291–314.
Aknouche, A. and Francq, C. (2021). Count and duration time series with equal
conditional stochastic and mean orders. Econometric Theory 37(2), 248–280.
Al-Osh, M. A. and Alzaid, A.A.(1987). First-order integer-valued autoregressive
(INAR (1)) process. Journal of Time Series Analysis 8(3), 261–275.
27
Kedem, B. and Fokianos, K. (2005). Regression Models for Time Series Analysis.
Wiley, New York.
Latour, A. (1997). The multivariate GINAR(p) process. Advances in Applied Prob-
ability 29(1), 228–248.
Lindvall, T. (1992). Lectures on the Coupling Method. Wiley, New York.
McLeish, D. L. (1974). Dependent central limit theorems and invariance principles.
Annals of Probability 2(4), 620–628.
Roos, B. (2003). Improvements in the Poisson approximation of mixed Poisson
distributions. Journal of Statistical Planning and Inference 113, 467–483.
Taniguchi, M. and Yoshihide, K. (2012). Asymptotic Theory of Statistical In-
ference for Time Series. Springer Science & Business Media.
Teicher, J. (1954). On the multivariate Poisson distribution. Scandinavian Actu-
arial Journal 1954(1), 1–9.
Wang, F. and Wang, H. (2018). Modelling nonstationary multivariate time series
of counts via common factors. Journal of the Royal Statistical Society: Series B
(Statistical Methodology) 80(4), 769–791.
Weiß, C. H. (2018). An Introduction to Discrete-Valued Time Series. Wiley.
Zeger, S. L. (1988). A regression model for time series of counts. Biometrika 75(4),
621–629.
Zeger, S. L. and Qaqish, B. (1988). Markov regression models for time series: a
quasi-likelihood approach. Biometrics 44(4), 1019–1031.
Zhu, F. (2011), A negative binomial integer-valued GARCH model. Journal of Time
Series Analysis 32(1), 54–67.