Mixing Properties of Nonstationary Multivariate Count Processes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Mixing properties of nonstationary multivariate count

processes

Zinsou Max Debaly


Université du Québec en Abitibi-Témiscamingue, GREMA.
341 Rue Principale N,
Amos, QC J9T 2L8
arXiv:2311.10692v1 [math.ST] 17 Nov 2023

Quebec, Canada
E-mail: debz01@uqat.ca

Michael H. Neumann
Friedrich-Schiller-Universität Jena
Institut für Mathematik
Ernst-Abbe-Platz 2
D – 07743 Jena
Germany
E-mail: michael.neumann@uni-jena.de

Lionel Truquet
ENSAI
Campus de Ker Lann
51 Rue Blaise Pascal
BP 37203
35172 BRUZ Cedex
France
E-mail: lionel.truquet@ensai.fr

Abstract
We prove absolute regularity (β-mixing) for nonstationary and multivariate versions
of two popular classes of integer-valued processes. We show how this result can be
used to prove asymptotic normality of a least squares estimator of an involved model
parameter.

2010 Mathematics Subject Classification: Primary 60G10; secondary 60J05.


Keywords and Phrases: absolute regularity, count processes, integer-valued processes,
mixing, multivariate processes, nonstationary processes.
Short title: Nonstationary multivariate count processes.

version: November 20, 2023


1

1. Introduction
Stationary time series models of counts have been extensively studied in the litera-
ture. In particular, many autoregressive models have been developed. See for instance
Al-Osh and Alzaid (1987) and Al-Osh and Alzaid (1990) for INAR processes, Zeger
and Qaqish (1988), or Davis, Dunsmuir and Streett (2003), Ferland, Latour, and
Oraichi (2006) or Fokianos, Rahbek, and Tjøstheim (2009) for Poisson autoregressive
models and the negative binomial INGARCH introduced in Zhu (2011). Though
most of the models discussed in the literature are univariate, a few contributions also
considered multivariate time series of counts. See in particular Latour (1997) for
a multivariate extension of INAR processes, Jørgensen et al. (1999) for state-space
models or Fokianos et al. (2020) for a multivariate extension of Poisson autoregressive
models. See also Debaly and Truquet (2019) for some conditions ensuring existence
of stationary solutions for some of these models. We defer the reader to Fokianos
(2021) for a survey about some existing attempts to model multivariate time series
of counts. In a nonstationary framework, see also the recent contribution of Wang
and Wang (2018) for a multivariate time series model with common factors.
More recent contributions considered some stationary versions of the models dis-
cussed above but defined conditionally on some exogenous covariates. See for instance
Agosto et al. (2016), Aknouche and Francq (2021) or Debaly and Truquet (2021) in
the context of univariate time series models of counts and Debaly and Truquet (2023)
for generic multivariate time series models. See also Doukhan, Neumann, Truquet
(2020) for very general model specifications with strictly exogenous covariates.
Most of the references mentioned above focus on stationarity and ergodicity prop-
erties of the corresponding autoregressive models, since these key stability properties
are often sufficient to derive consistency and asymptotic normality of maximum like-
lihood estimators. However, such a stationary modeling is only relevant when the
covariates exhibit itself a stationary behavior. When those covariates are not time-
stationary, with a possible explosive behavior, such as a polynomial trend, the model
is by essence non-stationary with possible explosive moments. Statistical inference
of autoregressive parameters is then more complicated in this setting. Recently,
Doukhan, Leucht, and Neumann (2022) derived mixing properties of a Poisson IN-
GARCH model with explosive covariates and applied their results to statistical testing
for existence of a trend in the intensity of the process. We complement their results
about mixing properties by considering INGARCH and INAR models, both in the
univariate and in the multivariate case. Additionally, we apply our mixing proper-
ties to study the asymptotic properties of the least squares estimator in an INAR(1)
model in which an additive explosive sequence of Poissonian covariates is incorpo-
rated in the dynamic. Non-standard fast rates of convergence rates are obtained,
depending on the growth of the intensity of the Poisson covariate. Our contribution
justify the importance of studying non-stationary models in the context of count data.
Indeed, contrarily to the case of continuous data modelled by linear models, it is not
possible to detrend the data by differentiation or to decompose the response as a sum
between a non-stationary and a stationary component. In this sense, working with
a non-stationary count process requires a specific attention and a careful analysis of
2

the asymptotic properties of some classical estimators. Our contribution is one step
in this direction.
The paper is organized as follows. In Section 2, we give our main results for
absolute regularity of two multivariate time series models of counts, a multivariate
version of INGARCH model and a multivariate INAR model with non-stationary
covariates. We apply our results to statistical inference in a non-stationary INAR
model in Section 3, followed with a numerical analysis and an application to real
data. Proofs of our results are postponed to the last section of the paper.

2. Main results
2.1. Two models for multivariate count processes. We consider the following
two classes of processes:
1) Multivariate Poisson-INGARCH(1,1)
We assume that (Xt )t∈N0 is a d-variate process on (Ω, F, P ) obeying the model equa-
tions
Xt ∣ Ft−1 ∼ Poi(λt ), (2.1a)
λt = Aλt−1 + BXt−1 + Zt−1 (2.1b)
for all t ∈ N, where Fs = σ(λ0 , X0 , Z0 , . . . , Xs , Zs ) and (Zt )t∈N0 is a sequence of inde-
pendent covariates, Zt being independent of Ft−1 and Xt . For λt = (λt,1 , . . . , λt,d )T ,
Poi(λt ) is a d-dimensional Poisson distribution with independent components and
respective intensities λt,1 , . . . , λt,d . A and B are (d × d)-matrices with non-negative
entries.
2) Multivariate GINAR(1)
Multivariate generalized INAR (GINAR) processes were introduced by Latour (1997).
In this case we assume that (Xt )t∈N0 is a d-variate process, where its components
follow the equations
d Xt−1,j
i,j
Xt,i = ∑ ∑ Yt,s + Zt−1,i , i = 1, . . . , d, t ∈ N, (2.2a)
j=1 s=1

where (Zt )t∈N0 is a sequence of independent random vectors, Zt = (Zt,1 , . . . , Zt,d )T .


This sequence is assumed to be independent of the collection of count variables
i,j
{Yt,s ∶ (t, s, i, j) ∈ N2 × {1, . . . , d}2 } which itself is composed of independent random
variables such that
i,j
Yt,s ∼ Bin(1, Bij ). (2.2b)
With B ∶= ((Bij ))i,j=1,...,d , it is customary to use the compact notation
Xt = B ○ Xt−1 + Zt−1 , t ∈ N, (2.2c)
where ○ is a (d × d)-dimensional version of the thinning operator.
In the Poisson-INGARCH model, the vector λt = (λt,1 , . . . , λt,d )T describes the
current state of the process at time t. For the GINAR model, this role is taken by the
vector of count variables Xt−1 = (Xt−1,1 , . . . , Xt−1,d )T . Since we are mainly interested
in nonstationary processes, there is no gain by considering two-sided versions (Xt )t∈Z .
Rather, the processes are started at time 0 with an initial intensity parameter λ0 =
3

(λ0,1 , . . . , λ0,d )T , initial values of the count variables X0,1 , . . . , X0,d , and an initial
vector of innovations Z0 , respectively.
We intend to prove absolute regularity for the count process (Xt )t∈N0 , where we
allow in particular an arbitrarily strong trend. We make use of some sort of con-
traction in the transition mechanism which will follow if the entries of the respective
matrices A and B are sufficiently small. In contrast, the exogenous variables Zt,i may
cause a trend. Natural choices for their distributions are
Zt,i ∼ Bin(Nt,i , p) or Zt,i ∼ Poi(γt,i ) or simply Zt,i = Nt,i ,
where the constants Nt,i and γt,i may increase without bound as t tends to infinity. In
the special case of a univariate Poisson-INGARCH(1,1) process with trend, mixing
properties were derived in Doukhan, Leucht, and Neumann (2022, Section 3.1).

2.2. A general method for proving absolute regularity. Let (Ω, A, P ) be a


probability space and A1 , A2 be two sub-σ-algebras of A. Then the coefficient of
absolute regularity is defined as
β(A1 , A2 ) = E[ sup{∣P (B ∣ A1 ) − P (B)∣∶ B ∈ A2 }].
For a process (Xt )t∈N0 on (Ω, F, P ), the coefficients of absolute regularity at the
point k are defined as
β X (k, n) = β(σ(X0 , X1 , . . . , Xk ), σ(Xk+n , Xk+n+1 , . . .))
and the (global) coefficients of absolute regularity as
β X (n) = sup{β X (k, n)∶ k ∈ N0 }.
The intended approach of proving absolute regularity is inspired by the fact that
one can construct, on a suitable probability space (Ω,̃ F, ̃ P̃) two versions (X ̃t )t∈N
0

and (X̃ ′ )t∈N of the process (Xt )t∈N such that (X


t 0 0
̃0 , . . . , X
̃k ) and (X
̃′ , . . . , X
0
̃ ′ ) are
k
independent and
β X (k, n) = P̃ (X
̃k+n+r ≠ X
̃′
k+n+r for some r ≥ 0) . (2.3)
Since such an optimal coupling seems to be out of reach in our context we confine our-
selves to construct a “reasonably good” coupling. Actually, if (X ̃t )t∈N and (X
0
̃ ′ )t∈N
t 0

defined on a common probability space (Ω, ̃ P̃) are any two versions of (Xt )t∈N
̃ F,
0

such that (X̃0 , . . . , X


̃k ) and (X
̃′ , . . . , X
0
̃ ′ ) are independent, then
k

β X (k, n)
≤ E[
̃ sup {∣P̃ ((X
̃k+n , X
̃k+n+1 , . . .) ∈ C ∣ X
̃0 , . . . , X
̃k )
C∈σ(C)

− P̃ ((X
̃′ , X
k+n k+n+1 , . . .) ∈ C ∣ X0 , . . . , Xk ) ∣}]
̃′ ̃′ ̃′

≤ P̃ (X
̃k+n+r ≠ X
̃′
k+n+r for some r ∈ N0 )
= P̃ (X
̃k+n ≠ X
̃ )

k+n

+ ∑ P̃ (X
̃k+n+r ≠ X
̃′
k+n+r , Xk+n+r−1 = Xk+n+r−1 , . . . , Xk+n = Xk+n ) .
̃ ̃′ ̃ ̃′ (2.4)
r=1
4

(In the second line of this display, σ(C) denotes the σ-algebra generated by the
cylinder sets.) In the special case when (Xt )t∈N0 is a Markov chain, we obtain the
following simpler estimate for the mixing coefficient:
β X (k, n) = β(σ(Xk ), σ(Xk+n )) ≤ P̃ (X
̃k+n ≠ X
̃′ ) .
k+n (2.5)

2.3. Absolute regularity of multivariate count processes. First we set some


notation. For a generic (d × d)-matrix M , we denote by ∥M ∥1 = maxj { ∑di=1 ∣Mij ∣}

its maximum absolute column sum norm. Analogously, for a vector x, ∥ x∥1 =
d √
∑j=1 xi denotes its L1√norm. Furthermore, for a matrix M and a vector x with non-

negative components,
√ M and x denote the corresponding objects with respective

components Mij and xi . dT V (P, Q) = (1/2) ∑k∈N0 ∣P ({k}) − Q({k})∣ denotes the
total variation norm between two generic distributions P and Q on (N0 , 2N0 ).
To give some insight into details of our approach, we consider first the multivariate
Poisson-INGARCH model (2.1a), (2.1b). In this case, the construction leading to a
good estimate (2.4) may be divided into three phases and we sketch in what follows
how this is accomplished.
Phase 1:
First we assume that (X ̃t )t∈N and (X
0
̃ ′ )t∈N are independent versions of the process
t 0
(Xt )t∈N0 . We shall show that
√ √
sup {E∥ ̃
̃ λk − ̃ λ′k ∥1 } < ∞. (2.6)
k

Phase 2:
Note that the first term on the right-hand side of (2.4) can be made as small as
′ λ′k+n )
∣̃
̃ T V (P̃X̃k+n ∣̃λk+n , P X̃k+n
Ed ̃ T V (Poi(̃
= Ed λk+n ), Poi(̃
λ′k+n )). Furthermore, since the
components of Xk+n are conditionally independent we may use the estimate
d
dT V (Poi(̃ λ′k+n )) ≤ ∑ dT V (Poi(̃
λk+n ), Poi(̃ λk+n,i ), Poi(̃
λ′k+n,i )).
i=1

A good guideline for our construction is given by the upper estimate


√ √ √
̃ ̃
dT V (Poi(λk+n,i ), Poi(λk+n,i )) ≤ 2/e ∣ ̃

λk+n,i − ̃λ′k+n,i ∣, (2.7)

where e is Euler’s number; see Roos (2003, formula (5)) or Exercise 9.3.5(b) in Daley
and Vere-Jones (1988, page 300). It turns out that this estimate is also suitable in
case of a strong trend and it is therefore used in our proofs. In view of (2.7), we shall
couple the variables X ̃k , Z
̃k , . . . , X
̃k+n−1 , Z
̃ with their corresponding counterparts
√ k+n−1 √
X̃ ,Z
̃ ,...,X ̃ ̃ ̃
k+n−1 , Zk+n−1 such that E∥ λk+n − λ′k+n ∥1 gets small as n grows. Since
′ ′ ̃ ′ ̃′
k k
the covariate Zt is independent of Ft−1 we choose for t = k, . . . , k +n−1 Z ̃t and Z
̃′ such
t
that they are equal. For the count variables we apply a (step-wise) maximal coupling;
see Lemma 5.1 below. This implies in particular that the difference between X ̃t and
X̃ also gets small,

t
√ √ √ √ √ √ √ √
n
E∥ λk+n − λk+n ∥1 ≤ ∥( A + 2 B) ∥1 E∥ λk − ̃
̃ ̃ ̃′ ̃ ̃ λ′k ∥1 = O(∥( A + 2 B)n ∥1 ),
5
√ √
and it follows from our spectral radius assumption ρ ( A + 2 B) < 1 that the right-
hand side decays exponentially fast.
Phase 3:
In the third phase, the focus is clearly on getting P̃(X
̃k+n+r ≠ X
̃k+n+1 for some r ≥ 0)
small. For given ̃λk+n and ̃ λ′k+n , we apply a maximal coupling to the corresponding
components of X ̃k+n and X̃ ′ , and we obtain
k+n
d
P̃(X ̃′ ∣ ̃
̃k+n ≠ X ̃′ ̃ ̃′
k+n λk+n , λk+n ) ≤ ∑ P (Xk+n,i ≠ Xk+n,i ∣ λk+n , λk+n ),
̃ ̃ ̃′
i=1

and so
d √ √ √ √ √ √ √
n
P̃(X
̃k+n ≠ X
̃′ ) ≤ ∑
k+n 2/e E∣ λk+n,i − λk+n,i ∣ = O(∥( A+2 B) ∥1 E∥ λk − ̃
̃ ̃ ̃′ ̃ ̃ λ′k ∥1 ).
i=1

For r > 0, we focus on P̃(X̃k+n+r,i ≠ X


̃′
k+n+r,i , Xk+n = Xk+n , . . . , Xk+n+r−1 = Xk+n+r−1 ),
̃ ̃′ ̃ ̃′
i.e. we only have to consider the case of X ̃k+n = X ̃′ , . . . , X
k+n
̃k+n+r−1 = X ̃′
k+n+r−1 . We
obtain that
√ √
∥ ̃λk+n+r − ̃ λ′k+n+r ∥1 1(X
̃k+n = X̃′ , . . . , X
k+n
̃k+n+r−1 = X ̃′
k+n+r−1 )
√ √ √
≤ ∥( A)r ∥1 ∥ ̃ λk+n − ̃ λ′k+n ∥1 .
We apply again a maximal coupling to X
̃k+n+r,i and X
̃′
k+n+r,i , which leads to

P̃(X
̃k+n+r,i ≠ X
̃′
k+n+r,i , Xk+n = Xk+n , . . . , Xk+n+r−1 = Xk+n+r−1 )
̃ ̃′ ̃ ̃′
√ √ √ √ √
n r
= O(∥( A + 2 B) ∥1 ∥( A) ∥1 E∥ λk − ̃ ̃ ̃ λ′k ∥1 ).
To summarize, we obtain that

P̃(X
̃k+n ≠ X
̃ ′ ) + ∑ P̃(X
k+n
̃k+n+r ≠ X
̃′
k+n+r , Xk+n = Xk+n , . . . , Xk+n+r−1 = Xk+n+r−1 )
̃ ̃′ ̃ ̃′
r=1
√ √ ∞ √
= O(∥( A + 2 B)n ∥1 ∑ ∥( A)r ∥1 )
r=0
√ √
= O(∥( A + 2 B)n ∥1 ).
In view of this discussion, we impose the following condition. In what follows, we
denote by ρ(C) the spectral radius of a square matrix C.
√ √
(A1) (i) ρ ( A + 2 B) < 1,

(ii) E λ0,i < ∞ (i = 1, . . . , d),
√ √
(iii) supt E∣ Zt,i − E Zt,i ∣ < ∞ (i = 1, . . . , d).
Now we are in a position to state our first major result.

Theorem 2.1. Suppose that (2.1a), (2.1b), and (A1) are fulfilled. Then the count
process (Xt )t∈N0 is absolutely regular and the coefficients of absolute regularity satisfy
β X (n) = O(κn ),
6
√ √
for any κ > ρ ( A + 2 B).

Remark 1. The spectral radius√condition


√ (A1)(i) is satisfied as soon as there exists
a matrix norm ∥ ⋅ ∥ such that ∥ A + 2 B∥ < 1. In particular, when ∥ ⋅ ∥ denotes the
1−norm, it means that the maximum of the absolute column sums is less than one
but other norms can be also considered. Let us recall that for a matrix C, ρ(C) is
the infimum of ∥C∥ over all possible matrix norms ∥ ⋅ ∥.
Remark 2. The result of Theorem 2.1 can be easily generalized for some class of
Poisson-INGARCH processes where the components conditioned on the past are not
necessarily independent. Suppose that (Xt )t∈N0 is a d-variate process with compo-
nents conditioned on the past being independent, and that (2.1a), (2.1b), and (A1)
are fulfilled. Suppose further that H is an invertible (d × d)-matrix with entries Hij
being either 0 or 1, and let
T
Yt = (Yt,1 , . . . , Yt,d ) = H Xt ∀t ∈ N0 .
Then
Yt,i ∣ Ft−1 ∼ Poi(λYt,i ),
where λYt = (λYt,1 , . . . , λYt,d )T = Hλt , and
Cov (Yt ∣Ft−1 ) = H Diag(λt,1 , . . . , λt,d ) H T .
It follows that the intensity process (λYt )t∈N also satisfies an equation similar to (2.1b),
λYt = H A H −1 λYt−1 + H B H −1 Yt−1 + H Zt−1 ∀t ∈ N.
Since Yt is a function of Xt we have that σ(Y0 , . . . , Yk ) ⊆ σ(X0 , . . . , Xk ) and σ(Yk+n , Yk+n+1 , . . .) ⊆
σ(Xk+n , Xk+n+1 , . . .). This implies that
β Y (k, n) ≤ β X (k, n) ∀k, n ∈ N0 ,
and so
β Y (n) ≤ β X (n) ∀n ∈ N0 ,
that is, the process (Yt )t∈N0 inherits the property of absolute regularity from the
underlying process (Xt )t∈N0 . A special case is given by a matrix H with entries
Hij = 1 if i = j or j = d, and Hij = 0 if i ≠ j and j < 1. It follows from det(H) = 1 that
this matrix is invertible. We obtain for the first d − 1 components of the vector Yt
that
Yt,i = Xt,i + Xt,d .
This is a popular and simple method to define a multivariate distribution with the
property that the marginal distribution of each variable is Poisson which goes back
to Campbell (1934) and Teicher (1954). One limitation of this approach is that it
permits only non-negative dependencies between the components. This follows from
the fact that all components are sums of independent Poisson random variables, and
hence, covariances are governed by the non-negative intensity parameters λt,i .
A more general method to construct multivariate Poisson distributions with Pois-
son marginal distributions is by pairing a copula distribution with Poisson marginal
distributions; see e.g. Fokianos et al. (2020). We were not able to prove absolute
regularity of explosive variants of such processes since we could not find an efficient
7

estimate of the total variation between two such distributions in terms of the square
roots of the intensities. Moreover, our sufficient condition for the mixing properties
of this model remains stronger than the optimal stationarity condition ρ(A + B) < 1
given in Debaly and Truquet (2019). These restrictions are mainly due to our proof
and assumptions allowing explosive covariates in the model. When the covariates
are non-stationary but with a non-explosive mean, one can derive mixing properties
of the model without conditional independence or parameter restrictions. We give a
result below.
In the following result, we still consider model (2.1b) but now, for λ ∈ Rd+ , Poi(λ)
(1) (d)
coincides with the probability distribution of the vector (Nλ1 , . . . , Nλd ) where, for
(i)
i = 1, . . . , d, (Ny ) is a Posson process with intensity 1. We then follow the general
y≥0
construction of a multivariate count distribution with Poisson marginals presented in
Fokianos et al. (2020).

Theorem 2.2. Suppose that ρ(A + B) < 1, Eλ0,i < ∞ and supt∈N E∣Zt,i ∣ < ∞ for
i = 1, . . . , d. Then the count process (Xt )t∈N0 is absolutely regular and the coefficients
of absolute regularity satisfy
β X (n) = O(κn ),
for any κ > ρ (A + B).
Remark 3. One can show directly that the condition ρ(A + B) < 1 is weaker than
1/n
(A1)(ii). Remembering Gelfand’s formula for spectral radius ρ(C) = limn→∞ ∥C n ∥1 ,
when C and D are two square matrices of size d×d and with non-negative coefficients,
we have ρ(C) ≤ ρ(D) as soon as Cij ≤ Dij for 1 ≤ i, j ≤ d. Moreover
√ 1/2n
√ n 1/n √
ρ(C) = lim ∥C n ∥1 ≤ lim ∥ C ∥1 = ρ ( C) .
n→∞ n→∞
For the matrices A and B defining our model, we then deduce that
√ √ √ √
ρ(A + B) ≤ ρ ( A + B) ≤ ρ ( A + 2 B) .
When the covariates are integrable and non-explosive, we then obtained absolute
regularity of the model under much less restrictive assumptions.
Now we turn to the GINAR model (2.2a), (2.2b). In this case, the count process
(Xt )t∈N0 is Markovian, which means that the coefficients of absolute regularity can
be estimated according to (2.5). Here we impose the following condition.

(A2) (i) ρ ( B) < 1/2,

(ii) E X0,i < ∞ (i = 1, . . . , d),
√ √
(iii) supt E∣ Zt,i − E Zt,i ∣ < ∞ (i = 1, . . . , d).
Now we can state our second main result.
Theorem 2.3. Suppose that (2.2a), (2.2b), and (A2) are fulfilled. Then the count
process (Xt )t∈N0 is absolutely regular and the coefficients of absolute regularity satisfy
β X (n) = O(κn ),

for any κ > 2ρ ( B).
8

As for the Poisson autoregressive model, we also give a result for non-explosive
covariates with a less restrictive condition on the parameter space.

Theorem 2.4. Suppose that ρ(B) < 1 and for 1 ≤ i ≤ d, E∣X0,i ∣ < ∞ and supt≥0 E∣Zt,i ∣ <
∞. Then for any κ > ρ(B),
β X (n) = O(κn ).
Proofs of Theorems 2.1 to 2.4 are given in Subsections 4.1 to 4.4, respectively.

3. An application in statistics
In this section we apply our results to prove asymptotic normality of a least squares
estimator of the parameter in non-stationary Poisson-INARCH(1) and GINAR(1)
models. To simplify matters and in order to avoid any kind of high-level assumptions
we focus on the special cases where in (2.1a) A = 0d×d and in (2.1b) and (2.2c)
B = Diag(b1 , . . . , bd ). This means that the parameter bi can be estimated on the basis
of the ith components of the processes (Xt )t∈N0 and (Zt )t∈N0 alone. The simplification
allows us to switch to the univariate case.
Suppose that the random variables X0 , Z0 , X1 , Z1 , . . . follow the model equations
Xt = b ○ Xt−1 + Zt−1 , t ∈ N, (3.1)
where b ○ Xt−1 can be represented as ∑X s=1 Yt,s . {Yt,s ∶ t, s ∈ N} is a double array
t−1

of independent and identically distributed random variables such that {Yt,s ∶ s ∈ N}


is also independent of Ft−1 . We have two special cases in mind: If Yt,s ∼ Poi(b),
then b ○ Xt−1 ∣ Ft−1 ∼ Poi(bXt−1 ) and we obtain a Poisson-INGARCH(1) model. If
Yt,s ∼ Bin(1, b), then b ○ Xt−1 ∣ Ft−1 ∼ Bin(Xt−1 , b) and we obtain a GINAR(1) model.
We shall assume that 0 < b < 1/4 and that X0 is non-random. Furthermore, we
suppose that Zt follows a Poisson distribution with a non-random intensity γt . The
trend of count process (Xt )t∈N0 is determined by that of (Zt )t∈N0 , see Remark 4
below. We will then refer to this model as stochastic trend count autoregressive
model, hereafter st-CAR.

3.1. Asymptotic normality of least squares estimator of st-CAR model.


Before we proceed we impose some regularity conditions on the intensity process
(γt )t∈N0 .

(A3) (i) The sequence (γt )t∈N0 is monotonically non-decreasing and γt Ð→ ∞.


t→∞
(ii) γt+1 /γt Ð→ 1 and there exists some C < ∞ such that γ2t /γt ≤ C ∀t ∈ N.
t→∞

Let rn = ∑nt=1 γt−1


2
and sn = ∑nt=1 γt−1
3
.
Remark 4. We show below that the sequence (γt )t∈N0 determines the growth rate of
the count process and that the sequences (rn )n∈N and (sn )n∈N together govern the
rate of convergence of a least squares estimator of the parameter b. Here are two
examples:
9

(i) (polynomial growth)


n nk+1
If γt−1 = d tκ , d > 0 and κ ∈ N, then we obtain from ∫0 uk du = k+1 ≤ ∑nt=1 tk ≤
n+1 k (n+1)k+1
∫0 u du = k+1 that
n
n2κ+1
∑ γt2 = d2 + O(n2κ )
t=1 2κ + 1
and
n
n3κ+1
∑ γt3 = d3 + O(n3κ ),
t=1 3κ + 1
2
i.e. rn = d n 2κ+1 /(2κ + 1) (1 + o(1)) and sn = d3 n3κ+1 /(3κ + 1) (1 + o(1)).
(ii) (logarithmic growth)
If γt−1 = d ln(t), d > 0, then the rates of growth of the sequences (rn )n∈N and
(sn )n∈N deviate from n only by logarithmic factors. We have that, for k = 2, 3,
n n
∑(ln(t))k = ∑ ((ln(t))k − (ln(t − 1))k ) + ⋯ + ((ln(2))k − (ln(1))k )
t=1 t=2
n
= ∑(n − s + 1) ((ln(s))k − (ln(s − 1))k )
s=2
n
k
= n ( ln(n)) − ∑(s − 1) ((ln(s))k − (ln(s − 1))k ).
s=2

Since (ln(s))2 −(ln(s−1))2 = (ln(s)+ln(s−1))(ln(s)−ln(s−1)) ≤ 2 ln(n)/(s−1)


and (ln(s)) − (ln(s − 1)) = ((ln(s))2 + ln(s) ln(s − 1) + (ln(s − 1))2 )(ln(s) −
3 3

ln(s − 1)) ≤ 3(ln(n))2 /(s − 1) we obtain that


n
∑(ln(t))k = n (ln(n))k + O(n (ln(n))k−1 ).
t=1

Hence, rn = d2 n (ln(n))2 (1 + o(1)) and sn = d3 n (ln(n))3 (1 + o(1)).

Now we can draw a conclusion about the growth of the count variables Xt . Let
εt = b ○ Xt−1 − b Xt−1 . A repeated application of our model equation (3.1) leads to
Xt = b Xt−1 + εt + Zt−1
= b {b Xt−2 + εt−1 + Zt−2 } + εt + Zt−1
t−1 t−1
= . . . = bt X0 + ∑ bs γt−s−1 + ∑ bs (εt−s + Zt−s−1 − γt−s−1 ). (3.2)
s=0 s=0

Therefore,
t−1
EXt = bt X0 + ∑ bs γt−s−1
s=0

t 1 − bt t−1 s
= b X0 + γt { − ∑ b (1 − γt−s−1 /γt )}
1−b s=0
γt
= (1 + o(1)), (3.3)
1−b
that is, the growth of the Xt is determined by the sequence (γt )t∈N0 .
10

Suppose now that X0 , . . . , Xn and Z0 , . . . , Zn−1 are observed. Since


Xt − Zt−1 = b Xt−1 + εt , t = 1, . . . , n,
we can easily estimate the parameter b. The ordinary least squares estimator is given
as ̂bn = arg minb ∑nt=1 (Xt − Zt−1 − bXt−1 )2 = ∑nt=1 Xt−1 (Xt − Zt−1 )/ ∑nt=1 Xt−1
2
and it
follows that n
rn ̂ rn √
√ (bn − b) = n 2 ∑ X t−1 ε t / sn .
sn ∑t=1 Xt−1 t=1

We analyze the terms ∑nt=1 Xt−1 2
and ∑nt=1 Xt−1 εt / sn separately.
We shall show in the course of the proof of Proposition 3.1 that
n
2 rn
∑ EXt−1 = (1 + o(1)) (3.4a)
t=1 (1 − b)2
and n
3 sn
∑ EXt−1 = (1 + o(1)). (3.4b)
t=1 (1 − b)3
Using the mixing property stated in Theorems 2.1 and 2.3 together with an appro-
priate bound for the moments of the Xt we shall also show that
n
2 rn
∑ Xt−1 = + oP (rn ). (3.5a)
t=1 (1 − b)2
and n
3 sn
∑ Xt−1 = + oP (sn ). (3.5b)
t=1 (1 − b)3

Let Zn,t = Xt−1 εt / sn . Then
E(Zn,t ∣ Ft−1 ) = 0 (3.6)
P
and, using ∑nt=1 Xt−1
3
/sn Ð→ 1/(1 − b)3 ,
n n n
2 2 P ν
∑ E(Zn,t ∣ Ft−1 ) = ∑ Xt−1 E(ε2t ∣ Ft−1 )/sn = ν ∑ Xt−1
3
/sn Ð→ , (3.7)
t=1 t=1 t=1 (1 − b)3
where ν = b in case of the Poisson-INGARCH model and ν = b(1 − b) in case of the
Binomial-INARCH (i.e. GINAR) model. Finally, we shall show that
n
8/3
∑ E[Zn,t ] n→∞
Ð→ 0, (3.8)
t=1
which implies, for ϵ > 0, that
n n
2
∑ E[Zn,t 1(∣Zn,t ∣ > ϵ)] ≤ ∑ E[Zn,t ]/ϵ2/3 n→∞
8/3
Ð→ 0,
t=1 t=1

i.e., the Lindeberg condition is satisfied. Using (3.6) to (3.8) we obtain by a central
limit theorem for sums of martingale differences (see e.g. Corollary 3.8 in McLeish
(1974) or Theorem 2 in conjunction with Lemma 2 in Brown (1971)) that
1 n n
d ν
√ ∑ Xt−1 εt = ∑ Zn,t Ð→ N (0, ). (3.9)
sn t=1 t=1 (1 − b)3
(3.5a) and (3.9) lead to the following result.
11

Proposition 3.1. Suppose that (A3) is fulfilled. Then


rn d
√ (̂ bn − b) Ð→ N (0, ν(1 − b)).
sn


We see that the rate of convergence of ̂bn is sn /rn . In the special cases mentioned
in the remark above, it is n−(κ+1)/2 if γt−1 = dtκ and (n ln(n))−1/2 if γt−1 = d ln(t). In
order to establish an asymptotic confidence interval for b, it remains to estimate the
norming constants rn and sn . (3.5a), (3.5b), and Proposition 3.1 imply that
n n
2 2 P 3 3 P
(1 − ̂
bn ) ∑ Xt−1 Ð→ rn and (1 − ̂
bn ) ∑ Xt−1 Ð→ sn .
t=1 t=1

Therefore, an asymptotic confidence interval with a nominal coverage probability of


(1 − α) is given by


bn − Φ−1 (1 − α/2) Kn , ̂
bn + Φ−1 (1 − α/2) Kn ],
√ √
where Kn = ̂ bn ∑nt=1 Xt−1
3
/ ∑nt=1 Xt−1
2
in case of the Poisson-INGARCH model and
√ √ n
Kn = ̂ bn (1 − ̂ 3
bn ) ∑t=1 Xt−1 / ∑nt=1 Xt−1
2
for the Binomial-INARCH model.

3.2. Numerical studies. We present a small numerical study to illustrate the asymp-
totical properties of least squares estimator for st-CAR model provided by Proposi-
tion 3.1. To do so, we consider three values of parameter b, 0.1, 0.16 and 0.23. For each
value, we consider three different growth rates γt−1 = t, γt−1 = t2 and γt−1 = ln t, and
simulate trajectories of length n = 50, n = 100 and n = 1000 for Poisson-INGARCH
(abbreviated as INGARCH) and Binomial INARCH (abbreviated as INARCH) mod-
els. We repeat this process B = 1, 000 times and compute the least squares estimator
of b and compute its average value (line LSE in table 1), the average value of Kn for
Poisson-INGARCH (line Kn -INGARCH in table 1) and for Binomial-INARCH (line
Kn -INARCH in table 1) models. The simulation results look very promising, even
for moderate sample sizes n = 50. As expected, the value of Kn is smaller for γt−1 = t2
than that for γt−1 = t which is itself smaller than that for γt−1 = ln t regardless of
conditional distribution of the count process.

3.3. Real data example. As an illustration, we analyse the count number of ques-
tions about scrapy, a python framework for web scraping. We consider the number
of questions about natural language processing (NLP) as a predictor. Actually, NLP
entered a new era as of 2010 with recent development on neural network models.
BERT (Bidirectional Encoder Representations from Transformers) is the state-of-art
model for NLP developed by Google in 2018. There are more and more NLP projects
based on internet content, like sentiment analysis, due to the high use of social media,
among others. For data harvesting and processing purposes, many libraries such as
Requests or BeautifulSoup came up in python. Beside these ones, scrapy represents
a whole framework, meaning it comes with a set of rules and conventions, that allow
us to efficiently solve common web scraping problems.
12

INGARCH INARCH
γt−1 t t2 ln t t t2 ln t
n = 50 b = 0.1 LSE 0.0996 0.1000 0.0975 0.0998 0.0999 0.0989
Kn -INGARCH 0.0090 0.0016 0.0262 0.0090 0.0016 0.0263
Kn -BINARCH 0.0085 0.0015 0.0249 0.0085 0.0015 0.0250
b = 0.16 LSE 0.1597 0.1600 0.1567 0.1597 0.1600 0.1571
Kn -INGARCH 0.0110 0.0019 0.0320 0.0110 0.0019 0.0321
Kn -BINARCH 0.0101 0.0018 0.0293 0.0101 0.0018 0.0294
b = 0.23 LSE 0.2299 0.2300 0.2253 0.2295 0.2301 0.2289
Kn -INGARCH 0.0126 0.0022 0.0365 0.0126 0.0022 0.0368
Kn -BINARCH 0.0111 0.0020 0.0321 0.0111 0.0020 0.0322
n = 100 b = 0.1 LSE 0.100 0.1000 0.0999 0.1000 0.1000 0.0994
Kn -INGARCH 0.0045 0.0006 0.0170 0.0045 0.0006 0.0169
Kn -BINARCH 0.0043 0.0005 0.0161 0.0043 0.0005 0.0160
b = 0.16 LSE 0.1599 0.1600 0.1585 0.1600 0.1600 0.1592
Kn -INGARCH 0.0055 0.0007 0.0206 0.0055 0.0007 0.0206
Kn -BINARCH 0.0050 0.0006 0.0189 0.0050 0.0006 0.0189
b = 0.23 LSE 0.2298 0.2300 0.2276 0.2300 0.2300 0.2292
Kn -INGARCH 0.0063 0.0008 0.0235 0.0063 0.0008 0.0235
Kn -BINARCH 0.0055 0.0007 0.0207 0.0055 0.0007 0.0206
n = 1000 b = 0.1 LSE 0.1000 0.1000 0.1002 0.1000 0.1000 0.0999
Kn -INGARCH 0.0005 0.0000 0.0041 0.0005 0.0000 0.0041
Kn -BINARCH 0.0004 0.0000 0.0039 0.0004 0.0000 0.0039
b = 0.16 LSE 0.1600 0.1600 0.1598 0.1600 0.1600 0.1600
Kn -INGARCH 0.0006 0.0000 0.0050 0.0006 0.0000 0.0050
Kn -BINARCH 0.0005 0.0000 0.0046 0.0005 0.0000 0.0046
b = 0.23 LSE 0.2300 0.2300 0.2298 0.2300 0.2300 0.2300
Kn -INGARCH 0.0006 0.0000 0.0058 0.0006 0.0000 0.0058
Kn -BINARCH 0.0006 0.0000 0.0051 0.0006 0.0000 0.0051

Table 1. Simulation results of least squares estimator for st-CAR


model

For this small data analysis study, we download data of monthly number of var-
ious questions about NLP and scrapy on stackoverflow, the largest online commu-
nity for programmers to learn and share their knowledge. The data are available
on https://www.kaggle.com/datasets/aishu200023/stackindex. They were col-
lected between January 2009 and December 2019 (see Fig. 1). Estimated value of
parameter b is 0.23269 with Kn −INGARCH = 0.00453 and Kn −BINARCH = 0.00397.

4. Proofs of the main results


4.1. Proof of Theorem 2.1. The proof of Theorem 2.1 is based on the following
two lemmas.
Lemma 4.1. Suppose that (2.1a), (2.1b), and (A1) are fulfilled. Let (̃
λk )k∈N0 and

λ′k )k∈N0 be two independent copies of the intensity process. Then
√ √
sup E∥ ̃
̃ λk − ̃ λ′k ∥ < ∞.
k∈N0 1
13

Figure 1. Number of questions asked about NLP (blue line) and


scrapy (red line) on stackoverflow

Proof of Lemma 4.1. Recall that λt,i = ∑dj=1 Aij λt−1,j + ∑dj=1 Bij Xt−1,j +Zt−1,i . For t ∈ N,
i ∈ {1, . . . , d}, we split up
√ √
∣ ̃ λt,i − ̃λ′t,i ∣
d √ √
≤ ∑ ∣ (Aij + Bij ) λt−1,j − (Aij + Bij ) ̃
̃ λ′t−1,j ∣
j=1
d √ √
+ ∑ {∣ Aij ̃ ̃t−1,j − (Aij + Bij ) ̃
λt−1,j + Bij X λt−1,j ∣
j=1
√ √
+ ∣ Aij ̃
λ′t−1,j + Bij X
̃′
t−1,j − (Aij + Bij ) ̃
λ′t−1,j ∣}
√ √
+∣ Z ̃t−1,i − Z ̃′ ∣
t−1,i
d √ √ √
≤ ∑ ( Aij + Bij ) ∣ ̃ λt−1,j − ̃
λ′t−1,j ∣
j=1
d √ √ √ √ √
+∑ Bij (∣ X̃t−1,j − ̃
λt−1,j ∣ − ∣ X̃′ − ̃ λ′t−1,j ∣)
t−1,j
j=1
√ √
+ ∣ Zt−1,i − Z
̃ ̃′ ∣
t−1,i
(i) (i) (i)
=∶ Rt,1 + Rt,2 + Rt,3 , (4.1)
14
√ √ √ √
say. Since, for X ∼ Poi(λ), E∣ X − λ∣ ≤ E∣X − λ∣/ λ ≤ E(X − λ)2 /λ = 1, we
obtain
̃ (i) ∣ ̃
E(R ̃′
t,2 λt−1 , λt−1 )

√d √ √ √ √
≤ ∑ Bij {E(∣ Xt−1,j − ̃
̃ ̃ λt−1,j ∣ ∣ ̃
λt−1 , ̃
λ′t−1 ) + E(∣
̃ X̃′ − ̃
t−1,j λ′t−1,j ∣ ∣ ̃
λt−1 , ̃
λ′t−1 )}
j=1
d √
≤ 2∑ Bij . (4.2)
j=1

Finally, we have
√ √ √ √
̃ (i) ∣ Z
E(R ̃t−1 , Z
̃t−1

) = E∣
̃ Z ̃t−1,i − Z̃′ ∣ ≤ 2 E∣ Zt−1,i − E Zt−1,i ∣. (4.3)
t,3 t−1,i

It follows from (4.1) to (4.3) that for 1 ≤ i ≤ d


√ √ d √ √
E ∣ λt,i − λt,i ∣ ≤ ∑ Cij E ∣ λt−1,j − ̃
̃ ̃ ′ ̃ λ′t−1,j ∣ + bi ,
j=1
√ √ √ √
where Cij = Aij + Bij and bi = 2 ∑dj=1 Bij +2 supt≥0 E∣ Zt,i − E Zt,i ∣. Since ρ(C) <
1, an application of Lemma 5.5 yields the result. □

Lemma 4.2. Suppose that (2.1a), (2.1b), and (A1) are fulfilled. Then there exists a
coupling such that
√ √ √ √ √ √
(i) E∥̃ ̃ λk+n − ̃λ′k+n ∥1 ≤ ∥( A + 2 B)n ∥1 E∥
̃ ̃ λk − ̃ λ′k ∥1

and
√ √
P̃(X ̃ ′ ) = O(∥( A + 2 B)n ∥ ).
̃k+n ≠ X
k+n 1

(ii) For r ≥ 1,
√ √
̃ ̃ λ′k+n+r ∥1 1(X
E[∥ λk+n+r − ̃ ̃k+n = X
̃′ , . . . , X
k+n
̃k+n+r−1 = X
̃′
k+n+r−1 )]
√ √ √ √ √
r n
≤ ∥( A) ∥1 ∥( A + 2 B) ∥1 E∥ λk − ̃
̃ ̃ λ′k ∥1
and
P̃(X
̃k+n+r ≠ X
̃′
k+n+r , Xk+n = Xk+n , . . . , Xk+n+r−1 = Xk+n+r−1 )
̃ ̃′ ̃ ̃′
√ √ √
= O(∥( A)r ∥1 ∥( A + 2 B)n ∥1 ).

Proof of Lemma 4.2.


(i) Let k ≤ t ≤ n. Given ̃
λt and ̃
λ′t , we apply a maximal coupling for X ̃t,i and X ̃′ .
t,i
Furthermore, we couple Z
̃t and Z
̃′ such that they are equal. As a by-product of the
t
maximal coupling we obtain that X ̃t,j ≥ X̃ ′ if ̃
λt,j ≥ ̃
λ′t,j and, vice versa, X̃t,j ≤ X
̃′
t,j t,j
15

if ̃
λt,j ≤ ̃
λ′t,j ; see (ii) of Lemma 5.1 below. This allows us to apply Lemma 5.2 and we
obtain
√ √
E(∣ Aij ̃
̃ λt,j + Bij X̃t,j − Aij ̃ λ′t,j + Bij X ̃′ ∣ ∣ ̃
t,j λt , ̃
λ′t , ̃λt,j ≥ ̃
λ′t,j )
√ √
= E( Aij ̃
̃ ̃t,j − Aij ̃
λt,j + Bij X ̃′ ∣ ̃
λ′t,j + Bij X ̃′ ̃ ̃′
t,j λt , λt , λt,j ≥ λt,j )
√ √ √ √ √ √
̃ ̃
Aij ( λt,j − λt,j ) + Bij E( X ̃i,j − X ̃′ ∣ ̃ ̃′ ̃ ̃′
i,j λt , λt , λt,j ≥ λt,j )
≤ ′ ̃
√ √ √ √
≤ ( Aij + 2 Bij ) ∣ ̃ λt,j − ̃ λ′t,j ∣
and, analogously,
√ √
E(∣ Aij X
̃ ̃t,j + Bij ̃ ̃ ′ + Bij ̃
λt,j − Aij X λ′t,j ∣ ∣ ̃
λt , ̃
λ′t , ̃
λt,j ≤ ̃
λ′t,j )
t,j
√ √ √ √
≤ ( Aij + 2 Bij ) ∣ ̃ λt,j − ̃ λ′t,j ∣.

Therefore, and since Z ̃t = Z ̃′ we obtain that


t
√ √ d √ √
̃ ̃ ̃′ ̃ ̃
E(∣ λt+1,i − λt+1,i ∣ ∣ λt , λt ) ≤ ∑ E(∣ Aij λt,j + Bij Xt,j − Aij ̃
′ ̃ ̃ ̃ ̃′ ∣ ∣ ̃
λ′t,j + Bij X t,j λt , ̃
λ′t )
j=1
√d √ √ √
≤ ∑ ( Aij + 2 Bij ) ∣ ̃
λt,j − ̃
λ′t,j ∣.
j=1

Taking expectation on both sides of this inequality we see that


√ √ d √ √ √ √
̃ ̃
E∣ λt+1,i − ̃ ̃ ̃
λ′t+1,i ∣ ≤ ∑ ( Aij + 2 Bij ) E∣ λt,j − ̃λ′t,j ∣.
j=1

From Lemma 5.5 with bi = 0, we obtain


√ √ √ √ √ √
n
E∥ λk+n − λk+n ∥ ≤ ∥( A + 2 B) ∥1 E∥ ̃
̃ ̃ ̃ ′ ̃ λk − ̃
λ′k ∥ .
1 1

(ii) For t > k+n and 1 ≤ i ≤ d, we have on the event H ∶= {X


̃k+n = X ̃′ , . . . , X
k+n
̃t−1 = X
̃t−1 },
√ √ d √ √ √
∣ ̃ λt,i − ̃ λ′t,i ∣ ≤ ∑( A)ij ∣ ̃λt,j − ̃λ′t,j ∣.
j=1

Iterating the previous bound, as in Lemma 5.5, we get


√ √ √ √ √
̃ ̃ t−(k+n)
∥ λt − λt ∥1 ≤ ∥( A)
′ ∥1 ∥ ̃
λk+n − ̃λ′k+n ∥1
on the event H. In both cases (i) and (ii), the upper estimates for the probabilities
P̃(X
̃k+n ≠ X
̃ ′ ) and P̃(X
k+n
̃k+n+r ≠ X
̃′
k+n+r , Xk+n = Xk+n , . . . , Xk+n+r−1 = Xk+n+r−1
̃ ̃′ ̃ ̃′ ) follow
√ √
from the maximal coupling and the fact that dT V (Poi(λ), Poi(λ′ )) ≤ (2/e)∣ λ − λ′ ∣.

Now we turn to the proof of our first major result.


16

Proof of Theorem 2.1. Let k ∈ N0 , and let (X ̃t )t∈N and (X


0
̃ ′ )t∈N be two versions
t 0

of the count process, where (X ̃0 , . . . , X


̃k ) and (X ̃ ,...,X
0
′ ̃ ) are independent, and

k
where X̃k+1 , X
̃k+2 , . . . are coupled with their respective counterparts X ̃′ , X
k+1
̃ ′ , . . . as
k+2
described in the proof of Lemma 4.2. Then we obtain from (2.4), Lemma 4.1, and
Lemma 4.2 that
β X (k, n) ≤ P̃ (X
̃k+n+r ≠ X
̃′
k+n+r for some r ∈ N0 )

≤ ∑ P̃ (X
̃k+n+r ≠ X
̃′
k+n+r , Xk+n = Xk+n , . . . , Xk+n+r−1 = Xk+n+r−1 )
̃ ̃′ ̃ ̃′
r=0
√ ∞ √ √
≤ 2/e ∑ E[∥ λk+n+r − ̃
̃ ̃ λ′k+n+r ∥1 1(X̃k+n = X ̃′ , . . . , X
k+n
̃k+n+r−1 = X
̃′
k+n+r−1 )]
r=0
√ √
= O (∥( A + 2 B)n ∥1 ) .

We conclude the proof using the second assertion of Lemma 5.5. □

4.2. Proof of Theorem 2.2. Let k ∈ N. To define our coupling we consider a se-
quence of i.i.d. random variables J̃ ∶= (N (t) , Z
̃t ) such that N (t) = (N (t,1) , . . . , N (t,d) )
t≥0
is a multivariate point process taking values in Nd0 and such that, for 1 ≤ i ≤ d,
(t,i)
N (t,i) = (Nz ) is a Poisson process with intensity 1. Moreover, Z ̃t is a copy of Zt .
z≥0
(t) (t,1) (t,d)
For λ ∈ Rd+ , we set Nλ = (Nλ1 , . . . , Nλd ). We also consider an independent copy
(t)
J̃′ ∶= (N , Z̃′ ) ̃ Finally, we consider two independent copies ̃
of J. λ0 and ̃
λ′0 of λ0 ,
t
t≥0
independent of the pair (J,
̃ J̃′ ) and, for 0 ≤ t ≤ k, we set X ̃ ′ = N ̃(t)′ with
̃t = N (t) , X
̃
λt t λt
for 1 ≤ t ≤ k,
̃
λt = Ã
λt−1 + B X
̃t + Z
̃t−1 , ̃
λ′t = Ã
λ′t−1 + B X
̃t′ + Z
̃t−1

.

For t ≥ k + 1, we set X̃t = N (t) and X


̃ ′ = N (t) with the same recursion formula for ̃ λt
̃
λt t ̃
λ′t
and replacing Z ̃′ in the recursive formula for ̃
̃t−1 by Z λ′t . We have, for any t ≥ 1 and
t−1
1 ≤ i ≤ d,
d
vt,i ∶= Ẽ
λt,i ≤ EZt−1,i + ∑ (Aij + Bij ) vt−1,j .
j=1

Setting bi = supt≥0 EZt,i , one can apply Lemma 5.5 which will ensure that supt≥1 ∥vt ∥1 <
∞. The same property holds true if vt is replaced with vt′ with vt,i ′
= Ẽ
λ′t,i , 1 ≤ i ≤ d.
We then get supt≥0 E∥̃ λt − ̃ λ′t ∥1 < ∞. Moreover for t ≥ k + 1, we have
d
P (X
̃t ≠ X
̃t′ ) ≤ ∑ P (X
̃t,i ≠ X
̃t,i

)
i=1
d
≤ ∑ E ∣X
̃t,i − X
̃t,i


i=1
d
= ∑ E ∣̃
λt,i − ̃
λ′t,i ∣ .
i=1
17

Setting wt,i = E ∣̃
λt,i − ̃
λ′t,i ∣ for 1 ≤ i ≤ d and t ≥ k + 1, we have

wt,i ≤ ∑ (Aij + Bij ) wt−1,j .


j=1

From Lemma 5.5, if κ ∈ (ρ(A + B), 1), there exists a positive constant α such that
∥wt ∥1 ≤ ακt−k ∥wk ∥1 . Since (ws )s≥0 is a bounded sequence, we get

β X (k, n) ≤ ∑ P (X ̃t′ ) ≤ α sup ∥ws ∥1 ∑ κt−k = O (κn ) .


̃t ≠ X
t≥n+k s≥0 t≥n+k

4.3. Proof of Theorem 2.3. The proof of Theorem 2.3 is based on the following
two lemmas.

Lemma 4.3. Suppose that (2.2a), (2.2b), and (A2) are fulfilled. Let (X
̃k )k∈N and
0

(X
̃k )k∈N be two independent copies of the count process. Then
0

√ √
sup E∥ Xk − X
̃ ̃ ̃ ′ ∥ < ∞.
k 1
k∈N0

X
Proof of Lemma 4.3. Recall that Xt,i = ∑dj=1 Yti,j + Zt,i , where Yti,j = ∑s=1t−1,j Yt,s
i,j
. Fur-
i,j i,j
thermore, let λt = E(Yt ∣ Xt−1 ) = Bij Xt−1,j . Then
¿ ¿
√ √ Á d i,j Á d i,j ′
∣ X̃t,i − X̃ ′ ∣ ≤ ∣Á
À∑ ̃
λt − Á
À∑ ̃
λ ∣
t,i t
j=1 j=1
¿ ¿ ¿ ¿
Á d i,j Á d i,j Á d i,j ′ Á d i,j ′
À∑ Ỹ − Á
+ ∣Á t
À∑ ̃
λt ∣ + À∑ Ỹ − Á
∣Á t
À∑ ̃
λt ∣
j=1 j=1 j=1 j=1
√ √
+∣ Z ̃t−1,i − Z ̃t−1,i ∣
d √ √
i,j
λi,j

≤ ∑ ∣ λt − ̃
̃
t ∣
j=1
d √ √ √ √
+ ∑ {∣ Ỹti,j − ̃λi,j ̃ i,j ′ − ̃λi,j

t ∣ + ∣ Yt t ∣}
j=1
√ √
+ ∣ Zt−1,i − Z
̃ ̃t−1,i ∣
(i) (i) (i)
= St,1 + St,2 + St,3 , (4.4)

say.
We have
d √ √ d √ √ √
(i) ̃i,j ̃i,j ′
E(St,1 ∣ Xt−1 , Xt−1 ) = ∑ ∣ λt − λt ∣ = ∑ Bij ∣ Xt−1,j − X
̃ ̃ ̃ ′ ̃ ̃ ′ ∣.
t−1,j (4.5)
j=1 j=1
18
√ √ √ √ √
Since, for Y ∼ Bin(n, p), E∣ Y − np∣ ≤ E∣Y − np∣/ np ≤ E(Y − np)2 /np = 1 − p
we obtain that
̃ (i) ∣ X
E(S ̃t−1 , X̃t−1

)
t,2
d √ √ √ √
Ỹti,j − ̃λi,j ̃ i,j ′ − ̃λti,j ∣ ∣ X

≤ ∑ {E(∣
̃
t ∣ ∣ X
̃t−1 , X
̃t−1

) + E(∣
̃ Y t
̃t−1 , X
̃t−1

)}
j=1

≤ 2 d. (4.6)
Finally, we have
√ √ √ √
̃ (i) ∣ Z
E(S ̃t−1 , Z
̃t−1

) = E∣
̃ Z ̃t−1,i − Z̃′ ∣ ≤ 2 E∣ Zt−1,i − E Zt−1,i ∣. (4.7)
t,3 t−1,i

It follows from (4.4) to (4.7) that


√ √ d √ √ √
E(∣
̃ X̃t,i − X ̃′ ∣ ∣ X
t,i
̃t−1 , X
̃t−1

) ≤ ∑ B ij ∣ X
̃ t−1,j − X̃ ′ ∣ + bi ,
t−1,j
j=1
√ √
where bi = 2d + 2 supt≥0 E∣ Zt,i − E Zt,i ∣. Taking expectation on both sides, one can
√ √ √
apply Lemma 5.5 with C = B, and vt,i = E(∣ ̃ X̃t,i − X̃ ′ ) to conclude.
t,i □

Lemma 4.4. Suppose that (2.2a), (2.2b), and (A2) are fulfilled. Then there exists a
coupling such that
√ √ √ √ √
n−1
E∥ Xk+n−1 − Xk+n−1 ∥ ≤ ∥(2 B) ∥1 E∥ X
̃ ̃ ̃ ′ ̃ ̃k − X ̃′ ∥ .
k
1 1

Proof of Lemma 4.4. We apply a step-wise maximal coupling, that is, we choose Z ̃t
i,j i,j ′ i,j
and Z ̃′ such that they are equal, and Ỹ and Ỹ are coupled such that P̃(Ỹ ≠
t t t t

Ỹti,j ∣ X̃t−1 , X ̃ ′ ) = dT V (P Yti,j ∣Xt−1 =X̃t−1 , P Yti,j ∣Xt−1 =X̃t−1

). As a by-product of the max-
t−1

imal coupling we obtain that Ỹti,j ≥ Ỹti,j if ̃ λi,j ̃i,j ′ and, vice versa, Ỹ i,j ≤ Ỹ i,j ′ if
t ≥ λt t t
i,j ′
λi,j
̃
t ≤ ̃
λ t ; see (ii) of Lemma 5.1 below. Then, by Lemma 5.3,
√ √ d √ √
E(∣ Xt,i − Xt,i ∣ Xt−1 , Xt−1 ) ≤ ∑ E(∣ Yt − Ỹti,j ∣ ∣ X i,j ′
̃ ̃ ̃ ′ ∣ ̃ ̃ ′ ̃ ̃ ̃t−1 , X
̃t−1

)
j=1
d √ √ √
≤ ∑2 Bij ∣ Xt−1,j − X
̃ ̃ ′ ∣.
t−1,j
j=1

After integration,
√ one
√ can apply Lemma 5.5 with b = 0, C = 2 B and
vt,i = E(∣
̃ X
̃t,i − X̃ ∣) to conclude.

t,i □

These two lemmas allow us to prove our second major result.


Proof of Theorem 2.3. Let k ∈ N0 , and let (X ̃t )t∈N and (X
0
̃ ′ )t∈N be two versions
t 0

of the count process, where (X ̃0 , . . . , X


̃k ) and (X ̃′ , . . . , X
0
̃ ′ ) are independent, and
k
where X̃k+1 , X
̃k+2 , . . . are coupled with their respective counterparts X ̃′ , X
k+1
̃′ , . . .
k+2
i,j
as described in the proof of Lemma 4.4. Recall that the random variables Yt,s are
Xt−1,j i,j
independent which implies that the terms ∑s=1 Yt,s (i = 1, . . . , d) are conditionally
19

i,jX
independent given Xt−1 , and that ∑s=1t−1,j Yt,s ∣ Xt−1 ∼ Bin(Xt−1,j , Bij ) . Then we
obtain from (2.5), Lemma 4.3, Lemma 4.4, and Lemma 5.4
β X (k, n) ≤ P̃ (X
̃k+n ≠ X
̃′ )
k+n
d
≤ ∑ P̃ (X
̃k+n,i ≠ X
̃′ )
k+n,i
i=1
√ √
≤ O (E∥
̃ X ̃k+n−1 − X ̃′
k+n−1 ∥1 )

= O (∥(2 B)n−1 ∥1 ) .
We conclude by using the last assertion of Lemma 5.5. □

4.4. Proof of Theorem 2.4. We consider two independent copies J̃ ∶= {(Ỹt , Z ̃t ) ∶ t ∈ N0 }


̃′ ) ∶ t ∈ N0 } of {(Yt , Zt ) ∶ t ∈ N0 }, with Yt = {Y ij ∶ 1 ≤ i, j ≤ d, s ∈ N0 } for
and J̃′ ∶= {(Ỹt′ , Zt t,s
any t ∈ N0 . We then define a process (X ̃t )
t∈N0
defined from the recursions (2.2c) and
inputs J̃ and another process (X ̃ ′)
t t∈N0 also satisfying (2.2c) but for which (Ỹt , Z ̃t ) is
replaced by (Ỹt′ , Z̃′ ) as soon as t ≥ k + 1. For t ∈ N0 and 1 ≤ i ≤ d, set vt,i = E ∣X
t
̃t,i ∣ and
bi = sups≥0 E∣Zs,i ∣. Since
d
vt,i ≤ ∑ bij vt−1,j + bi ,
j=1

an application of Lemma 5.5 entails that supt≥0 ∥vt ∥1 < ∞. Moreover if t ≥ k + 1,


setting wt,i = E ∣X
̃t,i − X
̃ ′ ∣, we get
t,i

d
wt,i ≤ ∑ Bij wt−1,j ,
j=1

which leads to ∥wt ∥1 ≤ ∥B t−k ∥1 ∥wk ∥1 . Moreover, the sequence (wk )k∈N0 is bounded.
We conclude that
d
P (X
̃k+n ≠ X
̃ ′ ) ≤ ∑ P (X
k+n
̃k+n,i ≠ X
̃′ )
k+n,i
i=1
≤ ∥wk+n ∥1
= O (∥B n ∥1 ) .
The end of the proof is similar to that of Theorem 2.2.◻

4.5. Proof of Proposition 3.1.


Proof of Proposition 3.1. It remains to prove (3.4a), (3.4b), (3.5a), (3.5b), and (3.8).
Recall that (3.2) provides the representation
t−1 t−1
Xt = bt X0 + ∑ bs γt−s−1 + ∑ bs (εt−s + Zt−s−1 − γt−s−1 ).
s=0 s=0

First we derive upper estimates for the third term on the right-hand side of this
equation which show in particular that Xt is dominated by its non-stochastic part.
20

We have that

E[(εt−s + Zt−s−1 − γt−s−1 )2 ] = E[E((εt−s + Zt−s−1 − γt−s−1 )2 ∣ Ft−s−1 )]


= E[ var(εt−s ∣ Ft−s−1 ) + var(Zt−s−1 ∣ Ft−s−1 )]
= E[κ Xt−s−1 + γt−s−1 ]
= (κ/(1 − b) + 1) γt−s−1 (1 + o(1)),

which leads to
t−1

∥ ∑ bs (εt−s + Zt−s−1 − γt−s−1 )∥ = O( γt ).
s=0 2

Therefore, we obtain
n n−1 n−1
2 2
∑ E[Xt−1 ] = X02 + ∑ (EXt−1 ) + O( ∑ γt )
t=1 t=1 t=1
rn
= (1 + o(1)),
(1 − b)2
i.e., (3.4a) is proved.
Regarding higher moments, note first that, for X ∼ Bin(n, p),
2
E[(X − EX)4 ] = 3(np(1 − p)) + np(1 − p)(1 − 6p(1 − p))

(see e.g. Johnson, Kotz, and Kemp (1992, Ch. 3.2, p. 107)) and, for X ∼ Poi(λ),

E[(X − EX)4 ] = 3λ2 + λ

(see e.g. Johnson, Kotz, and Kemp (1992, Ch. 4.3, p. 157)). This implies that

E[(εt−s + Zt−s−1 − γt−s−1 )4 ] = E[E((εt−s + Zt−s−1 − γt−s−1 )4 ∣ Ft−s−1 )]


2
= O(γt−s−1 )

and, therefore,
t−1

∥ ∑ bs (εt−s + Zt−s−1 − γt−s−1 )∥ = O( γt ).
s=0 4

This implies that


n n−1 n−1
3
3
∑ E[Xt−1 ] = X03 + ∑ (EXt−1 ) + O( ∑ γt9/4 )
t=1 t=1 t=1
sn
= (1 + o(1)),
(1 − b)3
i.e., (3.4b) is proved.
In order to prove (3.5a) and (3.5b) to hold, we first truncate the Xt . Let

X̄t ∶= Xt 1(Xt ≤ EXt + γt nδ ),



21

where 0 < δ < 1/4. We obtain that


E[Xt2 − X̄t2 ]
t−1
= E[Xt2 1( ∑ bs (εt−s + Zt−s−1 − γt−s−1 ) >

γt nδ )]
s=0
t−1
2 √
≤ 2 (EXt ) P ( ∑ bs (εt−s + Zt−s−1 − γt−s−1 ) > γt nδ )
s=0
t−1 t−1
+ 2 E[( ∑ bs (εt−s + Zt−s−1 − γt−s−1 )) 1( ∑ bs (εt−s + Zt−s−1 − γt−s−1 ) >
2 √
γt nδ )]
s=0 s=0
t−1
4
= O(γt2 E[( ∑ bs (εt−s + Zt−s−1 − γt−s−1 )) ]/(γt2 n4δ ))
s=0
t−1
4
+ 2 E[( ∑ bs (εt−s + Zt−s−1 − γt−s−1 )) ]/(γt n2δ ))
s=0
2 −4δ
= O(γt n + γt n−2δ ) = O(γt2 n−2δ ),
and therefore
n n n
2 2 2
∑ EXt−1 − ∑ E X̄t−1 = O(n−2δ ∑ γt−1 ) = O(rn n−2δ ). (4.8a)
t=1 t=1 t=1

Since Xt − X̄t are non-negative, we also obtain that


n n
2 2
∑ Xt−1 − ∑ X̄t−1 = OP (rn n−2δ ). (4.8b)
t=1 t=1

Recall that the β-mixing coefficients serve as an upper bound for the α-mixing coeffi-
cients. By the covariance inequality for α-mixing random variables (see e.g. Doukhan
(1994, Thm. 3, Sect. 1.2.2)) we obtain, for arbitrary τ > 0,
n n n
2 2 2 2 2
E[( ∑ X̄t−1 − ∑ E X̄t−1 )] = ∑ cov (X̄s−1 , X̄t−1 )
t=1 t=1 s,t=1
n
2 2
≤ 8 ∑ α(∣s − t∣)(2+τ )/τ ∥X̄s−1 ∥2+τ ∥X̄t−1 ∥2+τ
s,t=1
n
2
= O( ∑(γs−1 + γs−1 n2δ ) (γn−1
2
+ γn−1 n2δ ))
s=1

= O(rn2 n4δ−1 ). (4.8c)


(The last equality follows from the fact that nγn2 = O(rn ).) From (4.8a) to (4.8c) we
obtain that
n n
2 2
∑ Xt−1 − ∑ EXt−1 = oP (rn ),
t=1 t=1
which proves in conjunction with (3.4a) relation (3.5a).
(3.5b) can be proved by analogous arguments.
Finally, we have to show that (3.8) is satisfied. Note that, for X ∼ Bin(n, p),
E[(X − EX)8 ] = O((np)4 ) and, for X ∼ Poi(λ), E[(X − EX)8 ] = O(λ4 ); see John-
son, Kotz, and Kemp (1992, Ch. 3.2 and Ch. 4.3). This implies that E[ε8t ] =
22

E[E(ε8t ∣Ft−1 )] = O(E[E(Xt−1


4 4
+ γt−1 4
∣Ft−1 )]) = O(γt−1 ). Therefore we obtain
n n
8/3 8/3 8/3 −4/3
∑ E[Zn,t ] = ∑ E[Xt−1 εt sn ]
t=1 t=1
n 2/3 1/3
4
( E[ε8t ] )
−4/3
≤ ∑ ( E[Xt−1 ]) sn
t=1 ´¹¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ²
4 ) 4
= O(γt−1 )
= O(γt−1
n
4 −4/3
= ∑ O(γt−1 sn ) = O(n−1/3 ).
t=1

5. A few auxiliary results


In this section we collect a few well-known facts which are used in the proofs of
our main results.
Lemma 5.1. Let P and Q be two probability distributions on (N0 , 2N0 ), and let p
and q be the corresponding probability mass functions. Then
(i) There exist random variables X ∼ P and Y ∼ Q on a common probability space
(Ω, ̃ P̃) such that
̃ F,
a) P̃(X = k) = p(k) and P̃(Y = k) = q(k) ∀k ∈ N0 ,
b) P (X ≠ Y ) = dT V (P, Q).
̃
(ii) Suppose in addition that p(k) > q(k) if and only if k ≥ K0 for some K0 ∈ N.
Then, for each coupling satisfying (i),
P̃(X ≥ Y ) = 1.
Proof. Part (i) of this lemma is a well-known result and can be found e.g. in Theo-
rem 5.2 in Lindvall (1992, Chapter I). In view of our proof of part (ii), we sketch how
such a coupling can be achieved. Recall that
∞ ∞
dT V (P, Q) = (1/2) ∑ ∣p(k) − q(k)∣ = 1 − ∑ p(k) ∧ q(k).
k=0 k=0

(Note that our definition of the total variation norm differs from that in Lind-
vall (1992) by the factor 2.) To avoid trivialities, let P ≠ Q, which implies that
∆ ∶= dT V (P, Q) > 0. We obtain the desired result of

P̃(X ≠ Y ) = 1 − ∑ P̃(X = Y = k) = ∆
k=0

if and only if
X k
P̃(( ) = ( )) = p(k) ∧ q(k) ∀k ∈ N0 , (5.1a)
Y k
which is therefore defined this way. One possible choice of the remaining probabilities
is given by
X k (p(k) − p(k) ∧ q(k)) (q(l) − p(l) ∧ q(l))
P̃(( ) = ( )) = ∀k ≠ l, (5.1b)
Y l ∆
23

which means that X and Y are independent conditioned on {X ≠ Y }. Note that


(p(k)−p(k)∧q(k)) > 0 implies that q(k) = p(k)∧q(k), and so ∑l≠k (q(l)−p(l)∧q(l)) =

∑l=0 (q(l) − p(l) ∧ q(l)) = ∆. Hence,
X k X k
P̃(X = k) = P̃(( ) = ( )) + ∑ P̃(( ) = ( )) = p(k)
Y k l≠k Y l

is actually fulfilled. P̃(Y = k) = q(k) follows analygously. Therefore, the coupling


defined by (5.1a) and (5.1b) satisfies the requirements in (i).
Suppose now that P̃ describes an arbitrary coupling of X and Y . If (i) is fulfilled,
then P̃ must put as much as possible mass on the diagonal, that is, it satisfies (5.1a).
Since p(k) ≤ q(k) if k < K0 P̃ must be such that P̃((X ) (k)) = 0 for all pairs (kl)
Y = l
with k < K0 and k ≠ l. Conversely, since q(k) < p(k) if k ≥ K0 , P̃ must be such that
P̃((X ) = ( l )) = 0 for all pairs (k) with k ≥ K0 and k ≠ l. This implies that
Y k l

P̃(X ≥ K0 > Y ∣ X ≠ Y ) = 1,
and part (ii) follows. □
Lemma 5.2. Let 0 ≤ λ < λ′ , and let X ∼ Poi(λ), X ′ ∼ Poi(λ′ ). Then
√ √ √ √
E X ′ − E X ≤ 2( λ′ − λ).

Proof of Lemma 5.2. Let (Xν )ν>0 be a √ collection of Poisson variates with respective
intensities ν. Then the function ν ↦ E Xν is differentiable and it holds
d √ ∞ √
d −ν ν k
E Xν = ∑ k {e }
dν k=1 dν k!
∞ √
k νk
= ∑ k ( − 1) e−ν
k=1 ν k!
∞ √
ν k−1 ∞ √
νk
= ∑ k e−ν − ∑ k e−ν
k=1 (k − 1)! k=1 k!
√ √
= E Xν + 1 − E X ν . (5.2a)

Since the function x ↦ x + 1 is concave we obtain by Jensen’s inequality that
√ √
E Xν + 1 ≤ ν + 1. (5.2b)
√ √ √ √
Furthermore, since E Xν = ∑√∞ k=1 ke−ν ν k /k! = λ ∑∞ −ν l
l=0 (1/ l + 1) e ν /l! = λE[1/ Xν + 1]
and since the function x ↦ 1/ x + 1 is convex we obtain again by Jensen’s inequality
that √ ν ν
E Xν ≥ √ = √ . (5.2c)
EXν + 1 ν +1
It follows from (5.2a) to (5.2c) that
d √ √ ν 1
E Xν ≤ ν + 1 − √ = √ .
dν ν +1 ν +1
24

This implies
√ √ λ′ d √ λ′ 1 √ √
E X −E X =∫
′ E Xu du ≤ ∫ √ du = 2( λ′ − λ).
λ du λ u

Lemma 5.3. Let Xn ∼ Bin(n, p), where n ∈ N, p ∈ [0, 1]. Then


√ √ √ √ √
E Xn+1 − E Xn ≤ 2 p ( n + 1 − n) ∀n ∈ N.

Proof. We have that


√ √ √ √
E Xn+1 − E Xn = p E[ Xn + 1 − E Xn ]
1 1
= p E[ √ √ ] ≤ p E[ √ ].
X n + 1 + E Xn Xn + 1

Furthermore,
n
1 1 n
E[ √ ] = ∑√ ( ) pk (1 − p)n−k
Xn + 1 k=0 k+1 k
n √
n!
= ∑ k+1 pk (1 − p)(n+1)−(k+1)
k=0 (n − k)! (k + 1)!
n+1 √
1 n+1 l
= ∑ l( ) p (1 − p)n+1−l
p(n + 1) l=1 l
1 √ 1
≤ E Xn+1 ≤ √ ,
p(n + 1) p(n + 1)

which implies that



√ √ p √ 2 √ √ √
E Xn+1 − E Xn ≤ √ ≤ p√ √ = 2 p ( n + 1 − n).
n+1 n+1 + n

Lemma 5.4. If p0 < 1, then


√ √
sup dT V (Bin(n, p), Bin(m, p)) = O(∣ n − m∣). (5.3)
p≤p0

Proof. Let, w.l.o.g., n > m. We denote by f (⋅ ; N, p) the probability mass function of


a binomial distribution with parameters N and p, i.e. f (k; N, p) = (Nk )pk (1 − p)N −k
for k = 0, 1, . . . , N and f (k; N, p) = 0 otherwise. Then, for fixed p, the mapping

p(k; m, p) m(m − 1) ⋯ (m − k + 1)
k ↦ = (1 − p)m−n
p(k; n, p) n(n − 1) ⋯ (n − k + 1)
25

is non-increasing on {0, 1, . . . , n}, where p(0; m, p)/p(0; n, p) = (1 − p)m−n > 1 and


p(n; m, p)/p(n; n, p) = 0. Let k0 (p) ∶= min{k∶ p(k; m, p)/p(k; n, p) < 1}. Then

1 n
dT V (Bin(n, p), Bin(m, p)) = ∑ ∣f (k; n, p) − f (k; m, p)∣
2 k=0
= ∑ f (k; n, p) − f (k; m, p)
k∶ f (k;n,p)>f (k;m,p)
n
= ∑ f (k; n, p) − f (k; m, p).
k=k0 (p)

To handle the supremum we next show that

sup dT V (Bin(n, p), Bin(m, p)) = dT V (Bin(n, p0 ), Bin(m, p0 )). (5.4)


p≤p0

Note that, for fixed m, n and k, the mapping p ↦ p(k; m, p)/p(k; n, p) is non-
decreasing, which implies that p ↦ k0 (p) is a non-decreasing and piecewise con-
stant function. Denote the discontinuity point of this function by p1 , . . . , pK . For
p ∈/ {p1 , . . . , pK }, we have that
n
d
∑ f (k; n, p)
dp k=k0 (p)
n−1
n! n! n
= ∑ { pk−1 (1 − p)n−k − pk (1 − p)n−(k+1) } + ( ) n pn−1
k=k0 (p) (n − k)!(k − 1)! (n − (k + 1))!k! n
k0 (p) n
= ( ) pk0 (p) (1 − p)n−k0 (p)
p k0 (p)

and, analogously,
n
d
∑ f (k; m, p)
dp k=k0 (p)
k0 (p) m
= ( ) pk0 (p) (1 − p)m−k0 (p) .
p k0 (p)

which implies that

d k0 (p)
dT V (Bin(n, p), Bin(m, p)) = (f (k0 (p); n, p) − f (k0 (p); m, p)) > 0 ∀p ∈/ {p1 , . . . , pK }.
dp p

(Note that k0 (p) > 0.) Hence, (5.4) holds true.


For N ∈ N, let SN ∼ Bin(N, p0 ). Then we obtain from the Berry-Esseen inequality
that
SN − N p 0 1
sup ∣P ( √ ≤ x) − Φ(x)∣ = O( √ ).
x N p0 (1 − p0 ) N
26

Using this approximation we obtain that


dT V (Bin(n, p0 ), Bin(m, p0 ))
= ∣P (Sn ≤ k0 (p0 ) − 1) − P (Sm ≤ k0 (p0 ) − 1)∣
(k0 (p0 ) − 1) − np0 (k0 (p0 ) − 1) − mp0 1 1
= ∣Φ( √ ) − Φ( √ )∣ + O( √ + √ )
np0 (1 − p0 ) mp0 (1 − p0 ) n m
1 (k0 (p0 ) − 1) − np0 (k0 (p0 ) − 1) − mp0 1 1
≤ √ ∣ √ − √ ∣ + O( √ + √ )
2π np0 (1 − p0 ) mp0 (1 − p0 ) n m
√ √
= O(∣ n − m∣). (5.5)
The assertion of the lemma follows from (5.4) and (5.5). □
Lemma 5.5. Suppose that {vt,i ∶ t ≥ 0, 1 ≤ i ≤ d} is family of non-negative real
numbers such that there exist a matrix C of size d × d, with non-negative entries and
a vector b of Rd with non-negative entries such that
d
vt,i ≤ ∑ Ci,j vt−1,j + bi , 1 ≤ i ≤ d, t ≥ 1.
j=1

Then
t−1
vt ⪯ ∑ C s b + C t v0 ,
s=0
where ⪯ denotes the coordinatewise ordering on Rd (i.e. v ⪯ v ′ means that vj ≤ vj′ for
1 ≤ j ≤ d). As a consequence if the spectral radius ρ(C) < 1, for any κ ∈ (ρ(C), 1),
there exists α > 0 such that

∥vt ∥1 ≤ ∑ ∥C s ∥1 ∥b∥1 + ∥C t ∥1 ∥v0 ∥1
s=0

≤ α (∑ κs ∥b∥1 + κt ∥v0 ∥1 ) .
s=0

Proof. Using non-negativity of the coefficients and iterating the bound for the vt,i

s, the
bound for vt using the coordinatewise ordering is straightforward. The bound for ∥vt ∥1
1/s
follows from the triangular inequality and Gelfand’s formula ρ(C) = lims→∞ ∥C s ∥1 .

References
Agosto, A., Cavaliere, G., Kristensen, D. and Rahbek, A.(2016). Modeling
corporate defaults: Poisson autoregressions with exogenous covariates (PARX).
Journal of Empirical Finance 38, 640–663.
Ahmad, A. and Francq, C. (2016). Poisson QMLE of count time series models.
Journal of Time Series Analysis 37(3), 291–314.
Aknouche, A. and Francq, C. (2021). Count and duration time series with equal
conditional stochastic and mean orders. Econometric Theory 37(2), 248–280.
Al-Osh, M. A. and Alzaid, A.A.(1987). First-order integer-valued autoregressive
(INAR (1)) process. Journal of Time Series Analysis 8(3), 261–275.
27

Al-Osh, M. A. and Alzaid, A.A.(1990). First-order integer-valued autoregressive


(INAR (p)) process. Journal of Time Series Analysis 27(2), 314–324.
Brockwell, P. J. and Davis, R. A. (2016). Introduction to Time Series and
Forecasting. Springer.
Brown, B. M. (1971). Martingale central limit theorems. Annals of Mathematical
Statistics 42(1), 59–66.
Campbell, J. T. (1934). The Poisson correlation function. Proceedings of the Edin-
burgh Mathematical Society 4(1), 18–26.
Cox, D. R., Gudmundsson, G., Lindgren. G., Bondesson, L., Harsaae, E.,
Laake, P., Juselius, K., and Lauritzen, S. L. (1981). Statistical analysis of
time series: Some recent developments [with discussion and reply]. Scandinavian
Journal of Statistics 8(2). 93–115.
Daley, D. J. and Vere-Jones, D. (1988). An Introduction to the Theory of Point
Processes. Springer, New York.
Davis, R. A., Dunsmuir, W. TM and Streett, S. B. (2003). Observation-
driven models for Poisson counts. Biometrika 90(4), 777-790.
Davydov, Y. A. (1968). Convergence of distributions generated by stationary sto-
chastic processes. Theory of Probability and Its Applications 13(4), 691–696.
Debaly, Z.-M., and Truquet, L. (2021). Iterations of dependent random maps
and exogeneity in nonlinear dynamics. Econometric Theory 37(6), 1135–1172.
Debaly, Z.-M., and Truquet, L. (2019). Stationarity and moment properties of
some multivariate count autoregressions. arXiv preprint arXiv:1909.11392.
Debaly, Z.-M. and Truquet, L. (2023). Multivariate time series models for mixed
data. Bernoulli 29(1), 669–695.
Doukhan, P. (1994). Mixing: Properties and Examples. Lecture Notes in Statistics
85, Springer, New York.
Doukhan, P., Leucht, A., and Neumann, M. H. (2022). Mixing properties of
non-stationary INGARCH(1,1) processes. Bernoulli 28(1), 663–688.
Doukhan, P., Neumann, M. H. and Truquet, L. (2020). Stationarity and
ergodic properties for some observation-driven models in random environments.
arXiv preprint arXiv:2007.07623.
Ferland, R., Latour, A., and Oraichi, D. (2006). Integer-valued GARCH
process. Journal of Time Series Analysis 27(6), 923–942.
Fokianos, K. (2021). Multivariate count time series modelling. Econometrics and
Statistics https://doi.org/10.1016/j.ecosta.2021.11.006.
Fokianos, K., Rahbek, A., and Tjøstheim, D. (2009). Poisson autoregression.
Journal of the American Statistical Association 104(488), 1430–1439.
Fokianos, K., Støve, B., Tjøstheim, D., and Doukhan, P. (2020). Multi-
variate count autoregression. Bernoulli 26(1), 471–499.
Francq, C. and Zakoı̈an, J.-M.(2019). GARCH Models: Structure, Statistical
Inference and Financial Applications. John Wiley & Sons Ltd., Chichester.
Jørgensen, B., Lundbye-Christensen, S., Song, P. K., and Sun, L. (1999).
A state space model for multivariate longitudinal count data. Biometrika 86(1),
169–181.
Johnson, N. L., Kotz, S., and Kemp, A. W.(1992). Univariate Discrete Distri-
butions. (2nd ed.). Wiley.
28

Kedem, B. and Fokianos, K. (2005). Regression Models for Time Series Analysis.
Wiley, New York.
Latour, A. (1997). The multivariate GINAR(p) process. Advances in Applied Prob-
ability 29(1), 228–248.
Lindvall, T. (1992). Lectures on the Coupling Method. Wiley, New York.
McLeish, D. L. (1974). Dependent central limit theorems and invariance principles.
Annals of Probability 2(4), 620–628.
Roos, B. (2003). Improvements in the Poisson approximation of mixed Poisson
distributions. Journal of Statistical Planning and Inference 113, 467–483.
Taniguchi, M. and Yoshihide, K. (2012). Asymptotic Theory of Statistical In-
ference for Time Series. Springer Science & Business Media.
Teicher, J. (1954). On the multivariate Poisson distribution. Scandinavian Actu-
arial Journal 1954(1), 1–9.
Wang, F. and Wang, H. (2018). Modelling nonstationary multivariate time series
of counts via common factors. Journal of the Royal Statistical Society: Series B
(Statistical Methodology) 80(4), 769–791.
Weiß, C. H. (2018). An Introduction to Discrete-Valued Time Series. Wiley.
Zeger, S. L. (1988). A regression model for time series of counts. Biometrika 75(4),
621–629.
Zeger, S. L. and Qaqish, B. (1988). Markov regression models for time series: a
quasi-likelihood approach. Biometrics 44(4), 1019–1031.
Zhu, F. (2011), A negative binomial integer-valued GARCH model. Journal of Time
Series Analysis 32(1), 54–67.

You might also like