EPJ manuscript No.
(will be inserted by the editor)
arXiv:math-ph/0612022v1 7 Dec 2006
Random Recurrent Neural Networks Dynamics.
M. Samuelides1 and B. Cessac2,3,4
1
2
3
4
Ecole Nationale Supérieure de l’Aéronautique et de l’espace and ONERA/DTIM, 2 Av. E. Belin,
31055 Toulouse, France.
INRIA, 2004 Route des Lucioles, 06902 Sophia-Antipolis, France.
INLN, 1361, Route des Lucioles, 06560 Valbonne, France.
Université de Nice, Parc Valrose, 06000 Nice, France.
Abstract. This paper is a review dealing with the study of large size random
recurrent neural networks. The connection weights are varying according to a
probability law and it is possible to predict the network dynamics at a macroscopic
scale using an averaging principle. After a first introductory section, the section 2
reviews the various models from the points of view of the single neuron dynamics
and of the global network dynamics. A summary of notations is presented, which
is quite helpful for the sequel. In section 3, mean-field dynamics is developed. The
probability distribution characterizing global dynamics is computed. In section
4, some applications of mean-field theory to the prediction of chaotic regime for
Analog Formal Random Recurrent Neural Networks (AFRRNN) are displayed.
The case of AFRRNN with an homogeneous population of neurons is studied in
section 4.1. Then, a two-population model is studied in section 4.2. The occurrence
of a cyclo-stationary chaos is displayed using the results of [16]. In section 5, an
insight of the application of mean-field theory to IF networks is given using the
results of [9].
1 Introduction
Recurrent neural networks were introduced to improve biological plausibility of artificial neural
networks such as perceptrons since they display internal dynamics. They are useful to implement
associative recall. The first models were endowed with symmetric connexion weights which
induced relaxation dynamics and equilibrium states [28]. Asymmetric connexion weights were
introduced later on, enabling the observation of complex dynamics and chaotic attractors. The
role of chaos in cognitive functions was first discussed by W.Freeman and C.Skarda in seminal
papers such as [37]. The practical importance of such dynamics is due to the use of on-line
Hebbian learning to store dynamical patterns. For a review see for instance [26].
The nature of the dynamics depends on the connexion weights. When considering large sized
neural networks, it is impossible to study the dynamics as a function of the many parameters
involved in the network dynamics: parameters defining the state of the neuron such as the
Sodium, Potassium conductibility in Hodgkin-Huxley models; parameters defining the structure
of the synapses; parameters attached to the environment; external inputs; etc . . . . One may
consider that the connexion weights share few values but this does not allow to study the
effect of the variability. Henceforth, one often considers random models where the connexion
weights form a random sample of a probability distribution. These models are called ”Random
Recurrent Neural Networks” (RRNN).
In this context, the parameters of interest are those defining the probability distribution,
i.e. the statistical parameters (introduced as “macroscopic parameters” in the first paper of this
review, refereed, from now on, as paper I). Then a study of the dynamics in terms of relevant
2
Will be inserted by the editor
dynamical quantities, called order parameters1 can be performed via ”Dynamic Mean-Field
Equations” (DMFE). This terminology and method is inherited from statistical physics and
quantum field theory, though is has to be adapted to the present context. Mean-Field Equations
were introduced for neural networks by Amari [2] and later on by Crisanti and Sompolinsky [38].
Their results were extended in [12] and proved in a rigorous way in [34]. In [34] the Authors
used a ”Large deviation Principle” (LDP) coming from rigorous statistical mechanics [7].
Simultaneously, mean-field equations were successfully used to predict the dynamics of spiking
recurrent neural networks [9,40].
This paper intends to provide a bridge between the detailed computation of the asymptotic
regime and the rigorous aspects of MFE theory. We shall introduce the mathematical basis of
dynamic mean field theory in sections 2,3, and apply it to several prominent examples. Note
however that our approach is different from the standard ones based either on the computation
of a functional generating the cumulants, or using an ad hoc approximation replacing the sum
of incoming synaptic potentials by a Gaussian random variable. Instead we use large deviations
techniques (detailed in the appendix). They have the advantage to be rigorous and they allows
to prove convergence results stronger than the usual techniques.
In section 2, the various models are stated from the points of view of the single neuron
dynamics and of the global network dynamics. A summary of notations is presented, which
is quite helpful for the sequel. In section 3 mean-field dynamics is developed. The probability
distribution characterizing global dynamics is computed. The mathematical tools which are used
there are detailed (without any proof) in appendix. In section 4, some applications of mean-field
theory to the prediction of chaotic regime for analog formal random recurrent neural networks
(AFRRNN) are displayed. The dynamical equation of homogeneous AFRRNN, which is studied
in paper I, is derived from the random network model in section 4.1. Moreover a two-population
model is studied in section 4.2 and the occurrence of a cyclo-stationary chaos is displayed using
the results of [16]. In section 5, an insight of the application of mean-field theory to IF networks
is given using the results of [9]. The model of this section is a continuous-time model following
the authors of the original paper. Hence the theoretical framework of the beginning of the paper
has to be enlarged to support this extension of mean-field theory and this work has still to be
done. However, we sketch a parallel between the two models to induce further research.
2 Dynamics of Random Recurrent Neural Networks.
2.1 Defining dynamic state variables
The state of an individual neuron i at time t is described by an instantaneous individual
variable, the membrane potential ui (t). In stochastic models, such as the ones considered here,
all variables (including the ui (t)’s) are (real) random variables2 which takes their values in IR.
In this section, we consider discrete time dynamics and restrict ourselves to finite time-horizon,
i.e. we consider time t as an integer belonging to time interval{0, 1, ..., T } where T is finite.
def
Thus an individual state trajectory ui = (ui (t))t∈{0,1,...,T } takes its value in F = IR{0,1,...,T }.
Though one is also interested in long-time behaviour and stationary regime (if any), rigorous
proofs of convergence of large-size networks only exist for finite time.
We shall study the probability distribution of trajectories instead of the probability of an
instantaneous state. Actually, the later can be easily obtained from the former. The second
order moments of an individual trajectory ui are its expectation E(ui ) ∈ F and its covariance
matrix Cov(ui ) ∈ F ⊗ F.
1
This terminology comes from statistical physics and was introduced by the physicists L. Landau.
Prominent examples of order parameters are the magnetization in the Ising model or the EdwardsAnderson parameter in spin-glasses.
2
Following the standard physicist’s habit, the random variables won’t be noted by capital letters.
Will be inserted by the editor
3
Our aim is to study the coupled dynamics of N interacting neurons that constitute a neural
def
network. N is the size of the neural network. The global state trajectory of u = (ui )i∈{1...,N }
is a random vector in F N . The probability distribution3 of the random vector u, denoted by
QN depends on N . We shall compute it for various neuron models in this section. We shall
first focus to the case of homogeneous neural networks then more realistic cases such as several
population networks will be considered.
As it was detailed in paper I, the dynamics of the neuron models that we study here depends
on three crucial points.
– how does the neuron activation depend on the membrane potential,
– how do the other neurons contribute to the synaptic potential which summarizes completely
the influence of the network onto the target neuron,
– how is the synaptic potential used to update the membrane potential.
We shall now detail these points in the models that are considered further on.
2.2 Spike firing modeling
As introduced in paper I, it is considered that the neuron is active and emits a spike whenever
its membrane potential exceeds the activation threshold. So the neuron i is active at time t
when ui (t) ≥ θ where θ is the neuron activation threshold. We consider here that θ is constant
during the evolution and is the same for all neurons. Actually this hypothesis may be relaxed
and random thresholds may be considered but the notation and the framework of dynamical
study would be more complicated (see [34]).
For spiking neuron models we define an activation variable xi (t) which is equal to 1 if neuron
i emits a spike at time t and to 0 otherwise. Hence we have
xi (t) = f [ui (t) − θ]
(1)
where f , called the transfer function of the neuron is here the Heaviside function. Actually, to
alleviate notations, we shift ui of θ and that allows to replace equation (1) by equation
xi (t) = f [ui (t)]
(2)
The threshold will be further on taken into account in the updating equation.
Two spiking neuron models are considered here, the Binary Formal neuron (BF) which is
the original model of Mac Culloch and Pitts [32] and the Integrate and Fire neuron (IF) which
is generally used nowadays to model dynamics of large spiking neural networks [22].
In these models, the neuron activation takes generally two values: 0 and 1. This is true
for most models of neurons. However, it was preferred in a lot of research works to take into
account the average firing rate of the model instead of the detailed instant of firing (see section
5.1, paper I). This point of view simplifies the model as it deals with smooth functions easier
to handle from a mathematical point of view. In this case, equation (2) is still valid but xi (t)
takes its values in the interval [0, 1] and the transfer function f is a smooth sigmoid function
ex
. Since the activation of the neuron is represented by a real value
for instance f (x) =
1 + ex
that varies continuously, the model is called Analog Formal neuron (AF).
AF model is still widely dominant when Artificial Neural Networks are considered for applications since gradient are easy to compute. For biological purpose, it was widely believed that
the relevant information was stored in the firing rate; in that case more precise modeling would
not be so useful, at least from a functional point of view.
3
This term is defined in Appendix, definition 7
4
Will be inserted by the editor
The three models are studied in that chapter and we attempt to give a unified presentation
of mean-field equation for these three models. Note that, in the sequel, the state of the neuron
will be the membrane potential ui (t) and not the activation xi (t). This is due to the updating
definition in the IF model. We shall return to that point later on.
2.3 The synaptic potential of RRNN
The spikes are used to transmit information to other neurons through the synapses. We shall
adopt here the very rough, but classical description of the synapse. Namely, the synaptic connexion from neuron j to neuron i is denoted by Jij . It can be positive (excitatory) or negative
(inhibitory). Let us denote by J = (Jij ) the matrix of synaptic weights. At time 0, the dynamical system is initialized and the synaptic potentials are set to zero4 .
The synaptic potential of neuron i of a network of N neurons at time t + 1 is expressed5 as
a function of J and u(t) ∈ IRN by
vi (J , u)(t + 1) =
N
X
Jij xj (t) =
j=1
N
X
Jij f [uj (t)]
(3)
j=1
(The notation vi (J , u)(t + 1) will be explained below).
As discussed in the introduction, we consider here random models where the connexion
weights form a random sample of a probability distribution (”Random Recurrent Neural Networks”(RRNN)). In that case, the parameters of interest are the statistical parameters defining
the probability distribution. A standard example considered in this paper is Gaussian connexion weights. In this case, the statistical parameters are denoted by6 J¯ and J 2 so that J is a
normal random matrix with independent components distributed according to the normal law
J¯ J 2
, N ).
N(N
Note that the assumption of independence is crucial in the approach described below. Unfortunately, the more realistic case where correlations between the Jij ’s exist (e.g. after Hebbian
learning) is, currently, out of reach for all the mean-field methods that we know. We shall first
consider Gaussian synaptic synaptic, but we shall extend later on the RRNN model properties
to a more general setting where the weights are non Gaussian and depend on the neuron class
in a several population model like in [16].
We have already dealt with the dynamical properties of RRNN such as (3) in paper I,
considered from the dynamical system point of view, where we fix a realization of the Jij ’s and
consider the evolution of trajectories of this dynamical system. Then, we have averaged over the
Jij distribution in order to get informations about the evolution of averaged quantities. In the
present paper we shall start with a complementary point of view. Namely, assume that we fix the
trajectory ui of each neuron (resp. we fix the trajectory u of the network). Then, at each time
PN
step the variable j=1 Jij f [uj (t)] is a Gaussian random variable whose probability distribution
is induced by the distribution of the Jij ’s. Of course, this distribution depends on the trajectory
PN
PN
J¯
(for example E[ j=1 Jij f [uj (t)]] = N
j=1 f [uj (t)]). To emphasize this dependence we shall
denote by vi (J , u) = (vi (J , u)(t)) ∈ F the trajectory of the synaptic potential as in (3).
With this line of reasoning one can show that the vi (., u) are (conditionally on u) Gaussian identically distributed and independent random vectors in F (see appendix, proposition
4
This may for example corresponds to a rest state set to zero without loss of generality (since its
corresponds to changing the voltage reference for the membrane potential, see paper I).
5
See section 5 in paper I.
6
We use here the same notation as in paper I. Recall that the scaling with N1 allows to have a
synaptic potential whose mean and variance are independent of N .
Will be inserted by the editor
5
(16). The distribution of vi is therefore7 defined by its mean mu and its covariance matrix cu
(depending on u).
We have
N
J¯ X
f [uj (t)]
(4)
mu (t + 1) =
N j=1
and
cu (s + 1, t + 1) =
N
J2 X
f [uj (s)]f [uj (t)]
N j=1
(5)
Notice that these quantities, called order parameters in the sequel, are invariant by any
permutation of the neuron membrane potentials. Actually, they depend only on the empirical
distribution 8 µu , associated to u.
Definition 1 The empirical measure is an application from F N to P(F ), the set of probability
measures on F . It is defined by
N
1 X
(6)
δu (A)
µu (A) =
N i=1 i
where δu (A) is the Dirac mass on the set A, where δu (A) = 1 if u belongs to A and 0 otherwise.
Using this formalism provides an useful way to perform an average over a probability disPN
tribution on the trajectories u. For example, the average N1
j=1 g[uj (t)], where g is some
R
function, writes9 g(η(t))dµu (η). More generally, assume that we are given a probability distribution µ on the space of trajectories F . Then, one can perform a generic construction of a
Gaussian probability on F .
Definition 2 For any µ ∈ P(F ) the Gaussian probability distribution gµ on RT , with moments
mµ and cµ , is defined by :
R
mµ (t + 1) = J¯ f [η(t)]dµ
R u (η)
(7)
cµ (s + 1, t + 1)] = J 2 f [η(s)]f [η(t)]dµu (η)
Then, it is easy to reformulate the previous computation as:
Proposition 1 The common probability distribution of the individual synaptic potential trajectories vi (., u) is the normal distribution gµu where µu is the empirical distribution of the
network potential trajectory u.
This framework is useful to compute the large-size limit of the common probability distribution of the potential trajectories.
2.4 Dynamical models of the membrane potential
We shall now detail the updating rule of the membrane potential. Various neural dynamics
have been detailed in paper I. We focus here on Analog Formal (AF), Binary Formal (BF), and
Integrate and Fire neuron (IF).
In any case, the network is initialized with independent identically distributed membrane
potential according to a probability distribution m0 ∈ P(IR). It is useful, on technical grounds,
to add to each neuron a small amount of noise. Thus we introduce for each neuron i, a sequence
7
This distribution actually does not depend on i since the vi ’s are identically distributed.
This concept is introduced within more details in the appendix, definition (18)
9
Note that the integration variable η corresponds to a trajectory of the network, and that η(t)
corresponds to the state of the network at time t (i.e. ηi (t) is the state of neuron i at time t).
8
6
Will be inserted by the editor
(wi )(t))t∈{1,...,T } of i.i.d. centered Gaussian variables of variance σ 2 . This sequence is called the
synaptic noise of neuron i. The synaptic noise plays an important part in the mathematical
proof but the order parameter σ is as small as necessary. So this model is not very restrictive.
The synaptic noise is added to the synaptic potential. On biological grounds it may account for
different effects such as the diffusion of neurotransmitters involved in the synaptic transmission,
the degrees of freedom neglected by the model, external perturbations, etc ... Though it is not
evident that the “real noise” is Brownian, using this kind of perturbations has the advantage of
providing a tractable model where standard theorems in the theory of stochastic processes or
methods in non equilibrium statistical physics (e.g. Fokker-Planck equations) can be applied.
In some papers, the synaptic noise is called thermal noise or annealed noise by comparison
with the random variables J = (Jij ), which are called quenched variables as they are fixed,
once for all, and do not change with time (we do not consider learning in this paper).
The formal neuron updates its membrane potential according to
ui (t + 1) = vi (t + 1) + wi (t + 1) − θ
(8)
IF neuron takes into account its present membrane potential while updating. Its evolution
equation is
ui (t + 1) = ϕ[ui (t) + θ)] + vi (t + 1) + wi (t + 1) − θ
(9)
where
– ϕ is defined by
ϕ(u) =
γu if ϑγ < u < θ
ϑ else
(10)
– γ ∈]0, 1[ is the leak (damping coefficient).
– ϑ is the reset potential and ϑ < 0 < θ
The following table summarizes the main properties of the three models we investigate:
Transfer function Heaviside sigmoidal
Formal model
BF
AF
Integrate and Fire
IF
Assume for a moment that we remove the neural coupling, then the individual neuron state
trajectories are independent, identically distributed, random vectors in F (whose randomness
is induced by the Brownian noise). The corresponding dynamics is called the free dynamics.
Let us denote by P the common distribution of the neurons trajectory in the uncoupled case.
The probability distribution of the corresponding neural network trajectory is therefore P ⊗N .
In the case of formal neurons, the free dynamics equation is
ui (0) ∼ m0 , ui (t + 1) = wi (t + 1) − θ
(11)
where ∼ means “distributed according to”. So P = m0 ⊗N (−θ, σ 2 )⊗T . In the case of IF neurons
P is not explicit. It is the image of m0 ⊗ N (−θ, σ 2 )⊗T by the diffusive 10 dynamics
ui (0) ∼ m0 , ui (t + 1) = ϕ[ui (t) + θ)] + wi (t + 1) − θ
(12)
When coupling the neurons, the trajectory of the system of neurons is still a random vector.
Its probability distribution, denoted by QN , has a density with respect to P ⊗N that can be
explicitly computed. This is the main topic of the next subsection.
10
This is indeed a discrete time stochastic difference equations with drift ϕ[ui (t) + θ] − θ and diffusion
wi (t + 1). Incidentally, we shall come back to stochastic differential equations for neural networks in
the section 5 and we shall consider the related Fokker-Planck equation.
Will be inserted by the editor
7
2.5 Computation of the probability distribution of network trajectories.
This section is devoted to the computation of the probability distribution QN . The result
shows that the density of QN with respect to the free dynamics probability P ⊗N depends on
the trajectory variable u only through the empirical measure µu . To achieve this computation
we shall use a key result of stochastic process theory, the Girsanov theorem [23,25], which gives
the density of the new distribution of a diffusion when the drift is changed. Actually, since the
time set is finite, the version of Girsanov theorem that we use is different from the original one
[25] and may be recovered by elementary Gaussian computation. Its derivation is detailed in
the appendix in theorem 19. A similar result may be obtained for continuous time dynamics
using the classical Girsanov theorem (see [7]).
Let us state the finite-time Girsanov theorem
Theorem 2 Let m0 a probability measure on IRd and let N (α, K) be a Gaussian regular
probability on IRd with mean α and covariance matrix K. Let T be a positive integer and
ET = (IRd ){0,...,T } be the space of finite time trajectories in IRd . Let w be a Gaussian random vector in ET with distribution m0 ⊗ N (α, K)T . Let φ and ψ be two measurable applications
of IRd into IRd . Then we define the random vectors x and y in E by:
x0 = w0
(13)
x(t + 1) = φ[x(t)] + w(t + 1)
y0 = w0
(14)
y(t + 1) = ψ[y(t)] + w(t + 1)
Let P and Q be the respective probability distributions on E of x and y, then we have:
dQ
(η) =
dP
T
−1
X
− 21 {ψ[(η(t)] − φ[(η(t)]}t K −1 {ψ[(η(t)] − φ[(η(t)]}
exp
+{ψ[(η(t)] − φ[(η(t)]}t K −1 {η(t + 1) − α − φ[η(t)]}
(15)
t=0
namely Q is absolutely continuous with respect to P .
We shall use this theorem to prove the following:
Theorem 3 The density of the distribution of the network membrane potential QN with respect
to P ⊗N is given by
dQN
(u) = exp N Γ (µu )
dP ⊗N
where the functional Γ is defined on P(F ) by:
(Z
)
Z
T −1
1
1 X
− ξ(t + 1)2 + Φt+1 (η)ξ(t + 1) dgµ (ξ) dµ(η)
Γ (µ) = log
exp 2
σ t=0
2
(16)
(17)
with:
– for AF and BF models: Φt+1 (η) = η(t + 1) + θ
– IF model: Φt+1 (η) = η(t + 1) + θ − ϕ[η(t) + θ]
Remark 1 Let us recall that the Gaussian measure gµ has been defined previously (Definition
2)
PROOF OF THEOREM: Call QN (J ) the conditional distribution of the network state trajectory given J , the matrix of synaptic weights. We shall apply the finite-time Girsanov theorem
N (J )
2 to express dQ
dP ⊗N . To apply the theorem we notice that
8
Will be inserted by the editor
– The difference of the two drift terms ψ[η(t) − φ[η(t)] of the theorem is here the synaptic
potentials (vi (t)). The synaptic potentials vi are functions of the ui ’s according to (3)
vi (J , u)(t + 1) =
N
X
Jij f [uj (t)]
j=1
– The expression of the synaptic noise (wi (t + 1), as a function of the state trajectory ui in the
free dynamics, is given by Φt+1 (ui ). The explicit form of Φ depends on the neuron model
(formal or IF).
We have so
T −1 N
1
1 XX
dQN (J )
2
− vi (J , u)(t + 1) + vi (J , u)(t + 1)Φt+1 (ui )
(u) = exp 2
dP ⊗N
σ t=0 i=1
2
(18)
N
T −1
Y
1
1 X
dQN (J )
2
exp 2
− vi (J , u)(t + 1) + vi (J , u)(t + 1)Φt+1 (ui )
(u) =
dP ⊗N
σ t=0
2
i=1
(19)
We have thus obtained the probability distribution of the membrane potentials in the coupled neurons models, but for a fixed realization of the Jij ’s (conditional probability).
Let us now consider the probability of the quenched variables J = (Jij ). We observed
previously when we introduced the synaptic potential model that under the configuration distribution of J , the random vectors vi (J , u) are independent identically distributed according
to the normal distribution gµu To compute the density of QN with respect to P ⊗N one has
N (J )
thus to average the conditional density dQ
dP ⊗N over the distribution of J . Since the Jij ’s are
independent, the integration separates into products and one gets from (19) the following
Z
N
T −1
X
1 X
1
dQN
2
log
exp
−
(u)
=
exp
ξ(t
+
1)
+
Φ
(u
)ξ(t
+
1)
dgµu (ξ)
t+1 i
dP ⊗N
σ 2 t=0
2
i=1
The sum over i is equivalent to an integration over the empirical measure µu (6), so we have
dQN
(u) = exp N
dP ⊗N
Z
log
(Z
)
T −1
1 X
1
2
exp 2
− ξ(t + 1) + Φt+1 (η)ξ(t + 1) dgµu (ξ) dµu (η)
σ t=0
2
Remark. These equations reminds the generating functional approach derived e.g. by Sompolinsky & al. [38,13] or Molgedey & al [31] allowing to compute the moments of the ui (t)’s.
However, the present approach provides a stronger result. While the generating functional
method deals with weak convergence (convergence of generating function) the method developed here allows to obtain directly the probability distribution of the ui (t)’s. Moreover, by
using large deviations techniques one is able to establish almost-sure convergence results (valid
for only one typical sample).
Let us now state an important corollary of this theorem.
Corollary 4 The empirical measure µu is a random measure governed by QN . It has a density
with respect to the distribution of the empirical measure of the free model, (that is governed by
P ⊗N ), given by:
µ ∈ P(F ) → exp N Γ (µ)
Will be inserted by the editor
9
2.6 Summary of notations
Let us recall the notations of this section. They will be extensively used in the following sections:
Notation
i ∈ {1, ..., N }
t ∈ {0, ..., T }
ui (t)
xi (t)
vi (t)
wi (t)
ui ∈ F
u ∈ FN
xi ∈ E
x ∈ FN
θ
σ
λ
f
Jij
J = (Jij )
Interpretation
individual neuron in a N neuron population
time label of the discrete time dynamics at horizon T
membrane potential of neuron i at time t
activation state of neuron i at time t
synaptic potential of neuron i at time t
synaptic summation noise of neuron i at time t
membrane potential trajectory of neuron i from time 0 to time T
network membrane potentials trajectory (network trajectory) from time 0 to time T
activation state trajectory of neuron i from time 0 to time T
network activation state trajectory from time 0 to time T
common firing threshold of individual neurons
common standard deviation of the synaptic noise
leak current factor for Integrate and fire (IF) neuron model
neuron transfer function converting membrane potential into activation state
synaptic weight from neuron j to neuron i (real random variable)
synaptic weight matrix (random N × N matrix)
2
J
expectation of synaptic weights Jij
N
J2
variance of synaptic weights Jij
N
µ ∈ P(F ) generic probability distribution of individual membrane potential trajectory
η∈F
random vector which takes its values in F under probability distribution µ
P ∈ P(F ) probability distribution of individual membrane potential trajectory for free dynamics
gµ ∈ P(F ) synaptic potential distribution obtained from µ ∈ P(F ) through central limit approximation
QN ∈ P(F N ) probability distribution of network membrane potential trajectory u
3 The mean-field dynamics
3.1 Introduction to mean-field theory
The aim of this section is to describe the evolution of a typical neuron in the limit of large
size networks. This is done by summarizing, in a single term, the effect of the interactions
of this neuron with the other neurons of the network. A mean-field theory provides evolution
equations of the type of free dynamics equation, that are involving single neuron dynamics,
but where an effective interaction term remains. This term, or ”mean-field”, is properly the
average effect of all the interaction of other neurons with the neuron of interest. So the mean
field dynamics is intermediate between the detailed dynamics, which takes into account all the
detailed interactions between neurons, and the free dynamics, which neglects all interactions.
To derive mean-field equations in a direct way, one can replace vi (t) by an approximation
which depends only on the statistical distribution of the uj (t)’s. This approximation takes advantage of the large number of synapses Jij to postulate the vanishing of individual correlations
between neurons or between neurons and configuration variables. This is the hypothesis of ”local chaos” of Amari ([1],[2]), or of ”vanishing correlations” which is usually invoked to support
mean-field equations. In the present context, it can be stated as follows.
In the large size limit, the uj ’s are asymptotically independent, they are also independent
from the configuration parameters
10
Will be inserted by the editor
This approach is very similar to the Boltzmann’s ”molecular chaos hypothesis”11 introduced
by Boltzmann (1872), tailored to neural networks dynamics.
From the central limit theorem we are then allowed to state that the random variable ζ(t +
P
1) = N
j=1 Jij f (uj (t)) is a large sum of approximatively independent identically distributed
variable, and thus that it has approximatively a Gaussian distribution. Thus, we just have to
derive its first and second order moment from the common probability distribution of ui =
(ui (t)) to know completely the distribution of ζ. Henceforth, from a probability distribution
on F which is supposed to be the common probability distribution of the ui ’s, we are able to
derive the distribution of ζ and then the distribution of the resulting potential trajectory and
the state trajectory of a generic vector of the network.
The assumption on which the mean-field approximation is based may look entirely wrong at
first glance. However, in the present context it gives exactly the same results as more elaborated
methods such as the generating functional method, or the large deviations approach developed
below. Moreover, it is supported by the “propagation of chaos” result proved in section 3.4. Note
however that in models with correlated interactions Jij (such as spin-glasses, where Jij = Jji )
the “local chaos” hypothesis leads to wrong results (at low temperature) while generating
functional methods [39,15,14] and large deviations techniques [7] can still be used.
3.2 Mean-field propagation operator and mean-field equation
We now define an evolution operator L, on the set P(F ) of probability distributions on F , that
we call the mean-field propagation operator (or mean-field propagator). Let µ ∈ P(F ) be a
probability measure on F . Let us compute the moments of
∀t ∈ {0, 1, ..., T − 1}, ζ(t + 1) =
N
X
Jij f [uj (t)]
j=1
where the uj ’s are independent identically distributed random vectors with probability distribution µ. They are also independent from the configuration parameters Jij .
2
J¯
and Var[Jij ] = JN , we have
Since E[Jij ] = N
R
E[ζ(t + 1)] = J¯ F f [η(t)]dµ(η)
R
Cov[ζ(s + 1), ζ(t + 1)] = J 2 E f [η(s)]f [η(t)]dµ(η)
(20)
Notice that the expression of the covariance is asymptotic since the sum of squares of expectation
of the synaptic weights may be neglected. So ζ is a Gaussian random vector in F with probability
distribution gµ (see definition 2).
Definition 3 Let µ a probability distribution on F such that the distribution of the first component is m0 . Let u, w, v be three independent random vectors with the following distributions
– the distribution of u is µ,
– the distribution of w is N (0, σ 2 IT ),
– the distribution of v is gµ
11
The word “chaos” is somehow confusing here, especially because we also dealt with deterministic
chaos in the paper I. Actually, “deterministic chaos” and the related exponential correlation decay can
be invoked, in statistical physics, to obtain (deterministic) equations for the mean value and Gaussian
fluctuations of relevant observables. In the present case the basic reason that makes the mean-field
approach “works” is however different. This is the mere fact that the model is fully connected and
that the Jij ’s are independent and vanishing in the limit N → ∞. This is standard result in statistical
physics models such as the Curie-Weiss model but obtaining this for the trajectories of a dynamical
model with quenched disorder requires more elaborated techniques.
Will be inserted by the editor
11
Then L(µ) is the probability distribution on F of the random vector ϑ which is defined by
ϑ(0) = u(0)
(21)
ϑ(t + 1) = v(t + 1) + w(t + 1) − θ
for the formal neuron models (BF and AF), and by
ϑ(0) = u(0)
ϑ(t + 1) = ϕ[u(t) + θ)] + v(t + 1) + w(t + 1) − θ
(22)
for the IF neuron model. The operator L which is defined on P(F ) is called the mean-field propagation operator.
Definition 4 The following equation on µ ∈ P(F )
L(µ) = µ
(23)
is called the mean-field equation (MFE)
Remark 2 The mean-field equation is the achievement of mean-field approach. To determine
the distribution of an individual trajectory, we suppose that this distribution governs the interaction of all the units onto the selected one. The resulting distribution of the selected unit has
to be the same distribution than the generic distribution. This is summarized in the mean-field
equation
L(µ) = µ
Equations (21) (resp. (22) for the IF model) with the specification of the probability distributions define the mean-field dynamics. Actually, the distribution L(µ) is just the convolution
of the probability distributions P and the Gaussian distribution gµ . More precisely, if we apply
the discrete time Girsanov theorem 19 of the appendix, we have:
Theorem 5 L(µ) is absolutely continuous with respect to P and its density is given by
dL(µ)
(η) =
dP
Z
exp
T −1
1
1 X
2
−
ξ(t
+
1)
+
Φ
(η)ξ(t
+
1)
dgµ (ξ)
t+1
σ 2 t=0
2
(24)
PROOF : The proof is essentially a simplified version of the application of the finite-time
Girsanov theorem which was used to prove theorem (3). The conditioning is done here with
respect to v which is the difference between the drift terms of the free dynamics and of the
mean-field dynamics.
Remark 3 We have to notice for further use that
Z
dL(µ)
Γ (µ) = log
(η)dµ(η)
dP
(25)
In all the cases, for 0 < t < T the projection of the distributions Γ (µ) and L(µ) on the t + 1
first time steps just depends on the projection of µ on the t first instants. Since the projection
of µ on the initial instant is always m0 , the projection of L(µ) on the two first instants {0, 1}
depend only on m0 and similarly, the projection of Lt (µ) on the t + 1 first instants {0, 1, ..., t}
depends only on m0 . Eventually µT = LT (µ) = LT (P ) depends only on m0 and it is the only
fixed point of the mean-field propagation operator L.
So we have shown the following
Theorem 6 The probability measure µT =LT (P ) is the only solution of the mean-field equation
with initial condition m0 .
12
Will be inserted by the editor
3.3 Large Deviation Principle for RRNN mean-field theory
In this section, we fully use the computation results of the previous section to show the rigorous
foundations of mean-field theory for RRNN. The approach is the following:
(a) The empirical measure µu of the network dynamics satisfies a large deviation principle
(LDP) under P ⊗N with a good rate function µ ∈ P(F ) → I(µ, P ) ∈ IR+ , the relative
entropy between µ and P . Actually, when the size of the network tends to infinity, the
empirical measure converges in distribution exponentially fast towards P . The definition of
LDP and its consequences are outlined in the appendix in definition 3.3. Sanov theorem is
stated in appendix, theorem 24.
(b) According to corollary 4, the density of the new distribution of µu with respect to the
original distribution when we switch from P ⊗N , that governs the free dynamics, to QN ,
that governs the RRNN dynamics is exp N Γ (µ).
(c) Combining (a) and (b), one obtains that under QN , the sequence µu satisfies a LDP with
the good rate function
H(µ) = I(µ, P ) − Γ (µ)
(26)
This kind of result is used in statistical physics under the name of Gibbs variational principle
[21]. The functional H is called, in statistical physics, a thermodynamic potential (e.g. free
energy or Gibbs potential). Notice that the classical statistical mechanics framework is
relative to equilibrium probability distributions on the space of microscopic states. It is
applied here to trajectories. For that reason, this approach is called the dynamic mean-field
theory [38]. It is quite technical to support it rigorously. One has to show that H is lower
semi-continuous and is a good rate function (see Varadhan’s theorem 23 of the appendix).
This kind of proof is rather technical. To reduce the size of the paper we admit the following
result (see [7] for a general approach and [34] for the proof for AFRRNN model)
Theorem 7 Under the respective distributions QN the family of empirical measures (µN )
of P(F ) satisfies a full large deviation principle with a good rate function H given by (26).
(d) It is clear from remark 3 that H(µT ) = 0 where µT is is the unique solution of MFE with
initial condition m0 , so it is the fixed point of L. Thus µT is a minimum of H.
The basic computation is the following: first we apply the definition 19 of the relative entropy
that is given in the appendix
Z
dµT
(η)dµT (η)
I(µT , P ) = log
dP
Since µT is the solution of MFE, we have
dL(µT )
dµT
(η) =
(η)
dP
dP
then we apply the previous remark 3 which states
Z
dL(µT )
Γ (µT ) = log
(η)dµT (η)
dP
to check
I(µT , P ) = Γ (µT ) ⇒ H(µT ) = 0
(e) To obtain the exponential convergence of the sequence of empirical measures µu under QN
when N → ∞, one has eventually to show that H(µ) = 0 ⇒ µ = µT . This point is technical
too. It is proved in a similar still more general framework (continuous time) in [7] using a
Taylor expansion. The same method is and applied to show the uniqueness for AFRRNN
model in [34].
Thus, we have the main result of that section:
Theorem 8 When the size N of the network goes to infinity, the sequence of empirical measures (µu ) converges in probability exponentially fast towards µT which is the unique solution
of the mean-field equation L(µ) = µ
Will be inserted by the editor
13
3.4 Main results of RRNN mean-field theory
First notice that theorem 8 may be extended to RRNN with fast decreasing connection weights
distribution. More precisely, assume that the common distribution νN of the connexion weights
satisfies the following:
Hypothesis 9 (H) For all N , the common probability law νN of the connexion weights satisfies
R
J¯
(i)
wdν (w) = N
R 2 N
2
J¯2
(ii) w dνN (w) = JN + N
2
R
(iii) ∃a > 0, ∃D > 0 such that exp(aN w2 )dνN (w) ≤ D
then the family (νN ) is said to satisfy hypothesis (H).
Then, from theorem 8 two important results can be deduced rigorously. The first one is a
”propagation of chaos” result which supports the basic intuition of mean field theory about
the asymptotic independence of finite subsets of individuals when the population size grows to
infinity.
Theorem 10 Let k be a positive integer and (fi )i∈{1,...,k} be k continuous bounded functions
on F , when the size N of the network goes to infinity, then
Z Y
k
i=1
fi (ui )dQN (u) →
k Z
Y
fi (η)dµ0 (η)
i=1
PROOF : The idea of the proof is due to Snitzman [41].
First, a straightforward consequence of theorem 8 is that when we apply theR sequence of
Qk
random measures (µN ) to the test function F on P(F ) defined by F (µ) = i=1 fi (ui )dµ(u)
we get the convergence of
Z Y
k Z
N
k
Y
1 X
fi (uj ) dQN (u) =
fi (η)dµ0 (η)
lim
N →∞
N j=1
i=1
i=1
i
R Qk 1 hPN
R Qk
Thus it remains to compare
f
(u
)
dQN (u) and
i
j
i=1 N
j=1
i=1 fi (ui )dQN (u) From
the symmetry property of QN , it is clear that for any subset {j1 , ..., jk } of k neurons among N,
we have
Z Y
Z Y
k
k
fi (ui )dQN (u)
fi (uji )dQN (u) =
i=1
i=1
i
1
If we develop
f
(u
)
dQN (u), we get
i=1 N
j=1 i j
Z Y
N
k
X
1
1 X
fi (uj ) dQN (u) = k
N j=1
N
i=1
R Qk
hP
N
{j1 ,...,jk }
Z Y
k
fi (uji )dQN (u)
(27)
i=1
The average sum in (27) is here over all applications of {1, ..., k} in {1, ..., N }. And the
equality is proved if we replace it by the average over all injections of {1, ..., k} in {1, ..., N },
since the terms are all equal for injections. But when N goes to infinity the proportion of
N!
injections which is (N −k)!N
k goes to 1 and thus the contributions of repeated k-uple is negligible
when N is large. Therefore
Z Y
Z Y
k
N
k
X
1
fi (ui )dQN (u) = 0
lim
fi (uj ) dQN (u) −
N →∞
N
i=1
i=1
j=1
14
Will be inserted by the editor
Still, this propagation of chaos result is valid when the expectation of the test function is
taken with respect to the connection distribution. Thus, it doesn’t say anything precise about
the observation relative to a single large-sized network.
Actually, since exponentially fast convergence in probability implies almost-sure convergence
form Borel-Cantelli lemma, we are able to infer the following statement from theorem 8. Recall
that we note (as in the proof of theorem 3) QN (J ) the conditional distribution of the network
PN
state trajectory given J the system of synaptic weights and we define µN (u) = N1 i=1 δui for
the empirical measure on F which is associated to a network trajectory u ∈ F N .
Theorem 11 Let F be a bounded continuous functional on P(F ), we have almost surely in J
Z
lim
F [µN (u)]dQN (J )(u) = F (µT )
N →∞
Note that we cannot use this theorem to infer a ”quenched” propagation of chaos result similar
to theorem 10, which was an “annealed” propagation of chaos result (i.e. averaged over the
connection weight distribution). This is not possible because, for a given network configuration
J , QN (J ) is no more symmetric with respect to the individual neurons. Nevertheless, we
obtain the
R following crucial result, applying theorem 11 to the case where F is the linear form
F (µ) = f dµ
Theorem 12 Let f be a bounded continuous function on F , we have almost surely in J
Z
N Z
1 X
f (ui )dQN (J )(u) = f (η)dµT (η)
N →∞ N
i=1
lim
The consequences of these results are developed in the next section.
4 Mean-field dynamics for analog networks
We are interested in the stationary dynamics of large random recurrent neural networks. Moreover since we want to study the meaning of oscillations and of (deterministic) chaos observed
in the finite sized models (see paper I), the regime of low noise is specially interesting since the
oscillations are practically canceled if the noise is too strong. For these reasons, we cannot be
practically satisfied by obtaining the limit µT of the empirical measures. So we shall extract
from µT dynamical informations on the asymptotics of the network trajectories. Notice that
the distribution of the connexion weight distribution is not necessarily Gaussian as long as it
satisfies hypothesis (H:9).
4.1 Mean-field dynamics of homogeneous networks
4.1.1 General mean-field equations for moments
Recall that in section 2 of this chapter (definition 2) we defined for any probability measure
µ ∈ P(F ) the two first moments of µ, mµ and cµ . Let us recall these notations:
R
mµ (t + 1) = J¯ f [η(t)]dµ(η)
R
cµ (s + 1, t + 1) = J 2 f [η(s)]f [η(t)]dµ(η)
q (t + 1) = c (t + 1, t + 1)
µ
µ
where f is the sigmoid function f (x) =
ex
1 + ex
Will be inserted by the editor
15
In this section, in order to alleviate notations, we note m, c, q instead of mµT , cµT , qµT where
µT is the asymptotic probability that was shown to be a fixed point of the mean-field evolution
operator L in last section. By expressing that µT is a fixed point of L, we shall produce some
evolution autonomous dynamics on the moments m, c, q.
More precisely we have from the definition of L (see definition 3 in section 3) that the law
of η(t) under µT is a Gaussian law of mean m(t) − θ and of variance q(t) + σ 2 (see equations
(20) and (21)). So we have
R p
m(t + 1) = J¯ R f [ pq(t) + σ 2 ξ + m(t) − θ]dγ(ξ)
q(t + 1) = J 2 f [ q(t) + σ 2 ξ + m(t) − θ]2 dγ(ξ)
where γ is the standard Gaussian probability on IR: dγ(ξ) =
√1
2π
(28)
h 2i
exp − ξ2 dξ.
Moreover, the covariance of (η(s), η(t)) under µT is c(s, t) if s 6= t. Thus in this case,
considering the standard integration formula of a 2 dimensional Gaussian vector:
E[f (X)g(Y
q )] =
p
RR
Cov(X,Y )
V ar(X)V ar(Y )−Cov(X,Y )2
√
ξ1 +
f
ξ2 + E(X) g[ V ar(Y )ξ2 + E(Y )]dγ(ξ1 )dγ(ξ2 )
V ar(Y )
V ar(Y )
we obtain the following evolution equation for covariance:
c(s + 1, t+ 1) =
q
p
RR
c(s,t)
[q(s)+σ2 ][q(t)+σ2 ]−c(s,t)2
√
ξ
+
J2
f
ξ
+
m(s)
−
θ
f [ q(t) + σ 2 ξ2 + m(t) − θ]dγ(ξ1 )dγ(ξ2 )
2
1
2
q(t)+σ
2
q(t)+σ
(29)
The dynamics of the mean-field system (28,29) can be studied as a function of the parameters:
– the mean J¯ of the connexion weights,
– the standard deviation J 2 of the connexion weights
– the firing threshold θ of neurons.
Notice that the time and size limits do not necessarily commute. Therefore, any result on long
time dynamics of the mean-field system may not be an exact prediction of the large-size limit of
stationary dynamics of random recurrent networks. However, for our model, extensive numerical
simulations have shown ([12],[17] and chapter I) that the time asymptotics of the mean-field
system is informative about moderately large random recurrent network stationary dynamics
(from size of some hundred neurons).
More precisely, in the low noise limit (σ << 1), two points of view are interesting:
– the ensemble stationary dynamics is given by the study of the time asymptotics of the
dynamical system
R p
m(t + 1) = J¯ R f [ pq(t)ξ + m(t) − θ]dγ(ξ)
(30)
q(t + 1) = J 2 f [ q(t)ξ + m(t) − θ]2 dγ(ξ)
– the synchronization of the individual neuron trajectories. Actually, m(t) and q(t) may converge, when t → ∞, towards limits m∗ and q ∗ (stable equilibria of the dynamical system
30) with a great variety of dynamical behaviors. Each individual trajectory may converge
to a fixed point and (m∗ , q ∗ ) are the statistical moments of the fixed point empirical distributions. Another case is provided by individual chaotic oscillations around m∗ where q ∗
measures the amplitude of the oscillations.
The discrimination between these two situations which are very different from the point of view
of neuron dynamics is given by the study of the mean quadratic distance which will be outlined
in the next paragraph.
16
Will be inserted by the editor
4.1.2 Study of the mean quadratic distance
The concept of mean quadratic distance was introduced by Derrida and Pommeau in [19] to
study the chaotic dynamics of extremely diluted large size networks. The method originates to
check the sensitivity of the dynamical system to initial conditions. The idea is the following:
let us consider two networks trajectories u(1) and u(2) of the same network configuration which
is given by the synaptic weight matrix (Jij ). Their mean quadratic distance is defined by
d1,2 (t) =
N
1 X (1)
(2)
[u (t) − ui (t)]2
N i=1 i
For a given configuration, if the network trajectory converges towards a stable equilibrium or
towards a limit cycle (synchronous individual trajectories), then the mean quadratic distance
between closely initialized trajectories goes to 0 when times goes to infinity. On the contrary,
when this distance goes far from 0, for instance converges towards a non zero limit, whatever
close the initial conditions are, the network dynamics present in some sense ”sensitivity to
initial conditions” and thus this behavior of the mean quadratic distance can be considered
to be symptomatic of chaos. We applied this idea in [11] to characterize instability of random
recurrent neural network.
In the context of large deviation based mean-field theory, the trajectories u(1) and u(2) are
submitted to independent synaptic noises and the mean quadratic distance is defined by
N Z
1 X
(1,2)
(2)
(1)
(31)
[ui (t) − ui (t)]2 dQN (u(1) , u(2) )
d1,2 (t) =
N i=1
(1,2)
where QN is the joint probability law on F 2N of the network trajectories (u(1) , u(2) ) over the
time interval {0, ..., T }. Following the same lines as in last sections, it is easy to show a large
(1,2)
(2)
(1)
deviation principle for the empirical measure of the sample (ui , ui )i∈{1,...,N under QN
when N → ∞. Then we get the almost sure convergence theorem
Z
N Z
1 X
(1,2)
lim
f1 (u1i )f2 (u2i )dQN (J )(u) = f1 (η1 )f2 (η2 )dµT (η1 , η2 )
N →∞ N
i=1
(1,2)
where µT is the fixed point of the mean-field evolution operator L(1,2) of the joint trajectories,
which is defined on the probability measure set P(F × F) exactly in the same way as L was
defined previously in definition 3.
Then if we define the instantaneous covariance between two trajectories by:
Definition 5 The instantaneous cross covariance between the two trajectories under their joint
probability law is defined by
Z
(1,2)
(32)
c1,2 (t) = η1 (t)η2 (t)dµT (η1 , η2 )
(1,2)
where µT
is the fixed point measure of the joint evolution operator L(1,2) defined from an
(1,2)
initial condition µinit ,
then we can follow the argument, which was already used for the covariance evolution equation
(29). Thus we obtain the following evolution equation for the instantaneous cross covariance
equation
c1,2 (t + 1)
=
q
p
R
R
[q1 (t)+σ2 ][q2 (t)+σ2 ]−c1,2 (t)2
c1,2 (t)
2
ξ1 + √
ξ2 + m1 (t) − θ f [ q(t) + σ 2 ξ2 + m2 (t) − θ]dγ(ξ1 )dγ(ξ2 )
J
f
q2 (t)+σ2
q2 (t)
(33)
Will be inserted by the editor
17
The proof is detailed in [33].
It is obvious now to infer the evolution of the mean quadratic distance from the following
square expansion.
Proposition 13 The mean quadratic distance obeys the relation
d1,2 (t) = q1 (t) + q2 (t) − 2c1,2 (t) + [m1 (t) − m2 (t)]2
(34)
4.1.3 Study of the special case of balanced inhibition
In order to show how the previous equations are used we shall display the special case of
balanced inhibition and excitation. The study of the discrete time 1-dimensional dynamical
system with different parameters was addressed in paper I. See also ([12] and [17]) for more
details.
We choose in the previous model the special case where J¯ = 0. This choice simplifies
considerably the evolution study since ∀t, m(t) = 0 and the recurrence over q(t) is autonomous.
So we have just to study the attractors of a single real function.
Moreover, the interpretation of J¯ = 0 is that there is a general balance in the network
between inhibitory and excitatory connections. Of course, the model is still far from biological
plausibility since the generic neuron is endowed both with excitatory and inhibitory functions.
In next section, the model with several populations will be addressed. Nevertheless, the case
J¯ = 0 is of special interest. In the limit of low noise, the mean-field dynamical system amounts
to the recurrence equation:
Z p
2
(35)
q(t + 1) = J
f [ q(t)ξ − θ]2 dγ(ξ)
we can scale q(t) to J 2 and we obtain
Z
p
q(t + 1) = f [J 2 q(t)ξ − θ]2 dγ(ξ) = hJ 2 ,θ [q(t)]
(36)
where the function hJ 2 ,θ of IR+ into IR+ is defined by
Z
p
hJ 2 ,θ (q) = f [J 2 q(t)ξ − θ]2 dγ(ξ)
This function is positive, increasing and tends to 0.5 when q tends to infinity. The recurrence
(36) admits on IR+ a single stable fixed point q ∗ (J 2 , θ). This fixed point is increasing with J 2 and
decreasing with θ. We represent in figure 1 the diagram of the variations of function q ∗ (J 2 , θ). It
is obtained from a numerical simulation with a computation of hJ 2 ,θ by Monte-Carlo method.
Let us now consider the stability of the network dynamics by studying the covariance and
the mean quadratic distance evolution equation. The covariance evolution equation (29) in the
low noise limit and when t → ∞ amounts to
q
RR
√
q∗2 −c(s,t)2
c(s,t)
2
√
ξ1 + q∗ ξ2 − θ f ( q ∗ ξ2 − θ) dγ(ξ1 )dγ(ξ2 ) (37)
c(s + 1, t + 1) = J
f
q∗
Scaling the covariance with J 2 we obtain the recurrence
c(s + 1, t + 1) = HJ 2 ,θ,q [c(s, t)]
with
HJ 2 ,θ,q (c) =
Z Z
f
J
2
s
!
c
√
q 2 − c2
ξ1 + √ ξ2 − θ f J 2 qξ2 − θ dγ(ξ1 )dγ(ξ2 )
q
q
(38)
18
Will be inserted by the editor
Fig. 1. Variations of the fixed point q ∗ (J 2 , θ) as a function of the network configuration parameters
Fig. 2. Variations of q ∗ − c∗ as a function of the network configuration parameters J 2 and θ
It is clear from comparing with equation (35) that q ∗ is a fixed point of HJ 2 ,θ,q . To study
the stability of this fixed point, standard computation shows that
Z
2
√
dHJ 2 ,θ,q∗ ∗
(39)
(q ) = f ′ J 2 q ∗ ξ2 − θ dγ(ξ)
dc
dH
2
∗
Then, as it is stated in paper I, the condition Jdc,θ,q (q ∗ ) ≤ 1 is a necessary and sufficient
condition for the stability of q ∗ . A detailed and rigorous proof for θ = 0 is provided in [33].
Then two cases occur.
dH
J ,θ,q
(q ∗ ) ≤ 1,the stationary limit of c(s + τ, t + τ ) when τ → ∞
– In the first case where
dc
does not depend on t − s and is c∗ = q ∗ . The stationary limit of the mean-field Gaussian
process is a random point. Its variance is increasing with J 2 and decreasing with θ.
dHJ 2 ,θ,q∗
(q ∗ ) > 1 does not depend on t − s when t − s 6= 0 and
– In the second case where
dc
∗
∗
is equal to c < q . The stationary limit of the Gaussian process is the sum of a random
point and of a white noise. From the dynamical system point of view, this corresponds to
2
∗
Will be inserted by the editor
19
a chaotic regime (with infinitely many degrees of freedom). The signature of chaos is given
by the evolution of the mean quadratic distance. The instantaneous covariance converges
also towards c∗ . Therefore the mean quadratic distance converges towards a non zero limit,
which is independent of the initial condition distance. As shown in [11] the transition from
fixed point to chaos is given by an explicit equation which is the same as the equation of
the De Almeida-Thouless line [5] in spin-glasses models. The analogy between these two
systems is further developed in [6,11].
The figures 1 and 2 shows the evolution of q ∗ and q ∗ − c∗ as a function of J 2 and θ. When
J 2 is small, there is no bifurcation. When J 2 is larger, a transition to chaos occurs when θ is
decreasing. When J 2 is growing, the transition to chaos occurs for increasing θ values. Figure
31 of paper I shows the interest of variation of input (which is equivalent to threshold variation)
allows to hold up the occurence of chaos.
4.2 Mean-field dynamics of 2-population AFRRNN
4.2.1 2-population AFRRNN model
As it was announced previously, the assumption of a homogeneous connexion weight model
is not plausible. Besides, in literature, RRNN models with several neuron populations have
been studied as early as in 1977 with [2] and have been thoroughly investigated in the last
decade (see for instance [27]). The heterogeneity of neuron population induces interesting and
complex dynamical phenomena such as synchronization. Actually the mean-field theory that
was developed in the previous sections may be extended without major difficulty to several
neuron populations. To give a practical idea of what can be obtained such extensions we consider
here two populations with respectively N1 = λN and N2 = (1 − λ)N neurons where λ ∈]0, 1[
and where N → ∞.
Four connexion random matrices have to be considered in this model J11 , J12 , J21 , J22
where Jij is the matrix of connexion weights from population j to population i. The random
matrix Jij is a (Nj × Ni ) random matrix with independent identically distributed entries. Their
2
distribution is governed by statistical parameters (J¯ij , Jij
) and obeys hypothesis (9). They are
independent altogether.
However, the technical hypothesis (H) does not allow us to give to connexion weights a
rigorously constant sign, permitting to distinguish between inhibitory and excitatory neurons.
Indeed, there is no probability distribution on positive (resp. negative) real numbers, having
2
J¯
and JN . Thus, the positivity of the support
a mean and a variance respectively scaling as N
induces on the other side of the distribution a heavy tail which will not respect assumption
(iii) in hypothesis (H). However, it is possible to consider probability distributions which are
checking hypothesis (H) and which are loading the negative numbers (or alternatively) the
positive ones) with arbitrary small probability.
We consider here a 2-population model with a population of excitatory neurons and a
population of inhibitory neurons (up to the above restriction).
4.2.2 General mean-field equations for moments
A large deviation principle may be obtained for the 2-population model for Gaussian connexion
weights. So, the convergence in finite time to the mean-field dynamics is shown, in the present
model with the same proof as in the previous 1-population model. See [33] for a rigorous proof
and [16] for a more practical statement of results. The limit of the empirical measure is the
law of a Gaussian vector which takes its values in F × F. Each factor stands to describe the
repartition of a neural population. Note that the two components are independent. As for the
1-population model we note mk (t), qk (t), ck ((s, t) the mean, variance and covariance at given
times of the empirical measure of population k (k ∈ {1, 2}). The mean-field evolution equation
for these moments is described by the following system:
20
Will be inserted by the editor
R p
P
¯kj f [ qj (t) + σ 2 ξ + mj (t) − θj ]dγ(ξ)
J
m
(t
+
1)
=
k
j∈{1,2}
R p
P
2
2
2
j (t) + σ ξ + mj (t) − θj ] dγ(ξ)
qk (t + 1) = j∈{1,2} Jkj f [ q
q
X
R
ck (s,t)
[qk (s)+σ2 ][qk (t)+σ2 ]−ck (s,t)2
2
ξ1 + √
Jkj f
ck (s + 1, t + 1) =
ξ2 + mk (s) − θk ×
qk (t)+σ2
qk (t)+σ2
j∈{1,2}
p
×f [ qk (t) + σ 2 ξ2 + mk (t) − θk ]dγ(ξ1 )dγ(ξ2 )
(40)
4.2.3 Results and discussion
As far as numerical studies are concerned, we choose the following values for the statistical
parameters
J¯ = gd
J1,1 = √g
¯1,1
J1,2 = −2gd
J1,2 = 2g
(41)
¯2,1 = gd
J
J
2,1 = g
¯
J22 = 0
J22 = 0
In this study, according to some biological scheme, excitatory neurons are connected both to
excitatory neurons and inhibitory neurons and inhibitory neurons are both connected to excitatory neurons. Moreover, the number of parameters is reduced to allow numerical exploration
of the synchronization parameter. We keep two independent parameters:
– g stands for the non linearity of the transfer function
– d stands for the differentiation of the two populations (inhibitory vs. excitatory).
Considering the firing thresholds as previously, there is no variation about individual thresholds. Excitatory neuron threshold θ1 is chosen equal to 0 and inhibitory neuron threshold θ2 is
chosen equal to 0.3 because the activation potential of inhibitory neurons is always positive.
In the bifurcation map of 3 (extracted from [16]) several dynamical regimes are displayed and
the corresponding numerical ranges of parameters d and g are displayed. Notice that theoretical
previsions of the mean-field equations (40) and the large scale simulations of large-sized networks
behavior are consistent.
As in the homogeneous case there is a transition to chaos for weak d. When the differentiation
parameter d is sufficiently large (about 2), the fixed point looses its stability through a Hopf
bifurcation to give rise to synchronous oscillations when g is growing. There is then a succession
of bifurcations leading to chaos (see paper I).
Moreover, a new phenomenon occurs. For large g, there is a significant transition regime between stationary chaos and synchronized oscillations which is named ”cyclo-stationary chaos”.
In that regime, statistical parameters are exhibiting regular periodic oscillations, though individual trajectories are diverging with a mean quadratic distance behaviour which is characteristic
from chaos.
5 MFT-based oscillation analysis in IF networks.
In this section we would like to give an interesting application of mean-field approaches for
spiking neurons. It was developed in [9]. This paper is part of a current of research which
studies the occurrence of synchronized oscillations in recurrent spiking neural networks [4,3,
8], in order to give an account of spatio-temporal synchronization effects which are observed in
many situations in neural systems [24,36,10,35].
Will be inserted by the editor
21
Fig. 3. Bifurcation map of the 2-population model
5.1 IFRRNN continuous-time model
The model of [9] has continuous time. There is no synaptic noise but the neurons are submitted
to a random external output. So, equation (9) has to be replaced by
u(t) < θ
⇒ τ u̇(t) = −u(t) + vnet (t) + vext (t)
(42)
u(t − 0) = θ ⇒ u(t + 0) = ϑ
where
–
–
–
–
τ is the characteristic time of the neuron,
vnet is the synaptic input from the network,
vext is the external input,
ϑ is the reset potential and 0 < ϑ < θ. Note that u(t − 0) and u(t + 0) are respectively the
left and right limits of u at firing time t. Thus, the refractory period is assumed to be zero.
This model of continuous time neuron dynamics is introduced in paper I, section 2.2.4.
Moreover, since the inputs are modeled by continuous-time stochastic processes, equation
(42) is a stochastic differential equation of the type
τ du(t) = −u(t)dt + dVt
(43)
with dV (t) = dVext (t) + dVnet (t)
Now we shall explicit these stochastic processes, in order to obtain the Fokker-Planck equation of the network dynamics, in a mean-field approximation.
5.2 Modeling the external input
The network is a recurrent inhibitory network and we study its reaction to random excitatory
synaptic inputs. We suppose that in the network each neuron receives excitations from Cext
external neurons connected via constant excitatory synapses Jext . The corresponding external
current is a Poisson process with emission frequency νext .
22
Will be inserted by the editor
Let us examine the effect of a superimposition of a large number C of independent identically
distributed low-rate ν Poisson processes. Put
I(t) = J
C
X
i=1
Ni (t)
where Ni (t) are i.i.d. Poisson processes with firing rate ν. Then I(t) is a stochastic process
with independent stationary increments
such that E(I(t)) = µt = JCνt and Var(I(t)) = σ 2 t =
√
2
J Cνt. Thus µ = JCν and σ = J Cν.
We are interested in studying such processes when they reach the firing threshold θ which
is far larger than the elementary increment J. In typical neural applications, J = 0.1 mv and
θ = 20 mV. At this level, operating a classical time-space rescaling, I(t) appears like a Gaussian
process with independent increments and same moments. We have
dI(t) ∼ µdt + σdBt
where (Bt ) is the standard Brownian motion. If we apply the subsequent to the external synaptic
input we get the following modeling in the limit of large size and low rate
with µext = Jext Cext νext
dVext (t) = µext dt + σext dB(t)
√
and σext = Jext Cext νext .
5.3 Mean-field approximation of the internal input
In the framework of continuous-time modeling, the synaptic input definition of vnet for IF
neuron i which was, according to equation (3),
vi (J , u)(t + 1) =
N
X
Jij xj (t),
j=1
has to be replaced by
vi (J , u)(t) = τ
N
X
j=1
Jij
X
k
δ t − Tjk (u) − D
(44)
where
– δ is the Dirac distribution,
– Tjk (u) are the successive firing times of neuron j during the network trajectory u,
– D is the synaptic transmission delay.
In the present study, the network is supposed to be sparsely connected. All the connexion
weights are equal to −J as soon as they are non zero. Each neuron is connected to C neurons
which are randomly drawn among the network with C << N connections, where C is a fixed
integer and N is the total number of neurons. Another model is considered further where the
C
and to
connection weights are independent random variables equal to −J with probability N
0 else. We shall focus here on the first model.
In previous sections, mean-field approximation in the finite time set framework consisted in
finding a fixed point for the mean-field propagation operator L. Namely:
– approximating random vectors vi by Gaussian vectors with a probability distribution gµ ,
where µ is a probability law on the individual neuron potential trajectory space (finitedimensional vector space)
Will be inserted by the editor
23
– finding µ as the probability law of the neuron dynamical equation with this approximation
for the synaptic input.
The mean-field approximation in [9] follows the same logic.
The first step of the mean field approximation consists for a given rate function ν in defining
the non stationary Gaussian process
dVnet (t) = µnet (t)dt + σnet (t)dB(t)
(45)
where
– the drift µnet is given by
µnet (t) = −CJν(t − D)τ
(46)
– and where the diffusion coefficient σnet is given by
σnet (t)2 = J 2 Cν(t − D)τ
(47)
The second step consists in considering the following diffusion with ”tunneling effect”
u(t) < θ
⇒ τ du(t) = −u(t)dt + dVnet (t) + dVext (t)
(48)
u(t − 0) = θ ⇒ u(t + 0) = ϑ
The terminology “tunneling effect”, referring to quantum mechanics, is somewhat curious
here. It has its roots in the following remark. Whenever the membrane potential reaches θ it is
reset to ϑ. If we interpret eq. (48) in the context of a random particle motion, the “particle”
is “instantaneously transported” from the point θ to ϑ. This analogy is not only formal. The
“tunneling effect” induces a specific behavior for the probability current at the boundary u = θ.
In the present model, this current is directly related to the firing rate (see next section).
5.4 Fokker-Planck equation.
5.4.1 Closed form equation
Note p(u, t) the probability density of the solution u(t) of (48). Define
µ(t) = µ
pnet (t) + µext (t)
σ(t) = σnet (t)2 + σext (t)2
Then p(u, t) is solution of the Fokker-Planck equation for diffusion process for u < θ and u 6= ϑ:
∂
σ(t)2 ∂ 2 p
∂p
(u, t) +
(u, t) =
[(u − µ(t))p(u, t)]
2
∂t
2 ∂u
∂u
(49)
The tunneling effect from θ to ϑ is taken into account in the following boundary conditions
p(θ, t) = 0
(50)
∂p
∂p
∂p
∂u (ϑ + 0, t) = ∂u (ϑ − 0, t) + ∂u (θ − 0, t)
This corresponds to a re-injection of the outgoing probability current j(θ, t) at u = ϑ, where
∂p
j = ∂u
. Thus j(ϑ + 0, t) = j(ϑ − 0, t) + j(θ, t). The outgoing current (re-injected current) is, by
definition, the average firing rate defined by
ν(t) =
∂p
(θ − 0, t)
∂u
(51)
24
Will be inserted by the editor
5.4.2 Stationary solution
It is easy to find the stationary solution of the previous equation
∂p
(u, t) = 0
∂t
Suppose a given constant firing rate ν0 , then set
µ0 = p
−CJν0 τ + µext
2
σ0 = CJ 2 ν0 τ + σext
(52)
and plug it into the differential second order equation
d
σ02 d2 p
+
[(u − µ0 )p(u)] = 0
2 du2
du
(53)
with the following boundary conditions
p(θ) = 0
dp
du (ϑ + 0) =
dp
du (ϑ
− 0) +
dp
du (θ
− 0, t)
(54)
One obtains easily the following stationary distribution
(
For u < ϑ, p(u) =
For u ≥ ϑ, p(u) =
2
2ν0 −yu
τ e
2
2ν0 −yu
τ e
2
RJυϑθ
u
2
ey dy
2
ey dy
0
and yθ = θ−µ
R ∞σ0
Then the normalization condition −∞ p(u)du = 1 allows to infer
where yu =
u−µ0
σ0 , yϑ
=
ϑ−µ0
σ0
R Jθ2
1
=
ν0 τ
Z
0
+∞
e−y
2
e2yθ y − e2yϑ y
dy
y
(55)
The relations (52,55) allows to compute numerically ν0 . The equation (55) can be approximately solved in the situation where the fluctuations σ0 are weak (i.e. yθ >> 1 which means
that the spiking events are rare). In this case :
2
yθ
ν0 τ ≈ √ e−yθ
π
(56)
This asymptotic expression can be compared to the escape probability from the equation of
motion of a particle in a parabolic potential well V, with minimum µ0 , submitted to a Brownian
excitation
τ dVt = −(V − µ0 )dt + σ0 dBt
The time rate to reach V = θ is thus given by the Arrhenius time
2
ν0 τ ∼ e−yθ
Numerical values of ν0 which are inferred from equations (55) and (56) are compared in [9]
to the result of numerical simulations of the network and there is a good agreement between
theoretical predictions and simulated firing rates.
Will be inserted by the editor
25
OS
µ
ext
SS
22
20
0 1
σ
ext
Fig. 4. Sketch of the bifurcation diagram of the model (10,44) when varying the parameters µext , σext
controlling the Poisson process of external excitation. SS means Stationary State, while OS means
Oscillatory State. The solid line represents the instability line for D = 0.1τ . (Drawn by hand from [9])
5.4.3 Stability analysis.
The stability analysis for the stationary solution uses normal form techniques similar to those described in paper I, but in an infinite dimensional space. The Fokker-Planck equation is rescaled
and expanded around the steady-state solution. This intricate computation is fully detailed in
[9] . We simply focus to the results.
The Authors find that there is a bifurcation of Hopf type for the stationary solution. Thus,
for a certain parameter range, the system exhibits synchronized oscillations of the neurons.
A sketch of the bifurcation map is given in figure 4 when varying the parameters µext , σext
controlling the external excitation.
One can see from that bifurcation diagram that the bifurcation occurs when the drift of the
external input is increasing. On the opposite, an increase of the dispersion of the external input
stabilizes the steady state. If the external input consists in the superposition of i.i.d. Poisson
processes as it was detailed above, then the increase of their common frequency νe xt induces
the occurrence of an oscillatory regime. There is still a good agreement between the predictions
of mean-field theory and the results of simulations.
5.5 Conclusion
Thus, the conclusion is that in this model of a neural network with a sparsely connected
inhibitory integrate-and-fire neurons, submitted to a external excitatory Poisson process, and
emitting spikes irregularly at a low rate, there is, in the thermodynamic limit, a sharp transition
between a regime where the average global is constant, to a synchronized state where neurons are
weakly synchronized. The activity becomes oscillatory when the inhibitory feedback is strong
enough. Note that the period of the global oscillations depends on the synaptic transmission
delay which cannot be neglected.
Finally, let us mention that the Authors performed a finite size analysis of the model and
found that global oscillations of finite coherence time generically exist above and below the
critical inhibition threshold.
26
Will be inserted by the editor
6 Appendix about probability theory
This paper uses intensively some classical notations and concepts coming from probability
theory. The proofs are omitted but sometimes the results follow from advanced results of this
theory. It is not possible to recall here the necessary prerequisites. There are excellent books
about probability theory for physicists and engineers such as [30]. We just want here to recall
some notations and some results from convergence theory. We have detailed the proof of the
”finite-time Girsanov theorem” since it is a crucial result for the paper.
6.1 Elementary Notations
The classical and shortest point of view for considering random phenomena from the 19th
century is to consider a random variable x in a space E via its probability law on that space. All
the moments can be computed by integration over the probability law of the random variable.
For instance, if µ is the probability law of the real random variable x, one has
Z
xdµ(x)
E(x) =
E
E(x2 ) =
Z
x2 dµ(x)
E
and more generally for any bounded continuous function φ of x
Z
φ(x)dµ(x)
E[φ(x)] =
E
where E is the mathematical expectation operator. The expectation of any random variable is
a vector in a topological vector space F . The mathematical expectation operator is linear.
Moreover, for a random vector x ∈ IRd the expectation E(x) ∈ IRd is defined by
Z
xi dµ(x)
∀i ∈ {1, ..., n}, {E(x)}i = E(xi ) =
IRd
and the symmetric (d, d)−covariance matrix is given by
Cov(x)ij = E(xi xj ) − E(xi )E(xj )
where µ is the probability law of x.
Actually, this point of view cannot be used when we are obliged to consider an infinite set of
random variables or when we want to operate a variable change. Hence, we are obliged to adopt
a more general point of view which was initiated by Kolmogorov in 1933. This approach relies
basically upon the consideration of a very large state space Ω which describes all the possible
outcomes or states of the world. Then a rich family A of subsets of Ω is defined such that all
the random events of interest are belonging to A. Eventually a probability measure is defined
on A which associates to any random event A ∈ A its probability P (A). The triple (Ω, A, P )
is called a probability space.
Later on, we shall have to work on infinite-dimensional space. So let us fix a general framework
Definition 6 A Polish space F is a metric complete (every Cauchy sequence converges) and
separable (there is a countable dense subset) space. The σ−algebra B of Borel subsets of A Polish
space F is the smallest σ−algebra that contains the open sets. Given a probability measure µ on
the Borel subsets of F
R it is possible to integrate any bounded continuous function φ on F and
the integral is noted F φ(ξ)dµ(ξ). The integral may be extended to a wider class of functions.
These functions are called integrable with respect to µ.
Will be inserted by the editor
27
In that new framework let us define random variables in F .
Definition 7 Let (Ω, A, P ) be a probability space and (F , B) a Polish space endowed with its
Borel σ−algebra. A random variable x ∈ F is a state function from Ω into F such that for any
open set B in F , the subset of Ω defined by
(x ∈ B) = {ω ∈ Ω such that x(ω) ∈ B}
belongs to A so its probability P (x ∈ B) is well defined.
The probability law of a random variable x ∈ F is the probability law on F which associates
to any Borel subset B ⊂ F the probability P (x ∈ B).
The probability law of x is noted x.P or Px . This definition stands for also for general
measure than probability laws such as volume measures. More generally, we have
Definition 8 Let (Ω, A, P ) be a measure space and x a mapping from Ω to F such that
∀B ∈ B, (x ∈ B) = {ω ∈ Ω such that = x(ω) ∈ B} ∈ A
Then we define a measure on (F , B) that is noted x.P or Px by
B ∈ B → x.P (B) = Px (B) = P (x ∈ B)
This measure is called the image of the measure P by the mapping x
This definition is completed by the following transfer theorem which shows that the mathematical expectation can be computed on the state space Ω or on the value space F .
Theorem 14 For any function φ defined on F and integrable for the probability law Px we
have
Z
Z
φ(ξ)dPx (ξ)
φ[x(ω)]dP (ω) =
E[φ(x)] =
F
Ω
The transfer theorem is very useful in theory and in practice. It allows to define the mathematical expectation of a random variable without any ambiguity.
Kolmogorov’s framework allows to define independent random variables by the equivalent
following properties
Definition 9 For i ∈ {1, ..., n} let xi ∈ Fi be random variables, they are said independent if
the law Px of the random variable x = (x1 , ..., xn ) ∈ F1 × ... × Fn is the product of the Pxi
which is expressed in the following equivalent properties
P (x ∈ B1 × ... × Bn ) = Px1 (B1 )...Pxn (Bn )
E[φ1 (x1 )...φn (xn )] = E[φ1 (x1 )]...E[φn (xn )]
6.2 Density and Gaussian random vectors
Definition
10 Let (Ω, A, m) a measure space and h an integrable positive function on Ω such
R
that Ω h(ω)dm(ω) = 1. Then we can define a probability measure Q on (Ω, A) by
Z
Q(A) =
1A (ω)h(ω)dm(ω)
Ω
Q is said absolutely continuous with respect to m, h is called the density of Q with respect to
m and we can compute the integral for Q by using the formula
Z
Z
φ(ω)h(ω)dm(ω)
φ(ω)dQ(ω) =
Ω
We write
dQ
dm (ω)
Ω
= h(ω) or dQ((ω) = h(ω)dm(ω)
28
Will be inserted by the editor
Of course, the density functions are commonly used in elementary probability. An important
class of probability measures is the Gaussian probability family.
Definition 11 Let a ∈ IR and σ 2 ∈ IR+ . The Gaussian probability measure γ = N (a, σ 2 ) is
defined by its density with respect to the Lebesgue measure λ on IR, which is
(ξ − m)2
1
dγ
exp −
(ξ) = √
dλ
2σ 2
2πσ 2
Similarly, this definition can be extended to d-dimensional vector space and even to infinitedimensional Hilbert space. Here, we just need the following
Definition 12 Let θ ∈ IRd and K be a d × d symmetric positive matrix, then there exists one
and one only probability measure on IRd , which is called the Gaussian probability γ = N (θ, K)
such that if γ is the probability law of the random vector x ∈ IRn then ∀u ∈ IRd , the law of the
random variable ut x12 is N (ut θ, ut Ku).
Proposition 15 Let x be a random vector with regular Gaussian probability γ = N (θ, K) then
we have
R
E(x) = ξdγ(ξ) = θ
Cov(x) = E(xxt ) − E(x)E(x)t = K
So a Gaussian law is completely determined by its expectation and its covariance matrix.
Definition 13 With the previous notations, if K is invertible, γ is said to be regular and the
density of γ with respect to the Lebesgue measure λ is
dγ
1
(ξ − m)t K −1 (ξ − m)
(57)
(ξ) = p
exp −
dλ
2
(2π)n Det(K)
A common property of the Gaussian family is its stability by linear transforms and translation.
More precisely, we have
Proposition 16 Let x a Gaussian random vector which takes its value in the vector space E
and Λ a linear mapping of E into F . Then y = Λx is a Gaussian random vector in F and
E(y) = ΛE(x)
(58)
Cov(y) = ΛCov(x)Λt
Proposition 17 Let x a Gaussian random vector which takes its value in the vector space E
and a ∈ E. Then y = x + a is a Gaussian random vector in F and
E(y) = E(x) + a
(59)
Cov(y) = Cov(x)
Corollary 18 Let x be a random vector with regular Gaussian probability γ = N (θ, K) and let
a ∈ IRd , then the law γa of x + a is the regular Gaussian law N (θ + a, K) and its density with
respect to γ can be written as follows
1
dγa
(ξ) = exp at K −1 (ξ − θ) − at K −1 a
(60)
dγ
2
PROOF : The formula is checked using an easy and straightforward computation from the expression of the Gaussian density
It is interesting to note that it is possible to define Gaussian probability on an infinitedimensional vector space though it is not possible to define Lebesgue measure. However, in
that paper we just use finite-dimensional Gaussian probabilities. An interesting property of the
Gaussian measure, which is crucial in this paper, is the following finite-dimensional version of
the Girsanov theorem[23].
12
ut is the transpose of column vector u, so ut x is the scalar product of vectors u and x
Will be inserted by the editor
29
Theorem 19 Let m0 a probability measure on IRd and let N (α, K) be a Gaussian regular
probability on IRd . Let T a positive integer and ET = (IRd ){0,...,T } the space of finite time
trajectories in IRd . Let w a Gaussian random vector in ET with law m0 ⊗ N (α, K)T . Let φ
and ψ two measurable applications of IRd into IRd . Then we define the random vectors x and y
in E by
x0 = w0
x(t + 1) = φ[x(t)] + w(t + 1)
y0 = w0
y(t + 1) = ψ[y(t)] + w(t + 1)
Let P and Q be the respective probability laws on E of x and y, then Q is absolutely continuous with respect to P and we have
T
−1
X
dQ
− 21 {ψ[(η(t)] − φ[(η(t)]}t K −1 {ψ[(η(t)] − φ[(η(t)]}
(η) = exp
+{ψ[(η(t)] − φ[(η(t)]}t K −1 {η(t + 1) − α − φ[η(t)]}
dP
t=0
(61)
PROOF : The proof is a recursion on T . It is easy to check (61) for T = 1. To reduce the
expression let us write down
y0T = (y(0), ..., y(T )), η0T = (η(0), ..., η(T ))
and
ΘT (η0T )
=
T
−1
X
t=0
− 12 {ψ[(η(t)] − φ[(η(t)]}t K −1 {ψ[(η(t)] − φ[(η(t)]}
+{ψ[(η(t)] − φ[(η(t)]}t K −1 {η(t + 1) − α − φ[η(t)]}
Suppose (61) is true up to T and let us compute the density of y up to T + 1. Let h be a
bounded continuous test function defined on Et+1 . We have by conditioning with respect to y0T
Z
E h(y(T + 1), y0T ) = E h(w(T + 1) + ψ[η(T )], η0T ) dQ(η0T )
where the expectation is taken with respect to w(T + 1), which is independent from y0T . Let us
explicit the Gaussian law N (α, K) and use the recursion hypothesis:
T
E h(y(T
R R + 1), y0 ) =
h(ω + ψ[η(T )], η0T ) exp − 21 (ω − α)t K −1 (ω − α) exp ΘT (η0T )dωdP (η0T )
CK
where CK is the classic normalization constant for the Gaussian law. Then let us perform the
translation ̟ = ω + ψ[η(T )], it gives
T
E h(y(T
R R + 1), y0T ) = 1
h(̟, η0 ) exp − 2 (̟ − α − ψ[η(T )])t K −1 (̟ − α − ψ[η(T )]) exp ΘT (η0T )d̟dP (η0T )
CK
To simplify notations let us write down ζT = ψ[η(T )] − φ[η(T )], we have
T
E h(y(T
R R + 1), y0T ) = 1
h(̟, η0 ) exp − 2 (̟ − α − φ[η(T )] + ζT )t K −1 (̟ − α − φ[η(T )] + ζT ) exp ΘT (η0T )d̟dP (η0T )
CK
Let us develop the quadratic form in the exponential
− 12 (̟ − α − φ[η(T )] + ζT )t K −1 (̟ − α − φ[η(T )] + ζT )
= − 21 (̟ − α − φ[η(T )])t K −1 (̟ − α − φ[η(T )]) − 21 ζTt K −1 ζT + ζTt K −1 (̟ − α − φ[η(T )])
30
Will be inserted by the editor
So we have
exp −21 (̟ − α − φ[η(T )] + ζT )t K −1 (̟ − α − φ[η(T )] +ζT )
= exp 21 (̟ − α − φ[η(T )])t K −1 (̟ − α − φ[η(T )]) exp − 12 ζTt K −1 ζT + ζTt K −1 (̟ − α − φ[η(T )])
T
We obtain a product of two exponentials. The first one combines itself with CK d̟dP (ηO
) to
T +1
T +1
T
give dP (ηO ); the second one combines itself with exp ΘT (η0 ) to give exp ΘT +1 (η0 ). So we
get eventually
Z
E h(y0T +1 ) = h(η0T +1 ) exp ΘT +1 (η0T +1 )dP (η0T +1 )
6.3 Convergence of random variables
The definition of probability is based upon the law of large numbers (LLN). This last result
may be roughly formulated as follows:
13
When (xn ) is an independent
sequence ofR random variables with the same probability law p
R
with two
first
moments
c
=
xdp(x)
and k = x2 dp(x) then the sequence of empirical averages
Pn
x
k
converges towards c.
xn = k=1
n
This statement is not precise. The convergence may have several senses. Some useful convergence concepts in probability theory are the convergence in law, the convergence in probability
and the almost sure convergence. Let us recall their definition.
Definition 14 Let (xn ) and x be random variables on a probability=20 space (Ω, A, P ). The
sequence of random variables (xn ) is said to
– converge in law to x if and only if for any continuous bounded function h, E[h(xN )] →
E[h(x)] 14
– converge in probability to x if and only if
∀ǫ > 0, P (| xn − x |≥ ǫ) → 0
– converge almost surely to x if and only if
∃N ⊂ Ω with P (N ) = 0 such that ∀ω ∈
/ N, xn (ω) → x(ω)
These definitions are stronger and stronger. Almost sure convergence implies convergence in
probability which implies in turn convergence in law. Most mean-field computations of meanfield equations in random neural networks use the convergence of Fourier transforms through
a Laplace limit integral ensuring convergence in law.
However, from the point of view of practitioners, almost sure convergence is more pleasant
because a single realization of the sequence (Xn ) allows to check the convergence. To check the
weaker convergence statements, a lot of realizations of the sequence are necessary.
Let us return to the law of large numbers. The convergence in probability of the sequence
2
(xn ) is specially easy to show since E(xn ) = c and Var(xn ) = k−c
n . Then one has just to write
the Bienaymé-Tchebychev inequality
P (| xn − c |≥ ǫ) ≤
k − c2
nǫ2
But this convergence is not strong enough to show the almost sure convergence (the so-called
strong large number law).
13
Such a sequence is called an i.i.d. sequence
An equivalent condition is the convergence of their characteristic functions (or Fourier transforms):
∀t ∈ IR, E(exp(itxn )) → E(exp(itx))
14
Will be inserted by the editor
31
6.4 Large deviation principle
6.4.1 Cramer’s theorem
One way to obtain the strong law is to show that the convergence in probability occurs much
faster than it appears from Bienaymé-Tchebychev inequality.
Actually the following theorem was obtained by Cramer in the late 30’s:
RTheorem 20 Let (xn ) sequence of i.i.d. random variables with probability law µ such that
ξdµ(ξ) = θ. Then we have
∀a > ξ,
1
log P (xn > a) → −I(a)
n
where
I(a) = max pa − E[exp(px)]
p∈IR
This theorem can be extended to more general settings. It is the subject of large deviation
theory. Let us first consider the case of finite-dimensional random vectors [29]
The following proposition is easy to prove:
Proposition 21 Let µ a probability law on IRd such that for all p ∈ IRd , Λ(p) = log E[exp(pt x)]
exists. The function Λ is called the log-generating function of µ. We define its Legendre transform Λ∗ on IRd as follows:
Λ∗ (a) = sup [(pt a) − Λ(p)]
p∈IRd
then
a) Λ∗ is a convex function (with ∞ as a possible value)
b) ∀a ∈RIRd , Λ∗ (a) ≥ 0
c) a = ξdµ(ξ) ⇔ Λ∗ (a) = 0
PROOF : a) is straightforward, since the supremum of convex functions is convex
b) comes from Jensen’s inequality.
c) comes from Λ(0) = 1
Then we can state the Cramer’s theorem for i.i.d. sequence of finite-dimensional random
vectors:
Theorem 22 Cramer’s theorem:
Let (xn ) be a sequence of i.i.d. random vectors with a probability distribution µ according to
the assumption and the notations of the previous proposition. Then for any Borel subset B of
IRd , we have
− inf o Λ∗ (a) ≤
a∈B
1
1
limn→∞ log[IP(xn ∈ B o )] ≤ limn→∞ log[IP(xn ∈ B)] ≤ − inf Λ∗ (a) (62)
n
n
a∈B
where B o is the interior set of B (the greatest open subset of B) and B is the closure of B (the
smallest closed extension of B).
A consequence of Cramer’s theorem is that for any closed subset F in IRd such that
inf a∈B Λ∗ (a) > 0, IP(X n ∈ F ) goes to 0 exponentially fast when n → ∞ and that the rate
of convergence depends only on the value of Λ∗ at the point of F where Λ∗ reaches its minimum. This point is called the dominating point. For regular probability distributions where
Λ∗ is strictly convex, defined and continuous around θ = E(x), the exponential decay of finite
deviations from the expectation (large deviations) and the strong law of large numbers are easy
consequences.
32
Will be inserted by the editor
6.4.2 Large deviation principle in an abstract setting
The convergence with an exponential rate is a general situation, which is characterized in the
following general definitions:
Definition 15 Let E be a Polish space and I be a lower semi-continuous function of E into
[0, ∞]. I is called a rate function. If I possesses the property of compact level set, i.e.
∀ǫ > 0, {x ∈ E such that I(x) ≤ ǫ} is compact
then I is called a good rate function.
Definition 16 Given a rate function I on a Polish space F and a sequence of probability
measures Qn on F , if for any Borel subset B of F ,
– (Qn ) satisfies the large deviation minoration on open sets if
∀O, open set in F , − inf I(ξ) ≤
ξ∈O
1
limn→∞ log[Qn (O)]
n
(63)
– (Qn ) satisfies the large deviation majoration on compact sets if
1
∀K, compact set in F , limn→∞ log[Qn (K))] ≤ − inf I(x)
ξ∈K
n
(64)
– (Qn ) satisfies the large deviation majoration on closed sets if
1
∀C, closed set in F , limn→∞ log[Qn (C))] ≤ − inf I(x)
ξ∈C
n
(65)
– If (Qn ) checks the large deviation minoration for open sets and the large deviation majoration
for compact sets we say that (Qn ) satisfies the large deviation principle (LDP) with rate
function I.
– If (Qn ) checks the large deviation minoration for open sets and the large deviation majoration
for closed sets we say that (Qn ) satisfies the full large deviation principle with rate function
I.
– (Qn ) is said tight if for all ǫ > 0, it exists a compact subset K of F such that Qn (c K) < ǫ.
If (Qn ) is tight and checks a LDP, it satisfies the full LDP for the same rate function.
The same definitions stand for a sequence of random elements in F if the sequence of their
probability laws checks the respective majorations.
A simpler way to state that (Qn ) satisfy the full large deviation principle with rate function
I is to write that
− inf o I(ξ) ≤
ξ∈B
1
1
limn→∞ log[Qn (B)] ≤ limn→∞ log[Qn (B))] ≤ − inf I(x)
n
n
ξ∈B
(66)
Actually, the scope of Cramer’s theorem may be widely extended and a full large deviation
principle is checked for the empirical mean of any i.i.d. random sequence in a Polish space under
mild assumptions on the existence of the log-generating function [18]. The rate function of this
LDP is the Legendre transform of the log-generating function.
6.4.3 Varadhan theorem and Laplace principle
An equivalent functional formulation of the full large deviation principle is due to Varadhan
and is called by Dupuis and Ellis the Laplace principle ([20]).
Will be inserted by the editor
33
Definition 17 Let I be a good rate function on the Polish space F . The random sequence (xn )
in F is said to satisfy the Laplace principle with rate function I if for any continuous bounded
function h on E we have
1
log E{exp[−nh(xn )]} = − inf {h(ξ) + I(ξ)}
n→∞ n
ξ∈F
lim
This approach is called by the authors of ([20]) the weak convergence approach to the
theory of large deviations. The equivalence of the two approaches (Laplace principle for good
rate functions and full large deviation principle with good rate functions) are expressed in a
theorem of Varadhan and its converse. Their proofs are in ([20]). Handling continuous bounded
test functions may be more practical than dealing with open and closed sets. In particular,
it is very easy to show the following transfer theorem for the LDP principle when the law is
changed.
Theorem 23 Let Pn and Qn two sequences of probability measures on the Polish space F , let
I be a good rate function on F and let Γ a continuous function on F such that
n
(a) Qn << Pn and dQ
dPn (ξ) = exp nΓ (ξ),
(b) (Pn ) satisfies a full large deviation principle with rate function I,
(c) I − Γ is a good rate function,
then (Qn ) satisfies a full large deviation principle with rate function I − Γ .
PROOF OF THEOREM: Using the weak large deviation approach and the strong hypothesis of
the theorem, the proof is quite formal. Let h be any continuous bounded test function on F ,
from hypothesis (c) and
6.5 Convergence of random measures
Let us have a second look at the law of
Plarge numbers. Since this law claims the convergence on
the sequence of empirical averages n1 nk=1 f (xk ) over any bounded continuous test function f
we are lead to consider the empirical measure of a sample.
Definition 18 Let ξ = (ξ1 , ..., ξn ) ∈ IRnd a sequence of n vectors of=20 IRd . We associate to ξ
the following probability measure µξ ∈ P(IRd )
n
µξ =
1X
δξk
n
k=1
µx i is called the empirical measure associated to ξ.
This definition says that if A is a Borel subset of F then µN (x)(A) is the fraction of neurons
which state trajectory belong to A. More practically, if φ is any test continuous function on E,
it says that
Z
N
1 X
φ(ui )
φ(η)dµN (u)(η) =
N i=1
E
P
WithRthis definition, the convergence for each continuous bounded test function f of n1 nk=1 f (xk )
towards f (ξ)dµ(ξ) is exactly the narrow convergence of the sequence µxn towards µ.
The set P(IRd ) of probability measure on IRd is a convex subset of the functional vector
space M1 (IRd ) of bounded measures on IRd . We endow P(IRd ) with the narrow topology
forRwhich
R
µn → µ if and only if for all continuous and bounded test function f ∈ C b (IRd ), f dµn → f dµ.
P(IRd ) is a Polish state for this topology.
34
Will be inserted by the editor
So instead of considering the random variable x which takes its values in IRd , we consider the
random variable δξ which takes its values in the Polish space P(IRd ). If (xk ) is an i.i.d. sequence
in IRd with probability law µ, then δxi is an i.i.d.=20 sequence in P(IRd ) and its empirical
mean is just µ(x1 ,...,xn) the empirical measure of an i.i.d. sample of size n. That means that=20
Cramer’s theorem extension to Polish spaces may be applied. This theorem is known as Sanov
theorem.
Let us first recall the definition of the relative entropy with respect to a probability measure
µ on IRd .
Definition 19 Let µ be a probability measure on IRd . We define a convex function ν ∈ P(IRd ) →
I(ν, µ) ∈ IR by:
R
dν
(ξ)dν(ξ)
I(ν, µ) = log dµ
I(ν, µ) = ∞ else
(67)
This function is called the relative entropy with respect to µ
then we may state the Sanov theorem [21], [18]
Theorem 24 The sequence of empirical measure µn which are associated to size n i.i.d. sample
of a probability law on IRd satisfy a full LDP with the relative entropy with respect to µ as the
rate function.
Will be inserted by the editor
35
References
1. S. Amari. Characteristics of random nets of analog-like elements. IEEE Trans. Syst. Man. Cyb.,
SMC-2(N5), 1972.
2. S. Amari, K. Yosida, and K.I. Kakutani. A mathematical foundation for statistical neurodynamics.
Siam J. Appl. Math., 33(1):95–126, 1977.
3. D.J. Amit and J. Brunel. Dynamics of a recurrent network of spiking neurons before and following
learning. Network: Computation in Neural Systems, 8:373–404, 1997.
4. D.J. Amit and J. Brunel. Model of global spontaneous activity and local structured delay activity
during delay periods in the cerebral cortex. Cerebral Cortex, 7:237–252, 1997.
5. De Almeida A.M.O. and Thouless D.J. Stability of the sherrington-kirkpatrick solution of a spin
glass model. Phys. A., 11(5):983–990, 1978.
6. Cessac B. Occurence of chaos and at line in random neural networks. Europhys. Let.,, 26 (8):577–
582, 1994.
7. G. Ben Arous and A. Guionnet. Large deviations for langevin spin glass dynamics. Probability
Theory and Related Fields, 102:455–509, 1995.
8. N. Brunel. Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons.
Journal of Computational Neuroscience, 8:183–208, 2000.
9. N. Brunel and V. Hakim. Fast global oscillations in networks of integrate-and-fire neurons with
low firing rates. Neural Computation, 11:1621–1671, 1999.
10. G. Buzsaki and J.J. Chrobak. Temporal structure in spatially organized neuronal ensembles. a role
for interneuronal networks. Current Opinion in Neurobiology, 5:504, 1995.
11. B. Cessac. Increase in complexity in random neural networks. Journal de Physique I, 5:409–432,
1995.
12. B. Cessac, B. Doyon, M. Quoy, and M. Samuelides. Mean-field equations, bifurcation map and
route to chaos in discrete time neural networks. Physica D, 74:24–44, 1994.
13. A. Crisanti, HJ. Sommers, and H. Sompolinsky. chaos in neural networks : chaotic solutions, 1990.
14. A. Crisanti and H. Sompolinsky. Dynamics of spin systems with randomly asymmetric bounds :
Ising spins and glauber dynamics. Phys. Review A, 37(12):4865, 1987.
15. A. Crisanti and H. Sompolinsky. Dynamics of spin systems with randomly asymmetric bounds :
Langevin dynamics and spherical models. Phys. Review A, 36(10), 1987.
16. E. Daucé, O. Moynot, 0.Pinaud, and M. Samuelides. Mean field theory and synchronization in
random recurrent neural networks. Neural Processing Letters, 14:115–126, 2001.
17. E. Daucé, M. Quoy, B. Cessac, B. Doyon, and M. Samuelides. Self-organization and dynamics
reduction in recurrent networks: stimulus presentation and learning. Neural Networks, 11:521–533,
1998.
18. A. Dembo and O. Zeitouni. Large Deviations techniques. Jones & Bartlett publishers, 1993.
19. B. Derrida and Y. Pommeau. Europhys. lett., 1:45–59, 1986.
20. P. Dupuis and R.S. Ellis. A weak convergence approach to the theory of large deviations. Wiley,
1997.
21. Richard S. Ellis. Entropy, Large Deviations and Statistical Mechanics. Springer Verlag, 1985.
22. W. Gerstner and W.M. Kistler. Spiking neuron models, single neurons, populations, plasticity.
Pure and Applied Mathematics Series. Cambridge University Press, 2002.
23. I.V. Girsanov. On transforming a certain class of stochastic processes by absolutely continuous substitution of measures.(russian, english). Theor. Probab. Appl. 5 (1960), 285-301 (1962); translation
from Teor. Veroyatn. Primen. 5, 314-330, (1960).
24. Gray. Synchronous oscillations in neuronal systems: Mechanisms and functions. Journal of computational Neuroscience, 7:334–338, 1994.
25. Skorhokod A. Guikhman I. Introduction à la théorie des processus aléatoires (French). Editions
Mir, Moscou, 1980.
26. A. Guillot and E.Daucé. Approches dynamiques de la cognition artificielle. Hermès, Paris, 2002.
27. D. Hansel and H. Sompolinsky. Chaos and synchrony in a model of a hypercolumn in visual cortex.
J. Comp. Neurosc., 3:7–34, 1996.
28. J. J. Hopfield. Neural networks and physical systems with emergent collective computational
abilities. Proc. Nat. Acad. Sci., 79:2554–2558, 1982.
29. O. Kallenberg. Foundations of Modern Probability. Probabilities and its applications. SpringerVerlag, 2002.
30. N.G. Van Kampen. Stochastic processes in Physics and Chemistry. North Holland, 1992.
36
Will be inserted by the editor
31. Molgedey L., Schuchardt J., and Schuster H.G. Supressing chaos in neural networks by noise.
Physical Review Letters, 69(26):3717–3719, 1992.
32. W. S. McCulloch and W. Pitts. A logical calculus of ideas immanent in nervous activity. Bulletin
of Mathematical Biophysics, 5:115–133, 1943.
33. O. Moynot. Etude mathématique de la dynamique des réseaux aléatoires récurrents. PhD thesis,
Université Paul Sabatier, Toulouse, 1999.
34. O.Moynot and M.Samuelides. Large deviations and mean-field theory for asymetric random recurrent neural networks. Probability Theory and Related Fields, 123(1):41–75, 2002.
35. R. Ritz and T.J. Sejnowski. Synchronous oscillatory activity in sensory systems: new vistas on
mechanisms. Current Opinion in Neurobiology, 7(4):536–546, 1997.
36. W. Singer and C. M. Gray. Visual feature integration and the temporal correlation hypothesis.
Annual review in Neuroscience, 5, 1995.
37. C.A. Skarda and W.J. Freeman. Chaos and the new science of the brain, volume 1-2, pages 275–285.
In ”Concepts in Neuroscience”, World Scientfic Publishing Company, 1990.
38. H. Sompolinsky, A. Crisanti, and H.J. Sommers. Chaos in random neural networks. Phys. Rev.
Lett., 61:259–262, 1988.
39. Zippelius A. Sompolinsky H. Relaxationnal dynamics of the edwards-anderson model and the
mean-field theory of spin-glasses. Phys. Rev. B, 1982.
40. H. Soula, G. Beslon, and O. Mazet. Spontaneous dynamics of assymmetric random recurrent
spiking neural networks. Neural Computation 18:1, 2006.
41. A.S. Sznitman. Non linear reflecting diffusion process and the propagation of chaos and fluctuations
associated. Journal of Functional Analysis, 56(3):311–336, 1984.