Multi-Agent Complex Systems and Many-Body Physics: C EDP Sciences
Multi-Agent Complex Systems and Many-Body Physics: C EDP Sciences
Multi-Agent Complex Systems and Many-Body Physics: C EDP Sciences
Multi-agent simulations are currently being used to study the dynamical behavior within
a wide variety of Complex Systems [1]. Within these simulations, N decision-making particles
or agents (e.g. commuters, traders, computer programs [2–4]) repeatedly compete with each
other for some limited global resource (e.g. road space, best buy/sell price, processing time)
using sets of rules which may differ between agents and may change in time. The population
is therefore competitive, heterogeneous and adaptive. A simple version of such a scenario,
which has generated more than one hundred papers since 1998, is the Minority Game (MG)
of Challet and Zhang [2–9].
Here we show that the time-averaged dynamics of such multi-agent systems – in particular,
their n-point correlation functions – can be interpreted as a generalization of conventional
many-body physics [10]. We also show that these correlation functions can be evaluated
accurately if one regroups the agents into clusters of like particles (i.e. crowds) and their anti-
correlated mirror-images (i.e. anticrowds). When applied to the MG, this approach yields a
set of analytic results which are in excellent agreement with the numerical findings of Savit et
al. (see Fig. 1) [6]. Although there have been many other MG theories proposed to date [5],
none of these provides an analytic description of this Savit-curve [6] (Fig. 1) over the entire
parameter space.
Figure 2 shows a generic setup of such a multi-agent system. The Minority Game represents
a specific example of this setup, in which the identity of the minority group provides the global
outcome – however, what follows does not depend specifically on such a minority rule. There
are N agents (e.g. commuters) who repeatedly decide between two actions at each timestep t
(e.g. +1/ − 1 ≡ take route A/B) using their individual S strategies. The agents have access to
some common information µ(t). For example, Fig. 2 shows µ(t) to be the past few outcomes
c EDP Sciences
2 EUROPHYSICS LETTERS
180 N = 501, S = 2
N = 301, S = 2
160 N = 101, S = 2
D1/2
2
underestimate
140 D1/2
2
overestimate
120
1/2
100
2
D
80
60
40
20
0
2 8 32 128 512 2048 8192
Rmax
1/2
Fig. 1 – Results for the standard deviation of fluctuations D2 in the Minority Game. Numerical
results correspond to 20 different runs at each N and Rmax . The theoretical curves are generated
using the analytic expressions in the text. The shaded area bounded by the upper and lower curves
shows our theoretical prediction of the numerical spread for a given N . In line with the original
numerical results of Ref. [6], we have chosen successive Rmax tick-values to increase by a factor of 4.
Fig. 2 – General setup: At timestep t, each agent decides between action −1 and action +1 based on
the predictions of the S strategies that he possesses. N−1 [t] agents choose −1, and N+1 [t] choose +1.
A global outcome 0 or 1 is assigned depending on the rules of the game (e.g. minority group wins).
Strategies are rewarded/penalized one virtual point according to whether their predicted action would
have been a winning/losing action.
Neil F. Johnson, David M.D. Smith and Pak Ming Hui: Multi-Agent Complex Systems and Many-Body Physics3
Fig. 3 – Left: strategy space for m = 2, together with some example strategies. The reduced strategy
space contains 2.2m = 2P strategies with specific correlations (i.e. fully correlated, uncorrelated or
anti-correlated [4, 5]). Right: strategy allocation tensor Ψ in the case of S = 2 strategies per agent.
Each square shows the number of agents who were assigned strategy R and then R′ .
of the system where 0 or 1 could represent route A or B being the least crowded, respectively.
Hence µ(t) is, in the setup of Fig. 2, a binary string of outcomes. Each strategy, labelled R,
µ(t)
comprises a particular action aR = ±1 for each µ(t) ∈ {µ(t)}, and the set of all possible
strategies constitutes a strategy space Θ ≡ {R}. Figure 3 (left) gives an example of such a
strategy space in the case that each agent has information µ(t) corresponding to the most
recent m = 2 outcomes. Each agent has some subset S of these strategies, hence the strategy
allocation among agents can be described in terms of a rank-S tensor Ψ [9] where each entry
gives the number of agents holding a particular combination of S strategies. Figure 3 (right)
gives an example of this strategy allocation matrix Ψ for S = 2. We assume Ψ to be constant
over the timescale for which time-averages are taken. A single Ψ ‘macrostate’ corresponds
to many possible ‘microstates’ describing the specific partitions of strategies among the N
agents [9]. To allow for large strategy spaces and large sets of global information, we can
convert the strategies {R} to their decimal equivalents and label them from R = 1 → Rmax ,
where Rmax = 2.2m , in order of increasing magnitude. Hence for the example strategies in
Fig. 3, we can relabel strategy [−1 − 1 − 1 − 1] as R = 1, [−1 − 1 + 1 + 1] as R = 4 and
[+1 + 1 + 1 + 1] as R = Rmax = 8. Hence R mimics a ‘coordinate’ in that if an agent is using
a particular strategy R, the agent can be thought of as sitting on a line at coordinate R. For
large strategy spaces, there will be many possible R-values – for example, m ≥ 3 means that
there are 2.2m ≥ 16 strategies in the reduced strategy space of Fig. 3. The line of ordered
strategies, and hence allowed R-values, will therefore appear quite dense. Given this, we will
take the liberty of writing down sums over R as integrals over R in the rest of this paper.
We denote the number of agents choosing −1 (e.g. take route A, or buy a given stock)
and +1 (e.g. take route B, or sell a given stock) as N−1 (t) and N+1 (t) respectively. Many
macroscopic quantities of interest can be written in terms of N−1 (t) and N+1 (t). For example,
4 EUROPHYSICS LETTERS
the excess traffic on route A as compared to route B, or the excess number of buyers over sellers
of a given stock and hence its price-change at time t, can be written as D(t) = N+1 (t)−N−1 (t).
Hence we will focus here on D(t), which is given by:
Z Rmax
µ(t) S(t);Ψ
D(t) ≡ dR aR nR , (1)
R=1
S(t);Ψ
where nRis the number of agents using strategy R at time t, and S(t) is the current score-
vector denoting the past performance of each strategy [9]. In particular, the element SR (t)
represents the net number of successful predictions of strategy R up to time t. Therefore,
the combination of S(t), Ψ and the game rule (e.g. use strategy with best or second-best
S(t);Ψ µ(t)
performance to date) defines nR . The action aR = ±1 is determined uniquely by µ(t).
In conventional many-body physics, we are typically interested in the following statisti-
cal properties of macroscopic measurable quantities such as D(t): (i) the moments of the
probability distribution function (PDF) of D(t) (e.g. mean, variance, kurtosis) and (ii) the
correlation functions that are products of D(t) at different times t1 = t, t2 = t + τ , t3 = t + τ ′
etc. (e.g. autocorrelation). Numerical multi-agent simulations typically average over time
t and then over configurations {Ψ}. A general expression to generate all such functions, is
therefore
(τ,τ ′ ,τ ′′ ,...)
DP ≡ hD(t1 )D(t2 ) . . . D(tP )it Ψ
(2)
Z Z Rmax
µ(t ) µ(t ) µ(t ) S(t );Ψ S(t2 );Ψ S(t );Ψ
= ... dR1 dR2 . . . dRP aR1 1 aR2 2 . . . aRP P nR1 1 nR2 . . . nRP P
1 t Ψ
Z Z Rmax DD EE
S(t );Ψ S(t2 );Ψ S(t );Ψ
≡ ... dR1 dR2 . . . dRP V (P ) (R1 , R2 , . . . , RP ; t1 , t2 , . . . , tP )nR1 1 nR2 . . . nRP P
1 t Ψ
where
µ(t ) µ(t ) µ(t )
V (P ) (R1 , R2 . . . , RP ; t1 , t2 . . . , tP ) ≡ aR1 1 aR2 2 . . . aRP P (3)
resembles a time-dependent, non-translationally invariant, p-body interaction potential in R ≡
S(t );Ψ
(R1 , R2 . . . , RP )-space, between p charge-densities {nRi i } of like-minded agents. Note that
S(t );Ψ
each charge-density nRi i now possesses internal degrees of freedom determined by S(t) and
S(t );Ψ
Ψ. Since {nRi i }
are determined by the game’s rules, Eq. (2) can be applied to any multi-
agent game, not just the MG. We focus here on moments of the PDF of D(t) where {ti } ≡ t
and hence {τ } = 0. Discussion of temporal correlation functions such as the autocorrelation
(τ )
D2 will be reported elsewhere. We consider explicitly the variance D2 to demonstrate the
approach, noting that higher-order moments such as D4 (i.e. kurtosis) which classify the non-
Gaussianity of the PDF, can be treated in a similar way. The potential V (P ) is insensitive to
the configuration-average over {Ψ}, hence the mean is given by [11]:
Z Rmax D D E E
S(t);Ψ
D1 = dR V (1) (R; t) nR . (4)
R=1 Ψ t
If the game’s output is unbiased, the averages yield D1 = 0. This condition is not necessary –
one can simply subtract D12 from the right hand side of the expression for D2 below – however
we will take D1 = 0 for clarity. The variance D2 measures the fluctuations of D(t) about its
average value:
Z Z Rmax D D E E
S(t);Ψ S(t);Ψ
D2 = dRdR′ V (2) (R, R′ ; t) nR nR′ (5)
R,R′ =1 Ψ t
Neil F. Johnson, David M.D. Smith and Pak Ming Hui: Multi-Agent Complex Systems and Many-Body Physics5
µ(t) µ(t)
where V (2) (R, R′ ; t) ≡ aR aR′ acts like a time-dependent, non-translationally invariant,
two-body interaction potential in (R, R′ )-space.
These effective charge-densities and potential fluctuate in time. It is reasonable to assume
S(t);Ψ S(t);Ψ
that the charge densities fluctuate around some mean value, hence nR nR′ = nR nR′ +
S(t);Ψ S(t);Ψ
εRR′ (t) with mean nR nR′ plus a fluctuating term εRR′ (t). This is a good approximation
if we take R to be a popularity-ranking (i.e. the Rth most popular strategy) or a strategy-
S(t);Ψ
performance ranking (i.e. the Rth best-performing strategy) since in these cases nR will
S(t);Ψ
be reasonably constant [4]. For example, taking R as a popularity-ranking implies nR=1 ≥
S(t);Ψ S(t);Ψ
nR=2 ≥ nR=3 ≥ . . ., thereby constraining the magnitude of the fluctuations in the charge-
S(t);Ψ
density nR . Hence
Z Z Rmax D D E E
S(t);Ψ
D2 = dRdR′ V (2) (R, R′ ; t) nR nR′ + εRR′ (t) . (6)
R,R′ =1 Ψ t
S(t);Ψ
We will assume that εRR′ (t) averages out to zero. In the presence of network connections
S(t);Ψ
between agents, there can be strong correlations between these noise terms εRR′ (t) and
(2) ′
the time-dependence of V (R, R ; t), implying that the averaging over t should be carried
out timestep-by-timestep [12]. For MG-like games without connections, the agents cannot
suddenly access larger numbers of strategies and hence these correlations can be ignored.
This gives
Z Z Rmax D E
D2 = dRdR′ V (2) (R, R′ ; t) nR nR′ . (7)
R,R′ =1 t
As in conventional many-body theory, the expectation value in Eq. (7) can be ‘contracted’
µ(t)
down by making use of the equal-time correlations between {aR }. As is well-known for
MG-like games [2, 5, 8], we can safely work in the so-called reduced strategy space which is
µ(t) µ(t)
constructed such that any pair R and R′ are either (i) fully correlated, i.e. aR = aR′ for all
µ(t) µ(t) µ(t) µ(t)
µ(t); (ii) anti-correlated, i.e. aR = −aR′ for all µ(t); (iii) uncorrelated, i.e. aR = aR′ for
µ(t) µ(t)
half of {µ(t)} while aR = −aR′ for the other half of {µ(t)}. In other words, one can choose
two subsets of the strategy space Θ, i.e. Θ = U ⊕ U, such that the strategies within U are
uncorrelated, the strategies within U are uncorrelated, the anticorrelated strategy of R ∈ U
appears in U , and the anticorrelated strategy of R ∈ U appears in U . We can therefore break
R µmax
up the integrals in Eq. (7): (i) R′ ≡ R (i.e. fully-correlated) hence µmax 1
µ=1
dµaµR aµR′ = 1
µ
1
and V (2) (R, R′ ; t) t = 1; (ii) R′ ≡ R (i.e. anticorrelated) which yields µmax dµaµR aµR′ =
R max
µ=1
−1. If all possible global information values {µ} are visited reasonably equally over a long
time-period, this implies V (2) (R, R′ ; t) t = −1. For the MG, for example, {µ} corresponds
to the m-bit histories which indeed are visited equally for small m. For large m, they are
not visited equally for a given Ψ, but are when averaged over all Ψ. If, by contrast, we
happened to be considering some general non-MG game where the µ’s occur with unequal
probabilities ρµ , even after averaging over all Ψ, one can simply
R µmaxredefine the strategy subsets
1 µ µ
U and U to yield a generalized scalar product, i.e. µmax dµa a ′ ρµ = −1 (or 0 in
′ 1
µ=1R µmax R Rµ µ
case (iii)). (iii) R ⊥ R (i.e. uncorrelated) which yields µmax µ=1 dµaR aR′ = 0 and hence
V (2) (R, R′ ; t) t = 0. Hence
Z Z Rmax D E Z Rmax
D2 = dRdR′ V (2) (R, R′ ; t) nR nR′ = dR (nR nR − nR nR )
R,R′ =1 t R=1
6 EUROPHYSICS LETTERS
Z
= dR (nR nR − nR nR + nR nR − nR nR )
ZR∈U
= dR (nR − nR )2 . (8)
R∈U
Equation (8) must be evaluated together with the condition which guarantees that the total
number of agents N is conserved:
Z Rmax Z
N= dR nR ≡ dR (nR + nR ) . (9)
R=1 R∈U
Equation (8) has a simple interpretation. Since nR and nR have opposite sign, they act like two
charge-densities of opposite charge which tend to cancel each other out: nR represents a Crowd
of like-minded people, while nR corresponds to a like-minded Anticrowd who always do exactly
the opposite of the Crowd regardless of the specific µ(t). We have effectively renormalized
S(t);Ψ S(t);Ψ
the charge-densities nR and nR′ and their time- and position-dependent two-body
µ(t) µ(t)
interaction V (2) (R, R′ ; t) ≡ aR aR′ , to give two identical Crowd-Anticrowd ‘quasiparticles’
of charge-density (nR − nR ) which interact via a time-independent and position-independent
(2)
interaction term Veff ≡ 1. The different types of Crowd-Anticrowd quasiparticle in Eq. (8)
do not interact with each other, i.e. (nR − nR ) does not interact with (nR′ − nR′ ) if R 6=
R′ . Interestingly, this situation could not arise in a conventional physical system containing
collections of just two types of charge (i.e. positive and negative).
A given numerical simulation will employ a given strategy-allocation matrix (i.e. a given
rank-S tensor) Ψ. As Rmax increases from 1 → ∞, Ψ tends to become increasingly disordered
(i.e. increasingly non-uniform) [4, 9] since the ratio of the standard deviation to the mean
1
S
number of agents holding a particular set of S strategies is equal to [(Rmax − 1)/N ] 2 . There
are two regimes: (i) A ‘high-density’ regime where Rmax ≪ N . Here the charge-densities {nR }
tend to be large, non-zero values which monotonically decrease with increasing R. Hence
the set {nR } acts like a smooth function n(R) ≡ {nR }. (ii) A ‘low-density’ regime where
Rmax ≫ N . Here Ψ becomes sparse with each element ΨR,R′ ,R′′ ,... reduced to 0 or 1. The
{nR } should therefore be written as 1’s or 0’s in order to retain the discrete nature of the
agents, and yet also satisfy Eq. (9) [4]. Depending on the particular type of game, moving
between regimes may or may not produce an observable feature. In the MG, for example, D1
does not show an observable feature as Rmax increases – however D2 does [6]. We leave aside
the discussion as to whether this constitutes a true phase-transition [5, 9] and instead discuss
the explicit analytic expressions for D2 which result from Eq. (8). It is easy to show that
the mean number of agents using the Xth most popular strategy (i.e. after averaging over Ψ)
is [4]:
S S
(X − 1) X
nX = N 1− − 1− . (10)
Rmax Rmax
The increasing non-uniformity in Ψ as Rmax increases, means that the popularity-ranking of
R becomes increasingly independent of the popularity-ranking of R. Using Eq. (10) with
S = 2, and averaging over all possible R positions in Eq. (8) to reflect the independence of
the popularity-rankings for R and R, we obtain:
N2
−2 N
D2 = Max 1 − Rmax , N 1− . (11)
3Rmax Rmax
The ‘Max’ operation ensures that as Rmax increases and hence {nR } → 0, 1, Eq. (9) is still
satisfied [4]. Equation (11) underestimates D2 at small Rmax (see Fig. 1) since it assumes that
Neil F. Johnson, David M.D. Smith and Pak Ming Hui: Multi-Agent Complex Systems and Many-Body Physics7
the rankings of R and R are unrelated, thereby overestimating the Crowd-Anticrowd cancel-
lation. By contrast, an overestimate of D2 at small Rmax can be obtained by considering the
opposite limit whereby Ψ is sufficiently uniform that the popularity and strategy-performance
rankings are identical. Hence the strategy with popularity-ranking X in Eq. (10) is anticorre-
lated to the strategy with popularity-ranking Rmax + 1 − X. This leads to a slightly modified
2N 2 −2
first expression in Eq. (11): 3R max
(1 − Rmax ). Figure 1 shows that the resulting analytical
1/2
expressions reproduce the quantitative trends in the standard deviation D2 observed numer-
ically for all N and Rmax , and they describe the wide spread in the numerical data observed
at small Rmax .
An important practical implication of the present paper is that the wide range of cluster-
based approximation schemes developed in conventional many body-theory, might therefore
usefully be extended to capture the dominant correlations in multi-agent competitive popula-
tions. Such generalizations will undoubtedly raise interesting issues for the Physics community,
concerning the precise manner in which time-averages and configuration-averages should be
performed within these traditional approximation schemes.
REFERENCES