Carl von Clausewitz, the Fog-of-War, and the AI Revolution
Carl von Clausewitz, the Fog-of-War, and the AI Revolution
Carl von Clausewitz, the Fog-of-War, and the AI Revolution
Rodrick Wallace
Carl von
Clausewitz, the
Fog-of-War, and
the AI Revolution
The Real World Is
Not A Game Of Go
123
SpringerBriefs in Applied Sciences
and Technology
Computational Intelligence
Series editor
Janusz Kacprzyk, Polish Academy of Sciences, Systems Research Institute,
Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new develop-
ments and advances in the various areas of computational intelligence—quickly and
with a high quality. The intent is to cover the theory, applications, and design
methods of computational intelligence, as embedded in the fields of engineering,
computer science, physics and life sciences, as well as the methodologies behind
them. The series contains monographs, lecture notes and edited volumes in
computational intelligence spanning the areas of neural networks, connectionist
systems, genetic algorithms, evolutionary computation, artificial intelligence,
cellular automata, self-organizing systems, soft computing, fuzzy systems, and
hybrid intelligent systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.
123
Rodrick Wallace
Division of Epidemiology
The New York State Psychiatric Institute
New York, NY
USA
© The Author(s), under exclusive licence to Springer International Publishing AG, part of Springer
Nature 2018
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer International Publishing AG part
of Springer Nature
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Corporate interests and their academic clients now claim that artificial intelligence,
via recent advances in deep learning and related technologies, is ready to take on
management of critical real-time processes ranging from driverless cars on intel-
ligent roads to the conduct of war. In the past, corporate interests have also claimed
that smoking is harmless, environmental contamination unimportant, faulty airbags
are safe, polyvinyl chloride furnishings and finishings in fires are no more dan-
gerous than wood, and made any number of other assertions that, in the long run,
have caused massive human suffering. In many cases, aggressive marketing by
those interests was able to build edifices “too big to fail, too big to jail”, privatizing
profits while socializing costs. Corporate AI advocates for driverless cars and
autonomous weapons stand on the verge of creating such conditions for their
products. Absent intervention, others will follow.
The central thesis of this monograph is that cognitive algorithmic entities tasked
with the real-time management of critical processes under rapidly shifting “road-
way” conditions will face many of the same conundrums and constraints that
confront the conduct of warfare and other forms of conflict. As with conventional
traffic flow, such roadways need not be passive, but may engage or employ entities
having their own purposes, mechanisms, and cognitive abilities. These may range
across biological, individual, social, institutional, machine, and/or hybrid mani-
festations and dynamics, from cancer, murder, and neoliberal capitalism, to Centaur
or autonomous battlespaces.
From the Somme and Passchendaele, to Blitzkrieg madness and Cold War
preparations for human extinction, Vietnam, and the current Middle Eastern
bludgeonings, the art and science of warfare has been singularly unable to cope
with what the military theorist Carl von Clausewitz characterized as the
“fog-of-war” and “friction” inevitable to human conflict. We argue here that, in the
real world, Artificial Intelligence will face similar challenges with similar or greater
ineptitude. The biblical injunction not to put trust in the chariots of Egypt is likely
to take new meaning over the next century.
v
vi Preface
The fourth chapter provides another case history, examining how failure of the
dynamics of crosstalk between “tactical” and “strategic” levels of organization will
lead to another version of the John Boyd mechanism of command failure: the rules
of the game change faster than executive systems can respond.
The fifth chapter comes full circle, applying the theory explicitly to military
systems. Here, the powerful asymptotic limit theorems of control and information
theories particularly illuminate target discrimination failures afflicting autonomous
weapon, man/machine centaur or cockpit, and more traditional structures under
increasing fog-of-war and friction burdens. Argument indicates that degradation in
targeting precision by high-level cognitive entities under escalating uncertainty,
operational difficulty, attrition, and real-time demands will almost always involve
sudden collapse to an all too familiar pathological state in which “all possible
targets are enemies”, otherwise known as “kill everyone and let God sort them out”.
The sixth chapter examines real-time critical processes on a longer timescale,
through an evolutionary lens. The basic finding is that protracted conflict between
cognitive entities can trigger a self-referential, coevolutionary bootstrap dynamic, in
essence a “language that speaks itself”. Such phenomena do not permit simple
command-loop interventions in John Boyd’s sense and are very hard to contain. An
example might be found in the evolutionary transformation of the Soviet Union’s
military forces, tactics, and strategy in the face of German Bewegungskrieg from
the battles of Moscow to Stalingrad, and then Kursk, and in the “insurgency” that
followed the 2003 tactical successes of the US in Iraq. Another example can be
found in the systematic resistance of the defeated Confederate states after the US
Civil War that resulted in the withdrawal of US troops and the end of
Reconstruction in 1877, permitting imposition of the Jim Crow system of racial
segregation and voter suppression that lasted well into the latter half of the twentieth
century.
The final chapter sums up the argument: Caveat Venditor, Caveat Emptor.
Some explicit comment on methodology is in order. The basic approach is
through the asymptotic limit theorems of information and control theories, leading
to statistical models that, like regression equations, are to be fitted to observational
or experimental data. The essential questions do not, then, revolve around the
pseudoscientific manipulation of metaphors abducted from “nonlinear science”, as
devastatingly critiqued by Lawson (2014), but rather on how well these statistical
models work in practice. Mathematical models that surround, or arise from, the
development of these tools should be viewed in the sense of the theoretical ecol-
ogist E. C. Pielou (1977) as generating conjectures that are to be tested by the
analysis of observational and experimental data: the word is never the thing.
The author thanks Barry Watts and a number of anonymous commentators for
suggestions and differences of opinion useful in revision.
References
Lawson, S., 2014. Non-Linear Science and Warfare: Chaos, complexity and the US military in the
information age. New York: Routledge.
Pielou, E.C., 1977. Mathematical Ecology. New York: John Wiley and Sons.
Wallace, D., Wallace, R., 1998. A Plague on Your Houses. New York: Verso.
Contents
ix
x Contents
xi
Chapter 1
AI in the Real World
1.1 Introduction
Thus for important biological and social processes, instability and its draconian
regulation are always implicit. Similar considerations apply to large-scale human
institutions that respond to rapidly-changing patterns of demand and opportunity.
Driverless cars on intelligent roads—V2V/V2I systems—will operate quite literally
on rapidly-shifting roadway environments, as, currently, do financial, communica-
tions, and power networks of any size, and, of course, autonomous, man-machine
‘centaur’ and more familiar ‘cockpit’ weapon systems of varying levels of complexity
and human control.
One example that has engendered particular attention is Richard Bookstaber’s
(2017) elegant sales pitch for agent-based modeling in economics and finance. He
proposes yet another ‘revolution in military affairs’ (Neuneck 2008), this time guided
by an array of cognitive modules that, acting together, via a kind of swarm intelligence
in the presence of Big Data, are supposed to permit us to manage financial crises in
real time. Agent-based models rely on ad-hoc (and possibly evolutionarily derived)
heuristics rather than a full-scale, detailed underlying dynamic picture of the world.
Agent-based modeling per se has been the subject of trenchant criticism. In one
example, Conte and Paolucci (2014), who go on to cite a considerable skeptical
literature, write
[Agent Based Models (ABM)] can only provide a sufficient explanation of the phenomenon
of interest, not a necessary one. This... is also known as multi-realizability... and is an out-
standing property of multilevel systems. A macro-level phenomenon in whatever domain...
is multirealizable when it can be implemented in different ways on the lower levels... Even
if as many models as generating paths were actually implemented, it would still be difficult,
if not impossible, to assess which one among them is effectively implemented in the real
world...
Under the pressure of complex systems science... agent-based simulation is increasingly
expected to meet a further... requirement, i.e., to be fed by massive data in real-time...
Unlike laws of nature, Agent Based models of socio-economic phenomena are countless and
not always consistent...
...[T]he variety of equivalent agent models in part depends on a property inherent to [complex]
multi-level systems... [i.e.,]... multirealizability... [i]n part... a consequence of the shaky
foundations, the poor theoretical justification at the basis of many agent models...
They particularly note that the consensus surrounding ABM directs that one
seeks the rules that are minimally necessary to obtain the macroscopic effect to
be described, and emphasize, by contrast, that ‘Entities and properties emerge from
the bottom up and retro-act on the systems that have generated them. Current agent-
based models instead simulate only emergent properties’.
Clearly, real-world, real-time systems are not necessarily minimal and are almost
always engaged in feedback with their effects. One need only think of the multi-
plicities and variations of parasite and pathogen life cycles that have evolved under
shifting selection pressures. Institutional systems suffer similar feedbacks and selec-
tions, and neither minimality nor linearity can be assumed.
Indeed, unlike the linear case, out-of-sample dynamics of nonlinear systems can-
not be estimated by ABM’s necessarily constructed on the sample. The Ptolemaic
solar system involves circular cycles-on-cycles of different radii about a fixed Earth
1.1 Introduction 3
The Data Rate Theorem (DRT) (Nair et al. 2007) relates control and information
theories in the study of regulation and its failure. That is, the DRT tells how good the
headlights must be for driving on a particular twisting, potholed road at night. More
specifically, the DRT establishes the minimum rate at which externally-supplied
control information must be provided for an inherently unstable system to maintain
stability.
At first approximation, it is usual to assume a linear expansion near a nonequi-
librium steady state, so that an n-dimensional vector of system parameters at time
t, xt , determines the state at time t + 1 according to the model of Fig. 1.1 and the
expression
xt+1 = Axt + Bu t + Wt (1.1)
4 1 AI in the Real World
Fig. 1.1 A linear expansion near a nonequilibrium steady state of an inherently unstable control
system, for which xt+1 = Axt + Bu t + Wt . A, B are square matrices, xt the vector of system
parameters at time t, u t the control vector at time t, and Wt a white noise vector. The Data Rate
Theorem states that the minimum rate at which control information must be provided for system
stability is H > log[| det[Am |], where Am is the subcomponent of A having eigenvalues ≥ 1. This is
characterized as saying that the rate of control information must exceed the rate at which the unstable
system generates topological information. The US military strategist John Boyd has observed that
driving conflict at a rate more rapid than an adversary can respond causes fatal destabilization, in
this context making the rate of topological information greater than the rate at which an opponent
can exert control. All cognitive systems will be vulnerable to such challenge
For those familiar with the works of the US military strategist John Boyd, Eqs. (1.1)
and (1.2) and Fig. 1.1 instantiate something close to his vision of a necessary continu-
ous cycle of interaction with the environment, assessing and responding to its constant
changes. Boyd asserts that victory in conflict is assured by the ability to ‘get inside’
the decision/correction control loop time frame of an opponent. That is, driving cir-
cumstances more rapidly than an adversary can respond triggers fatal destabilization
by making the rate at which topological information is generated greater than the
rate at which the adversary can counter with useful control information.
No cognitive system—biological, machine, organizational, or hybrid—is immune
to such attack.
How do elaborate control systems fail? The military strategist Carl von Clausewitz
emphasized two particular constraints leading to failure; ‘fog-of-war’ and ‘friction’.
The first term refers to the inevitability of limited intelligence regarding battlefield
conditions, and the second to the difficulty of imposing control, due to weather,
terrain, time lags, attrition, difficulty in resupply and logistics, and so on. Again,
for a night driving example, this might be represented as a synergism between poor
headlights and unresponsive steering.
Perhaps obviously, each critical real-time AI system will have several, perhaps
many, such constraints acting synergistically. We then envision, for each system, a
nonsymmetric n × n ‘correlation matrix’ ρ having elements ρi, j representing those
constraints and their pattern of interaction. Such matrices will have n invariants,
ri , i = 1..n, that remain fixed when ‘principal component’ transformations are
applied to data, and we construct an invariant scalar measure from them, based
on the well-known polynomial relation
det is the determinant, λ a parameter, and I is the n × n identity matrix. The first
invariant will be the trace of the matrix, and the last ± the determinant. Using these
n invariants we define an appropriate composite scalar index Γ = Γ (r1 , ..., rn ) as a
monotonic increasing real function. This is similar to the Rate Distortion Manifold
of Glazebrook and Wallace (2009) or the Generalized Retina of Wallace and Wallace
(2016).
Taking the one dimensional projection Γ as the ‘Clausewitz parameter’, we heuris-
tically extend the condition of Eq. (1.2) as
The Mathematical Appendix, following Wallace (2017, Sect. 7.10), uses a Black-
Scholes approximation to find that H (Γ ) will have, in first order, the unsurprising
6 1 AI in the Real World
A second approach to Eq. (1.5) is via the information bottleneck method of Tishby
et al. (1999), adapted here from Wallace (2017, Sect. 9.5). The basic idea is to view
the control information H of Eq. (1.2) as the distortion measure in a Rate Distortion
Theorem argument. We examine a sequence of actual system outputs and, in a deter-
ministic manner, infer from it a sequence of control signals Û i = û i0 , û i2 , ... that we
compare with the actual sequence of control signals U i = u i0 , u i1 , ... having a prob-
ability p(U n ). The RDT distortion measure is then the minimum necessary control
information for system stability H (Û i , U i ), and we write an ‘average distortion’ as
1.4 A Bottleneck Model 7
Hˆ ≡ p(U n )H (Û i , U i ) ≥ 0 (1.6)
Un
Using standard methods (Cover and Thomas 2006), we can then define a convex
‘Rate Distortion Function’ in the ‘distortion’ Hˆ . For illustration we take the RDF as
the standard Gaussian, although the essential result depends only on the function’s
inherent convexity (Cover and Thomas 2006). Then
which recovers Eq. (1.5). Other—convex—forms of RDF give the same result.
We next examine control failure, focusing on the dynamics of T itself, using a variant
of the bottleneck approach.
8 1 AI in the Real World
Again the central interest is on how a control signal u t in Fig. 1.1 is expressed in
the system response xt+1 , but here with a focus on T rather than on H .
Again the idea is to deterministically retranslate an observed sequence of system
outputs X i = x1i , x2i , ... into a sequence of possible control signals Û i = û i0 , û i1 , ...
and compare that sequence with the original control sequence U i = u i0 , u i1 , ..., with
the difference between them having a particular value under some chosen distortion
measure and hence having an average distortion
D≡ p(U i )d(U i , Û i ) (1.11)
i
where p(U i ) is the probability of the sequence U i and d(U i , Û i ) measures the distor-
tion between U i and the sequence of control signals that has been deterministically
reconstructed from the system output.
Again, a classic Rate Distortion argument. According to the Rate Distortion The-
orem, there exists a Rate Distortion Function, R(D), that determines the minimum
channel capacity necessary to keep the average distortion below some fixed limit
D (Cover and Thomas 2006). Based on Feynman’s (2000) interpretation of infor-
mation as a form of free energy, it becomes possible to construct a Boltzmann-like
pseudoprobability in the Clausewitz temperature T as
exp[−R/T ]d R
d P(R, T ) = ∞ (1.12)
0 exp[−R/T ]d R
‘explodes’ with increasing time. By the Stochastic Stabilization Theorem (Mao 2007;
Appleby et al. 2008), an ‘exploding’ function for which
log[|x(t)|] σ2
lim sup →− +ω (1.17)
t→∞ t 2
Expanding log[Tt ] using the Ito Chain Rule gives the differential equation
1
dT /dt = μφ log[T (t)φ] + 2μφ − T (t)σ 2 (1.21)
2
As above, there are two nonequilibrium steady state solutions, constrained by
Jensen’s inequality, with the larger stable and the smaller either collapsing to zero or
increasing toward the larger. The relations are
1 −σ 2 exp(−3)
E(T L ) ≥ − 2φW [−1, ]
σ2 μφ 2
1 −σ 2 exp(−3)
E(T S ) ≥ − 2 2φW [0, ] (1.22)
σ μφ 2
where W [−1, x], W [0, x] are the −1 and 0 branches of the Lambert W-function. As
above, large enough σ coalesces the upper and lower limits, causing T to collapse
to zero. Figure 1.3 shows that coalescence with increasing σ for the relation
−1
W [ j, −σ 2 ], j = −1, 0
σ2
Setting the two different expressions for W in Eq. (1.22) equal to each other and
solving for φ gives a stability condition in terms of σ and μ. The trick is to recognize
that W [−1, −x] = W [0, −x] at a branch point x = exp[−1]. This gives the stability
condition on the force capacity/resolve φ as
σ
φ> (1.23)
2μ exp(2)
Loss of force capacity remains a difficult conundrum for models of combat oper-
ations. Ormrod and Turnbull (2017), for example, write that
The practical relationship between attrition and combat remains uncertain, with a host of
variables influencing the outcome of battle [for example leadership, fire support, morale,
training, mobility, infiltration etc.]... Comprehensive assessment models of military forces
and combat skill is a difficult and unsolved proposition... [D]ata are far from convincing
that [available] simulations provide robust macro-attrition models that align with military
doctrine.
Again, McQuire focuses on ‘force resolve’ rather than attrition per se, although
most battles have been broken off at casualty rates less than 10%. Nonetheless, the
inference of Eq. (1.23), in consonance with much observation, is that sufficiently low-
ered force capacity φ—from either loss of resources or resolve—can be expected to
trigger tactical, operational, or strategic failure, depending on the scale of observa-
√
tion. Details vary, for this model, in proportion to the ratio σ/ μ.
It is probably necessary to make the same kind of expansion for φ as was done
for Γ in Sect. 1.2 so as to include factors of resolve as well as of material resource.
McQuire explicitly identifies a high level of enemy maneuverability as an essential
determinant of defeat in combat, and we can model the interaction between T and
φ from that perspective.
We first normalize variates as Tˆ ≡ T /Tmax , φ̂ ≡ φ/φmax . The interaction
between them is then taken as
The μi indicate positive feedback and the γi represent the rate of ‘entropy’ effects
that decrease the indices of interest, respectively the rates of attrition of situational
awareness and capability/force resolve.
Elementary calculation finds equilibrium values for this system as
μ1 μ2 − γ1 γ2
Tˆ →
μ2 (μ1 + γ1 )
μ1 μ2 − γ1 γ2
φ̂ → (1.25)
μ1 (μ2 + γ2 )
12 1 AI in the Real World
R 2 − 1/R 2
Tˆ ∝ (1.26)
R(R + 1/R)
See Wallace (1993) for an application of the network-failure method to the recur-
rent collapse of fire service in New York City that began after 1972, triggered by
political ‘planned shrinkage’ fire service reductions focused in high population, high
density minority voting blocs. These reductions persist to the present, in spite of the
reoccupation of formerly minority communities by an affluent majority population.
That analysis centers on cascading hierarchical disintegration.
It is interesting to note that justification for these fire service reductions was by
means of ‘deployment algorithms’ developed by the Rand Corporation that have
since been institutionalized in a highly automated system that is a recognizable
precursor of the AI which will be given control of V2V/V2I and similar critical
infrastructure (Wallace and Wallace 1998). In essence, New York City’s housing
stock collapsed to a level that could be supported by the reduced fire extinguishment
services, resulting in the loss of hundreds of thousands of primarily low income units.
Analogous evolutionary selection pressures can be expected to follow widespread
deployments of AI control for other critical real-time systems.
A different approach is to expand Eq. (1.24) by letting X (t) represent the vector
< Tˆ , φ̂ >
and assuming that a ‘mobility function’, redesignated R̂(X (t)), acts directly—as
opposed to inversely with R above. Then Eq. (1.24) can be expressed as a stochastic
differential equation vector system
almost surely.
That is, sufficient ‘enemy maneuverability’, in this model, if maintained long
enough, drives any levels of Clausewitz temperature and capacity/resolve to extinc-
tion.
One can, of course, imagine both this and the mechanism of Eq. (1.26) at work
together.
The challenge to an agent or agency is then to deny an ‘opponent’ the necessary
scale and pattern of maneuverability.
A simplified stochastic variant of Eq. (1.24) would involve fixing the value of φ,
analogous to the development of Eqs. (1.20)–(1.23). Then
Two points are evident. First, since this is an expectation, there will always be
some probability that the system falls below the critical value for Tˆ determined
by the DRT. Second, as the rate of attrition of situational awareness, γ , rises, this
probability significantly increases.
However, applying the Ito Chain Rule to Tˆ 2 , after some calculation, finds
μφ
E(Tˆ 2 ) = [ ]2 (1.31)
μφ + γ − σ 2 /2
Rising σ can thus trigger a particular instability leading to rapid violation of the
DRT condition.
A more comprehensive ‘cognitive’ argument can be made for less regular circum-
stances if it is possible to identify equivalence classes of a system’s developmental
pathways, e.g., ‘healthy’ versus ‘pathological’, permitting definition of a ‘develop-
mental symmetry groupoid’ (Wallace 2017; Weinstein 1996; Golubitsky and Stewart
2006). A groupoid is a generalization of the idea of a symmetry group in which a
product is not necessarily defined between each element. The simplest example might
be a disjoint union of separate symmetry groups, but sets of equivalence classes also
define a groupoid. See the Mathematical Appendix for an introduction to standard
material on groupoids.
We will show that a new ‘free energy’ can then be defined that is liable to an analog
of Landau’s classical spontaneous symmetry breaking, in the Morse Theory sense
(Pettini 2007). Under symmetry breaking, higher ‘temperatures’ are associated with
more symmetric higher energy states in physical systems. Cosmological theories
make much of such matters in the first moments after the ‘big bang’, where different
physical phenomena began to break out as the universe rapidly cooled. Here, for
cognitive processes controlled by AI systems a decline in the Clausewitz temperature
T can result in sharply punctuated collapse from higher to lower symmetry states,
1.6 The Failure of Cognition 15
where the sum is over the different possible cognitive modes of the full system.
A ‘free energy’ Morse Function F can then be defined as
exp[−F/T ] ≡ exp[−HG j /T ]
j
F = −T log[ exp[−HG j /T ]] (1.34)
j
temperature T , but in the context of what are likely to be far more complicated
groupoid rather than group symmetries.
As above, it is possible to invoke an index of resolve/capability by the mapping
T → φT in Eqs. (1.33) and (1.34).
Based on the analogy with physical systems, there should be only a few possible
phases, with sharp and sudden transitions between them as the Clausewitz tempera-
ture T decreases.
It is possible to examine sufficient conditions for the intractable stability of the
pathological ‘ground state’ via the Stochastic Stabilization Theorem (Appleby et al.
2008; Mao 2007). Suppose there is a multidimensional vector of parameters asso-
ciated with that phase, X , that measures deviations from the pathological state. The
free energy measure from Eq. (1.34) allows definition of another entropy in terms of
a Legendre transform
Ŝ ≡ F(X ) − X · ∇ X F (1.35)
increasing function of both the rate of material supply M j and of information supply
characterized by a local channel capacity C j . The overall
capability of the system is
seen as limited to some total maximum rate M = j M j (M j , C j ).
Wallace (2016) uses an Arrhenius reaction rate model to argue that the rate of
individual module cognition is then given as exp[−K j /M j ] for an appropriate K j >
0. We focus first on optimization under real-time ‘roadway’ constraints in which
tactical rather than strategic considerations predominate.
Taking
a Lagrange multiplier approach to efficiency optimization under the con-
straint j M j = M > 0, we use the simplest possible equally-weighted multi-
objective scalarization, producing the Lagrangian
L≡ exp[−K j /M j ] − λ[ M j − M] (1.37)
j j
exp[−K j /M j ]
Kj =λ
M 2j
M= Mj
j
∂ L/∂ M = λ (1.38)
where, abducting arguments from physical theory, λ is taken as the ‘inverse Boyd
temperature’ of the full system. Any good Statistical thermodynamics text will go
through the argument (e.g., Schrodinger 1989, Chap. II). The calculation is based on
maximizing a probability distribution using Lagrange multipliers. Then log(P) in
N!
P=
n 1 !n 2 !...
is maximized subject to the constraints i n i = N and i εi n i = E, where n i
is the number in state i and εi its energy. One then applies the Stirling formula
n! ≈ n(log(n) − 1) and some hand-waving to identify the energy multiplier as an
inverse temperature.
Figure 1.5 shows a single term for K j = 0.5 over a range 0 ≤ M j ≤ 2.
It is important to recognize that, for small λ, i.e., high Boyd temperature, an M j
may become arbitrarily large, a requirement that cannot be met: the system then fails
catastrophically.
1.8 The ‘Boyd Temperature’ 19
Clearly, then, sufficient ‘cognitive challenge’ creates the conditions for sudden,
punctuated collapse. This follows directly from the inference that, for a given cog-
nitive module, we will most likely have something much like M j ∝ φ j T j , i.e., the
rate of resource consumption is determined by the synergism between the force
capacity/resolve index and the Clausewitz temperature index. At the very least,
M j (M j , C j ), where C j represents the required information channel capacity, must
itself be some positive, monotonic increasing function of them both.
Although we have used a ‘simple’ deterministic model, the real world is seldom
deterministic: the essential parameters of Eqs. (1.37) and (1.38) can themselves be
stochastic variates, and we enter the complicated realm of stochastic programming,
following closely the presentation of Cornuejols and Tutuncu (2006, Chap. 16).
Many optimization problems are described by uncertain parameters, and one form
of approach, stochastic programming, assumes that these uncertain parameters are
random variables with known probability distributions. This information is then used
to transform the stochastic program into a so-called deterministic equivalent which
might be a linear program, a nonlinear program, or an integer program. As Cornuejols
and Tutuncu put it,
While stochastic programming models have existed for several decades, computational tech-
nology has only recently allowed the solution of realistic size problems... It is a popular
modeling tool for problems in a variety of disciplines including financial engineering...
Stochastic programming models can include anticipative and/or adaptive decision variables.
Anticipative variables correspond to those decisions that must be made here-and-now and
cannot depend on the future observations/partial realizations of the random parameters...
20 1 AI in the Real World
Evidently, real-time critical systems will often fall heavily into the anticipative
category.
We provide a relatively simple example, explicitly reconsidering the effectiveness
reaction rate index exp[−K j /M j ].
The scalarization function is then to be replaced by its expectation before the
optimization calculation is carried out.
We assume, first, that the K j have exponential distribution density functions, i.e.,
ρ(K j ) = ω j exp[−ω j K j ] so that
∞
E(K j ) = K j ρ(K j )d K j = 1/ω j (1.39)
0
As a consequence,
∞
E(exp[−K k /M j ]) = ω j exp[−ω j K j ] exp[−K j /M j ]d K j =
0
Mjωj
(1.40)
Mjωj + 1
ωj Mjωj 2
− =λ (1.41)
M j ω j + 1 (M j ω j + 1)2
Figure 1.6 shows this relation for < K j >= 1/ω j = 0.5. Again, small λ, equiv-
alent to a high Boyd temperature, is to be associated with exploding demand for
resources, but without the a zero state as at the left of the peak in Fig. 1.5. In this
case, noise precludes such a state.
A second approach is to take the M j themselves as stochastic variables having
exponential distributions and the K j as fixed parameters so that
∞
< M > j ≡ E(M j ) = M j ω j exp[−ω j M j ]d M j = 1/ω j (1.42)
0
L= E(exp[−K j /M j ]) − λ[ E(M j ) − M] =
j j
1
2 ω j K j Bessel K (1, 2 ω j K j ) − λ[ − M] (1.43)
j j
ωj
1.8 The ‘Boyd Temperature’ 21
The first term in the gradient equation analogous to that of Eq. (1.38), but now
replacing ω j with 1/ < M > j , is
2Bessel K (0, 2 K j / < M > j )(K j / < M >2j ) = λ (1.44)
< M >j
E(exp[−K j /M j ]) = (1 − 2 exp[< K > j / < M > j ]Ei 3 (< K > j / < M > j ))
< K >j
(1.45)
1
λ= (2 exp[K /M](K − M)Ei 3 (K /M)
KM
−2K exp[K /M]Ei 2 (K /M) + M) (1.46)
where we have suppressed the j index and both M and K are their expectation values.
22 1 AI in the Real World
Figure 1.8 shows this relation for < K > j = 0.5, and is similar in form to Fig. 1.6.
It is of some interest to carry through this program for the efficiency measure
exp[−K /M]/M which would become important on strategic scales of analysis, that
is, long-term conflict beyond do-or-die exigencies. The equations below, for which
1.8 The ‘Boyd Temperature’ 23
we have suppressed the j index, are equivalent to the first of Eq. (1.38): (1). fully
deterministic. For exponential distributions, (2). deterministic K , stochastic M, (3).
stochastic K , deterministic M, (4). both K and M stochastic.
Fig. 1.10 Term-by-term stochastic optimization for the efficiency index j exp[−K j /M j ]/M j .
a fixed K , stochastic M, b stochastic K , fixed M, c both stochastic. Exponential distributions
assumed. K and < K > are taken as 0.5
Figure 1.10 shows the pattern for the different stochastic optimizations: (a) fixed
K , stochastic M, (b) stochastic K , fixed M, (c) both stochastic. In all cases, the
demand for resources, either directly or on average, becomes explosive with declin-
ing λ.
These stochastic optimization calculations are not completely trivial and needed
a sophisticated computer algebra program for their solution.
One-parameter distributions, in general, can be explored using a variant of the
method applied here. Under such a condition, < M > j ≡ E(M j ) can be expressed
as a function of the distribution’s characteristic parameter, say α j , recalling that the
distribution function is ρ(α j , M j ). Then
1.8 The ‘Boyd Temperature’ 25
∞
< M > j ≡ E(M j ) = M j ρ(α j , M j )d M j = Q j (α j )
0
Many years ago, Huberman and Hogg (1987) examined the punctuated onset of
collective phenomena across interacting algorithmic systems:
We predict that large-scale artificial intelligence systems and cognitive models will undergo
sudden phase transitions from disjointed parts into coherent structures as their topological
connectivity increases beyond a critical value. These situations, ranging from production
systems to semantic net computations, are characterized by event horizons in space-time
that determine the range of causal connections between processes. At transition, these event
horizons undergo explosive changes in size. This phenomenon, analogous to phase transi-
tions in nature, provides a new paradigm with which to analyze the behavior of large-scale
computation and determine its generic features.
Recent work on ‘flash crash’ stock market collapses bears out these predictions,
implying indeed dynamics roughly analogous to the Boyd mechanism of the previous
section (Parker 2016a, b; Johnson et al. 2013; Zook and Grote 2017).
26 1 AI in the Real World
Fig. 1.11 From Zook and Grote 2017. Flash Crash of May 6, 2010. Percent change in Standard
and Poor’s 500 index at one minute intervals during trading day. Chapter 6 will examine such
phenomena as an example of the punctuated onset of a coevolutionary ‘language that speaks itself’.
Military coevolutionary catastrophes can play out over much longer time scales
Figure 1.11, from Zook and Grote (2017), shows the flash crash of May 6, 2010,
in which Standard and Poor’s 500 index declined by about 5 percent in only a few
minutes.
Zook and Grote (2017) remark
In the days of HFT [high frequency trading] with its enormous technological infrastructure,
public information is transformed into orders that are brought to the market extremely fast
so that they resemble private information—at least with regard to other, slower, market
participants. Fore fronting the processes and strategies contained in the assemblages of
HFT is essential in recognizing that the recreation of capital exchanges is not simply an
exercise in efficiency but a calculated strategy. The human traders directing the efforts of
HFT assemblages rely upon space-based strategies of information inequality to extract profits
while simultaneously introduce new and unknown risks into the market.
Thus the ratio CC A /CC L acts as a temperature analog in the Parker model.
Johnson et al. (2013) put it somewhat differently, invoking their own version of
the Boyd analysis:
Society’s techno-social systems are becoming ever faster and more computer-oriented. How-
ever, far from simply generating faster versions of existing behavior... this speed-up can gen-
erate a new behavioral regime as humans lose the ability to intervene in real time. Analyzing
millisecond-scale data for... the global financial market, we uncover an abrupt transition to
a new all-machine phase characterized by large numbers of subsecond extreme events.
Their ‘temperature’ analog is the real-time ratio of the number of available strate-
gies to the number of active agents. If this is greater than 1, the system remains
stable in their model. Below 1, the system undergoes a phase transition to an unsta-
ble dynamic regime. See their Fig. 6 for details, and their reference list for some of
the other phase transition studies of the flash-crash pathology.
Something similar to Parker’s analysis emerges from the arguments of the pre-
vious section—although only indirectly as a temperature—by letting the constraint
M, via its components M j , be given wholly in terms of the available information
channel capacity
C j , replacing the resolve-and-information constraint above. That
is, M = j M j (M j , C j ),
becomes a purely informational constraint in a multi-
channel complex, i.e., C = j C j . The system’s inverse Boyd temperature index λ
then determines whether there is enough channel capacity available to permit stability.
Unlike Parker’s single component result, for larger, interactive systems, under certain
Boyd temperature regimes there may never be enough channel capacity available.
That is, for the flash crash example, if the rate of challenge ‘gets inside the command
loop’ of the market system, CC L of individual components can never be made large
enough for stabilization: the response rate calculation leading to Figs. 1.6 and 1.7
suggest that high enough Boyd temperature—sufficiently small λ—leads to channel
capacity demands for individual modules that cannot be met.
These mechanisms have been recognized as sources of instability in AI-driven
military confrontations (e.g., Baumard 2016). As Kania (2017) put it in the context
of the inevitably different design ‘cultures’ for Western and Chinese military AI
systems,
Against the backdrop of intensifying strategic competition, great powers are unlikely to
accept constraints upon capabilities considered critical to their future military power. At
this point, despite recurrent concerns over the risks of ‘killer robots,’ an outright ban would
likely be infeasible. At best, militaries would vary in their respective adherence to potential
norms. The military applications of AI will enable new capabilities for militaries but also will
create new vulnerabilities. This militarization of AI could prove destabilizing, potentially
intensifying the risks of uncontrollable or even unintended escalation. There will likely be
major asymmetries between different militaries’ approaches to and employment of AI in
warfare, which could exacerbate the potential for misperception or unexpected algorithmic
interactions.
Turchin and Denkenberger (2018), in their long chapter on military AI, devote
only a single paragraph to such dynamics:
Nuclear weapons lessened the time of global war to half an hour. In the case of war between
two military AIs it could be even less.... A war between two military AIs may be similar to
28 1 AI in the Real World
the flash-crash: two AIs competing with each other in a stable mode, could, in a very short
time (from minutes to milliseconds), lose that stability. They could start acting hostilely to
each other...
It seems clear that the risk of such pathological interaction is inherent to AI control
of real-time critical systems across many venues.
In Chap. 6 we will reexamine algorithmic flash-crash processes and similar phe-
nomena from the more general perspective of evolutionary theory, suggesting that
they represent the punctuated onset of rapid coevolutionary dynamics, in effect,
of a ‘language that speaks itself’, creating instabilities far beyond those resulting
from John Boyd’s command loop challenge. Indeed, quite perversely, command
loop robustness under dynamic challenge is the keystone to the instability.
F(JL , KL ) = L D F(J, K )
χ (JL , KL ) = χ (J, K )/L (1.48)
where JL and KL are the transformed values after the clumping renormalization,
and we take J1 , K1 ≡ J, K . D is a real positive number characteristic of the network,
here most likely a fractal dimension. In physical systems D is integral and determined
by the underlying dimensionality of the object under study (Wilson 1971). As shown
in the Mathematical Appendix, many different such renormalization relations are
possible for cognitive systems.
These relations are presumed to hold in the neighborhood of the critical value of
the transition index, KC .
Differentiating with respect to L gives expressions of the form
F = ω D/y F0
χ = ω1/y χ0 (1.50)
so that
ω= sτ0 /τ K (1.52)
Substituting this into the relation for the correlation length gives the expected
fragment size in L -space, d(tˆ), as
What are the limits on T (or on T φ), the temperature analog that determines cog-
nitive function in elaborate AI (and other) cognitive systems? We reconsider the
argument leading to Eq. (1.13).
First, assume that T → T + Δ, Δ T.
This leads to an expression for the free energy index F of the form
∞
F
exp[− ]= exp[−R/(T + Δ)]d R = (T + Δ) (1.54)
(T + Δ) 0
μΔt
dΔt = dt + σ Δt dWt ≈
T + Δt
μ
Δt dt + σ Δt dWt (1.55)
T
where μ is an appropriate ‘diffusion coefficient’, dWt represents Brownian white
noise, σ determines the magnitude of the volatility, and we use the condition that
Δ T.
1.11 The Ratchet 31
Applying the Ito Chain Rule (Protter 1990) to log[Δ] produces the SDE
μ 1
d log[Δt ] = ( − σ 2 )dt + σ dWt (1.56)
T 2
Invoking the Stochastic Stabilization Theorem (Mao 2007; Appleby et al. 2008),
log[|Δt |]
lim →< 0
t→∞ t
almost surely unless
μ 1
> σ2
T 2
2μ
T < 2 (1.57)
σ
The essential point is that there will be an upper limit to T in this version of the
ratchet. Above that ceiling, other things being equal, Δt → 0.
This mechanism might constrain the maximum possible T .
Conversely, a sudden increase in σ might trigger a decline in T that in turn causes
a subsequent increase in σ , leading to a downward ratchet and system collapse.
Typically, for real-time systems, local entities are engaged in what the military call
immediate do-or-die ‘tactical’ challenges, for example a single driverless car in a
rapidly varying traffic stream. Two subsequent layers of cognition, however, are
imposed on the tactical level. The highest involves the ‘strategic’ aims in which
tactical decisions are embedded. For driverless cars on intelligent roads, so-called
V2V/V2I systems, the ultimate aim is unimpeded, rapid traffic flow over some preex-
isting street network. Connecting strategy to tactics is done through the operational
level of command, the necessary interface between local and ultimate cognitive
intent. While ‘tactical’ problems usually have relatively straightforward engineering
solutions—lidar, radar, V2V crosstalk, and so on for driverless cars—operational
and strategic levels do not.
As Watts (2008), in a military setting, puts it
The cognitive skills demanded of operational artists and competent strategists appear to
differ fundamentally from those underlying tactical expertise in do-or-die situations... Tac-
tical competence does not necessarily translate into operational competence... Operational
problems, being wicked [in a precise technical sense] are characterized by complexity and
uncertainty embedded in a turbulent environment riddled with uncertainties.
32 1 AI in the Real World
Rose (2001) explores critical US strategic intelligence failures during the Korean
War. The tactical brilliance of the US amphibious landing at Inchon, on South Korea’s
Northwest coast, on September 15, 1950 was matched by a stunning blindness to per-
sistent and accurate intelligence reports of a massive Chinese buildup in Manchuria.
Indeed, the Chinese had already sent numerous diplomatic signals that they viewed
US presence north of the 38th Parallel as a strategic threat. US Cold War doctrine,
however, dictated that the Soviet Union controlled all Communist entities, and that,
fearing war with the US, the Soviets would simply reign in the Chinese. US China
scholars, who would have known better and might have entered into policy discus-
sions, had all been silenced by the McCarthy era smears about who had ‘lost China’.
US commanding general Douglas MacArthur and his hand-picked, sycophantic staff
argued that, in spite of the evident massive military buildup, the Chinese would not
intervene. On October 13, they began doing so, in two distinct stages.
As Rose puts it,
By mid-November [1950], FEC reported that 12 PLA divisions had been identified in Korea.
On 24 November, however, National Intelligence Estimate 2/1 stated that China had the
capability for large-scale offensive operations but that there were no indications such an
offensive was in the offing. That same day, the second Chinese offensive started, leaving the
8th Army fighting for its life and most of the 1st Marine Division surrounded and threatened
with annihilation.
It took several days for MacArthur and his staff to face the fact that his ‘end of the war’
offensive toward the Yalu was over and victory was not near. Finally, on 28 November,
MacArthur reported that he faced 200,000 PLA troops and a completely new war. MacArthur
again had the numbers significantly wrong, but he got the ‘new war’ part right.
Using Q, we have broken out the underlying network topology, a fixed between-
and-within communication configuration weighted by ‘command weight’ Ai that is
assumed to change relatively slowly on the timescale of observation compared to the
time needed to approach the nonequilibrium steady state distribution.
Since the X are expressed in dimensionless form, g, t, and A must be rewritten as
dimensionless as well giving, for the monotonic increasing (or threshold-triggered)
function F εi
Xτi = G[τ, × Aτ ] (1.59)
Ai
where Aτ is the value of a ‘characteristic area’ variate that represents the spread of
the essential signal at (dimensionless) characteristic time τ = t/T0 .
G may be quite complicated, including dimensionless ‘structural’ variates for each
individual geographic node i. The idea is that the characteristic ‘area’ Aτ —the level
of command that recognizes the importance of essential incoming information—
grows according to a stochastic process, even though G may be a deterministic
mixmaster driven by systematic local probability-of-contact or other information
flow patterns.
An example.
A characteristic area cannot grow indefinitely, and we invoke a ‘carrying capacity’
for command level on the network under study, say K > 0. An appropriate SDE is
then
dAτ = [μρAτ (1 − Aτ /K )]dτ + σ Aτ dWτ (1.60)
E(A ) → 0, μρ < σ 2 /2
σ2
E(A ) ≥ K (1 − ), μρ ≥ σ 2 /2 (1.61)
2μρ
Figure 1.12 shows the form of this relation. To the left of a critical value of the
competence index ρ, given the usual stochastic variabilities and excursions, there is
a high probability that critical information will not propagate to higher command
from the tactical level.
The effect of more general noise forms—colored, Levy, etc.—can be explored
using the Doleans-Dade exponential (DDE) (Protter 1990). We suppose Eq. (1.60)
can be rewritten in the form
dAτ = Aτ dYτ (1.62)
1
d[Yτ , Yτ ]/dt > dYτ /dt (1.64)
2
then the pathological ground state is stable and information will not flow across the
network system. A version of the formalism does indeed extend to Levy noise, which
has a long tail and relatively large jumps, in comparison with the usual Brownian
noise (Protter 1990).
In one dimension, for sufficiently powerful noise, similar results arise directly via
the stochastic stabilization theorems explored by Mao (2007), Appleby et al. (2008).
Matters are more complicated in two or more dimensions, where the noise structure
can determine more complicated dynamic effects.
Similar themes have been explored using Kuramoto’s (1984) model of frequency
synchronization across a coupled network (e.g., Acebron et al. 2005; Kalloniatis
and Roberts 2017). A central result of Kalloniatis and Roberts is the difference
between random and scale-free networks in their response to Levy noise. Their Fig. 6
is essentially our Fig. 1.12, but rephrased in terms of the Kuramoto order parameter
representing the degree of synchronization across the network as a function of the
strength of the noise, so the figures are mirror images.
36 1 AI in the Real World
Human minds, small working teams, larger institutions, and the machines—cognitive
and otherwise—that become synergistic with them, are all cultural artifacts. Indeed,
culture, as the evolutionary anthropologist Robert Boyd has commented, ‘is as much
a part of human biology as the enamel on our teeth’.
The failure of critical real-time cognition at tactical, operational, and strategic
scales—and the correction of failure—can and must be reexamined from the per-
spective of the dynamics imposed by embedding culture. US operational and strategic
misfeasance, malfeasance and nonfeasance in Korea, Vietnam, and Afghanistan have
deep cultural roots, as the military writings of Mao Tse-Tung and other East Asian
practitioners suggest.
Artificial Intelligence systems are, then, also cultural artifacts, and the dynamics
of critical systems under their influence must inevitably reflect something of the
embedding culture, if only through the dynamics of the rapidly-shifting roadway
topologies they must ride on, adapt to, or attempt to control. A simple, if somewhat
static, example can be seen in the differential successes of Japanese and American
automobile manufacturers.
Extension of the preceding theoretical development is surprisingly direct, at least
in a purely formal sense. The devil, as always, will be in the details.
The symmetry-breaking model of Sect. 1.6 can be extended to include the effects
of an embedding cultural environment—via an information source Z —on global
broadcast mechanisms at the different scales and levels of organization that link
across the tactical, operational, and strategic scales of organization. A single dual
information source X G i then becomes a large-scale joint information source whose
individual components are linked by crosstalk, having a joint source uncertainty
j q
H (X Gi 1 , X G 2 , ..., X G m ).
Given the embedding cultural information source Z , then the splitting criterion
between high and low probability dynamic system trajectories is given by network
information theory as the complicated sum
j q
I ((X Gi 1 , X G 2 , ..., X G m |Z ) =
j
H (Z ) + H (X G n |Z )
j q
−H (X Gi 1 , X G 2 , ..., X G m |Z ) (1.65)
Equations (1.33) and (1.34) are then rewritten in terms of the splitting criterion
j q
I ((X Gi 1 , X G 2 , ..., X G m |Z )
We will call the new ‘free energy’ index, now influenced by embedding culture, F .
We have, in essence, extended to complex man/work group/institution/machine
composites a kind of ‘Mathematical Kleinman Theory’, representing in a formal way
something of the observations of Kleinman (1991), Kleinman and Cohen (1997),
1.13 Failure, Remediation, and Culture 37
and their colleagues who studied the profound differences in the expression and
experience of mental disorders across cultures.
It is possible to reexamine sufficient conditions for the intractable stability of a
pathological ‘ground state’ condensation representing control system collapse via
the Stochastic Stabilization Theorem (Appleby et al. 2008; Mao 2007), but now in a
particular embedding cultural milieu. Recall that, for military systems, that ground
state is usually something like ‘kill them all and let God sort them out’, or other
forms of ‘target discrimination failure’.
We assume a multidimensional vector of parameters associated with that phase,
J , that measures deviations from the pathological ground state. The free energy
measure from the generalization in Eq. (1.34) allows definition of another ‘entropy’
in terms of the Legendre transform
Ŝ ≡ F (J ) − J · ∇ J F (1.66)
What should be evident is that culture will become inherently convoluted not only
with patterns of cognitive system failure at different scales and levels of organiza-
tion, but with successful modalities of treatment. Treatment of cognitive failure—for
individual minds, small groups, institutions, real-time AI critical systems, and so on,
in the sense of Kleinman, will itself always be ‘culture-bound’.
For nonergodic systems addressed in the next chapter, where time averages are
not the same as ensemble averages, the groupoid symmetries become ‘trivial’, asso-
ciated with the individual high probability paths for which an H -value may be
defined, although it cannot be represented in the form of the usual Shannon ‘entropy’
(Khinchin 1957, p. 72). Then equivalence classes must be defined in terms of other
similarity measures for different developmental pathways. The arguments of this
section regarding pathological modes and their treatment then follow through.
Matters are far more complicated than we have examined so far. That is, while
this work has studied particular mechanisms and their dynamics at various scales
and levels of organization, in real systems the individual ‘factoids’ will influence
each other, consequently acting collectively and emergently, becoming, in the usual
sense, greater than the sum of their parts. This implies the existence of a ‘free energy’
splitting criterion that must be a specific and appropriate generalization of Eq. (1.65).
The argument is, yet again, surprisingly direct.
1. Cultural, cognitive, and communication processes can all be characterized by
information sources subject to phase transitions analogous to those of physical sys-
tems, if only by the identification of information as a form of free energy. The
Mathematical Appendix provides several examples of ‘biological’ renormalizations.
2. Behavioral ‘traffic flow’ for real-time critical systems, in a very large sense,
is itself subject to phase transitions, via directed homotopy groupoids, building into
shifting aggregations of these simpler transitive groupoids. That is, the system ‘traffic’
involves one-way paths from an ‘origin’ state to a ‘destination’ state. Equivalence
classes of such paths form the transitive groupoids that combine into the larger
groupoid of interest, subject to ‘symmetry’ making/breaking associated with system
and time-specific extensions. Wallace (2018), in the context of driveless cars on
intelligent roads, so-called V2V/V2I systems, puts it thus:
Traffic flow can be rephrased in terms of ‘directed homotopy’ – dihomotopy – groupoids
on an underlying road network, again parameterized by [a particular] ‘temperature’ index
T . Classical homotopy characterizes topological structures in terms of the number of ways
a loop within the object can be continuously reduced to a base point... For a sphere, all
loops can be reduced [to a single point]. For a toroid – a donut shape – there is a hole so
that two classes of loops cannot be reduced to a point. One then composes loops to create
the ‘fundamental group’ of the topological object. The construction is standard. Vehicles
on a road network, however, are generally traveling from some initial point So to a final
1.14 The Synergism of Phase Transitions in Real-Time Critical Systems 39
destination S1, and directed paths, not loops are the ‘natural’ objects, at least over a short
time period, as in commuting.
Given some ‘hole’ in the road network, there will usually be more than one way to reach S1
from So. An equivalence class of directed paths is defined by paths that can be deformed
into one another without crossing barrier zones [such as holes]... At high values of [some
appropriate index] T , many different sets of paths will be possible allowing unobstructed
travel from one given point to another, defining equivalence classes creating a large groupoid.
As [the critical index] T declines, roadways and junctions become increasingly jammed,
eliminating entire equivalence classes of open pathways, and lowering the groupoid sym-
metry: phase transitions via classic symmetry breaking on a network. The ‘order parameter’
that disappears at high T is then simply the number of jammed roadways.
These results extend to higher dihomotopy groupoids via introduction of cylindrical paths
rather than one-dimensional lines...
Most fundamentally... the traffic flow groupoid and the groupoid associated with cogni-
tion across the V2V/V2I system will inevitably be intimately intertwined, synergistically
compounding symmetry breaking traffic jams as-we-know-them with symmetry breaking
cognitive collapse of the control system automata, creating conditions of monumental chaos.
3. Sufficiently rapid challenge can always ‘get inside the command loop’ of a
real-time critical system in the sense of John Boyd, and/or can trigger network frag-
mentation by the Zurek mechanism(s) of Sect. 1.10.
These considerations lead to a particular inference:
4. The dynamics of critical real-time systems will almost always involve the syner-
gism of several of the mechanisms studied above, leading to highly counterintuitive,
unexpected, and often deliberately triggered, groupoid symmetry breaking phase
transitions that can, and most certainly will, seriously compromise the health and
welfare of large populations.
The devil, of course, will be in the particular details of each system studied.
1.15 Discussion
The first two models examined the effect of a declining ‘Clausewitz temperature’ T
constructed from indices of the fog-of-war, friction, and related constraints on the
stability of an inherently unstable control system, using adaptations of the Data Rate
Theorem.
A third approximation modeled the dynamics of control breakdown for a simple
class of inherently unstable systems in terms of the dynamics of the Clausewitz
temperature itself and a parallel ‘resolve/capability’ index. The outcome, after some
algebra, also implies the inevitability of highly punctuated collapse under sufficient
stress, and it appears possible, using relatively direct methods, to calculate explicit
system limits that may be empirically tested.
These models were followed by an extension of the DRT to more highly ‘cog-
nitive’ systems via the recognition that cognition represents choice, choice reduces
uncertainty, and the reduction in uncertainty implies the existence of an information
source ‘dual’ to the cognitive process under study. The dynamics of the most complex
40 1 AI in the Real World
This central problem remains. Although human failures of perception are manifold
(e.g., Kahneman 2011), they have been paired down by an evolutionary selection
process that has yet to act full-scale on human systems controlled by automata.
More recently Watts (2011), consonant with Neuneck’s view, has commented
on the assertion that near-perfect information availability combined with precision
targeting in combat situations will allow large-scale man/machine ‘cockpit’ systems
to profoundly alter the conduct of war. He states that
These assumptions obviously fly in the face of the view that the fundamental nature of war
is essentially an interactive clash - a Zweikampf or two-sided ‘duel,’ as Carl von Clause-
witz characterized it - between independent, hostile, sentient wills dominated by friction,
uncertainty, disorder, and highly nonlinear interactions. Can sensory and network technolo-
gies eliminate the frictions, uncertainties, disorder, and nonlinearities of interactive clashes
between opposing polities? As of this writing, the answer appears to be ‘No.’
Schrage (2003) again makes Boyd’s case about ‘getting inside’ an opponent’s
decision loop:
...[C]omparative information advantage accrues as the rate of information acquisition and
analysis changes over time. The ability to detect and respond to battlespace changes faster,
better, cheaper and more pervasively than the opposing force inherently places a premium
on better ‘improvisation’ than better planning. Indeed, the critical combat competency for
commanders shifts from rigorous planning – that stochastically evaporates on contact with
the enemy – to improvisational responsiveness...
The ‘roadway’ example used at the beginning... is akin to the well-known midcourse guidance
problem of stochastic optimal control that has yet to be solved. In the midcourse guidance
example, many authors have tried to deal with the problem that as the space voyage pro-
gresses, information gained about the state vector is at the cost of increased control energy
[e.g., Mortensen’s seminal studies 1966a, b].
For the record, spacecraft trajectories are relatively simple geodesics in a gravita-
tional manifold characterized by Newtonian mechanics. In the context of air traffic
control, Hu et al. (2001) show that finding collision-free maneuvers for multiple
agents on a two-dimensional Euclidean plane surface R2 is the same as finding the
shortest geodesic in a particular manifold with nonsmooth boundary. Given n vehi-
cles the geodesic is calculated for the quotient space R2n /W (r ), where W (r ) is
defined by the requirement that no vehicles are closer together than some critical
Euclidean distance r .
For autonomous ground vehicles, R2 must be replaced by a far more topolog-
ically complex ‘roadmap space’ M2 subject to traffic jams and similar conditions.
Geodesics for n such vehicles are then in a quotient space M2n /W (r ) whose dynamics
are subject to phase transitions driven by changes in vehicle and/or passenger density
that represent cognitive groupoid symmetry breaking (Wallace 2017, Sect. 9.6).
Fifty years after Mortensen’s work, on the eve of the ‘AI revolution’ that will
place inherently unstable critical infrastructure fully under machine control, essential
questions have yet to be solved, or—in large part—even properly stated.
In consequence, and in summary, promises of ‘graceful degradation under stress’
for driverless vehicles on intelligent roads, of ‘precision targeting’ for autonomous
or ‘centaur’ weapons that avoids civilian casualties, of ‘precision medicine’ under
collapsing living and working conditions, of agent-based models that manage finan-
cial crises in real time, and so on, at best represent species of wishful thinking across
many different modalities, scales, and levels of organization.
It is difficult (but not impossible with the help of self-deception, groupthink, or
outright prostitution and a good PR firm) to escape the inference that the forthcoming
beta testing of large-scale AI systems on unsuspecting human populations violates
fundamental norms.
The USA’s tactical and operational level ‘Revolution in Military Affairs’ of the
1990’s, the networked information system designed to lift the fog-of-war from armed
conflict, died a hard death in the protracted insurgencies that evolved against it in
Iraq and Afghanistan, abetted by a strategic incompetence that is a major recurring
manifestation of friction (e.g., Bowden 2017). (For a more trenchant analysis, see
Stephenson 2010). Now the AI revolution is about to meet Carl von Clausewitz.
New York City’s early use of algorithms to supervise critical service deployment,
the resulting catastrophic loss of housing and community, and a consequent massive
rise in premature mortality, provides a cogent case history (Wallace 1993; Wallace
and Wallace 1998).
The real world is not a game of Go. In the real world, nothing is actually too big
to fail, and in the real world, the evolution of one’s competitors is an ever-present
selection pressure: Caveat Emptor, Caveat Venditor.
References 43
References
Abler, R., J. Adams, and P. Gould. 1971. Spatial organization: The geographer’s view of the world.
New York: Prentice Hall.
Acebron, J., L. Bonilla, C. Perez Vicente, F. Ritort, and R. Spigler. 2005. The Kuramoto model: A
simple paradigm for synchronization phenomena. Reviews of Modern Physics 77: 137–185.
Altmann, J., and F. Sauer. 2017. Autonomous weapon systems and strategic stability. Survival 59:
117–142.
Appleby, J., X. Mao, and A. Rodkina. 2008. Stabilization and destabilization of nonlinear differential
equations by noise. IEEE Transactions on Automatic Control 53: 126–132.
Atlan, H., and I. Cohen. 1998. Immune information, self-organization and meaning. International
Immunology 10: 711–717.
Baumard, P. 2016. Deterence and escalation in an artificial intelligence dominant paradigm: Deter-
minants and outputs. MIT CSAIL Computer Science and Artificial Intelligence Laboratory,
Boston MA: In MIT international conference on military cyber stability.
Binney, J., N. Dowrick, A. Fisher, and M. Newman. 1986. The theory of critical phenomena. Oxford,
UK: Clarendon Press.
Bookstaber, R. 2017. The end of theory: Financial crises, the failure of economics, and the sweep
of human interactions. Princeton NJ: Princeton University Press.
Bowden, M. 2017. Hue 1968: A turning point of the American war in Vietnam. New York: Atlantic
Monthly Press.
Boyd, J. 1976. Destruction and creation. Available online from various sources.
Conte, R., and M. Paolucci. 2014. On agent-based modeling and computational social science.
Frontiers in Psychology 5: 668.
Cornuejols, G., and R. Tutuncu. 2006. Optimization methods in finance. New York: Cambridge
University Press.
Cover, T., and J. Thomas. 2006. Elements of information theory, 2nd ed. New York: Wiley.
de Groot, S., and P. Mazur. 1984. Non-equilibrium thermodynamics. New York: Dover.
Feynman, R. 2000. Lectures in computation. Boulder CO: Westview Press.
Glazebrook, J.F., and R. Wallace. 2009. Rate distortion manifolds as model spaces for cognitive
information. Informatica 33: 309–346.
Golubitsky, M., and I. Stewart. 2006. Nonlinear dynamics and networks: The groupoid formalism.
Bulletin of the American Mathematical Society 43: 305–364.
Gould, P., and R. Wallace. 1994. Spatial structures and scientific paradoxes in the AIDS pandemic.
Geografiska Annaler 76B: 105–116.
Hu, J., M. Prandini, K. Johnasson, and S. Sastry. 2001. Hybrid geodesics as optimal solutions to
the collision-free motion planning problem. In ed. M. Di Benedetto, A. Sangiovanni-Vincentelli,
HSCC 2001, LNCS 2034:305-318.
Huberman, B., and T. Hogg. 1987. Phase transitions in artificial intelligence systems. Artificial
Intelligence 33: 155–171.
Hwang, C., and A. Masud. 1979. Multiple objective decision making, methods and applications.
New York: Springer.
Ingber, L., and D. Sworder. 1991. Statistical mechanics of combat with human factors. Mathematical
Computational Modeling 15: 99–127.
Ingber, L., H. Fujio, and M. Wehner. 1991. Mathematical comparison of combat computer models
to exercise data. Mathematical Computational Modeling 15: 65–90.
Johnson, N., G. Zhao, E. Hunsader, H. Qi, N. Johnson, J. Meng, et al. 2013. Abrupt rise of new
machine ecology beyond human response time. Scientific Reports 3: 2627.
Kahneman, D. 2011. Thinking fast and slow. New York: Farrar, Straus and Giroux.
Kalloniatis, A., and D. Roberts. 2017. Synchronization of networked Kuramoto oscillators under
stable Levy noise. Physica A 466: 476–491.
Kania, E. 2017. Battlefield singularity: Artificial intelligence, military revolution, and China’s future
military power. Retreived from https://www.cnas.org/.
44 1 AI in the Real World
Khinchin, A. 1957. Mathematical foundations of information theory. New York: Dover Publications.
Kleinman, A. 1991. Rethinking psychiatry: From cultural category to personal experience. New
York: Free Press.
Kleinman, A., and A. Cohen. 1997. Psychiatry’s global challenge. Scientific American 276 (3):
86–89.
Kuramoto, Y. 1984. Chemical oscillations, waves, and turbulence. Berlin: Springer.
Lobel, I., A. Ozdaglar, and D. Feijer. 2011. Distributed multi-agent optimization with state-
dependent communication. Mathematical Programming B 129: 255–284.
Mao, X. 2007. Stochastic differential equations and applications, 2nd ed. Philadelphia: Woodhead
Publishing.
McQuie, R. 1987. Battle outcomes: Casualty rates as a measure of defeat, ARMY, November, 30-34.
Mortensen, R. 1966a. A priori open loop optimal control of continuous time stochastic systems.
International Journal of Control 3: 113–127.
Mortensen, R. 1966b. Stochastic optimal control with noisy observations. International Journal of
Control 4: 455–464.
Nair, G., F. Fagnani, S. Zampieri, and R. Evans. 2007. Feedback control under data rate constraints:
An overview. Proceedings of the IEEEE 95: 108–137.
Neuneck, G. 2008. The revolution in military affairs: Its driving forces, elements, and complexity.
Complexity 14: 50–60.
Nisbett, R., and Y. Miyamoto. 2005. The influence of culture: Holistic versus analytic perception.
TRENDS in Cognitive Sciences 10: 467–473.
Ormrod, D., and B. Turnbull. 2017. Attrition rates and maneuver in agent-based simulation models.
Journal of Defense Modelling and Simulation: Applications, Methodology, Technology. https://
doi.org/10.1177/1548512917692693.
Parker, E. 2016a. Flash crashes, information processing limits, and phase transitions. http://ssrn.
com/author=2119861.
Parker, E. 2016b. Flash crashes: The role of information processing based subordination and the
Cauchy distribution in market instability. Journal of Insurance and Financial Management 2:
90–103.
Pettini, M. 2007. Geometry and topology in hamiltonian dynamics. New York: Springer.
Protter, P. 1990. Stochastic integration and differential equations. New York: Springer.
Rose, P. 2001. Two strategic intelligence mistakes in Korea, 1950. https://www.cia.gov/library/
center-for-the-study-of-intelligence/csi-publications/csi-studies/studies/fall_winter_2001/
article06.html
Schrage, M. 2003. Perfect information and perverse incentives: Costs and consequences of trans-
formation and transparency. SPP Working Paper WP 03-1, MIT Center for International Studies.
Schrodinger, E. 1989. Statistical thermodynamics. New York: Dover Publications.
Shannon, C. 1959. Coding theorems for a discrete source with a fidelity criterion. Institute of Radio
Engineers International Convention Record 7: 142–163.
Stephenson, S. 2010. The revolution in military affairs: 12 observations on an out-of-fashion idea.
In Military review, 38–46, May–June.
Tishby, N., F. Pereira, and W. Bialek. 1999. The information bottleneck method. In 37th annual
conference on communication, control and computing, 368–377.
Tse-Tung, Mao. 1963. Selected military writings of Mao Tse-Tung. Peking, PRC: Foreign Langues
Press.
Turchin, A., and D. Denkenberger. 2018. Military AI as a convergent goal of self-improving AI. In
AI Safety and Security, ed. R. Yampolskiy. CRC Press.
Wallace, R. 1993. Recurrent collapse of the fire service in New York City: The failure of paramilitary
systems as a phase change. Environment and Planning A 25: 233–244.
Wallace, R. 2005. Consciousness: A mathematical treatment of the global neuronal workspace
model. New York: Springer.
Wallace, R. 2012. Consciousness, crosstalk, and the mereological fallacy: An evolutionary perspec-
tive. Physics of Life Reviews 9: 426–453.
References 45
Wallace, R. 2015. An ecosystem approach to economic stabilization: Escaping the neoliberal wilder-
ness. London: Routledge.
Wallace, R. 2016. High metabolic demand in neural tissues: Information and control theory per-
spectives on the synergism between rate and stability. Journal of Theoretical Biology 409: 86–96.
Wallace, R. 2017. Information theory models of instabilities in critical systems. Singapore: World
Scientific.
Wallace, R. 2018. Canonical Instabilities of autonomous vehicle systems: The unsettling reality
behind the dreams of greed. New York: Springer.
Wallace, D., and R. Wallace. 1998. A plague on your houses. New York: Verso.
Wallace, R., and D. Wallace. 2016. Gene expression and its discontents: The social production of
chronic disease, 2nd ed. New York: Springer.
Wallace, R., D. Wallace, and H. Andrews. 1997. AIDS, tuberculosis, violent crime and low birth-
weight in eight US metropolitan areas: Public policy, stochastic resonance, and the regional
diffusion of inner city markers. Environment and Planning A 29: 525–555.
Watts, B. 2004. Clausewitzian friction and future war, revised edition, McNair Paper 68. Washing-
ton, DC: Institute for National Strategic Studies, National Defense University.
Watts, B. 2008. US Combat training, operational art, and strategic competence: Problems and
opportunities. Washington, D.C.: Center for Strategic and Budgetary Assessments.
Watts, B. 2011. The maturing revolution in military affairs. Washington DC: Center for Strategic
and Budetary Affairs.
Weinstein, A. 1996. Groupoids: Unifying internal and external symmetry. Notices of the American
Mathematical Association 43: 744–752.
Wilson, K. 1971. Renormalization group and critical phenomena. I renormalization group and the
Kadanoff scaling picture. Physics Reviews B 4: 3174–3183.
Wolpert, D., and W. MacReady. 1995. No free lunch theorems for search. SFI-TR-02-010, Santa
Fe Institute.
Wolpert, D., and W. MacReady. 1997. No free lunch theorems and optimization. IEEE Transactions
on Evolutionary Computation 1: 67–82.
Zook, M., and M. Grote. 2017. The microgeographies of global finance: High-frequency trading
and the construction of information inequality. Environment and Planning A 49: 121–140.
Zurek, W. 1985. Cosmological experiments in superfluid helium? Nature 317: 505–508.
Zurek, W. 1996. The shards of broken symmetry. Nature 382: 296–298.
Chapter 2
Extending the Model
2.1 Introduction
Cognitive systems can, at least in first order, be described in terms of the ‘grammar’
and ‘syntax’ of appropriate information sources. This is because cognition implies
choice, choice reduces uncertainty, and the reduction of uncertainty implies the exis-
tence of an information source (Atlan and Cohen 1998; Wallace 2012, 2015a, b,
2016a, b, c, 2017). Conventional ‘parametric’ theory focuses, however, on adiabati-
cally piecewise stationary ergodic (APSE) sources, i.e., those that are parameterized
in time but remain as close as necessary to ergodic and stationary for the theory to
work. ‘Stationary’ implies that probabilities are not time dependent, and ‘ergodic’
roughly means that time averages are well represented as ensemble averages. Tran-
sitions between ‘pieces’ can then be described using an adaptation of standard renor-
malization methods, as described in the Mathematical Appendix.
The Wallace references provide details of the ‘adiabatic’ approximation, much like
the Born-Oppenheimer approach to molecular dynamics where nuclear oscillations
are taken as very slow in comparison with electron dynamics that equilibrate about
the nuclear motions. Here, we extend the theory to nonergodic cognitive systems that,
As Hoyrup (2013) notes, however, while every non-ergodic measure has a unique
decomposition into ergodic ones, this decomposition is not always computable. Such
expansions—in terms of the usual ergodic decomposition or the groupoid/directed
homotopy equivalents—both explain everything and explain nothing, in the same
sense that, over some limited domain, almost any real function can be written as
a Fourier series or integral that retains the essential character of the function itself.
Sometimes this helps if there are basic underlying periodicities leading to a meaning-
ful spectrum, otherwise not. A good analogy is the contrast between the Ptolemaic
expansion of planetary orbits in circular components around a fixed Earth versus
the Newtonian/Keplerian gravitational model in terms of ellipses with the Sun at one
focus. While the Ptolemaic expansion converges to any required accuracy, it conceals
the essential dynamics.
Here, we show that the very general approach adapted from nonequilibrium ther-
modynamics and used above can apply to both nonergodic systems and their ergodic
components, if such exist. Again, this is in terms of inherent groupoid symmetries
associated with equivalence classes of directed homotopy developmental pathways.
To reiterate, the attack is based on the counterintuitive recognition of information
as a form of free energy (Feynman 2000), rather than an ‘entropy’ in the physical
sense. A central constraint is that, in the extreme case which will be the starting point,
only individual developmental paths can be associated with an information-theoretic
source function that cannot be represented in terms of a Shannon entropy-like uncer-
tainty value across a probability distribution.
2.1 Introduction 49
Equivalence classes then must arise via a metric distance measure for which the
developmental trajectories of one kind of ‘game’ are closer together than for a sig-
nificantly different ‘game’. Averaging occurs according to such equivalence classes,
and is marked by groupoid symmetries, and by characteristic dynamics of symmetry
breaking according to appropriate ‘temperature’ changes indexing the influence of
embedding regulatory mechanisms. We will, however, recover the standard decom-
position by noting that larger equivalence classes across which uncertainty measures
are constant can be collapsed to single paths on an appropriate quotient manifold.
Recall that, for a stationary, ergodic information source X, as Khinchin (1957)
indicates, it is possible to divide statements of length n—written as x n = {X (0) =
x0 , X (1) = x1 , ..., X (n) = xn }—into two sets. The first, and largest, is not consonant
with the ‘grammar’ and ‘syntax’ of the information source, and consequently has
vanishingly small probability in the limit of large n. The second, much smaller
set that is consonant and characterized as ‘meaningful’, has the following essential
properties.
If N (n) is the number of meaningful statements of length n, then limits exist
satisfying the conditions
log[N (n)]
H [X] = lim =
n→∞ n
lim H (X n |X 0 , ..., X n−1 ) =
n→∞
H (X 0 , ..., X n )
lim (2.1)
n→∞ n
H (X n |X 0 , ..., X n−1 ) and H (X 0 , ..., X n ) are conditional and joint Shannon uncer-
tainties having the familiar pseudo-entropy form
H =− Pi log[Pi ]
i
0 ≤ Pi ≤ 1, Pi = 1 (2.2)
i
in the appropriate joint and conditional probabilities (Cover and Thomas 2006). This
limit is called the source uncertainty.
Nonergodic information sources cannot be directly represented in terms of Shan-
non uncertainties resembling entropies. For such sources, however, a function,
H (x n ), of each path x n → x, may still be defined, such that limn→∞ H (x n ) =
H (x) holds (Khinchin 1957, p. 72). However, H will not, in general, be given
by the simple cross-sectional laws-of-large numbers analog having the (deceptive)
entropy-like form of Eq. (2.2).
50 2 Extending the Model
baseball games will usually be played in recognizably similar ways, but a baseball and
a football game are played quite differently. This permits identification of directed
homotopy equivalence classes of paths associated with different ‘fundamental tasks’
carried out by the cognitive system under study. Again, equivalence classes of paths
define groupoids, and groupoids represent an extension of the idea of a symmetry
group (Weinstein 1996). For example, the simplest groupoid might be seen as a
disjoint union of groups, for which there is no single universal product.
See the Mathematical Appendix for formal characterization of the metric M , a
somewhat nontrivial matter that conceals much of the underlying machinery.
Suppose the data rate of the incoming control information—again, this is via
another information source—is a real number U . H (x) is the path dependent infor-
mation source uncertainty associated with the consonant cognitive path x, and we
can construct a Morse Function (Pettini 2007) using a pseudoprobability
exp[−H (x)/κU ]
P(x) ≡ (2.3)
x̂ exp[−H ( x̂)/κU ]
where the sum is over all possible consonant paths x̂ originating from some base
point. κ is a measure of the effectiveness of the control signal and might parameterize
processes of aging or environmental insult.
A Morse Function F, analogous to free energy in a physical system, is then defined
as
exp[−F/κU ] ≡ exp[−H (x̂)/κU ] (2.4)
x̂
where, again, the sum over all possible consonant paths originating from some fixed
initial system state.
The extension of the Data Rate Theorem emerges via a spontaneous symmetry
breaking driven by changes in κU . These changes affect the groupoid structure
underlying the ‘free energy’ Morse Function F associated with different dihomotopy
classes defined in terms of the metric M . Generally, higher values of κU will be
permit richer cognitive behaviors—higher values of H (x). The analogy is with
spontaneous group symmetry breaking in physical systems, first characterized by
Landau, that has since become a foundation of much of modern physics (Pettini
2007). We argue that extension of the perspective to cognition is via dihomotopy
groupoid rather than group symmetries. Previous work in this direction was restricted
to ergodic sources and their spectral constructs and averages. Here, we have attempted
to lift that restriction without invoking an ergodic decomposition that may not actually
be computable (Hoyrup 2013) and in a manner that permits a variant of the symmetry-
breaking arguments now central to modern physical theory.
It seems clear that the extended DRT for cognitive systems is not confined to
dichotomy between stable and unstable operation, but can encompass a broad range
of qualitative behavioral dynamics, some of which may be adaptive to selection
52 2 Extending the Model
pressures, but many of which will not, and might be characterized as pathological,
particularly as the embedding control information U or its effectiveness as param-
eterized by κ, declines.
The formalism allows restatement of a result from Chap. 1, but in more general terms.
The regulation and control of a developmental trajectory is almost certainly a high
dimensional process, involving a number of interacting signals at different critical
branch points. We can model the dynamics of this, in first order, via an analog to
Onsager’s approach to nonequilibrium thermodynamics. The general approach is
well-studied (e.g., Groot and Mazur 1984). The first step is to use the free energy
2.4 Environmental Insult and Developmental Dysfunction 53
Morse Function F of Eq. (2.4) to construct an entropy scalar via the Legendre trans-
form in the vector of essential driving parameters K as
S ≡ F(K ) − K · ∇ K F (2.5)
The Onsager approximation then makes a linear expansion for the rate of change
of the vector K in the gradients of S by the components of K , which we write in the
more general, and not necessarily linear multidimensional form,
Much of the basic argument can be redone using the Kolmogorov algorithmic com-
plexity K (X ) of a stochastic process X , since the expectation of K converges to
the Shannon uncertainty, i.e.,
1
E[ K (X n |n)] → H (X ) (2.7)
n
Cover and Thomas (2006) provide details.
However, Zvon and Levin (1970) argue that, if the ensemble is stationary but not
ergodic, the limit varies over the ensemble, as is the case with Shannon uncertainty.
54 2 Extending the Model
This permits a redefinition of the entropy measure of Eq. (2.5) in terms of K and
may provide a different perspective on system dynamics. Indeed, there may well
be a considerable set of such complexity measures that converge in expectation to
Shannon uncertainty in a similar manner. These could perhaps be crafted to partic-
ular circumstances for cleaving specific Gordian knots, much as does reexpressing
electrodynamics according to the underlying symmetries of the system under study:
Maxwell’s equations for spherical systems are more easily solved in spherical coor-
dinates, and so on. However, this is not at all straightforward. For example, Teixeira
et al. (2011) demonstrate that analogs to the above expression apply exactly to Renyi
and Tsallis entropies of order α only in the limit α → 1, for which they are not
defined. However, for the Tsallis entropy they do show that, for every ε > 0 and
0 < ε̂ < 1, given a probability distribution P, T1+ε (P) ≤ H (P) ≤ T1−ε̂ (P), where
T represents the Tsallis and H the Shannon measure.
2.6 Discussion
regularities from Onsager theory without the ‘reciprocity relations’ associated with
microreversibility, and a spontaneous symmetry breaking using groupoid rather than
group symmetries that extends the Data Rate Theorem. The underlying one-way
topological perspective of directed homotopy for cognitive/information processes
holds through the loss of the ergodic property and the consequent disappearance of
any simple expression for information source uncertainty.
These results provide a different perspective on the mechanisms of punctuated
failure across a broad spectrum of cognitive phenomena, ranging from cellular, neu-
rological, and other physiological and psychosocial processes, to critical systems
automata, institutional economics, and sociocultural dynamics.
References
Appleby, J., X. Mao, and A. Rodkina. 2008. Stabilization and destabilization of nonlinear differential
equations by noise. IEEE Transactions on Automatic Control 53: 126–132.
Atlan, H., and I. Cohen. 1998. Immune information, self-organization and meaning. International
Immunology 10: 711–717.
Coudene, T. 2016. Ergodic theory and dynamical systems. New York: Springer Universietext.
Cover, T., and J. Thomas. 2006. Elements of information theory, 2nd ed. New York: Wiley.
de Groot, S., and P. Mazur. 1984. Non-equilibrium thermodynamics. New York: Dover.
Durlauf, S. 1993. Nonergodic economic growth. Reviews of Economic Studies 60: 349–366.
Fajstrup, L., E. Goubault, A. Mourgues, S. Mimram, and M. Raussen. 2016. Directed algebraic
topology and concurrency. New York: Springer.
Feynman, R. 2000. Lectures in computation. Boulder CO: Westview Press.
Grandis, M. 2009. Directed algebraic topology: Models of non-reversible worlds. New York: Cam-
bridge University Press.
Gray, R., Davisson, L. (1974). The ergodic decomposition of stationary discrete random processes.
IEEE Transactions on Information Theory, IT, 20, 625-636.
Gray, R. 2011. Entropy and information theory, 2nd ed. New York: Springer.
Gray, R., and F. Saadat. 1984. Block source coding theory for asymptotically mean stationary
measures. IEEE Transactions on Information Theory 30: 5468.
Hahn, P. 1978. The regular representations of measure groupoids. Transactions of the American
Mathematical Society 242: 35–53.
Hoyrup, M. 2013. Computability of the ergodic decomposition. Annals of Pure and Applied Logic
164: 542–549.
Khinchin, A. 1957. Mathematical foundations of information theory. New York: Dover Publications.
Lee, J. 2000. Introduction to topological manifolds. New York: Springer.
Mackey, G.W. 1963. Ergodic theory, group theory, and differential geometry. Proceedings of the
National Academy of Sciences USA 50: 1184–1191.
Nair, G., F. Fagnani, S. Zampieri, and R. Evans. 2007. Feedback control under data rate constraints:
An overview. Proceedings of the IEEEE 95: 108–137.
Pettini, M. 2007. Geometry and topology in Hamiltonian dynamics. New York: Springer.
Schonhuth, A. 2008. The ergodic decomposition of asymptotically mean stationary random sources.
arXiv: 0804.2487v1 [cs.IT].
Series, C. 1977. Ergodic actions of product groups. Pacific Journal of Mathematics 70: 519–534.
Teixeira, A., A. Matos, A. Souto, and L. Antunes. 2011. Entropy measures vs. Kolmogorov com-
plexity. Entropy 13: 595–611.
Van den Broeck, C., J. Parrondo, and R. Toral. 1994. Noise-induced nonequilibrium phase transition.
Physical Review Letters 73: 3395–3398.
56 2 Extending the Model
Von Numann, J. 1932. Zur Operatorenmethode der klassischen Mechanik. Annals of Mathematics
33: 587642.
Wallace, R. 2012. Consciousness, crosstalk, and the mereological fallacy: An evolutionary perspec-
tive. Physics of Life Reviews 9: 426–453.
Wallace, R. 2015a. An ecosystem approach to economic stabilization: Escaping the neoliberal
wilderness. London: Routledge.
Wallace, R. 2015b. An information approach to Mitochondrial dysfunction: Extending Swerdlow’s
hypothesis. Singapore: World Scientific.
Wallace, R. 2016a. High metabolic demand in neural tissues: Information and control theory per-
spectives on the synergism between rate and stability. Journal of Theoretical Biology 409: 86–96.
Wallace, R. 2016b. Subtle noise structures as control signals in high-order biocognition. Physica
Letters A 380: 726–729.
Wallace, R. 2016c. Environmental induction of neurodevelopmental disorders. Bulletin of Mathe-
matical Biology 78: 2408–2426.
Wallace, R. 2017. Information theory models of instabilities in critical systems. Singapore: World
Scientific.
Wallace, R., and M. Fullilove. 2008. Collective consciousness and its discontents. New York:
Springer.
Weinstein, A. 1996. Groupoids: Unifying internal and external symmetry. Notices of the American
Mathematical Association 43: 744–752.
Zvon, A., and L. Levin. 1970. The complexity of finite objects and the development of the concepts
of information and randomness by means of the theory of algorithms. Russ. Math. Suros. 25:
83–124.
Chapter 3
An Example: Passenger Crowding
Instabilities of V2I Public Transit Systems
3.1 Introduction
number of standees inside a bus, multiplied by the total number of passengers boarding and
alighting at a bus stop... A formal treatment [shows]... that average waiting time is related not
only to the headway (the inverse of bus frequency) but also to the occupancy rate or crowding
level in an additive or multiplicative way... A second effect of high occupancy levels on
waiting times is the possibility of triggering bus bunching [by a number of mechanisms] ...
[T]he negative impacts of crowding on the reliability of public transport services should be
carefully analysed...
The seduction of real-time V2I systems using GPS positioning of individual transit
vehicles is the assumption that sufficient control of vehicle headway will smooth out
passenger and vehicle congestion, avoiding bunching, mitigating overcrowding, and
so on. Here, via the Data Rate Theorem that links control and information theories,
we show that assumption to be an illusion, and that there will always be a critical
value of passenger density at which a public transit system suffers the functional
equivalent of a massive traffic jam.
The phenomenological model we develop will, in fact, link larger-scale vehi-
cles/mile traffic density with passengers/bus density and roadway quality.
The underlying conceit of V2I systems is that the infrastructure can control indi-
vidual vehicles to regulate traffic flow. An essential constraint on such systems,
however, is that they are inherently unstable, and require a constant flow of control
information to stay on the road or, if on a track, to avoid collisions. As discussed
above, aircraft can be designed to be inherently stable, in the sense that, for a short
time at least, they can be allowed to proceed ‘hands off’, as long as the center of
pressure of the vehicle is behind the center of gravity. Then small perturbations from
steady state rapidly die out. Ground vehicles in heavy traffic on twisting roads must,
by contrast, always be under real-time direction by a cognitive entity—driver or AI
automaton.
The first stage of modeling the V2I public transit system is the usual linear expan-
sion around a nonequilibrium steady state in which control information is sufficient
to keep the system ‘on track’.
Recall that the Data Rate Theorem (Nair et al. 2007) establishes the minimum rate
at which externally-supplied control information must be provided for an inherently
unstable system to maintain stability. Given the linear expansion near a nonequi-
librium steady state, an n-dimensional vector of system parameters at time t, xt ,
determines the state at time t + 1 according to the model of Fig. 1.1, so that
such conditions states that the minimum control information rate H is determined
by the relation
H > log[| det(Am )|] ≡ a0 (3.2)
f (ρ) ≈ κ3 ρ + κ4 (3.4)
Again, a Clausewitz temperature can be defined, and, as before, the limit condition
for stability becomes
κ1 ρ + κ2
T ≡ > a0 (3.5)
κ3 ρ + κ4
And as before, for small ρ, the stability condition is κ2 /κ4 > a0 . At large ρ this
again becomes κ1 /κ3 > a0 . If κ2 /κ4 κ1 /κ3 , the stability condition may be violated
at high traffic densities, and instability becomes manifest, as at the higher ranges of
Fig. 3.1. See Fig. 1.2 for the canonical form.
60 3 An Example: Passenger Crowding Instabilities of V2I Public Transit Systems
For buses embedded in a larger traffic stream we recapitulate something of Sect. 1.3,
as there are at least three critical densities that must interact: vehicles per linear mile,
passengers per bus, and an inverse index of roadway quality that might be called
‘potholes per mile’. There is, then, a characteristic density matrix for the system,
which we write as ρ̂: ⎛ ⎞
ρ11 ρ12 ρ13
⎝ ρ21 ρ22 ρ23 ⎠
ρ31 ρ23 ρ33
ρ11 is the number of passengers per bus, ρ22 vehicles per mile, ρ33 ‘potholes per
mile’, and the off-diagonal terms are measures of interaction between them since,
at the least, buses are part of the traffic stream, roadway quality affects vehicles per
mile, and so on.
One might extend the model to even higher dimensions by including, for example,
passenger densities of a subway or light rail system feeding into a transit ‘hot spot’.
Again, we apply the arguments of Sect. 1.3. An n × n matrix ρ̂ has n invariants
ri , i = 1..n, that remain fixed when ‘principal component analysis’ transformations
are applied to data, and these can be used to construct an invariant scalar measure,
using the polynomial relation
det is the determinant, λ is a parameter and I the n ×n identity matrix. The invariants
are the coefficients of λ in p(λ), normalized so that the coefficient of λn is 1. As
described in Sect. 1.3, typically, the first invariant will be the matrix trace and the
last ± the matrix determinant.
For an n × n ρ-matrix it again becomes possible to define a composite scalar
index Γ as a monotonic increasing function of the matrix invariants
for positive αi . Recall that, for n = 2, Tr[ρ̂] = ρ11 +ρ22 and det[ρ̂] = ρ11 ρ22 −ρ12 ρ21 .
Again, an n × n matrix will have n such invariants from which the scalar index
Γ can be constructed.
62 3 An Example: Passenger Crowding Instabilities of V2I Public Transit Systems
This method can be seen as a variant of the ‘Rate Distortion Manifold’ of Glaze-
brook and Wallace (2009) or the ‘Generalized Retina’ of Wallace and Wallace
(2013, Sect. 10.1) in which high dimensional data flows can be projected down onto
lower dimensional, shifting, tunable ‘tangent spaces’ with minimal loss of essential
information.
The DRT argument implies a raised probability of a transition between stable and
unstable behavior if the Clausewitz temperature analog
κ 1 Γ + κ2
T ≡
κ3 Γ + κ4
falls below a critical value, as in Fig. 1.2. Kerner and Klenov (2009), however, argue
that traffic flow can be subject to more than two phases. We can recover something
similar for V2I public transit systems driven by passenger density etc. via a ‘cognitive
paradigm’ similar to that of Sect. 1.6. Recall that Atlan and Cohen (1998) view a
system as cognitive if it must compare incoming signals with a learned or inherited
picture of the world, then actively chooses a response from a larger set of those
possible to it. V2I systems are clearly cognitive in that sense. Such choice, however,
implies the existence of an information source, since it reduces uncertainty in a formal
way.
Given the ‘dual’ information source associated with the inherently unstable cogni-
tive V2I public transit system, an equivalence class algebra can again be constructed
by selecting different system origin states and defining the equivalence of subsequent
states at a later time by the existence of a high probability path connecting them to the
same origin state. Disjoint partition by equivalence class, analogous to orbit equiv-
alence classes in dynamical systems, defines a symmetry groupoid associated with
the cognitive process. Groupoids are ‘weak’ generalizations of group symmetries in
which there is not necessarily a product defined for each possible element pair, for
example in the disjoint union of different groups.
The equivalence classes across possible origin states define a set of information
sources dual to different cognitive states available to the inherently unstable V2I
public transit system. These create a large groupoid, with each orbit corresponding
to a transitive groupoid whose disjoint union is the full groupoid. Each subgroupoid
is associated with its own dual information source, and larger groupoids must have
richer dual information sources than smaller.
Let X G i be the V2I system’s dual information source associated with groupoid
element G i . Given the argument leading to Eqs. (3.5–3.7), it is again possible to
construct a Morse Function in the manner of Sect. 1.6.
Let H (X G i ) ≡ HG i be the Shannon uncertainty of the information source asso-
ciated with the groupoid element G i . We can define another pseudoprobability as
3.4 Simplified Dynamics of System Failure 63
exp[−HG i /ωT ]
P[HG i ] ≡ (3.9)
j exp[−HG j /ωT ]
where T has again been constructed using a composite index, Γ , and the sum is over
the different possible cognitive modes of the full system. ω is a scaling parameter
representing the rate at which changes in T affect system cognition.
A ‘free energy’ Morse Function F can again be defined as
exp[−F/ωT ] ≡ exp[−HG j /ωT ] (3.10)
j
with W being the Lambert W-function. This is defined by W (x) exp[W (x)] = x. We
take a as an index of the proportion of routes with overcrowded transit vehicles.
Figure 3.2 shows the relation, which is strikingly similar to the ‘two popula-
tion’ model of Fig. 1.4. Indeed, treating the proportions of overcrowded buses and
64 3 An Example: Passenger Crowding Instabilities of V2I Public Transit Systems
Fig. 3.2 Relative size of the largest network connected component—the multimodal ‘transit jam’—
for random connections. a is taken as an index of the proportion of transit vehicles that are over-
crowded, and W is the Lambert W-function. Tuning the topology of the network leads to a family
of broadly similar curves with different thresholds and topping-out levels
The essential content of the Data Rate Theorem is, of course, that, if the rate at which
control information can be provided to an unstable system is below the critical limit
defined by the rate at which the system generates ‘topological information’, there
is no coding strategy, no timing strategy, no control scheme of any form, that can
provide stability. Generalization, based on the inherently cognitive nature of V2I
systems—human or AI controlled—suggests that there may be a sequence of stages
of increasing transit jam dysfunction for public transit under the burden of rising
per-bus passenger densities.
Thus, for a bus system necessarily embedded in a larger traffic flow, no matter
what V2I headway manipulations are applied, there will always be a critical per-bus
passenger density that creates the public transit equivalent of a traffic jam, i.e., transit
jams that include bunching, long headways, extended on/off delays, buses too full to
pick up passengers, and so on, all synergistic with gross overcrowding. The arguments
of Kerner and Klenov (2009) on phase transitions carry over into public transit sys-
tems whose dynamics are driven by multiple density measures and their interaction.
For a given route at a fixed time, there should be a ‘passenger density macroscopic
fundamental diagram’ (PDMFD) much like Fig. 3.1 showing passengers/hour as a
function of passengers/vehicle. The previous sections, describing service instability,
imply the inevitability of ‘explosive’ deviations from regularity in the PDMFD with
increasing passenger load.
The essential solution to traffic jam analogs—to transit jams in public
transportation—is to provide adequate numbers of vehicles so that critical passenger
densities are not exceeded.
In sum, there can be no cheap tech fix for inadequate public transit service.
References
Albert, R., and A. Barabasi. 2002. Statistical mechanics of complex networks. Reviews of Modern
Physics 74: 47–97.
Atlan, H., and I. Cohen. 1998. Immune information, self-organization and meaning. International
Immunology 10: 711–717.
Blandin, S., et al. 2011. A general phase transition model for vehicular traffic. SIAM Journal of
Applied Mathematics 71: 107–127.
Chiabaut, N. 2015. Evaluation of a multimodal urban arterial: The passenger macroscopic funda-
mental diagram. Transportation Research Part B 81: 410–420.
Corless, R., G. Gonnet, D. Hare, D. Jeffrey, and D. Knuth. 1996. On the Lambert W function.
Advances in Computational Mathematics 4: 329–359.
Geroliminis, N., N. Zheng, and K. Ampountolas. 2014. A three-dimensional macroscopic fun-
damental diagram for mixed bi-model urban networks. Transportation Research Part C 42:
168–181.
Glazebrook, J.F., and R. Wallace. 2009. Rate distortion manifolds as model spaces for cognitive
information. Informatica 33: 309–346.
66 3 An Example: Passenger Crowding Instabilities of V2I Public Transit Systems
Ivanchev, J., H. Aydt, and A. Knoll. 2014. Stochastic bus traffic modeling and validation using smart
card fare collection data, In IEEE 17th Conference on ITSC, 2954–2061.
Kerner, B., and S. Klenov. 2009. Phase transitions in traffic flow on multilane roads. Physical Review
E 80: 056101.
Maerivoet, S., and B. De Moor. 2006. Data quality travel time estimation and reliability. Katholieke
University Leuven 06-030.
Nair, G., F. Fagnani, S. Zampieri, and R. Evans. 2007. Feedback control under data rate constraints:
An overview. Proceedings of the IEEEE 95: 108–137.
Sugiyama, Y., M. Fukui, M. Kikuchi, K. Hasebe, A. Nakayama, et al. 2008. Traffic jams without
bottlenecks—experimental evidence for the physical mechanisms of the formation of a jam. New
Journal of Physics 10: 033001.
Tirachini, A., D. Hensher, and J. Rose. 2013. Crowding in public transport systems: Effects on
users, operation and implications for the estimation of demand. Transportation Research Part A
53: 36–52.
Tirachini, A., D. Hensher, and J. Rose. 2014. Multimodal pricing and optimal design of urban public
transport: The interplay between traffic congestion and bus crowding. Transportation Research
Part B 61: 33–54.
Wallace, R., and D. Wallace. 2013. A mathematical approach to multilevel, multiscale health inter-
ventions: Pharmaceutical Industry decline and policy response. London: Imperial College Press.
Chapter 4
An Example: Fighting the Last War
4.1 Introduction
A healthy small child, given ten or so different pictures of an elephant over a few
days, when first taken to the zoo or the circus (at least during the author’s childhood)
or to the appropriate Disney movie, has no trouble identifying a newly-seen elephant,
in-the-flesh, or on screen. AI systems, deep learning or otherwise, must be confronted
with innumerable elephant pictures in an enormous variety of situations to be able to
identify an elephant in some previously unexperienced context. Human institutions,
which are cognitive entities, do not fare much better.
The canonical example of institutional failure is, perhaps, the inevitability of
military structures almost always ‘fighting the last war’ (or the last battle). Although
one might prefer to focus on the fall of France in 1940, such considerations apply as
much to Erwin Rommel’s armored sideshow of Bewegungskrieg a la 1940 France
in the face of an ultimately overwhelming English/US strategic superiority in North
Africa that harked back to U.S. Grant’s Civil War strategy. Indeed, Bewegungskrieg
suffered a similar, if slower, grinding collapse on the Eastern Front of WWII from
Moscow to Stalingrad and Kursk, for analogous reasons associated with differences
in both manufacturing and manpower capacity and approach. Vergultungs waffen
and a handful of jet interceptors and Tiger tanks didn’t count for much in the face of
massive supply chains and the evolution of sophisticated combined arms tactics. As
they say, the mills of the Gods grind slowly, but they grind exceeding fine.
Grant’s autobiography remains of some interest.
Another cogent example can be found the aftermath of the US victory in the
Gulf wars of 1991 and 2003, and in the career of General H.R. McMaster, as of this
writing, the U.S. National Security Advisor.
During the first Gulf War, February, 1991, H.R. McMaster was a Capitan com-
manding Eagle Troop of the Second Armored Cavalry Regiment in the battle known
as 73 Easting. As the result of an all too typical US operational failure, McMaster’s
unit of 9 M1A1 tanks and 12 Bradley reconnaissance vehicles (each armed with a
brace of reloadable anti-tank missiles), was ordered to rapidly advance toward I-
raqi defense lines in a sand storm, without air support, and without any intelligence
regarding actual enemy deployment.
In the sand storm, not knowing whereabouts of the enemy, McMaster ordered the
lightly-armored Bradley vehicles to form up behind the line of tanks.
Topping a ridge, Eagle Troop’s 9 M1A1 tanks were unexpectedly confronted
with a fully dug-in Iraqi T-72 tank company. Relying on the tactical superiority of
the M1A1 over the T-72, and on the relentless US live-fire training that permitted a
fire rate of 12 shots per minute, in 23 min Eagle Troop destroyed 28 T-72 tanks, 16
armored personnel carriers, and 39 trucks, eliminating the entrenched Iraqi company,
without taking a casualty.
Other US armored units in the same offensive thrust faced similar operational lacu-
nae, again forced to engage in unexpected large-scale combat with highly motivated,
modestly well-trained, deeply entrenched Iraqi armor. Again, only vastly superior
equipment and training permitted US forces to carry through the confrontations with
minimal casualties and with the destruction of almost all enemy units.
In some 90 min, in spite of a characteristic operational level incompetence, US
tactical advantage resulted in the elimination of an entire elite Iraqi armored brigade.
The spirit of Erwin Rommel, and of a resurrected Prussian Bewegungskrieg, seemed
to have completely won the day.
Fast forward to the 2003 occupation of Iraq, the invasion of Afghanistan, and
nearly fifteen years of grinding insurgency: somebody changed the rules of the game
from armored Bewegungskrieg to another style of US Grant’s sociocultural grind.
The Gods are still deciding how small the pieces are going to be on that.
Indeed, in 2005 then-Col. McMaster was tasked with the pacification of the city of
Tal Afar in Iraq, under the rubric of ‘Operation Restore Rights’, a modestly successful
effort whose central innovation was the active presence of many US troops within
the city 24/7, usually involving forcible evictions of Iraqi families to house them
overnight (Finer 2017). The US soon ‘declared victory’ and withdrew. By mid-2006,
in Finer’s words, Tal Afar “was awash in the sectarian violence that had engulfed
much of Iraq”. In June 2014 Tal Afar was one of the first cities taken by the Islamic
State.
McMaster went on to hold a series of staff positions in the US Central Command.
It is, however, of some interest that, in 2006–7, he was passed over for promotion to
general. In 2007 the Secretary of the Army requested that General David Petraeus
return from Iraq to take charge of the promotion board as a way to ensure that the
best performers in combat received every consideration for advancement, resulting
in McMaster’s promotion. As a third-star general, in 2014 he began duties as Deputy
Commanding General of the Training and Doctrine Command (Wikipedia 2017).
As Watts (2008) argues at some length, skills needed at the tactical do-or-die level
do not translate well into a corresponding degree skill at the operational and strategic
4.1 Introduction 69
levels, where the US remains severely challenged. As Watts puts it, tactical problems
are usually subject to relatively simple engineering solutions—better equipment and
training than the opposition—while operational and strategic problems are, in a
formal sense, ‘wickedly hard’, involving subtleties and synergisms that are difficult
to understand and to address. Different kinds of thinking, training, and talent are
required for each level.
It is the contention of this work that AI systems tasked with the control of crit-
ical real-time systems will face many of the same problems that have routinely
crippled their institutional counterparts, particularly under fog-of-war and frictional
constraints.
Here, we will model how cognitive systems, including AI entities, are dependent
on continuing crosstalk between strategic and operational ‘algorithms’, in a large
sense, and an appropriate reading of real-time field experience. This will prove to
generate a different version of John Boyd’s ‘command loop’ dynamic failure mech-
anism.
In short, if you don’t know if, when, or how the rules have changed, you can’t
win the game.
Here, the approach to dynamic process is via the mutual information generated by
crosstalk between channels. The essential point is again that there must be continual
communication between tactical and higher—operational and strategic—levels of
cognition. The tactical level is tasked with response to real-time ‘roadway’ shifts,
while the operational level must coordinate larger-scale tactical challenges and the
strategic must focus on monitoring the inevitably-changing rules-of-the-game. As
they say, don’t take a knife to a gunfight.
The focus is on the mutual information between information channels representing
the tactical and higher levels of control.
Mutual information between information sources X and Y is defined as
I (X ; Y ) = H (Y ) − H (Y |X ) =
p(x, y) log[ p(x, y)/ p(x) p(y)] =
x,y
p(x, y) log[ p(x, y)/ p(x) p(y)]d xd y (4.1)
x,y
where the last expression is for continuous variates. It is a convex function of the
conditional probability p(y|x) = p(x, y)/ p(x) for fixed probabilities p(x) (Cover
and Thomas 2006), and this would permit a complicated construction something like
that of previous chapters, taking the x-channel as the control signal. We will treat a
simplified example.
70 4 An Example: Fighting the Last War
Given two interacting channels where the p’s have normal distributions, mutual
information is related to correlation as
SM ≡ M (Z ) − Z dM (Z )/d Z (4.3)
μ Z (t)
d Z (t)/dt = μd SM /d Z = − (4.4)
2 (1 − Z (t))2
− Z 2 + 4Z − 2 log(Z ) = μt + 3 (4.5)
where K is a measure of free energy exchange, μ is a diffusion rate and, again, dWt
represents white noise.
This has the steady state expectation
1 4K + μ − 8K μ + μ2
ρ =
2
(4.7)
4 K
The limit of this expression, as K → ∞, is 1.
4.2 A Crosstalk Model: Mutual Information Dynamics 71
Figure 4.2 shows a graph of ρ 2 versus K for μ = 1. The greater the diffusion
coefficient μ, the slower the rate of convergence.
It is possible to determine the standard deviation of the squared correlation (i.e.,
of the fraction of the joint variance) by calculating the difference of steady state
expectations E(Z 2 ) − E(Z )2 for the model of Eq. (4.6), again using the Ito chain
72 4 An Example: Fighting the Last War
4.3 Discussion
The K index in Eq. (4.6) is a free energy measure representing the degree of crosstalk
between the channels X and Y , indexing here different levels of command. Free en-
ergy is a measure of actual work. That is, it takes active work to effectively crosslink
strategic and tactical levels of organization in a cognitive system. It is not enough
to ‘accept information’ from below, but that information must be winnowed out to
look for patterns of changing challenge and/or opportunity. Winnowing data involves
choice, choice involves cognition and the reduction of uncertainty. The reduction of
uncertainty implies the existence of an information source. Information is a form
of free energy, and the exercise of cognition implies the expenditure of, often con-
siderable, free energy. The argument is exactly circular, and illustrated by Fig. 4.2.
Fog-of-war and frictional constraints can probably be added to this model, perhaps
via parameterization of the variance measure E(Z 2 ) − E(Z )2 .
AI systems that take millions of exposures to pictures of elephants in different
contexts to recognize one elephant in an unfamiliar context will not do well when
told they must search for tigers. ‘Big Data’ cannot train AI unless some executive
function has already recognized the need to winnow down and choose the appropriate
data subset for retraining, even for advanced algorithms that are easily trained. If
embedding reality changes the game faster than the AI (or institutional) system can
respond, then some version of John Boyd’s trap will have been sprung on it, either
through happenstance or deliberation.
Crosstalk takes work.
References
Cover, T., and J. Thomas. 2006. Elements of information theory, 2nd ed. New York: Wiley.
Finer, J. 2017. H.R. McMaster is hailed as the hero of Iraq’s Tal Afar. Here’s what that operation
looked like, Washington Post, 2/24/2017.
Stuart, A., and J. Ord. 1994. Kendall’s advanced theory of statistics, 6th ed. London: Hodder Arnold.
Watts, B. 2008. US combat training, operational art, and strategic competence: Problems and
opportunities. Washington, D.C.: Center for Strategic and Budgetary Assessments.
Wikipedia. 2017. https://en.wikipedia.org/wiki/H._R._McMaster.
Chapter 5
Coming Full Circle: Autonomous Weapons
5.1 Introduction
Unfortunately... adjusting the sensor threshold to increase the number of target attacks also
increases the number of false target attacks. Thus the operator’s objectives are competing,
and a trade-off situation arises. (Kish et al. 2009)
that created and followed World War I—including the European colonial ‘country
building’ producing Iraq and Syria—haunt us today. At present, the USA—and other
nations—are poised to move beyond current man/machine ‘cockpit’ drone systems
to autonomous weapons.
As Archbishop Silvano Tomasi (2014) put it,
..[T]he development of complex autonomous weapon systems which remove the human
actor from lethal decision-making is short-sighted and may irreversibly alter the nature of
warfare in a less humane direction, leading to consequences we cannot possibly foresee, but
that will in any case increase the dehumanization of warfare.
The collapse dynamics we have explored in the previous chapters move the argu-
ment beyond Scharre’s ‘operational risk’ into violations of the Laws of Land Warfare
that require distinction between combatants and non combatants.
To reiterate, unlike an aircraft that can remain in stable flight as long as the center
of pressure is sufficiently behind the center of gravity, high-order cognitive systems
like human sports and combat teams, man-machine ‘cockpits’, self-driving vehicles,
autonomous weapon systems, and modern fighter aircraft—built to be maneuverable
rather than stable—operate in real-time on rapidly-shifting topological ‘highways’
of complex multimodal demand. Facing these turbulent topologies, according to the
Data Rate Theorem, the cognitive system must receive a constant flow of sufficiently
detailed information describing them.
Matters are, of course, even more complex. The underlying ‘roadway topology’
of combat operations becomes exceedingly rich under conditions of necessary dis-
crimination between combatant and noncombatant. Again, the problem of air traffic
control (ATC) provides an entry point. In ATC, locally stable vehicle paths are seen
as thick braid geodesics in a simpler Euclidean quotient space (Hu et al. 2001). These
are generalizations of the streamline characteristics of hydrodynamic flow (Landau
and Lifshitz 1987). As described above, in the context of ATC, Hu et al. demonstrate
5.2 The Topology of Target Space 75
that finding collision-free maneuvers for multiple agents on a Euclidean plane sur-
face R 2 is the same as finding the shortest geodesic in a particular manifold with
nonsmooth boundary. Given n vehicles, that geodesic is calculated for the topolog-
ical quotient space R 2n /W (r ), where W (r ) is defined by the requirement that no
vehicles are closer together than some critical Euclidean distance r .
For autonomous or other weapons under targeting constraints r is, crudely, the
minimum acceptable distance to possible noncombatants in the target zone. R 2 must
again be replaced by a far more topologically complex and extraordinarily dynamic
roadway space M 2 (or even M 3 ) that incorporates evasive maneuvers of potential
targets within and around ‘no-go’ zones for the weapon. Geodesics for n possible
targets are then in a highly irregular and rapidly-shifting quotient space M αn /W (r ),
whose dynamics are subject to phase transitions driven by the convolution of fog-of-
war and friction indices characterized in the previous chapters. The different phases
are analogous to the different ‘traffic jam’ conformations identified by Kerner and
Klenov (2009), who apply insights from statistical physics to traffic flow.
Needless to say, navigating under such restraints will always be far more difficult
than in the case of air traffic control. The ‘ground state’ fallback will obviously be
to simply collapse r to zero and thus greatly simplify target space topology.
According to the Data Rate Theorem, if the rate at which control information
can be provided to an unstable system is below the critical limit defined by the rate
at which the system generates ‘topological information’, there is no coding strat-
egy, no timing strategy, no control scheme of any form, that can ensure stability.
Generalization to the rate of incoming information from the rapidly-changing multi-
modal ‘roadway’ environments in which a real-time cognitive system must operate
suggests that there will be sharp onset of serious dysfunction under the burden of
rising demand. In Sect. 1.3 we analyzed that multimodal demand in terms of the
crosstalk-like fog-of-war matrix ρi, j that can be characterized by situation-specific
statistical models leading to the scalar temperature analog T and a similar argument
leading to the friction/resolve index φ. More complicated ‘tangent space’ reductions
are possible, at the expense of greater mathematical overhead (e.g., Glazebrook and
Wallace 2009).
There will not be graceful degradation under falling fog-of-war ‘temperatures’ or
increasing ‘friction’, but rather punctuated functional decline that, for autonomous,
centaur, or man-machine cockpit weapon systems, deteriorates into a frozen state
in which ‘all possible targets are enemies’, as in the case of the Patriot missile
fratricides (Hawley 2006). Other cognitive systems will display analogous patterns of
punctuated collapse into simplistic dysfunctional phenotypes or behaviors (Wallace
2015a, b, 2017): the underlying dynamic is ubiquitous and, apparently, inescapable.
As Neuneck (2008) puts it,
[Proponents of the ‘Revolution in Military Affairs’ seek] to eliminate Clausewitz’s ‘fog of
war’... to eliminate unpredictability on the battlefield. War is a complex, nonlinear process
of violent interactions where technological edge is not a guarantee for success.
76 5 Coming Full Circle: Autonomous Weapons
Fig. 5.1 Adapted from Venkataraman et al. (2011). Under real-time fog-of-war constraints it
becomes difficult for automated systems to differentiate between military and civilian vehicles.
Ground state collapse identifies everything as a tank
The problem, in its many and varied intractable forms, has been considered and
reconsidered across a number of venues. In addition to the opening remarks by Kish
et al. (2009); Venkataraman et al. (2011), for example, review a relevant sector of
the signal processing literature and Fig. 5.1, adapted from their paper, encapsulates,
in reverse, something of the conundrum. Under sufficient real-time fog-of-war con-
straint, a cognitive system collapses into a ground state that does not differentiate
between an SUV, a van, and a tank.
In 2017 the Pentagon’s elite advisory panel, the infamous JASON group of Viet-
nam war ‘automated battlefield’ fame, released an unclassified overview of possible
uses for artificial intelligence by the US Department of Defense (JASON 2017).
5.2 The Topology of Target Space 77
At the very end of a Statement of Work appendix to the report is the following
‘Scope’ Q&A exchange:
4. Many leading AI researchers and scientists from other disciplines expressed their concerns
of potential pitfalls of AI development in the “Open Letter on Artificial Intelligence.” As the
letter suggests, can we trust these agents to perform correctly? Can we verify and validate
these agents with sufficient level of built-in security and control to ensure that these systems
do what we want them to do?
JASON response: Verification and validation of AI agents is, at present, immature. There is
considerable opportunity for DoD to participate in the program of advancing the state of the
art of AI to become a true engineering discipline, in which V&V, as well as other engineering
“ilities”[reliability, maintainability, accountability, verifiability, etc.], will be appropriately
controlled.
Recognizing, perhaps, a classic lawyer’s tapdance, Carl von Clausewitz might well
disagree that matters will be this simple. Indeed, it is interesting to note that John Boyd
himself had directly ordered the closing of JASON’s automated battlefield project—
a mix of electronic sensors and quick-response air strikes aimed at closing North
Vietnam’s ‘Ho-Chi Mihn trail’ supply line—as an ineffective waste of resources
(Lawson 2014, Chap. 5).
In sum, as with other real-time AI, there is no free lunch for cognitive weapon sys-
tems, with or without hands-on human control. All such systems—including conven-
tional military command structures at different scales and levels of organization—are
inherently susceptible to serious operational instabilities under complex fog-of-war
and frictional environments. Policy based on the business dreams of military contrac-
tors and their academic or think-tank clients—promises of precision targeting—will
be confronted by nightmare realities of martyred civilian populations, recurring gen-
erations of new ‘terrorists’, and the persistent stench of war crime.
References
Columbia University Law School Human Rights Clinic. 2012. Counting drone strike deaths. http://
web.law.columbia.edu/human-rights-institute.
Glazebrook, J.F., and R. Wallace. 2009. Rate distortion manifolds as model spaces for cognitive
information. Informatica 33: 309–346.
Hanley, C., M. Mendoza, and S. Choe. 2001. The bridge at No Gun Ri: A hidden nightmare from
the Korean War. New York: Henry Holt and Company.
Hawley, J. 2006. Patriot fratricides: The human dimension lessons of Operation Iraqi Freedom.
Field Artillery, January-February.
Hersh, S. 1972. Cover-up: The Army’s secret investigation of the massacre at My Lai 4. New York:
Random House.
Hu, J., M. Prandini, K. Johnasson, and S. Sastry. 2001. Hybrid geodesics as optimal solutions to
the collision-free motion planning problem. In HSCC 2001. LNCS, vol. 2034, eds. Di Benedetto,
M., and A. Sangiovanni-Vincentelli, 305–318.
JASON. 2017. Perspectives on research in artificial intelligence and artificial general intelligence
relevant to DoD, JSR-16-Tasl-003. McLean, VA: The MITRE Corporation.
Kerner, B., and S. Klenov. 2009. Phase transitions in traffic flow on multilane roads. Physical Review
E 80: 056101.
78 5 Coming Full Circle: Autonomous Weapons
Kish, B., M. Pachter, and D. Jacques. 2009. Effectiveness measures for operations in uncertain
environments. In UAV Copperative Decision and Control: Challenges and Practical Applications,
eds. Shima, T., and S. Rasmussen, Chap. 7. Philadelphia: SIAM Publications.
Landau, L., and E. Lifshitz. 1987. Fluid mechanics, 2nd ed. Pergamon: New York.
Lawson, S. 2014. Non-linear science and warfare: Chaos, complexity and the US military in the
information age. New York: Routledge.
Neuneck, G. 2008. The revolution in military affairs: Its driving forces, elements, and complexity.
Complexity 14: 50–60.
Scharre, P. 2016. Autonomous weapons and operational risk. Washington, DC: Center for New
American Security. http://www.cnas.org/autonomous-weapons-and-operational-risk.vtISHGO-
RiY.
Stanford/NYU. 2012. Living under drones: Death, injury and trauma to civilians from US drone
practices in Pakistan. http://livingunderdrones.org/.
Tomasi, S. 2014. Catholic Herald. http://www.catholicherald.co.uk/news/2014/05/15/vatican-
official-voices-opposition-to-automated-weapons-systems/.
Trsek, R., and (Lt. Col. USAF). 2014. Cutting the cord: Discrimination and command responsibility
in autonomous lethal weapons. USA: Air War College of Air University.
Venkataraman, V., G. Fan, L. Yu, X. Zhang, W. Liu, and J. Havlick. 2011. Automated target tracking
and recognition using coupled view and identity manifolds for shape recognition. EURASIP
Journal of Advances in Signal Processing 2011: 124.
Wallace, R. 2015a. An ecosystem approach to economic stabilization: Escaping the neoliberal
wilderness. London: Routledge.
Wallace, R. 2015b. An information approach to mitochondrial dysfunction: Extending Swerdlow’s
hypothesis. Singapore: World Scientific.
Wallace, R. 2017. Information theory models of instabilities in critical systems. Singapore: World
Scientific.
Chapter 6
An Evolutionary Approach to Real-Time
Conflict: Beware the ‘Language that Speaks
Itself’
6.1 Introduction
nature of those processes. That is, although the operation of information sources is
both nonequilibrium and irreversible in the most fundamental sense (e.g., few and
short palindromes), the asymptotic limit theorems of information theory beat back
the mathematical thicket surrounding such phenomena. The theorems permit, in
some measure, a non-equilibrium steady state approximation to inherently nonequi-
librium processes under proper circumstances, and allow the stochastic differential
equation models inherent to nonequilibrium statistical mechanics to penetrate a full
step deeper.
Two dynamics dominate evolutionary process: punctuated equilibrium, in the
sense of Eldredge and Gould (1972), and path dependence (Gould 2002). Punctu-
ated equilibrium implies periods of relational stasis followed by sudden ‘extinction’
and/or ‘speciation’ events where entities undergo fundamental reorganization under
selection pressures that may involve competition. Path dependence implies that what
comes next depends heavily on, and is largely driven by, what has come before.
Western ‘market economies’ quintessentially involve persistent, grinding conflict
between cognitive entities, i.e., competition between institutions. The model can be
applied, with some modification, to the kind of de-facto combat operations likely to
confront AI control of real-time critical systems. The basic argument is that conflict
will always act as a powerful selection pressure on interacting cognitive entities, of
any nature, leading to punctuated speciation/extinction events on appropriate time
scales, depending on the exact nature of the contending systems.
Changes in Soviet military organization, leadership, and doctrine under German
‘selection pressure’ in WWII provide something of a case history, albeit on a different
timescale than the stock market flash-crash. The emergence of an ‘insurgency’ after
the botched US occupation of Iraq in 2003 provides another example, as does the
ultimately successful resistance of the defeated Confederate states after the US Civil
War that forced the removal of Federal troops after 1877, leading to imposition of a
draconian ‘Jim Crow’ system of racial apartheid and voter disenfranchisement that
lasted well into the latter half of the 20th Century. Indeed, after 1980, the Jim Crow
system evolved into current nation-wide programs of mass incarceration afflicting
racial minorities with much the same effect.
Interacting cognitive enterprises can be seen as undergoing evolutionary process
according to a modified version of the traditional biological mode (Wallace 2010,
2011, 2013, 2015a):
1. Variation. Among individual cognitive entities—AI systems, individuals, insti-
tutions, and their composites—there is considerable variation in structure and
behavior.
2. Inheritance of culture. Along its developmental path, which can be seen as a
kind of reproductive process, a machine/entity/institution (MEI) will resemble
its own history more than that of others, as ‘corporate’ strategies, resources, and
perspectives are passed on in time.
3. Change. Learned or enforced variation in structure, policy, and ‘doctrine’, in a
large sense, is constantly occurring in surviving MEI’s.
6.1 Introduction 81
for some probability distribution. This result goes under a number of names; Sanov’s
Theorem, Cramer’s Theorem, the Gartner-Ellis Theorem, the ShannonMcMillan
Theorem, and so forth (Dembo and Zeitouni 1998). Thus a large deviation can itself
be described in terms of an information source, here designated LD .
As a consequence of these considerations, we can define a joint Shannon uncer-
tainty representing the interaction of these information sources as
H (X , Y , Z, LD ) (6.2)
Ŝ ≡ H (K) − K · ∇K H (6.3)
μi,k is a diffusion matrix analog, and the last term represents volatility in a noise
process dBt that may not be Brownian.
Setting the expectation of this set of relations to zero, we find a relatively large
set of nonequilibrium steady states, indexed as j = 1, 2, ...jmax 1 and each char-
acterized by an uncertainty value Hj .
Importing the Clausewitz temperature T , we again write a pseudoprobability for
state q as
exp(−Hq /T )
Pq = (6.5)
j exp(−Hj /T )
and define a new ‘free energy’ Morse Function F̂ in terms of the denominator sum,
exp(−F̂/T ) ≡ exp(−Hj /T ) (6.6)
j
Fig. 6.1 Adapted from Fig. 10.3 of Wallace (2017b). The vertical axis indexes MEI capacity. The
horizontal one represents the degree of Clausewitz challenge Γ . At low values the system drifts
about a nonequilibrium steady state with significant capacity. Γ burden exceeding some critical level
triggers a punctuated phase change via a large deviation, leading to a less organized nonequilibrium
steady state. Such disintegration will likely, in itself, constitute a serious environmental insult,
leading to ‘self-referential’ ratchet dynamics: a positive feedback-driven race to the bottom. Similar
mechanisms may act during the rapid flash-crashes studied in Sect. 1.9
along which systems of conflicting MEI’s develop. There is never, ever, a ‘return to
normal after perturbation’ in path-dependent evolutionary process.
The evolutionary dynamic we propose for conflicting MEI’s under Clauswitzian
stress is illustrated by Fig. 6.1, (adapted from Fig. 10.3 of Wallace 2017b). The ver-
tical axis represents an index of system capacity—the ability to carry out designated
duties. The horizontal axis is taken as a measure of the Clausewitz stress Γ . At low
levels of stress the system drifts about some nonequilibrium steady state having rela-
tively high degrees of capacity. When stress exceeds a threshold, there is a punctuated
phase change associated with a large deviation, leading to a less organized nonequi-
librium steady state, as indicated. Thus onset of disintegration may itself constitute a
significant environmental insult, leading to a fully self-referential downward ratchet,
similar to the argument in Sect. 1.11.
A relatively simple deterministic mathematical description of such a binary switch
might be as follows. Assume Γ , the stress index, is initially at some nonequilibrium
steady state, and that Γ → Γ + ε. Then ε can be assumed, at least in first order, to
follow an approximate relation
√
If ε ≤ C/μ, then d ε/dt ≤ 0, and the system remains at or near the initial value
of Γ . Otherwise d ε/dt becomes positive, and the switch is triggered, according to
Fig. 6.1.
The standard stochastic extension has the SDE dynamics
C 1
d ε/dt = με − − σ 2ε (6.9)
ε 2
The last term is the added ‘Ito correction factor’ due to noise. ε has the nonequilib-
rium steady state expectation, again via the Jensen inequality for a concave function,
C
E(ε) ≥ (6.10)
μ − 21 σ 2
Below this level, the system collapses to zero. Above it, the system ‘explodes’ to
higher values.
Sufficient noise creates the ‘stochastic self-stabilization’ of Mao (2007), locking
in the collapsed ratchet state. In addition, since Eq. (6.8) represents an expectation
across a probability distribution, even at relatively low mean values there may well be
much larger stochastic excursions—large deviations—that can trigger a destabilizing
transition, following Fig. 6.1. For example, Wallace (2015a, Chap. 7) examines the
impact of the diversion of technological resources from civilian to military industrial
enterprise during the Cold War—equivalent to increasing σ in Eq. (6.9)—locking in
the massive ‘rust belt’ industrial collapse in the US.
Of course, given sufficient ‘available free energy’, in a large sense, upward ratchets
in levels of organization—analogous to the famous aerobic transition or, in human
social systems, to the Renaissance, the Industrial Revolution, the many post Victorian
Great Urban Reforms, and the US Labor and Civil Rights Movements—are also
possible, but these cannot at all be described as a ‘return to normal’ after perturbation.
Under such circumstances, decline in σ in Eq. (6.9) can lower the collapse-to-zero
threshold and trigger a monotonic increasing function of the ‘free energy’ index.
If σ 2 /2 ≥ μ, however, the needed reinvestment may become very large indeed,
leading to the collapse of the MEI.
It is important to realize that, although we have couched evolutionary dynamics
in terms of interacting information sources, evolutionary process, per se, is not cog-
nitive. Variation and selection will operate in the presence of any heritage system,
Lamarckian, cultural, and so on. Russian success over Prussian Bewegungskrieg in
WWII owes much (but not everything) to this dynamic, which, in long-term, can
undercut the John Boyd mechanism associated with the Data Rate Theorem that
6.2 An Iterated Coevolutionary Ratchet 85
applies to real-time tactics and operations. Recall also the ultimate defeat of the
US ‘revolution in military affairs’ of the 1990s by the grinding ‘insurgencies’ that
evolved against it in Iraq and Afghanistan.
In sum, Wallace (2011); Goldenfeld and Woese (2010), and others emphasize the
point that evolutionary process is, at base, self-dynamic, self-referential, continually-
bootstrapping phenomenon, one that, in essence, becomes ‘a language that speaks
itself’. Once triggered, such evolutionary ratchets can take on a life of their own,
entraining constituent cognitive subprocesses into a larger, embedding, but basically
non-cognitive, processes in which there is no command loop to short-circuit, in the
sense of John Boyd. The German experience on the Eastern Front of WW II, the
US experiences in Vietnam, Iraq and Afghanistan, and the market flash-crashes of
Sect. 1.9 seem to provide examples, albeit on different time scales.
A somewhat different view emerges from explicitly considering the dynamics of the
large deviations characterized by the information source LD (Wallace 2011). This
can be done using the metric M from Chap. 2, as described in the Mathematical
Appendix. Recall that M characterizes the ‘distance’ between different essential
behaviors and/or other ‘phenotypes’. We then express the large deviation in terms
of the dynamics of M , using the entropy of Eq. (6.3) to define another first-order
stochastic Onsager equation having of the form
dM d Ŝ
=μ + σ Wt (6.11)
dt dM
Here, d M /dt represents the ‘flow’ from system A to  in the underlying manifold,
and Wt represents Brownian noise. Again, see the Mathematical Appendix for details.
More generally, this must be expressed as the SDE
where Bt is not necessarily Brownian white noise and μ and σ are now appropriate
functions of Mt and t. Here we enter deep realms of stochastic differential geometry
in the sense of Emery (1989). We do this by making an explicit parameterization of
M (A, Â) in terms of a vector K and an associated metric tensor gi,j (K) as
where the integral is taken over some parameterized curve from A to  in the embed-
ding manifold. Substituting Eq. (6.13) into Eq. (6.12) produces a very complicated
expression in the components of K.
A first order iteration would apply the calculus of variations to minimize Eq. (6.13),
producing a starting expression having the form
d 2 Ki i dKj dKm
+ Γj,m =0 (6.14)
dt j,m
dt dt
where the Γ terms are the famous Christoffel symbols involving sums and products
of gi,j and ∂gi,j /∂Km . For the second iteration, this must be extended by introduction
of noise terms to produce Emery’s stochastic differential geometry. The formalism
provides a means of introducing essential factors such as geographic, social, and
other structures as the necessary ‘riverbanks’ constraining the ‘flow’ of self-dynamic
evolutionary process that act in addition to historical path-dependence.
In the Mathematical Appendix we contrast this empirical Onsager approach,
where equations must actually fit data, with the supposedly necessary and sufficient
methodology of evolutionary game theory.
Applying the Ito chain rule to log(Nt ) invokes the stochastic stabilization mech-
anisms of Mao (2007), via the added ‘correction factor’, leading to the long-time
endemic limits
Nt → 0, α < σ 2 /2
σ2
Nt → K(1 − ), α ≥ σ 2 /2 (6.16)
2α
If the rate of growth of the initial fragment, α, is large enough, noise-driven
fluctuations are not sufficient to collapse it to zero: a ‘traffic jam’ or ‘Cambrian
event’ (Wallace 2014) analog grows.
Figure 6.2 shows two simulations, with σ below and above criticality.
Taking the potential carrying capacity K as very large, so that Nt /K → 0 in
Eq. (6.15), the model suggests that improper management of conflict between cog-
nitive entities can lead to Cambrian events akin to Hercules’ battle with the Hydra:
cut off one head, and two will grow in its place.
Wallace and Fullilove (2014) describe the Hydra mechanism of the latter dynamic
as follows:
Atomistic, individual-oriented economic models of criminal behavior fail to capture crit-
ical scale-dependent behaviors that characterize criminal enterprises as cultural artifacts.
Public policies based on such models have contributed materially to the practice of mass
incarceration in the USA. A survey of similar policing strategies in other venues suggests
that such policies almost inevitably lead to exacerbation of organized violence. Adapting a
Black-Scholes methodology, it is possible to characterize the ‘regulatory investment’ needed
to manage criminal enterprise under conditions of uncertainty at a scale and level of orga-
nization that avoids an atomistic fallacy. The model illuminates how public policy that
might seem rational on an individual scale can trigger ecosystem resilience transitions to
long-lasting or permanent modes of institutionalized hyperviolence. The homicide waves
associated with the planned shrinkage program in New York City that was directed at dis-
persing minority voting blocks carry implications for national patterns of social disruption in
which mass incarceration is an ecological keystone. Continuing large-scale socioeconomic
decay, in the specific context of that keystone, greatly increases the probability of persistent,
large-scale, organized hyperviolence, as has been the experience in Naples, Sicily, Mexico,
and elsewhere.
One is indeed led to another quotation, this by Charles Dickens, from the 1853
novel Bleak House, describing the social diffusion of the pathologies of the generic
London slum he called Tom-All-Alone’s:
Even the winds are his messengers, and they serve him in these hours of darkness... There
is not an atom of Tom’s slime, not a cubic inch of any pestilential gas in which he lives, not
one obscenity or degradation about him, not an ignorance, not a wickedness, not a brutality
of his committing, but shall work its retribution through every order of society up to the
proudest of the proud and to the highest of the high.
Welcome to Iraq, Afghanistan, and the Drone Wars: sow chaos, reap chaos.
88 6 An Evolutionary Approach to Real-Time Conflict …
Fig. 6.2 Simulating Nt based on the Ito chain rule expansion of log(Nt ) using Eq. (6.14). The sim-
ulations apply the ItoProcess function in Mathematica
√ 10 for white noise. N0 = 100, K = 100, α =
1, σ = 0.5, 1.5. The critical value for σ is 2. 2000 time steps. While the upper trace fluctuates
about K, the lower collapses to zero. If K becomes large, then the upper trace explodes in the Hydra
mechanism: cut off one head, and two grow in its place. This is the Drone War dynamic
References 89
References
Champagnat, N., R. Ferriere, and S. Meleard. 2006. Unifying evolutionary dynamics: From indi-
vidual stochastic process to macroscopic models. Theoretical Population Biology 69: 297–321.
Dembo, A., and O. Zeitouni. 1998. Large deviations and applications, 2nd ed. New York: Springer.
Eldredge, N., and S. Gould. 1972. Punctuated equilibrium: An alternative to phyletic gradualism.
In Models in Paleobiology, ed. T. Schopf, 82–115. San Francisco: Cooper and Co.
Emery, M. 1989. Stochastic calculus in manifolds. Springer, New York: Universititext Series.
Goldenfeld, N., and C. Woese. 2010. Life is physics: Evolution as a collective phenomenon far from
equilibrium. arXiv: 1011.4125v1 [q-bio.PE]
Gould, S.J. 2002. The structure of evolutionary theory. Cambridge, MA: Harvard University Press.
Hodgson, G., and T. Knudsen. 2010. Darwin’s Conjecture: The search for general principles of
social and economic evolution. Chigago, IL: University of Chicago Press.
Langton, C. 1992. Life at the edge of chaos. In Artificial Life II, ed. C. Langton, C. Taylor, J. Farmer,
and S. Rasmussen. Reading MA: Addison-Wesley.
Mao, X. 2007. Stochastic differential equations and applications, 2nd ed. Philadelphia: Woodhead
Publishing.
Pettini, M. 2007. Geometry and topology in Hamiltonian dynamics. New York: Springer.
Sereno, M. 1991. Four analogies between biological and cultural/linguistic evolution. Journal of
Theoretical Biology 151: 467–507.
Von Neumann, J. 1966. Theory of self-reproducing automata. University of Illinois Press.
Wallace, R. 2010. Expanding the modern synthesis. Comptes Rendus Biologies 333: 701–709.
Wallace, R. 2011. A formal approach to evolution as self-referential language. BioSystems 106:
36–44.
Wallace, R. 2013. A new formal approach to evolutionary processes in socioeconomic systems.
Journal of Evolutionary Economics 23: 1–15.
Wallace, R. 2014. A new formal perspective on ‘Cambrian explosions’. Comptes Rendus Biologies
337: 1–5.
Wallace, R. 2015a. An ecosystem approach to economic stabilization: Escaping the neoliberal
wilderness. London: Routledge.
Wallace, R. 2017b. Computational Psychiarty: A systems biology approach to the epigenetics of
mental disorders. New York: Springer.
Wallace, R. 2018. Canonical instabilities of autonomous vehicle systems: The unsettling reality
behind the dreams of greed. New York: Springer.
Wallace, R., and R. Fullilove. 2014. State policy and the political economy of criminal enterprise:
Mass incarceration and persistent organized hyperviolence in the USA. Structural Change and
Economic Dynamics 31: 17–31.
Chapter 7
Summary
The language of business is the language of dreams, but the language of war is the
language of nightmare made real. Yet business dreams of driverless cars on intelli-
gent roads, and of other real-time critical systems under the control of algorithmic
entities, have much of war about them. Critical real-time systems, including military
institutions at the tactical, operational and strategic scales, act on rapidly-shifting
roadway topologies whose ‘traffic rules’ can themselves rapidly change. Indeed,
combat rules-of-the-game usually morph in direct response to an entity’s ‘driving
pattern’, in a large sense. ‘Defensive driving’ is something more than an oxymoron.
The conduct of war is never without both casualty and collateral damage. Real-
time critical systems of any nature will inevitably partake of fog-of-war and fric-
tional challenges almost exactly similar to those that have made warfare increasingly
intractable for modern states. Indeed, the destabilization of essential algorithmic
entities has become a new tool of war.
Into the world of Carl von Clausewitz, John Boyd, Mao Tse-Tung, Vo Nguyen
Giap and Genghis Khan, come the brash, bright-eyed techies of Waymo, Alphabet,
Microsoft, Amazon, Uber, and all the wanna-bees. They will forthrightly step in
where a literal phalanx of angels has not feared to tread, but has already treaded very
badly indeed.
For systems facing Clausewitz challenges, everybody always eventually screws
up, and there are always very many dead bodies. Nobody navigates, or can navigate,
such landscapes unscathed.
Something of this is, of course, already known within the tech industries, much as
the risks of tobacco, of PVC furnishings and finishings, and so on, have always been
well understood by the corporations that market them and deliberately obscure their
dangers. At best, heuristic measures such as ‘anytime algorithms’ or multi-subsystem
voting strategies are deemed sufficient to meet real-world conditions. What is not well
appreciated by the tech industries, however, is the utterly unforgiving nature of the
Clausewitz Zweikampf. A taste of these matters has been presented here in a number
© The Author(s), under exclusive licence to Springer International Publishing AG, 91
part of Springer Nature 2018
R. Wallace, Carl von Clausewitz, the Fog-of-War, and the AI Revolution, SpringerBriefs
in Computational Intelligence, https://doi.org/10.1007/978-3-319-74633-3_7
92 7 Summary
of narrative military vignettes: the ‘best and the brightest’ have always screwed up.
Even the Russians lost some 4 million men in WWII before they learned how to
systematically overcome Prussian Bewegungskrieg, as at Stalingrad and Kursk.
If you are doing AI and all this seems irrelevant, you are fucking clueless.
Where fog-of-war and frictional challenges are both infrequent and small, AI sys-
tems will be relatively reliable, perhaps as reliable as the already highly-automated
power grids. Introduce those challenges, and AI will fail as badly as has military enter-
prise. Internet-of-X real-time systems—V2V/V2I, etc.—will be particularly suscep-
tible to large-scale blackout analogs (e.g., Wallace 2018, and references therein).
If these are critical systems, then considerable morbidity and mortality must be
expected.
Deep learning and reinforcement learning AI, when confronted with novel, and
often highly cognitive, challenges that ‘get inside the command decision loop’ can
be expected to fail. Under heavy load, the command decision loop time constant
will relentlessly increase, providing opportunity for inadvertent or deliberate short-
circuiting leading to failure. Indeed, minor perturbations of any nature at the func-
tional equivalent of ‘rush hour’ will have increased probability of amplification to
debilitating meso-, and often macro-scale, phase transitions. This is pretty much
written in stone, as are the associated coevolutionary ‘flash-crash’ and extended
self-referential dynamics that take matters beyond John Boyd’s OODA loop.
The persistent and characteristic failures of military enterprises confronted by
Clausewitz challenges raise a red flag for tech industries hell-bent on marketing AI
for the control of real-time critical systems. The current trajectory of both policy
and practice suggests that, at best, the liability lawyers are going to get rich beyond
dreams of avarice. Worst case scenarios involve large-scale ‘flash crash’ contending
military AI systems.
Reference
Wallace, R. 2018. Canonical instabilities of autonomous vehicle systems: The unsettling reality
behind the dreams of greed. New York: Springer.
Appendix A
Mathematical Appendix
Take H (Γ ) as the control information rate ‘cost’ of stability at the index level Γ .
What is the mathematical form of H (Γ ) under conditions of volatility i.e., variability
in Γ proportional to it? Let
where dWt is taken as white noise and the function g(t, Γ ) will ‘fall out’ of the
calculation on the assumption of certain regularities.
Let H (Γt , t) be the minimum needed incoming rate of control information under
the Data Rate Theorem, and expand in Γ using the Ito chain rule (Protter 1990)
1
dHt = [∂H /∂t + g(Γt , t)∂H /∂Γ + b2 Γt2 ∂ 2 H /∂Γ 2 ]dt
2
+[bΓt ∂H /∂Γ ]dWt (A.2)
1
ΔL = (−∂H /∂t − b2 Γ 2 ∂ 2 H /∂Γ 2 )Δt (A.4)
2
As in the classical Black-Scholes model (Black and Scholes 1973), the terms
in g and dWt ‘cancel out’, and the effects of noise are subsumed into the Ito cor-
rection factor, a regularity assumption making this an exactly solvable but highly
approximate model.
The conventional Black-Scholes calculation takes ΔL/ΔT ∝ L. Here, at nonequi-
librium steady state, we assume ΔL/Δt = ∂H /∂t = 0, so that
1
− b2 Γ 2 ∂ 2 H /∂Γ 2 = 0 (A.5)
2
By inspection,
H = κ1 Γ + κ2 (A.6)
A.2 Groupoids
other than the classification of equivalence relations via the orbit equivalence rela-
tion and groups via the isotropy groups. The imposition of a compatible topological
structure produces a nontrivial interaction between the two structures. Below we will
introduce a metric structure on manifolds of related information sources, producing
such interaction.
In essence a groupoid is a category in which all morphisms have an inverse, here
defined in terms of connection by a meaningful path of an information source dual
to a cognitive process.
As Weinstein (1996) points out, the morphism (α, β) suggests another way of
looking at groupoids. A groupoid over A identifies not only which elements of A
are equivalent to one another (isomorphic), but it also parameterizes the different
ways (isomorphisms) in which two elements can be equivalent, i.e., all possible
information sources dual to some cognitive process. Given the information theoretic
characterization of cognition presented above, this produces a full modular cognitive
network in a highly natural manner.
Brown (1987) describes the basic structure as follows:
A groupoid should be thought of as a group with many objects, or with many identities... A
groupoid with one object is essentially just a group. So the notion of groupoid is an extension
of that of groups. It gives an additional convenience, flexibility and range of applications...
EXAMPLE 1. A disjoint union [of groups] G = ∪λ G λ , λ ∈ Λ, is a groupoid: the product
ab is defined if and only if a, b belong to the same G λ , and ab is then just the product in the
group G λ . There is an identity 1λ for each λ ∈ Λ. The maps α, β coincide and map G λ to
λ, λ ∈ Λ.
EXAMPLE 2. An equivalence relation R on [a set] X becomes a groupoid with α, β :
R → X the two projections, and product (x, y)(y, z) = (x, z) whenever (x, y), (y, z) ∈ R.
There is an identity, namely (x, x), for each x ∈ X ...
the set of all points in M with f (x) = a. If M is compact, then the whole manifold
can be decomposed into such slices in a canonical fashion between two limits, defined
by the minimum and maximum of f on M. Let the part of M below a be defined as
These sets describe the whole manifold as a varies between the minimum and
maximum of f .
Morse functions are defined as a particular set of smooth functions f : M → R as
follows. Suppose a function f has a critical point xc , so that the derivative d f (xc ) =
0, with critical value f (xc ). Then f is a Morse function if its critical points are
nondegenerate in the sense that the Hessian matrix J of second derivatives at xc ,
whose elements, in terms of local coordinates are
Ji, j = ∂ 2 f /∂ x i ∂ x j ,
has rank n, which means that it has only nonzero eigenvalues, so that there are no
lines or surfaces of critical points and, ultimately, critical points are isolated.
The index of the critical point is the number of negative eigenvalues of J at xc .
A level set f −1 (a) of f is called a critical level if a is a critical value of f , that
is, if there is at least one critical point xc ∈ f −1 (a).
Again following Pettini (2007), the essential results of Morse theory are as follows:
1. If an interval [a, b] contains no critical values of f , then the topology of f −1 [a, v] does not
change for any v ∈ (a, b]. Importantly, the result is valid even if f is not a Morse function,
but only a smooth function.
2. If the interval [a, b] contains critical values, the topology of f −1 [a, v] changes in a manner
determined by the properties of the matrix J at the critical points.
3. If f : M → R is a Morse function, the set of all the critical points of f is a discrete subset
of M, i.e., critical points are isolated. This is Sard’s Theorem.
4. If f : M → R is a Morse function, with M compact, then on a finite interval [a, b] ⊂ R,
there is only a finite number of critical points p of f such that f ( p) ∈ [a, b]. The set of
critical values of f is a discrete set of R.
5. For any differentiable manifold M, the set of Morse functions on M is an open dense set
in the set of real functions of M of differentiability class r for 0 ≤ r ≤ ∞.
6. Some topological invariants of M, that is, quantities that are the same for all the manifolds
that have the same topology as M, can be estimated and sometimes computed exactly once
all the critical points of f are known: let the Morse numbers μi (i = 0, ..., m) of a function
f on M be the number of critical points of f of index i, (the number of negative eigenvalues
Appendix A: Mathematical Appendix 97
m
χ= (−1)i μi .
i=1
χ=V−E+F
where V, E, and F are the numbers of vertices, edges, and faces in the polyhedron.
7. Another important theorem states that, if the interval [a, b] contains a critical value of f
with a single critical point xc , then the topology of the set Mb defined above differs from
that of Ma in a way which is determined by the index, i, of the critical point. Then Mb is
homeomorphic to the manifold obtained from attaching to Ma an i-handle, i.e., the direct
product of an i-disk and an (m − i)-disk.
Matsumoto (2002) and Pettini (2007) provide details and further references.
to hold for s sufficiently small. The idea is to take a distortion measure as a kind of
Finsler metric, imposing resulting ‘global’ geometric structures for an appropriate
class of non-ergodic information sources. Possible interesting theorems, then, revolve
around what properties are metric-independent, in much the same manner as the Rate
Distortion Theorem is independent of the exact distortion measure chosen.
This sketch can be made more precise.
98 Appendix A: Mathematical Appendix
Take a set of ‘consonant’ paths x n → x, that is, paths consistent with the ‘gram-
mar’ and ‘syntax’ of the information source dual to the cognitive process of interest.
Suppose, for all such x, there is an open set, U , containing x, on which the
following conditions hold:
(i) For all paths x̂ n → x̂ ∈ U , a distortion measure s n ≡ dU (x n , x̂ n ) exists.
(ii) For each path x n → x in U there exists a pathwise invariant function H (x n ) →
H (x), in the sense of Khinchin (1957, p.72). While such a function will almost
always exist, only in the case of an ergodic information source does it have the
mathematical form of an ‘entropy’ (Khinchin 1957). It can, however, in the sense
of Feynman (2000), still be characterized as homologous to free energy, since
Bennett’s elegant little machine (Feynman 2000) can still turn the information
in a message from a nonergodic information source into work.
(iii) A function MU (s n , n) ≡ Mn → M exists, for example,
and so on.
(iv) The limit
H (x n ) − H (x̂ n )
lim ≡ dH /dM (A.9)
n→∞ Mn
using an appropriate integration limit argument. The second integration is over dif-
ferent paths within A itself, while the first is between different paths in A and Â.
Equation (1.48) states that the ‘free energy’ F and the correlation length, the degree
of coherence on the underlying network, scale under renormalization clustering in
chunks of size L as
F[K L , JL ]/ f (L ) = F[J, K ]
χ [K L , JL ]L = χ (K , J )
K L = K C + (K − K C )L z f (L ) y (A.13)
K C /2 ≈ K C + (K − K C )L z f (L ) y (A.14)
We iterate in two steps, first solving this for f (L ) in terms of known values,
and then solving for L , finding a value LC that we then substitute into the first
100 Appendix A: Mathematical Appendix
of Eq. (1.48) to obtain an expression for F[K , 0] in terms of known functions and
parameter values.
The first step gives the general result
[K C /(K C − K )]1/y
f (LC ) ≈ z/y
(A.15)
21/y LC
Solving this for LC and substituting into the first expression of Eq. (1.48) gives,
as a first iteration of a far more general procedure (Shirkov and Kovalev 2001), the
result
F[K C /2, 0] F0
F[K , 0] ≈ =
f (LC ) f (LC )
χ (K , 0) ≈ χ (K C /2, 0)LC = χ0 LC (A.16)
f (L ) = L δ (A.17)
where δ > 0 is a real number which may be quite small, equation we can be solve
for LC , obtaining
[K C /(K C − K )][1/(δy+z)]
LC = (A.18)
21/(δy+z)
for K near K C . Note that, for a given value of y, one might characterize the relation
α ≡ δy + z = constant as a ‘tunable universality class relation’ in the sense of Albert
and Barabasi (2002).
Substituting this value for LC back gives a complex expression for F, having
three parameters: δ, y, z.
A more interesting choice for f (L ) is a logarithmic curve that ‘tops out’, for
example
f (L ) = m log(L ) + 1 (A.19)
Again f (1) = 1.
A late version of the computer algebra program Mathematica solves for LC as
Q y/z
LC = (A.20)
Lamber t W [Q exp(z/my)]
Appendix A: Mathematical Appendix 101
where
Q ≡ (z/my)2−1/y [K C /(K C − K )]1/y
The function arises in the theory of random networks and in renormalization strategies
for quantum field theories.
An asymptotic relation for f (L ) would be of particular interest, implying that
‘computational richness’ increases to a limiting value with system growth. Taking
gives a system which begins at 1 when L = 1, and approaches the asymptotic limit
exp(m) as L → ∞. Mathematica finds
my/z
LC = (A.22)
Lamber t W [A]
where
A ≡ (my/z) exp(my/z)[21/y [K C /(K C − K )]−1/y ] y/z
Applying these latter results to the Zurek calculation on fragment size, Eq. (1.53),
has yet to be done.
In contrast to the empirical methodology of Chap. 6, where equations must fit data,
evolutionary game theory is supposed to provide both a necessary and sufficient
model of evolutionary dynamics. The underlying formalism is the replicator equation
of Taylor and Jonker (1978). We follow the presentation of Roca et al. (2009).
Given an evolutionary game with a payoff matrix W , the dynamics of the distri-
bution of strategy frequencies, xi , as elements of a vector x, follow the relation
d xi
= xi [(W x − xT W x] (A.23)
dt
The term xT W x ensures that xi = 1. The implications are then derived by
recourse to dynamical systems theory. An appropriate change of variables converts
the equation to a system of the Lotka-Volterra type.
102 Appendix A: Mathematical Appendix
References
Albert, R., and A. Barabasi. 2002. Statistical mechanics of complex networks. Reviews of Modern
Physics 74: 47–97.
Atlan, H., and I. Cohen. 1998. Immune information, self-organization and meaning. International
Immunology 10: 711–717.
Black, F., and M. Scholes. 1973. The pricing of options and corporate liabilities. Journal of Political
Economy 81: 637–654.
Brown, R. 1987. From groups to groupoids: A brief survey. Bulletin of the London Mathematical
Society 19: 113–134.
Cover, T., and J. Thomas. 2006. Elements of information theory, 2nd ed. New York: Wiley.
Feynman, R. 2000. Lectures in computation. Boulder CO: Westview Press.
Glazebrook, J.F., and R. Wallace. 2009. Rate distortion manifolds as model spaces for cognitive
information. Informatica 33: 309–346.
Khinchin, A. 1957. Mathematical foundations of information theory. New York: Dover Publications.
Pettini, M. 2007. Geometry and topology in Hamiltonian dynamics. New York: Springer.
Protter, P. 1990. Stochastic integration and differential equations. New York: Springer.
Roca, C., J. Cuesta, and A. Sanchez. 2009. Evolutionary game theory: Temporal and spatial effects
beyond replicator dynamics. Physics of Life Reviews 6: 208–249.
Shirkov, D., and V. Kovalev. 2001. The Bogolirbov renormalization group and solution symmetry
in mathematical physics. Physics Reports 352: 219–249.
Taylor, P., and L. Jonker. 1978. Evolutionarily stable strategies and game dynamics. Mathematical
Biosciences 40: 145–156.
Wallace, R., and M. Fullilove. 2008. Collective consciousness and its discontents. New York:
Springer.
Weinstein, A. 1996. Groupoids: unifying internal and external symmetry. Notices of the American
Mathematical Association 43: 744–752.
Wilson, K. 1971. Renormalization group and critical Phenomena. I. Renormalization group and the
Kadanoff scaling picture. Physics Reviews B 4: 3174–3183.