A Short Course in Mathematical Neuroscience (Revised 7/2015)

A Short Course in Mathematical Neuroscience
(revised 7/2015).
Philip Eckhoff and Philip Holmes,

with contributions by
Kolia Sadeghi, Michael Schwemmer and KongFatt Wong-Lin.
Program in Applied and Computational Mathematics,

Princeton University.
July 9, 2015
Preface
This book has two main aims: to teach how mathematical models that illuminate some parts of
neuroscience can be constructed, primarily by describing both “classic” and more recent examples;
and to explain mathematical methods by which these models can be analyzed, thus yielding pre-
dictions and explanations that can be brought to bear on experimental data. Theory is becoming
increasingly important in neuroscience [1], not least because recent developments in optogenetics,
imaging and other experimental methods are creating very big data. Computational approaches
already play a substantial rôle [59, 60], and modelers are becoming more ambitious: the Euro-
pean Blue Brain project [184] (http://bluebrain.epfl.ch/) proposes to simulate all the cells
and most of the synapses in an entire brain, thereby hoping to “challenge the foundations of our
understanding of intelligence and generate new theories of consciousness.”
Our goal is more modest: to show how relatively simple mathematical models and their analyses
can aid understanding of some parts of brains and central nervous systems. The choice of the
adjective “mathematical” here and in this book’s title reflects our strategy and biases. Scientific
theories are often expressed via mathematics (especially in physics and chemistry), but they are
increasingly applied and “solved” by numerical computation. This works well if the models are
based in physics and well tried and tested, as in classical and quantum mechanics. Even though
we cannot solve the Navier-Stokes equations explicitly (and although a full proof of existence and
uniqueness of solutions is lacking1 ), engineers can use simulations of transonic flow with confidence
in airplane design. In contrast, although good biophysically-based models of single cells and small
circuits exist (including some described here), it is not clear that massive simulations of large neural
and biomechanical networks will reveal brain and neuromuscular function at the levels required for
understanding. We believe that the discipline, focus and constraints implicit in mathematical
analysis can lead to models that identify key features at appropriate spatial and temporal scales,
and perhaps relate models across the scales. Moreover, since relatively little neuro-theory exists,
and we wish to teach some general principles, we favor simple models and do not attempt to state
general theories.
This book grew from notes written for a course developed at Princeton University, directed at
junior and senior undergraduates and beginning graduate students, that has been taught biennially
since 2006. Minimal prerequisites for the course, and reading the book, are multivariable calculus
and linear algebra, including eigenvalue problems; familiarity with separable first order and linear
second order ordinary differential equations is also helpful. The first ten chapters of Hirsch, Smale
and Devaney’s textbook [121] provides a good background for this material.
1
See: http://www.claymath.org/millennium-problems/navier%E2%80%93stokes-equation
i
Our collective experience indicates that the substance of Chapters 2-6 can be covered in 24
lectures each of one hour and 20 minutes. Much of Chapter 1 and some subsequent verbally-
descriptive passages may be relegated to self study, and the notes supplemented by reading of a
few original sources such as the 1952 paper of Hodgkin and Huxley [126]. The focus has changed
from year to year, but a typical schedule has been: Chapter 1: 1-2 lectures; Chapter 2: 5 lectures;
Chapter 3: 4 lectures; Chapter 4: 5 lectures; Chapter 5: 4 lectures; Chapter 6: 4-5 lectures. We set
six homework assignments, the fourth serving as a “take-home” midterm exam, and asked students
to write a review or research paper in place of a final exam. Homework problems were drawn
from the Exercises scattered strategically throughout the text and also taken from Hugh Wilson’s
textbook [277], which is currently out of print but available for download.
When we began writing notes, with the exceptions of [277] and Dayan and Abbott [58], few
textbooks on mathematical or theoretical neuroscience were available. Wilson’s book covers similar
material to our chapters 3-4, with more examples, and emphasizes nonlinear dynamical models of
the type introduced in chapter 2. Dayan and Abbott cover a broader range, including coding and
decoding, neurons and networks, and models of learning, but without describing dynamical models
or their analyses in depth. They provide more detail on information-theoretic and probabilistic
models, including Bayesian methods, than our brief discussions in chapter 5. The monograph of
Rieke et al. [219] also covers these areas well (we refer to it extensively in chapter 5). These books,
published over 14 years ago, have now been joined by the textbooks of Izhikevich [141], Ermentrout
and Terman [75] and Gabbiani and Cox [86]. The first two cover nonlinear dynamical systems
methods and models at a more advanced level than the present book, the former focusing on single
cells and the latter encompassing neural networks and systems neuroscience. Gabbiani and Cox
also probe the behaviour of single cells and synapses in far greater detail than we do and introduce
a wider range of mathematical methods, including Fourier series and transforms, the singular value
decomposition, and power spectra. These three books, all significantly longer than ours, would be
suitable for readers wishing to continue beyond our short course.
ii
Contents
1 A very short introduction to (mathematical) neuroscience 1
1.1 An historical sketch of nerves and their mathematics . . . . . . . . . . . . . . . . . . 2
1.2 Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Action potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Synapses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Neural coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 The central nervous system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 The cerebral cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.1 Discovery of organization of the cerebral cortex . . . . . . . . . . . . . . . . . 11
1.5.2 Functional organization of the cerebral cortex . . . . . . . . . . . . . . . . . . 14
1.6 The peripheral nervous system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7 A note on experimental methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Mathematical Tools: ODEs, Numerical Methods and Dynamical Systems 18
2.1 Linearization and solving linear systems of ODEs . . . . . . . . . . . . . . . . . . . . 19
2.1.1 Stability of solutions of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.2 Liapunov functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Numerical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
iii
2.2.1 Euler’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.2 The second-order Euler-Heun method . . . . . . . . . . . . . . . . . . . . . . 31
2.2.3 The Runge-Kutta method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.4 Newton’s method for finding zeros . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 Geometric theory of dynamical systems . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.1 Stable and unstable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.2 Planar systems of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.3 Center manifolds and local bifurcations . . . . . . . . . . . . . . . . . . . . . 41
2.3.4 Periodic orbits and Poincaré maps . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4 An introduction to neural networks as nonlinear ODEs . . . . . . . . . . . . . . . . . 53
3 Models of single cells 60
3.1 Modeling strategies and scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2 Ion channels and membranes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3 The Hodgkin-Huxley (H-H) equations . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4 Two-dimensional simplifications of the H-H equations . . . . . . . . . . . . . . . . . 75
3.5 Bursting neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.6 Propagation of action potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.6.1 Traveling waves in a simple PDE . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.6.2 Traveling waves in the FitzHugh-Nagumo equation . . . . . . . . . . . . . . . 89
3.7 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4 Synaptic connections and small networks 92
4.1 Synapses and gap junctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.1.1 Electrical Synapses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
iv
4.1.2 Chemical Synapses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.2 Integrate-and-fire models of neurons and synapses . . . . . . . . . . . . . . . . . . . 98
4.2.1 Integrate-and-fire models for a single cell . . . . . . . . . . . . . . . . . . . . 98
4.2.2 Integrate-and-fire models with noisy inputs . . . . . . . . . . . . . . . . . . . 100
4.2.3 Implementation of synaptic inputs to IF models . . . . . . . . . . . . . . . . 103
4.2.4 A pair of coupled integrate-and-fire neurons . . . . . . . . . . . . . . . . . . . 109
4.2.5 A final note on neuronal modeling . . . . . . . . . . . . . . . . . . . . . . . . 113
4.3 Phase reductions and phase oscillator models of neurons . . . . . . . . . . . . . . . . 116
4.3.1 Phase response or resetting curves . . . . . . . . . . . . . . . . . . . . . . . . 116
4.3.2 Weakly coupled phase oscillators and averaging theory . . . . . . . . . . . . . 121
4.3.3 Phase models and half-center oscillators . . . . . . . . . . . . . . . . . . . . . 122
4.4 Central Pattern Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.4.1 A CPG model for lamprey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.4.2 A CPG model for insect locomotion . . . . . . . . . . . . . . . . . . . . . . . 125
5 Probabilistic methods and information theory 131
5.1 Stimulus to response maps: Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.1.1 Description of spike trains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.1.2 A primer on probability theory . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.1.3 Spike-triggered averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.1.4 A Poisson model for firing statistics . . . . . . . . . . . . . . . . . . . . . . . 139
5.2 Response to stimulus maps: Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3 Information theoretic approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.3.1 Entropy and missing information . . . . . . . . . . . . . . . . . . . . . . . . . 148
v
5.3.2 Entropy of spike trains and spike counts . . . . . . . . . . . . . . . . . . . . . 149
5.3.3 Relative entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.3.4 Mutual information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6 Stochastic models of decision making 158
6.1 Two-alternative forced-choice tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.1.1 Neural data from the moving dots task . . . . . . . . . . . . . . . . . . . . . . 159
6.1.2 A neural network model for choosing: leaky competing accumulators . . . . . 160
6.2 Derivation of firing rate equations from spiking neuron models . . . . . . . . . . . . 163
6.3 The best way to choose between two noisy signals . . . . . . . . . . . . . . . . . . . . 170
6.3.1 The sequential probability ratio test . . . . . . . . . . . . . . . . . . . . . . . 171
6.3.2 Random walks and the continuum limit of SPRT . . . . . . . . . . . . . . . . 173
6.4 Introduction to stochastic differential equations . . . . . . . . . . . . . . . . . . . . . 175
6.4.1 Change of variables: Itō’s lemma . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.4.2 The forward Kolmogorov or Fokker-Planck equation . . . . . . . . . . . . . . 177
6.4.3 First passage problems and the backward Kolmogorov equation . . . . . . . . 182
6.4.4 Numerical integration: the Euler-Maruyama method . . . . . . . . . . . . . . 184
6.5 A return to two-alternative forced-choice tasks . . . . . . . . . . . . . . . . . . . . . 185
6.5.1 The free response protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.5.2 The interrogation protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.6 Optimal performance on a free response task: analysis and predictions . . . . . . . . 187
vi
Chapter 1
A very short introduction to

(mathematical) neuroscience
The nervous system has been studied for millennia, and current progress is faster than ever,
largely due to advances in experimental methods, although as noted in the Preface, theoretical work
is playing an increasingly important rôle [1]. We begin with a very brief history, pointing out that
mathematics entered the picture in the mid 20th century. We then describe the basic components –
neurons and their synaptic connections – before jumping up in size to the organization of the central
nervous system, brain and the peripheral sensory system, emphasizing human physiology. We end
with a sketch of experimental methods. Much of the material follows Dayan and Abbott [58]
and Kandel et al. [149]. Readers wishing to establish firm foundations in neuroscience should
consult these and other textbooks and take an introductory course, preferably with a laboratory
component. Neuroscience is a very rich field, drawing on many areas of biology, chemistry and
physics, and spanning molecular to organismal spatial scales and milliseconds to lifetimes.
Chapter 2 provides an introduction to nonlinear dynamical systems, aka sets of deterministic

ordinary differential equations (ODEs). We start by showing how and under what conditions solu-
tions of nonlinear ODEs can be approximated by linearizing them near fixed points, provide a rapid
review of linear, constant coefficient ODEs, define different notions of stability, and describe meth-
ods for determining stability and instability of fixed points. We then present numerical methods for
approximating solutions of ODEs before returning to nonlinear systems to describe the geometric
structures that organize the full set of solutions in their state spaces. Throughout, we illustrate
with simple examples and models drawn from neuroscience.
Chapters 3 and 4 address the biophysical properties and behavior of neurons, introducing the
Hodgkin-Huxley ODEs that model the generation and propagation of action potentials (APs or
spikes) and presenting simplified versions of these equations. We then discuss gap junctions and
synapses, and much simplified integrate-and-fire (I-F) models that replace the detailed ion channel
dynamics responsible for action potentials by delta functions (instantaneous spikes). I-F models are
often used in computational studies of neural networks. We also show how networks of periodically
spiking or bursting neurons can be reduced to sets of phase oscillators, in which a single phase
(timing) variable replaces voltage and ionic channel dynamics in each cell or group of cells. These
1
methods are illustrated by models of central pattern generators, networks located in the spine in
vertebrates and thorax in insects that can produce rhythmic locomotive and other behaviors in
isolation.
In chapter 5 we turn to probabilistic methods, introducing basic ideas from probability theory
motivated by analyses of spike train data. We describe simple models for encoding stimuli and
decoding spike trains, and show how tools from information theory can be applied to spike train
data. Finally, chapter 6 moves from cells and networks to brain areas to consider simple models
of two-alternative decisions, drawing on behavioral data and neural recordings from humans and
primates. Here probabilistic elements due to noisy stimuli and intrinsic neural variability combine
with deterministic dynamics and the models are stochastic differential equations.
1.1 An historical sketch of nerves and their mathematics
The Greek physician Galen thought that the brain was a gland, from which nerves carry fluid
to the extremities. In the mid-nineteenth century, it was discovered that electrical activity in a
nerve has predictable effects on neighboring neurons. Camillo Golgi and Santiago Ramon y Cajal
made the first detailed descriptions of nerve cells in the late nineteenth century, and one of Cajal’s
drawings is reproduced in Fig. 1.1. Ross Harrison discovered that the axon and dendrites grow from
the cell body in isolated culture [114]; also see [151]. Pharmacologists discovered that drugs affect
cells by binding to receptors, which opened the door to the discovery of chemical communication
between neurons at synapses.
Figure 1.1: A neuron drawn by Ramon y Cajal. (expasy.org).
Over the past century neuroscience has now grown into a broad and diverse field. Molecu-
lar neuroscience studies the detailed structure and dynamics of neurons, synapses and small net-
works; systems neuroscience studies larger-scale networks that perform tasks and interact with
other such networks (or brain areas) to form pathways for higher-level functions; and cognitive
2
neuroscience studies the relationship between the underlying physiology (neural substrates) and
behavior, thought and cognition.
Mathematical treatments of the nervous system began in the mid 20th century. One of the first
examples is the book of Norbert Wiener, based on work done with the Mexican physiologist Arturo
Rosenblueth, and originally published in 1948 [275]. Weiner introduced ideas from dissipative
dynamical systems, symmetry groups, statistical mechanics, time series analysis, information theory
and feedback control. He also discussed the relationship between digital computers (then in their
infancy) and neural circuits, a theme that John von Neumann subsequently addressed in a book
written in 1955-57 and published in the year after his death [262]. In fact, while developing one
of the first programmable digital computers (JONIAC, built at the Institute for Advanced Study
in Princeton after the second World War), von Neumann had “tried to imitate some of the known
operations of the live brain” [262, see the Preface by Klara von Neumann]. It is also notable that,
in developing cybernetics, Wiener drew heavily on von Neumann’s earlier works in analysis, ergodic
theory, computation and game theory, as well his own studies of Brownian motion (now known as
Wiener processes). We will describe some of these ideas and methods in chapters 5 and 6.
These books [275, 262] were directed at the brain and nervous system in toto, although much of
the former was based on detailed experimental studies of heart and leg muscles in animals. The first
cellular-level mathematical model of a single neuron was developed in the early 1950’s by the British
physiologists Alan Hodgkin and Andrew Huxley [126]. This work, which won them the Nobel Prize
in Physiology in 1963, grew out of a long series of experiments on the giant axon of the squid Loligo
by themselves and others, as noted in chapter 3 (also see Huxley’s obituary [177]). Since their
pioneering work, mathematical neuroscience has grown into a subdiscipline, served worldwide by
courses short and long, a growing list of textbooks (e.g. [277, 58, 141, 150, 75, 86]) noted in the
Preface, and review articles such as [272, 158, 187, 61]. The number of mathematical models must
now exceed the catalogue of brain areas by several orders of magnitude. In this short book we can
present only a few examples, inevitably biased toward our own interests, but we hope that it will
prepare you to learn more, and to contribute to this exciting field.
Before beginning our rapid tour through the physiology of the nervous system, some remarks
about mathematical models are necessary. Although interest in theory and modeling is growing,
neuroscience is still dominated by experimental methods and findings. Most models are developed
to explain and/or to predict experimental observations. To do so they must be validated by
recreating, at least qualitatively, behavior observed experimentally. Models can be of two broad
types: empirical (also called descriptive or phenomological ), or mechanistic. Empirical models
ignore the (perhaps unknown) anatomy and physiology, and attempt to reproduce input-output
(or stimulus-response) relationships of the biological system under study. Mechanistic models
attempt to describe physiological and anatomical features in some detail, and reproduce observed
behaviors by appropriate choice of model components and the parameters characterizing them,
thereby revealing mechanisms responsible for those behaviors. Many models are not so easily
classifiable, and models can also occupy all kinds of places in a continuum from molecular to
organismal scales. We shall have more to say about this in §3.1. Most models in neuroscience and
mathematical biology strive for quantitative accuracy, but this remains more elusive than is typical
in the physical sciences.
3
1.2 Neurons
The nervous system consists of the peripheral and central nervous systems, which themselves
can be further subdivided and studied. The neuron is the building block for the entire system, and
so, following the reductive method that has been so successful in the physical sciences, we shall
begin with a single cell, and build up to larger systems.
Figure 1.2: Photograph of neurons from an ox spinal cord smear, with an H & E stain, magnified
100X. (dmacc.edu).
We start with some basic anatomy and architecture. Neurons are cells that can carry signals
rapidly over distance (albeit at speeds much less that that of light!). These signals are action
potentials (APs, or spikes): relatively large (O(100) mV)1 fluctuations in the voltage across the
cell membrane. APs propagate down the axon: an extension of the cell that forms a relatively long
cable, compared to the size of the soma, or central cell body. Neurons exhibit significant variety
in shape and size, but all have the same basic features: the soma, dendrites, which are the shorter
and more numerously branching extensions2 that receive signals from other neurons, and the axon,
which carries signals to other neurons, and may also be branched. Principal (or projection), neurons
are often excitatory, and carry signals over long distances from one region of the central nervous
system (CNS) to another or from one processing region of the brain to another. Cortical pyramidal
neurons, examples of projection neurons, are the primary excitatory neurons of the cerebral cortex,
the brain region where most cognitive functions are located. Interneurons have shorter axons and
usually provide inhibitory inputs. Fig. 1.2 shows neurons in a vertebrate spinal cord.
The dendritic tree (the branching pattern of dendrites) receives an average of 2 inputs/µm,
and on average each neuron has 4 mm of dendritic tree. The type and number of inputs varies
significantly, and some neurons receive over 105 inputs. An axon has 180 connections/mm length
on average, and a typical mouse neuron has a 40 mm axon. The longest axons in the human can
exceed 1 m (extending from motoneurons in the spinal cord to muscles in the feet). Each cell can
1
The notation O(n) means, of the order n, a rigorous definition is provided in §2.1. Here we use it informally to
mean “within an order of magnitude.”
2
Biologists refer to dendrites and axons as processes: confusing terminology for a mathematician!
4
Figure 1.3: Schematic of parts of a neuron. (college.hmco.com/psychology/bernstein).
synapse with thousands of others (O(103 )) and some brainstem cells make O(105 ) connections.
The human brain contains O(1011 ) neurons and O(1014 ) connections. A diagram of the major
components of a neuron is given in Fig. 1.3, and Fig. 1.4 shows a sample of neuron types, which
can vary widely in geometry and types of ionic channels.
1.2.1 Action potentials
The membrane that encloses the neuron contains a large number of ion channels, pores that
allow charged ions to pass across it. Most are selective for a specific ion, with N a+ , K + , Ca2+ , and
Cl− the most common. These pores control the flow of each ion by opening and closing in response
both to the voltage across the membrane and to internal or external signals. The membrane voltage
(or more correctly, trans-membrane voltage) – the difference between the electrical potential inside
and outside the cell – is the key variable. A typical resting potential (the potential inside the neuron
relative to the extracellular fluid measured when the neuron is not spiking) is approximately -70
mV. An action potential (AP) or spike is a fluctuation on the order of 100 mV that lasts about a
millisecond. The resting potential is maintained by ion pumps, which remove N a+ from the cell
and bring K + into the cell. Apart from the pumps, ions flow along concentration gradients from
high to low (provided their channels are open), and along voltage gradients, negative ions moving
towards higher potential, and positive ions towards lower potential. A positive current is defined
outward, and consists of positive ions flowing out or negative ions flowing in. A positive current
hyperpolarizes the cell (makes the voltage more negative), and a negative inward current depolarizes
it (makes the voltage more positive)3 .
3
More confusing terminology.
5
Figure 1.4: Some different types of neurons, classified primarily by their morphology or location.
(students.uni-marburg.de).
6
Depolarization leads to spikes, and if the depolarization is large enough, the cell will sponta-
neously spike, with the spike voltage fluctuation often at least an order of magnitude larger than
the original depolarization. The spike is typically caused by N a+ channels opening quickly and
allowing an inrush of N a+ ions, which raises the voltage and creates the spike, followed by K +
channels opening, allowing an outflow K + , while the N a+ channels close, which drops the mem-
brane voltage below resting potential. While the pumps work to reestablish concentration gradients
which enable the spike, there is a short period of time, the absolute refractory period, during which
the cell cannot spike, followed by a relative refractory period as ion concentrations recover, during
which it is very difficult, but not impossible, to spike.
Small changes in voltage insufficient to trigger a spike are called subthreshold potentials. These
are strongly attenuated along the axon and usually cannot be detected even 1 mm away from the
soma. Action potentials, on the other hand, regenerate along the axon and travel long distances
without attenuation. Wave propagation in this active medium, accompanied by waves of ionic
transport across the cell membrane, is quite different from propagation through a passive medium
like that of sound waves in still air and electrical pulses in wires. Some cells (e.g., bipolar cells in
the retina) typically exhibit only non-spiking subthreshold oscillations.
1.2.2 Synapses
Neurons communicate via synapses (sometimes called chemical synapses) and direct electrical
contacts called gap junctions (electrical synapses). Gap junctions influence the voltages of neigh-
boring cells much as if they were connected by a simple electrical resistor.
Synapses are points at which an axon of the presynaptic cell is in close contact with a dendrite
of the postsynaptic cell. When an AP arrives at a synapse, channels in the presynaptic terminal
open to allow an influx of Ca2+ from the extracellular medium. The increased internal calcium
concentration causes vesicles that contain neurotransmitter molecules to fuse with the cell mem-
brane, thereby releasing neurotransmitter molecules that diffuse across the synaptic cleft. The other
(postsynaptic) side of the synapse is a dendritic spine (a projection from a dendrite). The neuro-
transmitters open ion channels on the spine, thereby causing currents that either excite (depolarize)
or inhibit (typically hyperpolarize) the post-synaptic neuron. A given synapse is either excitatory or
inhibitory and the resulting signals are called excitatory (resp. inhibitory) post-synaptic potentials,
or EPSPs (resp. IPSPs) for short.
1.3 Neural coding
Neural coding is a major field in itself, and in chapter 5 we will present some key questions and
the mathematics used to address them. The problem has two components: encoding and decoding.
Close your eyes and run a finger along a rough surface. What happens as you ‘feel’ the texture
and try to identify the material? Force and displacement sensors responding to continuously vary-
ing signals (stimuli) cause neurons in the peripheral nervous system to spike, conveying discrete
7
sequences of APs to the somatosensory cortex (see §§1.5-1.6). From these sequences your brain ‘re-
constructs’ the surface. In parallel with this process, encoding tries to understand the production of
spike sequences produced by a neuron in response to a stimulus. Remember that individual spikes
are separated by refractory periods, and the stimulus can be changing rapidly, perhaps on the order
of the interspike interval. Decoding attempts to recover the stimulus from a spike sequence. Given
the variability in neurons and the sparsity of spikes compared to changes in the stimulus, encoding
and decoding are often studied in terms of firing rates, which translate into the probability of a
spike occurring in a small time interval for an individual neuron, or into the average firing rate at a
given time for a population of neurons receiving a common stimulus. However, there is increasing
emphasis on the role of individual spike timing [219].
1.4 The central nervous system
The central nervous system (CNS) is bilateral and has seven major divisions: the spinal cord, the
medulla oblongata, pons and midbrain (which form the brain stem), the cerebellum, diencephalon,
and the cerebral hemispheres: see Fig. 1.5.
Figure 1.5: The central nervous system. (emedicinehealth.com).
The anatomy of the CNS and and its divisions is referenced by axes with directions defined as
rostral, caudal, ventral, and dorsal. In most animals, the axes directions are consistent for each
division and for the CNS as a whole. Rostral means towards the front (head), caudal towards
the tail, ventral towards the belly, and dorsal towards the back. In a rat, for example, these axes
8
maintain orientation for the CNS as a whole, and for the cerebrum locally, as seen in Fig. 1.6. In
higher primates, there is an orientation change above the spinal cord, as seen in Fig. 1.7. For the
CNS as a whole and the spinal cord, rostral is towards the head, caudal towards the coccyx, ventral
towards the belly and dorsal the back. Above the spinal cord, and in the brain, rostral is rotated
and is towards the front, caudal towards the back of the head. Dorsal is towards the top and ventral
towards the bottom, although these two terms are often replaced by superior and inferior. For the
remainder of the course, we will try to use these four terms to indicate location and direction, but
their orientation depends on the system being discussed, and we may be inconsistent anyway, so
stay alert.
Figure 1.6: Central nervous system orientation and axes for a rat, typical of most vertebrates.
(fmrib.ox.ac.uk).
Figure 1.7: Central nervous system orientation and axes for humans, typical of higher primates.
(fmri.ox.ac.uk).
Spinal cord: The spinal cord synapses with nerves leading into the limbs and the trunk, allowing
control of their movement and reception of sensory information. A cross-section of spinal cord
reveals an ‘H’ of gray matter surrounded by white matter, as seen in Fig. 1.8. The ventral horn
of the gray matter (the leg of the ‘H’ pointing towards the belly) contains the cell bodies of motor
neurons that control muscles. The dorsal horn contains sensory neurons that receive inputs from
the trunk and limbs. The white matter is mostly axons that run up and down the spinal cord, the
white actually being the myelin covering the axons.
9
Figure 1.8: Cross section of a spinal cord, revealing gray matter and white matter. (nap.edu).
Brain stem: The brain stem lies at the rostral end of the spinal cord and consists of the medulla
oblongata, the pons, and the midbrain. As a whole, it controls the muscles of the face and head,
receives sensory signals from the face and head, conveys information between the brain and the
spinal cord, and regulates levels of arousal and awareness. The medulla oblongata governs vital
autonomic functions such as digestion, breathing, and heart rate. The pons is the link between the
cerebrum and the cerebellum. Finally, the midbrain contains sensory and motor function of the
head and face, controls eye movement, and is part of the pathway for visual and auditory reflexes.
Cerebellum: The cerebellum controls the force and range of movements and is important in
learning motor skills. The cerebellum actually contains more neurons than any other division of
the CNS, even the cerebrum.
Diencephalon: The diencephalon contains the thalamus and hypothalamus. The thalamus passes
sensory information to the cerebral cortex from the rest of the CNS, but it is more than a relay. It
has a gating and modulatory function, governing which sensory inputs receive conscious attention.
The hypothalamus regulates autonomic, endocrine, and visceral function, as well as many other
functions through its control of the pituitary gland. The hypothalamus is also related to motivation
and pursuing actions which are rewarding. The hypothalamus can cause the sensation of hunger, for
example, and is involved in the complicated process of arousal and selective attention. Dopaminergic
neurons in the midbrain are involved in perception of reward, and are also part of the larger attention
process. In this way, these regions exert strong modulatory effects on the entire CNS and body.
Cerebral hemispheres: The two cerebral hemispheres contain the cerebral cortex, which on each
side consists of four lobes (frontal, parietal, temporal, and occipital), the basal ganglia, hippocam-
pus, and amygdaloid nuclei (a nucleus refers to a cluster of neurons). The basal ganglia regulate
movement and are involved in cognitive processes such as learning skills. The hippocampus relates
to memory, although it is not a memory storage organ. It is responsible for formation of long-term
memories, and damage does not affect existing memories, but degrades the ability to make new
ones. The amygdaloid nuclei, or amygdala, governs autonomic and endocrine responses to emo-
tional states. It discerns any emotional and motivational component to sensory stimuli, and its
projection to the brain stem allows it to affect the body. For instance, when experiencing fear, the
amygdala is part of the pathway that results in increased heart rates. The cerebral cortex is very
complicated and will be treated in its own section.
10
1.5 The cerebral cortex
As in §1.2 we start with some anatomy. The cerebral cortex is the furrowed ‘gray matter’ which
allows cognitive function. It also has a bilateral structure and each hemisphere is divided into four
lobes. The frontal lobe is an essential part of the pathway for motor control, and it is concerned with
future actions. The parietal lobe processes somatic sensation, which also involves understanding
the current position and orientation of the body and its parts and the relation of the body to space.
The occipital lobe is mostly occupied with processing vision. The temporal lobe processes auditory
inputs, houses the hippocampus and amygdaloid nuclei, and is part of visual memory. The lobes
of the brain are diagrammed in Fig. 1.9.
The two hemispheres are connected by the corpus collosum, a thick bundle of mostly white
matter which allows information to pass between hemispheres. The surface of the cortex has
many distinctive grooves and ridges, which most people think of when they think of ‘brains.’ The
grooves are called sulci and the ridges gyri. Most cognitive processes reside in the cortex, which is
typically only 2 to 4 mm thick. Therefore, the number of neurons available for cognitive processes
is proportional to cortical area. The numerous deep infoldings provide a way to increase the cortex
area in a given volume. The extent of infolding varies from species to species, increasing in the
higher primates and being most evident in humans. The central sulcus lies between the parietal
and frontal lobes, dividing sensory and motor areas. The lateral sulcus defines the temporal lobe,
separating it from the frontal and parietal lobes. In addition to the four lobes, there are two major
cortical areas that are not part of any specific lobe. Above the corpus collosum is the cingulate
cortex, and inside the lateral sulcus is the insular cortex. The insular cortex cannot be seen from
outside the brain, since the parietal and temporal lobes bulge around it.
The cerebral cortex exhibits contralateral function: each hemisphere is primarily concerned
with the opposite side of the body. The left side of the brain receives input from and controls
the right hand. Vision is slightly different, the signals are not distributed depending on what eye
they are from, but rather which side of the body they see: the part of each retina that sees the
left half-plane sends its signals to the right hemisphere and vice-versa. Recall that in the spinal
cord, inputs from the right side go into the right side of the spinal cord, but in the brain they are
switched. Between entering the dorsal horn of the spinal cord and arriving at the cerebral cortex,
the pathway switches sides. These switches are called decussations and they occur at different
points in the ascent, depending on the pathway. Surprisingly, the cerebral hemispheres themselves
are not symmetrical: there are deviations from symmetry not only in structure, but also function.
1.5.1 Discovery of organization of the cerebral cortex
The function of the brain has been studied and debated seriously for about two centuries. Gall
theorized that different regions of cortex perform different functions, to such an extent that he
divided the brain into 35 separate organs, each with a different mental function. He thought that
use of certain faculties would increase the size of the corresponding section of cortex, as exercise
does for muscles. His ideas led to the pseudoscience of phrenology, in which personal traits and
abilities are ascertained from bumps in the skull supposedly caused by changes in these regions of
cortex. In the 1820’s, Flourens tested this theory experiementally by removing different regions
11
Figure 1.9: The four lobes of the brain and other prominent structural features. (sruweb.com/-
walsh).
12
from animals. Removing the entire region that Gall claimed corresponds to a certain mental ability
did not destroy that ability in these animals. Flourens decided that the entire brain was involved
in each mental function and introduced the aggregate-field view of the brain.
In the mid-nineteenth century, Jackson, and later Carl Wernicke, Ramon y Cajal, and C.S.
Sherrington showed that sensations and motions in different body parts correspond to different
cortical areas. Jackson worked with epilepsy, and saw that convulsions in different parts of the
body can be traced to specific and different parts of cortex. The later three developed a new view
called cellular connectionism: neurons are basic signalling units, and are arranged in functional
groups which connect to other functional groups in an ordered sequence. At the start of the 20th
century, Brodmann distinguished 52 functionally distinct areas of cortex with his cytoarchitectonic
method. In spite of mounting and compelling evidence for distinct functional areas by the start
of the twentieth century, the aggregate-field view dominated the field for several more decades,
primarily due to the pronouncements of prominent neural scientists, including Head, Goldstein,
Pavlov, and Lashley. Lashley tested for distinct areas by looking for the area that allows rats to
find their way through a maze. No matter which region he lesioned, the rats were able to find their
way through the maze, and Lashley considered this proof of aggregate-field. In retrospect, we see
that many faculties have parallel pathways which can compensate for damage to one of the routes
through the brain. For instance, when a rat’s vison system was damaged, it was able to find its
way through the maze using whiskers. The unscientific nature of phrenology probably played a
significant role in the scientific opposition to distinct areas. As the century continued, the evidence
was finally sufficiently overwhelming to change the dominant scientific view. For example, a small
lesion can destroy the ability to recognize people by name but not by sight.
Until recently, everything we knew about function of different brain regions was learned in
lesion studies, in which physical damage to the brain is observed to correlate with losses in specific
functions. For instance, in 1861, Broca was studying aphasia, a language disorder. He had a
patient whose mouth, vocal cords, and tongue appeared physically intact, but who could not speak
coherently. This man could understand language and utter isolated words and hum music, but
he could not speak. Following the patient’s death, Broca studied his brain and found a lesion in
the left hemisphere. Over the next three years, eight similar cases were found to have lesions in
the same spot. This led to Broca’s famous announcement that we speak with the left hemisphere.
Wernicke later discovered that Broca’s region is not the entire language region, but there is another
part of the left hemisphere that allows comprehension of language. Later it was discovered that
lesions to Broca’s area make deaf people unable to sign, even when they know sign language before
damage. Today, it is possible to more finely subdivide cortical function, and we can now study
distinct regions in fully functional, healthy brains without lesions using imaging methods (see §1.7).
For example, the 5 visual processing areas noted by Brodmann have now been subdivided into 35
functionally distinct regions.
In summary, specific brain regions are not concerned with individual mental faculties, as Gall
had supposed. Each region performs a basic processing task, but they are linked in both series
and parallel, with “reentrant” (two way) connections, to make higher faculties such as language
possible. Damage to a single area does not necessarily destroy the faculty, since parallel pathways
can compensate, and the brain can even reorganize linkages in some cases.
13
1.5.2 Functional organization of the cerebral cortex
The cerebral cortex is 2 to 4 mm thick across species, and is arranged in six distinctly-defined
layers, which are structurally different: some are home to the main cell bodies, some for projection
to later regions, some for inputs, some for back-projection to previous regions. The layered structure
allows for efficient space utilization and organization of input-output and feedback connections.
Basic processing takes place in cortical columns, each of which would fit in a 1 mm diameter cylinder.
These columns are viewed as the fundamental computational modules, and not surprisingly, humans
have many more of them than rats.
The primary visual cortex lies in the occipital lobe near the calcarine sulcus. The primary
auditory cortex is in the temporal lobe on gyri on the lateral sulcus, which separates the temporal
lobe from the parietal and frontal lobes. Somatosensory processing is caudal to the central sulcus on
the post-central gyrus in the parietal lobe. The primary motor cortex is rostral to the central sulcus,
just opposite the somatosensory region, as seen in Fig. 1.10. The cortex contains topographical
representations of the retina, cochlea, and body itself. Thus, that neighboring sections of auditory
cortex represent neighboring sections of cochlea, for example. The sensory and motor cortex each
have well-defined body maps including arms, hands, even fingers. These maps do not have equal
representation, but cortex area is distributed in proportion to the density of sensors or the fineness of
movement required in the represented area. For example, the fingers are enormous and completely
out of normal proportion in this somatosenory body map. Higher order areas do not exhibit as
clearly-defined a map, although topographical representation may still be present. Higher order
motor areas, for example, are located rostral to primary motor cortex in the frontal lobe. Higher
order vision areas are rostral to primary visual cortex, area V1. Note that primary somatosensory
is the first stop in the sensory pathway into cortex, but the primary motor cortex is the last stop
in the motor pathway out of cortex.
Figure 1.10: Locations of primary motor and somatosensory cortices, with some other lobes and
brain areas indicated. (emc.maricopa.edu).
14
1.6 The peripheral nervous system
The peripheral nervous system has somatic and autonomic functions. Somatic refers to the
nerves and nerve endings in skin, muscles, and joints. Sensory information from nerve endings
relays to the spinal cord information about the current configuration of the body and about its
surroundings. Motor control of muscles is also partially achieved through nerves of the peripheral
nervous system. Such feedback, originating in sensors embedded within the body that monitor
body states, is called proprioceptive.
Autonomic functions include visceral sensation, motor control of viscera, motor control of smooth
muscles, and control of exocrine functions. The autonomic functions can be divided into three
systems. The sympathetic system governs the body’s response to stress, the parasympathetic system
governs conservation of body resources and homeostasis, and the enteric system controls the smooth
muscle in the gut. Functions of the autonomic nervous system are not consciously controlled. In
contrast, exteroceptive sensing involves monitoring the external environment via the senses of sight,
hearing, touch, taste and smell, which typically requires brain activity, and may rely on conscious
reflection.
1.7 A note on experimental methods
Membrane voltage recordings played a central role in the development of the Hodgkin-Huxley
equations (see §3.3) and thy remain essential for studying individual neurons and small circuits.
Such recordings, or more general experiments on the neural system, are carried out via two major
avenues in vivo experiments on live animals, and in vitro experiments on slices or aggregates of
cells removed from the organism and maintained under suitable conditions.
Changes in membrane potential can be recorded intracellularly or extracellularly. Intracellular

recordings are usually done for in vitro experiments. Either a sharp electrode pierces the membrane,
achieving electrical contact with the intracellular fluid, or the electrode attaches to the outside of the
cell, sealing a patch of membrane which is then broken. The electrode typically is attached at the
soma, but new techniques allow measurement in the dendritic tree as well. Intracellular recordings
are usually not performed on the axon, but when they are, only spikes are discernable, since other
voltage fluctuations are attenuated. Extracellular recordings are used for in vivo experiments, and
do not damage the cell membrane. The disadvantage is that the voltage fluctuations measured
during a spike are three orders of magnitude less than for intracellular recordings. Spikes are
detected, but not subthreshold fluctuations, and spikes from neighboring cells may also appear on
the voltage trace. This has led to sophisticated ‘spike-sorting’ methods and algorithms. Membrane
voltages can also be recorded optically, using voltage sensitive dyes.
Neuroanatomical tracing techniques have provided improved techniques for tracing axonal pro-
jections, and thereby establishing connectivity among brain areas. Originally, researchers cut an
axon and looked for the affected cell bodies. Now, particles can be passed from axon to soma, or
from soma to axon, and often from cell to cell. Horseradish peroxidase and the herpes simplex
virus are both used, in addition to dyes or radioactively-labeled amino acids.
15
Optogenetic methods for imaging and interacting with individual neurons and groups of neurons
in vivo have recently been devloped (e.g. [63, 181]). Light-sensitive molecules are introduced into
specific cells by genetic targeting. These cells can then signal their voltages via fluorescence, and
can be activated or suppressed by solid-state or fiberoptic light sources of suitable wavelengths.
The resulting observations of neural activity at high spatial and temporal resolution provide access
to brain and CNS functions in behaving animals. In parallel, striking advances in probing the
structures of neural circuits at subcellular scales include CLARITY [46] and Connectomics [235].
These technologies, and the associated data analysis methods, drawing on machine learning and
statistics, are drawing neuroscientists, mathematicians and computer scientists closer.
Figure 1.11: An fMRI image, showing active brain areas. (biologydaily.com).
Functional magnetic resonance imaging (fMRI) has emerged over the past decade as an indis-
pensible technique for understanding the brain. Hemoglobin molecules change from diamagnetic
to paramagnetic when deoxygenated, allowing fMRI to tracks changes in blood oxygenation. The
fMRI blood oxygenation level-dependant (BOLD) signal shows which areas of the brain are most
active at any given time, as illustrated in Fig. 1.11. (This is a very complex procedure: the sig-
nal is small and must be ‘deconvolved’ to produce ‘activity maps’ like that of Fig. 1.11.) Two
and three-dimensional images can be reconstructed with good spatial resolution (on the order of 1
mm3 ), but data collection of each image takes 1-2 seconds. This temporal resolution is insufficient
to track neural activity during rapid behaviors, but it can provide useful data for cognitive tasks
that extend over several seconds (e.g. working memory). (Magnetoencephalography (MEG) offers
much better temporal resolution, and spatial resolution close to that of fMRI.) Subjects can be
given tasks while in the scanner, and brain areas showing increased activity during the task identi-
fied. These presumably play some part in the ‘circuit’ (assembly of areas) that executes the task.
This technique is leading to greater understanding of the collaborative functions of different brain
regions, allowing substantial refinement of Brodmann’s divisions. It has also engaged physicists,
mathematicians and computer scientists in neuroscience.
Electroencephalograms (EEGs) are collected from multiple scalp electrodes, much like electro-
cardiograms from the chest, and they offer excellent temporal resolution, but it is difficult to localize
which brain areas are responsible for the signals detected at the scalp surface. This requires the
solution of a hard electro-magnetic inverse problems like those used to interpret acoustic signals
during prospecting for oil and minerals. (There is also beautiful mathematics in this area, but no
space for it here.)
Behavioral experiments are also important, especially in cognitive psychology, even though they
do not directly probe the anatomy and physiology of brain tissue. First of all, without observation
and quantification of behavior during voltage recordings and fMRI scanning, the recorded data
16
cannot be interpreted. In addition, behavioral data can be fit to ‘high-level’ abstracted models of
neural systems, and important descriptors of the neural system thereby inferred. Behavioral data
includes distributions of reaction times to stimuli, and descision times and error rates on cognitive
choice tasks. In chapter 6 we will discuss behavioral measures of this type, and describe some
simple models that may link them with underlying neural processes.
17
Chapter 2
Mathematical Tools: ODEs,

Numerical Methods and Dynamical
Systems
In this chapter we provide a survival course in basic methods and ideas from the qualitative the-
ory of ordinary differential equations (ODEs) and dynamical systems theory. Background material
on linear ODEs can be found in textbooks such as [25], and on nonlinear systems in [121], and
relevant ideas and results are also reviewed in texts on mathematical neuroscience, including [277]
and [75, Chap. 3]. More advanced treatments are provided by [12] and [107]. Here we will intro-
duce basic ideas that build on knowledge of multivariable calculus and linear algebra, and we will
illustrate the theory with neurobiological examples.
The mathematical objects of interest are systems of first order ODEs, which may be written in
compact vector notation as:
d def
x = ẋ = f (x); x ∈ Rn , x(0) = x0 , (2.1)
dt
where x0 is an initial condition, and
   
f1 (x1 , . . . , xn ) x1
f (x) =  .. ..  .
, x =  (2.2)
  
. . 
fn (x1 , . . . xn ) xn
This is sometimes called an ODEIVP (initial value problem) to distinguish it from boundary value
problems (BVPs). The time-dependent quantities xj = xj (t) are called state variables and they
evolve in an n-dimensional state space, which we can usually take to be Euclidean space Rn . Often
the functions fj (x) depend upon parameters µ = (µ1 , . . . , µk ), in which case we may write f (x; µ)
or fµ(x), to indicate the different status of x and µ. Indeed, while the parameters may themselves
change, they usually do so on a slower scale than x, and so may be taken as constant for the
purposes of solving (2.1). The function f (x) (or f (x; µ) or fµ(x)) defines a vectorfield on Rn , which
may be thought of as a forest of arrows telling which way solutions of (2.1) must go at each point.
18
One can think of it as the velocity field of an (n-dimensional) “fluid” that permeates state space,
and we say that it generates a flow on the state space. See Fig. 2.1.
4.5 5
4
4
3
3.5
2
3
1
x(t)
x2
2.5 0
−1
2
−2
1.5
−3
1
−4
0.5 −5
0 1 2 3 4 −5 0 5
Time t x1
Figure 2.1: Vectorfields and some flowlines (solutions). (a) A non-autonomous scalar system ẋ =
f (x, t); (b) an autonomous two-dimensional system.
In general, flow maps are nonlinear, and while we cannot give explicit formulae like (2.8-2.10) be-
low, the local existence-uniqueness theorem [121] guarantees that the nonlinear ODE (2.1) generates
a flow map:
x(t, x0 ) = φt (x0 ) or φt : Rn → Rn , (2.3)
at least for short times t, provided that the functions fj (x) (or, collectively, f (x)) are smooth.
This follows from assembling the unique solutions of (2.1) for each initial condition x(0) = x0 ,
which exist for finite time intervals around t = 0. Ideally, we would like to find sets of solutions to
appropriate ranges of initial conditions, and to study how they depend upon those initial states,
and on the parameters. Alas, it is a sad fact that explicit solutions expressible in terms of known
functions can be found for very few functions f . Nonetheless, it is often relatively easy to find
special solutions such as fixed points, also called equilibria: points at which f (xe ) = 0 and so xe
remains unchanged.
2.1 Linearization and solving linear systems of ODEs
Suppose that x = xe is an equilibrium or fixed point, i.e. f (xe ) = 0 (⇔ fj (xe1 , . . . , xen ) = 0, 1 ≤

j ≤ n ). So, if x0 = xe , we conclude that x(t) ≡ xe . Let x(t) = xe + ξ(t), where ξ(t) = (ξ1 , . . . , ξn )
is a small perturbation: |ξ(t)| ≪ 1. Substitute into (2.1) and expand f in a multivariable, vector-
valued Taylor series1 to obtain
ẋe + ξ̇ = f (xe + ξ) = f (xe ) + Df (xe )ξ + O(|ξ|2 ). (2.4)

1
Here and in general we assume that f is sufficiently differentiable so that Taylor’s Theorem with remainder applies
to each component.
19
h i
∂fi
In (2.4) Df (xe ) denotes the n × n Jacobian matrix of partial derivatives ∂xj , evaluated at the
fixed point xe , and the “Big Oh” notation O(|ξ|2 ) characterizes the magnitude of quadratic and
higher order terms in the components ξ1 , . . . , ξn . Specifically, g(ξ) = O(|ξ|p ) means that, for each
component gi v
u n
gi (ξ) uX
lim p
≤ k < ∞ , where |ξ| = t ξi2 . (2.5)
|ξ|→0 |ξ|
i=1
More generally, g(x) = O(|h(x)|) as x → x0 means that the quotient g(x)/h(x) is bounded as x →
x0 . (We will occasionally also use the less precise ∼ notation, as in sin x ∼ x and sin x ∼ x − x3 /3.)
Returning to Eqn. (2.4), for small enough |ξ|, the first order term Df (xe )ξ dominates. Taking
into account that ẋe and f (xe ) vanish and ignoring the small term O(|ξ|2 ), we get the linear system
ξ̇ = Df (xe )ξ (2.6)
The constant-coefficient linear ODE system (2.6) is called the linearization of (2.1) at xe . It can
be solved by standard methods, as outlined below.
The general solution of (2.6) or of any linear system of the form
ẏ = By, (2.7)
where B is an n × n matrix and y = (y1 , . . . , yn ) is an n-vector, is determined by the eigenvalues

and eigenvectors of B (or Df (xe )) [25]. Specifically, if the matrix B has n linearly-independent
eigenvectors vi , i = 1, . . . , n with eigenvalues λi , we can form an n × n, real-valued, fundamental
solution matrix Xe (t) with columns vi eλi t . As in (2.9), if complex eigenvalues λ± = α ± iβ
occur, Xe can still be chosen real by taking adjacent columns as eαt [u cos(βt) − w sin(βt)] and
eαt [w cos(βt) + u sin(βt)], where v = u ± iw is the complex conjugate pair of eigenvectors belonging
to λ± . This follows from the facts that, if λ is a real eigenvalue with eigenvector v, then there is a
solution to (2.7) of the form:
y(t) = cveλt , (2.8)
and if λ = α ± iβ is a complex conjugate pair with eigenvectors v = u ± iw (where u, w are real)
then
y1 (t) = eαt (u cos βt − w sin βt) and y2 (t) = eαt (u sin βt + w cos βt) (2.9)
are two linearly-independent real-valued solutions.
A short calculation (which will be done on demand) shows that, no matter how the columns of
Xe (t) are ordered, the solution of (2.7) with initial condition y(0) = y0 is given by:
y(t) = Φ(t)y0 , where Φ(t) = Xe (t)[Xe (0)]−1 . (2.10)
This special fundamental solution matrix Φe (t), which satisfies Φ(0) = I, is the flow map for (2.7).
It is an explicit rule (formula) that tells us how the vectorfield evolves solutions forward: it defines
a time-dependent flow map that advances solutions in the state space Rn . The flow map (2.10) is
linear, like the ODE that defines it.
Exercise 1. (Linear algebra review) Verify that the flow map (2.10) does solve (2.7). What hy-
potheses must the eigenvectors vi satisfy for the matrix Xe (t) to be invertible? For what sorts of
matrices B do these hypothesis always hold?
20
Here we are concerned with qualitative properties rather than exact or “complete” solutions. In
particular, in studying stability we want to know whether the size of solutions grows, stays constant,
or shrinks as t → ∞. This can usually be answered just by looking at the eigenvalues2 . Indeed, the
real part of λ (almost) determines stability. Since any solution of (2.7) can be written as a linear
superposition of terms of the forms (2.8)-(2.9) (except in the case of multiple eigenvalues), we can
deduce that
• If all eigenvalues of B have strictly negative real parts, then |y(t)| → 0 as t → ∞ for all
solutions.
• If at least one eigenvalue of B has a positive real part, then there is a solution y(t) with
|y(t)| → +∞ as t → ∞.
• If some eigenvalues have zero real part with distinct imaginary parts, then the corresponding
solutions oscillate and neither decay nor grow as t → ∞.
Definition 1. The fixed point xe of (2.1) is hyperbolic or non-degenerate, if all the eigenvalues
of Df (xe ) have non-zero real parts.
2.1.1 Stability of solutions of ODEs
To make use of all this we need a notion of stability, which we define in a geometrical context,
suitable for our picture of state space. See Figs. 2.2-2.3 for illustrations.
Definition 2 (Liapunov stability). xe is a stable fixed point of (2.1) if for every neighborhood
U ∋ xe there is a neighborhood V ⊆ U such that every solution x(t) of (2.1) starting in V (x(0) ∈ V )
remains in U for all t ≥ 0. If xe is not stable, it is unstable.
Definition 3 (Asymptotic stability). xe is asymptotically stable if it is stable and additionally

V can be chosen such that |x(t) − xe | → 0 as t → ∞ for all x(0) ∈ V .
x(0)
V x(0)
e
x xe x(0)
x(0)
U U V
Figure 2.2: Stability. Figure 2.3: Asymptotic stability: a sink.
A fixed point that is Liapunov stable but not asymptotically stable is sometimes called neutrally
stable. Equipped with these definitions and the linear analysis sketched above, and recognizing
2
Unless there are repeated eigenvalues (multiplicity ≥ 2). This is trickier – see [25, Ch. 7.7] or ask me.
21
that the remainder terms ignored in passing from (2.4) to (2.7) can be made as small as we wish by
selecting small neighborhood of xe , we can conclude that the stability type of the nonlinear system
(2.1) under the flow φt is locally determined by the eigenvalues of Df (xe ):
Proposition 1. If xe is a fixed point of ẋ = f (x) and all the eigenvalues of Df (xe ) have strictly
negative real parts, then xe is asymptotically stable. If at least one eigenvalue has strictly positive
real part, then xe is unstable.
Further geometric information on the structure of solutions is given in §2.3.1.
Borrowing from fluid mechanics, we say that if all nearby solutions approach a fixed point (e.g.
all eigenvalues have negative real parts), it is a sink ; if all nearby solutions recede from it, it is
a source, and if some approach and some recede, it is a saddle point. When the fixed point is
surrounded by nested closed orbits, we call it a center.
We might hope to claim that stability (per Definition 2) holds even if (some) eigenvalues have
zero real part, but the following counterexample crushes our hopes:
ẋ = αx3 , α 6= 0. (2.11)
Here x = 0 is the unique fixed point and the linearization at 0 is
ξ˙ = 3αx2 ¯x=0 ξ = 0,
¯
(2.12)
with solution ξ(t) = ξ(0) = const., so certainly x = 0 is stable for (2.12). But the exact solution
of the nonlinear ODE (2.11) may be found by separating variables:
Z x(t) µ ¶
dx 1 1
Z
3
= αdt ⇒ − − = αt
x(0) x 2x(t)2 2x(0)2
x(0)
⇒ x(t) = p . (2.13)
1 − 2α x(0)2 t
We therefore deduce that
)
1
|x(t)| → ∞ as t → 2α x(0) 2 if α > 0 (blow up! instability)
.
but |x(t)| → 0 as t → ∞ if α < 0 (asymptotic stability)
In this case the linearized system (2.12) is degenerate and the nonlinear “remainder terms,” ignored
in our linearized analysis, determine the outcome. Here it is obvious, at least in retrospect, that
ignoring these terms is perilous, since while they are indeed O(ξ 2 ) (in fact, O(ξ 3 )), the linear O(ξ)
term is identically zero! Moreover, global exisitence ∀t ∈ R fails for (2.11).
Example 1. Consider the two-dimensional system
x˙1 = x2 + α(x21 + x22 )x1 ,
ẏ = −x1 + α(x21 + x22 )x2 .
Note that the linearization is simply a harmonic oscillator with eigenvalues ±i. Is the equilibrium
(x, y) = (0, 0) of this system stable or unstable? To answer this, it is convenient to transform to
polar coordinates x1 = r cos θ, x2 = r sin θ, which gives the uncoupled system:
ṙ = αr3 , θ̇ = −1.
22
The first equation is identical to the counterexample of (2.11) above, so we may conclude: α > 0 ⇒
unstable; α = 0 ⇒ stable; α < 0 ⇒ asymptotically stable. The linearized system is Liapunov stable
for all α.
How can we prove stability in such degenerate cases, in which one or more eigenvalues has zero
real part? One method requires construction of a function which remains constant, or decreases,
along solutions. For mechanical systems the total (kinetic plus potential) energy is often a good
candidate.
2.1.2 Liapunov functions
Liapunov functions allow one to prove stability and even asymptotic stability in certain cases.
We describe Liapunov’s ‘second method’ or ‘direct method’ for ODEs of the form (2.1). Also
see [277, Chap. 14].
Theorem 1. Suppose that (2.1) has an isolated fixed point at x = 0 (w.l.o.g. one can move a
fixed point xe to 0 by letting y = x − xe ). If there exists a differentiable function V (x), which is
positive definite and for which dV
dt = ∇V · f is negative definite on some domain D ∋ 0, then 0 is
asymptotically stable. If dt is negative semidefinite (i.e., dV
dV
dt = 0 is allowed), then 0 is Liapunov-
stable.
½ ¾
V (x) > 0, x 6= 0
Proof. (sketch) V is positive definite: ⇒ the level sets of V are hyperspheres
V (x) = 0, x = 0
surrounding x = 0.
¾
V̇ (x) < 0, x 6= 0
V̇ is negative definite: ⇒ solutions of (2.1) cross the level sets inwards
V̇ (x) = 0, x = 0
converging on x = 0. The level sets of V provide our neighborhoods U, V , each one crossed inwards
by all solutions. Thus, as x → 0 we get an asymptotic stability.
¾
V̇ (x) ≤ 0, x 6= 0
V̇ is negative semidefinite: ⇒ solutions of (2.1) either cross level sets
V (x) = 0, x = 0
inwards, or are confined to level sets. Again, we use the level sets of V for obtaining neighborhoods
U, V of x; but cannot conclude that 0 is asymptotically stable.
Example 2. Example 1 revisited. Let V = (x2 + y 2 )/2 and compute
V̇ = xẋ + y ẏ = x(y + α(x2 + y 2 )x) + y(−x + α(x2 + y 2 )y) = α(x2 + y 2 )2 .
We see that, if α < 0 V̇ < 0 for all (x, y) 6= (0, 0), implying asymptotic stability, while for α > 0
V̇ > 0 for all (x, y) 6= (0, 0), implying instability.
Exercise 6 below uses a Liapunov-like function that allows one to prove that solutions flow
“downhill,” crossing level sets and approaching a fixed point for an ODE that has multiple fixed
points.
23
The notions of stability in Defintions 2-3 may be generalized to non-constant orbits of ODEs
(periodic, quasiperiodic, or non-periodic). First, some notation and preparatory definitions are
needed. Let φt be the flow map of Eqn. (2.3) and let φ(x) = {φt (x)|t ≥ 0} denote the set of all
points in the solution or orbit φt (x) based at x.
Definition 4. Two orbits φ(x) and φ(x̂) are ǫ-close if there is a reparameterization of time t̂(t)
(a smooth, monotonic function) such that |φt (x) − φt̂ (x̂)| < ǫ for all t ≥ 0.
Neighboring periodic orbits in a conservative system may circulate at different speeds, as in the
following example (in polar coordinates, cf. Example 1):
ṙ = 0, θ̇ = 1 + r2 ,
where the period T (r) = 2π/(1 + r2 ) depends on amplitude. Reparameterization of time allows for
this by bringing neighboring solutions back into step. Monotonicity of t̂(t) implies that the orbits
are not close merely as sets, but that after the change of timescale every pair of their values φt (x)
and φt̂ (x̂) at succeeding times are close.
Definition 5. A solution φt (x) is orbitally stable if for every ǫ > 0 there is a neighborhood
V ∋ x such that, for all x̂ ∈ V , the sets φ(x) and φ(x̂) are ǫ-close. If additionally V may be chosen
such that for all x̂ ∈ V , there exists a time shift τ (x̂) so that |φt (x) − φt−τ (x̂) (x̂)| → 0 as t → ∞,
then φt (x) is orbitally asymptotically stable.
See Figs. 2.4-2.5, which show (segments of) the orbits φt (x) and their neighborhoods V , each
of which contains a second orbit based at x̂.
φt (x)
x̂
φt (x)
x̂
τ (x̂) corrects for the
‘asymptotic phase’
x̂
φt (x)
Figure 2.4: Orbital stability Figure 2.5: Orbital asymptotic stability
2.2 Numerical methods
As we have remarked, very few ODEs admit explicit solutions in terms of “nice” known funtions
(trignometric, exponential, polynomial, etc.) We therefore resort to numerical solution, happily
calling up Matlab or similar software. But can we believe what it tells us? This section outlines the
24
main mathematical ideas behind numerical integration of ODEs, starting with the simplest such
method, which we treat in some detail. For (much) more information and background, go to a text
on numerical methods, such as [251]. CITE MORE BASIC ODE NA REF??
2.2.1 Euler’s method
Forward Euler method: Consider the general scalar (one-state-space dimension) ODE
ẋ = f (x, t), x(t0 ) = x0 , x ∈ R. (2.14)
Euler’s method is the simplest, but least efficient numerical integrator in terms of the number of
steps needed to propagate solutions for times t ∈ [t0 , t0 + T ] with given accuracy. It discretizes the
derivative ẋ ≈ (xn+1 − xn )/∆t to yield the difference equation:
xn+1 = xn + ∆t f (xn , tn ), (2.15)
with x0 = x(t0 ) (naturally), and tn = t0 + n∆t. This is the forward Euler method (see below for
backward Euler method). For x ∈ Rn , we simply do the above for each component:
xn+1 = xn + ∆t f (xn , tn ), (2.16a)

x1n+1 = x1n + ∆ f1 (x1n , x2n , . . . xN
n , tn ), (2.16b)
x2n+1 = x2n + ∆t f2 (xn , xn , . . . xN
1 2
n , tn ), (2.16c)
..
.
xN N 1 2 N
n+1 = xn + ∆t fN (xn , xn , . . . xn , tn ), (2.16d)
Example 3. Consider the harmonic oscillator
ẍ + x = 0 or ẋ = y, ẏ = −x, with x(0) = x0 , y(0) = y0 ,
the exact solution of which can be expressed via a linear map that is a rotation matrix:
x(t) = x0 cos(t) + y0 sin(t),

y(t) = −x0 sin(t) + y0 cos(t),
µ ¶ · ¸µ ¶
x(t) cos(t) sin(t) x0
= ,
y(t) − sin(t) cos(t) y0
µ ¶ µ ¶
x(t) x0
= X(t) .
y(t) y0
This is an example of the general fundamental solution matrix procedure sketched in §2.1. Euler’s
method approximates the harmonic oscillator as
xn+1 = xn + ∆tyn ,
yn+1 = yn − ∆txn ,
which is a linear transformation:

µ ¶ µ ¶ · ¸
x x 1 ∆t
= [A] , A=
y n+1 y n −∆t 1
25
and det[A] = 1 + (∆t)2 > 1, so areas are expanded. This approximation to the solution matrix
is qualitatively incorrect, since the exact rotation preserves areas, because det[X(t)] = cos2 t +
sin2 t ≡ 1. In the (x, y = ẋ) state space, the exact solutions form circles, but the Euler method
steps continually move out across the circles, each finite step being a segment of a tangent vector
to one of the circles, as seen in Fig. 2.6. The numerical scheme has destroyed the conservative
(Hamiltonian) structure of the original problem.
−1
−2
−3
−4
−4 −3 −2 −1 0 1 2 3 4
Figure 2.6: Exact and Euler-approximated solutions to the harmonic oscillator ẍ + x = 0. On the
2 2 2 2
circles, the total energy x2 + y2 = x +2 ẋ = const. A large step size has been taken to emphasize
errors.
Moral: we can’t expect numerical schemes to preserve constants of motion, constraints, or the
overall structure of equations (but symplectic integrators do preserve Hamiltonian structures).
Back in one dimension, the Euler method is also called the tangent line method, since at each
iteration we take the step
xn+1 − xn = ∆tf (xn , tn ) (2.17)
in the direction of the tangent to a “true” solution, as in Fig. 2.7.
Note that Euler’s method essentially computes a truncated Taylor series of the solution. Letting
φ(t) denote the true (exact) solution based at φ(t0 ) = x0 , we have
φ′′ (t0 )
φ(t) = φ(t0 ) + φ′ (t0 )(t − t0 ) + (t − t0 )2 + O((t − t0 )3 ), (2.18)
2!
with φ(t0 ) = x0 and φ′ (t0 ) = f (x0 , t0 ). Setting t = t1 and t − t0 = t1 − t0 = ∆t (uniform step sizes),
(2.18) gives
φ′′ (t0 ) 2
φ(t1 ) = x0 + f (x0 , t0 )∆t + ∆t + O(∆t3 ). (2.19)
2!
So x1 = x0 + f (x0 , t0 )∆t approximates the true solution φ(t1 ) = φ(t0 + ∆t) with an error of size
|x1 − φ(t1 )| = O(∆t2 ). Similarly, for any successive step,
φ(tn+1 ) = φ(tn ) + φ′ (tn )∆t + O(∆t2 ),
so that, provided |xn − φ(tn )| = O(∆t2 ), we also have
xn+1 = xn + f (xn , tn )∆t = φ(tn+1 ) + O(∆t2 ). (2.20)
26
3
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
2
y
5
1
t
0 0 x
−1
−5 10
5
0 5
−2
−2 −1 0 1 2 −5 0
Figure 2.7: Top: the forward Euler or tangent-line method; solid and dashed curves are exact
solutions, segments with diamonds are Euler steps. Bottom: solutions of Example 3 projected onto
the (x, y) plane and plotted in three dimensional (x, y, t)-space.
At each step we get an error of O(∆t2 ). We shall shortly see how these errors accumulate.
Backward Euler method: In the forward Euler method, as (2.17) shows, we approximate the
time derivative ẋ = f (x, t) by a forward difference quotient: xn+1
∆t
−xn
= f (xn , tn ). We can also
approximate it by a backward difference quotient:
xn+1 − xn
= f (xn+1 , tn+1 ) or xn+1 = xn + ∆tf (xn+1 , tn+1 ). (2.21)
∆t
This formula defines the backward Euler method. Note that in general, (2.21) must be solved
iteratively: we have to find the point xn+1 that we would have arrived at, leaving from xn in the
direction tangent to the vector field at xn+1 . The forward Euler method is explicit – we just plug
in xn and compute – but backward Euler is implicit: xn+1 must be solved for.
Example 4. The linear ODE ẋ = ax, x(0) = x0 is a case that we can solve explicitly for both
methods. The forward Euler method leads to
xn+1 = xn + ∆t(axn ), with initial condition x0 (2.22a)
n
⇒ xn = x0 (1 + a∆t) . (2.22b)
For backward Euler we have
x1 = x0 + ∆t(ax1 ) ⇒ (1 − a∆t)x1 = x0 ⇒ x1 = x0 (1 − a∆t)−1 , (2.23a)
−1 −2
x2 = x1 + ∆t(ax2 ) ⇒ (1 − a∆t)x2 = x1 = x0 (1 − a∆t) ⇒ x2 = x0 (1 − a∆t) ··· (2.23b)
−n
xn = x0 (1 − a∆t) . (2.23c)
Both methods agree up to O(∆t) [xn = x0 (1 + na∆t + · · · )], but thereafter they disagree. Both are
accurate up to O(∆t), with O(∆t2 ) error per step. This is true in general, and not just for the
exponential solution of ẋ = ax.
27
3
2.8
2.6
forward Euler
2.4
2.2
1.8
backward Euler
1.6
1.4
1.2 t0 t1=t0+∆ t
1
0 0.5 1 1.5 2 2.5 3
Figure 2.8: The forward and backward Euler methods compared. The solid and dashed curves
denote exact solutions.
As noted above, the backward Euler formula (2.21) may not be explicitly soluble, so we may have
to use, e.g., Newton’s method to find xn+1 at each step: an iterative loop within an iterative loop.
(See §2.2.4 for Newton’s method). So why be backward, if forward Euler is much simpler? The
answer has to do with stability. Example 3 already shows that forward Euler can turn a (Liapunov)
stable system, the harmonic oscillator, into an unstable growing spiral. More generally, even for an
asymptotially stable scalar equation whose solutions all decay to fixed points or approach particular
solutions x̄(t) as t → ∞, the forward Euler method can give growing solutions if the step size ∆t
is too large. In contrast, the backward Euler method is unconditionally stable.
Example 5. Consider ẋ = −ax, x(0) = x0 , a > 0, for which the exact solution x(t) = x0 e−at is
a decaying exponential. Forward Euler gives:
xn+1 = xn − a∆txn = (1 − a∆t)xn (2.24)
⇒ xn = (1 − a∆t)n x0 → 0 if and only if |1 − a∆t| < 1.
If we pick a∆t < 2, then |1 − a∆t| < 1 and our numerical scheme reflects the stability and
‘contraction’ of solutions inherent in the equation. But if a∆t > 2 then 1 − a∆t < −1 and (1 −
a∆t)n → ∞ in a violent oscillatory manner. Hence ∆t < a2 is a necessary and sufficient condition
for forward Euler to preserve the stability inherent in this example. We may appeal to linearization
of ẋ = f (x, t) near specific solutions x̄(t) to generalize this argument. Conclusion: the forward
Euler method is only conditionally stable.
The backward Euler method has very different behavior: it is unconditionally stable:
xn
xn+1 = xn − a∆txn+1 ⇒ xn+1 =
1 + a∆t
x0
⇒ xn = → 0, ∀ ∆t > 0 if a > 0 ! (2.25)
(1 + a∆t)n
In general, implicit or backward numerical schemes have superior stability properties: they look
ahead, and so do better in reproducing forward asymoptotic behavior.
28
Stability: We now take a geometric view of the oscillatory instability of forward Euler and un-
conditional stability of backward Euler (again for the exponential decay equation). As we have
seen, the forward Euler method applied to ẋ = −ax gives the approximate discrete solution
xn = x0 (1 − a∆t)n , so that a∆t > 2 ⇒ instability. More specifically, with a = 1, we observe
the behaviors shown in Fig. 2.9:
µ ¶n
1 1
∆t = 2 xn = x0 , monotonic decay → 0;
2
µ ¶n
3 1
∆t = 2 xn = x0 − , oscillatory decay → 0 : correct as t → ∞, but a bad approach;
2
∆t = 2 xn = x0 (−1)n , period 2 orbit x0 → −x0 → x0 → · · · : entirely incorrect;
µ ¶n
5 3 3x0 9x0
∆t = 2 xn = x0 − , oscillatory growth, x0 → − → → · · · : a disaster!
2 2 4
Note that the stability condition ∆t < a2 becomes more stringent, requiring smaller ∆t, for larger
a. Paradoxically, very stable systems are harder to integrate accurately! There are special methods
for these stiff systems.
2 2 10
1.5 1.5
1 1 5
0.5 0.5
0 t 0 t 0 t
−0.5 −0.5
−1 −1 −5
−1.5 −1.5
−2 −2 −10
0 5 10 0 5 10 0 5 10
Figure 2.9: Oscillatory instability of forward Euler for a = 1, and ∆t = 1.5, 2, and 2.5.
3.5
2.5
x0
2
1.5
0.5
∆ t1 ∆ t2 ∆ t3
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Figure 2.10: Unconditional stability of backward Euler for step sizes ∆t = 1, 2,and 3.
In contrast, for the backward Euler algorithm xn+1 = (1 + a∆t)−1 xn , so that monotonic decay
persists as ∆t increases, although the decay rate is increasingly underestimated: see Fig. 2.10.
Specifically, for the first step the slope of the piecewise-linear backward Euler approximation is
−ax0 /(1 + a∆t) compared with −ax0 for forward Euler, and the two approximate solution values
at t = ∆t bracket the exact solution:
x0
x0 (1 − a∆t) < x0 e−a∆t < .
1 + a∆t
29
Error Analysis: We have already noted that Euler’s method (forward or backward) produces an
error O(∆t2 ) at each step (Eq. (2.20)). As earlier, letting φ(t) denote the exact solution and xn
denote the (forward) Euler approximation, we compare the Taylor series expansion of φ with xn .
Recalling that φ′ (tn ) = f (φ(tn ), tn ) and using Taylor’s theorem with remainder, we have the exact
expressions
φ′′ (t̃n ) 2
φ(tn+1 ) = φ(tn ) + φ′ (tn )∆t + ∆t , t̃n ∈ [tn , tn+1 ], (2.26a)
2!
xn+1 = xn + f (xn , tn )∆t. (2.26b)
def
Denoting the error at step n as en = |xn − φ(tn )| and subtracting (2.26a) and (2.26b) leads to
∆t2 ¯¯
¯ ¯
′′
¯
en+1 = |xn+1 − φ(tn+1 )| = ¯xn + f (xn , tn )∆t − φ(tn ) − f (φ(tn ), tn ) − φ (t̃n )
¯
2 ¯
1
⇒ en+1 ≤ |xn − φ(tn )| + |f (xn , tn ) − f (φ(tn ), tn )|∆t + |φ′′ (t̃n )|∆t2 . (2.27)
2
Stepwise error: Suppose xn = φ(tn ) (we start at the right place), then (2.27) ⇒ en+1 ≤
1 ′′ 2 2
2 |φ (t̃n )|∆t , so the error is O(∆t ) per step, as we’ve already seen. In fact we can state this
2
in terms of a uniform bound on the second derivative of solutions: en ≤ M ∆t2 , where M =
maxt∈[t0 ,t0 +T ] |φ′′ (t)|.
Global accumulated error: Now pick a stepsize ∆t and take N steps for a total elapsed time
of T = N ∆t. We suppose that T is fixed and ask: “What is the largest total accumulated error at
time t = T ?” To answer this we let K be the Lipschitz constant3 of f (x, t) with respect to x ∈ U ,
and assume that it holds for all t ∈ [0, T ], where U is a bounded set in which the solution φ(t)
remains for t ∈ [t0 , t0 + T ]. Then (2.27) implies that
∆t2
en+1 ≤ en + Ken ∆t + M , M = maxt∈[t0 ,t0 +T ] |φ′′ (t)|,
2
M ∆t2
en+1 ≤ (1 + K∆t)en + , e0 = 0 (start at the right place).
2
M ∆t2
Setting A = (1 + K∆t) and B = 2 , we solve this difference inequality
en+1 ≤ Aen + B, with e0 = 0 (2.28)
as follows:
e1 ≤ Ae0 + B = B,
e2 ≤ Ae1 + B ≤ AB + B = (A + 1)B,
e3 ≤ Ae2 + B ≤ A(A + 1)B + B = (A2 + A + 1)B,
..
.
N −1 µ N ¶
N −1 N −2
X
j A −1
eN ≤ (A +A + · · · + A + 1)B = B A =B .
A−1
j=0
3
If g(x) has Lipschitz constant K then |g(x) − g(y)| ≤ K|x − y|. If g is differentiable then K is the maximum
value of g ′ , but Lipschitz constants can also be defined for continuous non-differentiable functions like g(x) = |x|.
30
M ∆t2 (1 + K∆t)N − 1 M ∆t2 eN K∆t − 1
· ¸ · ¸
⇒ eN ≤ ≤ ; (2.29)
2 K∆t 2 K∆t
α2 N 2
the last inequality uses (1 + α)N ≤ 1 + αN + 2! + · · · = eαN . We conclude that
M ∆t KN ∆t M ∆t KT def
eN ≤ [e − 1] = [e − 1] = K∆t, (2.30)
2K 2K
with a constant K that depends on K, M , and T , but not on N . Hence the global accumulated
error is O(∆t). Euler’s scheme is therefore called a first-order method : to double the accuracy at
a fixed time T , we must halve ∆t and thus double the number of steps N .
2.2.2 The second-order Euler-Heun method
Let φ(t) be an exact solution of ẋ = f (x, t) and consider the formula derived integrating the
ODE: Z tn+1
φ(tn+1 ) = φ(tn ) + f (φ(s), s)ds. (2.31)
tn
(This is an integral equation for the solution we seek, but of course we can’t solve it for general
nonlinear functions f !) Euler’s (forward or backward) method approximates the integral of (2.31)
simply by the product of steplength (tn+1 − tn ) and the value of f either at tn or tn+1 . A better
approximation to this integral should therefore improve the estimate. As Fig. 2.11 shows, Euler
uses a rectangular (cf. Riemann sum) approximation. The second-order Euler-Heun method uses
a trapezoidal approximation:
Z tn+1
tn+1 − tn
f (φ(s), s)ds ≈ [f (φ(tn ), tn ) + f (φ(tn+1 ), tn+1 )]. (2.32)
tn 2
11111111
00000000 P(t)
11111111
00000000f(P( t n+1), t n+1)
00000000
11111111 00000000
11111111
00000000
11111111 Backward
00000000
11111111
00000000
11111111
00000000
11111111
Euler
00000000
11111111
f(P( t n),t n)
00000000
11111111 Forward
00000000
11111111
111
000
00000000
11111111
Second−order
00000000
11111111 00000000
11111111
Integral
Euler
00000000
11111111 00000000
11111111
00000000
11111111 00000000
11111111
00000000
11111111
00000000
11111111 00000000
11111111
tn t n+1 tn00000000
11111111
t n+1
Figure 2.11: First-order and second-order approximations to the integral in (2.31).
Of course, we don’t know φ(t) and so cannot calculate this expression explicitly, but we do our
best by writing xn ≈ φ(tn ), xn+1 ≈ φ(tn+1 ), which gives:
∆t
xn+1 = xn + [f (xn , tn ) + f (xn+1 , tn+1 )], ∆t = tn+1 − tn . (2.33)
2
31
This is still an implicit method (xn+1 appears on RHS), but we may make it explicit at the cost of
another approximation, by estimating xn+1 = xn + ∆tf (xn , tn ) from first-order Euler:
∆t
xn+1 = xn + [f (xn , tn ) + f (xn + ∆tf (xn , tn ), tn+1 )]. (2.34)
2
This defines the improved Euler or Heun method. The stepwise error is O(∆t3 ) and the global
accumulated error is O(∆t2 ) (not proved here). Halving the step size now improves accuracy by a
factor of 4!
2.2.3 The Runge-Kutta method
The Runge-Kutta (RK) method is the basis of many ‘integration packages’, including ones called
by Matlab. Here’s the recipe for a fixed step size RK method for the scalar ODE ẋ = f (x, t):
∆t 1
xn+1 = xn + [k + 2kn2 + 2kn3 + kn4 ], where (2.35a)
6 n
kn1 = f (xn , tn ), (2.35b)
∆tkn1
∆t
kn2 = f (xn + , tn +
), (2.35c)
2 2
∆tkn2 ∆t
kn3 = f (xn + , tn + ), (2.35d)
2 2
kn4 = f (xn + ∆tkn3 , tn + ∆t). (2.35e)
This does a much better job of approximating the integral (2.31) by averaging the slope of the
vector field over its values at four points in the step, as pictured in Fig. 2.12.
0110
x
1010
P(t)
1010
xn 1010 t f(x n,t n)= t k1n
1010 t k3n t k1n
11111111111
00000000000
10
t 1
t 2 kn t k2n
t t 2{
tn t n+1 2 kn
} t k4
n
tn t n+ t /2 t n+1
Figure 2.12: Comparison of 1 point average forward-Euler and 4 point average Runge-Kutta.
The resulting stepwise error is O(∆t5 ) and the global accumulated error is O(∆t4 ), so this is a
fourth order method, abbreviated as RK4 (there is also a fifth order RK method). More calculations
are required per step, but far fewer steps are needed to obtain the same accuracy at time T . For
example, to obtain accuracy of O(10−6 ) at T = 1 for first-order Euler, we need ∆t ∼ 10−6 and
32
thus O(106 ) steps. For second-order Euler/Heun: ∆t ∼ 10−3 and O(103 ) steps, but for fourth-
1 6 √
order Runge-Kutta, ∆t ≈ 30 requires only O(10 4 ) = O( 1000) = O(30) steps. The extension of
second-order Euler-Heun and RK4 to systems is done component-wise as in (2.16).
Here we have considered only fixed step methods, for which ∆t = tn+1 − tn , ∀n. Clearly, we
could take larger steps in regions in which f (x, t) varies little (or ‘slowly’), so that the stepwise
error constant M is small, and, to maintain the same absolute accuracy, take shorter steps in
regions in which f changes rapidly. Such variable stepsize methods can also reduce computational
effort. At each step, the error is estimated and larger or shorter steps are used, sometimes in a
predictor/corrector strategy. The Matlab routine ode45() uses such a strategy. However, see [247]
for an interesting example of how it can fail.
2.2.4 Newton’s method for finding zeros
Finding the zeros of a function seems a simpler problem than solving an ODE: one must only
approximate some numbers, not a whole function. However, it also requires an iterative approach.
Suppose we want to solve f (x) = 0, with f ∈ C 1 (at least once-differentiable). This can be an
embarrassing question for a mathematician. For example, there is no closed form formula for the
roots of a polynomial of nth order for n ≥ 5, and he or she must resort to a numerical method.
We define the iterative process:
f (xn ) df
xn+1 = xn − ′
, where f ′ (x) = (x). (2.36)
f (xn ) dx
Proposition 2. For almost all x0 , the sequence generated by (2.36) {xn }∞n=0 → x̄, where f (x̄) = 0,
i.e. the sequence converges on one of the (possibly many) solutions of f (x) = 0.
Proof. (Sketch) Provided ff′(x̄) f (x̄)

(x̄) = 0 the limit equation x̄ = x̄ − f ′ (x̄) is clearly satisfied, and so
we can expect a well-defined limit point provided that the zero is simple: f ′ (x̄) 6= 0. This is true
for almost all zeros, since otherwise two conditions f (x̄) = f ′ (x̄) = 0 must hold simultaneously.
Moreover, the fixed point x̄ of the iterated map (2.36) is locally asymptotically stable, as we see
by linearizing (2.36) at x̄. Letting x = x̄ + y and expanding in a Taylor series as we did when
linearizing ODEs, (2.36) becomes
x̄ + yn+1 = x̄ + yn − ff′(x̄+y
(x̄+y
n)
n)
,
³ ´′ h ′ ′ i¯
f ′′ ¯
⇒ yn+1 = yn − f ′ (x̄) yn + O(yn2 ) = yn − f f(f−f
f
′ )2 ¯ yn + O(yn2 ).
x=x̄
Truncating at the linear terms, we have
f ′ (x̄)2 − f (x̄)f ′′ (x̄)

· ¸
yn+1 = 1 − yn = 0, (2.37)
f ′ (x̄)2
′
since f (x̄) = 0 and ff ′ (x̄)
(x̄) = 1. So the eigenvalue λ = 0 for this one-dimensional map, and we have
superlinear or quadratic convergence.
33
f(x)
f’(x 0) =0
f(x)
f’(x 0) f(x 0)
f(x)
1
x x x x
x2 x1 x0 x0 x1 x0 x
Figure 2.13: Geometric interpretation of Newton’s method. a) Normal case, with basin of attraction
and quadratic convergence rate. b) A ‘bad’ initial guess x0 for which f ′ (x0 ) = 0. c) Another bad
case: x0 lies in a periodic cycle.
There is a geometric interpretation to Newton’s method as can be seen in Fig. 2.13. In a), all
initial data is in the ‘basin of attraction’ of x̄, and approaches x̄ under iteration (2.36). In b),
there is an example of a ‘bad’ x0 , at which x1 = x0 − ff′(x 0)
(x0 ) = ±∞!! Finally, in c) we see x0 in an
(unstable) cycle. Consider small perturbations of these bad choices, and think about what might
happen if f (x) has multiple roots.
Numerical methods is a HUGE field! MAT 342, APC523 and other courses do a proper job on
it. This brief taste is only an appetizer!
2.3 Geometric theory of dynamical systems
In his attempts to solve problems of celestial mechanics in the late 19th century, the great
French mathematician Henri Poincaré thought deeply about nonlinear differential equations. His
ideas, research papers, and books (e.g. [202, 203]) gave rise to the qualitative or geometric theory
of dynamical systems. The following, partially taken from [130], provides an informal introduction
to some of the main ideas. For background and deeper coverage of the main ideas, see [121], and
for a more advanced treatment, see [107] (and take MAE541/APC571!).
Dynamical systems are more general objects than differential equations: they consist of a phase
or state space M and a rule that specifies how points x ∈ M (system states) evolve as time
progresses. Evolution can be continuous, as in ODEs, or discrete, as in iterated maps of the form:
xn+1 = F(xn ), (2.38)
examples of which include the numerical algorithms of §2.2. (In fact dynamical systems ideas can
in turn be used to analyse numerical algorithms such as Newton’s method for finding zeros [252].)
Solutions of ODEs near periodic orbits can be studied via Poincaré maps, as sketched in §2.3.4.
Maps (or mappings) can also be defined directly in modeling phenomena such as populations, which
may more naturally be described in terms of generations, or via census data taken periodically.
Dynamical systems can be deterministic, such as the ODEs introduced above and below, in which
34
the evolution rules specify uniquely the future and past, given the present state. Alternatively,
in stochastic or random dynamical systems, an element of chance intervenes, often modeled by a
gaussian (white noise) process. We shall meet simple examples of these in the form of stochastic
differential equations (SDEs) in chapter 6, and in particular, the drift-diffusion process.
2.3.1 Stable and unstable manifolds
In §2.1 we described the linearization procedure. We now investigate the geometry implied by
the linear algebra. The eigenvectors of the linear system (2.7) (or (2.6)) define invariant subspaces4
of solutions described by the flow map (2.10) and exponential formulae such as (2.8-2.9). For
convenience, we repeat (with a superscript e to denote that linearization is carried out at the
equilibrium xe :
x(t) = Φe (t)x0 , where Φ(t) = Xe (t)[Xe (0)]−1 , (2.39)
where Xe (t) is a fundamental solution matrix for the linearised system (2.6).
It is a wonderful fact that the closed form solution (2.39) of the linearized system (2.7) is a good
approximation for solutions of (2.1) as long as all the eigenvalues of Df (xe ) have non-zero real
parts and |ξ| remains sufficiently small. More precisely, the Hartman–Grobman theorem [121, 107]
asserts that there is a continuous change of coordinates transforming solutions of the full nonlinear
system (2.1) in a neighborhood B(xe ) of the fixed point into those of (2.7) in a similar neighborhood
of ξ = 0. This is called topological equivalence. Even better, the decomposition of the state space
Rn into invariant subspaces spanned by (collections of) eigenvectors vi also holds for the nonlinear
system in B(xe ), in that it possesses invariant manifolds filled with solutions whose qualitative
behavior echoes that of the linearised system. (For our puposes, we need only know that a manifold
is a curved space that looks locally like a piece of Euclidean space: see the examples below.)
First, recall that under the linear flow map Φe (t) (2.3) any solution initially lying in a k-
dimensional subspace span{v1 , . . . , vk }, (k < n) remains in that subspace. Thus we can define two
distinguished invariant subspaces whose names will reflect the asymptotic properties of solutions
belonging to them. Suppose that Df (xe ) has s (≤ n) and u (≤ n − s) eigenvalues with nega-
tive and positive real parts respectively and number them λ1 , . . . , λs and λs+1 , . . . , λs+u (counting
multiplicities). Let
E s = span{v1 , . . . , vs } and E u = span{vs+1 , . . . , vs+u } .
We call E s the stable subspace and E u the unstable subspace: solutions in E s decay exponentially
and those in E u grow exponentially as t increases. The stable manifold theorem states that for
(2.1) in a neighborhood B(xe ) of the equilibrium xe , there exist local stable and unstable manifolds
s (xe ), W u (xe ) of dimensions s and u respectively, tangent at xe to E s and E u , and characterised
Wloc loc
as follows, using the flow map (2.3):
s (xe ) = {x ∈ B(xe )|φ (x) → xe as t → +∞ and φ (x) ∈ B(xe ) for all t ≥ 0} ;
Wloc t t
u (xe ) = {x ∈ B(xe )|φ (x) → xe as t → −∞ and φ (x) ∈ B(xe ) for all t ≤ 0} .
Wloc t t
4
A set S is invariant for an ODE if every solution x(t) with initial condition x(0) ∈ S, remains in S for all time t.
35
In words, the local stable manifold consists of all solutions that start and remain near the equi-
librium for all future time and approach it as time tends to infinity, and the unstable manifold is
defined similarly, with the substitutions “past time” and “minus infinity.”
These manifolds are smooth, curved surfaces, which locally look like the linear subspaces E s
and E u (as the surface of the earth looks locally like a plane); see Fig. 2.14(a). We can make this
more precise as follows. In terms of the local coordinates ξ near xe the smooth manifolds Wlocs (xe )
u e s u
and Wloc (x ) can be expressed as graphs over E and E respectively. Picking an eigenvector basis
and letting E s⊥ denote the (n − s)-dimensional orthogonal complement to E s and y ∈ E s , z ∈ E s⊥
be local coordinates, we can write
s
Wloc (xe ) = {(y, z)|(y, z) ∈ B(0) and z = g(y)} (2.40)
for some smooth function g : E s → E s⊥ . We cannot generally compute g (if we could we would
have found solutions of (2.1)), but we can approximate it, as we shall see below.
Equipped with the local manifolds we can define the global stable and unstable manifolds:
[ [
W s (xe ) = s
φt (Wloc (xe )) ; W u (xe ) = u
φt (Wloc (xe )) .
t≤0 t≥0
These are the unions of backwards and forwards images of the local manifolds under the flow map
of the nonlinear system (Eqn. (2.3)). Thus W s (xe ) is the set of all points whose orbits approach xe
as t → +∞, even if they leave B(xe ) for a while, and W u (xe ) is the set of all points whose orbits
approach xe as t → −∞: Fig. 2.14(a).
u
E
s
E xe
xe
B(x e)
b)
a)
Figure 2.14: Stable and unstable manifolds (a). The local manifolds in B(xe ) can be expressed
as graphs as in (2.40), but globally they may “double back” and even intersect one another. A
homoclinic orbit (b).
A stable manifold cannot intersect itself (except at the fixed point); nor can it intersect the
stable manifold of any other fixed point, since this would violate uniqueness of solutions: the
intersection point would lead to more than one future. The same is true of unstable manifolds.
However, intersections of stable and unstable manifolds can occur: they contain solutions that
lead from one fixed point to another. If a solution x(t) leaves and returns to the same fixed point
36
(x(t) ∈ W u (xe ) ∩ W s (xe )), it is called homoclinic: Fig. 2.14(b). If it passes between two different
fixed points (x(t) ∈ W u (xe1 ) ∩ W s (xe2 )) it is heteroclinic.
2.3.2 Planar systems of ODEs
In a similar manner to fixed points, periodic orbits also carry stable and unstable manifolds
filled with orbits that are forwards and backwards-asymptotic to them. The global orbit structure
in state space is largely organized by the stable and unstable manifolds. The former can act as
separatrices dividing solutions that can have very different future behaviors. See example 7 below.
However, while Taylor series approximations for graphs defining the local manifolds (such as (2.40))
can be found, truly global behavior can only be established in simple cases such as two-dimensional
systems, of the form
ẋ1 = f1 (x1 , x2 ),
(2.41)
ẋ2 = f2 (x1 , x2 ).
Nullclines are the curves in the (x1 , x2 )-state space on which one component of the vectorfield
vanishes, so that solutions cross them either vertically or horizontally. Specifically, ẋ1 = 0 on
f1 (x1 , x2 ) = 0 and ẋ2 = 0 on f2 (x1 , x2 ) = 0. (One can also define isoclines [277] on which the slope
of the vectorfield is constant: nullclines are a special case of these.) It is often possible, although
tedious, to solve for nullclines explicitly, plot them on the state R2 , also called the phase plane,
and then sketch enough arcs and segments of solution curves that one can better interpret the
results of numerical solutions. Clearly, fixed points lie at the intersections of the nullclines, where
both components of the vectorfield vanish, and the orientations of the tangent vectors in regions
bounded by different components of the nullclines can help us to assemble the local pieces of state
space determined by linearization into a global picture.
Example 6. Nonlinear gain control in the retina [277, §6.2].
In this example we illustrate linearization and the use of nullclines, and show how some global
implications can be deduced from simple calculations of the direction of the vectorfield. Here the
state variables B and A represent activity levels of bipolar and amacrine cells in the retina, exposed
to light of intensity L ≥ 0 (assumed to be constant):
1
Ḃ = τB [−B + L/(1 + A)]
1 . (2.42)
Ȧ = τA [−A + 2B]
The terms −A, −B model the decay of activity with time constants τA and τB , in the absence of
inputs, 2B models the excitatory influence of bipolar cells on amacrine cells, and L/(1 + A) in the
first equation represents the feedback from amacrine cells that reduces the bipolar response. We
shall assume that L is constant. See [277] for more details.
Neural activity, measured as an average spike rate for example, is normally regarded as a non-
negative quantity (it might drop to zero), so we first ask if this model allows A or B to become
L
negative. Suppose we start at a point (A, B) = (A(0), 0) with A(0) ≥ 0: then Ḃ = (τB (1+A)) ≥ 0, so
B cannot go negative. Similarly, starting at (0, B(0) with B(0) ≥ 0, Ȧ = 2B
τA ≥ 0 and A cannot go
negative either. In fact the positive quadrant {Q+ = (B, A)|B ≥ 0, A ≥ 0} is positively invariant:
37
solutions of (2.42) starting in Q+ stay in Q+ for all future time. This also ensures that, given
physically acceptable initial conditions (A(0) ≥ 0, B(0) ≥ 0), solutions stay away from the line
A = −1 where the function defining the first equation is singular (= blows up). See Fig. 2.15.
1.8 A
1.6
Q+
1.4
1.2
0.8
0.6
0.4
0.2
0
L B
0 0.2 0.4 0.6 0.8 1
Figure 2.15: The phase portrait of the retinal gain model. The red hyperbola and green diagonal
are the nullclines Ȧ = 0 ⇒ B = L/(1 + A) and Ḃ = 0 ⇒ A = 2B respectively.
Fixed points are given by −A + 2B = 0 and −B + L/(1 + A) = 0, and setting A = 2B in the

second equation and solving for B we obtain:
√ √
−1 ± 1 + 8L −1 ± 1 + 8L
B̄ = and Ā = , (2.43)
4 2
so there is a single equilibrium in the physically-relevant region Q+ , corresponding to choosing the
positive roots in (2.43). Linearizing (2.42) we obtain the matrix:
Ã !
−1 −L
τB τB (1+A)2
2 −1 , (2.44)
τA τA
and evaluating at (B̄, Ā) yields:

Ã √
( 1+8L−1)2
! √
−1
µ ¶
τB 16τB L
1 + 8L − 1 + 8L 1 1
2 −1 ⇒ det(Df ) = > 0, tr(Df ) = − + < 0. (2.45)
τA τA
4τA τB L τA τB
The eigenvalues of the 2 × 2 matrix Df are given by the quadratic equation
λ2 − tr(Df )λ + det(Df ) = 0, (2.46)
and so, for all L, τA , τB > 0, we conclude that the fixed point is stable. (See Fig. 2.16 and the
discussion on pp 42-23 below.)
We have already shown that the postive quadrant Q+ is positively invariant. We can also show
that a trapping region A exists within Q+ such that all solutions starting in Q+ sufficiently close
to A eventually enter A. Referring to Fig. 2.15, we draw vertical and horizontal lines at B = C
38
and A = 2C for some C > L that together with the A and B axes bound a rectangular domain A
and consider the components of the vector field on these lines and normal to them:
Ḃ |B=C = τ1B [−C + L/(1 + A)] < 0, since A ≥ 0 and C > L;

Ȧ |A=2C = τ1A [−2C + 2B] < 0, since B ≤ C.
With the results established earlier, this shows that all solutions starting on the boundary of A enter
it. In fact we can do even more, as indicated in Exercise 3 below.
In addition to stable and unstable fixed points, systems of dimension ≥ 2 can exhibit periodic
orbits. These are closed curves in the state space that may attract or repel orbits in their neigh-
borhoods, in which case they are called limit cycles, since the neighboring orbits approach them as
t → ∞ or t → −∞ respectively. The harmonic oscillator of Example 3 is a linear, planar system
that has a continuous family of periodic orbits, but these are not limit cycles because all neighbor-
ing orbits remain on their own closed curves (circles, for the special case ẋ = y, ẏ = −x). However,
the system
x˙1 = x1 − ωx2 − (x21 + x22 )x1 ,
(2.47)
x˙2 = ωx1 + x2 − (x21 + x22 )x2 ,
shows that nonlinear ODEs can have limit cycles; in fact, transforming Eqns. (2.47) into polar
coordinates x1 = r cos θ, x2 = r sin θ as in Example 1, we obtain
ṙ = r − r3 , θ̇ = ω , (2.48)
showing that all orbits except that starting at the fixed point (x1 , x2 ) = (0, 0) approach the circle
r = 1 as t → ∞. This examplifies an asymmptotically stable limit cycle.
Exercise 2. Write a Matlab code to simulate Eqns. (2.47) using Euler’s method and compare the
results with exact solutions obtained by solving Eqns. (2.48) analytically. Compute solutions for
T = 5 time units. Start by setting ω = 10 and ∆t = 0.0001 and the decrease ∆t successively to
0.001, 0.01 and 0.1. Explain what you see, illustrating with phase plane plots of (x1 (t), x2 (t) and
plots of r(t) vs. t, and interpret the results in the light of the discussion of the harmonic oscillator
in §2.2.1 and Example 1.
Returning to Example 6, note that we have proved that solutions enter the trapping region
A, but we have not proved that all solutions approach the stable sink inside it. A stable limit
cycle might also lie in A, attracting all or some of the solutions entering it. If this were the case,
then some other invariant set(s) must also exist, separating orbits that approach the limit cycle
and those approaching the sink. For planar systems, the following result allows us to exclude the
possibility of closed orbits (and hence of limit cycles) in some cases.
Theorem 2 (Bendixson’s criterion). If on a simply-connected region D ⊂ R2 the quantity tr(Df ) =
∂f1 ∂f2
∂x1 + ∂x2 is not indentically zero and does not change sign, then the planar ODE (2.41) has no
closed orbits lying entirely in D.
Proof. (Sketch) This is proved by noting that on any closed (= time-periodic) orbit Γ we have:
ẋ2 dx2 f2
Z
= = ⇒ [f1 (x1 , x2 ) dx2 − f2 (x1 , x2 ) dx1 ] = 0;
ẋ1 dx1 f1 Γ
39
hence, appealing to Green’s theorem, we deduce that
· ¸
∂f1 ∂f2
Z Z
+ dx1 dx2 = 0,
intΓ ∂x1 ∂x2
thereby obtaining a contradiction. Simple connectivity is required for Green’s theorem.
Exercise 3. Use Bendixson’s criterion to show that the ODEs (2.42) cannot have periodic solutions.
Can you go on from this to prove that all solutions starting in the positive quadrant approach the
stable sink (B̄, Ā)?
The Poincaré-Bendixson theorem, stated at the beginning of §2.3.4, provides a kind of converse
to Bendixson’s criterion in that it gives conditions sufficient for the existence of closed orbits and
limit cycles.
Unlike the local conclusions drawn from linearization, which are restricted to neighborhoods of
each fixed point, Bendixson’s criterion gives a (semi-) global result for the flow. Its validity depend
the size of the region D, but in some case D = RR2 : the entire plane. For planar systems it is also
possible to obtain conditions on the types of fixed points that may lie in given regions using index
theory.
We start by defining the index of a simple closed curve CinR2 that does not pass through any
fixed point of the ODEs (2.41) (and is not, in general, a solution curve of these ODEs). The index
C is defined by taking a point p = (x1 , x2 ) ∈ C and letting it traverse C once counterclockwise and
return to its starting point. During this process the orientation of the vector (f1 (x1 , x2 ), f2 (x1 , x2 ))
changes continuously, and must therefore rotate through a net angle 2πk for some integer k: k is
the index of the closed curve C. It can be shown that k does not depend of the exact form of C, but
only upon properties of the fixed points of Eqns. (2.41) inside C. The index of C may be obtained
via the integral
Z · µ ¶¸ Z · µ ¶¸ Z · ¸
dx2 f2 f1 df2 − f2 df1
2πk = d arctan = d arctan = d (2.49)
C dx1 C f1 C f12 + f22
[11, § V.8].
If C encircles an isolated fixed point xe then k is the index of xe . Either by using Eqn. (2.49)
or simplly by drawing pictures of planar vectorfields the following proposition can be verified.
Proposition 3. Some indices for planar ODEs.
(i) The index of a sink, a source or a center is +1.
(ii) The index of a hyperbolic saddle point is −1.
(iii) The index of a closed orbit is +1.
(iv) The index of a closed curve containing no fixed points orbit is 0.
(v) The index of a closed curve is equal to the sum of the indices of all fixed points within it.
From this proposition we can deduce

Corollary 1. A closed orbit Γ of a planar ODE system must contain at least one fixed point and if
this point is unique, it must be a sink or a source. If all the saddle points within Γ are hyperbolic,
then there must be 2n + 1 fixed points within Γ, n ≥ 0 of which are saddles and the remainder
sources or sinks.
40
Degenerate (non-hyperbolic) fixed points, especially those with a double zero eigenvalue, can
have indices |k| > 1, for example, the origin (0, 0) of the ODE
ẋ1 = x21 − x22 , ẋ2 = 2x1 x2 (2.50)
has index 2, and for

ẋ1 = −x21 ẋ2 = ±x2 (2.51)
it has index 0. The latter is an example of a saddle-node bifurcation point, described more generally
in the next section. See [11, § V.8] and [107, § 1.8] for more on index theory for planar systems.
The extended Definition 5 of stability given at the end of §2.1.2 suggests the more general idea
of an attracting set, which subsumes the special cases of asymptotically stable fixed points (sinks)
and stable periodic orbits or limit cycles:
Definition 6. An invariant set A for a dynamical system is an attracting set if all solutions
starting in a neighborhood of A, approach A as t → ∞.
The structures of 1-dimensional orbits on the plane are constrained by the fact that such curves
cannot cross (except where they limit at fixed points), due to the uniqueness of solutions. Thus
sinks, asymptotically stable limit cycles, and certain unions of curves connecting saddle points are
the only attracting sets possible for planar ODEs [107, §1.8-9]. However, much richer behavior,
including chaos and strange attractors that contain infinitely many unstable periodic orbits, occur
in ODEs of dimensions ≥ 3.
2.3.3 Center manifolds and local bifurcations
As parameters change, so do phase portraits and the resulting dynamical behavior of model
systems. New fixed points can appear, and the stability types of existing ones may change in
bifurcations. Examples will follow (the reader may like to look ahead to Example 7 and Figs. 2.22-
2.23 at the end of this section), but first we develop some background. Bifurcation theory addresses
how system behavior changes as control parameters are varied. We consider families of systems of
the form
ẋ = f (x; µ) or ẋ = fµ (x) , (2.52)
where µ ∈ Rm is a (vector of) parameter(s). Often all but one are (temporarily) fixed and we
consider a one-parameter family, which corresponds to a path in the appropriate space of systems.
Here it is necessary to introduce the important notion of structural stability, which is entirely
different from the Liapunov and asymptotic stabilities of equilibria and orbits defined in §2.1-2.1.2.
Rather than considering how solutions of a given ODE system change under small perturbations
of the initial conditions, we ask how the entire set of solutions – the global phase portrait – of a
system changes, when the defining vectorfield fµ (x) undergoes a small change, as happens when
parameters vary. We will sweep the (tricky) technical issues of defining “small perturbation” for a
system of ODEs under the rug (one needs to define a space of systems with appropriate metrics and
norms to describe how close two systems are, etc.), and merely sketch the idea. Rigorous definitions
can be found in [107].
41
We say that two systems are topologically equivalent if the phase portrait of one can be mapped
onto that of the other by a continuous coordinate change: i.e., there is a one-to-one correspondence
of all orbit segments. Furthermore, the sense of time must be preserved (sinks remain sinks, sources
remain sources). A small perturbation is one for which not only function values, but also their first
derivatives, are close:
||f (x) − g(x)|| < ǫ and ||Df (x) − Dg(x)|| < ǫ , (2.53)
qP
where || · || denotes a suitable norm, e.g. Euclidean (for a matrix ||A|| = 2
i,j |ai,j | ). Thus
equipped, we may state:
Definition 7. The nonlinear system ẋ = f (x) is structurally stable if all sufficiently close
systems ẋ = g(x) are topologically equivalent to ẋ = f (x).
The case of two-dimensional linear systems nicely illustrates this, and further illuminates our
linearized stability analyses. Consider the ODEs
ẋ1 = a11 x1 + a12 x2 ,

(2.54)
ẋ2 = a21 x1 + a22 x2 .
The eigenvalues of the matrix A = [aij ] are given by the roots of the characteristic equation
λ2 − λ trA + det A = 0 ,
where tr A = a11 + a22 and det A = a11 a22 − a12 a21 . Thus, if det(A) < 0 then (0, 0) is a saddle
point, while if det(A) > 0 and tr(A) < 0 it is a sink and if det(A) > 0 and tr(A) > 0 it is a source.
The diagram of Fig. 2.16 summarises the qualitative behavior. On the diagram we also indicate
the parabola [tr(A)]2 = 4 det(A) on which the eigenvalues change from real to complex, but this
does not correspond to a topological change in the character of solutions: a sink is a sink, whether
solutions approach it directly or in spirals.
4
tr(A)
3
sources(nodes)
1 sources(foci)
saddles det(A)
0
−1 sinks(foci)
−2
−3
sinks(nodes)
−4
−4 −3 −2 −1 0 1 2 3 4
Figure 2.16: Stability types of the fixed point at the origin for the planar system (2.54).
Here the space of systems is effectively the plane with det(A), tr(A) as coordinates and any
system not on the tr(A)-axis or the positive det(A)-axis is structurally stable, since all systems in
42
a sufficiently small neighborhood of it share its behavior. On the positive det(A)-axis, however,
we have purely imaginary eigenvalues and the ODE behaves like an undamped harmonic oscillator.
Such systems are structurally unstable. Moving off the axis – adding damping, no matter how
small – yields qualitatively different behavior. The periodic orbits are all destroyed and replaced
by solutions spiralling into a sink, or out from a source.
We now return to bifurcations, via a simple example. The one-parameter, one-dimensional

family of ODEs
ẋ = µ − x2 (2.55)
will serve to introduce local bifurcation theory. Equation (2.55) is soluble in closed form, but here
we need only observe that, for µ < 0 it has no equilibria and all solutions approach −∞ as t
√
increases and +∞ as t decreases, while for µ > 0 there are two hyperbolic equilibria xe1 = + µ
√
and xe2 = − µ, the former being an asymptotically stable sink and the latter an unstable source,
√
and that all solutions starting at x(0) > − µ approach the sink as t → +∞. At µ = 0 the single
equilibrium xe = 0 is non-hyperbolic and the linearisation ξ˙ = 2xe ξ = 0 gives no information on
stability. For all µ 6= 0 (2.55) is, in fact, structurally stable: its behavior changes as it passes
through the point µ = 0 of structural instability. This information can be conveniently pictured on
a bifurcation diagram: a plot of the branches of equilibria versus the parameter: Fig. 2.17. This is
a saddle-node bifurcation [121, 107].
2
x
1.5
0.5
0 µ
−0.5
−1
−1.5
−2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
Figure 2.17: The bifurcation diagram for Equation (2.55). Vertical lines represent phase portraits of
individual systems, with arrows corresponding to time and thus displaying stability; —– represents
a branch of sinks, - - - - a branch of sources.
In general we have:
Definition 8. A value µ0 of µ for which Equation (2.52) is structurally unstable is a bifurcation

value.
The bifurcation at µ = 0 for (2.55) is called local because for µ ≈ 0 the qualitative change in
the phase portrait is confined to a neighborhood of x = 0. In fact when such bifurcations occur in
more complex, higher-dimensional systems, they can be analysed by local methods: examination
43
of truncated Taylor series expansions near the degenerate (non-hyperbolic) equilibrium or periodic
orbit involved in the bifurcation. Here the center subspace E c , spanned by eigenvectors whose
eigenvalues have zero real part, is important. For example, supplementing the example of (2.55) in
the following way:
ẋ1 = −x1
ẋ2 = µ − x22 , (2.56)
ẋ3 = 2x3
we can see that “nothing happens” in the x1 - and x3 -directions as µ varies. At µ = 0 the degenerate
equilibrium x = 0 is hyperbolic in those two directions and they can effectively be ignored. It is a
remarkable fact that this remains true even when there is nonlinear coupling between the hyperbolic
and non-hyperbolic directions. This is the content of the center manifold theorem [107]. In addition
to local stable and unstable manifolds Wlocs (xe ), W u (xe ), a center manifold W c (xe ), tangent to
loc loc
E c , exists in a neighborhood of the degenerate equilibrium xe . See Fig. 2.18. Unlike Wloc s (xe )
u e c e
and Wloc (x ), however, Wloc (x ) cannot be characterized in terms of the asymptotic behavior of
solutions within it. For example, at µ = 0 solutions of (2.56) approach x = 0 from x2 < 0 and
recede from it for x2 > 0 (the fixed point is “semi-stable”).
Eu
u(xe)
Wloc
Es
s (xe)
Wloc
c (xe)
Wloc
Ec
Figure 2.18: The stable, center, and unstable manifolds
Fig. 2.18 provides a schematic picture of the phase space near a degenerate equilibrium with
s negative, u positive, and c zero (real part) eigenvalues. As the picture suggests, the “nonlinear
s , W u , and W c allow us to separate stable, unstable
coordinates” implicit in the definitions of Wloc loc loc
and non-hyperbolic or bifurcation behaviors and so to reduce our analysis to that of a c-dimensional
system restricted to the center manifold.
To describe the reduction process we assume that coordinates have been chosen with the de-
generate equilibrium xe at the origin and such that the matrix Df (xe ) is block diagonalised. Thus
(2.52) can be written in the form:
ẋ = Ax + f (x, y),
(2.57)
ẏ = By + g(x, y),
where x belongs to the center subspace E c and y belongs to the stable and unstable subspaces. We
drop the explicit reference to parameter dependence. Here all the eigenvalues of the c × c matrix
A have zero real parts and the (n − c) × (n − c) = (s + u) × (s + u) matrix B has only eigenvalues
with non-zero real parts. The center manifold can now be expressed as a graph h over E c :
y = h(x) . (2.58)
This implies that, as long as solutions remain on the center manifold, the full state (x, y) of the
system can be specified, via (2.58) by the state of the x variables alone. The reduced system is then
44
defined as
ẋ = Ax + f (x, h(x)) (2.59)
and the stability and bifurcation behavior near the degenerate equilibrium can be deduced from it.
As an example, consider the two dimensional system
ẋ1 = µx1 − x1 x2 ,
(2.60)
ẋ2 = −x2 + x21 ,
whose linear part is already diagonalised. At µ = 0 the eigenvalues are 0 and −1 respectively and
E c = span{(1, 0)}, E s = span{(0, 1)}, so we seek the center manifold as a function x2 = h(x1 ).
This function will be a particular solution of the ODE
dx2 −x2 + x21 1 x1

= = − , (2.61)
dx1 −x1 x2 x1 x2
which cannot be solved explicitly. Here and in general we instead approximate the graph h(x1 ) as
a Taylor series, which, in view of the local nature of our study, will be adequate provided that the
leading terms do not vanish.
Returning to the general problem (2.57), we substitute y = h(x) into the second component and
use the chain rule and the first component to obtain a partial differential equation for the function
h(x):
Dh(x)[Ax + f (x, h(x))] = Bh(x) + g(x, h(x)) (2.62)
with “boundary conditions”
h(0) = 0 ; Dh(0) = 0 , (2.63)
which result from the tangency of c
Wloc
to Ec
at xe = 0. We shall seek an approximate solution of
(2.62) and (2.63) in the form of a Taylor series. For our example (2.60) at µ = 0, therefore, we set
x2 = h(x1 ) = a2 x21 + a3 x31 + O(|x1 |4 )

⇒ Dh(x1 ) = 2a2 x1 + 3a3 x21 + O(|x1 |3 ) ; (2.64)
due to the boundary conditions, the Taylor series necessarily starts at second order. Equation (2.62)
becomes, in this case,
[2a2 x1 + 3a3 x21 + O(|x1 |3 )]{−x1 [a2 x21 + a3 x31 + O(|x1 |4 )]}
= −[a2 x21 + a3 x31 + O(|x1 |4 )] + x21 . (2.65)
Equating terms of comparable orders in |x1 | yields:
O(|x1 |2 ) : 0 = −a2 + 1
¾
(2.66)
O(|x1 |3 ) : 0 = a3
and we conclude that the function h may be written
x2 = h(x1 ) = x21 + O(|x1 |4 ) . (2.67)
At µ = 0 the reduced equation is therefore
ẋ1 = −x1 [x21 + O(|x1 |4 )] = −x31 + O(|x1 |5 ) , (2.68)
45
the behavior of which is dominated, for small |x1 |, by the negative cubic term which pushes solutions
towards x1 = 0. The degenerate equilibrium is evidently stable at the bifurcation point, as well
as for all µ < 0. This is not directly obvious from (2.60): the linearisation ẋ1 = µx1 of the first
component tells us nothing at µ = 0 and the reader can check, repeating the calculations above,
that a change in sign in the term x21 in the second component turns stability on its head! (See
Example 1 above.)
To study the bifurcation that occurs at as (x1 , x2 ) = (0, 0) for (2.60) as µ varies from zero, we
should really incorporate µ as an additional “dummy state variable” and compute an extended
center manifold tangent to the (x1 , µ)-plane, but in this case we can solve explicitly for the new
equilibria that appear for µ > 0. Representative phase portraits of the full system for µ < 0, µ = 0
and µ > 0 appear in Fig. 2.19, and Fig. 2.20 shows the bifurcation diagram for the reduced system.
This is an example of a pitchfork bifurcation [107].
2 2 2
1 1 1
x2
x2
x2
0 0 0
−1 −1 −1
−2 −2 −2
−2 0 2 −2 0 2 −2 0 2
x1 x1 x1
Figure 2.19: Phase portraits of the two dimensional system (2.60): (a) µ = −0.2, (b) µ = 0, (c)
µ = 0.2. Figure taken from [130, Fig 6,9].
2
x
1.5
0.5
0 µ
−0.5
−1
−1.5
−2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
Figure 2.20: The bifurcation diagram for Equation (2.60); —– represents a branch of sinks, - - - -
a branch of saddle points.
Exercise 4. Verify by direct calculation of equilibria, and by checking stability types, that the phase
portraits of Fig. 2.19 and the bifurcation diagram Fig. 2.20 are correct.
Example 7. The Usher-McClelland leaky accumulator model for two-alternative decisions [259].
46
1
0.9
0.8
0.7
0.6
ψ(x)
0.5
0.4
0.3
0.2
0.1
0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
x
Figure 2.21: Sigmoidal or logistic activation functions, showing the effects of gain g and bias β.
Solid blue curve: g = 1, β = 0; dashed red curve: g = 2, β = 0.2.
This example illustrates nullclines, invariant manifolds and local bifurcations. It is similar to
the short-term memory circuit of [277, §6.4] and the “winner take all” decision-making model of
Eqn. (2.80) below. Consider the system
ẋ1 = −x1 − f (x2 ; g, β) + s1 ,
(2.69)
ẋ2 = −x2 − f (x1 ; g, β) + s2 ,
where the function f is a sigmoid:
1
f (x) = ; (2.70)
1 + exp(−4g(x − β))
see Fig. 2.21. Here x1 and x2 represent the activity levels of two pools of neurons that mutually
inhibit each other through the “input/output” function f . The neurons in each pool are selectively
responsive to one of the stimuli sj , as, for example, in oculo-motor brain areas of monkeys trained
to discriminate arrays of dots moving to the right or left [111, 238]. In the absence of stimuli
(sj = 0), the activities xj (t) decay to a baseline level due to the leakage terms −xj . With stimuli of
different magnitudes present (e.g. s2 > s1 ) one unit receives greater input than the other, so that
its activity intially grows faster than the other’s; this effect is magnified by the greater inhibition
that it exerts (if x2 > x1 then f (x2 ) > f (x1 )). The decision is made either when x1 or x2 first
crosses a threshold, or at a specific time, by selecting the pool with higher activity. See §6.1 for
more on models of this type.
Eqn. (2.69) has four parameters: g, β, s1 and s2 : g and β control the gain and offset or bias
implicit in the (synaptic) connections between the two pools and they can be expected to change only
slowly (as the animal learns the task); the stimulus levels sj will change from trial to trial. The
nullclines of (2.69) can be read off the right hand sides immediately as explicit functions of x1 in
terms of x2 and vice versa:
ẋ1 = 0 on x1 = s1 − f (x2 ; g, β) and ẋ2 = 0 on x2 = s2 − f (x1 ; g, β), (2.71)
and, depending upon the parameter values, we can produce pictures such as those of Fig. 2.22. Note
that the system may have one, two or three fixed points. In particular, if gain g is sufficiently low
47
then the slopes of the sigmoids are such that only one intersection is possible. For high gain and
with sufficiently similar stimuli, three fixed points exist.
2 2
x2 x2
1.5 x1’=0 1.5 x1’=0
1 1
x2’=0 x2’=0
0.5 0.5
0 0
x1 x1
−0.5 −0.5
−1 −1
−1 −0.5 0 0.5 1 1.5 2 −1 −0.5 0 0.5 1 1.5 2
a) low gain, unique intersection b) high gain, 3 intersections
Figure 2.22: Examples of nullclines and fixed points for equations (2.69) with s1 = 0.95, s2 = 1,
β = 0.2, and g = 1 (on left) and g = 2 (on right).
Linearizing (2.69) we obtain the matrix

−f ′ (x2 ; g, β)
µ ¶
−1
Df = , (2.72)
−f ′ (x1 ; g, β) −1
where the derivatives f ′ = f ′ (x̄j ; g, β) are evaluated at the fixed point in question. From (2.71) we
see that tr(Df ) = −2 and det(Df ) = 1 − f ′ (x̄1 ; g, β)f ′ (x̄2 ; g, β). Using the fact that the eigenvalues
of a 2 × 2 matrix such as (2.72) are the roots of the quadratic equation
p
2 tr(Df ) ± [tr(Df )]2 − 4det(Df )
λ − tr(Df )λ + det(Df ) = 0 ⇒ λ1,2 =
p 2
= −1 ± f (x̄1 ; g, β)f ′ (x̄2 ; g, β),
′ (2.73)
we can deduce that the fixed points are saddles with one positive and one negative eigenvalue if
f ′ (x̄1 ; g, β)f ′ (x̄2 ; g, β) > 1 and sinks, with two negative eigenvalues if f ′ (x̄1 ; g, β)f ′ (x̄2 ; g, β) < 1.
The fixed points (x̄1 , x̄2 ) must be found numerically by simultaneoeus solution of the transcen-
dental equations (2.71) defining the nullclines, but it is easy to deduce stability from the graph-
ical evidence of Fig. 2.23. For small slopes and a unique intersection of nullclines, we have
f ′ (x̄1 ; g, β)f ′ (x̄2 ; g, β) < 1 and stability, while for large slopes an unstable fixed point can appear.
The upshot is that, when g > 1 and multiple equilibria exist, the “outer” fixed points that lie in
regions where the derivatives f ′ (x̄j ; g, β) are small, are sinks and the “inner” one is a saddle. See
Fig. 2.23.
In applying Usher and McClelland’s model to discriminating between two stimuli, this phase
portrait may be interpreted as follows. In the absence of stimuli, or with low stimulus levels,
both pools of neurons relax to baseline activity. When a stronger stimulus with low coherence is
presented, so that no preponderant fraction of dots is moving to left or to right, both neural pools are
equally activated, and the one that “wins” by inhibiting the other is determined by initial conditions.
Informally, the animal chooses left or right “at random.”
48
2 1.5
x2 x1’=0 x2
1.5 x1’=0
E1
E2
1 1
x2’=0 Es=Ws
0.5
0 0.5
x2’=0 x1
−0.5
Wu
−1 0
x1
Eu
−1.5
−2 −0.5
−2 −1 0 1 2 −1 −0.5 0 0.5 1 1.5
low s1=s2=0 high s1=s2 = 1
Figure 2.23: The phase planes of (2.69) for large gain g = 2, and equal stimuli, showing one stable
fixed point in the absence of stimuli (s1 = s2 = 0 on left), and three for s1 = s2 = 1 (on right). In
the latter case the outer fixed points are sinks and the inner one is a saddle.
Exercise 5. Show that, in the case of equal stimulus strengths s1 = s2 , the phase portrait of
Eqn. (2.69) is reflection-symmetric about the diagonal x1 = x2 . What happens as the difference
s1 − s2 changes between 0 and 1, if the sum s1 + s2 = 1 is fixed and g > 1? Sketch a bifurcation
diagram and provide an interpretation in terms of typical behavior(s) as t → ∞ (i.e., long-term
decision outcomes).
Exercise 6. Show that the function
x1 x2
∂f ∂f
Z Z
V (x1 , x2 ) = x dx + x dx + f (x1 )f (x2 ) (2.74)
∂x ∂x
is a Liapunov function for Eqns. (2.69) with s1 = s2 = 0, in the sense that dV /dt < 0 unless
x1 = −f (x2 ) and x2 = −f (x1 ) (here we suppress the explicit dependence of f on the parameters g
and β). What does the function V (x1 , x2 ) look like? What can you deduce about the global structure
of the state space? Can you generalize V to cover the case s1 , s2 6= 0?
Grossberg and Hopfield [51, 133, 134, 104] have shown that rather large classes of n-dimensional
neural networks with symmetric connectivity matrices, including that of Example 7, admit Lia-
punov functions: see [277, §14.5, Theorem 16] (Exercise 6 is a very special two-dimensional case).
This implies that all solutions run “downhill” and approach equilibria as time progresses, and for
typical systems almost all solutions approach stable equilibria. No sustained periodic or other cyclic
activity is possible. In contrast, the following generalization of Example 1 in §2.1 above exhibits
persistent oscillatory behavior.
Example 8. Consider the two-dimensional system
ẋ = µx − ωy + α(x2 + y 2 )x,
(2.75)
ẏ = ωx + µy + α(x2 + y 2 )y,
where α and ω are regarded as fixed parameters and µ as the bifurcation parameter. Passing to
polar coordinates as in Example 1, we have:
ṙ = µr + αr3 , θ̇ = ω. (2.76)
49
Suppose that α < 0. Then, for µ < 0 we see that r(t) → 0, since the right hand side of the first
equation
p of (2.76) is strictly negative for all r > 0. In contrast, for µ > 0 a “radial equilibrium”
r = −µ/α > 0 appears. This is, in fact, not an equilibrium, since the angular variable θ
increases (or decreases) continually due to the second equation of (2.76). The new solution is a
limit cycle. As µ increases through zero a Hopf bifurcation occurs in which the fixed point at r = 0
((x1 , x2 ) = (0, 0)) loses stability and a stable limit cycle is born. See Fig. 2.24.
1
|τ|
0.8
0.6 limit cycle

0.4
0.2
equilibrium
0
µ
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
1 1 x2
x2
r
0.5 0.5
θ
0 0
x1 x1
−0.5 −0.5
µ<0
µ >0
−1 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
Figure 2.24: A Hopf bifurcation, showing bifurcation diagram (top) and phase portraits of (2.75)
for α < 0 with µ < 0 and µ > 0 (bottom left and right). The bifurcation diagram shows branches
of stable solutions as solid curves and unstable solutions as dashed curves. Note the limit cycle, an
attracting periodic orbit, for µ > 0.
For more on Hopf bifurcations, see [277, §8.4] or for greater detail [107, Chap. 3], which explains
how nonlinear coordinate changes can be made that turn any non-degenerate Hopf bifurcation
into a standard normal form whose leading terms look like those of (2.76) when written in polar
coordinates. In particular, there is an explicit formula for the coefficient α in (2.75-2.76) in terms of
the first three derivatives of the functions defining the vectorfield f (x). Normal forms are essentially
nonlinear generalizations of the similarity transformations which diagonalize matrices in linear
algebra.
Exercise 7. What happens in Eqn. (2.75) above as µ passes through zero in the case that α > 0?
Produce a bifurcation diagram and phase portraits analogous to those of Fig. 2.24. Explain how the
sign of α determines the stability of the limit cycles. Now suppose that a fifth order term is present,
so that the radial equation becomes ṙ = µr + αr3 − r5 , and produce bifurcation diagrams and phase
portraits for this case, showing all topologically distinct cases that can occur.
50
2.3.4 Periodic orbits and Poincaré maps
The Hopf bifurcation described above provides a way to deduce the existence of limit cycles or
periodic orbits from local information: linearization and computation of a stability coefficient at
a degenerate fixed point. Fixed points can occur for ODEs with phase spaces of any dimension
n ≥ 1, but in Euclidean spaces Rn periodic orbits only appear for dimensions n ≥ 2. They are global
phenomena: a solution lying in or attracted to a periodic orbit must leave any sufficiently small
neighborhood and circulate around before returning. This clearly cannot occur on the real line.
Global behavior is generally hard to detect and analyze for n ≥ 3 (for example, chaotic behavior
can occur [121, 107]), but for planar (n = 2) systems there are direct global methods.
The Poincaré-Bendixson theorem [277, §8.1] allows one to prove existence of a periodic orbit
“directly” by showing that orbits must enter a certain region that contains no fixed points:
Theorem 3. Suppose that a planar ODE has an attracting set A that contains no fixed points.
Then A has a periodic orbit.
Actually Wilson’s statement is a little more specfic and more restricted: it assumes the existence
of a positively invariant (attracting) annulus, which implies the presence of at least one asymptot-
ically stable limit cycle. For a more general statement than this or that given above, and a proof,
see [121, §10.5]. One can also deduce necessary conditions for the existence of limit cycles using
index theory; in particular, application of Proposition 3 leads to:
Proposition 4. If a planar ODE has a limit cycle Γ then at least one fixed point must lie inside Γ.
If multiple points lie within Γ, then the numbers Ns of sources and sinks, and Nc of saddle points,
must satisfy Ns − Nc = 1.
Hence if one finds a trapping region B ⊂ R2 and hopes to show that a limit cycle lies within it,
B must exclude all fixed points, so it cannot be simply-connected. For example, if there is a single
fixed point, B can be an annulus. Fig. 2.25 shows an example for the classical van der Pol equation:
x31
ẋ1 = x2 − ( 3 − x1 ), (2.77)
ẋ2 = −x1 .
Finding an inner boundary through which orbits enter the annulus is easy: one need only take a
sufficiently small circle around the unstable source at (x1 , x2 ) = (0, 0), but it is tricky to construct
an outer boundary.
The limit cycle of Fig. 2.25 is another example of an asymptotically stable or attracting periodic
orbit. Periodic orbits can occur in differential equations of any dimension n ≥ 2, but finding them
analytically is much more difficult than finding fixed points, since one must effectively solve the
ODE to do it. It is still useful, however, to develop some analytical tools. The Poincaré map is one
such tool which reduces the analysis of solutions near a periodic orbit γ to the study of a mapping
of dimension n − 1, one lower than that of the original state space. One defines a (small) piece Σ
of an (n − 1)-dimensional manifold (a hyper-plane often suffices), pierced by γ at a point p and
transverse to the flow in that the component of the vector field f normal to Σ does not vanish at
any point in Σ. The (hyper-) surface Σ is called a cross-section: solutions cross it, all in the same
51
4
x2
−1
−2
−3
−4
−5
−4 −3 −2 −1 0 1 2 3 4
x1
Figure 2.25: An annular trapping region B and the limit cycle of the van der Pol equation (2.77).
direction, with non-zero speed. By continuity, solutions starting at points q ∈ Σ near p follow γ
sufficiently closely to intersect Σ again at, say q′ , thus implicitly defining a map on Σ:
P : Σ → Σ or q → q′ = P(q) . (2.78)
P is called the Poincaré or first return map. Note that the point p at which the periodic orbit
intersects the cross-section Σ is a fixed point: p = P(p). Poincaré maps are continuous and at
least as smooth as the right hand sides of the ODEs that define them.
In the example (2.77) above, the positive x1 -axis
Σ = {(x1 , x2 )|x1 > 0, x2 = 0}
is a suitable cross-section, and while we cannot integrate (2.77) explicitly to obtain a formula for
the one-dimensional map P , the instability of the fixed point (0, 0) and attractivity of the trapping
region B from outside implies that P takes the form sketched in Fig. 2.26. It is clear that the
continuous function P must intersect the diagonal in at least one point p > 0, so that p = P (p) is
a fixed point corresponding to the limit cycle. In fact it is true, although not easy to prove, that p
is unique and that the linearised map satisfies 0 < (dP/dx1 )|x1 =p = λp < 1, implying asymptotic
stability of both the fixed point and the corresponding periodic orbit.
More generally, from the theory of iterated matrices, if an (n − 1)-dimensional Poincaré map
DP(p) linearised at a fixed point p has only eigenvalues of modulus strictly less than one, then p
and the associated periodic orbit are asymptotically stable. If at least one eigenvalue has modulus
greater than one, p and the periodic orbit are unstable. This paralells the all-negative and at-least-
one-positive eigenvalue criteria for flows, described in §2.1. The degenerate case, with one or more
eigenvalues of unit modulus, is dealt with rather like the analogous non-hyperbolic fixed point.
As for flows generated by solutions of differential equations, along with the discrete orbits of
iterated mappings come invariant manifolds, tangent to the appropriate stable and unstable sub-
spaces of the linearised maps. Suppose that DP(p) has s ≤ n − 1 eigenvalues of modulus less than
52
P(x1 )
x1
Figure 2.26: The Poincaré map for Equation (2.77). Figure adapted from [130, Fig. 6.3].
one and u = n − 1 − s of modulus greater than one (for simplicity we assume p is hyperbolic).
Then DP(p) has, on Σ, s- and u-dimensional stable and unstable subspaces E s (p), E u (p) respec-
tively. The stable manifold theorem for maps [107] then yields local manifolds Wloc s (p), W u (p),
loc
s u
tangent to E (p), E (p) at p, just as for flows. A solution to the ODE started on Σ in Wloc s (p)
s (p) closer to p at its next intersection with Σ.
will circulate near γ and return to a point of Wloc
All such solutions started in a neighborhood B(p) ⊂ Σ form the (s + 1)-dimensional local stable
s (γ), while solutions started at points of W u (p) form the
manifold of the periodic orbit γ, Wloc loc
s (γ). Fig. 2.27 shows two possible structures
(u + 1)-dimensional local unstable manifold of γ, Wloc
in the three-dimensional case in which s = u = 1. In case (a) the real eigenvalues of DP(p) satisfy
0 < λ1 < 1 < λ2 ; in case (b) λ2 < −1 < λ1 < 0. In the latter case it is easy to see that the
(diagonalised) matrix · ¸
λ1 0
0 λ2
of DP(p), which takes (x1 , x2 ) ∈ Σ to (λ1 x1 , λ2 x2 ), rotates the vector connecting (0, 0) to (x1 , x2 )
by (approximately) π. As solutions circulate in a tubular neighborhood of γ, therefore, they pick
up an odd number of half twists. In case (a) the number of half twists is even. Fig. 2.27 depicts the
simplest cases of 0 and 1 half twists respectively. In the latter case the local stable and unstable
manifolds are Möbius bands. Starting on the right of the periodic orbit, say, and following it on
either manifold for one circuit, one returns on its left.
2.4 An introduction to neural networks as nonlinear ODEs
In this section we present two abstracted examples of neural networks and their realizations as
systems of ODEs. These connectionist or firing rate models [228, 278, 279, 116] ignore the details
of spike dynamics and synaptic transmission, replacing the fine scale temporal dynamics of trans-
membrane voltages by variables describing the activations or firing rates of cells or groups of cells.
They are considerably simpler and more tractable than the Hodgkin-Huxley equations (which we
introduce in §§3.3-3.4).
Inhibitory connections lower a postsynaptic neuron’s activation in response to presynaptic ac-

tivity; this is accomplished in ODEs by negative coefficients multiplying the presynaptic activition
terms in the postsynaptic equation. Similarly, positive coefficients represent excitatory connections.
Stimuli originating in sensory neurons can thereby either increase or decrease firing rates depending
53
Figure 2.27: Periodic orbits in a three-dimensional flow with (a) orientable and (b) non-orientable
stable and unstable manifolds. The stable manifolds are lightly shaded and in (b) part of Wlocs (γ)
is removed for clarity. Figure taken from [130, Fig. 6.4].
on the neuronal response properties, but the effects are typically nonlinear.
In firing rate models, postsynaptic neural responses to inputs from other cells, and to stimuli, are
characterized by simple input/output functions or current-to-frequency transduction curves. Firing
rates are bounded: they cannot drop below zero, and they cannot exceed a maximum frequency,
controlled by the refractory period, and depending on the cell type. Thus, low level stimuli typically
have little effect, while the response saturates at high stimulus levels. Between these limits the cell
is maximally responsive to changes in stimuli. This intuitively justifies the use of sigmoid-like
input/output functions. Such functions can be derived from spiking neuron models by mean field
and kinetic theory methods [216, 243], as described in §6.2 below.
The size and scales of the “compressive effect” vary from system to system, so several functional
descriptions have been developed. In addition to the sigmoid of Eqn. (2.70), the Naka-Rushton
function describes the response S(P ) to a stimulus of strength P via three parameters:
MPN
S(P ) = for P ≥ 0 and S(P ) = 0 for P < 0 . (2.79)
σN + P N
Here S(P ) → M , the maximal response (spike rate) for strong stimuli. The parameter σ sets the
stimulus strength at which S(P ) reaches M/2 and N controls the steepness with which the rise
from 0 to M occurs. In the sigmoid of (2.70) the parameters g (gain) and β (bias, or offset) play
similar roles to N and σ in (2.79) (the maximum value was normalized to 1 in (2.70)). Piecewise
linear functions are also sometimes used [259, 30].
The following ODEs describe a simple two neuron network with the winner-take-all prop-
54
erty [277, §6.7]:
dE1 1
= (−E1 + S(K1 − 3E2 )) , (2.80a)
dt τ
dE2 1
= (−E2 + S(K2 − 3E1 )) : (2.80b)
dt τ
see Fig. 2.28. Here E1 represents the spike rate of neuron 1, which receives external input K1 , and
is inhibited by neuron 2, and E2 represents the spike rate of neuron 2, which neuron 1 inhibits.
Although we say “neuron 1 and neuron 2,” each of these mathematical variables may represent the
average activity over a population of cells of the same type, or a subgroup tuned to a particular
stimulus (as in the Usher-McClelland model of Example 7).
1 2
K1 K2
Figure 2.28: A simple network with winner-take-all dynamics. Neurons or groups of neurons are
represented as circles, and external inputs and synapses to other neurons as connecting lines or
arcs. Excitatory connections (positive inputs) terminate with an arrow and inhibitory connections
(negative inputs) with a filled circle. Self-inhibitory connections are sometimes omitted in such
depictions. The self-excitatory connections are absent in Eqns. (2.80). FIG NEEDS FIXING:
CHANGE dots on self connections to arrows!
For suitable parameter values (2.80) has two sinks whose domains of attraction are separated
by the stable manifold of a saddle point. In particular, for M = 100, σ = 120, N = 2 and
K1 = K2 = 120, they lie at (0, 50) and (50, 0) This is the essence of the winner-take-all idea. If
neuron 1, say, has a higher firing rate, either from the initial conditions or due to a stronger input,
its output drives down that of neuron 2, thus decreasing inhibitory feedback to 1 and allowing 1’s
rate to increase further. This continues until neuron 2’s state is driven to zero. This system, which
is explored further in Exercise 8 below, is a simple model for deciding between two alternatives
based on evidence (external inputs). If the solution goes to the equilibrium with neuron 1 non-zero,
then the system “chooses” alternative 1. If it goes to the equilibrium with neuron 1 driven to zero,
then it chooses 2. Note that the decision is determined both by the inputs (evidence for each choice)
and by initial conditions (e.g., bias towards one alternative). Biases that set the system close to
an equilibrium can require strong evidence to converge on the correct decision! Also, if inputs are
changed after the system makes a decision, it is likely to make the same choice in the next trial.
We take up decision models in chapter 6.
Exercise 8. Plot nullclines for the ODEs (2.80), find all the equilibria and determine their stability
types for the parameter values specified above (you may additionally take τ = 20: how does this time
constant affect stability?). Repeat for K1 = 130, K2 = 110 and for K1 = 150, K2 = 90 with the
other parameters unchanged. In each case plot nullclines and solution trajectories for five different
initial conditions. In the decision-making context, what is significant about the third case?
We now turn to a multi-stage, higher-dimensional system: a neural network describing retinal

light adaptation. See [277, §7.3] for further details and an illustrative experiment that allows you
55
to experience your own time constants. The network is illustated in Fig. 2.29, in which the letters
identify types of neurons, and, in the ODEs (2.81) below, represent spike rates of those cells, or,
more realistically, of populations of the cells.
Light reaching the retina activates cone photoreceptors (C), which excite bipolar cells (B) and
horizontal cells (H). Bipolar cells excite amacrine cells (A) and ganglion cells (G). Ganglion cell
axons conduct signals further into the visual system, which is not modeled here. Horizontal cells
inhibit the cones with subtractive feedback and the bipolar cells with feedforward inhibition. The
amacrine cells excite the interplexiform cells (P), which excite the horizontal cells. Finally, amacrine
cells inhibit the bipolar cells with divisive feedback, as in Example 6 above (albeit with a different
constant premultiplying A in the denominator). As there, in the winner-take-all network above,
and in Example 7, spike rates are assumed to decay exponentially in the absence of inputs. This
model incorporates the “H - C” model of [277, §3.3] and the “A - B” model of Example 6 and
unites them in a more general setting.
H P A
L
C B G
Figure 2.29: The retinal light adaptation circuit modeled in Eqns 2.81. Here C denotes cone
photoreceptors receiving light input (L), B and H denote bipolar and horizontal cells and A and
P are amacrine and interplexiform cells. Ganglion cells (G) provide output to lateral geniculate
nucleus and visual cortex. Excitatory and inhibitory connections are shown as arrows and filled
circles.
This six-unit network is modeled by the following ODEs, where L again represents intensity of
light incident on the retina:
dC 1
= (−C − P H + L), (2.81a)
dt 10
dH 1
= (−H + C), (2.81b)
dt 100µ ¶
dB 1 6C − 5H
= −B + , (2.81c)
dt 10 1 + 9A
dA 1
= (−A + B), (2.81d)
dt 80
dP 1
= (−P + 0.1A), (2.81e)
dt 4000µ ¶
dG 1 50B
= −G + . (2.81f)
dt 10 13 + B
The values for the time constants and other parameters are taken from [277]. The Naka-Rushton
function appears again in the equation for G, with N = 1; this function was originally invented to
describe retinal ganglion cells.
56
We start our analysis by finding equilibria and analyzing their stability. Equilibria satisfy
C = −P H + L, (2.82a)
H = C, (2.82b)
6C − 5H
B= , (2.82c)
1 + 9A
A = B, (2.82d)
P = 0.1A, (2.82e)
50B
G= . (2.82f)
13 + B
We may immediately eliminate the three variables H, A and P in favor of B and C, using H = C,
A = B, and P = 0.1A. Moreover, none of the other variables depends on G (Eqn. (2.81f) decouples),
and G depends only on B, so we can solve for C as a function of B and then form a single equation
which can be solved for B:
10L C
C= , B= ⇒ 9B 3 + 91B 2 + 10B = 10L. (2.83)
10 + B 1 + 9B
Given a light intensity L ≥ 0, this cubic polynomial has a unique positive root in B ≥ 0 (why?),
and solving it (numerically?) we may substitute back into (2.82) to get the equilibrium values for
the other five variables. From (2.83) we note that B ≈ L for small L, while for large L, B ≈ L1/3 :
the cube root nonlinearity effectively compresses the ganglion cell’s output range at high intensities.
Exercise 9. Derive the Jacobian matrix for this equilibrium in terms of L, and show that it is
stable for L = .01, .1, 1, 10, 100, 1000. [Hint: use eig() in MATLAB].
The intermediate stages are critical to light adaptation. If L were input directly to the ganglion
cells G in place of B, then a change from intensity L to L + ∆ would produce an incremetal change
in ganglion firing rate of
50(L + ∆) 50L
δG = − . (2.84)
13 + (L + ∆) 13 + L
Supposing that δG = 1 (in arbitrary units) is the smallest discriminable change that can be recog-
nized, and assuming that ∆ ≪ L, we have
50(L + ∆) 50L
δG = 1 = −
13 + (L + ∆) 13 + L
⇒ (13 + L + ∆)(13 + L) = 50[(13 + L)(L + ∆) − (13 + L + ∆)L]
(13 + L + ∆)(13 + L)
⇒ (13 + L)(L + ∆ − L) − ∆L =
50
(13 + L + ∆)(13 + L) (13 + L) 2
⇒∆= ≈ . (2.85)
50 × 13 50 × 13
Hence, for large L, the smallest discriminable intensity change increases like L2 , making intensity
differences of bright lights hard to distinguish. Sensitivity is drastically improved by adaptation,
which relies on the relatively slow dynamics occuring in the intermediate processing stages of
Fig. 2.29.
Equation (2.81) has three distinct time scales: C, B, and G have time constants of 10 ms, which
is much faster than those of the other variables, A and H have time constants of 80 and 100 ms,
57
respectively, and P has a very slow time constant of 4000 ms. Hence, following a sudden change in
L, the variables C, B, and G rapidly approach what would be a stable equilibrium if H, A and P
were fixed. But they are not, and A and H change on the intermediate timescale, while P begins
its slow change. This leads to the behavior illustrated in Fig. 2.30, where the three timescales are
apparant: a rapid rise in B and G, a slightly slower rise in A, followed by equilibration of these
three variables to the new equilibrium, while P contines to approach 0.1A very slowly.
Exercise 10. Write a MATLAB program simulating the system in Equation 2.81 for a given level
of light L. Incorporate a way to have the light step to a higher intensity during the simulation.
Explore for different starting intensities L what increment ∆ is required so that the spike in G is
of magnitude 1. Allow the system to reach steady state before you increase the intensity. Do NOT
record the new steady state adapted value of G, but measure the height of the transient spike. Find
∆ for L initially equal to 1, 2, 10, 20, 100, 200, 1000, 2000, 10000, and plot ∆ versus L on a
logarithmic scale. Also plot the value of ∆ from Equation (2.85), which does not incorporate the
adaptation mechanism (cf. [277, Fig. 7.8]).
Although the timescale separation is not really sufficient, we can give an idea of how it may be
used to reduce the dimension of the system, a method that will be important in chapter 3. Since
C, B and G equilibrate rapidly, we assume that they instantaneously attain the values specified in
Eqns. (2.81a), (2.81c) and (2.81f) as they track the other variables. Substituting into the remaining
ODEs for H, A and P , we obtain a three-dimensional system that describes behavior on the 100
to 1000 ms time scale. Furthermore, since P changes very slowly, the essential dynamics on the
intermediate 100 ms time scale is governed by the two-dimensional system
dH 1 1
= (−H − P H + L) = (−(1 + P )H + L), (2.86a)
dt 100µ 100 ¶ µ ¶
dA 1 6(L − P H) − 5H 1 6L − (5 + 6P )H
= −A + = −A + . (2.86b)
dt 80 1 + 9A 80 1 + 9A
This system changes as P , acting like a slowly-varying parameter, moves to its new equilibrium.
Here no bifurcations occur, but they can do so in other systems, and we shall use reduction methods
of this type along with bifurcation theory to elucidate the dynamics of bursting in §3.5. In general,
identifying the relevant time scales in a system, modeling at the appropriate time scales for the
phenomena of interest, and justifying the reductive methods, are all major research challenges.
Exercise 11. Find the equilibrium for Eqn. (2.86) as a function of L and P , and show that it is
stable.
Summarizing, cells C, B, and G form a direct excitatory pathway from L, but inhibition of B
from H and A, modulated by P , introduces the adaptive dynamics. Mechanisms such as this allow
sensory systems to be sensitive to small changes over broad ranges of stimuli. For an illustrative
experiment using your own retinas, see [277, Fig. 7.9, p 103].
58
25
B
G
A
20
P
15
10
0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Time (ms)
25
B
G
A
20
P
15
10
0
4000 4050 4100 4150 4200 4250 4300 4350 4400 4450 4500
Time (ms)
0.38
0.37
0.36
0.35
0.34
0.33
P
0.32
0.31
0.3
0.29
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Time (ms)
Figure 2.30: Solutions of Eqns 2.81 modeling retinal adaptation following an intensity step from
L = 100 to L = 200. Top and middle: A, B, P and G over 10 sec, and over 500 ms immediately
following the stimulus change. Bottom: P over 10 sec with expanded vertical scale.
59
Chapter 3
Models of single cells
Having developed some of our mathematical toolbox, we now return to the main topic of the
course: neurobiological modeling. Unlike the classic early books of Wiener [275] and von Neu-
mann [262] on modeling the nervous system and brain, and the models introduced in §2.4 above,
in this and the following section we shall take a reductive, cellular-level viewpoint. We start with
general observations about models.
3.1 Modeling strategies and scales
There are many approaches to the mathematical modeling of biological phenomena, each with
different advantages and uses. This course explores several of these avenues. The first and most
obvious characteristic of a model is the level or scale at which it operates (cf. [277, §1.2]). In
ecology the focus is often on populations and little detail is accorded to individual organisms beyond
attributes such as infected or susceptible, male or female, age or developmental stage, etc. While
there is considerable interest in communities of individuals engaged in collective decision-making
(in neuro-economics, for example), neurobiological models usually start at the level of individuals,
with full nervous system models, which identify the major nerves connecting parts of the body and
various organs to the central nervous system, as described in §1.4.
The next level of detail focuses on the brain, modeling interactions among regions or brain
areas within it. Zooming in further, one can subdivide each area into smaller groups of neurons,
for example: those tuned to respond to particular elements of stimuli such as horizontal bars,
and subgroups of those tuned for vertical bars (Fig. 3.1). At these levels there is no concern
for the membrane potential of a single neuron or its morphology; the scale is still too large to
be concerned with details that do not (obviously) influence the behavior. Continuing to smaller
scales, we progress to models of several neurons, at which point spike rates, membrane voltages,
and phases of individual cells are of interest. But even these models do not address the anatomical
or physiological detail of single cells.
At the next level of detail, where the Hodgkin-Huxley model resides [277, Chap. 9], the neuron is
60
Figure 3.1: Phenomena in the brain exist at many different spatial and temporal scales.
61
viewed as a single compartment containing different ions that drive transmembrane currents. Here
ionic currents and membrane potentials characterize the model, but although it is far more detailed
than those at larger scale, it is still imprecise and crude in many ways. Variations in properties
along axons and dendritic processes of cells that make them functions of space as well as time are
absent, and the morphology of the neuron is ignored. At the next level details of dendritic trees,
axons and synapse locations can be added, but even these ignore the biochemical and molecular
machinery that governs each ion channel and determines neurotransmitter release and transport
across the synaptic cleft.
Thus far we have been discussing spatial scale, from body size to molecules, but temporal
scales play equally important roles. Spatial and temporal scales usually go hand in hand, and as
illustrated in Figure 3.2 and explored in more depth later, the range of temporal scales in the brain
actually facilitates mathematical modeling and analysis, since it allows one to separate phemonema
that take place at different rates and treat them quasi-independently, approximating fast ones as
instantaneously equilibrated and slow ones as fixed. The Hodgkin-Huxley equations, treated in this
chapter, describe phenomena that last for milliseconds, and synapses work at a similar scale. The
time scale increases to hundreds of milliseconds to seconds when considering behavioral respones
to stimuli. Decision making takes place over fractions of a second to minutes. Learning includes
short- and long-term effects, with time scales ranging from minutes through hours and days to
years. Developmental neuroscience studies the formation of the embryonic nervous system, and its
relevant time scale can be months to years, depending on the species.
At any given level of detail, there are several types of models that can be developed. As noted
in §1.1, mechanistic models aim to accurately capture the underlying physiology, and thereby
make predictions about physiological behavior which can be detected experimentally. Models of
this type, while seemingly the best, are often 1) complicated (nonlinear and high-dimensional),
thereby limiting useful analysis, 2) extensive, limiting large scale simulation of linked models, 3)
characterized by many parameters which are difficult to measure or estimate, and 4) therefore
rare in practice. Other models simplify the physiology, often only focussing on one feature; they
consequently have reduced dimensionality compared to more complete models, and are therefore
often less useful for prediction and explanation. Finally, at the far end of the spectrum, are
empirical or phenomenological models that attempt to fit experimental data, without reference to
underlying physiology. Typical models rarely fall cleanly into one of these three classes; they inhabit
a continuum, with most somewhere near the middle. As these notes proceed, we will develop and
analyse a representative set of them.
Fortunately, for our first major exercise, the Hodgkin-Huxley (H-H) model is excellent in several
aspects. It is a cellular-level model which was established by careful physiological experiments
(although it is crude when viewed from the molecular scale). It and its modifications and exten-
sions have successfully described a wide variety of observations. For instance, years after their
development in the early 1950’s, it was noticed that the H-H equations predicted a phenomenon
known as hysteresis, which had never been seen experimentally. In 1980 hysteresis was verified in
experiments of Guttman, Lewis and Rinzel [109], cf. [277, p. 142]. This example demonstrates the
power of a physiologically-inspired model.
Other models that we will see in this section are simplifications of H-H, which retain physiological
inspiration but are less ccurate, less detailed and more amenable to analysis. The first is a two-
62
Figure 3.2: The range of time scales is addressed by a range of mathematical models. Part I refers
to chapter 6 and part II to chapters 3 and 4.
63
dimensional reduction due to Rinzel [220]. The second, the the FitzHugh-Nagumo (F-N) model [79,
193], predates Rinzel’s and was not directly inspired by physiology, but by examining H-H solutions
and creating a more tractable two dimensional ODE with similar dynamics. While not directly
modeling physiology, it is still very useful, since it identifies key qualitative properties that produce
the observed action potential (AP) or spike and the refractory behavior that follows it: properties
that other more detailed models share.
If one is interested in, e.g., the accumulation of spiking activity in an oculo-motor area preceding
a saccade in response to a visual stimulus (see chapter 6), one would prefer to model a single variable
– the short-term average spike rate in that region – rather than the details of membrane potentials
and ionic transport in every neuron that determine it. Thus, a major challenge is extracting a
closed model at the appropriate scale to describe the phenomenon of interest. Hodgkin and Huxley
achieved this at the cellular scale by modeling ion channels phenomenologically, fitting functions
that describe the voltage-dependence without reference to the (then unknown) molecular detail
involved in gating the channels.
In the remainder of this section we describe the H-H model, and then consider simpler models
for which mathematical analyses can go further. We go beyond APs in single cells to examine
propagation of APs, bursting effects and communication between neurons at synapses. In the
following chapter 4) we describe synaptic connections and interactions among multiple neurons in
small neural circuits that produce repetitive rhythms: central pattern generators (CPGs). However,
we must begin at a more fundamental level and smaller scale than a single neuron, which turns out
not to be so simple.
More extensive accounts of parts of the material in this and the next section can be found in
[146, 150, 75].
3.2 Ion channels and membranes
All animal cells are enclosed by a membrane consisting primarily of a lipid bilayer about 7.5 nm
thick. Its primary purpose is to separate the inside of the cell from what is outside, but it also
allows passage to various molecules, through many pores and channels. The membrane itself apart
from the pores and channels prevents the unregulated passage of water, and of sodium, potassium,
and chloride ions, and therefore resists the easy flow of electrical current. The membrane exhibits
electrical resistance and capacitance, each of which are modified by the states of the channels. Dif-
ferent channels allow the movement of different ions; hence we refer to sodium channels, potassium
channels, etc.
Ionic transport is important for many cell processes, but is essential for neural phenomena. Ions
and molecules can be moved by both active and passive processes. Water crosses the membrane
passively through osmosis, which is controlled via ion concentrations, allowing the cell to regulate its
volume. Ions also cross the membrane passively via diffusion through pores. Lipid-soluble molecules
such as carbon dioxide diffuse through the lipid bilayer itself. Differences in ion concentration drive
water osmotically, and create an electrical potential across the membrane, known as the membrane
potential. Finally, ionic concentration differences tend to decrease due to diffusion, creating the
64
need for active processes to balance the passive ones.
Active processes include pumps that exchange sodium in the cell for potassium and pumps that
remove calcium from the cell. For example, after cell death when active process of calcium Ca2+
removal stops, the Ca2+ concentration in the cell rises much higher than in a living cell, keeping
the cell in the constant tension of rigor mortis [150, Chaps. 5 and 15]. One pump exchanges three
sodium N a+ ions for two potassium K + ions, maintaining the intracellular K + concentration much
higher than in the extracellular space, and the extracellular N a+ concentration higher than in the
cell. The pump works against concentration gradients, and so requires energy (in the form of ATP)
to operate.
Since diffusion plays such an important role in determining intracellular ionic concentrations,
we briefly examine a mathematical model of it [277, Chap. 15]. This discussion will also introduce
another kind of mathematical model: a partial differential equation (PDE). Given a region of space
Ω, we denote by c = c(x, t) the concentration of the ion of interest as a function of space and time
over Ω. Letting q be the production of c per unit volume defined over Ω and J the vector flux of c
defined along the boundary of Ω, with n the unit normal to the boundary, we obtain the following
conservation law:
∂
Z Z Z
c dV = q dV − J · n dA. (3.1)
∂t Ω Ω ∂Ω
(In words: the rate of change of c in Ω = production − loss through boundary: this is an example
of a conservation law.) By the divergence theorem, we have
Z Z
J · n dA = ∇ · J dV, (3.2)
∂Ω Ω
so that for a fixed region Ω (3.1) becomes

∂c
Z Z
dV = (q − ∇ · J) dV. (3.3)
Ω ∂t Ω
This integral conservation law holds for any fixed region Ω, and since the region is arbitrary, the
integrand must be identically zero. We therefore obtain a partial differential equation (PDE)
describing the rate of change of c:
∂c
= q − ∇ · J. (3.4)
∂t
To close this equation, we need an expression for the ion flux J, in terms of c. Fick’s law, which
is not a natural law, but a good approximation under many circumstances (similar to Ohm’s Law),
states that flux is proportional to the negative of the concentration gradient. This matches our
intuitive understanding that ions tend to flow from regions of high concentration to those of low
concentration. Substituting Fick’s law with diffusion coefficient D into (3.4), we obtain
∂c
= ∇ · (D∇c) + q. (3.5)
∂t
Specializing to a single space dimension (x = x), (3.5) becomes

µ ¶
∂c ∂ ∂c
= D + q, (3.6)
∂t ∂x ∂x
65
∂ c 2
and if D is constant, the right hand side may be written D ∂x 2 : the classical “heat equation” first
proposed by Fourier to describe heat conduction. Note that in steady state, with no local production
(eg: release of bound ions such as Ca2+ from the endoplasmic reticulum), the concentration profile
is linear. This is easily checked.
Random, passive diffusion and active pumps are not the only processes affecting ion concentra-
tions. Positive ions tend to move from high to low electrical potentials. Let z be the valence of
z
the ion, so the quantity |z| = ±1 denotes the sign of the charge on the ion. As above, c is the
concentration of the ion, and u describes its mobility: a different constant for each ionic species.
Finally, φ denotes the electrical potential, and recall from physics that a potential gradient is the
same as an electrical field. Planck’s equation describes the ion flow resulting from such a potential
gradient [150, §2.6]:
z
J = −u c∇φ. (3.7)
|z|
There is thus a flux due to concentration gradients and a flux due to potential gradients. Each is
governed by a constant specific to an ion. Einstein realized that the constants for the same ion
were related, and developed a relationship between the diffusion constant D in Fick’s law, and the
ionic mobility u in Planck’s equation [150, p. 83]. If F is Faraday’s constant, T is the absolute
temperature, and R is the universal gas constant, this reads:
uRT
D= . (3.8)
|z|F
Combining the concentration-driven diffusion (3.5) with the Planck’s equation flux (3.7) using
Einstein’s relationship (3.8) yields the Nernst-Planck equation
µ ¶
zF
J = −D ∇c + c∇φ . (3.9)
RT
From this equation, which relates ion flux, concentration difference, and potential difference,
we can now derive the Nernst potential : the transmembrane potential difference consistent with
zero flux for a given concentration difference. We model the membrane as one-dimensional without
spatial variations parallel to the boundary: Fig. 3.3. Partial derivatives in (3.9) become one-
dimensional spatial derivatives, gradients also become spatial derivatives, as in (3.6), and setting
the flux J = 0 allows one to solve for the potential difference:
µ ¶ µ ¶
dc zF dφ 1 dc zF dφ
−D + c =0 ⇒ + = 0. (3.10)
dx RT dx c dx RT dx
Integrating (3.10) across the membrane length, with ce and ci denoting the external and internal
concentrations and φe and φi the potentials, we obtain
zF
ln(c) |ccei = (φi − φe ). (3.11)
RT
This equation relates concentration differences to potential differences for the zero flux condition,
which is the condition for the Nernst equation. Recognizing the difference V = φi − φe as the
membrane potential and exponentiating (3.11) yields the Nernst equation:
µ ¶
RT ce
V = ln : the Nernst potential. (3.12)
zF ci
66
Intracellular Extracellular
[K+ ]in [K+ ]out
−
−
[Cl ]in [Cl ]out
[Na +]in [Na +]out
i e
Cell
membrane
V= i
− e
Figure 3.3: Schematic of a 1-d cell membrane showing ion concentrations and membrane potential.
However, even if one ionic species is in equilibrium according to (3.12), others will typically not be,
thus creating nonzero fluxes and currents that can change the potential. The Nernst equation (3.12)
is derived from first principles of thermodynamic equilibrium and does not depend on our model,
but to obtain the potential for zero net electrical current, we must model the current-voltage
relationship for the cell membrane. We have no first-principles argument to derive the current, and
indeed this I − V relationship differs for different cells, so here we describe two simple models that
cover a wide range of cases.
The Goldman-Hodgkin-Katz (GHK) current equation is obtained from the Nernst-Planck equa-
tion [150, §2.6.3] as follows. Assuming a constant electric field across a cell membrane of thickness L
and potential V , the field within the membrane is E = −dφ/dx = −V /L. This is then substituted
into (3.9) to obtain
dc zF V J
− c+ = 0, (3.13)
dx RT L D
which we can solve for c(x). Note that we may take J(x) = J as constant, since the current
does not change across the membrane, for there is no local accumulation of charge. The boundary
conditions are that c(0) = ci and c(L) = ce , and using the former we get:
µ ¶ · µ ¶ ¸
−zV F x JRT L −zV F x
exp c(x) = exp − 1 + ci . (3.14)
RT L DzV F RT L
So far we have only used the intracellular boundary condition c(0) = ci but we may now choose
the transmembrane current J such that c(L) = ce . We also rename D/L as the parameter PS :
the permeability of the membrane to ion S, and setting IS = zF J we replace J by IS : the GHK
current for ion S. Here J is the flux in moles per unit area per unit time and zF is the charge
carried per mole of ion S. Thus IS is the charge per unit area per unit time, or the current per unit
area:
z 2 F 2 ci − ce exp −zF V
¡ ¢
RT
IS = PS V ¢ . (3.15)
1 − exp −zF V
¡
RT RT
Note that J = 0 and hence the GHK current (3.15) is also zero at
µ ¶
RT ce
V = ln , (3.16)
zF ci
which is the Nernst potential, so the GHK ionic current model is consistent with our work from
first principles.
67
Exercise 12. Solve Equation 3.14 for c(x), with the given boundary conditions, and verify that
using Equation (3.15) in Equation (3.14) results in c(L) = ce .
We now wish to determine the membrane potential at which the net ionic current is zero: the
resting potential. Obviously, for given intra- and extracellular concentrations, each ion has a Nernst
potential given by (3.12), but how do we get the overall membrane resting potential? The GHK
current equation (3.15) allows us to add the currents and obtain the requisite balance. With some
ions with valence z = 1 and others with valence z = −1, we get zero total current if
X cj − cje exp −V F j j
¡ ¢ ¡V F ¢
i
X c i − c e exp
0= Pj ¡ −VRT
F
¢ + Pj ¡ V FRT¢ , (3.17)
z=1
1 − exp RT z=−1
1 − exp RT
which allows us to solve for the GHK potential V:

j j
ÃP P !
RT z=−1 Pj ci + z=1 Pj ce
V = ln P j j
. (3.18)
F
P
z=−1 P j c e + z=1 Pj c i
We can now express the resting potential of a neuron in terms of concentrations and permeabilities.
Remember that the GHK derivation requires the constant electric field assumption, whose validity
depends on the type of cell, the concentration levels, and other factors.
Exercise 13. Given that RT /F ≈ 25.8, and that for the squid giant axon the intracellular and
extracellular concentrations of the ions are [K + ]in = 397, [K + ]out = 20, [N a+ ]in = 50, [N a+ ]out =
437, [Cl− ]in = 40, and [Cl− ]out = 556, and that the permeability ratios are PK : PN a : PCl = 1 :
0.03 : 0.1 at rest, solve for the individual Nernst potential (3.16) of each ionic species, and also,
from (3.18), for the resting potential. [Note that only permeability ratios are needed for this, but
permeabilities change with the voltage, so we are using their values at resting potential to determine
the voltage at rest.]
Note that the resting potential determined by (3.18) will not, in general, coincide with any of
the Nernst potentials for individual ions given by (3.16). Thus, one would expect passive ionic flow
to occur, thus changing the concentration ratios. This is where the active ionic pumps, referred to
above, enter: they balance individual passive ionic flows, and so can maintain concentration ratios
at equilibrium.
We now further explore the relationships between membrane currents and potentials and electri-
cal circuits, and introduce a second and much simpler model for ionic currents. Assuming that the
membrane has constant capacitance per unit area, the cross-membrane ion current is proportional
to the rate of change of membrane potential, with the capacitance the constant of proportionality.
There is also a negative sign, since a positive current (outward), results in reduced potential inside
the cell:
dV
Cm = −Iion . (3.19)
dt
Equation (3.19) can be understood from Kirchhoff’s loop law applied to the circuit of Fig. 3.4.
The ionic current depends on the membrane conductance for that ion, as well as the Nernst and
membrane potentials. Each ion has its own Nernst potential, and since different channels are
68
Extracellular
I ion Cm C mdV/dt
Intracellular
Figure 3.4: The cell membrane as an electrical circuit, with ionic and capacitative currents labeled
as in Equation 3.19.
permeable to specific ions, each ion has a different membrane conductance, which usually depends
on the membrane potential. Various conductance forms are used in the H-H model.
Our second model for ionic currents assumes a linear I − V relationship. Denoting by V the
membrane potential, gS the specific ion’s membrane conductance, and VS its Nernst potential, the
following simple equation replaces (3.15) for the ionic current:
IS = gS (V − VS ). (3.20)
Constant conductance gS implies a linear or ohmic relationship between potential difference from
Nernst potential and current ( Eqn. (3.20) is Ohm’s law). However, cells usually exhibit voltage-
dependent membrane conductances, and the voltage-current relationship (3.20) becomes nonlinear,
as it will in the H-H equations. Recall that what is relevant here is the difference (V − VS ) from
equilibrium Nernst potential, not the potential difference from zero, and that this equation holds
regardless of whether the ion is positively or negatively charged. This is because if (V − VS ) < 0,
positive ions will flow in and negative ions will flow out, both of which are negative currents.
Similarly, for (V − VS ) > 0, positive ions flow out, and negative ions flow in, both of which are
positive currents.
We can use the ohmic conductance model to solve for the resting potential of the squid gi-
ant axon. With VS donoting the Nernst potential of ion S, and including the three species
K + , N a+ , Cl− , we have:
dV
Cm = −gK (V − VK ) − gN a (V − VN a ) − gCl (V − VCl ), (3.21)
dt
which yields the resting potential (for zero applied current) upon setting the right hand side equal
to zero. Note that the sign of each current changes when V passes through VS ; hence the Nernst
potential VS is sometimes called the reversal potential for the ion S.
Exercise 14. Using the Nernst potentials calculated in Exercise 13, and assuming the same ratios
of conductances as permeabilities, solve (3.21) for the resting potential V and compare to the result
of Exercise 13.
Substantially more information on these and related topics can be found in [150, Chaps. 4-5]
and [146, Chaps. 2-3]. The latter reference covers Fick’s law, with derivations of the Einstein,
69
Nernst-Planck, and Nernst equations. In particular, Section 2.2.4 has a good explanation of space-
charge neutrality. The GHK equations are derived, and there are examples of how to use the
concepts, including a worked-out resting potential exercise on p. 30. Chapter 3 has equivalent
circuit descriptions.
3.3 The Hodgkin-Huxley (H-H) equations
The Hodgkin-Huxley (H-H) model of the squid giant axon [126] is a triumph of mathematical
modeling. Their formulation of an ODE model that reproduces the action potential, and extension
to a partial differential equation (PDE) to explain its propagation along the axon, marks the
beginning of mathematical neuroscience. The model was built on more than 15 years of beautiful
and painstaking experimental work, interrupted by World War II, that culminated in a remarkable
series of papers [127, 128, 124, 123, 125, 126]. This gained Hodgkin and Huxley a Nobel prize in
1963, along with J.C. Eccles (for his work on synapses and discovery of excitatory and inhibitory
post-synaptic potentials: see §4.1). Here we give only the essence of the final mathematical model
from [126]; for an introduction with some historical notes, see [150, §5.1]. See [122] for a personal
story that also has general relevance for scientific research.
Exercise 15. A possible journal club team project: Read and present the experimental H-H(-K)
papers [127, 128, 124, 123, 125].
Figure 3.5: The equivalent circuit for the giant axon of squid, reproduced from [126, Fig. 1].
The leak conductance gL is assumed constant, but the sodium and potassium conductances vary,
indicated by the arrow crossing the resistor symbol. Batteries in series with each resistor represent
reversal potentials. The external current I flows inward in the convention of [126], in contrast to
our convention of outward flow (Fig. 3.4).
The H-H model comprises four coupled nonlinear first-order differential equations, listed below,
70
and followed by notes on their derivation, explanations and analysis:
dv
Cm = −ḡK n4 (v − vK ) − ḡN a m3 h(v − vN a ) − gL (v − vL ) + Iapp , (3.22a)
dt
dn
= αn (v)(1 − n) − βn (v)n, (3.22b)
dt
dm
= αm (v)(1 − m) − βm (v)m, (3.22c)
dt
dh
= αh (v)(1 − h) − βh (v)h. (3.22d)
dt
Equation (3.22a) expresses the change in membrane potential in terms of four ionic currents, the
potassium current, the sodium current, a leakage ion current, and an external applied current,
according to Kirchhoff’s law applied to the equivalent circuit of Fig. 3.5. The simple circuit of
Fig. 3.4 lumps all the currents; here they are distinguished by the resistors in parallel. Bars have
been added to the leading terms in the sodium and potassium conductances to indicate that they
now denote constant parameter values that multiply time-dependent functions n(t), m(t) and h(t)
to form the “dynamical” conductances gK = ḡK n4 and gN a = ḡN a m3 h of Eqn. (3.22a). The leak
conductance gL = gCl , primarily due to chloride ions, remains constant. The external current Iapp
may derive from synaptic inputs, electrical (gap junction) contacts with other cells, or from an
intracellular electrode (it is sometimes called Iext in that case). Note that in writing Eqns. 3.22
and the expressions (3.23-3.25) below we have adopted the usual convention of increasing voltages
leading to an action potential. In [126], the signs of v and the currents are reversed.
Voltage dependence in the potassium and sodium conductances is accounted for the channel
gating variables n, m and h, which model the opening and closing of the ion channels. These
variables evolve under Eqns. (3.22b-3.22d), whose coefficients αn , . . . βn depend on voltage as shown
in Eqns. (3.23-3.25), so that the entire set of equations is coupled. Voltage dependences are also
characterized by their reversal potentials vK , vN a , vL : as the name suggests, the directions of the
currents change as the membrane potential crosses these values. The leakage current gL (v − vL )
combines the effect of all other ions, the most important of which is chloride, and its membrane
conductance gL is assumed to remain constant over the relevant voltage range.
The forms of the α and β functions are:

µ ¶
10 − v −v
αn = 0.01 ¡ 10−v ¢ , βn = 0.125 exp , (3.23)
exp 10 − 1 80
µ ¶
25 − v −v
αm = 0.1 ¡ 25−v ¢ , βm = 4 exp , (3.24)
exp 10 − 1 18
µ ¶
−v 1
αh = 0.07 exp , βh = ¡ 30−v ¢ . (3.25)
20 exp 10 + 1
Sometimes the gating equations are rewritten in the form
dn αn (v) 1
τn (v) = n∞ (v) − n, where n∞ (v) = and τn (v) = (3.26)
dt αn (v) + βn (v) αn (v) + βn (v)
and similarly for m and h, to emphasize the equilibrium potential, n∞ (v), for the gating variable
and the time scale, τn (v).
71
A Note on units: Units are not always used in a consistent manner. So that numerical values
remain reasonable, however, the following choices are often made. Membrane voltage: milliVolts
(mV); currents: microAmps (µA); capacitance: microFarads (µF); conductance: milliSiemens =
10−6 × 1/milliOhm (mS). The units of Eqn. (3.22a) are then µA for the cell as a whole. Cell size
can be removed by expressing membrane capacitance and ion channel conductances as µF/cm2 and
mS/cm2 , giving current densities per unit area in µA/cm2 . The gating variables m, n and h are
dimensionless and take values in the unit interval [0, 1]. The usual time scale is milliseconds (ms).
In (3.22c-3.22d), each of the α’s and β’s is a function of voltage empirically fit by Hodgkin and
Huxley to voltage clamp data (see below). The resulting six functions have units of ms−1 and
depend on voltages in mV measured with respect to the resting potential of the cell. The constants
are gN a = 120, gK = 36, and gL = 0.3. vN a = 115, vK = −12, and vL = 10.6. Note that the resting
potential is very close to the potassium Nernst potential vK , compared to the sodium potential
vN a . This is because the equilibrium conductance is higher for potassium than for sodium, and it
can drive the resting potential closer to the potassium Nernst potential.
As noted above, the H-H model was developed through ingenious experimental work by Hodgkin,
Huxley, Katz, and others. They used a voltage clamp technique that built on earlier work by Cole
and Marmont. A metallic conductor was threaded inside the axon along its full length, thus
inducing a uniform membrane potential so that voltage is independent of space. Voltage can then
be stepped from one value to another and held constant, and the current Iapp (t) required to maintain
it recorded as a function of time. Since there is no accumulation of charge in the axon, the applied
current exactly balances the transmembrane ionic currents. Combined with manipulations of the
extracellular solution (replacing sodium with inert choline, for example), this device permitted
characterizations of the individual ionic currents, leading to the gating functions of (3.24-3.23) and
the polynomial fits n4 and m3 h for conductance dependence in (3.22a). Keener and Sneyd [150,
§5.1] provide a good summary, which we now sketch.
In their experiments, Hodgkin and Huxley noticed that during a spike an initial inward current
was followed by outward current flow. They hypothesized that the inward current was due to
sodium ions, since open sodium channels would allow the higher extracellular concentrations to flow
inward. The outward current was assumed to be due to potassium ions, given the higher potassium
concentrations inside the cell. (Recall that a hyperpolarizing current makes the membrane potential
more negative, and a depolarizing current makes it more positive.) Separating the potassium current
as described above, its conductance was seen from experimental data to exhibit a sigmoidal increase
and an exponential decrease. Hodgkin and Huxley modeled this in terms of the fourth power of a
new variable, n(t): the potassium activation. The fourth power was not physiologically motivated,
but was the lowest exponent that provided an acceptable fit to the data (a biophysical interpretation
of this exponent will be given later). A mathematical examination of the response of n to a step
increase and a step decrease in v shows that n(t) raised to the fourth power reproduces the sigmoid-
like increase from n = 0, and the exponential decrease from n > 0, as seen in experiments (cf. [126,
Figs. 2 and 3]).
Exercise 16. Solve the ODE (3.22b) (or (3.26)) for n(t) with an applied voltage step of magnitude
Va .
Exercise 17. Numerically integrate the ODE (3.22b) (or (3.26)) in response to a voltage step from
0, which had been a steady state, to 60 mV for 10 ms, followed by a voltage step back to 0 and let
72
run until n settles to a steady state. Also, plot the exact solution from Exercise 16 for these same
parameters; you will need piecewise solutions, since Va will change at 10 ms.
The spike initiation dynamics suggested that sodium conductance is more complicated, involving
two processes, one of which rapidly activates the sodium current, while the other turns it off more
slowly [126, Fig. 6]). This led to proposal of sodium activation and deactivation variables m(t)
and h(t). Acceptable fits were again obtained by a power law: m3 h. This elegant solution models
the conductance by two underlying processes. Activation m(t) starts small (near 0) and increases
quickly, causing the initial inward sodium current, while inactivation h(t) starts near 1 and slowly
decreases, eventually shutting off the sodium current. The α and β functions are again chosen
to match experimental data. Here, there is no obvious physiological explanation, but the goal
of modeling is to use empirical data fits as little as possible, but to fill in the areas where the
physiology is not understood. This allows the physiological model to have the correct interactions
(matching data) with the parts of the system that are not physiologically modeled.
Fig. 3.6 shows the time course of a typical action potential simulated by numerical solution of
the Hodgkin-Huxley equations (3.22), and Fig. 3.7 shows the time course of the gating variables
over the same time period. Note the four phases of the action potential. First, there is a sharp
increase in membrane potential called the upstroke as the sodium conductance increases quickly
and drives the potential up towards the sodium Nernst potential. This excited or depolarized
state does not last long. At these higher voltages, h decreases, lowering the sodium conductance,
while n increases, increasing the potassium conductance and driving the voltage down towards the
potassium potential. During the ensuing refractory period, no more action potentials are possible;
m recovers quickly to its resting value, but n stays high and h remains low for a time, since their
equations have longer time constants, thus holding the potential low and preventing it from spiking
again. When n and h return to values that will allow another spike, the neuron is said to be in its
recovery phase.
In interpreting the auxiliary variables m, n and h in terms of channels in the cell membrane that
allow the ions to pass, we appeal to the statistics of large numbers. A given channel is open or closed,
depending on the states of gates within it. Specifically, each gate within a channel is stochastically
open or closed with a probability that depends on the voltage and the equation governing its gating
variable. Thus, n determines the probability that one of four gates in a potassium channel is open,
so that the percentage of open channels is proportional to n4 . Over a large number of ion channels,
the probability that a given one is open is very close to the percentage that are actually open. The
sodium channel is modeled as having three m gates and one h gate, all of which must be open for
the channel to be open. Then the percentage of open gates is m3 h.
The numerical simulations of Figs. 3.6-3.7 give some information on the H-H equations, but
it would be nice to have a more comprehensive picture. Such an analysis is necessarily more
qualitative, because we are not interested in the numerical results in a single case, but rather in the
general solution patterns for a range of parameters and initial conditions. In the next section we
sketch two-dimensional simplifications of H-H that reveal the mathematical mechanisms for spiking
more clearly.
In addition to the H-H papers cited at the beginning of this section, good supplements and
sources for the material include Keener and Sneyd [150, Chaps. 4-6] and Johnston and Wu [146].
73
120
100
80
membrane potential (mV)
60
40
20
−20
0 2 4 6 8 10 12 14 16 18 20
time (ms)
Figure 3.6: The time course of membrane voltage during an action potential and the subsequent
refractory period. Note that the voltage scale has been shifted so that the resting potential is at 0
mV.
0.9
0.8
m (solid), h (dashed), n (dotted)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 2 4 6 8 10 12 14 16 18 20
time (ms)
Figure 3.7: Gating variable evolutions during an action potential and the refractory period: m solid,
n dash-dotted and h dashed. Note the differing timescales and the approximate anticorrelation of
n(t) and h(t).
74
The latter has general information on membrane conductance models in Chapter 5, concluding with
a gate model leading in to Chapter 6 on the H-H equations. Chapter 7 contains a detailed discussion
of different potassium, sodium, calcium, and other ionic currents, including high-threshold calcium,
low-threshold calcium, calcium-gated potassium, etc... Chapters 8-10 provide good coverage of
stochastic models of ion channels. Chapter 8 is an introduction to the molecular structure of ion
channels, Chapter 9 is an introduction to basic probability, and Chapter 10 uses the Chapman-
Kolmogorov equation to analyze transition schemes for channel gating. These topics are very
interesting but go considerably beyond the detail needed for this course. A final project could
certainly be done in this area.
3.4 Two-dimensional simplifications of the H-H equations
In this section, we introduce two simplifications of the Hodgkin-Huxley equations, a two-

dimensional reduction of H-H, and the FitzHugh-Nagumo (F-N) equations. See [277, Chaps 8-9].
Although the FitzHugh-Nagumo model predated the H-H reduction [78, 79, 193], we start with the
latter. It was first proposed by Krinsky and Kokoz [161] and subsequently and independently, by
Rinzel [220]. We follow Rinzel’s description.
Exercise 18. Plot projections of solutions of the H-H equations (3.22) in a repetitively spiking
mode with input current I = 15, onto planes defined by the following pairs of state variables:
(n, h), (m, n), and V, n. Explore the parameter space a little. See Fig. 3.8 for an example.
1 1 1
0.8 0.8 0.8
0.6 0.6 0.6

h
0.4 0.4 0.4
0.2 0.2 0.2
0 0 0
0 0.5 1 0 0.5 1 0 50 100
n m V
Figure 3.8: Phase plane projections for Eqns. (3.22). (Left) h plotted versus n: note that h ≈ 0.8−n.
(Center) n plotted versus m: note that n changes little during the rapid change in m, and that
m is approximately constant during the slow change in n. (Right) n plotted versus V : the two
variables retained in Rinzel’s reduction.
In examining the behavior of the four H-H state variables, Rinzel noted that m(t) changes
relatively rapidly because its timescale τm = 1/(αm + βm ) is small relative to τn and τh in the
relevant voltage range. See Eqns. (3.24-3.26) and Figs. 3.7 and 3.8. He therefore ignored transients
in m and assumed that it is always approximately equilibrated so that ṁ ≈ 0, implying that
αm (v)
m(t) ≈ m∞ (v) = , (3.27)
αm (v) + βm (v)
from (3.24) (cf. Eqn. (3.26)). Following Fitzhugh [79], he further noted that the variables n(t)
and h(t) are approximately anti-correlated in that, throughout the action potential and recovery
75
phase, they remain close to a line of slope −1: h = a − n (see Figs. 3.7 and 3.8). This allowed him
to eliminate m and h as state variables, dropping Eqns. (3.22c) (3.22d) and replacing m and h in
(3.22a) by the function m∞ (v) and by h = a − n. (Wilson [277, §9.1] gives the value a = 1, but
for the “classical” parameters of the Hodgkin-Huxley paper [126], a = 0.8 is more appropriate: see
Fig. 3.8.) He thereby reduced the H-H system (3.22) to the two variables, v and n:
dv
Cm = −ḡK n4 (v − vK ) − gN a m3∞ (v)(1 − n)(v − vN a ) − gL (v − vL ) + Iapp , (3.28a)
dt
dn
= αn (v) (1 − n) − βn (v) n. (3.28b)
dt
This reduction to a planar system can be made rigorous by use of geometric singular perturbation
methods [147].
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
n
0.4 0.4 limit cycle
0.3 0.3
n’=0
n’=0 V’=0
0.2 0.2 V’=0
0.1 0.1
0 0
0 20 40 60 80 100 120 0 20 40 60 80 100 120
V V
Figure 3.9: Phase planes of the reduced H-H equations (3.28), showing nullclines (bold) for Iapp = 0
(left) and Iapp = 15 (right). Diamonds are at the end of flows and thereby indicate the direction of
the vector field. Approximately horizontal components correspond to fast flows and solutions move
slowly near v̇ = 0: the slow manifold.
We can now use nullclines and other techniques described in §2.3 to study the two-dimensional
phase portrait of (3.28). However, while the ṅ = 0 nullcline
αn (v)
n= (3.29)
αn (v) + βn (v)
can be written explicitly as a function of n in terms of V , the v̇ = 0 nullcline, given by setting the
right-hand side of (3.28a) equal to zero, demands solution of a quartic polynomial in n. This can
be done numerically to yield the phase portrait of Fig. 3.9. The lefthand plot is for Iapp = 0 and
features an attracting fixed point near v = 0 and two other fixed points (a saddle and a source).
The righthand plot, for Iapp = 15, shows a limit cycle which corresponds to periodic spiking. To
illustrate the rich dynamics that a planar system with nonlinear nullclines can exhibit, we have
chosen parameter values for which Eqn. (3.28) has three fixed points; for others, it has only one,
as do the original HH equations [220].
76
120 1
0.9
100
0.8
80 0.7
0.6
60
0.5
n
40
0.4
20 0.3
0.2
0
0.1
−20 0
0 10 20 30 40 50 0 20 40 60 80 100 120
t V
Figure 3.10: (Left) Plot of voltage V versus time t for (3.28) with zero applied current (flat, blue)
and for Iapp = 15 (spikes, green): spike shape is similar to that of full H-H system, Fig. 3.6. (Right)
With Iapp = 0 the (V, n) phase plane contains a stable fixed point at V = 0, marked *; with
Iapp = 15, the state leaves that point and approaches the limit cycle.
Fig. 3.10(left) shows the voltage profile versus time for (3.28): note that the spikes are qual-
itatively similar to those of the full H-H system (3.22) (Fig. 3.6). The right panel of Fig. 3.10
again shows the (v, n)-phase plane. For applied current Iapp = 0, solutions settle to the stable fixed
point at V = 0, but for Iapp = 15 they approach a stable limit cycle, representing periodic spiking
behavior. The horizontal segments of the limit cycle correspond to fast flows in the phase plane
seen in Fig. 3.9. This figure also illustrates the threshold potential vth , at the local minimum of the
v̇ = 0 nullcline. When the fixed point lies to the left of this, as it does for Iapp = 0, solutions may
spike in response to perturbations that push v past vth , but absent further perturbations the state
will settle at the stable fixed point. When it moves to the right of vth (Iapp = 15) it loses stability
and solutions repeatedly reach and cross threshold, leading to autonomous spiking.
The F-N equations [78, 79, 193] have a similar mathematical structure to that of Rinzel’s H-H
reduction in that the nullclines have similar qualitative forms, but are defined by simple cubic and
linear functions rather than the complicated sigmoids of (3.24-3.23). Moreover, apart from the
two timescales, the physiological interpretation and physical units have vanished, but the major
qualitative properties remain. We reproduce the version from [277, §8.3 ]:
v3
µ ¶
1
v̇ = v− − r + Iapp , (3.30a)
τv 3
1
ṙ = (−r + 1.25v + 1.5). (3.30b)
τr
Wilson chooses values τv = 0.1 and τr = 1.25, which we adopt for the calculations to follow, but we
write the time constants as parameters to indicate that they may vary (depending on temperature,
for example, as in H-H [126]). Here v still represents the membrane voltage, but r is a combined
effective gating/recovery variable. The time constants 1/10 (fast) and 1/0.8 (slow) have been
chosen to reflect the rapid upstroke and downstroke in the action potential (cf. Fig. 3.6) and the
slower hyperpolarized, subthreshold dynamics, but as one can see from the time courses plotted in
Fig. 3.11, the relative durations of the depolarized and hyperpolarized episodes are approximately
equal, unlike the H-H dynamics of Figs. 3.6-3.7. The reason for this becomes clear when we examine
77
the nullclines
v3
ṙ = 0 : r = 1.5 + 1.25v and v̇ = 0 : r = v − + Iapp (3.31)
3
shown in the phase portrait of Fig. 3.12. Provided that τv ≪ τr (and 0.1 ≪ 1.25 in this case),
the vectorfield of (3.30) is dominated by its large v̇ component everywhere except in an O(|τv /τr |)
neighborhood of the v̇ = 0 nullcline. The flow therefore moves approximately horizontally and
quickly towards this slow manifold and follows it closely in the direction determined by the slower
component (3.30b) of the vectorfield. This leads to the slow climb up the righthand branch of the
cubic v̇ = 0 nullcline and the slow descent of its left-hand branch, punctuated by fast jumps up
and down in voltage when the solutions leave the neigborhood of the attracting branches of the
nullcline.
2.5 2.5
2
2
1.5
1 1.5
0.5
1
V
−0.5 0.5
−1
0
−1.5
−2 −0.5
0 10 20 30 40 50 0 10 20 30 40 50
t t
Figure 3.11: The time courses of the voltage and recovery variables v (left) and r (right) of the
FitzHugh-Nagumo equations (3.30) for Iapp = 0 (blue dashed) and Iapp = 1.5 (green solid).
Note that (3.30) has a single equilibrium point for all values of Iapp , as one can see by examining
the cubic equation that results from setting r = 1.5 + 1.25v (ensuring ṙ = 0) in v̇ = 0, to obtain:
v3
+ 0.25v + 1.5 − Iapp = 0. (3.32)
3
dr
Alternatively, note that the cubic v̇ = 0 nullcline of (3.31) satisfies dv = 1−v 2 and so has maximum
(positive) slope 1, while the linear nullcline v̇ = 0 has slope 1.25. Hence they can only intersect
once. However, as Iapp varies, the stability of this equilibrium can change, creating limit cycles in
Hopf bifurcations.
Exercise 19. Show that, as Iapp increases in (3.30), a Hopf bifurcation takes place, and that a
stable limit cycle appears. Do you find behavior reminiscent of the second part of Exercise 7?
See [79, especially Fig. 1, pp 448-9], [220] and [277, §8.3].
Exercise 20. Compute and compare phase portraits and bifurcation diagrams for the full H-H
equations (3.22) with those for the F-N equations (3.30). Refer to Wilson [277, §9.2] for a similar
comparison; also see [79].
78
6 6
5 5
V’=0 r’=0 r’=0

4 4
V’=0
3 3
r 2 2 limit
r
cycle
1 1
0 0
−1 −1
−2 −2
−5 0 5 −5 0 5
V V
Figure 3.12: The phase plane of the FitzHugh-Nagumo equations (3.30), showing nullclines and
indicating fast and slow flows for Iapp = 0 (left) and Iapp = 1.5 (right). On the left there is a unique
attracting fixed point, but on the right an asymptotically stable limit cycle surrounding a source
produces the periodic spiking seen in Fig. 3.11. Nullclines and solutions are shown as in Fig. 3.9.
Wilson [277, §9.2] presents a simplification of Rinzel’s reduction (3.28) that also uses cubic and
linear functions, but which better captures the spike/recovery dynamics and the spike rate vs.
applied current characteristic.
3.5 Bursting neurons
Thus far we have considered models such as that of H-H that produce single spikes driven by fast
depolarizing (N a+ ) currents, separated by refractory periods controlled by slower hyperpolarizing
(K + ) currents. Bursting – the clustering of multiple spikes followed by a refractory period of
relative quiescence – can also occur and can vary substantially in form and function [150, Chap. 9]
and [277, Chap. 10]. The mechanism can be described qualitatively as the interaction of two
subsystems dynamically separated by their intrinsic time scales: a faster one, typically governed by
the sodium and potassium channels, which can either be at rest or exhibit (periodic) oscillations,
and a slow subsystem driving the first through its quiescent and oscillatory states in a quasi-
static manner. The slower mechanism can be attributed to accumulation of intracellular calcium
ions (referred to as calcium dynamics [150, Chap. 7]) that mediates a hyperpolarizing potassium
current, or to other slow voltage-dependent processes.
Singular perturbative reduction methods [147], mentioned in passing in §3.6.1, can also provide
79
insights into the dynamics of bursting neurons, which may be written in the general form
u̇ = f (u, c) , (3.33a)
ċ = ǫg(u, c) , (3.33b)
where the vector u = (v, w) ∈ Rn , v denotes the cell membrane voltage, w = (w1 , . . . , wn−1 ) repre-
sents a collection of n − 1 gating variables wi , and ǫ ≪ 1 is a small parameter. The variable c may
represent calcium concentration or, more generally, any (very) slowly varying quantity responsible
for bursting.
The subset of fast equations (3.33a) generally takes the Hodgkin-Huxley (HH) form (3.22):
C v̇ = −Iion (v, w1 , . . . , wn , c) + Iext (t) (3.34a)

wi (v) − wi
ẇi = ∞ ; i = 1, . . . , n − 1, (3.34b)
τi (v)
where the term Iion (. . .) represents the sum of all ionic currents, and the functions wi∞ (v) and τi (v)
take forms similar to those of the H-H system, cf. Eqns. (3.24-3.26). Some of the gating equations
(3.34b) may be so much faster than others that those variables may be assumed to be equilibrated,
but it is implicit that all are fast in comparison to the slow variable c.
Wilson [277, Chap. 10] and Keener and Sneyd [150, Chap. 9] both provide reviews of bursting
models, including those proposed to describe oscillations in β-cells of the pancreas, where insulin
is secreted. Here we merely sketch some of the key ideas, basing our discussion on the Sherman-
Rinzel-Keizer (SRK) model [241] for pancreatic cells. This is a minimal burster model consisting
of three ODEs, two fast and one slow:
C v̇ = −ḡK n(v − EK ) − ICa (v) − gKCa (Ca) · (v − EK ) (3.35a)

n∞ (v) − n
ṅ = δ (3.35b)
τn (v)
˙
Ca = ǫ(−αICa (v) − kCa Ca) . (3.35c)
where
Ca
ICa (v) = ḡCa m∞ (v)h∞ (v)(v − ECa ) , gKCa (Ca) = ḡKCa , (3.36)
Kd + Ca
and n∞ , m∞ , h∞ are standard H-H-type equilibrium functions:
1 1 1
n∞ (v) = Vn −v , m∞ (v) = Vm −v , h∞ (v) = v−Vh (3.37)
1+e Sn 1+e Sm
1+e Sh
and the the n-timescale and parameter α are

γ 1
τn (v) = , α= . (3.38)
e
v−V̄
a −e − v−b V̄ 2VCell F
The model has a potassium current IK = ḡK n(v − EK ), a fast transitory calcium current ICa and a
very slow calcium-dependent potassium current IKCa = gKCa (Ca) · (v − EK ). Intracellular calcium
Ca affects the conductance via (3.36) and has its own dynamics as given in the last equation of
(3.35c). This system differs from the version in [150, §9.1.1]) in that the gating variable n appears
80
ḡCa = 1400 p S ECa = 110 mV
ḡK = 2500 p S EK = −75 mV
ḡKCa = 30000 p S
C = 6310 f F VCell = 1150 µm3
F = 96.487 Coul/mMol Kd = 100 µMol
δ = 1.6 kCa = 0.03 m/s
Vn = −15 mV Sn = 5.6 mV
Vm = 4 mV Sm = 14 mV
Vh = −10 mV Sh = 10 mV
a = 65 mV b = 20 mV
γ = 60 ms V̄ = −75 mV
ǫ = 0.001
Table 3.1: Parameter values for the SRK model for bursting pancreatic β-cells.
linearly in (3.35a) rather than as n4 . Specific parameter values are given in Table 3.1 and our
presentation is taken from [94].
The calcium concentration Ca enters equation (3.35a) via the Hill-type function gKCa (Ca) of
(3.36), and the fast system becomes simpler if one defines a new slow variable c in place of Ca, so
that the calcium-dependent potassium current IKCa = gKCa (Ca)(v − EK ) is linear in c:
Ca
c= . (3.39)
Kd + Ca
(In fact, since Kd ≫ Ca, the conductance gKCa is essentially proportional to c and (3.39) is in
its linear régime.) Differentiating (3.39) we find ċ = (KdKd ˙ and inverting (3.39) to obtain
Ca,
+Ca)2
c
Ca = Kd 1−c , the ODEs (3.35) become:
1
v̇ = [−ḡK n(v − EK ) − ICa (v) − ḡKCa c(v − EK ) + Iext ] ,
C· ¸
n∞ (v) − n
ṅ = δ , (3.40)
τn (v)
(1 − c)2
· ¸
c
ċ = ǫ −αICa (v) − kCa Kd ,
Kd 1−c
where we have added an external (bias) current Iext and ICa (v) is given in (3.36) above.
To understand the mathematical origin of bursting, first recall that we assume ǫ ≪ δ ≪ 1/C
(cf. the values in Table 3.1). Hence we can “freeze” c (or Ca) and consider the phase plane of the
(v, n) subsystem for different values of c in its operating range. This subsystem is similar to the
two-dimensional reductions of the H-H equation of §3.4, and it possesses a stable limit cycle (with
relatively fast v and slower n dynamics) over a range of c values. The spiking frequency is basically
set by the values of C and δ, which control the (relatively) slow flow along the v̇ = 0 nullcline and
the fast jumps along n = constant lines (cf. Figs. 3.9-3.10), but it can also vary with c. More
significantly the limit cycle can disappear in local and global bifurcations as c varies. Bursting
results from hysteretic transitions, driven by the slow variable c, between a quasi-static quiescent
81
state of the fast (v, n) subsystem in which v remains subthreshold, and a periodic (spiking) state.
Here c is regarded as a bifurcation parameter.
The branch of equilibria (or slow manifold) that forms the bifurcation diagram is obtained by
setting v̇ = ṅ = 0, implying that n = n∞ (v) and ḡK n∞ (v)(v − EK ) + ICa (v) + ḡKCa c(v − EK ) = Iext ,
or
Iext − ḡK n∞ (v)(v − EK ) − ICa (v)
c= . (3.41)
ḡKCa (v − EK )
This yields a cubic-like curve in the (c, v) plane with two folds. Laborious analyses, or numerical
explorations, reveal that saddle-node bifurcations occur at the folds, and that a Hopf bifurcation
occurs on the upper segment of the curve towards the left. See Fig. 3.13. The lower segment (to
the right of the saddle-node) consists of stable sinks, the middle segment contains saddle points,
and the upper one, between the Hopf bifurcation point and the saddle node, is filled with unstable
sources.
−10 −20
−20 −30
H
−30
−40
SN
−40
−50
V
c’=0
−50
−60
−60 SN
−70
V’=0
−70
−80
0 0.002 0.004 0.006 0.008 0.01 0 1 2 3
c t 4
x 10
Figure 3.13: Left: A branch of equilibria (red) for the “frozen c”system and bifurcations with
respect to c; note the two saddle nodes (SN) and a Hopf (H) bifurcation. The v̇ = 0 and ċ = 0
nullclines and a typical bursting trajectory are projected onto the (c, v) plane. Right: The voltage
time history exhibiting periodic bursts. Adapted from [94, Fig 11].
To see how c evolves we examine the nullclines of the ċ equation. We have ċ = 0 on c = 1 and
on
αḡCa m∞ (v)h∞ (v)(v − EK )
c= , (3.42)
αḡCa m∞ (v)h∞ (v)(v − EK ) − kCa Kd
and away from c = 1 the sign of ċ (and hence the direction in which c moves) is determined by
the sign of the RHS of (3.42). The curve defined by (3.42) is also shown in Fig. 3.13: c moves
82
to the right above and to the left of it, and moves to the left below it and to the right. Starting
from initial data with the fast system state at a sink on the lower branch of Fig. 3.13, c will slowly
decrease, while the fast state tracks the slow manifold of equilibria, until it arrives at the saddle
node point, where the equilibrium disappears and the solution jumps rapidly up in v and converges
on the limit cycle, leading to a burst of spikes. Since v lies above the ċ = 0 nullcline in this regime,
c now begins to increase, leading to a (slight) slowing of the spike rate, and ultimately “collision”
of the limit cycle with the intermediate saddle point in a global, homoclinic bifurcation [107, §6.1]
(to be described in class). Spiking terminates, the solution drops onto the lower segment of the
slow manifold initiating the refractory period, and the process repeats.
a1) b1)
20 20
v [mV]
v [mV]
40 40
60 60
80 80
0 1 2 3 4 5 0 1 2 3 4 5
4 4
a2) x 10 b2) x 10
20 20
v [mV]
v [mV]
40 40
60 60
80 80
0 1 2 3 0 1 2 3
4 4
x 10 x 10
a3) b3)
20 20
v [mV]
v [mV]
40 40
60 60
80 80
0 1 2 3 4 0 1 2 3 4
Time [ms] x 10
4 Time [ms] x 10
4
Figure 3.14: Bursts in the SRK model. Panels (a1 , b1 ) Duty cycle changes due to ḡKCa : ḡKCa =
30000, left and 41750, right. (a2 , b2 ) Effect of δ on spike rate and hence numbers of APs: δ = 1.7, left
and 1.55, right. (a3 , b3 ) Effect of added external currents on AP numbers and bursting frequency:
Iext = 0, left and −550, right. From [94, Fig 15].
As we have noted, the parameters C and δ primarily control spike rates within a burst, and the
(very) small parameter ǫ clearly controls the rate of evolution of c, and hence the overall bursting
period: Tburst ∼ 1/ǫ. The other parameters, including conductances and the external current,
influence the behavior in more suble ways. The duty cycle – the fraction of Tburst occupied by
spiking – is controlled by the relative speeds with which c changes on the upper and lower segments
of the slow manifold, and this in turn is controlled by how close the ċ = 0 nullcline lies to the
those segments. From (3.41) we see that Iext and the conductance ḡKCa affect the position of the
latter, while αḡCa and kCa Kd influence the ċ = 0 nullcline (3.42). The effect of ḡKCa is simple: as
the denominator in (3.41) it scales the horizontal extent of the slow manifold and its bifurcations
83
in Fig. 3.13, so that increases in ḡKCa bring the homoclinic bifurcation leftward (and vice versa).
If the ċ = 0 nullcline is closer to the lower branch, this reduces the burst duration more than the
refractory period, thereby decreasing the duty cycle (the bursting frequency also changes slightly).
See Fig. 3.14(a1 -b1 ).
Changes in membrane capacitance C have little effect on AP numbers (although they can drasti-
cally affect AP magnitudes), but decreasing the parameter δ reduces the number of APs from 22 to
2-3; this is accompanied by a moderate increase in bursting frequency: Fig. 3.14(a2 -b2 ). The bias
current Iext also has a strong influence, permitting adjustment of AP numbers without drastically
changing the bursting frequency: Fig. 3.14(a3 -b3 ).
Note that if c (or Ca) is taken constant at a value in the range in which there are three equilibria
in the bifurcation diagram of Fig. 3.13, then the fast equations (3.35a-3.35b) have coexisting stable
equilibria (hyperpolarized states) and periodic spiking states. Such behavior is called bistable: the
neuron can be “kicked” from inactivity to activity or vice versa by transient stimuli, thus providing
short-term memory at the level of a single cell.
Both Wilson [277, Chap. 10] and Keener and Sneyd [150, Chap. 9] include dynamical taxonomies
or classifications of different types of bursters based on the bifurcations that the fast u subsystem
undergoes as the slow c variable drifts back and forth, and [277, §10.1] contains a nice description
of spike frequency adaptation due to a slow “afterhyperpolarizing” or IAHP current. For (many)
more details and analyses of general bursting models, also see [94].
3.6 Propagation of action potentials
Hodgkin and Huxley’s paper [126] also contains a model for the propagation of the AP along
the axon, viewed as a one-dimensional continuum. The ODEs (3.22) and their simplifications
considered in §§3.3-3.4 assume a spatially-uniform potential difference across the cell membrane1 .
While this may be reasonable for compact cells, or for the soma of a neuron, it is clearly wrong for
cells with extensive axonal or dendritic processes and for rapid potential changes, and it cannot,
of course, account for propagating APs that travel as waves along the axon and dendrites. A
spatio-temporal description is needed.
The spatial description of current Im passing through the cell membrane is given in terms of
transmembrane potential v by the cable equation:
µ ¶
∂ 1 ∂v
Im = . (3.43)
∂x R ∂x
(The diffusion or heat equation of §3.2 appears again!) Here R = Ri + Re , which may depend
on x, denotes the sum of the intra- and extracellular resistences per unit length (the intra- and
extracellular media near the membrane carry the current along the cable). This equation may be
derived as follows (cf. [150, §4.1]): first consider a short (infinitesimal) segment [x, x + dx] of length
1
As noted in §3.3, in some of their experiments Hodgkin and Huxley enforced spatial uniformity in the giant axon
of the squid by threading a thin wire down its interior (voltage clamp).
84
dx, having intra- and extracellular resistences Ri dx, Re dx, and denote the intra- and extracellular
voltages at either end as vi (x), ve (x) and vi (x + dx), ve (x + dx): Fig. 3.15(a). With the convention
of positive current flow from left to right (x to x + dx), Ohm’s law implies that
vi (x + dx) − vi (x) = −Ii (x)Ri dx and ve (x + dx) − ve (x) = −Ie (x)Re dx, (3.44)
where Ii , Ie denote the internal and external currents in the axial direction. Dividing by dx and
taking the limit dx → 0, as in calculus, gives
1 ∂vi 1 ∂ve
Ii (x) = − and Ie (x) = − . (3.45)
Ri ∂x Re ∂x
Figure 3.15: a) Current flow along an axon. b) Schematic for the Kirchhoff law current balance.
We next balance the axial currents Ii , Ie at x and x + dx with the transmembrane current,
appealing to Kirchhoff’s laws. If Im is the transmembrane current per unit cable length, this gives
Ie (x + dx) − Ie (x) = Im dx = Ii (x) − Ii (x + dx) : (3.46)
in words: what is lost in external current is gained in internal current. See Fig. 3.15(b) and recall
that outward transmembrane currents are positive: §3.2. Dividing by dx and taking the limit again
gives
∂Ie ∂Ii
Im (x) = =− . (3.47)
∂x ∂x
We now use the facts that v = vi − ve and the total axial current Ia = Ii + Ie is constant to
eliminate explicit reference to the internal and external currents and voltages. From (3.45) and
ve = vi − v we get
µ ¶
1 ∂vi 1 ∂(vi − v) Ri + Re ∂vi 1 ∂v
Ia = Ii + Ie = − − =− +
Ri ∂x Re ∂x Ri Re ∂x Re ∂x
µ ¶
Re Ia 1 ∂vi 1 ∂v
⇒ =− + . (3.48)
Ri + R e Ri ∂x Ri + Re ∂x
Since ∂Ia /∂x = 0 (Ia constant), this gives
· ¸ ·µ ¶ ¸ · ¸
∂ 1 ∂vi ∂ 1 ∂v ∂ Re
= − Ia . (3.49)
∂x Ri ∂x ∂x Ri + Re ∂x ∂x Ri + Re
85
If the resistances are constant, or if Re ≈ 0 (e.g. the axon is isolated in a large bath of fluid, as
assumed in [126], cf. [150, §5.1]) the final term is zero and substituting Ii for ∂vi /∂x from the first
of (3.45) and using (3.47) we obtain
·µ ¶ ¸ µ ¶
∂Ii ∂ 1 ∂v ∂ 1 ∂v
Im (x) = − = = . (3.50)
∂x ∂x Ri + Re ∂x ∂x R ∂x
To connect this with the H-H ODEs (3.22) we recall that the transmembrane current Im is the
sum of the capacitive current C ∂v
∂t and the ionic and applied currents, giving:
µ ¶ µ ¶
∂v ∂ 1 ∂v
p C + Iion − Iapp = . (3.51)
∂t ∂x R ∂x
Here the LHS is multiplied by the perimeter p of the axon to put all terms into the same units:
µAmps/cm (recall that Ri and Re are resistances per unit length of the cable and so have units
mOhms/cm, the units of 1/R are mSeimens·cm, and mSeimens·cm×mV/cm2 = µAmps/cm). Par-
tial derivatives are used here and above, because v = v(x, t) and the currents also depend on t and
x. Finally, note that the sign of Iapp in (3.51) is chosen for consistency with (3.22), to which (3.51)
reduces if v does not depend on x. Eqn. (3.51) is solved along with the ODEs (3.22c-3.22d) for the
gating variables.
Like ODEs, PDEs such as (3.51) must be supplemented by intial conditions to be well-posed
and admit unique solutions, but they also require boundary conditions, specifying the states which
are assumed at the ends or edges of the spatial domain. Here we are considering one-dimensional
cables (axons) and so we must specify the voltages at the end points. For endpoints x = 0, L typical
cases are:
v(0, t) = v0 , v(L, t) = vL (voltage clamp), (3.52)

¯ ¯
∂v ¯¯ ∂v ¯¯
= 0, = 0 (insulated ends), (3.53)
∂x ¯(0,t) ∂x ¯(L,t)
¯
∂v ¯¯
or = −Ri Ii (0, t) (current injection at x = 0); (3.54)
∂x ¯(0,t)
and a suitable initial condition is that the axon starts in the resting state
v(x, 0) = vrest , (3.55)
where vrest satisfies gK (V − VK ) + gN a (V − VN a ) + gCl (V − VCl ) = 0 with gK = ḡK n4∞ (v), gN a =

ḡN a m3∞ (v)h∞ (v), and Iapp = 0 (cf. Eqns. (3.21) and (3.22a)).
See [150, §§4.1-2] for more on the derivation of (3.51), and see Chap. 8 in general, and [146,
Chap. 4] for additional material. Abbot and Dayan [58, §6.3] also derive (3.43), but not in detail.
Wilson [277, Chap. 15] also has a brief derivation, with more information on dendritic trees and
spatially-discretized multicompartment models of neurons. W. Rall recognised as early as 1959
that dendritic tree structures could modulate incoming signals [209, 210], and it is now known that
passive and active, linear and nonlinear dendritic effects, including back-propagation of APs, can
affect intercellular communication. This is referred to as dendritic computation [176].
86
3.6.1 Traveling waves in a simple PDE
Equation (3.51) is an example of a reaction-diffusion equation (RDE), the simplest types of

which take the form
∂u ∂2u
= + f (u), (3.56)
∂t ∂x2
where f (u) describes the “reaction” or source term which depends on the state u(x, t). Such
equations describe the propagation of waves in active media, in contrast to the classical wave and
heat equations
∂2u 1 ∂2u ∂u 2
2∂ u
= and = α (3.57)
∂t2 c2 ∂x2 ∂t ∂x2
which describe the evolution of disturbances in passive media. The following example and exercise,
while too simple to capture the rise and fall of an AP, illustrate a monotonic traveling front that
can occur in a chemical reaction such as combustion.
Example 9. Consider the bistable RDE
∂u ∂2u
= + u − u3 , (3.58)
∂t ∂x2
so called because in the spatially-uniform case (∂ 2 u/∂x2 ≡ 0) (3.58) becomes the ODE u̇ = u − u3
has two stable states ue = ±1 separated by an unstable fixed point ue = 0. These spatially uniform
states ue (x) = ±1 and ue (x) = 0 are also equilibria for the RDE, but, as for nonlinear ODEs it is
usually difficult to find time- (and space-) dependent solutions.
Here we can find space-dependent equilibria: solutions of the boundary value problem (BVP):
∂2u
¯ ¯
3 ∂u ¯¯ ∂u ¯¯
+ u − u = 0, = =0 (3.59)
∂x2 ∂x ¯(0) ∂x ¯(L)
∂u ′ ∂2u ′′ ′
(we assume insulated boundaries). Indeed, writing ∂x = u and ∂x2
= u and multiplying by u , we
may integrate (3.59) once:
′
(u )2 u2 u4
Z Z
′′ ′ 3 ′
u u dx = − (u − u )u dx ⇒ + − = C, (3.60)
2 2 4
where C is a constant of integration. From (3.60) we get
s µ
u2 u4
¶
du du
Z Z
′
u = = 2 C− + ⇒ p = dx = x − x0 : (3.61)
dx 2 4 2(C − u2 /2 + u4 /4)
a separable first order ODE. A second constant of integration, x0 has appeared, and it and C are
′ ′
implicitly determined by the boundary conditions u (0) = u (L) = 0 of (3.59). But alas, only for
special values of C can the integral in (3.61) be evaluated in terms of elementary funtions (it is a
Jacobian elliptic function – a special function – in general). However, we can deduce qualitative
properties of solutions of the BVP by plotting level sets of its first integral (3.60): Fig. 3.16.
The closed curves surrounding the origin (u, u′ ) = (0, 0) for 0 < C < 1/4 correspond to periodic
orbits, whose periods increase monotonically from 2π to infinity as C goes from 0 to 1/4 (the proof
87
1.5 1
0
1 C increasing
−1
0 2 4 6 8 10
0.5
C=1/4 1
C=0
u’
0 0
−1
−0.5 0 2 4 6 8 10
0.5
−1
0
−1.5 −0.5
−1.5 −1 −0.5 0 0.5 1 1.5 0 2 4 6 8 10
u
Figure 3.16: Level sets of (3.60) (a); solutions of the BVP (3.59) must begin and end on the
′
horizontal axis u = 0. The first three stationary solutions (b).
of this relies on properties of Jacobian elliptic integrals), and to satisfy the boundary conditions of
′
(3.59) segments of these closed curves that start and end on u = 0 must be selected. Potentially
one can have 1/2, 1, 3/2, etc. full turns, but these must be “fitted in” to the length L of the domain,
and so no such solutions exist if L < π.
Time-dependent solutions are much harder to find, and here we only consider the special case of
uniform traveling waves (TWs): spatial structures that move at constant velocity c and so may be
written in the special form u(x, t) = U (x − ct). Note that if c > 0 the wave travels in the direction
of increasing x (from left to right). Letting ζ = x − ct represent the wave-frame coordinate and
denoting derivatives with respect to ζ dU/dζ = U ′ , from the chain rule we have:
∂U ∂(x − ct) ∂U ∂(x − ct) ∂2U

= U′ = U ′, = U′ = −cU ′ , and = U ′′ . (3.62)
∂x ∂x ∂t ∂t ∂x2
Substituting these into (3.58) gives the ODE
−cU ′ = U ′′ + U − U 3 ⇒ U ′′ + cU ′ + U − U 3 = 0 : (3.63)
analogous to a nonlinear oscillator with linear damping with the independent variable ζ replacing
time. The phase plane of (3.63) for c > 0 is shown in Fig. 3.17. We shall seek TWs defined on the
real line −∞ < ζ < ∞ (a reasonable simplification for cables (axons) that are long in comparison
to their diameter), and require that U ′ (x) → 0 as x → ±∞, so that they connect uniform states.
Fig. 3.17 illustrates that, for any c > 0 there are two special solutions, branches of the unstable
manifolds of (U, U ′ ) = (±1, 0), that connect those fixed points to the sink at (U, U ′ ) = (0, 0). These
are called heteroclinic orbits, and they correspond to TWs in which U rises from −1 to 0 and
drops from +1 to 0 respectively. If c < 0 the sink becomes a source and the connections flow, and
waves propagate, in the opposite direction. Only for c = 0 do heteroclinic orbits connecting the
equilibria (U, U ′ ) = (±1, 0) exist.
88
1 1
0.8 0.5
0.6 0
0.4 −0.5
0.2
−1
0 10 20 30
0
1
−0.2
−0.4 0.5
−0.6 0
−0.8 −0.5
−1 −1
−2 −1 0 1 2 0 10 20 30
Figure 3.17: Phase plane for (3.63) with c > 0 (left): the traveling waves of interest must begin at
either of the fixed points (U, U ′ ) = (±1, 0) and end at (U, U ′ ) = (0, 0). Profiles of the heteroclinic
orbits (right).
Exercise 21. Describe how the form of the heteroclinic traveling waves changes as wavespeed c
increases for the bistable RDE (3.58). Show that there is a critical speed cmon at and above which
the wave profile is monotonic (unlike the oscillations in Fig. 3.17).
See [150, Chap. 4] and [75, Chap. 6] for more information on propagating action potentials.
3.6.2 Traveling waves in the FitzHugh-Nagumo equation
Rather than tackling the spatially-dependent H-H equations directly, we will consider the sim-
1 ∂2v
plified F-N version. Adding the “diffusive” term pR ∂x2
to (3.30a) and assuming constant resistance
per unit length along the axon, we have:
1 ∂2v v3
µ ¶
∂v 1
= +v− − r + Iapp , (3.64a)
∂t τv pR ∂x2 3
∂r 1
= (−r + 1.25v + 1.5). (3.64b)
∂t τr
def
Recalling from §3.4 that τv ≪ τr we√define a small parameter ǫ = τv /τ √r . We then rescale time and
∂
space by letting t̄ = t/τr and x̄ = (ǫ pR)x so that ∂t ∂
= τ1r ∂∂t̄ , ∂x = (ǫ pR) ∂∂x̄ and (3.64) becomes
∂v ∂2v v3 def ∂2v

ǫ = ǫ2 2 + v − − r + Iapp = ǫ2 2 + f (v, r), (3.65a)
∂ t̄ ∂ x̄ 3 ∂ x̄
∂r def
= (−r + 1.25v + 1.5) = g(v, r). (3.65b)
∂ t̄
89
This subterfuge involves choosing time and space scales in which the rise and fall of the action
potential are equally steep in time and space (it also reduces the number of parameters by three).
Since, as in §3.6.1, we want the length of the axon (cable) to be long in comparison to its diameter,
which is consistent with assuming a high resistance R ≫ 1, the rescaling makes physical sense.
√ (If
the space domain in the original coordinates is 0 ≤ x ≤ L, in the new ones it is 0 ≤ x̄ ≤ (ǫ pR)L.)
We seek a uniform traveling wave with voltage and recovery variable profiles v(x̄, t̄) = V (x̄ − ct̄)
and r(x̄, t̄) = R(x̄ − ct̄). Proceeding as in Example 9, we obtain the pair of ODEs
ǫ2 V ′′ + ǫcV ′ + f (V, R) = 0, (3.66a)

′
cR + g(V, R) = 0, (3.66b)
∂()
where ()′ = ∂(x̄−c t̄) . This is essentially a third order ODE whose analysis would take us too deeply
into the technicalities of singular perturbation theory [20, 147] (although it might make a nice
project for the more mathematically-inclined). Briefly, since ǫ ≪ 1, the “waveframe dynamics”
of the V -ODE is fast compared to that of R, so that we may assume R to remain approximately
constant while V evolves rapidly towards a point on the nullcline f (V, R) = 0, according to (3.66a).
After V equilibrates, R changes slowly according to (3.66b) and the solution moves along close to
the nullcline until the bend, where it jumps rapidly to the other attracting branch, much as in
Fig. 3.12.
Keener and Sneyd [150, §§6.3-4] explicitly solve a version in which the cubic V -dependence
in f (V, R) is replaced by a piece-wise linear function representing the outer, negatively-sloped
branches, as was done in [222]. They assemble exponential solutions by choosing constants of
integration to satisfy matching conditions; they also sketch a geometrical approach to the nonlinear
problem. Ermentrout and Terman [75, §6.3] give a shorter account of this problem. We also note
that the F-N PDE exhibits a Hopf bifurcation to periodic traveling waves [221].
Exercise 22. Use the methods described in this section and in the references [150, §§6.3-4] and [75,
§6.3] given above to investigate traveling waves in the FitzHugh-Nagumo (F-N) equation. Start by
considering the piecewise linear version of F-N as in [150, §6.3.1]. [This could become a substantial
course project.]
3.7 Concluding remarks
In this section we have described relatively detailed single compartment models of isolated neu-
rons that reproduce the dynamics of ionic channels. In addition to the PDE descriptions introduced
in §3.6, multicompartment ODE models have also been developed to capture the morphology of
branching dendrites and axons. The simulation environment NEURON enables construction of
complicated cell geometries containing hundreds of compartments, and their assembly into inter-
connected neural networks (http://www.neuron.yale.edu/neuron/).
Complementing the inclusion of cell morphology, the original Hodgkin-Huxley model of the
squid’s giant axon has been extended to incorporate additional ionic currents. These typically
involve multiple gating variables (sometimes ten or more) in addition to the membrane voltage,
and analytical studies are not feasible. Numerical studies of such systems, with their multiple time
90
scales, are also quite challenging. Reductions based on removal of fast variables are helpful, but
even for the two-dimensional cases described in §3.4 it is hard to find fixed points analytically.
A class of substantially simplified integrate-and-fire models has therefore been proposed, which
neglect the details of spike dynamics, and only represent voltage variations during the refractory
period. These are particularly useful in modeling networks and small circuits, and we defer our
discussion of them to the next chapter, after describing models for synaptic connections.
91
Chapter 4
Synaptic connections and small

networks
We now move from single cell models to considering networks, starting with a review of models
for chemical synapses, in which neurons communicate via release and reception of neurotransmitter
molecules junctions, and of direct electrical connections or gap junctions.
4.1 Synapses and gap junctions
Synapses are structures in neurons that allow communication of signals with other neurons.
There are more than 1011 neurons in the human brain, and on average 1000 synaptic connections
from each neuron. Some cortical neurons receive up to 10,000 inputs to their dendritic tree, and
Purkinje fibers in the cerebellum receive up to 100,000 inputs. Sherrington originally thought all
synapses were electrical in nature, but in the 1920’s, Otto Loewi discovered the chemical synapse,
showing that signals from the vagus nerve to the heart are conveyed by acetylcholine (ACh). A
debate ensued as to whether synapses were all electrical or all chemical, but it slowly became
clear that there are two distinct categories: chemical and electrical. J.C. Eccles, co-winner of the
1963 Nobel Prize with Hodgkin and Huxley, who had studied with Sherrington and adopted his
ideas, subsequently did experiments that demonstrated chemical synapses, and, with Bernard Katz,
performed further studies of acetylcholine.
Electrical synapses involve direct contact of cytoplasm of two distinct cells and allow simple
depolarizing signals. Chemical synapses involve the release of neurotransmitter from a presynaptic
neuron and its reception at another, postsynaptic neuron, resulting in the generation of excitatory
or inhibitory postsynaptic potentials (EPSPs or IPSPs), as noted in §1.2. A single EPSP is typ-
ically too small to drive a hyperpolarized neuron above threshold, but multiple EPSPs cause the
postsynaptic cell to spike. IPSPs drive the voltage down and tend to prevent or delay spiking.
Chemical synapses are substantially slower than electrical synapses, but allow more complicated
behavior. In particular, their synaptic plasticity is crucial to learning, since it allows connections
among cells (and hence brain areas) to weaken or strengthen in response to experience. As synapses
92
change strength, pathways change, as we hope yours will in this course.
4.1.1 Electrical Synapses
Electrical synapses, also called gap junctions, allow neurons to communicate directly via cyto-
plasmic contact in channels at which the cells are only 3.5 nm apart. Gap junction channels are
specialized protein structures, having pores of diameter ≈ 1.5 nm and a pair of hemichannels (or
connexons), each made of six connexins: proteins that span their respective cell membranes. The
resulting channel opens and closes through the changing orientation of the connexins. It also allows
dyes and tracking particles to pass from one neuron to another, permitting study of the network.
Electrical synapses provide fast, bidirectional communication, due to the electrotonic transmis-
sion. The change in postsynaptic potential depends directly on the size and shape of the change in
presynaptic potential, as well as on the effective input resistance of the postsynaptic area. A small
postsynaptic neuron or extension will have a high input resistance and respond more strongly to
a small current through the channel. In general, electrical synapses are depolarizing (excitatory)
through the transmission of the increased potentials corresponding to spikes, but action potentials
with hyperpolarizing after-potential create inhibitory hyperpolarization. Bidirectionality results
from the cytoplasmic continuity, although there are exceptions: certain channels can also close in
response to voltage changes, permitting transmission of spikes in only one direction; these are called
rectifying synapses. Also, to isolate damaged cells, gap-junction channels close in response to low
cytoplasmic pH or high Ca2+ , both of which are signs of cell damage.
Electrical synapses appear where rapid response is required. Escape reflexes are a major example:
in goldfish, the tail-flip giant neuron is connected by electrical synapse to sensory input, allowing
a rapid burst of speed in response to threatening stimuli. Electrical synapses allow connection of
large groups of neurons, so that multiple small cells can function as a larger unit. The resistance
of the coupled network is low, since the cells are all in parallel, so any input results in a small
voltage change in the network, but once there is a sufficient input to cause a spike, all cells spike
synchronously. Rapid electrical synapses thus allow group triggering in an “all-or-none” mode. For
example, ink release in various marine snails is coordinated and synchronized by electrical synapses.
Modeling: Gap junctions are typically modeled as simple resistors, passing a current propor-
tional to the voltage difference between the cells in contact. Hence for the current passing from the
ith to the jth cell one adds to the internal ionic currents a term Igap = +ḡgap (vi − vj ), where ḡgap
represents the conductance of the junction and (vi − vj ) is the potential difference between cell i
and cell j. For a pair of H-H type model cells, this leads to:
C1 v̇1 = −I1,ion (. . .) + I1,ext + ḡgap (v2 − v1 ) , C2 v̇2 = −I2,ion (. . .) + I2,ext + ḡgap (v1 − v2 ) (4.1)
(plus equations for the gating variables of each cell). Note that the gap junction current term is
conventionally shown as +Igap . We consider a pair of simpler integrate-and-fire neurons coupled
by both gap junctions and inhibitory synapses in §4.2.4.
93
4.1.2 Chemical Synapses
Much of the following discussion, including the notation in the modeling section, is drawn
from [58, Chap. 5].
In chemical synapses, neurons are separated by synaptic clefts between presynaptic terminals or
boutons (swellings on the axon) and postsynaptic dendritic spines. The 20-40 nm synaptic cleft is
as wide or wider than the typical intracellular space. Following the arrival of an action potential,
an influx of Ca2+ occurs through voltage-gated channels in the active zone of the synapse, causing
synaptic vesicles, in which neurotransmitter molecules are stored, to fuse with the cell membrane
and release their contents. This process, called exocytosis, allows amplification of signals: one
vesicle releases thousands of neurotransmitter molecules, which can open many ion channels and
thereby depolarize a much larger cell than is possible with gap junctions. Bernard Katz, a coauthor
on two of the H-H papers [127, 128], discovered this “quantal” neurotransmitter release in packets
that correspond to the size of synaptic vesicles; his Nobel Prize was awarded in 1970.
After release, neurotransmitter molecules diffuse across the synaptic cleft to reach and activate
receptors: proteins that span the postsynaptic cell membrane. The receptors then open or close
ion channels in the postsynaptic neuron, which, depending on the channel, can produce excitatory
(depolarizing) or inhibitory (hyperpolarizing) effects: EPSPs or IPSPs, as noted above. This chain
of processes causes delays that range from 0.3 ms to several ms. Additional delays occur on the
postsynaptic side, where there are two methods of opening channels. The effects on the postsynaptic
neuron depend primarily on properties of its receptor circuits. Neurotransmitters act somewhat
like hormones, but their effect is generally faster and much more precise. Specialized release and
reception sites are less common in the autonomic nervous system, where neurotransmitters act
more slowly and over a more diffuse area.
Ionotropic receptors have directly ligand-gated ion channels that allow fast responses within
milliseconds; they typically employ the neurotransmitter ACh and appear in circuits which require
rapid behavior. Ionotropic receptors fall into three general classes: cys-loop receptors, ionotropic
glutamate (glutamic acid) receptors, and adenosine triphosphate (ATP)-gated channels. The first
class subdivides into anionic and cationic groups; the most widely modeled among the former
are inhibitory Gamma-aminobutyric acid (GABAA ) receptors. The latter are neuromodulatory
receptors, e.g. using serotonin. The excitatory glutamatergic class contains 2-amino-3-hydroxy-5-
methyl-isoxazolepropanoic acid (AMPA), N-methyl-D-aspartic acid (NMDA) and kainate receptors.
AMPA exhibits significantly faster activation and deactivation than NMDA. NMDA receptors ad-
ditionally depend on the postsynaptic potential, which, as discussed below, has implications for
synaptic plasticity and learning.
[PH: CHECK WITH KongFatt! Drop (ATP)-gated ref ??]
Metabotropic receptors act indirectly: neurotransmitter arrival starts a cascade of “second mes-
sengers” which eventually open or close postsynaptic ion channels, causing effects that can last
seconds to minutes, depending on properties of the messenger cascade. Norepinephrine (NE) acts
in this way at synapses in the cerebral cortex. The intracellular cascade, usually involving G-
proteins, requires more complex modeling. GABA activates both ionotropic and metabotropic
inhibitory receptors. GABAB is a widely-modeled G protein-coupled metabotropic receptor.
94
Modeling: As for gap junctions, neurotransmitter arrival at a synapse can be modeled as
a current source in the postsynaptic neuron. As described above, neurotransmitter molecules
bind to receptors, causing ion channels to open, permitting a current to flow and producing a
postsynaptic potential. The major variable influencing the synaptic current is the conductance of
the postsynaptic ion channels, which is typically modelled as the product of a maximal conductance,
ḡs , with the probability of being open, Ps . Ps can represent the probability of a single channel being
open, or the fraction of open channels in a larger group. Once the transmitter unbinds from the
receptor, the channels close. Since ḡs is a constant, the dynamics occur in Ps . This can be modeled
like the gating variables in the H-H equations (3.22):
dPs
= αs (1 − Ps ) − βs Ps , (4.2)
dt
where αs and βs respectively determine the rates at which channels open and close (e.g. if the
fraction of closed channels is 1 − Ps , then the net opening rate is αs (1 − Ps )): see [64, p. 15] (cf. [58,
p. 180]).
Time constants of the relevant channels can be determined by in vitro experiments by phar-
macologically blocking specific receptors. For example, AP5, CNQX and bicuculline are so-called
antagonists of NMDA, AMPA and GABAA receptors, respectively, which act disrupt their receptor
functions. An alternative and powerful method is to “dynamically clamp” conductances using a
real-time interface between cell and a computer. See [206] for a discussion of this technique.
In modeling neurons as single compartments, one typically lumps the collective properties of all
the presynaptic cell’s synapses with a given postsynaptic cell, so that Ps denotes the fraction of open
post
postsynaptic channels, and the resulting synaptic current Isyn (t), to be introduced in Eqn. (4.9)
below, represents the net effect of all individual EPSPs and IPSPs arriving at time t.
Channels typically open more rapidly than they close, so αs ≫ βs . The closing rate βs is
generally assumed to be constant, but the opening rate αs depends on the concentration of neuro-
transmitter in the synaptic cleft, and may be expressed as a constant ᾱs multiplied by a function
describing how neuro transmitter concentration depends on the presynaptic voltage v, e.g., as in
CN T,max def
αs (v) = pre = ᾱs G(v), (4.3)
1 + exp[−kpre (v − Esyn )]
from [64, p. 5]. In this sigmoidal function CN T,max represents the maximal neurotransmitter
pre
concentration in the synaptic cleft, kpre sets the “sharpness” of the switch, and Esyn sets the
voltage at which it opens.
Due to the rapid rise of the AP, neurotransmitter fills the cleft to its full concentration faster
than any other timescale in the model, so it behaves like an instantaneous switch that turns on and
pre
off as v passes up and down through Esyn . We can therefore simplify by modeling the transmitter
concentration as a step function that assumes a fixed value αs for a fixed time T , after which it
resets to 0: 
 0, t ≤ 0,
αs (t) = αs , 0 < t < T, (4.4)
0, t ≥ 0.

Under this approximation, if a presynaptic spike arrives at t = 0 when the postsynaptic open
probability is Ps (0), we can solve Equation (4.2) as a decoupled process, without further reference
95
to the presynaptic cell’s state. Since αs ≫ βs , we may ignore βs for 0 ≤ t ≤ T , which gives
Ps (t) = 1 + (Ps (0) − 1) exp(−αs t) , 0 ≤ t ≤ T . (4.5)
If Ps (0) = 0, this rises to a maximum value Pmax = Ps (T ) = 1 − exp(−αs T ) at time T . (This

formulation also allows derivation of αs experimentally, given the presynaptic voltage and the
measured maximum conductance Pmax .) For t ≥ T , αs = 0, and we simply have exponential decay:
Ps (t) = Ps (T ) exp(−βs (t − T )) . (4.6)
Putting (4.5) and (4.6) together, the postsynaptic current is approximated by an explicit piecewise-
smooth function.
If the synaptic rise time is slower, such that the assumptions behind (4.5) do not hold, we can
approximate the synaptic conductance after a presynaptic spike at t = 0 as the difference of two
exponentials:
Ps (t) = Pmax B(exp(−t/τ1 ) − exp(−t/τ2 )) , t ≥ 0 . (4.7)
The rise and fall in synaptic conductance is achieved by setting τ1 > τ2 . B is a normalization factor
to ensure that the maximum conductance is Pmax .
Exercise 23. What is the correct value for B, in terms of τ1 and τ2 ?
Another expression with a similar form to (4.7) is the so-called “alpha function”
µ ¶
Pmax t t
Ps (t) = exp 1 − , t ≥ 0, (4.8)
τs τs
which starts at zero, rises and peaks at the value of Pmax at time t = τs , and then decays back toward
zero with a time constant τs . All the expressions (4.5-4.8) employ stereotypical functions Ps (t) for
the postsynaptic current that incorporate some physiological detail, but that are automatically
elicited by a presynaptic spike, so that a train of such spikes occurring at times t1 , t2 , . . . produces
X
post post
Isyn (t) = ḡs Ps (t − ti )(v − vsyn ), (4.9)
i
post
where vsyn is the reversal potential associated with the synapse and specific neurotransmitter.
If all the synaptic time constants are short then one may further simplify the model by replacing
the functions Ps (t) with delta functions1 , possibly with a delay to account for neurotransmitter
transport across the synaptic cleft, so that (4.9) becomes:
X
post
Isyn (t) = ḡs δ(t − τs − ti ); (4.10)
i
we illustrate this in §4.2.4 below.
Other factors that arise in modeling synapses include postsynaptic voltage effects for NMDA
receptors. If the postsynaptic neuron is near resting voltage, the channels will not open for a presy-
naptic spike. For the channels to open, there must be activity in both presynaptic and postsynaptic
1
See §5.1.1 for a brief description of the delta function.
96
neurons. Once open, the NMDA receptors allow Ca2+ ions to enter, which increases the long-term
synaptic strength. Such modification of synaptic responses to presynaptic spikes is called synaptic
plasticity, and it includes both long-term effects as seen in NMDA receptors, and short-term ef-
fects, which are also known as synaptic facilitation and depression. Synaptic facilitation describes
stronger postsynaptic response to a presynaptic spike that comes a short time after previous spikes,
and synaptic depression is seen in a significantly lowered responses to each successive spike in a
train. Short-term effects can be modeled by adding a second synaptic conductance term, Prel ,
corresponding to the probability of neurotransmitter release at a presynaptic spike. In the absence
of presynaptic spikes, Prel decays to a resting value, as modeled by:
dPrel
τP = P0 − Prel . (4.11)
dt
For facilitation, after every presynaptic spike, we replace Prel by Prel +fF (1−Prel ), with 0 ≤ fF ≤ 1
representing the degree of facilitation. For depression, replace Prel by fD Prel , also with 0 ≤ fD ≤ 1.
Finally, the synaptic current at the jth (postsynaptic) cell due to an AP in the ith (presynaptic)
cell can be incorporated into the postsynaptic voltage equation as follows:
post
C v˙j = −Ij,ion (. . .) + Ij,ext − ḡsyn Prel Ps (vj − Esyn ). (4.12)
In solving this equation we can either take Ps = Ps (t − ti ) as a stereotypical function of the time
elapsed since the ith cell spiked, as given by (4.5-4.6), (4.7) or (4.8). Alternatively, if we wish to
include the dynamics of neurotransmitter release and represent dependence on presynaptic voltage
more accurately, we can use the form (4.3) in (4.2) and solve Eqns. (4.12) and (4.2) as a coupled
system along with the gating equations, as was done, e.g., in [93]).
Further models of synpatic dynamics appear in §4.2.3, but for extensive discussions and details
see Keener and Sneyd [150, Chap. 8] which covers every stage of synaptic transmission, from presy-
naptic calcium currents to neurotransmitter diffusion. Exploring, analyzing, and simulating these
models would be an interesting semester project. Johnston and Wu [146] also cover synapses in great
detail (they occupy about one-third of the book). Chapter 11 concerns presynaptic mechanisms,
Chapter 12 explores the role of calcium in transmitter release. Chapter 13 examines postsynap-
tic mechanisms, and Chapters 14 and 15 go into other synapse-related topics, with Chapter 15
investigating learning and memory.
We also note that, as mentioned in §3.6, the complex structure of dendritic trees and the
locations of synapses can influence how incoming EPSPs and IPSPs change the membrane voltage
in the postsynaptic cell’s soma. Not only do transmission delays occur and dendrite sizes affect
their conductances [209], but EPSPs arriving at nearby synapses interact to produce less excitation
than their sum predicts [211]. Nonlinear interactions due to “shunting” inhibition that changes
membrane conductance can also reduce excitatory currents [210]. See [176] for a review of these
and other dendritic computations, which can result in directionally-selective cells [210, 16] and
coincidence detection in auditory neurons [4]. The linear superposition assumed in Eqn. (4.9)
neglects all such details.
97
4.2 Integrate-and-fire models of neurons and synapses
Much of the complexity of the H-H equations for a single-compartment neuronal model and
reductions of them like the F-N model is due to the relatively detailed ionic channel (gating)
dynamics responsible for the spike dynamics; namely, the variables m, n and h in Eqn. (3.22) These
details are important in determining the shape and duration of the AP, but stereotypical APs
are sufficient for many purposes, especially when they are effectively filtered by slower synaptic
dynamics. Neglecting the details of spike generation allows us to significantly reduce the twenty-
plus H-H parameters (cf. Eqns. 3.24-3.23), and to obtain a more tractable class of systems that
can be more readily analyzed and simulated with less computer time and storage. These integrate-
and-fire (IF) models allow us to probe the dynamics of larger networks, and to develop a range
of models adapted to match different types of cortical neurons. In particular, we can fit specific
input/output functions or current-to-frequency (f − I) curves more easily than by changing the
gating dynamics, or adding further ionic currents. For example, initiation of spiking in typical
cortical neurons is not via the Hopf bifurcation exhibited by the H-H model (but see the modified
Connor-Stevens model [53, 54]).
IF models improve on the firing rate (connectionist) models that we encountered in §2, which
are too crude to allow the study of spike trains. In this “middle-ground” approach we simplify the
AP to a Dirac delta function, so that we need only model the refractory period during which the
membrane voltage v(t) builds toward threshold vth . Upon reaching vth at tk , a delta spike δ(t − tk )
is inserted and v(t) instantaneously reset at its minimum reference value vr . The membrane voltage
then begins to increase again, possibly after a brief absolute refractory period τref . Unlike firing
rate models, IF models generate spike trains and can thus reproduce a variety of spike-timing
phenomena. Furthermore, as will be shown later, they allow inclusion of biophysical cellular and
synaptic components of neural networks so that one may study their emergent behavior and effects
on network function. IF models are therefore widely used in systems neuroscience. They can also
be directly related to firing rate models, e.g. via approximation schemes such as the mean field
approach [216].
Historically, IF models were proposed by Louis Lapicque in 1907 [167] who studied the sciatic
nerve that excites leg muscles in frogs. Theoretical models were defined and analyzed decades later
(e.g., see [249, 153, 154] and [35]).
4.2.1 Integrate-and-fire models for a single cell
The simplest IF model is the perfect integrator of [89]. The subthreshold membrane potential
dynamics is described by the linear ODE
1 t
Z
C v̇ = Isyn + Iapp , for v ∈ [vr , vth ) ⇒ v(t) = vr + (Isyn + Iapp ) dt, (4.13)
C 0
where Isyn is the total synaptic input from other neurons, and Iapp is the intracellular injection
current. A delta function of appropriate weight wδ(t) is superposed on v and the reset rule applied
when v(t) reaches vth . The discontinuity (and effective nonlinearity) appears only when v = vth . We
98
discuss how to implement Isyn in §4.2.3. Fig. 4.1 shows the solution, when an absolute refractory
period τref is included and the input currents are constant in time and noise-free.
Action potential
Membrane potential, v
TISI
vth
vr
Absolute refractory period Time, t

Figure 4.1: Dynamics of a noiseless perfect integrate-and-fire spiking model. When the membrane
potential v reaches threshold vth , a delta spike is inserted is immediately reset. A brief absolute
refractory period is usually included. The firing rate is the inverse of the interspike interval f =
1/TISI .
Exercise 24. Assuming that input currents are noiseless and constant in time, compute the inter-
spike interval TISI (= spiking period) for the perfect IF neuron (4.13).
Lapicque’s [167, 35] work shows that the membrane potential is leaky, with a time constant
τ = RC which is independent of stimulation area, and associates the leak with a parallel R-C
circuit. In the 1960’s, Stein, Knight [249, 153] and others formally defined and analyzed such leaky
integrate-and-fire (LIF) models in which the subthreshold dynamics is described by
C v̇ = −gL (v − EL ) + Isyn + Iapp , for v ∈ [vr , vth ), (4.14)
The additional term in (4.14) includes the leak conductance, gL , and resting potential, EL , of
the cell (EL is the steady-state of v in the absence of input currents). Note that Eqn. (4.14) is
precisely Eqn. (3.22a) less the ionic gating terms for spiking dynamics. Ignoring the reset rule, all
solutions of (4.14) would approach an equilibrium vss = EL + (Isyn + Iapp )/gL . For Isyn = 0 and
Iapp > gL (vth − EL ), vss lies above vth and so the voltage will reach threshold and repetitive spiking
will result. If Iapp < gL (vth − EL ) the voltage will settle at a subthreshold value vss < vth , and
spikes cannot occur in the absence of noise. However, with the addition of noise, v(t) can cross
threshold with finite probability. Fig. 4.1 shows the solution of (4.14) in the noise-free case, with
vss > vth . In both of these examples analytical expressions for the interspike interval TISI (and
hence for the firing rate f = 1/TISI ), can be easily computed (see Exercises 24 and 25).
Exercise 25. Show that, for Isyn = 0 and Iapp > gL (vth − EL ), TISI for the LIF model (4.14) is
given by µ ¶
C vss − vr
TISI = τref + ln ,
gL vss − vth
where the steady-state vss = EL + Iapp /gL . How does the absolute refractory period τref affect the
shape of the f − I curve? Plot f vs. Iapp for fixed vr , vth , C, gL , EL and several τref values to
illustrate.
99
Action potential
Membrane potential, v
TISI
vss
vth
vr
Absolute refractory period

Time, t
Figure 4.2: Dynamics of a noiseless leaky integrate-and-fire spiking model. The steady-state vss
must exceed the threshold vth for spiking to occur. Trajectory of v(t) in the absence of the threshold
shown in red.
The LIF model has subthreshold dynamics that cause v(t) to decerelate as it approaches vss ,
and hence as it approaches the spike threshold vth . This differs qualitatively from the accelerating
pre-spike dynamics of more realistic neuronal models such as the H-H equations, cf. Figs. 3.6
and 3.10a. The shape of v(t) can be improved by incorporating nonlinearity in the model, which
becomes
C v̇ = F (v) + Isyn + Iapp . (4.15)
The simplest case is to make F (v) a quadratic function: F (v) ∼ v 2 [73, 168] or F (v) = k0 (v −
EL )(v − vc ) [169], for k0 > 0 and vth > vc > EL > vr . Suppose Isyn = 0. If Iapp = 0 and v < vc ,
v̇ < 0 and v(t) → EL , while if v > vc , then v̇ > 0 and v(t) will be automatically driven to threshold.
Thus the unstable fixed point vc acts as an effective threshold. When Iapp > 0 exceeds a critical
value, a saddle-node bifurcation occurs, v̇ > 0 for all v ∈ [vr , vth ] and periodic firing occurs. The
QIF model is related to the θ-neuron model and the Type I neuron normal form [73, 69, 108].
QIF models can be parameterized to produce spiking patterns of many different cortical neurons
[140, 141]. Realistic waveforms can also be constructed via exponential integrate-and-fire (EIF)
models [81, 80] where F (v) ∼ exp((v − vth )/∆th )) and ∆th is a quantity that sets the sharpness of
spike initiation. Fig. 4.3 compares f − I curves for several models.
Other IF models include the linear IF model [186, 85, 215] with a constant leak current, and the
spike response model [90, 91] (that accounts for spiking latency) which is closer to a H-H model.
Generalized IF models (GIF) [218, 34], including the QIF model with an additional recovery variable
[139, 141, 242], are used to describe subthreshold dynamics. See also the Abbott-Kepler model [3]
reducing from H-H model to a nonlinear IF model. See [39, 40] for model reviews and [140] for
comparisons among them. See e.g. [112, 28, 27] for details on numerical methods for simulations
of IF models.
4.2.2 Integrate-and-fire models with noisy inputs
Cortical neurons typically spike in an noisy and irregular manner. In reality, they (along with
most other) neurons are continually bombarded by synaptic inputs in the form of small post-
synaptic potentials (PSPs). Thus, Iapp is not constant but more realistically takes the form Iapp +
100
Figure 4.3: f − I curves for different models are compared to each other and the more realistic
Wang-Buzsaki (WB) model [273], which is modified version of the H-H model. (A) Constant current
inputs. (B) Noisy current inputs. Adapted from [81].
101
η(t), where Iapp is the mean input level, which may be excitatory (positive) or inhibitory (negative),
and η(t) is a random process. Rewriting in terms of the noise-free steady state vss = EL + Iapp /gL ,
Eqn. (4.14) becomes
1 η(t)
v̇ = (vss − v) + , for v ∈ [vr , vth ), (4.16)
τm C
where τm = C/gL is the time scale.
If we assume that the PSPs are much smaller than |vth − vreset |, η(t) can be modeled as additive
Gaussian noise and Eqn. (4.16) written as a Langevin equation:
dt σ
dv = (vss − v) + √ dW (t), (4.17)
τm τm
where dW (t) represents independent increments of a Weiner process or Brownian motion [88] (see
Definition 9 in §6.1.2). We shall develop some of the probability and stochastic process theory
necessary to better appreciate such processes in chapters 5 and 6.
Solutions of Eqn. (4.17) can be approximated by the Euler-Maruyama method [118], an extension
of the forward Euler method of §2.2.1 to stochastic differential equations. Upon each timestep of
length ∆t the voltage variable is updated as follows:
r
∆t ∆t
v(t + ∆t) = v(t) + [vss − v(t)] + σ N (0, 1). (4.18)
τm τm
Here the notation N (0, 1) indicates that independent samples are drawn from a Gaussian distribu-
tion with zero mean and unit variance. As in §2.2.1 the deterministic increment in Eqn. (4.18) is
proportional pto the nondimensional time step ∆t/τm , but the stochastic increment must be pro-
portional to ∆t/τm to ensure that each increment of the Weiner diffusion process is normally
distributed with mean zero and variance ∆t/τm . As ∆t → 0 successive iterates of (4.18) converge
to a solution of Eqn. 4.17, although notions of convergence for random variables are more compli-
cated than those discussed in §2.2, and must be formulated in terms of expectations. See [118] for
an introduction with examples and Matlab scripts, and §6.4 below, for (a little) more background
on stochastic differential equations. The Euler-Maruyama method is described in §6.4.4.
Exercise 26. Using Eqn. (4.18), investigate what happens when additive Gaussian noise is included
in Iapp ? How does the f-I curve change with increasing standard deviation σ?
How do we quantify irregularity in spike rates and interspike intervals? The Fano factor, defined
as the ratio F = (variance of ISI) / (mean of ISI) for a fixed time bin T , provides one measure of
variability (you will encounter this again in §5.1.2). Another quantity is the coefficient of variation,
Cv = (standard deviation of ISI) / (mean of ISI). For a homogenous Poisson process, F = 1,
independent of T . In a LIF model, we can create irregularity (large F ) by driving a neuron by
fluctuations, as in the next exercise.
Exercise 27. Use Eqn. (4.18) with fixed, large noise size σ to calculate and compare the Cv values
for inputs of a spike train that follows a Gaussian distribution when the excitatory mean input
current I1 is (i) subthreshold (vss < vth ), (ii) superthreshold (vss > vth ). Also, (iii), consider
combined excitatory (2I1 ) and inhibitory (−I1 ) inputs with net mean value I1 but independent
fluctuations both of standard deviation σ. Is the Cv in (iii) greater than that in (i)? Can you find
102
values of σ and input currents (I1 ) for which the Cv ’s are greater than 1? (There is experimental
evidence that networks can exhibit balances between excitation and inhibition; this is an ongoing
area of research, see, e.g., [236, 256, 260, 8, 237, 33].)
The lower panel of Fig. 4.3 shows how the f − I curves for different models change when noise
is added, and Figs. 4.4 and 4.5 show examples of f − I curves fitted to data from pyramidal and
other cortical cells.
4.2.3 Implementation of synaptic inputs to IF models
There are two major approaches to implementing synaptic inputs: current-based and conductance-
based. There are also two types of current inputs: excitatory and inhibitory, with reversal potentials
Ee and Ei , respectively. For IF models we require Ei < vreset < vth < Ee .
Current-based IF models. In current-based IF models, the inputs are individual spikes

generated by the presynaptic neuron(s), summed over the total number of presynaptic neurons or
synaptic contacts, and over the times at which spikes occur. This simplest representation of this is
to write the total input for the postsynaptic neuron i at time t as
X X
Isyn,i (t) = wi,j δ(t − tkj ), (4.19)
j k
where the index j labels the presynaptic neurons, wi,j is the synaptic weight from presynaptic
neuron j to postsynaptic
R +∞ neuron i, and δ denotes the Dirac delta function which contributes a
unit impulse ( ∞ δ(t)dt = 1) when neuron j spikes at times tkj . This implementation is a fairly
good approximation if the synaptic channels open and close very rapidly. A temporal delay from
presynaptic spikes due signal transduction along axons or dendrites can be incorporated if necessary.
The presynaptic spiking activity may include that from the local network or a stochastic background
from other brain areas. Although this formulation simplifies mathematical analyses, it a special
case, as we shall see below.
Conductance-based IF models Here the synaptic conductance varies as a function of time,

driven by the dynamics of auxiliary ODEs that are in turn driven by the presynaptic spike train. A
single non-NMDA-mediated synapse (e.g. one using AMPA or GABAA ) produces an input current
of the form
X
Isyn,i (t) = − gi,j Sj (v − Ej ), (4.20a)
j
Sj
S˙j = − + αxj (t), (4.20b)
τSj ,decay
xj X
x˙j = − + δ(t − tkj ), (4.20c)
τxj ,rise
k
where the synaptic gating variable S is the fraction (probability) of open channels, gi,j is the peak
synaptic conductance, Ej is the reversal potential for that particular ionic channel, and α is the
weight contributed by each presynaptic spike. Alternatively, the synapse dynamics can be described
by an alpha function.
103
Figure 4.4: Fits of an LIF model to spike rate data from a pyramidal cell that receives in-vivo-like
noisy input currents. Each panel uses different model parameters to fit data from four different
cells. Adapted from [215].
104
Figure 4.5: Neurons in different cortical layers I - VI can exhibit different morphologies and fuctional
properties. Top: Cells labeled A to D are pyramidal cells, E is a spiny stellate cell and F a double
bouquet cell. From [148]. Bottom: Diverse f −I curves for cells in different layers; L2/3 and L5 refer
to cortical layers (Roman numerals in top panel), FS denote fast-spiking inhibitory interneurons
and PYR denotes excitatory pyramidal neurons. From [165].
105
Shunting inhibition When Ei ≈ EL (e.g. with GABAA ), a phenomena called shunting
inhibition can arise [132, 155], whereby the f − I curve shifts laterally without change of shape
when the inhibitory current increases. Because v ∼ gI, early speculations proposed that increasing
g by shunting inhibition can be thought of as changing the gain modulation of input to cells (i.e.
the slope of the single-cell f − I curves). However, Holt and Koch [132] showed that this is not the
case. To see this, we add an inhibitory synaptic current to the LIF model:
v̇ = −gL (v − EL ) − gsyn (v − Ei ) + Iapp , (4.21)
and recompute the steady state

gL EL + gsyn Ei + Iapp
vss = . (4.22)
gL + gsyn
Defining the effective leak conductance gL,eff = gL + gsyn (> gL ), we find that (cf. Exercise 25)
µ ¶
C vss − vr
TISI = τref + ln . (4.23)
gL,eff vss − vth
For large Iapp vss ≫ vth , vr and we may use the approximation ln(1 + x) ≈ x to write
µ ¶
C vth − vr
TISI ≈ τref + , (4.24)
gL,eff vss − vth
and if τref is small enough to be ignored, we deduce that
gL,eff (EL − vth )

· ¸
Iapp
f = 1/TISI ≈ + . (4.25)
C(vth − vr ) C(vth − vr ) +
Clearly, increases in gL,eff only affect the first term on the right-hand-side of (4.25), which is
independent of Iapp . Hence, the f − I curve shifts laterally without change of slope (= gain).
Nonetheless, gain modulation can be achieved if synaptic noise [44] and/or dendritic properties
[204, 43] are included.
When modeling AMPA- and GABAA -mediated synapses with very short opening time constant
τxj ,rise , the auxiliary x variable is omitted, but this cannot be neglected for NMDA-mediated
synapses. Not only are the time constants longer (especially in prefrontal cortex [267]), but the
peak conductance also depends on the postsynaptic membrane potential v. (NMDA receptors are
blocked by Mg++ ions, and the postsynaptic neuron has to be depolarized to activate this receptor
[142, 143].) In this case the model takes the form
X
IN M DA,i (t) = − gi,j,N M DA Sj (v − Ej ), (4.26a)
j
Sj
S˙j = − + (1 − Sj )αxj (t), (4.26b)
τSj ,decay
xj X
x˙j = − + δ(t − tkj ), where (4.26c)
τxj ,rise
k
gN M DA
gi,j,N M DA = h i. (4.26d)
[M g ++ ]
1+ 3.57 exp(−v/16.13)
106
Note the nonlinearity (saturation) in S and voltage dependence in gN M DA . The saturation can be
appreciated if we assume regular presynaptic spiking ( k δ(t − tkj ) ∼ firing rate f ⇒ xj → x̄ =
P
τxj ,rise ). Then
τf
S̄ = , (4.27)
1 + τf
where τ = α τxj ,rise τSj ,decay .
( µA/cm 2 )
Figure 4.6: Illustration of bistable single-cell states for combined NMDA, GABAA and AMPA
currents via the current-potential or I − v plot. NMDA synaptic currents alone cannot support
bistability (black). When GABAA currents are included, there is a single stable state (where I
crosses zero; red). NMDA and AMPA currents also yield a single stable state (blue), but adding
both GABAA and AMPA currents can produce bistability (two stable states separated by an
unstable state; magenta). Replotted using the values of [172, Fig. 1b].
Bistable single neuron With appropriate parameters, a model neuron with NMDA-mediated
and additional currents and steady inputs can achieve an all-or-none activation pattern [172]. This
is the simplest model of a bistable neuron. In Fig. 4.6 we plot the nonlinear current voltage
relation given by (4.26a) and (4.26d) (with Sj constant) alone, and in combination with the linear
relationship of (4.20a) for AMPA or GABAA separately, and for all three currents together. The
magenta curve for the combined currents intersects Itotal = 0 three times, indicating the presence
of two stable states separated by an unstable state for the ODE
C v̇ = IN M DA (v) + IAM P A (v) + IGABAA (v). (4.28)
Bistable neurons allow networks to robustly perform functions such as sensory integration [42,
172, 160, 99, 197]. Other models include nonlinear synaptic or dendritic dynamics (e.g. due
to Ca++ , Ih -like, IADP and other currents) [173, 99, 196, 76, 175, 281]. The “up” and “down”
states of bistable neurons and their spontaneous switching between these states have been observed
experimentally in in vivo and in vitro neurons, see for example [250, 10, 244, 55, 175]. Other work
with multiple stable states include [68, 82, 281].
107
Figure 4.7: An example of spike frequency adaption in a pyramidal cell, showing membrane voltage
under a constant applied current of 400 ms duration. Adapted from [185].
Figure 4.8: Diversity of spike frequency adaption in pyramidal cells, depending on their cortical
layer. Adapted from [185].
108
Other ionic currents and gap junctions There are a wide variety of ionic currents which
we have not touched upon. They are often dependent on Ca2+ . A widely modeled synaptic cur-
rents include the widely-observed spike-rate or spike-frequency adaptation (afterhyperpolarization)
phenomenon [188, 185] due to a calcium-dependent potassium current, which can be modeled as
[254, 255, 269, 174]
Isyn,AHP (t) = gAHP [Ca2+ ](v − EK ), (4.29a)

[Ca++ ]
[Ca˙++ ] = αCa
X
δ(t − tk ) − . (4.29b)
τCa
k
As the calcium concentration [Ca++ ] increases due to an incoming spike train, the spike rate of the
postsynaptic neuron falls: see Figs. 4.7-4.8 and [277, §10.1].
Gap junction or direct electrotonic coupling, described in §4.1.1 can also combine with chemical
synapses to create bistable behaviors. See §4.2.4 for an example of two leaky integrate-and-fire
neurons coupled via gap junctions.
In addition to postsynaptic conductance dynamics, presynaptic release of neurotransmitters can

also be modeled. For example, it is known that presynaptic release of vesicles can exhibit short-
term history dependence or plasticity that characterize changes in synapse strength. Short-term
plasticity can be either facilitative or depressive [58].
Synaptic depression is due to depletion of neurotransmitter vesicles in the presynaptic cell.

Letting D(t) denote the fraction of available vesicles and appealing to the theory of kinetic processes
[268], the dynamics of D can be modeled as:
µ ¶X
1 (1 − D)
Ḋ = − ln δ(t − tk )D + (4.30a)
1 − pv τD
k
X (1 − D)
≈ −pv D δ(t − tk ) + , (4.30b)
τD
k
where pv is the release probability of each vesicle upon arrival of a spike, and τD (usually O(100)
msec) sets the time scale for recharging vesicles after fusion and release. To incorporate this effect
into the conductance models we simply multiply the factor αx (amount of neurotransmitter release
per spike) in Eqns. (4.20) and (4.26) by D. Synaptic facilitation, which is due to spike-triggered
calcium processes, can be modeled with dynamics similar to those for depression [268]:
µ ¶
1 X F
Ḟ = ln δ(t − tk )(1 − F ) + , (4.31a)
αF τF
k
where τF ≈ 100 msec.
4.2.4 A pair of coupled integrate-and-fire neurons
Here we consider a pair of integrate-and-fire neurons coupled by both gap junctions and in-
hibitory synapses, with the effects of postsynaptic currents represented by delta functions. This
109
results in simple linear ODEs that can be solved explicitly between spikes, allowing calculation of
a Poincaré map. This example, which is analyzed in [171, 87], illustrates how a (fairly) simple
one-dimensional map can be (almost) explicitly calculated, and how its structure reveals different
dynamical regimes.
Writing the synaptic current in neuron i due to spiking in neuron j at t = tkj as Isyn = ḡsyn (vi −
vsyn ) k δ(t − tkj ) and using (4.1) in (4.14) results in the following system of two ODEs:
P
post P
C1 v̇1 = −gL (v1 − vL ) + I1,ext + ḡgap [(v2 − v1 ) + ∆ k δ(t − tk2 )] − ḡsyn (v1 − Esyn ) k δ(t − tk2 ) ,
P
post P
C2 v̇2 = −gL (v2 − vL ) + I2,ext + ḡgap [(v1 − v2 ) + ∆ k δ(t − tk1 )] − ḡsyn (v2 − Esyn ) k δ(t − tk1 ) ,
P
(4.32)
where the delta function terms account for current flow due to potential differences across the
gap junction during the (instantaneous) spikes, as well as the postsynaptic current. In (4.32) the
(vi − vj ) components are called subthreshold and the delta function components, superthreshold.
The parameter ∆ quantifies the “weight” that the spike contributes to the voltage differences across
the gap junction.
For simplicity, suppose that the cells are identical (C1 = C2 = C) and subject to the same
external current I1,ext = I2,ext = Iext . In that case we can rescale time by gL t/C 7→ t and voltage
by (v − v0 )/(vth − v0 ) 7→ v to map the range between reset v0 and threshold vth onto the unit
interval [0, 1). This gives:
dv1 k k
£ P ¤ P
dt = I − v1 + α £(v2 − v1 ) + ∆ Pk δ(t − t2 )¤ − β(v1 − vsyn ) Pk δ(t − t2 ), (4.33)
dv2 k k
dt = I − v2 + α (v1 − v2 ) + ∆ k δ(t − t1 ) − β(v2 − vsyn ) k δ(t − t1 ).
post
where α, β, I and vsyn are rescaled analogs of ḡgap , ḡsyn , Iext and Esyn .
Exercise 28. Work out the details of the rescaling transformations noted above and find expressions
for α, β, I and vsyn in terms of the original parameters.
In the absence of spiking (or if ∆ = β = 0), and with initial conditions v1 (0) = v̄1 , v2 (0) = v̄2 ,
Eqns. (4.33) may be solved to give
v1 (t) = I(1 − e−t ) + ¡ v̄1 +v̄
¡ ¢ −t ¡ v̄ −v̄ ¢ −at
2 ¢e + ¡ 12 2¢e ,
2
v̄1 +v̄2 (4.34)

−t
v2 (t) = I(1 − e ) + 2 e − v̄1 −v̄
−t
2
2
e−at ,
where a = 1 + 2α. Hence v1 − v2 = (v̄1 − v̄2 )e−at : under the influence of subthreshold coupling
(or indeed just leakage alone), the membrane voltages approach one another. However, spikes
and resets to zero intervene whenever v1 or v2 reach the rescaled threshold value 1, so that we
have a hybrid dynamical system rather than the smooth ODEs that we have encountered so far.
To construct long time solutions of (4.33) we must assemble segments of solutions punctuated by
jumps at appropriate points. This suggests that we derive a discrete mapping somewhat different
from the one introduced in §2.3.4.
Assuming that cell 2 has just fired and the system is at (v1 , v2 ) = (v̄1 , 0), we can compute a
Poincaré map from this post-spike state to the next post-spike state immediately after cell 1 fires:
indeed, from the first of Eqns. (4.34), the interspike interval (ISI) T = T (v̄1 ) is determined by:
v̄1 ¡ −T
v1 (T ) = I(1 − e−T ) + e + e−aT = 1,
¢
(4.35)
2
110
and from the second equation the state of cell 2 when cell 1 fires is
v̄1 ¡ −T
v2 (T ) = I(1 − e−T ) + e − e−aT = 1 − v̄1 e−aT .
¢
(4.36)
2
Superthreshold electrical coupling adds a further amount α∆ to v2 (T ), and synaptic coupling
subtracts β(v2 (T ) − vsyn ). Similar remarks apply in the case that cell 1 fires followed by cell 2.
Hence, letting v denote the voltage of the cell that has not just fired immediately after the other
one has fired, we obtain the spike-to-spike Poincaré map and its derivative:
v 7→ f (v) = (1 − β) 1 − ve−aT + d(α, β, ∆, vsyn ) ,

¡ ¢
(4.37)
∂f (v)
= f ′ (v) = −(1 − β)e−aT (1 − avT ′ ) , (4.38)
∂v
where T = T (v) is determined by (4.35) and we have collected the synaptic and electrical “spike”
terms in the parameter d = (1 − β)α∆ + βvsyn which superimposes a vertical post-spike jump that
shifts f upward if superthreshold electrical coupling dominates synaptic inhibition and downward
otherwise. Fig. 4.9 shows three qualitatively different forms that the Poincaré map can exhibit as
parameter values change.
1 1
0.8 0.8
0.6 0.6
f(v)
f(v)
0.4 0.4
0.2 0.2
0 0
0 Vm 0.5 1 0 Vm 0.5 1
v v
1
0.8
0.6
f(v)
0.4
0.2
0
0 0.5 1
v
Figure 4.9: Three distinct forms for the Poincaré map f (v) with coexisting synchronous and asyn-
chronous states, showing disjoint domains of attraction (0 < vm < f (1− ): top left); interleaved
domains of attraction (0 < f (1− ) < vm : top right), and an isolated synchronous state (vm = 0:
bottom). Parameter values are (a) I = 1.15, β = 0, ∆ = 0.05; (b) I = 1.19, β = 0, ∆ = 0.1; (c)
I = 1.15, β = 0.1, ∆ = 0.05, vsyn = −0.5 and α = 0.2 throughout.
In interpreting these results v = 1 and v = 0 should be identified, since on spiking v resets

instantaneously to 0. Moreover, since the spike form is compressed into a delta function, we must
decide how to deal with the cases that a positive impulse carries v above threshold (v = 1) or a
negative one takes it below reset (v = 0). In the former we assume that a spike occurs immediately:
thus, if cell 1 fires when cell 2 is at v2 (T ) and (1 − β)v2 (T ) + d ≥ 1, cell 2 fires immediately and
thereafter 1 and 2 fire in perfect synchrony: this is called spike-capture synchrony (cf. [171]), and it
111
appears in the Poincaré map as a flat (horizontal) segment over the subinterval [0, vm ]. In the latter
case we allow v2 to drop below 0, so that v1 fires successive spikes and v2 ’s spikes are suppressed.
In all three cases shown in Fig. 4.9 there are coexisting stable fixed points at v = 0 (= 1) and at
v e in the interior of the interval (0, 1), but the nature of the separatrix dividing their domains of
attraction differs. Depending upon the size and sign of the post-spike jump d in (4.37), the values
of vm and the limit f (1− ) of the map may take any of the three qualitatively distinct forms. In the
first two there is a (small) domain [0, vm ] in which spike-capture synchrony occurs. In the first case
the interval [0, vm ] is not accessible from the invariant domain (vm , 1) and all solutions starting in
(vm , 1) limit either on the asynchronous fixed point v e or on an orbit of period 2. In the second
case, some solutions starting in (vm , 1) can escape to [0, vm ], resulting in spike capture. These two
cases are both shown without synaptic coupling (β = 0), but they can also occur for sufficiently
small β and α > 0. The third case occurs only with inhibitory synapses present (β > 0): here
the synchronous state v = 0 is completely isolated and a positively-invariant domain surrounds v e ,
and if (1 − β)(1 + α∆) + βvsyn < 0 (synaptic coupling is sufficiently strong relative to electrical
coupling), spike suppression occurs.
1 1 1
0.8
0.8
I=1.2 0.8
I=1.4 I=2
f(v) / f2(v)
f(v) / f2(v)
f(v) / f2(v)
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0 0 0
0 0.5 1
0
v 0.5 1 0 v 0.5 1 v
1.5
1
Vm
f (1)
0.5
|f ’ (Ve)|
0
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
I
1
0.8
0.6
α=0.4, ∆=0.4, β=0.05
v
0.4
vsyn= − 1, b=1.
0.2
Figure 4.10: Poincaré

0
1
maps
1.1
and
1.2
a bifurcation
1.3 1.4
diagram
1.5
for
1.6
a pair
1.7
of cells
1.8
with
1.9
gap 2junction coupling
and delta function synapses. The top panels showI f (v) (bold) and f 2 (v) (dashed) for bias currents
I = 1.2, 1.4 and 2. The bottom panel shows branches of asynchronous fixed points v e and unstable
period 2 orbits arising in a subcritical period-doubling bifurcation and disappearing at the discon-
tinuity in f when vm = f (1− ). A stable synchronous state (v = 0 = 1) exists throughout and is
globally stable for 1 ≤ I ≤ 1.1. Parameter values: α = 0.4, ∆ = 0.4, β = 0.05, vsyn = −1.
As the applied current I increases the asynchronous fixed point v e can lose stability in a period-
doubling bifurcation [121, 107] in which unstable period 2 orbits shrink down on the branch of fixed
points (a subcritical bifurcation). Laborious perturbation calculations for weak coupling (small
112
α, β) confirm the stability properties of v e illustrated in the bifurcation diagram of Fig. 4.10. The
period 2 orbits and, for low bias current I, the discontinuity at f (1− ), separate the domains of
attraction of the synchronous and asynchronous states.
In §6.2 we describe a large network model for decision making, composed of 1400 excitatory
and 400 inhibitory integrate-and-fire cells, modeling a microcircuit within a cortical column, and
describe how it can be reduced to a leaky accumulator model similar to that of Example 7.
4.2.5 A final note on neuronal modeling
Figure 4.11: Diversity of spiking behavior in neurons. From [140].
We end by noting that parameters such as C, τm , gsyn , gL , EL , etc. can be measured or
113
estimated from in vitro experimental studies, and are thus not “free” parameters. However, these
and other parameters can vary widely depending on cell type and locations, as illustrated above in
Figs. 4.4, 4.5 and 4.8. Fig. 4.11 summarises the rich variety of spiking and bursting patterns that
have been observed. Thus, care must be taken in choosing parameters, especially when comparing
with experimental data.
Figure 4.12: General classes of neuronal models for single neurons, with level of detail and com-
plexity decreasing from top to bottom. From [117].
In chapter 3 and the present section we have developed a range of models, from the relative
complexity of H-H and related ionic current descriptions, with detailed spike dynamics, to IF models
with their stereotypical delta function spikes. Fig. 4.12 summarises the range of spatial scales and
complexities. In practice, synaptic currents and model types should be selected from the range
of available neuronal models based upon the types of questions to be addressed [117]. Analytical
114
Figure 4.13: Comparison of computational costs for simulating different single compartment mod-
els. Features preserved or neglected by each model are indicated, and the last column shows the
approximate number of floating point operations required to simulate the model for 1 ms. Some
models (e.g. integrate-and-fire-or-burst, resonate-and-fire) are not included. From [140].
115
tractability, and the ability to extract understanding of how parameters influence behavor through
bifurcations, is one consideration. A second is computational tractability (Fig 4.13) [140].
4.3 Phase reductions and phase oscillator models of neurons
We have already noted that the presence of multiple ionic currents, each necessitating one or
more gating variables, makes models of the form (3.22) analytically intractable, but that time scale
separation allows one to reduce the dimension of state space, sometimes to two dimensions, so that
phase plane methods can be used. In addition to reductions based on equilibrating fast gating
variables and eliminating correlated ones (§3.4), and the integrate-and-fire simplifications of §4.2
that replace the spike dynamics by delta functions and stereotypical postsynaptic potentials, there
is a third class of reduced models, based on the notion of phase. In these phase oscillators or
rotators, a single “clock-face” variable that tracks each cell’s progress toward spiking replaces the
membrane voltage and all the gating variables. Moreover, the theory applies to any ODE that has
a stable hyperbolic limit cycle, and so it can also be applied to networks of neurons that collectively
produce a periodic oscillation.
4.3.1 Phase response or resetting curves
The ability to reduce multidimensional dynamics to a single phase variable is based on state
space topology. Spontaneously spiking or bursting neuron models typically possess hyperbolic
(exponentially) attracting limit cycles [121, 107]. In Euclidean space Rn ODEs must have at
least two state variables (n ≥ 2) to possess limit cycles (§2.3.4), and so it might seem that the
two-dimensional reductions of §3.4 are the best we can do. However, circular and toroidal phase
spaces – manifolds – naturally allow orbits to return near their starting points. A limit cycle
is, topologically, a circle: it can be deformed into a geometrical circle by a nonlinear change
of coordinates, no matter what ambient dimension it inhabits. Phase reduction uses this fact.
The method was originally developed by Malkin [179, 180], and independently, with biological
applications in mind, by Winfree [280]; also see [74, 69] and, for an elementary introduction, [95,
Chap. 6]. For extensive treatments in the context of coupled oscillators in neuroscience, see the
books [135, Chap. 9] and [75, Chap. 8].
We describe the theory in general terms, with a focus on models of single cells. Consider a
system of the form
ẋ = f (x) + ǫg(x, . . .) ; x ∈ Rn , 0 ≤ ǫ ≪ 1 , (4.39)
where g(x, . . .) represents external inputs (e.g. (4.39) might be equations for voltage and ionic
gating variables of H-H type (3.34)). For ǫ = 0, suppose that (4.39) possesses a stable hyperbolic
limit cycle Γ0 of period T0 and let x0 (t) denote a solution lying in Γ0 . Invariant manifold theory [120,
107] guarantees that, in a neighborhood U of Γ0 , the n-dimensional state space splits into a phase
variable φ ∈ [0, 2π) along the closed curve Γ0 and a smooth foliation of transverse isochrons [106].
Each isochron is an (n − 1)-dimensional manifold Mφ with the property that any two solutions
starting on the same leaf Mφi are mapped by the flow to another leaf Mφj and hence approach
Γ0 with equal asymptotic phases as t → ∞ (see Figs. 4.14 and 4.15). It follows that, for points
116
x ∈ U , phase is defined by a smooth function φ(x) and the leaves Mφ ⊂ U are labeled by the
inverse function x(φ). Moreover, this structure persists for small ǫ > 0; in particular Γ0 perturbs
to a nearby limit cycle Γǫ .
The phase coordinate φ(x) is chosen so that progress around the limit cycle occurs at constant
speed when ǫ = 0:
∂φ(x(t)) 2π def
φ̇(x(t)) |x∈Γ0 = · f (x(t)) |x∈Γ0 = = ω0 . (4.40)
∂x T0
Applying the chain rule, using Eqns. (4.39) and (4.40) and assuming that ǫ ≪ 1, we obtain the
scalar phase equation:
∂φ(x) ∂φ
φ̇ = · ẋ = ω0 + ǫ · g(x0 (φ), . . .) |Γ0 (φ) +O(ǫ2 ) , (4.41)
∂x ∂x
where we have dropped explicit reference to t. The assumption that coupling and external influences
are weak (ǫ ≪ 1) allows us to approximate their effects by evaluating g(x, . . .) along Γ0 .
limit cycle Γ0
∆φ
∗ ∆ xi
level sets of φ (x)
Figure 4.14: The direct method for computing the PRC, showing the geometry of isochrons, the
effect of the perturbation at x∗ that results in a jump to a new isochron, recovery to the limit cycle,
and the resulting phase shift. Adapted from [31].
For single cell models in which inputs and coupling enter only via the voltage equation (e.g.,
∂φ ∂φ ∂x def ∂φ
Eqns. (3.22a) and (3.34a)), ∂V = ∂x · ∂V = z(φ) is the only nonzero component in the vector ∂x .
(For general systems of the form (4.39) inputs may enter on any component of g(x), and the entire
∂φ
phase response vector ∂x may be required.) The function z(φ) is called the phase resetting or phase
response curve (PRC), and it describes the sensitivity of the system to inputs as a function of phase
on the cycle. More specifically, for neurons it describes how impulsive (spike-like) perturbations
advance or retard the next spike as a function of the time (or phase) during the cycle at which they
are applied.
The PRC may be calculated directly using the finite-difference approximation to the derivative:
φ(x∗ + (∆V, 0)T ) − φ(x∗ )
· ¸
∂φ
z(φ) = = lim , (4.42)
∂V ∆V →0 ∆V
where the numerator ∆φ = [φ(x∗ + (∆V, 0)T ) − φ(x∗ )] describes the change in phase due to a finite
impulsive (delta function) perturbation V → V + ∆V applied at a point x∗ ∈ Γ0 : see Fig. 4.14.
Numerical simulations of solutions of the unperturbed (ǫ = 0) ODE (4.39) starting at a set of well-
chosen points x∗ and perturbed from them by (small) ∆V yield acceptable results. Asymptotic
117
approximations of PRCs may also be obtained near local and global bifurcations at which periodic
spiking begins, using explicit solutions of ODEs written in normal form coordinates: we give an
example at the end of this subsection. A recent collection of articles on PRCs and phase reduction
appears in [234], and [205] describes an explicit case in which the weak coupling assumption appears
to be justified.
The limit ∆V → 0 in Eqn. (4.42) identifies our definition as that of the infinitesimal PRC,
sometimes called the iPRC [194]. Finite amplitude and temporally distributed inputs are also used,
especially in experimental studies in which it is difficult or impossible to introduce impulses[280, 95].
For example, in [83, 56] externally-applied and magnetically-perturbed leg movements are used to
estimate “effective” PRCs for oscillatory circults driving cockroach legs.
PRCs can also be computed from an adjoint formulation, as is done numerically in the software
package XPP [70]. This is also based on linearization around the limit cycle Γ0 . Letting x0 (t)
represent a solution of the unperturbed ODE on Γ0 and ∆x(t) an arbitrary small perturbation to
it, substitution of x = x0 + ∆x into the ODE yields
d
∆x = Df (x0 (t))∆x + O(|∆x|2 ) . (4.43)
dt
This is a linear system of ODEs with a time-dependent (periodic) Jacobian matrix (cf. linearization
at a fixed point: Eqn. (2.4)). As in Eqn. (4.42) but allowing the more general vector-valued
perturbation ∆x, the resulting phase shift may be approximated as
∂φ
∆φ = · ∆x + O(|∆x|2 ) , (4.44)
∂x
where u · v denotes the inner product in Rn . The effects of the instantaneous perturbation ∆x
depend only on the place x∗ ∈ Γ0 at which it is applied, so that ∆φ is independent of time. Hence
we may differentiate Eqn. (4.44) with respect to time to yield
µ ¶ µ ¶
d ∂φ ∂φ d
0= · ∆x + · ∆x + O(|∆x|2 ) , (4.45)
dt ∂x ∂x dt
which implies that
µ ¶
d ∂φ ∂φ ∂φ
· ∆x ≈ − · Df (x0 (t))∆x = −Df T (x0 (t)) · ∆x , (4.46)
dt ∂x ∂x ∂x
where we use (4.43) and in the final step appeal to the definition of the adjoint operator (the
transposed matrix) in switching the order in which Df appears. Since ∆x is arbitrary, and ignoring
the small O(|∆x|2 ) terms as |∆x| → 0, (4.46) implies that
µ ¶
d ∂φ ∂φ
= −Df T (x0 (t)) . (4.47)
dt ∂x ∂x
We have obtained a linear ODE (the adjoint of Eqn. (4.43)) whose T0 -periodic solution is the phase
response vector! To obtain a unique solution, the following normalization condition is usually
applied:
∂φ dx0 ∂φ
· = · f (x) = ω0 , (4.48)
∂x dt ∂x
in agreement with Eqn. (4.40). See [135, Chap. 9] and [75, Chap. 8] for more information.
118
In addition to the “direct” finite-difference method (4.42) for estimating a PRC numerically,
it is also possible to approximate PRCs by applying a suitable random perturbation to an ODE
possessing an attracting hyperbolic limit cycle, or, indeed, to a physical or biological system that
exhibits autonomous periodic behavior. Ota et al. [198] show that the PRC z(φ) of the phase
equation
φ̇ = ω0 + ǫz(φ)I(t) , (4.49)
the case of Eqn. (4.41) in which the input I(t) is a zero-mean stationary random process, is related
to the weighted spike-triggered average input (WSTA) by the equation
ǫ2 z(φ)
WSTA(t̃) = + O(ǫ3 ) . (4.50)
2π
Here the function WSTA(t̃) is the sum of segments Ij (t) of the input preceding each spike, weighted
in the following manner:
N N
1 X T − τj 1 X
WSTA(t̃) = ∆j Ij (t̃j ) where ∆j = , T = , (4.51)
N τj N
j=1 j=1
and t̃j = T t/τj is the linearly rescaled “time” for the input segment of length τj between the
jth and (j + 1)st spikes. Because I(t) has zero mean, the average period (inter-spike interval) T
approaches the unperturbed system’s period as N → ∞. The Weiner processes of Eqns. (4.17-4.18)
are examples of suitable random inputs (see Definition 9 in §6.1.2), as is the low pass filtered version
as used in [198].
Spike triggered averages (STAs) themselves are used more generally to characterize features of
input signals that produce individual spikes in sensory neurons, as described in §5.1.3 below. Ota
et al. note that the relationship betweeen PRCs and STAs was pointed out previously in [71].
Given the PRC z(φ), the phase reduced system (4.41) can be written as
φ̇ = ω0 + ǫz(φ)h(φ, . . .) + O(ǫ2 ) , (4.52)
where (. . .) in the argument of the input or coupling function h depends on how the cell is externally
or synaptically perturbed. Since the cell itself is now described only through a single phase variable
φ, and its phase φ = ω0 t moves around the circle at constant speed in the uncoupled limit ǫ = 0,
this ODE is sometimes called a rotator.
Example 10. Fig. 4.15 shows an example of isochrons and PRCs computed for a two-dimensional
reduction due to Rose and Hindmarsh [225] of a multi-channel neuron model of Connor et al. [52]:
C v̇ = [I b − gNa m∞ (v)3 (−3(w − Bb∞ (v)) + 0.85)(v − ENa )

−gK w(v − EK ) − gL (v − EL ) + I ext ] (4.53)
ẇ = (w∞ (v) − w)/τw (v) ,
where the functions m∞ (v), b∞ (v), w∞ (v) and τw (v) are of the sigmoidal forms of (3.22-3.26). Since
the gating variables have been reduced to a single scalar w by use of the timescale separation methods
of §3.4, the isochrons are one-dimensional arcs. Note that these arcs, equally-spaced in time, are
bunched in the refractory region in which the nullclines almost coincide and flow is very slow. In
fact as the bias current I b is reduced, the v̇ = 0 nullcine moves downward and touches the ẇ = 0
119
Figure 4.15: (a) The state space structure for a repetitively spiking Rose-Hindmarsh model, showing
attracting limit cycle and isochrons. The dashed and dash-dotted lines are nullclines for v̇ = 0 and
ẇ = 0, respectively, and squares show points on the perturbed limit cycle, equally spaced in time,
under a small constant input current Iext . (b) PRCs for the Rose-Hindmarsh model; the asymptotic
form z(φ) ∼ [1−cos φ] is shown solid, and numerical computations near the saddle node bifurcation
on the limit cycle yield the dashed result. From [32].
nullcline creating a saddle-node bifurcation on the closed orbit of (4.53) (see [107, Figs 1.6.3 and
2.1.3]). Thereafter the nullclines intersect transversely at a sink and a saddle. The sink corresponds
to a quiescent, hyperpolarized state, and the saddle and its stable manifold provide the threshold for
spiking. Saddle-nodes occuring on periodic orbits are sometimes referred to as SNIPERs or SNICs
(saddle-nodes in cycles): rather unpleasant acronyms.
The right hand panel of Fig. 4.15 shows the results of a direct numerical computation using (4.42)
(dashed curve) and the analytical approximation derived from the normal form for a saddle-node
bifurcation: z(φ) ∼ [1 − cos φ] (solid curve).
An explicit approximation for the PRC in this example may be derived from the normal form
of the saddle-node bifurcation (Eqn. (2.55) in §2.3.3), which we write in a slightly different form to
indicate that we are interested in the situation just after the fixed points have disappeared, i.e.
ẋ = µ + x2 (4.54)
√ √
for small µ > 0. This separable ODE may be integrated to give x(t) = µ tan( µ(t − t0 )):
√
a solution that blows up to +∞ as the elapsed time t − t0 → π/2 µ and comes from −∞ as
√
t − t0 → −π/2 µ. Identifying the singularity with the “time” at which the cell fires2 , this gives a
√ √
period T0 = π/ µ and frequency ω0 = 2 µ. The time derivative is therefore
µ
ẋ = √ . (4.55)
cos2 ( µ(t − t0 ))
√
Adopting the convention that the spike occurs at φ = ωt = 0, we have µ(0 − t0 ) = −π/2 ⇒ t0 =
2
A tricky move! We’ll try to justify this by drawing pictures in class.
120
√
π/2 µ, so that the solution and its time derivative may be rewritten as:
µ µ ω02 ω02
ẋ = √ = √ = √ = . (4.56)
cos2 ( µt − π/2) sin2 ( µt) 2[1 − cos(2 µt)] 2[1 − cos φ]
√
Here we replace the parameter µ by ω0 using ω0 = 2 µ and also use the fact that the uncoupled
rotator (4.52) is solved by φ = ω0 t. Finally we compute, as above but with only scalar functions:
dφ dφ dt ω0 2
z(φ) = = = = (1 − cos φ). (4.57)
dx dt dx ẋ ω0
This form was originally derived in [69].
In this case positive voltage perturbations can only advance the phase and hasten the next spike.
For other limit cycles, phase can be advanced or retarded by positive perturbations; for example,
the PRC for the small limit cycle near a Hopf bifurcation is approximately sinusoidal [72, 31].
Exercise 29. Derive the PRC for the following limit cycle oscillator analytically:
ẋ = x − ωy − (x2 + y 2 )x − β(x2 + y 2 )y,
(4.58)
ẏ = ωx + y − (x2 + y 2 )y + β(x2 + y 2 )x,
assuming that perturbations of the form (x∗ , y ∗ ) 7→ (x∗ + ∆x, y ∗ ) are applied (i.e., in the x direction
only). (This is a case of the normal form for the Hopf bifurcation, with µ = 1 and α = −1; cf.
Example 8 in §2.3.3.) Use polar coordinates x = r cos θ, y = r sin θ to obtain explicit solutions of
Eqn. (4.58) and to reveal the dynamics of the phase variable φ = θ. You should first set β = 0
and show that θ̇ = ω = const. What are the isochrons in this³ case, ánd how do they change
when β 6= 0? Finally, compute the full phase response vector ∂φ ∂φ
∂x , ∂y that allows for general
perturbations (x∗ , y ∗ ) 7→ (x∗ +∆x, y ∗ +∆y), and show that it satisfies the linearised adjoint equation
(4.47) for (4.58) and the normalization condition (4.48).
4.3.2 Weakly coupled phase oscillators and averaging theory
Suppose that we have computed the phase reduction for a repetitively spiking neuron and wish
to study how two such identical cells interact when synaptically or electrically coupled. We now
have a pair of equations like (4.52):
def
φ˙1 = ω0 + ǫ[δ1 + z(φ1 )h1 (φ1 , φ2 )] = ω0 + ǫH1 (φ1 , φ2 ) , (4.59a)
def
φ˙2 = ω0 + ǫ[δ2 + z(φ2 )h2 (φ2 , φ1 )] = ω0 + ǫH2 (φ2 , φ1 ) ; (4.59b)
in writing this we allow small frequency differences (detuning) ǫδj and we neglect the yet smaller
O(ǫ2 ) terms. The state space of this system is the product of two circles: a two-dimensional torus
around which solutions wind with approximately the same speed in each direction (φ1 = ω0 t+O(ǫ)).
To proceed further we change variables to remove this common frequency by defining slow phase
variables ψi = φi − ω0 t, so that (4.59) becomes:
ψ˙1 = ǫH1 (ψ1 + ω0 t, ψ2 + ω0 t) , (4.60a)
ψ˙2 = ǫH2 (ψ2 + ω0 t, ψ1 + ω0 t) . (4.60b)
Equations (4.60) are now in a form to which we can apply the averaging theorem [107, §§4.1-2],
which may be somewhat informally stated as follows:
121
Theorem 4. Consider the following ODEs
ẋ = ǫf (x, t) , where f (x, t) = f (x, t + T ) is T -periodic in t, and (4.61)

1 T
Z
ẏ = ǫf (y) , where f (y) = f (y, t) dt is the average of f (x, t). (4.62)
T 0
Then solutions x(t) of (4.61) and y(t) of (4.62), if started within O(ǫ), remain within O(ǫ) for
times of O(1/ǫ). Moreover, hyperbolic fixed points of (4.62) correspond to hyperbolic T -periodic
orbits of (4.61) with the same stability types.
In computing the integral of (4.62) one treats the state variable vector y as constant, and only
averages the explicitly t-dependent terms in the vectorfield f (y, t). For more details and a proof,
see [107, §4.1].
Recalling that the common period of the uncoupled oscillators is T0 , the averages of the terms
on the RHS of (4.60) are
T0
1
Z
Hi (ψi , ψj ) = Hi (ψi + ω0 t, ψj + ω0 t) dt. (4.63)
T0 0
Changing variables by setting τ = ψj + ω0 t, so that dt = ωdτ0 = T02πdτ , and using the fact that the
Hi are 2π-periodic, Eqn. (4.63) is revealed as a convolution integral:
2π
1
Z
def
Hi (ψi , ψj ) = Hi (ψi − ψj + τ, τ ) dτ = Hi (ψi − ψj ) . (4.64)
2π 0
We find that the averaged functions Hi (ψi − ψj ) depend only on the difference between the slow
phases and that these functions are also 2π-periodic.
4.3.3 Phase models and half-center oscillators
For mutually-symmetric coupling between two identical cells we have h2 (φ1 , φ2 ) = h1 (φ2 , φ1 ) in
the original functions of (4.59). Equations (4.59) are symmetric under permutation of φ1 and φ2 :
def
what cell 1 does to cell 2, cell 2 does to cell 1. This implies that H2 (ψ1 − ψ2 ) = H1 (ψ2 − ψ1 ) =
H(ψ2 − ψ1 ) after averaging, and that the reduced phase equations are also permutation symmetric:
ψ̇1 = ǫH(ψ1 − ψ2 ) and ψ̇2 = ǫH(ψ2 − ψ1 ) . (4.65)
We may subtract these to further reduce to a single scalar ODE for the phase difference θ = ψ1 −ψ2 :
def
θ̇ = ǫ[H(θ) − H(−θ)] = ǫG(θ) . (4.66)
Since the function H is 2π-periodic, we have G(π) = H(π) − H(−π) = H(π) − H(π) = 0 and
G(0) = H(0) − H(0) = 0, implying that, regardless of the precise form of H, in-phase and anti-
phase solutions always exist. Moreover, because G(−θ) = [H(−θ) − H(θ)] = −G(θ), G is an odd
function and its derivative G′ (θ) is even: see Fig. 4.17 below. Additional fixed points can also exist,
122
depending on the details of H. In general, if there is a phase difference θe such that G(θe ) = 0, we
say that θ = θe is a phase-locked or synchronous solution.
The stability of phase-locked solutions is determined by the eigenvalues of the 2 × 2 matrix

obtained by linearizing (4.65) at ψ1 − ψ2 = θe :
′ ′
" #
H(θe ) −H(θe )
ǫ ′ ′ ; (4.67)
−H(−θe ) H(−θe )
′ ′ ′ ′
these are 0 and ǫ[H(θe ) + H(−θe ) ] = ǫ G′ (θe ), with eigenvectors (1, 1)T and (H(θe ) , −H(−θe ) )T
respectively. (For θe = 0 and π the second of these is (1, −1)T .) Hence the dynamics is only
Liapunov (neutrally) stable to perturbations that advance or retard the phases of both units equally,
′ ′
but if H(0) < 0 (resp. H(π) < 0) then the in-phase (resp. anti-phase) solution is asymptotically
stable to perturbations that disrupt the relative phase ψ1 − ψ2 .
Note that θ is also the difference between the original phase variables: θ = φ1 − φ2 . In fact,
transforming back to the (φ1 , φ2 ) variables, Eqns. (4.65) become
φ̇1 = ω0 + ǫH(ψ1 − ψ2 ) and φ̇2 = ω0 + ǫH(ψ2 − ψ1 ) . (4.68)
Thus, if (4.66) has a stable fixed point at θ = θe , then there is a periodic solution of the original
system in which the phases approximately maintain the difference θe , implying that spikes alternate
with a regular spacing. If θe = 0, we say that spikes are synchronized ; if θe = π, they are
antisynchronized. Terms such as frequency locking are also used to describe solutions in which the
phase difference θ = φ1 − φ2 remains constant. See [277, §12.1] for additional discussions.
Phase reduction and averaging have simplified a system with at least two voltage variables,
associated gating variables, and possibly additional synaptic variables, to a one-dimensional system
on the phase circle.
Exercise 30. Consider the following (averaged) symmetrically-coupled system:
ψ˙1 = ω1 − α sin(ψ1 − ψ2 ), ψ˙2 = ω2 − α sin(ψ2 − ψ1 ).
Find conditions on the parameters ωj and α that imply frequency locking, investigate the stability
of frequency locked solutions and express their phase difference(s) φ1 − φ2 in terms of the frequency
def
difference ω1 − ω2 = ∆ω and coupling strength α. What happens when frequency locking is lost?
Exercise 31. Modify the system of Exercise 30 to read:
ψ˙1 = ω1 − [sin(ψ1 − ψ2 ) − β sin(2(ψ1 − ψ2 ))], ψ˙2 = ω2 − [sin(ψ2 − ψ1 ) − β sin(2(ψ2 − ψ1 ))].
Compute the reduced system of the form (4.66) and analyze the bifurcations that occur as β increases
from zero for fixed ∆ω = ω1 − ω2 sufficiently small that phase locked solutions exist near 0 and π.
A common structure appearing in models of central pattern generators is the half-center oscilla-
tor : a reflection-symmetric pair of units, sometimes each containing several neurons, that are usu-
ally connected via mutual inhibition to produce an alternating rhythm [75, §9.6]. See [119, 57, 65]
for examples. The reduction to two phase oscillators described above is perhaps the simplest
expression of this architectural subunit. It will reappear in the next section.
123
4.4 Central Pattern Generators
Central pattern generators (CPGs) are networks of neurons in the spinal cords of vertebrates
and invertebrate thoracic ganglia, capable of generating muscular activity in the absence of sensory
feedback (e.g. [49, 92, 200, 182], cf [277, Chaps. 12-13]). Studies of locomotion generation are per-
haps most common, but CPGs drive many other rhythmic activities, including scratching, whisking
(e.g. in rats), moulting (in insects), chewing and digestion (indeed, the stomato-gastric ganglion
in lobster is probably the best-modeled among them [183]). CPGs are typically studied in prepa-
rations isolated in vitro, with sensory inputs and higher brain “commands” removed [49, 102], and
sometimes in neonatal animals, but it is increasingly acknowledged that an integrative approach,
including muscles, body-limb dynamics and proprioceptive feedback is needed to fully understand
their function, e.g. [45, 129, 257] and see [5] for a biomechanical view of animal locomotion. CPGs
nonetheless provide examples of neural networks capable of generating interesting behaviors, but
small enough to allow the study of relatively detailed biophysically-based models. Here we sketch
two examples that use phase oscillators. For an early review of CPG models that use phase reduc-
tion and averaging, see [157].
4.4.1 A CPG model for lamprey
As §4.1 shows, the direct study of a pair of synaptically (or electrically) coupled H-H type
neurons, or even of two-dimensional reductions thereof, is challenging, so much of the CPG modeling
has been done at a higher, more phenomenological level. Phase models have been particularly useful.
A relatively early CPG model for lamprey swimming [47] used a chain of N phase oscillators with
nearest-neighbor coupling to describe the distributed networks of cells in the spinal cord that
generate traveling waves (also see [72, 159]). At that time it could not be derived from single cell
models or otherwise justified, but following the discovery of distinct cell types [38] and the creation
of cell-based [103, 115, 265] and network-based models [37, 276], its status has been strengthened
by computing PRCs and using phase reduction, as in §4.3 [261].
The phase oscillator model of [47] takes the form:

φ̇1 = ω1 + α sin(φ2 − φ1 ),
φ̇i = ωi + α sin(φi+1 − φi ) + δ sin(φi−1 − φi ), for i = 2 . . . N − 1, (4.69)
φ̇N = ωN + δ sin(φN −1 − φN ).
Here we assume that there is a network of interneurons and motoneurons within each segment that
generate periodic bursts, and that the state of each “segmental network” can be described by a
single phase variable φi . We allow a frequency gradient (ω1 , . . . , ωN ) along the cord from rostral
(head, i = 1) to caudal (tail, i = N ) end, and nearest-neighbor ascending and descending interseg-
mental coupling is represented by the terms α sin(φi+1 − φi ) and δ sin(φi−1 − φi ) respectively. We
assume that α, δ > 0, so that the coupling tends to pull oscillator phases together, cf. Exercise 30.
Wilson [277, §13.2] presents a generalized version of this in the course of a review of models from
the segmental level to that of the whole cord, so here we give only a brief sketch.
Provided that the frequency differences ωi+1 − ωi are small, and/or coupling strengths α, δ are
large, traveling wave solutions can be found for (4.69). This is most easily seen by assuming a
124
def
uniform, time-independent phase lag φi+1 − φi = γ, substituting into (4.69), and subtracting the
equations pairwise:
φ̇2 − φ̇1 = (ω2 − ω1 ) − α[sin(γ) − sin(γ)] + δ sin(−γ), . . . ,

φ̇i+1 − φ̇i = (ωi+1 − ωi ) + α[sin(γ) − sin(γ) + δ[sin(−γ) − sin(−γ)] = 0, . . . , (4.70)
φ̇N − φ̇N −1 = (ωN − ωN −1 ) + δ[sin(−γ) − sin(−γ)] − α sin(γ).
For γ to be constant, the differences φ̇i+1 − φ̇i between time derivatives must all vanish and so we
must have
(ω2 − ω1 ) = δ sin γ , (ωN − ωN −1 ) = α sin γ and ωi+1 − ωi = 0 for i = 2 . . . N − 1, (4.71)
and evidently |ω2 − ω1 | < δ and |ωN − ωN −1 | < α must also hold for real solutions of the first two
conditions of (4.71). Moreover, for a uniform wave to propagate from head to tail we require γ < 0
(φi+1 (t) lags behind φi (t)), so we deduce that the head and tail oscillators must run respectively
faster and slower than the interior oscillators, which should all have the same frequency. This is
a rather unlikely situation, and in [47] it was also shown that, for the case δ = α, a phase locked
solution exists with a uniform frequency gradient ∆ω = ωi+1 − ωi provided that ∆ω < 8α/N 2 . In
that case the phase lag γi = φi+1 − φi is nonuniform, being greatest in the center of the cord and
smallest at the ends, and note that N enters in the condition for phase locking, which becomes
more stringent for longer chains.
Actually one should use a double chain of oscillators, since the waves of neural and electromyo-
graph (EMG) activation measured in lamprey spinal cords are in antiphase contralaterally as well
as displaying phase lags ipsilaterally: see [48], but as Wilson notes [277, §13.1], “symmetry sub-
sampling” techniques may be used to justify analysis of only one side (left or right), provided the
model is bilaterally symmetric.
In the introductory paragraph to this section we noted that integrated neuro-mechanical models,
including CPGs, muscles, body-limb dynamics and environmental reaction forces, are needed if we
are to better understand the rôle of CPGs in producing locomotion. (Indeed, without reaction
forces, animals would go nowhere!) Examples of such models for lamprey swimming, of increasing
sophistication and realism, can be found in [190, 191, 258].
4.4.2 A CPG model for insect locomotion
We now describe some analysis on the phase oscillator reduction of a CPG model for hexapedal
locomotion. The model was motivated by experiments of Pearson and Iles [199, 201] on american
cockroaches (Periplaneta americana), and uses bursting neuron and chemical synapse models of
the types described in §3.5 and §4.1. PRCs and averaged coupling functions in terms of phase were
computed numerically. For details, see [94, 93]. The architecture of the CPG and motoneuron
network is shown in Fig. 4.16, the caption of which provides some descriptive background.
As Fig. 4.16(c) indicates, one-way synaptic connections run from the CPG interneurons to the
motoneurons, so the basic stepping rhythm is determined by the six CPG units, which may be
studied in isolation. But before proceeding, we must explain the insect gait pattern that we wish
125
a) b)
CNS CNS
5
BI
BI BI BI Levator
6
sensors
D L D L D L
Df
meta c.s. meso c.s. pro c.s.
Ds Depressor
c) d)
CNS CNS
MN MN
CPG
Ds Ds
pro 1 4 pro 1 4
Df
(1-gF) 2gF 2gF (1-gF) Df
meso 5 2 ... meso 5 2 ...

2gH (1-gH) (1-gH) 2gH
meta 3 6 meta 3 6
Figure 4.16: A central pattern generator for hexapedal locomotion. (a) Ipsilateral CPG-motoneuron
network connectivity and (b) individual leg depressor and levator circuits in Periplaneta americana,
showing fast and slow motoneurons Df , Ds as proposed by Pearson [201]. The central nervous
system (CNS) excites bursting interneurons (BI) as well as Df and Ds , which innervate the leg
depressor muscles. Motoneurons 5 and 6 innervate levator muscles and are not modelled here. Sen-
sory feedback (dashed) affects the activity of motoneurons and interneurons. Open circles indicate
excitatory coupling, closed circles indicate inhibitory coupling, c.s. denotes campaniform sensillae:
proprioceptive sensors on the exoskeleton that measure forces in the legs. (c) Network connectivity
of the hexapedal model: CPG interneurons are coupled through mutually-inhibiting synapses and
fast and slow motoneurons are connected via inhibitory synapses to their corresponding CPG neu-
ron; they are also tonically driven by the CNS. (d) Asymmetric ipsilateral coupling: contralateral
coupling strengths ḡsyn are equal, and the network of (c) is obtained when gF = ḡsyn /2 = gH .
Adapted from [93].
126
to model. Cockroaches run over much of their speed range with a double tripod gait, in which
the left front, left rear and right middle legs (the L tripod) alternate with the right front, right
rear and left middle legs (the R tripod) in providing stance support. Motoneurons activating
depressor muscles that provide support and drive the “power stroke” during stance must therefore
be alternately active for the L and R tripods: in particular, neighboring legs on the same side
(ipsilateral) and across the body (contralateral) should work in antiphase. We augment Pearson’s
proposal for three mutually-inhibiting bursting interneurons on each side of the thoracic ganglia
(Fig. 4.16(a)) by similarly linking the front, middle and rear pairs on the two sides by mutual
inhibition. In Fig. 4.16(c) the three cells driving the L tripod are numbered 1, 2, 3 and those
driving the L tripod: 4, 5, 6. In fact the hemisegmental ganglia contain multiple active neurons,
and the representation of each “leg unit” by a single bursting cell in Fig. 4.16 is minimal. For
example, the hemisegmental units of lamprey described in [277, §13.1] each contain 3 different cell
types, cf. [38, 261].
The reduced model for the six CPG cells in terms of the slow phase variables ψi = φi − ω0 t takes
the form:
ψ˙1 = ḡsyn H(ψ1 − ψ4 ) + ḡsyn H(ψ1 − ψ5 ) ,

ḡsyn ḡsyn
ψ˙2 = H(ψ2 − ψ4 ) + ḡsyn H(ψ2 − ψ5 ) + H(ψ2 − ψ6 ) ,
2 2
ψ˙3 = ḡsyn H(ψ3 − ψ5 ) + ḡsyn H(ψ3 − ψ6 ) ,
ψ˙4 = ḡsyn H(ψ4 − ψ1 ) + ḡsyn H(ψ4 − ψ2 ) , (4.72)
ḡsyn ḡsyn
ψ˙5 = H(ψ5 − ψ1 ) + ḡsyn H(ψ5 − ψ2 ) + H(ψ5 − ψ3 ) ,
2 2
ψ˙6 = ḡsyn H(ψ6 − ψ2 ) + ḡsyn H(ψ6 − ψ3 ) .
Here the strengths of the inhibitory synapses are chosen so that the net effect on each cell from
those connected to it is the same (the middle leg cells 2 and 5 receive inputs from three neighbors;
front and hind leg cells from two).
The averaged coupling function H takes the form shown in Fig. 4.17. Although the PRC is a
complicated function with multiple oscillations caused by the burst of spikes, the integral required
by the averaging theorem yields a fairly simple function. Indeed, subtraction of H(−θ) from H(θ)
produces a phase difference function G(θ) that not only has zeroes at θ = 0 and π, as noted in
§4.3.3, but is also odd and remarkably close to a simple sinusoid, as assumed in the earlier phase
oscillator model for lamprey CPG [47]. Note that an odd function only appears for symmetric
coupling (G(θ) = [Hji (θ) − Hji (−θ)]).
If we seek idealized L-R tripod solutions of the form ψ1 = ψ2 = ψ3 ≡ ψL (t), ψ4 = ψ5 = ψ6 ≡

ψR (t), then Eqns. (4.72) collapse to the pair of ODEs
ψ̇L = 2ḡsyn H(ψL − ψR ) and ψ̇R = 2ḡsyn H(ψR − ψL ) , (4.73)
and the arguments used in §4.3.3 may be applied to conclude that ψR = ψL + π and ψR = ψL
are fixed points of (4.73), independent of the precise form of H. For this argument to hold, note
that the sums on the right hand sides of the first three and last three equations of (4.72) must be
identical when evaluated on the tripod solutions; hence, net inputs to each cell from its synaptic
connections must be equal. Also, since for ḡsyn > 0 we have G′ (0) > 0 > G′ (π) (Fig. 4.17(b)), so
127
Figure 4.17: (a) The coupling function ḡsyn Hji (θ) (solid) for an inhibitory synapse; ḡsyn Hji (−θ) also
shown (dash-dotted). (b) The phase difference coupling function ḡsyn G(θ) = ḡsyn [Hji (θ)−Hji (−θ)].
Note that G(0) = G(π) = 0 and ḡsyn G′ (0) > 0 > ḡsyn G′ (π). From [93].
that we expect the in-phase solution to be unstable and the antiphase one to be stable. However,
to confirm this in the full six-dimensional phase space of (4.72) we must form the 6 × 6 Jacobian
matrix:
2H ′ −H ′ −H ′
 
0 0 0
 0 2H ′ 0 −H ′ /2 −H ′ −H ′ /2 
2H ′ −H ′ −H ′ 
 
 0 0 0
ḡsyn 

′
, (4.74)
 −H −H ′ 0 2H ′ 0 0 

 −H ′ /2 −H ′ −H ′ /2 0 2H ′ 0 
0 −H ′ −H ′ 0 0 2H ′
with derivatives H ′ evaluated at the appropriate (constant) phase differences π or 0. The anti-
phase tripod solution ψL − ψR = π gives one zero eigenvalue with “equal phase” eigenvector
(1, 1, 1, 1, 1, 1)T , and the remaining eigenvalues and eigenvectors are as follows:
λ = ḡsyn H ′ : (1, 0, −1, 1, 0, −1)T ,

λ = 2ḡsyn H ′ , m = 2 : (1, −1, 1, 0, 0, 0)T and (0, 0, 0, −1, 1, −1)T ,
λ = 3ḡsyn H ′ : (1, 0, −1, −1, 0, 1)T , (4.75)
′ T
λ = 4ḡsyn H : (1, 1, 1, −1, −1, −1) .
(This can be checked by Matlab.) Since ḡsyn H ′ (π) < 0 (Fig. 4.17(a)), this proves asymptotic
stability with respect to perturbations that disrupt the tripod phase relationships; moreover, the
system recovers fastest from perturbations that disrupt the relative phasing of the L and R tripods
(λ = 4ḡsyn H ′ : last entry of (4.75)). Since ḡsyn H ′ (0) > 0 (Fig. 4.17(a)), the “pronking” gait with
all legs in phase (ψL (t) ≡ ψR (t)) is unstable.
Exercise 32. How many other phase locked solutions of the six ODEs (4.72) can you find?. [Hint:
Seek solutions in which different groups of 3 (and then of 2) phases are equal, e.g. ψ1 = ψ5 = ψ3 =
ψL , etc.]
128
Exercise 33. Consider the CPG network with unequal ipsilateral coupling shown in Fig. 4.16(d).
Can you find a modified double tripod solution with each contralateral pair of cells in antiphase, but
with nonzero phase differences between front and middle and middle and hind cells? [Hint: As in
§4.4.1 assume that such a solution exists and try to find conditions on the synaptic strengths gF
and gH consistent with it.]
This CPG model was created in the absence of information on coupling strengths among different
hemisegments, and symmetry assumptions were made for mathematical convenience, allowing the
reduction to a pair of tripod oscillators, as in Eqns. (4.73). Experiments on deafferented thoracic
ganglia support bilateral symmetry (L-R and R-L contralateral connections have approximately the
same strengths), but indicate that descending connections are stronger than ascending ones [84], as
allowed in Fig. 4.16(d). Similar rostralcaudal asymmetries have been identified in the lamprey spinal
cord [110, 14]. However, proprioceptive feedback can strengthen coupling among segments [83], and
more recent experiments in which footfall data from freely-running animals are fitted to a genralized
model of the form (4.72) with bilateral symmetry but allowing different contralateral and ipsilateral
coupling strengths among the 6 hemisegments suggests a different balance [56]. Specifically, setting
ψi (t) = ω̄t and ψi+3 (t) = ω̄t + π for i = 1, 2, 3 and substituting into the ODEs for the units on the
left hand side, we obtain
ω̄ = c1 H(π) + c5 H(ψ1 − ψ5 ) ,
ω̄ = c4 H(ψ5 − ψ1 ) + c2 H(π) + c7 H(ψ5 − ψ3 ) , (4.76)
ω̄ = c6 H(ψ3 − ψ5 ) + c3 H(π) ,
where c1 , c2 , c3 and c4 , c5 , c6 , c7 are respectively the contralateral and ipsilateral coupling strengths
(see [56, Fig. 5]). The equations for the phases ψ4 , ψ2 and ψ6 on the right hand side are similar.
Note that, in the original phase variables φi , the stepping frequqncy of these solutions is ω0 = ω̄,
and Eqns. (4.76) provide 3 conditions relating the two ipsilateral phase differences ψ1 − ψ5 and
ψ5 − ψ3 and the frequency shift ω̄. They provide conditions on the coupling strengths and function
values for specific gaits. In particular, for the double tripod ψ1 − ψ5 = ψ5 − ψ3 = π (all neighbors
cycle in antiphase) and the conditions become
c1 + c5 = c2 + c4 + c7 = c3 + c6 , (4.77)
implying that the summed input weights into each of the six units are identical. The data in [56,
Fig. 9C] show that as speed increases, estimated coupling strengths approach this “balanced”
condition.
Exercise 34. Reindexing the units so that 1, 2, 3 and 4, 5, 6 denote the front middle and hind legs
on the right (R) and left (L) sides respectively, as in [56], the Jacobian matrix analogous to to
(4.74) in the general 7-parameter case of bilaterally symmetric coupling takes the form
   
· ¸ (c1 + c5 ) −c5 0 −c1 0 0
B C
H ′ (π) , with B =  −c4 (c2 + c4 + c7 ) −c7  , C =  0 −c2 0 .
C B
0 −c6 (c3 + c6 ) 0 0 −c3
Find explicit expressions for the eigenvalues and eigenvectors in terms of the parameters cj and
H ′ (π), and using these find conditions in these parameters that guarantee asymptotic stability (ex-
129
cepting the “equal phase” eigenvalue with eigenvector (1, 1, 1, 1, 1, 1)T ). [Hint: The eigenvalue prob-
lem can be written as
· ′
H ′ (π)C
¸µ ¶ µ ¶
′ H (π)B − λI vR vR
H (π) = 0 , where v =
H ′ (π)C H ′ (π)B − λI vL vL
denotes the eigenvector. Block symmetry implies that eigenvectors come in orthogonal pairs in
which vL = ±vR , allowing decomposition into two 3 × 3 matrix problems:
[H ′ (π)[B + C] − λI]vR = 0 and [H ′ (π)[B − C] − λI]vR = 0 ,
for which the characteristic polynomials are cubics.]
To address the need for integrated studies noted at the beginning of this section, neuro-mechanical
models of insect locomotion have also been constructed that integrate a CPG with muscles and
body-limb dynamics [163, 164]. They were developed from much simpler, bipedal models with pas-
sive, springy legs [231, 232, 230] by introducing feed-forward actuation [233], constructing a more
realistic, hexapedal geometry driven by agonist-antagonist muscle pairs [162], and finally adding
proprioceptive feedback [207]. Their rigid bodies are equipped with jointed, massless legs whose
ground contact points remain fixed, and the body moves in the horizontal plane, implying that its
dynamics have only 3 degrees of freedom (fore-aft and lateral translations and yawing rotation). In
spite of these simplifications, the final, unanalyzable system contains 270 ODEs and simulations of
it run too slowly to reveal how its behavior depends on multiple parameters. It has subsequently
been reduced to 24 phase oscillators describing motonuerons driving extensors and flexors, coupled
to 6 force and moment balance equations describing the Newtonian body dynamics [208]. Lineariza-
tion around the limit cycles implicit in phase reduction allows one to interpret the contributions
of internal CPG coupling and feedback additively, thereby illuminating mechanisms involved in
stability and recovery from perturbations.
Animal locomotion has also inspied the design of legged robots, e.g. [6, 156, 138]. For further
information on this, and the rôles of CPGs and CPG models in it, see the special issue of Arthropod
Structure and Development [223], and [137].
130
Chapter 5
Probabilistic methods and

information theory
In chapter 3 the emphasis was on biophysically-based mechanistic models: sets of deterministic

ODEs describing the evolution of membrane voltages, ionic currents and gating variables, and
reductions of them. We now turn to empirical and probabilistic models, describing how they can
be formulated from experimental data. Specifically, we will examine neural coding and decoding,
addressing two questions: 1) What spike train results from a given stimulus? and 2) Given a spike
train, what was the stimulus that caused it? To develop (partial) answers, we provide a brief
introduction to relevant ideas from probability and information theory, and their implications for
efficiency and reliability of neural coding and communication. The main sources for this section
are the books of Rieke et al. [219] and Dayan and Abbott [58].
Unlike the continuous membrane potentials v(t) of chapter 3 (but like the integrate and fire
model), here we shall describe neural signals as trains of delta functions. Since action potentials
are stereotyped and subthreshold fluctuations attenuate on very short scales in axons, the times at
which soma voltages peak often suffice to describe the output of a neuron. As in the integrate-and-
fire models of §4.2.2 this delta function idealization provides a compact description: for example,
a one (or ten) second recording containing fifty spikes has only fifty elements in its data set – the
fifty spike times – representing a substantial compression of the membrane potential discretized on
a millisecond timescale.
The simplification is appropriate given the questions we now address. Instead of probing physio-
logical mechanisms that generate spikes, we ask how spike trains convey information to the organism
about the outside world. Details about mechanisms are largely irrelevant to such a question, so we
abstract from the physiology to make progress. However, we must not forget that spike trains come
from real neurons, which are not idealized binary (0, 1) communication units, but are composed
of ion channels, axons, dendrites, synapses, etc., as described in chapter 3. Indeed, computation
of a sort does occur at the cellular level (e.g., when subtle ionic current timescale effects deter-
mine whether a given cell reaches threshold and fires a spike), but here we emphasize information
contained in and transmitted by (idealized) spike trains themselves.
131
The encoding/decoding questions posed above can be further refined. The central nervous
system receives sensory input in the form of spike trains, which contain visual, auditory, and tactile
information from the eyes, ears, and skin receptors. How is this information conveyed? Given a
stimulus, what spike train results? How does the organism extract stimulus information from a
series of spike times? How closely do specific neural coding schemes approach to optimal information
transmission rates? How robust and reliable are encoding and decoding? We will not completely
answer any of these questions, but we will develop tools necessary to address them. We start
with statistical and probabilistic tools for spike train analysis, continue to filtering techniques for
decoding, and end with an introduction to information theory relevant for the final two questions.
5.1 Stimulus to response maps: Encoding
After introducing idealized models of spike trains and different notions of averaged spike rates, we
review basic ideas in probability theory. Equipped with these, we return to spike trains to describe
spike-triggered averages (patterns evoked by stereotypical stimuli), and a simple statistical model
for spiking
5.1.1 Description of spike trains
As noted above, we consider only spike times in the neural recording, rather than the full
membrane potential. Hence, given a train of n spikes over an interval [0, T ], our data consists of n
spike times, {ti }, and the spike train as a function of time becomes:
n
X
ρ(t) = δ(t − ti ). (5.1)
i=1
Recall that the delta function δ(t) is a generalized function, defined by what it does to continuous
functions rather than via its pointwise values. Specifically, δ(t) = 0 for all t 6= 0, and while its value
R +∞
is not defined at t = 0, it has unit integral −∞ δ(t) dt = 1. Thus, the integral
Z T
ρ(t) dt = n (5.2)
0
counts the number of spikes (n) in the interval [0, T ], and when convolved with a continuous function
R +∞
f (t), ρ(t) sifts out the value of that function at the time of the spike: −∞ δ(t − t0 )f (t) dt = f (t0 ).
Therefore, from (5.1),
Z ∞ n
X
f (τ )ρ(t − τ ) dτ = f (t − ti ) , (5.3)
−∞ i=1
and sums over spikes are interchangeable with integrals with the neural output ρ(t).
Spike sequences are usually stochastic: if the same stimulus is presented multiple times, spike
sequences from the same neuron may differ due to internal noise in the system. (Indeed, delta func-
tions cannot exactly line up, when one considers that {ti } has measure zero in [0, T ].) Nonetheless,
132
spike trains elicited by the same stimulus under carefully controlled conditions usually “look sim-
ilar,” and we would like to quantify this similarity using probability and statistics. Small jitter
in spike times does not necessarily pose a problem: microsecond differences often do not regis-
ter to the organism, and in many cases, firing rates rather than individual spike times appear to
code for sensory properties. (Echolocation in directional hearing by bats and owls is a stroking
counterexample.)
Following the notation in [58] we explore different definitions and meanings of the firing rate.
The first is the spike count rate r: the average number of spikes per unit time during stimulus
presentation. This lacks time resolution, but can be very useful for constant stimuli. The definition
uses the delta function property of Eqn. (5.2):
T
n 1
Z
r= = ρ(τ ) dτ . (5.4)
T T 0
The spike count rate r is useful in that it requires only one stimulus presentation to obtain,
but it averages out time variations and is thus useless for treating non-stationary stimuli, unless
T is very small. However, we can divide the train into time bins of width ∆t, and at a suitable
resolution, each bin will contain zero or one spike, resulting in a rate that is piecewise either 0 or
1/∆t. Presenting the same stimulus multiple times to the neuron, we may then derive an average
response r(t) (note the use of Roman r to distinguish from r(t)). Over fifty presentations, a given
bin may have 15 spikes, which gives an instantaneous rate of 15/50∆t = 0.3/∆t. The values are
still discrete (multiples of 1/∆t divided by the number of presentations), but they offer much better
resolution than 0 or 1. There is a tradeoff between temporal resolution (making ∆t small) and rate
resolution (possible values for r(t)). Letting angle brackets hi denote averaging over trials, we define
the time-dependent firing rate (or simply the firing rate) as:
t+∆t
1
Z
r(t) = hρ(τ )i dτ . (5.5)
∆t t
The limit ∆t → 0 is not taken since that would require the number of trials to approach infinity
to avoid the rate resolution problem. The probability of finding a spike in a window of size ∆t,
starting at time t, is then r(t)∆t.
Finally, we define the average firing rate hri as the average number of spikes in the spike train
averaged over trials. It has the same drawbacks as r, and can be defined in three equivalent ways:
T T
hni 1 1
Z Z
hri = = hρ(t)i dt = r(t) dt . (5.6)
T T 0 T 0
Note that for the last equality, the limit ∆t → 0 must be taken, for which r(t) → hρ(t)i in (5.5).
Firing rates may be determined by recording spike trains for (identical) stimulus presentations
and summing them aligned with t = 0 corresponding to stimulus onset. The summed record is
divided into bins of width ∆t and the total number of spikes in each bin across all trials is counted
and divided by ∆t to get the firing rate over that bin. The result is then plotted as a Post (or Peri)
Stimulus Time Histogram (PSTH): Fig. 5.1.
133
Figure 5.1: Construction of the Peri Stimulus Time Histogram (PSTH) from 17 0.5 second stimulus
presentations. Individual spike trains are shown on the raster plot above and the PSTH below with
spike rates in Hz. Here the stimulus (red bar below raster plot) is centered on t = 0, not starting
at t = 0, as is usual. www.biomedicale.univ-paris5.fr
5.1.2 A primer on probability theory
We assume familiarity with basic concepts in probability, but we review a few key ideas here.
For additional background see a text such as [226]. In a discrete probability distribution, PN there are
N possible outcomes {Xi }N i=1 , each with probability P [X i ] = p i . Each p i ≥ 0, and i=1 pi = 1. A
continuous distribution defines a probability density as a function on some connected domain. For
instance, if an event is known to occur between time t = 0 and T , its probability R T can be expressed
as a positive function, not necessarily continuous, p(t) : [0, T ] → R+ , where 0 p(t) dt = 1. The
probability that the event occurs between times a and b (with 0 ≤ a < b ≤ T ) is the integral of p(t)
over [a, b]. A discrete distribution can be represented as a sum of weighted delta functions pi δ(Xi ).
The mean or expected value µ of the distribution is the average outcome weighted by the prob-
abilities. For discrete distributions
N
X
µ = E[X] = Xi pi , (5.7)
i=1
and for a continuous distribution on a domain Ω:

Z
µ = E[X] = xp(x) dx . (5.8)
Ω
The variance of a distribution V ar(X) or σ 2 , with σ the standard deviation, is a measure of
134
how much the distribution “spreads” about the mean. For a discrete distibution
N
X
2 2 2 2
V ar(X) = σ = E[(X − µ) ] = E[X ] − µ = Xi2 pi − µ2 , (5.9)
i=1
and in the continuous case: Z

2
σ = (x − µ)2 p(x) dx . (5.10)
Ω
Certain distributions appear frequently in nature and will be very important in this section.
The most common is the Gaussian, or normal distribution, a continuous distribution with density
(x − µ)2
· ¸
1
p(x) = N (µ, σ) = √ exp − , (5.11)
2πσ 2 2σ 2
having mean µ and standard deviation σ. Normal distributions are important in applications
because the average of n independent, identically-distributed (i.i.d.) random variables, each with
mean µ and variance σ 2 , is normally distributed with mean µ and variance σ 2 /n, by the central
limit theorem [226, p.404]. Since neural events often result from multiple independent upstream
events, it is not surprising to see Gaussians in neuroscience.
The exponential distribution, which is often used to model the distribution of times until a single
event occurs (e.g., failure of a lightbulb), is another useful example. This distribution is given by
λe−λx , for x ≥ 0,
½
p(x) = (5.12)
0, for x < 0.
The probability that the waiting time will be longer than T , given a time constant λ, is calculated
by summing the probability in the tail of the distribution for x > T as follows:
Z ∞
P [t > T ] = λe−λt dt = e−λT . (5.13)
T
Interspike intervals are often modeled by repeated drawings from (5.12) with a suitable time con-
stant λ. We leave computation of the mean and variance of this distribution as Exercise 36 below.
A double-sided version of the exponential distribution will appear in Exercise 38 of §5.2.
The Poisson distribution is an important discrete distribution, in which X takes integer values
i = 0, 1, 2, . . . and
λi e−λ
pi = P [X = i] = . (5.14)
i!
Here the parameter λ is both the mean and the variance, as can easily be checked using (5.7) and
(5.9). The Poisson distributions arises from the sum of n ≫ 1 trials, each with small probability p
of the event occurring, so that np = λ is neither small nor large. Thus you can easily see how λ will
end up as the mean for that distribution. It provides a good description of the total occurences of
an event, if each is independent of the others. In §5.1.4 we will use it in a spike-generating model.
Exercise 35. Verify that the mean and variance for (5.14) are both λ, and that they are µ and σ 2
for (5.11). [Hint: You will need to let i → ∞ for the Poisson distribution.]
135
Two quantities are commonly used to describe the spread of probability distributions. The
coefficient of variation is the ratio CV = σ/µ of the standard deviation to the mean. The Fano
factor is the ratio F = σ 2 /µ of variance to mean. For Poisson distributions the Fano factor is 1,
and determining it for spike counts shows how close their distributions are to Poisson [58, p.32].
Multiple random variables can be related to the same distribution of outcomes. For instance,
let X be the random variable of the ordered pair of results from rolling a red and a white die.
Another random variable Y could represent the sum of the two dice. Possible values of X are
(1, 1), (2, 3), (3, 2), . . . , each with probability 1/36 for fair dice, and possible valus of Y are the
integers 2 through 12, with varying probabilities. Suppose we know the value of Y for a given trial.
What do we know about X? Y = 2 implies that X = (1, 1), but if Y = 7 there are six possibilities
for X: [(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)], each with probability 1/6. So we may define a new
distribution describing the probability of X, once the value of Y is known. This is called the
conditional probability of X given Y , and is usually written P [X|Y ]. It will be very important
when we describe the distribution of spike times {ti } given the stimulus s(t) as P [{ti }|s(t)] for
stimulus encoding and its “dual” P [s(t)|{ti }] for decoding.
The joint distribution P [X, Y ] of two random variables X and Y (or the joint density p(x, y)
for continuous variables) describes the probabilities of different combinations of X and Y . The
marginal distribution of Y can be obtained from the joint probability by summing or integrating
over X. In the continuous formulation, this is expressed as
Z ∞
p(y) = p(x, y) dx . (5.15)
−∞
The conditional distribution of X given Y can then be obtained from the joint distribution by
dividing the joint probability by the probability for Y . For instance, if the probabilty that X =
Y = 5 is .1 and the probability that Y = 5 is .5, then the probability that X = 5 given that Y = 5
is
P [X = 5, Y = 5] 0.1
P [X = 5|Y = 5] = = = 0.2 .
P [Y = 5] 0.5
So given the joint distribution and the distribution of Y , we get the conditional distribution of X
given Y . The reverse is also true, and in general we have
P [X|Y ] P [Y ] = P [X, Y ] = P [Y |X] P [X] . (5.16)
In the neural spike train context, experiments usually determine only P [{ti }|s(t)] or P [s(t)|{ti }],
but we typically require both distributions. Fortunately, from (5.16) we can get the joint distri-
bution P [{ti }, s(t)] from both P [{ti }|s(t)]P [s(t)] and from P [s(t)|{ti }]P [{ti }]. Setting the two
representations equal yields Bayes’ rule [18]:
P [s(t)]
P [s(t)|{ti }] = P [{ti }|s(t)] . (5.17)
P [{ti }]
We will return to this when discussing encoding and decoding.
Finally, two random variables X and Y are independent if P [X, Y ] = P [X] P [Y ]. Hence
P [X|Y ] = P [X]: knowing Y tells us nothing about X, and vice versa. For continuous distri-
butions, independence holds if and only if p(x, y) = p(x) p(y): the joint density is equal to the
product of the individual densities.
136
Exercise 36. Find the mean and variance of the exponential distribution for time constant λ. Also,
given that the event has not happened by time T0 , what is the conditional probability that the event
will happen after time T0 + T ? Why is this interesting?
5.1.3 Spike-triggered averages
We now develop quantitative analyses for the generation of spike trains, given stimuli. We
assume that spike trains have been recorded in response to a stereotypical stimulus s(t) repeated
many times, and the time-dependent firing rate r(t) has been obtained by averaging over trials. We
define the spike-triggered average stimulus (STA) C(τ ) as the average value of the stimulus at time
τ before a spike [58]. We average over all spikes, both within and across trials to obtain
n n
1X 1 X
C(τ ) = h s(ti − τ )i ≈ h s(ti − τ )i . (5.18)
n hni
i=1 i=1
We may take hni out of the trial average if n is large enough that n ≈ hni. C(τ ) should approach
the mean of the stimulus at large τ , since outside of a certain correlation time, we do not expect
the stimulus to have anything to do with a spike. Thus, the expected stimulus value distant from
spikes is just the value with no information at all: the global average. Using the notation developed
in §5.1.1, we note that
Z T Z T n
X
hρ(t)is(t − τ ) dt = h ρ(t)s(t − τ )dti = h s(ti − τ )i ,
0 0 i=1
where the first step is possible because the same stimulus is applied in every trial, and the second
follows from (5.1). Hence we may write (5.18) as
T T
1 1
Z Z
C(τ ) = hρ(t)is(t − τ ) dt = r(t)s(t − τ ) dt , (5.19)
hni 0 hni 0
the final step uses lim∆t→0 r(t) = hρ(t)i, as in (5.6).
The expression (5.19) resembles the correlation of the firing rate and stimulus, but it differs
in an important way. Correlations show how functions are related at different times. The firing
rate-stimulus correlation function Qrs (τ ) is
T
1
Z
Qrs (τ ) = r(t)s(t + τ ) dt , (5.20)
T 0
and it shows how the firing rate at time t is related to the stimlus at t+τ on average, as a function of
τ . From (5.19-5.20), the spike-triggered average C(τ ) is Qrs (−τ ), normalized by hri = hni/T . The
sign of τ is differs in the two definitions, and there is a normalization factor, but the spike-triggered
average is often called a reverse correlation function. Other names noted in [219] include first
Wiener kernel, mean effective stimulus, and triggered correlation function. See Fig. 5.2 for examples
of STAs for retinal ganglion cells. The STA C(τ ) is conventionally plotted with τ increasing to the
left to emphasize that it represents the stimulus preceding the spike: see Fig. 5.4 below.
137
Figure 5.2: Spike-triggered averages C(τ ) for nine different macaque ganglion cells in the visual
system for red, green, and blue color stimuli. Row a shows three different blue-yellow cells, which
are seen to respond on average to an increase in blue or a decrease in green or red intensities.
Row b shows three different ON cells, and row c shows three different OFF cells. Note that C(τ )
is conventionally plotted with τ increasing to the left. From www.nature.com Chichilnisky and
Baylor, 1999
Wiener kernels characterize nonlinear systems in terms of certain filters. They resemble Taylor
series, with functions rather than real or complex numbers as the domain variables. Wiener showed
that the output y(t) = F [x(t)] of a nonlinear system with input x(t) can be written as
y(t) = G0 + G1 [x(t)] + G2 [x(t)] + G3 [x(t)] + · · · , (5.21)
with G0 = g0 a constant term, and G1 defined as

Z ∞
G1 [x(t)] = g1 (τ1 )x(t − τ1 ) dτ1 . (5.22)
0
The function g1 (τ1 ) is called the first Wiener kernel ; it is analogous to the first derivative in the
linear term of Taylor series. Higher order terms have kernels that depend on higher-order cross-
correlations, e.g., the average value of the stimulus at times τ1 from one spike and τ2 from another.
Given enough terms, Wiener series provide good approximations of many nonlinear systems. The
simplest stimulus to firing rate encoder, then, is just the first Wiener kernel, which Rieke et al.
[219, §A.3, pp 292-295] show is proportional to the spike triggered average. In fact we have
Z T
1
g1 (τ ) = h y(t)x(t − τ ) dti , (5.23)
Sx 0
where h·i denotes averaging over an ensemble of input-output pairs and Sx is a normalization factor
(the power spectrum level).
In summary, we may start with a correlation of firing rate and stimulus, or separately create a
filter to approximate the firing rate given the stimulus. Up to a normalizing factor, the filter and
138
the correlation function are identical:
Z ∞
r(t) = G1 [s(t)] = g1 (τ )s(t − τ ) dτ , (5.24)
0
where
T n
1 1 1 1X
Z
g1 (τ ) = h ρ(t)s(t − τ ) dti = h s(ti − τ )i , (5.25)
Sx T 0 Sx T
i=1
and h·i denotes averaging over an ensemble of stimulus presentations.
This is a very simple model for obtaining the firing rate from stimulus, and in most cases it is far
from adequate. Many neural systems exhibit adaptation, as explored earlier, which may happen on
slow timescales. The Wiener series has significant difficulties in accounting for adaptation. Good
encoding descriptions, even without adaptation, often require higher-order terms. Here we only
present simple models that introduce relevant techniques but are certainly not at the research
frontier.
5.1.4 A Poisson model for firing statistics
Given the firing rate r(t) corresponding to a stimulus s(t), can we now create the “dictionary” of
conditional probabilities of spike trains given the stimulus? Unfortunately, no. There are so many
possible spike sequences that correspond to a given approximation of r(t), which itself depends on
the choice of ∆t, that it is impossible to gather enough trials to determine the full dictionary, even
when we have a confident estimate of the firing rate. Moreover, if the presence of one spike alters
spike generation in its vicinity through the refractory period or other factors, the firing rate cannot
account for this. We must make additional assumptions to obtain a complete encoding model.
Here we describe a simple model of spike generation given the firing rate r(t), which works well
in several cases. The major assumption is that spike times are mutually independent. Given the
refractory period τref , this assumption is not strictly valid for single neurons, since a spike is less
likely immediately following another spike, but if τref is short compared to the average interspike
interval, the model works reasonably well.
For ease of analysis, we start by assuming that r(t) = r = const. If independent spikes arrive
at a constant rate, then all trains of n spikes occurring in a given time interval [0, T ] have equal
probability. We will first find the probability P [{ti }ni=1 ] that n spikes occur at specified times
0 < t1 < t2 < . . . < tn < T , and that no other spikes occur in [0, T ]. Then we will compute the
probability PT [n] of observing exactly n spikes anywhere in the interval [0, T ].
We divide the interval into M bins of width ∆t = T /M and assume that ∆t is small enough that
not more than one spike falls in each bin (e.g. choose ∆t < τref ); in any case, we will eventually
take the limit ∆t → 0. The probability of a spike occurring in any given bin is simply r∆t, and,
since spike times are assumed to be independent, the joint probability of n spikes occurring in n
given bins is therefore (r∆t)n . We also require that no spikes occur in the remaining M − n bins,
and again appealing to independence, we obtain (1 − r∆t)(M −n) , since (1 − r∆t) is the probability
139
of not having a spike in a bin. Taking the product of these independent events, we obtain
P [{ti }ni=1 ] = (r∆t)n (1 − r∆t)(M −n) . (5.26)
This expression may be simplified by approximating it as ∆t → 0 (which implies that M = T /∆t →

∞), while n remains fixed. Thus, (M − n) → M and we may write the second factor as
h i−(rT )
lim (1 − r∆t)(M −n) = lim (1 − r∆t)(T /∆t) = lim (1 − r∆t)−(1/r∆t) = e−rT ; (5.27)
∆t→0 ∆t→0 ∆t→0
in the last step we use the fact that limǫ→0 (1 + ǫ)1/ǫ = e. This limit provides a good approximation
for ∆t small, and so, returning to (5.26) we have the probability that one spike occurs in each of n
specified disjoint bins of size ∆t, and that no spikes occur in the remaining (M − n) bins in [0, T ]:
P [{ti }ni=1 ] ≈ rn e−rT (∆t)n . (5.28)
Dividing (5.28) by the n-dimensional bin volume, we obtain the probability density function:
p[{ti }ni=1 ] = rn e−rT . (5.29)
To compute the probability PT [n] that n spikes occur anywhere in the interval [0, T ], we return
to (5.26), and multiply P [{ti }ni=1 ] by the number of ways that n spikes can be distributed
¡ ¢ among M
bins, with at most one spike per bin. This is the binomial coefficient M choose n: M n = M!
(M −n)! n! ,
which gives · ¸
M! n (M −n)
PT [n] = lim (r∆t) (1 − r∆t) . (5.30)
∆t→0 (M − n)! n!
n−1
Using the limiting behavior noted above, we deduce that M !/(M − n)! = Πj=0 (M − j) ≈ M n =
n
(T /∆t) and thus, again appealing to (5.27) to simplify the third term in (5.30), we obtain
µ ¶n
1 T (rT )n
PT [n] = (r∆t)n exp(−rT ) = exp(−rT ) . (5.31)
n! ∆t n!
PT [n] is a Poisson distribution with parameter λ = rT (cf. Eqn. (5.14)), which makes intuitive
sense, since the mean number of spikes in the interval [0, T ] is the firing rate multiplied by the
duration: rT . See Fig. 5.3 for examples. Note that, using (5.31), we may rewrite (5.28) as
µ ¶n
n ∆t
P [{ti }i=1 ] = n! PT [n] . (5.32)
T
The analysis is more complicated when the firing rate r(t) varies, but we can still calculate
P [{ti }ni=1 ]. We again divide the interval [0, T ] into M bins of size ∆t, and observe that the probabil-
ity of observing a spike in a given bin containing ti approaches r(ti )∆t as ∆t → 0. Now let Mi denote
the bin containing the ith spike (ti ∈ [Mi ∆t, Mi+1 ∆t)) and consider the Mi,i+1 = Mi+1 − Mi − 1
bins in the interspike interval (ti , ti+1 ). The probability of not having a spike in any of these is
[1 − r(ti + j∆t)∆t], and so the joint probability of no spikes occurring in this interval is
Mi,i+1
Y
P [no spikes in (ti , ti+1 )] = [1 − r(ti + j∆t)∆t)] . (5.33)
j=1
140
1 0.14
n=0
n=1
0.9
n=2
0.12
n=5
0.8 n=10
0.7 0.1
0.6
0.08
PT[n]
PT[n]
0.5
0.06
0.4
0.3 0.04
0.2
0.02
0.1
0 0
0 2 4 6 8 10 0 5 10 15 20
rT n
Figure 5.3: Left: probability that a constant rate Poisson process generates exactly n spikes within
time T given rate r, plotted as a function of rT from Eqn. (5.31) for various n; note n = 0 gives
an exponential distribution. Right: probability of finding n spikes given rT = 10, plotted as a
function of n for the Poisson process (stars), compared to a Gaussian with µ = σ 2 = 10 (solid).
For convenience we transform the product to a sum by taking logarithms, and use the fact that
ln(1 + ǫ) = ǫ + O(ǫ2 ):
Mi,i+1 Mi,i+1
X X
ln(P [no spikes in (ti , ti+1 )]) = ln[1−r(ti +j∆t)∆t] = − r(ti +j∆t)∆t+O(∆t2 ) . (5.34)
j=1 j=1
As ∆t → 0 this approximation becomes exact and the Riemann sum of (5.34) turns into an integral:
Z ti+1
ln(P [no spikes in (ti , ti+1 )]) = − r(t) dt . (5.35)
ti
Exponentiating (5.35) and using our earlier observation, we find that the probability of observing a
spike in the bin containing ti and no spikes in the succeeding interval (ti , ti+1 ) is well approximated
by µ Z ¶ ti+1
r(ti ) ∆t exp − r(t) dt . (5.36)
ti
Taking products over all spike times and spikeless intervals, we have
µ Z t1 n ·
¶Y µ Z ti+1 ¶¸ µ Z T ¶
n
P [{ti }i=1 ] ≈ exp − r(t) dt r(ti ) ∆t exp − r(t) dt exp − r(t) dt , (5.37)
0 i=1 ti tn
and combining all the exponentials this simplifies to

n
Y µ Z T ¶
n
P [{ti }i=1 ] ≈ r(ti ) exp − r(t) dt (∆t)n , (5.38)
i=1 0
141
with the associated probability density function
n
Y µ Z T ¶
p[{ti }ni=1 ] = r(ti ) exp − r(t) dt . (5.39)
i=1 0
Note that if r(t) ≡ r = const., Eqns. (5.38) and (5.39) reduce to (5.28) and (5.29), as expected.
However, we cannot compute an analog of PT [n] (Eqn. (5.31)) unless r(t) is constant, because the
probability of observing spikes varies over the interval [0, T ].
Although Eqn. (5.39) is not as simple as the constant rate density (5.29), it still provides a
compact description of spike train generation given a firing rate. It is easy to construct simulations
that give sample spike trains given a Poisson distribution, so we now have sample encoding. We
can also simulate by interspike intervals: at each spike, choosing the next interspike interval from
the appropriate distribution. For the constant rate Poisson process, the distribution of interspike
times is an exponential distribution with the form that you will find in Exercise 37. The inter-
spike distribution can also be modified to include a refractory period, which results in a gamma
distribution. For more details see [58, p.33].
Exercise 37. Show that the distribution of interspike intervals for a constant rate Poisson spike
model is an exponential distribution.
Many types of sensory neurons are well-modeled by Poisson spike processes, which provide good
approximations over durations T significantly larger than the refractory period but short enough
that low frequency noise in the neural circuit does not overwhelm the distribution. For instance,
the encoding of pure tones in primary auditory cortex can be described by a Poisson process over
a wide range of stimulus amplitudes [219, p.53].
5.2 Response to stimulus maps: Decoding
While encoding can require complicated nonlinear models, the linear methods developed above
often suffice for the related decoding problem: estimating the stimulus s(t) from a spike train
{ti }. Decoding plays a key role as animals make behavioral choices given neural input about their
environment. We remark that reconstruction of s(t) from {ti } may not be the exact problem solved
by the organism. For instance, as a fly navigates visually it receives angular motion signals from the
H1 neuron (a system extensively studied by Bialek and others [219]). The fly does not necessarily
compute angular velocity and then create a flight torque to compensate: torque may be directly
computed from the H1 signal. However, the latter process also produces an analog signal related
to the stimulus, so it would have a similar structure.
Three issues must be addressed before we develop a simple decoding model. The first is the
lack of a well-defined firing rate, on which we based most of the analysis of §5.1. The second is
dependence on the stimulus distribution, as in Bayes’ rule. The third concern is causality in the
decoding problem.
142
We dealt with the problem of stochasticity in encoding by finding a time-dependent firing rate
r(t) (Eqn. (5.5)) given multiple stimulus presentations, instead of focusing on single instances of
spike trains {ti }. Alternatively, we can give up time dependence and compute an average firing rate
for a stationary stimulus by assuming a constant rate for time T (the r of Eqn. (5.4)). In decoding,
the organism has neither luxury because natural stimuli are typically non-stationary (varying on
similar time scales to interspike intervals), and demand response after a single viewing.
In many systems, such as photon counting in ganglion cells, spatial memory in rat hippocampus,
visual cortical neurons in monkeys, bat auditory cortex during echolocation, speech comprehension
in auditory cortex, and animal calls in crickets and frogs, neurons fire at most one spike before the
stimulus changes significantly [219, pp.57-59]. In the monkey visual system, fixations last perhaps
100 ms, and cortical neurons fire at rates of 10-50 Hz, so during a given fixation, each neuron might
fire 1-5 spikes. One spike cannot determine a rate, and decisions based on the stimulus must be
made rapidly. A moth cannot take time to estimate rates and compute evasive paths when a bat
is closing in on it. It is unlikely that rate codes are used in such sensory systems.
In some animals many neurons respond to a particular stimulus, allowing simultaneous ensem-
ble averaging to obtain r(t), but this population model is also problematic. Such dedicated neural
populations are typically small for invertebrates, and in some (e.g., H1 in fly) they are a single-
ton. Second, even in large populations, rate calculation can be noisy, because averaging requires
independence to gain accuracy, and connected neurons may be highly correlated through synaptic
excitation and inhibition. Finally, there is a balance between accurately reconstructing stimuli
and having neurons do other tasks. How much benefit do multiple neurons give, for the cost of
having them compute in parallel instead of something else, or even not existing? Again, we see
that individual spike times rather than firing rates can be crucial to decoding.
It is often easier to determine the encoding distribution P [{ti }|s(t)] than the decoding distribu-
tion P [s(t)|{ti }] from experiments, unless the train in the latter contains only one or two spikes.
Does a full description of P [{ti }|s(t)] suffice to determine P [s(t)|{ti }]? The two conditional distri-
butions resemble the sections of a dictionary for translation between English and another language,
say Greek. In encoding we use the English-to-Greek section, in decoding Greek-to-English1 . Does
the former imply the latter? To answer this we recall Bayes’ rule (5.17):
P [s(t)]
P [s(t)|{ti }] = P [{ti }|s(t)] . (5.40)
P [{ti }]
Here P [{ti }] acts as a normalization factor, but the stimulus distribution P [s(t)] establishes the con-
text. Without this crucial information, it is impossible to decode a signal, even with full knowledge
of the encoding dictionary. This is echoed in both our language analogy and in natural signals.
An example with a very simple encoding scheme is given by Rieke et al. [219]. Suppose that a
neuron responds to a stimulus s by producing output x with noise η:
x = s+η. (5.41)
Both stimulus and noise are characterized by distributions. There is a distribution for the noise
and for the stimulus, but other than the noise, the encoding scheme is linear, and the encoding
1
If you are Greek, apply the mapping English ↔ Greek.
143
dictionary P [x|s] is therefore very simple:
P [x|s] = P [η = x − s] : (5.42)
the probability of output x given s is exactly the probability that noise η is equal to the difference
x − s. Assuming that noise samples are drawn from a distribution Pnoi , we may calculate the
decoding distribution from Bayes’ rule (5.40):
P [x|s]P [s] 1
P [s|x] = = Pnoi [η = x − s]P [s] . (5.43)
P [x] P [x]
How do we determine the best estimate of s given x? A frequent measure of best is the χ2 or mean
square error: the average of the squared difference between estimated and actual values
χ2 (s, sest ) = h|s(t) − sest (t)|2 i . (5.44)
It can be proved that χ2 is minimized by an estimate equal to the conditional mean, which is similar
to the mean from (5.8), but using conditional probability:
Z
sest = sp[s|x] ds . (5.45)
Noise is usually modeled as a zero-mean Gaussian with variance hη 2 i:
η2
· ¸
1
Pnoi (η) = p exp − , (5.46)
2πhη 2 i 2hη 2 i
and substituting (5.46) into (5.43), we obtain
(s − x)2
· ¸
1 P [s]
P [s|x] = p exp − . (5.47)
2πhη 2 i 2hη 2 i P [x]
To proceed we need the distribution of stimuli P [s]. If this is also Gaussian, with mean 0 and
variance hs2 i, then
s2 (s − x)2
· ¸ · ¸
1 1
P [s|x] = exp − 2 exp − . (5.48)
2hs i 2hη 2 i
p
P [x] 2π hs2 ihη 2 i
Combining terms and defining a new normalization factor, we get

· 2µ ¶ µ ¶¸
1 s 1 1 x
P [s|x] = exp − + +s , (5.49)
Z(x) 2 hs2 i hη 2 i hη 2 i
where Z(x, hs2 i, hη 2 i) = P [x] 2π hs2 ihη 2 i exp(x2 /2hη 2 i) is fixed, given x; in particular, the variance
p
hs2 i depends on the distribution from which s is drawn, but does not change once that distribution
is chosen. Moreover, P [s|x] is a Gaussian distribution and for Gaussians, the mean is also the value
at which the density is maximum, so we may set ∂P [s|x]/∂s = 0 to obtain the condition
µ ¶ µ ¶
1 1 x
−s + + = 0. (5.50)
hs2 i hη 2 i hη 2 i
144
This implies that our best estimate is
1
Ã ! µ ¶
hη 2 i SN R def
sest = x 1 1 =x = K1 (SN R) x , (5.51)
hs2 i
+ hη 2 i
SN R + 1
where SN R = hs2 i/hη 2 i denotes the signal to noise ratio. The decoder is evidently also linear, with
gain K1 a function of SN R. The greater the noise, the lower the stimulus estimate will be, and as
SN R → 0, sest → 0. So far, so good; but if we replace the Gaussian stimulus distribution by the
two-sided exponential distribution, P [s] = (1/2s0 ) exp(−|s|/s0 ), for example, we get a nonlinear
decoder.
Exercise 38. Given the two-sided exponential distribution P [s] above, write a decoder (on paper,
not programming) that for each value of x returns the most likely value of s as sest . How does it
differ from the linear decoder (5.51)? Continue to assume Gaussian noise and just return the most
likely value. Do not carry out analytical minimization of χ2 .
There is extended discussion of fly H1 experiments in [219], and a key point is made in Figure
2.2. The conditional mean of the neural output given the stimulus (encoding) is shown in i), and
it is clearly nonlinear. The conditional mean of the stimulus given the neural output is shown in
h), and it is approximately linear. In this system, nonlinear encoding does not imply nonlinear
decoding. Exercise 38 illustrates that the opposite can occur: linear encoding can accompany
nonlinear decoding. Thus, given linearity of one, you cannot assume it of the other.
Finally we address causality. In encoding, this is simple: spikes are only caused by stimulus
events that precede them: a spike at time t1 is related to s(t) only for t < t1 . Similarly, in decoding
at time t0 , we may only use spikes that occurred for ti < t0 , but each of them was caused by the
stimulus for t < ti . What does this imply about the stimulus at t0 if the most recent spike preceded
it by t0 − tn ?
There are two solutions to this problem. The first is the most intuitively obvious: decode with a
delay. We cannot determine the stimulus at the current time t0 , but if we look back over duration
τ0 , we can estimate the stimulus if there is at least one spike in [t0 − τ0 , t0 ], and the more spikes in
this interval, the better we can do. This suggests making τ0 large, to get more accurate estimates.
But there is usually a competing pressure for speed. A fly with τ0 = 200 ms could not turn quickly,
and most experiments indicate that τ0 ≈ 30 ms for H1 neuron decoding. (We shall meet another
speed-accuracy tradeoff in human decision making in chapter 6.) The second solution is to assume
that the stimulus has correlations that extend forward in time: if you know the stimulus at time t1 ,
then you know something about it for t > t1 (a sort of continuity). With correlation time τc , the
best guess for the stimulus at time t > t1 is s(t1 ) decaying to the average value sav of the stimulus
with time constant τc . (Note that sav = 0 if s is a zero mean Gaussian as assumed above). In this
way, all spikes that occur before t0 can be used to estimate s(t0 ). Because of causality, we must
use either delays in decoding, correlations in the stimulus, or both.
We can now create a simple model for decoding. We estimate the stimulus via a linear response
function, which takes both encoding by a given neuron and the stimulus distribution into account.
As in linear vibration theory, where we convolve a driving force with an impulse response function
to get a particle’s trajectory, here we convolve a sequence ρ(t) of delta function spikes with the
145
linear response function to get the stimulus estimate:
Z T N
X N
X
sest (t) = K1 (τ ) δ(t − τ − ti ) dτ = K1 (t − ti ). (5.52)
0 i=1 i=1
If estimation is delayed, then we effectively find sest (t − τ0 ).
Higher order terms may be added as in the Wiener filter, but in many systems, including fly
H1, this does not help. This is because the spike triggered average C(τ ) provides the best stimulus
estimate if there is only a single spike in the observation window: with estimation delay τ0 , our
best filter is
K1 (τ ) = C(τ0 − τ ) , (5.53)
i.e., C(τ ) run backwards and shifted by τ0 . The filter K1 has limited temporal width, related to the
stimulus correlation time τc . If the product hriτc is small, implying less than one spike on average
in each period of length τc , the reconstruction at any given time will depend on a single spike. As
spike density increases, higher order corrections may be needed, but for many of the systems noted
earlier, the reconstruction s(t) given by the STA and the spike in the interval [t − τ0 , t] is adequate.
If we include stimulus correlations, then spikes before t − τ0 also affect the reconstruction, since
the stimulus estimate at those spike times decays as τc to the average stimulus value. If neither
delay nor stimulus correlations are included, then the best guess for the stimulus at any time is
simply the stimulus average and K1 (τ ) = 0. For sufficiently large delays, setting K1 (τ ) = C(τ0 − τ )
to zero for τ < 0 has little effect, since C(τ ) → 0 for large τ : spikes are not correlated to stimulus
values far enough in the past. The part of K(τ ) set to 0 is already very close to 0 for τ0 large enough
for many systems, and K(τ ) is set to zero for τ < 0 because of causality: spikes occuring after time
t cannot affect our estimate at t − τ0 . The previous reasoning merely states that this imposition of
causality does not affect accuracy much. The construction of K(τ ) from C(τ ) is shown in Figure
5.4.
A final interpretation of the linear response function K1 is that it serves as a low pass filter.
High frequency components of the spike train are attenuated, so the reconstruction is insensitive to
small changes in the spike timing. The fly H1 filter attenuates frequencies above approximately 25
Hz. Provided the relevant information is carried below the cutoff frequency, this also filters out high
frequency noise in the neural system, a desirable feature which helps linear decoding work. Thus,
in many cases linear models that do not work for encoding do work remarkably well for decoding.
For more information see [219, §2.3] and [58, §3.4].
Another model for decoding is explored in [219, §2.2]. Briefly, one computes the stimulus given
a spike pattern. For instance, what is the average stimulus, given a two spikes separated by 20 ms
with none in between? During extended stimulus presentations, patterns of two spikes, or a single
spike preceded by 25 ms of silence, etc., appear many times. Random sampling of the stimuli that
correspond to these responses gives a response-conditional ensemble for each response. This can
be done in depth, looking at many different responses, which allows a type of decoding that finds
the appropriate stimulus given a response. An examination and explanation of literature for this
approach, some of which is referenced in [219, §2.2], would make a good semester project.
146
a C(t) b C(−t)
t t
c C(t0−t) d
K(t)=C(t0−t), t>0
t t
Figure 5.4: a) The spike-triggered average C(τ ). b) Flipping time to get C(−τ ). c) Shifting C(−τ )
by τ0 to introduce the estimation delay. d) Setting K(τ ) = 0 for t < 0 to enforce causality in
estimation.
5.3 Information theoretic approaches
Information theory gives us a principled way to measure information transmission in bits/second

or bits/spike, in much the same way that an internet connection might have had a data rate of
56 kbits/sec, back when the first draft of these notes began. This allows us to ask interesting
questions which would be hard to formulate without a proper way of quantifying information: Can
we quantify in bits the information that a spike train conveys about the environment? Is there a
maximum possible information transmission rate for a neuron, and how close are real neurons to
being optimal? Is there redundancy between the information transmitted by different neurons? How
much does noise corrupt the information transmitted by a neuron, and can this lost information
be recovered by looking at more than one neuron? Do groups of neurons transmit information
optimally?
Information theory also gives ways of evaluating how well a given encoding/decoding model
captures recorded data. One can compare the information transmission rate of a model with a
model-independent estimate of how much information is actually being transfered. For example,
one can ask how much information is lost by characterising a spike train only by its firing rate
r(t), and whether a model which tries to capture certain structures in the spike train (such as
bursts) serves to transmit information more efficiently. The mathematical theory of information
began with Shannon’s classic paper [239] of 1948, in which entropy was introduced to describe
signal transmission over noisy channels, cf. [240]. (Entropy arose in statistical physics in the
19th century, and had already been characterized by Boltzmann as missing information.) In the
same year Weiner proposed entropy as a useful descriptor for sensory receptors and information
transmission in the nervous system [275, especially Chap. III]. Also see [219, Chap. 3] for more
background on information theory and further examples of its uses in neuroscience.
147
5.3.1 Entropy and missing information
The most common information-theoretic quantity is the entropy S(p) of a probability distribution
p. Specifically, given a discrete random variable X that can take any one of N distinct values Xj
with probabilities pj = p(Xj ) (e.g., Xj = letters in a message, or words in a dictionary), the entropy
of p is defined as
XN
S(p) = −k pj log pj , (5.54)
j=1
where k is a constant which may be eliminated by choosing the base of the logarithm. The con-
vention is to define bits of information by taking logs to the base 2:
N
X X
S(p) = − pj log2 pj or, a little more generally, S(p) = − p(x) log2 p(x) . (5.55)
j=1 x
In this context, entropy measures the average number of yes/no questions needed to determine
which element Xj has been chosen in drawing a sample, assuming that for each trial an oracle
chooses one particular element (with probability p) for us to guess. Intuitively, it is a measure of
(the log of) the effective number of elements, taking into account that p may put greater weight
on some elements than others.
If the distribution p is uniform, its entropy is simply log2 of the number of elements; thus, for a
fair coin S = −2 × (1/2) log2 (1/2) = log2 2 = 1. One bit of information is revealed when the coin
lands: heads, or tails. For a fair six-sided die, we have
6 · ¸
X 1 1 ln 6
S=− log2 = log2 6 = = 2.58496 . . . . (5.56)
6 6 ln 2
j=1
(In the final step we recall the definition of logarithms to the base a, the natural logarithm loge x =
ln x, and the formula loga x = ln x/ ln a, which we shall use again later.) For a fixed number of
elements, non-uniformity in the distribution p reduces entropy: e.g., for a loaded die in which two
numbers come up with probabilities each 1/4 and the remaining four with probabilities 1/8, we
have ½ · ¸ · ¸¾
1 1 1 1 log2 4 + log2 8
S = − 2 × log2 + 4 × log2 = = 2.5 . (5.57)
4 4 8 8 2
Non-uniform probabilities effectively reduce the number of alternatives that are likely to appear.
The definition (5.55) respects the notion of independence introduced at the end of §5.1.2,
p(x, y) = p(x)p(y), in that for any distribution of two independent variables
S[p(x, y)] = S[p(x)] + S[p(y)] : (5.58)
thus, entropies are additive (iteration of (5.58) leads to an analogous expression for N variables).
However, one might ask if other definitions of entropy would suffice. In fact, the additivity constraint
(5.58) defines S(p) uniquely if we also require that (1): S(p) be a continuous function of the
probabilities and (2): the entropy σk of the uniform distribution over k outcomes increases with
k. The following exercises help you prove that under these assumptions, S(p) must be given by
Eqn. (5.55).
148
P
Exercise 39. Use the expression S[p(x, y)] = S[p(x)] + x p(x)S[p(y|x)] to show that the entropies
of independent variables add, and in particular that σkn = nσk . (Recall that p(y|x) denotes the
probability of y, given x.)
Exercise 40. It is a fact that for an arbitrarily large integer n, there exists an integer m(n) such
that 2m(n) ≤ k n < 2m(n)+1 . Use this and the assumption that σk increases with k to show that
σk = log2 k.
P
Exercise 41. Using S[p(x, y)]P = S[p(x)] + x p(x)S[p(y|x)], show that the entropy of a probability
distribution in which pi = ki / i ki (i.e. pi is a rational number for all i) is given by (5.55). Can
you now prove that (5.55) holds for all discrete probability distributions?
Continuous distributions can R be accomodated by replacing the sum with an integral to define
differential entropy S(p) = − p(x) log p(x) dx, where p(x) is a probability density with respect to
some measure dx. For the Gaussian distribution of Eqn. (5.11), this gives
Z ∞
(x − µ)2 (x − µ)2
· ¸ ½ · ¸¾
1 1
S[N (µ, σ)] = − √ exp − log2 √ exp − dx
−∞ 2πσ 2 2σ 2 2πσ 2 2σ 2
1
= . . . = log2 2πeσ 2 bits ,
¡ ¢
(5.59)
2
where . . . represents a calculation that is detailed in [219, Appendix A.9]. This example emphasizes
the fact that entropy measures variability: up to an additive constant, S[N (µ, σ)] = log2 σ. Also
note that the entropy is independent of the mean µ.
In general, entropy is well-defined only for discrete probability distributions over finite numbers
of elements. In particular, differential entropy is not invariant under changes of variables such as
changing the units of x. This is troublesome, since entropy should be a function of the probability
distribution alone, and not of the underlying element space. As noted in [219, §3.1], the entropy
of a voltage distribution should not change by a factor of 3 log2 (10) if one goes from mV to V in
measuring variances! Moreover, differential entropies can be negative. The underlying problem is
that the number of states available to a continuous variable x is infinite, even when the support of
p(x) is finite (p(x) ≡ 0 for x 6∈ [a, b] ⊂ R).
Fortunately, differences of differential entropies are invariant under changes of variables. For
example, comparing Gaussians with variances σ1 and σ2 the difference is
log2 2πeσ12 − log2 2πeσ22
¡ ¢ ¡ ¢ µ ¶
σ1
S[p1 (x)] − S[p2 (x)] = = log2 ; (5.60)
2 σ2
all the constants vanish, as would multiplicative factors due to changes in units. This also reveals
that doubling the standard deviation increases entropy by 1 bit, a consequence of taking logs to the
base 2. Finally, note that mutual information, introduced in §5.3.4, can be expressed as differences
of entropies.
5.3.2 Entropy of spike trains and spike counts
We now illustrate an application to neuroscience by computing the entropy of a spike train,

following [219, §3.1.2]. Suppose that we have a record of length T from a neuron firing at rate r,
149
or, more precisely, with spike count rate r, according to Eqn. (5.4). We further assume that the
record is discretized with precision (bin size) ∆t, small enough so that at most one spike falls in
each bin (cf. §5.1.4). Labeling bins with and without spikes by 1’s and 0’s respectively, a spike
train becomes a binary word of length N = T /∆t containing rT 1’s, and the number of distinct
possible words is given by the binomial coefficient (N choose rT ):
µ ¶
T /∆t (T /∆t)!
= = Ntot . (5.61)
rT (rT )!(T /∆t − rT )!
If each spike train is equally probable, the probability of any given word i ∈ [1, Ntot ] is pi = 1/Ntot ,
from which we may compute the entropy of the spike train
N tot N tot µ ¶ · ¸
X X 1 1 (T /∆t)!
Strain = − pi log2 (pi ) = − log2 = log2
Ntot Ntot (rT )!(T /∆t − rT )!
i=1 i=1
ln[(T /∆t)!] − ln[(rT )!] − ln[(T /∆t − rT )!]
= . (5.62)
ln 2
If N = T /∆t, rT and N − rT are all large (e.g., as T → ∞), we may simplify (5.62) by using
Stirling’s approximation, which holds for large N :
√
N ! ∼ 2π N N +1/2 e−N , or ln(N !) = N [ln(N ) − 1] + O(ln(N )) . (5.63)
Substituting (5.63) into the three factorials of (5.62), neglecting the O(ln(·)) terms, and further
simplification detailed in [219, §3.1.2] yields
T
Strain ≈ − [r∆t ln(r∆t) + (1 − r∆t) ln(1 − r∆t)]
∆t ln 2
T
= − [r∆t log2 (r∆t) + (1 − r∆t) log2 (1 − r∆t)] . (5.64)
∆t
The assumption of ≤ 1 spike/bin implies that r∆t < 1, and so Strain > 0.
Note that S ∝ T , where T is the duration of the spike train, and if the probability r∆t of
observing a spike in any bin is small, we may use the approximation ln(1−r∆t) = −r∆t+O((r∆t)2 )
to estimate the entropy rate of the spike train:
Strain r∆t[1 − ln(r∆t) + O(r∆t)] r[ln(e) − ln(r∆t)] ³ e ´
≈ ≈ = r log2 bits/sec . (5.65)
T ∆t ln 2 ln 2 r∆t
Dividing (5.65) by the mean firing rate r, we obtain the entropy per spike in units of bits/spike.
Fig. 5.5(a) plots this quantity as a function of the timing precision ∆t. Note that the entropy
can exceed 1 bit/spike, especially for small ∆t. The information per bin cannot exceed 1 bit, but
empty bins also provide information. More precisely, interspike intervals (ISIs) are distributed with
mean 1/r sec and each ISI is measured with accuracy ∼ ∆t, and therefore chosen from ∼ 1/r∆t
possibilities. Each ISI, and hence each spike, is thus associated with ∼ log2 (1/r∆t) bits of entropy.
Spike train entropy provides a standard against which to guage the performance of different
coding schemes, including real neural codes. For example, we can compare the expressions (5.64)
and (5.65), which assume that spike times are tracked with precision ∆t, with a scheme in which
we simply count the total number of spikes in a (possibly large) window T , or equivalently, measure
the average firing rate as in Eqn. (5.4). The former case corresponds to a spike timing code, and
150
8 8
Entropy per Spike (bits)
Entropy per Spike (bits)

6 6
4 4
2 2
0 0
0 2 4 6 8 10 0 2 4 6 8 10
Timing Precision (msec) Mean Spike Count
(a) (b)
Figure 5.5: Entropy of a spike train. (a) Entropy per spike vs. bin width ∆t (Eqn (5.65) divided
by r) for r = 50 spikes/sec. (b) Entropy per spike vs mean spike count rT (Eqn. (5.74) divided by
rT ). Figure replotted from [219].
the latter to a spike rate code or simply rate code, in which information is carried by the average
spike count n = rT or its associated rate r. Letting PT [n] denote the probability of finding n spikes
in a window of length T as in §5.1.4, we require the entropy of the spike count distribution:
X
Scount = − PT [n] log2 (PT [n]) . (5.66)
n
Since PT [n] is a probability distribution with mean rT , it must satisfy

X X
PT [n] = 1 and nPT [n] = rT , (5.67)
n n
and although these two constraints do not uniquely specify PT [n], we can use them to find the
distribution that maximizes the entropy. This is the most random description of spike count
probability consistent with our prior knowledge, and it will provide an upper bound for the entropy
of spike counts from any real neuron.
To maximize (5.66) under the constraints (5.67), we seek a critical point of the functional
Ã ! Ã !
X X X
G(PT [n]) = − PT [n] log2 (PT [n]) − λ1 PT [n] − 1 − λ2 nPT [n] − rT , (5.68)
n n n
where λ1 and λ2 are Lagrange multipliers. This is a problem in the calculus of variations, introduc-
tions to which can be found in [274, 101]. Much as in multivariable calculus, to locate a maximum
we take the first and second derivatives to find analogs of the gradient vector and Hessian matrices,
set the former equal to zero to locate a critical point, and check that the latter is negative definite
to verify that this is a (local) maximum. Specifically, to find the critical point (here, a “critical
function”) of (5.68), we perturb the function PT [n] by adding a small variation ǫh[n], differentiate
G(PT [n] + ǫh[n]) w.r.t. ǫ (cf. Eqn. (5.68)), and set ǫ = 0. A long, explicit calculation is displayed
in [219, Appendix 11], in which the second variation is also computed. The expression for the first
variation is found to be:
X ½ 1 + ln(PT [n]) ¾
δG = − + λ1 + λ2 n h[n] . (5.69)
n
ln 2
151
Since h[n] is arbitrary, we may choose it to be non-zero on any integer n and zero on all others,
implying that the expression within the curly brackets must vanish for all values of n, if PT [n] is
to be a critical point. After rearrangement, this yields
1 −λn
PT [n] = e , (5.70)
Z
in which we have subsumed the Lagrange multipliers λ1 and λ2 into the constants Z and λ. We
have determined that an exponential distribution maximizes the spike count entropy.
So far, we have not specified the limits of the sums on n in Eqns. (5.66-5.69). It is possible (no
matter how unlikely) that no spikes fall in the interval [0, T ], so the lower limit is n = 0. However,
we cannot assign an upper bound to n, for although neural firing rates are bounded above, more
and more spikes may be observed as T grows. Moreover, in continuing to solve for Z and λ it is
simpler to allow n → ∞.
P∞ −λn /Z
To find the normalization constant Z, we use the first of the constraints (5.67), n=0 e =
1, and sum the geometric series to obtain
∞ ³
X ń 1
Z= e−λ = . (5.71)
1 − e−λ
n=0
The second constraint can then be written

∞
1 X −λn
ne = rT . (5.72)
Z
n=0
∂ −λn
To sum this series, it is helpful to note that e = −ne−λn , so that we can rewrite (5.72) as
∂λ
follows:
∞ ∞
e−λ
µ ¶
1 X ∂ −λn 1 ∂ X ³ −λ ń 1 ∂ 1
rT = (−1) e =− e =− = , (5.73)
Z ∂λ Z ∂λ Z ∂λ 1 − e−λ 1 − e−λ
n=0 n=0
where (5.71) was used in the last equality. Thus, λ = ln(1 + 1/rT ) (> 0, as required), and via
(5.71) we find that Z = 1 + rT . At last, substituting PT [n] into (5.66) and using almost everything
found above, we obtain
∞ ∞ −λn ∞
Ã !
X 1 X e X ne−λn
max
Scount = − PT [n] log2 (PT [n]) = . . . = ln Z +λ
ln 2 Z Z
n=0 n=0 n=0
µ ¶
ln Z + λrT 1
= = log2 (1 + rT ) + rT log2 1 + bits . (5.74)
ln 2 rT
max by the mean spike count rT , in Fig. 5.5(b) we plot the maximum entropy per
Dividing Scount
spike as a function of rT . Note that it decreases with T , implying that the spike train’s capacity
to carry information declines as the time resolution coarsens. Indeed, as rT → ∞, the available
max /rT → 0. This is as expected, for firing rates must vary
information capacity per spike Scount
with time if they are to carry much information, and we are averaging over a window of length T .
Indeed, if we count spikes in windows with lengths equal to the mean ISI, so that rT = 1, then the
152
max is precisely 2 bits/spike: rate codes can carry more than 1 bit per/spike if spike
entropy in Scount
rates are measured on time scales comparable to ISIs. More significantly, Eqns. (5.64) and (5.74)
reveal that the entropy of the spike train is typically greater than the entropy of the spike count,
because keeping track of individual spike timing allows for a larger set of available states, and thus
more available information.
Exercise 42. Quantify the claim made immediately above by computing the behaviors of Strain and
max as T increases for fixed r and ∆t. In particular, how does S max depend on T as T → ∞?
Scount count
Finally, if we count the numbers of spikes in very small time windows so that rT ≪ 1, the
maximum entropy rate of the spike count becomes
max
Scount ln(1 + rT ) + rT ln(1 + 1/rT ) r[1 − ln(rT ) + O(rT )] ³ e ´
= = ≈ r log2 bits/sec : (5.75)
T T ln 2 ln 2 rT
this is just the entropy rate of the full spike train (5.65), with the bin size ∆t replaced by T .
Counting spikes in small enough windows blurs the distinction between rate and timing codes.
What rate or timing codes actually tell us about sensory inputs depends on the time scales of
the stimuli, and on the range of firing rates available to the relevant neurons. If a signal (in the
natural world or laboratory) varies sufficiently slowly, then it makes sense to divide the spike train
into windows that each contain many spikes. The resulting time-varying spike rate can presumably
report quasi-static descriptors of the signal averaged over each window. In contrast, a timing code
can probe brief transients and sudden events. In this case, the distinction between rate and timing
codes is clear. However, if stimuli vary on time scales comparable to ISIs, then the natural time
windows contain few spikes, and the rate code transforms smoothly into a timing code, as shown
in the calculations above.
In chapter 6 we describe models of evidence accumulation over periods of O(1) sec that rely on
firing rates. In contrast, sound localization via binaural sensing requires precision on the order of
10 − 200 microseconds: 1 − 2 orders of magnitude shorter than an action potential. This strongly
suggests that a timing code is used to transmit interaural time differences (ITDs) between the two
ears. The classical theory of Jeffress [144] posits the use of axonal delay lines from sensory cells in
the two ears. Depending upon the angle of the sound source relative to the head, spikes traveling
along the delay lines coincide at different locations on an array of coincidence detectors. These
convert ITDs into spike rates, transforming the timing code to a place map in which firing rates
of cells signal the probable direction of the source, much as in the orientation-selective neurons
in visual cortex [136] (cf. [277, §§6.2, 7.5]. Neural circuits consistent with Jeffress’ theory have
been found in animals such as the barn owl and chicken, but other mechanisms have also been
identified in birds, mammals and reptiles, particularly regarding details of the rate coding of ITDs.
Acoustic coupling via the mounth cavity can increase ITDs. A distinct system using interaural
level differences (ILDs) also exists, and yet another principle involving monaural frequency analysis
is used for localization in the vertical plane.
Recent reviews of the sound-localization literature appear in [105, 13]. In this field details of
ion channels, dendrites and axons in single cells, timescales of synapses, balances of excitation
and inhibition, and circuit architectures interact with the mechanics of sound waves to form a rich
environment both for nature’s evolutionary solutions, and for theorists’ and modelers’ contributions.
153
Some examples of the latter can be found in [4, 253, 145], and in particular [113] develops an
information-theoretic analysis of optimal coding strategies to explain different mechanisms on the
basis of head size and frequency ranges. The whole area would be a good source for final projects.
Exercise 43. Redo the computations leading to PT [n] above

P without the second constraint of (5.67),
i.e., find the distribution
P that maximizes S(p[n]) = − n p[n] log2 p[n] subject only to the normal-
ization condition n p[n] = 1. What additional assumptions on the sums, if any, are necessary for
your calculations to make sense? Does the distribution that you have found seem reasonable for
describing neural spike counts?
5.3.3 Relative entropy
Relative entropies (also known as the Kullback-Leibler divergence) are defined in a rather broad
context. They can be concisely developed from first principles. Suppose we are playing a game of
‘20 questions’ on the interval [0, 1]: there is a particular number a we need to find by asking an
oracle yes/no questions such as: ‘Is a < 1/3?’. We start off not knowing anything about a, so we
assume a to have uniform probability r over [0, 1]. Suppose further that the actual value of a is
drawn from a probability distribution p. For example if p is a Dirac delta function δ(0) at a = 0,
the oracle will answer our question by saying ‘a < 1/3 with probability 1’. Let us try to quantify
the information gained by asking a set of such questions. We want the relative entropy I(p||r)2
to be solely a function of our prior expectations r(answers) and of the probabilities p(answers)
implied by the oracle’s responses. If these are very different from what was expected, then the
question was informative.
The information gain should have some basic properties. If the oracle always answers what
we expected a priori, then we already know what the oracle knows (p = r) and the information
gain should be I(r||r) = 0. We would also like the order in which we ask questions not to change
how much information is gained. We can start by asking a certain question X, thereby gaining
information I(pX ||rX ). If the answer is X = x and then we proceed to ask another question Y , the
quantity of new information gained will be I(pY |X=x ||rY |X=x P ). Since each answer to X occured
with probability pX , the average information gain of Y is X pX I(pY |X ||rY |X ). To ensure that
the order in which questions are asked does not matter, we require:
X X
I(pX,Y ||rX,Y ) = I(pX ||rX ) + pX I(pY |X ||rY |X ) = I(pY ||rY ) + pY I(pX|Y ||rX|Y ) . (5.76)
X Y
Here pX,Y , pX|Y , pY |X , etc. denote the joint and conditional probabilities, as defined in §5.1.2. We
do not distinguish between the discrete and continuous cases (P [X, Y ] and p(x, y), etc.).
Surprisingly, there is only one continuous function of p and r which satisfies I(r||r) = 0 and
(5.76), up to a multiplicative constant. The multiplicative constant is usually chosen so that one bit
is the information gained by asking a yes/no question for which our prior expectation is 1/2 ‘yes’ 1/2
‘no’ and the actual answer is ‘yes with probability 1’, (this is the most informative yes/no question
possible). The following exercises help you prove that under these assumptions, the relative entropy
2
the double bar || notation is conventional.
154
must have the expression: µ ¶
X pX
I(pX ||rX ) = pX log2 bits . (5.77)
rX
X
Exercise 44. Suppose we are trying to guess a number a ∈ [0, 1], and our prior expectations over
[0, 1] are uniform. Use I(r||r) = 0 and (5.76) to show that if the answer to the question ‘Is a < t2 ?’
is ‘Yes with probability 1’ for some 0 < t < 1, then I(a < t2 ) = 2 I(a < t).
Exercise 45. Show that if the answer to the question ‘Is a < 1/2?’ is ‘Yes with probability 1’,
then this question contains one bit of information. Now show that if the answer to ‘Is a < t?’ is
‘Yes with probability 1’ for some 0 < t ≤ 1, then I(a < t) = log2 1t . [Hint: construct a decreasing
sequence tn converging towards t for which I(a < tn ) is known for all n.]
Exercise 46. Suppose the answer to the question X ‘Is a < t?’ is ‘Yes with probability α’.
By expanding the yes/no question game to guessing (a, b) ∈ [0, 1]2 for a well-chosen probability
distribution p(a, b) and a uniform prior, show that I(pX ||rX ) satisfies (5.77). [Hint: use (5.76)
once more.] Can you generalize to show that (5.77) holds for questions with more than 2 answers
or for more than one question?
One can show that (5.77) continues to define the gain in information for very general
P p and r (as
long
R as p is absolutely continuous with respect to (w.r.t.) r) by replacing sums with integrals
or expectations E. The questions can be much more general than yes/no questions. In fact, it
is not necessary to define questions at all: I(p||r) defines the information someone gains by being
told that the true probability distribution is p when he thought it was r.
Exercise 47. Using Jensen’s inequality3 , show that I(p||r) is always ≥ 0, and only = 0 if p = r.
Use this to show that asking more questions always increases the information gain.
5.3.4 Mutual information
We can now define the mutual information MI(X, Y ) which measures the information shared
between two random variables, and can be written in a number of equivalent ways:
· ¸
p(x, y)
Z Z
MI(X, Y ) = I(pX,Y ||pX pY ) = p(x, y) log2 dx dy
p(x)p(y)
· ¸
p(x|y)
Z Z Z
= p(y) I(pX|Y ||pX ) dy = p(y) p(x|y) log2 dx dy (5.78)
p(x)
· ¸
p(y|x)
Z Z Z
= p(x) I(pY |X ||pY ) dx = p(x) p(y|x) log2 dy dx .
p(y)
The expressions in (5.78) can be derived by use of the conditional entropy

Z
S(x|y) = − p(x|y) log2 p(x|y) dx , (5.79)
3
Jensen’s inequality
P may ! be stated
P as follows. For a real-valued, convex fucntion φ, numbers x1 , x2 , . . . , xn and
j a j xj j φ(aj )
weights aj > 0, φ P ≤ P .
j aj j aj
155
which quantifies the information available, or remaining variability, in x given the observation y.
The gain in information due to observing y, and averaging over many such observations, is therefore
Z
MI(X, Y ) = [S(x) − S(x|y)] dy . (5.80)
Exercise 48. Derive the three integral expressions in (5.78) from Eqns. (5.79-5.80) and the rela-
tions among the joint and conditional probabilities for x and y.
For example, we can calculate the mutual information between the spiking of a neuron and a
stimulus s(t) that leads up to the spike, as we expect spikes to carry information about the stimulus
def
through p(s(t < t0 )|given a spike at t0 ) = ps|spike . This calculation can be done in two different
ways. If r is the average firing rate of the neuron, then the probability of having a spike in an
interval of length ∆t is hri∆t and we can write:
MI(s, spike ∈ [t; t + ∆t]) = hri∆t I(ps|spike ||ps ) + (1 − hri∆t) I(ps|no spike ||ps ) . (5.81)
Note that ps|no spike = ps = the probability distribution of the stimulus. This is because in our
idealization spikes are mathematical points (they take no time to happen), so conditioning on
having no spike is equivalent to not conditioning at all. (In fact the argument that shows this
is subtle, since one must take the limit ∆t → 0.) This implies that the second term of (5.81) is
zero. Dividing through by ∆t gives the mutual information between the stimulus and a spike in
bits/second, and dividing by hri, we obtain the mutual information as
MI(s, spike) = I(ps|spike ||ps ) bits/spike . (5.82)
The expression (5.82) makes a lot of sense, since it is saying that the information gained by
knowing there was a spike is determined by how different the distribution of stimuli is close to a
spike compared to the overall stimulus distribution. In practice, evaluating this quantity requires
an estimate of ps|spike and ps , which may or may not be feasible, depending on the stimulus. We
can do the same calculation by first integrating over stimuli. This yields
Z
MI(s, spike) = p(s) I(pspike|s ||pspike ) ds bits/second . (5.83)
Once again, measuring pspike|s could be difficult for high-dimensional stimuli, but as usual one
can present the same stimulus many times while recording a neuron’s response in order to get an
estimate of pspike|s(t) = r(s(t)). Recalling the definition of ensemble averages in Eqns. (5.5-5.6) of
§5.1.1, we replace pspike|s(t) /pspike with r(t)/hri and divide through by hri to obtain:
T · ¸
1 r(t) r(t)
Z
MI(s, spike) = log2 dt bits/spike . (5.84)
T 0 hri hri
For early visual neurons such as retinal ganglion cells or the H1 neuron in the fly, the amount of
information that each spike carries about the visual scene is in the range of 1 − 6 bits/spike. Note
that this calculation does not take into account correlations between spikes. In practice, spikes that
occur in close succession often convey overlapping pieces of information about the stimulus, and the
overall information transfered by the spike train is consequently less than the sum of informations
156
transfered by each spike separately. This need not always be the case: it is possible for the total
information that several spikes carry about the stimulus to be more than the sum of each separate
spike’s information. For example, In the spike train code of §5.3.2 information per bin cannot
exceed 1 bit, but entropies can exceed 1 bit/spike, because interspike intervals are also monitored.
As noted in §5.3.1, mutual information can be expressed as differences of entropies:

· ¸
p(x, y)
Z Z
MI(X, Y ) = p(x, y) log2 dx dy = S(pX ) + S(pY ) − S(pX,Y )
p(x)p(y)
· ¸
p(x|y)
Z Z Z
= p(y) p(x|y) log2 dx dy = S(pX ) − p(y) S(pX|Y ) dy (5.85)
p(x)
· ¸
p(y|x)
Z Z Z
= p(x) p(y|x) log2 dy dx = S(pY ) − p(x) S(pY |X ) dx .
p(y)
Since entropy differences are well defined for continuous random variables, these expressions may
also be used in such cases.
Exercise 49. For a probability distribution over a finite number of elements, derive an expression
for S(p) as a function of I(p||u), where u is the uniform distribution. What goes wrong when trying
to generalize to an infinite number of elements?
For more examples of these ideas in action in neuroscience, see [26] and [219, Chap. 3].
157
Chapter 6
Stochastic models of decision making
In this final part of the course we continue with phenomenological models, but return to ones
inspired by subsets of neural circuitry within the brain. Such connectionist or firing-rate models
have already appeared in Examples 6 and 7 of §2.3, and in §2.4. They are biophysically motivated
in the sense that the variables and parameters that define them represent quantities such as firing
rates of groups of neurons, synaptic connection strengths, bias currents, etc., but in most cases
a satisfactory derivation from mechanistic H-H type models, is lacking or incomplete. However,
these “high level” models address phenomena on the scale of brain areas (cf. §1.1 and §3.1), they
provide direct links to behavioral observables, and derivations from simplified integrate and fire
spiking models are available, as we describe in §6.2.
In this chapter we focus on decision making or signal detection in simple situations in which
one of two known stimuli appears, partially buried in noise, and the task is to correctly identify
it. A major new aspect, that requires mathematical tools beyond those introduced in chapter 2,
is the inclusion of random inputs. After introducing a simple decision task and a model for it in
§6.1, and describing an optimal decision strategy in §6.3, we therefore sketch necessary background
material on random processes and stochastic differential equations in §6.4. We then return to the
optimal procedure in §§6.5-6.6, showing how neural network models approximate it, and discussing
behavioral experiments that test human ability for optimal performance.
6.1 Two-alternative forced-choice tasks
Drawing primarily from [22], we first review a canonical behavioral experiment: the two-
alternative forced-choise (2AFC) task. Choosing between two alternatives vastly simplifies typical
cognitive tasks, but we focus on it for several reasons. First, it represents many problems faced
by animals in their natural environments (e.g., whether to approach or avoid a novel stimulus).
Pressures for speed and accuracy in such constrained situations may have exerted strong evolution-
ary influences, thereby optimizing neural decision-making mechanisms. Even if optimality has not
been acheived, analyses based on it can provide bounds for possible behaviors. Second, a wealth of
human behavioral data has motivated formal modeling of the dynamics and response outcomes in
158
2AFC tasks (e.g. [166, 212, 41, 214, 248] and see additional references in [22]). Finally, neuroscien-
tists can now directly monitor neuronal dynamics in primates trained to perform the same decision
tasks as human subjects, and assess their relationships to task performance. In many cases, neural
and behavioral data are converging to support mathematical models such as those described in this
section (e.g. [111, 229, 238, 96, 213, 248, 61]).
In a common version of the 2AFC task, subjects must identify the direction of a coherently-
moving subset of dots embedded in a random motion field [29]. The stimulus appears after a
period during which the subject fixates on a central dot, and the response is typically signalled by
button pushes for humans, and eye saccades to left or right for monkeys. Key parameters under
experimenter control include: i) the stimulus difficulty or signal-to-noise ratio (SNR), which can
be manipulated by varying the coherence fraction; ii) whether participants are allowed to respond
freely or responses are cued or deadlined; and, as discussed further below, iii) the delay between
response and the next stimulus.
Models of 2AFC typically make three fundamental assumptions: i) evidence favoring each alter-
native is integrated over time; ii) the process is subject to random fluctuations; and iii) the decision
is made when sufficient evidence has accumulated favoring one alternative over the other. In the
case of fixed duration stimuli, with responses delivered at stimulus offset or signalled by a cue that
follows a delay period, the accumulation period is determined by the experimenter rather than the
subject. We shall shortly describe a leaky accumulator model [259] that formalizes these ideas. To
motivate it, we next sketch examples of neural data.
6.1.1 Neural data from the moving dots task
In addition to the famous orientation-sensitive neurons in visual cortex discovered by Hubel and
Weisel [136], the medial temporal (MT) area contains motion sensitive neurons tuned to particular
directions. Consider a pair of such cells, one preferentially responsive to right-going and the other
to left-going motions, subject to the moving dots stimulus. The width of the tuning curves implies
that the firing rates of the two cells may not differ much on average, and, especially for low coherence
displays, they will be noisy, implying that decisions based on instantaneous activities of MT neurons
would be inaccurate. Accuracy can be improved by integrating noisy signals and comparing the
relative levels of the integrated quantities (this is basically the content of the optimal detection
procedure of the sequential probability ratio test described below). The lateral interparietal area
(LIP) and frontal eye fields (FEF), involved in eye movement control, have been proposed as sites
for such integration.
Fig. 6.1 illustrates this idea via representations of typical firing rates observed in areas MT and
LIP of monkeys trained on the moving dots task. Note that while it is hard to distinguish which
MT signal dominates by direct observation, the (putatively) integrated LIP signals draw apart
clearly as time elapses. In fact, after a transient due to the initial bursts of MT spikes, the activity
of one population (black) declines while the other (gray) grows. This provides evidence for mutual
inhibition, as in the model described in the next section.
159
200
MT left
150 MT right
firing rate
100
50
0
0 0.5 1 1.5 2
time from stimulus onset [s]
30
20
firing rate
10
LIP left
LIP right
0
0 0.5 1 1.5 2
time from stimulus onset [s]
Figure 6.1: Cartoons of neural firing rates vs time from MT (top) and LIP (bottom) neurons
sensitive to left-going (gray) and right-going (black) stimuli, for a left-going stimulus. This figure,
taken from [22], is redrawn schematically based on data from [29, 229, 238].
6.1.2 A neural network model for choosing: leaky competing accumulators
Following [259, 22] and others, we model the accumulation of evidence in 2AFC tasks in LIP by
two competing and mutually inhibitory neural populations, each selectively responsive to sensory
input corresponding to one of the two alternatives. We already met a deterministic version of this
leaky accumulator model (LAM) or leaky completing accumulator model (LCA) in Example 7 and
Exercises 5-6 of §2.3.3. We now describe it in more detail. We consider a specific version of the
LAM which may be written as
τ ẋ1 = −x1 − γfg,β (x2 ) + I1 + c η1 (t) , (6.1a)

τ ẋ2 = −x2 − γfg,β (x1 ) + I2 + c η2 (t) (6.1b)
(cf. [30]), in which the state variables xj (t) denote mean input currents to cells of the jth neural
population and the integration implicit in the differential equations models temporal summation
of dendritic synaptic inputs [104]. The parameter γ sets the strength of mutual inhibition via
population firing rates fg,β (xj (t)), where fg,β denotes the sigmoidal activation (“current-frequency”
or “input-output”) function:
1 1
fg,β (x) = = [1 + tanh (2g(x − β))] , (6.2)
1 + exp (−4g (x − β)) 2
which has maximal slope g at x = β (see Fig. 2.21 in §2.3.3). The stimulus received by each
population has mean Ij and is polluted by noise ηj (t) of strength c, the ηj (t) being independent,
identically distributed (i.i.d.) white noise processes with zero mean E[ηj (t)] = 0 and variance unity.
The time constant τ sets the rate at which neural activities decay in the absence of inputs. In the
context of the moving dots task described above, the noisy inputs come (primarily) from left- and
160
right-sensitive MT cells, and the firing rates fg,β (xj (t) describe activities of the corresponding LIP
“integrator cells.” The noise can also represent other unmodeled inputs to LIP.
In free-response mode the decision is made and response initiated when the firing rate fg,β (xj (t))
of either population first exceeds a preset threshold θj , it being usually assumed that θ1 = θ2 . For
cued responses, the population with greatest firing rate at time t determines the decision. It is also
generally assumed that activities xj decay to zero after response and before the next trial, so that
initial conditions for (6.1) are xj (0) = 0. See [30] for further details, discussions of variable gain
g = g(t), and a second model in which population firing rates themselves are the state variables.
The LAM (6.1) is a nonlinear ODE with additive random inputs: an example of a stochastic
differential equation (SDE). Explicit results are hard to come by for nonlinear stochastic systems,
so we start our analysis by linearizing the sigmoidal response functions at the maximum gain point
x = β. A (partial) justification for this can be found in the proposal of Cohen et al. [50] that
neural circuits should “equilibrate” to work near this point to utilize the resulting sensitivity and
avoid the lower and upper bounds on firing rates (Fig. 2.21). We therefore replace the functions
fg,β (xj (t)) of (6.1) by g xj to obtain:
dx1 = (−kx1 − wx2 + s1 ) dt + c dW1 , (6.3a)

dx2 = (−kx2 − wx1 + s2 ) dt + c dW2 . (6.3b)
Here we have defined new parameters sj = Ij /τ , k = 1/τ , w = g γ/τ , the latter both positive, and
written the LAM in Itō form [88] to emphasise the fact that white noise is a continuous, but not
differentiable, function of time. Hence the terms ηj = dWj /dt in Eqn. (6.1) are not defined, and
these SDEs are better written in the form (6.3) in which the noise terms are represented by Weiner
processes which add independent random increments dWj to each state variable during each time
instant dt.
Definition 9. A Wiener process or Brownian motion on the interval [0, T ] is a random variable
W (t) that depends continuously on t ∈ [0, T ] and satisfies the following conditions [88, 118]:
• W (0) = 0.
√
• For 0 ≤ s < t ≤ T , W (t) − W (s) ∼ t − s N (0, 1), where N (0, 1) is a normal distribution
with zero mean and unit variance. (In words, the increments W (t) − W (s) are normally
distributed with zero mean and unit variance.)
• For 0 ≤ s < t < u < v ≤ T , W (t) − W (s) and W (v) − W (u) are independent.
As noted in §4.2.2, in numerical simulations, each increment of the Wiener process is discretized
for a timestep dt as follows [118]:
√ √
dW ∼ dt N (0, 1) , so that W (t + dt) = W (t) + dt N (0, 1) , (6.4)
where samples are drawn independently from N (0, 1) on each step. Because the normal distribution
is used, the process is also referred to as Gaussian.
161
1.2
0.6
1
0.4
0.8
0.2
0.6
2
2
y
x
0.4 0.2
0.2 0.4
0.6
0
0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5

x y
1 1
Figure 6.2: Five sample paths for the linearized LAM (6.3) showing the original and decoupled
coordinate systems and illustrating convergence to an attracting line. The thick lines are the
thresholds xi = θ; the attracting line is shown thin solid; the dashed lines are the thresholds ±θ̄ for
y2 ; the star is the initial condition, and dots are final states following threshold crossing. Parameter
values are s1 = 5, s2 = 4, θ = 1, c = 0.1, w = 4, k = 4 (a balanced case). From [189, Fig. 1].
√ √
Making the othonormal change of variables y1 = (x1 + x2 )/ 2, y2 = (x2 − x1 )/ 2 to an
eigenvector basis, equations (6.3) decouple and become:
µ ¶
s1 + s2
dy1 = −(w + k)y1 + √ dt + c dW̃1 , (6.5a)
2
µ ¶
s2 − s1
dy2 = λy2 + √ dt + c dW̃2 , (6.5b)
2
where λ = w − k and the dW̃j are i.i.d. Wiener processes resulting from the orthogonal transfor-
mation. In the absence of noise (c =√0) the linear ODE (6.5a) is easily solved to reveal that y1 (t)
approaches a sink at y1e = (s1 + s2 )/[ 2(w + k)]. As we shall show in §6.4 below (Exercise 53), with
noise present the sample paths of solutions of (6.5a) behave
√ on average in a similar manner, ap-
proaching a normal distribution with mean µ = (s1 +s2 )/[ 2(w+k)] and variance σ = c2 /[2(w+k)]
2
(a stochastic attractor). Thus, if (w + k) is large,

√ most sample paths of (6.5) converge to a neigh-
borhood of an attracting line y1 = (s1 + s2 )/ 2(w + k) before crossing a threshold. Hence we may
reduce to the scalar process (6.5b), with modified thresholds
2θ(w + k) − (s1 + s2 )
θ̄ = ± √ . (6.6)
2(w + k)
Fig. 6.2 illustrates the behavior or typical solutions of the LAM and shows how the coordinate sys-
tems are related. After introducing further examples of SDEs, we shall provide a brief introduction
to their analysis in §6.4 below.
Exercise 50. Verify the coordinate change based on eigenvectors from (6.3) to (6.5) and confirm
that the transformed threshold value (6.6) is correct. Find analytical solutions of (6.3) for the
noise-free (c = 0) case and compare then with numerical simulations of the original nonlinear
system (6.1). Draw phase planes in both cases and explain how the differences influence threshold
crossing behavior. [Remember to allow for the rescaling of time in passing from (6.1) to (6.3)].
162
In the linearized and reduced approximation of Eqn. (6.5b), it is only the difference in evidence
that drives the decision process. Eqn. (6.5b) is an example of an Ornstein-Uhlenbeck (OU) process:
a much studied linear stochastic differential equation. Furthermore, when λ = w−k = 0 we say that
(6.3) is balanced, and (6.5b) becomes a drift-diffusion process (a Wiener process with constant drift
rate). Quantitative comparisons showing fairly good agreements among the nonlinear, linearized,
and reduced systems are given in [30]. However, √ note in Fig. 6.2 that, while solutions are attracted
to the neighborhood of the line y1 = (s1 + s2 )/[ 2(w + k)], they can cross the original thresholds
x1 = θ, x2 = θ on either side of the points y2 = ±θ̄ for the reduced scalar process. The reduced OU
or drift-diffusion process neglects nonlinear effects that limit sensitivity at high and low firing rates,
and it approximates the threshold crossing dynamics for individual populations of left-and right-
sensitive LIP cells by assuming “tight competition” (large values of leak and inhibition). See [22]
for further details.
6.2 Derivation of firing rate equations from spiking neuron models
Before continuing to describe high level models of decision making, and in particular the optimal
sequential probability ratio test and its continuum limit, the drift-diffusion process, we outline how
a two-unit model similar to the leaky competing accumulators of §6.1.2 emerges by reduction of
an integrate-and-fire model of interacting pools of excitatory and inhibitory cortical cells subject
to stochastic spike inputs. Such models were first proposed to investigate working memory, and
in particular to address the question of how activity that persists for several seconds can arise
from individual neurons and synapses that have time constants of at most O(102 ) msec (NMDA
synapses) [7, 36].
Building on these models, Wang [270] performed simulations on a circuit containing three popu-
lations of excitatory (pyramidal type) neurons, two of which are respectively sensitive to each of the
two stimuli, the other being nonselective, and a single population of inhibitory neurons that glob-
ally suppress the excitatory cells. He showed that the firing rates of the two selective populations
behaved in a manner (loosely) similar to the activity levels of the LAM network of Eqn. (6.1) [270]
(specifically, compare Fig. 6.2 above with [270, Fig. 3B]). Subsequently, Wong and Wang [282]
used mean field theory and some additional approximations to derive a reduced, nonlinear two-unit
model, as we describe below. See [61] for a recent review of similar spiking models and their relation
to the leaky accumulators of §6.1.2 and [271] for more on decision making models.
The network in [282] contains 2000 leaky integrate-and-fire neurons divided into four groups:
two stimulus-selective populations each containing 240 excitatory pyramidal cells, a non-selective
pool of 1120 pyramidal cells, and an inhibitory population of 400 interneurons, connected as shown
in Fig. 6.3(a). Recalling §4.2, The state variables are the cellular trans-membrane voltages Vj (t),
the internal synaptic variables SAM P A,j (t), SN M DA,j (t) and SGABA,j (t), and noisy external inputs
SAM P A,ext,j (t) are applied to all 2000 cells.
163
Decision threshold
Firing rate
Cells selective
to stimulus 1
Cells selective
to stimulus 2
0
Time from stimulus onset
(a) (b)
Figure 6.3: (a) The network of [282] contains three populations of excitatory cells; each selective
population responds preferentially to one stimulus, the third is nonselective to both stimuli. A
fourth population of interneurons provides overall inhibition. Excitatory (NMDA- and AMPA-
mediated) and inhibitory (GABAA -mediated) synapses are denoted by filled and open ovals re-
spectively. All cells receive noisy AMPA-mediated background excitation; each cell connects to
every other and selective populations have relatively stronger local recurrent excitation. (b) Stim-
uli excite both selective populations, but inhibition typically suppresses one population, producing
winner-take-all dynamics. A decision is made when the first population crosses a fixed decision
threshold. Figure adapted from [66].
The ODEs describing the subthreshold voltages and the fast synaptic dynamics are
dVj
Cj = −gL (Vj − VL ) + Isyn,j (t), (6.7)
dt
dStype,j Stype,j X
= − + δ(t − tlj ), (6.8)
dt Ttype
l
where type = AM P A, GABA, or AM P A, ext and Ttype is the time constant for that synapse type.
Two ODEs are needed to describe the fast rise and slow fall of N M DA:
dSN M DA,j SN M DA,j
= − + αxj [1 − SN M DA,j ], (6.9)
dt τN M DA,decay
dxj xj X
= − + δ(t − tlj ). (6.10)
dt τN M DA,rise
l
When Vj (t) crosses a threshold Vthresh at time tlj the cell emits a delta function δ(t−tlj ), after which
Vj is instantaneously reset and held at Vreset for an absolute refractory period τref . The sums over
delta functions in (6.8) and (6.10) represents spikes in presynaptic neurons, and, in the absence
of stimuli, the external input currents are generated by Gaussian noise of mean µ and standard
deviation σ:
r
dt dt
dSAM P A,ext,j = −(SAM P A,ext,j − µj ) + σj N (0, 1). (6.11)
τAM P A τAM P A
164
In Eqns. (6.7-6.11) subscripts j index individual cells, and superscripts l index the times tlj at which
the jth cell emits spikes. These external inputs are the sole source of randomness in the system.
The synaptic currents in Eqn. (6.7) are given in terms of the synaptic variables as follows:
Isyn,j (t) = IAM P A,ext,j (t) + IAM P A,rec,j (t) + IN M DA,rec,j (t) + IGABA,rec,j (t),
IAM P A,ext,j (t) = −gAM P A,ext,j (Vj − VE )SAM P A,ext,j (t),
NE
X
IAM P A,rec,j (t) = −gAM P A,rec,j (Vj − VE ) wk,j SAM P A,k (t), (6.12)
k=1
E N
gN M DA,rec,j (Vj − VE ) X
IN M DA,rec,j (t) = − wk,j SN M DA,k (t),
(1 + [M g 2+ ] exp(−0.062Vk )/3.57)
k=1
NI
X
IGABA,rec,j (t) = −gGABA,rec,j (Vj − VI ) wI,j SGABA,k (t).
k=1
Here NE and NI denote the numbers of excitatory and inhibitory cells (1600 and 400, for the
simulations of [282]), the subscript rec (henceforth omitted) indicates recurrent connections within
the network, VE and VI are the glutamatergic (excitatory) and GABAergic (inhibitory) reversal
potentials, ωk,j denotes the strength of the synaptic connection from population k to population
j, gtype,j are synaptic conductances and gL in Eqn. (6.7) denotes the leak conductance. For each
synapse type there are two conductance values: gtype,p for post-synaptic pyramidal neurons and
gtype,I for post-synaptic interneurons. Recurrent excitatory connections within the selective pop-
ulations 1 and 2 have strength w1,1 = w1,1 = w+ , which are normally set greater than all other
connections within and between the excitatory and inhibitory populations.
Stimuli are represented by the addition of terms µ0 (1 ± c′ /100) to the mean inputs µ1 and µ2 to
the two populations of selective cells, with appropriate adjustments to the variances σj . Here µ0 is
the overall strength of the stimulus and c′ denotes its discriminability: specifically, the percentage
of coherently moving dots in the random motion display. (More generally, c′ /100 = 1 for perfectly
clear stimuli with infinite signal-to-noise ratio and c′ /100 = 0 for zero SNR.) To run simulations of
the 2AFC task, the ODEs are given random (small) initial conditions and integrated for a “fixation
period” without stimuli, after which the stimuli µ0 (1±c′ /100) are applied and integration continues
until the collective firing rate of any of the three excitatory populations exceeds a fixed decision
threshold: see Fig. 6.3(b). If c′ > 0 (resp c′ < 0) and population 1 (resp. 2) first crosses threshold,
then the response is correct; otherwise, it is an error, and if the threshold is crossed prior to stimulus
onset, a premature response is logged. Further details, along with parameter values appropriate
for modeling area LIP, appear in [282] and the supplementary materials to that paper; also see
[66, 67].
Using the all-to-all coupling structure and eliminating irrelevant Stype,j ’s (excitatory neurons
have no GABA-synapses, inhibitory neurons have no AM P A- or N M DA-synapses), Eqns. (6.7-
6.11) still constitute a 9200-dimensional stochastic dynamical system that is analytically intractible
and computationally intensive to simulate. Following [282] we now sketch a sequence of low-
dimensional reductions that preserve key physiological detail, permit bifurcation analyses, and
relate the spiking network to the leaky accumulators described in §6.1.2.
We first reduce to a four-population model using a mean field approach from statistical physics
165
[282], simplifying the self-consistency calculations of [36, 216] by employing a fixed average voltage
V̄ = (Vreset +Vthresh )/2 to estimate synaptic currents that enter each of the four cell populations via
terms Jtype,j = −gtype,j (V̄ − Vtype ). These are then multiplied by the appropriate number NE or NI
of presynaptic cells in each population and by an averaged synaptic variable Stype,j , and summed
to create the incoming synaptic input currents to each population. Each term in the current to
postsynaptic population j therefore takes the form
Itype,j (t) = NE/I Jtype,j ωk,j Stype,k (t). (6.13)
Individual neuron voltages are replaced by averaged firing rates determined by frequency-current
(f − I) relationships, analogous to the input-output function f (·) of Eqn. (6.2) (see §4.2.2 above).
This yields an 11-dimensional system described by 4 firing rates νj (t), one inhibitory population-
averaged synaptic variable SGABA (t), and two such variables SAM P A,j (t) and SN M DA,j (t) for each
excitatory population (6 in all). The seven synaptic equations take the forms1
= − + 0.641(1 − SN M DA,j )νj , (6.14)
dt TN M DA
dSAM P A,j SAM P A,j
= − + νj , (6.15)
dt TAM P A
dSGABA,I SGABA,I
= − + νI , (6.16)
dt TGABA
where j = 1, 2, 3 for the excitatory populations, TN M DA = TN M DA,decay , and the four firing rates
obey
dνj −(νj − φj (Isyn,j ))
= , (6.17)
dt Tj
with j = 1, 2, 3 and j = I for the inhibitory population. Thus, the four population model is
described by 11 differential equations: a remarkable reduction from the 9,200 ODEs and SDEs of
the full network.2
The f − I relationships φj (Isyn ) of Eqn. (6.17) for the populations 1, 2, 3 and I are estimated by
computing the firing rate of a leaky integrate-and-fire neuron (cf. §4.2.2) from the first passage time
of the Ornstein-Uhlenbeck process (6.7), with additive Gaussian noise, across a single threshold [217,
9, 216]:
" Z Vthresh −Vss #−1
√ σV
x2
φ(Isyn ) = τref + τm π e [1 + erf(x)] dx , (6.18)
Vreset −Vss
σV
where τm = Cj /gL , Vss = VL + Isyn /gL and σV is the standard deviation in the membrane potential
due to the current fluctuations. The integral of (6.18) cannot be evaluated explicitly, and the
following approximation, due to [2], was employed in [282]:
cE/I (Isyn − IE/I )

φ(Isyn ) ≈ . (6.19)
1 − exp[−γE/I (cE/I (Isyn − IE/I ))]
1
The factor 0.641 in Eqn. (6.14) appears as a function value F (ψ(ri )) in [282, Eqn. (6)] corresponding to a steady
state firing rate, which appears in a subsequent approximation step below.
2
The number 7200 that appears in [282, Fig. 1] evidently excludes the 2000 uncoupled SDEs (6.11) that determine
the external inputs. Indeed, since these are stable OU processes one can in fact draw independent samples from the
equilibrium distribution of sample paths for the OU process (see §6.4.2, below).
166
In Eqn. (6.19) the subscripts E/I denote excitatory and inhibitory neurons, Isyn is the total synaptic
input to a single cell, cE/I is a gain factor and γE/I determined how sharply the f − I curve turns
from φ(Isyn ) ≈ 0 at low currents to an approximately linear relationship φ(Isyn ) ≈ cE/I (Isyn −IE/I )
at high currents. In [282] the parameters cE/I , γE/I and IE/I are chosen separately for the excitatory
and inhibitory neurons, and the authors remark that the fit to Eqn. (6.18) is accurate for large
input noise and moderate firing rates. This is appropriate for the decision making application, since
decision thresholds are observed to lie at around 40 − 50 Hz.
The next simplification adopted in [282] uses the fact that, as observed in simulations of the
full network, the firing rate of the nonselective cells ν3 (t) does not vary much across a trial. It
is therefore assumed to remain constant, thereby removing the variables SN M DA,3 , SAM P A,3 and
ν3 and replacing them by suitable constants. The firing rate of the inhibitory interneurons is also
approximated by linearizing the f − I curve φI for that population in the appropriate range of
8 − 15 Hz (again obtained from simulations):
cI (Isyn − II )
φI (Isyn ) = + r0 , (6.20)
γ2
with suitable choices for γ2 and r0 . This approximation makes it possible to compute a self-
consistent firing rate for the inhibitory population in the presence of self-inhibition. The fact that
the time constants for GABA and AM P A are short compared to that for N M DA (TGABA = 5 ms,
TAM P A = 2 ms ≪ TN M DA = 100 ms) then justifies use of the quasi-equilibrium approximations
SGABA,I = TGABA νI and SAM P A,j = TAM P A νj , j = 1, 2 from Eqns. (6.15-6.16). (Recall that a
similar approximation, based on separation of times scales [147, 107] was made in reducing the
Hodgkin-Huxley system to two variables in §3.4.) Finally we assume that the time constants
T1 , T2 in Eqns. (6.17) are also much faster than TN M DA , so that the firing rates also rapidly reach
quasiequilibria: νj = φj (Isyn,j ), j = 1, 2. Thus we set
SGABA,I = TGABA νI = TGABA φI and SAM P A,j = TAM P A νj = TAM P A φj , j = 1, 2, (6.21)
and only SN M DA,1 and SN M DA,2 remain as dynamical variables, evolving according to Eqn. (6.14)
with νj = φj :
=− + 0.641(1 − SN M DA,j )φj (Isyn,j ), j = 1, 2. (6.22)
dt TN M DA
The synaptic currents Isyn,j in Eqns. (6.22) are determined from Eqns. (6.12) via Eqn. (6.13),
but to close this calculation the GABA input currents and AM P A and N M DA currents due to
population 3 must be replaced by self-consistently computed quasiequilibrium values, since the
quantities that determine them have been removed as variables. Replacing the subscripts N M DA
and AM P A by N and A for short, we therefore have
Isyn,1 = JN,11 SN,1 − JN,21 SN,2 + JA,11 TA φ1 (Isyn,1 ) − JA,21 TA φ2 (Isyn,2 )

+Isc,1 + Istim,1 + Inoise,1 , (6.23)
Isyn,2 = −JN,12 SN,1 + JN,22 SN,2 − JA,12 TA φ1 (Isyn,1 ) + JA,22 TA φ2 (Isyn,2 )
+Isc,2 + Istim,2 + Inoise,2 , (6.24)
where JN,ij and JA,ij denote the net self (i = j) and cross (i 6= j) connection strengths between
populations 1 and 2, Isc,j , j = 1, 2 are the self-consistent internal currents due to AM P A, GABA
167
and population 3, Istim,j = µ(1 ± E) are the stimuli, and Inoise,j represent additive Gaussian noise.
The negative signs before the cross terms JN,12 , JA,21 etc. in Eqns. (6.23-6.24) indicate that the net
interaction between populations 1 and 2 is inhibitory, because global inhibition from interneurons
excited by each population dominates that population’s self excitation.
While equilibria of Eqns. (6.22) can be found by setting their right hand sides to zero, the fact
that Eqns. (6.23-6.24) contain nonlinear terms of the form φ1 (Isyn,j ) implies that the Isyn,j ’s would
have to be solved for recursively at each step in the integration process to find a time-dependent
solution. In [282] this was overcome by finding effective nonlinear f − I relationships of the forms
φ1 = H(y1 , y2 ) and φ2 = H(y2 , y1 ), where

y1 = JN,11 SN,1 − JN,21 SN,2 + Isc,1 + Istim,1 + Inoise,1 , (6.25)
y2 = −JN,12 SN,1 + JN,22 SN,2 + Isc,2 + Istim,2 + Inoise,2 ,
as described in the supplementary materials to [282]. Finally, φ1 (Isyn,1 ) and φ2 (Isyn,2 ) in (6.22) were
respectively replaced by H(y1 , y2 ) and H(y2 , y1 ), thus producing a well-defined pair of ODEs. From
these one can derive bifurcation diagrams, indicating subcritical pitchforks and tristability, and
associated phase portraits, as shown in [282, Figs. 4-5 and 12-13], samples of which are reproduced
here in Figs. 6.4-6.5. The reader should compare the phase portraits of Fig. 6.4 with those of
Example 7, Figs. 2.22-2.23 in §2.3.3.
Referring to Fig. 6.5, for small recurrent connection strengths w+ (Region I) a unique branch
of low amplitude stable states exists, but as w+ increases a pair of asymmetric stable states ap-
pears over a small range of stimulus strengths µ0 , bounded by pitchfork bifurcations (Region II).
Branches of saddle-node bifurcations emerge from degenerate pitchforks, which become subcritical
in Region III, producing ranges in µ0 over which 5 equilibria exist, 3 stable and 2 unstable, as well
as the bistable region illustrated in the phase portraits of Figs 6.4(b,c). For sufficiently high w+ ,
after the order of the two pitchfork bifurcations changes on the µ0 axis, 4 stable equilibria and 5
unstable equilibria coexist in the central region of the bifurcation diagrams of region IV, providing
phase portraits that are considerably more complex than those of the Usher-McClelland model
of Example 7 (not shown here in bifurcation diagram IV, but see the analogous diagrams in [67,
Figs. 5-7, 11 and 13]).
The phase portrait of Fig. 6.4(a) has 3 stable equilibria, one with both SN,1 and SN,2 low,
and the others with SN,1 ≫ SN,2 and SN,2 ≫ SN,1 . We refer to these as the low-low, high-low
and low-high attractors; their domains of attraction are separated by stable manifolds of the two
saddle points that lie between them. The latter attractors represent working memory states that
can store decisions for alternatives 1 and 2 during a delay period, while the low-low attractor
corresponds to an undecided or pre-decision state. At stimulus onset the low-low state typically
becomes unstable or disappears (see Figs. 6.4(b,c,d)), causing the solution to jump to the high-low
or low-high attractor; when a stable low-low state persists, its domain of attraction shrinks and
noise causes the jump. If thresholds are applied, they would cross the phase plane between the
low-low and high-low and low-high states, e.g. at SN,1 = 0.55 and SN,2 = 0.55. In this context,
the fourth attractor that occurs for large w+ corresponds to a high-high state that lies above
both thresholds and represents an impulsive choice. See [282] and [67] for further information and
interpretations.
Note that the regions labeled “bistable” in Fig. 6.5(top), bounded by the saddle-node and pitch-
168
Figure 6.4: Phase portraits of the reduced noise-free system (6.22) with effective f − I relationships
(6.25); nullclines shown as bold red and green curves. (a): no stimulus, µ0 = 0; (b,c,d): µ0 > 0 and
c′ = 6.4% (b), 51.2% (c) and 100% (d). Note reflection-symmetry about S1 = S2 and tristability
in (a); bistability and increasing symmetry breaking as c′ increases through (b,c), and a unique
global attractor in (d). In (b-d) the upper left sink corresponds to the correct choice, and blue
and red sample paths in (b) respectively indicate correct and error choices for the sysytem with
small additive white noise. Reproduced from [282, Fig. 5], which uses the notation Sj = SN,j (=
SN M DA,j ).
fork bifurcation curves, are in fact tristable since three stable states (low-low, low-high and high-low)
coexist, as indicated by bifurcation digram III; the region labeled “competition” is bistable. See
the phase portraits of Fig. 6.4(a) and (b) for examples.
More recently the effects of neuromodulation by norepinephrine have been studied using a mod-
ified version of the Wong-Wang model [66], and mean field reduction of this circuit to a four- and
ultimately a two-population model has been carried out [67], although the final reduction to two
variables as above was not employed, since effective relationships of the form (6.25) could not be
found over the entire parameter space necessary to model neuromodulation. (Nonetheless, null-
clines for 2-dimensional projected phase planes were computed and used to elucidate the dynamics,
and bifurcation diagrams analogous to those of Fig. 6.5 were computed.) Local reductions of a
more rigorous nature to 1-dimensional systems have also been done from similar models in the
neighborhood of a pitchfork bifurcation [227], using center manifold theory. For reviews of decision
making and spiking models of neural circuits, see [271, 61].
169
1.9
1.9
IV
Recurrent strength, w+
1.8
1.8
Bistability Bistability
1.7
1.7 III
Competition
1.6
1.6
II
I Monostability
1.5
1.5
00 20
20 40
40 60
60 80
80
Stimulus strength, µ0 (Hz)
I 1
w = 1.57
II 1
w = 1.59
0.8 + 0.8 +
0.6 0.6
S S as s
0.4 0.4 s
s
0.2 0.2 s as
0 0
0 20 40 60 80 0 20 40 60 80
µ 0 (Hz) µ0 (Hz)
III 1
w = 1.7 IV 1
w = 1.87
+ +
0.8 0.8 as
as s
0.6 s 0.6
S s S
0.4 0.4
s
0.2 0.2
s as s as
0 0
0 20 40 60 80 -100 -50 0 50 100
µ0 (Hz) µ 0 (Hz)
Figure 6.5: A bifurcation set (top) and bifurcation diagrams (I-IV below) for the reduced system
(6.22) with effective f − I relationships (6.25), as stimulus strength µ0 and recurrent connection
strength w+ vary, with c′ = 0; red and blue curves respectively denote saddle-node and pitchfork
bifurcations. In (I-IV) S and AS respectively denote symmetric (SN,1 = SN,2 ) and asymmetric
(SN,1 6= SN,2 ) states, which occur in reflection-symmetric pairs for c′ = 0. Solid and dashed curves
respectively denote stable and unstable equilibria. Reproduced from [282, Fig. 12]. See text for
discussion.
6.3 The best way to choose between two noisy signals
We now depart from the neural evidence and accumulator models for a while, to describe a
discrete procedure from mathematical statistics and signal analysis that delivers decisions in binary
170
choice situations that are optimal in a precise sense. Remarkably, a continuum limit of this method
turns out to be a drift-diffusion process of the type that emerged in §6.1.2 above. First we outline
a standard procedure for testing samples of fixed size.
Suppose we wish to decide whether a random sequence Y = y1 , y2 , · · · , yN of N independent

observations is drawn from the probability distribution p0 (y) (hypothesis H0 ) or p1 (y) (hypothesis
H1 ). Neyman and Pearson [195] showed that the optimal procedure is to calculate the following
ratio:
p1N p1 (y1 )p1 (y2 ) · · · p1 (yN )
= , (6.26)
p0N p0 (y1 )p0 (y2 ) · · · p0 (yN )
and to accept hypothesis H0 (resp., H1 ) if pp1N 0N
< K (resp., pp1N
0N
≥ K), where K is a constant
determined by the desired level of accuracy for one of the hypotheses. Setting K < 1 increases
the reliability of the decision to accept H0 , at the expense of reducing the reliability of accepting
H1 . Since the piN , i = 0, 1 are the probabilities of Y occurring under the hypotheses Hi , setting
K = 1 reduces the procedure to simply determining which hypothesis is most likely (determining
maximum likelihood ). In this case, if the hypotheses occur with equal probability, the procedure is
optimal in that it guarantees the smallest overall error rate.
6.3.1 The sequential probability ratio test
The Neyman-Pearson procedure is optimal for fixed sequence sizes N . When this requirement is
relaxed, the optimal procedure is the sequential probability ratio test (SPRT), in which observations
continue as long as the running quotient pp1n
0n
(defined as in (6.26) with N = n) satisfies the inequality
p1n
A0 < < A1 , (6.27)
p0n
where A0 < 1 < A1 are two given constant thresholds. The hypothesis H0 (resp., H1 ) is accepted
at step n as soon as pp0n
1n
≤ A0 (resp., pp1n
0n
≥ A1 ). The SPRT was independently developed during
World War II by Abraham Wald [263], who was introduced to the problem by Milton Friedman
and W. Allen Wallis while they were members of the Statistical Research Group at Columbia
University [266], and by George Barnard [17, 62] in the U.K. Alan Turing and his coworkers at
Bletchley Park employed the SPRT in breaking the Enigma code used by the German navy in
World War II [100, 98]3 .
The SPRT is optimal in the following sense. Let P (rej Hi |Hi ) be the probability that hypothesis
Hi is true but rejected, i = 0, 1, and let Ei (N ) be the expected value for the number of observations
required to reach a decision when hypothesis Hi is true, i = 0, 1.
Theorem 5. Among all fixed sample or sequential tests for which
P (rej Hi |Hi ) ≤ αi , i = 0, 1,
and for which E0 (N ) and E1 (N ) are finite, the SPRT with error probabilities P (rej Hi |Hi ) = αi ,
i = 0, 1, minimizes both E0 (N ) and E1 (N ).
3
Both Wald and Turing died prematurely: Wald in a plane crash en route to a scientific presentation in India in
1950, and Turing (apparently) committed suicide in 1954.
171
This theorem was first proved in [264]; for a simpler proof, see [170].
The thresholds A0 and A1 in the SPRT are related to the error rates α0 and α1 as follows [263,
170]. Consider the set C1 of n-length sequences Y such that the SPRT chooses H1 when Y occurs.
That is, for any Y ∈ C1 ,
p1 (y1 )p1 (y2 ) · · · p1 (yn ) ≥ A1 p0 (y1 )p0 (y2 ) · · · p0 (yn ).
Integrating this inequality over C1 , we get
p1 (C1 ) ≥ A1 p0 (C1 ), (6.28)
where pj (C1 ) is the probability of making choice 1 given that hypothesis Hj is true. By definition,
p1 (C1 ) ≥ 1 − α1 and p1 (C0 ) ≤ α0 , so that
1 − α1
1 − α1 ≥ A1 α0 ⇒ A1 ≤ .
α0
Similarly,
α1
α1 ≤ A0 (1 − α0 ) ⇒ A0 ≥ .
1 − α0
The inequalities fail to be equalities because it is possible to overshoot the boundaries A0 or A1 .
However, in practice, there is typically little penalty in assuming equality [263, 170]:
α1 1 − α1
A0 = , A1 = . (6.29)
1 − α0 α0
Note that, when using an SPRT with A0 and A1 defined in this way, the condition that A0 <
1 < A1 in the proof becomes
α1 1 − α1
<1< . (6.30)
1 − α0 α0
This requires that α1 < 1 − α0 (which also implies α0 < 1 − α1 ). Now, 1 − α0 is the probability of
choosing H0 when H0 is true, so (6.30) implies that the probability of choosing H0 when H1 is true
is less than the probability of choosing H0 when H0 is true. Similarly, the probability of choosing
H1 when H0 is true should be less than the probability of choosing H1 when H1 is true. These are
reasonable restrictions for a decision making procedure.
Wald [263] also gives approximate expressions for the expected numbers of observations that
may be written
³ ´ ³ ´
α1 1−α1
α1 log 1−α 0
+ (1 − α1 ) log α0
E1 (N ) ≈ ³ ³ ´´ (6.31)
p1 (y)
E1 log p0 (y)
³ ´ ³ ´
α1 1−α1
(1 − α0 ) log 1−α 0
+ α 0 log α0
E0 (N ) ≈ ³ ³ ´´ , (6.32)
p1 (y)
E0 log p0 (y)
³ ³ ´´
where Ei log pp10 (y)
(y) is the expected value of the logarithmic likelihood when Hi is true, i = 0, 1.
In particular, to obtain the same accuracy for both alternatives, we set α0 = α1 = ER and
172
Eqns. (6.31-6.32) become:
(1 − 2ER) log 1−ER
¡ ¢
ER
E(N ) ≈ ³ ³ ´´ , (6.33)
E log pp10 (y)
(y)
where ER henceforth denotes the common error rate.
6.3.2 Random walks and the continuum limit of SPRT
Laming models the 2AFC by supposing that decisions are made based on accumulation of
information [166]. In each trial the subject makes a series of brief observations of the stimulus (S0
or S1 ) represented by the random sequence y1 , y2 , · · · , yn . The increment of information gained
from (independent) observation yr is defined to be
µ ¶
p1 (yr )
δIr = log , (6.34)
p0 (yr )
where pi (y) is the probability distribution for y given that stimulus Si was presented, i = 0, 1.
(Implicitly, the subject has some internal representation of p0 (y) and p1 (y).) At the nth observation
(cf. Eqn. (6.26)), the total information accumulated is
n n µ ¶ µ ¶
X X p1 (yr ) p1n
In = δIr = log = log : (6.35)
p0 (yr ) p0n
r=1 r=1
this is the log likelihood ratio [96]. Under the free response protocol observations continue as long
as I0 < In < I1 , where I0 and I1 are the thresholds in the logarithmic space, and a choice is made
at step n if In ≤ I0 or In ≥ I1 . Hence this log likelihood formulation is equivalent to making
decisions using the SPRT with I0 = log A0 and I1 = log A1 . For example, if the desired error rate
(ER) is ER = α0 = α1 , which is reasonable if the signals S0 and S1 are equally salient, from (6.29)
we take µ ¶ µ ¶
ER 1 − ER
I0 = log < 0, I1 = log = −I0 > 0, (6.36)
1 − ER ER
cf. [166]. (The signs follow from the assumed inequality (6.30).)
Thus, from (6.35), in logarithmic variables the trajectory In is a discrete-time, biased random
walk with initial condition zero: a new increment of information arrives, and the trajectory is
updated, as the timestep advances from n → n + 1 (recall that the increments δIr are assumed to
be independent and identically distributed). Hereafter we treat the continuous-time limit I(t) of
this process, in which infinitesimal increments of information arrive at each moment in time. This
limit must be taken with some care in order to preserve the variability present in (6.35). Up to an
unimportant scale factor between timesteps n and the continuous time t, the limiting procedure
is as follows. Let the δIr have mean m and variance D2 (assumed finite). Then define the family
(indexed by N = 1, 2, ...) of random functions of t ∈ [0, T ], where T is some large time, as follows:
k k
1 X 1 X
I N (t) = √ (δIr − m) + δIr where k = ⌊N t/T ⌋ . (6.37)
N r=1 N
r=1
Here, ⌊N t/T ⌋ is √
the largest integer smaller than N t/T . Note that the first term of (6.37) is
normalized by 1/ N and the second by 1/N , reflecting the different rates at which fluctuations
173
and means accumulate as random increments are summed. For any N , I N (t) has mean m⌊t/T ⌋
and variance D2 ⌊t/T ⌋; e.g., from (6.35), In has mean mn and variance D2 n. Furthermore, the
Donsker invariance principle (see [21, Thm. 37.8]), together with the law of large numbers, implies
that as N → ∞
I N (t) ⇒ D W (t) + mt ≡ I(t) , (6.38)
where W (·) is a Wiener process and the random functions I N (·) converge in the sense of distribu-
tions.
Eqn. (6.38) implies that the limiting process I(t) satisfies the stochastic differential equation
(SDE)
dI = m dt + D dW , I(0) = 0 , (6.39)
with thresholds I0 < 0 < I1 . The drift m and variance D of the δIr and hence of (6.39) depend
upon the distributions pi (y), cf. (6.34). For example, in the case of Gaussians
1 2 /(2σ 2 ) 1 2 /(2σ 2 )
p0 (y) = √ e−(y−µ0 ) , p1 (y) = √ e−(y−µ1 ) , (6.40)
2πσ 2 2πσ 2
with µ1 > µ0 , we have
µ ¶ µ ¶
p1 (yr ) µ1 − µ0 µ0 + µ1
δIr = log = yr − , (6.41)
p0 (yr ) σ2 2
and if Si is presented, the expected value of yr is E(yr ) = µi , and the variance is V ar(yr ) = σ 2 .
Thus, taking expectations and substituting in (6.41), we obtain
(µ1 − µ0 )2
E(δIr ) = ± = m, (6.42)
2σ 2
(the + applies if S1 is presented, the − if S0 ), and in both cases
(µ1 − µ0 )2
V ar(δIr ) = D2 = , (6.43)
σ2
cf. [97, 98]. Thus the limiting SDE is
·µ ¶ ¸
µ1 − µ0 µ1 − µ0
dI = dt + σ dW , I(0) = 0 . (6.44)
σ2 2
If each incremental observation δIr is composed of many subobservations (for example, from dif-
ferent regions of the visual field, or from large populations of neurons), this Gaussian assumption
is justified by the Central Limit Theorem.
In the particular case of (6.41) in which µ1 = −µ0 = A, σ = c, appropriate to tasks such as

the moving dots task in which the alternative stimuli are of equal coherence, the simplified form of
(6.41) implies that the accumulating information In is simply a scaled version of the running total
of observations yr :
n n
2A X 2A X def 2A
δIr = 2 yr ⇒ In = δIr = 2 yr = 2 yn . (6.45)
c c c
r=1 r=1
Assuming without loss of generality that S1 is presented, the yr have mean A and variance c2 , so
that, in the continuous time limit analogous to (6.37), yn converges to y(t), which satisfies the SDE:
dy = ±A dt + c dW ; y(0) = 0 . (6.46)
174
The “logarithmic” SPRT involving observations δIr is therefore equivalent to solving the first
c2
passage problem defined by (6.46) with thresholds y = ±z = ± 2A I1 , as in (6.36). We shall refer to
(6.46) as a drift-diffusion (DD) process or drift-diffusion model (DDM). In computing first passage
times and error rates below, we may appeal to reflection symmetry about y = 0 and assume without
loss of generality that A ≥ 0 so that stimulus S1 is present and alternative 1 is correct.
2.5
1.5
0.5
x
-0.5
-1
-1.5
0 0.5 1 1.5 2 2.5
time
Figure 6.6: Left: three sample paths of the drift-diffusion process (6.46) along with histograms
of first passage times for correct and incorrect responses obtained from 100,000 trials. Parameter
values A = c = z = 1. Right: typical sample paths for the states xj of the leaky accumulator model
of §6.1.2. From [22, Fig. 2]. Sample paths were computed by the Euler-Maruyama method with
step size ∆t = 0.01 (§6.4.4). NOTE: Interpret x 7→ y on left panel and yj 7→ xj on right.
Fig. 6.6 illustrates typical sample paths of the DDM and the resulting first passage time his-
tograms that emerge from threshold crossing for both correct and incorrect responses. It also
shows the states of the LAM of §6.1.2 to illustrate that the difference x2 (t) − x1 (t) behaves like
y(t), after an initial transient during which both values rise at similar rates (cf. the phase portraits
of Fig. 6.2). The constant drift SDE (6.39) or (6.46) is a particular limit of the discrete random
walk occurring in the SPRT and Neyman-Pearson tests. In the following sections we analyse these
stochastic processes in both free response and time-constrained contexts. First we need some facts
about SDEs that were promised in §6.1.2.
6.4 Introduction to stochastic differential equations
In this section we discuss scalar SDEs; specifically, initial value problems of the form
dx = f (x, t) dt + g(x, t) dW , x(t0 ) = x0 , (6.47)
which include the constant drift-diffusion process (6.46) derived above from the SPRT, and the
Ornstein-Uhlenbeck (OU) processes (6.5) that emerged from the linearized LAM model of §6.1.2.
Multidimensional generalizations also exist, but in practice explicit results can typically be obtained
only in the scalar case, and often only for linear SDEs such as Eqns. (6.5).
175
In Eqn. (6.47) f (x, t) and g(x, t) are respectively called the drift and diffusion terms (or coef-
ficients), and dW denotes Weiner increments (Definition 9, §6.1.2). We may rewrite the SDE in
integral form: Z t Z t
x(t) = x0 + f (x(s), s) dt + g(x, s) dW (s) , (6.48)
t0 t0
where the first integral is the usual one from calculus which (implicitly) defines the unique solution
of the deterministic ODE ẋ = f (x, t) . The second, stochastic integral can be interpreted as the
continuum limit of a discrete sum, much as in the Riemann sum used in Euler’s method (§§2.2.1-
2.2.2; Fig. 2.11):
N
X N
X
Istoch = g(x(tj ), tj ) dW (tj ) = g(x(τj ), τj ) [W (tj ) − W (tj−1 )] , with τj ∈ [tj−1 , tj ] . (6.49)
j=0 j=1
The sum (6.49) approaches the second integral of (6.48) as δt → 0 and N → ∞. (For simplicity
we may take equal increments tj − tj−1 = δt and choose δt = (t − t0 )/N .) However, unlike the
deterministic case, the choice of time instants τj at which the function g(x(t), t) is evaluated can
affect the result. We shall adopt the choice τj = tj−1 , which corresponds to the Itō stochastic
integral; (choosing τj = (tj − tj−1 )/2 yields the Stratonovich integral) [88, §4.2], cf. [75, §10.1].
If the function g in Eqn. (6.49) is constant (i.e., independent of x and t, as for many examples
considered below), then the choice of τj makes no difference.
Solutions of SDEs are random processes, and so we often want to average over many realizations
to obtain probabilistic descriptions such as the probability density of solutions of (6.47). To derive
these, we need an important result from Itō’s stochastic calculus.
6.4.1 Change of variables: Itō’s lemma
A primary difference between regular calculus and stochastic calculus occurs in the chain rules
for differentiation. Recall from calculus that if y(t) = h(x(t)) for scalar functions, then dy dh dx
dt = dx dt .
Itō’s lemma, which we will formally derive below, shows that additional higher order terms are
present in the analogue of the chain rule for stochastic calculus.
We shall illustrate the stochastic chain rule in connection with the SDE (6.47). First, recall the
following property of a Wiener process
√
dW = W (t + dt) − W (t) = dt N (0, 1) , (6.50)
implied by the second condition of Definition 9 (§6.1.2), which in turn implies that
√
dW = O( dt) , dW 2 = O(dt) and E[(dW )2 ] = dt. (6.51)
Given a twice-differentiable function h(x(t)), we ask what SDE does h obey? Expanding to second
order in a Taylor series we find:
1
dh(x) = h(x + dx) − h(x) = h′ (x)dx + h′′ (x)dx2 + . . .
2
1
= h (x)[f (x, t) dt + g(x, t) dW ] + h′′ (x)[f (x, t) dt + g(x, t) dW ]2 + . . .
′
2
1
= h′ (x)f (x, t) dt + h′ (x)g(x, t) dW + O(dt3/2 ) + h′′ (x)g 2 (x, t) dW 2 .
2
176
In the third step we substitute (6.47) and then use (6.51) to estimate orders of magnitude of dW
and (dW )2 in the fourth step.
Finally, we let dt → 0 and appeal to the fact that dW 2 → E[(dW )2 ] = dt as dt → 0 in Itō

calculus (this is essentially due to the appearance of dW only in stochastic integrals like (6.48):
see [88, §§4.2.5 and 4.3.3]). This yields Itō’s lemma:
· ¸
1 ′′
′
dh(x(t)) = h (x(t))f (x(t), t) + h (x(t))g (x(t), t) dt + h′ (x(t))g(x(t), t) dW (t) ,
2
(6.52)
2
also known as Itō’s formula or It̄o’s rule. Note that, unlike the standard chain rule, an additional
term 12 h′′ (x)g 2 (x, t) appears in the O(dt) deterministic part of dh (unless h is linear in x).
6.4.2 The forward Kolmogorov or Fokker-Planck equation
Since individual solutions or sample paths of an SDE are determined by successive increments
dW (t) drawn randomly from a distribution, we often consider ensembles of solutions produced
by integrating the SDE many times with different i.i.d. (independent, identically distributed)
sequences dW (t). A reasonable question is then: Given an initial condition x(t0 ) = x0 at time t0 ,
what is the probability p(x, t|x0 , t0 ) of finding the solution at a point x at time t > t0 ? This can,
of course, be estimated by repeated (Monte-Carlo) simulations of the SDE, but there is a more
elegant answer.
It turns out that the conditional probability density p(x, t|x0 , t0 ) evolves according to a linear
PDE. This can be derived from the SDE (6.47)
dx = f (x, t) dt + g(x, t) dW , x(t0 ) = x0 , (6.53)
using Itō’s lemma, the statement (6.52) of which we also repeat for convenience:
· ¸
1
dh(x(t)) = h′ (x(t))f (x(t), t) + h′′ (x(t))g 2 (x(t), t) dt + h′ (x(t))g(x(t), t) dW (t) . (6.54)
2
Taking expectations of Eqn. (6.54) with respect to x, dividing by t, and using the fact that E[dW ] =
0, we obtain
d 1
E[h(x)] = E[h′ (x)f (x, t)] + E[h′′ (x)g 2 (x, t)] . (6.55)
dt 2
The expectations in Eqn. (6.55) are averages over x with respect to the joint probability density
p(x, t|x0 , t0 ), i.e. Z ∞
E[h(x)] = h(x)p(x, t|x0 , t0 ) dx , (6.56)
−∞
for any (continous function h(x). Here we have assumed that p(x, t|x0 , t0 ) is defined on R, as is
appropriate for one-dimensional Brownian motion. Using (6.56) and writing p(x, t|x0 , t0 ) = p(x, t)
for short, Eqn. (6.55) becomes
d ∞
Z ∞· ¸
1 ′′
Z
′ 2
h(x)p(x, t)dx = h (x)f (x, t) + h (x)g (x, t) p(x, t) dx
dt −∞ −∞ 2
Z ∞½ ¾
′ 1 2 ′′
= [−f (x, t)p(x, t)] + [g (x, t)p(x, t)] h(x) dx . (6.57)
−∞ 2
177
Here the second equality comes from integration by parts and using the fact that p(x, t) → 0 and
p′ (x, t) → 0 as x → ±∞, conditions that are necessary for the probability density to be well-defined.
Since Eqn. (6.57) must hold for arbitrary functions h(x), the integrands of its left- and right-hand
sides must be equal:
∂ ∂ 1 ∂2 2
p(x, t) = − [f (x, t)p(x, t)] + [g (x, t)p(x, t)] . (6.58)
∂t ∂x 2 ∂x2
This is called the forward Fokker-Planck or Kolmogorov equation for the SDE (6.53), reflecting its
origins in both statistical physics and probability theory. See [88] for (much) more information.
To obtain a well-posed problem Eqn. (6.58) must be supplied with an initial condition in the
form of a function
p(x, t0 ) = p0 (x) (6.59)
which encodes the distribution
R∞ of starting points x0 for sample paths of the SDE, and satisfies the
normalization condition −∞ p(x, t0 ) dx = 1 for a probability distribution.
R∞
Exercise 51. Verify that Eqn. (6.58) does preserve the value of the integral (mass) −∞ p(x, t) dx
of the 1
R ∞ probability distribution. [This is an example of a Liapunov function for a PDE. The L norm
of −∞ p(x, t) dx remains constant.]
Taking the simplest imaginable SDE
dx = σdW , x(0) = 0 , (6.60)
which is the continuum limit of a random walk, the corresponding Fokker-Planck equation is the
classical diffusion or heat equation:
∂p σ2 ∂ 2p
= . (6.61)
∂t 2 ∂x2
If all sample paths are started at the origin, we have p(x, t0 ) = δ(x). The initial value problem (6.60-
6.61) can be solved (e.g., by Fourier transforms) to yield a Gaussian distribution whose variance
increases linearly with t:
x2
µ ¶
1
p(x, t) = √ exp − 2 , (6.62)
2πσ 2 t 2σ t
where, for simplicity, we take t0 = 0. As expected, Eqn. (6.62) encodes the temporal signature of
spreading trajectories of Brownian particles. This example provides an intuitive explanation of the
diffusion term in Eqn. (6.58).
Fig. 6.7 shows sample paths (lower traces) and normalized histograms of the positions x(T )
generated by 10, 000 simulations of the SDE (6.60) at times (a) T = 20 and (b) T = 60, in
comparison with plots of the probability density function (6.62) at the same times (upper traces).
Agreement between the densities p(x, T ) and the histograms is excellent, but note that many
sample paths are necessary to obtain such good estimates, while a single solution of the Fokker-
Planck equation suffices, and moreover explicitly shows how parameters in the SDE appear in the
evolving densities. To further illustrate this point, consider an OU process with time-varying drift
rate A(t):
dx = [λx + A(t)] dt + σ dW , x(0) = x0 . (6.63)
178
0.4 Histogram of x(20) (normalized)
ρ(x,20)
0.2
0
20
15 Sample Trajectories
Time
10
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
x(t)
(a)
Histogram of x(60) (normalized)

0.2
ρ(x,60)
0.1
0
60
Sample Trajectories
40
Time
20
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
x(t)
(b)
Figure 6.7: Brownian Motion. Lower panels in each block show sample paths of (6.60) with
horizontal axis x(t) and vertical axis t. Upper panels plot solutions (6.62) of the Fokker-Planck
equation (black traces) along with normalized histograms of sample paths evaluated at (a) T = 20
and (b) T = 60. Sample paths were computed by the Euler-Maruyama method (§6.4.4).
179
(For λ = 0 this becomes a pure drift-diffusion (DD) process, although some authors reserve that
term for SDEs with A constant.) In this case the Fokker-Planck equation is
∂p ∂ σ2 ∂ 2p
=− [(λx + A(t))p] + , (6.64)
∂t ∂x 2 ∂x2
The term involving the first spatial derivative now appears due to the nonzero drift A and λx, which
exert a deterministic force. In the special case σ = 0, λ = 0 and A constant, we have convection at
speed A:
∂p ∂ ∂p
=− [A p] = −A ⇒ p(x, t) = p(x − At, 0) . (6.65)
∂t ∂x ∂x
This first order PDE provides the simplest example of traveling waves: the initial data p(x, 0) is
propagated at speed A without change of shape (cf. §3.6.1, and remember manipulations with the
traveling coordinate ζ = x − At and the chain rule). This provides intuition for the drift term in
Eqn. (6.58).
In general, for a Gaussian distribution of initial data
(x − µ0 )2
· ¸
1
p(x, 0) = √ exp − , (6.66)
2πν0 2ν0
the solution of (6.64) is:
(x − µ(t))2
· ¸
1
p(x, t) = p exp − , where (6.67)
2πν(t) 2ν(t)
Z t
λt c2 ³ 2λt ´
µ(t) = µ0 e + eλ(t−s) A(s) ds and ν(t) = ν0 e2λt + e −1 . (6.68)
0 2λ
In the case of stable OU processes (λ < 0), the variance ν(t) converges to c2 /|2λ|, while for unstable
processes (λ > 0) ν(t) grows exponentially fast. For pure drift diffusion processes (λ = 0), the mean
and variance formulae simplify to:
Z t
µ(t) = µ0 + A(s) ds and ν(t) = ν0 + c2 t , (6.69)
0
giving linear growth, as in Brownian motion. The time-varying drift term A(t) affects only the
center of mass or average of solutions of (6.47), which evolve according to the the deterministic
ODE
µ̇ = λµ + A(t) , µ(0) = µ0 . (6.70)
Exercise 52. Verify that Eqns. (6.67)-(6.68) and (6.69) satisfy the forward Kolmogorov equation
(6.64) with initial condition (6.66).
Exercise 53. Use the results above to verify the claim made following Eqn. (6.5): that solutions of
the stable OU process (6.5a) approach a Gaussian distribution with mean and variance as stated.
Describe what happens if λ = w − k > 0 in (6.5b).
The Kolmogorov equation (6.58) generalises to SDEs with multi-dimensional state spaces in a
natural way: the partial derivatives ∂/∂x and ∂ 2 /∂x2 becoming divergence and Laplacian opera-
tors respectively [88]. To illustrate this, and connect to the LAM of §6.1.2, in Fig. 6.8 we show
180
a) Decay > Inhibition b) Decay = Inhibition c) Decay < Inhibition
(O < 0) (O = 0) (O > 0)
4 4 4
y2
y2
y2
2 2 2
t = 0.5
0 0 0
-2 -2 -2
-4 -4 -4
-4 -2 0 2 y 4 -4 -2 0 2 y 4 -4 -2 0 2 y 4
1 1 1
4 4 4
y2
y2
y2
2 2 2
t=1
0 0 0
-2 -2 -2
-4 -4 -4
-4 -2 0 2 y 4 -4 -2 0 2 y 4 -4 -2 0 2 y 4
1 1 1
4 4 4
y2
y2
y2
2 2 2
t = 1.5
0 0 0
-2 -2 -2
-4 -4 -4
-4 -2 0 2 y1 4 -4 -2 0 2 y1 4 -4 -2 0 2 y1 4
Figure 6.8: Distributions of 1000 sample paths for the linearised leaky accumulator model (6.3)
at early (0.5 sec, top row), middle (1.0 sec, middle row) and later (1.5 sec, bottom row) times.
Here λ = w − k with k = 11.5 > w = 8.5 (left column), k = w = 10.0 (middle column) and
k = 8.5 < w = 11.5 (right
√ column). Attracting line shown thin solid. Remaining parameters
values are s1 = 3 + 2.19 2, s2 = 3, c = 1. In lower right most solutions have left the region shown.
From [22, Fig. 11]; sample paths were computed by the Euler-Maruyama method (§6.4.4). NOTE:
Interpret yj 7→ xj .
181
numerically-simulated sample paths for Eqn. (6.3) at three √ times after starting with a delta func-
tion distribution on the attracting line at y1 = (s1 + s2 )/[ 2(w + k)], y2 = 0 in the eigenvector
coordinates. Thus, the density projected onto the attracting line is effectively that for the scalar
OU-DD process (6.5b) and the density projected onto the dashed diagonal is that for the stable OU
process (6.5a). The left and right hand columns shows cases of stable (λ < 0) and unstable (λ > 0)
OU processes respectively in the y2 direction, and the central column is the pure DD process (a
balanced LAM).
6.4.3 First passage problems and the backward Kolmogorov equation
As we shall show in §6.5.2, the expressions (6.67-6.68) allow us to predict accuracy as a function
of time in 2AFC tasks run under a hard limit of the cued or deadlined protocol. Free responses, on
the other hand, are usually modeled as a first passage problem. Indeed, this is the situation that
emerges in the continuum limit of SPRT derived in §6.3.2: the decision is made when the solution
of the SDE (6.46) first reaches one of the two thresholds +z or −z.
We outline the first passage problem for the general nonlinear SDE
dx = f (x) dt + g(x) dW , x(0) = x0 , (6.71)
in which the functions f (x) and g(x) are independent of time except via the state variable x(t),
and passage of x(t) through x = a (resp., x = b), with b > a and x(0) ∈ (a, b), corresponds
to an incorrect (resp., correct) decision (to fix ideas, think of the case f (x) > 0). First passage
probabilities Π(x, t) are governed by the backward Kolmogorov or Fokker-Planck equation:
∂Π ∂Π g 2 (x) ∂ 2 Π
= f (x) + , (6.72)
∂t ∂x 2 ∂x2
To distinguish from the forward evolution problem (6.58) we use the variable Π(x, t) in place of
p(x, t). Note that, in (6.72), the functions f (x) and g(x) are now outside the partial x-derivatives.
Derivations of Eqn. (6.72) and of the ODEs that follow are more complicated than for the foward
equation (6.58) and we do not sketch them here; they can be found in [88].
Specifically, for the homogeneous problem (6.71) with time-independent coefficients, the proba-
bility Πa (x0 ) of first passage being through x = a, given a starting point x0 ∈ (a, b), is found from
the following boundary value problem:
g 2 (x0 ) ′′
f (x0 )Π′a (x0 ) + Πa (x0 ) = 0 , Πa (a) = 1, Πa (b) = 0 . (6.73)
2
(see [88, Eqn. (5.2.186)]). Note that the requirement of time-independent coefficients implies that
the variable drift case A(t) in (6.47) is not covered by this theory.
Multiplying the linear ODE (6.73) by the integrating factor

µZ x0 ¶
2f (y)
ψ(x0 ) = exp dy , (6.74)
a g 2 (y)
182
we may integrate once with respect to x0 to obtain:
d £ ′ c0
Πa (x0 )ψ(x0 ) = 0 ⇒ Π′a (x0 )ψ(x0 ) = c0 ⇒ Π′a (x0 ) =
¤
. (6.75)
dx0 ψ(x0 )
Integrating once more and using the boundary conditions we find:
Z x0 Z b
dy dy
Πa (x0 ) = c0 + c1 ⇒ Πa (a) = c1 = 1 , Πa (b) = c0 + c1 = 0 . (6.76)
a ψ(y) a ψ(y)
R b dy
This determines the two constants of integration c0 = −1/ a ψ(y) and c1 = 1.
As noted following Eqn. (6.46), we adopt the convention that the drift rate A ≥ 0, which implies
that the lower boundary x = a corresponds to incorrect decisions. Hence the error rate is
Fxb0
Z x2
Fax0 Fab − Fax0 x2 dy
ER = Πa (x0 ) = 1 − b = = b , where Fx1 = . (6.77)
Fa Fab Fa x1 ψ(y)
The accuracy or probability that a correct decision is made is therefore4
Fxb0 Fab − Fxb0 Fax0
1 − ER = 1 − = = (= Πb (x0 )) . (6.78)
Fab Fab Fab
The mean first passage time T (x0 ) for starting point x0 and passage through x = a or x = b, is
found from a second boundary value problem:
g 2 (x0 ) ′′
f (x0 )T ′ (x0 ) +T (x0 ) = −1 , T (a) = T (b) = 0 , (6.79)
2
(see [88, Eqn. (5.2.154)]; a simple derivation of this is given in [75, §10.1.6.1].) Eqn. (6.79) also
follows from the backward Kolmogorov equation and may be solved much as above. Letting
Z y Z x2
2 ψ(s) x2
h(y) = ds, Gx 1 = h(y)dy , (6.80)
ψ(y) a g 2 (s) x1
where ψ(y) is defined in Eqn. (6.74) and Fxx12 in Eqn. (6.77), we find that the mean first passage
time is [88, Eqn. (5.2.158)]:
F x 0 Gb − F b Gx 0
T (x0 ) = a x0 b x0 a . (6.81)
Fa
In the following, we shall interpret this as the mean decision time, i.e., T (x0 ) = hDTi, since the
DD process models only the evidence accumulation part of the cognitive process. The reaction
time (RT) also includes sensory transduction and motor response components, which are usually
modelled by an additive non-decision time Tnd , so that hRTi = hDTi + hTnd i [214, 248].
Exercise 54. Solve the inhomogeneous boundary value problem (6.79) and verify that the mean
first passage time is given by Eqn. (6.81). Also show that it can be written as
Fax0 Gba − Fab Gxa0
T (x0 ) = . (6.82)
Fab
Although Eqns. (6.77) and (6.81) give explicit expressions for ER and hDTi, the integrals Fxx12
and Gxx21 can only be evaluated in closed form for simple drift and diffusion coefficients f (x), g(x).
Even for the linear OU process (6.63) with constant A, special functions are required [22, Appendix].
Fortunately, for the pure DDM (6.46), simple expressions emerge: see §6.5.1 below).
4
R
Note that equations (5.2.189) and (5.2.190) of [88] are incorrect: the integrals are written there as ψ(y)dy.
183
6.4.4 Numerical integration: the Euler-Maruyama method
As we know, few nonlinear equations can be solved explicitly, and we frequently resort to nu-
merical integration to obtain approximate solutions. We end this section by describing the simplest
integration method for stochastic ODEs, which generalizes the forward Euler method of §2.2.1.
Taking steps of fixed size ∆t and letting tn = t0 + n∆t and xn = x(t0 + n∆t), the SDE (6.48)
is discretized as follows:
xn+1 = xn + f (xn , tn ) ∆t + g(xn , tn ) dW (tn ) or
√
xn+1 = xn + f (xn , tn ) ∆t + g(xn , tn ) ∆tN (0, 1) . (6.83)
Here
√ we again use the diffusion property (6.50) of the Weiner process in drawing samples of size
O( ∆t) from a normal distribution. Note that the function g(x, t) is evaluated at the start (tn )
of each time step, implying that the random increments are summed as in the Itō integral (6.49).
For more discussion and examples of Matlab codes that execute the Euler-Maruyama method, see
[118], and for extensive treatment of numerical methods for SDEs [152]. CITE an introductory
text, if there’s a good one!
In §2.2.1 we proved that solutions of the forward Euler method, defined on a finite interval
[t0 , t0 + T ] for the deterministic ODE ẋ = f (x, t), converge on the true solution as ∆t → 0, with a
global accumulated error of O(∆t). More precisely and in the present context, letting x(t) be the
true solution of ẋ = f (x, t) and {xn }Nn=0 be the discrete sequence generated by Eqn (6.83) with
g(x, t) ≡ 0, we proved first order convergence in the sense that
|xN − x(t0 + T )| ≤ K∆t , (6.84)
where the prefactor K depends upon the properties of f (x, t) and the elapsed time T , cf. Eqn. (2.30).
Since sample paths x(t) and xn of both the SDE (6.48) and its discrete approximation (6.83)
are random variables, we must use expected values to guage the error in the present case. If
E[|xn − x(t0 + n∆t)|] ≤ K∆tγ (6.85)
for all n ∈ [0, T /∆t], the numerical method is said to have strong order of convergence γ. Here K
depends upon T and properties of f (x, t), g(x, t), but for appropriate functions it can be shown
that the Euler-Maruyama
√ method satisfies inequality (6.85) with γ = 1/2. Recalling the fact that
increments dW ∼ ∆t, this is not surprising.
As noted in [118], Eqn. (6.85) bounds the rate at which the mean of the errors (averages over
many sample paths) decreases as ∆t → 0. One can also seek a bound for the rate of decrease in
the error of the means:
|E[xn ] − E[x(t0 + n∆t)]| ≤ K∆tγ . (6.86)
If inequality (6.86) holds for all n ∈ [0, T /∆t], the method has weak order of convergence γ. The fact
that E[dW ] = 0 implies that the noise terms average to zero in taking expectations of the integral
equation (6.48) and its discretized analogue (6.83). In other words, the expected values of sample
paths evolve according to the deterministic parts of the SDE (6.47) and the discrete map(6.83); cf.
Eqn. (6.70). The weak order of convergence for the Euler-Maruyama method is therefore γ = 1.
Numerical investigations of strong and weak convergence are described and displayed in [118].
184
6.5 A return to two-alternative forced-choice tasks
As we noted in §6.1, the 2AFC can be administered under a free response protocol, in which
subjects respond in their own time, deadlines may be imposed (to encourage rapid responses), or
the subject may be instructed to respond when a cue is presented, possibly some time after the
stimulus display is removed, as in a working memory task. A simplified limit of cued responses
is obtained by assuming that the stimulus remains visible until the response cue is presented, at
which point the subject is supposed to respond without further delay. We call this the interrogation
protocol. The DDM can model both these situations, as we now show.
6.5.1 The free response protocol
The only case in which simple formulae for error rates and mean decision times is the DDM
with constant drift:
dy = A dt + c dW , y(0) = y0 , (6.87)
and to further simplify we shall also assume symmetric thresholds y = ±z. Without loss of
generality, we take A > 0, so that passage through y = +z indicates a correct response, and through
−z an incorrect response. Letting y = Ax and defining new parameters and initial condition
µ ¶2
z A y0
α= > 0, β = > 0, and x(0) = x0 = , (6.88)
A c A
equation (6.87) becomes
1
dx = dt + √ dW , (6.89)
β
with thresholds x = ±α. The rescaling reveals that the problem only depends on two parameters:
the signal-to-noise ratio (SNR) β and the threshold-to-drift ratio α; α is the time taken to reach
threshold in the absence of noise: SNR β → ∞.
Eqn. (6.89) is a special case of the SDE (6.71) with g(x) = 1, D(x) = 1/β and thresholds
a = −α, b = +α. Hence, from Eqns. (6.74), (6.77) and (6.80) we can compute
ψ(y) = e2β(y+α) , h(y) = 1 − e−2β(y+α) ,
and
1 h i e−2αβ h −2x0 β i 1 h i
Fax0 = 1 − e−2(α+x0 )β , Fxb0 = e − e−2αβ , Fab = 1 − e−4αβ ,
2β 2β 2β
1 h −2(α+x0 )β i e−2αβ h −2αβ i
Gxa0 = α + x0 + e − 1 , Gbx0 = α − x0 + e − e−2x0 β .
2β 2β
Thus, via (6.77) and (6.81) we deduce that
1 − e−2x0 β
· ¸
1
ER = − , (6.90)
1 + e2αβ e2αβ − e−2αβ
2α(1 − e−2x0 β )
· ¸
hDTi = α tanh(αβ) + − x0 , (6.91)
e2αβ − e−2αβ
185
y0
where x0 = A.
Note that the error rate (6.90) may be made as small as we wish for a given drift and noise
variance, by picking α = z/A sufficiently high. Also, biased initial data with x0 > 0 reduces the ER
and mean DT (since we have assumed that A > 0 and the correct threshold +z > 0 is closer), while
if x0 < 0 the ER increases. If the two stimuli are delivered with equal probability and we require
equal accuracy for responses to both, the optimal procedure is to start each trial with unbiased
initial data: x0 = y0 = 0. (Recall that the analogous SPRT procedure starts with I0 = log(1) = 0,
cf. Eqn. (6.35).) In this case Eqns (6.90-6.91) reduce to
µ 2αβ ¶
1 e −1
ER = and hDTi = α tanh(αβ) = α 2αβ . (6.92)
1 + e2αβ e +1
It is also worth noting the behavior in the limit A → 0, the drift-free case. Evaluating the indeter-
minate forms in Eqns (6.90-6.91) (either by L’Hôpital’s rule or series expansions), we obtain
α − x0 z − y0 z 2 − y02
ER = = and hDTi = β(α2 − x20 ) = . (6.93)
2α 2z c2
Exercise 55. Show that the error rate and mean reaction time for the DDM can also be written
in the following forms:
e−2αγ − e−2αβ e−2αγ
· ¸
1
ER = 2αβ and hDTi = α − −γ, (6.94)
e − e−2αβ tanh(2αβ) sinh(2αβ)
y0
where γ = A.
6.5.2 The interrogation protocol
Cued responses can be modeled by assuming that sample paths of the drift-diffusion process
evolve freely until interrogation, at which instant we interpret the probability of responses in favor
of hypotheses H0 (resp., H1 ) by asking if a given sample path lies above or below y = 0. This is
the continuum analog of the Neyman-Pearson test. Specifically, we evaluate the integrals of the
probability distribution of solutions of the forward Kolmogorov equation between −∞ and 0 and
0 and +∞ respectively, to evaluate the expected accuracy and error rate. In the interval before
interrogation a sample path may cross and recross y = 0 many times.
Assuming again that y > 0 represents the correct choice, the probabilities of correct and incorrect
choices at time t are therefore:
Z ∞ Z 0
P(correct) = 1 − ER = p(y, t) dy and P(incorrect) = ER = p(y, t) dy . (6.95)
0 −∞
Using the change of variables

y−µ
u= √ ,
2ν
and the expression (6.67) derived above for p(y, t), the first of these integrals becomes
Z ∞ · µ ¶¸
1 −u2 1 µ
P(correct) = √ e du = 1 + erf √ , (6.96)
− √µ π 2 2ν
2ν
186
and consequently we also have
· µ ¶¸
1 µ
P(incorrect) = ER = 1 − erf √ . (6.97)
2 2ν
Rx 2
Here erf denotes the error function integral (erf(x) = √2π 0 e−u du ; erf(∞) = 1), and the appro-
priate expressions for the evolving mean µ(t) and variance ν(t) are found from (6.68) or (6.69).
The expression P(correct) of (6.95) is therefore a function of the time t at which the response is
given. In psychology this cumulative distribution is called the psychometric function.
6.6 Optimal performance on a free response task: analysis and

predictions
In §6.1 we referred to the speed-accuracy tradeoff. In many situations there are competing
pressures for accuracy, and speed: one would like to finish the test and get as many questions right
as possible. How long should one spend on each question? The constrained 2AFC situation allows
us to derive the optimal compromise using little more than freshman calculus.
We have seen that the neurally-inspired (albeit not rigorously-derived) leaky accumulator model
reduces to a DDM in the limit of balanced, and sufficiently large, leak and inhibition. It is therefore
plausible that human and animal brains may be able to approximate DDM decision dynamics.
Moreover the DDM is the optimal decision maker in the sense of Theorem 5: faced with noisy
incoming data it delivers a decision of guaranteed accuracy in the shortest possible time. It remains
to balance accuracy and speed, which, as we shall see, involves selecting the appropriate threshold
depending on task conditions.
Suppose that each correct response is rewarded with a fixed sum of money (for humans) or a
drop of juice (for thirsty monkeys), both assumed to be items of fixed value, and that each block
of trials runs for a fixed duration. If the subject wishes to maximise his or her total reward, then
the average reward rate given by the following function must be maximised:
1 − ER
RR = . (6.98)
hRTi + D + Dp ER
In (6.98) the numerator is the probability of being correct on each trial and the denominator is the
average time that elapses between each response. Here hRTi = hDT + Tnd i denotes the subject’s
mean reaction time (the sum of the decision time and sensori-motor latency Tnd ), D denotes the
response to stimulus interval (RSI) and Dp denotes an additional penalty delay that may be imposed
following errors. Both D and Dp are set by the experimenter, and the difficulty of the task may
also be adjusted, e.g. via coherence in the moving dots stimulus.
To maintain a stationary environment, the difficulty must be kept constant during each block
of trials, implying that the SNR β is fixed5 . For simplicity we shall also assume that Tnd and the
5
Note that many experiments, especially those using primates, are run with coherences and viewing times drawn
from random distributions, so that the local temporal environments are not stationary in the sense required here.
187
delays D and Dp are fixed, but they could be drawn from fixed distributions with well-defined
means hDi and hDp i, which would then replace D and Dp in the denominator of Eqn. (6.98).
We substitute the expressions (6.92) for ER and hDTi into (6.98). A short calculation yields
the expression:
µ ¶2
h
−2αβ
i−1 z A
RR = α + hTnd i + D + (hTnd i + D + Dp − α) e , where α = and β = (6.99)
A c
from Eqn. (6.88). Given fixed values of D, Dp and SNR (⇒ β = const.), we can now maximize
RR, or rather minimize 1/RR, with respect to the threshold-to-drift ratio α:
µ ¶
d 1
= 1 − e−2αβ − 2β (hTnd i + D + Dp − α) e−2αβ = 0 ,
dα RR
which implies that
e2αβ − 1 = 2β(hTnd i + D + Dp − α) . (6.100)
Eqn. (6.100) has a unique solution α = α(β, hTnd i + D + Dp ) that depends only on SNR and the
sum of the nondecision time, RSI and penalty delay. This critical point is a minimum of 1/RR,
and hence a maximum of RR; see Fig. 6.9.
~
zo H ~
zo ~
zo H
0.9
0.85 RR ~
zo H
0.8
RR ~
zo H
0.75
Reward rate
0.7
0.65
0.6 a~a=14.1
a~a=9.1
0.55 a~a=4.0
0.5
0 0.1 0.2 0.3 0.4 0.5
Normalized threshold
Figure 6.9: Reward rate as a function of rescaled threshold α for three values of SNR: β = 4.0
(dotted), β = 9.1 (dashed) and β = 14.1 (solid). Perturbations ǫ from optimal threshold that
maximizes RR are described in text at the end of this section. From [22, Fig. 15].
Exercise 56. Check that the critical point of the function 1/RR found above is a minimum, and
show that it is the only critical point of this function. Plot graphs of the reward rate RR as a
function of α for values of β = 10, 5, 1 and D = 0.5, 1.0, 2 (set Dp = 0).
We may go further, solving for α and β in terms of hDTi and ER from (6.92) to obtain
µ ¶
hDTi 1 − 2ER 1 − ER
α= , β= log . (6.101)
1 − 2ER 2hDTi ER
188
Substituting Eqns. (6.101) into (6.100) yields the speed-accuracy tradeoff that follows from maxi-
mizing RR:
" #−1
hDTi 1 1
= ¢+ , (6.102)
ER log 1−ER 1 − 2ER
¡
Dtotal ER
where Dtotal = hTnd i + D + Dp . This optimal performance curve (OPC) uniquely relates the mean
normalised decision time hDTi/Dtotal to ER. Not that only experimental observables appear in
Eqn (6.102): there are no other parameters, and in particular, no free parameters. Hence data
collected for different subjects (who may exhibit differing SNRs, even when viewing the same
stimuli), and for differing RSIs and penalty delays, can be pooled and compared with the theory.
The form of the OPC, shown in Fig. 6.10(a), may be intuitively understood by observing that
very noisy stimuli (β ≈ 0) contain little information, so that given a priori knowledge that they
are equally likely, it is optimal to choose at random without examining them, giving ER = 0.5
and hDTi = 0 (note the points for β = 0.1 in Fig. 6.10(a)). At the opposite limit β → ∞, noise-
free stimuli are so easy to discriminate that both hDTi and ER approach zero (note the points
for β = 100 in Fig. 6.10(a)). This limit yields the highest RRs, but when SNR is finite, due to
poor stimulus discriminability or a subject’s inability to clearly detect a signal, making immediate
decisions is not optimal. Instead, it is advantageous to accumulate the noisy evidence for just long
enough to make the best possible choice (see the points for β = 1 and β = 10 in Fig. 6.10(a)).
Non-optimal thresholds degrade performance, as illustrated by the diamonds in Fig. 6.10(a), which
result from setting α 25% above and below optimal for β = 1 and Dtot = 2: in both cases the RR
degrades by ≈ 1.3%. See [22, 283] for further details.
Fig. 6.10(b) shows a histogram of behavioral data compiled from human subjects, indicating
that those who score in the top 30% overall on a series of tests with differing dalays and SNRs
follow the optimal curve remarkably closely [131, 23]. More detailed data analysis in [23] reveals
that, in each block of trials for which stimulus recognition difficulty (∼ SNR) and RSI are held
constant, these subjects rapidly adjust their thresholds to achieve this. (A model for threshold
adjustment is described in [246].)
However, other subjects, and especially the lowest-scoring 10%, display suboptimal behavior,
with significantly longer decision times and correspondingly lower ERs. Previous studies have shown
that humans often favor accuracy over reward [192], and alternative objective functions have been
proposed to account for this. For example, one can define a modified reward rate, weighted toward
accuracy by additionally penalizing errors, as suggested by the proposal that humans exhibit a
competition between reward and accuracy (COBRA) [178, 24]:
q
RA = RR − ER ; (6.103)
Dtotal
here the factor q specifies the additional weight placed on accuracy, and the characteristic time Dtotal
is included in the second factor, so that the units of both terms in RA are consistent. Maximizing
RA as above we obtain a family of OPCs parameterized by q:
p
DT E − 2q − E 2 − 4q(E + 1)
= , (6.104)
Dtotal 2q
where " #
1 1
E= ¢ + . (6.105)
ER log 1−ER −
¡
ER
1 2ER
189
0.3
OPC
Mean normalized decision time

Total delay = 1
Total delay = 2
suboptimal
0.2
SNR= 1
0.1 SNR= 10
SNR=100 SNR=0.1
0
0 0.1 0.2 0.3 0.4 0.5
Error rate
0.7 Top 30%

Top 50%
Top 90%
0.6 All
Mean normalized decision time
Theory
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5
Error rate
Figure 6.10: Top: Optimal performance curve (OPC) of Eq. (6.102) relating mean normalized de-
cision time to error-rate across varying task conditions. Triangles and circles denote performances
under conditions specified in box. Moving left, RRs increase with SNR from 0.51 to 0.60, 0.84 and
0.97 with Dtot = 1, and from 0.26 to 0.33, 0.45 and 0.49 with Dtot = 2. Suboptimal performance
results from thresholds 25% above and below optimal threshold with SNR=1 and Dtot = 2 (dia-
monds); both ≈ 1.3% below maximum RR. Bottom: OPC (black curve) compared with data from
80 human subjects (histograms), sorted according to total rewards accrued over multiple blocks
of trials with two difficulty levels and D = 0.5, 1.0, 2.0 s and Dp = 0, 1.5 s. Open bars: all sub-
jects; light bars: lowest 10% excluded; medium bars: lowest 50% excluded; dark bars: lowest 70%
excluded. Vertical line segments indicate standard errors. From [283, Fig. 1].
If rewards are monetary, one can also postulate a situation in which errors are rewarded (albeit
less lavishly than correct reponses), or penalized by subtraction from previous winnings:
(1 − ER) − qER
RRm = (6.106)
DT + Dtotal
190
This leads to the following family of OPCs:
" q
#−1
1
DT − 1 − q
= (1 + q) ER ¡ 1−ER
1−ER
¢ + . (6.107)
Dtotal log ER 1 − 2ER
Eqns. (6.104-6.105) and Eqn. (6.107) both reduce to (6.102) for q = 0, as expected. Fig. 6.11 shows
an example of the second family (6.107). Eqn. (6.104) gives a similar family, but the maxima remain
at the same ER(≈ 0.18) as q increases, rather than moving rightwards as in Fig. 6.11 [22, 23].
0.9
0.8
0.7
0.6
Decision time
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5
Error rate
Figure 6.11: Optimal performance curves of (6.107) for the modified reward rate function RRm
of (6.106) with q varied in steps of 0.1 between −0.2 (lowest curve) and 0.8 (highest curve). The
dashed curve corresponds to q = 0 (Eqn. (6.102)) and the bold solid curve to q = 0.62: the best fit
to all the subjects in the study (white bars). Error bars indicate standard error. From [131, Fig.
2].
Both these functions involve a weight parameter q, which will typically be subject-dependent,
since different people may place a greater or lesser weight on accuracy, even if they understand that
a specific balance is implied, as in Eqn. (6.106). Values of q should therefore be fitted to indivduals
or subgroups of subjects, and the theory becomes descriptive rather than prescriptive. However,
Fig. 6.11 shows that an average weight (q = 0.62) may be assigned to the entire group to yield an
improved fit over that to the RR in Fig 6.10. See [23] for further details and fits of these curves to
behavioral data.
One can also observe that the typical shape of the RR vs. threshold curve (see Fig. 6.9 above)
is asymmetric around its maximum, falling more steeply toward lower thresholds and less steeply
toward higher thresholds. The loss of reward is therefore greater for underestimates of optimal
thresholds than for overestimates, assuming misestimation is symmetric (RR(z̃0 − ǫ) < RR(z̃0 +
ǫ)) [22]. This may partially explain the tendency of some subjects to pick higher thresholds and
consequently emphasise accuracy. Indeed, a re-analysis of the data of [23], suggested by information
gap theory [19], shows that such subjects may be allowing for their uncertainty in assessing how
much time has elapsed between responses [283]. This was modeled by allowing the RSIs to belong
to a closed interval ID of finite length and maximizing the worst RR that occurs for D ∈ ID .
191
A recent study that allowed more training sessions than [23] found that, while most subjects
started the experiment with a bias toward accuracy, their performance approached the OPC over
multiple sessions as they abandoned this bias, and that the remaining individual deviations from
the OPC were correlated with subjects’ coefficients of variation in an interval timing task [15].
However, for the range of difficulties used, it was also found that a constant threshold strategy
fitted the “well-trained” data almost as well as the OPC.
Finally, experiments featuring stimuli of unequal probability and unequal rewards can also be
analyzed in terms of reward rate metrics, and (again with sufficient training), both humans on a
free response task [245], and monkeys in an interrogation task [77, 224] have been found to exhibit
near optimal performances.
192
Bibliography
[1] L.F. Abbott. Theoretical neuroscience rising. Neuron, 60:489–495, 2008.
[2] L.F. Abbott and E.S. Chance. Drivers and modulators from push-pull and balanced synaptic
input. Prog. Brain Res., 149:147–155, 2005.
[3] L.F. Abbott and T. Kepler. Model neurons: From Hodgkin-Huxley to Hopfield. In L. Garrido,
editor, Statistical Mechanics of Neural Networks. Springer, Berlin, 1990.
[4] H. Agmon-Snir, C.E. Carr, and J. Rinzel. The role of dendrites in auditory coincidence
detection. Nature, 393:268–272, 1998.
[5] R. McN. Alexander. Principles of Animal Locomotion. Princeton University Press, Princeton,
NJ, 2003.
[6] R. Altendorfer, A.N. Moore, H. Komsuoglu, M. Buehler, H.B. Brown, D. McMordie,

U. Saranli, R.J. Full, and D.E. Koditschek. RHex: A biologically inspired hexapod run-
ner. J. Autonomous Robots, 11:207–213, 2001.
[7] D. J. Amit and N. Brunel. Model of global spontaneous activity and local structured activity
during delay periods in the cerebral cortex. Cereb. Cortex, 7:237–252, 1997.
[8] D.J. Amit and N. Brunel. Model of global spontaneous activity and local structured activity
during delay periods in the cerebral cortex. Cereb. Cortex, 7:237–252, 1997.
[9] D.J. Amit and M.V. Tsodyks. Quantitative study of attractor neural network retrieving at
low spike rates i: substrate-spikes, rates and neuronal gain. Network, 2:259–274, 1991.
[10] J. Anderson, I. Lampl, I. Reichova, M. Carandini, and D. Ferster. Stimulus dependence of

two-state fluctuations of membrane potential in cat visual cortex. Nat. Neurosci., 3:617–621,
2000.
[11] A.A. Andronov, E.A. Witt, and Khaiken S.E. Theory of Oscillators. Pergamon Press, Oxford,
UK, 1966. Reprinted by Dover Publications, New York, 1987.
[12] V.I. Arnold. Geometrical Methods in the Theory of Ordinary Differential Equations. Springer,
New York, 1983.
[13] G. Ashida and C.E. Carr. Sound localization: Jeffress and beyond. Curr. Opin. Neurobiol.,
21:1–7, 2011.
193
[14] A. Ayali, E. Fuchs, E. Hulata, and E. Ben Jacob. The function of inter segmental connections
in determining temporal characteristics of the spinal cord rhythmic output. Neuroscience,
147:236–246, 2007.
[15] F. Balci, P. Simen, R. Niyogi, A. Saxe, P. Holmes, and J.D. Cohen. Acquisition of de-
cision making criteria: Reward rate ultimately beats accuracy. Attention, Perception &
Psychophysics, 73 (2):640–657, 2011.
[16] H.B. Barlow. Intraneuronal information processing, directional selectivity and memory for
spatio-temporal sequences. Network, 7:251–259, 1996.
[17] G.A. Barnard. Sequential tests in industrial statistics. J. Roy. Statist. Soc. Suppl., 8:1–26,
1946.
[18] T. Bayes. An essay toward solving a problem in the doctrine of chances. Phil. Trans. Roy.
Soc., 53:370–418, 1763.
[19] Y. Ben-Haim. Information Gap Decision Theory: Decisions under Severe Uncertainty. Aca-
demic Press, New York, 2006. 2nd Edition.
[20] C.M. Bender and S.A. Orszag. Advanced Mathematical Methods for Scientists and Engineers.
McGraw Hill, New York, 1978.
[21] P. Billingsley. Probability and Measure. Wiley, New York, 1995.
[22] R. Bogacz, E. Brown, J. Moehlis, P. Holmes, and J.D. Cohen. The physics of optimal decision
making: A formal analysis of models of performance in two alternative forced choice tasks.
Psychol. Rev., 113 (4):700–765, 2006.
[23] R. Bogacz, P. Hu, P. Holmes, and J.D. Cohen. Do humans produce the speed-accuracy
tradeoff that maximizes reward rate? Quart. J. Exp. Psychol., 63 (5):863–891, 2010.
[24] C.J. Bohil and W.T. Maddox. On the generality of optimal versus objective classifier feedback
effects on decision criterion learning in perceptual categorization. Memory & Cognition, 31
(2):181–198, 2003.
[25] W.E. Boyce and R.C. DiPrima. Elementary Differential Equations and Boundary Value
Problems. Wiley, New York, 1997.
[26] N. Brenner, S.P. Strong, R. Koberle, W. Bialek, and R. De Ruyter van Steveninck. Synergy
in a neural code. Neural Computation, 12:1531–1552, 2000.
[27] R. Brette. Exact simulation of integrate-and-fire models with exponential current. Neural
Computation, 19:2604–2609, 2007.
[28] R. Brette, M. Rudolph, T. Carnevale, M. Hines, D. Beeman, J.M. Bower, M. Diesmann,

A. Morrison, P.H. Goodman, F.C. Jr. Harris, M. Zirpe, T. Natschlager, D. Pecevski, G.B.
Ermentrout, M. Djurfeldt, A. Lansner, O. Rochel, T. Vieville, E. Muller, A.P. Davision,
S. El Boustani, and A. Destexhe. Simulation of networks of spiking neurons: a review of
tools and strategies. J. Comput. Neurosci., 23:349–398, 2007.
[29] K.H. Britten, M.N. Shadlen, W.T. Newsome, and J.A. Movshon. Responses of neurons in
macaque MT to stochastic motion signals. Visual Neurosci., 10:1157–1169, 1993.
194
[30] E. Brown, J. Gao, P. Holmes, R. Bogacz, M. Gilzenrat, and J Cohen. Simple networks that
optimize decisions. Int. J. Bifurcation and Chaos, 15 (3):803–826, 2005.
[31] E. Brown, J. Moehlis, and P. Holmes. On the phase reduction and response dynamics of
neural oscillator populations. Neural Computation, 16 (4):673–715, 2004.
[32] E. Brown, J. Moehlis, P. Holmes, E. Clayton, J. Rajkowski, and G. Aston-Jones. The influence
of spike rate and stimulus duration on noradrenergic neurons. J. Comput. Neurosci, 17 (1):5–
21, 2004.
[33] N. Brunel and V. Hakim. Fast global oscillations in networks of integrate-and-fire neurons
with low firing rates. Neural Computation, 11:1621–1671, 1999.
[34] N. Brunel, V. Hakim, and M.J.E. Richardson. Firing-rate resonance in a generalized integrate-
and-fire neuron with subthreshold resonance. Phys. Rev. E, 67:051916, 2003.
[35] N. Brunel and M.C.W. van Rossum. Lapicque’s 1907 paper: from frongs to integrate-and-fire.
Biol. Cybern., 97:337–339, 2007.
[36] N. Brunel and X.-J. Wang. Effects of neuromodulation in a cortical network model. J.
Comput. Neurosci., 11:63–85, 2001.
[37] J.T. Buchanan. Neural network simulations of coupled locomotor oscillators in the lamprey
spinal cord. Biol. Cybern., 66:367–374, 1992.
[38] J.T. Buchanan and S. Grillner. Newly identified glutamate interneurons and their role in
locomotion in the lamprey spinal cord. Science, 236:312–314, 1987.
[39] A.N. Burkitt. A review of the integrate-and-fire neuron model: I. Homogeneous synaptic
input. Biol. Cybern., 95:1–19, 2006.
[40] A.N. Burkitt. A review of the integrate-and-fire neuron model: II. Inhomogeneous synaptic
input and network properties. Biol. Cybern., 95:97–112, 2006.
[41] J.R. Busemeyer and J.T. Townsend. Decision field theory: A dynamic-cognitive approach to
decision making in an uncertain environment. Psychol. Rev., 100:432–459, 1993.
[42] M. Camperi and X.-J. Wang. A model of visuospatial working memory in prefrontal cortex:
recurrent network and cellular bistability. J. Comput. Neurosci., 5:383–405, 1998.
[43] C. Capaday and C. van Vreeswijk. Direct control of firing rate gain by dendritic shunting
inhibition. J. Integr. Neurosci., 5:199–222, 2006.
[44] F.S. Chance, L.F. Abbott, and A.D. Reyes. Gain modulation from background synaptic
input. Neuron, 35:773–782, 2002.
[45] H.J. Chiel and R.D. Beer. The brain has a body: adaptive behavior emerges from interactions
of nervous system, body and environment. Trends Neurosci., 20(12):553–557, 1997.
[46] K. Chung, J. Wallace, S.-Y. Kim, S. Kalyanasundaram, A.S. Andalman, T.J. Davidson, J.J.
Mirzabekov, K.A. Zalocusky, J. Mattis, A.K. Denisin, S. Pak, H. Bernstein, C. Ramakrishnan,
L. Grosenick, V. Gradinaru, and K. Deisseroth. Structural and molecular interrogation of
intact biological systems. Nature, 497:332–337, 2013.
195
[47] A.H. Cohen, P. Holmes, and R.H. Rand. The nature of coupling between segmental oscillators
of the lamprey spinal generator for locomotion: a model. J. Math Biol., 13:345–369, 1982.
[48] A.H. Cohen, R.H. Rand, and P. Holmes. Systems of coupled oscillators as models of central
pattern generators. In A.H Cohen, S. Rossignol, and S. Grillner, editors, Neural Control of
Rhythmic Movements in Vertebrates, pages 333–367. Wiley, New York, 1988.
[49] A.H. Cohen, S. Rossignol, and S. Grillner, editors. Neural Control of Rhythmic Movements
in Vertebrates. Wiley, New York, 1988.
[50] J.D. Cohen, K. Dunbar, and J.L. McClelland. On the control of automatic processes: A
parallel distributed processing model of the Stroop effect. Psychol. Rev., 97(3):332–361,
1990.
[51] M. Cohen and S. Grossberg. Absolute stability of global pattern formation and parallel
memory storage by competitive neural networks. IEEE Transactions on Systems, Man, and
Cybernetics, SMC-13:815–826, 1983.
[52] J. Connor, D. Walter, and R. McKown. Neural repetitive firing: modifications of the Hodgkin-
Huxley axon suggested by experimental results from crustacean axons. Biophys. J., 18:81–102,
1977.
[53] J.A. Connor and C.F. Stevens. Prediction of repetitive firing behaviour from voltage clamp
data on an isolated neurone soma. J. Physiol., 213:31–53, 1971.
[54] J.A. Connor, D. Walter, and R. McKown. Neural repetitive firing: modifications of the
hodgkin-huxley axon suggested by experimental results from crustacean axons. Biophys. J.,
18:81–102, 1977.
[55] R. Cossart, D. Aronov, and R. Yuste. Attractor dynamics of network up states in the neo-
cortex. Nature, 423:283–288, 2003.
[56] E. Couzin-Fuchs, T. Kiemel, O. Gal, P. Holmes, and A. Ayali. Intersegmental coupling and
recovery from perturbations in freely-running cockroaches. J. Exp. Biol., 218():285–297, 2015.
[57] S. Daun-Gruhn, J.E. Rubin, and I.A. Rybak. Control of oscillation periods and phase du-
rations in half-center central pattern generators: a comparative mechanistic analysis. J.
Comput. Neurosci., 27 (1):3–36, 2009.
[58] P. Dayan and L.F. Abbott. Theoretical Neuroscience: Computational and Mathematical
Modeling of Neural Systems. MIT Press, Cambridge, MA, 2001.
[59] E. De Schutter. Why are computational neuroscience and systems biology so separate? PLoS
Comput. Biol., 4 (5):e1000078, 2008.
[60] E. De Schutter, editor. Computational Modeling Methods for Neuroscientists. MIT Press,
Cambridge, MA, 2009.
[61] G. Deco, E.T. Rolls, L. Albantakis, and R. Romo. Brain mechanisms for perceptual and
reward-related decision-making. Prog. in Neurobiol., 103:194–213, 2013.
[62] M.H. DeGroot. A conversation with George A. Barnard. Statist. Sci., 3:196–212, 1988.
196
[63] K. Deisseroth, G. Feng, A.K. Majewska, G. Miesenbock, A. Ting, and M.J. Schnitzer. Next-
generation optical technologies for illuminating genetically targeted brain circuits. J. Neu-
rosci., 26 (41):10380–10386, 2006.
[64] A. Destexhe, Z.F. Mainen, and T.J. Sejnowski. Kinetic models of synaptic transmission. In
C. Koch and I. Segev, editors, Methods in Neuronal Modeling: From Ions to Networks, pages
1–25. MIT Press, Cambridge, MA, 1999. Second edition.
[65] A. Doloc-Mihu and R.L. Calabrese. A database of computational models of a half-center

oscillator for analyzing how neuronal parameters influence network activity. J. Biol. Phys.,
37 (3):263–283, 2011.
[66] P. Eckhoff, K.F. Wong-Lin, and P. Holmes. Optimality and robustness of a biophysical
decision-making model under nonepinephrine modulation. J. Neurosci., 29 (13):4301–4311,
2009.
[67] P. Eckhoff, K.F. Wong-Lin, and P. Holmes. Dimension reduction and dynamics of a spiking
neuron model for decision making under neuromodulation. SIAM J. on Applied Dynamical
Systems, 10 (1):148–188, 2011.
[68] A.V. Egorov, B.N. Hamam, E. Fransen, M.E. Hasselmo, and A.A. Alonso. Graded persistent
activity in entorhinal cortex neurons. Nature, 420:173–178, 2002.
[69] G.B. Ermentrout. Type I membranes, phase resetting curves, and synchrony. Neural Com-
putation, 8:979–1001, 1996.
[70] G.B. Ermentrout. Simulating, Analyzing, and Animating Dynamical Systems: A Guide to
XPPAUT for Researchers and Students. SIAM, Philadelphia, 2002.
[71] G.B. Ermentrout, R.F. Galán, and N.N. Urban. Relating neural dynamics to neural coding.
Phys. Rev. Lett., 99:248103, 2007.
[72] G.B. Ermentrout and N. Kopell. Frequency plateaus in a chain of weakly coupled oscillators.
SIAM J. Math. Anal., 15:215–237, 1984.
[73] G.B. Ermentrout and N. Kopell. Parabolic bursting in an excitable system coupled with a
slow oscillation. SIAM J. Appl. Math., 46:233–253, 1986.
[74] G.B. Ermentrout and N. Kopell. Multiple pulse interactions and averaging in systems of
coupled neural oscillators. J. Math. Biol., 29:195–217, 1991.
[75] G.B. Ermentrout and D. Terman. Mathematical Foundations of Neuroscience. Springer, New
York, 2010.
[76] C.P. Fall and J. Rinzel. An intracellular Ca2+ subsystem as a biologically plausible source of
intrinsic conditional bistability in a network model of working memory. J. Comput. Neurosci.,
20:97–107, 2006.
[77] S. Feng, P. Holmes, A. Rorie, and W.T. Newsome. Can monkeys choose optimally when faced
with noisy stimuli and unequal rewards? PLoS Comput. Biol., 5 (2):e1000284, 2009.
[78] R. FitzHugh. Thresholds and plateaus in the Hodgkin-Huxley nerve equations. J. Gen.
Physiol., 43:867–896, 1960.
197
[79] R. FitzHugh. Impulses and physiological states in models of nerve membrane. Biophys. J.,
1:445–466, 1961.
[80] N. Foucaud-Trocme and N. Brunel. Dynamics of the instantaneous firing rate in response to
changes in input statistics. J. Comput. Neurosci., 18:311–321, 2005.
[81] N. Foucaud-Trocme, D. Hansel, C. van Vreeswijk, and N. Brunel. How spike generation
mechanisms determine the neuronal response to fluctuating inputs. J. Neurosci., 23:11628–
11640, 2003.
[82] E. Fransen, B. Tahvildari, A.V. Egorov, M.E. Hasselmo, and A.A. Alonso. Mechanism of
graded persistent cellular activity of entorhinal cortex layer V neurons. Neuron, 49:735–746,
2006.
[83] E. Fuchs, P. Holmes, I. David, and A. Ayali. Proprioceptive feedback reinforces centrally-
generated stepping patterns in the cockroach. J. Exp. Biol., 215:1884–1891, 2012.
[84] E. Fuchs, P. Holmes, T. Kiemel, and A. Ayali. Intersegmental coordination of cockroach

locomotion: Adaptive control of centrally coupled pattern generator circuits. Frontiers in
Neural Circuits, 4:125, 2011.
[85] S. Fusi and M. Mattia. Collective behavior of networks with linear (VLSI) Integrate and Fire
Neurons. Neural Computation, 11:633–652, 1999.
[86] F. Gabbiani and S. Cox. Mathematics for Neuroscientists. Academic Press, San Diego, CA,
2010.
[87] J. Gao and P. Holmes. On the dynamics of electrically-coupled neurons with inhibitory
synapses. J. Comput. Neurosci., 22:39–61, 2007.
[88] C.W. Gardiner. Handbook of Stochastic Methods for Physics, Chemistry and the Natural
Sciences. Springer, New York, 1985. Third edition, 2004.
[89] G.L. Gerstein and B. Mandelbrot. Random walk models for the spike activity of a single
neuron. Biophys. J., 4:41–68, 1964.
[90] W. Gerstner. Time structure of the activity in neural network models. Phys Rev E Stat Phys
Plasmas Fluids Relat Interdiscip Topics, 51:738–758, 1995.
[91] W. Gerstner and W.M. Kistler. Spiking Neuron Models. Cambridge University Press, Cam-
bridge, 2002.
[92] P.A. Getting. Comparative analysis of invertebrate central pattern generators. In A.H. Cohen,
S. Rossignol, and S. Grillner, editors, Neural Control of Rhythmic Movements in Vertebrates,
chapter 4, pages 101–128. John Wiley, New York, 1988.
[93] R.M. Ghigliazza and P. Holmes. A minimal model of a central pattern generator and mo-
toneurons for insect locomotion. SIAM J. on Applied Dynamical Systems, 3 (4):671–700,
2004.
[94] R.M. Ghigliazza and P. Holmes. Minimal models of bursting neurons: How multiple currents,
conductances and timescales affect bifurcation diagrams. SIAM J. on Applied Dynamical
Systems, 3 (4):636–670, 2004.
198
[95] L. Glass and M.C. Mackey. From Clocks to Chaos. Princeton University Press, Princeton,
NJ, 1988.
[96] J. Gold and M. Shadlen. Banburismus and the brain: Decoding the relationship between
sensory stimuli, decisions, and reward. Neuron, 36:299–308, 2002.
[97] J.I. Gold and M.N Shadlen. Neural computations that underlie decisions about sensory
stimuli. Trends in Cognitive Science, 5 (1):10–16, 2001.
[98] J.I. Gold and M.N. Shadlen. Banburismus and the brain: decoding the relationship between
sensory stimuli, decisions, and reward. Neuron, 36:299–308, 2002.
[99] M.S. Goldman, J.H. Levine, G. Major, D.W. Tank, and H.S. Seung. Robust persistent neural
activity in a model integrator with multiple hysteretic dendrites per neuron. Cereb. Cortex,
13:1185–1195, 2003.
[100] I.J. Good. Studies in the history of probability and statistics. XXXVI. A.M. Turing’s statis-
tical work in World War II. Biometrika, 66:393–396, 1979.
[101] M.D. Greenberg. Foundations of Applied Mathematics. Prentice-Hall, Englewood Cliffs, NJ,
1978. Reprinted by Dover Publications Inc, NY.
[102] S. Grillner. Bridging the gap - from ion channels to networks and behaviour. Curr. Opin.
Neurobiol., 9:663–669, 1999.
[103] S. Grillner, J.T. Buchanan, and A. Lansner. Simulation of the segmental burst generating
network for locomotion in lamprey. Neurosci. Letters, 89:31–35, 1988.
[104] S. Grossberg. Nonlinear neural networks: Principles, mechanisms, and architectures. Neural
Networks, 1:17–61, 1988.
[105] B. Grothe, M. Pecka, and D. McAlpine. Mechanisms of sound localization in mammals.

Physiol. Rev., 90:983–1012, 2010.
[106] J. Guckenheimer. Isochrons and phaseless sets. J. Math. Biol., 1:259–273, 1975.
[107] J. Guckenheimer and P. Holmes. Nonlinear Oscillations, Dynamical Systems and Bifurcations
of Vector Fields. Springer, New York, 1983. Sixth Edition, 2002.
[108] B.S. Gutkin and G.B. Ermentrout. Dynamics of membrane excitability determine interspike
interval variability: a link between spike generation mechanisms and cortical spike train
statistics. Neural Computation, 10:1047–1065, 1998.
[109] R. Guttman, S. Lewis, and J. Rinzel. Control of repetitive firing in squid axon membrane as
a model for a neuroneoscillator. J. Physiol. (London), 305:377–395, 1980.
[110] A. Hagevik and A.D. McClellan. Coupling of spinal locomotor networks in larval lamprey
revealed by receptor blockers for inhibitory amino acids: neurophysiology and computer
modeling. J. Neurophysiol., 72:1810–1829, 1994.
[111] D.P. Hanes and J.D. Schall. Neural control of voluntary movement initiation. Science,
274:427–30, 1996.
199
[112] D. Hansel, G. Mato, C. Meunier, and L. Neltner. On numerical simulations of integrate-and-
fire neural networks. Neural Computation, 10:467–483, 1998.
[113] N.S. Harper and D. McAlpine. Optimal neural population coding of an auditory spatial cue.
Nature, 430:682–686, 2004.
[114] R. Harrison. The outgrowth of the nerve fiber as a mode of protoplasmic movement. J. Exp.
Zool., 9:787–846, 1910.
[115] J. Hellgren, S. Grillner, and A. Lansner. Computer simulation of the segmental neural net-
work generating locomotion in lamprey by using populations of network interneurons. Biol.
Cybern., 68:1–13, 1992.
[116] J. Hertz, A. Krogh, and R.G. Palmer. Introduction to the Theory of Neural Computation.
Addison Wesley, Reading, MA, 1991.
[117] A.V. Herz, T. Gollisch, C.K. Machens, and D. Jaeger. Modeling single-neuron dynamics and
computations: a balance of detail and abstraction. Science, 314:80–85, 2006.
[118] D.J. Higham. An algorithmic introduction to numerical simulation of stochastic differential

equations. SIAM Rev., 43:525–546, 2001.
[119] A.A. Hill, J. Lu, M.A. Masino, O.H. Olsen, and R.L. Calabrese. A model of a segmental
oscillator in the leech heartbeat neuronal network. J. Comput. Neurosci., 10:281–302, 2001.
[120] M.W. Hirsch, C.C. Pugh, and M. Shub. Invariant Manifolds. Springer, Berlin, Heidelberg,
New York, 1977. Springer Lecture Notes in Mathematics No. 583.
[121] M.W. Hirsch, S. Smale, and R.L. Devaney. Differential Equations, Dynamical Systems and
an Introduction to Chaos. Academic Press/Elsevier, San Diego, CA, 2004.
[122] A.L. Hodgkin. Chance and design in electrophysiology: An informal account of certain
experiments on nerve carried out between 1934 and 1952. J. Physiol., 263:1–21, 1976.
[123] A.L. Hodgkin and A.F. Huxley. The components of membrane conductance in the giant axon
of Loligo. J. Physiol., 116:473–496, 1952.
[124] A.L. Hodgkin and A.F. Huxley. Currents carried by sodium and potassium ions through the
membrane of the giant axon of Loligo. J. Physiol., 116:449–472, 1952.
[125] A.L. Hodgkin and A.F. Huxley. The dual effect of membrane potential on sodium conductance
in the giant axon of Loligo. J. Physiol., 116:497–506, 1952.
[126] A.L. Hodgkin and A.F. Huxley. A quantitative description of membrane current and its
application to conduction and excitation in nerve. J. Physiol., 117:500–544, 1952.
[127] A.L. Hodgkin, A.F. Huxley, and B. Katz. Ionic currents underlying activity in the giant axon
of the squid. Arch. Sci. Physiol., 3:129–150, 1949.
[128] A.L. Hodgkin, A.F. Huxley, and B. Katz. Measurement of current-voltage relations in the
membrane of the the giant axon of Loligo. J. Physiol., 116:424–448, 1952.
[129] P. Holmes, R.J. Full, D. Koditschek, and J. Guckenheimer. The dynamics of legged locomo-
tion: Models, analyses and challenges. SIAM Rev., 48(2):207–304, 2006.
200
[130] P. Holmes, J.L. Lumley, G. Berkooz, and C.W. Rowley. Turbulence, Coherent Structures,
Dynamical Systems and Symmetry. Cambridge University Press, Cambridge, U.K., 2012.
Second Edition.
[131] P. Holmes, E. Shea-Brown, J. Moehlis, R. Bogacz, J. Gao, G. Aston-Jones, E. Clayton,

J. Rajkowski, and J.D. Cohen. Optimal decisions: From neural spikes, through stochastic
differential equations, to behavior. IEICE Transactions on Fundamentals on Electronics,
Communications and Computer Science, E88A (10):2496–2503, 2005.
[132] G.R. Holt and C. Koch. Shunting inhibition does not have a divisive effect on firing rates.
Neural Computation, 9:1001–1013, 1997.
[133] J.J. Hopfield. Neural networks and physical systems with emergent collective computational
abilities. Proc. Natl. Acad. Sci. USA, 79 (8):2554–2558, 1982.
[134] J.J. Hopfield. Neurons with graded response have collective computational properties like
those of two-state neurons. Proc. Natl. Acad. Sci. USA, 81 (10):3088–3092, 1984.
[135] F.C. Hoppensteadt and E.M. Izhikevich. Weakly Connected Neural Networks. Springer, New
York, 1997.
[136] D.H. Hubel and T.N. Wiesel. Functional architecture of macaque monkey visual cortex. Proc.
Roy. Soc. B, 198:1–59, 1977.
[137] A.J. Ijspeert. Central pattern generators for locomotion control in animals and robots: A
review. Neural Netw., 21:642–653, 2008.
[138] A.J. Ijspeert, A. Crespi, D. Ryczko, and J.M. Cabelguen. From swimming to walking with a
salmander robot driven by a spinal cord model. Science, 315:1416–1420, 2007.
[139] E.M. Izhikevich. Simple model of spiking neurons. IEEE Transactions on Neural Networks,
14:1569–1572, 2003.
[140] E.M. Izhikevich. Which model to use for cortical spiking neurons? IEEE Transactions on
Neural Networks, 15:1063–1070, 2004.
[141] E.M. Izhikevich. Dynamical systems in neuroscience: The geometry of excitability and burst-
ing. MIT Press, Cambridge, MA, 2007.
[142] C.E. Jahr and C.F. Stevens. A quantitative description of NMDA receptor-channel kinetic
behavior. J. Neurosci., 10:1830–1837, 1990.
[143] C.E. Jahr and C.F. Stevens. Voltage dependence of NMDA-activated macroscopic conduc-
tances predicted by single-channel kinetics. J. Neurosci., 10:3178–3182, 1990.
[144] L.A. Jeffress. A place theory of sound localization. J. Comp. Physiol. Psychol., 41:35–39,
1948.
[145] P.E. Jercog, G. Svirskis, V.C. Kotak, D.H. Sanes, and J. Rinzel. Asymmetric excitatory
synaptic dynamics underlie interaural time difference processing in the auditory system. PLoS
Comput. Biol., 8 (6):e1000406, 2010.
[146] D. Johnston and S. Wu. Foundations of Cellular Neurophysiology. MIT Press, Cambridge,
MA, 1997.
201
[147] C.K.R.T. Jones. Geometric Singular Perturbation Theory, volume 1609 of Lecture Notes in
Mathematics. Springer, Heidelberg, 1994. C.I.M.E. Lectures.
[148] E.G. Jones and A. Peters. Cerebral Cortex, Functional Properties of Cortical Cells, Vol.2.
Plenum, New York, 1984.
[149] E.R. Kandel, J.H. Schwartz, and T.M. Jessel. Principles of Neural Science. McGraw-Hill,
New York, 2000.
[150] J. Keener and J. Sneyd. Mathematical Physiology. Springer, New York, 2009. 2nd Edition,
2 Vols.
[151] H. Keshishian. Ross Harrison’s “The Outgrowth of the Nerve Fiber as a Mode of Protoplasmic
Movement”. J. Exp. Zool., 301A:201–203, 2004.
[152] P.E. Kloeden and E. Platen. Numerical Solution of Stochastic Differential Equations.
Springer, Berlin, 1999.
[153] B.W. Knight. Dynamics of encoding in a population of neurons. J. Gen. Physiol., 59:734–766,
1972.
[154] B.W. Knight. The relationship between the firing rate of a single neuron and the level of
activity in a population of neurons. experimental evidence for resonant enhancement in the
population response. J. Gen. Physiol., 59:767–778, 1972.
[155] C. Koch. Biophysics of computationa - information processing in single neurons. Oxford

University Press, New York, 1999.
[156] D.E. Koditschek, R.J. Full, and M. Buehler. Mechanical aspects of legged locomotion control.
Arthropod Struct. and Devel., 33 (3):251–272, 2004.
[157] N. Kopell. Toward a theory of modelling central pattern generators. In A.H Cohen, S. Rossig-
nol, and S. Grillner, editors, Neural Control of Rhythmic Movements in Vertebrates, pages
3369–413. Wiley, New York, 1988.
[158] N. Kopell, C. Börgers, D. Pervouchine, P. Malerba, and A.B.L. Tort. Gamma and theta
rhythms in biophysical models of hippocampal circuits. In V. Cutsuridis, B.F. Graham,
S. Cobb, and I. Vida, editors, Hippocampal microcircuits: A computational modeller’s resource
book, pages 423–457. Springer, New York, 2010.
[159] N. Kopell and G.B. Ermentrout. Coupled oscillators and the design of central pattern gen-
erators. Math. Biosciences, 90:87–109, 1988.
[160] A.A. Koulakov, S. Taghavachari, A. Kepecs, and J.E. Lisman. Model for a robust neural
integrator. Nat. Neurosci., 5:775–782, 2002.
[161] V.I. Krinsky and Yu. M. Kokoz. Reduction of the Hodgkin-Huxley system to a second order
system. Biofizika, 18 (3):506–511, 1973.
[162] R.P. Kukillaya and P. Holmes. A hexapedal jointed-leg model for insect locomotion in the
horizontal plane. Biol. Cybern., 97:379–395, 2007.
202
[163] R.P. Kukillaya and P. Holmes. A model for insect locomotion in the horizontal plane: Feed-
forward activation of fast muscles, stability, and robustness. J. Theor. Biol., 261 (2):210–226,
2009.
[164] R.P. Kukillaya, J. Proctor, and P. Holmes. Neuro-mechanical models for insect locomotion:
Stability, maneuverability, and proprioceptive feedback. CHAOS: An Interdisciplinary Jour-
nal of Nonlinear Science, 19 (2):026107, 2009.
[165] G. La Camera, A. Rauch, D. Thurbon, H.-R. Luscher, W. Senn, and S. Fusi. Multiple time
scales of temporal response in pyramidal and fast spiking cortical neurons. J. Neurophysiol.,
96:3448–3464, 2006.
[166] D.R.J. Laming. Information Theory of Choice-Reaction Times. Academic Press, New York,
1968.
[167] L. Lapicque. Recherches quantitatives sur l’excitatabilité électrique des nerfs traitée comme
une polarisation. J. Physiol. Pathol. Gen., 9:620–635, 1907.
[168] P. Latham and N. Brunel. Firing rate of the noisy quadratic integrate-and-fire neuron. Neural
Computation, 15:2281–2306, 2003.
[169] P.E. Latham, B.J. Richmond, P.G. Nelson, and S. Nirenberg. Intrinsic dynamics in neuronal
networks. I. Theory. J. Neurophysiol., 83:808–827, 2000.
[170] E.L. Lehmann. Testing Statistical Hypotheses. John Wiley & Sons, New York, 1959.
[171] T.J. Lewis and J. Rinzel. Dynamics of spiking neurons connected by both inhibitory and
electrical coupling. J. Comput. Neurosci., 14:283–309, 2003.
[172] J.E. Lisman, J.M. Fellous, and X.-J. Wang. A role for NMDA-receptor channels in working
memory. Nat. Neurosci., 1:273–275, 1998.
[173] J.E. Lisman and M.A. Idiart. Storage of 7 +/- 2 short-term memories in oscillatory subcycles.
Science, 267:1512–1515, 1995.
[174] Y.H. Liu and X.-J. Wang. Spike-frequency adaptation of a generalized integrate-and fire
model neuron. J. Comput. Neurosci., 10:25–45, 2001.
[175] Y. Loewenstein, S. Mahon, P. Chadderton, K. Kitamura, H. Sompolinsky, Y. Yarom, and

M. Hausser. Bistability of cerebellar purkinje cells modulated by sensory stimulation. Nat.
Neurosci., 8:202–211, 2005.
[176] M. London and M. Häusser. Dendritic computation. Annu. Rev. Neurosci., 28:503–532, 2005.
[177] M.C. Mackey and M. Santillán. Andrew Fielding Huxley (1917-1952). AMS Notices, 60
(5):576–584, 2013.
[178] W.T. Maddox and C.J. Bohil. Base-rate and payoff effects in multidimensional perceptual
categorization. J. Experimental Psychology: Learning, Memory, and Cognition, 24 (6):1459–
1482, 1998.
[179] I.G. Malkin. Methods of Poincaré and Linstedt in the Theory of Nonlinear Oscillations.
Gostexisdat, Moscow, 1949. In Russian.
203
[180] I.G. Malkin. Some Problems in Nonlinear Oscillation Theory. Gostexisdat, Moscow, 1956.
In Russian.
[181] J.J. Mancuso, J. Kim, S. Lee, S. Tsuda, N.B.H. Chow, and G.J. Augustine. Optogenetic
probing of functional brain circuitry. Experimental Physiol., 96 (1):26–33, 2010.
[182] E. Marder. Motor pattern generation. Curr. Opin. Neurobiol., 10 (6):691–698, 2000.
[183] E. Marder and D. Bucher. Understanding circuit dynamics using the stomatogastric nervous
system of lobsters and crabs. Annu. Rev. Physiol., 69:291–316, 2007.
[184] H. Markram. The blue brain project. Nature Reviews in Neuroscience, 7:153–160, 2006.
[185] A. Mason and A. Larkman. Correlations between morphology and electrophysiology of pyra-
midal neurons in slices of rat visual cortex. ii. electrophysiology. J. Neurosci., 10:1415–1428,
1990.
[186] M. Mattia and S. Fusi. Modeling networks with VLSI (linear) integrate-and-fire neurons.
Losanna, Switzerland: Proceedings of the 7th international conference on artificial neural
networks, 1997.
[187] M.M. McCarthy, S. Ching, M.A. Whittington, and N. Kopell. Dynamical changes in neuro-
logical disease and anesthesia. Curr. Opin. Neurobiol., 22 (4):693–703, 2012.
[188] D.A. McCormick, B.W. Connors, J.W. Lighthall, and D.A. Prince. Comparative electrophys-
iology of pyramidal and sparsely spiny stellate neurons of the neocortex. J. Neurophysiol.,
54:782–806, 1985.
[189] T. McMillen and P. Holmes. The dynamics of choice among multiple alternatives. J. Math.
Psychol., 50:30–57, 2006.
[190] T. McMillen and P. Holmes. An elastic rod model for anguilliform swimming. J. Math. Biol.,
53:843–866, 2006.
[191] T. McMillen, T.L. Williams, and P. Holmes. Nonlinear muscles, passive viscoelasticity and
body taper conspire to create neuro-mechanical phase lags in anguilliform swimmers. PLoS
Comput. Biol., 4 (8):e1000157, 2008.
[192] J. Myung and J.R. Busemeyer. Criterion learning in a deferred decision-making task. Amer-
ican J. of Psychology, 102 (1):1–16, 1989.
[193] J.S. Nagumo, S. Arimoto, and S. Yoshizawa. An active pulse transmission line simulating a
nerve axon. Proc. IRE, 50:2061–2070, 1962.
[194] T. Netoff, M.A. Schwemmer, and T.J. Lewis. Experimentally estimating phase response
curves of neurons: theoretical and practical issues. In N.W. Schultheiss, A. Prinz, and R.J.
Butera, editors, PRCs in Neuroscience: Theory, Experiment and Analysis, pages 95–129.
Springer, New York, 2012.
[195] J. Neyman and E.S. Pearson. On the problem of the most efficient tests of statistical hy-
potheses. Phil. Trans. Roy. Soc. A., 231:289–337, 1933.
[196] H. Okamoto and T. Fukai. Physiologically realistic modelling of a mechanism for neural
representation of intervals of time. Biosystems, 68:229–233, 2003.
204
[197] H. Okamoto, Y. Isomura, M. Takada, and T. Fukai. Temporal integration by stochastic
recurrent network dynamics with bimodal neurons. J. Neurophysiol., 97:3859–3867, 2007.
[198] K. Ota, M. Nomura, and T. Aoyagi. Weighted spike-triggered average of a fluctuating stimulus
yielding the phase response curve. Phys. Rev. Lett., 103:024101, 2009.
[199] K.G. Pearson. Central programming and reflex control of walking in the cockroach. J. Exp.
Biol., 56:173–193, 1972.
[200] K.G. Pearson. Motor systems. Curr. Opin. Neurobiol., 10:649–654, 2000.
[201] K.G. Pearson and J.F. Iles. Nervous mechanisms underlying intersegmental co-ordination of
leg movements during walking in the cockroach. J. Exp. Biol., 58:725–744, 1973.
[202] H.J. Poincaré. Sur le problème des trois corps et les équations de la dynamique. Acta
Mathematica, 13:1–270, 1890.
[203] H.J. Poincaré. Les méthodes nouvelles de la mécanique celeste, Vols 1-3. Gauthiers-Villars,
Paris, 1892,1893,1899.
[204] S.A. Prescott and Y. De Koninck. Gain control of firing rate by shunting inhibition: roles of
synaptic noise and dendritic saturation. Proc. Natl. Acad. Sci. USA, 100:2076–2081, 2003.
[205] A.J. Preyer and R.J. Butera. Neuronal oscillators in aplysia californica that demonstrate
weak coupling in vitro. Phys. Rev. Lett., 95:138103, 2005.
[206] A.A. Prinz, L.F. Abbott, and E. Marder. The dynamics clamp comes of age. Trends Neurosci.,
27:218–224, 2004.
[207] J. Proctor and P. Holmes. Reflexes and preflexes: On the role of sensory feedback on rhythmic
patterns in legged locomotion. Biol. Cybern., 2:513–531, 2010.
[208] J. Proctor, R.P. Kukillaya, and P. Holmes. A phase-reduced neuro-mechanical model for
insect locomotion: feed-forward stability and proprioceptive feedback. Phil. Trans. Roy. Soc.
A, 368:5087–5104, 2010.
[209] W. Rall. Branching dendritic trees and motoneuron membrane resistivity. Exp. Neurol., 1
(5):491–517, 1959.
[210] W. Rall. Theoretical significance of dendritic trees for neuronal input-output relations. In
R. Reiss, editor, Neural Theory and Modeling, pages 93–97. Stanford Univ. Press, Stanford,
CA, 1964.
[211] W. Rall, R.E. Burke, T.G. Smith, P.G. Nelson, and K. Frank. Dendritic location of synapses
and possible mechanisms for the monosynaptic EPSP in motoneurons. J. Neurophysiol.,
30:1169–1193, 1967.
[212] R. Ratcliff. A theory of memory retrieval. Psychol. Rev., 85:59–108, 1978.
[213] R. Ratcliff, A. Cherian, and M.A. Segraves. A comparison of macaque behavior and superior
colliculus neuronal activity to predictions from models of two choice decisions. J. Neurophys-
iol., 90:1392–1407, 2003.
205
[214] R. Ratcliff, T. Van Zandt, and G. McKoon. Connectionist and diffusion models of reaction
time. Psychol. Rev., 106 (2):261–300, 1999.
[215] A. Rauch and S. Fusi. Neocortical pyramidal cells respond as integrate-and-fire neurons to
in vivo-like input currents. J. Neurophysiol., 90:1598–1612, 2003.
[216] A. Renart, N. Brunel, and X.-J. Wang. Mean-field theory of recurrent cortical networks:
from irregularly spiking neurons to working memory. In J. Feng, editor, Computational
Neuroscience: A comprehensive approach. CRC Press, Boca Raton, 2003.
[217] L. M. Ricciardi. Diffusion Processes and Related Topics in Biology. Springer, Berlin, 1977.
[218] M.J.E. Richardson, N. Brunel, and V. Hakim. From subthreshold to firing-rate resonance. J.
Neurophysiol., 89:2538–2554, 2003.
[219] F. Rieke, W. Bialek, D. Warland, and R. De Ruyter van Steveninck. Spikes: Exploring the
Neural Code. MIT Press, Cambridge, MA, 1997.
[220] J. Rinzel. Excitation dynamics: Insights from simplified membrane models. Fed. Proc.,
37:2944–2946, 1985.
[221] J. Rinzel and J.P. Keener. Hopf bifurcation to repetitive activity in nerve. SIAM J. on
Applied Mathematics, 43:907–922, 1983.
[222] J. Rinzel and J.B. Keller. Traveling wave solutions of a nerve conduction equation. Biophys.
J., 13:1313–1337, 1973.
[223] R.E. Ritzmann, S.N. Gorb, and R.D. Quinn (Eds). Arthropod locomotion systems: from
biological materials and systems to robotics. Arthropod Struct. and Devel., 33 (3), 2004.
Special issue.
[224] A.E. Rorie, J. Gao, J.L. McClelland, and W.T. Newsome. Integration of sensory and reward
information during perceptual decision-making in lateral intraparietal cortex (LIP) of the
macaque monkey. PLoS ONE, 5 (2):e9308, 2010.
[225] R. Rose and J. Hindmarsh. The assembly of ionic currents in a thalamic neuron I. The
three-dimensional model. Proc. Roy. Soc. B, 237:267–288, 1989.
[226] S. Ross. A First Course in Probability. Prentice Hall, Upper Saddle River, New Jersey, 2002.
[227] A. Roxin and A. Ledberg. Neurobiological models of two-choice decision making can
be reduced to a one-dimensional nonlinear diffusion equation. PLoS Comput. Biol., 4
(3):e10000146, 2008.
[228] D.E. Rumelhart and J.L. McClelland. Parallel Distributed Processing: Explorations in the
Microstructure of Cognition. MIT Press, Cambridge, MA., 1986.
[229] J.D. Schall. Neural basis of deciding, choosing and acting. Nature Reviews in Neuroscience,
2:33–42, 2001.
[230] J. Schmitt, M. Garcia, C. Razo, P. Holmes, and R.J. Full. Dynamics and stability of legged
locomotion in the horizontal plane: A test case using insects. Biol. Cybern., 86(5):343–353,
2002.
206
[231] J. Schmitt and P. Holmes. Mechanical models for insect locomotion: Dynamics and stability
in the horizontal plane – I. Theory. Biol. Cybern., 83(6):501–515, 2000a.
[232] J. Schmitt and P. Holmes. Mechanical models for insect locomotion: Dynamics and stability
in the horizontal plane – II. Application. Biol. Cybern., 83(6):517–527, 2000b.
[233] J. Schmitt and P. Holmes. Mechanical models for insect locomotion: Active muscles and
energy losses. Biol. Cybern., 89(1):43–55, 2003.
[234] N.W. Schultheiss, A. Prinz, and R.J. Butera, editors. PRCs in Neuroscience: Theory, Experi-
ment and Analysis. Springer, New York, 2012. Springer Series in Computational Neuroscience,
Vol. 6.
[235] S. Seung. Connectome: How the Brain’s Wiring Makes Us Who We Are. Houghton Mifflin
Harcourt, New York, 2012.
[236] M.N. Shadlen and W.T. Newsome. Noise, neural codes and cortical organization. Curr. Opin.
Neurobiol., 4:569–579, 1994.
[237] M.N. Shadlen and W.T. Newsome. The variable discharge of cortical neurons: implications
for connectivity, computation, and information coding. J. Neurosci., 18:3780–3896, 1998.
[238] M.N. Shadlen and W.T. Newsome. Neural basis of a perceptual decision in the parietal cortex
(area LIP) of the rhesus monkey. J. Neurophysiology, 86:1916–1936, 2001.
[239] C.E. Shannon. A mathematical theory of communication. Bell Sys. Tech. J., 27:379–423;
623–656, 1948. Reprinted in Shannon and Weaver (1949) [240].
[240] C.E. Shannon and W. Weaver. The Mathematical Theory of Communication. University of
Illinois Press, Urbana, IL., 1949.
[241] A. Sherman, J. Rinzel, and J. Keizer. Emergence of organized bursting in clusters of pancre-
atic β-cells by channel sharing. Biophys. J., 54:411–425, 1988.
[242] E. Shlizerman and P. Holmes. Neural dynamics, bifurcations, and firing rates in a quadratic
integrate-and-fire model with a recovery variable. I: Deterministic behavior. J. Comput.
Neurosci., 24:2078–2118, 2012.
[243] O. Shriki, D. Hansel, and H. Sompolinsky. Rate models for conductance-based cortical neu-
ronal networks. Neural Computation, 15:1809–1841, 2003.
[244] Y. Shu, A. Hasenstaub, and D.A. McCormick. Turning on and off recurrent balanced cortical
activity. Nature, 423:288–293, 2003.
[245] P. Simen, D. Contreras, C. Buck, P. Hu, P. Holmes, and J.D. Cohen. Reward rate optimiza-
tion in two-alternative decision making: Empirical tests of theoretical predictions. J. Exp.
Psychol.: Hum. Percept. Perform., 35 (6):1865–1897, 2009.
[246] P.A. Simen, J.D. Cohen, and P. Holmes. Rapid decision threshold modulation by reward rate
in a neural network. Neural Networks, 19:1013–1026, 2006.
[247] J.D. Skufka. Analysis still matters: A surprising instance of failure of Runge-Kutta-Felberg
ODE solvers. SIAM Rev., 46 (4):729–737, 2004.
207
[248] P.L. Smith and R. Ratcliff. Psychology and neurobiology of simple decisions. Trends in
Neurosci., 27 (3):161–168, 2004.
[249] R. Stein. A theoretical analysis of neuronal variability. Biophys. J., 5:173–194, 1965.
[250] E.A. Stern, A.E. Kincaid, and C.J. Wilson. Spontaneous subthreshold membrane potential
fluctuations and action potential variability of rat corticostriatal and striatal neurons in vivo.
J. Neurophysiol., 77:1697–1715, 1997.
[251] J. Stoer and R. Bulirsch. Introduction to Numerical Analysis. Springer, New York, 2002.
Third edition.
[252] A.M. Stuart and A.R. Humphries. Dynamical Systems and Numerical Analysis. Cambridge
University Press, Cambridge, U.K., 1996.
[253] G. Svirskis, R. Dodla, and J. Rinzel. Subthreshold outward currents enhance temporal inte-
gration in auditory neurons. Biol. Cybern., 89:333–340, 2003.
[254] A. Treves. Mean-field analysis of neuronal spike dynamics. Network, 4:259–284, 1993.
[255] T.W. Troyer and K.D. Miller. Physiological gain leads to high isi variability in a simple model
of a cortical regular spiking cell. Neural Computation, 9:971–983, 1997.
[256] M. Tsodyks and T. Sejnowski. Rapid switching in balanced cortical network models. Network,
6:1–14, 1995.
[257] E.D. Tytell, P. Holmes, and A.H. Cohen. Spikes alone do not behavior make: why neuro-
science needs biomechanics. Curr. Opin. Neurobiol., 21:816–822, 2011.
[258] E.D. Tytell, C.-Y. Hsu, T.L. Williams, A.H. Cohen, and L.J. Fauci. Interactions between body
stiffness, muscle activation, and fluid environment in a neuromechanical model of lamprey
swimming. Proc. Natl. Acad. Sci., 107(46):19832–19837, 2010.
[259] M. Usher and J.L. McClelland. On the time course of perceptual choice: The leaky competing
accumulator model. Psychol. Rev., 108:550–592, 2001.
[260] C. van Vreeswijk and H. Sompoliinsky. Chaos in neuronal networks with balanced excitatory
and inhibitory activity. Science, 274:1724–1726, 1996.
[261] P.L. Várkonyi, T. Kiemel, K. Hoffman, A.H. Cohen, and P. Holmes. On the derivation
and tuning of phase oscillator models for lamprey central pattern generators. J. Comput.
Neurosci., 25:245–261, 2008.
[262] J. von Neumann. The Computer and the Brain. Yale University Press, New Haven, CT, 1958.
2nd Edition, with a foreward by P.M. and P.S. Churchland, 2000.
[263] A. Wald. Sequential Analysis. John Wiley & Sons, New York, 1947.
[264] A. Wald and J. Wolfowitz. Optimum character of the sequential probability ratio test. Annu.
Math. Statist., 19:326–339, 1948.
[265] P. Wallen, O. Ekeberg, A. Lansner, L. Brodin, H. Traven, and S. Grillner. A computer-based

model for realistic simulations of neural networks. II. The segmental network generating
locomotor rhythmicity in the lamprey. J. Neurophysiol., 68:1939–1950, 1992.
208
[266] W.A. Wallis. The Statistical Research Group, 1942-1945. J. Amer. Statist. Assoc., 75:320–
330, 1980.
[267] H. Wang, G.G. Stradtman 3rd, X.-J. Wang, and W.-J. Gao. A specialized NMDA receptor
function in layer 5 recurrent microcircuitry of the adult rat prefrontal cortex. Proc. Natl.
Acad. Sci. USA, 105:16791–16796, 2008.
[268] X.-J. Wang. Fast burst firing and short-term synaptic plasticity: a model of neocortical
chattering neurons. Neuroscience, 89:347–362, 1999.
[269] X.-J. Wang. Synaptic basis of cortical persistent activity: the importance of NMDA receptors
to working memory. J. Neurosci., 19:9587–9603, 1999.
[270] X.-J. Wang. Probabilistic decision making by slow reverberation in cortical circuits. Neuron,
36:955–968, 2002.
[271] X.-J. Wang. Decision making in recurrent neuronal circuits. Neuron, 60:215–234, 2008.
[272] X.-J. Wang. Neurophysiological and computational principles of cortical rhythms in cognition.
Physiol. Rev., 90:1195–1268, 2010.
[273] X.-J. Wang and G. Buzsaki. Gamma oscillation by synaptic inhibition in a hippocampal
interneuronal network model. J. Neurosci., 16:6402–6413, 1996.
[274] R. Weinstock. Calculus of Variations. McGraw-Hill, New York, 1952. Reprinted by Dover
Publications Inc., New York, 1974.
[275] N. Wiener. Cybernetics: or Control and Communication in the Animal and the Machine.
M.I.T. Press, Cambridge, MA, 1948. 2nd Edition, 1961.
[276] T.L. Williams. Phase coupling by synaptic spread in chains of coupled neuronal oscillators.
Science, 258:662–665, 1992.
[277] H. Wilson. Spikes, Decisions and Actions: The Dynamical Foundations of Neuroscience.
Oxford University Press, Oxford, U.K., 1999. Currently out of print, downloadable from
http://cvr.yorku.ca/webpages/wilson.htm#book.
[278] H. Wilson and J. Cowan. Excitatory and inhibitory interactions in localized populations of
model neurons. Biophys. J., 12:1–24, 1972.
[279] H. Wilson and J. Cowan. A mathematical theory of the functional dynamics of cortical and
thalamic nervous tissue. Kybernetik, 13:55–80, 1973.
[280] A.T. Winfree. The Geometry of Biological Time. Springer, New York, 2001. Second Edition.
[281] M. Winograd, A. Destexhe, and M.V. Sanchez-Vives. Hyperpolarization-activated graded

persistent activity in the prefrontal cortex. Proc. Natl. Acad. Sci. USA, 105:7298–7303, 2008.
[282] K.F. Wong and X.-J. Wang. A recurrent network mechanism of time integration in perceptual
decisions. J. Neurosci., 26 (4):1314–1328, 2006.
[283] M. Zacksenhouse, P. Holmes, and R. Bogacz. Robust versus optimal strategies for two-
alternative forced choice tasks. J. Math. Psychol., 54:230–246, 2010.
209

A Short Course in Mathematical Neuroscience (Revised 7/2015)

Uploaded by

Copyright:

Available Formats

A Short Course in Mathematical Neuroscience (Revised 7/2015)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Short Course in Mathematical Neuroscience (Revised 7/2015)

Uploaded by

Copyright:

Available Formats

A Short Course in Mathematical Neuroscience

Philip Eckhoff and Philip Holmes,

Program in Applied and Computational Mathematics,

1 A very short introduction to (mathematical) neuroscience 1

1.1 An historical sketch of nerves and their mathematics . . . . . . . . . . . . . . . . . . 2

1.2.1 Action potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Neural coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 The central nervous system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5 The cerebral cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5.1 Discovery of organization of the cerebral cortex . . . . . . . . . . . . . . . . . 11

1.5.2 Functional organization of the cerebral cortex . . . . . . . . . . . . . . . . . . 14

1.6 The peripheral nervous system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.7 A note on experimental methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Mathematical Tools: ODEs, Numerical Methods and Dynamical Systems 18

2.1 Linearization and solving linear systems of ODEs . . . . . . . . . . . . . . . . . . . . 19

2.1.1 Stability of solutions of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.1.2 Liapunov functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2 Numerical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2.2 The second-order Euler-Heun method . . . . . . . . . . . . . . . . . . . . . . 31

2.2.3 The Runge-Kutta method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.2.4 Newton’s method for finding zeros . . . . . . . . . . . . . . . . . . . . . . . . 33

2.3 Geometric theory of dynamical systems . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.3.1 Stable and unstable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.3.2 Planar systems of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.3.3 Center manifolds and local bifurcations . . . . . . . . . . . . . . . . . . . . . 41

2.3.4 Periodic orbits and Poincaré maps . . . . . . . . . . . . . . . . . . . . . . . . 51

2.4 An introduction to neural networks as nonlinear ODEs . . . . . . . . . . . . . . . . . 53

3 Models of single cells 60

3.1 Modeling strategies and scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.2 Ion channels and membranes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.3 The Hodgkin-Huxley (H-H) equations . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.4 Two-dimensional simplifications of the H-H equations . . . . . . . . . . . . . . . . . 75

3.5 Bursting neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.6 Propagation of action potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

3.6.1 Traveling waves in a simple PDE . . . . . . . . . . . . . . . . . . . . . . . . . 87

3.6.2 Traveling waves in the FitzHugh-Nagumo equation . . . . . . . . . . . . . . . 89

3.7 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4 Synaptic connections and small networks 92

4.1 Synapses and gap junctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.1.1 Electrical Synapses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.2 Integrate-and-fire models of neurons and synapses . . . . . . . . . . . . . . . . . . . 98

4.2.1 Integrate-and-fire models for a single cell . . . . . . . . . . . . . . . . . . . . 98

4.2.2 Integrate-and-fire models with noisy inputs . . . . . . . . . . . . . . . . . . . 100

4.2.3 Implementation of synaptic inputs to IF models . . . . . . . . . . . . . . . . 103

4.2.4 A pair of coupled integrate-and-fire neurons . . . . . . . . . . . . . . . . . . . 109

4.2.5 A final note on neuronal modeling . . . . . . . . . . . . . . . . . . . . . . . . 113

4.3 Phase reductions and phase oscillator models of neurons . . . . . . . . . . . . . . . . 116

4.3.1 Phase response or resetting curves . . . . . . . . . . . . . . . . . . . . . . . . 116

4.3.2 Weakly coupled phase oscillators and averaging theory . . . . . . . . . . . . . 121

4.3.3 Phase models and half-center oscillators . . . . . . . . . . . . . . . . . . . . . 122

4.4 Central Pattern Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

4.4.1 A CPG model for lamprey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

4.4.2 A CPG model for insect locomotion . . . . . . . . . . . . . . . . . . . . . . . 125

5 Probabilistic methods and information theory 131

5.1 Stimulus to response maps: Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.1.1 Description of spike trains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.1.2 A primer on probability theory . . . . . . . . . . . . . . . . . . . . . . . . . . 134

5.1.3 Spike-triggered averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

5.1.4 A Poisson model for firing statistics . . . . . . . . . . . . . . . . . . . . . . . 139