Biol. Cybern. 78, 277±292 (1998)
Cortical memory dynamics
Edward W. Kairiss1,3, Willard L. Miranker2,3*
1
Department of Psychology, Yale University, New Haven, Connecticut, USA
Department of Computer Science, Yale University, Box 208205, New Haven, CT 06520-28205, USA
3
Neuroengineering and Neuroscience Center, Yale University, New Haven, Connecticut, USA
2
Received: 26 December 1995 / Accepted: 14 November 1997
Abstract. Biological memories have a number of unique
features, including (1) hierarchical, reciprocally interacting layers, (2) lateral inhibitory interactions within
layers, and (3) Hebbian synaptic modi®cations. We
incorporate these key features into a mathematical and
computational model in which we derive and study
Hebbian learning dynamics and recall dynamics. Introducing the construct of a feasible memory (a memory
that formally responds correctly to a speci®ed collection
of noisy cues that are known in advance), we study
stability and convergence of the two kinds of dynamics
by both analytical and computational methods. A
conservation law for memory feasibility under Hebbian
dynamics is derived. An infomax net is one where the
synaptic weights resolve the most uncertainty about a
neural input based on knowledge of the output. The
infomax notion is described and is used to grade
memories and memory performance. We characterize
the recall dynamics of the most favorable solutions from
an infomax perspective. This characterization includes
the dynamical behavior when the net is presented with
external stimuli (noisy cues) and a description of the
accuracy of recall. The observed richness of dynamical
behavior, such as its initial state sensitivity, provides
some hints for possible biological parallels to this model.
1 Introduction
The contemporary eort to understand the brain as a
computational device faces a great challenge. Insights
into the morphological and biophysical complexity of
the nervous system, revealed by increasingly sophisti-
Correspondence to: W.L. Miranker
(e-mail: miranker-willard@yale.edu,
Tel.: +1-203-432 4671, Fax: -203-432 0593)
Research Sta Member Emeritus, IBM Research Center,
Yorktown Heights, New York, USA
cated methods over the last decade, have not been
accompanied by equivalent insights into functional
architecture. Among the many related questions are:
What structural and biophysical details of neurons and
their connections are relevant to the computations
performed by them? What is the signi®cance (to
information processing) of the morphological and
biophysical heterogeneity of neurons? Do the dierent
circuit architectures seen throughout the brain subserve
dierent computational functions?
The goal of computational neuroscience has been to
develop a theoretical framework that can integrate
knowledge of structure, function, and information processing by neurons, thereby leading to better understanding of higher cognitive qualities of the brain.
Computational studies of neural information processing
generally take one of two forms. Biophysical models of
neurons and networks attempt to embody molecular,
cellular and circuit details of well-studied brain regions
in models that accurately re¯ect experimental ®ndings.
This class of models serves as a valuable tool for gaining
insight into the complex biophysical interactions that
generate neural activity, and thus provides guidance for
future experiments. On the other side, arti®cial neural
networks rely on highly simpli®ed abstractions of biological nervous systems to create models that have
powerful computational properties but provide little
connection with information processing in brains.
The present work is an eort toward bridging the gap
between biologically realistic models and abstract computational models. We design an arti®cial neural network that functions as an associative memory. While its
architecture is similar to that of many recurrent networks (such as Hop®eld-Grossberg nets), it incorporates
three constraints that we judge to be signi®cant features
of biological neural networks. With few exceptions,
none of the existing connectionist models attempts to
cast these architectural features into a computational
and analytical form. Such a model may be used for the
study of a number of key issues, including capacity,
scaling, speed, stability, consistency and the relationship
between network dynamics and synaptic learning rules.
278
We hope that a deeper understanding of the remarkable
cognitive capabilities of the brain will also result.
2 Biological constraints on the computational model
We modify the standard recurrent architecture to
incorporate three salient features of cortical memory
systems. These are (1) hierarchical, reciprocally connected areas, (2) lateral inhibition, and (3) Hebbian plasticity.
Hierarchical organization. The overall organization of
cortical areas suggests a hierarchical structure, with
extensive reciprocal interactions between areas (Zeki
and Shipp 1988; Felleman and Van Essen 1991).
Thalamic input provides sensory information to `primary' sensory areas, which in turn project to `higher'
cortical regions. As one example, consider the visual
system. A striking characteristic of its organization is
that any visual cortical area that projects to another
visual area receives a reciprocal connection of equal or
greater size. In most cases, these connections are
believed to be excitatory. Similarly, the temporal lobe
memory system consists of a cascaded series of hierarchical areas, with feedforward and feedback connections
between them. Our model represents cortical areas as
layers of processing elements.
Cortical inhibitory systems. Inhibitory interneurons are a
ubiquitous feature of all cortical areas, and a powerful
lateral inhibitory system exists in cortex (for example,
Szentagothai 1969). Although feedforward and feedback
inhibitory schemes have been suggested, the most
compelling evidence (Thomson and Deuchars 1994)
suggests some form of surround inhibition. In this
scheme, activity of a pyramidal neuron leads to inhibition (via an inhibitory interneuron) of neurons that are
not nearest-neighbors. A system like this could implement a type of competitive interaction between groups
of neurons or cortical columns. A key feature of cortical
inhibition is that it is quite long-lasting compared with
excitatory input, which may enhance its ability to create
competitive interactions among sets of cortical pyramidal cells. Our model also incorporates recent evidence
that inhibitory synapses are modi®able (Kano 1994) and
thus may contribute to memory storage.
Synaptic learning rule. A popular hypothesis for the
formation of memories involves a modi®cation of the
connection strengths at the synaptic contacts between
neurons. Experimental studies have identi®ed a usedependent form of synaptic plasticity, called long-term
potentiation (LTP) (Collingridge and Bliss 1995). It is a
persistent increase in synaptic ecacy that can be quickly
induced. Although there appear to be several forms of
LTP, one that has been studied most extensively
resembles the synaptic mechanism commonly referred
to as Hebbian modi®cation (Brown et al. 1990). While it
has been convincingly demonstrated that this form of
use-dependent synaptic modi®cation can be triggered at
many dierent synapses, its precise role in learning
remains uncertain. A Hebbian mechanism is one that
might be particularly useful for unsupervised learning.
Dynamics of learning and recall. Relatively little is
known concerning the patterns of neural activity that
underlie memory storage. One popular hypothesis is that
during learning, long-term representations of sensory
events are stored in a distributed fashion throughout
cortical and subcortical areas, including those regions
that are activated by the sensory event. This process may
involve the simultaneous, widespread activation of
multiple neocortical areas through thalamocortical and
other pathways. These activity patterns are encoded as
persistent memories through some form of synaptic
modi®cation, Hebbian or otherwise. Retrieval (or recall)
involves a process of reactivating, in a similar spatial and
temporal sequence, the originally activated cortical
areas. This idea (Amaral and Price 1984; Rolls 1990) is
lent credence by several observations. First, cortical
areas receive extensive backprojections from succeeding
cortical regions (Felleman and Van Essen 1991), such
that activity in `higher' or `association' (i.e., more
remote from sensory or thalamic input) cortical regions
could reactivate the original patterns in `lower' or
`primary' sensory areas. Second, functional imaging
studies in humans (Roland and Friberg 1985) indicate
that early cortical areas are activated during recall. Our
model assumes that `readout' from memory takes place
in these early cortical regions, but depends heavily on
input signals as well as backprojections.
3 The present work
3.1 Models of cortical memory
Existing models of cortex (for example, Wilson and
Bower 1992; Traub and Jeerys 1994) tend to focus on
biophysical and network-level phenomena that are
present in cortical circuits. While this class of models
embodies selected aspects of biological realism, it is
usually based on numerical representations of cortical
neurons and is directed towards the simulation of a
restricted range of biological phenomena. This class of
model, therefore, is not well suited to the systematic
exploration of the relationship between neural architecture and memory function. Connectionist models, on the
other hand, have not fully explored the diversity of
architectural features, (such as those described above)
that are found in biological systems, and therefore
contribute only indirectly to our understanding of
cortical memory.
3.2 Goals of this study
Here we hope to abstract key features of cortical
architecture and incorporate them into a neural network
model that will allow us to explore how these features
might contribute to the design of a biological memory
system (learning and recall). Our approach has a number
279
of advantages and novel features. First, we systematically examine the contributions of those biological
features considered to be central to cortical associative
memory (see Sect. 2). Speci®cally, these include Hebbian
learning, interlayer connections, and intralayer connections in the form of lateral inhibition. To our knowledge,
this is the ®rst study to examine a model constrained by
these key biological features by both mathematical
analysis and numerical simulation. Experimental neuroscience provides the neural architectures and cognitive
phenomena that guide and motivate the modelling and
the mathematical analysis. In turn, the applied mathematics reveals conditions, properties and criteria that
will direct and organize experimental strategies and
theories associated with cortical memory.
4 The model
In this section we derive the neural net model accommodating the features described in Sect. 1. The model is
a collection of modi®ed McCulloch-Pitts neurons (with
sigmoidal transfer functions canonical in this subject,
e.g. Haykin 1994) arranged in layers with reciprocal
excitation between layers and lateral inhibition within
layers. We implement the Hebbian modi®cation previously described by means of a dynamics, called Hebbian
dynamics, for developing synaptic strength (see Sect.
4.2). For clarity we restrict our attention here to the case
of two layers and, for the most part, to the case of a
neural transfer function that is a step function. We then
introduce training, consisting of the application of
exogenous stimulation representing input patterns to
the network (see Sect. 5.1). The training drives the
Hebbian dynamics, in turn generating learning.
Following the descriptions of hierarchical organization, local cortical connectivity, cortical inhibition, synaptic learning, and mammalian memory architecture
given in Sect. 1, we propose the following model, a layered network with Hebbian learning dynamics (Fig. 1).
Cortical areas are represented as layers of modi®ed
McCulloch-Pitts neurons. In common with many existing neural network models, there are feedforward excitatory connections between layers. In addition, however,
we introduce two salient features of cortical architecture:
(1) modi®able recurrent excitatory connections that
provide feedback from higher layers to lower layers, and
(2) modi®able inhibitory intralayer synapses.
4.1 The network
Consider a network of McCulloch-Pitts neurons arranged in l layers (Fig. 1). The (input, output) of neuron i
in layer k is
uki ; vki ;
i 1; . . . ; N ;
k 0; 1; . . . ; l ÿ 1
where N is the number of neurons in a layer. There is an
additional exogenous input to layer zero only, and the
value of this input to neurons i is ui , i 1; . . . ; N . We
employ the vector notation
uk . . . ; uki ; . . .T ;
u . . . ; ui ; . . .T ;
vk . . . ; vki ; . . .T ;
k 0; 1; . . . ; l ÿ 1
Consecutively numbered layers are reciprocally interconnected via excitatory synapses. The synaptic strength
of a connection from neuron j in layer m to neuron i in
layer l is denoted by wlm
ij . There is lateral inhibition, so
that each neuron is connected via an inhibitory synapse
to the neurons in its own layer. Note that if a connection
is missing, that value of wlm
ij is zero. The synaptic weights
(excitatory or inhibitory) are real-valued, as are the
neural inputs. We employ the matrix notation
W lm wlm
ij ;
i; j 1; . . . ; N
where the l; m 0; . . . ; l ÿ 1 are either equal (the lateral
inhibition) or are consecutive integers (the reciprocal
excitation). Using signs to indicate excitation and
inhibition and using d, the Kronecker delta, we write the
input to layer l as a composition of excitatory feedback
from layer l 1, excitatory input from the previous
layer l ÿ 1, lateral inhibition and a constant bias
(exogenous input)
ul W l;l1 vl1 W l;lÿ1 ÿ W ll vl dl0 u
The output of layer l is
vl t g ul t ÿ 1 g ul1 t ÿ 1; . . . ; g ulN t ÿ 1T
1
Fig. 1. Two-layer model schematic. Only a subset of possible
connections is illustrated. Excitatory connections are depicted with
open arrows, inhibitory connectivity with ®lled arrows. Excitatory
connectivity between the layers 0 and 1, and inhibitory connectivity
within layers, are described by matrices W in Sect. 4.1
The neural outputs may be real, integer, or binary,
depending on the choice of the neural transfer function
(the sigmoid g). Here g is the transfer function, `the
sigmoid', of the neurons, g may be any one of a variety
of sigmoidal functions. The simplest such depends on a
threshold h, and is given by
1
1; z h
2
g z 1 sig z ÿ h
0; z < h
2
The sigmoidal transfer function of our processing
elements is subject to the usual biological interpretation.
That is, g z may be interpreted to represent either the
mean ®ring rate of a neuron, averaged over some
280
interval, or the instantaneous interspike interval. When
we refer to the binary behavior of our model (i.e., 0 or
1), this may be interpreted as either minimal and
maximal ®ring rate, or, alternatively, the absence (0)
or presence (1) of a spike during a small time interval. In
its present form, the model is neutral with respect to the
choice of interpretation. Except as noted, we shall, for
convenience and ease of presentation, hereafter restrict
ourselves to this g z. We refer to such a layered network
as a memory. The processes of storing information in
such a memory and of recalling information stored in it
will be introduced in Sect. 5 (Learning) and in Sect. 6
(Recall), respectively. We shall see that layer 0 functions
as both the input to the memory during learning and the
output during recall.
The growth of synaptic strength during dynamics
must be made to be bounded. We achieve this by introducing matrix valued bounds, WLlm and WUlm , so that 1
(Linsker 1986; Willner et al. 1993)
0 WLlm W lm WUlm
7
This agrees with the biological observation that synaptic
weights do not increase without bound. Alternative
methods for bounding synaptic strength by means of
damping are also available, such as Oja's rule and
Sanger's rule (see Hertz et al. 1991).
For convenience, we shall hereafter restrict ourselves
to the case of two layers.
5 Learning
4.2 Hebbian dynamics
To introduce learning dynamics, we regard all variables
as functions of time. Thus the synaptic dynamics is
speci®ed by the following matrix dierential equations:
dW lm
H lm vl t; vm t ÿ 1
dt
(For convenience, we shall not hereafter indicate the
ranges of the superscripts and subscripts when that
range is clear from the context. Likewise we shall not
indicate the temporal arguments t or t ÿ 1 when
confusion will not occur.) H lm is a matrix valued
(Hebbian) function to be speci®ed H lm vl ; vm
Hijlm vl ; vm Hijlm vli ; vmj . Consider the following
form of H (Linsker 1986; Willner et al. 1993):
lm
lm
lm
H lm x; y alm
0 xy a1 x a2 y a3
3
which we shall refer to as the base form. This form of H
encapsulates our current understanding of the biological
mechanisms underlying Hebbian plasticity in the nervous system. We shall use the base form of H for some
of our analytical results and for all our computations.
For the case of two layers we simplify the notation,
using the binary variables k and k~ (not k):
~
dW kk
~
~
H k k vk ; vk
dt
dW kk
H kk vk ; vk ;
dt
k 0; 1
4
and
~ ~
~
uk W kk vk ÿ W kk vk ku
5a
vk g uk ;
5b
k 0; 1
Suppressing superscripts for clarity, we may alternatively write the dierential equations as a recurrence
W n 1 ÿ W n sH v n 1; v n
6
the form that is used in the simulations. Here s is a
scaling factor, and n denotes the time clocked in cycles.
Learning in our layered cortical model is based on the
description of learning in the mammalian memory
architecture section of Sect. 1. In this section, we
introduce the notion of a feasible memory to characterize convergence of the Hebbian dynamics (the learning).
A feasible memory is one whose synaptic weights are
such that the memory formally responds with the correct
one of a speci®ed set of patterns in response to a
stimulus, the latter being one of an appropriate set of
noisy cues. A feasible memory is a novel mathematical
construct, and it is shown to be speci®ed by a collection
of degenerate linear programs. It is a memory that is
consistent with a correct recall corresponding to each of
a set of noisy cues, were they to be known in advance. A
number of properties of such memories are developed to
study the learning (and in Sect. 6, the recall) process.
Next we carry over the techniques of reinforcement
learning analysis to the present self- organizing memory
protocol (see Sect. 5.3). This novel use of reinforcement
learning techniques will have wide impact on the study
of self-organizing memories. For example, using these
techniques and the feasible memory construct, we
develop convergence criteria for learning in a memory
with Hebbian dynamics. As we shall see, correct recall is
impossible without memory feasibility, so we next
develop conditions for when the learning causes convergence to a feasible memory.
It is of interest to know whether a memory, once
feasible, stays feasible when subjected to further learning
(further Hebbian dynamics). For this purpose the conservation of feasibility under Hebbian dynamics is then
addressed, and conditions for it are obtained. Since, as it
turns out, there is a wide choice of feasible memories,
indeed a polyhedron-full in synaptic weight space, we
are motivated to grade these memories for performance,
capacity, etc. To do this we employ the principle of
maximum information preservation (infomax) to specify a
1
We use the conventional mathematical notation for relations
between arrays (matrices, vectors, etc.). For example, a matrix is
positive if all its entries are positive. The symbol 0 will denote an
array all of whose entries are 0, no matter what the dimension of
the array, the last being clear from the context.
281
notion of optimality (in an information theoretic sense)
among feasible memories. The mutual information for a
neuron is the uncertainty concerning it input that is resolved given its output. An infomax memory is one
whose synaptic weights are chosen so as to maximize this
uncertainty resolution.
5.1 Training
A memory is trained by exposing it to patterns v. Each
such pattern corresponds to a subset P of the integers
f1; . . . ; N g, the subset indexing which of the exogenous
inputs are on when v is presented, that is, which of the
components ui 1. Each pattern is thus a characteristic
function v, where
1; i 2 P
v i
0; i 62 P ; i 1; . . . ; N
We shall also use the spin variable r 2v ÿ 1. Exposure
to a pattern means to set the exogenous input u v in
the Hebbian dynamics (4) and (5). Since v i 2 f0; 1g for
each i, then v 2 f0; 1gN .
We now introduce the notion of a feasible memory. It
is a mathematical construct by means of which a number
of key properties of memories, for both learning and
recall, will be obtained. For example, in Sect. 6.2 we
shall see that correct recall is impossible without memory feasibility.
5.2 Feasible memories
The response of a memory is v0 , the output of layer zero,
provided that output is stationary. A feasible memory
(the terminology coming from linear programming) is
one that satis®es a certain collection of static alignment
conditions between input and response. In particular, we
specify conditions for the weights that correspond to a
memory which has as output (that is, which does recall)
each of a speci®ed collection of patterns fvl g,
l 1; . . . ; p. Recall of the pattern vl is in response to
any member of an associated set of input stimuli
Sl ; l 1; . . . ; p. The input stimuli are to be thought of
as noisy versions of the pattern. (Each input stimulus,
that is, each element of Sl , takes values in f0; 1gN .) We
would usually expect vl itself to be one of the stimuli in
Sl , and for de®niteness, we take this to be the case. The
Sl could be chosen alternatively as disjoint sets of
positive integers, where each Sl indexes the set of noisy
cues intended to cause the memory give the output
pattern vl , that is, the output v0 vl . Let sl
jSl j; l 1; . . . ; p be the number of noisy cues in Sl . We
call such a memory, a feasible memory.
Let W lm ; U l ; V m denote the weights, inputs and outputs, respectively, of a feasible memory. Then
Uk W
k k~
~ k~ kvl ÿ W kk kV k kv
~ l kv
~ s;
kV
s 2 Sl
8
~ l g U k ;
kV k kv
k 0; 1;
l 1; . . . ; p
9
specify the feasibility [cf. (5b)]. That is, (8) and (9)
contain the statements that the output of layer 0 is the
pattern vl when the exogenous input to layer 0 is the
noisy cue vs 2 Sl . The reader can check this: Replace
U ; V ; W in (8) by u; v; W . Then set k 0; 1. The result
is (5a) for k 0; 1. Similarly, (9) leads to (5b). If the
output of layer 1 is also speci®ed, say as V 1 v1l , then
we also have g Ui1 v1l . If vl v1l , a kind of resonance,
(8) and (9) simplify to
Uk W
k k~
kk
ÿ W vl vs
vl g U k
We shall use the notation vl v0l and the notation
V 1 v1l whether or not the latter is speci®ed. The
restriction V 1 v1l is, in fact, a loss of generality since
V 1 , the output of layer 1, could be dierent for each
input stimulus vs . We make this restriction for clarity
and convenience, deferring treatment of the fully general
case. Memory feasibility is also speci®ed by systems of
linear inequalities (that de®ne a polyhedron in W-space)
obtained by suitably combining (8) and (9) and using the
spin variable rkl ( 2vkl ÿ 1) annotated by pattern and
layer number. Indeed a formal de®nition of a feasible
memory is the following:
DEFINITION. A memory is called feasible if the following
de®nition holds for the input v, the weights W and
threshold h:
~
kk
kk ~
~ s ÿ h 0;
rkl á F k W rkl á ÿW vkl W vkl kv
k 0; 1;
s 2 Sl ;
l 1; . . . ; p
10
Here the bold dot (á) indicates that the multiplication is
componentwise. We shall not necessarily repeat the bold
dot, since this componentwise product, involving the
spin, is clear from the context. The quantity F is de®ned
in this relation as the indicated abbreviation. The
asterisk indicates that is to be replaced by > for
those i for which the spin variable r0l i ÿ1. That is,
for each l and s the replacement is made for those
neurons in layer zero that are intended not to ®re. For
~
the case of resonance, we simply set vkl vkl here. Note
k
that F is an N -vector. h indicates the vector of the
appropriate dimension all of whose components equal h.
We shall write rkl F k (in which the product is taken
componentwise) as rk F k or as rk F for convenience as
needed. We call W the set of solutions W of (10). That
is, W is the totality of feasible memories.
Setting2
2
Note that here and throughout we have standard mathematical
usage. Namely, bold type is not used to specify the dimension of an
array, that is, whether it is a vector or a matrix. The dimension of
an array is indicated by the context.
282
00
rl r0l ; r1l T ;
T
v^s vs ; 0 ;
W
and
ÿW
10
W
Xl
W
01
ÿW
11
!
;
11
v0l ; v1l T
vl n; vm n ÿ 1 2 f 0; 0; 1; 0; 0; 1; 1; 1g
(10) may be written more compactly as
rl WXl v^s ÿ h 0;
s 2 Sl ;
l 1; . . . ; p
12
A feasible memory is de®ned by these static constraints
and not by dynamic behavior. Thus a feasible memory is
consistent with the requirement to give a speci®ed set of
recalls, each to a dierent speci®ed collection of cues.
We shall see in Sect. 6.2 that memory feasibility is a
necessary condition for correct recall, but it is not
sucient. Indeed, in a framework wherein a feasible
memory is stimulated and then an appropriate recall
dynamics is implemented, the speci®ed (correct) recall may
or may not occur. Indeed no recall is possible, the entire
process being initial-state-sensitive. This behavior is compatible with the operation of recurrent neural nets generally.
It is also consistent with our own behavioral experience.
5.3 Convergence of the Hebbian dynamics (the training)
We may borrow techniques of the analysis of the
perceptron training algorithm to develop a convergence
result for the Hebbian dynamics. This is a novel use of
reinforcement learning techniques, carrying them over to
the study of self-organizing memory dynamics. To
establish convergence, we introduce the following three
hypotheses.3 (For convenience, we omit the superscripts
on W and H.)
A0 : W > 0
A1: W á H g uk n ÿ j; vk n ÿ j c W ; for some c > 0
A2: H 2 M 2 ;
W jM 2 ;
A0 is an additional constraint to be imposed for
memory feasibility. The arguments of H (componentwise) are the vertices of the unit square [cf. (2)]. In
particular
for constants M and j > 0
14
Then A1 will be valid if the following condition holds:
W H minfH 0; 0; H 1; 0; H 0; 1; H 1; 1g W
or in the base case (3), if
W H minfa3 ; a2 a3 ; a1 a3 ; a0 a1 a2 a3 g W
Indeed, the constant c in A1 could be chosen to be the
value of the minimum in these equations. The four
candidates for this minimum are not all positive.
However, as a practical observation based on simulations (not reported in this work), we noted that the
sequence of values of H that appears during the training
dynamics, while not a positive sequence, is positive on
the average. Thus a suitable modi®cation of the
perceptron algorithm argument that incorporates averaging is suggested and is developed in the next section.
A2 is easily satis®ed since W 2 WL ; WU [see (7)].
5.4 Averaging
We appeal to the asymptotic multiscale theory of
recurrences to develop an averaging approach for the
training algorithm. While an averaged convergence
result is weaker (in the mathematical sense) than the
one just obtained in Sect. 5.3, the averaging will allow us
to drop A0, A1 and any associated biological constraints. For a recurrence relation of the form (6), this
multiscale theory (Miranker 1981) gives
W s Hav n O 1
With these hypotheses, we derive the following lower
bound (13a) and upper bound (13b) for W n where J is
the matrix of ones (see Appendix A for details). These
three hypotheses are not biologically realistic, and so in
the next section on averaging, we show how to deduce
the convergence result without recourse to them. That is,
in Sect. 5.4 we obtain convergence of the Hebbian
dynamics in a biologically realistic context.
Here, Hav is a matrix de®ned in Appendix B where other
details of this derivation may also be found. This is the
projected averaging result, and it enables us to deduce
the needed lower bound for W n above without
recourse to A1. Nor for that matter is A0 needed since
division by W (componentwise) as performed in Appendix A is now out of the picture.
jW nj sncJ ÿ jW 0j
13a
5.5 Convergence to a feasible memory
13b
Notice that at n 0 the two bounds are compatible.
However, for n suciently large, the bounds cross and a
contradiction results. This implies that W n converges
in ®nitely many time steps. We shall refer to this ®nite
number of steps as n0 .
Since feasibility is an indispensable memory requirement, we give an approach to answering the question: If
convergent, when does the training result in a feasible
memory? In the following theorem, a condition for this
is given in terms of an appropriate matrix C, a
correlation matrix of the patterns (details are given in
Appendix C):
3
In the following and throughout, we have standard mathematical usage. Namely, the absolute value of a matrix (or of any
array) denotes the matrix (or array) of absolute values.
THEOREM. A necessary condition for training to result in a
feasible memory is that the admissible cues lie in the
jW nj
p p
nM s s 2jMJ
283
eigenspace of C
P
a
b
!
v iv j . Here the a; b
f a;bg
vary over the unit square [cf. (14)], and the relevant v
vary over the output patterns.
5.6 Conservation of feasibility
We ask whether the property of memory feasibility is
invariant under Hebbian dynamics. One biological
implication of conservation of feasibility (under Hebbian dynamics) is that once a memory has learned a
collection of patterns and is in fact a feasible memory,
further learning will not destroy already acquired
essential (for recall) memory properties. We now
develop conditions for which r F W n 0 implies
that r F W n 1 0 [which is the formal mathematical statement of conservation of feasibility, cf. (10)].
That is, we develop conditions for when the set W of
feasible memories is invariant under the Hebbian dynamics.
Let Mlk denote the number of ones in the set
fvkl i; i 1; . . . ; N g. Then the condition
~
Mlk M k
15
(that is, the quantities Mlk are independent of l) gives
conservation of feasibility. See Appendix D for details.
5.7 Infomax
Among the feasible memories W , we should like to
choose those which are optimal, and the principle of
maximum information preservation (see Linsker 1986;
Haykin 1994) furnishes a means for doing so. We use the
well-known concept of an infomax net (Linsker 1986;
Haykin 1994) ± one where the synaptic weights resolve
the most uncertainty about a neural input based on
knowledge of the output. By means of a closed form
example to be given in Sect. 6 on recall, we shall see that
the infomax technique oers a way to grade memories
by performance with clear biological implications.
As is well known, the mutual information for a neuron is the uncertainty concerning its input that is resolved given its output. Let wi ; i 1; 2; . . . be the weights
of the input synapses, and let xi be the corresponding
inputs. The mutual information denoted
P I y; x, where y
is an output of a neuron and where wi xi is the input to
that neuron, is given by (Haykin 1994)
X
wi constant
I y; x ÿ log
i
Although we do not restrict our network to pure
Gaussian input, by minimizing the sum of all the
synaptic weights in our memory network (that is, a
global infomax) we will apply the information theoretic
criteria above to design our network. In particular, by
performing the
X lm
min
wij
i;j;l;m
subject to the memory constraints (10), we can specify a
best global memory in an information theoretic sense.
The optimization problem is a collection of linear
programs (Dantzig 1963) indexed by the choices of v1l .
We call a solution of such a linear program WI , and the
collection of such solutions we call WI .
For linear programming (to be pursued computationally in Sect. 7), we write the constraints (10) in the
canonical form
A m; nZ n B n
16
where
Z n Wijab
A m; n
ral
ivbl
j
ab
a 6 b
B m ÿral i avs l i ÿ h
Here, vs l denotes the noisy cue vs where, in particular,
the l stresses that the cue lies in the collection
Sl : vs l j denotes the jth component (input) of that
cue. Setting
j
s; l; N ; l; N
s; a; i; b;
where4 s maxl jSl j, the relation between the new
indices m and n and the former ones are
m m i; b; l; s
i
als a
ls ls s; 0 i i;
0 l l; 0 s s
0 a a;
and
n n j; b; l; a
jbi
a bi
a i
a a;
0 j j;
0 b b
In the case that the jSl j are dierent, we simply make
each jSl j s by augmenting each Sl with redundant cues
as necessary. Thus the order of the matrix A is
m i ÿ 1; b ÿ 1; l ÿ 1; s n j ÿ 1; b ÿ 1; i ÿ 1; a.
5.8 Open questions
When do the Hebbian dynamics achieve a global
infomax memory in WI ? Is the set WI invariant under
the Hebbian dynamics? That is, if an infomax memory is
once achieved, is the infomax property maintained
under further learning? Does the set WI ever consist
of one point? That is, when is the infomax memory
unique?
6 Recall
The process of recall is directly modelled on the
biological description given in the architecture section
4
Recall (cf. Sect. 5.2) that jSn j denotes the number of noisy cues
in Sl ; l 1; . . . ; p:
284
of Sect. 1. As a preliminary simpli®cation, we assume
that weights are frozen during recall. We believe that this
weight freezing is not mandatory, but a formal demonstration of this is yet to be done. This might proceed
along the lines of the conservation of feasibility property
that is described in Sect. 5.6. Layer 0 serves as both the
input layer during learning and the output layer during
recall (as described in Sect. 2.; cf. Roland and Friberg
1985). Input stimuli are presented to layer 0 and the
network is allowed to relax (via the recall dynamics, to
be introduced), reaching a recall that is the output of
layer 0. This resultant output state may or may not be a
correct recall, this being dependent on the initial state of
the memory when it is ®rst exposed to an input stimulus.
In this section, the indispensability of memory feasibility is demonstrated by showing that feasibility is a
necessary condition for correct recall. Further we show
that the recall dynamics is strongly stable at a feasible
memory (see Sect. 6.3). This implies that each of the
dierent memory records de®nes a basin of attraction. A
notion of best memory results from this analysis. A best
memory is speci®ed by a geometric condition characterizing memories that have the largest basins of attraction, on the average. Lyapunov and averaging
techniques are then used to obtain a local convergence
result for the recall dynamics (see Sect. 6.4). This tells us
that if the recall process gets suciently close to a retrieval, that retrieval in fact occurs. This results in a
characterization of neuron gain functions which are
appropriate. A closed form example (of our neural
network model) is then given (see Sect. 6.5) to show that
the recall dynamics applied to a feasible memory does
not always give the correct recall. Indeed it may give
none at all, since the entire process is highly initial-statesensitive. The closed form example also exposes a
number of possible memory properties, including how
the notion of infomax may be used to grade the recall
process itself. Some conjectures dealing with the superiority in performance (in terms of speed and consistency) of an infomax memory are suggested by this
example. Finally, we develop a number of mathematical
properties of the recall process: stability, local convergence, global convergence. The global convergence result for recall is developed for the case of smooth neuron
transfer functions, and this further characterizes such
functions.
6.1 Recall dynamics
Recall dynamics consists of freezing the synaptic
weights, that is to set the right member of (4) to zero,
to set the exogenous input u vs (the stimulus) and to
rewrite (5b) as a recurrence, as follows:
k k~ k
~ s
u n W v n ÿ W v n kv
k
vk n 1 g uk n;
kk k
n 0; 1; . . .
17
Anticipating the theorem giving necessity of feasibility in
Sect. 6.2, let us suppose that the memory is feasible, and
let us place a bar on the W that appears in (17). Then we
combine (17) with the feasibility constraints (8) to obtain
(for k 0 and 1)
u0 n ÿ U 0 W 01 v1 ÿ V 1 ÿ W 00 v0 ÿ v0l
u1 n ÿ U 1 ÿW 10 v1 ÿ V 1 ÿ W 11 v0 ÿ v0l
Similarly,
v0 n 1 ÿ v0l g u0 n ÿ g U 0
v1 n 1 ÿ V 1 g u1 n ÿ g U 1
Then setting x u0 ; u1 T , y v0 ; v1 T , X U 0 ; U 1 T ,
Y v0l ; V 1 T , the recall dynamics of a feasible neural
net may be written as
y n 1 g x n
18
with
x n X W y n ÿ Y Wy n v^s ;
n 0; 1; . . .
19
the last following from X ÿ WY v^s , by de®nition. The
vector of initial neuronal outputs y 0, a vertex of the
unit square, must be known independently, and it plays
a critical role in recall, as we shall see in (31). Note that
we have placed a bar on W appearing in (19). We do this
to con®ne our attention to feasible memories, since we
shall see in Sect. 6.2 that memory feasibility is a
necessary condition for recall.
6.2 Memory recall
Corresponding to an input u, a memory is said to
produce the recall v if the sequence v0 n generated by
(18) and (19) is ®nitely convergent to v. While this makes
no demand on the limiting nature of v1 n, in order to
obtain some of the results to follow we shall also require
that v1 n be ®nitely convergent as well, so that the recall
state of the entire memory net shall be stationary. We
use the notation, u ! v, to denote this, namely, that the
memory gives the response v to the cue u. The following
theorem allows us to con®ne our attention to feasible
nets:
THEOREM. A necessary condition for the recall dynamics
to give vs ! vl , s 2 Sl , l 1; . . . ; p is that the memory is
feasible [that is, that the weights of the memory obey the
feasibility constraints (10)].
PROOF. We rewrite the recall dynamics (17) as
uk n P k n pk n
and
vk n 1 g uk n
Here
285
~ ~ k~
~ 0 kv
~ s
n kv0l ÿ W kk kvk n kv
P k n W k k kv
l
dUi0 n Ui0 ÿ h
20
and
~
~ kk v0 n ÿ v0
pk n kW k k ÿ kW
l
21
This may be checked by direct substitution. By hypothesis v0 n vl and v1 n V 1 , say for n > n0 [cf. (16)]
suciently large. Then also for n > n0 we have from (20)
that pk n 0, and from (21) that uk n P k n U k ,
say. Then
~ ~ 1
~ 0 kv
~ s
U k W k k kV
kv0l ÿ W kk kV 1 kv
l
Then g Ui0 dUi0 n v0l i. Combining this with (7),
we have
dVi0 n 1 0
24
For the input-output at time n 1, we have
Ui0 dUi0 n 1 W 01 Vi0 dVi0 n 1
ÿ W 00 Vi0 dVi0 n 1 v0l
W 01 Vi0 ÿ W 00 Vi0 v0l
by (24). The right member here is Ui0 , by de®nition.
Then dUi0 n 1 0. Then from (22) with n replaced by
n 1, we get v0 n 1 v0l .
and
v0l ; V 1 T g U 0 ; g U 1 T
The last two equations are the feasibility constraints in
the form (8) and (9), which demonstrates the theorem.
Note: An open question concerns the case of recall
when the synaptic weights are allowed to continue development. In particular, under what conditions does a
feasible memory stay feasible during such an active recall
process? If feasibility is conserved under the Hebbian
dynamics (cf. Sect. 5.6), then the theorem here allows us to
eliminate the need to freeze the weights W after training.
Best memory. The hypothesis concerning constraint
inactivity, as it used here, suggests an alternative notion
to the optimal memory described previously in the
context of infomax. A `best memory' may be de®ned as
one for which the constraints are `the most inactive'.
Then a best memory corresponds to the case when the
polyhedron of feasible memories contains the largest
possible inscribed sphere, and where the weights W of
that memory are at the center of that sphere. Such a
memory would be the most robust biologically. That is,
it would have `the most ground to give' through any
process of degradation of synaptic weights.
6.3 Stability of recall dynamics
We would expect cortical memories to have basins of
attraction relevant to each memory trace, so that noisy
cues do give recall. This result is provided by the
following stability considerations.
The recall dynamics is described by the sequence
uk n; vk n, n 0; 1; . . . We show that if u0 n gets
suciently close to the value U 0 of a feasible memory,
then it converges at the next step. That is,
u0 n 1; v0 n 1 U 0 ; V 0 . As we shall see, suciently close means: to within threshold of. Thus the
recall dynamics is strongly stable at a feasible memory (a
local and ®nite convergence result). Let
k
k
k
v n V dV n
22
uk n U k dU k n
k
Vi dVi n 1 g
Uik
We consider an alternate local convergence demonstration is valid for the case of dierentiable transfer
functions g whose derivative g0 > 0. The positivity of
g0 is a biologically feasible requirement, since biological
neurons typically increase their output as the net
depolarization due to synaptic inputs increases.
Using the notation in (19), we may write the inputoutput relation of a feasible memory,
01
00
10
11
U 0 W g U 1 ÿ W vl vs
U 1 W g U 0 ÿ W V 1
as
X Wg X v^s
Then using (1) we have
k
6.4 Local convergence (dierentiable g)
dUik
Comparing this with (19), we see that x X is an
equilibrium point of the recall dynamics. Next, to show
that the equilibrium point x X is an attractor, we
introduce a Lyapunov function. Set
n
from which follows upon setting k 0, that
v0l i dVi0 n 1 g Ui0 dUi0 n
23
Suppose no feasibility constraint is active at
u n; vk n (that is, u0 n 6 h). In particular, that
k
h < u0i n Ui0 dUi0 n;
25
8i such that vl i 1
Suppose further that dUi0 n is suciently small. In
particular, that
Wh x Wg X v^s
26
and
g h x
27
Next consider the function
X Z gn i
1
hÿ1 z dz
Hn ÿ gTn Wgn
2
i
28
286
where the sum is over all components gn i of gn . To see
that Hn is a Lyapunov function, we require that W be
ÿ1
symmetric and that W > 0. (Recall that a matrix is
positive if all its entries are positive.) For details, see
Appendix E.
6.5 Nonuniversality of recall
Our experience tells us that cortical memories give
wrong answers or no answers on occasion. These
features ®t into our feasible memory model. Indeed the
memory state when a recall process is commenced
impacts the outcome. To show that the local result is the
best we can do, we give an example showing that the
recall dynamics of a feasible memory does not always
give the correct recall. Indeed, it may give none at all.
Consider the following two-layer, two-neuron net without lateral inhibition:
We have
v0 n 1 g w01 v1 n x
29
v1 n 1 g w01 v0 n
Corresponding to the two (input, output) pairs
x 1; V 0 1 and x 0; V 0 0, we have
1 g w01 V 1 1 and 0 g 0
That is, we have the following (degenerate) constraint
polyhedron:
w01 V 1 1 > h > 0
30
At time zero the state of the net outputs,
y 0 v0 0; v1 0T , may be any vertex of the unit
square [see (14)]. We show in the following table the
net's response to the input cue, x 0.
0,
1,
0,
1,
0
0
1
1
< h; < h
< h; > h
> h; < h
> h; > h
0
1
1
1
0
2
1
2
0
1
2
2
0
(1, 0)
(0, 1)
1
Infomax. This special model net allows us to solve for
the infomax values of the weights in closed form. Indeed,
we see that the minimum value of w01 w10 subject to
the constraint (30) (plus the non-negativity constraints
w01 ; w10 0) occurs at w10 0 and w01 h ÿ 1=V 1 .
For V 1 1, we see that these values of
w01 ; w10 0; h ÿ 1 lie in the ®rst column of the table
(31). From this we make the following enticing observations for this example: (i) the infomax net always gives
a recall which is moreover correct, (ii) the average time
to recall is smallest for the infomax net. To what extent
these properties prevail for general infomax nets (and
their cortical implications) is a question we leave for
later resolution.
6.6 Global convergence (dierentiable gain functions)
We conclude with a second observation on dierentiable
transfer (or gain) functions. Suppose that max jg0 j < c (a
property of transfer functions that we expect to be valid
in cortical neurons). Then we use the mean value
theorem to derive a global convergence result for the
recall dynamics. We use the notation in (19), and we set
dx x ÿ X . Then if for some constant q; ckWk q < 1,
we may show that limn!1 dx n 0. We refer to
Appendix 9 for details.
7 Computational implementation
Our numerical simulations were designed to generate
feasible solutions (i.e., feasible memories) and to study
the recall dynamics of the best such in the infomax sense
(see Sect. 5.7). We compute the range of feasible solutions
that emerges from the memory design. We then select the
most favorable solutions from an infomax perspective
and characterize the behavior of these networks during
recall. This characterization includes the dynamical
behavior when presented with external stimuli (noisy
cues), and a description of the accuracy of recall.
7.1 Model architecture
(31)
The values that the feasible weights w01 ; w10 may
have are indicated by the inequalities along the top row
in (31); the initial values y 0 are labelled in the column
at the left. The net oscillates in the indicated two cases in
the right-most column 1; 0 $ 0; 1. That is, the net
gives no recall at all in these two cases. In all other cases,
a recall is achieved in the number of cycles as displayed.
In the lower right-hand corner, an incorrect recall
v0 1 occurs. All other recalls are correct v0 0. In
Appendix F we indicate how table (31) is derived.
The basic architecture of a two-layer model is shown in
Fig. 1. The input vector buers exogenous stimulus
values and allows them to be presented to the layer 0 for
a ®xed simulation epoch. Neuronal outputs are given by
(1).
For visualization purposes, stimuli are generated as
two-dimensional pixel patterns and converted to vectors
of exogenous stimuli. As described in Sect. 5.2, activity
patterns in layer 0 are viewed as the `memory' evoked by
the stimulus. Thus, the `activity' of layer 0 can be displayed as a two-dimensional pixel array ± a feature that
was useful during exploratory simulations to determine
whether a pattern had been successfully stored and recalled. For the experiments described below, each layer
had four elements.
287
Fig. 2. Examples of the network trajectories observed during recall.
Two are convergent, and two are oscillatory. Convergent trajectories
could correspond to correct, incorrect, or spurious recalls. All other
trajectories were classi®ed as oscillatory
7.2 Infomax design and recall dynamics
As described above (see Sect. 5.7), the constraints (12)
can be represented as a collection of linear programs [see
(14)]. The experiments described below involved two
overlapping patterns in layer 0 as `target patterns' (P1
and P2 , say) and each was associated with two `noisy
cues'. The pattern of activity in layer 1 that corresponded to each target and its associated noisy cues was
speci®ed. Using the linear program procedure in SAS
(v6.09), we computed the solutions W [see (10)] for all
possible memories. The possible memories are indexed
by the number of possible choices of v1l i
1; . . . ; N ; l 1; . . . ; p. This number is 2pN , which equals
256 for our simulation. Of these, 225 states yielded
feasible solutions.
To study recall dynamics, we applied the stimuli to the
network and examined the sequence of network states
that evolved over time. These were classi®ed as either (a)
convergent correct or convergent incorrect (if a steadystate dynamics was achieved): (b) spurious (if a steady
state was reached that did not correspond to either of the
design patterns); or (c) oscillatory (if a steady state was
not achieved within a speci®ed number of steps).
Fig. 3. Distributions of recall states for all (225) feasible solutions. The
three groups of columns re¯ect data sorted according to the applied
stimulus, that is all stimuli, or whether or not they were in the design
set. Each block of columns contains data over all possible starting
values of the network
Two classes of dynamical states were observed
(Fig. 2). Convergent states were those in which the
network eventually reached a stable state. Interestingly,
only a small number of unique states were seen. The
number of `correct' states might be interpreted as a
measure of the success of the network in recovering the
target state from a stimulus that consisted of either a
`noisy' cue or the target state itself. States that converged
to the other stored pattern were labelled as incorrect,
and might be viewed as `confusional errors'.
The remaining convergent states were considered
`spurious', since they did not correspond to any of the
patterns in the memory design. Presumably, the associated basins of attraction result from interference or
overlap eects between the stored patterns. Only a small
number of such states was observed, however. For example, no `zero' states were seen, or states in which all
the neurons in layer 0 were on. Similarly, a limited
number of unique oscillatory states were observed,
suggesting that only a few `limit cycle' attractors were
formed by the present design.
Examples of trajectories of each of these dynamics are
given in Fig. 2. Distributions of the dynamics illustrated
in Fig. 2 are shown for all feasible solutions in Fig. 3, for
all best feasible solutions (objective 2:53), in Fig. 4
and for all worst feasible solutions (objective 4:51) in
Fig. 5.
Fig. 4. Distributions of recall states for all best feasible
solutions. This is similar to Fig. 3, but data are selected only
from the four best feasible solutions (objective value 2.53).
Note that when design stimuli are applied to the network, the
number of correct recall states exceeds that of the other
categories
288
Fig. 5. Distributions of recall states for all worst feasible
solutions. This is similar to Fig. 4, but data are selected only
from the 17 worst feasible solutions (objective value 4.53).
The correct recall states are much fewer when compared with
those generated by the best feasible solutions (Fig. 4)
8 Discussion
The nature of learned representations, their distribution,
mechanisms of encoding, and access during recall have
been the subject of much theoretical and experimental
work. The model described herein begins to address
some of the systems-level architectures that may be
important in mammalian memory. Some of the issues
that our model can begin to address include: How does
plasticity at inhibitory synapses contribute to memory
storage? How does the dynamic interaction between
subcortical and cortical areas encode memory? How do
the dierent representations encoded at stages (layers) in
the memory hierarchy contribute to the memory process, formation and recall?
Role of inhibition. Numerous neural network studies
have examined the role of inhibitory connections in the
context of associative memory, and these generally
involve the use of inhibitory connections to generate
competitive interactions among processing elements
(Hertz et al. 1991; Haykin 1994). In contrast, Baird
and Eeckman (1993) used a constant local inhibitory
feedback to embed periodic attractors in a recurrent
network architecture. The principal role of the (®xed)
inhibitory connections was to endow the system with
oscillatory dynamics. In our model, the inhibitory
weights do more than simply implement a competitive
network. Since they are modi®able, they contribute
towards memory dynamics in the same way that
excitatory connections do. The existence of use-dependent plasticity in inhibitory systems has recently received
some empirical support (Kano 1994). We view the
modi®able intralayer inhibitory connections in our
model as a mechanism for generating an eective
representation within the multilayer hierarchy. These
representations depend on the dynamic interaction
between neuronal activity within a layer and the signals
impinging on the layer from above and below.
The present model does not employ within-layer excitatory contacts, and some aspect of the intralayer code
could be implemented by such connections. Future
studies that introduce intralayer excitation will address
this possibility as well as the stability issues that arise
from the positive feedback that such connections introduce.
Recall dynamics. The sensitivity to initial conditions of
the recall dynamics was a surprising ®nding, especially
its degree. With the memory designs studied, recall is not
`perfect' but is a function of the dynamical history of the
network. This suggests a mechanism that could underlie
errors during recall of sequences. If we view the state of
the network at any instant as the background `context'
against which new stimuli are presented, then the ability
of the network to converge to a stored memory state
(given a previously associated stimulus) depends on this
context. Thus, convergence to a spurious attractor is
more likely if the network state happens to lie closer to
the spurious attractor than the desired memory at the
time the stimulus is presented. It may be that some type
of `priming' stimulus might increase the proportion of
correct recall at the expense of spurious or incorrect
states. Alternatively, we might view the starting state of
the network as `noise' that corrupts the stimulus, and
increases the likelihood of error in recall. A more
detailed analysis of the eect of initial state on recall
dynamics will be required to quantify the robustness of
the network against both extraneous and endogenous
noise.
A key question is how biological nets deal with this
initial state sensitivity. It may be that additional cascaded layers result in more robust recall dynamics, or
that some form of preprocessing (such as saccades or
attention) overcomes initial state sensitivity. Our future
studies will probe and illuminate such possibilities.
Comparison with existing architectures. A number of
other models incorporate, as we do, feedforward and
feedback weights, and modi®able lateral inhibitory
connections. For example, the Adaptive Resonance
Theory (ART) model (Grossberg 1987), the `wake-sleep'
algorithm (Hinton et al. 1995), and the Bidirectional
Associative Memory (BAM) architecture (Kosko 1992)
all employ bottom-up and top-down streams. Our model
diers from these in several ways. First, we place few
restrictions on the properties of the interlayer weights.
Unlike BAM, the feedforward and feedback weights are
independently speci®ed. We also specify a common
learning algorithm for all layers, unlike ART and wakesleep, which use dierent training methods for the
bottom-up and top-down streams. Second, training of
the feedforward and feedback weights proceeds simul-
289
taneously, unlike the phased training by means of a
Hebb-like mechanism, in contrast to the winner-take-all
algorithm used in ART. Indeed, this is a feature that our
model shares with networks that employ anti-Hebbian learning to decorrelate the output of a layer
(e.g., FoÈldiaÂk 1990) or to maximize the mutual information transmission through the network (Plumbley
1990).
Implications for models of mammalian memory. A leading
systems-level model of mammalian memory is that of
Squire and Zola-Morgan (1991). In describing the role
of temporal lobe structures involved in memory, they
argue that neocortical structures support perceptual
processes as well as short-term memory. Projections
from the activated cortical regions enter medial temporal lobe structures (including perirhinal and entorhinal
cortex and hippocampus). They propose that (i) the
hippocampal areas are specialized for forming conjuctions, or associations between individual elements of the
sensory event; and (ii) these `bindings' are used for later
retrieval. Distributed activity in cortical networks may
represent aspects of the sensory world; in the case of
area TE, this may re¯ect, for example, visual object
quality. For this distributed activity to develop into a
stable long-term memory, activity must occur at the time
of learning along projections from these neocortical
regions to the medial temporal lobe. The pathways
involved include the parahippocampal gyrus, perirhinal
cortex, and entorhinal cortex.
Models of memory that have, as a central component,
a dynamic interaction between subcortical and cortical
areas have also been proposed by other workers
(Grossberg 1987; Rolls 1990; Miller 1991; Mumford
1994). It is generally believed that this interplay between
cortical regions is restricted to certain categories of associative memory, such as those involving integration
over space or some complex array of environmental
cues. An attractive feature of our model is that it incorporates these features (intralayer inhibition and reciprocally connected layers) in a form that is both
analytically tractable and extensible to larger and more
complex networks.
Future work. From the perspective of arti®cial neural
nets, our system would be termed a self-organizing,
cascaded, bidirectional, autoassociative memory, with
both excitatory and inhibitory connections. However,
since our objective herein was to obtain results and insight into the working of cortical memory, the performance of our model as an arti®cial neural net is not at all
in question. Nevertheless, the mathematical results and
techniques developed here will form the basis for future
studies of capacity and other traditional measures of
neural network performance.
Our simulations suggest that, with appropriately
chosen synaptic weights, a simple, multilayer recurrent
network displays a form of associative memory. Future
work will focus on determining which particular forms
of Hebbian algorithms can embed memories that can be
reliably recalled. In addition, it will be interesting to
explore the eects of scaling layer number and size, and
methods by which `priming' might increase the accuracy
of recall. Finally, we have assumed that memories are
stored as `®xed points' in the state space of the network,
and that the oscillatory states represent unwanted dynamics. Certain aspects of perceptual and mnemonic
processing appear to involve oscillations in ensembles of
neurons (Gray and Singer 1989). It may be that oscillatory memory states have better robustness and convergence properties than point attractors (Liljenstrom
and Wu 1995), and oscillatory dynamical behaviour of
our model might be exploited for this purpose.
Acknowledgement. The research reported here was supported by
the Neuroengineering and Neuroscience Center (NNC) at Yale
University.
Appendix A
In this appendix, we derive the bounds (13).
From (6), we have
W n s
n
X
j1
H g u n ÿ j; v n ÿ j W 0
Then multiplying this relation componentwise by W (the corresponding matrix of weights of a feasible memory), we have
W W n snW min H g u n ÿ j; v n ÿ j W W 0
j
A:1
Also from (6), we deduce (componentwise) that
jW n 1j2 ÿ jW nj2 s2 H 2 2sW nH
A:2
Then from (A.1) (componentwise), H0 and H1 in Sect. 5.3 imply
(componentwise)
jW nj snc ÿ jW 0j
From (A.2) and H2 we deduce that (componentwise)
jW n 1j2 ÿ jW nj2 s s 2jMM 2
Setting j n here and summing from j 0 to n ÿ 1, we get
(componentwise)
p p
jW nj nM s s 2jM
Appendix B
For a recurrence relation of the form (6), the multiscale theory
(Miranker 1981) gives
W n W0 s O s;
s sn
where W0 s obeys the dierential equation
nÿ1
dW0
1X
lim
H g u j; v j
n!1 n
ds
j0
The right member here is the average of H as its arguments vary
appropriately over vertices of the unit square [see (14)]. Calling this
average Hav , we have
dW0
Hav
ds
290
so that
Now in (C.6) alternately set a; b equal to an element of the set
f a; bg, where
W W0 0 Hav s
s Hav n O 1
~ k;
~ k; k;
~ kg
~
f a; bg f k; k; k; k;
and add. We get
!
"
#
X
X
kk kk
kk kk
ril Bil rij Cij vkl j
dij
0
Appendix C
j
To develop the theorem stated the Sect. 5.5, we begin by writing the
Hebbian dynamics for n > n0 (13) as follows:
0 H v n; v n
H
ab
val ; vbl
dij
"
C:1
0A
Bab val
C ab vbl
Dab val vbl
R
ab
C:2
l
ab
ab
ab
A ;B ;C ;D
!
@
@
@2
H val ; vbl jva vbl 0
1; a ; b ;
l
@vl @vl @val @vbl
R
W kk i; j dij
2
1
@
4
2
@vbl
!2
a 2
val vbl 0
3
H ab val ; vbl 5
v
val vbl 0
R3 O
0
i
Cijab vbl
j
a
Dab
ij vl
ivbl
j
Rab
ij
l
C:3
j
where sl sl i is a slack vector whose components obey the
following constraints:
rl isl i 0;
i 1; . . . ; N
C:5
~ s i
kv
Now multiply (C.3) by a scaling factor
and sum over j. We get
(using the Kronecker delta)
X ab ab
ab
a
b
0
rij Aij Dab
ij vl ivl j Rij l
j
X
j
dij
X
l
#
~
vkl j
ab
ab a
b
rijab Aab
ij Dij vl ivl j Rij l
X
C:7
l
~~ ~~
~
~
~~ ~~
~ ~
rilkk Bilkk rilkk Bilk k rijkk Cijk k rijkk Cijkk
C:9
val j
rilab Bab
il
C:6
X
j
rijab Cijab vbl j
X X
j
C:10
a
b
rijab Dab
ij vl ivl j
f a;bg
Xh
k~
k
k
k k~ k k~ k
rijkk Dkk
ij vl ivl j rij Dij vl ivl j
j
~
~~
~
~
~~ ~
~
k
kk kk k
k
k
rijkk Dkk
ij vl ivl j rij Dij vl ivl j
so that (C.10) becomes
X X ab ab
ÿsl i h
rij Aij Rab
ij l
j
i
C:11
C:12
f a;bg
Recall that the sl i obey the condition in (C.5). Setting k 0; 1,
alternately in (C.11), we get
0
Xh
0
0
10 10 1
0
rij00 D00
ij vl ivl j rij Dij vl ivl j
j
0
1
11 11 1
1
rij01 D01
ij vl ivl j rij Dij vl ivl j
rijab
!
f a;bg
~~ ~~
rijk k Cijkk
#
Also set
and write the vector of feasibility constraints [see (10)] also componentwise as follows:
X kk
X kk~
~
~ s iÿh sl i
ÿ
W i; jvkl j
W i; jvkl j kv
C:4
j
l
~
j f a;bg
vbl 3
a
Bab
ij vl
!
~
~ s i
ÿsl i h ÿ kv
X X ab ab
ab
a
b
rij Aij Dab
ij vl ivl j Rij l
vb 2 R3
Note the components of val and vbl take on the values 0 and 1 only,
and so some care must be taken when the smallness of this remainder is needed. In fact we shall apply these considerations in the
base case when Rab 0. Write the matricial equation (C.2) componentwise:
Aab
ij
~~ ~~
rilk k Bilk k
~
~
vkl j
and
For the remainder R3 , we have
val
rijkk Cijkk vkl j
X
X X
~
2
3
!2
14 @
ab a b 5
l
H vl ; vl
2
@val
!
~
#
l
and
ab
rilkk Bkk
il
X
~
~
rijk k Cijkk
Now comparing (C.4) and (C.7), we make the following assignments:
X
~
~
~
kk
k k~ kk
ÿW i; j dij
rilkk Bkk
rijkk Cijkk rijkk Cijkk
C:8
il ril Bil
where
ab
dij
j
~ ~
rilk k Bilk k
l
"
!
X
l
dij
where the indices a; b are vertices of the unit square [see (14)]. We
ask when this relation is consistent with the feasibility constraints.
To see this we expand Hab in (C.1) in a power series with remainder
in terms of its two arguments. Doing this, we obtain
ab
l
"
i
C:13
and
vs i
Xh
1
1
10 10 1
0
rij11 D11
ij vl ivl j rij Dij vl ivl j
j
0
1
01 01 0
1
rij01 D01
ij vl ivl j rij Dij vl ivl j
i
C:14
291
This is a set of equations for the 4N 2 scaling factors rijab . There are
two equations for each value of i; s and l. Recalling that jS
Pl j is the
number of stimulus cues per pattern, we see there are 2N pl1 jSl j
equations. Since the system (C.13), (C.14) is singular, the number
of unknowns must exceed the number of equations.
This gives the condition
p
X
l1
jSl j < 2N
Each equation in (C.13), (C.14) has 4N unknowns. Additional
conditions involving the patterns vl and the stimuli vs may be
derived by applying the well-known conditions for the solutions of
linear systems of equations (e.g., Golub and van Loan 1989) to
(C.13), (C.14). These take the form of restrictions on the collection
of permissible noisy cues relative to the patterns. In the base case,
this restrictions may be derived as follows.
We note that in this case [see (3)], Rab 0 and
ÿ ab ab ab ab ab ab ab ab
A ;B ;C ;D
a3 ; a1 ; a2 ; a0
Then in this case the expressions (C.8), (C.9), (C.12), (C.13), (C.14)
simplify somewhat. In particular (C.12) becomes
X X ab ab
ÿsl i h
a3 rij
j
f a;bg
Dab
ij
In particular,
is independent of i. Take rij to be independent of i
also and set
0
1
X
va ivb jA
C@
f a;bg
Thus ®nding a set of multipliers as a solution of the relevant linear
systems here requires that vs 2 eigenspace C (an orthogonality
condition: see Golub and van Loan 1989). That is, the admissible
noisy cues must lie in the eigenspace of an appropriate matrix of
output pattern correlations (cf. Haykin 1994, p. 200.).
~
Gk M k ÿ M k h vk n 1; 1
Now the Hebbian hypothesis, concerning positive and negative
reinforcement of synaptic strength, requires that sig h vk n 1; 1
2vk n 1 ÿ 1: Then Gk cannot be expected to be of one sign.
~
Thus according to this approach, we must set M k M k . That is, set
k
G 0, to obtain rF W n 1 0, the result we seek. Improved
invariance results could be obtained by using a possible positivity
of F k to allow some negativity in Gk .
Appendix E
To show that Hn given in (28) it is a Lyapunov function (under the
conditions on W stated in Sect. 6.4), ®rst note that, by direct
substitution of (20),
1
T
gn T W gn1 ÿ gn
g
2 n1
X Z gn1 i
hÿ1 zdz
Hn1 ÿ Hn ÿ
i
gn i
Next using the mean value theorem of the integral calculus, we get
1
T
ÿ gn T W gn1 ÿ gn
g
2 n1
hÿ1 gT gn1 ÿ gn
Hn1 ÿ Hn ÿ
where g is the `intermediate value' occurring in that theorem.
Let
x hÿ1 g
Then using (26), (27), we ®nd
1
T
gn1 gn T W gn1 ÿ gn xT gn1 ÿ gn
2
1
T
ÿ ÿ h xn1 h xn T W xÿT gn1 ÿ gn
2
xn2 ÿ x xn1 ÿ x T ÿ1 xn2 ÿ x xn1 ÿ x
ÿ
ÿ2
W
2
2
2
2
Hn1 ÿ Hn ÿ
Appendix D
Appendix F
To establish a condition for the conservation of feasibility, we
begin by noting that as the expressions in (6) and (10) are linear in
W , we have
rk á F k W n 1 rk á F k W n srk á Gk
where the N -vector
For the convenience of the reader, we give the derivation of the
entry 0 in the ®rst row, ®rst column of (31) [the (0;0)n(< h; < h)
entry]. The arguments for developing the remaining tabular entries
are similar, albeit more rami®ed.
Since the stimulus x 0, the recall dynamics is given by
v0 n 1 g w00 v1 n
~ ~
Gk ÿH kk vk H kk vk
Recall that vki n vk i: Then for each pattern P , the ith component of G is
8 P 00 0
P
ÿ H v n 1; 1 Hij01 v0i n 1; 1v1 j;
>
>
> j2P ij i
j
>
<
Gi
P 11 1
P 10 1 k 0
1
>
ÿ
n
1;
1v
j
v
Hij vi n 1; 1;
H
>
>
> j ij i
j2P
:
k1
Let Mlk denote the number of ones in the set fvkl i; i 1; . . . ; N g:
~
Then the condition Mlk M k gives conservation of feasibility. If
lm
lm
Hij x; y Hi x; y; that is, if all the Hebbian functions are independent of j, we may write
v1 n 1 g w10 v0 n
Since we are in the ®rst row of the table, v0 0 v1 0 0, so that
v0 1 g w01 v1 0 g 0
v1 1 g w10 v0 0 g 0
However, g 0 0 since the threshold h is positive. That is,
v0 1 v1 1 0 as claimed.
Appendix G
~
~
Gki ÿM k Hikk vki n 1; 1 M k Hik k vkj n 1; 1
Next suppose that Hilm x; y h x; y. Then
To see that dx ! 0 under the hypothesis on W given in Sect. 6.6, we
subtract transfer (or gain) relations to get
292
vk n 1 ÿ V k g uk n ÿ g U k
k uk n ÿ U k
g0 U
by the mean value theorem. Now setting dvk vk ÿ V k , we may
write this as [cf. (5b)]
~
~ s
k W kk vk~ ÿ W kk vk kv
dvk n 1 g0 U
k k~
~
kk
~ s
ÿ W V k ÿ W V k kv
Using the notation in (20), we may rewrite this as the following
vector relation:
dx n 1 g0 Wdx n
The result follows from this under the hypothesis on W given in
Sect. 6.6.
References
Amaral DG, Price JL (1984) Amygdalo-cortical projections in the
monkey (Macaca fascicularis). J Comp Neurol 230:465±496
Baird B, Eeckman F (1993) A normal form projection algorithm
for associative memory. In: Hassoun M (ed) Associative neural
memories: theory and implementation. Oxford University
Press, Oxford
Brown TH, Kairiss EW, Keenan CL (1990) Hebbian synapses:
biophysical mechanisms and algorithms. Annu Rev Neurosci
13:475±511
Collingridge GL, Bliss TV (1995) Memories of NMDA receptors
and LTP. Trends Neurosci 18:54±56
Dantzig, GB (1963) Linear programming and extensions. Princeton
University Press, Princeton, NJ
Felleman DJ, Van Essen DC (1991) Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex 1:1±47
FoÈldiaÂk P (1990) Forming sparce representations by anti-Hebbian
learning. Biol Cybern 64: 165±170
Golub, GH, van Loan CF (1989) Matrix computations, 2nd edn.
Johns Hopkins University Press, Baltimore
Gray C, Singer W (1989) Stimulus-speci®c neuronal oscillations in
orientation columns of cat visual cortex. Proc Natl Acad Sci
USA 86:1698±1702
Grossberg S (1987) Competitive learning: from interactive activation to adaptive resonance. Cogn Sci 11:23±63
Haykin S (1994) Neural networks: a comprehensive foundation.
Macmillan, London
Hertz J, Krogh A, Palmer RG (1991) Introduction to the theory of
neural computation. Addison-Wesley, Reading, Mass
Hinton GE, Dayan P, Frey BJ, Neal RM (1995) The wake-sleep
algorithm for unsupervised neural networks. Science 268:1158±
1161
Kano M (1994) Calcium-induced long-lasting potentiation of
GABAergic currents in cerebellar Purkinje cells. Jpn J Physiol
44 [suppl 2]:S131± S136
Kosko B (1992) Neural networks and fuzzy systems. Prentice-Hall,
Englewood Clis, NJ
Liljenstrom H, Wu XB (1995) Noise-enhanced performance in a
cortical associative memory model. Int J Neural Syst 6:19±29
Linsker R (1986) From basic network principles to neural architecture. Proc Natl Acad Sci USA 83:7508±7512
Miller R (1991) Cortico-hippocampal interplay and the representation of context in the brain. Springer, Berlin Heidelberg New
york
Miranker WL (1981) Numerical methods for sti equations. Reidel, Dordrecht
Mumford D (1994) Neuronal architectures for pattern-theoretic
problems. In: Koch C, Davis JL (eds) Large scale neuronal
theories of the brain. MIT Press, Cambridge, Mass
Plumbley MD (1993) Ecient information transfer in anti-Hebbian
neural networks, Neural Networks 6:823±833
Roland PE, Friberg L (1985) Localization of cortical areas activated by thinking. J Neurophysiol 53:1219±1243
Rolls E (1990) Functions of neuronal networks in the hippocampus
and of backprojections in the cerebral cortex in memory. In:
McGaugh JL, Weinberger NM, Lynch G (ed) Brain organization and memory. Oxford University Press, Oxford
Squire LR, Zola-Morgan S (1991) The medial temporal lobe
memory system. Science 253:1380±1386
Szentagothai J (1969) Architecture of the cerebral cortex. In: Jasper
HH, Ward AA, Jr, Pope A (eds) Basic mechanisms of the
epilepsies. Little, Brown, Boston
Thomson AM, Deuchars J (1994) Temporal and spatial properties
of local circuits in neocortex. Trends Neurosci 17:119±126
Traub RD, Jeerys JG (1994) Simulations of epileptiform activity
in hippocampal CA3 region in vitro. Hippocampus 4:281±285
Willner B, Miranker WL, Lu C-P (1993) Self-organization of the
locomotive oscillator. Biol Cybern 68:307±320
Wilson M, Bower JM (1992) Cortical oscillations and temporal
interactions in a computer simulation of piriform cortex. J
Neurophysiol 67:981±995
Zeki S, Shipp S (1988) The functional logic of cortical connections.
Nature 359:311±317