Academia.eduAcademia.edu

Cortical Memory Dynamics

1998, Biological Cybernetics

Biological memories have a number of unique features, including (1) hierarchical, reciprocally interacting layers, (2) lateral inhibitory interactions within layers, and (3) Hebbian synaptic modi®cations. We incorporate these key features into a mathematical and computational model in which we derive and study Hebbian learning dynamics and recall dynamics. Introducing the construct of a feasible memory (a memory that formally responds correctly to a speci®ed collection of noisy cues that are known in advance), we study stability and convergence of the two kinds of dynamics by both analytical and computational methods. A conservation law for memory feasibility under Hebbian dynamics is derived. An infomax net is one where the synaptic weights resolve the most uncertainty about a neural input based on knowledge of the output. The infomax notion is described and is used to grade memories and memory performance. We characterize the recall dynamics of the most favorable solutions from an infomax perspective. This characterization includes the dynamical behavior when the net is presented with external stimuli (noisy cues) and a description of the accuracy of recall. The observed richness of dynamical behavior, such as its initial state sensitivity, provides some hints for possible biological parallels to this model.

Biol. Cybern. 78, 277±292 (1998) Cortical memory dynamics Edward W. Kairiss1,3, Willard L. Miranker2,3* 1 Department of Psychology, Yale University, New Haven, Connecticut, USA Department of Computer Science, Yale University, Box 208205, New Haven, CT 06520-28205, USA 3 Neuroengineering and Neuroscience Center, Yale University, New Haven, Connecticut, USA 2 Received: 26 December 1995 / Accepted: 14 November 1997 Abstract. Biological memories have a number of unique features, including (1) hierarchical, reciprocally interacting layers, (2) lateral inhibitory interactions within layers, and (3) Hebbian synaptic modi®cations. We incorporate these key features into a mathematical and computational model in which we derive and study Hebbian learning dynamics and recall dynamics. Introducing the construct of a feasible memory (a memory that formally responds correctly to a speci®ed collection of noisy cues that are known in advance), we study stability and convergence of the two kinds of dynamics by both analytical and computational methods. A conservation law for memory feasibility under Hebbian dynamics is derived. An infomax net is one where the synaptic weights resolve the most uncertainty about a neural input based on knowledge of the output. The infomax notion is described and is used to grade memories and memory performance. We characterize the recall dynamics of the most favorable solutions from an infomax perspective. This characterization includes the dynamical behavior when the net is presented with external stimuli (noisy cues) and a description of the accuracy of recall. The observed richness of dynamical behavior, such as its initial state sensitivity, provides some hints for possible biological parallels to this model. 1 Introduction The contemporary e€ort to understand the brain as a computational device faces a great challenge. Insights into the morphological and biophysical complexity of the nervous system, revealed by increasingly sophisti- Correspondence to: W.L. Miranker (e-mail: miranker-willard@yale.edu, Tel.: +1-203-432 4671, Fax: -203-432 0593)  Research Sta€ Member Emeritus, IBM Research Center, Yorktown Heights, New York, USA cated methods over the last decade, have not been accompanied by equivalent insights into functional architecture. Among the many related questions are: What structural and biophysical details of neurons and their connections are relevant to the computations performed by them? What is the signi®cance (to information processing) of the morphological and biophysical heterogeneity of neurons? Do the di€erent circuit architectures seen throughout the brain subserve di€erent computational functions? The goal of computational neuroscience has been to develop a theoretical framework that can integrate knowledge of structure, function, and information processing by neurons, thereby leading to better understanding of higher cognitive qualities of the brain. Computational studies of neural information processing generally take one of two forms. Biophysical models of neurons and networks attempt to embody molecular, cellular and circuit details of well-studied brain regions in models that accurately re¯ect experimental ®ndings. This class of models serves as a valuable tool for gaining insight into the complex biophysical interactions that generate neural activity, and thus provides guidance for future experiments. On the other side, arti®cial neural networks rely on highly simpli®ed abstractions of biological nervous systems to create models that have powerful computational properties but provide little connection with information processing in brains. The present work is an e€ort toward bridging the gap between biologically realistic models and abstract computational models. We design an arti®cial neural network that functions as an associative memory. While its architecture is similar to that of many recurrent networks (such as Hop®eld-Grossberg nets), it incorporates three constraints that we judge to be signi®cant features of biological neural networks. With few exceptions, none of the existing connectionist models attempts to cast these architectural features into a computational and analytical form. Such a model may be used for the study of a number of key issues, including capacity, scaling, speed, stability, consistency and the relationship between network dynamics and synaptic learning rules. 278 We hope that a deeper understanding of the remarkable cognitive capabilities of the brain will also result. 2 Biological constraints on the computational model We modify the standard recurrent architecture to incorporate three salient features of cortical memory systems. These are (1) hierarchical, reciprocally connected areas, (2) lateral inhibition, and (3) Hebbian plasticity. Hierarchical organization. The overall organization of cortical areas suggests a hierarchical structure, with extensive reciprocal interactions between areas (Zeki and Shipp 1988; Felleman and Van Essen 1991). Thalamic input provides sensory information to `primary' sensory areas, which in turn project to `higher' cortical regions. As one example, consider the visual system. A striking characteristic of its organization is that any visual cortical area that projects to another visual area receives a reciprocal connection of equal or greater size. In most cases, these connections are believed to be excitatory. Similarly, the temporal lobe memory system consists of a cascaded series of hierarchical areas, with feedforward and feedback connections between them. Our model represents cortical areas as layers of processing elements. Cortical inhibitory systems. Inhibitory interneurons are a ubiquitous feature of all cortical areas, and a powerful lateral inhibitory system exists in cortex (for example, Szentagothai 1969). Although feedforward and feedback inhibitory schemes have been suggested, the most compelling evidence (Thomson and Deuchars 1994) suggests some form of surround inhibition. In this scheme, activity of a pyramidal neuron leads to inhibition (via an inhibitory interneuron) of neurons that are not nearest-neighbors. A system like this could implement a type of competitive interaction between groups of neurons or cortical columns. A key feature of cortical inhibition is that it is quite long-lasting compared with excitatory input, which may enhance its ability to create competitive interactions among sets of cortical pyramidal cells. Our model also incorporates recent evidence that inhibitory synapses are modi®able (Kano 1994) and thus may contribute to memory storage. Synaptic learning rule. A popular hypothesis for the formation of memories involves a modi®cation of the connection strengths at the synaptic contacts between neurons. Experimental studies have identi®ed a usedependent form of synaptic plasticity, called long-term potentiation (LTP) (Collingridge and Bliss 1995). It is a persistent increase in synaptic ecacy that can be quickly induced. Although there appear to be several forms of LTP, one that has been studied most extensively resembles the synaptic mechanism commonly referred to as Hebbian modi®cation (Brown et al. 1990). While it has been convincingly demonstrated that this form of use-dependent synaptic modi®cation can be triggered at many di€erent synapses, its precise role in learning remains uncertain. A Hebbian mechanism is one that might be particularly useful for unsupervised learning. Dynamics of learning and recall. Relatively little is known concerning the patterns of neural activity that underlie memory storage. One popular hypothesis is that during learning, long-term representations of sensory events are stored in a distributed fashion throughout cortical and subcortical areas, including those regions that are activated by the sensory event. This process may involve the simultaneous, widespread activation of multiple neocortical areas through thalamocortical and other pathways. These activity patterns are encoded as persistent memories through some form of synaptic modi®cation, Hebbian or otherwise. Retrieval (or recall) involves a process of reactivating, in a similar spatial and temporal sequence, the originally activated cortical areas. This idea (Amaral and Price 1984; Rolls 1990) is lent credence by several observations. First, cortical areas receive extensive backprojections from succeeding cortical regions (Felleman and Van Essen 1991), such that activity in `higher' or `association' (i.e., more remote from sensory or thalamic input) cortical regions could reactivate the original patterns in `lower' or `primary' sensory areas. Second, functional imaging studies in humans (Roland and Friberg 1985) indicate that early cortical areas are activated during recall. Our model assumes that `readout' from memory takes place in these early cortical regions, but depends heavily on input signals as well as backprojections. 3 The present work 3.1 Models of cortical memory Existing models of cortex (for example, Wilson and Bower 1992; Traub and Je€erys 1994) tend to focus on biophysical and network-level phenomena that are present in cortical circuits. While this class of models embodies selected aspects of biological realism, it is usually based on numerical representations of cortical neurons and is directed towards the simulation of a restricted range of biological phenomena. This class of model, therefore, is not well suited to the systematic exploration of the relationship between neural architecture and memory function. Connectionist models, on the other hand, have not fully explored the diversity of architectural features, (such as those described above) that are found in biological systems, and therefore contribute only indirectly to our understanding of cortical memory. 3.2 Goals of this study Here we hope to abstract key features of cortical architecture and incorporate them into a neural network model that will allow us to explore how these features might contribute to the design of a biological memory system (learning and recall). Our approach has a number 279 of advantages and novel features. First, we systematically examine the contributions of those biological features considered to be central to cortical associative memory (see Sect. 2). Speci®cally, these include Hebbian learning, interlayer connections, and intralayer connections in the form of lateral inhibition. To our knowledge, this is the ®rst study to examine a model constrained by these key biological features by both mathematical analysis and numerical simulation. Experimental neuroscience provides the neural architectures and cognitive phenomena that guide and motivate the modelling and the mathematical analysis. In turn, the applied mathematics reveals conditions, properties and criteria that will direct and organize experimental strategies and theories associated with cortical memory. 4 The model In this section we derive the neural net model accommodating the features described in Sect. 1. The model is a collection of modi®ed McCulloch-Pitts neurons (with sigmoidal transfer functions canonical in this subject, e.g. Haykin 1994) arranged in layers with reciprocal excitation between layers and lateral inhibition within layers. We implement the Hebbian modi®cation previously described by means of a dynamics, called Hebbian dynamics, for developing synaptic strength (see Sect. 4.2). For clarity we restrict our attention here to the case of two layers and, for the most part, to the case of a neural transfer function that is a step function. We then introduce training, consisting of the application of exogenous stimulation representing input patterns to the network (see Sect. 5.1). The training drives the Hebbian dynamics, in turn generating learning. Following the descriptions of hierarchical organization, local cortical connectivity, cortical inhibition, synaptic learning, and mammalian memory architecture given in Sect. 1, we propose the following model, a layered network with Hebbian learning dynamics (Fig. 1). Cortical areas are represented as layers of modi®ed McCulloch-Pitts neurons. In common with many existing neural network models, there are feedforward excitatory connections between layers. In addition, however, we introduce two salient features of cortical architecture: (1) modi®able recurrent excitatory connections that provide feedback from higher layers to lower layers, and (2) modi®able inhibitory intralayer synapses. 4.1 The network Consider a network of McCulloch-Pitts neurons arranged in l layers (Fig. 1). The (input, output) of neuron i in layer k is uki ; vki †; i ˆ 1; . . . ; N ; k ˆ 0; 1; . . . ; l ÿ 1 where N is the number of neurons in a layer. There is an additional exogenous input to layer zero only, and the value of this input to neurons i is ui , i ˆ 1; . . . ; N . We employ the vector notation uk ˆ . . . ; uki ; . . .†T ; u ˆ . . . ; ui ; . . .†T ; vk ˆ . . . ; vki ; . . .†T ; k ˆ 0; 1; . . . ; l ÿ 1 Consecutively numbered layers are reciprocally interconnected via excitatory synapses. The synaptic strength of a connection from neuron j in layer m to neuron i in layer l is denoted by wlm ij . There is lateral inhibition, so that each neuron is connected via an inhibitory synapse to the neurons in its own layer. Note that if a connection is missing, that value of wlm ij is zero. The synaptic weights (excitatory or inhibitory) are real-valued, as are the neural inputs. We employ the matrix notation W lm ˆ wlm ij †; i; j ˆ 1; . . . ; N where the l; m ˆ 0; . . . ; l ÿ 1 are either equal (the lateral inhibition) or are consecutive integers (the reciprocal excitation). Using  signs to indicate excitation and inhibition and using d, the Kronecker delta, we write the input to layer l as a composition of excitatory feedback from layer l ‡ 1, excitatory input from the previous layer l ÿ 1, lateral inhibition and a constant bias (exogenous input) ul ˆ W l;l‡1 vl‡1 ‡ W l;lÿ1 ÿ W ll vl ‡ dl0 u The output of layer l is vl t† ˆ g ul t ÿ 1††  g ul1 t ÿ 1††; . . . ; g ulN t ÿ 1†††T 1† Fig. 1. Two-layer model schematic. Only a subset of possible connections is illustrated. Excitatory connections are depicted with open arrows, inhibitory connectivity with ®lled arrows. Excitatory connectivity between the layers 0 and 1, and inhibitory connectivity within layers, are described by matrices W in Sect. 4.1 The neural outputs may be real, integer, or binary, depending on the choice of the neural transfer function (the sigmoid g). Here g is the transfer function, `the sigmoid', of the neurons, g may be any one of a variety of sigmoidal functions. The simplest such depends on a threshold h, and is given by  1 1; z  h 2† g z† ˆ 1 ‡ sig z ÿ h†† ˆ 0; z < h 2 The sigmoidal transfer function of our processing elements is subject to the usual biological interpretation. That is, g z† may be interpreted to represent either the mean ®ring rate of a neuron, averaged over some 280 interval, or the instantaneous interspike interval. When we refer to the binary behavior of our model (i.e., 0 or 1), this may be interpreted as either minimal and maximal ®ring rate, or, alternatively, the absence (0) or presence (1) of a spike during a small time interval. In its present form, the model is neutral with respect to the choice of interpretation. Except as noted, we shall, for convenience and ease of presentation, hereafter restrict ourselves to this g z†. We refer to such a layered network as a memory. The processes of storing information in such a memory and of recalling information stored in it will be introduced in Sect. 5 (Learning) and in Sect. 6 (Recall), respectively. We shall see that layer 0 functions as both the input to the memory during learning and the output during recall. The growth of synaptic strength during dynamics must be made to be bounded. We achieve this by introducing matrix valued bounds, WLlm and WUlm , so that 1 (Linsker 1986; Willner et al. 1993) 0  WLlm  W lm  WUlm 7† This agrees with the biological observation that synaptic weights do not increase without bound. Alternative methods for bounding synaptic strength by means of damping are also available, such as Oja's rule and Sanger's rule (see Hertz et al. 1991). For convenience, we shall hereafter restrict ourselves to the case of two layers. 5 Learning 4.2 Hebbian dynamics To introduce learning dynamics, we regard all variables as functions of time. Thus the synaptic dynamics is speci®ed by the following matrix di€erential equations: dW lm ˆ H lm vl t†; vm t ÿ 1†† dt (For convenience, we shall not hereafter indicate the ranges of the superscripts and subscripts when that range is clear from the context. Likewise we shall not indicate the temporal arguments t or t ÿ 1 when confusion will not occur.) H lm is a matrix valued (Hebbian) function to be speci®ed H lm vl ; vm † ˆ Hijlm vl ; vm ††  Hijlm vli ; vmj †††. Consider the following form of H (Linsker 1986; Willner et al. 1993): lm lm lm H lm x; y† ˆ alm 0 xy ‡ a1 x ‡ a2 y ‡ a3 3† which we shall refer to as the base form. This form of H encapsulates our current understanding of the biological mechanisms underlying Hebbian plasticity in the nervous system. We shall use the base form of H for some of our analytical results and for all our computations. For the case of two layers we simplify the notation, using the binary variables k and k~ (not k): ~ dW kk ~ ~ ˆ H k k vk ; vk † dt dW kk ˆ H kk vk ; vk †; dt k ˆ 0; 1 4† and ~ ~ ~ uk ˆ W kk vk ÿ W kk vk ‡ ku 5a† vk ˆ g uk †; 5b† k ˆ 0; 1 Suppressing superscripts for clarity, we may alternatively write the di€erential equations as a recurrence W n ‡ 1† ÿ W n† ˆ sH v n ‡ 1†; v n†† 6† the form that is used in the simulations. Here s is a scaling factor, and n denotes the time clocked in cycles. Learning in our layered cortical model is based on the description of learning in the mammalian memory architecture section of Sect. 1. In this section, we introduce the notion of a feasible memory to characterize convergence of the Hebbian dynamics (the learning). A feasible memory is one whose synaptic weights are such that the memory formally responds with the correct one of a speci®ed set of patterns in response to a stimulus, the latter being one of an appropriate set of noisy cues. A feasible memory is a novel mathematical construct, and it is shown to be speci®ed by a collection of degenerate linear programs. It is a memory that is consistent with a correct recall corresponding to each of a set of noisy cues, were they to be known in advance. A number of properties of such memories are developed to study the learning (and in Sect. 6, the recall) process. Next we carry over the techniques of reinforcement learning analysis to the present self- organizing memory protocol (see Sect. 5.3). This novel use of reinforcement learning techniques will have wide impact on the study of self-organizing memories. For example, using these techniques and the feasible memory construct, we develop convergence criteria for learning in a memory with Hebbian dynamics. As we shall see, correct recall is impossible without memory feasibility, so we next develop conditions for when the learning causes convergence to a feasible memory. It is of interest to know whether a memory, once feasible, stays feasible when subjected to further learning (further Hebbian dynamics). For this purpose the conservation of feasibility under Hebbian dynamics is then addressed, and conditions for it are obtained. Since, as it turns out, there is a wide choice of feasible memories, indeed a polyhedron-full in synaptic weight space, we are motivated to grade these memories for performance, capacity, etc. To do this we employ the principle of maximum information preservation (infomax) to specify a 1 We use the conventional mathematical notation for relations between arrays (matrices, vectors, etc.). For example, a matrix is positive if all its entries are positive. The symbol 0 will denote an array all of whose entries are 0, no matter what the dimension of the array, the last being clear from the context. 281 notion of optimality (in an information theoretic sense) among feasible memories. The mutual information for a neuron is the uncertainty concerning it input that is resolved given its output. An infomax memory is one whose synaptic weights are chosen so as to maximize this uncertainty resolution. 5.1 Training A memory is trained by exposing it to patterns v. Each such pattern corresponds to a subset P of the integers f1; . . . ; N g, the subset indexing which of the exogenous inputs are on when v is presented, that is, which of the components ui ˆ 1. Each pattern is thus a characteristic function v, where  1; i 2 P v i† ˆ 0; i 62 P ; i ˆ 1; . . . ; N We shall also use the spin variable r  2v ÿ 1. Exposure to a pattern means to set the exogenous input u ˆ v in the Hebbian dynamics (4) and (5). Since v i† 2 f0; 1g for each i, then v 2 f0; 1gN . We now introduce the notion of a feasible memory. It is a mathematical construct by means of which a number of key properties of memories, for both learning and recall, will be obtained. For example, in Sect. 6.2 we shall see that correct recall is impossible without memory feasibility. 5.2 Feasible memories The response of a memory is v0 , the output of layer zero, provided that output is stationary. A feasible memory (the terminology coming from linear programming) is one that satis®es a certain collection of static alignment conditions between input and response. In particular, we specify conditions for the weights that correspond to a memory which has as output (that is, which does recall) each of a speci®ed collection of patterns fvl g, l ˆ 1; . . . ; p. Recall of the pattern vl is in response to any member of an associated set of input stimuli Sl ; l ˆ 1; . . . ; p. The input stimuli are to be thought of as noisy versions of the pattern. (Each input stimulus, that is, each element of Sl , takes values in f0; 1gN .) We would usually expect vl itself to be one of the stimuli in Sl , and for de®niteness, we take this to be the case. The Sl could be chosen alternatively as disjoint sets of positive integers, where each Sl indexes the set of noisy cues intended to cause the memory give the output pattern vl , that is, the output v0 ˆ vl . Let sl ˆ jSl j; l ˆ 1; . . . ; p be the number of noisy cues in Sl . We call such a memory, a feasible memory. Let W lm ; U l ; V m denote the weights, inputs and outputs, respectively, of a feasible memory. Then Uk ˆ W k k~ ~ k~ ‡ kvl † ÿ W kk kV k ‡ kv ~ l † ‡ kv ~ s; kV s 2 Sl 8† ~ l ˆ g U k †; kV k ‡ kv k ˆ 0; 1; l ˆ 1; . . . ; p 9† specify the feasibility [cf. (5b)]. That is, (8) and (9) contain the statements that the output of layer 0 is the pattern vl when the exogenous input to layer 0 is the noisy cue vs 2 Sl . The reader can check this: Replace U ; V ; W † in (8) by u; v; W †. Then set k ˆ 0; 1. The result is (5a) for k ˆ 0; 1. Similarly, (9) leads to (5b). If the output of layer 1 is also speci®ed, say as V 1 ˆ v1l , then we also have g Ui1 † ˆ v1l . If vl ˆ v1l , a kind of resonance, (8) and (9) simplify to Uk ˆ W k k~ kk ÿ W †vl ‡ vs vl ˆ g U k † We shall use the notation vl  v0l and the notation V 1  v1l whether or not the latter is speci®ed. The restriction V 1  v1l is, in fact, a loss of generality since V 1 , the output of layer 1, could be di€erent for each input stimulus vs . We make this restriction for clarity and convenience, deferring treatment of the fully general case. Memory feasibility is also speci®ed by systems of linear inequalities (that de®ne a polyhedron in W-space) obtained by suitably combining (8) and (9) and using the spin variable rkl (ˆ 2vkl ÿ 1) annotated by pattern and layer number. Indeed a formal de®nition of a feasible memory is the following: DEFINITION. A memory is called feasible if the following de®nition holds for the input v, the weights W and threshold h: ~ kk kk ~ ~ s ÿ h†  0; rkl á F k W †  rkl á ÿW vkl ‡ W vkl ‡ kv k ˆ 0; 1; s 2 Sl ; l ˆ 1; . . . ; p 10† Here the bold dot (á) indicates that the multiplication is componentwise. We shall not necessarily repeat the bold dot, since this componentwise product, involving the spin, is clear from the context. The quantity F is de®ned in this relation as the indicated abbreviation. The asterisk indicates that  is to be replaced by > for those i for which the spin variable r0l i† ˆ ÿ1. That is, for each l and s the replacement is made for those neurons in layer zero that are intended not to ®re. For ~ the case of resonance, we simply set vkl ˆ vkl here. Note k that F is an N -vector. h indicates the vector of the appropriate dimension all of whose components equal h. We shall write rkl F k (in which the product is taken componentwise) as rk F k or as rk F for convenience as needed. We call W the set of solutions W of (10). That is, W is the totality of feasible memories. Setting2 2 Note that here and throughout we have standard mathematical usage. Namely, bold type is not used to specify the dimension of an array, that is, whether it is a vector or a matrix. The dimension of an array is indicated by the context. 282 00 rl ˆ r0l ; r1l †T ; T v^s ˆ vs ; 0† ; Wˆ and ÿW 10 W Xl ˆ W 01 ÿW 11 ! ; 11† v0l ; v1l †T vl n†; vm n ÿ 1†† 2 f 0; 0†; 1; 0†; 0; 1†; 1; 1†g (10) may be written more compactly as rl WXl ‡ v^s ÿ h†  0; s 2 Sl ; l ˆ 1; . . . ; p 12† A feasible memory is de®ned by these static constraints and not by dynamic behavior. Thus a feasible memory is consistent with the requirement to give a speci®ed set of recalls, each to a di€erent speci®ed collection of cues. We shall see in Sect. 6.2 that memory feasibility is a necessary condition for correct recall, but it is not sucient. Indeed, in a framework wherein a feasible memory is stimulated and then an appropriate recall dynamics is implemented, the speci®ed (correct) recall may or may not occur. Indeed no recall is possible, the entire process being initial-state-sensitive. This behavior is compatible with the operation of recurrent neural nets generally. It is also consistent with our own behavioral experience. 5.3 Convergence of the Hebbian dynamics (the training) We may borrow techniques of the analysis of the perceptron training algorithm to develop a convergence result for the Hebbian dynamics. This is a novel use of reinforcement learning techniques, carrying them over to the study of self-organizing memory dynamics. To establish convergence, we introduce the following three hypotheses.3 (For convenience, we omit the superscripts on W and H.) A0 : W > 0 A1: W á H g uk n ÿ j††; vk n ÿ j††  c W ; for some c > 0 A2: H 2  M 2 ; W  jM 2 ; A0 is an additional constraint to be imposed for memory feasibility. The arguments of H (componentwise) are the vertices of the unit square [cf. (2)]. In particular for constants M and j > 0 14† Then A1 will be valid if the following condition holds: W H  minfH 0; 0†; H 1; 0†; H 0; 1†; H 1; 1†g W or in the base case (3), if W H  minfa3 ; a2 ‡ a3 ; a1 ‡ a3 ; a0 ‡ a1 ‡ a2 ‡ a3 g W Indeed, the constant c in A1 could be chosen to be the value of the minimum in these equations. The four candidates for this minimum are not all positive. However, as a practical observation based on simulations (not reported in this work), we noted that the sequence of values of H that appears during the training dynamics, while not a positive sequence, is positive on the average. Thus a suitable modi®cation of the perceptron algorithm argument that incorporates averaging is suggested and is developed in the next section. A2 is easily satis®ed since W 2 ‰WL ; WU Š [see (7)]. 5.4 Averaging We appeal to the asymptotic multiscale theory of recurrences to develop an averaging approach for the training algorithm. While an averaged convergence result is weaker (in the mathematical sense) than the one just obtained in Sect. 5.3, the averaging will allow us to drop A0, A1 and any associated biological constraints. For a recurrence relation of the form (6), this multiscale theory (Miranker 1981) gives W ˆ s Hav n ‡ O 1†† With these hypotheses, we derive the following lower bound (13a) and upper bound (13b) for W n† where J is the matrix of ones (see Appendix A for details). These three hypotheses are not biologically realistic, and so in the next section on averaging, we show how to deduce the convergence result without recourse to them. That is, in Sect. 5.4 we obtain convergence of the Hebbian dynamics in a biologically realistic context. Here, Hav is a matrix de®ned in Appendix B where other details of this derivation may also be found. This is the projected averaging result, and it enables us to deduce the needed lower bound for W n† above without recourse to A1. Nor for that matter is A0 needed since division by W (componentwise) as performed in Appendix A is now out of the picture. jW n†j  sncJ ÿ jW 0†j 13a† 5.5 Convergence to a feasible memory 13b† Notice that at n ˆ 0 the two bounds are compatible. However, for n suciently large, the bounds cross and a contradiction results. This implies that W n† converges in ®nitely many time steps. We shall refer to this ®nite number of steps as n0 . Since feasibility is an indispensable memory requirement, we give an approach to answering the question: If convergent, when does the training result in a feasible memory? In the following theorem, a condition for this is given in terms of an appropriate matrix C, a correlation matrix of the patterns (details are given in Appendix C): 3 In the following and throughout, we have standard mathematical usage. Namely, the absolute value of a matrix (or of any array) denotes the matrix (or array) of absolute values. THEOREM. A necessary condition for training to result in a feasible memory is that the admissible cues lie in the jW n†j  p p nM s s ‡ 2jM†J 283 eigenspace of C ˆ P a b ! v i†v j† . Here the a; b† f a;b†g vary over the unit square [cf. (14)], and the relevant v vary over the output patterns. 5.6 Conservation of feasibility We ask whether the property of memory feasibility is invariant under Hebbian dynamics. One biological implication of conservation of feasibility (under Hebbian dynamics) is that once a memory has learned a collection of patterns and is in fact a feasible memory, further learning will not destroy already acquired essential (for recall) memory properties. We now develop conditions for which r  F W n††  0 implies that r  F W n ‡ 1††  0 [which is the formal mathematical statement of conservation of feasibility, cf. (10)]. That is, we develop conditions for when the set W of feasible memories is invariant under the Hebbian dynamics. Let Mlk denote the number of ones in the set fvkl i†; i ˆ 1; . . . ; N g. Then the condition ~ Mlk  M k 15† (that is, the quantities Mlk are independent of l) gives conservation of feasibility. See Appendix D for details. 5.7 Infomax Among the feasible memories W , we should like to choose those which are optimal, and the principle of maximum information preservation (see Linsker 1986; Haykin 1994) furnishes a means for doing so. We use the well-known concept of an infomax net (Linsker 1986; Haykin 1994) ± one where the synaptic weights resolve the most uncertainty about a neural input based on knowledge of the output. By means of a closed form example to be given in Sect. 6 on recall, we shall see that the infomax technique o€ers a way to grade memories by performance with clear biological implications. As is well known, the mutual information for a neuron is the uncertainty concerning its input that is resolved given its output. Let wi ; i ˆ 1; 2; . . . be the weights of the input synapses, and let xi be the corresponding inputs. The mutual information denoted P I y; x†, where y is an output of a neuron and where wi xi is the input to that neuron, is given by (Haykin 1994) X wi ‡ constant I y; x† ˆ ÿ log i Although we do not restrict our network to pure Gaussian input, by minimizing the sum of all the synaptic weights in our memory network (that is, a global infomax) we will apply the information theoretic criteria above to design our network. In particular, by performing the X lm min wij i;j;l;m subject to the memory constraints (10), we can specify a best global memory in an information theoretic sense. The optimization problem is a collection of linear programs (Dantzig 1963) indexed by the choices of v1l . We call a solution of such a linear program WI , and the collection of such solutions we call WI . For linear programming (to be pursued computationally in Sect. 7), we write the constraints (10) in the canonical form A m; n†Z n†  B n† 16† where Z n† ˆ Wijab A m; n† ˆ ral i†vbl j†  aˆb a 6ˆ b B m† ˆ ÿral i† avs ‰lŠ i† ÿ h† Here, vs ‰lŠ denotes the noisy cue vs where, in particular, the l stresses that the cue lies in the collection Sl : vs ‰lŠ j† denotes the jth component (input) of that cue. Setting  j†  ˆ s; l; N ; l; N † s; a; i; b; where4 s ˆ maxl jSl j, the relation between the new indices m and n and the former ones are m ˆ m i; b; l; s† ˆ i als ‡ a ls ‡ ls ‡ s; 0  i  i; 0  l  l; 0  s  s 0  a  a; and n ˆ n j; b; l; a† ˆ jbi a ‡ bi a ‡ i a ‡ a;  0  j  j; 0  b  b In the case that the jSl j are di€erent, we simply make each jSl j ˆ s by augmenting each Sl with redundant cues as necessary. Thus the order of the matrix A is m i ÿ 1; b ÿ 1; l ÿ 1; s††  n j ÿ 1; b ÿ 1; i ÿ 1; a††. 5.8 Open questions When do the Hebbian dynamics achieve a global infomax memory in WI ? Is the set WI invariant under the Hebbian dynamics? That is, if an infomax memory is once achieved, is the infomax property maintained under further learning? Does the set WI ever consist of one point? That is, when is the infomax memory unique? 6 Recall The process of recall is directly modelled on the biological description given in the architecture section 4 Recall (cf. Sect. 5.2) that jSn j denotes the number of noisy cues in Sl ; l ˆ 1; . . . ; p: 284 of Sect. 1. As a preliminary simpli®cation, we assume that weights are frozen during recall. We believe that this weight freezing is not mandatory, but a formal demonstration of this is yet to be done. This might proceed along the lines of the conservation of feasibility property that is described in Sect. 5.6. Layer 0 serves as both the input layer during learning and the output layer during recall (as described in Sect. 2.; cf. Roland and Friberg 1985). Input stimuli are presented to layer 0 and the network is allowed to relax (via the recall dynamics, to be introduced), reaching a recall that is the output of layer 0. This resultant output state may or may not be a correct recall, this being dependent on the initial state of the memory when it is ®rst exposed to an input stimulus. In this section, the indispensability of memory feasibility is demonstrated by showing that feasibility is a necessary condition for correct recall. Further we show that the recall dynamics is strongly stable at a feasible memory (see Sect. 6.3). This implies that each of the di€erent memory records de®nes a basin of attraction. A notion of best memory results from this analysis. A best memory is speci®ed by a geometric condition characterizing memories that have the largest basins of attraction, on the average. Lyapunov and averaging techniques are then used to obtain a local convergence result for the recall dynamics (see Sect. 6.4). This tells us that if the recall process gets suciently close to a retrieval, that retrieval in fact occurs. This results in a characterization of neuron gain functions which are appropriate. A closed form example (of our neural network model) is then given (see Sect. 6.5) to show that the recall dynamics applied to a feasible memory does not always give the correct recall. Indeed it may give none at all, since the entire process is highly initial-statesensitive. The closed form example also exposes a number of possible memory properties, including how the notion of infomax may be used to grade the recall process itself. Some conjectures dealing with the superiority in performance (in terms of speed and consistency) of an infomax memory are suggested by this example. Finally, we develop a number of mathematical properties of the recall process: stability, local convergence, global convergence. The global convergence result for recall is developed for the case of smooth neuron transfer functions, and this further characterizes such functions. 6.1 Recall dynamics Recall dynamics consists of freezing the synaptic weights, that is to set the right member of (4) to zero, to set the exogenous input u ˆ vs (the stimulus) and to rewrite (5b) as a recurrence, as follows: k k~ k ~ s u n† ˆ W v n† ÿ W v n† ‡ kv k vk n ‡ 1† ˆ g uk n††; kk k n ˆ 0; 1; . . . 17† Anticipating the theorem giving necessity of feasibility in Sect. 6.2, let us suppose that the memory is feasible, and let us place a bar on the W that appears in (17). Then we combine (17) with the feasibility constraints (8) to obtain (for k ˆ 0 and 1) u0 n† ÿ U 0 ˆ W 01 v1 ÿ V 1 † ÿ W 00 v0 ÿ v0l † u1 n† ÿ U 1 ˆ ÿW 10 v1 ÿ V 1 † ÿ W 11 v0 ÿ v0l † Similarly, v0 n ‡ 1† ÿ v0l ˆ g u0 n†† ÿ g U 0 † v1 n ‡ 1† ÿ V 1 ˆ g u1 n†† ÿ g U 1 † Then setting x ˆ u0 ; u1 †T , y ˆ v0 ; v1 †T , X ˆ U 0 ; U 1 †T , Y ˆ v0l ; V 1 †T , the recall dynamics of a feasible neural net may be written as y n ‡ 1† ˆ g x n†† 18† with x n† ˆ X ‡ W y n† ÿ Y † ˆ Wy n† ‡ v^s ; n ˆ 0; 1; . . . 19† the last following from X ÿ WY ˆ v^s , by de®nition. The vector of initial neuronal outputs y 0†, a vertex of the unit square, must be known independently, and it plays a critical role in recall, as we shall see in (31). Note that we have placed a bar on W appearing in (19). We do this to con®ne our attention to feasible memories, since we shall see in Sect. 6.2 that memory feasibility is a necessary condition for recall. 6.2 Memory recall Corresponding to an input u, a memory is said to produce the recall v if the sequence v0 n† generated by (18) and (19) is ®nitely convergent to v. While this makes no demand on the limiting nature of v1 n†, in order to obtain some of the results to follow we shall also require that v1 n† be ®nitely convergent as well, so that the recall state of the entire memory net shall be stationary. We use the notation, u ! v, to denote this, namely, that the memory gives the response v to the cue u. The following theorem allows us to con®ne our attention to feasible nets: THEOREM. A necessary condition for the recall dynamics to give vs ! vl , s 2 Sl , l ˆ 1; . . . ; p is that the memory is feasible [that is, that the weights of the memory obey the feasibility constraints (10)]. PROOF. We rewrite the recall dynamics (17) as uk n† ˆ P k n† ‡ pk n† and vk n ‡ 1† ˆ g uk n†† Here 285 ~ ~ k~ ~ 0 † ‡ kv ~ s n† ‡ kv0l † ÿ W kk kvk n† ‡ kv P k n† ˆ W k k kv l dUi0 n†  Ui0 ÿ h 20† and ~ ~ kk † v0 n† ÿ v0 † pk n† ˆ kW k k ÿ kW l 21† This may be checked by direct substitution. By hypothesis v0 n† ˆ vl and v1 n† ˆ V 1 , say for n > n0 [cf. (16)] suciently large. Then also for n > n0 we have from (20) that pk n† ˆ 0, and from (21) that uk n† ˆ P k n†  U k , say. Then ~ ~ 1 ~ 0 † ‡ kv ~ s U k ˆ W k k kV ‡ kv0l † ÿ W kk kV 1 ‡ kv l Then g Ui0 ‡ dUi0 n†† ˆ v0l i†. Combining this with (7), we have dVi0 n ‡ 1† ˆ 0 24† For the input-output at time n ‡ 1, we have Ui0 ‡ dUi0 n ‡ 1† ˆ W 01 Vi0 ‡ dVi0 n ‡ 1†† ÿ W 00 Vi0 ‡ dVi0 n ‡ 1†† ‡ v0l ˆ W 01 Vi0 ÿ W 00 Vi0 ‡ v0l by (24). The right member here is Ui0 , by de®nition. Then dUi0 n ‡ 1† ˆ 0. Then from (22) with n replaced by n ‡ 1, we get v0 n ‡ 1† ˆ v0l . and v0l ; V 1 †T ˆ g U 0 †; g U 1 ††T The last two equations are the feasibility constraints in the form (8) and (9), which demonstrates the theorem. Note: An open question concerns the case of recall when the synaptic weights are allowed to continue development. In particular, under what conditions does a feasible memory stay feasible during such an active recall process? If feasibility is conserved under the Hebbian dynamics (cf. Sect. 5.6), then the theorem here allows us to eliminate the need to freeze the weights W after training. Best memory. The hypothesis concerning constraint inactivity, as it used here, suggests an alternative notion to the optimal memory described previously in the context of infomax. A `best memory' may be de®ned as one for which the constraints are `the most inactive'. Then a best memory corresponds to the case when the polyhedron of feasible memories contains the largest possible inscribed sphere, and where the weights W of that memory are at the center of that sphere. Such a memory would be the most robust biologically. That is, it would have `the most ground to give' through any process of degradation of synaptic weights. 6.3 Stability of recall dynamics We would expect cortical memories to have basins of attraction relevant to each memory trace, so that noisy cues do give recall. This result is provided by the following stability considerations. The recall dynamics is described by the sequence uk n†; vk n††, n ˆ 0; 1; . . . We show that if u0 n† gets suciently close to the value U 0 of a feasible memory, then it converges at the next step. That is, u0 n ‡ 1†; v0 n ‡ 1†† ˆ U 0 ; V 0 †. As we shall see, suciently close means: to within threshold of. Thus the recall dynamics is strongly stable at a feasible memory (a local and ®nite convergence result). Let k k k v n† ˆ V ‡ dV n† 22† uk n† ˆ U k ‡ dU k n† k Vi ‡ dVi n ‡ 1† ˆ g Uik We consider an alternate local convergence demonstration is valid for the case of di€erentiable transfer functions g whose derivative g0 > 0. The positivity of g0 is a biologically feasible requirement, since biological neurons typically increase their output as the net depolarization due to synaptic inputs increases. Using the notation in (19), we may write the inputoutput relation of a feasible memory, 01 00 10 11 U 0 ˆ W g U 1 † ÿ W vl ‡ vs U 1 ˆ W g U 0† ÿ W V 1 as X ˆ Wg X † ‡ v^s Then using (1) we have k 6.4 Local convergence (di€erentiable g) ‡ dUik Comparing this with (19), we see that x ˆ X is an equilibrium point of the recall dynamics. Next, to show that the equilibrium point x ˆ X is an attractor, we introduce a Lyapunov function. Set n†† from which follows upon setting k ˆ 0, that v0l i† ‡ dVi0 n ‡ 1† ˆ g Ui0 ‡ dUi0 n†† 23† Suppose no feasibility constraint is active at u n†; vk n†† (that is, u0 n† 6ˆ h). In particular, that k h < u0i n† ˆ Ui0 ‡ dUi0 n†; 25† 8i such that vl i† ˆ 1 Suppose further that dUi0 n† is suciently small. In particular, that Wh x† ˆ Wg X † ‡ v^s 26† and g ˆ h x† 27† Next consider the function X Z gn i† 1 hÿ1 z† dz Hn ˆ ÿ gTn Wgn ‡ 2 i 28† 286 where the sum is over all components gn i† of gn . To see that Hn is a Lyapunov function, we require that W be ÿ1 symmetric and that W > 0. (Recall that a matrix is positive if all its entries are positive.) For details, see Appendix E. 6.5 Nonuniversality of recall Our experience tells us that cortical memories give wrong answers or no answers on occasion. These features ®t into our feasible memory model. Indeed the memory state when a recall process is commenced impacts the outcome. To show that the local result is the best we can do, we give an example showing that the recall dynamics of a feasible memory does not always give the correct recall. Indeed, it may give none at all. Consider the following two-layer, two-neuron net without lateral inhibition: We have v0 n ‡ 1† ˆ g w01 v1 n† ‡ x† 29† v1 n ‡ 1† ˆ g w01 v0 n†† Corresponding to the two (input, output) pairs x ˆ 1; V 0 ˆ 1† and x ˆ 0; V 0 ˆ 0†, we have 1 ˆ g w01 V 1 ‡ 1† and 0 ˆ g 0† That is, we have the following (degenerate) constraint polyhedron: w01 V 1 ‡ 1 > h > 0 30† At time zero the state of the net outputs, y 0† ˆ v0 0†; v1 0††T , may be any vertex of the unit square [see (14)]. We show in the following table the net's response to the input cue, x ˆ 0. 0, 1, 0, 1, 0 0 1 1 < h; < h < h; > h > h; < h > h; > h 0 1 1 1 0 2 1 2 0 1 2 2 0 (1, 0) (0, 1) 1 Infomax. This special model net allows us to solve for the infomax values of the weights in closed form. Indeed, we see that the minimum value of w01 ‡ w10 subject to the constraint (30) (plus the non-negativity constraints w01 ; w10  0) occurs at w10 ˆ 0 and w01 ˆ h ÿ 1†=V 1 . For V 1 ˆ 1, we see that these values of w01 ; w10 † ˆ 0; h ÿ 1† lie in the ®rst column of the table (31). From this we make the following enticing observations for this example: (i) the infomax net always gives a recall which is moreover correct, (ii) the average time to recall is smallest for the infomax net. To what extent these properties prevail for general infomax nets (and their cortical implications) is a question we leave for later resolution. 6.6 Global convergence (di€erentiable gain functions) We conclude with a second observation on di€erentiable transfer (or gain) functions. Suppose that max jg0 j < c (a property of transfer functions that we expect to be valid in cortical neurons). Then we use the mean value theorem to derive a global convergence result for the recall dynamics. We use the notation in (19), and we set dx ˆ x ÿ X . Then if for some constant q; ckWk  q < 1, we may show that limn!1 dx n† ˆ 0. We refer to Appendix 9 for details. 7 Computational implementation Our numerical simulations were designed to generate feasible solutions (i.e., feasible memories) and to study the recall dynamics of the best such in the infomax sense (see Sect. 5.7). We compute the range of feasible solutions that emerges from the memory design. We then select the most favorable solutions from an infomax perspective and characterize the behavior of these networks during recall. This characterization includes the dynamical behavior when presented with external stimuli (noisy cues), and a description of the accuracy of recall. 7.1 Model architecture (31) The values that the feasible weights w01 ; w10 † may have are indicated by the inequalities along the top row in (31); the initial values y 0† are labelled in the column at the left. The net oscillates in the indicated two cases in the right-most column 1; 0† $ 0; 1††. That is, the net gives no recall at all in these two cases. In all other cases, a recall is achieved in the number of cycles as displayed. In the lower right-hand corner, an incorrect recall v0 ˆ 1† occurs. All other recalls are correct v0 ˆ 0†. In Appendix F we indicate how table (31) is derived. The basic architecture of a two-layer model is shown in Fig. 1. The input vector bu€ers exogenous stimulus values and allows them to be presented to the layer 0 for a ®xed simulation epoch. Neuronal outputs are given by (1). For visualization purposes, stimuli are generated as two-dimensional pixel patterns and converted to vectors of exogenous stimuli. As described in Sect. 5.2, activity patterns in layer 0 are viewed as the `memory' evoked by the stimulus. Thus, the `activity' of layer 0 can be displayed as a two-dimensional pixel array ± a feature that was useful during exploratory simulations to determine whether a pattern had been successfully stored and recalled. For the experiments described below, each layer had four elements. 287 Fig. 2. Examples of the network trajectories observed during recall. Two are convergent, and two are oscillatory. Convergent trajectories could correspond to correct, incorrect, or spurious recalls. All other trajectories were classi®ed as oscillatory 7.2 Infomax design and recall dynamics As described above (see Sect. 5.7), the constraints (12) can be represented as a collection of linear programs [see (14)]. The experiments described below involved two overlapping patterns in layer 0 as `target patterns' (P1 and P2 , say) and each was associated with two `noisy cues'. The pattern of activity in layer 1 that corresponded to each target and its associated noisy cues was speci®ed. Using the linear program procedure in SAS (v6.09), we computed the solutions W [see (10)] for all possible memories. The possible memories are indexed by the number of possible choices of v1l i† ˆ 1; . . . ; N ; l ˆ 1; . . . ; p. This number is 2pN , which equals 256 for our simulation. Of these, 225 states yielded feasible solutions. To study recall dynamics, we applied the stimuli to the network and examined the sequence of network states that evolved over time. These were classi®ed as either (a) convergent correct or convergent incorrect (if a steadystate dynamics was achieved): (b) spurious (if a steady state was reached that did not correspond to either of the design patterns); or (c) oscillatory (if a steady state was not achieved within a speci®ed number of steps). Fig. 3. Distributions of recall states for all (225) feasible solutions. The three groups of columns re¯ect data sorted according to the applied stimulus, that is all stimuli, or whether or not they were in the design set. Each block of columns contains data over all possible starting values of the network Two classes of dynamical states were observed (Fig. 2). Convergent states were those in which the network eventually reached a stable state. Interestingly, only a small number of unique states were seen. The number of `correct' states might be interpreted as a measure of the success of the network in recovering the target state from a stimulus that consisted of either a `noisy' cue or the target state itself. States that converged to the other stored pattern were labelled as incorrect, and might be viewed as `confusional errors'. The remaining convergent states were considered `spurious', since they did not correspond to any of the patterns in the memory design. Presumably, the associated basins of attraction result from interference or overlap e€ects between the stored patterns. Only a small number of such states was observed, however. For example, no `zero' states were seen, or states in which all the neurons in layer 0 were on. Similarly, a limited number of unique oscillatory states were observed, suggesting that only a few `limit cycle' attractors were formed by the present design. Examples of trajectories of each of these dynamics are given in Fig. 2. Distributions of the dynamics illustrated in Fig. 2 are shown for all feasible solutions in Fig. 3, for all best feasible solutions (objective ˆ 2:53), in Fig. 4 and for all worst feasible solutions (objective ˆ 4:51) in Fig. 5. Fig. 4. Distributions of recall states for all best feasible solutions. This is similar to Fig. 3, but data are selected only from the four best feasible solutions (objective value ˆ 2.53). Note that when design stimuli are applied to the network, the number of correct recall states exceeds that of the other categories 288 Fig. 5. Distributions of recall states for all worst feasible solutions. This is similar to Fig. 4, but data are selected only from the 17 worst feasible solutions (objective value ˆ 4.53). The correct recall states are much fewer when compared with those generated by the best feasible solutions (Fig. 4) 8 Discussion The nature of learned representations, their distribution, mechanisms of encoding, and access during recall have been the subject of much theoretical and experimental work. The model described herein begins to address some of the systems-level architectures that may be important in mammalian memory. Some of the issues that our model can begin to address include: How does plasticity at inhibitory synapses contribute to memory storage? How does the dynamic interaction between subcortical and cortical areas encode memory? How do the di€erent representations encoded at stages (layers) in the memory hierarchy contribute to the memory process, formation and recall? Role of inhibition. Numerous neural network studies have examined the role of inhibitory connections in the context of associative memory, and these generally involve the use of inhibitory connections to generate competitive interactions among processing elements (Hertz et al. 1991; Haykin 1994). In contrast, Baird and Eeckman (1993) used a constant local inhibitory feedback to embed periodic attractors in a recurrent network architecture. The principal role of the (®xed) inhibitory connections was to endow the system with oscillatory dynamics. In our model, the inhibitory weights do more than simply implement a competitive network. Since they are modi®able, they contribute towards memory dynamics in the same way that excitatory connections do. The existence of use-dependent plasticity in inhibitory systems has recently received some empirical support (Kano 1994). We view the modi®able intralayer inhibitory connections in our model as a mechanism for generating an e€ective representation within the multilayer hierarchy. These representations depend on the dynamic interaction between neuronal activity within a layer and the signals impinging on the layer from above and below. The present model does not employ within-layer excitatory contacts, and some aspect of the intralayer code could be implemented by such connections. Future studies that introduce intralayer excitation will address this possibility as well as the stability issues that arise from the positive feedback that such connections introduce. Recall dynamics. The sensitivity to initial conditions of the recall dynamics was a surprising ®nding, especially its degree. With the memory designs studied, recall is not `perfect' but is a function of the dynamical history of the network. This suggests a mechanism that could underlie errors during recall of sequences. If we view the state of the network at any instant as the background `context' against which new stimuli are presented, then the ability of the network to converge to a stored memory state (given a previously associated stimulus) depends on this context. Thus, convergence to a spurious attractor is more likely if the network state happens to lie closer to the spurious attractor than the desired memory at the time the stimulus is presented. It may be that some type of `priming' stimulus might increase the proportion of correct recall at the expense of spurious or incorrect states. Alternatively, we might view the starting state of the network as `noise' that corrupts the stimulus, and increases the likelihood of error in recall. A more detailed analysis of the e€ect of initial state on recall dynamics will be required to quantify the robustness of the network against both extraneous and endogenous noise. A key question is how biological nets deal with this initial state sensitivity. It may be that additional cascaded layers result in more robust recall dynamics, or that some form of preprocessing (such as saccades or attention) overcomes initial state sensitivity. Our future studies will probe and illuminate such possibilities. Comparison with existing architectures. A number of other models incorporate, as we do, feedforward and feedback weights, and modi®able lateral inhibitory connections. For example, the Adaptive Resonance Theory (ART) model (Grossberg 1987), the `wake-sleep' algorithm (Hinton et al. 1995), and the Bidirectional Associative Memory (BAM) architecture (Kosko 1992) all employ bottom-up and top-down streams. Our model di€ers from these in several ways. First, we place few restrictions on the properties of the interlayer weights. Unlike BAM, the feedforward and feedback weights are independently speci®ed. We also specify a common learning algorithm for all layers, unlike ART and wakesleep, which use di€erent training methods for the bottom-up and top-down streams. Second, training of the feedforward and feedback weights proceeds simul- 289 taneously, unlike the phased training by means of a Hebb-like mechanism, in contrast to the winner-take-all algorithm used in ART. Indeed, this is a feature that our model shares with networks that employ anti-Hebbian learning to decorrelate the output of a layer (e.g., FoÈldiaÂk 1990) or to maximize the mutual information transmission through the network (Plumbley 1990). Implications for models of mammalian memory. A leading systems-level model of mammalian memory is that of Squire and Zola-Morgan (1991). In describing the role of temporal lobe structures involved in memory, they argue that neocortical structures support perceptual processes as well as short-term memory. Projections from the activated cortical regions enter medial temporal lobe structures (including perirhinal and entorhinal cortex and hippocampus). They propose that (i) the hippocampal areas are specialized for forming conjuctions, or associations between individual elements of the sensory event; and (ii) these `bindings' are used for later retrieval. Distributed activity in cortical networks may represent aspects of the sensory world; in the case of area TE, this may re¯ect, for example, visual object quality. For this distributed activity to develop into a stable long-term memory, activity must occur at the time of learning along projections from these neocortical regions to the medial temporal lobe. The pathways involved include the parahippocampal gyrus, perirhinal cortex, and entorhinal cortex. Models of memory that have, as a central component, a dynamic interaction between subcortical and cortical areas have also been proposed by other workers (Grossberg 1987; Rolls 1990; Miller 1991; Mumford 1994). It is generally believed that this interplay between cortical regions is restricted to certain categories of associative memory, such as those involving integration over space or some complex array of environmental cues. An attractive feature of our model is that it incorporates these features (intralayer inhibition and reciprocally connected layers) in a form that is both analytically tractable and extensible to larger and more complex networks. Future work. From the perspective of arti®cial neural nets, our system would be termed a self-organizing, cascaded, bidirectional, autoassociative memory, with both excitatory and inhibitory connections. However, since our objective herein was to obtain results and insight into the working of cortical memory, the performance of our model as an arti®cial neural net is not at all in question. Nevertheless, the mathematical results and techniques developed here will form the basis for future studies of capacity and other traditional measures of neural network performance. Our simulations suggest that, with appropriately chosen synaptic weights, a simple, multilayer recurrent network displays a form of associative memory. Future work will focus on determining which particular forms of Hebbian algorithms can embed memories that can be reliably recalled. In addition, it will be interesting to explore the e€ects of scaling layer number and size, and methods by which `priming' might increase the accuracy of recall. Finally, we have assumed that memories are stored as `®xed points' in the state space of the network, and that the oscillatory states represent unwanted dynamics. Certain aspects of perceptual and mnemonic processing appear to involve oscillations in ensembles of neurons (Gray and Singer 1989). It may be that oscillatory memory states have better robustness and convergence properties than point attractors (Liljenstrom and Wu 1995), and oscillatory dynamical behaviour of our model might be exploited for this purpose. Acknowledgement. The research reported here was supported by the Neuroengineering and Neuroscience Center (NNC) at Yale University. Appendix A In this appendix, we derive the bounds (13). From (6), we have W n† ˆ s n X jˆ1 H g u n ÿ j†; v n ÿ j††† ‡ W 0† Then multiplying this relation componentwise by W (the corresponding matrix of weights of a feasible memory), we have W W n†  snW min H g u n ÿ j†; v n ÿ j††† ‡ W W 0† j A:1† Also from (6), we deduce (componentwise) that jW n ‡ 1†j2 ÿ jW n†j2 ˆ s2 H 2 ‡ 2sW n†H A:2† Then from (A.1) (componentwise), H0 and H1 in Sect. 5.3 imply (componentwise) jW n†j  snc ÿ jW 0†j From (A.2) and H2 we deduce that (componentwise) jW n ‡ 1†j2 ÿ jW n†j2  s s ‡ 2jM†M 2 Setting j ˆ n here and summing from j ˆ 0 to n ÿ 1, we get (componentwise) p p jW n†j  nM s s ‡ 2jM† Appendix B For a recurrence relation of the form (6), the multiscale theory (Miranker 1981) gives W n† ˆ W0 s† ‡ O s†; s ˆ sn where W0 s† obeys the di€erential equation nÿ1 dW0 1X ˆ lim H g u j†; v j††† n!1 n ds jˆ0 The right member here is the average of H as its arguments vary appropriately over vertices of the unit square [see (14)]. Calling this average Hav , we have dW0 ˆ Hav ds 290 so that Now in (C.6) alternately set a; b† equal to an element of the set f a; b†g, where W ˆ W0 0† ‡ Hav s ˆ s Hav n ‡ O 1†† ~ k; ~ k†; k; ~ k†g ~ f a; b†g ˆ f k; k†; k; k†; and add. We get ! " # X X kk kk kk kk ril Bil ‡ rij Cij vkl j† dij 0ˆ Appendix C j To develop the theorem stated the Sect. 5.5, we begin by writing the Hebbian dynamics for n > n0 (13) as follows: 0 ˆ H v n†; v n†† ˆH ab val ; vbl † ‡ dij " C:1† 0ˆA ‡ Bab val ‡ C ab vbl ‡ Dab val vbl ‡R ab C:2† l† ab ab ab A ;B ;C ;D † ! @ @ @2 H val ; vbl †jva ˆvbl ˆ0 1; a ; b ; l @vl @vl @val @vbl ˆ R W kk i; j† ˆ dij 2 1 @ ‡ 4 2 @vbl !2 a 2 val ˆvbl ˆ0 3 H ab val ; vbl †5 v † val ˆvbl ˆ0 R3 ˆ O ‡ 0ˆ ‡ i† ‡ Cijab vbl j† ‡ a Dab ij vl i†vbl j† ‡ Rab ij l† C:3† j where sl ˆ sl i†† is a slack vector whose components obey the following constraints: rl i†sl i†  0; i ˆ 1; . . . ; N C:5† ~ s i† ˆ kv ˆ Now multiply (C.3) by a scaling factor and sum over j. We get (using the Kronecker delta) X ab ab ab a b 0ˆ rij Aij ‡ Dab ij vl i†vl j† ‡ Rij l†† j ‡ X j dij X l # ~ vkl j†   ab ab a b rijab Aab ij ‡ Dij vl i†vl j† ‡ Rij l† X C:7† l  ~~ ~~ ~ ~ ~~ ~~ ~ ~ rilkk Bilkk ‡ rilkk Bilk k ‡ rijkk Cijk k ‡ rijkk Cijkk C:9† val j† ‡ rilab Bab il C:6† X j rijab Cijab vbl j† X X j C:10† a b rijab Dab ij vl i†vl j† f a;b†g Xh k~ k k k k~ k k~ k rijkk Dkk ij vl i†vl j† ‡ rij Dij vl i†vl j† j ~ ~~ ~ ~ ~~ ~ ~ k kk kk k k k ‡ rijkk Dkk ij vl i†vl j† ‡ rij Dij vl i†vl j† so that (C.10) becomes  X X ab  ab ÿsl i† ˆ h ‡ rij Aij ‡ Rab ij l† j i C:11† C:12† f a;b†g Recall that the sl i† obey the condition in (C.5). Setting k ˆ 0; 1, alternately in (C.11), we get 0ˆ Xh 0 0 10 10 1 0 rij00 D00 ij vl i†vl j† ‡ rij Dij vl i†vl j† j 0 1 11 11 1 1 ‡ rij01 D01 ij vl i†vl j† ‡ rij Dij vl i†vl j† rijab ! f a;b†g ~~ ~~ rijk k Cijkk # Also set and write the vector of feasibility constraints [see (10)] also componentwise as follows: X kk X kk~ ~ ~ s i†ÿh ˆ sl i† ÿ W i; j†vkl j† ‡ W i; j†vkl j†‡ kv C:4† j l ‡ ~ j f a;b†g vbl †3 a Bab ij vl ! ~ ~ s i† ÿsl i† ˆ h ÿ kv  X X ab  ab ab a b rij Aij ‡ Dab ‡ ij vl i†vl j† ‡ Rij l† vb †2 ‡ R3 Note the components of val and vbl take on the values 0 and 1 only, and so some care must be taken when the smallness of this remainder is needed. In fact we shall apply these considerations in the base case when Rab  0. Write the matricial equation (C.2) componentwise: Aab ij ~~ ~~ rilk k Bilk k ~ ~ vkl j† and For the remainder R3 , we have val ‡ rijkk Cijkk vkl j† X X X ~ 2 3 !2 14 @ ab a b 5 l† ˆ H vl ; vl † 2 @val ! ~ # l and ab rilkk Bkk il X ~ ~ rijk k Cijkk Now comparing (C.4) and (C.7), we make the following assignments:  X ~ ~ ~ kk k k~ kk ÿW i; j† ˆ dij rilkk Bkk ‡ rijkk Cijkk ‡ rijkk Cijkk C:8† il ‡ ril Bil where ab ‡ dij j ‡ ~ ~ rilk k Bilk k l " ‡ ! X l ‡ dij where the indices a; b† are vertices of the unit square [see (14)]. We ask when this relation is consistent with the feasibility constraints. To see this we expand Hab in (C.1) in a power series with remainder in terms of its two arguments. Doing this, we obtain ab l " i C:13† and vs i† ˆ Xh 1 1 10 10 1 0 rij11 D11 ij vl i†vl j† ‡ rij Dij vl i†vl j† j 0 1 01 01 0 1 ‡ rij01 D01 ij vl i†vl j† ‡ rij Dij vl i†vl j† i C:14† 291 This is a set of equations for the 4N 2 scaling factors rijab . There are two equations for each value of i; s and l. Recalling that jS Pl j is the number of stimulus cues per pattern, we see there are 2N plˆ1 jSl j equations. Since the system (C.13), (C.14) is singular, the number of unknowns must exceed the number of equations. This gives the condition p X lˆ1 jSl j < 2N Each equation in (C.13), (C.14) has 4N unknowns. Additional conditions involving the patterns vl and the stimuli vs may be derived by applying the well-known conditions for the solutions of linear systems of equations (e.g., Golub and van Loan 1989) to (C.13), (C.14). These take the form of restrictions on the collection of permissible noisy cues relative to the patterns. In the base case, this restrictions may be derived as follows. We note that in this case [see (3)], Rab ˆ 0 and ÿ ab ab ab ab   ab ab ab ab  A ;B ;C ;D ˆ a3 ; a1 ; a2 ; a0 Then in this case the expressions (C.8), (C.9), (C.12), (C.13), (C.14) simplify somewhat. In particular (C.12) becomes X X ab ab ÿsl i† ˆ h ‡ a3 rij j f a;b†g Dab ij In particular, is independent of i. Take rij to be independent of i also and set 0 1 X va i†vb j†A Cˆ@ f a;b†g Thus ®nding a set of multipliers as a solution of the relevant linear systems here requires that vs 2 eigenspace C (an orthogonality condition: see Golub and van Loan 1989). That is, the admissible noisy cues must lie in the eigenspace of an appropriate matrix of output pattern correlations (cf. Haykin 1994, p. 200€.). ~ Gk ˆ M k ÿ M k †h vk n ‡ 1†; 1† Now the Hebbian hypothesis, concerning positive and negative reinforcement of synaptic strength, requires that sig h vk n ‡ 1†; 1† ˆ 2vk n ‡ 1† ÿ 1: Then Gk cannot be expected to be of one sign. ~ Thus according to this approach, we must set M k ˆ M k . That is, set k G ˆ 0, to obtain rF W n ‡ 1††  0, the result we seek. Improved invariance results could be obtained by using a possible positivity of F k to allow some negativity in Gk . Appendix E To show that Hn given in (28) it is a Lyapunov function (under the conditions on W stated in Sect. 6.4), ®rst note that, by direct substitution of (20), 1 T ‡ gn †T W gn‡1 ÿ gn † g 2 n‡1 X Z gn‡1 i† hÿ1 z†dz ‡ Hn‡1 ÿ Hn ˆ ÿ i gn i† Next using the mean value theorem of the integral calculus, we get 1 T ÿ gn †T W gn‡1 ÿ gn † g 2 n‡1 ‡ hÿ1 g†T gn‡1 ÿ gn † Hn‡1 ÿ Hn ˆ ÿ where g is the `intermediate value' occurring in that theorem. Let x ˆ hÿ1 g† Then using (26), (27), we ®nd 1 T gn‡1 ‡ gn †T W gn‡1 ÿ gn †‡ xT gn‡1 ÿ gn † 2   1 T ˆ ÿ ÿ h xn‡1 † ‡ h xn ††T W ‡ xÿT gn‡1 ÿ gn † 2     xn‡2 ÿ x xn‡1 ÿ x T ÿ1 xn‡2 ÿ x xn‡1 ÿ x ‡ ÿ ˆ ÿ2 W 2 2 2 2 Hn‡1 ÿ Hn ˆ ÿ Appendix D Appendix F To establish a condition for the conservation of feasibility, we begin by noting that as the expressions in (6) and (10) are linear in W , we have rk á F k W n ‡ 1†† ˆ rk á F k W n†† ‡ srk á Gk where the N -vector For the convenience of the reader, we give the derivation of the entry 0 in the ®rst row, ®rst column of (31) [the (0;0)n(< h; < h) entry]. The arguments for developing the remaining tabular entries are similar, albeit more rami®ed. Since the stimulus x ˆ 0, the recall dynamics is given by v0 n ‡ 1† ˆ g w00 v1 n†† ~ ~ Gk ˆ ÿH kk vk ‡ H kk vk Recall that vki n† ˆ vk i†: Then for each pattern P , the ith component of G is 8 P 00 0 P ÿ H v n ‡ 1†; 1† ‡ Hij01 v0i n ‡ 1†; 1†v1 j†; > > > j2P ij i j > < Gi ˆ P 11 1 P 10 1 k ˆ 0 1 > ÿ n ‡ 1†; 1†v j† ‡ v Hij vi n ‡ 1†; 1†; H > > > j ij i j2P : kˆ1 Let Mlk denote the number of ones in the set fvkl i†; i ˆ 1; . . . ; N g: ~ Then the condition Mlk ˆ M k gives conservation of feasibility. If lm lm Hij x; y† ˆ Hi x; y†; that is, if all the Hebbian functions are independent of j, we may write v1 n ‡ 1† ˆ g w10 v0 n†† Since we are in the ®rst row of the table, v0 0† ˆ v1 0† ˆ 0, so that v0 1† ˆ g w01 v1 0†† ˆ g 0† v1 1† ˆ g w10 v0 0†† ˆ g 0† However, g 0† ˆ 0 since the threshold h is positive. That is, v0 1† ˆ v1 1† ˆ 0 as claimed. Appendix G ~ ~ Gki ˆ ÿM k Hikk vki n ‡ 1†; 1† ‡ M k Hik k vkj n ‡ 1†; 1† Next suppose that Hilm x; y† ˆ h x; y†. Then To see that dx ! 0 under the hypothesis on W given in Sect. 6.6, we subtract transfer (or gain) relations to get 292 vk n ‡ 1† ÿ V k ˆ g uk n†† ÿ g U k †  k † uk n† ÿ U k † ˆ g0 U by the mean value theorem. Now setting dvk ˆ vk ÿ V k , we may write this as [cf. (5b)] ~ ~ s  k † W kk vk~ ÿ W kk vk ‡ kv dvk n ‡ 1† ˆ g0 U k k~ ~ kk ~ s †† ÿ W V k ÿ W V k ‡ kv Using the notation in (20), we may rewrite this as the following vector relation: dx n ‡ 1† ˆ g0 Wdx n† The result follows from this under the hypothesis on W given in Sect. 6.6. References Amaral DG, Price JL (1984) Amygdalo-cortical projections in the monkey (Macaca fascicularis). J Comp Neurol 230:465±496 Baird B, Eeckman F (1993) A normal form projection algorithm for associative memory. In: Hassoun M (ed) Associative neural memories: theory and implementation. Oxford University Press, Oxford Brown TH, Kairiss EW, Keenan CL (1990) Hebbian synapses: biophysical mechanisms and algorithms. Annu Rev Neurosci 13:475±511 Collingridge GL, Bliss TV (1995) Memories of NMDA receptors and LTP. Trends Neurosci 18:54±56 Dantzig, GB (1963) Linear programming and extensions. Princeton University Press, Princeton, NJ Felleman DJ, Van Essen DC (1991) Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex 1:1±47 FoÈldiaÂk P (1990) Forming sparce representations by anti-Hebbian learning. Biol Cybern 64: 165±170 Golub, GH, van Loan CF (1989) Matrix computations, 2nd edn. Johns Hopkins University Press, Baltimore Gray C, Singer W (1989) Stimulus-speci®c neuronal oscillations in orientation columns of cat visual cortex. Proc Natl Acad Sci USA 86:1698±1702 Grossberg S (1987) Competitive learning: from interactive activation to adaptive resonance. Cogn Sci 11:23±63 Haykin S (1994) Neural networks: a comprehensive foundation. Macmillan, London Hertz J, Krogh A, Palmer RG (1991) Introduction to the theory of neural computation. Addison-Wesley, Reading, Mass Hinton GE, Dayan P, Frey BJ, Neal RM (1995) The wake-sleep algorithm for unsupervised neural networks. Science 268:1158± 1161 Kano M (1994) Calcium-induced long-lasting potentiation of GABAergic currents in cerebellar Purkinje cells. Jpn J Physiol 44 [suppl 2]:S131± S136 Kosko B (1992) Neural networks and fuzzy systems. Prentice-Hall, Englewood Cli€s, NJ Liljenstrom H, Wu XB (1995) Noise-enhanced performance in a cortical associative memory model. Int J Neural Syst 6:19±29 Linsker R (1986) From basic network principles to neural architecture. Proc Natl Acad Sci USA 83:7508±7512 Miller R (1991) Cortico-hippocampal interplay and the representation of context in the brain. Springer, Berlin Heidelberg New york Miranker WL (1981) Numerical methods for sti€ equations. Reidel, Dordrecht Mumford D (1994) Neuronal architectures for pattern-theoretic problems. In: Koch C, Davis JL (eds) Large scale neuronal theories of the brain. MIT Press, Cambridge, Mass Plumbley MD (1993) Ecient information transfer in anti-Hebbian neural networks, Neural Networks 6:823±833 Roland PE, Friberg L (1985) Localization of cortical areas activated by thinking. J Neurophysiol 53:1219±1243 Rolls E (1990) Functions of neuronal networks in the hippocampus and of backprojections in the cerebral cortex in memory. In: McGaugh JL, Weinberger NM, Lynch G (ed) Brain organization and memory. Oxford University Press, Oxford Squire LR, Zola-Morgan S (1991) The medial temporal lobe memory system. Science 253:1380±1386 Szentagothai J (1969) Architecture of the cerebral cortex. In: Jasper HH, Ward AA, Jr, Pope A (eds) Basic mechanisms of the epilepsies. Little, Brown, Boston Thomson AM, Deuchars J (1994) Temporal and spatial properties of local circuits in neocortex. Trends Neurosci 17:119±126 Traub RD, Je€erys JG (1994) Simulations of epileptiform activity in hippocampal CA3 region in vitro. Hippocampus 4:281±285 Willner B, Miranker WL, Lu C-P (1993) Self-organization of the locomotive oscillator. Biol Cybern 68:307±320 Wilson M, Bower JM (1992) Cortical oscillations and temporal interactions in a computer simulation of piriform cortex. J Neurophysiol 67:981±995 Zeki S, Shipp S (1988) The functional logic of cortical connections. Nature 359:311±317