Simbol

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Cognitive Systems Research 3 (2002) 429–457

www.elsevier.com / locate / cogsys

The physical symbol grounding problem


Action editor: Tom Ziemke
Paul Vogt
IKAT /Infonomics, Universiteit Maastricht, P.O. Box 616, 6200 MD Maastricht, The Netherlands
Received 14 May 2001; accepted 5 November 2001

Abstract

This paper presents an approach to solve the symbol grounding problem within the framework of embodied cognitive
science. It will be argued that symbolic structures can be used within the paradigm of embodied cognitive science by
adopting an alternative definition of a symbol. In this alternative definition, the symbol may be viewed as a structural
coupling between an agent’s sensorimotor activations and its environment. A robotic experiment is presented in which
mobile robots develop a symbolic structure from scratch by engaging in a series of language games. In this experiment it is
shown that robots can develop a symbolic structure with which they can communicate the names of a few objects with a
remarkable degree of success. It is further shown that, although the referents may be interpreted differently on different
occasions, the objects are usually named with only one form.
 2002 Elsevier Science B.V. All rights reserved.

Keywords: Symbol grounding problem; Embodied cognitive science; Symbolic structure; Robots

1. Introduction 1987) and the symbol grounding problem (Harnad,


1990). These problems arise because symbols are
This paper tries to show how symbols can be defined as internal representations, which are sup-
redefined to describe cognitive functions within the posed to relate to entities in the real world.
paradigm of embodied cognition. Traditionally, A recent approach in cognitive science tries to
cognitive scientists describe cognition in terms of overcome these problems by describing cognition in
symbol systems (Newell & Simon, 1976; Newell, terms of the dynamics of an agent’s interaction with
1980). This is very useful because they assume that the world. This novel approach has been called
cognitive agents manipulate symbols when they embodied cognitive science (e.g. Pfeifer & Scheier,
think, reason or use language. However, when 1999). In embodied cognitive science it is assumed
explaining the processes underlying such (higher) that intelligence can be described in terms of an
cognitive functions in terms of symbol manipulation, agent’s bodily experiences that are acquired through
at least two major fundamental problems arise: the its interaction with its environment. That is, an
frame problem (McCarthy & Hayes, 1969; Pylyshyn, agent’s intelligence should be based on its past
interactions with the physical world. Within this
E-mail address: p.vogt@cs.unimaas.nl (P. Vogt). paradigm, the frame problem and the symbol

1389-0417 / 02 / $ – see front matter  2002 Elsevier Science B.V. All rights reserved.
PII: S1389-0417( 02 )00051-7
430 P. Vogt / Cognitive Systems Research 3 (2002) 429–457

grounding problem are avoided to some degree, discussing the symbol grounding problem (Dorffner
because it is argued that symbolic representations are et al., 1993; Prem, 1995) and to illustrate their ideas
no longer necessary to implement intelligent be- they have simulated some aspects in a connectionist
haviors (Brooks, 1990). model of language acquisition (Dorffner, 1992). The
But is this true? Are symbols no longer necessary? major difference between their work and the current
Indeed much can be explained without using sym- work is that Dorffner’s model, besides being a
bolic descriptions, but most of these explanations connectionist model, has been tested only in simula-
only dealt with low-level reactive behaviors such as tions of language acquisition, whereas this work
obstacle avoidance, phototaxis, simple forms of invokes a concrete robotic experiment that investi-
categorization and the like (Pfeifer & Scheier, 1999). gates language evolution. As the ideas behind their
Higher cognitive functions such as language process- approach is very similar to the one presented below,
ing have been modeled most successfully using they will not be discussed further.
symbolic representations. This can be inferred from In the next section, the traditional cognitivist
the fact that most natural language processing appli- approach, the embodied cognitive science approach,
cations use symbolic processing, either hand-coded and their problems in relation to symbols are shortly
or acquired statistically from large corpora (see, e.g., reviewed. This review paves the way for introducing
ACL, 1997). It might therefore still be desirable to an alternative interpretation of symbols. As will be
describe higher cognition in terms of symbol sys- argued, the newly defined semiotic symbols are
tems. However, to overcome the symbol grounding meaningful within themselves and fit well within the
problem, the symbol system has to be embodied and embodied cognition paradigm. From there on, the
situated (Pfeifer & Scheier, 1999). It has to be article will present a model by which agents can
embodied in order to experience the world and it has acquire a set of semiotic symbols of which the
to be situated so that it can acquire its knowledge meaning is grounded through the agents’ interactions
through interactions with the world (see, e.g., (Sun, with their environment. The model (explained in
2000) for a discussion). This leads to the central Section 3) is based on the language game model that
question of this paper: Is it possible to define and has been proposed by Luc Steels to study the origins
develop an embodied and situated symbol system? and evolution of language (Steels, 1996b). Section 4
This paper argues that this could be possible by will present the results of a concrete experiment in
adopting Peirce’s triadic definition of symbols. To which robotic agents develop a set of semiotic
distinguish this triadic definition from the traditional symbols using the language game model. The results
definition, these symbols will be called semiotic will be discussed in Section 5. Finally, Section 6
symbols. As will be shown these semiotic symbols concludes.
are both embodied and situated. The proposed ap-
proach combines the paradigm of physical grounding
(Brooks, 1990) with the symbol grounding problem
(Harnad, 1990) such that semiotic symbols are 2. Symbols
grounded inherently and meaningfully from the
physical interactions of robots with their environ- 2.1. Symbols as internal representations
ment. It will be argued that this approach reduces the
symbol grounding problem into a technical problem Traditionally, symbols are defined as internal
which will be called the physical symbol grounding representations with which computations can be
problem (Vogt, 2000b). To illustrate how a system of carried out. As mentioned, this approach is subject to
semiotic symbols can be constructed from scratch, a some fundamental problems such as the frame
concrete robotic experiment will be presented. problem and the symbol grounding problem. In this
A similar argumentation has recently been put section, a brief review of the classical cognitivist
forward in AI and cognitive science by Dorffner and approach is given, together with a discussion of the
colleagues (Dorffner, Prem, & Trost, 1993; Prem, symbol grounding problem. Although the frame
1995). They apply the theory of semiotics for problem is closely related, the discussion in this
P. Vogt / Cognitive Systems Research 3 (2002) 429–457 431

paper concentrates on the symbol grounding prob- symbols should be grounded from the bottom-up by
lem. invariantly categorizing sensorimotor signals.
The discussion starts with the physical symbol Harnad proposes that this should be done in three
system hypothesis as put forward by Newell and stages:
Simon (Newell & Simon, 1976; Newell, 1980). This
hypothesis states that physical symbol systems are 1. Iconization: Analogue signals need to be trans-
sufficient and necessary conditions for intelligence. formed to iconic representation (or icons).
Physical symbol systems (or symbol systems for 2. Discrimination: ‘‘[The ability] to judge whether
short) are systems that can store, manipulate and two inputs are the same or different, and, if
interpret symbolic structures according to some different, how different they are.’’
specified rules. 3. Identification: ‘‘[The ability] to be able to assign
In Newell and Simon’s definition, symbols are a unique (usually arbitrary) response—a ‘name’—
considered to be patterns that provide distal access to to a class of inputs, treating them all as equivalent
some structure (Newell, 1990). These are internal or invariant in some respect.’’ (Harnad, 1990, my
representations that can be accessed from some italics)
external structure. (Where external is relative to the
pattern, hence it could be some other pattern.) Iconization and discrimination, according to Harnad
Although Newell (1990) admits that the relation yield sub-symbolic representations; symbols are the
between a symbol’s meaning and the outside world result of identification. Hence, identification is the
is important, he leaves it unspecified. Other related goal of symbol grounding. The process of identifica-
conceptions of a symbol are also used in the tion is task dependent (Sun, 2000). As will be argued
cognitive science community. De Saussure, for in- in this paper, using language is a task that is
stance, defines a sign as a relation between a particularly suited to do the identification. This is
meaning and some arbitrary form / label, or more mainly because language through its conventions
concretely as ‘‘a link between . . . a concept and a offers a basis for invariant labeling of the real world.
sound pattern’’ (De Saussure, 1974, p. 66). Harnad In Harnad’s work symbols are still defined as
defines a symbol as an ‘arbitrary category name’ names for categories of sensorimotor activity. As
(Harnad, 1993). All these definitions assume that such the symbol grounding problem relates to the
symbols are internal representations. cognitivist paradigm, which concentrates on internal
These notions of symbols fit well with the mind as symbol processing (Ziemke, 1999). As will be
a computer metaphor. However, symbols in com- argued, symbols could also be viewed as structural
puters only have a meaning when interpreted by an couplings between reality and sensorimotor activa-
external observer; computers manipulate symbols tions of an agent that arises from the agent–environ-
without being aware of their meaning. Naturally, this ment interaction. This reality may be a real world
is not the way human cognition works. Humans are object or some internal state. When symbols are
very well capable of interpreting the symbols, which structures that inherently relate reality with internal
they manipulate for instance during thought or while structures, they are already meaningful in some sense
using language; they need no external observer to do and the symbol grounding problem is not a fun-
this. Therefore, one would like to have symbols that damental problem anymore.
agents can interpret themselves.
This problem led Searle to formulate his famous 2.2. Symbols or no symbols?
Chinese Room argument (Searle, 1980), which will
not be discussed here. It also led Harnad to formulate To overcome the problems of the cognitivist
his symbol grounding problem (Harnad, 1990). As approach, embodied cognitive science came around
argued, symbolic manipulation should be about in the late 1980s and gained popularity ever since.
something and the symbols should acquire their The approach has strong roots in artificial intelli-
meaning from reality. This is what Harnad calls the gence, where it also became popular under the terms
symbol grounding problem. According to Harnad, nouvelle AI and behavior-based robotics. Besides in
432 P. Vogt / Cognitive Systems Research 3 (2002) 429–457

AI, it also has many roots in other disciplines of coupled parallel processes and can be learned. Cog
cognitive science such as psychology (Gibson, has learned, for instance, to detect human faces and
1979), linguistics (Lakoff, 1987), philosophy gaze directions, to control hand–eye coordination, to
(Boden, 1996), and neuroscience (Edelman, 1987; saccade its eyes and to interact socially with humans
Johnson, 1997). (Brooks et al., 1998). All these behaviors are based
The essence of this modern approach will be on the physical grounding hypothesis.
discussed here briefly in line with the argumentation Although many behaviors can be explained by the
brought by Brooks who introduced the physical physical grounding hypothesis, the question still
grounding hypothesis (Brooks, 1990, 1991). This remains whether it is able to explain higher cognitive
hypothesis states that intelligence should be functions. The assumption taken in this paper is that
grounded in the interaction between a physical agent intelligent behaviors such as thought and language
and its environment. Furthermore, according to this do require some form of symbolic representation.
hypothesis, symbolic representations are no longer The reason for this is twofold: First, as scientists like
necessary. Intelligent behavior can be established by to describe overt behavior in terms of symbol
parallel operating sensorimotor couplings. manipulation, it is very useful to have a proper
When, as Brooks argues, symbolic representations definition that fits well within the embodied cogni-
are no longer necessary, it could be argued that the tion paradigm. This makes it easier to ascribe
symbol grounding problem is no longer relevant symbols to embodied cognitive agents from an
since there are no symbols (Clancey, 1997; Pfeifer & observer’s perspective. The second reason is that in
Scheier, 1999). Another important aspect of the order to facilitate higher cognitive functions such as
physical grounding hypothesis is that intelligent language, agents might actually need symbols that
behaviors are often emergent phenomena (e.g. they can manipulate. The question remains how are
Pfeifer & Scheier, 1999). This means that intelligent symbols represented?
behaviors may arise from mechanisms that appear
not to be designed to perform the observed behavior. 2.3. Symbols as structural couplings
The physical grounding hypothesis lies at the heart
of embodiment and situatedness. Intelligence is As already discussed, the traditional approach to
embodied through an agent’s bodily experiences of cognitive science and AI is confronted with problems
its behavior and it is situated through the agent’s such as the frame problem (McCarthy & Hayes,
interaction with the world. 1969; Pylyshyn, 1987), the symbol grounding prob-
Brooks and others showed that much of an agent’s lem (Harnad, 1990) and the Chinese Room ‘problem’
surprisingly intelligent behavior can be explained at (Searle, 1980). At the heart of these problems lies
the level of sensorimotor control (e.g. Steels & the fact that the traditional symbols are neither
Brooks, 1995; Arkin, 1998; Pfeifer & Scheier, 1999). situated nor embodied, see, e.g., (Clancey, 1997;
Very simple mechanisms that connect agents’ sensors Pfeifer and Scheier, 1999) for broad discussions on
with its motors can exhibit rather complex behavior these problems. As mentioned, the physical ground-
without requiring symbolic representations (Braiten- ing hypothesis (Brooks, 1990) doubts the necessity
berg, 1984). of symbolic representations, but if they would be
An example is the famous Cog experiment of necessary they should be both situated and embodied
Brooks and his colleagues (e.g. Brooks, Breazeal, (Clancey, 1997). In this section a definition of
Irie, Kemp, Marjanovic, ´ Scassellati, & Williamson, symbols will be given that is both situated and
1998). Cog is a humanoid robot that can mimic some embodied from an agent’s point of view.
human-like behaviors by connecting its sensory
stimulation to some actuator response. Its behaviors 2.3.1. Semiotic symbols
are controlled according to the behavior-based Various scientists from the embodied cognitive
paradigm. Several behaviors are modeled in a science field assume that when symbols should be
layered organization of sensorimotor couplings. necessary to describe cognition, they should be
These behavior modules are implemented as loosely defined as structural couplings connecting objects to
P. Vogt / Cognitive Systems Research 3 (2002) 429–457 433

their categories based on their sensorimotor projec- The sign (or semiotic symbol) is often illustrated
tions (Clancey, 1997; Maturana & Varela, 1992). as a semiotic triangle such as the one introduced by
There is, however, already a definition of a symbol Ogden and Richards (1923). The triangle displayed
that comes very close to such a structural coupling. here (Fig. 1) only differs from the original one in its
This alternative definition stems from the work of terminology.1 The dotted line in the figure indicates
Peirce (Peirce, 1931–1958). that the relation between form and referent is not
Peirce’s theory extends the semiotics of De Saus- always explicitly observable.
sure. While De Saussure defines a sign as having a
meaning (the signified) and a form (the signifier)
(De Saussure, 1974), Peirce also includes its relation 2.3.2. The meaning of meaning
to a referent. A sign consists of what he calls a The term meaning requires special attention. It has
representamen, interpretant and an object. These are been (and still is) subject to much debate in
defined as follows (Chandler, 1994): philosophy and the cognitive sciences in general.
Representamen: the form which the sign takes This work tries to distill a definition of meaning that
(not necessarily material). is suitable within the context of the current in-
Interpretant: . . . the sense made of the sign. vestigation.
Object: to which the sign refers. According to Peirce, a semiotic symbol’s meaning
According to Peirce, the sign is called a symbol if arises in its interpretation (Chandler, 1994). As such
the representamen in relation to its interpretant is the meaning arises from the process of semiosis,
either arbitrary or conventionalized, so that the which is the interaction between form, meaning and
relationship must be learned. In this respect the referent. This means that the meaning depends on
representamen could be, for instance, a word-form as how the semiotic symbol is constructed and with
used in language. The interpretant could be viewed what function. This is comparable to the notion of
‘‘as another representation which is referred to the meaning that is prominent in embodied cognitive
same object’’ (Eco, 1976, p. 68). The object can be science, where meaning depends on ‘‘the way we
viewed as a physical object in the real world, but perceive the overall shape of things . . . and by the
may also be an abstraction, an internal state or way we interact with things with our bodies’’
another sign. In the experiments described below, the (Lakoff, 1987, p. 292).
objects will be physical objects and the interpretant So, the meaning of semiotic symbols can be
will be represented by a category that is formed from viewed as a functional relation between a form and a
the visual interaction of a robot with the real world. referent. This relation is based on an agent’s bodily
Often the term symbol is used to denote the
representamen (e.g. Ogden & Richards, 1923; Har-
nad, 1993). Many scientists, including Peirce, tend
to ‘misuse’ the term sign when referring to the
representamen. However, the sign was originally
defined by Peirce as the triadic relation (Chandler,
1994). In this paper the triadic interpretation of the
sign is adopted as the definition of the symbol,
provided that the representamen of the sign is either
arbitrary or conventionalized. In order to distinguish
this definition from the traditional interpretation, this
Fig. 1. The semiotic triangle illustrates the relations that constitute
alternative interpretation of the symbol shall be
a sign. When the form is either arbitrary or conventionalized, the
called a semiotic symbol. sign can be interpreted as a symbol.
Also a more familiar terminology is adopted.
Following Steels, the representamen is called a form, 1
In Ogden and Richards’ original diagram, the term symbol was
the interpretant a meaning and the object a referent actually used instead of form. In addition, they call the meaning a
(Steels & Kaplan, 1999). thought or reference.
434 P. Vogt / Cognitive Systems Research 3 (2002) 429–457

experience and interaction with a referent. The This meaning is solely based on the infants’ visual
experience of an agent is based on its history of interaction with, and categorization of the toy.
interactions. Each interaction between an agent and a Naturally, when the infants further interact with the
referent can activate its past experiences bringing object, e.g. by playing with it, they expand their
forth a new experience. The way these bodily meaning of the object and they come to learn more
experiences are represented and memorized form the of its functions.
internal representation of the meaning. The actual As Ziemke and Sharkey do, one might still argue
interaction between an agent and a referent ‘defines’ that robots cannot use semiotic symbols meaningful-
the functional relation. ly, since they are not rooted in the robot as the robots
In the experiment described below, robots develop are designed rather than shaped through evolution or
a system of semiotic symbols through communica- physical growth (Ziemke & Sharkey, 2001; Ziemke,
tive interactions called language games. The robots 1999). In that respect, they continue, today’s robots
have a very simple body and can only visually do not use semiotic symbols meaningfully, since
interact with objects and, in principle, point at them whatever task they might have stems from its
by orienting towards the objects. In the language designer or is in the head of a human observer. With
games, communication is only used to name a this in mind, it will be assumed that robots, once
visually detected referent (in case of the speaker) and they can construct semiotic symbols, they do so
to guess what visually detected referent the speaker meaningfully. This assumption is made to illustrate
names (in case of the hearer). The function (or use in how robots can construct semiotic symbols meaning-
Wittgenstein’s terminology) of this naming is only to fully.
focus the hearer’s attention towards a referent such
that it, for instance, can go to or point at the referent. 2.3.3. Physical symbol grounding
The meaning of the semiotic symbol in such a Adopting Peirce’s triadic notion of a symbol has at
guessing game is conveyed by an agent’s perception, least two advantages. One advantage is that one may
categorization and naming of the referent, together argue that the semiotic symbol is per definition
with, in case of the hearer, an appropriate reaction. grounded, because the triadic relation (i.e. the
Such meanings may change dynamically over time semiotic symbol) already bears the symbol’s mean-
as the robots (visually) interact more with the ing with respect to reality. As meaning is defined by
referents. the functional relation between the form, meaning
This use of meaning may not seem realistic, and referent, one might argue that the use of such
because the communication of semiotic symbols semiotic symbolic structures ‘‘are meaningful to
have no meaning with respect to the robots’ survival begin with’’ (Lakoff, 1987, p. 372).2 Hence the
as, for instance, pointed out by Ziemke and Sharkey symbol grounding problem is no longer relevant. Not
(2001). But it is very much similar to the way infants because symbols do not exist anymore as argued by,
seem to construct meanings upon their first visual e.g. Brooks, but because the semiotic symbols are
encounter with some object. This is nicely illustrated already meaningful. This does not mean, however,
by a series of experiments reported in (Tomasello that there is no problem anymore, the symbol
and Barton, 1994). In these experiments, infants are grounding problem is no longer a fundamental
shown a, for them, novel toy together with an problem of interpreting symbols meaningfully. The
invented name (i.e. a non-existing word), such as problem is reduced to the process of semiosis, which
‘‘toma’’. After that, the toy is hidden and the infants is defined by Peirce as the interaction between the
are requested to find the ‘‘toma’’. Even if they do not referent, meaning and form. The semiosis can be
know where the toy is hidden, the infants are very viewed as the process of constructing the semiotic
successful in recognizing the ‘‘toma’’ when they find triangle. Implementing this in autonomous systems,
it. So, although the children are not yet able to grasp
the entire meaning of the toy (they do not know, for 2
Note that Lakoff does not explicitly apply this quote to
instance, what they can or should do with it), they semiotic symbols, but his argument is similar to the one expressed
presumably do form some meaning for the object. here.
P. Vogt / Cognitive Systems Research 3 (2002) 429–457 435

however, remains a hard problem. This problem shall The semiotic definition provided by Peirce yields a
be addressed as the physical symbol grounding structural coupling between the real world object and
problem (Vogt, 2000b) and will be treated as a some internal representation (or a sensorimotor acti-
technical problem. vation pattern), especially in the process of semiosis.
The physical symbol grounding problem is a Since the semiotic symbol is defined by a relation
combination of the physical grounding problem and between a form, meaning and referent, its meaning is
the symbol grounding problem. It is based on the an intrinsic property bearing the relation to the real
idea that symbols should be grounded (cf. Harnad, world. Hence, it could be argued that the semiotic
1990) and the idea that they should be grounded by symbol is per definition grounded and the symbol
physical agents that interact with the real word (cf. grounding problem is not relevant anymore. How-
Brooks, 1990). To solve the physical symbol ground- ever, there still remains the problem of constructing
ing problem, the three phases of symbol grounding a semiotic symbol. Rather than a fundamental prob-
identified by Harnad (1990) are still relevant. The lem, this will be treated as a hard technical problem
next section will show how these phases (iconiza- and it is addressed as the physical symbol grounding
tion, discrimination and identification) can be problem. The semiotic symbols acquire their mean-
modeled in a concrete experiment. ing through perception and categorization as a
The second advantage is that the semiotic symbol process that conveys the semiotic symbols’ names to
is situated and embodied. It should be acquired the referents they stand for.
through some interaction between a physical agent As argued, and as the experimental results will
and its environment. This allows to connect semiotic reveal, semiotic symbols are constructed most effi-
symbols with embodied cognition. ciently in communication. Language development
So constructing semiotic symbols is required for gives rise to the development of meaning and vice
communication. Every time a semiotic symbol is versa.
used, it is (re-)constructed by its user. As a result, the
semiotic symbols are not static, but may change
dynamically whenever the agent–environment inter- 3. Adaptive language games
action requires so. In communication the form needs
to be conventionalized (although still arbitrary to 3.1. Synthetic modeling of language evolution
some extend). Establishing conventions about a
semiotic symbol’s form in relation to its referent is From the second half of the 1990s, research at the
particularly useful for the invariant identification of Vrije Universiteit Brussel and at the Sony Computer
the referent. As a referent is perceived differently Science Laboratory in Paris is focussed on the study
under varying circumstances, it may be categorized of language origins and evolution. These studies are
differently. As will be shown, language use helps to based on the language game model as proposed by
identify the sensing of the referents invariantly. Steels (1996b). In this model language use is the
Therefore, it is assumed that meaning co-evolves central issue in language development as has been
with the language (Steels, 1997a), such that meaning hypothesized by the ‘father of language games’
can also be viewed as a cultural unit (cf. Eco, 1976). (Wittgenstein, 1958). Steels hypothesized three
Similar arguments have been put forward to explain mechanisms for language development: cultural
various aspects of language development (see, e.g., interaction, individual adaptation and self-organiza-
Whorf, 1956; Lakoff, 1987). tion.
In this model, agents have a mechanism to ex-
2.3.4. Summary change parts of their vocabulary with each other,
In this subsection a definition of a semiotic symbol called cultural interaction. When novel situations
has been adopted that provides an alternative to the occur in a language game, agents can expand their
traditional definitions. As Clancey has argued, a lexicons. In addition, they evaluate each ‘speech act’
symbol within the embodied cognition paradigm on their effectiveness which they use to strengthen or
should be some structural coupling (Clancey, 1997). weaken used form-meaning associations. These
436 P. Vogt / Cognitive Systems Research 3 (2002) 429–457

mechanisms are called individual adaptations. Al- studied separately in simulations (Steels, 1996c) and
though agents have been implemented with mecha- on mobile robots (Vogt, 1998b; De Jong & Vogt,
nisms that model local behavior, such as com- 1998).
municating to another agent and individual adapta- Other experiments have been done in which the
tions, the iterative combination of these mechanisms discrimination game has been coupled to the naming
yields the emergence of a global coherence in the game in simulations (De Jong, 2000; Belpaeme,
agents’ language. This process is very similar to the 2001), on immobile robots called the ‘Talking
self-organizing phenomena that have been observed Heads’ (Belpaeme, Steels, & van Looveren, 1998;
in many biological, physical and cultural systems Steels & Kaplan, 1999) and on mobile robots (Steels
(Prigogine & Strengers, 1984; Maynard-Smith & & Vogt, 1997; Vogt, 2000b). Again, and as will be
´
Szathmary, 1995; Maturana & Varela, 1992). Self- shown in the next section, these experiments re-
organization is thought to be a basic mechanism of vealed that the three mechanisms are very powerful
complex dynamical systems. to explain how lexicons can evolve in co-evolution
The above mechanisms for lexicon development with the meaning without implementing any prior
have been tested extensively under different settings knowledge of the lexicon and meaning.
by the researchers in Brussels and Paris to investi- Another variant of the language game that is
gate various aspects of lexicon origins, evolution and worthwhile mentioning is the imitation game that has
development. These aspects include lexicon forma- been used to study the origins of human-like vowel
tion (Steels, 1996b), lexicon dynamics (Steels & systems (De Boer, 1997, 2000a). De Boer showed in
McIntyre, 1999), multiple word games (Van his experiments, that agents can develop repertoires
Looveren, 1999) and stochasticity (Steels & Kaplan, of vowels that are very similar to vowel systems
1998). All these experiments reveal that the language observed in human languages. These systems
game model (also called the naming game in relation emerged based on the three mechanisms proposed by
to lexicon formation) is a strong model to explain Steels as mentioned above. A crucial factor in the
lexicon formation in terms of the nurture paradigm. experiments’ success is the realistic simulation of the
The model is very similar to those that have been vocal tract and auditory system with which the
successfully studied by, e.g., Batali (1998), Oliphant agents are modeled. It is important to realize that in
(1999) and Kirby and Hurford (1997). It contrasts the experiments none of the vowels have been
the nativist approach advocated by, e.g., Chomsky preprogrammed, neither are their features as the
(1980), Pinker and Bloom (1990) and Bickerton Chomsky and Halle theory proposes (Chomsky &
(1998). Halle, 1968).
The above mechanisms to explore lexicon forma- For an overview of the experiments that are done
tion have been extended to include meaning creation in Brussels and Paris, consult (Steels, 1997b).
(hence modeling symbol grounding). The extension
resulted in a model of discrimination games (Steels, 3.2. The experiment
1996c). The mechanisms on which the discrimination
games are based are very similar to those of the The development of the language games has
language games: agent–environment interaction, in- resulted in a model that has been implemented on
dividual adaptation and self-organization. Agents various robotic platforms: on mobile LEGO robots
are given a mechanism to perceive their environ- (Steels & Vogt, 1997), on the Talking Heads, which
ment. Another mechanism allows them to adapt their are immobile pan-tilt cameras (Belpaeme et al.,
memories. They can construct new categories that 1998) and most recently on the AIBO, which is a
form the basis of the meaning, and they can adapt four legged robot (Kaplan, 2000). The experiment
them based on their use in grounded language reported here is based on the one reported in (Steels
games. The evolution of meaning using these mecha- and Vogt, 1997). The goal of this experiment is that
nisms is an emergent property of the simple inter- two mobile robots, given their bodies and inter-
action and adaptations, so the term self-organization action / learning mechanisms, develop a shared and
is in place. The discrimination game model has been grounded lexicon from scratch about the objects that
P. Vogt / Cognitive Systems Research 3 (2002) 429–457 437

the robots can detect in their environment. This


means that the robots construct a vocabulary of
form-meaning associations from scratch with which
the robots can successfully communicate the names
of the objects.
Although not always modeling lexicon evolution,
lexicon grounding on real robots is becoming in-
creasingly more popular. Other research, however,
does not investigate lexicon development from Fig. 2. The semiotic square illustrates the guessing game scenario.
scratch, but assumes a part of the lexicon is given. The two squares show the processes of the two participating
robots. This figure is adapted from (Steels and Kaplan, 1999).
Examples are the work of Billard and Hayes (1997)
and Yanco and Stein (1993) in robot–robot com-
munication, or Roy (2000) and Sugita and Tani The robots categorize the preprocessed data, which
(2000) in human–robot communication. It is beyond results in a meaning if the categorization is used in
the scope of this paper to discuss these researches the communication, i.e. when it is coupled to a form.
here, but all this work solves, to some extend, the After categorization, the speaker produces an utter-
physical symbol grounding problem. ance and the hearer tries to interpret this utterance.
In the experiments, the robots play a series of When the meaning of this utterance applies to the
guessing games (Steels & Kaplan, 1999). The gues- categorization of the sensed referents, the hearer can
sing game is a variant of a language game in which act to the appropriate referent, for instance, by
the hearer tries to guess what referent the speaker pointing at it. The guessing game is successful if the
names. Much of the processing of the robots that is hearer guessed the right topic from the speaker’s
not directly involved with lexicon development, like utterance. The success is evaluated in the feedback,
sensing, turn-taking and evaluating the feedback has which is passed back to both robots so that they can
been preprogrammed in a behavior-based manner, adapt their ontology of categories (used as memor-
see (Vogt, 2000b) for details. The same holds for the ized representations of the meanings) and lexicon.
learning mechanisms. The reason for this is not to As can be seen in Fig. 2, when the segment-node
complicate the system too much, so that a working is ignored (although it is a necessary intermediate
experiment could be developed to investigate how step), the guessing game allows the robots to con-
robots can develop a grounded lexicon, given the struct a semiotic symbol. The game is thus similar to
mechanisms explained below. Other research that the process of semiosis. In this paper it is assumed
investigates various aspects of the origins of lan- that meaning arises from the sensing, segmentation
guage and communication tries to explain how such and categorization. Hence semiotic symbols are
mechanisms may have evolved. Examples of such illustrated with the triangle (Fig. 1) rather than the
research investigate the origins of communication square. When a guessing game is successful, the
channels (Quinn, 2001), feature detectors (Bel- relations between form, meaning and referent are
paeme, 1999) and communication as such (Noble, established properly and one could argue that the
2000; De Jong, 2000). constructed semiotic symbol is used meaningful.
The basic scenario of a guessing game is illus- Below follows a detailed description of the ex-
trated in what Steels calls the semiotic square (Steels perimental setup and the guessing game model. Each
& Kaplan, 1999), see Fig. 2. The game is played by basic part of the model is exemplified with an
two robots. One robot takes the role of the speaker, illustrative example.
while the other takes the role of the hearer. Both
robots start sensing their surroundings, after which 3.3. The experimental setup
the sensory data is preprocessed. This way the robots
acquire a context setting. The speaker selects the The experiment makes use of two LEGO robots,
sensing of one referent as the topic; the hearer see Fig. 3. The robots are equipped with four light
considers all detected referents as a potential topic. sensors, two motors, a radio module and a sen-
438 P. Vogt / Cognitive Systems Research 3 (2002) 429–457

tion might have come up with a similar visual


apparatus.

3.4. Sensing, segmentation and feature extraction

As a first step towards solving the symbol ground-


ing problem, the robots have to construct what
Harnad calls an iconic representation. Although
Harnad refers to raw sensory data such as an image
on the retina, in this work it is assumed that an
iconic representation is the preprocessed sensory data
that relates to the sensing of a light source. The
resulting representation is called a feature vector in
line with the terminology from pattern recognition
Fig. 3. The LEGO robots in their environment. (see, e.g. Fu, 1976).
A feature vector is acquired in three stages:
sensing, segmentation and feature extraction. Below
sorimotor board. The light sensors are used to detect these three stages are explained in more detail.
the objects in the robots’ environment. The two
motors control the robots’ movements. The radio 3.4.1. Sensing
module is used to coordinate the two robots’ be- During the sensing phase, the robots detect what is
haviors and to send sensor data to a PC where most in their surroundings one by one. They do so by
of the processing takes place. rotating 7208 around their axis. While they do this,
The robots are situated in a small environment they record the sensor data of the middle 3608 part of
(2.5 3 2.5 m 2 ) in which four light sources are placed the rotation. This way the robots obtain a spatial
at different heights. The light sources act as the view of their environment for each of the four light
objects that the robots try to name. The different sensors, see Fig. 4. The robots rotate 7208 instead of
light sensors of the robots are mounted at the same 3608 in order to cancel out nasty side effects induced
height as the different light sources. Each sensor by the robots’ acceleration and deceleration.
outputs its readings on a sensory channel. A sensory Fig. 4 shows, as an example, the sensing of two
channel is said to correspond with a particular light robots during a language game. Fig. 4(a) shows that
source if the sensor has the same height as this light robot r0 clearly detected the four light sources; there
source. appears a peak for every light source. In each peak,
The goal of the experiments is that the robots the sensor that has the same height as the light
develop a lexicon with which they can successfully source responsible for the peak (i.e. the corre-
name the different light sources. Although this is not sponding sensor) has the highest intensity. The two
a realistic task for living organisms (or even for robots do not sense the same view as can be seen in
robots), complicating the task would not contribute Fig. 4(b). This is due to the fact that the robots are
much to the investigation of how robots can develop not located at the same place.
a lexicon to name things.
The reasons for designing the correspondence 3.4.2. Segmentation
between light sources and sensors are twofold. First, Segmentation is used by the robots to extract the
it helps the observer in analyzing the experiments. It sensory data that is induced by the detection of the
provides the observer with an easy tool to investigate light sources. The sensing of the light sources relates
what light source the robots saw based on the in the raw sensory data with the peaks of increased
sensory data. Secondly, if the robots were to live in intensity. As can be seen in Fig. 4, between two
this environment and had to distinguish the light peaks the sensory channels are noisy. The first step
sources from each other in order to survive, evolu- in the segmentation preprocesses the raw sensory
P. Vogt / Cognitive Systems Research 3 (2002) 429–457 439

Fig. 4. The sensing of the two robots during a language game. The plot shows the spatial view of the robots environment. It is acquired
during 3608 of their rotation. The figures make clear that the two robots have a different sensing, since they stand at different positions. The
y-axis shows the intensity of the sensors, while the x-axis determines the time (or angle) of the sensing in PDL units. A PDL unit takes about
1
] second, hence the total time of this sensing event took 1.5 s for robot r0 and slightly less for robot r1. The robots have different rotation
40
periods due to the noisy control of the robots. (a) Robot r0, (b) robot r1.

data to remove this noise. The preprocessed sensory


channel data ti,t , for sensory channels i 5 0, . . . ,n on
time steps t, is acquired by subtracting an upper
noise level Qi from the raw sensory data x i,t . This is
expressed in the following equation:

ti,t 5 Hx i,t 2 Qi if x i,t 2 Qi $ 0,


0 if x i,t 2 Qi , 0.
(1)

The upper noise levels Qi for the different sensory


channels have been obtained empirically. Applying
the above equation to the middle part of the scene
from Fig. 4(a) results in the scene displayed in Fig. 5.
In this figure there are two connected regions that Fig. 5. The preprocessed sensory channel data of a part of the
have positive sensory values for at least one sensory sensed view from Fig. 4(a). The vertical lines mark the boundaries
channel. These regions, of which the boundaries are of segments S1 and S2 .
marked with vertical lines, relate to the sensing of a
light source and form the segments. Note that there 0, . . . ,m there exist at least one sensory channel i for
is no noise anymore between the segments. which tk,i, j . 0. This means for every segment that
The segmentation results in a set of segments hSk j, for every observation j there is at least one pre-
where processed sensory channel with a positive value.
Sk 5 hs k,0 , . . . ,s k,n21 j, (2) Applying this to the data shown in Fig. 5 yields the
segments given in Table 1.
in which k is the number of the segment, s k,i 5 It is assumed that each segment relates to the
(tk,i,0 , . . . ,tk,i,m ) is the preprocessed sensory channel detection of one light source. Due to the noisy
data, m is the length of the segment and n is the control and sensing of the robots, this is not always
number of sensory channels (four in this case). The the case. The set of segments constitute what is
following holds for each segment Sk : for all j 5 called the context of the guessing game, i.e. Cxt 5
440 P. Vogt / Cognitive Systems Research 3 (2002) 429–457

Table 1 ing each maximum value by the maximum value of


The segments S1 and S2 from Fig. 5 s 1,1 . The resulting feature vector for segment S1 is
S1 S2 thus f1 5 (0.94, 1.00, 0.06, 0.00).
s 1,0 s 1,1 s 1,2 s 1,3 s 2,0 s 2,1 s 2,2 s 2,3 Similarly, segment S2 has maximum values of 0,
36, 40 and 1 for sensory channels 0 to 3. Normaliz-
tk,i,0 2 0 0 0 0 1 11 0
tk,i,1 14 2 0 0 0 1 37 0 ing each value to 40 yields feature vector f2 5 (0.00,
tk,i,2 32 17 1 0 0 16 39 1 0.90, 1.00, 0.03). The context of this robots could be
tk,i,3 46 40 3 0 0 36 40 1 described in terms of the feature vectors as Cxt 5
tk,i,4 51 54 3 0 0 28 35 1 h f1 , f2 j.
tk,i,5 33 35 4 0 0 28 35 1
The fact that the sensor that reads the highest
tk,i,6 3 1 4 0 0 0 8 0
tk,i,7 3 1 3 0 0 0 8 0 intensity during the sensing of a light source is
mounted at the same height as the light is an
The preprocessed sensory channel data s k,i is given in the
columns. Note that for every segment each row has at least one invariant property of the sensing. The feature ex-
positive value. Further note that here both segments have equal traction from Eq. (3) has been designed to extract
length. In general this is not the case. this property. In a feature vector there is one feature
with value 1, the others have lower values. This
feature indicates to which light source it corresponds.
hS1 , . . . ,SN j, where N is the number of segments that Because in the example feature f1,1 of feature vector
are sensed. Each robot participating in the guessing f1 has a value 1, one can infer that this feature vector
game has its own context which may differ from corresponds to the light source that is at the same
another. height as sensor s1.3 Similarly, f2 corresponds to the
light source at the same height as sensor 2.
3.4.3. Feature extraction The reasons for this feature extraction are mani-
For all sensory channels, the feature extraction is a fold. First, it is useful to have a consistent repre-
function w (s k,i ) that normalizes the maximum inten- sentation of the sensed referents in order to categor-
sities of sensory channel i to the overall maximum ize. It is easier to implement categorization when its
intensity from segment Sk , input has a fixed format. As the segments vary in
length, they have no fixed format. Second, the
max s k,istk,i, jd normalization to the maximum intensity within the
wss k,id 5 ]]]]]] (3) segment (the ‘invariance detection’) is useful to deal
max Sksmax s k,istk,i, jdd
with different distances between the robot and the
The result of applying a feature extraction to the data light source. Furthermore, it helps to analyze the
of sensory channel i will be called a feature fk,i , so experiments from an observer’s point of view and to
fk,i 5 w (s k,i ). Segment Sk can now be related to a evaluate feedback. Besides its use during feedback
feature vector fk 5 ( fk,0 , . . . , fk,n 21 ), where n is the (see below), the robots are not ‘aware’ of this
total number of sensory channels. Like a segment, a invariance. It should be noted that such feature
feature vector is assumed to relate to the sensing of a extraction functions could well have been learned or
referent. The space that spans all possible feature evolved as, for instance, shown by De Jong and
vectors f is called the n dimensional feature space Steels (1999) and Belpaeme (1999).
^ 5 [0, 1] n , or feature space for short.
Applying this feature extraction to the segments 3.5. Meaning formation
from Table 1 goes as follows. First the maximum
values for each sensory channel (the numerator in In order to form a semiotic symbol, the robots
Eq. (3)) have to be identified. In segment S1 these have to categorize the sensing of a referent so that it
are: 51, 54, 4 and 0 for sensory channels is distinctive from the categories relating to the other
s 1,0 , . . . ,s 1,3 , respectively. The maximum of these
maxima (the denominator in Eq. (3)) is in sensory 3
Note that the sensors are numbered from 0 to 3 and that sensor
channel s 1,1 . The features are extracted by normaliz- s0 is the lowest in height.
P. Vogt / Cognitive Systems Research 3 (2002) 429–457 441

referents. This way the category can be used as the space are available to a robot. These different feature
memorized representation of the meaning of this spaces, indicated by ^ l , allow different levels of
referent. Although a category should not be equaled generality specified by l 5 0, . . . , lmax , where lmax
with a meaning, it is labeled as such when used in is the level in which maximum specialization can be
communication, because it forms the memorized reached. This maximum is introduced because the
representation of a semiotic symbol’s meaning. sensors have a limited resolution and to prevent
The ‘meaning’ formation is modeled with the unnecessary specialization, which makes the dis-
so-called discrimination game (Steels, 1996c). The crimination game computationally very inefficient. In
discrimination game models the discrimination phase each space a different resolution is obtained by
in symbol grounding as proposed by Harnad (1990) allowing each dimension of ^ l to be exploited up to
and it searches for distinctive categories in three 3 l times. How this is done will be explained soon.
steps: (1) the feature vectors from the context are The higher l is, the more dense the distribution of
categorized, (2) categories that distinguish a topic categories in feature space ^ l can be and the less
from the other segments in the context are identified, general the categories in that feature space are. The
and (3) the ontology of categories is adapted. categories of all feature spaces together constitute an
Each robot plays a discrimination game for each agent’s ontology.
(potential) topic(s) individually. A topic is a segment The different feature spaces allow the robots to
from the constructed context (described by its feature categorize a segment in different ways. The categori-
vector). The topic of the speaker is arbitrarily zation of segment Sk results in a set of categories
selected from the context and is the subject of Ck 5 hc 0 , . . . , c m j, where m # lmax . So, assuming
communication. As the hearer tries to guess what the only one feature space, the two feature vectors from
speaker intends to communicate, it considers all the example are categorized with the following sets
segments in its context as a potential topic. of categories: C1 5 hc 1 j and C2 5 hc 2 j.
In the discrimination games, a prototypical repre-
3.5.1. Categorization sentation has been used rather than the binary
Let a category c 5 kc, n, r, k l be defined as a representation used in (Steels, 1996c; Steels and
region in the feature space ^. It is represented by a Vogt, 1997) or the subspace representation used by
prototype c 5 (x 0 , . . . , x n21 ), i.e. a point in the n De Jong and Vogt (1998), which all yield more or
dimensional feature space ^, and n, r and k are less similar results (De Jong & Vogt, 1998; Vogt,
scores. The category is the region in ^ in which all 2000b). The reason for adopting a prototype repre-
points have the nearest distance to c. sentation is its biological plausibility as inferred by
A feature vector f is categorized using the 1- for instance psychologists (Rosch, Mervis, Gray,
nearest neighbor algorithm (see, e.g., Fu, 1976). Johnson, & Boyes-Braem, 1976) and physiologists
This algorithm returns the category of which the (Churchland, 1989).
prototype has the smallest Eucledian distance to f.
Each robot categorizes all segments this way.
Consider the context of feature vectors from the 3.5.2. Discrimination
example derived in the previous section. Further Suppose that a robot wants to find distinctive
suppose that the robot has two categories in its categories for (potential) topic St , then a distinctive
ontology c 1 and c 2 , which are represented by the category set can be defined as follows:
prototypes c 1 5 (0.50, 0.95, 0.30, 0.00) and c 2 5
(0.50, 0.95, 1.00, 0.00). Then the Eucledian distance DC 5 hc i [ Ct u;sSk [ Cxtu hSt jd: c i [
⁄ Ck j.
between f1 and c 1 is 0.50, and between f1 and c 2 this
is 1.04. Since the distance between f1 and c 1 is Or in words: the distinctive category set consists of
smallest, f1 is categorized with c 1 . Likewise, f2 is all categories of the topic that are not a category of
categorized with c 2 . any other segment in the context. That is, those
In order to allow generalization and specialization categories that distinguish the topic from all other
in the categories, different versions of the feature segments in the context.
442 P. Vogt / Cognitive Systems Research 3 (2002) 429–457

3.5.3. Adaptation where e is the step size with which the prototype
If DC 5 5, the discrimination game is a failure and moves towards f. This way the prototype becomes a
some new categories are constructed. Suppose that more representative sample of the feature vectors it
the robot tried to categorize feature vector f 5 categorizes. This update is similar to on-line k means
( f0 , . . . , fn21 ), then new categories are created as clustering (MacQueen, 1967).
follows (see also the example in Section 3.5.4):
3.5.4. An example
1. Select an arbitrary feature fi . 0. To continue the example from above, recall that
2. Select a feature space ^ l that has not been the two feature vectors in the context are categorized
exploited 3 l times in dimension i for l as low as with the following sets of categories: C1 5 hc 1 j and
possible. C2 5 hc 2 j. Suppose that S1 is the topic of the
3. Create new prototypes c j 5 (x 0 , . . . , x n21 ), where discrimination, the distinctive category set is DC 5
x i 5 fi and the other x r are made of already ⁄ C2 . Because DC ± 5, the discrimi-
hc 1 j, because c 1 [
existing prototypes in ^ l . nation game is a success. If c 1 is further successfully
4. Add the new prototypical categories c j 5 kc j , nj , used in naming the referent, its prototype shifts
rj , kj l to the feature space ^ l , with n 5 r 5 0.01 towards f1 by applying Eq. (4). If e 5 0.1, which is
and k 5 1 2 ] l
l max .
the case in the experiment, then c 1 becomes c 19 5
(0.54, 0.96, 0.28, 0.00).
The score n indicates the effectiveness of a category Now suppose that the context did not consists of
in discrimination game, r indicates the effectiveness two feature vectors, but three: Cxt 5 h f1 , f2 , f3 j,
in categorization and k indicates how general the where f1 and f2 are as before and f3 5
category is (i.e., in which feature space ^ l the (1.00,0.90,0.05,0.00). Further suppose that the robot
category houses). Score k is a constant, based on the has the same ontology as before, then f3 is categor-
feature space ^ l and the feature space that has the ized with the category set C3 5 hc 1 j, since c 1 is the
highest resolution possible (i.e. l 5 lmax ). This score closest to f3 . When S1 is the topic, the distinctive
implements a bias towards selecting the most general category set is empty, since C1 5 hc 1 j and c 1 [ C3 .
category. The other scores are updated as in re- The discrimination game fails and two new
inforcement learning. It is beyond the scope of this categories are formed of which the prototypes are as
paper to give exact details of the update functions, follows when feature f1,0 is selected as an exemplar
see (Vogt, 2000b) for these details. of topic S1 (note that only one feature space ^ is
The three scores together constitute the meaning considered in this example): c 3 5 (1.00, 0.95, 0.30,
score m 5 ]13 (n 1 r 1 k ), which is used in the naming 0.00) and c 4 5 (1.00, 0.95, 1.00, 0.00). Feature f1,0 is
phase of the experiment. The influence of this score copied to the new prototypes, together with the other
is small, but it helps to select a form-meaning features of already existing prototypes, compare c 3
association in case of an impasse. and c 4 with c 1 and c 2 .
The reason to exploit only one feature of the topic
during the construction of new prototypes, rather 3.6. Naming
than the complete feature vector is to speed up the
construction. After both robots have obtained distinctive
If the distinctive category set DC ± 5, the dis- categories of the (potential) topic(s) from the dis-
crimination game is a success. The DC is forwarded crimination game as explained above, the naming
to the naming game that models the naming phase of game (Steels, 1996b) starts. In the naming game, the
the guessing game. If a category c is used successful- robots try to communicate the topic.
ly in the guessing game, the prototype c of this The speaker tries to produce an utterance as the
category is moved towards the feature vector f it name of one of the distinctive categories of the topic.
categorizes, The hearer tries to interpret this utterance in relation
to distinctive categories of its potential topics. This
c[c 1 e ? ( f 2 c), (4) way the hearer tries to guess the speaker’s topic. If
P. Vogt / Cognitive Systems Research 3 (2002) 429–457 443

the hearer finds a possible interpretation, the gues- of communication. A topic relates to an entry when
sing game is successful if both robots communicated its distinctive category matches the entry’s meaning.
about the same referent. This is evaluated by the
feedback process as will be explained below. Ac- 3.6.4. Feedback
cording to the outcome of the game, the lexicon is In the feedback, the outcome of the guessing game
adapted. is evaluated. It is important to note that in this paper,
the term feedback is only used to indicate the process
3.6.1. The lexicon of evaluating a game’s success by verifying whether
The lexicon L is defined as a set of form-meaning both robots communicated about the same referent.
associations: L 5 hFM i j, where FM i 5 kFi , Mi , si l is As mentioned, the guessing game is successful when
a lexical entry. Here Fi is a form that is made of a both robots communicated about the same referent.
combination of consonant–vowel strings, Mi is a The feedback is established by comparing the feature
(memorized part of a) meaning represented by a vectors of the two robots relating to the topics. If
category, and s is the association score that indicates these feature vectors correspond to each other, i.e.
the effectiveness of the lexical entry in the language they both have a value of 1 in the same dimension,
use. the robots have identified the same topic, cf. the
invariance criterion mentioned in Section 3.4.3. The
outcome of the feedback is known to both robots.
3.6.2. Production If the hearer selected a topic after the understand-
The speaker of the guessing game will try to name ing phase, but if this topic is not consistent with the
the topic. To do this it selects a distinctive category speaker’s topic, there is a mismatch in referent. This
from the DC for which the meaning score m is is the case when the invariance criterion is not met.
highest. Then it searches its lexicon for form-mean- If the speaker has no lexical entry that matches a
ing association of which the meaning matches the distinctive category, or if the hearer could not
distinctive category. interpret the speaker’s utterance because it does not
If it fails to do so, the speaker will first consider have a proper lexical entry in the current context,
the next distinctive category from the DC. If all then the guessing game is a failure.
distinctive categories have been explored and still no The feedback is evaluated rather artificial and not
entry has been found, the speaker may create a new very realistic since robots normally have no access to
form as will be explained in the adaptation section. each other’s internal states. However, previous at-
If there are one or more lexical entries that fulfill tempts to implement feedback physically have failed
the above condition, the speaker selects that entry (Vogt, 1998a). In these attempts the hearer pointed at
that has the highest association score s. The form the topic so that the speaker could verify whether the
that is thus produced is uttered to the hearer. hearer identified the same topic. This failed because
the speaker was not able to verify reliably at what
3.6.3. Interpretation object the hearer pointed. A similar pointing strategy
On receipt of the utterance, the hearer searches its has been successfully implemented in the Talking
lexicon for entries for which the form matches the Heads experiment (Steels & Kaplan, 1999). To
utterance, and for which the meaning matches one of overcome this technical problem, it is assumed that
the distinctive categories of the potential topics. the robots can do this. Naturally, this problem needs
If it fails to find one, the lexicon has to be to be solved in the future.
expanded, as explained later.
If the hearer finds more than one, it will select the 3.6.5. Adaptation
entry that has the highest score S 5 s 1 a ? m, where Depending on the outcome of the game, the
a 5 0.1 is a constant weight. The potential topic that lexicon of the two robots is adapted. There are four
relates to this lexical entry is selected by the hearer possible outcomes / adaptations:
as the topic of the guessing game. That is, this
segment is what the hearer guessed to be the subject 1. The speaker has no lexical entry: In this case the
444 P. Vogt / Cognitive Systems Research 3 (2002) 429–457

speaker creates a new form and associates this Suppose that the speaker selected the segment
with the distinctive category it tried to name. This relating to L3 as the topic, which it categorized
is done with a probability of Ps 5 0.1. distinctively with c 3 . This category is only associated
2. The hearer has no lexical entry: The hearer adopts with the form tyfo, which it utters. Upon receiving
the form uttered by the speaker and associates this the form tyfo, the hearer tries to interpret it. The
with the distinctive categories of a randomly hearer has two meanings associated with tyfo: c 91 and
selected segment from its context. c 93 . As the association score relating tyfo with c 39 is
3. There was a mismatch in referent: Both robots highest, the hearer selects the segment belonging to
lower the association score s of the used lexical c 93 as the topic.5 If this segment relates to L3, the
entry: s [h ? s, where h 5 0.9 is a constant guessing game is successful and the scores are
learning rate. In addition, the hearer adopts the updated as follows: The speaker and the hearer both
utterance and associates it with the distinctive increase the association score between c 3 (or c 93 )
categories of a different randomly selected seg- with tyfo, which become 0.955 and 0.73, respective-
ment. ly (recall that h 5 0.9). Competing associations are
4. The game was a success: Both robots reinforce laterally inhibited: the association score of kc 1 ,tyfol
the association score of the used entry: s [h ? becomes 0.18, of kc 91 ,tyfol becomes 0.09 and of
s 1 (1 2 h ). In addition, they lower competing kc 93 ,labol becomes 0.045.
entries (entries for which either the form or the Reconsider the lexicon in Table 2, but now the
meaning is the same as in the used entry): s [h ? speaker selected L1 that it categorized with c 1 as the
s. The latter update is called lateral inhibition.4 topic. In this case, the speaker will select labo to
name L1, because this has the highest association
3.6.6. An example score. The hearer interprets labo with c 92 , which
The following example illustrates the naming unfortunately has been categorized for L2. There is a
phase of the guessing game. Suppose the speaker and mismatch in referent. The association scores of the
the hearer have a lexicon as given in Table 2. Each used association scores are lowered. So, the associa-
agent has three private meanings c i (or c 9i ) associated tion score of kc 1 ,labol becomes 0.225 and of
with two different forms tyfo and labo. In the cells kc 92 ,labol becomes 0.675.
of the table the association scores are given, a dash If the speaker has categorized the topic, e.g. the
indicates that there is no association between that segment relating to L1, with a category that is not
meaning and form. Further suppose that both robots yet in its lexicon, say c 4 , then it may invent a new
each have detected three light sources L1, L2 and form. This new form, for instance, gufi is then
L3. uttered to the hearer. The hearer, however, does not
know the form yet and associates it with the cate-
gorization(s) of a randomly selected segment and the
Table 2 game is considered as a failure.
The lexicon of the speaker and hearer used in the example
Speaker Hearer
Ms tyfo labo Mh tyfo labo 3.7. Summary
c1 0.20 0.25 c 91 0.10 –
This section presented the guessing game model
c2 – 0.65 c 92 – 0.75
c3 0.95 – c 93 0.70 0.05 by which the experiments are done. Two mobile
robots try to construct a lexicon with which they can
Each agent has associated three meanings (in the rows) with two
forms (in the columns). Real-valued numbers indicate the associa- solve the physical symbol grounding problem. In
tion scores and a dash indicates there is no association. each guessing game, the robots try to name one of

4
Independent of each other, Oliphant (1999), De Jong (2000)
5
and Kaplan (2001) have shown that lateral inhibition is crucial for Note that the meaning score m is discarded to simplify the
convergence in lexicon development. example.
P. Vogt / Cognitive Systems Research 3 (2002) 429–457 445

the light sensors that are in their surroundings. They are specified in Section 4.2. Section 4.3 presents the
do so by taking the following steps: results.

1. Sensing, segmentation and feature extraction. 4.1. The sensory data


2. Topic selection.
3. Discrimination games: (a) categorization, (b) dis- As mentioned, the guessing games are for a large
crimination and (c) ontology adaptation. part processed off-line on a PC. Only the sensing is
4. Naming: (a) production, (b) interpretation, (c) done on-board. The recorded sensory data of the
evaluating feedback and (d) lexicon adaptation. sensing is re-used to do multiple experiments using
the same data, but also to process more games than
The guessing game described above implements the have been recorded. The most important reason for
three mechanisms hypothesized by Luc Steels the off-line processing has to do with time efficiency.
(1996b) that can model lexicon development. (Cul- Conducting a complete experiment on-board would
tural) interactions are modeled by the sensing, take at least one week of full-time experimenting.
communication and feedback. Individual adaptation Another advantage of processing off-board is that
is modeled at the level of the discrimination and one can do multiple experiments in which various
naming game. The selection of elements and the methods and parameter settings can be compared
individual adaptations are the main sources for the reliably.
self-organization of a global lexicon. The data that has been recorded for the experi-
The coupling of the naming game with the dis- ments reported here, consists (after preprocessing the
crimination games and the sensing part makes that raw sensory data) of approximately 1000 context
the emerging lexicon is grounded in the real world. settings. These context settings are used to experi-
The robots successfully solve the physical symbol ment 10 runs of 10 000 guessing games. Hence in
grounding problem in some situation when the each run the context settings are used approximately
guessing game is successful. This is so, because ten times. Statistics on the data set revealed that the
identification (Harnad, 1990) is established when the average context size is about 3.5 segments per robot.
semiotic triangle (Fig. 1) is constructed completely. In each game one robot is selected randomly to be
Identification is done at the naming level and it is the speaker, who arbitrarily selects one feature vector
successful when the guessing game is successful. It as the topic. Therefore, it takes, in principle, approxi-
is important to realize that internal representations mately 7000 games until a particular situation re-
stored in the robots’ memories as such are not occurs.
semiotic symbols; they only constitute part of a Other statistics on the preprocessed data set re-
semiotic symbol when used in a language game. vealed that the a priori chance for success is around
Only then the relation with a referent is assured. 23.5%. This means that when both the speaker and
the hearer select a topic at random, in about 23.5%
they will select the same topic. This a priori chance
4. The experiments is calculated from the average context size (3.5
segments) and the potential understandability. Since
Using the defined model (and some variations of both robots not always detect the same surroundings
it), a series of experiments have been done to (Fig. 4), the possibility exists that the speaker selects
investigate various aspects of the model. These a topic that the hearer did not detect. Although in
experiments are all reported in (Vogt, 2000b). One of natural communication the hearer might try to find
them is reported in this section. the missing information, this is not done here for
This section is organized as follows: Before practical reasons. Besides, human communication is
reporting the experiment some expectations of the not always perfect, because they fail to construct a
experiment’s success are stated according to some shared context. So, there is a maximum in the
statistics calculated from the recorded sensory data. success to be expected, which has been coined the
The measures with which the experiment is analyzed potential understandability. The potential understan-
446 P. Vogt / Cognitive Systems Research 3 (2002) 429–457

dability has been calculated to lie around 80%. For weighting factor is to scale the importance of such a
details on the calculation and other information about meaning to its occurrence.
the sensory data, see (Vogt, 2000b).
4.2.3. Parsimony
4.2. Measures The parsimony PR is calculated similar to the
distinctiveness,
The experiments are investigated using six differ-
O 2 Psm ur d ? log Psm ur d,
m
ent measures. Two of these (the discriminative- and Hs m u rid 5 j i j i
communicative success) measure the success rate of j 51

the discrimination games and the guessing games.


Hs m u rid
The others measures indicate the quality of the pars( ri ) 5 1 2 ]]], (6)
system that emerges. These measures, which are H( m )
based on the entropy measure taken from infor-
O P ( r ) ? pars( r )
n

mation theory (Shannon, 1948), are developed by o i i


i51
Edwin De Jong (2000). They are called distinctive- PR 5 ]]]]]],
n
ness, parsimony, specificity and consistency, and are
calculated every 200 guessing games. Below follows with H( m ) 5 log m. Parsimony thus calculates to
a description of these measures. what degree a referent gives rise to a unique
meaning.
4.2.1. Discriminative success
Discriminative success (Steels, 1996c) measures 4.2.4. Communicative success
the number of successful discrimination games aver- Communicative success (Steels, 1996b) measures
aged over the past 100 guessing games. the number of successful guessing games averaged
over the past 100 guessing games.
4.2.2. Distinctiveness
‘‘Intuitively, distinctiveness expresses to what 4.2.5. Specificity
degree a meaning identifies the referent’’ (De Jong, ‘‘The specificity of a word[-form] is . . . defined as
2000, p. 76). For this one can measure how the the relative decrease of uncertainty in determining
entropy of a meaning in relation to a certain referent the referent given a word that was produced’’ (De
H( r u mi ) decreases the uncertainty about the referent Jong, 2000, p. 115). It thus is a measure to indicate
H( r ). To do this, one can calculate the difference how well a word-form can identify a referent. It is
between H( r ) and Hs r u mid. Here r are the referents calculated analogous to the distinctiveness and par-
r1 , . . . , rn and mi relates to one of the meanings simony. For a set of word-forms s1 , . . . , sq , the
m1 , . . . , mm for robot R. The distinctiveness DR can specificity is defined as follows:
now be defined as follows:
O 2 Psr us d ? log Psr us d,
n

Hs r usid 5
O 2 Psr um d ? log Psr um d,
n j i j i
j 51
Hs r u mid 5 j i j i
j 51
Hs r usid
specssid 5 1 2 ]], (7)
H( r ) 2 Hs r u mid Hs r u mid H( r )
dist( mi ) 5 ]]]]] 5 1 2 ]]], (5)
H( r ) H( r ) q

O P ss d ? specss d
O P ( m ) ? dist( m )
m o i i
i 51
o i i SR 5 ]]]]]],
i 51 q
DR 5 ]]]]]],
m
where H( r ) 5 log n is defined as before and Po is the
where H( r ) 5 log n and Po ( mi ) is the occurrence occurrence probability of encountering word-form
probability of meaning mi . The use of Po ( mi ) as a si .
P. Vogt / Cognitive Systems Research 3 (2002) 429–457 447

4.2.6. Consistency in Fig. 6(e), increases to a value slightly above 0.9. It


Consistency measures how consistent a referent is shows that when a form is used, it is mainly used to
named by a certain word-form. It is calculated as name one referent. Hence there is little polysemy.
follows: The consistency (Fig. 6(f)) is lower than the spe-
q cificity. This means that a referent is not always
Hss u rid 5 O 2 Pss ur d ? log Pss ur d,
j 51
j i j i
named with the same form. As will be shown later,
this does not mean that the lexicon is inefficient, it
rather means that the system bears some synonymy.
Hss u rid
conss rid 5 1 2 ]], (8) It should be noted that although the parsimony is
H(s ) almost as high as the consistency, this does not mean
that the number of meanings used is close to the
O P sr d ? conssr d
n

o i i number of forms. It merely indicates that the incon-


i 51
CR 5 ]]]]]], sistent use of forms happen about as often as the
n
inconsistent use of meanings.
where H(s ) 5 log q and Pos rid is defined as before. As can be seen in Figs. 6(e) and (f), the specificity
The four entropy-based measures specify whether and consistency rise very rapidly. To understand this,
or not there is order in the system. When either it should be realized that these measures are calcu-
measure has the value of 1, there is order in the lated relative to the successful use of forms (in case
system. When a measure has value 0, there is of specificity) or to the successful naming of refer-
disorder. All these entropy measures are calculated ents (in case of consistency). Specificity and consis-
per robot every 200 games within one run; the other tency are not relative to the number of successful
measures are calculated after every single game. guessing games. So, this means that, whenever the
When presented, all measures are averaged over the robots communicate successfully, the used semiotic
ten runs that are done each experiment. symbols reveal order in the lexicon. Similar argu-
ments hold for the distinctiveness and parsimony
4.3. The results (Figs. 6(b) and (c)).
Although still rather fast, the communicative
The experiment is done with 10 runs of 10 000 success rises slower (see Fig. 6(d)). Because the
guessing games. Fig. 6 shows the evolution of the agents tend to categorize the referents very different-
different measures. The discriminative success ap- ly on different occasions, there arises much
proaches 100% early in the experiment (Fig. 6(a)). synonymy and polysemy in the system. Too much
This indicates that the discrimination game is a very synonymy and polysemy cause many confusions, so
efficient model for categorizing different sensings. many guessing games fail. Hence the robots must
Similar results are confirmed in other experiments disambiguate the synonymy and polysemy, which
using different representations of categories and takes some time. Nevertheless, the agents perform
varying numbers of objects, e.g. (Steels, 1996c; De better than chance already after a few hundred
Jong and Vogt, 1998). As the distinctiveness shows games. Kaplan has shown that the speed of conver-
in Fig. 6(b), when a categorization (or meaning) is gence in communicative success depends, amongst
used, it usually stands for one referent only. That is, others, on the number of meanings / referents, the
there is a one-to-one relation between meaning and number of agents and noise in transmission (Kaplan,
referent. This, however, does not imply that there is 2001).
a one-to-one relation between referent and meaning. The run that will be discussed in more detail
The lower parsimony shows this (see Fig. 6(c)). below resulted in the lexicon that is displayed in the
The communicative success (Fig. 6(d)) approaches semiotic landscape shown in Fig. 7. This figure
the potential understandability of 80%. After 10 000 shows the associations between referent, meaning
games, the communicative success is approximately and form for both robots with a strength that
75%. Hence the robots are fairly well capable of indicates the relative occurrence frequency of con-
constructing a shared lexicon. The specificity, shown nections that are successfully used over 10 000
448 P. Vogt / Cognitive Systems Research 3 (2002) 429–457

Fig. 6. The evolution of the measures during the experiment.


P. Vogt / Cognitive Systems Research 3 (2002) 429–457 449

Fig. 7. The semiotic landscape of the experiment. A semiotic landscape provides a way to illustrate how the semiotic symbols of the two
robots are related. It illustrates the connections between referent / light source L, meaning M and forms such as sema, zigi, etc. The upper
half of the graph shows the lexicon of robot r0, the lower half the lexicon of r1. The connections drawn indicate the relative occurrence
frequencies (P) of referents, meanings and forms. The relations between referent and meaning are relative to the occurrence of the referent.
The relations between meaning and form are relative to the occurrence of the form. Associations with an occurrence frequency of P , 0.005
are left out for clarity.

guessing games. Ideally, the connections between relative co-occurrence frequency in time of, for
referent–form–referent would be orthogonal. That is, instance, forms in relation to a referent (Steels &
the couplings between a referent and its form should Kaplan, 1999). Fig. 8(a) shows the referent–form
not cross-connect with other form–referent relations. competition. This figure shows the successful co-
This orthogonality criterion is achieved for mety, occurrence of referent and form, where the occur-
luvu and possibly zigi. The word-forms kate and rence of the form is calculated relative to the
demi have cross-connections, but these are relatively occurrence of the referent. Very infrequent occurring
unimportant because they occur with very low elements are left out for clarity. Fig. 8(a) shows that
frequencies. More polysemy is found for sema and form tyfo clearly wins the competition and is nearly
tyfo. As will be shown below, tyfo gets well used uniquely to name light source L1. Hence L1 has
established to name L1 almost unambiguously. The very little synonymy. Vice-versa, the form–referent
form sema however, provides some instability in the diagram shows that when the form tyfo is used, it is
system. used mostly to name light source L1 (see Fig. 8(b)).
Fig. 8 shows various competition diagrams of This, however, happens after game 3000. Before this,
robot r0, relating to referent L1 in one of the runs of the form tyfo shows quite some polysemy.
the experiment. A competition diagram displays the Fig. 8(c) shows that, throughout the run, L1 is
450 P. Vogt / Cognitive Systems Research 3 (2002) 429–457

Fig. 8. Some competition diagrams of robot r0 in one run of the experiment. (a) A referent–form competitions for light source L1. The
y-axis shows the co-occurrence frequencies of form and referent relative to the occurrence of the referent over the past 200 guessing games.
The x-axis shows the number of guessing games. (b) The form–referent competition for tyfo. Again the y-axis shows the co-occurrence
frequencies, but now of the form and referent relative to the occurrence of the form. (c) The referent-meaning competition for L1 and (d)
shows the form-meaning competition for tyfo.

categorized with more than one meaning of which referent. Such relations are much less frequently
two are used most frequently. A similar competition observed when investigating co-occurrences of refer-
can be seen in Fig. 8(d) where various meanings ent and meaning or form and meaning.
compete to be the meaning of tyfo. Apparently, these
competitions compensate each other such that
competitions as in Fig. 8(a) and (b) emerge. 5. Discussion
That the competition is not always running so
smooth is shown in Fig. 9. Here there are two forms 5.1. Meeting the limits
strongly competing for naming light source L2. In
most cases, the forms are used to name only one The results make clear that the robots construct a
referent in the end as Fig. 9(b) shows. But some- communication system that meets its limits. The
times this could fail. Nevertheless, the overall picture communicative success is in the end nearly as high
is that all referents are mostly named by only one as the potential understandability.
form and all forms are mostly used to name only one Both the discriminative success and distinctiveness
P. Vogt / Cognitive Systems Research 3 (2002) 429–457 451

Fig. 9. (a) A referent–form competition for light source L2 and (b) a form–referent competition for the form luvu.

are very close to 1, and the specificity is also close to prototype shift towards detected referents, cf. Eq.
1. So, when a robot uses a semiotic symbol success- (4).
fully, it almost always refers to the same referent. In what respect did the robots acquire meaningful
This means that there is hardly any polysemy. The symbols? In Section 2.3.2, meaning has been defined
parsimony and consistency are somewhat lower than as a functional relation between form and referent.
the distinctiveness and specificity. Hence, there are The robots do not acquire meaning in the sense that
some one-to-many relations between referent and they use the communicated symbols to fulfill a ‘life-
meaning and between referent and form in the task’, neither is it rooted in the sense that the robots’
system. The semiotic landscape (Fig. 7) already bodies, their interaction and learning mechanisms are
showed that most of the synonymy does not neces- designed (see Ziemke, 1999). But given the robots’
sarily mean that the communication is difficult. bodies, interaction mechanisms and learning mecha-
Usually, the hearer can rather easily interpret any nisms, the semiotic symbols are meaningful in that
speaker’s utterance. they are used by the robots to name referents. They
The landscape also shows that a one-to-many could also be used to perform simple actions, such as
relationship between form and meaning does not pointing at the referent, as shown in (Vogt, 1998a)
necessarily mean polysemy. In fact, it is beneficial, and in the Talking Heads experiment (Steels &
since it cancels out the one-to-many mapping of Kaplan, 1999). However, because this pointing could
referent to meaning for a great deal. Hence semiotic in the current experiments not be used to evaluate
symbols may have different meanings. Referents are feedback, it has not been implemented in the current
interpreted differently when observed under different experiment. In more realistic experiments, the robots
circumstances. Yet the referents can be named in- should use the communication to fulfill some task.
variantly to a high degree. One may argue that the Fulfilling tasks could then be used to evaluate the
robots use different semiotic symbols when they use language game’s success as is the case in, e.g. (De
different meanings and sometimes this is even Jong, 2000; Vogt, 2000a; Kaplan, 2000). Further
appropriate. However, when the semiotic symbols research is currently in progress where the robots use
have the same form and relate to the same referent, the lexicon to improve their learned capability to
attributing them to a single semiotic symbol can be sustain their energy-level in order to survive. This
useful. Especially when the semiotic symbols are experiment combines the language game model with
only used to name a referent. Therefore, one could the viability experiments previously done at the AI
also argue that the meaning of a semiotic symbol Lab in Brussels, see, e.g., (Steels, 1996a; Birk,
changes dynamically over time, depending on the 1998). The meaning of the semiotic symbols that are
situation the robots are in. Note, by the way, that the constructed in such an experiment is than based on
meanings also change dynamically over time by the the robots’ activity to remain viable. But even then,
452 P. Vogt / Cognitive Systems Research 3 (2002) 429–457

as argued in, e.g., (Ziemke, 1999; Ziemke and game adopts forms that may already exists for the
Sharkey, 2001), the semiotic symbols will only be robot, but that are not applicable in the current
really meaningful to an agent when the agent is context (i.e. the meaning does not apply to any
completely rooted by, for instance, evolution distinctive category). Consequently, a lot of variety
(Ziemke, 1999). Current studies in ALife focus on is introduced in the lexicon. As a result of adapting
how robotic agents may evolve, for instance, their the association scores, the robots will tend to select
bodies (Lund, Hallam & Lee, 1997), their control effective elements appropriately more often. This
architecture (Nolfi & Floreano, 2000), their sensors induces a repetitive cycle that strengthens the effec-
(Jung, Dauscher, & Uthmann, 2001) and communi- tive elements more and meanwhile weakens the
cation channels (Quinn, 2001). These researches ineffective ones. In the beginning this adaptation is
might help to explain how robots can become more flexible than later on when the association
‘rooted’ in their environment. For a broad discussion scores become stronger. When the association scores
on these issues (see, e.g., Ziemke & Sharkey, 2001). are strong, it is more difficult for other associations
In the current experiment the forms are transmitted to influence the communicative success. This, how-
as strings in the off-line processing on the PC. ever, becomes less important as the communicative
Previously, this has been done using radio communi- success rises to a satisfactory degree.
cation (Steels & Vogt, 1997). However, in a more The dynamics of the system allows an important
realistic setting, the transmission should occur in conclusion to be drawn, namely that the selection
phonetic strings. In such a case, the phonetic strings criteria and the dynamics of association scores cause
are also physical objects and must be processed and a self-organizing convergence of the lexicon. The
categorized. Both processing and categorization conclusion that this is an emergent property can,
could, for instance, be done in a similar way as amongst others, be drawn from the fact that lexicon
modeled by De Boer (1997), see Section 3.1. Using dynamics is controlled at the form-meaning level and
De Boer’s model, vowels, or even more complex not at the form–referent level. That is, adaptations in
utterances (De Boer, 2000b; Oudeyer, 2001), could the lexicon occur only at the form–meaning level. A
be developed in a similar way as the lexicon is referent is categorized differently in different situa-
developed. One non-trivial problem remains to be tions (see, e.g., Fig. 8(c)). At the same time, these
solved. This problem has to do with distinguishing different categories may be named by only one name
utterances as forms from physical objects. How this (or more concretely, one name may have different
problem can be solved is unclear, but it may depend meanings in different situations) as Fig. 8(d) depicts.
on the context setting, and for that the agents need to Fig. 8(a) and (b) show that the one-to-many relations
develop more sophisticated means of recognizing a at the referent–meaning and form–meaning levels
context. But perhaps it could also be solved by the cancel each other out at the referent–form level in
evolution of communication channels as, for in- both directions. This happens despite the fact that
stance, is being investigated in (Quinn, 2001). when a form–meaning association is used success-
fully, the strength of competing form–meaning as-
5.2. The dynamics of the system sociations are laterally inhibited. Although this later-
al inhibition helps to decrease polysemy and
What can be said about the dynamics of the synonymy at the referent–form level, it is also an
system? Fig. 6(d) showed that the communicative antagonizing force at the form–meaning level when
success shows a rapid increase during the first 1000 the meaning is used to stand for the same referent.
games, after which the rate of increase decreases. This antagonizing force, however, is not problematic
Furthermore, the figure shows a slower increase than, due to the context dependence of the guessing
e.g. the discriminative success shown in Fig. 6(a). games. Selected lexical elements must be applicable
So, what happens in the beginning? In the first few within the context of a particular game. This is a nice
hundred games, when the robots try to name the consequence of the pragmatic approach. Further-
various referents, there is a lot of confusion what the more, the feedback signals that operate at the form–
speaker intends to communicate. The hearer of a referent level contribute largely to the convergence
P. Vogt / Cognitive Systems Research 3 (2002) 429–457 453

of the lexicon. It has been shown in (Vogt, 2000b) experiments here none of the robots has been
that leaving out the feedback without alternative preprogrammed with the lexicon.
ways of knowing the topic, does not lead to conver- Although not situated and embodied, the simula-
gence in this experimental setup. This does not tions of Oliphant, too, are relevant. He showed that
mean, however, that leaving out the feedback does populations of agents could rather easily develop
not work in general. It may well be that not using highly efficient lexicons from scratch by associating
such feedback, or any other non-verbal means of given meanings with given forms (Oliphant, 1999).
exchanging topic knowledge, might lead to conver- Also relevant is the work of De Jong who ex-
gence in a more rich environment, cf. the results of perimented with language games, of which the
simulations reported in (Smith, 2001). This is cur- meanings were grounded in simulations (De Jong,
rently being investigated. Another strategy that could 2000). In addition, De Jong’s agents tried to improve
be beneficial in such cases is to use a cooperative their (task-oriented) behavior using the lexicon de-
rather than a competitive selection and adaptation veloped. Both Oliphant and De Jong showed that
scheme as demonstrated in (Kaplan, 2001). agents could develop a coherent lexicon without
using feedback on the effect of the game, provided
5.3. The relation to other work the agents have access to both the form and meaning
during a language game. Although in this paper the
The experiment presented here is unique in its robots did use such feedback, robotic experiments
modeling the development of a lexicon grounded in have confirmed Oliphant and De Jong’s results
reality from scratch using mobile robots. Although (Vogt, 2000b, 2001). In these experiments both
the Talking Heads experiment (Belpaeme et al., robots had access to both the form and the topic by
1998; Steels & Kaplan, 1999) also models lexicon means of ‘pointing’, so that the hearer knows the
development from scratch and is also a grounded topic in advance.
experiment, the Talking Heads are immobile (they
can only move pan-tilt from a fixed position). This
immobility helps to evaluate the feedback on gues- 6. Conclusions
sing games, because the Talking Heads use cali-
brated knowledge about their environment to evalu- In order to overcome fundamental problems that
ate feedback. In addition, the different sensings of a exist in the cognitivist approach towards cognition,
referent are more similar on different occasions than and to allow describing cognition in terms of sym-
in the mobile robots as the Talking Heads sense their bols within the paradigm of embodied cognition, this
environment from a fixed position. In controlled paper proposes an alternative definition of symbols.
experiments, this allowed the Talking Heads to speed This definition is not novel, but is adopted from
up the lexicon development (Steels & Kaplan, 1999). Peirce’s definition of symbols as the relation between
Nevertheless, the overall success on both platforms a form, a meaning and a referent. The relation as
is more or less comparable. such is not meaningful, but arises from its active
Similar findings are found when comparing the construction and use. This process is called semiosis
work of this paper with the work of Billard and her and has been modeled in robotic agents through
colleagues on mobile robots (Billard & Hayes, 1997; adaptive language games.
Billard & Dautenhahn, 1998). In Billard’s experi- As a result of the semiotic definition of symbols, it
ments, a student robot learns a grounded lexicon could be argued that the symbols are per definition
about its environment by imitating a teacher robot grounded, because semiotic symbols have intrinsic
who has been preprogrammed with the lexicon. The meaning in relation to the real world (cf. Lakoff,
overall results are similar to the results presented 1987). Hence, the symbol grounding problem is no
here, although the lexicon acquisition is much faster longer a fundamental problem, since a semiotic
in (Billard and Dautenhahn, 1998). The latter result symbol is a relation between a form, meaning and
is presumably due to the fact that one of Billard’s referent, and the way this relation is formed and used
robots already knows the lexicon, while in the specifies its meaning. There, however, remains the
454 P. Vogt / Cognitive Systems Research 3 (2002) 429–457

problem of constructing the semiotic symbols, but this setup is rather simplistic. Future work should
this problem can be viewed as a technical problem. confirm the scalability of the model in more realistic
This problem is called the physical symbol ground- and more complex environments using more com-
ing problem. plex robots. Another improvement that is currently
The experiment reported shows how robotic under investigation, is that the communication sys-
agents can develop a structured set of semiotic tem is used to perform concrete ‘life-tasks’ rather
symbols which they can use to name various real than just using and developing a lexicon. This would
world objects invariantly. The semiotic symbols are make the approach more realistic since in natural
constructed through the robots’ interactions with systems communication is usually used to guide
their environment (including themselves), individual (task-oriented) behavior, such as coordinating each
adaptations and self-organization. These three mech- other’s actions.
anisms, hypothesized by Luc Steels (1996a), are
based on the core principles of embodied cognitive
science: embodiment, situatedness and emergence. Acknowledgements
The semiotic symbols are structural couplings that
are formed through some interaction of an agent with The author wishes to thank Ida Sprinkhuizen-
the real world as proposed for instance by Clancey Kuyper, Edwin De Jong and Eric Postma for proof-
(1997). reading earlier versions of this paper. Erik Myin,
An important result that the experiment reveals is Tom Ziemke, Georg Dorffner and two anonymous
that semiotic symbols need not be categorized the reviewers are thanked for various useful comments
same under different circumstances. As the semiotic that helped improve this paper a lot.
landscape shows, there is no one-to-one-to-one rela-
tion between a referent, meaning and form; this
relation is rather one-to-many-to-one. In different References
situations, the robots detect the referents differently.
Yet they are able to identify them invariantly at the ACL (1997). Proceedings of the fifth conference on applied
form level. In the process of arriving at such natural language processing, Washington, DC. Menlo Park:
invariant identification, which is the most important Association for Computational Linguistics.
aspect of symbol grounding (Harnad, 1990), the Arkin, R. C. (1998). Behavior-based robotics. Cambridge, MA:
MIT Press.
co-evolution of form and meaning reveals to be
Batali, J. (1998). Computational simulations of the emergence of
extremely important. grammar. In Hurford, J. R., Studdert-Kennedy, M., & Knight,
The experiment shows that the physical symbol C. (Eds.), Approaches to the evolution of language. Cambridge,
grounding problem can be solved in the simple UK: Cambridge University Press.
experimental setup, given the language game model, Belpaeme, T. (1999). Evolution of visual feature detectors. In
Evolutionary computation in image analysis and signal process-
the designed robots and under the assumption that
ing and telecommunications first European workshops,
feedback can be established by, for instance, using EvoIASP99 and EuroEcTel99 joint proceedings, Goteborg ¨ ,
pointing. These given assumptions make that the Sweden, LNCS 1596. Berlin: Springer.
physical symbol grounding problem is not entirely Belpaeme, T. (2001). Simulating the formation of color
solved, because for this the language game model, categories. In Proceedings of the international joint conference
on artificial intelligence 2001 ( IJCAI’01), Seattle, WA.
the robots and other assumptions should be rooted
Belpaeme, T., Steels, L., & van Looveren, J. (1998). The
by, e.g., evolution (Ziemke, 1999; Ziemke & Shar- construction and acquisition of visual categories. In Birk, A., &
key, 2001). The experiment nevertheless illustrates Demiris, J. (Eds.), Learning robots: proceedings of the EWLR-
how semiotic symbols can be constructed and used 6, Lecture notes on artificial intelligence 1545. Berlin: Springer.
and is thus an important step towards solving the Bickerton, D. (1998). Catastrophic evolution: the case for a single
step from protolanguage to full human language. In Hurford, J.,
physical symbol grounding problem, or at least, in
Knight, C., & Studdert-Kennedy, M. (Eds.), Approaches to the
our understanding of cognition. evolution of language. Cambridge: Cambridge University Press,
Although the guessing game works well in the pp. 341–358.
current experimental setup, it should be realized that Billard, A., & Dautenhahn, K. (1998). Grounding communication
P. Vogt / Cognitive Systems Research 3 (2002) 429–457 455

in autonomous robots: an experimental study. Robotics and De Saussure, F. (1974). Course in general linguistics. New York:
Autonomous Systems, 24 (1–2), 71–79. Fontana.
Billard, A., & Hayes, G. (1997). Robot’s first steps, robot’s first Dorffner, G. (1992). Taxonomies and part-whole hierarchies in
words. In Sorace, P., & Heycock, S. (Eds.), Proceedings of the the acquisition of word meaning –a connectionist model. In
GALA ’97 conference on language acquisition, Edinburgh. Proceedings of 14 th annual conference of the cognitive science
University of Edinburgh: Human Communication Research society. Hillsdale, NJ: Lawrence Erlbaum, pp. 803–808.
Centre. Dorffner, G., Prem, E., & Trost, H. (1993). Words, symbols, and
Birk, A. (1998). Robot learning and self-sufficiency: What the symbol grounding, Technical report TR-93-30. Wien: Oester-
energy-level can tell us about a robot’s performance. In Birk, reichisches Forschungsinstitut fuer Artificial Intelligence, Avail-
A., & Demiris, J. (Eds.), Learning robots: proceedings of able on-line at http: / / www.ai.univie.ac.at / papers / oefai-tr-93-
EWLR-6, Lecture notes on artificial intelligence 1545. Berlin: 30.ps.gz.
Springer, pp. 109–125. Eco, U. (1976). A theory of semiotics. Bloomington, IN: Indiana
Boden, M. A. (1996). The philosophy of artificial life. Oxford: University Press.
Oxford University Press. Edelman, G. M. (1987). Neural Darwinism. New York: Basic
Braitenberg, V. (1984). Vehicles, experiments in synthetic psy- Books.
chology. Cambridge, MA: MIT Press. Fu, K. S. (Ed.), (1976). Digital pattern recognition. Berlin:
Brooks, R. A. (1990). Elephants don’t play chess. Robotics and Springer.
Autonomous Systems, 6, 3–15. Gibson, J. J. (1979). The ecological approach to visual percep-
Brooks, R. A. (1991). Intelligence without representation. Artifi- tion. Boston, MA: Houghton-Mifflin.
cial Intelligence, 47, 139–159. Harnad, S. (1990). The symbol grounding problem. Physica D,
Brooks, R. A., Breazeal-Ferrell, C., Irie, R., Kemp, C. C., 42, 335–346.
Marjanovic, ´ M., Scassellati, B., & Williamson, M. M. (1998). Harnad, S. (1993). Symbol grounding is an empirical problem:
Alternative essences of intelligence. In Proceedings of the neural nets are just a candidate component. In Proceedings of
fifteenth national conference on artificial intelligence. Menlo the fifteenth annual meeting of the cognitive science society.
Park, CA: AAAI Press. Hillsdale, NJ: Laurence Erlbaum.
Chandler, D. (1994). Semiotics for beginners, http: / Johnson, M. H. (1997). Developmental cognitive science. Oxford:
/ www.aber.ac.uk / media / Documents / S4B / semiotic.html. Blackwell.
Chomsky, N. (1980). Rules and representations. The Behavioral Jung, T., Dauscher, P., & Uthmann, T. (2001). Some effects of
and Brain Sciences, 3, 1–61. individual learning on the evolution of sensors. In Kelemen, J.,
Chomsky, N., & Halle, M. (1968). The sound pattern of English. & Sosık,´ P. (Eds.), Proceeding of the 6 th European conference
Cambridge, MA: MIT Press. on artificial life, ECAL 2001, LNAI 2159. Berlin: Springer, pp.
Churchland, P. M. (1989). A neurocomputational perspective. 432–435.
Cambridge, MA: MIT Press. Kaplan, F. (2000). Talking aibo: First experimentation of verbal
Clancey, W. J. (1997). Situated cognition. Cambridge, UK: interactions with an autonomous four-legged robot. In Nijholt,
Cambridge University Press. A., Heylen, D., & Jokinen, K. (Eds.), Learning to behave:
De Boer, B. (1997). Generating vowels in a population of agents. interacting agents CELE-TWENTE workshop on language
In Husbands, C., & Harvey, I. (Eds.), Proceedings of the fourth technology.
European conference on artificial life. Cambridge, MA: MIT Kaplan, F. (2001). La naissance d’ une langue chez les robots.
Press, pp. 503–510. Paris: Hermes Science.
De Boer, B. (2000a). Emergence of vowel systems through self- Kirby, S., & Hurford, J. (1997). Learning, culture and evolution in
organisation. AI Communications, 13, 27–39. the origin of linguistic constraints. In Husbands, C., & Harvey,
De Boer, B. (2000b). Imitation games for complex utterances. In I. (Eds.), Proceedings of the fourth European conference on
Van den Bosch, A., & Weigand, H. (Eds.), Proceedings of the artificial life. Cambridge, MA: MIT Press.
Belgian–Netherlands artificial intelligence conference, pp. Lakoff, G. (1987). Women, fire and dangerous things. The
173–182. University of Chicago Press.
De Jong, E. D. (2000). The development of communication. PhD Lund, H. H., Hallam, J., & Lee, W. (1997). Evolving robot
thesis, Vrije Universiteit, Brussels. morphology. In Proceedings of the IEEE fourth international
De Jong, E. D., & Steels, L. (1999). Generation and selection of conference on evolutionary computation. IEEE Press.
sensory channels. In Evolutionary computation in image analy- MacQueen, J. B. (1967). Some methods for classification and
sis and signal processing and telecommunications: first Euro- analysis of multivariate observations. In Proceedings of the fifth
pean workshops, EvoIASP99 and EuroEcTel99 joint proceed- Berkeley symposium on mathematical statistics and probability,
¨
ings, Goteborg , Sweden, LNCS 1596. Berlin: Springer. pp. 281–297.
De Jong, E. D., & Vogt, P. (1998). How should a robot Maturana, H. R., & Varela, F. R. (1992). The tree of knowledge:
discriminate between objects. In Pfeifer, R., Blumberg, B., the biological roots of human understanding. Boston: Sham-
Meyer, J. -A., & Wilson, S. (Eds.), From animals to animats, bhala.
Proceedings of the fifth international conference on simulation ´ E. (1995). The major transitions
Maynard-Smith, J., & Szathmary,
of adaptive behavior, vol. 5. Cambridge, MA: MIT Press. in evolution. Oxford: W.H. Freeman.
456 P. Vogt / Cognitive Systems Research 3 (2002) 429–457

McCarthy, J., & Hayes, P. J. (1969). Some philosophical problems without explicit meaning transmission. In Kelemen, J., & Sosık, ´
from the standpoint of artificial intelligence. Machine Intelli- P. (Eds.), Proceeding of the 6 th European conference on
gence, 4, 463–502. artificial life, ECAL 2001, LNAI 2159. Berlin: Springer, pp.
Newell, A. (1980). Physical symbol systems. Cognitive Science, 4, 381–390.
135–183. Steels, L. (1996a). Discovering the competitors. Adaptive Be-
Newell, A. (1990). Unified theories of cognition. Cambridge, MA: havior, 4 (2), 173–199.
Harvard University Press. Steels, L. (1996b). Emergent adaptive lexicons. In Maes, P. (Ed.),
Newell, A., & Simon, H. A. (1976). Computer science as From animals to animats, Proceedings of the fourth interna-
empirical inquiry: Symbols and search. Communications of the tional conference on simulating adaptive behavior, vol. 4.
ACM, 19, 113–126. Cambridge, MA: MIT Press.
Noble, J. (2000). Talk is cheap: evolved strategies for communi- Steels, L. (1996c). Perceptually grounded meaning creation. In
cation and action in asymmetrical animal contests. In Meyer, J. Tokoro, M. (Ed.), Proceedings of the international conference
-A., Berthoz, A., Floreano, D., Roitblat, H., & Wilson, S. W. on multi-agent systems. Menlo Park, CA: AAAI Press.
(Eds.), From animals to animats, Proceedings of the sixth Steels, L. (1997a). Synthesising the origins of language and
international conference on simulation of adaptive behavior, meaning using co-evolution, self-organisation and level forma-
vol. 6. Cambridge, MA: MIT Press, pp. 481–490. tion. In Hurford, J., Knight, C., & Studdert-Kennedy, M. (Eds.),
Nolfi, S., & Floreano, F. (2000). Evolutionary robotics: the Approaches to the evolution of language. Cambridge: Cam-
biology, intelligence, and technology of self-organizing ma- bridge University Press.
chines. Cambridge, MA: MIT Press. Steels, L. (1997b). The synthetic modeling of language origins.
Ogden, C. K., & Richards, I. A. (1923). The meaning of meaning: Evolution of Communication, 1 (1), 1–34.
a study of the influence of language upon thought and of the Steels, L., & Brooks, R. (Eds.), (1995). The ‘ artificial life’ route
science of symbolism. London: Routledge & Kegan Paul. to ‘ artificial intelligence’. Building situated embodied agents.
Oliphant, M. (1999). The learning barrier: Moving from innate to New Haven, CT: Lawrence Erlbaum.
learned systems of communication. Adaptive Behavior, 7 (3–4), Steels, L., & Kaplan, F. (1998). Stochasticity as a source of
371–384. innovation in language games. In Proceedings of Alive VI.
Oudeyer, P. -Y. (2001). The origins of syllable systems: an Steels, L., & Kaplan, F. (1999). Situated grounded word seman-
operational model. In Proceedings of the international confer- tics. In Proceedings of IJCAI 99. San Mateo, CA: Morgan
ence on cognitive science, COGSCI’2001, Edinburgh. Kaufmann.
Peirce, C. S. (1931–1958). Collected papers, vol. I–VIII. Cam- Steels, L., & McIntyre, A. (1999). Spatially distributed naming
bridge MA: Harvard University Press. games. Advances in Complex Systems, 1 (4), 1.
Pfeifer, R., & Scheier, C. (1999). Understanding intelligence. Steels, L., & Vogt, P. (1997). Grounding adaptive language games
MIT Press. in robotic agents. In Husbands, C., & Harvey, I. (Eds.),
Pinker, S., & Bloom, P. (1990). Natural language and natural Proceedings of the fourth European conference on artificial
selection. Behavioral and Brain Sciences, 13, 707–789. life. Cambridge, MA: MIT Press.
Prem, E. (1995). Symbol grounding and transcedental logic. In Sugita, Y., & Tani, J. (2000). A connectionist model which unifies
Niklasson, L. F., & Boden, ´ M. B. (Eds.), Current trends in the behavioral and the linguistic processes: Results from robot
connectionism, Proceedings of the Swedish conference on learning experiments, Technical report SCSL-TR-00-001. Sony
connectionism. Hillsdale, NJ: Lawrence Erlbaum, pp. 271–282. CSL.
Prigogine, I., & Strengers, I. (1984). Order out of chaos. New Sun, R. (2000). Symbol grounding: A new look at an old idea.
York: Bantam Books. Philosophical Psychology, 13 (2), 149–172.
Pylyshyn, Z. W. (Ed.), (1987). The robot’ s dilemma. New Jersey: Tomasello, M., & Barton, M. (1994). Learning words in nonosten-
Ablex Press. sive contexts. Developmental Psychology, 30 (5), 639–650.
Quinn, M. (2001). Evolving communication without dedicated Van Looveren, J. (1999). Multiple word naming games. In
communication channels. In Kelemen, J., & Sosık, ´ P. (Eds.), Postma, E., & Gyssens, M. (Eds.), Proceedings of the eleventh
Proceeding of the 6 th European conference on artificial life, Belgium–Netherlands conference on artificial intelligence. Uni-
ECAL 2001, LNAI 2159. Berlin: Springer, pp. 357–366. versity of Maastricht.
Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Vogt, P. (1998a). The evolution of a lexicon and meaning in
Boyes-Braem, P. (1976). Basic objects in natural categories. robotic agents through self-organization. In La Poutre, ´ H., &
Cognitive Psychology, 8, 382–439. van den Herik, J. (Eds.), Proceedings of the Netherlands–
Roy, D. (2000). A computational model of word learning from Belgium artificial intelligence conference, Amsterdam. Am-
multimodal sensory input. In Proceedings of the international sterdam: CWI.
conference of cognitive modeling, Groningen, The Netherlands. Vogt, P. (1998b). Perceptual grounding in robots. In Birk, A., &
Searle, J. R. (1980). Minds, brains and programs. Behavioral and Demiris, J. (Eds.), Learning robots: proceedings of the EWLR-
Brain Sciences, 3, 417–457. 6, Lecture notes on artificial intelligence 1545. Berlin: Springer.
Shannon, C. (1948). A mathematical theory of communication. Vogt, P. (2000a). Grounding language about actions: Mobile
The Bell System Technical Journal, 27, 379–423; 623–656. robots playing follow me games. In Meyer, P., Bertholz, S.,
Smith, A. D. M. (2001). Establishing communication systems Floreano, F., Hoitblat, H., & Wilson, P. (Eds.), SAB2000
P. Vogt / Cognitive Systems Research 3 (2002) 429–457 457

proceedings supplement book. Honolulu: International Society Roitblat, H. L., & Wilson, S. (Eds.), From animals to animats,
for Adaptive Behavior. Proceedings of the second international conference on simula-
Vogt, P. (2000b). Lexicon grounding on mobile robots. PhD thesis, tion of adaptive behavior, vol. 2. Cambridge MA: MIT Press,
Vrije Universiteit, Brussels. pp. 478–485.
Vogt, P. (2001). The impact of non-verbal communication on Ziemke, T. (1999). Rethinking grounding. In Riegler, A., Peschl,
lexicon formation. In Proceedings of the Belgian /Netherlands M., & von Stein, A. (Eds.), Understanding representation in the
artificial intelligence conference, BNAIC’01. cognitive sciences: does representation need reality. New York:
Whorf, B. L. (1956). Language, thought, and reality. Cambridge, Plenum Press.
MA: MIT Press. Ziemke, T., & Sharkey, N. E. (2001). A stroll through the worlds
Wittgenstein, L. (1958). Philosophical investigations. Oxford: of robots and animals: Applying Jakob von Uexkull’s¨ theory of
Blackwell. meaning to adaptive robots and artificial life. Semiotica, 134 (1–
Yanco, H., & Stein, L. (1993). An adaptive communication 4), 701–746.
protocol for cooperating mobile robots. In Meyer, J. -A.,

You might also like