Levinson Holler 2014
Levinson Holler 2014
Levinson Holler 2014
net/publication/264501060
CITATIONS READS
114 1,239
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Judith Holler on 12 August 2014.
One reason for the apparent gulf between animal and human communi-
cation systems is that the focus has been on the presence or the absence of
language as a complex expressive system built on speech. But language nor-
Opinion piece mally occurs embedded within an interactional exchange of multi-modal
signals. If this larger perspective takes central focus, then it becomes appar-
ent that human communication has a layered structure, where the layers
Cite this article: Levinson SC, Holler J. 2014
may be plausibly assigned different phylogenetic and evolutionary
The origin of human multi-modal
origins—especially in the light of recent thoughts on the emergence of
communication. Phil. Trans. R. Soc. B 369: voluntary breathing and spoken language. This perspective helps us to
20130302. appreciate the different roles that the different modalities play in human
http://dx.doi.org/10.1098/rstb.2013.0302 communication, as well as how they function as one integrated system
despite their different roles and origins. It also offers possibilities for recon-
ciling the ‘gesture-first hypothesis’ with that of gesture and speech having
One contribution of 12 to a Theme Issue
evolved together, hand in hand—or hand in mouth, rather—as one system.
‘Language as a multimodal phenomenon:
implications for language learning, processing
and evolution’.
1. Introduction
Human communication is unusual in the animal world on at least two principal
Subject Areas: counts: it has an unrivalled complexity and expressivity on the one hand, and an
behaviour, cognition, evolution unparalleled inter-group variation on the other. The combination is extraordi-
nary, because the variation within an unusually genetically homogeneous
species excludes a fully biological explanation. In this paper, we take the view
Keywords:
that human communication is evolutionarily stratified, composed of layers of
language evolution, intentional vocalization, abilities of different types and different antiquity. A wide range of scholars
gesture, multi-modal communication, of different perspectives seem to subscribe to such a general view (e.g. [1,2]).
deixis, iconicity But in this paper we suggest that viewing language as embedded in its full prag-
matic, interactive and multi-modal context transforms this stratificational
perspective. Unpeeling the layers can then help us see the different contributions
Author for correspondence:
of the distinct systems that underlie the peculiarities of human communication.
Stephen C. Levinson It is often hard for the literate world to remember that the core ecology for
e-mail: stephen.levinson@mpi.nl language use is in face-to-face interaction—this is the niche in which languages
are learnt and where the great bulk of language use occurs. In this niche,
language production always occurs with the involvement of not only the
vocal tract and lungs, but also the trunk, the head, the face, the eyes and, nor-
mally, the hands. Our upright posture allows the whole ventral surface of the
body to be used in communication. The speaker produces a multi-modal dis-
play, part semiotic, part entrained by the effort of vocal production. When
thinking about the origins of human language, it is essential to bear this ensem-
ble of linked systems in mind. The ease with which human language switches
the main channel or modality carrying lexical material from mouth to hands, as
in the sign languages of the deaf, should be a constant reminder that human
communication is a system of systems, where the burden of information can
be shifted from one part to another (see also [3–6]).
& 2014 The Author(s) Published by the Royal Society. All rights reserved.
Downloaded from rstb.royalsocietypublishing.org on August 5, 2014
followed, perhaps with a 100 000 year lag, by the emergence other capacities. Indeed, careful observation of human inter- 2
of anatomically modern humans about 200 000 years ago. action reveals the very same process in constant operation—if
rstb.royalsocietypublishing.org
Clearly, however, it must predate the great diaspora of I reach as if to get the water, you pass it to me. Much of the
modern humans thought to date to 60 000 years ago. In a facility with which we interact and cooperate depends on
recent meta-study surveying all the genetic, anatomical and precisely this kind of nonce gesture [33].
archaeological evidence, Dediu & Levinson [7] argue that a Researchers in linguistic pragmatics have long held that
compelling case can be made for a much earlier origin of language is the tip of an iceberg riding on a deep infrastruc-
modern vocal language at over half a million years ago, ture of communicational abilities [34,35]. Simple utterances
dating back to the common ancestor (often identified as are rarely interpreted at face value—thus Are you using that
Homo heidelbergensis) of modern humans and Neandertals pencil? is likely to be read as a request, Do you want another
(see [8] for additional evidence supporting the idea of an beer? as an offer, What are you doing tonight? as a prelude
early origin of modern human language). The development to an invitation, and so on. Grice [36] emphasized the
ways: a request ( perhaps visual) may prompt a visible action, a persons without verbal reference at all [75]. Village sign 3
wave another wave back, a passing of a needed item a recipro- languages are systems that have arisen de novo over four or
rstb.royalsocietypublishing.org
cal grasping and so forth. They also appear to have a largely more generations in remote areas where there are significant
universal structure, and recent work suggests that precursors pockets of inherited deafness; these typically make extensive
of some of these very same visual, bodily action sequences use of pointing, to the extent, for example, where the
can be seen in apes [30,55]. Sequences can be embedded in language has no need for place-names [76].
other sequences, so offering recursive structures that may be While pointing has a strong symbolic element with its
the ultimate origin of recursion in language syntax [40]. arbitrary and species-specific use of the index finger, many
Human interactional ethology, despite the presence of of the details of pointing are indexical—for example, the
precursor elements in other primate species, as a whole varied hand shapes, orientations and elevations of the arm
ensemble is entirely distinctive. For example, the toleration are adapted to the relevant aspects of the surroundings, as
or indeed expectation of mutual gaze is of paramount impor- when the distance of an unseen referent is indicated with
The ability of the gestural modality to depict spatial relations hypothesis is that, given its language independence, the inter- 4
has implications beyond the spatial domain, for iconic gestures action engine is phylogenetically older than language, and
rstb.royalsocietypublishing.org
and signs are well suited to depicting transitivity, and thus perhaps characterized the communication of early Homo
agents and patients [87,88]. In the evolution of language, this before complex speech evolved. This system is itself no doubt
facility to indicate space must have given gesture a special phylogenetically layered, and one can speculate that the
importance (for a detailed discussion of the role of the visual more ethological elements (e.g. mutual gaze, gesture and
modality in the emergence of complex syntax, see [2]). In turn-taking) may have partially preceded and driven the
current human communication, gestures depicting spatial underlying cognitive capacity for evaluating other minds,
relations still carry a great deal of the communicative import since there are plausible precursors in the primate lineage
of informal conversation. (e.g. gestures among the great apes, and turn-taking in other
All this suggests a long phylogenetic acquisition of layers primate clades). The development of this system may in turn
of this system. At the deepest stratum, one may assume a have been pushed by ecological changes to which increased
rstb.royalsocietypublishing.org
voluntary modern
ritualized ‘interaction iconic gestural vocal language
time-scale taxa gestures engine’ pointing representations utterances capacities
6 mþ Hominidae þ
2m early Homo þ þ þ þ
1 – 0.6 m immediate ancestor of þ þ þ þ þ
H. heidelbergensis
0.2 H. sapiens (incl. þ þ þ þ þ þ
broadcast, communication without sight and so on. Thus, a account of how the communication burden may gradually
multi-modal communication system that combines both ges- have shifted from hand to mouth), what results is one inte-
ture and speech, and thus their complementary strengths grated multi-modal communication system, as suggested by
and weaknesses, seems to meet human communication many details of the whole assemblage. Evidence for this
needs rather optimally. hypothesis is plentiful. For example, hand and mouth are clo-
The proposed sequential accumulation of layers of a com- sely connected in the somatotopic organization of the human
munication system maps roughly onto the stages we observe motor cortex (e.g. [99,100]), and a very similar connection
during the development of communication in human is also evident in the monkey motor cortex [101]. The
children—proto-gestures deriving from ritualized action hand – mouth connection is further evidenced by overt
sequences such as stretching one’s arms up in the air in human behaviour such as drawing or cutting something,
order to be picked up occur very early (albeit in an already which is frequently accompanied by intricate movements of
more generalized form than with young apes, [96]) and act the tongue, lips or jaws. And although the hands are the
as a form of pre-linguistic turn-taking, followed by pointing, major articulators in sign languages, the mouth and face
followed by both kinds of brachio-manual action becoming are always also involved. Neuroanatomical asymmetries in
integrated with speech to function as co-speech deictic and the brains of non-human primates and the lateralization of
co-speech iconic (or other representational) gestures. both their vocal and their gestural communicative signals
The details of this development are telling. For example, [102 –104] further corroborate the notion of an early evol-
turn-taking before language can be quite fast, with the infant utionary link between hand and mouth (but see [105]). In
responding in under three-quarters of a second [97], approach- addition, congenitally blind individuals gesture while they
ing adult norms. But later, when children are trying to respond speak despite never having seen a single gesture [106,107].
with more complex language at say 3 years of age, the response And further evidence consists of the fact that neurons
times can be twice as slow [98], converging with adult norms coding for manual goal-directed transitive movements
only in middle childhood. This suggests that the natural occupy areas in the monkey brain that correspond to brain
rhythm of conversation is independent of spoken language, areas critical for processing language in the human brain—
and children have to gradually learn to compress complex the putative mirror neuron system [108– 110].
material into the short rapid bursts of speech that adults use. It may then be that there was pre-adaptation for an inte-
The adult turn-taking speed puts extraordinary pressure on grated multi-modal communication system based on a close
language production and comprehension: since it takes marriage between hands and mouth, which was only fully
between 600 and 1500 ms to plan a response, and the gaps exploited when the changes in cortical organization occurred
between turns are only on average 200 ms, this forces those that made voluntary breathing and intentional spoken
engaged in dialogue to be already planning responses long communication possible. Given the large time-scales we are
before the other speaker has completed his or her utterance. envisaging, the gradual co-evolution of vocal language with
Comprehension and production must thus work in parallel, a pre-existing gestural mode of communication may have
with the next speaker predicting the ongoing turn by the taken place over nearly a million years, so that the different
other, in order to achieve precisely timed behavioural alterna- modalities are deeply intertwined. This view may therefore
tion. The whole system suggests an evolution from an original not be as diametrically opposed as it seems to McNeill’s
rapid exchange of very simple gestural or vocal material, into a [111] proposal that speech and gesture coevolved from the
system where the complexity of the linguistic and gestural beginning. In the stratificational model we are proposing,
material that is crammed into these short bursts has grown to our cousins the great apes suggest an early reliance on the
the very limits that human cognition can process. gestural modality, but co-occurring simple vocalizations
may rapidly have emerged as a way of drawing attention
to the signals. Indeed, it is possible that multimodality is
3. One multi-modal communication system or actually present in our great ape cousins and thus in the
common ancestor, since the extent to which their gestural sig-
separate systems? nals co-occur with vocalizations is still a largely unexplored
The evidence is that, despite the modern human communi- domain. In any case, from those initial multi-modal seeds
cation system having evolved in layers (see [22,90] for one our fully-fledged multi-modal communication system
Downloaded from rstb.royalsocietypublishing.org on August 5, 2014
would have evolved in the layered manner suggested. development, the emergence of gesture and speech is intri- 6
During the course of this evolution, the communicational cately connected as evidenced by the two modalities’
rstb.royalsocietypublishing.org
burden has progressively but only partially shifted from parallel developmental trajectories and patterns [5,118]. This
hand to mouth, but sign languages offer a constant reminder connection is maintained in adulthood and evident in the
that the roots of the system are ‘hand þ mouth’ and where precise temporal, semantic and pragmatic relationship of
the motivation exists the burden can be readily shifted back. speech and gesture [5], and further supported by the finding
Crucially, this perspective is at odds with the view that that semantic information conveyed by speech and gesture is
gesture was a ‘bridging modality’ that withered away processed in the same brain areas [119–121].
once conventionalized spoken language had emerged (e.g. Despite the remarkable flexibility that allows us to shift
[2,21,109], see also [111]). Such a view reduces the role of ges- between visual and verbal modes of communication as
ture to a scaffolding function for the evolution and needs require, the neglected modality is rarely fully repressed.
ontogenetic development of speech, with speech, once fully Thus, deaf signers mouth and even vocalize, while speakers
References
1. Hauser MD, Chomsky N, Fitch WT. 2002 The faculty grammatical expression in the manual modality. linguistic capacities and its consequences. Front.
of language: what is it, who has it, and how did it Psychol. Rev. 103, 34 –55. (doi:10.1037/0033-295X. Psychol. 4, 397. (doi:10.3389/fpsyg.2013.00397)
evolve? Science 298, 1569– 1579. (doi:10.1126/ 103.1.34) 8. Schepartz LA. 1993 Language and modern human
science.298.5598.1569) 5. McNeill D. 1992 Hand and mind: what gestures origins. Yearb. Phys. Anthropol. 36, 91– 126.
2. Tomasello M. 2008 Origins of human reveal about thought. Chicago, IL: Chicago University (doi:10.1002/ajpa.1330360607)
communication. Cambridge, UK: MIT Press. Press. 9. Jürgens U. 2002 Neural pathways underlying vocal
3. Clark HH. 1996 Using language. Cambridge, UK: 6. Kendon A. 2004 Gesture: visible action as utterance. control. Neurosci. Biobehav. Rev. 26, 235–258.
Cambridge University Press. Cambridge, UK: Cambridge University Press. (doi:10.1016/S0149-7634(01)00068-9)
4. Goldin-Meadow S, McNeill D, Singleton J. 1996 7. Dediu D, Levinson SC. 2013 On the antiquity of 10. Simonyan K, Horwitz B. 2011 Laryngeal motor
Silence is liberating: removing the handcuffs on language: the reinterpretation of Neandertal cortex and control of speech in humans.
Downloaded from rstb.royalsocietypublishing.org on August 5, 2014
Neuroscientist 17, 197–208. (doi:10.1177/ 30. Rossano F. 2013 Gaze in conversation. In The edwardsi)? Am. J. Phys. Anthropol. 139, 523–532. 7
1073858410386727) handbook of conversation analysis (eds T Stivers, (doi:10.1002/ajpa.21017)
rstb.royalsocietypublishing.org
11. MacLarnon A, Hewitt G. 2004 Increased breathing J Sidnell), pp. 308–29. Chichester, UK: Wiley- 49. Snowdon CT, Cleveland J. 1984 ‘Conversations’
control: another factor in the evolution of human Blackwell. among pygmy marmosets. Am. J. Primatol. 7,
language. Evol. Anthropol. 13, 181– 97. (doi:10. 31. Thorpe WH. 1966 Ritualization in ontogeny. 15– 20. (doi:10.1002/ajp.1350070104)
1002/evan.20032) I. Animal play. Phil. Trans. R. Soc. Lond. B 251, 50. Takahashi DY, Narayanan DZ, Ghazanfar AA. 2013
12. Fitch WT. 2009 Fossil cues to the evolution of 311 –319. (doi:10.1098/rstb.1966.0015) Coupled oscillator dynamics of vocal turn-taking in
speech. In The cradle of language (eds RP Botha, 32. Scott-Phillips TC, Blythe RA, Gardner A, West SA. monkeys. Curr. Biol. 23, 2162 –2168. (doi:10.1016/j.
C Knight), pp. 112– 134. Oxford, UK: Oxford 2012 How do communication systems emerge? cub.2013.09.005)
University Press. Proc. R. Soc. B 279, 1943 –1949. (doi:10.1098/rspb. 51. Lemasson A, Glas L, Barbu S, Lacroix A, Guilloux M,
13. Bramble DM, Lieberman DE. 2004 Endurance 2011.2181) Remeuf K, Koda H. 2011 Youngsters do not pay
running and the evolution of Homo. Nature 432, 33. Clark HH. 2005 Coordinating with each other in a attention to conversational rules: is this so for
human communication. Cogn. Sci. 36, 698– 713. 81. Hinton L, Nichols J, Ohala J (eds) 1994 Sound 99. Aflalo TN, Graziano MS. 2006 Possible origins of 8
(doi:10.1111/j.1551-6709.2011.01228.x) symbolism. Cambridge, UK: Cambridge University the complex topographic organization of motor
rstb.royalsocietypublishing.org
65. Carpendale JI, Carpendale AB. 2010 The Press. cortex: reduction of a multidimensional space
development of pointing: from personal 82. Dingemanse M. In press. Expressiveness and system onto a two-dimensional array. J. Neurosci. 26,
directedness to interpersonal direction. Hum. Dev. integration. On the typology of ideophones, with 6288 – 6297. (doi:10.1523/JNEUROSCI.0768-06.
53, 110–126. (doi:10.1159/000315168) special reference to Siwu. STUF—Language 2006)
66. Pika S, Liebal K, Tomasello M. 2005 Gestural Typology and Universals. 100. Meier JD, Aflalo TN, Kastner S, Graziano MS. 2008
communication in subadult bonobos (Pan paniscus): 83. Haiman J. 1985 Natural syntax: iconicity and erosion. Complex organization of human primary motor
repertoire and use. Am. J. Primatol. 65, 39 –61. Cambridge, UK: Cambridge University Press. cortex: a high-resolution fMRI study. J. Neurophysiol.
(doi:10.1002/ajp.20096) 84. McNeill D. 1985 So you think gestures are 100, 1800– 1812. (doi:10.1152/jn.90531.2008)
67. Hobaiter C, Leavens DA, Byrne RW. 2013 Deictic nonverbal? Psychol. Rev. 92, 350–371. (doi:10. 101. Graziano MS, Aflalo TN. 2007 Mapping behavioral
gesturing in wild chimpanzees (Pan troglodytes)? 1037/0033-295X.92.3.350) repertoire onto the cortex. Neuron 56, 239 – 251.
115. McNeill D. 2005 Gesture and thought. Chicago, IL: and the human mirror system. Brain Lang. 101, 123. Holler J, Tutton M, Wilkin K. 2011 Co-speech 9
University of Chicago Press. 260 –277. (doi:10.1016/j.bandl.2007.02.008) gestures in the process of meaning coordination. In
rstb.royalsocietypublishing.org
116. Kendon A. 1985 Some uses of gesture. In Perspectives on 120. Willems RM, Hagoort P. 2007 Neural evidence for Proc. 2nd GESPIN—Gesture and Speech in
silence (eds D Tannen, M Saville-Troike), pp. 215–234. the interplay between language, gesture, and Interaction Conf., Bielefeld, 5–7 September 2011.
Norwood, NJ: Ablex. action: a review. Brain Lang. 101, 278–289. See http://hdl.handle.net/11858/00-001M-0000-
117. Gershkoff-Stowe L, Goldin-Meadow S. 2002 Is there (doi:10.1016/j.bandl.2007.03.004) 0012-1BB3-D.
a natural order for expressing semantic relations? 121. Willems RM, Özyürek A, Hagoort P. 2007 When 124. Alibali MW, Heath DC, Myers HJ. 2001 Effects of visibility
Cogn. Psychol. 45, 375 –412. (doi:10.1016/S0010- language meets action: the neural integration between speaker and listener on gesture production:
0285(02)00502-9) of gesture and speech. Cerebral Cortex 17, some gestures are meant to be seen. J. Mem. Lang. 44,
118. Goldin-Meadow S. 2003 Hearing gesture: how our 2322 –2333. (doi:10.1093/cercor/bhl141) 169–188. (doi:10.1006/jmla.2000.2752)
hands help us think. Cambridge, MA: Harvard 122. Bavelas JB, Gerwing J, Sutton C, Prevost D. 2008 125. Jacobs N, Garnham A. 2007 The role of
University Press. Gesturing on the telephone: independent effects of conversational hand gestures in a narrative task.