217
Chapter 11
Deixis an d t h e
Interacti ona l
Fou ndations of
Refere nc e
Jack Sidnell and N. J. Enfield
11.1 Introduction
All reference involves directing the attention of some other person to something. The
something to which attention is directed may or may not be present in the immediate
context of interaction. Whether the referent is a hilltop in plain view, a bird’s singing,
Gottlob Frege, sorrow, the ideas of Augustine, or the concept of liberty, making reference requires bringing the recipient’s attention in line with that of the speaker. If human
cognition is fundamentally intentional in the sense of being about or directed towards
something, reference is a form of shared intentionality in which the cognitive focus of
two or more persons is aligned and jointly focused. In deictic reference, this directing
of attention is accomplished by relating an object of reference to some aspect of the
event of speaking—the indexical origo (Bühler 1982 [1934])—via a ground. So for instance when I point to a book and say ‘this one’ in response to the question ‘Which are
you reading?’, my recipient’s attention is directed to the book by relating it to my location, and specifying the relation as one of relative proximity (or immediacy of access—
see Fillmore 1982; Hanks 1990, 2005).
In this chapter we develop an account of deixis that builds from its simplest manifestation in acts of gaze-following. For humans, gaze-following results from a basic
propensity to attend to the attention of others. Because co-present others are able to
control their own gaze and other visible signs of attention they can actively manipulate
another’s attention such that what was a cue becomes a signal (see Krebs and Dawkins
1984). Pointing and all other forms of deixis (indeed all forms of reference) exploit this
218
218
Jack Sidnell and N. J. Enfield
propensity by actively directing others’ attention. Of special importance to our account
is so-called lip-pointing in which a meta-communicative facial expression (conveyed
by a configuration of lip and or head; Sherzer 1972, Enfield 2009: ch. 3) indicates that
a participant’s gaze direction is, at that moment, to be understood as an intentional,
communicative signal.
With shared intentionality as a foundation, all languages have developed systems of
deictic markers: for example, demonstratives such as English that and this. These systems display a defining semiotic property of human communication, namely the use
of signs that not only have meanings in themselves, but whose meanings are enriched
through relations of opposition and contrast with other elements of the system, such
that each element has a composite meaning, a combination of what it is and what it is
not. Simple systems in the domain of deixis feature a semantically marked form in opposition to an unmarked one. More complex systems involve multiple dimensions of
contrast. A further way in which the meanings of elements of a deictic system may be
enriched is through their mapping onto the local socioculturally constituted worlds of
their users. Speakers use deictic forms to refer to locally relevant features of the environment and deictic systems are interwoven with the sociocultural world in complex
and sometimes counter-intuitive ways.
An overarching question to be addressed is, ‘What’s special about deixis as a form
of reference? How does it differ if at all from reference accomplished by non-deictic
means, and what consequence does this difference have for its function or use in
actual situations of social interaction?’ In order to address this question, we begin by
developing an account of deixis that is rooted in basic, instinctive human propensities for (a) intentional, goal-directed behaviour and (b) the capacity for two or more
individuals to share attention. Together, these human capacities provide a basis for
the collective or shared intentionality that underwrites all forms reference, including reference accomplished via the use of deixis. We then turn to briefly sketch the
semantic domain describing the essential elements of deictic reference and some of
the documented typological variation. Much of the literature in this area focuses on
just these issues and so here we do little more than provide a thumbnail sketch and
point to relevant landmarks. We then consider demonstrative reference in which the
recipient’s attention is directed either by talk, gesture, or gaze to some enumerable
thing. Here we show that deixis is a low-cost, high-efficiency, minimally characterizing way to accomplish reference. These features surely account for many of its uses
in interaction. But we suggest that referrers select deixis for reference for reasons
other than efficiency. First, the semantically general character of deictic forms makes
them well-suited for reference to hard-to-describe and/or nameless objects. In such a
situation a deictic form can exploit features of the artifactual environment including
the presence of the thing being referred to. Second, the semantically non-specific,
minimally characterizing features of deixis allow speakers to avoid description where
such description may be counterproductive to some interactional goal. Third, because these forms require for their interpretation the application of knowledge in
common ground (shared knowledge), successful reference via such a form can be
219
Deixis and the Interactional Foundations of Reference
219
a demonstration of social proximity—an informational enactment of intimacy (see
Enfield 2006).
11.2 Directing Attention in Deixis
At about nine months of age human infants begin to engage in a suite of joint attentional behaviours such as gaze-following and joint engagement with objects. These behaviours differ markedly from those of younger infants which are primarily dyadic.
At about this age, ‘infants for the first time begin to “tune in” to the attention and
behavior of adults toward outside entities …’ (Tomasello 1999: 62). We can think of
gaze-following schematically as in Figure 11.1.
In following the gaze of another, a human infant is attending to that other’s attentional state. Essentially the infant is treating the other’s gaze direction as a sign and their
own gaze redirection is an interpretant of that sign (see Kockelman 2005). Importantly,
however, gaze-following of this kind occurs at least partially independently of whether
the other intended their own gaze to function as a communicative signal. The initial
gaze redirection then may function to prompt an infant’s gaze redirection either as a
signal or a cue. As Tomasello notes, it is at around this same age—nine months—that
infants also begin to direct adult attention to things using deictic gestures such as pointing, or by holding up an object to show it to someone. So at least ontogenetically there
seems to be a correlation between the emergence of gaze-following and the emergence
of deictic pointing and showing.
There is also a clear conceptual connection between gaze-following and deictic
pointing. Milgram and colleagues (1969) showed, somewhat inadvertently, that gazefollowing in adults was sensitive to the character of the stimulus. The study showed that
larger crowds of gazing individuals were more likely to promote gaze-following than
smaller crowds. The basic, apparently instinctive, propensity of humans to follow the
gaze of others is then available for manipulation—altering aspects of the stimulus/sign
will make gaze-following by others more likely. This, then, allows us to see the connection between gaze-following and ‘true’ pointing, one version of which, often referred
Infant
Parent
Object
Figure 11.1 Basic structure of gaze-following
220
220
Jack Sidnell and N. J. Enfield
to as ‘lip-pointing’, is done with gaze—indeed, it is essentially ‘gaze-pointing’. Figure
11.2 is a still image from video taken by Niclas Burenhult of a Jahai speaker in Malaysia
lip-pointing.
As Enfield (2001: 186) writes, in relation to a study of lip-pointing among speakers
of Lao, the term ‘lip-pointing’ ‘should not be taken to suggest that only the lips are involved…. Additional actions of chin-raise/head-lift, gaze direction, and eyebrow raise
are usually involved.’ Key for our purposes is the fact that the vector of pointing is defined by gaze while the ‘lips’ actually serve a meta-communicative purpose, signalling
that the gaze is being used as a point. Enfield (2001: 185) thus writes, ‘the “vector” of
lip-pointing is in fact defined by gaze, and the lip-pointing action itself (like other kinds
of “pointing” involving the head area) is a “gaze-switch”, i.e. it indicates that the speaker
is now pointing out something with his or her gaze.’ The example of lip-pointing thus
illustrates the way that humans can accomplish intentional reference (i.e. non-natural
meaning in Grice’s 1957 sense) through small manipulations of naturally meaningful
behaviours (gaze direction) which exploit the human propensity to follow another’s
gaze. The introduction of a meta-communicative overlay (chin/lip/head) on gaze direction transforms a cue into something another can recognize as a true intentional
signal—‘He’s referring to that thing/person/area over there.’
Figure 11.2 Still image from video of a Jahai speaker lip-pointing, provided by Niclas Burenhult
221
Deixis and the Interactional Foundations of Reference
221
Table 11.1 Current and projected focus of attention in deixis
Gaze-following
Lip-pointing
Finger-pointing
CFA only
CFA=PFA
CFA=PFA or CFA≠PFA
We are now in a position to describe one distinctive feature of finger-pointing relative
to the other forms of primitive deixis so far described. Specifically, in finger-pointing it
is possible to separate the speaker’s current focus of attention from the focus that they
are proposing for a recipient. We can describe the basic elements and their combinatorial possibilities by means of the following table and figures. In Table 11.1 ‘current focus
of attention’ is annotated CFA and ‘proposed focus of attention’ is annotated PFA.
This can be easily seen in the frame-grabs in Figure 11.3, taken from a video-recorded
interaction among speakers of Bequia creole. In the first frame Viv (in the foreground)
is telling Baga (in the background) about a man she thinks he might know but whose
name he does not recognize. When the description given allows Baga to identify the
person Viv is talking about, he points up the hill to his right (Figure 11.3b). Notice that
when Baga initially points, his own gaze is directed to the place he is indicating with his
finger (i.e. CFA = PFA). In Figure 11.3c, he maintains the pointing gesture but now gazes
toward Viv apparently to check whether his reference has been successful—checking,
that is, on his recipient’s focus of attention (i.e. CFA≠PFA). He finds Viv pointing to
the same place and the two engage in a moment of mutual gaze. Here then we can see,
in the visible behaviour of the participants, how reference involves joint attention such
that two persons are not only publicly projecting their attention to the same referent,
but where they are, in addition, mutually aware of the current alignment, and thus
sharedness, of their two lines of attention.
So we can see why this possibility of separating the speaker’s/gesturer’s directing signal
from the speaker’s gaze is important since a joint attentional frame crucially involves
the speaker monitoring the recipient’s attention to some third object (Carpenter et al.
1998; Tomasello 1999; Liszkowski et al. 2004; Tomasello et al. 2005). This monitoring
of the recipient transforms common attention to a THIRD into true joint attention—a
basic form of shared intentionality (see also Gilbert 1989; Searle 2010). It is relevant
to note here that ‘lip-pointing’—which crucially involves gaze as noted above—seems
specialized among Lao speakers in two ways: First, ‘lip-pointing is apparently restricted
to cases when the addressee is looking at the speaker’ (Enfield 2001: 192) and second
‘to acts of direct ostension in which the location or identity of a referent in the physical
environment is in focus’ (Enfield 2001: 196, emphasis added). The prior establishment of
recipiency along with the already ‘in focus’ character of the referent, it can be supposed,
obviates or at least alleviates the need to monitor the recipient.
Finger-pointing (Kita 2003) would also seem to allow for a higher informational
load than do the forms of ‘lip/gaze-pointing’ we have considered. Thus, researchers
have noted various functional contrasts here in little versus big points (Enfield et al.
222
222
Jack Sidnell and N. J. Enfield
(a)
(b)
(c)
(d)
Figure 11.3 Finger-pointing, Bequia, St. Vincent (see Sidnell 2005) (Still image from video
recording).
2007), those that are accompanied by gaze versus those without (Streeck 1993), as well
as informational possibilities associated with different hand and finger configurations
(Wilkins 2003; Kendon and Versante 2003). Finger-pointing also makes ‘path descriptions’ possible, as well as illustrative combinations. At the far end of the informational
scale are diagrammatic representations in which pointing gestures are used to identify
positions within a virtual drawing (see Enfield 2005).
We can see many of the basic features of deictic reference in another form of behaviour among infants which Kidwell and Zimmerman (2007) as well as Tomasello (1999)
and Clark (2003) describe as ‘showing’. In a typical showing sequence, a young child
will approach another (typically an adult) with an outstretched arm and an object in
hand (see Figure 11.4), the other might produce a response which identifies the object
(‘Watermelon’), expresses a social-relational feature of the object (‘Your shoe’), or appreciates it in some way (‘Oh wow, a pretty hat’). The showing child then withdraws
the object from view and/or moves out of the recipient’s line of vision, either returning
to the activity she was engaged in before the showing or initiating some new activity.
Such ‘showings’ are arguably one of the most basic forms which exhibit the triadic,
joint attentional interaction configuration that constitutes the very foundation of reference in all its various forms (see Tomasello 1999, 2003). Clark (2003) has explicated
223
Deixis and the Interactional Foundations of Reference
223
Figure 11.4 Human infant showing object to camera person
the parallels and the key difference: in pointing, the other’s attention is made to move
toward the current location of a thing, while in showing, a thing is moved into the
current line of the other’s attention—either way, the other’s attention ends up directed
towards the thing. In the current context, showings can be understood as an early form
of demonstrative, or better, ‘presentational’ deixis akin to adult uses of French ‘voilà’
or English ‘look at this’. Instructional activities build upon the human propensity for
attending to the attention of another, and showings play an important role in their
organization. Rembrandt’s Anatomy Lesson of Dr Nicolaes Tulp provides a stunning illustration (Figure 11.5).
Here Tulp is presenting a part of the cadaver for the consideration of his students,
some of whom look attentively at that which is being shown. Through such presentations or showings, novices are socialized into new ways of ‘seeing’ the world around
them, ways of seeing that are appropriate to some particular status or role (see Goodwin
1994; Kockelman 2007). In showings, then, we see not only the roots of reference in
224
224
Jack Sidnell and N. J. Enfield
Figure 11.5 The Anatomy Lesson of Dr Nicolaes Tulp, Rembrandt Harmenszoon van Rijn (1632)
human action but moreover the interactional foundations of human teaching, learning,
the transmission of knowledge across generations, and thus, ultimately, of culture (see
Tomasello 1999).
11.3 Demonstrative Systems
In order to achieve joint attention on something, that thing must be somehow picked
out from the range of possible things that one might be attending to. Often there are
many possible things a person might be looking at or pointing to, and there are various
ways to solve the problem of figuring out just which thing is the focus of attention. In
the joint-attentional behaviours described in the previous section, details of body comportment such as eye gaze, pointing, and showing constitute relatively straightforward
ways to narrow another’s attention on something to the exclusion of other possible
referents in a context.1 But when the deictic function is supplied purely by the selection
1
Of course, as Wittgenstein (1953) and others pointed out, all reference involves a certain degree of
indeterminacy. So, for instance, in the example from Rembrandt a recipient must infer, on the basis of
225
Deixis and the Interactional Foundations of Reference
225
of a word, there is little of inherent value in the word form itself that helps to solve this
narrowing-in function. This is why demonstratives like that and this are often accompanied by some form of deictic bodily behaviour (or descriptive lexical content—e.g.
‘That blue one’ etc.). At the same time, such linguistic forms are also able to rely on the
special salience of potential referents as determined by the current common ground of
interlocutors; for instance, one might say ‘My brother has a car like that one’ while there
are numerous cars in view, but where the car has a special salience in the scene—for example, it just drove past us, or it is painted a garish colour, or is particularly expensivelooking (Clark, Schreuder, and Buttrick 1983).
Take examples like I heard that, Take this, or Were you at that party? These are semantically very general forms of expression, and a listener can only make sense of
them by connecting the speech to something semantically much more specific such
as a physical object or something in the spoken discourse or other shared knowledge,
in other words, in the common ground (Clark 1992, 1996). The salience required for
the successful connecting of a demonstrative to a referent may come from different
sources. Certain things might be salient already because they are large, bright, central,
or otherwise prominent in their surroundings (Clark et al. 1983). And one can render
something salient in various ways (e.g. by pointing at it, looking at it, using a laser
pointer, shining a light, holding the thing up). Ultimately, however, even where many
sources of information converge to suggest a single referent, recipients of deictic expressions must infer what is being indicated.
Syntactically, demonstratives may serve a range of different functions. For example,
in English that may occur as an independent noun phrase (e.g. I saw that) or as a modifier within a noun phrase (e.g. I saw that car). Some demonstratives are ‘adverbial’ in
function, in that they can be seen to relate to or modify events and actions (e.g. there in
I went there). Depending on which language system we consider, demonstratives show
different distributions (thus, in English I saw that/*there, I went *that/there, I saw that
car/*there car). The details of such distinctions are subtle and complex and are particular to each language system (see Anderson and Keenan 1985, Diessel 1999, Dixon 2003
for reviews).
One common function of demonstratives in spoken language is ‘exophoric’. In exophoric uses, reference is made to physical things and places that can be seen and
pointed to in the context of the speech event. Alongside these exophoric functions,
there are also endophoric referential uses of demonstratives (Halliday and Hasan 1976).
In endophoric uses, reference is made not to things that can be physically pointed to
and shown, but to things in the discourse context, which often includes things that
have been said (e.g. anaphoric use of that in He said it was good and I agreed with
that), but could also refer to things that will be said next (e.g. cataphoric use of this
in What I want to say is this: I agree). Another kind of endophoric reference points to
whatever evidence is available, whether the doctor intends to draw attention to the arm, the tendon,
the flesh, or the entire body of the cadaver. Or again whether it is the colour, the size, or shape of some
or all of the cadaver that is being indicated.
226
226
Jack Sidnell and N. J. Enfield
things in the shared common ground, sometimes referred to as a ‘recognitional’ usage
(Himmelmann 1996, after Sacks and Schegloff 2007b [1979], see later); this is found in
cases like He reminds me of that boyfriend of Jane’s, where in order to resolve the reference of ‘that’, the listener consults neither the physical setting nor the current discourse,
but rather the interpersonally shared common ground of the dyad. The endophoric
uses of demonstratives are often regarded as secondary or derived from exophoric uses,
based on arguments from both ontogeny (infants acquire exophoric functions first)
and diachrony (endophoric functions often develop from exophoric ones; see Diessel
1999 for a statement of this position). However, it is not clear that from a synchronic
perspective either function is subordinate to the other. Hanks (1990) and Enfield (2003)
have argued that the core meanings of demonstratives do not semantically specify an exophoric versus endophoric distinction, rather that these are simply distinct (and sometimes not-so-distinct) pragmatic contexts of use of the semantically general terms.
Typological work on demonstratives indicates that there is significant and subtly complex variation across languages in terms of the semantic dimensions that are encoded,
the number of distinctions, and the grammatical properties of the various elements of
the systems. Here we do not attempt to give an overview of the typological properties
of demonstratives and demonstrative systems (for that, see e.g. Himmelmann 1996;
Diessel 1999; Dixon 2003; Huang 2014). We will simply introduce a few of the known
‘realms of possibility’, concentrating specifically on the number and semantic types of
possible distinctions found in systems of ‘demonstrative adjectives’ (i.e. words like that
and this as modifiers in expressions with nominal referents; e.g. that car or this book).
A demonstrative system can be extremely simple in terms of the number of distinctions it makes along a given dimension such as distance from speaker. Colloquial
German, for example, has essentially a one-term system of demonstrative adjectives.
While grammars of German state that a noun may be modified by one of three distinct
terms: der ‘that/the’, dieser ‘this’, and jener ‘yon’, in fact only der (and its variants die and
das, depending on the gender of the head noun) tends to be used. So, for instance, a
German speaker would be more likely to say das Buch hier for ‘this book’ (proximal to
the speaker, literally ‘that/the book here’) to distinguish from das Buch ‘that/the book’.
A more common and still very simple type of system features a two-way distinction.
In English, for example, a ‘proximal’ term this stands in opposition to a ‘distal’ term
that. There is an archaic term yon ‘far distal, over there’, but it is almost never used.
A similar situation is found in the non Pama-Nyungan language Kayardild, with ‘distal’
dathin- and ‘proximal’ dan-, and a ‘rarely used’ form nganikin- meaning ‘that, beyond
the field of vision’ (Evans 1995: 206–210). It is surprisingly difficult to determine precisely what is the semantic distinction between the terms in such a system, though the
most common characterization is ‘proximate’ versus ‘distal’. This captures the fact that,
in general, things that one refers to with the word this tend to be spatially closer to the
speaker than things one would refer to with that. However, there are problems with this
suggestion. For one thing, these words are used in endophoric, non-spatial domains
where the application of an analysis in terms of ‘proximate’ and ‘distal’ is metaphorical
at best. A more parsimonious analysis would then not specify spatial distance as the
227
Deixis and the Interactional Foundations of Reference
227
operative factor (Enfield 2003; Hanks 1990; Kirsner 1979). For another thing, there is
no objective measure of what would count as ‘proximal’ versus ‘distal’, yet these terms
imply some kind of specifiable distance. When we observe actual usage, it turns out
that spatial distance between speaker and referent does not predict which term will be
used. This was demonstrated in an analysis of situated usage of a two-term system in
Lao (a Southwestern Tai language of Laos; Enfield 2003). The account that best captures the observed data posits a semantic asymmetry in the system: one of the terms is
semantically specified as ‘external’, ‘distal’, or more accurately ‘not here’, while the other
term has no specification for ‘externality’ or ‘distance’. This is a basic ‘informativeness
scale’ (Horn 1989; Levinson 2000), by which the unmarked member of a paradigm
can readily pick up extra pragmatic meaning by virtue of its opposition to the other
members. In the Lao case, the semantically general form tends to imply ‘proximal’, not
because it semantically specifies proximal but because it is being chosen when ‘distal/
external’ could have been chosen instead. A similar solution has been implied in analyses of the English that/this opposition, though without consensus as to which term
is the semantically unmarked one (Halliday and Hasan 1976: 59 say that that is basic,
while Wierzbicka 1980: 27 and Dixon 2003: 81 say that this is the basic form).
Many languages have three-term systems, often described in terms of the familiar
‘proximate’ versus ‘distal’ distinction, but where there are two ‘proximate’ terms: one
refers to things that are proximate to the speaker, the other to things that are proximate
to the addressee. For example, in Yimas, a Lower Sepik language of Papua New Guinea,
there are three deictic stems: -k ‘this (near me)’, m- ‘that (near you)’, and -n ‘that yonder
(near neither you nor me)’ (Foley 1991: 112). Or in Manambu, also spoken in the East
Sepik, there are the forms k- ‘close to speaker’, wa- ‘close to hearer’, and a- ‘far from both’
(Aikhenvald 2008: 201). Other three-term systems operate on different semantic principles. In Turkish, alongside a contrast between ‘proximal’ (bu) and ‘distal’ (o), there is
a term (şu) that encodes ‘the absence of the addressee’s visual attention’ on the thing
being referred to (Küntay and Özyürek 2006: 304).
There are also many languages with demonstrative systems that have more than three
terms. Often the extra terms mark spatial contrasts associated with living in a particular kind of physical environment and lifestyle. For example, in Kri, a Vietic language
of Laos (Enfield and Diffloth 2009), there is a five-term system of exophoric demonstratives, featuring a familiar-looking proximal versus distal distinction, in addition to
semantic distinctions of ‘across’, ‘up’, and ‘down’, motivated by the Kri speakers’ riverine
up–down environment (this system is also used with reference to small-scale or ‘table
top’ space; see further discussion of the Kri system in section 11.4):
(1) a.
b.
c.
d.
e.
nìì
naaq
seeh
cồồh
lêêh
general (‘this’, proximal)
external (‘that’, distal)
external, across (‘yon’, far distal)
external, down below
external, up above
228
228
Jack Sidnell and N. J. Enfield
A similar system is found in Lezgian, a Nakho-Daghestanian language of the Eastern
Caucasus (Haspelmath 1993: 190; note that according to Haspelmath, in ‘modern
standard’ Lezgian, only the two forms glossed as ‘that’ and ‘this’ are commonly used).
In the Lezgian system, yet another term (ha) is added, which has a dedicated ‘discourse
anaphoric’ function:
(2) a.
b.
c.
d.
e.
f.
this
that
yonder
the aforementioned
that up there
that down there
‘i’
a
at’a
ha
wini
aǧa
These few examples can only hint towards the complexity and subtlety of different demonstrative systems in languages of the world. The list of possible semantic distinctions
is long. In his typological survey of demonstrative systems, Diessel (1999: 52) summarizes all of the semantic features that are attested. These divide into ‘deixis’ and ‘quality’,
subcategorized in Table 11.2.
Adding to the complexity and richness of the possibility space for demonstratives, the various terms may be enlisted in many different ways for endophoric
Table 11.2 Diessel’s summary of semantic distinctions
attested in demonstrative systems
(A) Semantic distinctions in demonstratives of the type ‘deixis’:
(i) distance
(ii) visibility
(iii) elevation
(iv) geography
(v) movement (or direction)
(B) Semantic distinctions in demonstratives of the type ‘quality’:
(i) ontology
(ii) animacy
(iii) humanness
(iv) sex
(v) number
(vi) boundedness
229
Deixis and the Interactional Foundations of Reference
229
usages, and in other syntactic functions (e.g. as demonstrative adverbs like English
there and here). The most important future line of research is to test the proposed
semantics of these systems in the context of their usage in everyday life. Since the
understanding of demonstratives are so heavily context-dependent, they cannot be
meaningfully studied without looking at a corpus of usage. This issue is discussed
in section 11.4.
11.4 Demonstratives in the Context
of Common Ground
We began this survey of deictic reference with the simplest kinds of joint-attentional
scenes, the kinds that allow a 9-month-old to get started on his or her long journey
of socialization. It is a years-long path of countless moments of joint attention,
countless instances of learning and guidance, of gradual convergence in knowledge and stance with elders and peers, first through simple gestures and shared
participation frames, and soon within the increasingly rich matrices of language,
kinship, ritual, livelihood, and material culture. These aspects of the sociocultural
world all form the basis of a community’s common ground, and thus are naturally
caught up in the elements of demonstrative systems, dependent as they are on
whatever sources of ‘mutual salience’ happen to be at hand. Most previous work
on deixis, such as the research on demonstrative systems outlined in the previous section, has approached the task as a search for the right ‘gloss’ of each form’s
meaning. However, deictic terms like demonstratives are especially hard to gloss in
the abstract since interpreters are so heavily dependent on context in figuring out
what they refer to on any given occasion. Research such as that by Hanks (1990)
and Enfield (2003) has shown that the situated dynamics—both spatial and socialrelational—of social interaction bears directly on how a simple demonstrative distinction, e.g. between that and this in English, is to be interpreted. The key to
interpreting deictic expressions is the common ground that pertains between interlocutors (Clark 1996; cf. Hanks 2006b). In a study of Lao, Enfield (2003) shows
how the rapidly changing common ground arising from fluidly evolving participation frames in marketplace interactions can affect the differential selection of
demonstratives for picking out referents that are all proximate and in common
view. In other kinds of context, we see how common ground of the more enduring kind—that is, cultural common ground (Clark 1992)—also has a bearing on
the selection and interpretation of demonstratives. Let us consider an example
from research on speakers of Kri, an Austroasiatic language of Laos (Enfield and
Diffloth 2009).
230
230
Jack Sidnell and N. J. Enfield
Figure 11.6 Kri house
In the Kri-speaking community of Mrka village in upland central Laos, houses are
built to a precise plan, by which the physical layout of the house is a diagram of certain
social-relational asymmetries, on two axes (see Enfield 2009 for detailed discussion).
Running laterally across the house is an ‘in–out’ axis, where ‘in’ maps onto ‘private,
family, women, children, storage room, food preparation’ and ‘out’ maps onto ‘public,
non-kin, men, adults, guests, drinking, public ritual’. Orthogonal to this is an axis that
runs from what we would call in English the ‘front’ of the house, where one enters,
to the ‘back’ of the house. In Kri, this is referred to as a ‘below–above’ axis, where
‘below’ maps onto socially lower rank, and ‘above’ to socially higher rank, where relative ‘height’ is determined primarily by relative age, often attenuated by classificatory
kinship. See Figures 11.6 and 11.7.
The Kri house is therefore conceptualized spatially as a mini-version of the larger
geographical environment, as coded in the demonstrative system. Recall that in that
system (see (1)), beyond the ‘proximal’ and ‘distal’ forms, there are three forms in addition: ‘the one up/above/upstream/uphill’, ‘the one down/below/downstream/downhill’,
and ‘the one across’ (i.e. away but neither up or down). While the house floor is normally perfectly level, the ‘up/down/across’ scheme is nevertheless readily mapped onto
231
Deixis and the Interactional Foundations of Reference
231
5 m approx.
prùng kùùjh ‘fire pit’
upper
roong
‘upper corner’
sùàmq
‘inner room’
sùàmq
sùàmq
tkoolq
‘giant mortar’
khraa
‘storage and
work room’
prùng kùùjh
‘fire pit’
cààr ‘verandah (covered)’
krcààngq
‘ladder’
cààr
‘verandah (open)’
lower
outer
inner
Figure 11.7 Kri house floor plan
it, thanks to its diagrammatic relation to the socialcultural dimensions represented as
‘in–out’ (family versus non-family) and ‘up–down’ (senior versus junior). Now consider an example from a video-recorded interaction between a group of Kri-speaking
women sitting on a front verandah, in which this socioculturally motivated mapping
provides the solution for a simple referential problem of locating an object. Figure 11.8
shows a still image from the video recording.
232
232
Jack Sidnell and N. J. Enfield
Figure 11.8 Image of the speakers (Still image from video recording).
The scene is in the house of E, the elderly woman at the right of frame. We focus on
an exchange between her and B, the young woman second from left, visible in the door
frame, who does not live in this house.
(3) Kri interaction
B: piin sulaaq
Give leaf
Pass some leaf.
E: sulaaq quu kuloong lêêh,
sulaaq, quu khraa seeh
Leaf
LOC inside
DEM.UP leaf
loc store DEM.ACROSS
The leaf is inside up there, the leaf, in the store.
môôc cariit
hanq
one
backpack 3SG
(There’s) a (whole) backpack.
(5s; B walks inside)
Here, B makes a request to be given some ‘leaf ’ (actually, corn husk) with which
she can roll a cigarette. In E’s reply, she uses a complex combination of referential
233
Deixis and the Interactional Foundations of Reference
233
expressions to inform B of the location of the ‘leaf ’ so that she can go and get some
herself. First, an intrinsic spatial reference (kuloong ‘inside’) is combined with the
‘up’ demonstrative lêêh, in alignment with the up–down axis of the house. From
their perspective sitting on the verandah, the ‘lowest’ part of the house, the inside
area of the house is ‘up’, and, accordingly, this is coded in the demonstrative chosen.
E then narrows in further on the spatial location; where they are currently sitting
is the ‘outer’ edge of the house, and the ‘leaf ’ in question is located inside the khraa
‘storage room’ at the ‘innermost’ side of the house: once one has entered the house
going ‘up’ from where the speakers are sitting, one would then have to go ‘across’;
this is specified with use of the relevant demonstrative seeh ‘the one across there’.
This example has illustrated one way in which the interpretation of demonstratives depends crucially on shared background knowledge, as relevant to the context of
speaking. In the case of Kri, the selection and interpretation of demonstratives draws
directly on a conventional mapping of the sociocultural domain of kinship and other
personal relations onto the 2D spatial array of the house floor plan.
11.5 What’s Special about Deixis
as a Form of Reference?
In this final section we address the central question with which we began: what is special about deixis as a form of reference? Another way to ask the same question is: where
both deictic and non-deictic formulations of a referent are possible, why might a speaker
choose the deictic one? Consider the following case from the second presidential debate
between John McCain and Barack Obama in 2008. Here the moderator has asked
McCain the following: ‘Should we fund a Manhattan-like project that develops a nuclear bomb to deal with global energy and alternative energy or should we fund 100,000
garages across America, the kind of industry and innovation that developed Silicon
Valley?’ McCain has already begun to respond when he produces the following segment:
(4) McCain2
01 JM: By the way my friends: I-I know you grow a little wea:ry
02
with this back-and-forth.
03
(.)
It was an energy bill on the floor of the Senate loaded down
04
05
with goodies. billions for the oil companies. An’ it was
06
sponsored by- Bush and Cheney.
2
English examples are presented using the transcription conventions originally developed by Gail
Jefferson. For present purposes, the most important symbols are the period (‘.’) which indicates falling
and final intonation, the question mark (‘?’) indicating rising intonation, and brackets (‘[’ and ‘]’)
marking the onset and resolution of overlapping talk between two speakers. Equal signs, which come
234
234
Jack Sidnell and N. J. Enfield
07
08
09
10
11
12
13
(0.2)
You know who voted for it, might never know, That one.
You know who voted against it? Me. I have fought time
after time against these pork barrel—these-these bills
that come to the floor and they have all kinds of goodies
an’ all kinds of things in them for everybody and they
buy off the votes,
Notice then that McCain selected the deictic formulation ‘that one’ in referring to
Obama who was sitting close by at the time (see Figure 11.9). This is clearly a marked
usage in contrast to ‘Obama’ or ‘Senator Obama’ and it was noted in the press, with
many ordinary people as well as political pundits weighing on what the formulation
might ‘mean’. For instance, the Huffington Post reported:
During a discussion about energy, McCain punctuates a contrast with Obama by
referring to him as “that one,” while once again not looking in his opponent’s direction (merely jabbing a finger across his chest). That’s not going to win McCain any
Miss Congeniality points. Nor will it reassure any voters who believe McCain is
improperly trying to capitalize on Obama’s “otherness.”
David Axelrod—an Obama strategist—was reported as saying: ‘Senator Obama has
a name. You’d expect your opponent to use that name.’—clearly drawing attention to
the marked character of ‘that one’. Other commentators suggested that the usage was
disrespectful, rude, or even racist. Defenders of McCain, in contrast, argued that the
press and others were making something out of nothing.
Drawing on the basic principles sketched in this chapter we can develop an analysis
of how people were able to arrive at these diverse interpretations. First, the reference is
accompanied by a pointing gesture in the direction of Obama (Figure 11.9), indeed
there is prior point at Obama produced over ‘you know who voted for it?’ Second,
while producing the reference (the deictic formulation ‘that one’ with point in Obama’s
direction), McCain was gazing at the studio audience. Third, the reference combines
the deictic ‘that’ with the characterizing ‘one’—a usage which denotes any enumerable person or thing. The combination is roughly equivalent to ‘him’ in denoting a
third person, non-participant in the immediately available speech situation, i.e. not
a speaker, not an addressee; and note that it is compatible with the referent being an
inanimate object. Fourth, McCain can be seen to have selected ‘that’ from the pair of
contrasting terms ‘this/that’—‘that’ is what we gloss as the distal member of the pair
and, in contrast to ‘this’, conveys distance from speaker (see Stivers 2007).
in pairs—one at the end of a line and another at the start of the next line or one shortly thereafter—are
used to indicate that the second line followed the first with no discernable silence between them, i.e. it
was ‘latched’ to it. Numbers in parentheses (e.g. (0.5)) indicate silence, represented in tenths of a second.
Finally, colons are used to indicate prolongation or stretching of the sound preceding them. The more
colons, the longer the stretching. For an explanation of other symbols, see Enfield and Stivers (2007).
235
(a)
(b)
Figure 11.9 McCain and Obama, ‘That one’
236
236
Jack Sidnell and N. J. Enfield
We can see that this reference positions Obama as a non-participant in a speech
event comprised of McCain and the audience to whom his talk is directed. In addition, the use of ‘one’ and ‘that’ (rather than ‘this’) conveys distance. These effects,
along with McCain’s use of ‘my friends’ to address and align the audience, thus work
together to construe an interactional rift that divides himself and the audience on the
one side from Obama on the other. At the same time, of course, these meanings are
defeasible—from another perspective, McCain was simply using a highly efficient,
minimally characterizing referring expression to identify who he was talking about.
The availability of seemingly incompatible, even opposed interpretations is surely an
outcome of the fact that so much of the meaning of these forms is inferred rather than
encoded.
We are now in a position to summarize at least some the features of deixis that distinguish it from other forms of reference and to see how these might shape a speaker’s
selection of a deictic over non-deictic formulation.
1. Deictic reference is a low-cost, highly efficient, minimally characterizing way to
accomplish reference.
Many of the examples we have so far discussed exemplify just this point. Simply put,
there are many situations in which a deictic formulation is the most efficient way to
accomplish reference. Where the intended referent is already available in the common
ground and perhaps even co-present, a deictic formulation constitutes the most
straightforward way of referring to it. Notice that this likely explains the universal occurrence of deictic words in the world’s languages—a language without them would
be unnecessarily cumbersome. It should be noted however that there are some (perhaps many) situations in which sociocultural norms override any pressure towards efficiency. So for instance in Vietnamese, in many situations, speakers avoid minimally
characterizing deictic formulations in referring to speaker and hearer (tôi/ta ‘I’, mày
‘you’) in favour of kin terms which explicitly characterize the social relationship between speaker and hearer (Luong 1990; Sidnell and Shohet 2013). So while matters of
efficiency are clearly at play, their relevance may not always be paramount.
2. The semantically general character of deictic forms makes them well-suited for
reference to hard-to-describe and/or nameless objects. In such a situation a deictic form can exploit features of the artifactual environment, including the presence of the thing being referred to.
For instance, in the following case something hanging on the door of the small room
where three children are playing is initially referred to by ‘it’. However, when the recipient initiates repair of the reference with ‘move what?’, a deictic formulation is used
which locates the referent relative to landmarks in the physical environment rather
than characterizing or describing it.
237
Deixis and the Interactional Foundations of Reference
237
(5) Kids_11_24_05(2of2)T7 @11:33
01
02
03
04
A:
C: ->
A:
C:
((looks at door)) Maybe R---, maybe you can move it,
°Move what?°
Move that thing that’s in the lock
Okay.
3. The semantically non-specific, minimally characterizing features of deixis allow
speakers to avoid description where such description may be counter-productive
to some interactional goal.
There are situations in which a speaker may wish to avoid characterizing the thing
referred to and here deictic formulations are particularly well-fitted. Sacks (1995) discussed this issue in his consideration of ‘indicator terms’ (the term used by analytic
philosophers such as Russell and Goodman to talk about deictics). Sacks observed that
in the context of group therapy one patient may wish to avoid saying ‘why are you in
therapy?’ and prefer instead ‘why are you here?’—these questions having quite different
implications. The first invites an answer that makes reference to the real or supposed
psychological issues with which the recipient is struggling. The second, in contrast, can
be answered with something like ‘my father sent me’ or ‘it’s a condition of my parole’
etc.—i.e. practical circumstances.
This points to some of the ways sociocultural rules or norms may come into play in
the selection of deictic or non-deictic forms. Levinson (2005, 2007) has discussed data
from Rossel Island that is also relevant here. The Rossel Islanders observe taboos on
name use when the bearer of the name is recently deceased. In their attempts to observe
these taboos, speakers of Yélî Dnye sometimes resort to highly circumspect reference
often involving elaborate deictic gestures or linguistic formulations—eyebrow flashes
to distant locales, points, or expressions like ‘that girl’ and so on.
4. Because these forms require for their interpretation the application of knowledge
in common ground (shared knowledge), successful reference via such a form can
be a demonstration of social proximity—an informational enactment of intimacy
(see Enfield 2006). Schegloff (2007b) discusses how this works via the indexical
meaning of a person’s voice. In the following example, Clara picks up the phone
and says hello (line 6b), to which the caller, Agnes, responds with ‘Hi’ (line 6c).
From this one-syllable voice sample, Clara knows it is Agnes, and demonstrates
this knowledge in her subsequent utterance, by using Agnes’s name.
(6) a.
b. Clara
c. Agnes
d. Clara
((Ring))
Hello
Hi
Oh hi, how are you Agnes
238
238
Jack Sidnell and N. J. Enfield
This indexically-based understanding is a way of making a genuine demonstration of
shared knowledge between a particular dyad. Had the caller been someone who Clara
did not know, or knew less well, she would have been simply unable to make this demonstration of knowing who it was, and thus would have made explicit the greater social
distance between the two. This example relates to the indexical meaning that allows
us to recognize a person just from their voice, and so is not in the realm of linguistic
deixis; however, we see exactly the same effect in the domain of grammatical deixis. In
this example from Lao (see Enfield 2006 for more information), a man is talking about
a riverine environment near his village, where villagers were once able to collect large
amounts of a certain herbal medicine.
(7) 6 tè-kii4
before
paj3
go
haak5 vang2-phêêng2
pcl
vp
nanø
tpc.nonprox
tèø-kii4
before
khaw3
3pl.b
tèq2-tòòng4
touch
‘Before, in Vang Phêêng weir, before, for them (the villagers) to go and
touch it
7 bòø daj4,
neg can
paa1-dong3
forest
man2
3.nonresp
lèwø
prf
dêj2
fac.news
was impossible, it was the forest of itnon-respect, you know.’
The deictic element in line 7—man2 ‘it’—has no local antecedent, and so the speaker
is evidently assuming that his listeners will know how or what ‘it’ is. A couple of lines
later, a woman who is listening to the man’s story asks:
(8) 8
FW
khuam2
reason
phen1
3.p
haaj4
angry
niø
tpc
naø
tpc.periph
‘Owing to itsrespect being angry?’
She uses a different pronoun, this time marking respect, however the referent is still
entirely inexplicit. In the next line, the man does make the referent explicit:
(9) 9
FM qee5 — bòò1 mèèn2
yeah neg be
phii3
spirit
lin5
play
vang2-phêêng2
V
lin5
play
dêj2,
fac.news
niø
pcl
‘Yeah—It’s not playing around you know, the spirit of Vang Phêêng.’
239
Deixis and the Interactional Foundations of Reference
239
The deictic expressions man2 and phen1, both third person pronouns, were first used in
this sequence in such a way as to assume certain cultural common ground; namely that
‘weirs’ and similar deep water environments have spirit owners that protect the aquatic
resources and that are feared and respected. The fact that these interlocutors were able
to successfully refer to these spirits with only the use of these semantically very general
demonstrative expressions is a demonstration of their common membership in a particular sociocultural world, and not only in a common ‘speech community’.
In this chapter we have sketched the interactional foundations of deixis (and reference in general) in the joint attentional scenes and associated action trajectories of
ordinary social life. We then discussed two ways in which the basic features of deictic
reference are elaborated—in semantically complex systems of linguistic opposition and
in the way they map onto the rich, conventionally meaningful cultural systems that
make up the life-world. Finally, we have tried to address the fundamental question of
why any given speaker on any given occasion would select a deictic over a non-deictic
expression.
Abbreviations Used
Orthography used for Lao in this book follows Enfield (2007). Orthography used for
Kri follows Enfield and Diffloth (2009). Following are the conventions used for interlinear morphemic glossing:
1
1st person
2
2nd person
3
3rd person
B
bare
dem
demonstrative
dir
directional
dist
distal
fac
factive
loc
locative
neg
negation
news
news marker
nonprox
non proximal
pcl
particle
pl
plural
prf
perfect
tpc
topic