How Flexible Idiom Is
How Flexible Idiom Is
How Flexible Idiom Is
Christiane Fellbaum*
How flexible are idioms? A corpus-based
study
https://doi.org/10.1515/ling-2019-0015
1 Introduction
Analyses of spoken and written language reveal a high percentage of Multi Word
Units (MWUs), both in terms of types and tokens (Jackendoff 1995; Moon 1998;
Cowie 1998). MWUs comprise a broad range of phrases, including idiosyncratic
collocations like brush one’s teeth and answer the door, formulae like Happy New
Year! and idioms like pull someone’s leg (for a typology of MWUs see Mel’čuk
1995; Langlotz 2006; Fellbaum 2015a; inter alia). Such “pre-fabricated” phrases
persist in the language, perhaps because they often refer to complex but com-
mon events and situations, and save speakers the effort to encode anew appro-
priate messages (Nunberg et al. 1994; Fellbaum 2007a).2
1 This article is based on a talk given at the BSGL 2015 meeting on idioms in Brussels (Fellbaum
2015b).
2 Domain-specific language, where the topics tend to be limited, may make even greater use of
“pre-fabricated” phrases. Kuiper (1996) studied the speech of sports commentators and auction-
eers. Noting that they are under pressure to talk fast when reporting in real time on rapidly
where one or more complements of verb are part of the idiom, though not
necessarily all. Examples include the English idioms hit a nerve, give somebody
the glad eye, and (not) look a gifthorse in the mouth.3 The open argument position
that is not lexically filled by an idiom component is most often the subject. Many
idioms have in addition an open indirect object argument (read Y the riot act), or
an open object of a preposition (keep tabs on Y) or a possessive (get Y’s goat).
We consider both “plausible” and “implausible” idioms, exemplified by
smell a rat and fry one’s brains, respectively; the former, but not the latter
have possible alternate non-idiomatic readings.4 Of particular interest are
evolving situations that they need to hold in short-term memory, Kuiper shows that such
speakers resort to a repertoire of formulaic language, which allows them to pack complex
information into standardized forms.
3 We will not consider Light Verb Constructions, Support Verb Constructions or Vague Action
Verbs like take a bow, make phone call, have a drink and give a groan (Grimshaw and Mester
1988; Kearns 2002 [1988]; inter alia). We exclude sentences like the early bird gets the worm,
routine formulae like have a nice day, idioms that cannot be assigned to any phrasal category
such as say when, like father like son as well as lexically specified collocations such as answer
the door. See Fellbaum (2014) for a broad classification of idioms.
4 A manual classification of frequent German idiom candidates retrieved from a one billion-
word corpus showed that the literal meaning is intended about half the time on average, with
significant variation across individual idioms (Fellbaum 2007b).
Corpus data suggest that the latter, with a Patient argument in the open subject
position, is by far the more frequent form of this idiom (192 tokens vs. 44 tokens
in the one billion word corpus, see Section 4.1.1 for more discussion of corpus-
based frequency).
Another hallmark of the canonical form is that it is neutral with respect to
information structure and does not require a contrast or assume prior reference
to (and thus givenness of) a constituent in the context in which the idiom is
used. In (4), an example of a non-neutral context, the beans are contrasted with
the beans that were spilled in the author’s previously published memoir, and
this contrast licenses the topicalization of the noun phrase in the second
sentence:
(4) The book is a collection of anecdotes and political opinions, rather than a
sequel to her best-selling memoir Spilling the Beans. The beans she spilled in
that book included an account of how her alcoholic father beat her as a
child.
https://www.telegraph.co.uk/news/6179497/Clarissa-Dickson-Wright-
They-dont-call-me-Krakatoa-for-nothing.html
The aim of this paper is to investigate whether, and to what extent, VP idioms
are in fact the kind of monolithic, inflexible structures that they are often
portrayed to be. To this end, we examine corpus data for syntactic and lexical
variations of familiar idioms.5
Operations that apply to entire idioms rather than idiom-internal constitu-
ents will not be further discussed in this paper. These include negation (William
F. Cody, who did not kick the bucket until 1917), questioning (When did he kick the
bucket?), variations in mood, tense and aspect (it would make your grandmother
very happy if you had kids before she kicks the bucket; the washing machine was
kicking the bucket when suddenly it started working just fine) and use of a modal
verb (80 tablets and he could not kick the bucket?). There is general agreement in
the literature that these operations are freely available to all idioms, regardless
of semantic compositionality (Stathi 2007; Schenk 1995; Mel’čuk 1995).6
We consider variations of the canonical form that result from the following
syntactic operations on idiom-internal components: topicalization, clefting, pas-
sivization, relativization, wh-questioning, ellipsis and pronominalization.
5 Our distinction between syntactic and lexical operations does not imply any theoretical
claims pertaining to different levels of grammatical analysis.
6 In fact, statements that aspectual variations are acceptable only when the aspect agrees with
that of the idiom’s meaning (Everaert 2010; Mel’čuk 1995, inter alia) are refuted by attested
examples like (14).
Ernst points out that the adjective here does not in fact modify the noun but
rather the entire VP and that is must therefore be considered semantically
external. (5) can be paraphrased roughly as “for linguists/linguistically, this is
grist for the mill.” The adjectival modification here does not necessarily entail
that the noun receives a semantic interpretation. A related case is (6), called
"metalinguistic" by Stathi (2007), where the adjective does not modify the noun
but rather comments on the linguistic (idiomatic) status of the phrase:
(6) An agenda of adventures that you were made to experience and a lifestyle
that you were meant to live before you kicked the proverbial bucket?
https://books.google.com/books?
id = 0XtKBQAAQBAJ&pg = PA51&lpg = PA51&q = %22you + kicked
+ the + proverbial + bucket%22&source = bl&ots = qPuGiS3pBg&sig
= rARQhJ0Sy8iNOInUNBpq0Oy2cW0&hl = en&sa = X&ved = 2ahUKEwj1j7D-
orvLeAhWJd98KHdMzC6EQ6AEwBXoECAgQAQ#v = onepage&q = %22you
%20kicked%20the%20proverbial%20bucket%22&f = false
Both external and metalinguistic noun modifications are available to all idioms,
regardless of compositionality.7
2 Related work
Idioms have received attention from linguist, lexicologists, lexicographers, com-
putational linguists and psycholinguists. Among the analyses proposed by
linguists, we distinguish those that focus on the syntax of idioms, those that
consider syntax in conjunction with semantic compositionality, and those that
categorically deny any semantic compositionality.
Reagan bandwagon, where the adjective horse-drawn modifies the literal meaning of bandwa-
gon, although this noun is not interpreted literally within the idiom. Nicolas considers such
cases to be “word play, “external to the grammar of idioms.” See Section 7 for a discussion of
word play.
(7) And no one here knows when the bell will toll or when the bucket will be
kicked
https://cherylcapaldotraylor.com/2016/02/29/take-the-leap/
https://books.google.com/books?id = YIdCmIrZZxEC&pg
= PA195&lpg = PA195&dq = %22the + bucket + will + be + kicked%
22&source = bl&ots = sXEqD5Mski&sig = qfKMK4D6c6jBPjQhMKHcIwMTn6-
k&hl = en&sa = X&ved = 2ahUKEwj-0sa7p97eAhVC3VMKHWrTAMo
Q6AEwCHoECAYQAQ#v = onepage&q = %22the%20bucket%20will%20be%
20kicked%22&f = false
(8) this, coupled with his diabetic cum hypertensive condition, was what led to his
kicking the bucket in the early hours of last Saturday, November 18, 2017
http://www.peacefmonline.com/pages/local/news/201711/336344.php?
storyid = 100&
Lebeaux (2000) attempts to integrate idioms into the core grammar and to
capture their behavior in terms of broad rules. He argues that idioms are
constructed like partial phrase markers similar to those characterizing certain
stages of language acquisition. Both can be accommodated in a “sub-gram-
mar” framework that is distinct from, but compatible with, the full grammar
that defines competence. Lebeaux distinguishes between a class of “pre-
merger” and a class of “post-merger” phrases. For example, pre-merger
idioms, including take advantage of, have a variable determiner (take no
advantage of) and are subject to syntactic operations like passivization
(advantage was taken of Jim); post-merger idioms like kick the bucket include
a definite determiner and cannot undergo syntactic operations like passive.
While Lebeaux’s proposal for an idiom grammar is interesting in that it
integrates language acquisition and adult grammar, it, too, is based on
constructed data that conflicts with attested data. Moreover, the claim that
in an idiom with a definite NP the Determiner is invariant is contradicted by
idioms like idioms like break the ice, which occurs freely with negation (break
no ice) or a demonstrative (break this ice):
(9) After talks, India and Pakistan break no ice on how to demilitarize the no-man’s
land above the Siachen glacier.
https://in.reuters.com/article/india-pakistan-events/timeline-flashpoints-and-
flare-ups-in-india-pakistan-ties-idINDEE83703C20120408
(10) How to break this ice and how not to spurt out unwanted topics and place
yourself in an awkward situation?
https://www.blinddate.com/blogs
(11) "Thank God" said a Georgia representative, and the ice was broken.
https://www.google.com/search?q = %22the + ice + was + broken%22&ie = utf-
8&oe = utf-8&client = firefox-b-1-ab
(12) In this latest meeting between leaders of the two countries, the hatchet was buried
https://www.theeastafrican.co.ke/news/ea/Rwanda-France-relations-bury-
hatchet/4552908-4580862-format-xhtml-112t1ak/index.html
(13) But when a greedy nephew took her to court to get a piece of the pie, the beans
were spilled.
https://www.forbes.com/pictures/eiif45gkek/catherine-lozick/#4f96c9555f45
Accounting for the syntactic flexibility of idioms in purely structural terms does
not do justice to the data.
Syntactic and lexical flexibility has been linked to semantic transparency. This is
an intuitively appealing approach, as it breaks down the hard boundary
between literal, freely composed and non-literal, possibly frozen language,
and could account straightforwardly for syntactic variations from the canonical
form. But this view is not universally accepted.
Sabban (1998) and Mel’čuk (1995) are among those who assert that all VP
idioms are non-compositional. Consequently, their morphosyntactic behavior is
that of simplex verbs and they can show variation only in the verb’s tense,
aspect and number (as far as it is compatible with the figurative meaning), as
well as negation and questioning of the entire VP.
Schenk (1995) categorically states that idiom chunks do not have meaning
but allows for some variation. He distinguishes two types of syntactic opera-
tions. The first does not affect meaning and can operate on meaningless expres-
sions such as idioms. These operations comprise raising, passivization and yes-
no-questioning.8 The second kind of syntactic operations distinguished by
Schenk apply to meaningful expressions only. They include topicalization, rais-
ing, control structures, clefting, pseudo-clefting, modification, relativization,
pronominalization and wh-questioning. Since Schenk considers all idiom com-
ponents to be semantically unanalyzable, idioms cannot appear in these syntac-
tic configuration. However, all data cited by Schenk are constructed, and
throughout this paper we will cite corpus examples showing that speakers
produce syntactically modified idioms in ways that Schenk would fail to predict.
Everaert (2010) considers the lexical representation of idioms within the
framework of generative grammar. He argues that syntactic flexibility is not
tied to semantic transparency. Rather, the properties of a given idiom compo-
nent are connected to those of all senses of the same word form in the lexicon,
and these senses are always available. For example, the lexical encodings of kick
and bucket in their use as idiom components share properties of the literal
meanings of these lexical items, and variations like passivization are licensed,
as they are for the non-idiomatic senses. However, Everaert’s analysis also
entails that the aspectual properties of kick are retained in the idiomatic use,
and he specifically dismisses structures like he kicked the bucket slowly to be as
ill-formed as he kicked the ball slowly. But speakers do produce such data,
indicating that the lexical entry of the idiom component kick it not simply
merged with non-idiomatic senses of that verb:
(14) Our computer here at home slowly, ever so slowly, kicked the bucket.
asksistermarymartha.blogspot.com/2008/11/its-alive.html
Abeillé (1995) argues that while some idioms are semantically decomposable,
decompositionality is not systematically associated with, and does not predict,
syntactic flexibility. Working within a Tree Adjoining Grammar (TAG), where
idioms are represented as elementary "frozen" trees associated with a semantic
8 Bargmann and Sailer (2015) similarly separate purely syntactic operations from those that are
semantically motivated. They take a crosslinguistic perceptive on the syntactic flexibility of non-
decomposable idioms and argue that the German obligatory verb-second syntax in declarative
sentences allows non-referential nominal idiom chunks to be fronted in topicalization and
passivization while remaining “semantically neutral”. By contrast, in English such dislocated
NPs are claimed to be topics and thus topicalization and passivization for non-compositional
idioms is licensed only under the appropriate discourse conditions and information structure.
representation, Abeillé proposes that idioms follow the same syntactic rules as
corresponding non-idiomatic structures, a position that will be argued for, and
supported by corpus data in this paper as well, though not within the gramma-
tical framework assumed by Abeillé.
A comprehensive proposal regarding the correlation between semantic
decompositionality and syntactic flexibility is made by Nunberg et al. (1994),
who examine a large number of English idioms and argue that the majority are
in fact semantically decomposable. To distinguish the semantic compositionality
of freely generated phrases like pull the rope and knot strings from idioms like
like pull strings, they introduce the notion of “idiomatically combining expres-
sion,” whose parts carry conventionalized meanings, specific to the idiom. Thus,
pull strings derives its meaning (roughly, “exploit personal contacts”) from
directly identifiable correspondences between its constituents and their idiom-
specific meanings (pull = exploit, strings = personal contacts). “Idiomatically
combining” refers to the fact that neither pull nor strings carry these meanings
outside of the idiom. Other idiomatically combining phrases are spill the beans
and let the cat out of the bag. Such decomposition of idiomatic phrases is
consistent with analyses that posit metaphorical status for idiom components
like cat and strings, within the context of specific idioms (Gibbs and Nayak 1989;
Glucksberg 1993; Geeraerts 1995). Nunberg et al. (1994) argue that the semantic
interpretation of idiom constituents allows syntactic operations on these consti-
tuents, such as topicalization, pronominalization and VP ellipsis.
Nunberg et al. (1994) state that in contrast to the constituents of idiomati-
cally combining expressions, the components of “idiosyncratic phrasal construc-
tions” like kick the bucket and saw logs are not semantically interpreted, though
the meaning of saw logs may be more intuitively apparent than that kick the
bucket, as it suggests the kind of sounds a sleeper may make. Idiosyncratic
phrasal constructions, whose constituent are not metaphors, do not show syn-
tactic flexibility, according to Nunberg et al. However, structures that Nunberg
et al. rule out, such as the passivization of kick the bucket, are attested, indicat-
ing that semantic transparency is not sufficient to account for the flexibility of
idioms.
Recognizing that idioms are more flexible than often claimed, Kay et al.
(2012) propose a lexical theory of idioms, citing rich attested data. They conclude
that semantically compositional idioms (like let the cat out of the bag) are
flexible, and, conversely, that the constituents of inflexible idioms do not receive
a semantic interpretation. However, the data we retrieved from corpora and
report on in Sections 5.2, 6.1 and 7.2 show that speakers also modify idioms
that are not semantically compositional.
Corpus analyses show the canonical or citation form is the most frequent and
thus the more familiar one. It may well reflect the way the idiom is represented
in speakers’ mental lexicon.9 Put differently, variations are relatively infrequent
and unfamiliar, hence speakers may reject them, especially when considered
outside of a context.
Idiom components like the verbs and nouns in pull strings and spill the beans
readily lend themselves to semantic interpretation and paraphrases of the
idioms like ‘use personal connections’ and ‘reveal a secret.’ Tying such semantic
compositionality to syntactic flexibility and modification is intuitively convin-
cing. However, there is widespread disagreement about the compositionality of
many idioms and, related, their flexibility. Speakers differ in the way they assign
meaning to idiom components and to entire idioms, and different paraphrases
and mappings of idiom components to metaphoric readings may account for the
divergent judgments of constructed data that one finds in the literature.
Gibbs (1995) makes an important point in arguing that speakers do not
access the same invariant, literal meaning when they encounter a word, and
that one cannot assume that idioms or their components have easily determined
literal meanings. Indeed, speakers do not always agree on the precise meaning
of an idiom or on the interpretation of idiom components, and this may affect
their acceptability judgments. For example, Abeillé and Schabes (1989) para-
phrase grist for someone’s mill as ‘help.’ This interpretation precludes separate
meanings for grist and mill, unlike a more specific interpretation of this idiom
that includes a reference to someone’s particular situation or agenda (the mill)
and the entity or event that has a favorable effect on it (the grist). Examples (15,
16) suggest such an interpretation:
(15) Each detail that leaks out becomes grist for the Democrats’ mill
https://www.washingtonpost.com/blogs/right-turn/wp/2017/06/19/how-
will-we-miss-congress-if-it-doesnt-go-away/?noredirect = on&utm_term = .
2f4b2c45e5a8
(16) That makes grist for the Democrats’ mill and they are grinding it night and day.
https://newspaperarchive.com/austin-daily-herald-sep-07-1957-p-15/
(19) I fell off that wagon for a year or so, but drank decaf for 4 years before that.
https://twitter.com/MarkMaddenX/status/959100902329241602
(20) before I fell off that wagon and started smoking again
https://www.quora.com/What-happens-to-addicted-people-when-they-enter-
a-long-coma
The specific meanings of wagon here vary across speakers, and the noun
appears to have undergone a change from its constituency in the monolithic
VP to an independent metaphor referring to any unhealthy or undesirable habit.
Similar to (17)–(20), (21) and (22) are examples of semantic re-analysis where
the noun in face the music is assigned a meaning (an unpleasant situation) that
is interpreted specific to a context:
(21) There is some responsibility; he might have to face that music, that much is
sure, but not murder or manslaughter charges.
https://books.google.com/books?id = 3S0BAAAQBAJ&pg = PT223&lpg =
PT223&dqx = %22face + that + music%22&source = bl&ots = Rqx3iRMD
G2&sig = USMcw9NfMKW9uf1CLYf5bmZ5EMA&hl = en&sa = X&ved = 2ahU-
KEwimjqDhie7eAhXG3VMKHbgtDf44ChDoATAJegQIARAB#v = onepage&q-
= %22face%20that%20music%22&f = false
(22) Harold Wilson had to face this music in 1967, and Callaghan and Healey
needed the IMF to bail them out in 1976
https://www.terrafirma.com/an-alternative-perspective-article/items/its-
not-over-yet-the-implications-of-the-credit-crunch.html
Idioms like beat around the bush, sit on the fence, and be on one’s high horse are
considered non-composing; the entire VP refers to a specific situation, form of
behavior or attitude. Given a context where this situation, behavior or attitude
are known to the interlocutors, we find the nouns preceded by a demonstrative:
(23) He asked if I would go into psychology, and rather than beat around that
bush again, I said yes.
https://books.google.com/books?id = m5HrCQAAQBAJ&
pg = PT427&lpg = PT427&dq = %22rather + than + beat + around
+ that + bush + again%22&source = bl&ots = LM54CrPmhc&sig
= G4tp1GOEINhDUgFfGf8vkqyquJM&hl = en&sa = X&ved = 2ahUKEwivkb-
u0puveAhUQ0VMKHcMgAa0Q6AEwAHoECAAQAQ#v = onepage&q = %
22rather%20than%20beat%20around%20that%20bush%20again%
22&f = false
(24) Thinking About a New Home? Don’t Sit On That Fence Too Long!
https://www.facebook.com/Sharri.Abii.Realtor/photos/hey-youdont-sit-
on-that-fence-too-long-if-youre-considering-buying-a-house-in-20/
777224589135660/
(25) You had better come down from that high horse, and own up that you set the
Maud afire.
https://www.gutenberg.org/files/23351/23351-h/23351-h.htm
3.4 Context
of the participants in the experiments were rated as “correct” when they agreed
with a majority of pre-classified judgments, and that there never was full or
nearly full agreement among the participants. This suggests that acceptability
judgments differ across speakers, at least for "invented" idioms and contexts.
4 Corpus data
Tabossi et al. (2009) importantly emphasize the need for context when accept-
ability judgments are elicited. To better support their claim that, given appro-
priate contextual embedding, syntactic flexibility is available to idioms just as it
is to freely composed phrases, we examine data attested in corpora. By doing so,
we do not ask how speakers judge a given structure but merely analyze what
speakers produce.
10 A URL will be provided for each English example, while the German data are all from the
corpus described in Geyken (2007) and accessible via http://kollokationen.bbaw.de/htm/idb_
de.html.
with a linguistic search engine designed for this purpose (Geyken and Sokirko
2007; Herold 2007).
Fellbaum (2007b) and Neumann et al. (2004) report on the creation of the
database of German idioms found in the corpus (http://kollokationen.bbaw.
de/htm/idb_de.html). A target list of 817 German idioms was manually cre-
ated and the corpus was searched with the goal of extracting morphosyntac-
tic and lexical variations.11 Importantly, the searches did not target any pre-
selected variations. Regular expression written explicitly for this purpose
(Herold 2007) allowed for the retrieval of a maximal number of variations
of the target idioms. The queries focused on the idiom constituents that are
arguably its core lexemes (equivalent to, for example, English bite and bullet)
but allow for lexical divergence from the dictionary form such as compounds
and semantically related words like synonyms, as extracted from a thesaurus-
like resource.
The regular expressions moreover were designed to capture all inflected and
derivational forms. Furthermore, the order of the lexemes was not specified, so
that structures like passive and topicalization were retrieved. Allowing for vari-
able distance between the lexemes also returned such variations as adjectival
modification of nouns.
An example is jemanden ins Bockshorn jagen (lit. chase somebody into the
buck’s horn, ‘intimidate’). The noun Bockshorn is considered a lexeme that does
not occur outside the idiom and that carries no meaning as an idiom constituent.
Searching for Bockshorn alone yielded 285 hits. Searching with the regular
expression
leaves the verb unspecified and allows for variation in the noun, and produces
twenty-seven hits, some with variant spellings of the verb and the noun as well
as with a different, semantically similar verb. It also yielded a noun Hasenhorn
11 Fazly et al. (2009) and Zhu and Fellbaum (2015) represent efforts to automatically extract
idiomatic expressions from a corpus, based on statistical measures of co-occurrence of tokens.
By contrast, we proceeded from a predefined set of frequent and familiar English and German
Verb Phrase idioms.
The idiom was found most often in the context of a negation (141 tokens vs. 36
affirmative tokens), suggesting that its “canonical” status as a Negative Polarity
Item.
The flexible search queries also allowed for variations of prepositions
that were part of the canonical structure. The retrieved tokens were manually
sorted into true positives (with an idiomatic reading) and false positives,
sequences containing one or more of the core constituents but with literal
interpretation. Across all idioms about half of the tokens received a literal
reading. There was variation – tokens with idiom-specific lexical items like
Bockshorn were unsurprisingly used idiomatically in most of the retrieved
tokens. In a few cases, the linguist sorting the tokens concluded that both
idiomatic and literal readings.
For each target idiom, a manually created entry in the database shows
the kinds of attested variations and their frequency in the corpus. The
“canonical” form, following the structure given in (1) for most idioms, was
by far the most frequent in all cases. The frequency of non-canonical varia-
tions of a given idiom differed across the idioms, as did the number of
retrieved examples with a specific type of variation. Most variations for a
given idiom were in the single digits; for some idioms, no syntactically
specified variation was found, but equivalent variations were found for
other idioms with a similar syntactic canonical form. All data can be accessed
on the website http://kollokationen.bbaw.de/htm/idb_de.html.
Like many idioms, this is a negative polarity item. Its origin–a very old custom
whereby actors in the theater covered their faces so as to remain anonymous and
protected from possible prosecution for using obscene or provocative language–
is unknown to everyday contemporary speakers. Mund ‘mouth’ is assigned a
meaning; it may refer to the mouth of the typical subjects of this VP (speakers)
(30) Bei BMW wird kein Blatt vor den Mund genommen
at BMW is no sheet in front of the mouth taken
at BMW no sheet is taken in front of the mouth
‘people at BMW speak out openly’
In (33), the writer plays on the polysemy of Blatt: the pronoun in the idiom refers
back to the antecedent with the “newspaper” reading:
12 Note that the adjectival modification here is not of the external kind studied by Ernst (1981),
i.e. hostility to the republic strictly modifies the speaker (metonymically his mouth) and not the
entire sentence.
In (34), the writer interpreted Blatt as ‘sheet; and created the compound mean-
ing ‘sheet of music’ in a musical context about the orchestra conductor Herbert
von Karajan, referring to his unrestrained performance:
The quantification here does not entail a metaphoric reading of Blatt; the writer
expresses the opinion that a government speaker takes great care not to be
speak too openly.
Some idiom components are not found outside of their use in idioms. An
example is gift horse in the composing phrase (not) look a gift horse in the
mouth. While the meaning of this compound noun is readily interpretable as
‘gift,’ corresponding to that of its first member, other lexemes are semantically
opaque constituents of non-composing idioms. An example is the common
German idiom in (36):
Such data attest to variations on the “canonical” form of the idiom and the
modification of a non-referring constituent.
6 English data
We now turn to English data, focusing on non-compositional idioms, retrieved
from the Web.
Kick the bucket is perhaps the English idiom par excellence, and it is cited in
many papers as the prototypical frozen idiom whose constituents do not receive
any semantic interpretation. Thus a frequently encountered claim is that the
bucket was kicked last night can only receive a literal interpretation (Nunberg
et al. 1994; Glucksberg 1993; inter alia), under the reasoning that neither the verb
nor the noun can be assigned a meaning as parts of the idiom. However, a Web
search yields examples of passivization such as these:
(42) And no one here knows when the bell will toll or when the bucket will be
kicked.
https://cherylcapaldotraylor.com/2016/02/29/take-the-leap/
(43) Live life to the fullest, you never know when the bucket will be kicked.
https://www.puff.com/forums/vb/general-cigar-discussion/84813-avo-le05s-
what-do-what-do-3.html
These naturally occurring examples clearly contradict the claim that this idiom
is blocked from the passive construction (Nunberg et al. 1994; Schenk 1995; inter
alia). (42) and (43) have the flavor of impersonal passives, which can be formed
with intransitive verbs and a semantically empty “dummy” subject as (44):
No dummy subject is needed in (42)–(43), where the object of the verb occupies
the subject position. Like there, it is semantically empty.13
13 Tabossi et al. (2009) rule at the passive for the idiom miss the boat based on similar
reasoning. But numerous examples can be found on the Web, such as It seems the boat was
missed in not building a fertilizer plant right at the new sewage treatment plant. (https://cityroom.
blogs.nytimes.com/…/turn-piles-of-waste-into-piles-of-cash-city-asks). These have the same
impersonal passive flavor.
We also find the idiom in nominalizing forms, with adjectival modifiers and
quantifiers, and as members of a compound:
(46) Petra prays for her hateful hubby’s untimely kicking of the bucket
http://gone-and-forgotten.blogspot.com/2014/10/truly-gone-forgotten-
foes-gorna-lord-of.html
(47) I am young but have experienced more bucket kicking within my immediate
family and circle of family friends than I can shake a fist at
https://www.straight.com/confessions/1505/live-kicking-bucket
(48) here’s a short list of things I hope to continue to avoid from now until bucket
kicking time.
https://www.boothbayregister.com/article/my-non-bucket-list/7624
(49) Now he (Mr. H.) could also let a cat out of the bag – to show how the
opposition jumps
https://books.google.com/books?id = LqpCAQAAMAAJ&pg = PA639&lpg =
PA639&dq = %22let + a + cat + out + of + the + bag%22&source = bl&ots =
Fcm5pIrqvq&sig = I7lKDjB6kQqj6eyExunKBZtqnak&hl = en&sa = X&ved = 2-
ahUKEwjQu5PuqvLeAhWNrFMKHTiC70Q6AEwDHoECAMQAQ#v = onepag-
e&q = %22let%20a%20cat%20out%20of%20the%20bag%22&f = false
(50) Trump spilled his beans that he has no intention of releasing his tax returns
when he slipped up and said, “And only a fool would give a tax return…“
https://www.redstate.com/california_yankee/2016/04/01/trump-slipped-
spilled-beans-releasing-tax-returns/
(52) You can study hard, burn that midnight oil, and attempt another A plus.
https://books.google.com/books?id = 9-ScBwAAQBAJ&pg
= PT145&lpg = PT145&dq = %22burn + that + midnight + oil, + and + attempt
%22&source = bl&ots = AoadDOh_yd&sig = t1NWImhdkTmjqg-
AcP0fKgQiF0E&hl = en&sa = X&ved = 2ahUKEwih-Pb5vfLeAhWig-
AKHeYKDF4Q6AEwAHoECAEQAQ#v = onepage&q = %22burn%20that%
20midnight%20oil%2C%20and%20attempt%22&f = false
(53) We had the hardest time trying to get that boy to hit those books. But, we
succeeded. He ended up going to engineering school.
https://books.google.com/books?id = MNprioytNPYC&pg = PA101&lpg
= PA101&dq = %22hit + those + books%22&source = bl&ots =
tlDgmkRuJv&sig = QDuHjD9u7_ZERo3y6yfebCfNcs8&hl = en&sa = X&ved = 2-
ahUKEwixicfnvvLeAhWBmOAKHSlRB7s4ChDoATAIegQIAhAB#v = onepag-
e&q = %22hit%20those%20books%22&f = false
The function of the demonstratives here is not deictic but serves to establish a
common ground and a shared perspective among the speakers; this use of
demonstratives is explored in Lakoff (1974), though she did not consider the
kind of non-referring nouns we find in the idioms.
canonical idiom components with a lexical item that is specific to the context;
such substitutions typically achieve a humorous effect. These modifications
indicate that speakers have a representation of the canonical form,14 and that
the modification is deliberate. Importantly, the data show that the lexical sub-
stitutions are systematic, similar to paraphrases and puns.
(54) To say that Simone had let the cat out of the bag was an understatement.
She’d let a lion out of the bag.
https://books.google.com/books?id = fIcRCPvI1Y4C&pg = PA157&lpg
= PA157&dq = %22She%27d + let + a + lion + out + of + the + bag%
22&source = bl&ots = bp4JfjsEA3&sig = tS7G25acTKwgLyyDSHFcAQ52GI&hl =
en&sa = X&ved = 2ahUKEwiR2enWqd7eAhWLZd8KHa6_An4Q6AEwAHoECA-
AQAQ#v = onepage&q = %22She’d%20let%20a%20lion%20out%20of%20the
%20bag%22&f = false
(55) Barack Obama got himself in trouble when he let the tiger out of the bag
about how he wanted to “spread the wealth around.”
https://startthinkingright.wordpress.com/tag/we-are-all-socialists-now/
Further examples of substitutions with semantically related words are (56) and (57):
(56) Kelly Rowland appears to have let the kitten out of the bag. The recently
married singer has been the subject of pregnancy rumors for a while
https://www.eonline.com/news/550004/is-this-kelly-rowland-announcing-
that-she-s-pregnant-see-the-pic
(57) when you asked him to share his knowledge with you he let the Angora out of
the bag?
https://books.google.com/books?id = sYxFAQAAMAAJ&pg = PA15&lpg
= PA15&dq = %22he + let + the + Angora + out + of + the + bag%22&source
= bl&ots = oKTSBEiXcY&sig = 8WpFIuIwOHDdt3DrLXtxrPUC0qk&hl = en&-
sa = X&ved = 2ahUKEwivnvjMqt7eAhUSVd8KHbbhD1AQ6AEwAHoECAAQ-
AQ#v = onepage&q = %22he%20let%20the%20Angora%20out%20of%
20the%20bag%22&f = false
Speakers also clearly access the phonological form of the idiom constituents, as
the following examples from the Web, alluding to the secret sexual adventures
of the golfer Tiger Woods (whose first name is homonymous with the animal)
and the actor Tom Cruise, show:
(58) More importantly, now that the Tiger is out of the bag, what purpose does it
serve not to release an official photo? [of President Obama playing golf with
T.W.]
https://theweek.com/articles/467558/5-ways-looking-obamas-secret-golf-
game-tiger-woods
(59) Nevertheless, it’s quite a shame that someone let the Tomcat out of the bag
http://www.mtv.com/news/2760121/tom-cruise-steals-tropics-thunder/
The example below involves a different sense from the feline one:
(60) Let that sex kitten out of the bag. You deserve a good romp.
https://epdf.tips/house-of-lies.html
Such data suggest that speakers access the literal meanings of idiom constituents
along with their semantic and formal (phonological) properties and support propo-
sals by psycholinguists on independent grounds (Cutting and Bock 1997; inter alia).
To preserve the idiomatic meaning, the substitution of a “canonical” idiom con-
stituent with another lexical item must be restricted to semantically similar lexemes.
Cat is semantically similar to dog. Word association norms show that the
response rate of dog to the stimulus cat is 66.7%, and 55.1% of responses to the
stimulus dog are cat (Moss and Older 1996). The semantic similarity of this word
pair is also reflected in the high similarity of their semantic vectors, which reflect
shared contexts (for example, Boyd-Graber reports a similarity score of 0.9).15
15 http://www.umiacs.umd.edu/~jbg/teaching/CMSC_726/13b.pdf
Thus it is not surprising that we find speakers substituting dog for cat in the
idiom, in contexts where dog in one of its literal meanings is present:
(61) The Training Cesar’s Way Clinics and Fundamentals of Dog Behavior and
Training will continue at the Dog Psychology Centers in California and Florida,
and we have some nice surprises coming down the line, one of which I’m
particularly proud of, although I can’t let the dog out of the bag yet.
https://www.cesarsway.com/cesar-millan/cesars-blog/bark-to-the-future
(62) Not to let the dog out of the bag, but some of the creations involve jalapeno
poppers, mac and cheese and some everlovin’ BLT action. A Split Rail craft
beer will be paired with each of the gourmet hot dogs to create a unique
dining experience.
https://www.manitoulin.ca/elliotts-fundraiser-feature-gourmet-hot-dogs-
cause/
(63) How do you avoid bringing chemicals and poisons into your home while
keeping it pest free? We live for this stuff so spill the (green) beans.
https://www.younghouselove.com/ants-in-my-pans/comment-page-3/
(64) Selena Gomez’s mother…breaks silence about her daughter’s kidney trans-
plant. Now that Selena Gomez has spilled the kidney beans, her mom has
something to say.
http://oceanup.com/2017/09/18/selena-gomezs-mother-mandy-teefey-
breaks-silence-about-her-daughters-kidney-transplant/now-that-selena-
gomez-has-spilled-the-kidney-beans-her-mom-has-something-to-say-
feature/
Note that kidney in (64) has two different meanings and the similarity is pho-
nological rather than semantic.
The German idiom sich auf die Strümpfe machen lit. make oneself onto one’s
stockings, ‘get going or moving’ is often found with Strümpfe replaced by Socken
‘socks’. This common idiom is an example of what Nunberg et al. call “idiosyn-
cratic phrasal,” somewhat similar to saw logs: the constituents do not receive a
semantic interpretation, though the meaning of the entire phrase could perhaps
be guessed. The lexical variation involves two nouns, Struempfe (stockings) and
Socken (socks), whose literal meanings are so similar that they are interchange-
able in many contexts.
Another kind of modification of idiom components is seen in compounding,
as in the example below:
(66) Addison Graham is hardly the first porn actor to move to the relatively quiet
desert town of Palm Springs after saying au revoir to the industry, but unlike
most retired adult industry veterans who throw in the bath towel after long
extensive careers, Graham was done with porn after only two years of
working.
(https://thehissfit.com/blog/my-interview-with-former-porn-star-addison-
graham/)
Bath towel does not receive a figurative meaning in the use of the idiom here,
but alludes to the state of undress associated with porn actors.
7.3 Zeugma
Though the discussion of "word play" in the literature does not include zeugma,
the conjunction of canonical idiom components with non-idiomatic components
within a single idiom, such structures are relevant. Kramer (2006) analyzes some
of the examples found in the German corpus. She distinguishes cases of “inter-
phrasal” zeugma, where two VP idioms with the same verb take two noun
phrase complements that are each part of a different idiom. An example is (67):
In den sauren Apfel beissen is similar to the the English idiom bite the bullet,
i.e. ‘bear the negative consequences’ and ins Gras beissen, meaning ‘die,’
corresponds to bite the dust,’ hence the sentence can be translated as ‘the
loser who has to bite the bullet and not seldomly the dust as well, is the
dog.’
Another, frequent type of zeugma labeled by Kramer (2006) as “transphra-
sal,” extends over multiple sentences. Here, an idiom component and a free NP
are arguments of a same verb, which may receive two readings, one literal and
one as an idiom component. Such cases of zeugma are characteristic of journal-
istic prose.
(68) Ein Essen ist wie ein Konzert. Der Wein kann darin
A meal is like a concert. The wine can in it
die erste Geige spielen oder nur ein Begleitinstrument.
the first violin play or only an accompanying instrument.
‘A meal is like a concert. The wine can play first fiddle or only [be] an
accompanying instrument.’
References
Abeillé, Anne. 1995. The flexibility of French idioms: A representation with lexicalized tree
adjoining grammar. In Martin Everaert, Erik-Jan van der Linden, André Schenk & Rob
Schreuder (eds.), Idioms: Structural and psychological perspectives, 15–42. Hillsdale, NJ:
Lawrence Erlbaum.
Abeillé, Anne & Yves Schabes. 1989. Parsing idioms in lexicalized TAGs. In Proceedings of the
fourth conference on European chapter of the Association for Computational Linguistics,
1–9. Stroudsburg, PA: Association for Computational Linguistics.
Bargmann, Sascha & Manfred Sailer. 2015. Syntactic flexibility of non-decomposable idioms,
Abstract. 4th general meeting of PARSEME, Valletta, Malta, March 19–20. https://typo.uni-
konstanz.de/parseme/images/WG1-Volume-Outlines/BARGMANN-SAILER-outline.pdf
Cowie, Anthony. 1998. Phraseology: Theory, analysis, and applications. Oxford: Oxford
University Press.
Cutting, Cooper & Kathryn Bock. 1997. That’s why the cookie bounces: Syntactic and semantic
components of experimentally elicited idiom blends. Memory and Cognition 25(1). 57–71.
Di Sciullo, Anna Maria & Edwin Williams. 1987. On the definition of word (Linguistic Inquiry
Monograph 14). Cambridge, MA: MIT Press.
Ernst, Thomas. 1981. Grist for the linguistic mill: Idioms and “extra” adjectives. Journal of
Linguistic Research 1(3). 51–68.
Everaert, Martin. 2010. The lexical encoding of idioms. In Malka Rappaport Hovav, Edit Doron &
Ivy Sichel (eds.), Lexical semantics, syntax, and event structure, 76–98. Oxford: Oxford
University Press.
Fazly, Afsaneh, Paul Cook & Suzanne Stevenson. 2009. Unsupervised type and token identifi-
cation of idiomatic expressions. Computational Linguistics 35(1). 61–103.
Fellbaum, Christiane. 2007a. The ontological loneliness of idioms. In Andrea Schalley & Dieter
Zaefferer (eds.), Ontolinguistics, 419–434. Berlin & New York: Mouton de Gruyter.
Fellbaum, Christiane. 2007b. Introduction. In Christiane Fellbaum (ed.), Idioms and colloca-
tions: From corpus to electronic lexical resource, 1–19. Birmingham: Continuum Press.
Fellbaum, Christiane. 2014. Non-syntactic idioms and phrases. In Tibor Kiss & Artemis
Alexiadou (eds.), Handbook of syntax, 776–802. Berlin & Boston: De Gruyter.
Fellbaum, Christiane. 2015a. The treatment of multi-word units. In Philip Durkin (ed.), Oxford
handbook of lexicography, 411–425. Oxford: Oxford University Press.
Fellbaum, Christiane 2015b. Is there a grammar of idioms? Paper presented at the Brussels
Conference on Generative Linguistics (BCGL 8: The grammar of idioms) 4–5 June 2015,
Brussels, Belgium.
Fraser, Bruce. 1970. Idioms within a transformational grammar. Foundations of Language 6.
22–42.
Geeraerts, Dirk. 1995. Specialization and reinterpretation in idioms. In Martin Everaert, Erik-Jan
van der Linden, André Schenk & Rob Schreuder (eds.), Idioms: Structural and psycholo-
gical perspectives, 57–73. Hillsdale, NJ: Lawrence Erlbaum.
Geyken, Alexander. 2007. The DWDS Corpus: A reference corpus for the German language of the
twentieth century. In Christiane Fellbaum (ed.), Idioms and collocations: From corpus to
electronic lexical resource, 23–39. Birmingham: Continuum Press.
Geyken, Alexander & Alexey Sokirko. 2007. Classifying NVGs/FVGs in an interactive parsing
process. In Christiane Fellbaum (ed.), Idioms and collocations: From corpus to electronic
lexical resource, 41–53. Birmingham: Continuum Press.
Gibbs, Raymond. 1995. Idiomaticity and human cognition. In Martin Everaert, Erik-Jan van der
Linden, André Schenk & Rob Schreuder (eds.), Idioms: Structural and psychological
perspectives, 97–116. Hillsdale, NJ: Lawrence Erlbaum.
Gibbs, Raymond & Nandini Nayak. 1989. Psycholinguistic studies on the syntactic behaviour of
idioms. Cognitive Psychology 21. 100–138.
Glucksberg, Sam. 1993. Idiom meaning and allusional content. In Cristina Cacciari & Patrizia
Tabossi (eds.), Idioms: Processing, structure, and interpretation, 3–26. Hillsdale, NJ:
Lawrence Erlbaum.
Grimshaw, Jane & Arnim Mester. 1988. Light verbs and theta-marking. Linguistic Inquiry 19.
205–232.
Herold, Axel. 2007. Corpus queries. In Christiane Fellbaum (ed.), Idioms and collocations: From
corpus to electronic lexical resource, 54–63. Birmingham: Continuum Press.
Jackendoff, Ray. 1995. The boundaries of the lexicon. In Martin Everaert, Erik-Jan van der
Linden, André Schenk & Rob Schreuder (eds.), Idioms: Structural and psychological
perspectives, 153–165. Hillsdale, NJ: Lawrence Erlbaum.
Kay, Paul, Ivan Sag & Dan Flickinger. 2012. A lexical theory of phrasal idioms Unpublished
manuscript. Stanford, CA: Stanford University.
Kearns, Kate 2002 [1988]. Light verbs in English. http://citeseerx.ist.psu.edu/viewdoc/down
load;jsessionid=C4DE6737920C5946FB4D814BD1B8EB33?doi=10.1.1.132.29&rep=
rep1&type=pdf
Kramer, Undine. 2006. Linguistic lightbulb moments: Zeugma in idioms. In Christiane Fellbaum
(ed.), International Journal of Lexicography 19(4). 370–395.
Kuiper, Koenraad. 1996. Smooth talkers. The linguistic performance of auctioneers and
sportscasters. Hillsdale, NJ: Lawrence Erlbaum.
Lakoff, Robin. 1974. Remarks on ‘this’ and ‘that’. Proceedings of the Chicago Linguistics Society
(CLS) 10. 345–356.
Langlotz, Andreas. 2006. Idiomatic creativity: A cognitive-linguistic model of idiom-represen-
tation and idiom-variation in English. Amsterdam & Philadelphia: John Benjamins.
Lebeaux, David. 2000. Language acquisition and the form of grammar. Amsterdam &
Philadelphia: John Benjamins.
Mel’čuk, Igor. 1995. Phrasemes in language and phraseology in linguistics. In Martin Everaert,
Erik-Jan van der Linden, André Schenk & Rob Schreuder (eds.), Idioms: Structural and
psychological perspectives, 167–232. Hillsdale, NJ: Lawrence Erlbaum.
Moon, Rosamund. 1998. Fixed expressions and idioms in English: A corpus-based approach
(Oxford Studies in Lexicography and Lexicology). Oxford: Clarendon Press.
Moss, Helen & Lianne Older. 1996. Birkbeck word association norms. East Sussex: Psychology
Press.
Neumann, Gerald, Christiane Fellbaum, Alexander Geyken, Axel Herold, Christiane Huemmer,
Fabian Koerner, Undine Kramer, Kerstin Krell, Alexander Sokirko, Diana Stantcheva &
Ekatherini Stathi. 2004. A corpus-based lexical resource of German idioms. Paper pre-
sented at the 20th International Conference on Computational Linguistics (COLING),
Geneva, Switzerland, August 23–27.
Newmeyer, Frederick. 1974. The regularity of idiom behavior. Lingua 34. 327–342.
Nicolas, Tim. 1995. Semantics of idiom modification. In Martin Everaert, Erik-Jan van der Linden,
André Schenk & Rob Schreuder (eds.), Idioms: Structural and psychological perspectives,
233–252. Hillsdale, NJ: Lawrence Erlbaum.
Nunberg, Geoffrey, Ivan Sag & Thomas Wasow. 1994. Idioms. Language 70. 491–538.
Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive
grammar of the English language. London: Longman.
Sabban, Annette. 1998. Okkasionelle Variationen sprachlicher Schematismen: Eine Analyse
französischer und deutscher Presse- und Werbetexte. Tübingen: Narr.
Schenk, André. 1995. The syntactic behavior of idioms. In Martin Everaert, Erik-Jan van der
Linden, André Schenk & Rob Schreuder (eds.), Idioms: Structural and psychological
perspectives, 253–272. Hillsdale, NJ: Lawrence Erlbaum.
Stathi, Katerina. 2007. A corpus-based analysis of adjectival modification in German idioms. In
Christiane Fellbaum (ed.), Idioms and collocations: Corpus-based linguistic and lexico-
graphic studies, 81–108. Birmingham: Continuum Press.
Tabossi, Patrizia, Kinou Wolf & Sara Koterle. 2009. Idiom syntax: Idiosyncratic or principled?
Journal of Memory and Language 61. 77–96.
Zhu, Feng & Christiane Fellbaum. 2015. Quantifying fixedness and compositionality in Chinese
idioms. International Journal of Lexicography 28(3). 338–350.