Steven Pinker Video Transcript
Steven Pinker Video Transcript
Steven Pinker Video Transcript
And today
I’m going to speak to you about language. I’m actually not a linguist, but a cognitive
scientist. I’m not so much interested as language as an object in its own right, but as a window to
the human mind. Language is one of the fundamental topics in the human sciences. It’s the trait
that most conspicuously distinguishes humans from other species, it’s essential to human
cooperation; we accomplish amazing things by sharing our knowledge or coordinating our
actions by means of words.
It poses profound scientific mysteries such as, how did language evolve in this particular
species? How does the brain compute language? But also, language has many practical
applications not surprisingly given how central it is to human life. Language comes so naturally
to us that we’re apt to forget what a strange and miraculous gift it is. But think about what you’re
doing for the next hour. You’re going to be listening patiently as a guy makes noise as he
exhales.
Now, why would you do something like that? It’s not that I can claim that the sounds I’m going
to make are particularly mellifluous, but rather I’ve coded information into the exact sequences
of hisses and hums and squeaks and pops that I’ll be making. You have the ability to recover the
information from that stream of noises allowing us to share ideas.
Now, the ideas we are going to share are about this talent, language, but with a slightly different
sequence of hisses and squeaks, I could cause you to be thinking thoughts about a vast array of
topics, anything from the latest developments in your favorite reality show to theories of the
origin of the universe. This is what I think of as the miracle of language, its vast expressive
power, and it’s a phenomenon that still fills me with wonder, even after having studied language
for 35 years. And it is the prime phenomenon that the science of language aims to explain.
Not surprisingly, language is central to human life. The Biblical story of the Tower of Babel
reminds us that humans accomplish great things because they can exchange information about
their knowledge and intentions via the medium of language. Language, moreover, is not a
peculiarity of one culture, but it has been found in every society ever studied by
anthropologists. There are some 6,000 languages spoken on Earth, all of them complex, and no
one has ever discovered a human society that lacks complex language.
For this and other reasons, Charles Darwin wrote, “Man has an instinctive tendency to speak as
we see in the babble of our young children while no child has an instinctive tendency to bake,
brew or write.” Language is an intricate talent and it’s not surprising that the science of language
should be a complex discipline.
It includes the study of how language itself works including: grammar, the assembly of words,
phrases and sentences; phonology, the study of sound; semantics, the study of meaning; and
pragmatics, the study of the use of language in conversation.
Scientists interested in language also study how it is processed in real time, a field
called psycholinguistics; how is it acquired by children, the study of language acquisition. And
how it is computed in the brain, the discipline called neurolinguistics.
Now, before we begin, it’s important to not to confuse language with three other things that are
closely related to language. One of them is written language. Unlike spoken language, which is
found in all human cultures throughout history, writing was invented a very small number of
times in human history, about 5,000 years ago. And alphabetic writing where each mark on the
page stands for a vowel or a consonant, appears to have been invented only once in all of human
history by the Canaanites about 3,700 years ago. And as Darwin pointed out, children have no
instinctive tendency to write, but have to learn it through instruction and schooling.
A second thing not to confuse language with is proper grammar. Linguists distinguish between
descriptive grammar – the rules that characterize how people to speak – and prescriptive
grammar – rules that characterize how people ought to speak if they are writing careful written
prose. A dirty secret from linguistics is that not only are these not the same kinds of rules, but
many of the prescriptive rules of language make no sense whatsoever.
Take one of the most famous of these rules, the rule not to split infinitives. According to this
rule, Captain Kirk made a grievous grammatical error when he said that the mission of the
Enterprise was “to boldly go where no man has gone before.” He should have said, according to
these editors, “to go boldly where no man has gone before,” which immediately clashes with the
rhythm and structure of ordinary English. In fact, this prescriptive rule was based on a clumsy
analogy with Latin where you can’t split an infinitive because it’s a single word, as in [facary] to
do. Julius Caesar couldn’t have split an infinitive if he wanted to. That rule was translated
literally over into English where it really should not apply.
Another famous prescriptive rule is that, one should never use a so-called double negative. Mick
Jagger should not have sung, “I can’t get no satisfaction,” he really should have sung, “I can’t get
any satisfaction.”
Now, this is often promoted as a rule of logical speaking, but “can’t” and “any” is just as much
of a double negative as “can’t” and “no.” The only reason that “can’t get any satisfaction” is
deemed correct and “can’t get no satisfaction” is deemed ungrammatical is that the dialect of
English spoken in the south of England in the 17th century used “can’t” “any” rather than “can’t”
“no.”
If the capital of England had been in the north of the country instead of the south of the country,
then “can’t get no,” would have been correct and “can’t get any,” would have been deemed
incorrect.
There’s nothing special about a language that happens to be chosen as the standard for a given
country. In fact, if you compare the rules of languages and so-called dialects, each one is
complex in different ways. Take for example, African-American vernacular English, also called
Black English or Ebonics. There is a construction in African-American where you can say, “He
be workin,” which is not an error or bastardization or a corruption of Standard English, but in
fact conveys a subtle distinction, one that’s different than simply, “He workin.” “He be workin,”
means that he is employed; he has a job, “He workin,” means that he happens to be working at
the moment that you and I are speaking.
Now, this is a tense difference that can be made in African-American English that is not made in
Standard English, one of many examples in which the dialects have their own set of rules that is
just as sophisticated and complex as the one in the standard language.
Now, a third thing, not to confuse language with is thought. Many people report that they think
in language, but commune of psychologists have shown that there are many kinds of thought that
don’t actually take place in the form of sentences.
(1.) Babies (and other mammals) communicate without speech. For example, we know from
ingenious experiments that non-linguistic creatures, such as babies before they’ve learned to
speak, or other kinds of animals, have sophisticated kinds of cognition, they register cause and
effect and objects and the intentions of other people, all without the benefit of speech.
(2.) Types of thinking go on without language—visual thinking. We also know that even in
creatures that do have language, namely adults, a lot of thinking goes on in forms other than
language, for example, visual imagery. If you look at the top two three-dimensional figures in
this display, and I would ask you, do they have the same shape or a different shape? People don’t
solve that problem by describing those strings of cubes in words, but rather by taking an image
of one and mentally rotating it into the orientation of the other, a form of non-linguistic thinking.
(3.) We use tacit knowledge to understand language and remember the gist. For that matter, even
when you understand language, what you come away with is not in itself the actual language that
you hear. Another important finding in cognitive psychology is that long-term memory for verbal
material records the gist or the meaning or the content of the words rather than the exact form of
the words. For example, I like to think that you retain some memory of what I have been saying
for the last 10 minutes. But I suspect that if I were to ask you to reproduce any sentence that I
have uttered, you would be incapable of doing so. What sticks in memory is far more abstract
than the actual sentences, something that we can call meaning or content or semantics.
In fact, when it even comes to understanding a sentence, the actual words are the tip of a vast
iceberg of a very rapid, unconscious, non-linguistic processing that’s necessary even to make
sense of the language itself.
And I’ll illustrate this with a classic bit of poetry, the lines from the shampoo bottle. “Wet hair,
lather, rinse, repeat.”
Now, in understanding that very simple snatch of language, you have to know, for example, that
when you repeat, you don’t wet your hair a second time because it’s already wet, and when you
get to the end of it and you see “repeat,” you don’t keep repeating over and over in infinite loop,
repeat here means, “repeat just once.” Now this tacit knowledge of what the writers **** of
language had in mind is necessary to understand language, but it, itself, is not language.
(4.) If language is thinking, then where did it come from?? Finally, if language were really
thought, it would raise the question of where language would come from if it were incapable of
thinking without language. After all, the English language was not designed by some committee
of Martians who came down to Earth and gave it to us. Rather, language is a grassroots
phenomenon. It’s the original wiki, which aggregates the contributions of hundreds of thousands
of people who invent jargon and slang and new constructions, some of them get accumulated
into the language as people seek out new ways of expressing their thoughts, and that’s how we
get a language in the first place.
Now, this not to deny that language can affect thought and linguistics has long been interested in
what has sometimes been called, the linguistic relativity hypothesis or the Sapir-Whorf
Hypothesis (note correct spelling, named after the two linguists who first formulated it, namely
that language can affect thought. There’s a lot of controversy over the status of the linguistic
relativity hypothesis, but no one believes that language is the same thing as thought and that all
of our mental life consists of reciting sentences.
How language works
Now that we have set aside what language is not, let’s turn to what language is beginning with
the question of how language works. In a nutshell, you can divide language into three
topics. There are the words that are the basic components of sentences that are stored in a part of
long-term memory that we can call the mental lexicon or the mental dictionary. There are rules,
the recipes or algorithms that we use to assemble bits of language into more complex stretches of
language including syntax, the rules that allow us to assemble words into phrases and sentences;
Morphology, the rules that allow us to assemble bits of words, like prefixes and suffixes into
complex words; Phonology, the rules that allow us to combine vowels and consonants into the
smallest words.
And then all of this knowledge of language has to connect to the world through interfaces that
allow us to understand language coming from others to produce language that others can
understand us, the language interfaces.
Words
Let’s start with words. The basic principle of a word was identified by the Swiss linguist,
Ferdinand de Saussure, more than 100 years ago when he called attention to the arbitrariness of
the sign.
Take for example the word, “duck.” The word, “duck” doesn’t look like a duck or walk like a
duck or quack like a duck, but I can use it to get you to think the thought of a duck because all of
us at some point in our lives have memorized that brute force association between that sound and
that meaning, which means that it has to be stored in memory in some format, in a very
simplified form and an entry in the mental lexicon might look something like this. There is a
symbol for the word itself, there is some kind of specification of its sound and there’s some kind
of specification of its meaning.
Now, one of the remarkable facts about the mental lexicon is how capacious it is. Using
dictionary sampling techniques where you say, take the top left-hand word on every 20th page of
the dictionary, give it to people in a multiple choice test, correct for guessing, and multiply by
the size of the dictionary, you can estimate that a typical high school graduate has a vocabulary
of around 60,000 words, which works out to a rate of learning of about one new word every two
hours starting from the age of one. When you think that every one of these words is arbitrary as a
telephone number of a date in history, you’re reminded about the remarkable capacity of human
long-term memory to store the meanings and sounds of words.
But of course, we don’t just blurt out individual words, we combine them into phrases and
sentences. And that brings up the second major component of language; namely, grammar.
Grammar
Now the modern study of grammar is inseparable to the contributions of one linguist, the famous
scholar, Noam Chomsky, who set the agenda for the field of linguistics for the last 60 years. To
begin with, Chomsky noted that the main puzzle that we have to explain in understanding
language is creativity or as linguists often call it productivity, the ability to produce and
understand new sentences.
Except for a small number of clichéd formulas, just about any sentence that you produce or
understand is a brand new combination produced for the first time perhaps in your life, perhaps
even in the history of the species. We have to explain how people are capable of doing it. It
shows that when we know a language, we haven’t just memorized a very long list of sentences,
but rather have internalized a grammar or algorithm or recipe for combining elements into brand
new assemblies. For that reason, Chomsky has insisted that linguistics is really properly a branch
of psychology and is a window into the human mind.
A second insight is that languages have a syntax which can’t be identified with their
meaning. Now, the only quotation that I know of, of a linguist that has actually made it into
Bartlett’s Familiar Quotations, is the following sentence from Chomsky, from 1956, “Colorless,
green ideas sleep furiously.” Well, what’s the point of that sentence? The point is that it is very
close to meaningless. On the other hand, any English speaker can instantly recognize that it
conforms to the patterns of English syntax.
Compare, for example, “furiously sleep ideas dream colorless,” which is also meaningless, but
we perceive as a word salad.
A third insight is that syntax doesn’t consist of a string of word by word associations as in
stimulus response theories in psychology where producing a word is a response which you then
hear and it becomes a stimulus to producing the next word, and so on. Again, the sentence,
“colorless green ideas sleep furiously,” can help make this point. Because if you look at the word
by word transition probabilities in that sentence, for example, colorless and then green; how
often have you heard colorless and green in succession. Probably zero times. Green and ideas,
those two words never occur together, ideas and sleep, sleep and furiously. Every one of the
transition probabilities is very close to zero, nonetheless, the sentence as a whole can be
perceived as a well-formed English sentence.
Language in general has long distance dependencies. The word in one position in a sentence can
dictate the choice of the word several positions downstream. For example, if you begin a
sentence with “either,” somewhere down the line, there has to be an “or.” If you have an “if,”
generally, you expect somewhere down the line there to be a “then.” There’s a story about a
child who says to his father, “Daddy, why did you bring that book that I don’t want to be read to
out of, up for?” Where you have a set of nested or embedded long distance dependencies.
Indeed, one of the applications of linguistics to the study of good prose style is that sentences can
be rendered difficult to understand if they have too many long distance dependencies because
that could put a strain on the short-term memory of the reader or listener while trying to
understand them. Rather than a set of word by word associations, sentences are assembled in a
hierarchical structure that looks like an upside down tree.
Let me give you an example of how that works in the case of English. One of the basic rules of
English is that a sentence consists of a noun phrase, the subject, followed by a verb phrase, the
predicate.
A second rule in turn expands the verb phrase. A very phrase consists of a verb followed by a
noun phrase, the object, followed by a sentence, the complement as, “I told him that it was sunny
outside.”
Now, why do linguists insist that language must be composed out of phrase structural
rules? (1.) Rules allow for open-ended creativity. Well for one thing, that helps explain the main
phenomenon that we want to explain, mainly the open-ended creativity of language.
(2.) Rules allow for expression of unfamiliar meaning. It allows us to express unfamiliar
meanings. There’s a cliché in journalism, for example, that when a dog bites a man, that isn’t
news, but when a man bites a dog, that is news. The beauty of grammar is that it allows us to
convey news by assembling into familiar word in brand new combinations. Also, because of the
way phrase structure rules work, they produce a vast number of possible combinations.
(3.) Rules allow for production of vast numbers of combinations.
Moreover, the number of different thoughts that we can express through the combinatorial power
of grammar is not just humongous, but in a technical sense, it’s infinite. Now of course, no one
lives an infinite number of years, and therefore can shell off their ability to understand an infinite
number of sentences, but you can make the point in the same way that a mathematician can say
that someone who understands the rules of arithmetic knows that there are an infinite number of
numbers, namely if anyone ever claimed to have found the longest one, you can always come up
with one that’s even bigger by adding a one to it. And you can do the same thing with language.
Let me illustrate it in the following way. As a matter of fact, there has been a claim that there is a
world’s longest sentence. Who would make such a claim? Well, who else? The Guinness Book
of World Records. You can look it up. There is an entry for the World’s Longest Sentence. It is
1,300 words long. And it comes from a novel by William Faulkner. Now I won’t read all 1,300
words, but I’ll just tell you how it begins.
“They both bore it as though in deliberate flatulent exaltation…” and it runs on from there. But
I’m here to tell you that in fact, this is not the world’s longest sentence. And I’ve been tempted to
obtain immortality in Guinness by submitting the following record breaker. “Faulkner wrote,
they both bore it as though in deliberate flatulent exaltation.” But sadly, this would not be
immortality after all but only the proverbial 15 minutes of fame because based on what you now
know, you could submit a record breaker for the record breaker namely, “Guinness noted that
Faulkner wrote” or “Pinker mentioned that Guinness noted that Faulkner wrote”, or “who cares
that Pinker mentioned that Guinness noted that Faulkner wrote…”
Take for example, the following wonderfully ambiguous sentence that appeared in TV Guide.
“On tonight’s program, Conan will discuss sex with Dr. Ruth.” Now this has a perfectly innocent
meaning in which the verb, “discuss” involves two things, namely the topic of discussion, “sex”
and the person with who it’s being discussed, in this case, with Dr. Ruth. But is has a somewhat
naughtier meaning if you rearrange the words into phrases according to a different structure in
which case “sex with Dr. Ruth” is the topic of conversation, and that’s what’s being discussed.
Now, phrase structure not only can account for our ability to produce so many sentences, but it’s
also necessary for us to understand what they mean. The geometry of branches in a phrase
structure is essential to figuring out who did what to whom.
Another important contribution of Chomsky to the science of language is the focus on language
acquisition by children. Now, children can’t memorize sentences because knowledge of language
isn’t just one long list of memorized sentences, but somehow they must distill out or abstract out
the rules that goes into assembling sentences based on what they hear coming out of their
parent’s mouths when they were little.
And the talent of using rules to produce combinations is in evidence from the moment that kids
begin to speak.
Children create sentences unheard from adults? At the two-word stage, which you typically see
in children who are 18 months or a bit older, kids are producing the smallest sentences that
deserve to be counted as sentences, namely two words long. But already it’s clear that they are
putting them together using rules in their own mind. To take an example, a child might say,
“more outside,” meaning, take them outside or let them stay outside. Now, adults don’t say,
“more outside.” So it’s not a phrase that the child simply memorized by rote, but it shows that
already children are using these rules to put together new combinations.
Another example, a child having jam washed from his fingers said to his mother ‘all gone
sticky’. Again, not a phrase that you could ever have copied from a parent, but one that shows
the child producing new combinations.
Past tense rule? An easy way of showing that children assimilate rules of grammar unconsciously
from the moment they begin to speak, is the use of the past tense rule. For example, children go
through a long stage in which they make errors like, “We holded the baby rabbits” or “He teared
the paper and then he sticked it.” Cases in which they over generalize the regular rule of forming
the past tense, add ‘ed’ to irregular verbs like “hold,” “stick” or “tear.” And it’s easy to
show… it’s easy to get children to flaunt this ability to apply rules productively in a laboratory
demonstration called the Wug Test. You bring a kid into a lab. You show them a picture of a
little bird and you say, “This is a wug.” And you show them another picture and you say, “Well,
now there are two of them.” There are two and children will fill in the gap by saying
“wugs.” Again, a form they could not have memorize because it’s invented for the experiment,
but it shows that they have productive mastery of the regular plural rule in English.
And famously, Chomsky claimed that children solved the problem of language acquisition by
having the general design of language already wired into them in the form of a universal
grammar. A spec sheet for what the rules of any language have to look like.
What is the evidence that children are born with a universal grammar? Well, surprisingly,
Chomsky didn’t propose this by actually studying kids in the lab or kids in the home, but through
a more abstract argument called, “The poverty of the input.” Namely, if you look at what goes
into the ears of a child and look at the talent they end up with as adults, there is a big chasm
between them that can only be filled in by assuming that the child has a lot of knowledge of the
way that language works already built in.
Here’s how the argument works. One of the things that children have to learn when they learn
English is how to form a question.
Now, children will get evidence from parent’s speech to how the question rule works, such as
sentences like, “The man is here,” and the corresponding question, “Is the man here?”
Now, logically speaking, a child getting that kind of input could posit two different kinds of
rules. There’s a simple word by word linear rule. In this case, find the first “is” in the sentence
and move it to the front. “The man is here,” “Is the man here?” Now there’s a more complex rule
that the child could posit called a structure dependent rule, one that looks at the geometry of the
phrase structure tree. In this case, the rule would be: find the first “is” after the subject noun
phrase and move that to the front of the sentence.
A diagram of what that rule would look like is as follows: you look for the “is” that occurs after
the subject noun phrase and that’s what gets moved to the front of the sentence. Now, what’s the
difference between the simple word-by-word rule and the more complex structured dependent
rule? Well, you can see the difference when it comes to performing the question from a slightly
more complex sentence like, “The man who is tall is in the room.”
But how is the child supposed to learn that? How did all of us end up with the correct structured
dependent of the rule rather than the far simpler word-by-word version of the rule? “Well,”
Chomsky argues, “if you were actually to look at the kind of language that all of us hear, it’s
actually quite rare to hear a sentence like, “Is the man who is tall in the room? The kind of input
that would logically inform you that the word-by-word rule is wrong and the structure dependent
rule is right. Nonetheless, we all grow up into adults who unconsciously use the structure
dependent rule rather than the word-by-word rule. Moreover, children don’t make errors like, “is
the man who tall is in the room,” as soon as they begin to form complex questions, they use the
structure dependent rule. And that,” Chomsky argues, “is evidence that structure dependent rules
are part of the definition of universal grammar that children are born with.”
Now, though Chomsky has been fantastically influential in the science of language that does not
mean that all language scientists agree with him. And there have been a number of critiques of
Chomsky over the years. For one thing, the critics point out, Chomsky hasn’t really shown
principles of universal grammar that are specific to language itself as opposed to general ways in
which the human mind works across multiple domains, language and vision and control of
motion and memory and so on. We don’t really know that universal grammar is specific to
language, according to this critique.
Secondly, Chomsky and the linguists working with him have not examined all 6,000 of the
world’s languages and shown that the principles of universal grammar apply to all
6,000. They’ve posited it based on a small number of languages and the logic of the poverty of
the input, but haven’t actually come through with the data that would be necessary to prove that
universal grammar is really universal.
Finally, the critics argue, Chomsky has not shown that more general purpose learning models,
such as neuro network models, are incapable of learning language together with all the other
things that children learn, and therefore has not proven that there has to be specific knowledge
how grammar works in order for the child to learn grammar.
Another component of language governs the sound pattern of language, the ways that the vowels
and consonants can be assembled into the minimal units that go into words. Phonology, as this
branch of linguistics is called, consists of formation rules that capture what is a possible word in
a language according to the way that it sounds.
To give you an example, the sequence, bluk, is not an English word, but you get a sense that it
could be an English word that someone could coin a new form… that someone could coin a new
term of English that we pronounce “bluk.” But when you hear the sound ****, you instantly
know that not only isn’t it an English word, but it really couldn’t be an English word. ****, by
way, comes from Yiddish and it means kind of to sigh or to moan. Oi. That’s to ****. The
reason that we recognize that it’s not English is because it has sounds like **** and sequences
like ****, which aren’t part of the formation rules of English phonology. But together with the
rules that define the basic words of a language, there are also phonological rules that make
adjustments to the sounds, depending on what the other words the word appears with. Very few
of us realize, for example, in English, that the past tense suffix “ed” is actually pronounced in
three different ways.
When we say, “He walked,” we pronounce the “ed” like a “ta,” walked. When we say “jogged,”
we pronounce it as a “d,” jogged. And when we say “patted,” we stick in a vowel, pat-ted,
showing that the same suffix, “ed” can be readjusted in its pronunciation according to the rules
of English phonology.
Now, when someone acquires English as a foreign language or acquires a foreign language in
general, they carry over the rules of phonology of their first language and apply it to their second
language. We have a word for it; we call it an “accent.” When a language user deliberately
manipulates the rules of phonology, that is, when they don’t just speak in order to convey
content, they pay attention as to what phonological structures are being used; we call it poetry
and rhetoric.
So far, I’ve been talking about knowledge of language, the rules that go into defining what are
possible sequences of language. But those sequences have to get into the brain during speech
comprehension and they have to get out during speech production. And that takes us to the topic
of language interfaces.
And let’s start with production. This diagram here is literally a human cadaver that has been
sawn in half. An anatomist took a saw and [sound] allowing it to see in cross section the human
vocal tract.
And that can illustrate how we get out knowledge of language out into the world as a sequence of
sounds.
Now, each of us has at the top of our windpipe or trachea, a complex structure called the larynx
or voice box; it’s behind your Adam’s Apple. And the air coming out of your lungs have to go
passed two cartilaginous flaps that vibrate and produce a rich, buzzy sound source, full of
harmonics. Before that vibrating sound gets out to the world, it has to pass through a gauntlet or
chambers of the vocal tract. The throat behind the tongue, the cavity above the tongue, the cavity
formed by the lips, and when you block off airflow through the mouth, it can come out through
the nose.
Now, each one of those cavities has a shape that, thanks to the laws of physics, will amplify
some of the harmonics in that buzzy sound source and suppress others. We can change the shape
of those cavities when we move our tongue around. When we move our tongue forward and
backward, for example, as in “eh,” “aa,” “eh,” “aa,” we change the shape of the cavity behind the
tongue, change the frequencies that are amplified or suppressed and the listener hears them as
two different vowels.
Likewise, when we raise or lower the tongue, we change the shape of the resonant cavity above
the tongue as in say, “eh,” “ah,” “eh,” “ah.” Once again, the change in the mixture of harmonics
is perceived as a change in the nature of the vowel. When we stop the flow of air and then
release it as in, “t,” “ca,” “ba.” Then we hear a consonant rather than a vowel or even when we
restrict the flow of air as in “f,” “ss” producing a chaotic noisy sound. Each one of those sounds
that gets sculpted by different articulators is perceived by the brain as a qualitatively different
vowel or consonant.
Now, an interesting peculiarity of the human vocal track is that it obviously co-ops structures
that evolved for different purposes for breathing and for swallowing and so on. And it’s an
interesting fact first noted by Darwin that the larynx over the course of evolution has descended
in the throat so that every particle of food going from the mouth through the esophagus to the
stomach has to pass over the opening into the larynx with some probability of being inhaled
leading to the danger of death by choking. And in fact, until the invention of the Heimlich
Maneuver, several thousand people every year died of choking because of this maladaptive of
the human vocal tract.
Why did we evolve a mouth and throat that leaves us vulnerable to choking? Well, a plausible
hypothesis is that it’s a compromise that was made in the course of evolution to allow us to
speak. By giving range to a variety of possibilities for alternating the resonant cavities, for
moving the tongue back and forth and up and down, we expanded the range of speech sounds we
could make, improve the efficiency of language, but suffered the compromise of an increased
risk of choking showing that language presumably had some survival advantage that
compensated for the disadvantage in choking.
What about the flow of information in the other direction, that is from the world into the brain,
the process of speech comprehension? Speech comprehension turns out to be an extraordinarily
complex computational process, which we’re reminded of every time we interact with a
voicemail menu on a telephone or you use a dictation on our computers. For example, One
writer, using the state-of-the-art speech-to-text systems dictated the following words into his
computer. He dictated “book tour,” and it came out on the screen as “back to work.” Another
example, he said, “I truly couldn’t see,” and it came out on the screen as, “a cruelly good
MC.” Even more disconcertingly, he started a letter to his parents by saying, “Dear mom and
dad,” and what came out on the screen, “The man is dead.” Now, dictation systems have gotten
better and better, but they still have a way to go before they can duplicate a human stenographer.
What is it about the problem of speech understanding that makes it so easy for a human but so
hard for a computer? Well, there are two main contributors. One of them is the fact that each
phony, each vowel or consonant actually comes out very differently, depending on what comes
before and what comes after. A phenomenon sometimes called co-articulation.
Let me give you an example. The place called Cape Cod has two “c” sounds. Each of them
symbolized by the letter “C,” the hard “C.” Nonetheless, when you pay attention to the way you
pronounce them, you notice that in fact, you pronounce them in very different parts of the
mouth. Try it. Cape Cod, Cape Cod… “c,” “c”. In one case, the “c” is produced way back in the
mouth; the other it’s produced much farther forward. We don’t notice that we pronounce “c” in
two different ways depending whether it comes before an “a” or an “ah,” but that difference
forms a difference in the shape of the resonant cavity in our mouth which produces a very
different wave form.
And unless a computer is specifically programmed to take that variability into account, it will
perceive those two different “c’s,” as a different sound that objectively speaking, they really are:
“c-eh” “c-oa”. They really are different sounds, but our brain lumps them together. The other
reason that speech recognition is such a difficult problem is because of the absence of
segmentation. Now we have an illusion when we listen to speech that consists of a sequence to
sounds corresponding to words. But if you actually were to look at the wave form of a sentence
on a oscilloscope, there would not be little silences between the words the way there are little bits
of white space in printed words on a page, but rather a continuous ribbon in which the end of one
word leads right to the beginning of the next. It’s something that we’re aware of… It’s
something that we’re aware of when we listen to speech in a foreign language when we have no
idea where one word ends and the other one begins.
In our own language, we detect the word boundaries simply because in our mental lexicon, we
have stretches of sound that correspond to one word that tell us where it ends.
But you can’t get that information from the wave form itself. In fact, there’s a whole genre of
wordplay that takes advantage of the fact that word boundaries are not physically present in the
speech wave. Novelty songs like Mairzy doats and dozy doats and liddle lamzy divey? ?A
kiddley divey too, wooden shoe?
Now, it turns out that this is actually a grammatical sequence in words in English… Mares eat
oats and does eat oats and little lambs eat ivy, a kid’ll eat ivy too, wouldn’t you? When it is
spoken or sung normally, the boundaries between words are obliterated and so the same
sequence of sounds can be perceived either as nonsense or if you know what they’re meant to
convey, as sentences.
Another example familiar to most children, Fuzzy Wuzzy was a bear, Fuzzy Wuzzy had no
hair. Fuzzy Wuzzy wasn’t very fuzzy, was he? And the famous dogroll, I scream, you scream,
we all scream for ice cream. We are generally unaware of how unambiguous language is. In
context, we effortlessly and unconsciously derive the intended meaning of a sentence, but a poor
computer not equipped with all of our common sense and human abilities and just going by the
words and the rules is often flabbergasted by all the different possibilities. Take a sentence as
simple as “Mary had a little lamb,” you might think that that’s a perfectly simple unambiguous
sentence.
But now imagine that it was continued with “with mint sauce.” You realize that “have” is
actually a highly ambiguous word. As a result, the computer translations can often deliver
comically incorrect results. According to legend, one of the first computer systems that was
designed to translate from English to Russian and back again did the following given the
sentence, “The spirit is willing, but the flesh is weak,” it translated it back as “The vodka is
agreeable, but the meat is rotten.”
So why do people understand language so much better than computers? What is the knowledge
that we have that has been so hard to program into our machines? Well, there’s a third interface
between language and the rest of the mind, and that is the subject matter of the branch of
linguistics called Pragmatics, namely, how people understand language in context using their
knowledge of the world and their expectation about how other speakers communicate.
The most important principle of Pragmatics is called “the cooperative principle,” namely;
assume that your conversational partner is working with you to try to get a meaning across
truthfully and clearly. And our knowledge of Pragmatics, like our knowledge of syntax and
phonology and so on, is deployed effortlessly, but involves many intricate computations.
For example, if I were to say, “If you could pass the guacamole, that would be awesome.” You
understand that as a polite request meaning, give me the guacamole. You don’t interpret it
literally as a rumination about a hypothetical affair, you just assume that the person wanted
something and was using that string of words to convey the request politely. Often comedies will
use the absence of pragmatics in robots as a source of humor.
As in the old “Get Smart” situation comedy, which had a robot named, Hymie, and a recurring
joke in the series would be that Maxwell Smart would say to Hymie, “Hymie, can you give me a
hand?” And then Hymie would go, {sound},remove his hand and pass it over to Maxwell Smart
not understanding that “give me a hand,” in context means, help me rather than literally transfer
the hand over to me.
Or take the following example of Pragmatics in action. Consider the following dialogue, Martha
says, “I’m leaving you.” John says, “Who is he?” Now, understanding language requires finding
the antecedents pronouns, in this case who the “he” refers to, and any competent English speaker
knows exactly who the “he” is, presumably John’s romantic rival even though it was never stated
explicitly in any part of the dialogue. This shows how we bring to bear on language
understanding a vast store of knowledge about human behavior, human interactions, human
relationships. And we often have to use that background knowledge even to solve mechanical
problems like, who does a pronoun like “he” refer to. It’s that knowledge that’s extraordinarily
difficult, to say the least to program into a computer.
Language is a miracle of the natural world because it allows us to exchange an unlimited number
of ideas using a finite set of mental tools. Those mental tools comprise a large lexicon of
memorized words and a powerful mental grammar that can combine them. Language thought of
in this way should not be confused with writing, with the prescriptive rules of proper grammar or
style or with thought itself.
Modern linguistics is guided by the questions, though not always the answers suggested by the
linguist known as Noam Chomsky, namely how is the unlimited creativity of language
possible? What are the abstract mental structures that relate word to one another? How do
children acquire them? What is universal across languages? And what does that say about the
human mind? The study of language has many practical applications including computers that
understand and speak, the diagnosis and treatment of language disorders, the teaching of reading,
writing, and foreign languages, the interpreting of the language of law, politics and literature.
But for someone like me, language is eternally fascinating because it speaks to such fundamental
questions of the human condition. Language is really at the center of a number of different
concerns of thought, of social relationships, of human biology, of human evolution, that all speak
to what’s special about the human species. Language is the most distinctively human
talent. Language is a window into human nature, and most significantly, the vast expressive
power of language is one of the wonders of the natural world. Thank you.