0% found this document useful (0 votes)
32 views

NLP LectureNotes 2022 Edited

Uploaded by

rohithatimsi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

NLP LectureNotes 2022 Edited

Uploaded by

rohithatimsi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 186

Natural Language Processing

(PE 633 – IT)

LECTURE NOTES

Prepared By
J.SOWJANYA
Asst. Prof., ITD
Department of Information Technology

M.V.S.R. Engineering College


(Sponsored by Matrusri Educational Society, Estd. 1980)
Affiliated to Osmania University, Hyderabad and Recognized by A.I.C.T.E.NEW DELHI
Nadergul (P.O), Hyderabad – 501510, A.P.

Vision of the Department:

To impart technical education producing competent and socially responsible engineering


professionals in the field of Information Technology.

Mission of the Department:

M1. To make teaching learning process effective and stimulating.

M2. To provide adequate fundamental knowledge of sciences and Information Technology with
positive attitude.

M3. To create an environment that enhances skills and technologies required for industry.

M4. To encourage creativity and innovation for solving real world problems.

M5. To cultivate professional ethics in students and inculcate a sense of responsibility towards
society

Program Educational Objectives:

After 3 to 4 years of graduation, graduates of the Information Technology program will:

PEO1: Engage in Professional development or postgraduate education to be a life-long learner.

PEO2: Apply knowledge of mathematics and Information Technology to analyze, design and
implement solutions for real world problems in core or in multidisciplinary areas.

PEO3: Communicate effectively, work in a team, practice professional ethics and apply
knowledge of computing technologies for societal development.

Program Specific Outcomes (PSOs):


PSO 1- Hardware design: An ability to analyze, design, simulate and implement computer
hardware / software and use basic analog/digital circuits, VLSI design for various computing
and communication system applications.

PSO 2 - Software design: An ability to analyze a problem, design algorithm, identify and define
the computing requirements appropriate to its solution and implement the same.
PROGRAM OUTCOMES(POs)

Engineering Graduates will be able to:

1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering


fundamentals, and an engineering specialization to the solution of complex engineering problems.
2. Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.
3. Design/development of solutions: Design solutions for complex engineering problems and
design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and research
methods including design of experiments, analysis and interpretation of data, and synthesis of the
information to provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities with
an understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to
the professional engineering practice.
7. Environment and sustainability: Understand the impact of the professional engineering
solutions in societal and environmental contexts, and demonstrate the knowledge of, and need for
sustainable development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms
of the engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or leader in
diverse teams, and in multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write effective
reports and design documentation, make effective presentations, and give and receive clear
instructions.
11. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
UNIVERSITY ALMANAC
1. Commencement of classes : 28-03-2022

2. CIE (Internal Test ) - I : 02-05-2022 to 07-05-2022

3. Display of CIE - I Marks : 13-05-2022

4. CIE ( Internal Test ) -II : 13-06-2022 to 16-06-2022

5. Display of total Sessional Marks : 25-06-2022

6. Last date of instruction : 18-06-2022

7. Preparation & Practical Examinations : 27-06-2022 to 08-07-2022

8. Commencement of Theory Examinations : 11-07-2022

9. Commencement of Next Academic year : 25-07-2022

TIME TABLE

Section-B
1. Wednesday : 10:30am to 11:30am

2. Thursday : 11:40 am to 12: 40pm

3. Friday : 09:30 am to 10:30 am

4. Saturday : 10:30am to 11:30am

Section- C
1. Monday : 10:30am to 11:30am

2. Tuesday : 12:40 pm to 1: 40 pm

2. Wednesday : 12:40 pm to 1: 40 pm

3. Thursday : 09:30 am to 10:30 am


Maturi Venkata Subba Rao (MVSR) Engineering College
Nadergul, Hyderabad-501510
DEPARTMENT OF INFORMATION TECHNOLOGY

LIST OF HOLIDAYS – 2022


Occasion / Festival Date Week day
New Year Day 01-01-2022 Saturday
Bhogi 14-01-2022 Friday
Sankrathi Pongal 15-01-2022 Saturday
Republic Day 26-01-2022 Wednesday
Mahasivaratri 01-03-2022 Tuesday
Holi 18-03-2022 Friday
Ugadi 02-04-2022 Saturday
BabuJagjeevanram Birthday. 05-04-2022 Tuesday
Srirama Navami 10-04-2022 Sunday
Dr. Ambedkar Birthday 14-04-2022 Thursday
Good Friday 15-04-2022 Friday
Eid ul Fitr (Ramzan) 03-05-2022 Tuesday
Following day of Ramzan. 04-05-2022 Wednesday
Eidul Azha (Bakrid) 10-07-2022 Sunday
Bonalu 25-07-2022 Monday
Muharam 09-08-2022 Tuesday
Independence Day 15-08-2022 Monday
Sri Krishnastami 20-08-2022 Saturday
Vinayaka chavithi 31-08-2022 Wednesday
Bathukamma Starting Day 25-09-2022 Sunday
Gandhi Jayanthi 02-10-2022 Sunday
Vijaya Dasami/ Dussehra 05-10-2022 Wednesday
Following day of Vijaya Dasami 06-10-2022 Thursday
Eid Milad-un-Nabi 09-10-2022 Sunday
Deepavali 25-10-2022 Tuesday
Karthika Purnima/Gurunanak Jayanthi 08-11-2022 Tuesday
Christmas 25-12-2022 Sunday
Boxing Day 26-12-2022 Monday
PE 633 IT WITH EFFECT FROM THE ACADEMIC YEAR 2018-2019

NATURAL LANGUAGE PROCESSING


Instruction 3 Periods per week
Duration of University Examination 3 Hours
University Examination 70 Marks
Sessional 30 Marks
UNIT – I
Natural Language Processing: Introduction to Natural Language Processing, the study of
Language, Applications of NLP, Evaluating Language Understanding Systems, Different
levels of Language Analysis, Representations and Understanding, Organization of Natural
Language Understanding Systems,
Linguistic Background: An outline of English syntax Spoken Language input and output
Technologies. Written language Input – Mathematical Methods – statistical Modelling and
classification Finite State Methods.
Grammar for Natural Language Processing – Parsing – Semantic and Logic Form –

UNIT – II
Introduction to Semantics and Knowledge Representation: some applications like
Machine translation, database interface Semantic Interpretation, word senses and
ambiguity, Basic logical form language, Encoding ambiguity in logical from, Thematic roles,
Linking syntax and semantics, Recent trends in NLP.

UNIT – III
Grammars and Parsing: Grammars and sentence Structure, Top-Down and Bottom-Up
Parsers, Transition Network Grammars, Top-Down Chart Parsing. Feature Systems and
Augmented Grammars: Basic Feature system for English, Morphological Analysis and the
Lexicon, Parsing with Features, Augmented Transition Networks.

UNIT – IV
Semantic Interpretation: word senses and ambiguity, Basic logical form language,
Encoding ambiguity in logical from, Thematic roles, Linking syntax and semantics, Recent
trends in NLP.

UNIT – V
Ambiguity Resolution: Statistical Methods, Probabilistic Language Processing, Estimating
Probabilities, Part- of- Speech tagging, Obtaining Lexical Probabilities, Probabilistic
Context- Free Grammars, Best First Parsing. Semantics and Logical Form, Word senses and
Ambiguity, Encoding Ambiguity in Logical Form.

Suggested Readings:
1. James Allen, ―Natural Language Understanding‖, Pearson Education
2. Christopher D Manning and Hinrich Schutze, ―Foundations of Statistical Natural
Language Processing‖ MIT Press, 1999.
3. Akshar Bharti, Vineet Chaitanya and Rajeev Sangal, ―NLP: A Paninian Perspective‖,
Prentice Hall, New Delhi
4. D. Jurafsky, J. H. Martin, ―Speech and Language Processing‖, Pearson
Objectives and Outcomes

Course Name: Natural Language Processing Faculty: J.Sowjanya

Code: PC 503 IT

Objectives:

1. To represent and analyse natural language both spoken and written, using statistical
and finite state methods for modelling and classification. To use grammar for
natural language processing.
2. To study knowledge representation from its semantics view point with emphasis on
applications. To study basic logical form language to encode ambiguity.
3. To study augmented grammars and parsers for feature systems.
4. To resolve and encode ambiguity using statistical methods to estimate lexical
probabilities along with critical study of probabilistic context free grammars and
parsing.
5. To interpret semantics covering ambiguity and link syntax to semantics

Outcomes:

After completing this course, the student will be able to

1. Use statistical and finite state methods for modelling and classification for
representation and analysis of natural languages, and use grammars for natural
language processing.
2. Apply knowledge representation and semantics to machine translation and
database semantic interpretation.
3. Perform top-down and bottom-up parsing, and parsing with features.
4. Estimate lexical probabilities, resolve ambiguity, and use probabilistic context-free
grammar.
5. Able to encode ambiguity in logical form language and deal with word-sense and
ambiguity and to link syntax to semantics.
LECTURE PLAN
Course: Natural Language Processing Faculty: J. Sowjanya.
Code: PE 633 IT Designation: Asst Prof.

B.E IIIYear , Section-B, VI semester, 2021-2022


Department of Information Technology, Maturi Venkata Subba Rao (MVSR) Engineering College.

Week Topics Unit No.

Delivery
Lecture

Periods

Lecture

Lecture
No. Broad area
No. of

Date

Date
Plan
No

1 1 1 30/03 Introduction to Natural Language Processing, the study of Language,


2 1 31/03 Applications of NLP, Evaluating Language Understanding Systems Unit- 1
3 1 01/04 Different levels of Language Analysis, Representations and Understanding
2 4 1 06/04 Organization of Natural Language Understanding Systems,Linguistic Natural
5 1 07/04 Background: An outline of English syntax Language
6 1 08/04 An outline of English syntax contd.. Processing &
3 7 1 13/04 Spoken Language input and output Technologies. Linguistic
8 1 16/04 Written language Input Mathematical Methods – statistical Modelling Background
9 1 16/04 classification Finite State Methods. Grammar for Natural Language
Processing, Parsing,Semantic and Logic Form
4 10 1 20/04 Introduction to Semantics and Knowledge Representation: some Unit – 2
11 1 21/04 applications like Machine translation, database interface
12 1 22/04 Semantic Interpretation, word senses and ambiguity, Introduction
13 1 23/04 Basic logical form language, to Semantics
5 14 1 27/04 Encoding ambiguity in logical from and
15 1 28/04 Thematic roles, Knowledge
16 1 29/04 Linking syntax and semantics Representati
17 1 30/04 Recent trends in NLP. on
6 18 1 05/05 Grammars and Parsing: Grammars and sentence Structure,
19 1 06/05 Top-Down and Bottom-Up Parsers,
20 1 07/05 Transition Network Grammars, Top-Down Chart Parsing.
7 CIE – I ( I Internal Test )
09-05-2022 to 11-05-2022 Unit – 3
21 1 12/05 Feature Systems and Augmented Grammars:
22 1 13/05 Basic Feature system for English, Grammars
23 1 14/05 Morphological Analysis and the Lexicon, and Parsing:
8 24 1 18/05 Parsing with Features
25 1 19/05 Augmented Transition Networks.
26 1 20/05 Semantic Interpretation: word senses and ambiguity,
27 1 21/05 Basic logical form language,
9 28 1 18/05 Encoding ambiguity in logical from, Unit –4
29 1 19/05 Thematic roles, Semantic
30 1 20/05 Linking syntax and semantics, Interpretatio
31 1 21/05 Recent trends in NLP. n
10 32 1 25/05 Ambiguity Resolution: Statistical Methods
33 1 26/05 Probabilistic Language Processing
34 1 27/05 Estimating Probabilities,
35 1 28/05 Part- of- Speech tagging
36 1 01/06 Obtaining Lexical Probabilities Unit –5
11 37 1 02/06 Probabilistic Context- Free Grammars, Ambiguity
38 1 03/06 Best First Parsing. Resolution
39 1 04/06 Semantics and Logical Form,
12 40 1 08/06 Word senses and Ambiguity
41 1 09/06 Encoding Ambiguity in Logical Form
42 1 10/06 Review of Unit – I,II, III,IV,V Previous Question Papers
13 CIE – II (II Internal Test)
13-06-2022 to 16-06-2022 (Last Date of Instruction : 18-06-2022)

Total No. of Periods: 42


DATE: Signature:
LECTURE PLAN
Course: Natural Language Processing Faculty: J. Sowjanya.
Code: PE 633 IT Designation: Asst Prof.

B.E IIIYear , Section-C, VI semester, 2021-2022


Department of Information Technology, Maturi Venkata Subba Rao (MVSR) Engineering College.

Week Topics Unit No.

Delivery
Lecture

Periods

Lecture

Lecture
No. Broad area
No. of

Date

Date
Plan
No

1 1 1 28/03 Introduction to Natural Language Processing, the study of Language,


2 1 29/03 Applications of NLP, Evaluating Language Understanding Systems Unit- 1
3 1 30/03 Different levels of Language Analysis, Representations and Understanding
4 1 31/03 Organization of Natural Language Understanding Systems, Natural
2 5 1 04/04 Linguistic Background: An outline of English syntax Language
6 1 06/04 An outline of English syntax contd.. Processing &
7 1 07/04 Spoken Language input and output Technologies. Linguistic
3 8 1 11/04 Written language Input Mathematical Methods – statistical Modelling Background
9 1 12/04 classification Finite State Methods. Grammar for Natural Language
10 1 13/04 Processing, Parsing,Semantic and Logic Form
4 11 1 18/04 Introduction to Semantics and Knowledge Representation: some Unit – 2
12 1 19/04 applications like Machine translation, database interface
13 1 20/04 Semantic Interpretation, word senses and ambiguity, Introduction
14 1 21/04 Basic logical form language, to Semantics
5 15 1 25/04 Encoding ambiguity in logical from, Thematic roles, and
16 1 26/04 Linking syntax and semantics, Recent trends in NLP. Knowledge
17 1 27/04 Grammars and Parsing: Grammars and sentence Structure, Representati
18 1 28/04 Top-Down and Bottom-Up Parsers, on
6 19 1 02/05 Transition Network Grammars,
20 1 05/05 Top-Down Chart Parsing.
7 CIE – I ( I Internal Test )
09-05-2022 to 11-05-2022
21 1 12/05 Feature Systems and Augmented Grammars: Unit – 3
8 22 1 16/05 Basic Feature system for English,
23 1 17/05 Morphological Analysis and the Lexicon, Grammars
24 1 18/05 Parsing with Features, Augmented Transition Networks. and Parsing:
25 1 19/05 Semantic Interpretation: word senses and ambiguity,
9 26 1 18/05 Basic logical form language, Encoding ambiguity in logical from, Unit –4
27 1 19/05 Thematic roles, Semantic
28 1 20/05 Linking syntax and semantics, Interpretatio
29 1 21/05 Recent trends in NLP. n
10 30 1 23/05 Ambiguity Resolution: Statistical Methods
31 1 24/05 Probabilistic Language Processing
32 1 25/05 Estimating Probabilities,
33 1 26/05 Part- of- Speech tagging
34 1 30/05 Obtaining Lexical Probabilities Unit –5
11 35 1 31/05 Probabilistic Context- Free Grammars, Ambiguity
36 1 01/06 Best First Parsing. Resolution
37 1 02/06 Semantics and Logical Form,
12 38 1 06/06 Word senses and Ambiguity
39 1 07/06 Encoding Ambiguity in Logical Form
40 1 08/06 Review of Unit – I,II, III,IV,V Previous Question Papers
41 1 09/06 Review of Unit – I,II, III,IV,V Previous Question Papers
13 CIE – II (II Internal Test)
13-06-2022 to 16-06-2022 (Last Date of Instruction : 18-06-2022)

Total No. of Periods: 41


DATE: Signature:
UNIT – I
Natural Language Processing: Introduction to Natural Language Processing, the study of
Language, Applications of NLP, Evaluating Language Understanding Systems, Different levels of
Language Analysis, Representations and Understanding, Organization of Natural Language
Understanding Systems,
Linguistic Background: An outline of English syntax Spoken Language input and output
Technologies. Written language Input – Mathematical Methods – statistical Modelling and
classification Finite State Methods.
Grammar for Natural Language Processing – Parsing – Semantic and Logic Form –

Introduction to Natural Language Processing


NLP is an automated way to understand and analyze natural human language and extract
information from such data by applying machine learning algorithms. It is the technology that is used
by machines to understand, analyse, manipulate, and interpret human's languages. It helps developers to
organize knowledge for performing tasks such as translation, automatic summarization, Named Entity
Recognition (NER), speech recognition, relationship extraction, and topic segmentation.

Applications of NLP
There are the following applications of NLP -
1. Speech Recognition - Speech recognition is used for converting spoken words into text. It is used in
applications, such as mobile, home automation, video recovery, dictating to Microsoft Word, voice
biometrics, voice user interface, and so on.
2. Spam Detection -Spam detection is used to detect unwanted e-mails getting to a user's inbox.
3. Sentiment Analysis - Sentiment Analysis is also known as opinion mining. It is used on the web to
analyse the attitude, behaviour, and emotional state of the sender. This application is implemented through
a combination of NLP (Natural Language Processing) and statistics by assigning the values to the text
(positive, negative, or natural), identify the mood of the context (happy, sad, angry, etc.)
4. Machine Translation - Machine translation is used to translate text or speech from one natural language
to another natural language. Example: Google Translator
5. Spelling correction (spell check) -Microsoft Corporation provides word processor software like MS-
word, PowerPoint for the spelling correction.
6. Chatbot (Virtual Assistance) - used by many companies to provide the customer's chat services.
7. Information extraction - used for extracting structured information from unstructured or semi-
structured machine-readable documents.
8. Question Answering- focuses on building systems that automatically answer the questions asked by
humans in a natural language.
9. Auto correction
10. Recommendations.

Components of NLP

Input Speech NLU NLG Output


Recognition

There are the following two components of NLP -


1. Natural Language Understanding (NLU)
Natural Language Understanding (NLU) helps the machine to understand and analyse human language by
extracting the metadata from content such as concepts, entities, keywords, emotion, relations, and semantic
roles. NLU mainly used in Business applications to understand the customer's problem in both spoken and
written language.
NLU involves the following tasks -
 It is used to map the given input into useful representation.
 It is used to analyze different aspects of the language.

2. Natural Language Generation (NLG)


Natural Language Generation (NLG) acts as a translator that converts the computerized data into natural
language representation. It mainly involves Text planning, Sentence planning, and Text Realization.
Note: The NLU is difficult than NLG.
Difference between NLU and NLG
NLU NLG
NLU is the process of reading and interpreting
NLG is the process of writing or generating language.
language.
It produces non-linguistic outputs from natural It produces constructing natural language outputs from
language inputs. non-linguistic inputs.

NLP pipeline
There are the following steps to build an NLP pipeline -
Step1: Sentence Segmentation - Sentence Segment is the first step for building the NLP pipeline. It breaks
the paragraph into separate sentences.
Example: Consider the following paragraph -
Independence Day is one of the important festivals for every Indian citizen. It is celebrated on the
15th of August each year ever since India got independence from the British rule. The day celebrates
independence in the true sense.
Sentence Segment produces the following result:
1. "Independence Day is one of the important festivals for every Indian citizen."
2. "It is celebrated on the 15th of August each year ever since India got independence from the British
rule."
3. "This day celebrates independence in the true sense."

Step2: Word Tokenization


Word Tokenizer is used to break the sentence into separate words or tokens.
Example:
JavaTpoint offers Corporate Training, Summer Training, Online Training, and Winter Training.
Word Tokenizer generates the following result:
"JavaTpoint", "offers", "Corporate", "Training", "Summer", "Training", "Online", "Training", "and", "Winter",
"Training", "."

Step3: Stemming
Stemming is used to normalize words into its base form or root form. For example, celebrates, celebrated
and celebrating, all these words are originated with a single root word "celebrate." The big problem with
stemming is that sometimes it produces the root word which may not have any meaning.
For Example, intelligence, intelligent, and intelligently, all these words are originated with a single root
word "intelligen." In English, the word "intelligen" do not have any meaning.

Step 4: Lemmatization
Lemmatization is quite similar to the Stamming. It is used to group different inflected forms of the word,
called Lemma. The main difference between Stemming and lemmatization is that it produces the root word,
which has a meaning.
For example: In lemmatization, the words intelligence, intelligent, and intelligently has a root word
intelligent, which has a meaning.

Step 5: Identifying Stop Words


In English, there are a lot of words that appear very frequently like "is", "and", "the", and "a". NLP pipelines
will flag these words as stop words. Stop words might be filtered out before doing any statistical analysis.
Example: He is a good boy.
Note: When you are building a rock band search engine, then you do not ignore the word "The."

Step 6: Dependency Parsing


Dependency Parsing is used to find that how all the words in the sentence are related to each other.

Step 7: POS tags


POS stands for parts of speech, which includes Noun, verb, adverb, and Adjective. It indicates that how a
word functions with its meaning as well as grammatically within the sentences. A word has one or more
parts of speech based on the context in which it is used.
Example: "Google" something on the Internet.
In the above example, Google is used as a verb, although it is a proper noun.

Step 8: Named Entity Recognition (NER)


Named Entity Recognition (NER) is the process of detecting the named entity such as person name, movie
name, organization name, or location.
Example: Steve Jobs introduced iPhone at the Macworld Conference in San Francisco, California.
Step 9: Chunking
Chunking is used to collect the individual piece of information and grouping them into bigger pieces of
sentences.

Phases of NLP
There are the following five phases of NLP:

1. Lexical Analysis and Morphological


The first phase of NLP is the Lexical Analysis. This phase scans the source code as a stream of characters and
converts it into meaningful lexemes. It divides the whole text into paragraphs, sentences, and words.

2. Syntactic Analysis (Parsing)


Syntactic Analysis is used to check grammar, word arrangements, and shows the relationship among the
words.
Example: Agra goes to the Poonam
In the real world, Agra goes to the Poonam, does not make any sense, so this sentence is rejected by the
Syntactic analyzer.

3. Semantic Analysis
Semantic analysis is concerned with the meaning representation. It mainly focuses on the literal meaning of
words, phrases, and sentences.
4. Discourse Integration
Discourse Integration depends upon the sentences that proceeds it and also invokes the meaning of the
sentences that follow it.

5. Pragmatic Analysis
Pragmatic is the fifth and last phase of NLP. It helps you to discover the intended effect by applying a set of
rules that characterize cooperative dialogues.
For Example: "Open the door" is interpreted as a request instead of an order.

Why NLP is difficult?

NLP is difficult because Ambiguity and Uncertainty exist in the language.


Ambiguity
There are the following three ambiguity -
 Lexical Ambiguity
Lexical Ambiguity exists in the presence of two or more possible meanings of the sentence within a single
word.
Example:
Manya is looking for a match.
In the above example, the word match refers to that either Manya is looking for a partner or Manya is
looking for a match. (Cricket or other match)
 Syntactic Ambiguity
Syntactic Ambiguity exists in the presence of two or more possible meanings within the sentence.
Example:
I saw the girl with the binocular.
In the above example, did I have the binoculars? Or did the girl have the binoculars?
 Referential Ambiguity
Referential Ambiguity exists when you are referring to something using the pronoun.
Example: Kiran went to Sunita. She said, "I am hungry."
In the above sentence, you do not know that who is hungry, either Kiran or Sunita.

Difference between Natural language and Computer Language

Natural Language Computer Language


Natural language has a very large vocabulary. Computer language has a very limited vocabulary.
Natural language is easily understood by humans. Computer language is easily understood by the machines.
Natural language is ambiguous in nature. Computer language is unambiguous.
Introduction to Natural Language Understanding

1.1 The Study of Language

Language is one of the fundamental aspects of human behavior and is a crucial component
of our lives. In written form it serves as a long-term record of knowledge from one generation to
the next. In spoken form it serves as our primary means of coordinating our day-to-day behavior
with others. This book describes research about how language comprehension and production
work. The goal of this research is to create computational models of language in enough detail that
you could write computer programs to perform various tasks involving natural language. The
ultimate goal is to be able to specify models that approach human performance in the linguistic
tasks of reading, writing, hearing, and speaking.
Language is studied in several different academic disciplines. Each discipline defines its own
set of problems and has its own methods for addressing them.
1. The linguist - studies the structure of language itself, considering questions such as why certain
combinations of words form sentences but others do not, and why a sentence can have some
meanings but not others.
2. The psycholinguist - studies the processes of human language production and comprehension,
considering questions such as how people identify the appropriate structure of a sentence and
when they decide on the appropriate meaning for words.
3. The philosopher- considers how words can mean anything at all and how they identify objects
in the world. Philosophers also consider what it means to have beliefs, goals, and intentions,
and how these cognitive capabilities relate to language.
4. Computational linguist - goal is to develop a computational theory of language, using the
notions of algorithms and data structures from computer science. Of course, to build a
computational model, you must take advantage of what is known from all the other disciplines.
Figure 1.1 summarizes these different approaches to studying language.

Discipline Typical Problems Tools


Linguists How do words form phrases and Intuitions about well-formedness and
sentences? What constrains the meaning; mathematical models of
possible meanings for a sentence? structure (for example, formal language
theory, model theoretic semantics)
Psycholinguists How do people identify the structure of Experimental techniques based on
sentences? How are word meanings measuring human performance;
identified? When does understanding statistical analysis of observations
take place?
Philosophers What is meaning, and how do words Natural language argumentation using
and sentences acquire it? How do intuition about counter examples;
words identify objects in the world? mathematical models (for example,
logic and model theory)
Computational How is the structure of sentences Algorithms, data structures; formal
Linguists identified? How can knowledge and models of representation and
reasoning be modeled? How can reasoning; AI techniques (search and
language be used to accomplish representation methods)
specific tasks?

Figure 1.1 The major disciplines studying language

Computational models are useful both for scientific purposes — for exploring the nature of
linguistic communication — and for practical purposes — for enabling effective human-machine
communication.
The scientific motivation is to obtain a better understanding of how language works. It
recognizes that any one of the other traditional disciplines does not have the tools to completely
address the problem of how language comprehension and production work. Even if you combine
all the approaches, a comprehensive theory would be too complex to be studied using traditional
methods. But we may be able to realize such complex theories as computer programs and then test
them by observing how well they perform. By seeing where they fail, we can incrementally improve
them. Computational models may provide very specific predictions about human behavior that can
then be explored by the psycholinguist. By continuing in this process, we may eventually acquire a
deep understanding of how human language processing occurs. To realize such a dream will take
the combined efforts of linguists, psycholinguists, philosophers, and computer scientists. This
common goal has motivated a new area of interdisciplinary research often called cognitive science.

The practical, or technological, motivation is that natural language processing capabilities


would revolutionize the way computers are used. Since most of human knowledge is recorded in
linguistic form, computers that could understand natural language could access all this information.
In addition, natural language interfaces to computers would allow complex systems to be accessible
to everyone. Such systems would be considerably more flexible and intelligent than is possible with
current computer technology. For technological purposes it does not matter if the model used
reflects the way humans process language. It only matters that it works.

This book takes a middle ground between the scientific and technological goals. On the one
hand, this reflects a belief that natural language is so complex that an ad hoc approach without a
well-specified underlying theory will not be successful. Thus the technological goal cannot be
realized without using sophisticated underlying theories on the level of those being developed by
linguists, psycholinguists, and philosophers. On the other hand, the present state of knowledge
about natural language processing is so preliminary that attempting to build a cognitively correct
model is not feasible. Rather, we are still attempting to construct any model that appears to work.

1.2 Applications of Natural Language Understanding

A good way to define natural language research is to consider the different applications that
researchers work on. As you consider these/ examples. It will also be a good opportunity to
consider what it would mean to say that a computer system understands natural language. The
applications can be divided into two major classes: text-based applications and dialogue-based
applications.
Text-based applications involve the processing of written text, such as books, newspapers,
reports, manuals, e-mail messages, and so on. These are all reading-based tasks. Text-based natural
language research is ongoing in applications such as

1. finding appropriate documents on certain topics from a data-base of texts (for example,
finding relevant books in a library)
2. extracting information from messages or articles on certain topics (for example, building
a database of all stock transactions described in the news on a given day)
3. translating documents from one language to another (for example, producing automobile
repair manuals in many different languages)
4. summarizing texts for certain purposes (for example, producing a 3-page summary of a
1000-page government report)
Not all systems that perform such tasks must be using natural language understanding
techniques in the way we mean in this book. For example, consider the task of finding newspaper
articles on a certain topic in a large database. Many, techniques have been developed that classify
documents by the presence of certain keywords in the text. You can then retrieve articles on a
certain topic by looking for articles that contain the keywords associated with that topic. Articles
on law, for instance, might contain the words "lawyer", "court", "sue", "affidavit", and so on, while
articles on stock transactions might contain words such as "stocks", "takeover", "leveraged buyout",
"options", and so on. Such a system could retrieve articles on any topic that has been predefined by
a set of keywords. Clearly, we would not say that this system is understanding the text; rather, it is
using a simple matching technique. While such techniques may produce useful applications, they
are inherently limited. It is very unlikely, for example, that they could be extended to handle
complex retrieval tasks that are easily expressed in natural language, such as the query "Find me
all articles on leveraged buyouts involving more than 100 million dollars that were attempted but
failed during 1986 and 1990". To handle such queries, the system would have to be able to extract
enough information from each article in the database to determine whether the article meets the
criteria defined by the query; that is, it would have to build a representation of the information in
the articles and then use the representation to do the retrievals. This identifies a crucial
characteristic of an understanding system: it must compute some representation of the information
that can be used for later inference.
Consider another example. Some machine translation systems have been built that are
based on pattern matching; that is, a sequence of words in one language is associated with a
sequence of words in another language. The translation is accomplished by finding the best set of
patterns that match the input and producing the associated output in the other language. This
technique can produce reasonable results in some cases but sometimes produces completely
wrong translations because of its inability to use an understanding of content to disambiguate
word senses and sentence meanings appropriately. In contrast, other machine translation systems
operate by producing a representation of the meaning of each sentence in one language, and then
producing a sentence in the other language that realizes the same meaning. This latter approach,
because it involves the computation of a representation of meaning, is using natural language
understanding techniques.

One very attractive domain for text-based research is story understanding. In this task the
system processes a story and then must answer questions about it. This is similar to the type of
reading comprehension tests used in schools and provides a very rich method for evaluating the
depth of understanding the system is able to achieve.

Dialogue-based applications involve human-machine communication. Most naturally this


involves spoken language, but it also includes interaction using keyboards. Typical potential
applications include

1. question-answering systems, where natural language is used to query a database (for


example, a query system to a personnel database)
2. automated customer service over the telephone (for example, to perform banking
transactions or order items from a catalogue)
3. tutoring systems, where the machine interacts with a student (for example, an
automated mathematics tutoring system)
4. spoken language control of a machine (for example, voice control of a VCR or computer)
5. general cooperative problem-solving systems (for example, a system that helps a person
plan and schedule freight shipments)
Some of the problems faced by dialogue systems are quite different than in text-based
systems. First, the language used is very different, and the system needs to participate actively in
order to maintain a natural, smooth-flowing dialogue. Dialogue requires the use of
acknowledgments to verify that things are understood, and an ability to both recognize and
generate clarification sub-dialogues when something is not clearly understood. Even with these
differences, however, the basic processing techniques are fundamentally the same.
It is important to distinguish the problems of speech recognition from the problems of language
understanding. A speech recognition system need not involve any language understanding. For
instance, voice-controlled computers and VCRs are entering the market now. These do not involve
natural language understanding in any general way. Rather, the words recognized are used as
commands, much like the commands you send to a VCR using a remote control. Speech recognition
is concerned only with identifying the words spoken from a given speech signal, not with
understanding how words are used to communicate. To be an understanding system, the speech
recognizer would need to feed its input to a natural language understanding system, producing
what is often called a spoken language understanding system. The key characteristic of any
understanding system is that it represents the meaning of sentences in some representation
language that can be used later for further processing.

1.3 Evaluating Language Understanding Systems

As you can see, what counts as understanding might vary from application to application. If
this is so, how can you tell if a system works? One obvious way to evaluate a system is to run the
program and see how well it performs the task it was designed to do. If the program is meant to
answer questions about a database of facts, you might ask it questions to see how good it is at
producing the correct answers. If the system is designed to participate in simple conversations on a
certain topic, you might try conversing with it. This is called black box evaluation because it
evaluates system performance without looking inside to see how it works. While ultimately this
method of evaluation may be the best test of a system’s capabilities, it is problematic in the early
stages of research because early evaluation results can be misleading. Sometimes the techniques
that produce the best results in the short term will not lead to the best results in the long term. For
instance, if the overall performance of all known systems in a given application is uniformly low,
few conclusions can be drawn. The fact that one system was correct 50 percent of the time while
another was correct only 40 percent of the time says nothing about the long-term viability of either
approach. Only when the success rates become high, making a practical application feasible, can
much significance be given to overall system performance measures.

An alternative method of evaluation is to identify various subcomponents of a system and


then evaluate each one with appropriate tests. This is called glass box evaluation because you look
inside at the structure of the system. The problem with glass box evaluation is that it requires some
consensus on what the various components of a natural language system should be. Achieving such
a consensus is an area of considerable activity at the present.

Despite the difficulties, evaluations are of crucial importance to the field, as very little can be
concluded from examining a few examples that a particular system can handle. If you cannot
interact freely with the program yourself but must depend on transcripts of others’ interactions,
you can easily be mislead into overestimating a program’s abilities. It is very hard for people not to
attribute much more intelligence to the program than is actually present.

Consider a concrete example — the ELIZA program that was developed in the mid-1960s at
MIT. This program must be one of the most popular AI programs at all time, and versions of it exist
for most machines, including most personal computers. ELIZA was never claimed to embody a
theory of language comprehension and production, but it serves as an excellent example as its
behavior initially seems impressive. The system plays the role of a therapist and, to obtain the best
results, the user should correspondingly play the role of a patient. Figure 1.2 presents a transcript
of this system in operation. Given this transcript, or even playing with the system yourself for a few
minutes, ELIZA's performance certainly seems impressive.

USER ELIZA
Men are all alike. In what way?
They’re always bugging us about something or
other. Can you think of a specific example?
Well, my boyfriend made me come here. Your boyfriend made you come here.
He says I’m depressed much of the time. I am sorry to hear you are depressed.
It’s true, I am unhappy. Do you think coming here will help you not to be
unhappy?
I need some help; that much seems certain. What would it mean to you if you got some help?
Perhaps I could learn to get along with my
mother. Tell me more about your family.
My mother takes care of me. Who else in your family takes care of you?
My father. Your father.
You are like my father in some ways. What resemblance do you see?
Figure 1.2 A dialogue with ELIZA

Here is a simple description of how ELIZA works. There is a database of particular words
that are called keywords. For each keyword, the system stores an integer, a pattern to match
against the input, and a specification of the output. The algorithm is as follows: Given a sentence S.
find a keyword in S whose pattern matches S. If there is more than one keyword, pick the one with
the highest integer value. Use the output specification that is associated with this keyword to
generate the next sentence. If there are no keywords, generate an innocuous continuation
statement, such as "Tell me more" or "Go on".
Word Rank Pattern Outputs
alike 10 ?X In what way?
What resemblance do you see?
are 3 ?X are you ?Y Would you prefer it if I weren't ?Y ?
3 ?X are ?Y What if they were not ?Y?
always 5 ?X Can you think of a specific example?
When?
Really, always?
what 2 ?X Why do you ask?
Does that interest you?
Figure 1.3 Sample data from ELIZA

Figure 1.3 shows a fragment of a database of keywords. In this database a pattern consists of
words and variables. The prefix ? before a letter indicates a variable, which can match any sequence
of words. For example, the pattern

?X are you ?Y
would match the sentence "Why are you looking at me?", where the variable ?X matches "Why" and
"?Y" matches "looking at me". The output specification may also use the same variables. In this case,
ELIZA inserts the words that match the variables in the input into the output after making some
minor changes in the pronouns (for example, replacing "me" with "you"). Thus, for the pattern
above, if the output specification is

Would you prefer it if I weren’t ?Y?

the rule would generate a response "Would you prefer it if I weren’t looking at you?" When the
database lists multiple output specifications for a given pattern, ELIZA selects a different one each
time a keyword rule is used, thereby preventing unnatural repetition in the conversation. Using
these rules, you can see how ELIZA produced the first two exchanges in the conversation in Figure
1.2. ELIZA generated the first response from the first output of the keyword "alike" and the second
response from the first output of the keyword "always". This description covers all of the essential
points of the program. You will probably agree that the program does not understand the
conversation it is participating in. Rather, it is a collection of tricks. Why then does ELIZA appear to
function so well? There are several reasons. Perhaps the most important reason is that, when
people hear or read a sequence of words that they understand as a sentence, they attribute
meaning to the sentence and assume that the person (or machine) that produced the sentence
actually intended that meaning. People are extremely good at distinguishing word meanings and
interpreting sentences to fit the context. Thus ELIZA appears to be intelligent because you use your
own intelligence to make sense of what it says.

Other crucial characteristics of the conversational setting also aid in sustaining the illusion
of intelligence. For instance, the system does not need any world knowledge because it never has to
make a claim, support an argument, or answer a question. Rather, it simply asks a series of
questions. Except in a patient-therapist situation, this would be unacceptable. ELIZA evades all
direct questions by responding with another question, such as "Why do you ask?" There is no way to
force the program to say something concrete about any topic.

Even in such a restricted situation, however, it is relatively easy to demonstrate that the
program does not understand. It sometimes produces completely off-the-wall responses. For
instance, if you say "Necessity is the mother of invention", it might respond with "Tell me more about
your family", based on its pattern for the word "mother". In addition, since ELIZA has no knowledge
about the structure of language, it accepts gibberish just as readily as valid sentences. If you enter
"Green the adzabak are the a ran four", ELIZA will respond with something like "What if they were
not the a ran four?" Also, as a conversation progresses, it becomes obvious that the program does
not retain any of the content in the conversation. It begins to ask questions that are inappropriate
in light of earlier exchanges, and its responses in general begin to show a lack of focus. Of course, if
you are not able to play with the program and must depend only on transcripts of conversations by
others, you would have no way of detecting these flaws, unless they are explicitly mentioned.

Suppose you need to build a natural language program for a certain application in only six
months. If you start to construct a general model of language understanding, it will not be
completed in that time frame and so will perform miserably on the tests. An ELIZA-like system,
however, could easily produce behavior like that previously discussed with less than a few months
of programming and will appear to far outperform the other system in testing. The differences will
be especially marked if the test data only includes typical domain interactions that are not designed
to test the limits of the system. Thus, if we take short-term performance as our only criteria of
progress, everyone will build and fine-tune ELIZA-style systems, and the field will not progress past
the limitations of the simple approach.
To avoid this problem, either we have to accept certain theoretical assumptions about the
architecture of natural language systems and develop specific evaluation measures for different
components, or we have to discount overall evaluation results until some reasonably high level of
performance is obtained. Only then will cross-system comparisons begin to reflect the potential for
long-term success in the field.

1.4 The Different Levels of Language Analysis

A natural language system must use considerable knowledge about the structure of the language
itself, including what the words are, how words combine to form sentences, what the words mean,
how word meanings contribute to sentence meanings, and so on. However, we cannot completely
account for linguistic behavior without also taking into account another aspect of what makes
humans intelligent — their general world knowledge and their reasoning abilities. For example, to
answer questions or to participate in a conversation, a person not only must know a lot about the
structure of the language being used, but also must know about the world in general and the
conversational setting in particular.

The following are some of the different forms of knowledge relevant for natural language
understanding:

1. Phonetic and phonological knowledge - concerns how words are related to the sounds
that realize them. Such knowledge is crucial for speech-based systems.
2. Morphological knowledge - concerns how words are constructed from more basic
meaning units called morphemes. A morpheme is the primitive unit of meaning in a
language (for example, the meaning of the word "friendly" is derivable from the meaning of
the noun "friend" and the suffix "-ly", which transforms a noun into an adjective).
3. Syntactic knowledge - concerns how words can be put together to form correct sentences
and determines what structural role each word plays in the sentence and what phrases are
subparts of what other phrases.
4. Semantic knowledge - concerns what words mean and how these meanings -combine in
sentences to form sentence meanings. This is the study of context-independent meaning -
the meaning a sentence has regardless of the context in which it is used.
5. Pragmatic knowledge - concerns how sentences are used in different situations and how
use affects the interpretation of the sentence.
6. Discourse knowledge-concerns how the immediately preceding sentences affect the
interpretation of the next sentence. This information is especially important for interpreting
pronouns and for interpreting the temporal aspects of the information conveyed.
7. World knowledge - includes the general knowledge about the structure of the world that
language users must have in order to, for example, maintain a conversation. It includes what
each language user must know about the other user’s beliefs and goals.
BOX 1.2 Syntax, Semantics, and Pragmatics

The following examples may help you understand the distinction between syntax, semantics, and
pragmatics. Consider each example as a candidate for the initial sentence of this book, which you
know discusses natural language processing:

1. Language is one of the fundamental aspects of human behavior and is a crucial component
of our lives.
2. Green frogs have large noses.
3. Green ideas have large noses.
4. Large have green ideas nose.

Sentence 1 appears to be a reasonable start (I hope!). It agrees with all that is known about
syntax, semantics, and pragmatics. Each of the other sentences violates one or more of these
levels. Sentence 2 is well-formed syntactically and semantically, but not pragmatically. It fares
poorly as the first sentence of the book because the reader would find no reason for using it. But
however bad sentence 2 would be as a start, sentence 3 is much worse. Not only is it obviously
pragmatically ill-formed, it is also semantically ill-formed. To see this, consider that you and I
could argue about whether sentence 2 is true or not, but we cannot do so with sentence 3. I
cannot affirm or deny sentence 3 in coherent conversation. However, the sentence does have
some structure, for we can discuss what is wrong with it: Ideas cannot be green and, even if they
could, they certainly cannot have large noses. Sentence 4 is even worse. In fact, it is unintelligible,
even though it contains the same words as sentence 3. It does not even have enough structure to
allow you to say what is wrong with it. Thus it is syntactically ill-formed. Incidentally, there are
cases in which a sentence may be pragmatically well-formed but not syntactically well-formed.
For example, if I ask you where you are going and you reply "I go store", the response would be
understandable even though it is syntactically ill-formed. Thus it is at least pragmatically well-
formed and may even be semantically well-formed.

1.5 Representations and Understanding

A crucial component of understanding involves computing a representation of the meaning


of sentences and texts. Without defining the notion of representation, however, this assertion has
little content. For instance, why not simply use the sentence itself as a representation of its
meaning? One reason is that most words have multiple meanings, which we will call senses. The
word "cook", for example, has a sense as a verb and a sense as a noun; "dish" has multiple senses as
a noun as well as a sense as a verb; and "still" has senses as a noun, verb, adjective, and adverb. This
ambiguity would inhibit the system from making the appropriate inferences needed to model
understanding. The disambiguation problem appears much easier than it actually is because people
do not generally notice ambiguity. While a person does not seem to consider each of the possible
senses of a word when understanding a sentence, a program must explicitly consider them one by
one.
To represent meaning, we must have a more precise language. The tools to do this come from
mathematics and logic and involve the use of formally specified representation languages. Formal
languages are specified from very simple building blocks. The most fundamental is the notion of an
atomic symbol which is distinguishable from any other atomic symbol simply based on how it is
written. Useful representation languages have the following two properties:
 The representation must be precise and unambiguous. You should be able to express every
distinct reading of a sentence as a distinct formula in the representation.
 The representation should capture the intuitive structure of the natural language
sentences that it represents. For example, sentences that appear to be structurally similar
should have similar structural representations, and the meanings of two sentences that are
paraphrases of each other should be closely related to each other.

Several different representations will be used that correspond to some of the levels of analysis
discussed in the last section. In particular, we will develop formal languages for expressing
syntactic structure, for context-independent word and sentence meanings, and for expressing
general world knowledge.

Syntax: Representing Sentence Structure

The syntactic structure of a sentence indicates the way that words in the sentence are related to
each other. This structure indicates how the words are grouped together into phrases, what words
modify what other words, and what words are of central importance in the sentence. In addition,
this structure may identify the types of relationships that exist between phrases and can store
other information about the particular sentence structure that may be needed for later processing.
For example, consider the following sentences:

1. John sold the book to Mary.


2. The book was sold to Mary by John.

These sentences share certain structural properties. In each, the noun phrases are "John", "Mary",
and "the book", and the act described is some selling action. In other respects, these sentences are
significantly different. For instance, even though both sentences are always either true or false in
the exact same situations, you could only give sentence 1 as an answer to the question "What did
John do for Mary?"
Sentence 2 is a much better continuation of a sentence beginning with the phrase "After it fell in
the river", as sentences 3 and 4 show. Following the standard convention in linguistics, this book
will use an asterisk (*) before any example of an ill-formed or questionable sentence.

3. *After it fell in the river, John sold Mary the book.


4. After it fell in the river, the book was sold to Mary by John.

Many other structural properties can be revealed by considering sentences that are not well-
formed.
Sentence 5 is ill-formed because the subject and the verb do not agree in number (the subject is
singular and the verb is plural), while 6 is ill-formed because the verb put requires some modifier
that describes where John put the object.

5. *John are in the corner.


6. *John put the book.

Making judgments on grammaticality is not a goal in natural language understanding. In fact, a


robust system should be able to understand ill-formed sentences whenever possible. This might
suggest that agreement checks can be ignored, but this is not so. Agreement checks are essential for
eliminating potential ambiguities. Consider sentences 7 and 8, which are identical except for the
number feature of the main verb, yet represent two quite distinct interpretations.
7. flying planes are dangerous. (planes are dangerous)
8. flying planes is dangerous. (flying is dangerous)
If you did not check subject-verb agreement, these two sentences would be indistinguishable
and ambiguous.

Figure 1.4 Two structural representations of "Rice flies like sand".

Most syntactic representations of language are based on the notion of context-free


grammars, which represent sentence structure in terms of what phrases are subparts of other
phrases. This information is often presented in a tree form, such as the one shown in Figure 1.4,
which shows two different structures for the sentence "Rice flies like sand". In the first reading, the
sentence is formed from a noun phrase (NP) describing a type of fly’ rice flies, and a verb phrase
(VP) that asserts that these flies like sand. In the second structure, the sentence is formed from a
noun phrase describing a type of substance, rice, and a verb phrase stating that this substance flies
like sand (say, if you throw it). The two structures also give further details on the structure of the
noun phrase and verb phrase and identify the part of speech for each word. In particular, the word
"like" is a verb (V) in the first reading and a preposition (P) in the second.

The Logical Form

The structure of a sentence doesn’t reflect its meaning, however. For example, the NP "the
catch" can have different meanings depending on whether the speaker is talking about a baseball
game or a fishing expedition. Both these interpretations have the same syntactic structure, and the
different meanings arise from an ambiguity concerning the sense of the word "catch". Once the
correct sense is identified, say the fishing sense, there still is a problem in determining what fish
are being referred to. The intended meaning of a sentence depends on the situation in which the
sentence is produced. Rather than combining all these problems, this book will consider each one
separately. The division is between context-independent meaning and context-dependent meaning.
The fact that "catch" may refer to a baseball move or the results of a fishing expedition is
knowledge about English and is independent of the situation in which the word is used. On the
other hand, the fact that a particular noun phrase "the catch" refers to what Jack caught when
fishing yesterday is contextually dependent. The representation of the context-independent
meaning of a sentence is called its logical form.

The logical form encodes possible word senses and identifies the semantic relationships
between the words and phrases. Many of these relationships are often captured using an abstract
set of semantic relationships between the verb and its NPs. In particular, in both sentences 1 and 2
previously given, the action described is a selling event, where "John" is the seller, "the book" is the
object being sold, and "Mary" is the buyer. These roles are instances of the abstract semantic roles
AGENT, THEME, and TO-POSS (for final possessor), respectively.
Once the semantic relationships are determined, some word senses may be impossible and thus
eliminated from consideration. Consider the sentence

9. Jack invited Mary to the Halloween ball.

The word "ball", which by itself is ambiguous between the plaything that bounces and the formal
dance event, can only take the latter sense in sentence 9, because the verb "invite" only makes
sense with this interpretation. One of the key tasks in semantic interpretation is to consider what
combinations of the individual word meanings can combine to create coherent sentence meanings.
Exploiting such interconnections between word meanings can greatly reduce the number of
possible word senses for each word in a given sentence.

The Final Meaning Representation

The final representation needed is a general knowledge representation (KR), which the
system uses to represent and reason about its application domain. This is the language in which all
the specific knowledge based on the application is represented. The goal of contextual
interpretation is to take a representation of the structure of a sentence and its logical form, and to
map this into some expression in the KR that allows the system to perform the appropriate task in
the domain. In a question-answering application, a question might map to a database query, in a
story-understanding application, a sentence might map into a set of expressions that represent the
situation that the sentence describes.

For the most part, we will assume that the first-order predicate calculus (FOPC) is the final
representation language because it is relatively well known, well studied, and is precisely defined.

1.6 The Organization of Natural Language Understanding Systems

This book is organized around the three levels of representation just discussed: syntactic structure,
logical form, and the final meaning representation. Separating the problems in this way will allow
you to study each problem in depth without worrying about other complications. Actual systems
are usually organized slightly differently, however. In particular, Figure 1.5 shows the organization
that this book assumes.
Figure 1.5 The flow of information

As you can see, there are interpretation processes that map from one representation to the
other. For instance, the process that maps a sentence to its syntactic structure and logical form is
called the parser. It uses knowledge about word and word meanings (the lexicon) and a set of rules
defining the legal structures (the grammar) in order to assign a syntactic structure and a logical
form to an input sentence. An alternative organization could perform syntactic processing first and
then perform semantic interpretation on the resulting structures. Combining the two, however, has
considerable advantages because it leads to a reduction in the number of possible interpretations,
since every proposed interpretation must simultaneously be syntactically and semantically well
formed. For example, consider the following two sentences:
10. Visiting relatives can be trying.
11. Visiting museums can be trying.
These two sentences have identical syntactic structure, so both are syntactically ambiguous. In
sentence 10, the subject might be relatives who are visiting you or the event of you visiting
relatives. Both of these alternatives are semantically valid, and you would need to determine the
appropriate sense by using the contextual mechanism. However, sentence 11 has only one possible
semantic interpretation, since museums are not objects that can visit other people; rather they
must be visited. In a system with separate syntactic and semantic processing, there would be two
syntactic interpretations of sentence 11, one of which the semantic interpreter would eliminate
later. If syntactic and semantic processing are combined, however, the system will be able to detect
the semantic anomaly as soon as it interprets the phrase "visiting museums", and thus will never
build the incorrect syntactic structure in the first place. While the savings here seem small, in a
realistic application a reasonable sentence may have hundreds of possible syntactic structures,
many of which are semantically anomalous.

Continuing through Figure 1.5, the process that transforms the syntactic structure and
logical form into a final meaning representation is called contextual processing. This process
includes issues such as identifying the objects referred to by noun phrases such as definite
descriptions (for example, "the man") and pronouns, the analysis of the temporal aspects of the
new information conveyed by the sentence, the identification of the speaker’s intention (for
example, whether "Can you lift that rock" is a yes/no question or a request), as well as all the
inferential processing required to interpret the sentence appropriately within the application
domain. It uses knowledge of the discourse context (determined by the sentences that preceded the
current one) and knowledge of the application to produce a final representation.
The system would then perform whatever reasoning tasks are appropriate for the application.
When this requires a response to the user, the meaning that must be expressed is passed to the
generation component of the system. It uses knowledge of the discourse context, plus information
on the grammar and lexicon, to plan the form of an utterance, which then is mapped into words by
a realization process. Of course, if this were a spoken language application, the words would not be
the final input and output, but rather would be the output of a speech recognizer and the input to a
speech synthesizer, as appropriate. While this text focuses primarily on language understanding,
notice that the same levels of knowledge are also used for the generation task as well. For instance,
knowledge of syntactic structure is encoded in the grammar. This grammar can be used either to
identify the structure of a given sentence or to realize a structure as a sequence of words. A
grammar that supports both processes is called a bidirectional grammar. While most researchers
agree that bidirectional grammars are the preferred model, in actual practice grammars are often
tailored for the understanding task or the generation task. This occurs because different issues are
important for each task, and generally any given researcher focuses just on the problems related to
their specific task. But even when the actual grammars differ between understanding and
generation, the grammatical formalisms used remain the same.

Linguistic Background: An Outline of English Syntax

2.1 Words
At first glance the most basic unit of linguistic structure appears to be the word. The word,
though, is far from the fundamental element of study in linguistics; it is already the result of a
complex set of more primitive parts. The study of morphology concerns the construction of words
from more basic components corresponding roughly to meaning units. There are two basic ways
that new words are formed, traditionally classified as inflectional forms and derivational forms.

Inflectional forms use a root form of a word and typically add a suffix so that the word
appears in the appropriate form given the sentence. Verbs are the best examples of this in English.
Each verb has a basic form that then is typically changed depending on the subject and the tense of
the sentence. For example, the verb sigh will take suffixes such as -s, -ing, and -ed to create the verb
forms sighs, sighing, and sighed, respectively. These new words are all verbs and share the same
basic meaning.

Derivational morphology involves the derivation of new words from other forms. The new
words may be in completely different categories from their subparts. For example, the noun friend
is made into the adjective friendly by adding the suffix - ly. A more complex derivation would allow
you to derive the noun friendliness from the adjective form. There are many interesting issues
concerned with how words are derived and how the choice of word form is affected by the
syntactic structure of the sentence that constrains it.

Traditionally, linguists classify words into different categories based on their uses. Two
related areas of evidence are used to divide words into categories. The first area concerns the
word’s contribution to the meaning of the phrase that contains it, and the second area concerns the
actual syntactic structures in which the word may play a role. For example, you might posit the
class noun as those words that can he used to identify the basic type (If object, concept, or place
being discussed, and adjective as those words that further qualify the object, concept, or place. Thus
green would be an adjective and book a noun, as shown in the phrases the green book and green
books. But things are not so simple: green might play the role of a noun, as in That green is lighter
than the other, and book might play the role of a modifier, as in the book worm. In fact, most nouns
seem to be able to be used as a modifier in some situations. Perhaps the classes should be
combined, since they overlap a great deal. But other forms of evidence exist. Consider what words
could complete the sentence it’s so... You might say it’s so green, it’s so hot, it’s so true, and so on.
Note that although book can be a modifier in the book worm, you cannot say *it’s so book about
anything. Thus there are two classes of modifiers: adjective modifiers and noun modifiers.

Consider again the case where adjectives can be used as nouns, as in the green. Not all
adjectives can be used in such a way. For example, the noun phrase the hot can be used, given a
context where there are hot and cold plates, in a sentence such as The hot are on the table. But this
refers to the hot plates; it cannot refer to hotness in the way the phrase the green refers to green.
With this evidence you could subdivide adjectives into two subclasses—those that can also be used
to describe a concept or quality directly, and those that cannot. Alternatively, however, you can
simply say that green is ambiguous between being an adjective or a noun and, therefore, falls in
both classes. Since green can behave like any other noun, the second solution seems the most
direct.
Using similar arguments, we can identify four main classes of words in English that
contribute to the meaning of sentences. These classes are nouns, adjectives, verbs, and adverbs.
Sentences are built out of phrases centered on these four word classes. Of course, there are many
other classes of words that are necessary to form sentences, such as articles, pronouns,
prepositions, particles, quantifiers, conjunctions, and so on. But these classes are fixed in the sense
that new words in these classes are rarely introduced into the language. New nouns, verbs,
adjectives and adverbs, on the other hand, are regularly introduced into the language as it evolves.
As a result, these classes are called the open class words, and the others are called the closed class
words.
A word in any of the four open classes may be used to form the basis of a phrase. This word
is called the head of the phrase and indicates the type of thing, activity, or quality that the phrase
describes. For example, with noun phrases, the head word indicates the general classes of objects
being described. The phrases
the dog
the mangy dog
the mangy dog at the pound
are all noun phrases that describe an object in the class of dogs. The first describes a member from
the class of all dogs, the second an object from the class of mangy dogs, and the third an object from
the class of mangy dogs that are at the pound. The word dog is the head of each of these phrases.

Similarly, the adjective phrases


hungry
very hungry
hungry as a horse
all describe the quality of hunger. In each case the word hungry is the head.

In some cases a phrase may consist of a single head. For example, the word sand can be a
noun phrase, hungry can be an adjective phrase, and walked can be a verb phrase. In many other
cases the head requires additional phrases following it to express the desired meaning. For
example, the verb put cannot form a verb phrase in isolation; thus the following words do not form
a meaningful sentence:
*Jack put.
To be meaningful, the verb put must be followed by a noun phrase and a phrase describing a
location, as in the verb phrase put the dog in the house. The phrase or set of phrases needed to
complete the meaning of such a head is called the complement of the head. In the preceding phrase
put is the head and the dog in the house is the complement. Heads of all the major classes may
require complements. Figure 2.1 gives some examples of phrases, with the head indicated by
boldface and the complements by italics.

Noun Phrases Verb Phrases Adjective Phrases Adverbial Phrases


The president of the looked up the chimney easy to assemble rapidly like a bat
company
His desire to succeed believed that the world happy that he’d intermittently
was flat won the prize throughout the day
Several challenges from ate the pizza angry as a hippo inside the house
the opposing team

Figure 2.1 Examples of heads and complements

2.2 The Elements of Simple Noun Phrases


Noun phrases (NPs) are used to refer to things: objects, places, concepts, events, qualities,
and so on. The simplest NP consists of a single pronoun: he, she, they, you, me, it, i, and so on.
Pronouns can refer to physical objects, as in the sentence
It hid under the rug.
to events, as in the sentence
Once I opened the door, I regretted it for months.
and to qualities, as in the sentence
He was so angry, but he didn’t show it.
Pronouns do not take any modifiers except in rare forms, as in the sentence
"He who hesitates is lost".
Another basic form of noun phrase consists of a name or proper noun, such as John or
Rochester. These nouns appear in capitalized form in carefully written English. Names may also
consist of multiple words, as in the New York Times and Stratford-on-Avon.
Excluding pronouns and proper names, the head of a noun phrase is usually a common
noun. Nouns divide into two main classes:
count nouns — nouns that describe specific objects or sets of objects.
mass nouns — nouns that describe composites or substances.

Count nouns acquired their name because they can be counted. There may be one dog or
many dogs, one book or several books, one crowd or several crowds. If a single count noun is used to
describe a whole class of objects, it must be in its plural form. Thus you can say Dogs are friendly
but not *Dog is friendly.

Mass nouns cannot be counted. There may be some water, some wheat, or some sand. If you
try to count with a mass noun, you change the meaning. For example, some wheat refers to a
portion of some quantity of wheat, whereas one wheat is a single type of wheat rather than a single
grain of wheat. A mass noun can be used to describe a whole class of material without using a
plural form. Thus you say Water is necessary for life, not * Waters are necessary for life.

In addition to a head, a noun phrase may contain specifiers and qualifiers preceding the
head. The qualifiers further describe the general class of objects identified by the head, while the
specifiers indicate how many such objects are being described, as well as how the objects being
described relate to the speaker and hearer. Specifiers are constructed out of ordinals (such as first
and second), cardinals (such as one and two), and determiners. Determiners can be sub-divided
into the following general classes:
articles — the words the, a, and an.
demonstratives — words such as this, that, these, and those.
possessives — noun phrases followed by the suffix ‘s, such as
John’s and the fat man’s, as well as possessive pronouns, such as her, my, and whose.
wh-determiners — words used in questions, such as which and what.
quantifying determiners — words such as some, every, most, no, any, both, and half

Number First Person Second Person Third Person


singular I you he (masculine)
she (feminine)
it (neuter)
plural we you they
Figure 2.2 Pronoun system (as subject)

Number First Person Second Person Third Person


singular my your his, her, its
plural our your their

Figure 2.3 Pronoun system (possessives)

A simple noun phrase may have at most one determiner, one ordinal, and one cardinal. It is
possible to have all three, as in the first three contestants. An exception to this rule exists with a few
quantifying determiners such as many, few, several, and little. These words can be preceded by an
article, yielding noun phrases such as the few songs we knew. Using this evidence, you could
subcategorize the quantifying determiners into those that allow this and those that don’t, but the
present coarse categorization is fine for our purposes at this time.
The qualifiers in a noun phrase occur after the specifiers (if any) and before the head. They
consist of adjectives and nouns being used as modifiers. The following are more precise definitions:
adjectives - words that attribute qualities to objects yet do not refer to the qualities
themselves (for example, angry is an adjective that attributes the quality of anger to
something).
noun modifiers - mass or count nouns used to modify another noun,
as in the cook book or the ceiling paint can.

Before moving on to other structures, consider the different inflectional forms that nouns
take and how they are realized in English. Two forms of nouns - the singular and plural forms -
have already been mentioned. Pronouns take forms based on person (first, second, and third) and
gender (masculine, feminine, and neuter). Each of these distinctions reflects a systematic analysis
that is almost wholly explicit in some languages, such as Latin, while implicit in others. In French,
for example, nouns are classified by their gender. In English many of these distinctions are not
explicitly marked except in a few cases. The pronouns provide the best example of this. They
distinguish number, person, gender, and case (that is, whether they are used as possessive, subject,
or object), as shown in Figures 2.2 through 2.4.
Number First Person Second Person Third Person
Singular me you her / him / it
Plural us you them
Figure 2.4 Pronoun system (as object)

2.3 Verb Phrases and Simple Sentences

While an NP is used to refer to things, a sentence (S) is used to assert, query, or command.
You may assert that some sentence is true, ask whether a sentence is true, or command someone to
do something described in the sentence. The way a sentence is used is called its mood. Figure 2.5
shows four basic sentence moods.

Mood Example
declarative (or assertion) The cat is sleeping.
yes/no question Is the cat sleeping?
wh-question What is sleeping? or Which cat is sleeping?
imperative (or command) Shoot the cat
Figure 2.5 Basic moods of sentences

A simple declarative sentence consists of an NP, the subject, followed by a verb phrase (VP),
the predicate. A simple VP may consist of some adverbial modifiers followed by the head verb and
its complements. Every verb must appear in one of the five possible forms shown in Figure 2.6.
Form Examples Example Uses
Base hit, cry, go, be Hit the ball!
I want to go.
simple present hit, cries, go, am The dog cries every day.
I am thirsty.
simple past hit, cried, went, was I was thirsty.
I went to the store.
present participle hitting, crying, I'm going to the store.
going, being Being the last in line aggravates me.
past participle hit, cried, gone, I’ve been there before.
been The cake was gone.
Figure 2.6 The five verb forms

Tense The Verb Sequence Example


simple present simple present He walks to the store.
simple past simple past He walked to the store.
simple future will + infinitive He will walk to the store.
present perfect have in present He has walked to the store.
+ past participle
future perfect will + have in infinitive I will have walked to the store.
+ past participle
past perfect have in past I had walked to the store.
(or pluperfect) + past participle
Figure 2.7 The basic tenses

Tense Structure Example


present progressive be in present + present participle He is walking.
past progressive be in past + present participle He was walking.
future progressive will + be in infinitive + present participle He will be walking.
present perfect have in present + be in past participle He has been walking.
progressive + present participle
future perfect will + have in present + be as past He will have been walking.
progressive participle + present participle
past perfect have in past + be in past participle He had been walking.
progressive + present participle
Figure 2.8 The progressive tenses
Verbs can be divided into several different classes: the auxiliary verbs, such as be, do, and
have; the modal verbs, such as will, can, and could; and the main verbs, such as eat, ran, and believe.
The auxiliary and modal verbs usually take a verb phrase as a complement, which produces a
sequence of verbs, each the head of its own verb phrase. These sequences are used to form
sentences with different tenses.
The tense system identifies when the proposition described in the sentence is said to be
true. The tense system is complex; only the basic forms are outlined in Figure 2.7. In addition, verbs
may be in the progressive tense. Corresponding to the tenses listed in Figure 2.7 are the
progressive tenses shown in Figure 2.8. Each progressive tense is formed by the normal tense
construction of the verb be followed by a present participle.
First Second Third
Singular I am you are he is
I walk you walk she walks
Plural we are you are they are
we walk you walk they walk
Figure 2.9 Person/number forms of verbs
Verb groups also encode person and number information in the first word in the verb group.
The person and number must agree with the noun phrase that is the subject of the verb phrase.
Some verbs distinguish nearly all the possibilities, but most verbs distinguish only the third person
singular (by adding an –s suffix). Some examples are shown in Figure 2.9.

Transitivity and Passives

The last verb in a verb sequence is called the main verb, and is drawn from the open class of verbs.
Depending on the verb, a wide variety of complement structures are allowed. For example, certain
verbs may stand alone with no complement. These are called intransitive verbs and include
examples such as laugh (for example, Jack laughed) and run (for example, He will have been
running). Another common complement form requires a noun phrase to follow the verb. These are
called transitive verbs and include verbs such as find (for example, Jack found a key). Notice that
find cannot be intransitive (for example, *Jack found is not a reasonable sentence), whereas laugh
cannot be transitive (for example, *Jack laughed a key is not a reasonable sentence). A verb like run,
on the other hand, can be transitive or intransitive, but the meaning of the verb is different in each
case (for example. Jack ran vs. Jack ran the machine).
Transitive verbs allow another form of verb group called the passive form, which is
constructed using a be auxiliary followed by the past participle. In the passive form the noun
phrase that would usually be in the object position is used in the subject position, as can be seen by
the examples in Figure 2.10. Note that tense is still carried by the initial verb in the verb group.
Also, even though the first noun phrase semantically seems to be the object of the verb in passive
sentences, it is syntactically the subject. This can be seen by checking the pronoun forms. For
example, I was hit is correct, not *Me was hit. Furthermore, the tense and number agreement is
between the verb and the syntactic subject. Thus you say I was hit by them, not *I were hit by them.
Some verbs allow two noun phrases to follow them in a sentence; for example, Jack gave Sue a book
or Jack found me a key. In such sentences the second NP corresponds to the object NP outlined
earlier and is sometimes called the direct object. The other NP is called the indirect object.
Generally, such sentences have an equivalent sentence where the indirect object appears with a
preposition, as in Jack gave a book to Sue or Jack found a key for me.

Active Sentence Related Passive Sentence


Jack saw the ball. The ball was seen by Jack.
I will find the clue. The clue will be found by me.
Jack hit me. I was hit by Jack.
Figure 2.10 Active sentences with corresponding passive sentences

Particles
Some verb forms are constructed from a verb and an additional word called a particle.
Particles generally overlap with the class of prepositions considered in the next section. Some
examples are up, out, over, and in. With verbs such as look, take, or put, you can construct many
different verbs by combining the verb with a particle (for example, look up, look out, look over, and
so on). In some sentences the difference between a particle and a preposition results in two
different readings for the same sentence. For example, look over the paper would mean reading the
paper, if you consider over a particle (the verb is look over). In contrast, the same sentence would
mean looking at something else behind or above the paper, if you consider over a preposition (the
verb is look).
You can make a sharp distinction between particles and prepositions when the object of the
verb is a pronoun. With a verb-particle sentence, the pronoun must precede the particle, as in I
looked it up. With the prepositional reading, the pronoun follows the preposition, as in I looked up
it. Particles also may follow the object NP. Thus you can say I gave up the game to Mary or I gave the
game up to Mary. This is not allowed with prepositions; for example, you cannot say *I climbed the
ladder up.

Clausal Complements
Many verbs allow clauses as complements. Clauses share most of the same properties of
sentences and may have a subject, indicate tense, and occur in passivized forms. One common
clause form consists of a sentence form preceded by the complementizer that, as in that Jack ate the
pizza. This clause will be identified by the expression S[that], indicating a special subclass of S
structures. This clause may appear as the complement of the verb know, as in Sam knows that Jack
ate the pizza. The passive is possible, as in Sam knows that the pizza was eaten by Jack.

Another clause type involves the infinitive form of the verb. The VP[inf] clause is simply a VP
starting in the infinitive form, as in the complement of the verb wish in Jack wishes to eat the pizza.
An infinitive sentence S[inf] form is also possible where the subject is indicated by a for phrase, as
in Jack wishes for Sam to eat the pizza.

Another important class of clauses are sentences with complementizers that are wh-words,
such as who, what, where, why, whether, and how many. These question clauses, S[WH], can be used
as a complement of verbs such as know, as in Sam knows whether we went to the party and The
police know who committed the crime.

Prepositional Phrase Complements

Many verbs require complements that involve a specific prepositional phrase (PP). The verb give
takes a complement consisting of an NP and a PP with the preposition to, as in Jack gave the book to
the library. No other preposition can be used. Consider
*Jack gave the book from the library. (OK only if from the library modifies book.)
In contrast, a verb like put can take any PP that describes a location, as in
Jack put the book in the box.
Jack put the book inside the box.
Jack put the book by the door.
To account for this, we allow complement specifications that indicate prepositional phrases with
particular prepositions. Thus the verb give would have a complement of the form NP+PP[to].
Similarly the verb decide would have a complement form NP+PP[about], and the verb blame would
have a complement form NP+PP[on], as in Jack blamed the accident on the police.

Verbs such as put, which take any phrase that can describe a location (complement
NP+Location), are also common in English. While locations are typically prepositional phrases, they
also cart be noun phrases, such as home, or particles, such as back or here. A distinction can be
made between phrases that describe locations and phrases that describe a path of motion, although
many location phrases can be interpreted either way. The distinction can be made in some cases,
though. For instance, prepositional phrases beginning with "to" generally indicate a path of motion.
Thus they cannot be used with a verb such as "put" that requires a location (for example, *I put the
ball to the box). This distinction will be explored further in Chapter 4.
Figure 2.11 summarizes many of the verb complement structures found in English. A full list would
contain over 40 different forms. Note that while the examples typically use a different verb for each
form, most verbs will allow several different complement structures.

Verb Complement Structure Example


laugh Empty (intransitive) Jack laughed.
find NP (transitive) Jack found a key.
give NP+NP (bitransitive) Jack gave Sue the paper.
give NP+PP[to] Jack gave the book to the library.
reside Location phrase Jack resides in Rochester
put NP+Location phrase Jack put the book inside.
speak PP[with]+PP[about] Jack spoke with Sue about the book.
try VP[to] Jack tried to apologize.
tell NP+VP[to] Jack told the man to go.
wish S[to] Jack wished for the man to go.
keep VP[ing] Jack keeps hoping for the best.
catch NP+VP[ing] Jack caught Sam looking in his desk.
watch NP+VP[base] Jack watched Sam eat the pizza.
regret S[that] Jack regretted that he'd eaten the
whole thing.
tell NP+S[that] Jack told Sue that he was sorry.
seem ADJP Jack seems unhappy in his new job.
think NP+ADJP Jack thinks Sue is happy in her job.
know S[WH] Jack knows where the money is.

Figure 2.11 Some common verb complement structures in English

2.4 Noun Phrases Revisited

Section 2.2 introduced simple noun phrases. This section considers more complex forms in
which NPs contain sentences or verb phrases as subcomponents.
All the examples in Section 2.2 had heads that took the null complement. Many nouns,
however, may take complements. Many of these fall into the class of complements that require a
specific prepositional phrase. For example, the noun love has a complement form PP[of], as in their
love of France, the noun reliance has the complement form PP[on], as in his reliance on handouts,
and the noun familiarity has the complement form PP[with], as in a familiarity with computers.

Many nouns, such as desire, reluctance, and research, take an infinitive VP form as a
complement, as in the noun phrases his desire to release the guinea pig, a reluctance to open the case
again, and the doctor’s research to find a cure for cancer. These nouns, in fact, can also take the S[inf]
form, as in my hope for John to open the case again.

Noun phrases can also be built out of clauses, which were introduced in the last section as
the complements for verbs. For example, a that clause (S[that]) can be used as the subject of a
sentence, as in the sentence That George had the ring was surprising. Infinitive forms of verb
phrases (VP[inf]) and sentences (S[inf]) can also function as noun phrases, as in the sentences "To
own a car would be delightful" and "For us to complete a project on time would be unprecedented". In
addition, the gerundive forms (VP[ing] and S[ing]) can also function as noun phrases, as in the
sentences Giving up the game was unfortunate and John's giving up the game caused a riot.

Relative clauses involve sentence forms used as modifiers in noun phrases. These clauses
are often introduced by relative pronouns such as who, which, that, and so on, as in
The man who gave Bill the money...
The rug that George gave to Ernest...
The man whom George gave the money to ...
In each of these relative clauses, the embedded sentence is the same structure as a regular
sentence except that one noun phrase is missing. If this missing NP is filled in with the NP that the
sentence modifies, the result is a complete, sentence that captures the same meaning as what was
conveyed by the relative clause. The missing NPs in the preceding three sentences occur in the
subject position, in the object position, and as object to a preposition, respectively. Deleting the
relative pronoun and filling in the missing NP in each produces the following:
The man gave Bill the money.
George gave the rug to Ernest.
George gave the money to the man.
As was true earlier, relative clauses can be modified in the same ways as regular sentences.
In particular, passive forms of the preceding sentences would be as follows:
Bill was given the money by the man.
The rug was given to Ernest by George.
The money was given to the man by George.
Correspondingly, these sentences could have relative clauses in the passive form as follows:
The man Bill was given the money by...
The rug that was given to Ernest by George...
The man whom the money was given to by George...
Notice that some relative clauses need not be introduced by a relative pronoun. Often the relative
pronoun can be omitted, producing what is called a base relative clause, as in the NP the man
George gave the money to. Yet another form deletes the relative pronoun and an auxiliary be form,
creating a reduced relative clause, as in the NP the man given the money, which means the same as
the NP the man who was given the money.

2.5 Adjective Phrases


You have already seen simple adjective phrases (ADJPs) consisting of a single adjective in
several examples. More complex adjective phrases are also possible, as adjectives may take many of
the same complement forms that occur with verbs. This includes specific prepositional phrases, as
with the adjective pleased, which takes the complement form PP[with] (for example, Jack was
pleased with the prize), or angry with the complement form PP[atl (for example, Jack was angry at
the committee). Angry also may take an S[that] complement form, as in Jack was angry that he was
left behind. Other adjectives take infinitive forms, such as the adjective willing with the complement
form VP[inf], as in Jack seemed willing to lead the chorus.

These more complex adjective phrases are most commonly found as the complements of
verbs such as be or seem, or following the head in a noun phrase. They generally cannot be used as
modifiers preceding the heads of noun phrases (for example, consider *the angry at the committee
man vs. the angry man vs. the man angry at the committee).

Adjective phrases may also take a degree modifier preceding the head, as in the adjective
phrase very angry or somewhat fond of Mary. More complex degree modifications are possible, as in
far too heavy and much more desperate. Finally, certain constructs have degree modifiers that
involve their own complement forms, as in too stupid to come in out of the rain, so boring that
everyone fell asleep, and as slow as a dead horse.

2.6 Adverbial Phrases

You have already seen adverbs in use in several constructs, such as indicators of degree (for
example, very, rather, too), and in location phrases (for example, here, everywhere). Other forms of
adverbs indicate the manner in which something is done (for example, slowly, hesitantly), the time
of something (for example, now, yesterday), or the frequency of something (for example, frequently,
rarely, never).

Adverbs may occur in several different positions in sentences: in the sentence initial
position (for example, Then, Jack will open the drawer), in the verb sequence (for. example, Jack
then will open the drawer, Jack will then open the drawer), and in the sentence final position (for
example, Jack opened the drawer then).The exact restrictions on what adverb can go where,
however, is quite idiosyncratic to the particular adverb.

In addition to these adverbs, adverbial modifiers can be constructed out of a wide range of
constructs, such as prepositional phrases indicating, among other things, location (for example, in
the box) or manner (for example, in great haste); noun phrases indicating, among other things,
frequency (for example, every day): or clauses indicating, among other things, the time (for
example, when the bomb exploded). Such adverbial phrases, however, usually cannot occur except
in the sentence initial or sentence final position. For example, we can say Every day John opens his
drawer or John opens his drawer every day, but not *John every day opens his drawer.
Because of the wide range of forms, it generally is more useful to consider adverbial phrases
(ADVPs) by function rather than syntactic form. Thus we can consider manner, temporal, duration,
location, degree, and frequency adverbial phrases each as its own form. We considered the location
and degree forms earlier, so here we will consider some of the others.

Temporal adverbials occur in a wide range of forms: adverbial particles (for example, now),
noun phrases (for example, today, yesterday), prepositional phrases (for example, at noon, during
the fight), and clauses (for example, when the clock struck noon, before the fight started).
Frequency adverbials also can occur in a wide range of forms: particles (for example, often),
noun phrases (for example, every day), prepositional phrases (for example, at every party), and
clauses (for example, every time that John comes for a visit).

Duration adverbials appear most commonly as prepositional phrases (for example, for three
hours, about 20 feet) and clauses (for example, until the moon turns blue).

Manner adverbials occur in a wide range of forms, including particles (for example, slowly),
noun phrases (for example, this way), prepositional phrases (for example, in great haste), and
clauses (for example, by holding the embers at the end of a stick).

In the analyses that follow, adverbials will most commonly occur as modifiers of the action
or state described in a sentence. As such, an issue arises as to how to distinguish verb complements
from adverbials. One distinction is that adverbial phrases are always optional. Thus you should be
able to delete the adverbial and still have a sentence with approximately the same meaning
(missing, obviously, the contribution of the adverbial). Consider the sentences
Jack put the box by the door.
Jack ate the pizza by the door.
In the first sentence the prepositional phrase is clearly a complement, since deleting it to produce
*Jack put the box results in a nonsensical utterance. On the other hand, deleting the phrase from the
second sentence has only a minor effect:
Jack ate the pizza is just a less general assertion about the same situation described by Jack ate the
pizza by the door.

3.1 Spoken Language Input

Spoken language interfaces to computers is a topic that has lured and fascinated engineers
and speech scientists alike for over five decades. For many, the ability to converse freely with a
machine represents the ultimate challenge to our understanding of the production and perception
processes involved in human speech communication. In the near future, interactive networks will
provide easy access to a wealth of information and services that will fundamentally affect how
people work, play and conduct their daily affairs. Advances in human language technology are
needed to enable the average citizen to communicate with networks using natural communication
skills and everyday devices, such as telephones and televisions. A speech interface, in a user’s own
language, is ideal because it is the most natural, flexible, efficient, and economical form of human
communication. Spoken input to computers embodies many different technologies and
applications, as illustrated in Figure 1.1. In some cases, as shown at the bottom of the figure, one is
interested not in the underlying linguistic content but in the identity of the speaker or the language
being spoken. Speaker recognition can involve identifying a specific speaker out of a known
population, which has forensic implications, or verifying the claimed identity of a user, thus
enabling controlled access to locales (e.g., a computer room) and services (e.g., voice banking).
Language identification also has important applications, and techniques applied to this area.
Figure 1.1: Technologies for spoken language interfaces.
When one thinks about speaking to computers, the first image is usually speech recognition,
the conversion of an acoustic signal to a stream of words. After many years of research, speech
recognition technology is beginning to pass the threshold of practicality. The last decade has
witnessed dramatic improvement in speech recognition technology, to the extent that high
performance algorithms and systems are becoming available. Speech input capabilities are
emerging that can provide functions like voice dialing (e.g., Call home), call routing (e.g., I would like
to make a collect call), simple data entry (e.g., entering a credit card number), and preparation of
structured documents (e.g., a radiology report). As these authors point out, speech recognition
involves several component technologies. First, the digitized signal must be transformed into a set
of measurements i.e signal representation. Next, the various speech sounds must be modeled
appropriately. The most widespread technique for acoustic modeling is called hidden Markov
modeling (HMM).

Speech recognition is a very challenging problem in its own right, with a well defined set of
applications. However, many tasks that lend themselves to spoken input— making travel
arrangements or selecting a movie—are in fact exercises in interactive problem solving. The
solution is often built up incrementally, with both the user and the computer playing active roles in
the “conversation.” Therefore, several language based input and output technologies must be
developed and integrated to reach this goal.

Figure 1.1 shows the major components of a typical conversational system. The spoken
input is first processed through the speech recognition component. The natural language
component, working in concert with the recognizer, produces a meaning representation. The
spoken language understanding technology discusses the integration of speech recognition and
natural language processing techniques. For information retrieval applications illustrated in this
figure, the meaning representation can be used to retrieve the appropriate information in the form
of text, tables and graphics. If the information in the utterance is insufficient or ambiguous, the
system may choose to query the user for clarification. Natural language generation and speech
synthesis can be used to produce spoken responses that may serve to clarify the tabular
information. Throughout the process, discourse information is maintained and fed back to the
speech recognition and language understanding components, so that sentences can be properly
understood in context.

3.1.1 Speech Recognition

Speech recognition is the process of converting an acoustic signal, captured by a microphone


or a telephone, to a set of words. The recognized words can be the final results, for such
applications as commands & control, data entry, and document preparation. They can also serve as
the input to further linguistic processing in order to achieve speech understanding. Speech
recognition systems can be characterized by many parameters, some of the more important of
which are shown in Figure 1.1. An isolated-word speech recognition system requires that the
speaker pause briefly between words, whereas a continuous speech recognition system does not.
Spontaneous, or extemporaneously generated, speech contains disfluencies and is much more
difficult to recognize than speech read from script. Some systems require speaker enrollment—a
user must provide samples of his or her speech before using them—whereas other systems are
said to be speaker independent, in that no enrollment is necessary. Some of the other parameters
depend on the specific task. Recognition is generally more difficult when vocabularies are large or
have many similar-sounding words. When speech is produced in a sequence of words, language
models or artificial grammars are used to restrict the combination of words. The simplest language
model can be specified as a finite-state network, where the permissible words following each word
are explicitly given. More general language models approximating natural language are specified in
terms of a context-sensitive grammar. One popular measure of the difficulty of the task, combining
the vocabulary size and the language model, is perplexity, loosely defined as the geometric mean of
the number of words that can follow a word after the language model has been applied. In addition,
there are some external parameters that can affect speech recognition system performance,
including the characteristics of the environmental noise and the type and the placement of the
microphone.

Parameters Range
Speaking Mode Isolated words to continuous speech
Speaking Style Read speech to spontaneous speech
Enrollment Speaker-dependent to Speaker-independent
Vocabulary Small ( 20 words) to large (20,000 words)
Language Model Finite-state to context-sensitive
Perplexity Small ( 10) to large (100)
SNR (signal-to-noise ratio) High ( 30 dB) to low (10 dB)
Transducer Voice-cancelling microphone to telephone

Table 1.1: Typical parameters used to characterize the capability of speech recognition systems

Figure 1.2 shows the major components of a typical speech recognition system. The
digitized speech signal is first transformed into a set of useful measurements or features at a fixed
rate, typically once every 10–20 msec. These measurements are then used to search for the most
likely word candidate, making use of constraints imposed by the acoustic, lexical, and language
models. Throughout this process, training data are used to determine the values of the model
parameters.
3.1.3 Signal Representation

In statistically based automatic speech recognition, the speech waveform is sampled at a rate
between 6.6 kHz and 20 kHz and processed to produce a new representation as a sequence of
vectors containing values that are generally called parameters. The vectors typically comprise
between 10 and 20 parameters, and are usually computed every 10 or 20 msec. These parameter
values are then used in succeeding stages in the estimation of the probability that the portion of
waveform just analyzed corresponds to a particular phonetic event in the phone-sized or whole-
word reference unit being hypothesized.
.
Representations used in current speech recognizers (see Figure 1.3), concentrate primarily on
properties of the speech signal attributable to the shape of the vocal tract rather than to the
excitation, whether generated by a vocal-tract constriction or by the larynx. Representations are
sensitive to whether the vocal folds are vibrating or not (the voiced/unvoiced distinction), but try
to ignore effects due to variations in their frequency of vibration.
Figure 1.3: Examples of representations used in current speech recognizers: (a) Time varying
waveform of the word speech, showing changes in amplitude (y axis) over time (x axis); (b) Speech
spectrogram of (a), in terms of frequency (y axis), time (x axis) and amplitude (darkness of the
pattern); (c) Expanded waveform of the vowel ee (underlined in b); (d) Spectrum of the vowel ee, in
terms of amplitude (y axis) and frequency (x axis); (e) Mel-scale spectrogram.

3.1.4 Robust Speech Recognition

Robustness in speech recognition refers to the need to maintain good recognition accuracy
even when the quality of the input speech is degraded, or when the acoustical, articulatory, or
phonetic characteristics of speech in the training and testing environments differ. Obstacles to
robust recognition include acoustical degradations produced by additive noise, the effects of linear
filtering, nonlinearities in transduction or transmission, as well as impulsive interfering sources,
and diminished accuracy caused by changes in articulation produced by the presence of high-
intensity noise sources. Some of these sources of variability are illustrated in Figure 1.4. Speaker-
to-speaker differences impose a different type of variability, producing variations in speech rate,
co-articulation, context, and dialect. Even systems that are designed to be speaker-independent
exhibit dramatic degradations in recognition accuracy when training and testing conditions differ .

Speech recognition systems have become much more robust in recent years with respect to
both speaker variability and acoustical variability. In addition to achieving speaker-independence,
many current systems can also automatically compensate for modest amounts of acoustical
degradation caused by the effects of unknown noise and unknown linear filtering.
Figure 1.4: Schematic representation of some of the sources of variability that can degrade speech
recognition accuracy, along with compensation procedures that improve environmental
robustness.

3.1.5 HMM Methods in Speech Recognition

Modern architectures for Automatic Speech Recognition (ASR) are mostly software architectures
which generate a sequence of word hypotheses from an acoustic signal. The most popular
algorithms implemented in these architectures are based on statistical methods. A vector yt of
acoustic features is computed every 10 to 30 msec.

Note: the notation y1T stands for the sequence {y1,y2, … yT}
Argmax is an operation that finds the argument that gives the maximum value from a target
function.

For large vocabularies, search is performed in two steps. The first generates a word lattice of the n-
best word sequences with simple models to compute approximate likelihoods in real-time. In the
second step, more accurate likelihoods are compared with a limited number of hypotheses. Some
systems generate a single word sequence hypothesis with a single step. The search produces an
hypothesized word sequence if the task is dictation. If the task is understanding, then a conceptual
structure is obtained with a process that may involve more than two steps.
Eg: dollfins swim fast.

3.1.6 Language Representation

A speech recognizer converts the observed acoustic signal into the corresponding orthographic
representation of the spoken sentence. The recognizer chooses its guess from a finite vocabulary of
words that can be recognized. For simplicity, we assume that a word is uniquely identified by its
spelling. Dramatic progress has been demonstrated in solving the speech recognition problem via
the use of a statistical model of the joint distribution of the sequence of spoken words and the
corresponding observed sequence of acoustic information. This approach, pioneered by the IBM
Continuous Speech Recognition group, is called the source-channel model. In this approach, the
speech recognizer determines an estimate of the identity of the spoken word sequence from the
observed acoustic evidence by using the a posteriori distribution.

3.1.7 Speaker Recognition

Speaker recognition, which can be classified into identification and verification, is the process of
automatically recognizing who is speaking on the basis of individual information included in
speech waves. This technique makes it possible to use the speaker’s voice to verify their identity
and control access to services such as voice dialing, banking by telephone, telephone shopping,
database access services, information services, voice mail, security control for confidential
information areas, and remote access to computers. AT&T and TI (with Sprint) have started field
tests and actual application of speaker recognition technology; Sprint’s Voice Phone Card is already
being used by many customers. In this way, speaker recognition technology is expected to create
new services that will make our daily lives more convenient. Another important application of
speaker recognition technology is for forensic purposes.
Figure 1.8 shows the basic structures of speaker identification and verification systems.
Speaker identification is the process of determining which registered speaker provides a given
utterance. Speaker verification, on the other hand, is the process of accepting or rejecting the
identity claim of a speaker. Most applications in which a voice is used as the key to confirm the
identity of a speaker are classified as speaker verification. There is also the case called open set
identification, in which a reference model for an unknown speaker may not exist. This is usually the
case in forensic applications. In this situation, an additional decision alternative, the unknown does
not match any of the models, is required. In both verification and identification processes, an
additional threshold test can be used to determine if the match is close enough to accept the
decision or if more speech data needed.
Speaker recognition methods can also be divided into text-dependent and text independent
methods. The former require the speaker to say key words or sentences having the same text for
both training and recognition trials, whereas the latter do not rely on a specific text being spoken.
Both text-dependent and independent methods share a problem however. These systems can be
easily deceived because someone who plays back the recorded voice of a registered speaker saying
the key words or sentences can be accepted as the registered speaker. To cope with this problem,
there are methods in which a small set of words, such as digits, are used as key words and each
user is prompted to utter a given sequence of key words that is randomly chosen every time the
system is used. Yet even this method is not completely reliable, since it can be deceived with
advanced electronic recording equipment that can reproduce key words in a requested order.
Therefore, a text-prompted (machine-driven-text-dependent) speaker recognition method has
recently been proposed.
Figure 1.8: Basic structures of speaker recognition systems.

3.1.8 Spoken Language Understanding

Spoken language understanding involves two primary component technologies (each covered
elsewhere in this volume): speech recognition (SR), and natural language (NL) understanding. The
integration of speech and natural language has great advantages: To NL, SR can bring prosodic
information (information important for syntax and semantics but not well represented in text); NL
can bring to SR additional knowledge sources (e.g., syntax and semantics). For both, integration
affords the possibility of many more applications than could otherwise be envisioned, and the
acquisition of new techniques and knowledge bases not previously represented. The integration of
these technologies presents technical challenges, and challenges related to the quite different
cultures, techniques and beliefs of the people representing the component technologies.

In large part, NL research has grown from symbolic systems approaches in computer science and
linguistics departments. The desire to model language understanding is often motivated by a desire
to understand cognitive processes and, therefore, the underlying theories tend to be from
linguistics and psychology. Practical applications have been less important than increasing
intuitions about human processes. Therefore, coverage of phenomena of theoretical interest—
usually the more rare phenomena—has traditionally been more important than broad coverage.

Speech recognition research, on the other hand, has largely been practiced in engineering
departments. The desire to model speech is often motivated by a desire to produce practical
applications. Techniques motivated by knowledge of human processes have therefore been less
important than techniques that can be automatically developed or tuned, and broad coverage of a
representative sample is more important than coverage of any particular phenomenon.
There are certainly technical challenges to the integration of SR and NL. However, progress toward
meeting these challenges has been slowed by the differences outlined above. Collaboration can be
inhibited by differences in motivation, interests, theoretical underpinnings, techniques, tools, and
criteria for success. However, both groups have much to gain from collaboration. For the SR
engineers, human language understanding provides an existence proof, and needs to be taken into
account, since most applications involve interaction with at least one human. For the AI NL
researchers, statistical and other engineering techniques can be important tools for their inquiries.

3.2 Spoken Output Technologies

3.2.1.1 A Global View of Synthesis Research

Speech synthesis research predates other forms of speech technology by many years. In the early
days of synthesis, research efforts were devoted mainly to simulating human speech production
mechanisms, using basic articulatory models based on electro acoustic theories. Though this
modeling is still one of the ultimate goals of synthesis research, advances in computer science have
widened the research field to include Text to- Speech (TtS) processing in which not only human
speech generation but also text processing is modeled. As this modeling is generally done by a set
of rules derived, e.g., from phonetic theories and acoustic analyses, the technology is typically
referred to as speech synthesis by rule.

Figure 5.1 shows the configuration of a standard TtS system. In such systems, rule-based synthesis
has attained highly intelligible speech quality and can already serve in many practical uses.
Ceaseless efforts have improved the quality of rule-based synthetic speech, step by step, by
alternating speech characteristics analysis with the development of control rules. However, most of
this progress has been system dependent, and remains deeply embedded within system
architectures in impenetrable meshes of detailed rules and finely tuned control parameters. As a
consequence, the expert knowledge that has been incorporated is not available for sharing
commonly and can be very hard to replicate in equivalent systems by other researchers.

Figure 5.1: The configuration of a standard TtS system.

In contrast to this traditional rule-based approach, a corpus-based approach has also been pursued.
In the corpus-based work, well-defined speech data sets have been annotated at various levels with
information, such as acoustic-phonetic labels and syntactic bracketing, serving as the foundation
for statistical modeling. Spectral and prosodic feature parameters of the speech data are analyzed
in relation to the labeled information, and their control characteristics are quantitatively described.
Based on the results of these analyses, a computational model is created and trained using the
corpus. By subsequently applying the resulting model to unseen test data, its validity and any
defects can be quantitatively shown. By feeding back results from such tests into the original model
with extended training, further improvements can be attained in a cyclical process.

3.2.1.2 Synthesis Segment Units


In TtS systems, speech units that are typically smaller than words are used to synthesize speech
from arbitrary input text. Since there are over 10,000 different possible syllables in English, much
smaller units such as phonemes and dyads (phoneme pairs) have typically been modeled. A speech
segment’s spectral characteristics vary with its phonetic context, as defined by neighboring
phonemes, stress and positional differences, and recent studies have shown that speech quality can
be greatly affected by these contextual differences. However, in traditional rule-based synthesis,
though these units have been carefully designed to take into account phonetic variations, no
systematic studies have been carried out to determine how and where to best extract the acoustic
parameters of units or what kind of speech corpus can be considered optimal.

Corpus-based methods provide for a specification of the speech segments required for
concatenative synthesis in three factors:
1. the procedures of the unit selection algorithm;
2. the objective measures used in the selection criteria; and
3. the design of the speech corpus from which the units are extracted.
This modularization of system building is useful not only in reducing construction effort, but
also in allowing precise mathematical specification of the problems and in defining ways to cope
with them systematically, by improving the selection algorithms, criteria and data.

3.2.1.3 Prosody Control


For synthesis of natural-sounding speech, it is essential to control prosody, to ensure appropriate
rhythm, tempo, accent, intonation and stress. Segmental duration control is needed to model
temporal characteristics, just as fundamental frequency control is needed for tonal characteristics.
In contrast to the relative sparsity of work on speech unit generation, many quantitative analyses
have been carried out for prosody control.

3.2.1.4 Speaker Characteristics Control


Speech waveforms contain not only linguistic information but also speaker voice characteristics, as
manifested in the glottal waveform of voice excitation and in the global spectral features
representing vocal tract characteristics. The glottal waveform has been manipulated using a glottal
source model and female voices (more difficult to model) have been successfully synthesized.
However, it is very difficult to fully automate such parameter extraction procedures and the
establishment of an automatic analysis-synthesis scheme is longed for.

3.2.2 Synthetic Speech Generation

Speech generation is the process which allows the transformation of a string of phonetic and
prosodic symbols into a synthetic speech signal. The quality of the result is a function of the quality
of the string, as well as of the quality of the generation process itself. Today from a text-to-speech
(TtS) system two quality criteria are proposed. The first one is intelligibility, which can be
measured by taking into account several kinds of units (phonemes, syllables, words, phrases). The
second one, more difficult to define, is often labeled as pleasantness or naturalness. Actually the
concept of naturalness may be related to the concept of realism in the field of image synthesis: the
goal is not to restitute the reality but to suggest it. Thus, listening to a synthetic voice must allow
the listener to attribute this voice to some pseudo-speaker and to perceive some kind of
expressivity as well as providing some indices characterizing the speaking style and the particular
situation of elocution.
Most of the present TtS systems produce an acceptable level of intelligibility, but the
naturalness dimension, the ability to control expressivity, speech style and pseudo speaker identity
are still poorly mastered. Let us mention, however, that users demands vary to a large extent
according to the field of application: general public applications such as telephonic information
retrieval need maximal realism and naturalness, whereas some applications involving
professionals (process or vehicle control) or highly motivated persons (visually impaired,
applications in hostile environments) demand intelligibility with the highest priority.

5.2.3 Speech Signal Generation


The last step for speech output is synthesis of the waveform, according to the segmental and
prosodic parameters defined at earlier stages of processing.
Speech signal generators (the synthesizers) can be classified into three categories:
1. articulatory synthesizers,
2. formant synthesizers, and
3. concatenative synthesizers.
Articulatory synthesizers are physical models based on the detailed description of the
physiology of speech production and on the physics of sound generation in the vocal apparatus .
Typical parameters are the position and kinematics of articulators. Then the sound radiated at the
mouth is computed according to equations of physics. This type of synthesizer is rather far from
applications and marketing because of its cost in terms of computation and the underlying
theoretical and practical problems still unsolved.
Formant synthesis is a descriptive, acoustic-phonetic approach to synthesis. Speech
generation is not performed by solving equations of physics in the vocal apparatus, but by
modeling the main acoustic features of the speech signal. The basic acoustic model is the
source/filter model. The filter, described by a small set of formants, represents articulation in
speech. It models speech spectra that are representative of the position and movements of
articulators. The source represents phonation. It models the glottal flow or noise excitation signals.
Both source and filter are controlled by a set of phonetic rules (typically several hundred). High-
quality, rule-based formant synthesizers, including multilingual systems, have been marketed for
many years.
Concatenative synthesis is based on speech signal processing of natural speech databases.
The segmental database is built to reflect the major phonological features of a language. For
instance, its set of phonemes is described in terms of diphone units, representing the phoneme-to-
phoneme junctures. Non-uniform units are also used (diphones, syllables, words, etc.). The
synthesizer concatenates (coded) speech segments, and performs some signal processing to
smooth unit transitions and to match predefined prosodic schemes. Several high-quality
concatenative synthesizers, including multilingual systems, are marketed today.

3.2.3 Text Interpretation for TtS Synthesis


The problem of converting text into speech for some language can naturally be broken down
into two subproblems. One subproblem involves the conversion of linguistic parameter
specifications (e.g., phoneme sequences, accentual parameters) into parameters (e.g., formant
parameters, concatenative unit indices, pitch time/value pairs) that can drive the actual synthesis
of speech. The other sub problem involves the computation of these linguistic parameter
specifications from input text.
In any language, orthography is an imperfect representation of the underlying linguistic
form. To illustrate this point, and to introduce some of the issues that we will discuss in this section,
consider an English sentence such as Give me a ticket to Dallas or give me back my money: see
Figure 5.2.
Figure 5.2: Some linguistic structures associated with the analysis of the sentence, “Give me a ticket
to Dallas or give me back my money.”

One of the first things that an English TtS system would need to do is tokenize the input into
words: for English this is not generally difficult though it is more complicated for some other
languages. A pronunciation then needs to be computed for each word; in English, given the
irregularity of the orthography, this process involves a fair amount of lexical lookup though other
processes are involved too. Some of the words in the sentence should be accented; in this particular
case, a reasonable accentuation would involve accenting content words like give, ticket, Dallas, back
and money, and leaving the other words unaccented. Then we might consider breaking the input
into prosodic phrases: in this case, it would be reasonable to intone the sentence as if there were a
comma between Dallas and or. Thus, various kinds of linguistic information need to be extracted
from the text, but only in the case of word boundaries can this linguistic information be said to be
represented directly in the orthography.

3.2.4 Spoken Language Generation

Interactive natural language capabilities are needed for a wide range of today’s intelligent
systems: expert systems must explain their results and reasoning, intelligent assistants must
collaborate with users to perform tasks, tutoring systems must teach domain concepts and critique
students’ problem-solving strategies, and information delivery systems must help users find and
make sense of the information they need. These applications require that a system be capable of
generating coherent multi-sentential responses, and interpreting and responding to users’
subsequent utterances in the context of the ongoing interaction.
Spoken language generation allows for provision of responses as part of an interactive
human-machine dialogue, where speech is one medium for the response. This research topic draws
from the fields of both natural language generation and speech synthesis. It differs from synthesis
in that speech is generated from an abstract representation of concepts rather than from text.
While a relatively under-emphasized research problem, the ability to generate spoken responses is
clearly crucial for interactive situations, in particular when:
1. the user’s hands and/or eyes are busy;
2. screen real estate is at a premium;
3. time is critical; or
4. system and user are communicating via a primarily audio channel such as the telephone.
Like written language generation, spoken language generation requires determining what concepts
to include and how to realize them in words, but critically also requires determining intonational
form. Several problems are particularly pertinent to the spoken context: 1) The need to model and
use knowledge about hearer goals, hearer background, and past discourse in determining content
and form of a response. While the written context can include a general audience (e.g., for report
generation), responses in an interactive dialogue are intended for a particular person and, to be
useful, must take that person into account. 2) What kind of language should be generated given a
spoken context? Given the lack of visual memory that a text provides, the form required for speech
is likely to be quite different from that found in text.

3.3 Written Language Input

The written form of language is contained in printed documents, such as newspapers, magazines
and books, and in handwritten matter, such as found in notebooks and personal letters. Given the
importance of written language in human transactions, its automatic recognition has practical
significance.
3.3.1.1 Written Language
Fundamental characteristics of writing are:
1. it consists of artificial graphical marks on a surface;
2. its purpose is to communicate something;
3. this purpose is achieved by virtue of the mark’s conventional relation to language
Although speech is a sign system that is more natural than writing to humans, writing is considered
to have made possible much of culture and civilization. Different writing systems, or scripts,
represent linguistic units, words, syllables and phonemes, at different structural levels. In
alphabetic writing systems, principal examples of which are the Latin, Greek and Russian scripts,
alphabets are the primitive elements, or characters, which are used to represent words. Several
languages such as English, Dutch, French, etc, share the Latin script. The Devanagari script, which
represents syllables as well as alphabets, is used by several Indian languages, including Hindi. The
Chinese script, which consists of ideograms, is an alternative to alphabets. The Japanese script
consists of the Chinese ideograms (Kanji) and syllables (Kana).

There are roughly two dozen different scripts in use today (ignoring minor differences in
orthography, as between English and French). Each script has its own set of icons, known as
characters or letters, that have certain basic shapes. Each script has its rules for combining the
letters to represent the shapes of higher level linguistic units. For example, there are rules for
combining the shapes of individual letters so as to form cursively written words in the Latin
alphabet.
In addition to linguistic symbols, each script has a representation for numerals, such as the Arabic-
Indic digits used in conjunction with the Latin alphabet. In addition, there are icons for special
symbols found on keyboards.

3.3.1.2 Transducers

Most of archived written language has been in the form of printed paper documents. In such
documents, text is presented as a visual image on a high contrast background, where the shapes of
characters belong to families of type fonts. Paper documents, which are an inherently analog
medium, can be converted into digital form by a process of scanning and digitization.
More recently, it has become possible to store and view electronically prepared documents
as formatted pages on a computer graphics screen, where the scanning and recognition process is
eliminated.
Written language is also encountered in the form of handwriting inscribed on paper or
registered on an electronically sensitive surface. Handwriting data is converted to digital form
either by scanning the writing on paper or by writing with a special pen on an electronic surface
such as a Liquid Crystal Display (LCD). The two approaches are distinguished as off-line and on-line
handwriting.

3.3.1.3 Recognition
Written language recognition is the task of transforming language represented in its spatial form of
graphical marks into its symbolic representation. For English orthography, this symbolic
representation is typically the ASCII representation of text. The characters of most written
languages of the world are representable today in the form of the Unicode. The central tasks are
character recognition and word recognition. A necessary preprocessing step for recognizing
written language is the spatial issue of locating and registering the appropriate text when there are
complex two-dimensional spatial layouts employed. The latter task is referred to as document
image analysis.
Character Recognition
The basic problem is to assign the digitized character into its symbolic class. In the case of a
print image, this is referred to as Optical Character Recognition (OCR) . In the case of handprint, it
is referred to as Intelligent Character Recognition (ICR).
The typical classes are the upper- and lower-case characters, the ten digits, and special
symbols such as the period, exclamation mark, brackets, dollar and pound signs, etc. A pattern
recognition algorithm is used to extract shape features and assign the observed character into the
appropriate class. Artificial neural networks have emerged as fast methods for implementing
classifiers for OCR. Algorithms based on nearest neighbor methods have higher accuracy, but are
slower.
Recognition of characters from a single font family on a well-printed paper document can be
done very accurately. Difficulties arise when there are decorative fonts, many fonts to be handled, ,
or when the document is of poor quality. Some examples of poor quality machine-printed and
handwritten characters are shown in Figure 2.1. In the difficult cases, it becomes necessary to use
models to constrain the choices at the character and word levels. Such models are essential in
handwriting recognition due to the wide variability of hand printing and cursive script. A word
recognition algorithm attempts to associate the word image to choices in a lexicon. Typically, a
ranking is produced. This is done either by the analytical approach of recognizing the individual
characters or by the holistic approach of dealing with the entire word image. The latter approach is
useful in the case of touching printed characters and handwriting. A higher level of performance is
observed by combining the results of both approaches. In the off-line unconstrained handwritten
word recognition problem, recognition rates of 95%, 85% and 78% have been reported for the top
choice for lexicon sizes of 10, 100 and 1,000 respectively difficulties (a) and handwritten
characters (b). In the on-line case, larger lexicons are possible for the same accuracy; a top choice
recognition rate of 80% with pure cursive words and a 21,000 word lexicon has been reported.

Figure 2.1: Examples of low-quality machine-printed characters involving segmentation difficulties


(a) and handwritten characters (b).

Language Models
Language models are useful in recovering strings of words after they have been passed
through a noisy channel, such as handwriting or print degradation. The most important model for
written language recognition is the lexicon of words. The lexicon, in turn, is determined by
linguistic constraints, e.g., in recognizing running text, the lexicon for each word is constrained by
the syntax, semantics and pragmatics of the sentence. The performance of a recognition system can
be improved by incorporating statistical information at the word sequence level. The performance
improvement derives from selection of lower-rank words from the word recognition output when
the surrounding context indicates such selection makes the entire sentence more probable. Lexical
techniques such as collocational analysis can be used to modify word neighborhoods generated by
a word recognizer. Modification includes re-ranking, deleting or proposing new word candidates.
Collocations are word patterns that occur frequently in language; intuitively, if word A is present,
there is a high probability that word B is also present.
Methods to apply linguistic knowledge include: n-gram word models, n-gram class (e.g.,
part-of-speech) models, context-free grammars, and stochastic context-free grammars. An example
of a handwritten sentence together with recognition choices produced by a word recognizer and
grammatically determined correct paths are shown in Figure 2.2. An increase in top choice word
recognition rate from 80% to 95% is possible with the use of language models.

Document Image Analysis


Interaction with written language recognition is the task of document image analysis. It
involves determining the physical (spatial) and logical structure of document content. There is
wide variability in the structure of documents, as in the case of newspapers, magazines, books,
forms, letters and handwritten notes. In the case of a newspaper page, the objective of document
analysis is to:

1. determine spatial extent of document segments and to associate appropriate labels with them,
e.g., half-tone photographs, text, graphics, separating lines, etc.,
2. group image parts into meaningful units, e.g., figure and caption, heading, subheading, etc.,
3. determine reading order of blocks of text.
Document image analysis involves traditional image processing operations to printed text,
such as enhancement, gray-scale image binarization, texture analysis, segmentation, etc. Additional
difficult problems in the case of handwriting are: separation of lines of text, separation of words
within a line and the separation of touching characters.

Figure 2.2: Handwritten Sentence Recognition. The path through top word choices is determined
using part-of-speech tags.

3.3.2 Document Image Analysis


Document analysis, or more precisely, document image analysis, is the process that
performs the overall interpretation of document images. The process of determining document
structure may be viewed as guided by a model, explicit or implicit, of the class of documents of
interest. The model describes the physical appearance and the relationships between the entities
that make up the document. OCR is often at the final level of this process, i.e., it provides a final
encoding of the symbols contained in a logical entity such as paragraph or table, once the latter has
been isolated by other stages. However, it is important to realize that OCR can also participate in
determining document layout. For example, as part of the process of extracting a newspaper article,
the system may have to recognize the character string, continued on page 5, at the bottom of a page
image, in order to locate the entire text.
In practice then, a document analysis system performs the basic tasks of image
segmentation, layout understanding, symbol recognition and application of contextual rules, in an
integrated manner.
1. Text Documents -The ultimate goal for text systems can be termed inverse formatting or
completion of the Gutenberg loop, meaning that a scanned printed document is translated back into
a document description language from which it could be accurately reprinted if desired.
2. Forms -Forms are the printed counterparts of relations in a data base. A typical form consists of
an n-tuple of data items, each of which can be represented as an ordered pair (item name, item
value). OCR is used to recognize the item value; more general document analysis operations may be
needed in order to identify the item name. The capability for locating items on a form, establishing
their name class, and encoding the accompanying data values has many applications in business
and government.
3. Postal Addresses and Check Reading
These applications are characterized by a well-defined logical format (but a highly variable
physical layout), and a high degree of contextual constraint on the symbolic data. The latter is
potentially very useful in the attainment of high accuracy. Contextual rules can modify OCR results
to force agreement of city names and postal codes, for example, or to reconcile numeric dollar
amounts on checks with the written entry in the legal amount field. Contextual constraints can also
assist in the detection of misrecognized documents, so that these can be handled by manual or
other processes. While pieces of mail and checks are actually a subclass of form documents, the
large amount of effort invested in these problems justifies listing them separately.
4 . Line Drawings
Much of the activity in this area centers on entry of engineering drawings to Computer- Assisted
Design / Computer-Assisted Manufacture(CADCAM) systems. A project for input of integrated
circuit diagrams has reported cost-efficient conversion of drawings compared with conventional
manual input. This project offers evidence that new circuits can most efficiently be created on
paper and then encoded by recognition processes. The claim is that this is better than direct input
at a terminal, due to the small screen sizes on present-day equipment.

3.3.3 OCR: Print

Currently, there is considerable motivation to provide computerized document analysis systems.


Giant steps have been made in the last decade, both in terms of technological supports and in
software products. Character recognition (OCR) contributes to this progress by providing
techniques to convert large volumes of data automatically. There are so many papers and patents
advertising recognition rates as high as 99.99%; this gives the impression that automation
problems seem to have been solved. However, the failure of some real applications show that
performance problems subsist on composite and degraded documents (i.e., noisy characters, tilt,
mixing of fonts, etc.) and that there is still room for progress. Various methods have been proposed
to increase the accuracy of optical character recognizers.
There is a parallel analogy between the various stages of evolution of OCR systems and
those of pattern recognition. To overcome the recognition deficiency, the classical approach
focusing on isolated characters has been replaced with more contextual techniques. The opening of
OCR domain to document recognition leads to combination of many strategies such as document
layout handling, dictionary checking, font identification, word recognition, integration of several
recognition approaches with consensual voting, etc.

3.3.4 OCR: Handwriting


For more than thirty years, researchers have been working on handwriting recognition. As
in the case of speech processing, they have aimed at designing systems able to understand personal
encoding of natural language. Over the last few years, the number of academic laboratories and
companies involved in research on handwriting recognition has continually increased.
Simultaneously, commercial products have become available. This new stage in the evolution of
handwriting processing results from a combination of several elements: improvements in
recognition rates, the use of complex systems integrating several kinds of information, the choice of
relevant application domains, and new technologies such as high quality high speed scanners and
inexpensive, powerful CPUs.
Methods and recognition rates depend on the level of constraints on handwriting. The
constraints are mainly characterized by the types of handwriting, the number of scriptors, the size
of the vocabulary and the spatial layout. Obviously, recognition becomes more difficult when the
constraints decrease. Considering the types of Roman script (roughly classified as hand printed,
discrete script and cursive script), the difficulty is lower for handwriting produced as a sequence of
separate characters than for cursive script, which has much in common with continuous speech
recognition. For other writing systems, character recognition is hard to achieve, as in the case of
Kanji which is characterized by complex shapes and a huge number of symbols.
The characteristics which constrain handwriting may be combined in order to define
handwriting categories for which the results of automatic processing are satisfactory. The trade-off
between constraints and error rates give rise to applications in several domains. The resulting
commercial products have proved that handwriting processing can be integrated into working
environments. Most efforts have been devoted to mail sorting, bank check reading, forms
processing in administration and insurance. These applications are of great economic interest, each
of them concerning millions of documents.
Mail sorting is a good illustration of the evolution in the domain. In this case, the number of
writers is unconstrained. In the early stages, only ZIP code was recognized. Then, cities (and states
such as in the U.S.) were processed, implying the recognition of several types of handwriting: hand
printed, cursive, or a mixture of both. The use of the redundancy between the ZIP code and the city
name, as well as redundancy between numeral and literal amounts in bank checks, shows that
combining several sources of information improves the recognition rates. Today, the goal is to read
the full address, down to the level of the information used by the individual carrier. This
necessitates precisely extracting the writing lines, manipulating a very large vocabulary and using
contextual knowledge as the syntax of addresses (such as in the case of reading the literal amount
of checks, the use of syntactic rules improves the recognition). These new challenges bring the
ongoing studies closer to unconstrained handwritten language processing, the ultimate aim. The
reading of all of the handwritten and printed information present on a document is necessary to
process it automatically, to use content dependent criteria to store, access and transmit it and to
check its content. Automatic handwritten language processing will also allow one to convert and to
handle manuscripts produced over several centuries within a computer environment.

3.3.5 Handwriting as Computer Interface

Pen computers offer an interesting alternative to paper. One can write directly on a Liquid
Crystal Display (LCD) screen with a stylus or pen. The screen has an invisible sensitive matrix
which records the position of the pen on the surface. The trajectory of the pen appears almost
instantaneously on the screen giving the illusion of ink (electronic ink). Handwriting recognition
allows text and computer commands to be entered.
While nothing opposes the idea of a computer that would use multiple input modalities,
including speech, keyboard and pen, some applications call for a pen-only computer interface: in a
social environment, speech does not provide enough privacy; for small hand-held devices and for
large alphabet (e.g., Chinese), the keyboard is cumbersome. Applications are numerous: personal
organizer, personal communicator, notebook, data acquisition device for order entries, inspections,
inventories, surveys, etc.
The dream is to have a computer that looks like paper, feels like paper but is better than
paper. Currently, paper is the most popular medium for sketching, note taking and form filling,
because it offers a unique combination of features: light, cheap, reliable, available almost
everywhere any time, easy to use, flexible, foldable, pleasing to the eye and to the touch, silent. But
paper also has its drawbacks: in large quantities it is no longer light and cheap, it is hard to reuse
and recycle, difficult to edit, expensive to copy and to mail, and inefficient to transform into
computer files. With rapid technology progress, electronic ink could become cheaper and more
convenient than paper, if only handwriting recognition worked.
Human subjects make 4–8% error for isolated letters read in the absence of context and
error with the context of the neighboring letters. Therefore, the task of designing usable
handwriting recognizers for pen computing applications is tremendously hard. Human recognition
rates must be reached and even outperformed.
On-line Handwriting Recognition Techniques
Considerably more effort has been put into developing algorithms for Optical Character
Recognition (OCR) and speech recognition than for on-line handwriting recognition. Consequently,
on-line handwriting recognition, which bears similarity to both, has been borrowing a lot of
techniques from these.
There is a natural temptation to convert pen trajectory data to pixel images and process them
with an OCR recognizer. But, the on-line handwriting recognition problem has a number of
distinguishing features which must be exploited to get best results:
1. Preprocessing operations such as smoothing, deslanting, deskewing, and dehooking and
2. feature extraction operations such as the detection of line orientations, corners, loops and
cusps are easier and faster with the pen trajectory data than on pixel images.
3. Discrimination between optically ambiguous characters (for example, “j” and “;”) may be
facilitated with the pen trajectory information.
4. Segmentation operations are facilitated by using the pen-lift information, particularly for
handprinted characters.
5. Immediate feed-back is given by the writer, whose corrections can be used to further train
the recognizer.
Another temptation is to use the pen trajectory as a temporal signal and process it with a speech
recognizer. Other problems arise:
1. Stroke reordering is usually necessary, to get rid of stroke order variability and of the
problem of delayed strokes.
2. Data unfolding in a purely one-dimensional representation may result in losing direct
reference to the two-dimensional structure of the data.

3.3.6 Handwriting Analysis

As in many well-mastered tasks, human subjects generally work at the highest and most
efficient level of abstraction possible when reading a handwritten document. When difficulties are
encountered in decyphering a part of the message using one level of interpretation, they often
switch to a lower level of representation to resolve ambiguities.
In this perspective, the lower levels of knowledge, although generally used in the
background, constitute a cornerstone on which a large part of the higher and more abstract process
levels relies. For example, according to motor theories of perception, it is assumed that motor
processes enter into genesis of percepts and that handwriting generation and perception tasks
interact and share sensorimotor information. Cursive script recognition or signature verification
tasks therefore require, directly or indirectly, an understanding of the handwriting generation
processes.
3.4 Mathematical Methods

3.4.1 Overview
Processing of written and spoken human language is computation. However, language
processing is not a single type of computation. The different levels of information encoding in our
language as well as the spectrum of applications for language technology pose a range of very
distinct computational tasks. Therefore a wide variety of mathematical methods have been
employed in human language technology. Some of them have led to specialized tools for a very
restricted task, while others are part of the mathematical foundations of the technology. In this
overview a very general map is drawn that groups the different approaches. Some particularly
relevant classes of methods are highlighted in the remaining sections of the chapter.
In the early days of language processing, most if not all researchers underestimated the
complexity of the problem. Many of them tried to bypass a mathematical characterization of their
tasks and solve the problem simply by looking at the envisaged inputs and outputs of their systems.
Purely procedural early approaches to machine translation fall in this category. These attempts
failed very badly. However, there is one major difference between language processing and most
other areas with highly complex calculation tasks, e.g., computational meteorology. One system
exists that can handle human language quite decently, i.e., the human cognitive system. Moreover,
there is a scientific discipline that strives for a formal description of the human language faculty.
Very soon the great majority of researchers became convinced that one needed to utilize
insights from linguistics—including phonetics and psycholinguistics—in order to make progress in
modeling the human language user. Modern linguistics tries to characterize the mapping between a
spoken or written utterance and its meaning. Linguists do this in roughly the following way. They
break up the complex mapping into suitable levels of representation, specify representations in
dedicated formalisms, and they employ the same formalisms for specifying the implicit linguistic
knowledge of a human language user. Traditionally their main data are collected or invented
example sentences, judged and interpreted by introspection.
Almost exclusively, discrete symbolic methods have been employed for representing
different types of information in linguistics. (The only exception was phonetic signal processing,
where Fourier transformations converted the two-dimensional acoustic signal into a three-
dimensional spectrogram.)
3.4.1.1 High-level Linguistic Methods
The mathematical methods of syntax, morphology and phonology are suited for describing
sets of strings, especially hierarchically structured strings. Most of these methods came from formal
language theory. Most notable is the formal theory of languages and grammars emerging from the
Chomsky hierarchy, an inclusion hierarchy of classes of languages. Each class of languages
corresponds to a certain level of computational complexity. For these classes of languages, classes
of grammars were found that generate the languages. Each class of languages also corresponds to a
type of automaton that accepts strings of a language. Typical for this research was the close
interaction between theoretical computer science and formal syntax with strong influences in both
directions. Much investigation has gone into the question of the proper characterization
of human language with respect to the Chomsky hierarchy.
The grammars and automata of formal language theory can rarely be applied to natural
language processing without certain modifications. The grammar models developed in linguistics
do not directly correspond to the ones from formal language theory. A variety of grammar models
have been designed in linguistics and language technology.
The grammars of formal language theory are rewrite systems with atomic non-terminal
symbols that stand for lexical and syntactic categories. However, in human language such
categories have complex properties that influence their syntactic distribution. Therefore
mathematical tools were developed for expressing linguistic entities as sets of complex features. A
new class of logic, so-called feature logic evolved. This branch of research, often subsumed under
the term unification grammar, had close links with similar developments in knowledge
representation and programming languages.
The specialized processing methods that were developed for unification grammars had
strong connections with constraint logic programming. The situation is somewhat different in
semantics. Here representation languages are needed in which we can represent the meaning—or
better informational content—of an utterance. In order to provide unambiguous representations of
meaning that can serve as the basis for inferences, logic is employed. Many varieties of higher order
predicate logic have been developed for this purpose. Special representation languages such as
frame and script languages came from artificial intelligence (AI). In the last few years, many of
them have received a logical foundation. General purpose and specialized inference techniques
have been employed for interpreting the meaning representation in connection with knowledge
about the linguistic context, situational context, and the world. Logical deduction is the inference
technique mainly used, but there are also approaches that utilize abduction methods.
The last two decades witnessed a convergence of theoretical linguistics and language
processing with respect to their mathematical methods. On the one hand, this movement proved
very fruitful in many areas of language processing. On the other hand, it also lead to some
disillusionment concerning the potentials of formal linguistic tools among practitioners of language
technology. Although the specification of linguistic knowledge improved quite a bit through the use
of advanced representation techniques, the resulting systems still lacked coverage, robustness, and
efficiency, the properties required for realistic applications. It even seemed that every increase in
linguistic coverage was accompanied by a loss of efficiency since efficient processing methods for
linguistic representation formalisms are still missing.

3.4.1.2 Statistical and Low-level Processing Methods


Encouraged by a breakthrough in the recognition of spoken words, many researchers turned to
statistical data-driven methods for designing language technology applications. For most of them,
the line of reasoning went as follows. Linguistic investigation of linguistic competence and
cognitive modeling of human language processing have not yet achieved a sufficient understanding
and formalization of the mapping from the language signal to the informational contents of the
utterance or vice versa. However, only very few applications need the complete mapping anyway.
Even if we had a formal model of the complete mapping, we do not have a model of the cognitive
system that could support it, since AI has not come close to modeling human knowledge
processing. If one cannot get access to the human linguistic competence through standard methods
of linguistic research, it may be possible to induce the knowledge necessary for a specific
application indirectly by correlating linguistic data with the desired outputs of the machine. In case
of a dictation system, this output is the appropriate written words.
For a machine translation system, the output consists of the translated sentences. For an
information access system, the output will be a query to a data base. After decades of expecting
technological progress mainly from investigating the cognitive structures and processes of the
human language user, attention moved back to the linguistic data produced by humans and to be
processed by language technology.
A special type of stochastic finite-state automata, hidden Markov models (HMMs), have been
utilized for the recognition of spoken words, syllables or phonemes. Probabilistic derivatives of
many grammar models have been proposed. Statistical methods are employed today for
substituting or supporting discrete symbolic methods in almost every area of language processing.
.
3.4.2 Statistical Modeling and Classification
The speech communication process is a very complicated one for which we have only an
incomplete understanding. It has thus far proven to be best treated as a stochastic process which is
well characterized by its statistical properties.
There are two fundamental assumptions which underlie the statistical model of speech. The
first is that speech is literate. That is, it is well represented by a small set of abstract symbols which
correspond to basic acoustic patterns. Second, the acoustic patterns are differentiated by their
short duration amplitude spectra. Measurement of these spectra shows that they possess a high
degree of variability even for the same symbol and thus are most accurately classified by means of
statistical decision theory. There is another property of speech that makes the application of
statistical methods to it more difficult. Speech comprises several overlapping levels of linguistic
structure including phonetic, phonological, phonotactic, prosodic, syntactic, semantic and
pragmatic information. Thus, to be useful, a statistical characterization of the acoustic speech signal
must include a principled means by which the statistics can be combined to satisfy all of the
constraints composed by the aforementioned linguistic structure. This section describes briefly the
most effective statistical methods currently in use to model the speech signal.

3.4.2.1 Primitive Acoustic Features


We call the voltage analog of the sound pressure wave the speech signal. The signal is usually
sampled at 0.1 msec. intervals to form a sequence of 16-bit integers. Because the signal is non-
stationary, we divide it into short frames of, say, 10 to 30 msec. duration at 5 to 10 msec. intervals.
Thus frames of around a hundred samples are considered to be stationary and a short duration
spectrum is computed for each frame. Although many methods of spectral estimation have been
advocated in the past, the method of linear prediction is often used. Linear prediction provides an
n-dimensional (where 􀀀 is often twelve) parameterization of the spectrum usually in the form of
cepstral coefficients. Thus the information bearing features of the speech signal are usually taken to
be a sequence of twelve-dimensional cepstral vectors with one vector computed every 10 msec.
Due to the intrinsic variability of the speech signal and specific assumptions inherent in the linear
prediction method, the sequence of cepstral vectors is a random process whose statistical
properties can be estimated.

3.4.2.2 Quantization
Since we assume that speech is literate, the signal could also be represented as a sequence of
symbols where each symbol corresponds to a phonetic unit of varying duration. Each phonetic unit
corresponds to a region of the twelve-dimensional acoustic feature space. The regions are defined
statistically by estimating the probability of each class conditioned on the vectors belonging to that
class and then computing the pairwise decision boundaries between the classes as the locus of
points in the feature space where both classes are equally probable.
If the decision boundaries are explicitly computed as described above, then an arbitrary feature
vector can be classified as resulting from the utterance of one phonetic class simply by finding
which region of the space it lies in. As a matter of practice, however, this computation is not
performed explicitly. Rather, a statistical decision rule is used. The rule simply states that a feature
vector belongs to that class whose probability is largest conditioned on the vector. The effect of this
decision rule is to quantize the entire twelve-dimensional feature space into a small number of
regions corresponding to the phonetic classes.

3.4.2.3 Maximum Likelihood and Related Rules


Although the above-described rule appears very intuitive, it is not easily implemented because
there is no direct method for computing the probability of a phonetic unit given its acoustic
features. If, however, a large set of acoustic vectors which have been phonetically labeled is
available, then an indirect method, based on Bayes rule, can be devised. Bayes rule allows us to
estimate the probability of a phonetic class given its features from the likelihood of the features
given the class. This method leads to the maximum likelihood classifier which assigns an unknown
vector to that class whose probability density function conditioned on the class has the maximum
value. It is most important to understand that ALL statistical methods of speech recognition are
based on this and related rules. The philosophy of maximum likelihood is now so much a part of
speech recognition that citations of it in the literature are no longer given.
Statistical methods in speech recognition are often called by different names such as
minimum prediction residual method, minimum risk, minimum probability of error or nearest
neighbor. These rules are all closely related as they derive from the same Bayesian argument.

3.4.2.4 Class Conditional Density Functions


From the previous section, it is clear that ALL statistical methods in speech recognition rest
on the estimation of class conditional density functions for phonetic units or, as we shall see later,
other linguistic constituents. Thus, the performance of a speech processing algorithm depends
critically on the accuracy of the estimates of the class conditional density functions. These, in turn,
depend on the existence of a sufficiently large, correctly labeled training set and well understood
statistical estimation techniques. Regarding the former, there is little to be said of a practical nature
except that the more data available, the better. There are some theoretical results, such as the
Cramer-Rao bound relating variance of estimators to sample size. Obviously the larger the size, the
lower the variance and hence the lower the error rate. However, it is quite difficult to relate
estimator variance to error rate precisely, so the various rules of thumb which are often invoked to
determine sample size needed for a specified performance level are unreliable.
There is one serious flaw to the above-described decision theory. It is predicated on the
principle that if the class conditional density functions are known exactly, then no other decision
rule based on the same training data can yield asymptotically better performance. Unfortunately,
the assumption of exact knowledge of the class conditional density functions is never met in reality.
The error may simply be in the parameter values of the densities or, worse, their form may be
incorrect.
An elegant solution to this problem is to directly minimize the classification error. This may
be done by Juang’s method of Generalized Probabilistic Descent (GPD) which has proven to be
effective in very difficult speech recognition problems.
Another variant of the maximum likelihood methodology is clustering. In many classification
problems, the items in a class differ amongst themselves in a systematic way. A simple example is
the pronunciation of a word by speakers of different national or regional accents or dialects. In
such cases the class conditional densities will be multi-modal with the modes and their shapes
unknown. Such densities can be estimated by clustering techniques. The most effective such
techniques are based on the Lloyd-Max optimum quantizer which has come to be known as the k-
means algorithm. These techniques have been applied to speech recognition.

3.4.2.5 Hidden Markov Model Methodology


If speech were composed solely of isolated acoustic patterns then the classical statistical decision
theory outlined above would be sufficient to perform speech recognition. Unfortunately, that is not
the case. That is, the putative fundamental units of speech are combined according to a rich set of
linguistic rules which are, themselves, so complicated that they are best captured statistically. The
problem is that in order to accomplish speech recognition, one must capture all the subtlety of
linguistic structure with a computationally tractable model. The lack of such a representation held
back progress in speech recognition for many years.
In the early 1970s both Baker (1975) and Jelinek (1976) independently applied an existing
mathematical technique based on the hidden Markov model (HMM) to speech recognition. As the
name of the method implies, the original concepts were proposed by A. A. Markov himself (Markov,
1913).
While the HMM has proven itself to be highly effective in representing several aspects of
linguistic structure, other techniques are presently preferred for dealing with the cognitive aspects
of language, syntax and semantics. Let us first consider syntax which refers to the relationship that
words bear to each other in a sentence. Several aspects of this grammatical structure are well-
captured using statistical methods. The simplest useful way of thinking about syntax is to define it
as word order constraint. That is, only certain words can follow certain other words. One way to
quantify this is to make a list of allowable n-grams, sequences of 􀀀 words, in a language. We can
augment this list with n-gram probabilities, the probability of a word given its predecessors. This
reduces syntax to a Markov n-chain, not to be confused w􀀀ith_ an_ HMM, and the desired
probabilities can be estimated by counting relative frequencies in large text corpora. Once these
numbers are available, an error-ridden lexical transcription of a spoken sentence can be corrected
by finding the sequence of n-grams of maximum probability conditioned on the lexical
transcription. A number of optimal search strategies are available to find the best sequence. For
such practical cases, trigrams are typically used. While the n-gram methods are useful, they do not
constitute full syntactic analysis since syntax is more than word order constraint. In fact, the reason
why syntax is considered to be part of cognition is that grammatical structure is a prerequisite to
meaning. This aspect of syntax can also be exploited by statistical methods.
There are many ways to use full syntactic analysis, but the most intuitively appealing
method is to use the linguist’s notion of parts of speech. These syntactic constituents are categories
which define the function of a word in a sentence. Associated with the parts of speech are rules of
grammar which specify how parts of speech can be combined to form phrases which, in turn, can
be combined to form sentences. Finally, using relative frequencies derived from text corpora,
probabilities can be assigned to the grammatical rules. Using techniques from the theory of
stochastic grammars, it is possible to find a sentence the joint probability of whose lexical
transcription and syntactic structure, or parse, is maximized for a given corrupted transcription
from a speech recognizer. In addition to these statistical parsing techniques, methods similar in
spirit to HMM techniques have been studied by Baker (1979) and Jelinek (1990). In either case,
syntax analysis both increases the accuracy of speech recognition and provides information
necessary for the extraction of meaning from a spoken utterance.

3.4.2.7 Semantics
The ultimate goal of speech recognition is to enable computers to understand the meaning of
ordinary spoken discourse. Semantics is that aspect of linguistic structure relating words to
meaning. Thus, the ultimate speech recognition machine will necessarily include a semantic
analyzer. At present, there exists no general theory of the semantics of natural language. There are
many proposed theories some of which are abstract and others of which are worked out for specific
limited domains of discourse. All such theories rest on the idea that formal logical operations acting
on lexical tokens and syntactic structures yield a formal symbolic representation of meaning. These
theories have not yet been made statistical in any coherent way, although a new approach based
on the HMM seems promising. There is, however, a statistical methodology which captures useful
semantic information. It is called word sense disambiguation. A simple example is found in the
word bank which has two meanings or senses. One sense is that of a financial institution and
another refers to the shores of a river. Clearly, the words commonly associated with the two senses
are quite different. If we know the words that are appropriate to each word sense, then we can use
search techniques to maximize the joint probability of word sequences and word sense. This will
result in higher lexical transcription accuracy. The key to this line of reasoning is the precise
measurement of the closeness of word associations. Church and Hanks (1990) proposed using the
information theoretic measure of mutual information and has analyzed large text corpora to show
that words clustered by large mutual information contents are indicative of a single word sense. It
is thus possible to compile word sense statistics for large lexicons and apply them in statistical
parsing techniques as described earlier.

3.4.2.8 Performance
It would be impossible in a short paper such as this is to completely and quantitatively
characterize the performance of statistical speech recognizers. Instead, I will briefly mention three
benchmarks established by systems based on the methodologies described above. They are
moderate vocabulary speech understanding, large vocabulary speech recognition and small
vocabulary recognition of telephony. Detailed summaries may be found in ARPA (1994); Wilpon
(1994). The most ambitious speech understanding experiment is presently ongoing under the
sponsorship of ARPA. Several laboratories have built (ATIS) systems that provide airline travel
information from spoken input. With a nominal vocabulary of 2000 words and spontaneous
discourse from undesignated but cooperative speakers, approximately 95% of all queries are
correctly answered.
Another highly ambitious project in speech recognition is also sponsored by ARPA. In this
large vocabulary recognition task, the goal is lexical transcription only; so unlike the ATIS task, no
semantic processing is used. The material is text read from North American business publications
by undesignated speakers. The nominal vocabulary is 60,000 words. For this task, several
laboratories have achieved word error rates of 11% or less. Unfortunately, such results are
obtained by computer programs requiring hundreds of times real time.
Finally, the largest commercial use of speech recognition is in the AT&T telephone network
for the placement of calls. In this case, customers are allowed to ask for one of five categories of
service using any words they like so long as their utterance contains one of five key words. This
system is currently processing about 1 billion calls per year. Calls are correctly processed more
than 95% of the time without operator intervention.

3.4.3 Finite State Technology

A formal language is a set of strings (sometimes called sentences) made up by concatenating


together symbols (characters or words) drawn from a finite alphabet or vocabulary. If a language
has only a finite number of sentences, then a complete characterization of the set can be given
simply by presenting a finite list of all the sentences. But if the language contains an infinite number
of sentences (as all interesting languages do), then some sort of recursive or iterative description
must be provided to characterize the sentences. This description is sometimes given in the form of
a grammar, a set of pattern-matching rules that can be applied either to produce the sentences in
the language one after another or else to recognize whether a given string belongs to the set. The
description may also be provided by specifying an automaton, a mechanistic device that also
operates either to produce or recognize the sentences of the language.
Languages have been categorized according to the complexity of the patterns that their
sentences must satisfy, and the basic classifications are presented in all the standard textbooks on
formal language theory, e.g., Hopcroft and Ullman (1979). The sentences of a regular language, for
example, have the property that what appears at one position in a string can depend only on a
bounded amount of information about the symbols at earlier positions. Consider the language over
the alphabet a, b, c whose sentences end with a c and contain an a at a given position only if there is
an earlier b. The strings cccc and bac belong to this language but abc does not. This is a regular
language since it only requires a single bit to record whether or not a b has previously appeared. On
the other hand, the language whose sentences consist of some number of a’s followed by exactly
the same number of b’s is not a regular language since there is no upper bound on the number of
a’s that the allowable number of b’s depends on. This set of strings belongs instead to the
mathematically and computationally more complex class of context-free languages.
A regular language can be described by grammars in various notations that are known to be
equivalent in their expressive power—they each can be used to describe all and only the regular
languages. The most common way of specifying a regular language is by means of a regular
expression, a formula that indicates the order in which symbols can be concatenated, whether there
are alternative possibilities at each position, and whether substrings can be arbitrarily repeated.
The regular expression {c*b{a|b|c}*| ε}c denotes the regular language described above. In the
notation used here, concatenation is represented by sequence in the formula, alternatives are
enclosed in braces and separated by vertical bars, asterisks (often called the Kleene closure
operator) indicate that strings satisfying the previous subexpressions can be freely repeated,
and 􀀀 denotes the empty string, the string containing no elements. The regular languages are also
exactly those languages that can be accepted by a particular kind of automaton, a finite-state
machine. A finite-state machine (fsm) consists of a finite number of states and a function that
determines transitions from one state to another as symbols are read from an input tape. The
machine starts at a distinguished initial state with the tape positioned at the first symbol of a
particular string. The machine transitions from state to state as it reads the tape, eventually coming
to the end of the string. At that point, if the machine is in one of a designated set of final states, we
say that the machine has accepted the string or that the string belongs to the language that the
machine characterizes. An fsm is often depicted in a state-transition diagram where circles
representing the states are connected by arcs that denote the transitions. An arrow points to the
initial state and final states are marked with a double circle. The fsm, shown in Figure 11.2, accepts
the language{c*b{a|b|c}*| ε}c._

Figure 11.2: Finite-state machine diagram.

Because of their mathematical and computational simplicity, regular languages and finite-
state machines have been applied in many information processing tasks. Regular expressions are
often used to specify global search patterns in word-processors and in operating-system utilities
such as Unix’ Grep. The lexical analysis component of most modern programming language
compilers is defined as a finite-state machine that recognizes identifier classes, punctuation,
numbers, etc. (Aho & Ullman, 1978). But until relatively recently, finite-state techniques have not
been widely used in natural language processing. This is in large measure the result of Chomsky’s
argument (Chomsky, 1957) that natural language sentences are so rich and complicated that they
cannot be accurately characterized by means of regular or finite-state descriptions. One
consequence of this was that a generation of linguists and computational linguists came to believe
that finite-state techniques were of little or no interest.
It may well be true that complete and accurate natural language syntactic and semantic
dependencies lie beyond the power of finite-state description, but work in the last twenty years
(and particularly in the last five years) has identified a number of important problems for which
very efficient and effective finite-state solutions can be found.
One set of solutions relies on the observation that finite-state descriptions can provide an
approximation to the proper grammatical description of a language, an approximation that is often
good enough for practical purposes. The information extraction problem, for example, requires that
documents and passages be identified that are likely to contain information relevant to a given
user’s needs. Full-blown syntactic and semantic analyses of documents would certainly help to
solve this problem. But such analyses may provide much more information than this task actually
requires. Indeed, Appelt, Hobbs, et al. (1993) have constructed a finite-state solution that is
extremely efficient compared to more powerful but more complex extraction systems and also has
very favorable recall and precision scores. Their finite-state pattern specifications can only
approximate a complete analysis of a text, but the approximation seems close enough for this
particular purpose.
There has also been growing interest in using finite-state machines for storing and accessing
natural language dictionaries. Appel and Jacobson (1988) observed that the words in a large
lexicon can be very compactly represented in a state-transition graph. This is because the graph can
be transformed using determinization and minimization techniques that are well-known from
finite-state theory, with the result that prefixes and suffixes common to many words are collapsed
into a single set of transitions. Lucchesi and Kowaltowski (1993) discuss access methods for finite-
state dictionary representations that permit efficient retrieval of translation and synonymy
information associated with particular words.
A third set of problems require that strings be systematically transformed into other strings.
For example, the negative prefix in in the abstract phonological representation such as in+practical
must be realized with an assimilated nasal as in impractical, or the inflected form stopping must be
mapped either to its stem stop (for use in information retrieval indexing) or to its morphological
analysis stop+PresentParticiple as an initial step in further processing. One formalism for
describing these kinds of string transformations are context-sensitive rewriting rules as discussed
for example by Chomsky and Halle (1968). Johnson (1972) and later Kaplan and Kay (1981);
Kaplan and Kay (1994) showed that for each rule of this type there is a finite-state
transducer that maps input strings to exactly the same outputs that the rule prescribes.
A transducer is a simple extension of a basic finite-state machine: as it reads its input tape and
makes a transition from one state to another, it can also write a corresponding symbol on a second
tape. When it reaches a final state and the end of the input, the string produced on the second tape
is taken to be the output that the input string is mapped to.
Mathematically, the collection of input-output string-pairs for a given transducer
(corresponding perhaps to a particular rule) constitutes a regular relation. Regular relations share
many (but not all) of the formal properties of the regular languages (for example, closure under
union and concatenation) and also enjoy certain other properties. In particular, the regular
relations are closed under composition, and Kaplan and Kay use this fact to show that the effect of
an entire collection of rewriting rules making up a phonological or morphological grammar can be
implemented as a single finite-state transducer. This device is much more efficient than any scheme
using the rules directly for performing the string transformations that the grammar describes.
Koskenniemi (1983) proposed a different rule notation, called two-level rules, for characterizing
phonological and morphological variations. These rules also denote only regular relations and can
be transformed to equivalent transducers.
UNIT – II
Introduction to Semantics and Knowledge Representation: some applications like Machine
translation, database interface Semantic Interpretation, word senses and ambiguity, Basic logical
form language, Encoding ambiguity in logical from, Thematic roles, Linking syntax and semantics,
Recent trends in NLP.
 
         S

/  "0


  

<@?3<@?  Q B  J  Q,D 


pH14BD66'/+ 7’@EH1<EA FQ<=)O@N@E6"%_)?% € <=)?AJ+S)O%\@EHC+ R"H1Z/[34*@E6"<p7=A*7’@E6"Z +1%('H14*@tR"H1Z/67)O@E7
@=<L+-%57N>,+a@E)OHJ%)?%bqY)O%('3)‚n*X +-Z/)O> nPop%32J>O),7N9+1%('/H-@E936"<I>?+1%32145+-21687"Wm$l@I),7p)?%36rV*[e6"%(7=)?:161n


)?ZZ/68'*)?+-@=6+-%('U7=)OZ#43>O@E+-%(6"H1457"W]XG936K>?+1%3214(+1216 `(+1<=<E)?6"<L7QZ/6">O@i+;FG+;A1WGXG936K<E)?RL93f
%3687=7SH-BtH1@=936<Y>?)}@E6"<L+a@E43<=687YHJ[56%(7S43[@EHb6":J6"<EA1HJ%361WiXG936KF H1<E>?')?7S)O%\@E6">?>O68Rk@=45+->?>OA
+-%5'~R"43>}@E43<L+->?>OA4(%3)}@E6's)?%\@=HH1%36JW_XG93)?7^)?7i@E936'*<E6+-Z H-BI[e6"HJ[3>?6_FIHJ<=‰P)?%32)?%s+
Bƒ+17ERr)?%(+-@=)?%32_+-<E6+#H-B<=687N68+-<LRL9bR"+1>O>?6' X+1RL9()O%36 Xm<E+1%(7=>?+-@=)?H1%s XX]„rW
‘ >O@=9(H1432J9#@E936Y+1`5Ha:J6 2JHJ+1>3)?7Bƒ+-<BD<EH1Z <=68+->?) 8+a@E)OHJ%na@=9(6QT(<L7’@t7=)O2J%(7H1Be>?)OZ/)O@=68'
7=4(R"R"67E7K+-<E6b+1[3[(+-<E6"%\@8W$&% Bƒ+1Rr@n@E936T(<L7N@^Bƒ+->O@=6<=)?%32~7N@=6[(7#)O% X‹XFI6<=6_@L+-‰J6"%
)?%s@=936 v ª [ 7C+-%5' v ª  7W W 4*@#+-@^@=95+a@^@E)OZ/61n BDHJ<=Z_+->I>O)?214()?7N@=),R"7^+1%(' +-<=@=)OT5R")?+1>
)?%J@E6">?>O)?216%(Rr6i9(+J'_`(+-<E6">?A/`566"%`5HJ<=%n*+1%('bR"H1Z/[34*@E6"<G7=R")O6%(Rr6iF +J7t)?%b)O@E7 )?%*Bƒ+-%5RrA1W
‘ 7Y+#<E67=43>O@nP@E936 6r†H1<=@E7 NBƒ+-)?>O68' #@=H_+1RL93)?6":J6 7N4(RRr687=7W  

<@?3<>=  B    R  G EN D "B,G R &GPQ,D


     

@=gyF 95H1+a9(<L@+-'3@YF 7 +JZ_)O%7m+-<E+/‰1668c\7N7G643%J@=)?<E@E9(66"'6 'C%(XR"BD61HJXWP< {3X@EHJ+JXˆ<Q7N‰6rF V37N+J+-H7 Z/+Y'3[3)}`3T5>?61)?R">OnP43)?%3@E>OH@21s45@=+-<L$l+->*@Y%('*FG7=),Rk>?+1+-@=7])?@=H16S`5%5@=6+-93>O<E)?6C6"AC:JBD+16HJ%('U>O>?'^Ha+aFQ<E@]43)?HJ%3>?6%327m6 7=BDH16"@E%\)O< Z/@E<E6"6"6 %(H1@ER"<L9(6Y'*+a6BD@S<E<=H1)?+1%3Z >O2 >


q])?%('*)@=Hop%(21>?)?7=9u
‚;{>{Pœ { ˆ } ™ƒ}7;{ >˜,;{>{
˜ ˜ }7‚7|
!{ˆPœ ˆ ‰ z>˜ † ƒ;} 7ƒ;ˆ} › ˆ„,}z


  

ˆ ˆ
- 
  

@=9(6qY)O%('3)FIHJ<E'37S+-<E6K<E6"[3>,+1R"6'U`PAUop%(21>?)?7=96c\43)?:a+->?6"%\@E7i+-%5'<E6"H1<L'*6<=68'd@=HBDHJ>}f
>?HaFˆ@=9(6]7N68cJ4(6"%(R"6Q%3H1%*fl:16<=`3f %3HJ43%K)?%(7N@=68+1'H-Be%3H143%3f %3HJ43%*fl:16<=`W ]%*BDHJ<N@E43%(+-@=6">?A1n
v 3v
v \z       $&%  %  P$]%

@=9()?7i%(+1)O:J6KZ/6r@E93H*'Bƒ+1)O>,7S@=HbF H1<E‰216%36"<L+->?>OAJW ‘ %P43Z#`56<SH1BI'*)O_R"43>O@C[(<=HJ`3>O6Z_7


RrHJZ/6I)?%K@E936QFG+;A1n;>O)?‰16GF H1<L'K7=6"%(7=6G7N6>O68Rk@=)?H1%0n-RL93H1),Rr6 H-B†7N6%J@E6"%(R"6Q7’@E<=4(Rr@=43<E61na[(<=H1f
%3HJ43% <=6"BD6"<E6"%(R"61n%3HJ43%*fl%3H143% Z/H*'*)OT5R"+-@=)?H1%nR"H1%:(’43%5Rk@=)?H1%57^>?)O‰J67) +1%(' + +-%(' ) H1< +?n
),'*6"%\ot@=)O+JT5RLR9+a@=F )?H1H1%d<L'dH-9(B +1@=67]%(Z/7=6^+1%P+-A%(''*)OZ/e6HP<='(6+-%\>?@])}@’ZAJn*686r+-@E%3R1)?W%3g~2\7Q6 H1FQ<Y)?7=>?6">0%57N7N6686 7"7NWGHJŒPZ6">?6 6RrH1@=B0)?H1@E93%U6H-7=6B@E`59366K>OHa+-F [3W f
[37=6"<E%\H1@=[36<E%(),+aRr@=6C6C)?%7=6"qY%(7=)O6C%5'*)?7G))?%3%d68RrFQ689(7=7E)?+-RL9O<EA/) (’BDH-H1@E< +J@=+,<L+*+-9(%5+J7N7Q>,+a+/@E)OHJ'*%)}†W6"{3<E6"H1%\<Q@Y6"V37=6"+-%(Z/7=6i[3>?@=619(n3+1Rr%OHJ%() [37=)?>?'*H164(<Q219+-%3+?u H1@=936<
‚;{>{Pœ { } {7{ „>„ >˜!„>„
!{Pœ ‰ ˆ‚} ‰ †{‚} 7ƒ;} ”7{,‚>
 

-

H1<
‚;{>{Pœ { } ƒ!˜ !{7{ >˜,;{7{
!{Pœ ƒ!{,‚ ˆˆ } ’>’ }† 7ƒ} ƒ!˜,‚ ’ }
 

-

XG9P4(T7 ) (’H-@L+1E+ +"R"+1% `56p<E6"[(>?+JRr6'i`PVA ) [3>?H14(219 +?[n ) [3<E6"[(+1<=6 +rH13< ) 9(+1<=%(67E]7 +"'36"[e6"%('*)?%32]HJ%
@=9(6S7=6"%(7=6Q)OZ/[3>?)?6'_)?%/@=936i7N6%\@=6"%5Rr61WXG936YR"H1<E<=68Rk@t7=6"%(7=6]Z#4(7’@tT(<E7N@t`e6Y),'*6"%\@E)}T(68'
BDH1<QXG68+193RL6p9d7=H-6"Bm%\@=@=693%(6 RrF6p7NH1@=<L<E'34(7GRr@=`e436r<EBD6pH1<EZK6 457N7’6@0>O68+1Rk>?@E7=)OHQ%32`56t@E93)O%\6^@=6+-<=[3[([(<=<=6"HJ@=[36'C<=),+aR"H1@E6S<E<=<E686"Rk[(@=>?>?AQ+JRrBD6"HJZ/<@=6<L%J+-@8%(W 7=>,+a@=)?H1%0W
{3HJ<Q6rV3+-Z/[3>?61n*)?%
  ’ { !{Pœ } ’ ƒ ˜ ˆ 7ƒ} ƒ„7z>z „7ƒ ,ƒ;}š}z,} ’ ”7˜ ‰ } 

9@=(XG9(+;93<=:J6K66K6S@=`e6"@E>?6"<E6+16"7E%h%(Rr7=HJR">?[5+-+16@=<=)?<EH1RrAP%HJ)O%343+-2>?<E'6i@=9(93'*+;6)O:1†@=6^6"6<E>O`e6"686"%\7=6R"@]%UH1)O[e%U@=619(qYn6#HJ)O%()?<S%('3)O7N)‚@i@=u <E),437SZ/@=936"6/%\@S93H1)?>?Bt> 7NFQ6)O6"@=)?9~%32(@=ne9(H16K< @EU]6">?+-67EZ/RrH1687N[e961W^R"H1XG439(>,' 6


;};„ ˆ } 7>‚7|!„>„ ˆ { ;{>{,‚;{>{ ;{Pœ } ’ ƒ ™˜ ‰ {Aƒ!{>{ „7„ ‰ {,‚ }A™ƒ!{>{
;;};};„„ ˆ }} !{!{PPœœ }} ’ ƒšƒš™™˜˜ 7777‚7‚7||„>„>„„ ˆ {{ ™!!{}>{>z>’ „>{7„{,7ƒ ‰ {A{Aƒ!ƒ!{>{>{{ „7„7„„ ‰ {,{,‚‚ }A}A™™ƒ!ƒ!{>{>{{
  -  

ˆ ’ ˆ ‰ ‰
    
    

q]6%(Rr6Jn)}@CF H143>,'`e6#)?Z/[5HJ<N@L+-%\@i@=Hb),'*6%J@E)}BDA@=936<=6>?+-@=)?H1%(7=93)?[H-Bt@=936#@=6>O687=R"H1[e6
RrHJ<=<E6Rr@=>?AbƒR+->?>O68'@=936S7N6%J@E6"%(R"6]7’@E<=45Rk@=4(<=6;„kWpŒP4(RL9/),'*6"%\@=)OT5R+a@=)?H1%57pZ/)O2J9\@p<=68cJ4()O<E6
BDH1>?>?Ha{3FQ<E6)O%3c\2_436+%\@=[(>?+1A1<En-+1)O@t21<L),7+-[3)OZ/9b[e+1H1%(<='@E+1Z/%J@+1)O@E%\HC@L+-T()?%(%3'#)O%(@=2/9(6]+/<=R"6"H1BD%\6"<E@=6"6"%\V\@@8W H-Be@=936Y[3<=HJ%3H14(%W IHJ%(7N),'*6<
@=9(6CBDH1>?>OHaFQ)?%32_7N6%\@=6"%5Rr67 BDHJ<Q6rV3+-Z/[3>?61u
;˜ { {‹”7˜ ˜ 7ƒ;}š‚˜>{
 ’ †{,‚>’ } ‹|;{,‚,™ „ ˆ>†ˆ ˜ ˆ ’ }7};„ ˆ† „
  
 -

XGq]93)?%(6#'*T() n*<L@E7’93"@ 6 ) )O@21+6%(R+-'*%‹6"<Q<EH16rBmBD6@E<Y93@E6 Hd:1+d6<='*`HJFI2(HJn43Rr>,Ha'dF '3HJ6"<S[e6"<E%(HJ+J'd'H1WC%$lBI@=9(H16 %(6K216"F %56"'*<E6"6^<Q@=H1<LB0+-@=%59(7N6 >,+a<=@E6")O%3BD6"2<E6")O%\%\@@EHu


{ƒ!{  ’ } }A™ƒ!{A™{‚;{ |>ƒ!˜ ˆ ™{ ˆ } z>{ † {>{


$lB ) :a+-95,+ +*FG+17G<E6rBD6"<E<E)O%32#@=H@E936^RrHaF H1<G@=9(6 <=H\+1'n\BD6"Z/)?%3)?%36^F H143>,'b`e6 4(7=6'W


$l@SZ_+;AU7=H143%('<=),'*),Rr43>?H14(7]@=Hd6":J6"%‹R"H1%(7=)?'36"<]@E9(+a@S@=936<EHJ+1'UR+-%`5+-<E‰eneH1<i)O@
Z_+;Ad7=6"6Z HJ`P:\)?H1457Q@=95+a@S)O@S),7Q@E936#'3H12FQ93),RL9U`(+1<=‰*7W]q]HaF 6":16<n(<=68+1RL93)?%327=4(RL9
RrHJ%(Rr>?4(7=)OHJ%(7 <=68c\43)O<E67GZ/HJ<=6i@=95+-V% (’4(7’@]F H1<L'<=6[3>?+JRr6Z6%\@Q+1`3)?>O)O@’A1W
[ n!#P v y
XG936/[3<EH1`3>?6"Z_7 '*687=R"<=)?`e6'h+-`eHa:16[5HJ)O%\@C@=H+RrHJZ/ZHJ%‹@E936"Z/6Ju ‘ 7=6"%\@E6"%(R"6
ZK457’@]`e6 N43%5'*6"<L7’@EHPHP' ‚+a@]>?6+17N@Q[(+-<=@=),+->?>?A*„I`e6rBDHJ<=6C)O@YR"+-%d`e6C@=<L+-%(7=>?+-@=68'W
Y+-@=43<L+->m>?+1%32145+-216 4(%('*6"<L7N@E+-%5'*)O%(2)?7i+95+-<L'd@E+17=‰d`568R"+-457N6#)}@i<=68cJ4()O<E67]BDH1<=f
 

ZK4(>?+-@=)?%32b%3H1@YH1%(>OAU+_2J<E+1ZZ_+1<QBDH1<S@=936#>?+1%32145+-216 `(4*@C+1>?7=H4(7=)?%32`(+JRL‰\2J<=HJ43%('
‰P%3HaFQ>?6'*2J6_)O%(R">O45'*)O%(2~RrHJZ/ZHJ%ˆ7N6%(7=6‰P%3HaFQ>O68'*216JW ‘ >?>t@E93),7KZ#4(7’@`564(7=6' )?%
RrHJZ/[3>O6"V+-%5'd+J7GA16"@G4(%3‰P%3HaFQ%FG+;A*7I@=H_[3<EH*Rr67E7G+21)?:16"%@=6"VP@W
  <@?3< R    R R D 


XG936s'*)O_R"43>O@’AwH1B^@=9(6‹@L+17=‰ŠZ_+-‰J67)}@URr>?6+1<@E9(+a@>O)O@=6<E+-@=43<E6‹+1%(' [eHP6r@=<EAy+-<E6
`e6"A1HJ%(' XXh)O%C@E936BDHJ<=687N66+-`(>O6BD4*@E43<E61Wm.06"2\+->a@E6rVP@E70FQ93)?RL9 +1<=6pR+-<E6rBD43>?>OAi'*<L+aB @=68'
@=H~+;:1H1),'s+-Z#`3)O2J43)O@’A~+-%5's43%PF +1%J@E6'h)?Z[(>O),R"+-@=)?H1%(7KZ)?219\@#+1>?7=HU`e6b`e6"A1HJ%('h@=9(6
+-Z#`3)O@H-B XXCW JH1‰1687/+-%('ŠH-@E936"<Z_+-@=6"<E),+->QFQ93),RL9Š<E6">?A HJ%wR"H1%P:16AP)O%32s'*HJ43`3>?6
Z/6+1%3)O%(2/+1<=6 +1>?7=H#%(H-@]21HPH*'dR"+-%5'*)?'(+a@=687 BDH1< XX+a@Q[3<E67=6"%\@8W
‘ <E6^@E936"<E6/+-%PAd@E+17=‰U'*H1Z_+1)O%(7SFQ936<=6 XX ),7S+1[3[3>?)?R+-`3>?6 ygy93)O>?6/'*6rT(%()}@E)O:J6
7N@E+a@E6"Z/6"%\@YR+-%3%3H1@Q`56 Z_+1'361n*)O@S7N66"Z_7 @=95+a@Q@=9(6C@E+17=‰b'*HJZ_+-)?%(7Q+-<E6C<E+-@=936< %5+-<=f


<EHaF W ‘ @T(<E7N@ 7=)?219\@ )O@+-[3[e6+1<E70@=9(+-@R")O<LRr4(>?+1<E7n;H-Rr),+->PRrHJZZ#43%3),R"+-@=)?H1%naZ/)?%\43@=67


H-B*Z/6"6"@=)?%32J7n8@=68RL93%3),R"+->1>O)O@=6<E+-@=43<E6p)?%(Rr>?4('*)?%32Y7ERr)?6"%\@=)OT5R[5+-[e6"<L7"n8Z_+-%P4(+->,76"@ER-W +-<E6
+-[([3>O),R"+-@=)?H1%(7p@=9(+-@GR+-%`e6i9(+-%('3>O68'_`PA X‹XCWPXG9(67=6]@=6"VP@E7I9(+;:16]>?)?Z)O@=68':JHPR+-`343f
>?H1<EAb+-%('bT3V*6'7N@’AP>O687"WXG936)O<]+14('*)?6"%(R"6 )?7G%3H1@]:16<=A_>,+-<E216JW X‹+-%PAH-B7=4(RL9b@E6rVP@E7
9(+;:J6/+U7=93HJ<N@^>?)}BD6JW I)?<LRr43>,+-<L7 +-%('hZ/)?%\43@=67KH-BQZ/6"6"@=)?%32J7^+1<=6/4(7=6's)?%\@=6%(7N)?:16>OA
BDH1<G+K7=93HJ<N@ [56<=)?H*'_H-B@E)OZ/6JnP+1%('/@E936"%bT(>O68'b+1%('/BDH1<E21H1@N@E6"%WXG936A/+1<=6S+->,7NHRrH1%3f
7=)?'*6<=68'b4(%3)O%\@E6"<E67N@=)?%32/`\A9P43Z_+-%@=<L+-%(7=>?+-@=HJ<E7PW W 68R"+-457N6CH-B @=9(6^+-`eHa:16i<=68+17=H1%(7n
@=9(67=6@=6"VP@E7+-<E6)?'36+-> R"+1%('*),'3+a@E67KBDH1V< X‹XCW qYHaFI6:16"<8nm@=936<=6d+-<E6%3H[3<L+1Rr@=),R"+1>
+-43@=H1Z_+-@=),R 7NA*7N@=6"Z_7GBDHJ<]@E<E+1%(7N>,+a@E)OHJ%H1@=936<Q@=95+-%UH1%(6^7NA*7N@=6"ZŽ)?%@=936K'3H1Z_+-)?%H-B
Z/6r@E6"H1<EH1>?H12J)?R+->5BDH1<E6R+17N@E7W
ŒPH Bƒ+-<dFI6~9(+;:16‹'3)?7ERr4(7E7=6' XX +17b+ˆBD43>?>OA +14*@=HJZ/+-@=),R‹7=A*7’@E6"Z <=68c\43)O<E)?%32
%3HŠ9\4(Z/+1%y)O%\@E6"<E:16"%\@E)OHJ%W XG936<=6‹+1<=6‹H-@=9(6"<b[eHJ7E7=)O`3)?>?)}@E)O687+J7F 6">?>‚W € %36‹R"+1%
9(+;:J6]9P43Z_+-%+-),'*68'_Z_+1RL93)?%36Q@E<E+1%(7N>,+a@E)OHJ%WXG936Y9P43Z_+-%~ @=<L+-%57N>,+a@EH1<k„RrH14(>?'_[(<=6"f
6'3)}@d@=936~@=6"VP@d)?%y@=9(6h7=H143<LRr6~>?+1%32145+-216U[5+-<L+-[393<L+17=)?%32s@=936s'*)ORr43>O@7=6"%\@E6"%(R"6
RrHJ%(7N@=<E4(Rk@E)OHJ%(7I7=H^@E9(+a@ @=9(6iR"H1Z/[34*@E6"<GR"+-%43%('*6<E7N@E+1%('_)O@n*HJ< +1)?'/@E936CRrH1Z/[343@=6"<
'*43<E)?%32@=936C@E<E+1%(7N>,+a@E)OHJ%[(<=H*Rr687=7 FQ936%)}@]9(+J7]+/'*)O_R"43>O@’Ab)O@YR"+-%(%3H-@Q9(+1%('*>?61n3HJ<
[eHJ7N@Nfl6'*)O@@=936Q2J6"%36<E+-@=68' @=6"VP@p)O%@=9(6G@E+-<E216"@>?+1%3214(+1216JW0XG936 XXw7NA*7N@=6Z 7N9(H143>,'
`e6 :\)?6"F 6'd+17Q+#@EH\HJ>[3<EHa:P)?'*68'_@EH/@=936 9P43Z_+-%@=<L+-%57N>,+a@EH1<8W
XUH\7’@ X‹X 7NA*7N@=6"Z_7S[3<E67=6"%\@]@=H*'3+;Ad<E6c\43)?<=6^[eHJ7N@Nfl6'3)}@E)O%325W YH-@=6K@=9(+-@S)OBp@=9(6
XX 7=A*7’@E6"Z_7Y`e6R"H1Z/6^<E6+J7NHJ%(+-`3>?Ab2JHPHP'n(@E936K[eHJ7N@S68'*)}@EH1<i%36"68'37Y@=H‰P%3HaF H1%(>OA
H1%(6i>,+-%(214(+12161W
<@?3< ;  B D   BF
D@R Q 3

XG93693),7’@EH1<EAH1B XX R+-%`e6K@E<E+JRr68'7’@L+-<=@=)?%32BD<EH1ZŽ@E936#68+-<E>OA [ 7]FQ9(6"%‹)O@CF +J7


<E6+->?)"6'@=9(+-@^RrH1Z/[343@=6"<L7CRrHJ43>?'~`e6457N68'BDHJ<S@E<E+1%(7=>?+-@=)?H1%W#$&%‹@E936 SŒ†n+d>,+-<E216
%P43ZK`e6"<GH1B0<=687N68+-<LRL9_21<EH143[57I7=[3<L+-%32#43[b@EH#F H1<E‰/H1%b@=936i@E+J7N‰Uƒ4(7N45+->?>OA*U]4(7=7=),+-%
v©       $&%  %  P$]%

@=Hot%321>?),7N95„rn3FQ)O@=9BD43%('3)O%32BD<=HJZ '*6"BD6"%(R"6^+-%('d)?%J@E6">?>O)?216%(Rr6 687’@L+-`3>?),7N93Z/6%J@L7"WI$&%


@=9(6 SŒ3,Œ U n3@E936"<E6^FG+17Q+_7=)OZ/)?>?+1<]6r†H1<=@Q@=H_@E<E+1%(7=>?+-@=6iBD<=HJZ ot%321>?)?7=9U+-%('{(<=6%(RL9
@=H UQ457=7=)?+1%W
‘ 7 Z/6"%\@E)OHJ%36'6+1<=>?)O6<n\Z/HJ7N@QH-B@E93)?7GF H1<E‰/`(+17=6')O@E7=6">OBmH1%`3)?>O)?%32145+->'*),Rk@=)?H-f
%(+1<=A~>OHPH1‰P43[0WXG936'*6:16>OHJ[56<E7^c\43),RL‰P>OAs7N@E+-<=@=68's<=68+->?) )O%(2@=9(+-@KBƒ+-<KZHJ<=6bF +J7
%366'*68'*W W 4*@^43%*BDHJ<N@E43%(+-@=6">?A1nm)O%h@E936")?<^6"%\@E9\457N),+17=Z +-%5'‹HJ[*@=)?Z/)?7=Z '*4(<=)?%32@=9(6
6+1<=>?A_'3+;A*7"n\@E936"A95+1'[3<=H*R">?+1)OZ/6'_@=9(+-@ X‹X 7=AP7N@=6Z_7IF 6"<E6C+-<EH143%('_@=936 RrHJ<=%(6"<8n
+-%5' @=95+a@ @=936 XXˆ7=A*7’@E6"Z_7mFIHJ43>?'K`56GR"+1[(+-`(>O6 H-B([(<=H*'*4(R")O%(2Y93)?219*f&c\4(+->?)O@’AS@E<E+1%(7Nf
>,+a@=)?H1%57 BDH1<Q216%36"<L+->5@=6"VP@E7QFQ)O@=93HJ4*@Y+-%PA_9P43Z_+-%d)?%J@E6"<E:16%J@E)OHJ%WXG9P4(7G)O%d@E936 SΠn
FQ936%+/R"H1Z/Z/)}@=@=6"6^R+->?>O68' ‘ .0‡ ‘ yF +J7Q7N6"@]43[d@=H_6:;+1>O45+a@=6i@E936 XX <=687N68+-<LRL9n
)O@K6+J7N)?>OA~R"+-Z/6/@EH@=936bR"H1%(R">O457N)?H1%h@=95+a@^<E67=6+-<LRL9‹95+1'~Bƒ+-)?>O68'h@=HU>?)O:J6_43[s@=HU)O@E7
[3<EH1Z/),7N687"WQ$l@C7=+1)?')?%)O@E7]<E6"[eH1<=@Y)?% v ª @E9(+a@i`(+17=)?RC<E67=6+1<ERL9FG+17Y%36"68'*6'U+1%('
XX F +J7 %3H-@QBD68+17=)O`3>?6C)O%d@E936CBDH1<E67=6"68+-`3>?6YBD43@=43<E61W
XG936 ‘ .0‡ ‘ ˆ<=6[5HJ<N@<L+-%32i@=9(6Y'*6+-@=9/‰P%36">?>3H1B XXŠ6r†H1<=@E7p)O%/@E936 SŒ/+-@@=9(+-@
@=)?Z/61W ‘ >?>BD43%('*)?%32URr68+17=6'n†@E936/<=687N68+-<LRL921<EH143[57S'*),7=)O%\@=621<L+a@E6'n+-%5'@E936T(6">,'
F 6"%\@p)O%'3)?7=<=6[34*@E61WXG936QBƒ+-@=6QH-B XXw)?%op43<EH1[e6]+-%5'K@E936 iŒ3Œ Uy'*),'_%3H1@tRL9(+-%32J6
7=H'3<E+1Z/+-@=),R"+1>O>?A1W_$l@KFG+17C2J6"%36<E+1>O>?AU<E6RrHJ21%3) "68'n93HaF 6":J6"<8n†@=9(+-@K)O@^F +J7 +dT(6">,'
FQ93H\7N6b@=)?Z69(+1' %3H-@A16"@RrHJZ/61W € %3>OA +BD6"F <E67=6+-<LRL9s21<EH143[57KRrHJ%\@=)?%\4(6's@EH
<E6"Z_+-)?%d+JRk@E)O:J61W
XG936#T(6">,'<E6":P)?:168'U)O%‹@=936>,+a@=6  7S+-B @=6<S@E936/7N4(RRr687=7NBD43>RrHJZ[(>O6"@=)?H1%‹H1Bp@=9(6
X ‘  Xf XotXQo € 7=A*7’@E6"Z )O% +1%(+1'3+h)?% v ª PW $l@@=<L+-%57N>,+a@E67@=9(6 +-%(+J'*),+-%


F 6+a@E936"<BDHJ<=68R"+J7’@L7 BD<=HJZ op%(21>?)?7=9K@EH^{3<E6"%(RL90W ‘ <=HJ43%('#@=936]7E+-Z/6G@E)OZ/6QH-@E936"<I7NA*7Nf


 

@=6Z_7Q>O)?‰16^XG)O@=457^ƒot%321>?)?7=9@=H{3<E6"%5RL9bBDHJ<Q@=6"V\@E)O>?6C@=68RL93%3HJ>OHJ21A3„kn S.*X  I93)O%(67=6


@=HCop%(21>?)?7=9^BDHJ< X‹+a@=9(6"Z_+a@E)?R7 +1%('K‡t9PAP7=),R"E7 (’H143<E%(+->,7L„kna6r@LR-W FI6<=6 +->,7NHi'*6":J6">?H1[e6'W
$&%/@=9(6 [ 7"na@E936 \+-[5+-%3687N6Q7=4(R"R"67E7’BD43>?>?A#RrHJZ/[3>O6"@=68'_+ %(+a@E)OHJ%(+->([3<=[H (’6Rr@Y XU45„
H1% X‹XŠ`e6r@’F 6"6%op%32J>O),7=9+-%(' \+-[(+1%367=61W XG936Sot43<EH1[e6+-% IHJZZ#43%3)O@’AK9(+J7t+1>?7=H
#

43%('36"<=@E+-‰J6"%+-%U+-Z#`3)O@=)?H14(7Q[(<=[H (’6Rk@]R+->?>O68'dot43<EH-@=<L+RrHa:16<=)?%32/+->?>@=9(6^>?+1%32145+-21687
H-B @=9(6 IH1Z/ZK43%()}@’AJWg~HJ<=‰‹9(+17^+1>?7=H`566"%s43%('36"<=@E+-‰J6"%h`PA‹21<EH14([(7i)O%ˆ{(<E+1%(Rr6Jn
i6"<EZ_+-%PA1n(Œ*FQ)}@ "6"<E>,+-%('nP@E936 SŒd+1%('b$&%5'*)?+(W

<@?3< 
  R R D  L L B 3G EN #R
 

XG936<=6i+-<E6Q@=93<E6"6SZ_:+ (’H1<t@’AP[5687tH1B XXy7NA*7N@=6Z/7t@=95+a@GH1%36YR+-%)?Z_+-2J)O%36S`(+17=6'/HJ%


@=9(6")?<G'36"[e6"%('*6%(Rr6CHJ%b>,+-%32J4(+-2J67W XG936ST(<L7’@ @’AP[56CH1#B XX 7=A*7’@E6"Z_7G+-<E6i'*687N)?21%(6'
BDH1<m+G[(+-<=@=),Rr43>,+-<[5+-)?<H-BP>?+1%32145+-21687"W$lBP@E<E+1%(7N>,+a@E)OHJ%iR+-[(+1`3)?>O)O@’A]),7%366'*68'S`e6r@’F 6"6%
+-%(H-@=9(6"<Y[5+-)?<YH-Bp>,+-%(214(+12167n(+b%36"F 7=A*7’@E6"ZŽZK4(7N@S`e6#R"H1%(7N@=<E4(Rk@E6'W WI68R"+14(7N6KH-B
)O@E7t'36"[e6"%('*6%(Rr6YH1%@=936Y>?+1%3214(+121687"n;@E936Y7=AP7N@=6Z 9(+J7@=9(6Y+1'*:a+1%J@L+-2J6I@E9(+a@t7=[e6Rr),+->
BD6+-@=43<E67I+-%('b7N)?Z/)O>,+-<E)}@E)O687p`e6r@’F 6"6%_@=936CRrHJ%(Rr6<=%(6'/>,+-%32J4(+-2J67pR+-%_`e6YZ_+1'*6S4(7=6
H-B’W$l@]),7QR"+1>O>?6'b@=9(6^'*)O<E6Rr@Y+-[3[(<=H\+1RL9W
XG936b7=6R"H1%('h@’AP[e6bH1BQ7=A*7’@E6"Z )?7#`(+J7N68'~HJ% @=9(6RrHJ%(Rr6[*@KH1BG)?%\@=6<=>?)O%(214(+(W ‘
7=6"%\@=6%(Rr6I)O%+S7=H143<LRr6t>?+1%3214(+1216p),7mT(<L7’@+-%(+1>OA "68'^+-%5' )?7 <E6"[3<E67=6"%\@E6'C)?%#+-%#)O%\@E6"<=f
Z/6'*),+a@E6^>,+-%32J4(+-2J61WQXG936K)?%\@=6<=Z/6'3)?+-@=6^>,+-%32J4(+-2J6 %36"68'%3H-@i`56+_9P43Z_+-% =%(+a@=f
43<L+-> K>,+-%(214(+1216C+-%('d),7G@’AP[3)?R+->?>OA+KBDHJ<=Z_+1> NZ_+a@E936"Z_+a@E)?R+-> K>,+-%(214(+12161W ]6"VP@Y+


  
[ n!#P v
216%36"<L+a@EH1<C@L+-‰1687C@=9(6)O%\@E6"<EZ68'*),+a@=6b<=6[3<=687N6%\@E+a@E)OHJ%h+1%('s216%36"<L+a@E67^+U7=6"%\@E6"%(R"6
)?%@=9(6^@E+1<=2J6r@i>?+1%3214(+1216JWSXG936#Z/+ (’H1<i+1'*:a+-%\@L+-216KH-B@=9()?7C+-[([3<=H\+1RL9U),7]@E9(+a@i@=9(6
+-%5+->?A 6"<]H1Bp[(+-<L7=6"<QBDHJ<]@=9(6#7=H143<LRr6 >,+-%32J4(+-2J6 )?7Y)O%5'*6"[e6"%('36"%\@iH1B@=936#216"%(6"<L+a@=HJ<
BDH1<C@E936/@E+1<=2J6r@C>?+1%32145+-216JW ‘ 7^+d<=687N43>O@n0)}BGH1%36/FG+-%\@E7S@EHU`343)?>?'s+d7NA*7N@=6"Z FQ)}@E9
@=<L+-%57N>,+a@E)OHJ%dR+-[(+1`3)O>?)O@’A+-Z/HJ%3216Jn37=+;AJn v >,+-%32J4(+-2J67n*H1%3>?A v [5+-<L7N6<E7Q+1%('d216"%3f
6"<L+a@EH1<L70%(6"6'^@=HC`e6GR"H1%(7N@=<E4(Rr@=6'W IH1%\@E<E+J7’@)}@FQ)O@=9 z*v   v V v ©P„7=A*7’@E6"Z_7m@=9(+-@
%366'@=H/`e6^RrH1%57’@E<=4(Rr@=68'b)?%d@=936CT(<L7N@Q+1[3[3<EHJ+JRL9W
XG936'*)}R"43>}@’AFQ)O@=9‹@=936/7N68RrHJ%('U+-[3[(<=H\+1RL9n593HaFI6:16<n5)?7Y@=9(+-@i)O@S),7i'*)}R"43>}@
@=H '*6"T(%36~@=936h)O%\@E6"<E>O)?%32145+3W ‘ >,7NH5nY)O@d),7d%3H-@d[5H\7=7=)?`3>O6~@=HŠ@E+1‰16h+1'*:a+-%\@L+-216‹H-B
7=)OZ/)?>?+1<=)O@=)?67#`56"@’FI66"%Š>?+1%3214(+121687"W ‘ 7+‹<E67=43>O@nZ_+-%PA <E67=6+1<ERL9s21<EH14([(7K457N6d+
@=9()O<L'~+1[3[3<EHJ+JRL9R"+->?>?6'U@E936@=<L+-%(7NBD6"< +1[3[3<EHJ+1RL90Wi$&%~@=9()?7C+-[([3<=H\+1RL9ne@E936/[(+-<L7N6<
[3<EH*'*4(Rr687@E936Y<E6"[3<E67=6"%\@L+a@=)?H1%BDHJ< +^7=H14(<ER"6]>?+1%32145+-216Jn-FQ93),RL9),7p@E936"%_@E<E+1%(7NBD6"<E6'
@=Hh@E936U<=6[3<E67=6"%\@E+-@=)?H1%ˆBDHJ<@=936@L+-<E216r@/>,+-%32J4(+-2J61W XG936216"%(6"<L+a@=HJ<@E+-‰J67Ha:16<
BD<EH1Z|@E936"<E61WXG93),7K+1[3[3<EHJ+1RL9~)?7K)O%\@E6"<EZ68'*),+a@=6_`e6r@’F 6"6%s@=936T(<E7N@ @’F H(W{(H1< v
>,+-%32J4(+-2J67nQ)O% @=93),7+1[3[3<EHJ+JRL9yH1%3>?A v [(+1<E7=6"<L7b+1%('y2J6"%36<E+-@=H1<L7+1<=6~%36"68'*6'n
93HaF 6":J6"<8nJ@E936 %P43ZK`e6"<QH1B0@E<E+1%(7’BD6<]RrHJZ[eH1%(6"%\@E7G%366'*68'+-<E6 z3v  W
$&%d@=936 $&%5'*)?+1%RrHJ%J@E6rVP@QFQ936"<E6C@=9(6"<E6^+-<E6C+/>,+-<E216C%P43Z#`56<]H-B>?+1%32145+-21687CDH-B f
T5R")?+1>O>?A v Z_:+ (’HJ<]H1%(67L„QFQ93),RL9‹+-<E6K+1>?7=H:16<=ARr>?HJ7=61n5@=9(6K)?%J@E6"<E>O)?%32J4(++-[([3<=H\+1RL9
F H143>,'7=6"6Zj[3<E6rBD6<E+1`3>?61nP`34*@Q@E936 %(+a@E43<E6 H-Bm)?%\@=6<=>?)O%(214(+),7GH1[e6"%W
<@?3< M JB,B Q QGPQJR

I43<E<=6%\@=>?A @=936hT(6>?' )?72JH1)?%32ˆ@E93<EH1432J9 +w[39(+J7N6hH-BK:P)?21HJ<=HJ4(7+JRk@E)O:P)O@’A1W XG9(6


\+-[(+1%367=6 Y+a@E)OHJ%(+->m‡p<EH(’6Rr@^ XU45„Y7N45R"Rr687=7NBD43>?>OAdRrHJZ[(>O6"@=6'd<E67=43>O@=)?%32_)O%+1%)O%*f
'*4(7N@=<E),+-> [3<=H1@=H1@’A\[e61WighH1<E‰d95+17i+->?<=68+1'*A7’@L+-<=@=68'UH1%h+Z_+:(’HJ<S'*6:16">?H1[(Z6%\@S6rB f
BDH1<=@Wot43<=H1@=<L+K[(<=[H (’6Rk@G)?%ot43<EH1[e6S9(+J7 9\4(%('*<E6'37GH1Bm<E67=6+-<LRL936<E7tFIHJ<=‰P)?%32#HJ%)O@W
$l@I457N687@=936Y@=<L+-%(7NBD6"<I+-[3[3<EHJ+JRL9/)O%_FQ93),RL9+C@E<=66]<=6[3<E67=6"%\@E+-@=)?H1%/H-B+^7=H143<LRr6Q>,+-%3f
2145+-21687I7=6"%\@E6"%(R"6S),7GH1`*@L+-)?%36'b`PAb+#7=6"<E)?67 H-B7’@E6"[(7 )?%\:JH1>?:P)O%32KBDH1<EZ_+->†21<L+-Z/Z_+-<L7
+-%5'F 6">?>BDH1<EZ68'*%3687=7YT3@=6<E7W#XG9()?7i<E6"[3<E67=6"%\@E+-@=)?H1%),7S@=<L+-%(7NBDH1<EZ/6'@=H+-%3H1@=936<
@=<E6"6i7’@E<=45Rk@=4(<=6S4(7N)?%32K@=936S@=<L+-%(7NBD6"< RrHJZ[eH1%(6"%\@n*+-%5'/BDH1<EZ_7p@E936S`5+17=)?7pBDH1< 216"%(6"<=f
+a@E)OHJ%W ‘ %P43ZK`e6"<]H1BmH1@=936<Q<=687N68+-<LRL9b21<EH14([(7G+-<E6C+1>?7=H/+1Rk@E)O:J6C)O%Uop4(<=HJ[56JW
$&%('*),+~@=HPHs9(+17_+JRk@E)O:J6d2J<=HJ43[(7)?% X‹XCWtXG9366+1<=>?)O687’@_[34(`3>O),7=936'wF H1<E‰ F +J7
43%('36"<=@E+-‰J6"%`PA I9(+1‰P<E+1`5HJ<N@’A_)?% v ª W XUH1<E6C<=68Rr6"%\@E>OAJn(F H1<E‰9(+17]`566"%U43%('36"<=f
@E+1‰16%‹)?%~X +-Z/)O> Y%3)?:16"<L7=)}@’AUXG9(+1:% (N+;:P43<n S+a@=)?H1%5+-> I6"%\@=<E6#BDH1<^ŒPH-B @’FG+-<E6#Xm6RL93f
%3HJ>OHJ21AU  GŒ*X]„ W H1ZK`5+;A1n I6"%\@=<E6SBDH1< ZS6":16>OHJ[3Z/6"%\@GH-B ‘ '*:a+-%(R"6' IH1Z/[34*@E)O%(2
 tf Z ‘ G„I‡p43%(61n*+-%('b$N$’X +1%3[343<8W UQ67=6+1<ERL9_2J<=HJ43[b+a@QXG9(+1:% (N+;:P43<I+-@N@=6Z/[*@=68'
UQ4(7E7=)?+1%_@=H/Xm+1Z/)O>e@=<L+-%57N>,+a@E)OHJ%`(+17=6'H1%@E936C'*)O<E6Rr@G+1[3[3<EHJ+JRL9_)O% v ª +-%5'_@=9(6
"

T(<L7’@]7=A*7’@E6"Z @E<E+1%(7N>,+a@E6'd7N)?Z/[3>?6C7=6"%\@E6"%(R"67W ‘ 2J<=HJ43[+-@  GŒ*X'*),'d7=H1Z/6iFIHJ<=‰


#

H1%ot%321>?),7N9@=H#q])?%('*)5@=<L+-%(7=>?+-@=)?H1%/`34*@G'*),'_%3H-@ '*6":J6">?H1[_+ F H1<E‰\)?%32^7=A*7’@E6"ZW XG9(6


21<EH14([h+-@ tf Z ‘ '*),' 7NHJZ/6_[3<=6>O)?Z/)O%5+-<EAFIHJ<=‰‹H1% XXCn`343@K9(+17^>?+-@=6">?A‹`e6"6%
RrHJ%(Rr6%\@=<L+a@=)?%32UHJ%ˆ[3<EHPR"67E7N)?%32UH1BSŒ*+-%(7=‰P<=)O@W~XG936b$=$’X +-%3[34(<K6r†H1<=@K`PAy ‘ ‰Jf
7=9(+-< W 9(+-<L+a@E)21<EH143[5„Y),7i@=936/Z/HJ7N@^RrH1Z/[3<E6"9(6"%(7=)O:J6#+1%('‹),7iBDH*Rr4(7=)?%32HJ%~$&%5'*)?+1%
"

>,+-%32J4(+-2J67W$l@ 9(+J70+1[3[3>?)O68'C@=936I[3<E)O%(R")O[(>O6870H1BP$&%5'*)?+1% @=<L+1'3)}@E)OHJ%(+->J21<L+-Z/Z_+-<t Œ*+-%*f


v      $&%  %  P$]%

7=‰\<E)O@S:PAJ+1‰;+1<E+1%(+J„I@EHbZ/H*'*6"<E%$&%('3)?+1%>?+1%32145+-21687"W]%3)?:16<E7=)}@’AdH-BtqYA*'*6"<L+-`(+J')?7
+-%d)?Z[eH1<=@E+1%\@YRr6"%\@E<=6 7N@E+-<=@=)?%32FIHJ<=‰_HJ%URrH1Z/[343@E+a@E)OHJ%(+->>?)?%32143),7N@=),R"7Q+-%5' XXCW
XG93O6 XX 7NA*7N@=6Z/7bFQ)O>?>i9(+;:16U@EHˆT3@d)O% +-% HJ<=2\+-%3) +-@=)?H1%W $&%(7N@=68+1'yH1B^<=6"f
[3>,+1R")O%32‹[56H1[3>?61n@E936"A FQ)?>O>GRrHJZ/[3>O6Z/6"%\@K@E936"ZW~XG9(6"<E6R"+1% `56dH-@E936"<KBƒ+->?>OHJ4*@E7
BD<EH1Z @E936 XX 6"eHJ<N@8W ‘ @_+1%Š)?%\@=6>O>?6Rr@=4(+1>]>O6:16>‚nI)}@bR+-%w21)?:16+~`()O2s`eH\H\7’@/@EH
@=9(6@=<L+1'3)}@E)OHJ%(+->IŒ3+-%(7=‰P<=)O@^:PAJ+-‰a+1<E+1%h7N@=4('3)O687"n>?)O%32J43),7’@E)?R7K+-%5' +-<=@=)OT5Rr),+->p)?%\@=6>O>?)}f
216%(Rr6'*),7=R")O[3>?)?%367WSXG93687N6#FQ)O>?>21<EHaF @EHb%36F 936")?219\@L7Y`eH1<E<=HaFQ)?%32/BD<EH1Z H1%(6#+-%3f
H-@E936"<8W ‘ @I@E936i[3<E+JRk@E)?R+->e>O6:16"> nPH1%36CR"+1%)?Z/+121)?%36C+^%P43ZK`e6"<GH1BmRrHJZ[(4*@=6< `5+17=6'
[3<EH*'*4(Rk@L7"W{3H1< 6"V3+-Z/[3>?61n0>,+-%(214(+1216_+17E7N),7N@E+-%\@_ƒ+U7=A*7’@E6"Z|@E9(+a@KR"H1<E<=68Rk@L7C21<L+-Zf
Z_+a@E)?R+->0Z/),7’@L+-‰J67L„kn(R"H1Z/[34*@E6"<S+J7=7=),7’@E6'd>?+1%32145+-216C)?%(7N@=<E4(Rk@E)OHJ%ˆDFQ93),RL9U)O%5Rr>?4('*67
:a+-<E)OHJ4(7]>?67E7=H1%(7Q)?%(R">O4('3)O%32dRrHJZ[(<=6936"%(7=)?H1%@=687’@L7]BDHJ<Y>,+-%32J4(+-2J6C@=68+1RL93)?%32\„rn5%5+a@Nf
43<L+->m>?+1%3214(+1216^)?%\@=6<NBƒ+JRr67]@=HdRrH1Z/[343@=6"<L7K‚7NHb@=9(+-@iHJ%36KR+-%‹R"H1Z/ZK43%()?R+a@=6#FQ)}@E9
RrHJZ/[34*@=6<E7#4(7=)O%(2‹>,+-%32J4(+-2J68„C6"@ER-WsXG936dZ_+-<E<=),+-2J6_H-BY7=[566RL9 [(<=H*Rr687=7=)?%32UFIHJ<=‰
FQ)O@=9U%(+-@=43<L+->>,+-%32J4(+-2J6C[3<=H*R"67E7N)?%32/9(+17Q@=936 [eH-@E6"%\@=),+->mH-B[3<=H*'*45Rr)?%32_RrH1Z/[343@=6"<
7=AP7N@=6Z_7G@=9(+-@YR"+1%d>?),7’@E6"%+-%5'b@L+->?‰†Wp$lB@=936^HJ[*@=),R"+1>RL9(+-<L+1Rr@=6<Q<=68RrH12J%3)O@=)?H1%@E6RL9*f
%3HJ>OHJ21AK),7 +->,7NHKR"H1Z#`3)O%(6'n\)}@GR"+1%>O68+1'@EH#7=AP7N@=6Z_7@=95+a@ R+-%<E6+J'/H14*@ `5HPHJ‰P7S @EH
@=9(6 `3>O)?%('n3BDH1<Q6rV3+1Z[(>O6;„kW
ghH1<E‰/BDH1<G$&%('*),+-%d>?+1%32145+-21687 FQ)O>?>9(+;:16S@=H`e6^'*H1%36CZ_+-)?%3>?A`PA_$&%('*),+-%(7Wgh6
+-<E6^+->,7=H#@E936^`e67N@Y6c\43)?[3[e6'd@EH'*H_)}@8W XG9(6^[3<EH1`3>?6"Z_7]+1<=6 RL9(+1>O>?6"%32J)O%(2/+1%('d@=9(6
[eH-@=6%\@=),+->†)OZ/Z/6"%57N6JW gy95+a@Q)?7 %366'*68'),7G+^%(+-@=)?H1%(+1>5FQ)?>O>+1%('+#%(+a@E)OHJ%(+->e6r†H1<=@W

/.-  98     # ^ #    V0


 |Q
 0 8#8 K

$l@b),7b[eHJ7E7=)O`3>?6U@EHˆHa:16<ER"H1Z/6d@=936~>?+1%3214(+1216U`(+1<=<E)?6"<)?%y$&%('*),+ @EHP'(+;A 4(7=)O%(2


-+ %P4(7E+-<L+-‰a+(W ‘ %P4(7E+-<L+-‰a+_@E<=)?67i@EH@L+-‰J6+J'*:a+-%\@E+1216KH1Bt@E936/<=6>?+-@=)?:16/7’@E<=6%32-@E9(7iH-B
@=9(6tRrHJZ[(4*@=6<m+-%5'Y@E936t9P43Z_+1%i<E6+J'*6"<8n8FQ936"<E6@=936IRrH1Z/[343@=6"<0@E+1‰167@=936p>,+-%(214(+1216
>?HJ+1'U+1%('>?6+;:1687G@=9(6KF H1<E>?'‰P%3HaFQ>?6'*2J6 >OH\+1'HJ%@=936#<=68+1'*6<WG$l@S),7S[(+-<=@=),Rr43>,+-<E>?A
6r†6Rr@=)?:16]FQ9(6"%@=936Y>?+1%3214(+121687+-<E6]Rr>?HJ7=61n1+J7)?7p@=936SR"+17=6GFQ)}@E9_$&%('*),+-%_>,+-%32J4(+-2J67W
C6"6[3)?%32_)O%>?)O%(6^FQ)}@E9@=936#+-%P4(7E+-<L+-‰a+K[(93)O>?HJ7=H1[(9\AJn*)}@S`3<=),'*2J67G@E936 2J+1[d`e6r@’F 6"6%
>,+-%32J4(+-2J67`PA^RL93H\7N)?%32Y@=936 Z/HJ7N@+1[3[3<EH1[3<E)?+-@=6pHJ<m%(6+-<E67N@ R"H1%(7N@=<E4(Rr@=)?H1%K+;:a+-)?>,+-`3>?6
"

)?%@E936C@E+1<=2J6r@Q>,+-%32J4(+-2J6]@=HJ216"@=936<QFQ)}@E97=43)O@E+1`3>O6^+J'3'*)O@=)?H1%(+1>%3H-@L+a@E)OHJ%W
$l@t),7t+1<=2J436'#)O%@=93),7t7=6Rr@=)?H1%@=9(+-@t@E936"<E6]+-<E6QH1%3>?A#@=93<E6"6QZ_:+ (’HJ<t'*)Oe6<=6%(Rr687p)?%
@=9(6K7NHJ4*@=9$&%('3)?+1%d>,+-%(214(+12167G+1%('q])?%('*) W ‘ >O>0@=93687N6^R+-%`56 `(<=),'*2168'd`\Ad7N)?Z/[3>?6
+1'('*)}@E)OHJ%(+->t%3H-@L+a@=)?H1%s)?%ˆq])?%('*) WUXG936<E67=43>}@E)O%(2U>?+1%3214(+1216_R"+1%s`56:P)?6"F 6's+17K+
7=H14*@E936"<E%d'3)?+1>O68Rk@]H-Bq])?%('*) W ‘ %\457=+1<E+1‰;+#4(7=67G@E93),7]'*),+->?6Rk@G@EH_Z/+1‰16 7=H143<LRr6i@=6"VP@
)?%7=H143@=936<=%d>?+1%32145+-21687G+1R"R"67E7N)?`3>?6Y@EHq])?%('*)<E6+J'*6"<L7W
 “ e› šk—8š ;—;ž ¥ œ œ — œkœ › œL£Lœ
)   1 *
 ,  
 %*!    %' !P K   =   v 

<>= <@?  G   J


B  

XG936S[3<=HJ`3>?6"Z `56)O%32+1'3'3<=687=7=6'936<=6S)?7t93HaF @=H^Ha:J6"<LRrHJZ6G@E936Y>,+-%(214(+1216Q`(+1<=<E)O6<


?) %y$&%('*),+3W {343>?>OA\f&+-4*@EH1Z_+a@E)?R2J6"%36<E+1>}fl[343<E[eHJ7=6d9()O2J9*flc\4(+1>O)O@’AˆZ_+1RL9()O%36@=<L+-%(7=>?+-f
@=)?H1% 7=A*7’@E6"Z_7_ƒ{CqQf XX BDHJ<^7N9(H1<=@L„C+-<E66rVP@=<E6"Z/6">?A~'3)}Rr4(>}@ @EHU`343)?>?'WXG936"<E6
+-<E6K%(H6"VP),7N@=)?%32b@E<E+1%(7=>?+-@=)?H1%7=AP7N@=6Z_7YBDHJ< +-%PA[(+-)?<SH1Bt>,+-%32J4(+-2J67])?%~@=936FIHJ<=>,'
@=95+a@Yc\4(+->?)OBDA_@=H/`e6CR+->?>O68'{ CqQf XXCW(XG936 '*)ORr43>O@’A+1<=),7=67IBD<=HJZ@=936CBDHJ>O>?HaFQ)?%32
<E6+17=H1%57"u
v WQ$&%+-%PA/%(+-@=43<L+->†>,+-%32J4(+-2J6Q@=6"V\@8n*H1%3>?A_[(+1<N@ H-B@E936C)O%*BDHJ<=Z_+-@=)?H1%_@EH#`e6 RrH1%3f
:16"AJ6')?7]6rV*[3>?)?R")}@E>OA6rV*[3<E67E7N68'+1%('d)}@]),7G@=9(6^9P43Z_+-%Z/)O%('FQ9()?RL9dT(>?>?7]43[
@=936/'*6r@L+-)?>?7C`PAU4(7=)?%32b@E936FIHJ<=>,'‰P%3HaFQ>?6'*2J61W^XG936`(+J7N),RK<E6+J7NHJ%UBDH1<i@E93)?7
7’@L+a@=6#H-BG+a+-)?<E7S)?7S@E9(+a@i@=936/RrHJ%(Rr6[*@E7C+1%('‹7=9(+1'367iFQ93),RL9~+%5+a@=4(<E+1> >,+-%3f
214(+1216G[343<E[5HJ<N@L7@EHK'*67ERr<E)?`56QBDHJ<=Z+^R"H1%\@=)?%P4343Z+-%('@E936]%P43ZK`e6"<IH-B>?6rV*)}f
R"+->\)O@=6"Z_7+;:a+-)?>,+-`3>?6I+1<=6tT(%3)O@=61n\7NHS%36R"67E7=+1<=)?>?AS@E936"AK9(+;:16t@=Hi`56GHa:16<=>?HJ+J'*6'
+-%(')O@t)?7pH1%3>?A#`e6R+-4(7=6QH-Be@=9(6Q@=H1@E+->5RrH1%\@E6rVP@I+-%5'7=9(+1<=68'K`(+JRL‰P21<EH143%('#43%*f
'*6"<L7’@L+-%('3)O%32@=95+a@^F 6+-<E6+1`3>O6@EH'*)?7E+-Z#`3)?214(+-@=6@=936Z +1%('~+1<=6_+1`3>O6@EH
RrH1Z/Z#43%3),R"+a@E61W
z W ZY)Oe6<=6%\@%(+a@E43<E+1>->,+-%32J4(+-2J67+1'3H1[*@0'*)O†6"<E6"%\@mRrHJ%P:16"%\@E)OHJ%(7†+-`eH14*@@E936@’AP[e6
+-%(' +-Z/HJ43%\@H-B )O%3BDH1<EZ/+-@=)?H1%w@=H `56457N68'WXG9(6<=68+17=H1%(7BDHJ<_@=9()?7bZ/+;A
`56JuK@E93693),7’@EH1<EA‹H1BG>,+-%(214(+1216_'*6":J6">?H1[3Z/6%J@dƒ'*)Oe6<=)?%32U@L+17N@=687"n+-<E`3)O@=<L+-<EA
RL93H1),Rr687YZ_+1'36)?%~@=936/>?H1%32d93),7’@EH1<EAdH1BI>,+-%32J4(+-2J61n†[3<E67=6"%(R"6#H1BIH1@=936<i>,+-%3f
214(+12167I+-%('Z/)}V*)?%32FQ)}@E9b@E936"Zn*6r@LR-W˜„+-%('@=936C6"%P:P),7=+1216'[3<=)?Z_+-<EABD43%(Rkf
@=)?H1%dH-B @=936 >,+-%32J4(+-2J6S6"@ER1W
XG936 %36r@<E67=43>O@ H1B(@=9()?7),7m@=9(+-@4(%3>O687=7 F 6I[(<=Ha:P),'*6IZ_+JRL93)O%(67mFQ)}@E9#‰P%3HaFQ>?6'*2J6
+-%5')?%*BD6"<E<=)?%32R"+-[5+-`3)?>O)O@’ARrHJZ/[(+-<L+-`3>?6Q@EH#9P43Z_+-%`56)O%(2J7G{ Cq]f XX FQ)O>?>%3H-@G`e6
BD6+J7N)?`3>?61W#$l@CFQ)?>?>%3H-@C`56+1%‹6"V*+1212J6"<L+a@=)?H1%@EHd7E+;A@=95+a@ )O%57N[3)O@=6/H1Bt@=<E6"Z/6%('*H1457
[3<EH12J<=687=7i)?%ˆRrH1Z/[343@=6"<^@=6RL9(%3H1>?H12JAˆDZ_+-)?%3>?A~)?%h@=6"<EZ_7 H-B]7=[566'n Z/6"Z/H1<EA‹+1%('
[3<EH12J<E+1ZZ/)?%32K6%P:\)?<EH1%3Z/6"%\@L7E„ { Cq]f X‹X <E6"Z_+-)?%(7Q+/'*),7’@L+-%\@]'*<E6+-ZW
<>= <>= M J Q Q,D  Q,N B D@G A TQ
$&%U7N[()}@E6^H-B @=936#'*)}R"43>}@’AH-B{Cq]f XXCn5)}@YR+-%`e6KRr>,+-)?Z/6'@=9(+-@]FQ)}@E9@=936K936">?[
H-BQZ_+1RL9()O%3687"nm@=9(6b>,+-%32J4(+-2J6`5+-<E<=)?6"<KR+-%s`e6bHa:16<ER"H1Z/6@=H*'3+;AJWd$lBG@E93)?77=H143%('(7
[(+1<E+J'*H;V*)?R+->*>?6r@I457tRrHJ%(7=)?'*6<I+1%+-%(+1>OHJ21A1WŒ*Rr)?6"%\@E)?7N@E7t6":16%/@=H*'3+;A/+-<E6Y7N@=<E43212J>O)?%32
@=H#`343)?>?'+^Z_+1RL9()O%36S@=95+a@ R+-%FG+->?‰/>O)?‰16i+^9P43Z_+-%0nP+;:JH1),'*)?%32^H1`57’@L+1Rr>?67p@=H#<=68+1RL9
+ '*687’@E)O%(+-@=)?H1%W ‘ @p@E936Y7E+-Z/6G@=)?Z/61n\FI6Q9(+;:J6G`566"%7=4(R"R"67E7’BD43>?>?AK4(7=)O%(2 <E+1)O>*@E<E+1%(7Nf
[eH1<=@pBDH1<pZ/HJ<=6Q@E9(+-%+ R"6"%\@=4(<=AJWXG9(6Y'*),7’@L+-%(R"6]`(+-<E<E)O6<9(+17`e6"6%Ha:16<ER"H1Z/6G6":J6"%
FQ)O@=93HJ4*@Q@=936 Z_+JRL93)O%(67G>?6+-<E%3)?%32#@=H_FG+->?‰eW
Xm<=436Jn0BDHJ<^@E93)?7KFI6b9(+;:16_9(+J'h@=H>,+;A‹<L+-)?>?F +;AU@E<E+JRL‰*7^+->?>tHa:J6"<C@=9(6RrHJ43%\@=<EA†M
`343)?>,' `3<E),'*21687+JRr<EHJ7E7<E)O:J6"<L7"M^'*)?2Š@=43%3%(6">,7b@E93<EH1432J9 Z/H143%\@L+-)?%(7+-%(' `343)?>?' +
9P43216Q)?%*BD<L+af&7’@E<=45Rk@=4(<=6 @=HKZ/+1‰16G@E936]FQ93HJ>O6G@E93)O%(2 BD6+J7N)?`3>?61Wop:16%@=936%_)}@I'36">?)O:J6"<L7
@=9(62JH\H*'37#+-%('ˆ[(+17E7N6%3216<E7K+a@#HJ%3>?A~@E936d<E+1)O>?F +;Ah7’@L+a@E)OHJ%(7"MtFI6%36"68' 7=6"[(+1<E+-@=6
%# v  
    $&%  %  P$]%

+-<E<L+-%32J6"Z/6"%\@E7#@EH 216r@b@=936@E93)O%(2J7b+1%('w[56<E7=H1%57+a@93HJZ/6 CW 4*@d+->?>Y7=+1)?' +1%('


'*HJ%361n*)O@Y'*HP67Q6%(+-`(>O6C4(7G@EH/Ha:16"<LRrHJZ/6]@=9(6^'*)?7N@E+1%(Rr6 `5+-<E<=)?6"<8W
$lB { CqQf XXy),7G>O)?‰16S@=9(6iFG+->?‰P)O%(2#Z_+1RL9()O%36Jn*FQ9(+a@G),7 @=936 RrHJ43%\@=6<=[(+1<N@I@=H@=9(6
<L+-)?>OFG+;A/>OH*RrHJZ/H-@=)?:16 XG936 +-%(7=F 6"< >O)?67G)?%U7N6[(+-<L+a@E)O%32#>?+1%32145+-216"f `5+17=6'_+-%5+->?AP7=),7
H-BI@=6"V\@8nBD<EH1Z|‰P%3HaFQ>?6'3216/+-%('~)O%3BD6"<E6"%(R"6rfl`(+17=6'~+-%(+1>OA*7=)?7W#XG936BDHJ<=Z/6"<C@L+17=‰U)?7


>?6rB @@=H~@=936dZ_+1RL93)?%361np+1%(' @E936d>?+-@N@=6<K@L+17=‰h@=H@E936d9\4(Z/+1% <E6+J'*6"<#`568R"+-457N6H-B


@=9(6")?<#[3<EH-TeRr)?6"%(R"A~BDHJ<^@E936<E67=[e6Rk@E)O:J6@L+17=‰P7W‹gh6+->,7NH<E6">,+aVh@=9(6<E6c\43)?<=6Z/6"%\@
@=95+a@]@E936 H14*@E[34*@Y`e6^2J<E+1ZZ_+-@=),R"+-> Wgh6K'*H_<E6c\43)?<=6Jn393HaF 6":16<n\@E9(+a@]@E936^HJ4*@=[343@
`e6R"H1Z/[3<E6"936%(7=)O`3>?61W ‘ [(+1<N@/BD<EH1Z @=9(6d6"eHJ<N@@=Hs`34()O>,'Š+1[3[3<EH1[3<E),+a@=6ZK43>O@=)?>?)O%*f
2145+->m'(+a@E+1`(+17=67QBD<=HJZŽ+R"H1Z/[34*@L+a@E)OHJ%(+->m:\)?6"FQ[eH1)?%\@n†Rr<E6+-@=)?:16 ),'*6+J7Y+-<E6^%366'*68'
@=H‹67N@E+-`(>O),7N9 >,+-%(214(+1216_`3<E)?'32167WUXG93H14(219h@=93),7#)?7#RL9(+->?>O6%321)?%325nm)O@K),7#'*6"T(%3)O@=6>OA
BD6+J7N)?`3>?61Wgy)O@=9d@=936 +1[3[3<EH1[3<E)?+-@=6i)O%3BD<E+-fl7N@=<E4(Rk@E43<E61nJ@=936 >,+-%32J4(+-2J6Y`(+1<=<E)?6"<GR"+-%d`e6
Ha:16<ER"H1Z/6YFQ)O@=9d@EHP'(+;A + 7 @=68RL93%3HJ>OHJ21A+a@]>?6+J7’@]+1ZHJ%32#@=936 $&%('3)?+1%d>,+-%(214(+12167W
  <>= <   ON  B   
XG936U[3<L+1Rr@=),R"+1>G+J7N[e6Rr@H-BS@=93),7_[3<EH1`3>?6"Z )?7/@EH '*)O:P),'*6@=9(6U>OH\+1'Š`56"@’FI66"%wZ/+1%
+-%5' Z/+JRL93)?%36‹)?% 7=4(RL9 + FG+;A @E9(+a@@=936h+17=[568Rk@E7bFQ93),RL9 +-<E6‹'*)O_R"43>O@BDHJ<@=9(6
9P43Z_+-%`e6")?%32+-<E6^9(+1%('*>?6'`PAd@=9(6KZ_+1RL93)?%36Jn†+1%('+J7N[e6Rr@E7SFQ93)?RL9h+-<E6^6+J7NAdBDHJ<
@=9(6K9P43Z_+-%U`e6")?%32+1<=6 >?6rB @S@=H9()OZWYXG936+-)?Z ),7Q@EHZ/)O%()OZ/) "6^@E936K6"eHJ<N@SH-B@=9(6
9P43Z_+-%d`56)O%(2(W
XG936Q+-[([3<=H\+1RL9n;93HaF 6":J6"<8naR"+-%%3H1@`e6]+i936+;:PA\fl9(+-%('36' Z/6r@E93H*'`568R"+-457N6 @=9(6
[3<EH1`3>?6"Z )?7 :J6"<EARrHJZ[(>O6"V†W ‘ `(<=4*@E6BDH1<LRr6_7NHJ>O43@=)?H1%hFQ)O>?>p6")O@=9(6"<^`3>?HaF 4([s+J7C+
<E67=43>}@QH1Bm@E936 >?+1<=2J6i+1Z/H143%\@]H-B <=687NHJ43<LRr67 )O@]FQ)O>?>%366'n(H1<G@=9(6 >?+1<=2J6i+1ZHJ43%\@QH-B
@=)?Z/6#)O@CFQ)O>?> @E+-‰J61W ‘ %3H1@=936<i'367=)O<L+-`3>?6^BD68+a@E43<=6#H-Bp@E936/+-[3[3<EHJ+JRL9UFIHJ43>,'U`56#@=9(6
6+J7N6FQ)}@E9ŠFQ9()?RL9 @E936U7NA*7N@=6Z R+-%Š`56d6rVP@=6%('*68' @=HhH-@E936"</<E6">,+a@E6' >,+-%32J4(+-2J67W
XG9367=A*7’@E6"Z ),7/>O)?‰16>OAˆ@=Hs[5H\7=7=67E7/6rR")O6%(RrAJnI6"V\@E6"%(7=)?`3)O>?)O@’A1n 6r@ER1W H1%3>?Aˆ)OBi)O@)?7
`(+J7N68'bHJ%d7=H143%5'[(<=)?%(Rr)?[3>?67Q+1%('@E936"HJ<=AJW
XG936H1<E6r@=),R"+1>1),7=7=43687mH1B()?%J@E6"<E67N@ <E6">,+a@E6p@EHS)?%*BDH1<EZ_+a@E)OHJ%#+-%5'^93HaF )O@)?7RrH*'*68'^)?%
>,+-%32J4(+-2J61WmqYHaF )?7p@=936S)O%*BDHJ<=Z_+-@=)?H1%_6rVP@E<E+JRk@=68' ‘ <E6">,+a@E6'_c\4367N@=)?H1%_),7IFQ9\A#@=9(6
7E+-Z/6K+-Z/HJ43%\@]H-Bp7N4(<NBƒ+JRr6 )?%*BDH1<EZ_+a@=)?H1%U)?%UH1%(6^>?+1%32145+-216C),7YR">O68+-<E>OA43%('*6<E7N@=HPH*'


FQ93)?>O6~)O% +-%3H1@=936<>?+1%32145+-216JnG)}@d)?7%3H-@dRr>?6+-<b@=H @E936‹<E6+J'*6"<8W gy9(+a@+-<E6U@=9(6


7=H143<LRr687IH1B0‰P%3HaFQ>?6'*2J6Y@E9(+a@]+-<E6i4(7N68'b)?%)O%3BDH1<EZ/+-@=)?H1%6rVP@=<L+1Rr@=)?H1% qYHaF 7N9(H143>,'
@=9(6 ‰\%(HaFQ>O68'*216i`56 HJ<=2\+-%3) "68'


‘ >O@=9(H1432J9U@=936/RrHJ%(Rr6[*@SH1Bt+-%P4(7E+-<L+-‰a+/),7Y2J6"%36<E+1>‚neFI6#FQ)O>?>'*)?7ERr457=7S)}@C)?%@=9(6


RrHJ%\@=6rVP@YH1Bp$&%('*),+-%>,+-%(214(+12167u FQ)}@E9<=687N[e6Rr@Y@EH +-%(%(+1'3+-flqY)O%5'*)0+-%P4(7E+-<L+-‰a+/)?%


[(+1<N@E)?R"43>?+1<W
"

 <>= < ;  QBEJ Q,JB  J R GB,G#G FR Q 




Œ\@E<=4(Rr@=43<E6H-B]@E936U+-%P4(7E+-<L+-‰a+U),7_7N9(HaFQ% )?%w{ )?2143<E6 *W v W ‘ 7NHJ43<ER"6>,+-%(214(+1216


7=6"%\@=6%(Rr6C),7 T(<E7N@Q[3<EHPR"67E7N68'_`PAZ/H1<E[393HJ>OHJ21),R"+->†+-%5+->?A 6"< DZ/HJ<=[39e„kWXG936 Z/H1<E[39


RrHJ%(7=)?'*6<E7+^FIHJ<E'+a@I+i@E)OZ/6JnP+1%('#BDH1<p68+1RL9/F H1<L'#)O@IRL9368RL‰*7FQ936r@E936"<p@=936SFIHJ<E'#)?7
Œ‘  qY$ ‡ ‘ ‘ Co € YU ‡tq ‡ +1.<E+J6"'*V )O2JZ_7
ZS6rBƒ+143>}@ I93H1),Rr687
Z " X

vª


 ,  %*!  %' !P K   =  


X ‘ X‹Œ
.Jg  +J7N6"fl‡+1<E7E+-<E2 ZS+-@E+
i<EH143[()O%32 UQ43>?67

W )O>?)O%(214(+1> ZY),Rk@E)OHJ%(+-<EA
X ‘ ‡t‡t$ 1W . € " Y)?`3X 9(‘ +1X ‰J@E) ZYZS),Rk)?Rr@E@=)OHJ)?H1%(%(+-+1<E<=A A

.Jg Œ XX +-‘ <EX 216"@GZS.m6r+-@E+1%3)O25>,7 W


qYUo Y.mUQoI$NŒŒ*XG$


Co   XUHJ<=[39UŒPAP%\@=93687N)"6<L„
 6"<E‡`+-n(<L‡t+1'*<=HJ)?21%3Z_H1437i%0 Yn*Œ*H19(43+1%7=n 9\@=) „
ZS6rBƒ+143>}@ I93H1),Rr687


Œ YX ‘ XG$ Œ  ‘


 U ‘  iŒ ‘ U ‘ € %*f >?)?%36^q]6>O[
"

€ S XQ‡ S X

{ )O2J43<E6 *W v uGu3WI>?H*RL‰Œ*RL936Z_+a@=),RiH-B ‘ %P4(7E+-<L+-‰a+



1v v       $&%  %  P$]%


)?%_@=936S'*)?Rr@=)?H1%(+1<=AH-B)?%('*68Rr>?)O%(+1`3>?6]FIHJ<E'(7"W $lB†BDH143%('n\)O@I<E6r@E43<E%(7)O@E7I21<L+-Z/Z_+a@=),R"+1>
BD6+-@=43<E67W$l@K+1>?7=H4(7N687C@=9(6_FIHJ<E'~[(+-<L+1'3)O2JZ/7S@EH7N66_FQ936r@E936"< @E936_)?%3[34*@KF H1<L'
R"+1%#`e6]'*6<=)?:168' BD<EH1Z +i<EHPH-@+-%('#)}@L7[(+1<E+J'*)O2JZW0$lBe@=936Q'*6<=)?:a+a@=)?H1%),7[5H\7=7=)O`(>O6Jn1)O@
<E6r@=4(<=%(7 @E936^2J<E+1ZZ_+-@=),R"+->eBD68+a@=4(<=687Q+17E7NH*Rr),+a@E6'FQ)}@E9d@=936^F H1<L'_BDHJ<=Z DHJ`*@E+1)O%368'
BD<EH1Z @=936I<=HPH-@ +-%('i@E936t[(+1<E+J'*)?21Z„kW$&%^R+17=61n8@=9(6p)O%([34*@FIHJ<E'CR"+1%3%3H1@0`56 '*6"<E)?:16'n
)O@])?7G[eHJ7E7N)?`3>?Ab+_RrHJZ/[5HJ43%('FIHJ<E'+-%('d),7G21)?:16"%@=H@E936^7=+1%('*93)[5+1RL‰a+-2J6Y@EH7N[3>?)O@
)O@])O%\@=H@’F H/H1<QZ/H1<E6CFIHJ<E'37nPFQ93),RL9+1<=6S@=9(6"%U+-2\+-)?%d+1%(+->?A "6'b`PAbZ/H1<E[39W
XG936/H143@=[34*@^H1BIZ/H1<E[39h)?7C2J)O:J6"%s+17i)?%3[34*@ @EH@=936/>?HPR+->F H1<L'2J<=HJ43[56<W$l@E7
Z_+-)?%/@E+J7N‰#)?7p@=H 21<EH14([#BD4(%(Rk@E)OHJ%_FIHJ<E'37pFQ)}@E9/@=936YR"H1%\@=6%\@pFIHJ<E'37p`(+17=6'H1%_>?H*R"+1>
)?%*BDH1<EZ_+a@=)?H1%~7N4(RL9h+17S[5H\7’@E[5H\7N)O@=)?H1%~Z_+-<E‰16"<L7Q@E9(+a@CBDHJ>O>?HaF +%3HJ43%nHJ<C+14*V*)O>?),+-<EA
:16<=`57BDH1>?>OHaFQ)?%32ˆ+sZ_+-)?% :16"<E`WXG93)?7b21<EH143[()O%32yƒH1<bR+17=6d6%('*)?%32J7b)O%yR+17=6UH-B
)?% \(6Rr@=)?H1%(+1>P>?+1%3214(+121687E„rn8),'*6%J@E)}T567 :P)O`395+-‰\@=)*H1B(%3H14(%(7+-%('K:16"<E`(7WmXG936 :P)O`(9(+-‰\@=)
H-B:16<=`57I),7]+1>?7=HR+->?>O68'X ‘ X D@=6"%57N6"fl+J7N[e6Rr@NflZH*'3+1>O)O@’A3„I>?+1`56>‚W
‘ B @E6"<@E936t+1`5Ha:J67N@E+1216Jn7=6"%\@=6%J@E)?+1>1+-%5+->?AP7=),7R"+-%C`e6t'*HJ%361W I43<=<E6"%\@+-%P4(7E+-<L+-‰a+
'*HP67K%3H-@'*H@=93),7K+1%(+->?A*7N),7 `e6R+-4(7=6)O@K<=68c\43)O<E67K+>?+1<=2J6+-Z/H143%\@^H1BG>?)?%32143),7N@=),R
'3+-@E+/@EH`56#[3<E6"[(+1<=68'W ‘ >?7=H(n†7=)O%(R"6 @=9(6K$&%('*),+-%U>,+-%(214(+12167Q+1<=6#Rr>?HJ7=61n3@E936  f z 
<E43>O6+-[3[3>?)?67 @EH:P)O`(9(+-‰\@=) W Y7=6H1BG:P)?`39(+1‰J@E)t[3<EH*'*4(R"67  =R"H1<E<=68Rk@ dHJ4*@=[343@
#

FQ)O@=9ˆHJ%3>OA z  6"eHJ<N@8W ŒP6"%\@E6"%\@=),+->I[(+-<L7N6<^R"+1% `56)O%(R"H1<E[5HJ<E+-@=68'‹FQ936%ˆ>,+-<E216


#  

>?6rV*)?R+->'3+-@E+1`(+17=67G+-<E6C<=68+1'*AJW
XG936#%36rVP@ 7’@L+-216#H-Bp[3<EH*Rr687=7=)O%(2b),7]@E9(+a@CH-Bp@E936Z/+1[3[3)?%32b`(>OH*RL‰†WiXG9()?7C7N@E+-2J6
4(7=67]+/%(H143%:P)O`(9(+-‰\@=)m'*)?Rr@=)?H1%(+1<=AJn3+_X ‘ X '*),Rk@=)?H1%5+-<EA1n(+1%('+#`()O>?)O%(214(+1>0'*),Rk@=)?H-f
%(+1<=AJWC{(H1<C6+JRL9F H1<L'U21<EH143[0n(@E936_7NA*7N@=6"Z T5%('37C+b7=43)O@E+-`(>O6<=HPH1@i+-%5':P)O`(9(+-‰\@=)
)?%b@E936C@E+1<=2J6r@G>?+1%3214(+1216JW XG9P4(7n*)}@Q2J6"%36<E+-@=67G+#>OH*R+->F H1<L'_21<EH143[)O%d@E936i@E+-<E216"@
>,+-%32J4(+-2J61W
XG936>?HPR+->GF H1<L'ˆ21<EH14([(7)O%Š@=936@L+-<E216"@>,+-%32J4(+-2J6d+1<=6d[(+J7=7=6' H1%Š@=H +h>OH1f
R"+1>IF H1<L'ˆ7=[3>?)}@=@=6"<U‚.1g Œ3„KBDHJ>O>?HaFI68' `PAˆ+Z/H1<E[393HJ>OHJ21),R"+1>I7=A\%\@E9367=) 6"<d Co i„kW
.Jg Œ7N[3>?)O@E7G@E936^>?HPR+->F H1<L'd21<EH143[57G)O%\@=H_6>O6Z/6"%\@E7]R"H1%(7=),7’@E)O%32_H1B <EHPH-@Y+1%('dBD6+-f
@=4(<=687"WU{ )O%5+->?>OAJn Co  2J6"%36<E+-@=67i@E936bF H1<L'37CBD<EH1Z +<EH\H1@#+-%5'h@=936b+J7=7=H*Rr),+a@E6'
21<L+-Z/Z_+a@E)?R+->eBD6+-@=43<E67W
<>= <   R  B  Q  B G  
‘ %P4(7E+-<L+-‰a+ HJ4*@=[(4*@I),7t457N4(+1>O>?A/%3H-@I@=936S@E+1<=2J6r@t>,+-%(214(+12161nJ`34*@QRr>?HJ7=6]@EHK)O@WXG9P4(7"n
@=9(6 +-%(%(+1'3+-flqY)O%5'*)p+-%P4(7E+-<L+-‰a+d[3<EHP'34(Rr687K+U'*),+->?6Rr@KH-B]q])?%('*) n@=95+a@#'*HP687^%3H1@
9(+;:J6#+-2J<=66"Z/6"%\@Y6"@ER1W $l@CR+-%`e6R+->?>O68'‹+7NHJ<N@iH-B Zi+-‰*7=93)O%()G‚7NHJ4*@=9(6"<E%5„]qY)O%('3)‚W
"

ŒPHJZ6C+1'('*)}@E)OHJ%(+->†%3H1@E+-@=)?H1%bZ_+;A_+->,7=H#`e6S457N68'b)?%@E936CH14*@E[34*@W I6<N@L+-)?%+1ZHJ43%\@
H-BP@E<E+1)O%3)?%32]),7%(6"6'36'iBDH1< +Q4(7=6"<0@=H]2J6r@04(7=6'i@EHQ@=9(6I+-%P4(7E+-<L+-‰a+ H143@=[34*@ >?+1%32145+-216JW
XG936 <EH1>?6 H-B @=936#+-%P4(7E+-<L+-‰a+K)?%\@=6<NBƒ+JRr6DHJ<]H1%3f >?)O%(6 936">?[U)O%{ )?2143<E6 PW z „ ),7G@EH
Bƒ+1R")O>?)O@E+a@E6Y@E936i<=68+1'*)?%32KH1BH14*@E[34*@Q`PA@=936i<=68+1'*6<W $l@Q7N93HJ43>,'‰166"[_@E<E+JRL‰#H1B0FQ9(+-@


RrHJ%(Rr6[*@E7^9(+;:16_`e6"6% )O%\@=<EH*'*4(R"6'~@=HU@E9364(7=6"<8n+-%('s+->,7=HU[3<=Ha:P),'*6_H1%3f >?)O%(6_936">?[


FQ936%@E936 4(7=6"<GBƒ+JRr67Q+[3<EH1`(>O6ZdW
×]œ › k–  ˜— –œLC¤˜¤ š ;™?šk› › ¥  ›  œ – ×]— œE¤ œE‚¥# –˜—8 • œE™?›‚š × • ¥ ‚š ¥  — •‚¥  — —  ;– œL– œ ¤ — ¤ œ œ › —;¥ › œ 0¥ – ¤˜¤  – š ×  ›–˜—š ;¥K–  • ¥ ¥ ›ƒ™?šk› ×]œ —;ž ¥ šk— ¤
  )


1

 ) 1
( .
)
&

(
- *
( 
 ,    %'
 %*!   !P K   =   v1vJv
Œ*H143<LRr6 0. +1%32145+-216

IH1<E6 ‘ %P4(7=+1<E+1‰a+

X+ X+ 9 \ (
4 /
Z 1
+ % X+ X+
Y7=6"<E$&%\@=6<NBƒ+JRr6 $&%\@E6">?>O)?216%J@ S

 7N6<Q$&%J@E6"<=Bƒ+1R"6 -+ ),'
 
q ‘ XX ‘ *4 @=HJZ_+a@=),R IH1<E<=68Rk@EH1<

<ED@=6H_+J'*9P6"43<LZ_7E„ +-%
X+ X
 X E+
 X +-<E216"X @Q.m+-%325W X +-<E216rX @].m+-%325W
q ‘ XX X‹X
p+€ -<E)?H14(7G.6:16>?7GH1B ‘ %\457=+1<E+1‰;+
4*@E[34*@
{ )?2143<E6 PW z uGu3ZY)O†6"<E6"%\@]$&%\@=6"<=Bƒ+1R"67 BDHJ< ‘ %P4(7E+-<L+-‰a+


ZS6"[e6"%('*)?%32hH1% @E936d%(+a@E43<=6H-B])?%\@=6"<=Bƒ+1R"61nHJ%36R+-%ˆ9(+;:J6+1%ˆH1<L'*)?%(+-<EA~457N6<
)?%J@E6"<=Bƒ+1R"61n\)O%\@=6>O>?)?216"%\@t457N6<I)?%\@=6"<=Bƒ+1R"61n\212\q ‘ X‹Xƒ9P43Z_+-%b+1)?'*68'_Z_+1RL93)?%36Q@E<E+1%(7Nf
>,+a@=)?H1%e„kn(+1%('bZ_+JRL93)O%(6S@E<E+1%(7=>?+-@=)?H1%sƒ7=6"6 { )O2J43<=6 *W z „kW


<>= < VD J D R[Q,D  B  G

XG936<E6+J7NHJ%‹@E936b+1`5Ha:J6+-[3[3<EHJ+JRL9‹F H1<E‰*7CBDH1<^$&%5'*)?+1% >?+1%32145+-21687i6":16% FQ)}@E93H143@


+‹BD43>O> \(68'*2168'ˆ[(+1<E7=6"<),7K@E9(+a@@=9(67=H14(<ER"6b+-%('ˆ@E+1<=2J6r@/>?+1%3214(+121687K+-<E67=)?Z)?>,+-<8W
‘ @]Z_+-[3[()O%32_>?6":J6"> n3)O@])?7Q[eHJ7E7=)O`3>?6C@=H/T5%('d<=HPH-@L7"n5X ‘ X7"n5+-%('d%3HJ43%:\)?`39(+1‰\@=))?%
q])?%('*)BD<=HJZ +-%d6c\43)?:a+->?6"%\@ +-%3%5+1'3+K@=6"V\@]FQ)O@=9(H14*@Q@EH\H_Z_+1%\ARr>,+17=9367W
WG+17=6'dH1%UHJ43<]6"VP[e6"<E)?6"%(R"61n(F 6^R"+1%7=+;Ab@=9(+-@S+-[5+-<=@QBD<=HJZŽ+121<E6"6"Z/6%J@8n*@=9(6"<E6
"

+-<E6CH1%3>?A y Z_:+ (’HJ<G'3)}†6"<E6"%(R"67G)?%d@=936^7=H14*@E9$&%('*),+-%d>?+1%3214(+121687IFQ)O@=9UqY)O%('3)‚W ‘ >O>


@=9(67=6CR+-%)O%Bƒ+1Rr@G`e6C`3<E)?'*2J6'WXG936Ab+-<E6S'367ERr<E)O`e6'`56>OHaF BDH1< +-%3%5+1'3+-flqY)O%('3)
"
1v v;z       $&%  %  P$]%

+-%P4(7E+-<L+-‰a+(W#W 4*@QT(<L7’@]>?6r@]457G>OHPH1‰+a@]+-2J<=66"Z/6"%\@W
<>= <  D D  JL  3B     QD J R GB,G#G JQLJ Q 

q])?%('*)09(+17Q+1%+121<E6"6Z6%\@ <E43>?6iFQ9()?RL9UR"+1%`e6^7N@E+a@E6'd+17 BDHJ>O>?HaF]7"u


i6"%5'*6"</%P43ZK`e6"<_[e6"<L7=H1% D21%([5„#H-BY@E936U:16<=`w+-2J<=667#FQ)}@E9Š@E936U21%([ŠH1BY@=9(6
‰a+-<=@E+)}BG)O@C95+17  :P)?`39(+-‰\@E) #1nH-@E936"<EFQ)?7=6#@=936_2J%3[~H1Bt@E936/:16"<E`~+121<E6"687YFQ)O@=9h@=9(6
‰a+-<EZ_+U)}BS)}@9(+17 Š:\)?`39(+1‰\@=) MtH-@E936"<EFQ),7N6_@E936b:J6"<E` @E+1‰1687^Z_+17ERr43>?)?%361n7=)O%32J43>,+-<8n
@=9()O<L'd[56<E7=H1%BDH1<EZW
gy936%^@=9(6I)?%3[34*@ +1%3%(+1'(+Y7=6"%\@=6%(Rr6I9(+17+-%#+-ZK`()O2J43)}@’A )?%K),'*6%J@E)}BDAP)?%32i‰;+1<N@L+
DHJ<b‰a+-<EZ_+sFQ93),RL936":J6"<),7<E6">?6":a+-%\@BDHJ<+121<E6"6"Z/6%J@k„/+-%('w@E936‹2J%3[yH-BC@E936@’F H
"

R"+1%('*),'3+a@E6#‰a+1<N@L+17DH1<C‰a+-<EZ_+17L„Q),7C'*)}†6"<E6"%\@8n+-%~+1ZK`3)?214()}@’AFQ)?>O>p+1[3[568+-<i)O%~@=9(6
21%([H1Bm@E936 :16<=`W
ŒP)?Z/)O>,+-<E>OAJn([eHJ7E7N687=7=)?:16 Z/H*'*)}T56"<L7]H-B%3HJ43%(7Y%36"68'U)O%3BDH1<EZ/+-@=)?H1%+1`5HJ4*@Y21%([UH-B
@=9(6d<E6">,+a@E6' %3H143%57"W XG9(6"<E6d+1<=6+‹%P43Z#`56<_H-BSR+17=67FQ936<=6d)}@_),7%3H-@_6+J7NAh@EH
RrHJZ/[34*@=6JWpŒPH1Z/6r@E)OZ/687"n\@=9(6S<E6">?6":a+-%\@ )O%*BDHJ<=Z_+-@=)?H1%_Z_+;A/%3H-@ 6":16%`e6i+;:a+-)?>,+-`3>?61W @
+-%(%(+1'3+9(+17]@=93<E6"6216%('*6"<L7KƒZ_+17ERr43>?)O%(61neBD6"Z/)?%3)O%(6/+1%('U%364*@=6<L„rn†FQ936"<E6+J7
q])?%('*)59(+17HJ%3>?A^@’F H/ƒZ_+17ERr43>?)O%(6]+-%('#BD6"Z/)?%3)?%368„rW IHJ%(7=6c\436"%\@E>OAJnJ7=H1Z/6G)O%*BDHJ<=Z_+-f
"

@=)?H1%>?HJ7E7 )?7Q`eH14(%('@=H/H*R"Rr4(<Q)O%21HJ)O%(2KBD<EH1Z +1%3%(+J'3+#@=H_q])?%('*) WtXmH/+;:JH1),'_@E93)?7


>?HJ7E7"nP)O@GFQ)?>?>`56C%368Rr67E7E+-<EAK@EH[(<=Ha:P),'*6S+J'3'*)O@=)?H1%(+1>e%(H-@E+-@=)?H1%b@=H/Z_+-<E‰#@E936C%36"43@=6"<
"

216%($&'*% 6<q]Wq])?%(6'*<=) 6CnQ),BD7]HJ<+-%d‰a+-6r<EV3Z/+1Z+-f [(‰a>O+16i<N@E@=<_H_[32J<E)O:J+;6SAJH1)O@E2\7Y+3np7N)?@’21AP%3[3)OT5),R"R+1+->O%(>?A R"617Nu 6[(+-<L+a@E6d:J6"<E` BDH1<EZ_7+-<E6


H1‰P+;:a%‹9P+-43@=)?>?>?93+J+16_+ `3i>?7=6143+-M-<=%(`3Bƒ' +143Rr@E6J'3)?n% +-)}<L@^+;7=:a+-6"+1%36:+%(Z/(N+J+J7S'3+Q+3@E‰Pn;9(93BD+a<EH1@ 6>,c\+1)O@43+ 63'3%\W0H\@=qY68>?A^7iHa+iFI%3H-67=@^:1)?%36'*21<),>?na7’6I)?@E%/:J)O%(6"[321<E<E`#43+J),Rk7N7N@=@E9~6)?R"Z `56 6"HJ)?@’7FI%366]6Z/6"R"% +1[3%>OHa='3'3AJ+1)?67N'<E@=+;)?W %3:aXG+J21:+4(9P(N4()?+17=7"9 + n
"

`e:166r<=@’`0F M16")?6"%_%@=@E93936]6 T(@’<EF 7N@IH/R4(+17=7=+16Q21)}68@t7QFQ`P)?Ab>?>([(`e+;6YAP%3)?%362_4*@=+a6@N<t@E6"21%\6@=%()?'*H16%<p@EFQH/93@=)?93>O6]6K)?21%/6"@=%59('*6Y6"<]7N68Z_RrHJ+1%(<='‰P)?nJ%3@’2/A\[(H1)?%R+-@=>?9(>OAJ6 n


  

)O7=@i[568FQRr)?)O>OT(> 6`e7Q6qY+b)O%(%3'3H1)0%*[3fl%3<EH*6'*4*4(@=6Rr<S68'b2J6"`P%(Ab'*6+1<%PD4(7=D+1FQ<E93+16‰a<=+J6 „kW 7N[e6R")}T567 +-%3%(+J'3+7=6"%\@=6%(Rr6Jn iq " " 

~€~ |!;{>{,‚;{ †{ „7z{>{ >{7{š™>>ƒ>}7!‚}7z>—!{>{„A7 Š?™ƒ!˜>z7{>{


‘


ƒ;} ;˜>˜,‚ ˜ ‰ } ˆ }
   

~€|!{>{ † „7z >}7‚} ;{ ˆ 


  - 

~ ;{,‚;{ } {>{ >˜ {7{š} ™>} ƒ>‹!7z>ƒ;{>}{ ;Š?˜>™˜,‚ƒ!˜>z7{>{ ‘


 -

‰ ˆ
   
 - 

™ ;—;+E ž ”0 –˜•škœ —›¥ › œ¥ šk• – › œ œ –˜—; ¥k×  –˜œE¢; • –˜škœL— £r – š ×Q¥N›‚œ š — –  ¥ • œEž’ šk›– › ¥ – ž Y—;œ š k› ™?¥’š ¥p¤˜×Q¤ š ¥ — ¥ Yœ ¢ ž’CžNškœ › ;–˜—;š   š š –   –˜œ šk— — œ×]› œ › £k ¥ –˜› — ,;œ –
 

 1  '&   ( ) ( ) 
 &  ! *

0k›š • œƒ¥ ×>•—¥×]–  œ › – ¥p› 0—;•š a¥ › › š ¥ ×š ¥ › –˜•¢–  ¥¤˜¥ –   š – ¤ š – ; š –˜¢ š¤˜šk¥ –˜¥=— 0œ  ‚– šL– ¤•8™ š – ¢ 1¥ ×  8– žl;• ž œL• ¤¥ Jœ œ šk› š ¤ œ• –˜ž— ¥’¤^0—;š– ¤˜š ¤ ¢1¢ £k𥥒¥ • š¥ ›‚•¥ œ—¥ ¢- – š œLšk¤˜¤p› £ œ –˜ž’—8k›škSš—8¥N¥k™L×Q™ ¥–˜—;• — ¥
 ( (  .  1 . -
# &  ( ). & )  1 )
&  ( ).      1    &  -

‚F ¥k× œ › –œ ×  œkœ ¤˜¥0‚œkœ œS— £ •; š š ¤ œkœ ›‚š – š ¥¥ œ— ¥ ™ œ –  •8• ™ ¥ ¤;ššk–˜ž ›  8› ¥– šLœ ™ —  • –˜—;¥ ž’š šk× › £ –˜¤˜—;¥& tš¥ L™ ¥ — •  ¥¥ —;ž ¥  –˜¥k— × – ‚ š š ¤ œE  • –˜šk¥ — ¢ ¥ › 
( (
1 *   ( & # & '  (  ( &
    ( 1 (    (  &   1 *

–  œƒž × œ —Y;šš ž’ž ;›–˜— ;– ž’š ;› ¥ –˜—]› ¥ šk— ¥  š  • ¥ ¥  –˜šk— › œkœƒ×]œ — ¥ £ "œkœ £ – "œkœ G• œE ;–


  !
   ) .
  ( 
 ( 1
 '* ! 
  
( 
  


1
 
 ,    %' !P K  =  
 %*!   v1v7y
.H\7=7iH1B +-2J<=66"Z/6"%\@C)?7C%(H-@ 6rV*[e6Rk@E6'‹@=H`e6/@=HPH"(N+-<E<=)?%32@=HUq])?%('*)t7=[e6+-‰J6"<L7
)?%y:\)?6"F H1Bi@E936")?<6"VP[eHJ7=43<E6@=Hˆ:a+1<=)?H14(7:;+1<=)?6r@E)O687_H-B %3HJ%*fl%(+a@E)O:J6qY)O%5'*)S9(6+-<L'
6":J6"<EA*'3+;A_+17G[3<EH1[(+12J+-@=6'b`PA_@=6>O6:\),7=)OHJ%n3<L+1'*)?H+1%('T(>?Z/7W
<>= < 'GJ G   BED   R
‘ [5+-<=@QBD<=HJZ +-2J<=66"Z/6"%\@G@=9(6"<E6^+-<E6iHJ%3>?A@=9(<=66 Z/+(’H1<]7=AP%J@L+1Rr@=),Ri'3)}†6"<E6"%(R"67Q`e6rf
@’F 6"6"%sq])?%('*)I+-%(' +-%3%5+1'3+(W/ŒP4(<=[3<E),7N)?%321>?A+->?>H-B @=9(67=6/R+-%h`56/@L+-‰J6"%sR"+-<E6H-B
`PAb6%3<=),RL93)?%32q])?%('*)mFQ)}@E9+BD6"F +1'('*)}@E)OHJ%(+->BD43%(Rk@E)OHJ%(+->m[(+-<=@=),Rr>?67]H1<Y7=4*/V*67]+J7
"

7=93HaFQ%~`e6">?HaF W_XG9P4(7"n0@=936A~R+-%h`56_:P)?6"F 6'h+17C>?6rV*)?R+->2\+-[(7iH1<CBD4(%(Rk@E)OHJ%hF H1<L'


2J+1[(7W >,+-%(214(+1216i`3<=),'*2J67
           
$&%R"+J7N6 H1B6"ZK`e6'('*6'U7=6"%\@=6%(Rr687Q)?%q])?%('*) n(@E9367=6K+-<E6C[34*@C+aB @=6<Q@=9(6^Z_+-)?%U:16"<E`
43%3>?)?‰16^)?% +-%3%(+J'3+3W {3HJ<]6rV3+-Z/[3>?6DFQ9(6"<E6 /n(q^ne+-%('Uoy7=[e6Rr)OBDA@=936^>,+-%(214(+1216
H-Be@=9(6]7N6%J@E6"%(R"6Q+17 +-%(%(+1'3+(n-q])?%('*)(+1%('/op%32J>O),7=9n-<E67=[568Rk@=)?:16>OA†M ) q +17N@E+1%('37BDHJ<
!" "

21>?HJ7E7 )O%q])?%('*) n ) o +PBDH1<Q2J>OH\7=7I)O%Uot%321>?)?7=9n iq BDH1<]+1%\457=+1<E+1‰;+^H14*@E[34*@])?%dqY)O%('3)‚n


"

6r@LR-W˜„ku


  ~&‚;{>{Pœ{ ˆ } ™{ƒ;{>{ ™ „ œ˜ƒ!{ ˆ { ™{>z>{ {>{—} {>{


~ !{Pœ{ ’ {„ 7ƒ!{, œ˜ƒ!{ ˆ { ;˜Pœ˜‚>‚;˜ n”7˜2œ } †‡,› >
 

~ !{Pœ{ ’ {„ ‹7ƒ;{, œ˜ƒ;{ ˆ { „7z7z ”7˜Pœ } ;˜Pœ˜,‚7‚;˜


  
  

E~ ‚;{>{Pœ{ ƒ}  „ ;{ ˆ  } ˆ } ;{,‚} ˆ {>{ ˆ  œ { ˆ } † } ƒ!˜ † >>} ˆ }


~€‚;{>{Pœ ™!{ƒ!{>{ ™ „ œ };„ ƒ!{,‚;{ ™!˜ >{>{7 † {>{
!{Pœ ’ {„ 7ƒ!{šƒ;} !„7z>z † ˜ ƒ!˜Pœ } †
   $ 1
    

 

`eH-]B 6H1R"@=N+16J‰P4(n3) K7=@=6S95F +aH16 @QB0R"@=@=9(+19(%d6i6KHJ68+-<E+1`e'37=Ha)O6":1>?<GA6 H-+;B:1+-@EHJ93%()?'6^%(+1@=Rr93'3HJ),+/%(7Q7N7=@=[36")O<E@=%\H143@=`(66>O%J%(6@LRrZd7"6^Wn37=qY+JH17G4(HaFI%()O>?'3>?64(7Q:17’6"H*@E< <E'3+-`P'd@=A_68@E'4(H7=`e+)O%36"2>?Ha+-F =%36"u ),%(7=+J+J'3+ +[5)?%(67N<E@=7=6H1+J%0' n
 
" "

 

 Œ E~ œ˜ƒ!{ ˆ { ˆ {>{  } |!{‚7 { ˆ  } 7 ‚{>{Pœ{ ƒ}  „ ;{ ˆ 


 
~ œ˜ƒ!{ ˆ { ™{>z>{ {>{—} † {>{ };„ ’ {7{š‚;{>{Pœ{ ˆ } ™{ƒ;{>{ $" 

~ !˜ƒ!{ ˆ { ;˜Pœ˜‚>‚;˜ n”7˜2œ } ‡,› > 7ƒ!{, ;{Pœ{ ’ {„


 
   

) 6"),7=+J, + +RrHJ%(7N@=<E4(Rk@E)OHJ%~),7^+[3<EH1[e6"<^R"H1%(7N@=<E4(Rk@E)OHJ%~)?%ˆq])?%('*) MH1%3>?A‹)O@K),7C4(7=6's>?67E7


BD<E6c\436"%\@E>OAJWd$&%s@E936'3)?+1>O68Rk@KH1BQqY)O%5'*)t[3<EH*'*4(R"6's`PAh+1%P4(7=+1<E+1‰a+3n93HaF 6":J6"<8n@E93)?7
FQ)?>O>`e6C@=9(6 %3H1<EZ_+->RrHJ%(7’@E<=45Rk@=)?H1%d4(7=6'W
           
+-%(%(+1'3+h9(+J7+s>,+-<E216%P43ZK`e6"<H1B^+1' (’68Rk@=)?:a+->][5+-<=@=),Rr)?[3)?+1>][393<L+17=67/HJ<bRr>,+-457N687
FQ93),RL9ˆRrHJ%\:J6"A)?%*BDHJ<=Z_+a@E)OHJ%h+1`5HJ4*@^@E6"%(7=61n+17=[568Rk@K6"@ER1Wb`34*@#@=936A~'*H‹%3H-@^9(+;:J6
"

)?%*BDH1<EZ_+a@=)?H1%+-`eH143@]‰;+1<E+1‰a+#<E6">,+a@E)OHJ%(7W$&%7=6"%\@E6"%(R"67  *W y „G+-%('s PW ©\„kn +-%3%(+J'3+


RrH*'*687G)O%3BDH1<EZ/+-@=)?H1%+-`eH14*@Q@E6"%(7=6C)O%68+a@=)?%32_+-%5'Z_+1‰\)?%325W
 $ "
v1v ©       $&%  %  P$]% 
 B%
 
 E~~ ‚;‚;{>{>{P{Pœœ { ™>ƒ!!{>„ {,—;;{7{‹{ ”77{Pœƒ!{{>{”,{ ”,{{Pœ>ˆ>œˆ {”,{ ˜ 7ƒ!} ˜ ;{>{>z>˜
"
 

  E~ ‚;{>{Pœ{ ;{,—;{>{‚!„ ’ „ { ”,{Pœ{”7{ { ˆ>ˆ  ˜  }


~ ‚;{>{Pœ{ |!{ ˆ {>{—;{>{š7ƒ!{7{ ”7{Pœ>œ {”7{ 7ƒ!˜ !{>{>z>˜
  
   

$l@'*HP687_%3H-@dRrH1%\@L+-)?%w)?%*BDHJ<=Z_+a@E)OHJ%y+-`eH14*@_@E936<E6">,+a@=)?H1% 68+a@b9(+J7/@=Hˆ7=[eH\HJ%
H1< Z_+1‰16/9(+17i@EHU7N[eHPH1%W_$l@^),7i@=936b`(+1RL‰P21<EH14(%('U‰P%3HaFQ>?6'*2J6#@E9(+a@ [3<EHa:P)?'367i@=9(6
<E6">,+a@=)?H1%w`e6r@’F 6"6"% @E936"ZW {3HJ<6rV3+-Z/[3>?61nI@=9(6216%36"<L+->QF H1<E>?'w‰P%3HaFQ>?6'*2J6Z/+;A
`e6b4(7=6'h@EH~7E+;A@=95+a@7=[eH\HJ% )?7^@=936)O%57’@E<=43Z/6%J@#H-B]6+-@ƒ+1%(' ),7^%3H1@K6+-@=6%5„ )?%
 *W y „kWC$&%@E936%36rVP@ 7N6%J@E6"%(R"6 PW ©\„S7N[eHPH1%‹R"H143>,'`e6K@E936)O%(7N@=<E43Z/6"%\@CH1<S@=936Z/6
H-BZ_q]+-)?‰J%(61'*W )  n1HJ%K@E936GH-@E936"<9(+1%('n-95+17H1%(>OA @’F Hi[(+-<=@=),Rr)?[3),+->*[393<L+17=67 :P) 1W AJ+1+ 9P4(+J+
 $

+-%5'@L+1+ 9P4(+J+KFQ9()?RL9URrH*'*6C[e6"<=BD6Rr@=)?:16^+-%5'R"H1%\@=)?%P43H14(7Q+J7N[e6Rr@E7GHJ%3>OAJn*61W 2(W?n


  ~™ƒ!{>{,—{>{ ƒ>!{>{ ‰ ƒ!{>z>{
}{,} ˆ › ‚7„
 

  ~™ƒ!{>{,{>{ ƒ>!{>{šz>{ !{A™!{>{


}{,!„ ˆ>† |!˜,—
  

XG9P4(7n\+1%\457=+1<E+1‰;+CF H14(>?'_`e6C+-`3>?6Y@EHK4(7=6]@E9367=6CRrH1%57’@E<=4(Rr@=)?H1%(7t)O%dq])?%('*)†H1%(>OA
FQ936%s@=936b@=6"%57N6)?%*BDHJ<=Z_+a@E)OHJ% )O% +1%3%(+J'3+)?7#+-[3[(<=HJ[3<=),+a@E619W W 4*@#FQ9(+-@#+-`eH143@
@=9(6 H-@=9(6"<G@=6%(7=61n(+J7N[e6Rr@Q+1%('dZ/HP'(+->?)}@’A XG936<=6C),7]+/7NAP%\@E+JRk@=),Ri93H1>?6C)O%UqY)O%('3) ?W
"

XG936<=6p),7 +-%3H1@=936<0[3<EH1`3>?6"Zna93HaF 6":J6"<8WXG936p@’F HY[5+-<=@=),Rr)?[3)?+1>1[39(<E+J7N687)?%Kq])?%('*)




9(+;:J6R"H*'*)O%(2~BDHJ<_‰a+-<L+-‰a+h<=6>?+-@=)?H1%(7FQ93),RL9 )?7_+1`(7N6%\@)O% +-%(%(+1'3+(W AJ+1+ 9P4(+J+


RrH*'*6870‰a+1<=Z_+ naFQ93)O>?6I@E+1+ 9\45+1+YR"H*'*67m‰a+-<=@E+3Wm‡+-<=@=),Rr)?[3)?+1>1[39(<E+J7N6i PW „07=+;A*7@=9(+-@
"

BD<E43)}@ @=95+a@QF +J7p68+a@=6%nPFQ93)?>O6_ *W „p7E+;A*7p@E936S`eHaA/FQ93H)?7 6+-@=)?%32(WXG9P4(7n*q])?%('*)†)?7




[eH\HJ<=6< @=9(+1% +1%3%(+J'3+K)?%URrH*'*)?%32@=6%(7N6Jn(+17=[e6Rk@8n3Z/H*'3+->?)}@’Ab)O%*BDHJ<=Z_+-@=)?H1%n*FQ93)?>?6




<E)?RL936<^)?%ˆRrH*'*)?%32‰a+-<L+-‰a+)?%*BDH1<EZ_+a@=)?H1% )?%ˆR"+J7N6_H1BY+1' (’68Rk@E)O:a+->I[(+-<=@=),Rr)?[3>?677W WI43@


"

@=9()?7#RrH1Z/[eH143%5'37C@=9(6[3<EH1`3>?6"Z BDH1<K+1%\457=+1<E+1‰;+(W Y7=)O%(2@=93687N6R"H1%(7N@=<E4(Rk@E)OHJ%(7 )?%


q])?%('*)eF H143>,'_Z/6+1%_[34*@N@E)O%(2#)?%b7=H1Z/6r@E93)O%(2 @=9(+-@I),7p%3H1@ R"H1%\@E+1)O%368'/)O%@=936i7NHJ43<ER"6
>,+-%32J4(+-2J6i7=6"%\@=6%(Rr6JW
XG936t+1%(7=FI6<>O)?670)O%K)?'*6%\@=)OBDA\)?%32i+-%3H1@=936<0RrHJ%(7’@E<=45Rk@=)?H1% )?%KqY)O%('3)\+1%(' Rr<E6+-@=)?%32
+bR"H1<E<=687N[eH1%5'*6"%(R"6 `56"@’FI66"%)O@i+1%('@E936R"H1%(7N@=<E4(Rr@=)?H1%(7Q4(%('*6"<CRrHJ%(7=)?'*6<E+-@=)?H1%U)?%
" +-%(q]%()?+1%('3'*+()eW 9(+J7I+1%3H-@E936"<tR"H1%(7N@=<E4(Rr@=)?H1%*f @=93'6 ) (’H +\R"H1%(7N@=<E4(Rr@=)?H1%*flFQ93),RL9/+1>O>?HaF]7ˆ`eH-@E9
@=@=69(%(6^7=7N6b[e+-HP%(H1's%dFQ‰a+1)}@E<E9d+1‰;FQ+9()?)?%*RLBD79 HJ<=U]Z_+1+aZ @E)OHJ9(%s+17G@=H68+a`e@E6"6b% 7=+3[5F 68Rr6 )OT(R"+16'%W‹FQ{3<E)}HJ@E<^6^6r+-V3%P+-AZ/H1[3B0>?@=619(n 6C@EH‹BDH17=>?>O+;HaA FQ)?) %3FG2(+1u 7=9
  { ‚;{7{Pœ{ ˆ } „ ’ {‹”7{Pœ>œ{>”7{ ’ } ™ƒ;{>{,—;{>{š,ƒ!{>{  ’ {A™˜ 7ƒ;˜ !{>{>z7˜ 

 –˜– šk¥ —  œLCa¤P š ¤ • –˜—;¥ › —¥;™OœE– šk › – ¥–˜ ž ¥ › œ – ¥N¥ œL£k¥ › ¥ œ  › –˜¥šk— ;›  ¥ ›‚› – ¥ ;–   ¥š › ¥¤˜¥N— œ  › —^ • šLœ ™—   •;•;– 𙥠œ ž› œ t– ¥”• ¥œ — –˜—8™Ošk › ×];œE– ¥ C–˜šk— –˜—ižNšk— — œƒ × ¥ — –˜ž¦
  

) )  (  . (    *  
& # )#      1   )   1  1 

£Lœ › lœC–˜šk—i› ¥ ž œ ž’šk¥ › › šL¥ ™3ž ‚–˜— ¤  › œ "— œkœ –  • –  ¥ œkœ ¥ ž’› š ¢ ¥ £Eœ › ×]œ –˜—iž œ ¥ šL™ œL£Eœ › ×]œL£Eœ šk›  › œ — –  –  ¥  ¥ ›‚¢ ;œ —
)  '*
 #1      1 1 # 1
 1 1'*
 n!     v1v
‚;{7{P{ œƒ{š7}7ƒ‚ †} >ƒ˜>˜„7”Aƒ „’A7‰ ƒ˜>˜ ˆ7ƒ„„>ˆ”Aƒ’ 7‚;{P}œ ƒ!{,{ } ˆ }{,} 7ƒ;{, !{ ’ ƒ
  | ‚;{7{Pœ’ { ˆ } ™ƒ!’A‰{>{,—ˆ{>{š7ƒ!{>{ „ ’ { ”,{Pœ>œ{”,{’ ’ }  ˆ’ } 7ƒ!˜ ;{>{>z>˜

  

  ” ‚;{7{Pœ{ ˆ } ™ƒ!{>{,—{>{š7ƒ!{>{ „ ’ { ’ }  ’ { ”,{Pœ>œ{”,{ ™˜ 7ƒ!˜ !{>{>z>˜


   

   

Xm *H~W ©P6"„kVPn*[(F <=6^687=R"7K+1%@=93)?6U%P:17E6"+-%\Z/@]6b+)?%3%*H-BDHJ@L<=+aZ_@E)OHJ+a%d@E)OHJ+1%Š>OHJ%3+J7#2/)?FQ%ˆ)}@E@E9936 @E936+-(’H-%3f&%5Rr+1HJ'3%(+~7N@=<E7N4(6%JRk@E@E)O6"HJ%(%R"6+17d7  BD*HJW>Oy>?Ha„KF]+17%(u '



!" 

  ‚{>{Pœ{ ˆ } ™ƒ!{>{—;{>{š7ƒ!{7{ >˜> {ƒ!{ ”,{Pœ>œ{”,{ ™˜ 7ƒ!˜ !{>{>z>˜


     

XG) 93+?W 6Q) (’:PH )?`39(+*+-Rr‰\H1@E4()*>?Z/'b+16"<=:1‰J66"%b<L7I`eƒ6C)‚W 61<=6W?n-[3@E>?93+J6GRr68BD'43%(`PRr9A @=)?) H1(’HJ%(43+1%(>3E+F +JH1@E<LH#'37[(<=7NH*6Jn-'*‰14(HCR"6i6r@L+#R-W˜‰P„^)O%5+1'b<=6GH1<EBm6"Rr[3HJ>,+1>O>?R"H*6cJ'^4(`P)?+1A >
q])?%('*)0)O%U7=H1Z/6C<=621)?H1%W
 

 $ ‚{>{Pœ{ ˆ } ™ƒ!{>{—;{>{ 7ƒ;{>{ >˜ ˆ { {ƒ!{‹”7{Pœ>œ{>”7{ ™˜ ,ƒ!˜ !{>{7z>˜   

]%(>O)?‰16d@=9(6T(<E7N@R"+J7N6Jnp@=9()?7/),'*6+~@L+-‰1687_7NHJZ/6b@E)OZ/6+-%5' 6"eHJ<N@/BDH1</@E936Uq])?%('*)
<E6+1'36"< @=H_2J6r@Q4(7=6'@=H5W
        
 

)?)?XG%%93qY$&),%(7 )O'*%5=),'*%3+3)06 WQ_7=$&6"%‹R"%\H1@=R%(6+1%(7N7=@=Rr6^<E6J4(nH-RkB@E=)O@EHJ%393%6 6K),[37Y),<E7G+/64(7=[e6"7N%\686'dR"@Y43HJFQ>?)?<Y+1)}@E[(<=9)O+J@’A7’@E@S93H16 [5B 6‰aHJ<N+-%3BD<=68>?@ERkAb+(@=u)?@E:19366^+1gh7=[56687NRk@=@Y6"<EH1%UB`e@=936">O6#@YZ/>?+1+1%3)O21%‹45:1+-6"21<E68` 7


 

 

 ~&!{‚;P{>œ{Pœ{ },‚ˆ } ‰ ƒ!‚7!{7z>„{ ™{,ƒ!}{7{,—;{>{


!{Pœs{†}š7ƒ;› } › ‚7„
% 

$&%UR"+J7N6CH-B AJ+J+ 2J+;AJ+J+KX ‘ X >,+-`e6"> n ) %(6[+3),7G%3H-@]457N68'W




  ~€‚;{>{Pœ{ ‰ ƒ!{>z7{ ™ƒ!{>{ † {,—;{>{


  

XG936<=6"BDH1<E61n-F 6YR"+1%_[5H\7’@E43>,+a@=6i+ %36FwF H1<L'/X ‘ X NA\+1+,) iFQ)O@=9b7E+-Z/6Y7=6"Z_+-%3f


@=),R"7 +17 NAJ+J+ (n-`34*@ FQ93)?RL9'*HP67p%(H-@I457N6 =%36 KRrH1%57’@E<=4(Rr@=)?H1%M\FQ)O@=9_@E93)?7IX ‘ X F 6
 

R"+1%6"V*[3<=687=7 @E936^RrHJ<=<E67=[eH1%('*)?%32 +-%3%(+J'3+7N6%J@E6"%(R"6 Z/H1<E6YBƒ+1)}@E9*BD43>?>OA+17u


   

‚;{>{2œ{ ‰ ƒ!{7z>{ ™ƒ!{>{—;{>{


"

/    !    

gh6‹9(+;:J6+-<E214368' @E9(+a@)}@d)?7b[5H\7=7=)O`(>O6U@EHˆHa:16<ER"H1Z/6d@=936~>?+1%32145+-216`(+1<=<E)?6"<)?%
$&%('*),+U4(7=)?%32+-%P4(7E+-<L+-‰a+3W ‘ %P4(7E+-<L+-‰a+d@=<E)O687C@=H@E+-‰J6+1'*:a+1%J@L+-2J6H1BG@=936<E6">,+a@E)O:J6
7N@=<E6"%321@=9(7 H-B5@=936]R"H1Z/[34*@E6"<+-%5'^@E936G9\4(Z/+1%#<E6+J'*6"<8naFQ936<=6I@=936QR"H1Z/[34*@E6"<@E+1‰1687
@=9(6i>,+-%(214(+1216Y>?HJ+J'b+1%('b>?6+;:J67t@=936CF H1<E>?'‰\%(HaFQ>O68'*216i>OH\+1'bH1%@E936C<=68+1'*6<W $l@Q)?7
[(+1<N@E)?R"43>?+1<=>?AY6"e68Rk@E)O:J6pFQ936"%^@=936I>?+1%3214(+121687+-<E6pRr>?HJ7=61n;+170)?70@=936IR"+17=6pFQ)O@=9^$&%5'*)?+1%
( ›ž’š ;œE— "  Mœ ¥ "›×]œ œœ –˜› —  ¢  šL¥ ™ œ šL—; ™*• –˜—;¥–˜— – ¡  0¥ £r›k›l¥ œ œƒ ×>œE ×]š œ—;› š  ›œ ¥œE—   "• œ œE"šœ  •–˜• — ¥¥  — • "¥ ¥ œkœ 8œ — › š › £ œE›‚š – " ¥œ k›"œ œƒ×>–˜› — œE×]  "œ œ –˜›— ";œ –Jž’"škœk›‚œ› ¥ 0šk– —¤˜¤ ¢ ¥  š— œE £r;l› œ œL¤
)
 '
'



  (
1 
 

&
 (
(
1
 '
( )  (
-
 '
*
 
 (
&
11



( 
UNIT – II
Introduction to Semantics and Knowledge Representation: some applications like Machine
translation, database interface Semantic Interpretation, word senses and ambiguity, Basic logical
form language, Encoding ambiguity in logical from, Thematic roles, Linking syntax and semantics,
Recent trends in NLP.

Introduction: NLIDBS ( Application of NLP )

Natural Language Processing (NLP) has many applications that require either
Natural Language understanding or Natural Language generation or even both - for
example Machine Translation, which focuses on translating text from one human
language to another automatically. Another good example of NLP is Information
Extraction which is concerned with “factual information in free text ”. Natural
Language Interface to Database (NLIDB) is one of the traditional applications of the
NLP domain. It is a powerful example of Question answering systems (QAS) for
querying structured and unstructured databases. NLIDB has been an interactive area
of research that aims to provide accurate answers to user questions expressed in
Natural Language and generalize access to databases for different types of users
regardless of their technical skills

Above figure shows the components of Natural Language Interface to Database. The
idea of NLIDB systems came from the method of questioning the database that uses
Natural Language queries like English instead of database language to query the
database.

Architecture diagram for NLIDB:


1. Different approaches and architectures in the field of NLIDBs:
a. Pattern matching systems: Pattern Matching systems appeared in the late 60s
and early 70s. It uses the FLIP (Formal List Processor) language based on the
structure of the LISP (List Processing) language. It creates a query in NLIDB from an
entry corresponding to a predefined model using a set of rules. Many NLIDB systems
use Pattern matching systems because of its simplicity to map user inputs expressed
in natural language (NL) to a query in database query language: the NL query is
processed by first associating it to a model. Then it is transformed into a logical form
according to the model to which it belongs. Finally, the system formulates the query
in NLIDB. Moreover, these systems often manage to find reasonable answers to user
requests. But, they are limited just to a specific database. One of the best systems
based on this architecture is ELIZA.
b. Syntax-based systems: In syntax-based systems, the Natural Language query is
parsed syntactically, and the resulting parsing tree is directly mapped to a Database
Query Language expression. These systems use grammatical rules that describe the
different possible syntactic structures of queries and a lexicon of words that may
appear in the user's queries. The main advantage of this approach is that it provides
detailed information about sentence structure, words, grammatical function, the
relationships between these words, and how sentences can be grouped to form more
complex sentences. The best known syntax-based NLIDB is LUNAR.

c. Semantic grammar systems: A semantic grammar system is similar to a syntax-


based system. The query result is obtained by mapping the analysis tree to a
database query [6]. The basic idea of such a system is to simplify the analysis tree as
much as possible, by removing unnecessary nodes or combining some nodes. But, the
main disadvantage is that the analysis tree in this system has specific structures and
node labels, which makes it useless for other applications. Many NLIDBs such as
Planes, Ladder, and Rel are based on this architecture.

d. Intermediate Representation Languages: Intermediate representation systems


were proposed because of the difficulties of translating user queries expressed in
Natural Language (NLQ) directly into a database query language. The idea of this
method is to first match the NLQ to an intermediate logical query expressed in a
representation language. Then, it will be translated into a query in NLIDB. One
known system that uses this architecture is PRECISE.

2. There are different ways of classifying NLIDB systems:

 Graphical NLIDBs: Presents users with an interface where they can make
proposed selections for query formulation. NL-Menu system as an example.
 Textual NLIDBs: Allows users to express their queries in NL by directly
writing it easily, LUNAR and PRECISE are systems that use Textual NLIDB.
 NLIDBs dependent on the database domain: it is necessary to know
beforehand all the particularities of the field of application like LUNAR, ASK,
and GILNDB systems.
 NLIDBs independent of the database domain: PRECISE is an example of this
type of NLIDB system where knowledge of the field of application is not
necessary.
 Classification of NLIDB by language: Most of the NLIDBs that exist answer
English requests since this language is the main language of several countries.
However, this does not prevent having other NLIDBs that allow access to the
information stored in a database through the formulation of user queries in
other languages: Arabic NLIDBs , Indien NLIDBs , French NLIDBs, Chinese
NLIDBs, Bengali Language Query Processing System, and multilingual NLIDBs
(the EDITE system which supports French, English, and Spanish languages).

3. Advantages and disadvantages of NLIDBs


The main advantages of NLIDBs are listed below:
1. Absence of artificial language: NLIDB systems allow users to access the
information stored in a database using queries written in Natural Language easily.
2. Simple to use: The use of NLIDB is easier than the query languages of databases
or forms
3. Better for some questions: some queries that involve negation or quantification
can be easily expressed in Natural Language.
4.Easy to use for multiple tables: queries that involve joins between tables have no
difficulty for users, it is sufficient that the user types the query and the system
displays the requested results.
5.Tolerance for grammatical errors: most NLIDB systems support small
grammatical errors. While the majority of computer systems require syntax rules.

Despite the development of many NLIDB systems, their use is not widespread due to
a set of constraints, some of which are listed below:
1.Lack of obvious language coverage: Some NLIDB systems cannot answer all
questions expressed in Natural Language.
2.Absence of explanations in case of failures: In the case of failures, some NLIDBs
do not provide any explanation at all.
3. Disappointment with user expectations: Individuals can be misled by the ability
of an NLIDB system to process all their queries in Natural Language.
4.Suitability of NLIDB for any user.
UNIT – III
Grammars and Parsing: Grammars and sentence Structure, Top-Down and Bottom-Up Parsers,
Transition Network Grammars, Top-Down Chart Parsing. Feature Systems and Augmented
Grammars: Basic Feature system for English, Morphological Analysis and the Lexicon, Parsing with
Features, Augmented Transition Networks.

Chapter 3. Grammars and Parsing

To examine how the syntactic structure of a sentence can be computed, you must consider
two things: the grammar, which is a formal specification of the structures allowable in the
language, and the parsing technique, which is the method of analyzing a sentence to determine its
structure according to the grammar. This chapter examines different ways to specify simple
grammars and considers some fundamental parsing techniques. Next chapter then describes the
methods for constructing syntactic representations that are useful for later semantic
interpretation.

3.1 Grammars and Sentence Structure

This section considers methods of describing the structure of sentences and explores ways of
characterizing all the legal structures in a language. The most common way of representing how a
sentence is broken into its major subparts. and how those subparts are broken up in turn, is as a
tree. The tree representation for the sentence John ate the cat is shown in Figure 3.1.

Figure 3.1 A tree representation of John ate the cat

This illustration can be read as follows: The sentence (S) consists of an initial noun phrase (NP) and
a verb phrase (VP). The initial noun phrase is made of the simple NAME John. The verb phrase is
composed of a verb (V) ate and an NP, which consists of an article (ART) the and a common noun
(N) cat. In list notation this same structure could be represented as

(S (NP (NAME John))


(VP (V ate)
(NP (ART the)
(N cat) )))
Since trees play such an important role throughout this book, some terminology needs to be
introduced. Trees are a special form of graph, which are structures consisting of labeled nodes (for
example, the nodes are labeled S, NP, and so on in Figure 3.1 connected by links. They are called
trees because they resemble upside-down trees, and much of the terminology is derived from this
analogy with actual trees. The node at the top is called the root of the tree, while
the nodes at the bottom are called the leaves. We say a link points from a parent node to a child
node. The node labeled S in Figure 3.1 is the parent node of the nodes labeled NP and VP, and the
node labeled NP is in turn the parent node of the node labeled NAME. While every child node has a
unique parent, a parent may point to many child nodes. An ancestor of a node N is defined as N's
parent, or the parent of its parent, and so on. A node is dominated by its ancestor nodes. The root
node dominates all other nodes in the tree.
To construct a tree structure for a sentence, you must know what structures are legal for English. A
set of rewrite rules describes what tree structures are allowable. These rules say that a certain
symbol may be expanded in the tree by a sequence of other symbols. A set of rules that would allow
the tree structure in Figure 3.1 is shown as Grammar 3.2.

Grammar 3.2 A simple grammar

Rule 1 says that an-S may consist of an NP followed by a VP. Rule 2 says that a VP may consist of a V
followed by an NP. Rules 3 and 4 say that an NP may consist of a NAME or may consist of an ART
followed by an N. Rules 5 - 8 define possible words for the categories. Grammars consisting entirely
of rules with a single symbol on the left-hand side, called the mother, are called context-free
grammars (CFGs). CFGs are a very important class of grammars for two reasons: The formalism is
powerful enough to describe most of the structure in natural languages, yet it is restricted enough
so that efficient parsers can be built to analyze sentences. Symbols that cannot be further
decomposed in a grammar, namely the words in the preceding example, are called terminal
symbols. The other symbols, such as NP, VP, and 5, are called non-terminal symbols. The
grammatical symbols such as N and V that describe word categories are called lexical symbols. Of
course, many words will be listed under multiple categories. For example, can would be listed
under V and N.

Grammars have a special symbol called the start symbol. In this book, the start symbol will always
be S. A grammar is said to derive a sentence if there is a sequence of rules that allow you to rewrite
the start symbol into the sentence. For instance, Grammar 3.2 derives the sentence John ate the cat.
This can be seen by showing the sequence of rewrites starting from the S symbol, as follows:
S => NP VP (rewriting S)
=> NAME VP (rewriting NP)
=> John VP (rewriting NAME)
=> John V NP (rewriting VP)
=> John ate NP (rewriting V)
=> John ate ART N (rewriting NP)
=> John ate the N (rewriting ART)
=> John ate the cat (rewriting N)

Two important processes are based on derivations. The first is sentence generation, which uses
derivations to construct legal sentences. A simple generator could be implemented by randomly
choosing rewrite rules, starting from the S symbol, until you have a sequence of words. The
preceding example shows that the sentence John ate the cat can be generated from the grammar.
The second process based on derivations is parsing, which identifies the structure of sentences
given a grammar. There are two basic methods of searching. A top-down strategy starts with the S
symbol and then searches through different ways to rewrite the symbols until the input sentence is
generated, or until all possibilities have been explored. The preceding example demonstrates that
John ate the cat is a legal sentence by showing the derivation that could be found by this process.
In a bottom-up strategy, you start with the words in the sentence and use the rewrite rules
backward to reduce the sequence of symbols until it consists solely of S. The left-hand side of each
rule is used to rewrite the symbol on the right-hand side. A possible bottom-up parse of the
sentence is
John ate the cat
=> NAME ate the cat (rewriting John)
=> NAME V the cat (rewriting ate)
=> NAME V ART cat (rewriting the)
=> NAME V ART N (rewriting cat)
=> NP V ART N (rewriting NAME)
=> NP V NP (rewriting ART N)
=> NP VP (rewriting V NP)
=> S (rewriting NP VP)
A tree representation, such as Figure 3.1, can be viewed as a record of the CFG rules that account
for the structure of the sentence. In other words, if you kept a record of the parsing process,
working either top-down or bottom-up, it would be something similar to the parse tree
representation.

3.2 What Makes a Good Grammar

In constructing a grammar for a language, you are interested in generality, the range of sentences
the grammar analyzes correctly; selectivity, the range of non-sentences it identifies as problematic;
and understandability, the simplicity of the grammar itself.

In small grammars, such as those that describe only a few types of sentences, one structural
analysis of a sentence may appear as understandable as another, and little can be said as to why
one is superior to the other. As you attempt to extend a grammar to cover a wide range of
sentences, however, you often find that one analysis is easily extendable while the other requires
complex modification. The analysis that retains its simplicity and generality as it is extended is
more desirable.

Unfortunately, here you will be working mostly with small grammars and so will have only a few
opportunities to evaluate an analysis as it is extended. You can attempt to make your solutions
generalizable, however, by keeping in mind certain properties that any solution should have. In
particular, pay close attention to the way the sentence is divided into its subparts, called
constituents. Besides using your intuition, you can apply a few specific tests, discussed here.

Anytime you decide that a group of words forms a particular constituent, try to construct a new
sentence that involves that group of words in a conjunction with another group of words classified
as the same type of constituent. This is a good test because for the most part only constituents of
the same type can be conjoined. The sentences in Figure 3.3, for example, are acceptable, but the
following sentences are not:

*I ate a hamburger and on the stove.


*I ate a cold hot dog and well burned.
*I ate the hot dog slowly and a hamburger.
To summarize, if the proposed constituent doesn't conjoin in some sentence with a constituent of
the same class, it is probably incorrect.

Another test involves inserting the proposed constituent into other sen -tences that take the same
category of constituent. For example, if you say that John’s hitting of Mary is an NP in John’s hitting
of Mary alarmed Sue, then it should be usable as an NP in other sentences as well. In fact this is
true—the NP can be the object of a verb, as in I cannot explain John’s hitting of Mary as well as in the
passive form of the initial sentence Sue was alarmed by John’s hitting of Mary. Given this evidence,
you can conclude that the proposed constituent appears
to behave just like other NPs.

NP- NP: I ate a hamburger and a hot dog.


VP -VP: I will eat the hamburger and throw away the hot dog.
S- S: I ate a hamburger and John ate a hot dog.
PP -PP: I saw a hot dog in the bag and on the stove.
ADJP- ADJP: I ate a cold and well burned hot dog.
ADVP - ADVP: I ate the hot dog slowly and very carefully.
N - N: I ate a hamburger and hot dog.
V - V: I will cook and burn a hamburger.
AUX - AUX: I can and will eat the hot dog.
ADJ - ADJ: I ate the very cold and burned hot dog (that is, very cold and very burned).

Figure 3.3 Various forms of conjunctions

As another example of applying these principles, consider the two sentences I looked up John ‘s
phone number and I looked up John ‘s chimney. Should these sentences have the identical structure?
If so, you would presumably analyze both as subject-verb-complement sentences with the
complement in both cases being a PP. That is, up John’s phone number would be a PP.

When you try the conjunction test, you should become suspicious of this analysis. Conjoining up
John’s phone number with another PP, as in *I looked up John’s phone number and in his cupboards, is
certainly bizarre. Note that I looked up John’s chimney and in his cupboards is perfectly acceptable.
Thus apparently the analysis of up John ‘s phone number as a PP is incorrect.

Further evidence against the PP analysis is that up John ‘s phone number does not seem usable as a
PP in any sentences other than ones involving a few verbs such as look or thought. Even with the
verb look, an alternative sentence such as *Up John’s phone number, I looked is quite implausible
compared to Up John’s chimney, I looked.

This type of test can be taken further by considering changing the PP in a manner that usually is
allowed. In particular, you should be able to replace the NP John’s phone number by the pronoun it.
But the resulting sentence, I looked up it, could not be used with the same meaning as I looked up
John ‘s phone number. In fact, the only way to use a pronoun and retain the original meaning is to
use I looked it up, corresponding to the form I looked John’s phone number up.

Thus a different analysis is needed for each of the two sentences. If up John’s phone number is not a
PP, then two remaining analyses may be possible. The VP could be the complex verb looked up
followed by an NP, or it could consist of three components: the V looked, a particle up, and an NP.
Either of these is a better solution. What types of tests might you do to decide between them?
As you develop a grammar, each constituent is used in more and more different ways. As a result,
you have a growing number of tests that can be per-formed to see if a new analysis is reasonable or
not. Sometimes the analysis of a new form might force you to back up and modify the existing
grammar. This backward step is unavoidable given the current state of linguistic knowledge. The
important point to remember, though, is that when a new rule is proposed for a grammar, you must
carefully consider its interaction with existing rules.

3.3 A Top-Down Parser

A parsing algorithm can be described as a procedure that searches through various ways of
combining grammatical rules to find a combination that generates a tree that could be the structure
of the input sentence. To keep this initial formulation simple, we will not explicitly construct the
tree. Rather, the algorithm will simply return a yes or no answer as to whether such a tree could be
built. In other words, the algorithm will say whether a certain sentence is accepted by the grammar
or not. This section considers a simple top-down parsing method in some detail and then relates
this to work in artificial intelligence (Al) on search procedures.

A top-down parser starts with the S symbol and attempts to rewrite it into a sequence of
terminal symbols that matches the classes of the words in the input sentence. The state of the parse
at any given time can be represented as a list of symbols that are the result of operations applied so
far, called the symbol list. For example, the parser starts in the state (S) and after applying the rule
S -> NP VP the symbol list will be (NP VP). If it then applies the rule NP ->ART N, the symbol list will
be (ART N VP), and so on.

The parser could continue in this fashion until the state consisted entirely of terminal
symbols, and then it could check the input sentence to see if it matched. But this would be quite
wasteful, for a mistake made early on (say, in choosing the rule that rewrites 5) is not discovered
until much later. A better algorithm checks the input as soon as it can. In addition, rather than
having a separate rule to indicate the possible syntactic categories for each word, a structure called
the lexicon is used to efficiently store the possible categories for each word. For now the lexicon
will be very simple. A very small lexicon for use in the examples is

cried: V
dogs: N, V
the: ART

With a lexicon specified, a grammar, such as that shown as Grammar 3.4, need not contain any
lexical rules.

1. S -> NP VP
2. NP -> ART N
3. NP -> ART ADJ N
4. VP -> V
5. VP -> V NP

Grammar 3.4

Given these changes, a state of the parse is now defined by a pair: a symbol list similar to before
and a number indicating the current position in the sentence. Positions fall between the words,
with 1 being the position before the first word. For example, here is a sentence with its positions
indicated:
1 The 2 dogs 3 cried 4
A typical parse state would be
((N VP) 2)
indicating that the parser needs to find an N followed by a VP, starting at position two. New states
are generated from old states depending on whether the first symbol is a lexical symbol or not. If it
is a lexical symbol, like N in the preceding example, and if the next word can belong to that lexical
category, then you can update the state by removing the first symbol and updating the position
counter. In this case, since the word dogs is listed as an N in the lexicon, the next parser state would
be ((VP) 3)
which means it needs to find a VP starting at position 3. If the first symbol is a non-terminal, like
VP, then it is rewritten using a rule from the grammar. For example, using rule 4 in Grammar 3.4,
the new state would be
((V) 3)
which means it needs to find a V starting at position 3. On the other hand, using rule 5, the new
state would be
((V NP) 3)
A parsing algorithm that is guaranteed to find a parse if there is one must systematically explore
every possible new state. One simple technique for this is called backtracking. Using this approach,
rather than generating a single new state from the state ((VP) 3), you generate all possible new
states. One of these is picked to be the next state and the rest are saved as backup states. If you ever
reach a situation where the current state cannot lead to a solution, you simply pick a new current
state from the list of backup states. Here is the algorithm in a little more detail.

A Simple Top-Down Parsing Algorithm


The algorithm manipulates a list of possible states, called the possibilities list. The first element of
this list is the current state, which consists of a symbol list and a word position In the sentence, and
the remaining elements of the search state are the backup states, each indicating an alternate
symbol-list—word-position pair. For example, the possibilities list
(((N) 2) ((NAME) 1) ((ADJ N) 1))
indicates that the current state consists of the symbol list (N) at position 2, and that there are two
possible backup states: one consisting of the symbol list (NAME) at position 1 and the other
consisting of the symbol list (ADJ N) at position 1.
Step Current State Backup States Comment
1 ((S) 1) initial position
2 ((NP VP) 1) rewriting S by rule I
3 ((ART N VP) 1) rewriting NP by rules 2 & 3
((ART ADJ N VP) I)
4 ((N VP) 2) matching ART with the
((ART ADJ N VP) 1)
5 ((VP) 3) matching N with dogs
((ART ADJ N VP) 1)
6 ((V) 3) rewriting VP by rules 5—8
((V NP) 3)
((ART ADJ N VP) 1)
7 the parse succeeds as V is matched to
cried, leaving an empty grammatical
symbol list with an empty sentence
Figure 3.5 Top-down depth-first parse of 1 The 2 dogs 3 cried 4

The algorithm starts with the initial state ((S) 1) and no backup states.
1. Select the current state: Take the first state off the possibilities list and call it C. If the
possibilities list is empty, then the algorithm fails (that is, no successful parse is possible).
2. If C consists of an empty symbol list and the word position is at the end of the sentence, then
the algorithm succeeds.
3. Otherwise, generate the next possible states.

3.1. If the first symbol on the symbol list of C is a lexical symbol, and the next word in the
sentence can be in that class, then create a new state by removing the first symbol from
the symbol list and updating the word position, and add it to the possibilities list.

3.2. Otherwise, if the first symbol on the symbol list of C is a non-terminal, generate a new
state for each rule in the grammar that can rewrite that nonterminal symbol and add
them all to the possibilities list.

Consider an example. Using Grammar 3.4, Figure 3.5 shows a trace of the algorithm on the sentence
The dogs cried. First, the initial S symbol is rewritten using rule 1 to produce a new current state of
((NP VP) 1) in step 2. The NP is then rewritten in turn, but since there are two possible rules for NP
in the grammar, two possible states are generated: The new current state involves (ART N VP) at
position 1, whereas the backup state involves (ART ADJ N VP) at position 1. In step 4 a word in
category ART is found at position I of the sentence, and the new current state becomes (N VP). The
backup state generated in step 3 remains untouched. The parse continues in this fashion to step 5,
where two different rules can rewrite VP. The first rule generates the new current state, while the
other rule is pushed onto the stack of backup states. The parse completes successfully in step 7,
since the current state is empty and all the words in the input sentence have been accounted for.
Consider the same algorithm and grammar operating on the sentence
1 The 2 old 3 man 4 cried 5
In this case assume that the word old is ambiguous between an ADJ and an N and that the word
man is ambiguous between an N and a V (as in the sentence The sailors man the boats). Specifically,
the lexicon is
the: ART
old: ADJ, N
man: N, V
cried: V
The parse proceeds as follows (see Figure 3.6). The initial S symbol is rewritten by rule 1 to
produce the new current state of ((NP VP) 1). The NP is rewritten in turn, giving the new state of
((ART N VP) 1) with a backup state of ((ART ADJ N VP) 1). The parse continues, finding the as an
ART to produce the state ((N VP) 2) and then old as an N to obtain the state ((VP) 3). There are now
two ways to rewrite the VP, giving us a current state of ((V) 3) and the backup states of ((V NP) 3)
and ((ART ADJ N) 1) from before. The word man can be parsed as a V. giving the state (04).
Unfortunately, while the symbol list is empty, the word position is not at the end of the sentence, so
no new state can be generated and a backup state must be used. In the next cycle, step 8, ((V NP) 3)
is attempted. Again man is taken as a V and the new state ((NP) 4) generated. None of the rewrites
of NP yield a successful parse. Finally, in step 12, the last backup state, ((ART ADJ N VP) 1), is tried
and leads to a successful parse.
Figure 3.6 A top-down parse of 1 The 2 old 3 man 4 cried 5

Parsing as a Search Procedure


You can think of parsing as a special case of a search problem as defined in Al. In particular, the top-
down parser in this section was described in terms of the following generalized search procedure.
The possibilities list is initially set to the start state of the parse. Then you repeat the following
steps until you have success or failure:
1. Select the first state from the possibilities list (and remove it from the list).
2. Generate the new states by trying every possible option from the selected state (there may
be none if we are on a bad path).
3. Add the states generated in step 2 to the possibilities list.

For a depth-first strategy, the possibilities list is a stack. In other words, step 1 always takes the
first element off the list, and step 3 always puts the new states on the front of the list, yielding a
last-in first-out (LIFO) strategy.
In contrast, in a breadth-first strategy the possibilities list is manipulated as a queue. Step 3 adds
the new positions onto the end of the list, rather than the beginning, yielding a first-in first-out
(FIR)) strategy.
Figure 3.7 Search tree for two parse strategies (depth-first strategy on left; breadth-first on right)

We can compare these search strategies using a tree format, as in Figure 3.7, which shows
the entire space of parser states for the last example. Each node in the tree represents a parser
state, and the sons of a node are the possible moves from that state. The number beside each node
records when the node was selected to be processed by the algorithm. On the left side is the order
produced by the depth-first strategy, and on the right side is the order produced by the breadth-
first strategy. Remember, the sentence being parsed is 1 The 2 old 3 man 4 cried 5

The main difference between depth-first and breadth-first searches in this simple example is
the order in which the two possible interpretations of the first NP are examined. With the depth-
first strategy, one interpretation is considered and expanded until it fails; only then is the second
one considered. With the breadth-first strategy, both interpretations are considered alternately,
each being expanded one step at a time. In this example, both depth-first and breadth-first searches
found the solution but searched the space in a different order. A depth-first search often moves
quickly to a solution but in other cases may spend considerable time pursuing futile paths. The
breadth-first strategy explores each possible solution to a certain depth before moving on. In this
particular example -the depth-first strategy found the solution in one less step than the breadth-
first. (The state in the bottom right hand side of Figure 3.7 was not explored by the depth-first
parse.)

In certain cases it is possible to put these simple search strategies into an infinite loop. For
example, consider a left-recursive rule that could be a first account of the possessive in English (as
in the NP the man ‘s coat):
NP -> NP 's N
With a naive depth-first strategy, a state starting with the non-terminal NP would be rewritten to a
new state beginning with NP s N. But this state also begins with an NP that could be rewritten in the
same way. Unless an explicit check were incorporated into the parser, it would rewrite NPs forever!
The breadth-first strategy does better with left-recursive rules, as it tries all other ways to rewrite
the original NP before coming to the newly generated state with the new NP. But with an
ungrammatical sentence it would not terminate because it would rewrite the NP forever while
searching for a solution. For this reason, many systems prohibit left-recursive rules from the
grammar.

Many parsers built today use the depth-first strategy because it tends to minimize the
number of backup states needed and thus uses less memory and requires less bookkeeping.

3.4 A Bottom-Up Chart Parser

The main difference between top-down and bottom-up parsers is the way the grammar rules are
used. For example, consider the rule
NP -> ART ADJ N
In a top-down system you use the rule to find an NP by looking for the sequence ART ADJ N. In a
bottom-up parser you use the rule to take a sequence ART ADJ N that yOu have found and identify
it as an NP. The basic operation in bottom-up parsing then is to take a sequence of symbols and
match it to the right-hand side of the rules. You could build a bottom-up parser simply by
formulating this matching process as a search process. The state would simply consist of a symbol
list, starting with the words in the sentence. Successor states could be generated by exploring all
possible ways to
 rewrite a word by its possible lexical categories
 replace a sequence of symbols that matches the right-hand side of a grammar rule by its left-
hand side symbol

Unfortunately, such a simple implementation would be prohibitively expensive, as the


parser would tend to try the same matches again and again, thus duplicating much of its work
unnecessarily. To avoid this problem, a data structure called a chart is introduced that allows the
parser to store the partial results of the matching it has done so far so that the work need not be
reduplicated.
Matches are always considered from the point of view of one constituent, called the key. To
find rules that match a string involving the key, look for rules that start with the key, or for rules
that have already been started by earlier keys and require the present key either to complete the
rule or to extend the rule. For instance, consider Grammar 3.8.

1. S -> N PVP
2. NP -> ART ADJ N
3. NP -> ART N
4. NP -> ADJ N
5. VP -> AUX VP
6. VP -> V NP
Grammar 3.8 A simple context-free grammar

Assume you are parsing a sentence that starts with an ART. With this ART as the key, rules 2 and 3
are matched because they start with ART. To record this for analyzing the next key, you need to
record that rules 2 and 3 could be continued at the point after the ART. You denote this fact by
writing the rule with a dot (●), indicating what has been seen so far. Thus you record
2'. NP -> ART ●ADJ N
3'. NP -> ART ● N
If the next input key is an ADJ, then rule 4 may be started, and the modified rule 2 may be extended
to give
2''. NP -> ART ADJ ● N

The chart maintains the record of all the constituents derived from the sentence so far in the
parse. It also maintains the record of rules that have matched partially but are not complete. These
are called the active arcs. For example, after seeing an initial ART followed by an ADJ in the
preceding example, you would have the chart shown in Figure 3.9. You should interpret this figure
as follows. There are two completed constituents on the chart: ART1 from position 1 to 2 and ADJ1
from position 2 to 3. There are four active arcs indicating possible constituents. These are indicated
by the arrows and are interpreted as follows (from top to bottom). There is a potential NP starting
at position 1, which needs an ADJ starting at position 2. There is another potential NP starting at
position 1, which needs an N starting at position 2. There is a potential NP starting at position 2
with an ADJ, which needs an N starting at position 3. Finally, there is a potential NP starting at
position 1 with an ART and then an ADJ, which needs an N starting at position 3.

Figure 3.9 The chart after seeing an ADJ in position 2

The basic operation of a chart-based parser involves combining an active arc with a
completed constituent. The result is either a new completed constituent or a new active arc that is
an extension of the original active arc. New completed constituents are maintained on a list called
the agenda until they themselves are added to the chart. This process is defined more precisely by
the arc extension algorithm shown in Figure 3.10.

Figure 3.10 The arc extension algorithm


Given this algorithm, the bottom-up chart parsing algorithm is specified in Figure 3.11.

Do until there is no input left:


1. If the agenda is empty, look up the interpretations for the next word in the input and add
them to the agenda.
2. Select a constituent from the agenda (let’s call it constituent C from position p1 to p2).
3. For each rule in the grammar of form X  C X1 ... Xn, add an active arc of form
X  o CX1 .... Xn from position p1 to p2.
4. Add C to the chart using the arc extension algorithm above.
Figure 3.11 A bottom-up chart parsing algorithm

As with the top-down parsers, you may use a depth-first or breadth-first search strategy,
depending on whether the agenda is implemented as a stack or a queue. Also, for a full breadth-first
strategy, you would need to read in the entire input and add the interpretations of the words onto
the agenda before starting the algorithm. Let us assume a depth-first search strategy for the
following example.

Consider using the algorithm on the sentence The large can can hold the water using
Grammar 3.8 with the following lexicon:

the: ART
large: ADS
can: N,AUX,V
hold: N,V
water: N, V
To best understand the example, draw the chart as it is extended at each step of the algorithm. The
agenda is initially empty, so the word the is read and a constituent ARTL placed on the agenda.
Entering ART1: (the from 1 to 2)
Adds active arc NP -> ART o ADJ N from 1 to 2
Adds active arc NP -> ART o N from 1 to 2
Both these active arcs were added by step 3 of the parsing algorithm and were derived from rules 2
and 3 in the grammar, respectively. Next the word large is read and a constituent ADJ1 is created.
Entering ADJ1: (large from 2 to 3)
Adds arc NP -> ADJ o N from 2 to 3
Adds arc NP -> ART ADJ o N from 1 to 3
The first arc was added in step 3 of the algorithm. The second arc added here is an extension
of the first active arc that was added when ART1 was added to the chart using the arc extension
algorithm (step 4).
The chart at this point has already been shown in Figure 3.9. Notice that active arcs are
never removed from the chart. For example, when the arc NP  ART o ADJ N from 1 to 2 was
extended, producing the arc from 1 to 3, both arcs remained on the chart. This is necessary because
the arcs could be used again in a different way by another interpretation.
For the next word, can, three constituents, N1, AUX1, and V1 are created for its three
interpretations.
Entering N1 : (can from 3 to 4)
No active arcs are added in step 2, but two are completed in step 4 by the arc extension algorithm,
producing two NPs that are added to the agenda: The first, an NP from 1 to 4, is constructed from
rule 2, while the second, an NP from 2 to 4, is constructed from rule 4. These NPs are now at the top
of the agenda.
Entering NP1 :an NP (the large can from 1 to 4)
Adding active arc S -> NP o VP from 1 to 4
Entering NP2: an NP (large can from 2 to 4)
Adding arc S -> NP o VP from 2 to 4
Entering AUX1: (can from 3 to 4)
Adding arc VP -> AUX o VP from 3 to 4
Entering V1: (can from 3 to 4)
Adding arc VP —> V o NP from 3 to 4

Figure 3.12 After parsing the large can

The chart is shown in Figure 3.12, which illustrates all the completed constituents (NP2, NP1,
ART1, ADJ1, N1, AUX1, V1) and all the uncompleted active arcs entered so far. The next word is can
again. and N2, AUX, and V2 are created.
Entering N2: (can from 4 to 5, the second can)
Adds no active arcs
Entering AUX2: (can from 4 to 5)
Adds arc VP -> AUX o VP from 4 to 5
Entering V2: (can from 4 to 5)
Adds arc VP -> V o NP from 4 to 5
The next word is hold, and N3 and V3 are created.
Entering N3: (hold from 5 to 6)
Adds no active arcs
Entering V3: (hold from 5 to 6)
Adds arc VP -> V o NP from 5 to 6

The chart in Figure 3.13 shows all the completed constituents built so far, together with all the
active arcs, except for those used in the first NP.
Entering ART2: (the from 6 to 7)
Adding arc NP -> ART o ADJ N from 6 to 7
Adding arc NP -> ART o N from 6 to 7
Entering N4: (water from 7 to 8)
No active arcs added in step 3
An NP, NP3, from 6 to 8 is pushed onto the agenda, by completing arc
NP  ART o N from 6 to 7
Entering NP3: (the water from 6 to 8)
A VP, VP1, from 5 to 8 is pushed onto the agenda, by completing
VP V o NP from 5 to 6
Adds arc S  NP o VP from 6 to 8

The chart at this stage is shown in Figure 3.14, but only the active arcs to be used in the remainder
of the parse are shown.
Entering VP1: (hold the water from 5 to 8)
A VP, VP2, from 4 to 8 is pushed onto the agenda, by completing
VP AUX o VP from 4 to S
Entering VP2: (can hold the water from 4 to 8)
An S, S1, is added from 1 to 8, by completing arc
S  NP o VP from 1 to 4
A VP, VP3, is added from 3 to 8, by completing arc
VP  AUX o VP from 3 to 4
An S, S2, is added from 2 to 8, by completing arc
S  NP o VP from 2 to 4
Since you have derived an S covering the entire sentence, you can stop successfully. If you wanted
to find all possible interpretations for the sentence, you would continue parsing until the agenda
became empty. The chart would then contain as many S structures covering the entire set of
positions as there were different structural interpretations. In addition, this representation of the
entire set of structures would be more efficient than a list of interpretations, because the different S
structures might share common subparts represented in the chart only once. Figure 3.15 shows the
final chart.
Efficiency Considerations

Chart-based parsers can be considerably more efficient than parsers that rely only on a search
because the same constituent is never constructed more than once. For instance, a pure top-down
or bottom-up search strategy could require up to Cn operations to parse a sentence of length n,
where C is a constant that depends on the specific algorithm you use. Even if C is very small, this
exponential complexity rapidly makes the algorithm unusable. A chart-based parser, on the other
hand, in the worst case would build every possible constituent between every possible pair of
positions. This allows us to show that it has a worst-case complexity of K*n3, where n is the length
of the sentence and K is a constant depending on the algorithm. Of course, a chart parser involves
more work in each step, so K will be larger than C. To contrast the two approaches, assume that C is
10 and that K is a hundred times worse, 1000. Given a sentence of 12 words, the brute force search
might take 1012 operations (that is, 1,000,000,000,000), whereas the chart parser would take 1000
* 123 (that is, 1,728,000). Under these assumptions, the chart parser would be up to 500,000 times
faster than the brute force search on some examples!

3.5 Transition Network Grammars

So far we have examined only one formalism for representing grammars, namely context-free
rewrite rules. Here we consider another formalism that is useful in a wide range of applications. It
is based on the notion of a transition network consisting of nodes and labeled arcs. One of the
nodes is specified as the initial state, or start state. Consider the network named NP in Grammar
3.16, with the initial state labeled NP and each arc labeled with a word category. Starting at the
initial state, you can traverse an arc if the current word in the sentence is in the category on the arc.
If the arc is followed, the current word is updated to the next word. A phrase is a legal NP if there is
a path from the node NP to a pop arc (an arc labeled pop) that accounts for every word in the
phrase. This network recognizes the same set of sentences as the following context-free grammar:

NP  ART NP1
NP1  ADJ NP1
NP1  N
Consider parsing the NP a purple cow with this network. Starting at the node NP, you can follow the
arc labeled art, since the current word is an article— namely, a. From node NP1 you can follow the
arc labeled adj using the adjective purple, and finally, again from NP1, you can follow the arc labeled
noun using the noun cow. Since you have reached a pop arc, a purple cow is a legal NP.
Simple transition networks are often called finite state machines (FSMs). Finite state
machines are equivalent in expressive power to regular grammars, and thus are not powerful
enough to describe all languages that can be described by a CFG. To get the descriptive power of
CFGs, you need a notion of recursion in the network grammar. A recursive transition network
(RTN) is like a simple transition network, except that it allows arc labels to refer to other networks
as well as word categories. Thus, given the NP network in Grammar 3.16, a network for simple
English sentences can be expressed as shown in Grammar 3.17. Uppercase labels refer to networks.
The arc from S to Si can be followed only if the NP network can be successfully traversed to a pop
arc. Although not shown in this example, RTNs allow true recursion—that is, a network might have
an arc labeled with its own name.
Consider finding a path through the S network for the sentence The purple cow ate the grass.
Starting at node 5, to follow the arc labeled NP, you need to traverse the NP network. Starting at
node NP, traverse the network as before for the input the purple cow. Following the pop arc in the
NP network, return to the S network and traverse the arc to node S1. From node S1 you follow the
arc labeled verb using the word ate. Finally, the arc labeled NP can be followed if you can traverse
the NP network again. This time the remaining input consists of the words the grass. You follow the
arc labeled art and then the arc labeled noun in the NP network; then take the pop arc from node
NP2 and then another pop from node S3. Since you have traversed the network and used all the
words in the sentence, The purple cow ate the grass is accepted as a legal sentence.

Arc Type Example How Used


CAT noun succeeds only if current word is of the named category
WRD of succeeds only if current word is identical to the label
PUSH NP succeeds only if named network can be successfully traversed
JUMP jump always succeeds
POP pop succeeds and signals the successful end of the network
Figure 3.18 The arc labels for RTNs

In practice, RTN systems incorporate some additional arc types that are useful but not formally
necessary; Figure 3.18 summarizes the arc types, together with the notation that will be used in
this book to indicate these arc types. According to this terminology, arcs that are labeled with
networks are called push arcs, and arcs labeled with word categories are called cat arcs. In
addition, an arc that can always be followed is called a jump arc.

Top-Down Parsing with Recursive Transition Networks

An algorithm for parsing with RTNs can be developed along the same lines as the algorithms for
parsing CFGs. The state of the parse at any moment can be represented by the following:
current position - a pointer to the next word to be parsed.
current node - the node at which you are located in the network.
return points - a stack of nodes in other networks where you will continue if you pop from
the
current network.
Grammar 3.19

First, consider an algorithm for searching an RTN that assumes that if you can follow an arc, it will
be the
correct one in the final parse. Say you are in the middle of a parse and know the three pieces of
information just cited. You can leave the current node and traverse an arc in the following cases:
Case 1: If arc names word category and next word in sentence is in that category,
Then (1) update current position to start at the next word;
(2) update current node to the destination of the arc.
Case 2: If arc is a push arc to a network N,
Then (1) add the destination of the arc onto return points;
(2) update current node to the starting node in network N.
Case 3: If arc is a pop arc and return points list is not empty,
Then (1) remove first return point and make it current node.
Case 4: If arc is a pop arc, return points list is empty and there are no words left,
Then (1) parse completes successfully.
Grammar 3.19 shows a network grammar. The numbers on the arcs simply indicate the order in
which arcs will be tried when more than one arc leaves a node.
Figure 3.20 demonstrates that the grammar accepts the sentence
1 The 2 old 3 man 4 cried 5
by showing the sequence of parse states that can be generated by the algorithm. In the trace, each
arc is identified by the name of the node that it leaves plus the number identifier. Thus arc S/1 is
the arc labeled 1 leaving the S node. If you start at node 5, the only possible arc to follow is the push
arc NP. As specified in case 2 of the algorithm, the new parse state is computed by setting the
current node to NP and putting node S1 on the return points list. From node NP, arc NP/1 is
followed and, as specified in case 1 of the algorithm, the input is checked for a word in category art.
Since this check succeeds, the arc is followed and the current position is updated (step 3). The
parse continues in this manner to step 5, when a pop arc is followed, causing the current node to be
reset to S1 (that is, the NP arc succeeded). The parse succeeds after finding a verb in step 6 and
following the pop arc from the S network in step 7.
In this example the parse succeeded because the first arc that succeeded was ultimately the
correct one in every case. However, with a sentence like The green faded, where green can be an
adjective or a noun, this algorithm would fail because it would initially classify green as an adjective
and then not find a noun following. To be able to recover from such failures, we save all possible
backup states as we go along, just as we did with the CFG top-down parsing algorithm.
Consider this technique in operation on the following sentence:
1 One 2 saw 3 the 4 man 5
The parser initially attempts to parse the sentence as beginning with the NP one saw, but after
failing to find a verb, it backtracks and finds a successful parse starting with the NP one. The trace
of the parse is shown in Figure 3.21, where at each stage the current parse state is shown in the
form of a triple (current node, current position, return points), together with possible states for
backtracking. The figure also shows the arcs used to generate the new state and backup states.

This trace behaves identically to the previous example except in two places. In step 2, two arcs
leaving node NP could accept the word one. Arc NP/2 classifies one as a number and produces the
next current state. Arc NP/3 classifies it as a pronoun and produces a backup state. This backup
state is actually used later in step 6 when it is found that none of the arcs leaving node S 1 can
accept the input word the.
Of course, in general, many more backup states are generated than in this simple example.
In these cases there will be a list of possible backup states. Depending on how this list is organized,
you can produce different orderings on when the states are examined.
Step Current State Arc to be Followed Backup States
1. (S, 1, NIL) S/1 NIL
2. (NP, 1, (S1)) NP/2 (& NP/3 for backup) NIL
3. (NP1, 2, (S1)) NPl/2 (NP2, 2, (S1))
4. (NP2, 3, (S1)) NP2/l (NP2, 2, (S1))
5. (S1, 3, NIL) no arc can be followed (NP2, 2, (S1))
6. (NP2, 2, (S1)) NP2/l NIL
7. (S1, 2, NIL) S1/l NIL
8. (S2, 3, NIL) S2/2 NIL
9. (NP, 3, (S2)) NP/1 NIL
10. (N7PI, 4, (S2)) NP1/2 NIL
11. (NP2, 5, (S2)) NP2/1 NIL
12. (S2, 5, NIL) S2/1 NIL
13. Parse succeeds NIL
Figure 3.21 A top-down RTN parse with backtracking

An RTN parser can be constructed to use a chart-like structure to gain the advantages of chart
parsing. In RTN systems, the chart is often called the well-formed substring table (WFST). Each
time a pop is followed, the constituent is placed on the WFST, and every time a push is found, the
WFST is checked before the sub-network is invoked. If the chart contains constituent(s) of the type
being pushed for, these are used and the sub-network is not reinvoked. An RTN using a WFST has
the same complexity as the chart parser described in the last section: K*n3, where n is the length of
the sentence.

3.6 Top-Down Chart Parsing

So far, you have seen a simple top-down method and a bottom-up chart-based method for parsing
context-free grammars. Each of the approaches has its advantages and disadvantages. In this
section a new parsing method is presented that actually captures the advantages of both. But first,
consider the pluses and minuses of the approaches.
Top-down methods have the advantage of being highly predictive. A word might be
ambiguous in isolation, but if some of those possible categories cannot be used in a legal sentence,
then these categories may never even be considered. For example, consider Grammar 3.8 in a top-
down parse of the sentence The can holds the water, where can may be an AUX, V. or N, as before.
The top-down parser would rewrite (5) to (NP VP) and then rewrite the NP to produce three
possibilities, (ART ADJ N VP), (ART N VP), and (ADJ N VP). Taking the first, the parser checks if the
first word, the, can be an ART, and then if the next word, can, can be an ADJ, which fails. Trying the
next possibility, the parser checks the again, and then checks if can can be an N, which succeeds.
The interpretations of can as an auxiliary and a main verb are never considered because no
syntactic tree generated by the grammar would ever predict an AUX or V in this position. In
contrast, the bottom-up parser would have considered all three interpretations of can from the
start—that is, all three would be added to the chart and would combine with active arcs. Given this
argument, the top-down approach seems more efficient.
On the other hand, consider the top-down parser in the example above needed to check that
the word the was an ART twice, once for each rule. This reduplication of effort is very common in
pure top-down approaches and becomes a serious problem, and large constituents may be rebuilt
again and again as they are used in different rules. In contrast, the bottom-up parser only checks
the input once, and only builds each constituent exactly once. So by this argument, the bottom-up
approach appears more efficient.
You can gain the advantages of both by combining the methods. A small variation in the
bottom-up chart algorithm yields a technique that is predictive like the top-down approaches yet
avoids any reduplication of work as in the bottom-up approaches.
As before, the algorithm is driven by an agenda of completed constituents and the arc
extension algorithm, which combines active arcs with constituents when they are added to the
chart. While both use the technique of extending arcs with constituents, the difference is in how
new arcs are generated from the grammar. In the bottom-up approach, new active arcs are
generated whenever a completed constituent is added that could be the first constituent of the
right-hand side of a rule. With the top-down approach, new active arcs are generated whenever a
new active arc is added to the chart, as described in the top-down arc introduction algorithm
shown in Figure 3.22. The parsing algorithm is then easily stated, as is also shown in Figure 3.22

Top-Down Arc Introduction Algorithm


To add an arc S -> C1 ... o C1 ... Cn ending at position j, do the following:
For each rule in the grammar of form Ci -> X1 ... Xk, recursively add the new arc Ci -> o X1
... Xk from position j to j.
Top-Down Chart Parsing Algorithm
Initialization: For every rule in the grammar of form S -> X1 ... Xk, add an arc labeled S -> o
X1 ... Xk using the arc introduction algorithm.
Parsing: Do until there is no input left:
1. If the agenda is empty, look up the interpretations of the next word and add them
to the agenda.
2. Select a constituent from the agenda (call it constituent C).
3. Using the arc extension algorithm, combine C with every active arc on the chart.
Any
New constituents are added to the agenda.
4. For any active arcs created in step 3, add them to the chart using the top-down arc
introduction algorithm.
Figure 3.22 The top-down arc introduction and chart parsing algorithms

Consider this new algorithm operating with the same grammar on the same sentence as in Section
3.4, namely The large can can hold the water. In the initialization stage, an arc labeled S  o NP VP
is added. Then, active arcs for each rule that can derive an NP are added: NP o ART ADJ N,
NP  o ART N, and NP  o ADJ N are all added from position 1 to 1. Thus the initialized chart is as
shown in Figure 3.23. The trace of the parse is as follows:
Entering ART1 (the) from 1 to 2
Two arcs can be extended by the arc extension algorithm
NP  ART o N from 1 to 2
NP ART o ADJ N from 1 to 2
Entering ADJ1 (large) from 2 to 3
One arc can be extended
NP  ART ADJ o N from 1 to 3
Entering AUX1 (can) from 3 to4
No activity, constituent is ignored
Entering V1 (can) from 3 to4
No activity, constituent is ignored
Entering N1 (can) from 3 to 4
One arc extended and completed yielding
NP1 from 1 to 4(the large can)
Entering NP1 from 1 to 4
One arc can be extended
S  NP o VP from 1 to 4
Using the top-down rule (step 4), new active arcs are added for VP
VP  o AUX VP from 4 to 4
VP  o V NP from 4 to 4

At this stage, the chart is as shown in Figure 3.24. Compare this with Figure 3.10. It contains
fewer completed constituents since only those that are allowed by the top-down filtering have been
constructed.
The algorithm continues, adding the three interpretations of can as an AUX, V, and N. The
AUX reading extends the VP  o AUX VP arc at position 4 and adds active arcs for a new VP starting
at position 5. The V reading extends the VP 0 V NP arc and adds active arcs for an NP starting at
position 5. The N reading does not extend any arc and so is ignored. After the two readings of hold
(as an N and V) are added, the chart is as shown in Figure 3.25. Again, compare with the
corresponding chart for the bottom-up parser in Figure 3.13. The rest of the sentence is parsed
similarly, and the final chart is shown in Figure 3.26. In comparing this to the final chart produced
by the bottom-up parser (Figure 3.15), you see that the number of constituents generated has
dropped from 21 to 13. While it is not a big difference here with such a simple grammar, the
difference can be dramatic with a sizable
grammar.

It turns out in the worst-case analysis that the top-down chart parser is not more efficient that the
pure bottom-up chart parser. Both have a worst-case complexity of K*n3 for a sentence of length n.
In practice, however, the top-down method is considerably more efficient for any reasonable
grammar.

3.7 Feature Systems and Augmented Grammars

Context-free grammars provide the basis for most of the computational parsing mechanisms
developed to date, but as they have been described so far, they would be very inconvenient for
capturing natural languages. This chapter describes an extension to the basic context-free
mechanism that defines constituents by a set of features. This extension allows aspects of natural
language such as agreement and sub-categorization to be handled in an intuitive and concise way.
In natural languages there are often agreement restrictions between words and phrases. For
example, the NP "a men" is not correct English because the article a indicates a single object while
the noun "men" indicates a plural object; the noun phrase does not satisfy the number agreement
restriction of English. There are many other forms of agreement, including subject-verb agreement,
gender agreement for pronouns, restrictions between the head of a phrase and the form of its
complement, and so on. To handle such phenomena conveniently, the grammatical formalism is
extended to allow constituents to have features. For example, we might define a feature NUMBER
that may take a value of either s (for singular) or p (for plural), and we then might write an
augmented CFG rule such as
NP  ART N only when NUMBER1 agrees with NUMBER2
This rule says that a legal noun phrase consists of an article followed by a noun, but only when the
number feature of the first word agrees with the number feature of the second. This one rule is
equivalent to two CFG rules that would use different terminal symbols for encoding singular and
plural forms of all noun phrases, such as
NP-SING  ART-SING N-SING
NP-PLURAL  ART-PLURAL N-PLURAL
While the two approaches seem similar in ease-of-use in this one example, consider that all rules in
the grammar that use an NP on the right-hand side would now need to be duplicated to include a
rule for NP-SING and a rule for NP-PLURAL, effectively doubling the size of the grammar. And
handling additional features, such as person agreement, would double the size of the grammar
again and again. Using features, the size of the augmented grammar remains the same as the
original one yet accounts for agreement constraints.
To accomplish this, a constituent is defined as a feature structure - a mapping from features
to values that defines the relevant properties of the constituent. In the examples in this book,
feature names in formulas will be written in boldface. For example, a feature structure for a
constituent ART1 that represents a particular use of the word a might be written as follows:
ART1: (CAT ART
ROOT a
NUMBER s)
This says it is a constituent in the category ART that has as its root the word a and is singular.
Usually an abbreviation is used that gives the CAT value more prominence and provides an
intuitive tie back to simple context-free grammars. In this abbreviated form, constituent ART1
would be written as
ART1: (ART ROOT a NUMBER s)
Feature structures can be used to represent larger constituents as well. To do this, feature
structures themselves can occur as values. Special features based on the integers - 1, 2, 3, and so on
- will stand for the first sub-constituent, second sub-constituent, and so on, as needed. With this, the
representation of the NP constituent for the phrase "a fish" could be
NP1: (NP NUMBERs
1 (ART ROOT a
NUMBER s)
2 (N ROOT fish
NUMBER s))
Note that this can also be viewed as a representation of a parse tree shown in Figure 3.27, where
the subconstituent features 1 and 2 correspond to the subconstituent links in the tree.
The rules in an augmented grammar are stated in terms of feature structures rather than
simple categories. Variables are allowed as feature values so that a rule can apply to a wide range of
situations. For example, a rule for simple noun phrases would be as follows:
(NP NUMBER ?n)  (ART NUMBER ?n) (N NUMBER ?n)
This says that an NP constituent can consist of two subconstituents, the first being an ART and the
second being an N, in which the NUMBER feature in all three constituents is identical. According to
this rule, constituent NP1 given previously is a legal constituent. On the other hand, the constituent

Figure 3.27 Viewing a feature structure as an extended parse tree


* (NP 1 (ART NUMBER s)
2 (N NUMBER s))
is not allowed by this rule because there is no NUMBER feature in the NP, and the constituent
*(NP NUMBER s
1 (ART NUMBER s)
2 (N NUMBER p))
is not allowed because the NUMBER feature of the N constituent is not identical to the other two
NUMBER features.
Variables are also useful in specifying ambiguity in a constituent. For instance, the word fish
is ambiguous between a singular and a plural reading. Thus the word might have two entries in the
lexicon that differ only by the value of the NUMBER feature. Alternatively, we could define a single
entry that uses a variable as the value of the NUMBER feature, that is,
(N ROOT fish NUMBER ?n)
This works because any value of the NUMBER feature is allowed for the word fish. In many cases,
however, not just any value would work, but a range of values is possible.. To handle these cases,
we introduce constrained variables, which are variables that can only take a value out of a specified
list. For example, the variable ?n{s p} would be a variable that can take the value s or the value p.
Typically. when we write such variables, we will drop the variable name altogether and just list the
possible values. Given this, the word fish might be represented by the constituent
(N ROOT fish NUMBER ?n{s p})
or more simply as
(N ROOT fish NUMBER {s p})
There is an interesting issue of whether an augmented context-free grammar can describe
languages that cannot be described by a simple context-free grammar. The answer depends on the
constraints on what can be a feature value. If the set of feature values is finite, then it would always
be possible to create new constituent categories for every combination of features. Thus it is
expressively equivalent to a context-free grammar. If the set of feature values is unconstrained,
however, then such grammars have arbitrary computational power. In practice, even when the set
of values is not explicitly restricted, this power is not used, and the standard parsing algorithms can
be used on grammars that include features.

3.8 Some Basic Feature Systems for English


This section describes some basic feature systems that are commonly used in grammars of English
and develops the particular set of features used throughout this book. Specifically, it considers
number and person agreement, verb form features, and features required to handle
subcategorization constraints. You should read this to become familiar with the features in general
and then refer back to it later when you need a detailed specification of a particular feature.

Person and Number Features


In the previous section, you saw the number system in English: Words may be classified as to
whether they can describe a single object or multiple objects. While number agreement restrictions
occur in several different places in English, they are most importantly found in subject-verb
agreement. But subjects and verbs must also agree on another dimension, namely with respect to
the person. The possible values of this dimension are
First Person (1): The noun phrase refers to the speaker, or a group of people including the
speaker (for example, I, we, you, and 0.
Second Person (2): The noun phrase refers to the listener, or a group including the listener
but
Not including the speaker (for example, you, all of you).
Third Person (3): The noun phrase refers to one or more objects, not including the speaker
or
hearer.

Since number and person features always co-occur, it is convenient to combine the two into a
single feature, AGR, that has six possible values: first person singular (is), second person singular
(2s), third person singular (3s), and first, second and third person plural (ip, 2p, and 3p,
respectively). For example, an instance of the word is can agree only with a third person singular
subject, so its AGR feature would be 3s. An instance of the word are, however, may agree with
second person singular or any of the plural forms, so its AGR feature would be a variable ranging
over the values {2s 1p 2p 3p}.

Verb-Form Features and Verb Subcategorization


Another very important feature system in English involves the form of the verb. This feature is used
in many situations, such as the analysis of auxiliaries and generally in the subcategorization
restrictions of many head words. As described in Chapter 2, there are five basic forms of verbs. The
feature system for verb forms will be slightly more complicated in order to conveniently capture
certain phenomena. In particular, we will use the following feature values for the feature VFORM:
base - base form (for example, go, be, say, decide)
pres - simple present tense (for example, go, goes, am, is, say, says, decide)
past - simple past tense (for example, went, was, said, decided)
fin - finite (that is, a tensed form, equivalent to {pres past})
ing - present participle (for example, going, being, saying, deciding)
pastprt - past participle (for example, gone, been, said, decided)
inf - a special feature value that is used for infinitive forms with the word to

Figure3.28 The SUBCAT values for NPNP combinations

To handle the interactions between words and their complements, an additional feature, SUBCAT,
is used. Chapter 2 described some common verb sub-categorization possibilities. Each one will
correspond to a different value of the SUBCAT feature. Figure 3.28 shows some SUBCAT values for
complements consisting of combinations of NPs and VPs. To help you remember the meaning of the
feature values, they are formed by listing the main category of each part of the complement. If the
category is restricted by a feature value, then the feature value follows the constituent separated by
a colon. Thus the value _np_vp:inf will be used to indicate a complement that consists of an NP
followed by a VP with VFORM value inf. Of course, this naming is just a convention to help the
reader; you could give these values any arbitrary name, since their significance is determined solely
by the grammar rules that involve the feature. For instance, the rule for verbs with a SUBCAT value
of _np_vp:inf would be
(VP) -> (V SUBCAT _np_vp:inf)
(NP)
(VP VFORM inf)
This says that a VP can consist of a V with SUBCAT value _np_vp:inf, followed by an NP, followed by
a VP with VFORM value inf. Clearly, this rule could be rewritten using any other unique symbol
instead of _np_vp:inf, as long as the lexicon is changed to use this new value.
Many verbs have complement structures that require a prepositional phrase with a
particular preposition, or one that plays a particular role. For example, the verb give allows a
complement consisting of an NP followed by a PP using the preposition to, as in "Jack gave the
money to the bank". Other verbs, such as "put", require a prepositional phrase that describes a
location, using prepositions such as "in", "inside", "on", and "by". To express this within the feature
system, we introduce a feature PFORM on prepositional phrases. A prepositional phrase with a
PFORM value such as TO must have the preposition to as its head, and so on. A prepositional phrase
with a PFORM value LOC must describe a location. Another useful PFORM value is MOT, used with
verbs such as walk, which may take a prepositional phrase that describes some aspect of a path, as
in We walked to the store. Prepositions that can create such phrases include to, from, and along.
The LOC and MOT values might seem hard to distinguish, as certain prepositions might describe
either a location or a path, but they are distinct. For example, while Jack put the box (in on by] the
corner is fine, *Jack put the box (to from along] the corner is ill-formed. Figure.3.29 summarizes the
PFORM feature.

Figure 3.29 Some values of the PFORM feature for prepositional phrases

Figure 3.30 Additional SUBCAT values

This feature can be used to restrict the complement forms for various verbs. Using the
naming convention discussed previously, the SUBCAT value of a verb such as put would be jip
pp:loc, and the appropriate rule in the grammar would be
(VP) -> (V SUBCAT _np_pp:loc)
(NP)
(PP PFORM LOC)
For embedded sentences, a complementizer is often needed and must be subcategorized for. Thus a
COMP feature with possible values for, that, and no-comp will be useful. For example, the verb tell
can subcategorize for an S that has the complementizer that. Thus one SUBCAT value of tell will be
_s:that. Similarly, the verb wish subcategorizes for an S with the complementizer for, as in We
wished for the rain to stop. Thus one value of the SUBCAT feature for wish is _s:for. Figure 3.30 lists
some of these additional SUBCAT values and examples for a variety of verbs. In this section, all the
examples with the SUBCAT feature have involved verbs, but nouns, prepositions, and adjectives
may also use the SUBCAT feature and subcategorize for their complements in the same way.

Binary Features
Certain features are binary in that a constituent either has or doesn’t have the feature. In our
formalization a binary feature is simply a feature whose value is restricted to be either + or -. For
example, the INV feature is a binary feature that indicates whether or not an S structure has an
inverted subject (as in a yes/no question). The S structure for the sentence Jack laughed will have
an INV value —, whereas the S structure for the sentence Did Jack laugh? will have the INV value +.
Often, the value is used as a prefix, and we would say that a structure has the feature +INV or -INV.
Other binary features will be introduced as necessary throughout the development of the
grammars.

The Default Value for Features


It will be useful on many occasions to allow a default value for features. Any-time a constituent is
constructed that could have a feature, but a value is not specified, the feature takes the default
value of -. This is especially useful for binary features but is used for non-binary features as well;
this usually ensures that any later agreement check on the feature will fail. The default value is
inserted when the constituent is first constructed.

3.9 Morphological Analysis and the Lexicon

Before you can specify a grammar, you must define the lexicon. This section explores some
issues in lexicon design and the need for a morphological analysis component .
The lexicon must contain information about all the different words that can be used,
including all the relevant feature value restrictions. When a word is ambiguous, it may be described
by multiple entries in the lexicon, one for each different use.
Because words tend to follow regular morphological patterns, however, many forms of
words need not be explicitly included in the lexicon. Most English verbs, for example, use the same
set of suffixes to indicate different forms: -s is added for third person singular present tense, -ed for
past tense, -ing for the present participle, and so on. Without any morphological analysis, the
lexicon would have to contain every one of these forms. For the verb want this would require six
entries, for want (both in base and present form), wants, wanting, and wanted (both in past and
past participle form).
In contrast, by using the methods described in previous Section to strip suffixes there needs
to be only one entry for want. The idea is to store the base form of the verb in the lexicon and use
context-free rules to combine verbs with suffixes to derive the other entries. Consider the following
rule for present tense verbs:
(V ROOT ?r SUBCAT ?s VFORM pres AGR 3s) ->
(V ROOT ?r SUBCAT ?s VFORM base) (+S)
where +S is a new lexical category that contains only the suffix morpheme -s. This rule, coupled
with the
lexicon entry
want:(V ROOT want
SUBCAT {_np_vp:inf _np_vp:inf}
VFORM base)
would produce the following constituent given the input string want -s
want:(V ROOT want
SUBCAT {_np_vp:inf _np_vp:inf}
VFORM pres
AGR 3s)
Another rule would generate the constituents for the present tense form not in third person
singular, which for most verbs is identical to the root form:
(V ROOT ?r SUBCAT ?s VFORM pres AGR {ls 2s lp 2p 3p})
—> (V ROOT ?r SUBCAT ?s VFORM base)
But this rule needs to be modified in order to avoid generating erroneous interpretations.
Currently, it can transform any base form verb into a present tense form, which is clearly wrong for
some irregular verbs. For instance, the base form be cannot be used as a present form (for example,
*We be at the store). To cover these cases, a feature is introduced to identify irregular forms.
Specifically, verbs with the binary feature +IRREG-PRES have irregular present tense forms. Now
the rule above can be stated correctly:
(V ROOT ?r SUBCAT ?s VFORM pres AGR (ls 2s lp 2p 3p))
—> (V ROOT ?r SUBCAT ?s VFORM base IRREG-PRES -)
Because of the default mechanism, the IRREG-PRES feature need only be specified on the
irregular verbs. The regular verbs default to -, as desired. Similar binary features would be needed
to flag irregular past forms (IRREG-PAST, such as saw), and to distinguish -en past participles from
-ed past participles (EN--PASTPRT). These features restrict the application of the standard lexical
rules, and the irregular forms are added explicitly to the lexicon. Grammar 4.5 gives a set of rules
for deriving different verb and noun forms using these features.

Grammar 4.5 Some lexical rules for common suffixes on verbs and nouns
Given a large set of features, the task of writing lexical entries appears very difficult. Most
frameworks allow some mechanisms that help alleviate these problems. The first technique -
allowing default values for features - has already been mentioned. With this capability, if an entry
takes a default value for a given feature, then it need not be explicitly stated. Another commonly
used technique is to allow the lexicon writing to define clusters of features, and then indicate a
cluster with a single symbol rather than listing them all. Later, additional techniques will be
discussed that allow the inheritance of features in a feature hierarchy.
Figure 3.31 contains a small lexicon. It contains many of the words to be used in the
examples that follow. It contains three entries for the word "saw "- as a noun, as a regular verb, and
as the irregular past tense form of the verb "see" - as illustrated in the sentences
The saw was broken.
Jack wanted me to saw the board in half.
I saw Jack eat the pizza.
With an algorithm for stripping the suffixes and regularizing the spelling, as described in previous
Section, the derived entries can be generated using any of the basic parsing algorithms on Grammar
4.5. With the lexicon in Figure 3.31 and Grammar 4.5, correct constituents for the following words
can be derived: been, being, cries, cried, crying, dogs, saws (two interpretations), sawed, sawing,
seen, seeing, seeds, wants, wanting, and wanted. For example, the word cries would be transformed
into the sequence cry +s, and then rule 1 would produce the present tense entry from the base form
in the lexicon.
Figure 3.31 A lexicon

Often a word will have multiple interpretations that use different entries and different
lexical rules. The word saws, for instance, transformed into the sequence saw +s, can be a plural
noun (via rule 7 and the first entry for saw), or the third person present form of the verb saw (via
rule 1 and the second entry for saw). Note that rule I cannot apply to the third entry, as its VFORM
is not base.
The success of this approach depends on being able to prohibit erroneous derivations, such
as analyzing seed as the past tense of the verb "see". This analysis will never be considered if the
FST that strips suffixes is correctly designed. Specifically, the word see will not allow a transition to
the states that allow the -ed suffix. But even if this were produced for some reason, the IRREG-PAST
value + in the entry for see would prohibit rule 3 from applying.
3.10 Parsing with Features
The parsing algorithms developed in Chapter 3 for context-free grammars can be extended to
handle augmented context-free grammars. This involves generalizing the algorithm for matching
rules to constituents. For instance, the chart-parsing algorithms developed in Chapter 3 all used an
operation for extending active arcs with a new constituent. A constituent X could extend an arc of
the form
C  C1 ... Ci o X ... Cn
to produce a new arc of the form
C  C1 ... Ci X o ... Cn
A similar operation can be used for grammars with features, but the parser may have to instantiate
variables in the original arc before it can be extended by X. The key to defining this matching
operation precisely is to remember the definition of grammar rules with features. A rule such as
1. (NP AGR ?a) -> o (ART AGR ?a) (N AGR ?a)
says that an NP can be constructed out of an ART and an N if all three agree on the AGR feature. It
does not place any restrictions on any other features that the NP, ART, or N may have. Thus, when
matching constituents against this rule, the only thing that matters is the AGR feature. All other
features in the constituent can be ignored. For instance, consider extending arc 1 with the
constituent
2. (ART ROOT A AGR 3s)
To make arc 1 applicable, the variable ?a must be instantiated to 3s, producing
3. (NP AGR 3s) -> o (ART AGR 3s) (N AGR 3s)
This arc can now be extended because every feature in the rule is in constituent 2:
4. (NP AGR 3s) -> (ART AGR 3s) o (N AGR 3s)
Now, consider extending this arc with the constituent for the word dog:
5. (N ROOT DOG1 AGR 3s)
This can be done because the AGR features agree. This completes the arc
6. (NP AGR 3s) —> (ART AGR 3s) (N AGR 3s)
This means the parser has found a constituent of the form (NP AGR 3s).
This algorithm can be specified more precisely as follows: Given an arc A, where the
constituent following the dot is called NEXT, and a new constituent X, which is being used to extend
the arc,
a. Find an instantiation of the variables such that all the features specified in NEXT are found
in X.
b. Create a new arc A', which is a copy of A except for the instantiations of the variables
determined in step (a).
c. Update A' as usual in a chart parser.
For instance, let A be arc 1, and X be the ART constituent 2. Then NEXT will be (ART AGR ?a). In
step a, NEXT is matched against X, and you find that ?a must be instantiated to 3s. In step b, a new
copy of A is made, which is shown as arc 3. In step c, the arc is updated to produce the new arc
shown as arc 4.
When constrained variables, such as ?a{3s 3p}, are involved, the matching proceeds in the
same manner, but the variable binding must be one of the listed values. If a variable is used in a
constituent, then one of its possible values must match the requirement in the rule. If both the rule
and the constituent contain variables, the result is a variable ranging over the intersection of their
allowed values.
For instance, consider extending arc 1 with the constituent (ART ROOT the AGR ?v{3s 3p}),
that is, the word "the". To apply, the variable ?a would have to be instantiated to ?v{ 3s 3p},
producing the rule
(NP AGR ?v{3s 3p}) —> (ART AGR ?v{3s 3p}) o (N AGR ?v{3s 3p})
This arc could be extended by (N ROOT dog AGR 3s), because ?v{3s 3p} could be instantiated by the
value 3s. The resulting arc would be identical to arc 6. The entry in the chart for the is not changed
by this operation. It still has the value ?v{3s 3p}. The AGR feature is restricted to 3s only in the arc.
Another extension is useful for recording the structure of the parse. Subconstituent features (1, 2,
and so on, depending on which subconstituent is being added) are automatically inserted by the
parser each time an arc is extended. The values of these features name subconstituents already in
the chart.
With this treatment, and assuming that the chart already contains two constituents, ARTL
and Ni, for the words the and dog, the constituent added to the chart for the phrase the dog would
be
(NP AGR 3s
1 ART1
2 N1)
where ART1 (ART ROOT the AGR {3s 3p}) and N1 = (N ROOT dog AGR {3s}). Note that the AGR
feature of ART1 was not changed. Thus it could be used with other interpretations that require the
value 3p if they are possible. Any of the chart-parsing algorithms described in Chapter 3 can now be
used with an augmented grammar by using these extensions to extend arcs and build constituents.
Consider an example. Figure 3.32 contains the final chart produced from parsing the sentence He
wants to cry using Grammar 4.8. The rest of this section considers how some of the non-terminal
symbols were constructed for the chart.
Grammar for parsing with features
Figure 3.32 The chart for "He wants to cry".

Constituent NP1 was constructed by rule 3, repeated here for convenience:


3. (NP AGR ?a) -> (PRO AGR ?a)
To match the constituent PRO1, the variable ?a must be instantiated to 3s. Thus the new
constituent built is
NP1: (CAT NP
AGR 3s
1 PRO1)
Next consider constructing constituent VP1 using rule 4, namely
4. (VP AGR ?a VFORM ?v) -> (V SUBCAT _none AGR ?a VFORM ?v)
For the right-hand side to match constituent V2, the variable ?v must be instantiated to base. The
AGR feature of V2 is not defined, so it defaults to -. The new constituent is
VP1: (CAT VP
AGR -
VFORM base
1 V2)
Generally, default values are not shown in the chart. In a similar way, constituent VP2 is built from
TO1 and VP1 using rule 9, VP3 is built from V1 and VP2 using rule 6, and S1 is built from NP1 and
VP3 using rule 1.

Construct parse tree with features for the following sentences

1. The man cries.


2. The men cry.
3. The man saw the dogs.
4. He wants the dog.
5. He wants to be happy.
6. He wants the man to see the dog.
7. He is happy to be a dog.
3.11 Augmented Transition Networks

Features can also be added to a recursive transition network to produce an augmented transition
network (ATN). Features in an ATN are traditionally called registers. Constituent structures are
created by allowing each network to have a set of registers. Each time a new network is pushed, a
new set of registers is created. As the network is traversed, these registers are set to values by
actions associated with each arc. When the network is popped, the registers are assembled to form
a constituent structure, with the CAT slot being the network name.
Grammar 4.11 is a simple NP network. The actions are listed in the table below the network.
ATNs use a special mechanism to extract the result of following an arc. When a lexical arc, such as
arc 1, is followed, the constituent built from the word in the input is put into a special variable
named "*". The action
DET := *
then assigns this constituent to the DET register. The second action on this arc,
AGR := AGR*
assigns the AGR register of the network to the value of the AGR register of the new word (the
constituent in "*").
Agreement checks are specified in the tests. A test is an expression that succeeds if it returns a
nonempty value and fails if it returns the empty set or nil.

Grammar 4.12 A simple S network


If a test fails, its arc is not traversed. The test on arc 2 indicates that the arc can be followed only if
the AGR feature of the network has a non-null intersection with the AGR register of the new word
(the noun constituent in "*").
Features on push arcs are treated similarly. The constituent built by traversing the NP
network is returned as the value "*". Thus in Grammar 4.12, the action on the arc from S to S1,
SUBJ := *
would assign the constituent returned by the NP network to the register SUBJ. The test on arc 2 will
succeed only if the AGR register of the constituent in the SUBJ register has a non-null intersection
with the AGR register of the new constituent (the verb). This test enforces subject-verb agreement.

Figure 3.33 Trace tests and actions used with "1 The 2 dog 3 saw 4 Jack 5"
With the lexicon in Section 4.3, the ATN accepts the following sentences:
The dog cried.
The dogs saw Jack.
Jack saw the dogs.
Consider an example. A trace of a parse of the sentence "The dog saw Jack" is shown in Figure 3.33
It indicates the current node in the network, the current word position, the arc that is followed
from the node, and the register manipulations that are performed for the successful parse. It starts
in the S network but moves immediately to the NP network from the call on arc 4. The NP network
checks for number agreement as it accepts the word sequence The dog. It constructs a noun phrase
with the AGR feature plural. When the pop arc is followed, it completes arc 4 in the S network. The
NP is assigned to the SUBJ register and then checked for agreement with the verb when arc 3 is
followed. The NP
"Jack" is accepted in another call to the NP network.

Grammar 4.14 An S network for assertions

An ATN Grammar for Simple Declarative Sentences

Here is a more comprehensive example of the use of an ATN to describe some declarative
sentences. The allowed sentence structure is an initial NP followed by a main verb, which may then
be followed by a maximum of two NPs and many PPs, depending on the Verb. Using the feature
system extensively, you can create a grammar that accepts any of the preceding complement forms,
leaving the actual verb-complement agreement to the feature restrictions. Grammar 4.14 shows the
S network. Arcs are numbered using the conventions previously. For instance, the arc S3/1 is the
arc labeled 1 leaving node 83. The NP network in Grammar 4.15 allows simple names, bare plural
nouns, pronouns, and a simple sequence of a determiner followed by an adjective and a head noun.
Allowable noun complements include an optional number of prepositional phrases. The
prepositional phrase network in Grammar 4.16 is straightforward. Examples of parsing sentences
with this grammar are left for the exercises.

Presetting Registers
One further extension to the feature-manipulation facilities in ATNs involves the ability to
preset registers in a network as that network is being called, much like parameter passing in a
programming language. This facility, called the SENDR action in the original ATN systems, is useful
to pass information to the network that aids in analyzing the new constituent.
Consider the class of verbs, including want and pray, that accept complements using the
infinitive forms of verbs, which are introduced by the word to. According to the classification in
Section 4.2, this includes the following:
Grammar 4.16 The PP network

_vp:inf Mary wants to have a party.


_np_vp:inf Mary wants John to have a party.

In the context-free grammar developed earlier, such complements were treated as VPs with the
VFORM value inf. To capture this same analysis in an ATN, you would need to be able to call a
network corresponding to VPs but preset the VFORM register in that network to inf. Another
common analysis of these constructs is to view the complements as a special form of sentence with
an understood subject. In the first case it is Mary who would be the understood subject (that is, the
host), while in the other case it is John. To capture this analysis, many ATN grammars preset the
SUBJ register in the new S network when it is called.
UNIT – IV
Semantic Interpretation: word senses and ambiguity, Basic logical form language, Encoding
ambiguity in logical from, Thematic roles, Linking syntax and semantics, Recent trends in NLP.

Semantics and Logical Form

This chapter introduces the basic ideas underlying theories of meaning, or semantics. It introduces
a level of context-independent meaning called the logical form, which can be produced directly
from the syntactic structure of a sentence. Because it must be context independent, the logical form
does not contain the results of any analysis that requires interpretation of the sentence in context.

4.1 Semantics and Logical Form


Precisely defining the notions of semantics and meaning is surprisingly difficult because the terms
are used for several different purposes in natural and technical usage. For instance, there is a use of
the verb "mean" that has nothing to do with language. Say you are walking in the woods and come
across a campfire that is just noticeably warm. You might say "This fire means someone camped here
last night". By this you mean that the fire is evidence for the conclusion or implies the conclusion.
This is related to, but different from, the notion of meaning that will be the focus of the next few
chapters. The meaning we want is closer to the usage when defining a word, such as in the sentence
"'Amble' means to walk slowly". This defines the meaning of a word in terms of other words. To
make this more precise, we will have to develop a more formally specified language in which we
can specify meaning without having to refer back to natural language itself. But even if we can do
this, defining a notion of sentence meaning is difficult. For example, I was at an airport recently and
while I was walking towards my departure gate, a guard at the entrance asked, "Do you know what
gate you are going to?" I interpreted this as asking whether I knew where I was going and
answered yes. But this response was based on a misunderstanding of what the guard meant, as he
then asked, "Which gate is it?" He clearly had wanted me to tell him the gate number. Thus the
sentence "Do you know what gate you are going to?" appears to mean different things in different
contexts.

Figure 4.1 Logical form as an intermediate representation

Can we define a notion of sentence meaning that is independent of context? In other words,
is there a level at which the sentence "Do you know what gate you are going to?" has a single
meaning, but may be used for different purposes? This is a complex issue, but there are many
advantages to trying to make such an approach work. The primary argument is modularity. If such
a division can be made, then we can study sentence meaning in detail without all the complications
of sentence usage. in particular, if sentences have no context-independent meaning, then we may
not be able to separate the study of language from the study of general human reasoning and
context. As you will see in the next few chapters, there are many examples of constraints based on
the meaning of words that appear to be independent of context. So from now on, we will use the
term "meaning" in this context-independent sense, and we will use the term "usage" for the
context-dependent aspects. The representation of context-independent meaning is called the
logical form. The process of mapping a sentence to its logical form is called semantic interpretation,
and the process of mapping the logical form to the final knowledge representation (KR) language is
called contextual interpretation. Figure 4.1 shows a simple version of the stages of interpretation.
The exact meaning of the notations used will be defined later.
For the moment let us assume the knowledge representation language is the first-order
predicate calculus (FOPC). Given that assumption, what is the status of the logical form? In some
approaches the logical form is defined as the literal meaning of the utterance, and the logical form
language is the same as the final knowledge representation language. If this is to be a viable
approach in the long run, however, it would mean that the knowledge representation must be
considerably more complex than representations in present use in Al systems. For instance, the
logical form language must allow indexical terms, that is, terms that are defined by context. The
pronouns "I" and "you" are indexical because their interpretation depends on the context of who is
speaking and listening. In fact most definite descriptions (such as "the red ball") are indexical, as
the object referred to can only be identified with respect to a context. Many other aspects of
language, including the interpretation of tense and determining the scope of quantifiers, depend on
context as well and thus cannot be uniquely determined at the logical form level. Of course, all of
this could be treated as ambiguity at the logical form level, but this would be impractical, as every
sentence would have large numbers of possible logical forms (as in the sentence "The red ball
dropped", which would have a different logical form for every possible object that could be
described as a ball that is red).
But if the logical form language is not part of the knowledge representation language, what
is its formal status? A promising approach has been developed in linguistics over the last decade
that suggests an answer that uses the notion of a situation, which is a particular set of
circumstances in the world. This corresponds reasonably well to the intuitive notion of the
meaning of "situation" in English. For instance, when attending a class, you are in a situation where
there are fellow students and an instructor, where certain utterances are made by the lecturer,
questions asked, and so on. Also, there will be objects in the lecture hail, say a blackboard and
chairs, and so on. More formally, you might think of a situation as a set of objects and relations
between those objects. A very simple situation might consist of two objects, a ball B0005 and a
person P86, and include the relationship that the person owns the ball. Let us encode this situation
as the set {(BALL B0005), (PERSON P86), (OWNS P86 B0005)}.
Language creates special types of situations based on what information is conveyed. These
issues will be explored in detail later, but for now consider the following to help your intuition. In
any conversation or text, assume there is a discourse situation that records the information
conveyed so far. A new sentence is interpreted with respect to this situation and produces a new
situation that includes the information conveyed by the new sentence. Given this view, the logical
form is a function that maps the discourse situation in which the utterance was made to a new
discourse situation that results from the occurrence of the utterance. For example, assume that the
situation we just encoded has been created by some preceding sentences describing the ball and
who owns it. The utterance "The ball is red" might produce a new situation that consists of the old
situation plus the new fact that B0005 has the property RED: ((BALL B0005), (PERSON P86),
(OWNS P86 B0005), (RED B0005)}. Figure 8.2 shows this view of the interpretation process,
treating the logical form as a function between situations. The two organizations presented in
Figures 4.1 and 4.2 differ in that the latter might not include a single identifiable expression in the
knowledge representation that fully captures the ”meaning” of the sentence. Rather, the logical
form might make a variety of changes to produce the updated situation. This allows other
implications to be derived from an utterance that are not directly captured in the semantic content
of the sentence.
Such issues will become important later when we discuss contextual interpretation.

Figure 4.2 Logical form as a function

Even though much of language is highly context dependent, there is still considerable
semantic structure to language that is context independent and that can be used in the semantic
interpretation process to produce the logical form. Much of this semantic knowledge consists of the
type of information you can find in dictionaries - the basic semantic properties of words (that is,
whether they refer to relations, objects, and so on), what different senses are possible for each
word, what senses may combine to form larger semantic structures, and so on. Identifying these
forms of information and using this information to compute a logical form are the focus of Part II of
the book.

4.2 Word Senses and Ambiguity

To develop a theory of semantics and semantic interpretation, we need to develop a structural


model, just as we did for syntax. With syntax we first introduced the notion of the basic syntactic
classes and then developed ways to constrain how simple classes combine to form larger
structures. We will follow the same basic strategy for semantics. You might think that the basic
semantic unit could be the word or the morpheme, but that approach runs into problems because
of the presence of ambiguity. For example, it is not unusual for the verb go to have more than 40
entries in a typical dictionary. Each one of these definitions reflects a different sense of the word.
Dictionaries often give synonyms for particular word senses. For go you might find synonyms such
as move, depart, pass, vanish, reach, extend, and set out. Many of these highlight a different sense of
the verb go. Of course, if these are true synonyms of some sense of go, then the verbs themselves
will share identical senses. For instance, one of the senses of go will be identical to one of the senses
of depart.
If every word has one or more senses then you are looking at a very large number of senses,
even given that some words have synonymous senses. Fortunately, the different senses can be
organized into a set of broad classes of objects by which we classify the world. The set of different
classes of objects in a representation is called its ontology. To handle a natural language, we need a
much broader ontology than commonly found in work on formal logic. Such classifications of
objects have been of interest for a very long time and arise in the writings of Aristotle (384—322
B.C.). The major classes that Aristotle suggested were substance (physical objects), quantity (such
as numbers), quality (such as bright red), relation, place, time, position, state, action, and affection.
To this list we might add other classes such as events, ideas, concepts, and plans. Two of the most
influential classes are actions and events. Events are things that happen in the world and are
important in many semantic theories because they provide a structure for organizing the
interpretation of sentences. Actions are things that agents do, thus causing some event. Like all
objects in the ontology, actions and events can be referred to by pronouns, as in the discourse
fragment
We lifted the box. It was hard work.
Here, the pronoun "it" refers to the action of lifting the box. Another very influential category is the
situation. As previously mentioned, a situation refers to some particular set of circumstances and
can be viewed as subsuming the notion of events. In many cases a situation may act like an
abstraction of the world over some location and time. For example, the sentence "We laughed and
sang at the football game" describes a set of activities performed at a particular time and location,
described as the situation "the football game".
Not surprisingly, ambiguity is a serious problem during semantic interpretation. We can
define a word as being semantically ambiguous if it maps to more than one sense. But this is more
complex than it might first seem, because we need to have a way to determine what the allowable
senses are. For example, intuitively the word "kid" seems to be ambiguous between a baby goat and
a human child. But how do we know it doesn't have a single sense that includes both
interpretations? The word "horse", on the other hand, seems not to be ambiguous even though we
know that horses may be subdivided into mares, colts, trotters, and so on. Why doesn't the word
"horse" seem ambiguous when each time the word is used we might not be able to tell if it refers to
a mare or a colt? A few linguistic tests have been suggested to define the notion of semantic
ambiguity more precisely. One effective test exploits the property that certain syntactic constructs
typically require references to identical classes of objects. For example, the sentence "I have two
kids and George has three" could mean that George and I are goat farmers or that we have children,
but it can't mean a combination of both (I have goats and George has children). On the other hand
you can say "I have one horse and George has two", even when I have a colt and George has mares.
Thus this test provides a way to examine our intuitions about word senses. The word "kid"
is ambiguous between two senses, BABY-GOAT1 and BABY-HUMAN1, whereas "horse" is not
ambiguous between MARE1 and COLT1 but rather has a sense HORSE1 that includes both. This
brings up the important point that some senses are more specific than others. This property is
often referred to as vagueness. The sense HORSE1 is vague to the extent that it doesn't distinguish
between mares and colts. Of course, the sense MARE1 is vague with respect to whether it is a large
mare or a small one. Virtually all senses involve some degree of vagueness, as they might always
allow some more precise specification.
A similar ambiguity test can be constructed for verb senses as well. For example, the
sentence "I ran last year and George did too" could mean that we both were candidates in an
election or that we both ran some race, but it would be difficult to read it as a mixture of the two.
Thus the word run is ambiguous between the senses RUN1 (the exercise sense) and RUN2 (the
political sense). In contrast, the verb "kiss" is vague in that it does not specify where one is kissed.
You can say "I kissed Sue and George did too", even though I kissed her on the cheek and
George kissed her hand.
In addition to lexical ambiguity, there is considerable structural ambiguity at the semantic
level. Some forms of ambiguity are parasitic on the underlying syntactic ambiguity. For instance,
the sentence "Happy cats and dogs live on the farm" is ambiguous between whether the dogs are
also happy or not (that is, is it happy cats and happy dogs, or happy cats and dogs of any
disposition). Although this ambiguity does have semantic consequences, it is actually rooted in the
syntactic structure; that is, whether the conjunction involves two noun phrases, (Happy cats) and
(dogs), or the single noun phrase (Happy (cats and dogs)). But other forms of structural ambiguity
are truly semantic and arise from a single syntactic structure. A very common example involves
quantifier scoping. For
instance, does the sentence "Every boy loves a dog" mean that there is a single dog that all boys love,
or that each boy might love a different dog? The syntactic structure is the same in each case, but the
difference lies in how the quantifiers are scoped. For example, the two meanings correspond
roughly to the following statements in FOPC:
ꓱ d. Dog(d) & ꓯ b. Boy(b) Ↄ Loves (b,d)
ꓯ b. Boy(b) Ↄ ꓱ d. Dog(d) & Loves (b,d)
Thus, while Every boy loves a dog has a single syntactic structure, its semantic structure is
ambiguous. Quantifiers also vary with respect to vagueness. The quantifier all is precise in
specifying every member of some set, but a quantifier such as many, as in Many people saw the
accident, is vague as to how many people were involved.
You might also think that indexical terms such as you, I, and here are ambiguous because
their interpretations depend on context. But this is not the sense of ambiguity being discussed here.
Note that the word dog is not considered ambiguous because there are many dogs to which the
noun phrase the dog could refer. The semantic meaning of the dog is precisely determined.
Likewise, the pronoun 1 may be unspecified as to its referent, but it has a single well-defined sense.
While the referents of phrases are context dependent and thus beyond the scope of this
discussion, there are context-independent constraints on reference that must be accounted for.
Consider the sentence Jack saw him in the mirror. While it is not specified who was seen, you know
that it wasn't Jack seeing himself. To express this meaning, you would have to use the reflexive
pronoun, as in Jack saw himself in the mirror. Thus certain reference constraints arise because of
the structure of sentences.
A very important aspect of context-independent meaning is the co occurrence constraints
that arise between word senses. Often the correct word sense can be identified because of the
structure and meaning of the rest of the sentence. Consider the verb "run", which has one sense
referring to the action you do when jogging and another referring to the action of operating some
machine. The first sense is typically realized as an intransitive verb (as in Jack ran in the park),
whereas the second can only be realized as a transitive verb (as in Jack ran the printing press for
years). In other cases the syntactic structure remains the same, but the possible senses of the words
can only combine in certain ways. For instance, consider Jack ran in the park versus Jack ran in the
election. The syntactic structures of these sentences are identical, but different senses of the verb
run must be selected because of the possible senses in the modifier. One of the most important
tasks of semantic interpretation is to utilize constraints such as this to help reduce the number of
possible senses for each word.

4.3 The Basic Logical Form Language


The last section introduced a primitive unit of meaning, namely the word sense. This section
defines a language in which you can combine these elements to form meanings for more complex
expressions. This language will resemble FOPC, although there are many equivalent forms of
representation, such as network-based representations, that use the same basic ideas. The word
senses will serve as the atoms or constants of the representation. These constants can be classified
by the types of things they describe. For instance, constants that describe objects in the world,
including abstract objects such as events and situations, are called terms. Constants that describe
relations and properties are called predicates. A proposition in the language is formed from a
predicate followed by an appropriate number of terms to serve as its arguments. For instance, the
proposition corresponding to the sentence "Fido is a dog" would be constructed from the term
FIDO1 and the predicate constant DOG1 and is written as
(DOG1 FIDO1)
Predicates that take a single argument are called unary predicates or properties; those
that take two arguments, such as LOVES1, are called binary predicates; and those that take n
arguments are called n-ary predicates. The proposition corresponding to the sentence Sue loves
Jack would involve a binary predicate LOVES1 and would be written as
(LOVES1 SUE1 JACK1)
You can see that different word classes in English correspond to different types of constants
in the logical form. Proper names, such as Jack, have word senses that are terms; common nouns,
such as dog, have word senses that are unary predicates; and verbs, such as run, love, and put, have
word senses that correspond to n-ary predicates, where n depends on how many terms the verb
subcategorizes for.

Figure 4.3 Two possible network representations of "Sue loves Jack"

Note that while the logical forms are presented in a predicate-argument form here, the same
distinctions are made in most other meaning representations. For instance, a network
representation would have nodes that correspond to the word senses and arcs that indicate the
predicate-argument structure. The meaning of the sentence Sue loves Jack in a semantic network
like representation might appear in one of the two forms shown in Figure 4.3. For most purposes
all of these representation formalisms are equivalent.
More complex propositions are constructed using a new class of constants called logical
operators. For example, the operator NOT allows you to construct a proposition that says that some
proposition is not true. The proposition corresponding to the sentence Sue does not love Jack would
be
(NOT (LOVES1 SUE1 JACK1))
English also contains operators that combine two or more propositions to form a complex
proposition. FOPC contains operators such as disjunction (v), conjunction (&), what is often called
implication ( ⊃ ) and other forms (there are 16 possible truth functional binary operators in FOPC).
English contains many similar operators including or, and, if only if, and so on. Natural language
connectives often involve more complex relationships between sentences. For instance, the
conjunction "and" might correspond to the logical operator "&" but often also involves temporal
sequencing, as in "I went home and had a drink", in which going home preceded having the drink.
The connective "but", on the other hand, is like "and" except that the second argument is something
that the hearer might not expect to be true given the first argument. The general form for such a
proposition is (connective proposition proposition). For example, the logical form of the sentence
"Jack loves Sue or Jack loves Mary" would be (OR1 (LOVES1 JACK1 SUE1) (LOVES1 JACK1
MARY1)). The logical form language will allow both operators corresponding to word senses and
operators like "&" directly from FOPC. The logic based operators will be used to connect
propositions not explicitly conjoined in the sentence.
With the simple propositions we just defined, we can define the meaning of only a very
limited subset of English, namely sentences consisting of simple verb forms and proper names. To
account for more complex sentences, we must define additional semantic constructs. One
important construct is the quantifier. In first-order predicate calculus, there are only two
quantifiers: Ɐ and ꓱ . English contains a much larger range of quantifiers, including all, some, most,
many, a few, the, and so on. To allow quantifiers, variables are introduced as in first-order logic but
with an important difference. In first-order logic a variable only retains its significance within the
scope of the quantifier. Thus two instances of the same variable x occurring in two different
formulas - say in the formulas ꓱ x. P(x) and ꓱ x. Q(x) - are treated as completely different variables
with no relation to each other. Natural languages display a different behavior. For instance,
consider the two sentences "A man entered the room. He walked over to the table." The first
sentence introduces a new object to the discussion, namely some man. You might think to treat the
meaning of this sentence along the lines of the existential quantifier in logic. But the problem is that
the man introduced existentially in the first sentence is referred to by the pronoun "He" in the
second sentence. So variables appear to continue their existence after being introduced. To allow
this, each time a discourse variable is introduced, it is given a unique name not used before. Under
the right circumstances, a subsequent sentence can then refer back to this term.

Figure 4.4 Some common quantifiers

Natural language quantifiers have restricted ranges and thus are more complex than those
found in FOPC. In FOPC a formula of the form Ɐx. Px is true if and only if Px is true for every
possible object in the domain (that is, x may be any term in the language). Such statements are rare
in natural language. Rather, you would say "all dogs bark" or "most people laughed", which require
constructs that are often called generalized quantifiers. These quantifiers are used in statements of
the general form
(quantifier variable : restriction-proposition body-proposition)
For instance, the sentence "Most dogs bark" would have the logical form
(MOST1 d1 (DOG1 d1) (BARKS1 d1))
This means that most of the objects dl that satisfy (DOG1 d1) also satisfy (BARKS1 d1). Note that
this has a very different meaning from the formula
(MOST d2 : (BARKS1 d2) (DOG1 d2))
which roughly captures the meaning of the sentence "Most barking things are dogs".
A very important class of generalized quantifiers corresponds to the articles "the" and "a". The
sentence "The dog barks" would have a logical form
(THE x: (DOG1 x) (BARKS1 x))
which would be true only if there is a uniquely determined dog in context and that dog barks.
Clearly, in any natural setting there will be many dogs in the world, so the use of context to identify
the correct one is crucial for understanding the sentence. Since this identification process requires
context, however, discussion of it is delayed until Part III. Here it suffices to have a way to write the
logical form. The special set of quantifiers corresponding to the articles (or the absence of articles
in bare noun phrases) is shown in Figure 4.4.
More complex noun phrases will result in more complex restrictions. For instance, the
sentence "The happy dog barks" will involve a restriction that is a conjunction, namely (THE x (&
(DOG1 x) (HAPPY x)) (BARKS1 x))
This will be true only if there is a contextually unique x such that (& (DOG1 x) (HAPPY x)) is true,
and this x barks.
Another construct needs to be introduced to handle plural forms, as in a phrase such as the
dogs bark. A new type of constant called a predicate operator is introduced that takes a predicate as
an argument and produces a new predicate. For plurals the predicate operator PLUR will be used. If
DOG1 is a predicate that is true of any dog, then (PLUR DOG1) is a predicate that is true of any set
of dogs. Thus the representation of the meaning of the sentence "The dogs bark" would be
(THE x: ((PLUR DOG1) x) (BARKS1 x))
Plural noun phrases introduce the possibility of a new form of ambiguity. Note that the natural
reading of The dogs bark is that there is a specific set of dogs, and each one of them barks. This is
called the distributive reading, since the predicate BARKS1 is distributed over each element of the
set. In contrast, consider the sentence The dogs met at the corner. In this case, it makes no sense to
say that each individual dog met; rather the meeting is true of the entire set of dogs. This is called
the collective reading. Some sentences allow both interpretations and hence are ambiguous. For
instance, the sentence Two men bought a stereo can mean that two men each bought a stereo (the
distributive reading), or that two men bought a stereo together (the collective reading).
The final constructs to be introduced are the modal operators. These are needed to
represent the meaning of verbs such as believe and want, for representing tense, and for many
other constructs. Modal operators look similar to logical operators but have some important
differences. Specifically, terms within the scope of a modal operator may have an interpretation
that differs from the normal one. This affects what conclusions you can draw from a proposition.
For example, assume that Jack is also known as John to some people. There are two word senses
that are equal; that is, JACK1 = JOHN22. With a simple proposition, it doesn't matter which of these
two constants is used: if (HAPPY JOHN22) is true then (HAPPY JACK1) is true, and vice versa. This
is true even in complex propositions formed from the logical operators. If (OR (HAPPY JOHN1)
(SAD JOHN1)) is true, then so is (OR (HAPPY JACK1) (SAD JACK1)), and vice versa. The same
propositions within the scope of a modal operators such as BELIEVE1, however, are not
interchangeable. For instance, if Sue believes that Jack is happy, that is,
(BELIEVE SUE1 (HAPPY JACK1))
then it does not necessarily follow that Sue believes John is happy, that is,
(BELIEVE SUE (HAPPY JOHN22))
because Sue might not know that JACK1 and JOHN22 are the same person. Thus you cannot freely
substitute equal terms when they occur within the scope of a modal operator. This is often referred
to as the failure of substitutivity in modal contexts.
An important class of modal operators for natural language are the tense operators, PAST,
PRES, and FUT. So far all examples have ignored the effect of tense. With these new operators,
however, you can represent the difference
in meaning between John sees Fido, John saw Fido, and John will see Fido, namely as the propositions
(PRES (SEES1 JOHN1 FIDO1))
(PAST (SEES1 JOHN1 FIDO1))
(FUT (SEES1 JOHN1 FIDO1))
You can see that these are modal operators because they exhibit the failure of substitutivity. For
example, consider the operator PAST, and assume two constants, say JOHN1 and PRESIDENT1, that
are equal now, indicating that John is currently the president. But in the past, John was not the
president, so JOHN1 did not equal
PRESIDENT!. Given this and the fact that John saw Fido in the past, (PAST (SEES1 JOHN1 FJDO 1)),
you cannot conclude that the president saw Fido in the past, that is, (PAST (SEES1 PRESIDENT1
FIDO1)), since John was not the president at that time. Note also that a proposition and its negation
can both be true in the past (but at different times). Thus it is possible for both the sentences John
was happy and John was not happy to be true; that is, (PAST (HAPPY JOHN!)) and (PAST (NOT
(HAPPY JOHN1))) are both true.
This completes the specification of the basic logical form language. The next sections discuss
various extensions that make the language more convenient for expressing ambiguity and
capturing semantic regularities.

4.4 Encoding Ambiguity in the Logical Form

The previous sections defined many of the constructs needed to specify the logical form of a
sentence, and if you were interested solely in the nature of logical form, you could be finished. But
representations for computational use have another important constraint on them, namely the
handling of ambiguity. A typical sentence will have multiple possible syntactic structures, each of
which might have multiple possible logical forms. In addition, the words in the sentence will have
multiple senses. Simply enumerating the possible logical forms will not be practical. Rather, we will
take an approach where certain common ambiguities can be collapsed and locally represented
within the logical form, and we will develop techniques to incrementally resolve these ambiguities
as additional constraints from the rest of the sentence and from context are brought into play.
Many researchers view this ambiguity encoding as a separate level of representation from the
logical form, and it is often referred to as the quasi-logical form.
Perhaps the greatest source of ambiguity in the logical form comes from the fact that most
words have multiple senses. Some of these senses have different structural properties, so they can
be eliminated given the context of the surrounding sentence. But often words have different senses
that have identical structural constraints. At present, the only way to encode these would be to
build a separate logical form for each possible combination of senses for the words in the sentence.
To reduce this explosion of logical forms, we can use the same technique used to handle multiple
feature values in the syntactic structure. Namely, anywhere an atomic sense is allowed, a set of
possible atomic senses can be used. For example, the noun "ball" has at least two senses: BALL1,
the object used in games, and BALL2, the social event involving dancing. Thus the sentence "Sue
watched the ball" is ambiguous out of context. A single logical form can represent these two
possibilities, however:
1. (THE b1 : ({BALL1 BALL2} b1) (PAST (WATCH1 SUE1 b1 )))
This abbreviates two possible logical forms, namely
2. (THE b1 : (BALL1 b1) (PAST (WATCH1 SUE1 b1 )))
and
3. (THE b1 : (BALL2 b1) (PAST (WATCH1 SUE1 b1 )))
One of the most complex forms of ambiguity in logical forms arises from the relative scoping
of the quantifiers and operators. You saw in Section 4.1 that a sentence such as "Every boy loves a
dog" is ambiguous between two readings depending on the scope of the quantifiers. There is no
context-independent method for resolving such issues, so the ambiguity must be represented in the
final logical forms for the sentence. Rather than enumerating all possible scopings, which would
lead to an exponentially growing number of interpretations based on the number of scoping
constructs, we introduce another abbreviation into the logical form language that collapses
interpretations together. Specifically, the abbreviated logical form does not contain scoping
information at all. Rather, constructs such as generalized quantifiers are treated syntactically like
terms and appear in the position indicated by the syntactic structure of the sentence. They are
marked using angle brackets to indicate the scoping abbreviation. For example, the logical forms
for the sentence "Every boy loves a dog" are captured by a single ambiguous form
(LOVES1 <EVERY b1 (BOY1 b1)> <A d1 (DOG1 d1)>)
This abbreviates an ambiguity between the logical form
(EVERY b1 : (BOY1 b1) (A d1 (DOG1 d1) (LOVES1 b1 d1)))
and
(A d1: (DOG1 d1) (EVERY b1 : (BOY1 b1) (LOVES1 b1 d1)))
While the savings don't amount to much in this example, consider that a sentence with four
scoping constructs in it would have 24 (4 factorial) possible orderings, and one with five scoping
constructs would have 120 orderings. The abbreviation convention allows all these forms to be
collapsed to a single representation. Chapter 12 will explore some heuristic techniques for
determining the scope of operators. For the time being, however, it is reasonable to assume that no
context-independent scoping constraints need be represented.
If the restriction in a generalized quantifier is a proposition involving a simple unary
predicate, a further abbreviation will be used that drops the variable. Thus the form <EVERY b1
(BOY b1)> will often be abbreviated as <EVERY b1 BOY>
A large number of constructs in natural language are sensitive to scoping. In particular, all
the generalized quantifiers, including "the", are subject to scoping. For example, in "At every hotel,
the receptionist was friendly", the preferred reading in almost any context has the definite reference
"the receptionist" fall within the scope of "every hotel"; that is, there is a different receptionist at
each hotel.
In addition, operators such as negation and tense are also scope sensitive. For example, the
sentence "Every boy didn't run" is ambiguous between the reading in which some boys didn't run
and some did, that is,
(NOT (EVERY b1 : (BOY1 b1) (RUN1 b1 )))
and the reading where no boys ran, that is,
(EVERY b1 : (BOY1 b1) (NOT (RUN1 b1)))
These two readings are captured by the single logical form
(<NOT RUN1> <EVERY b1 BOY1>)
where unscoped unary operators (for example, NOT, PAST, PRES, and so on) are wrapped around
the predicate.
Finally, let us return to two constructs that need to be examined further: proper names and
pronouns. So far we have assumed that each proper name identifies a sense that denotes an object
in the domain. While this was a useful abstraction to introduce the basic ideas of logical form, it is
not a general enough treatment of the phenomena. In fact, proper names must be interpreted in
context, and the name John will refer to different people in different situations. Our revised
treatment resolves these problems by using a discourse variable that has the property of having the
specified name. We will introduce this construct as a special function, namely
(NAME <variable> <name>)
which produces the appropriate object with the name in the current context. Thus, the logical form
of "John ran" would be (<PAST RUN1> (NAME j1 "John")).
Arguments similar to those previously given for proper names can be made for pronouns
and other indexical words, such as "here" and "yesterday", and we will treat them using a special
function of the form (PRO <variable> <proposition>). For example, the quasi-logical form for "Every
man liked him" would be
(<PAST LIKE1> <EVERY m1 MAN1> (PRO m2 (HE1 m2)))
HE1 is the sense for "he" and "him", and formally is a predicate true of objects that satisfy the
restrictions on any antecedent, that is, being animate-male in this case. As with generalized
quantifiers, when the restriction is a simple unary predicate, the pro forms are often abbreviated.
For example, the logical form for "he" will often be
written as (PRO m2 HE1).
The constructs described in this chapter dramatically reduce the number of logical forms
that must be initially computed for a given sentence. Not all ambiguities can be- captured by the
abbreviations, however, so there will still be sentences that will require a list of possible logical
forms, even for a single syntactic structure.

4.6 Thematic Roles

This section examines theories based on the notion of thematic roles, or cases. One motivating
example from the last section included the sentences
John broke the window with the hammer.
The hammer broke the window.
The window broke.
"John", " the hammer", and "the window" play the same semantic roles in each of these sentences.
"John" is the actor, "the window" is the object, and "the hammer" is the instrument used in the act of
breaking of the window. We introduced relations such as AGENT, THEME, and INSTR to capture
these intuitions. But can we define these relations more precisely, and what other thematic roles
have proved useful in natural language systems? These issues are explored in this section.
Perhaps the easiest thematic role to define is the AGENT role. A noun phrase fills the AGENT
role if it describes the instigator of the action described by the sentence. Further, this role may
attribute intention, volition, or responsibility for the action to the agent described. One test for
AGENT-hood involves adding phrases like "intentionally" or "in order to" to active voice sentences.
If the resulting sentence is well formed, the subject NP can fill the AGENT role. The following
sentences are acceptable:
John intentionally broke the window.
John broke the window in order to let in some air.
But these sentences are not acceptable:
* The hammer intentionally broke the window.
* The window broke in order to let in some air.
Thus the NP "John" fills the AGENT role only in the first two sentences.
Not all animate NPs, even in the subject position, fill the AGENT role. For instance, you cannot
normally say
* John intentionally died.
* Mary remembered her birthday in order to get some presents.
Of course, by adding the phrase intentionally to the first sentence, you may construct some
plausible reading of the sentence ("John killed himself"), but this is a result of modifying the initial
meaning of the sentence "John died".
NPs that describe something undergoing some change or being acted upon will fill a role
called THEME. This usually corresponds to the syntactic OBJECT and, for any transitive verb X, is
the answer to the question "What was Xed?". For example, given the sentence "The gray eagle saw
the mouse", the NP "the mouse" is the THEME and is the answer to the question "What was seen?".
For intransitive verbs, the THEME role is used for the subject NPs that are not AGENTs. Thus in
"The clouds appeared over the horizon", the NP "the clouds" fills the THEME
role. More examples follow, with the THEME NP in italics:
The rock broke.
John broke the rock.
I gave John the book.
A range of roles has to do with locations, or abstract locations. First, we must make the
distinction mentioned earlier between relations that indicate a location or place and those that
indicate motion or paths. The AT-LOC relation indicates where an object is or where an event takes
place, as in
Harry walked on the road.
The chair is by the door.
On the road describes where the walking took place, while by the door describes where the chair is
located.
Other phrases describe changes in location, direction of motion, or paths:
I walked from here to school yesterday.
It fell to the ground.
The birds flew from the lake along the river gorge.
There are at least three different types of phrases here: those that describe where something came
from (the FROM-LOC role), such as "from here"; those that describe the destination (the TO-LOC
role), such as "to the ground"; and those that describe the trajectory or path (the PATH-LOC role),
such as "along the gorge".
These location roles can be generalized into roles over arbitrary state values, called the AT
role, and roles for arbitrary state change (the FROM, TO, and PATH roles). Thus AT-LOC is a
specialization of the AT role, and so on. You can see other specializations of these roles when you
consider the abstract relation of possession:
I threw the ball to John. (the TO-LOC role)
I gave a book to John. (the TO-POSS role)
I caught the ball from John. (the FROM-LOC role)
I borrowed a book from John. (the FROM-POSS role)
The box contains a ball. (the AT LOC role)
John owns a book. (the AT POSS role)
Similarly, you might define AT-TIME, TO-TIME, and FROM-TIME roles, as in
I saw the car at 3 o'clock. (the AT-TIME role)
I worked from one until three. (the FROM-TIME and TO-TIME role)
The roles apply to general state change as well, as with temperature in
The temperature remains at zero. (AT VALUE)
The temperature rose from zero. (FROM-VALUE)
Thus the notion of general value and change of value along many dimensions seems to be
supported by the similarity of the ways of realizing these roles in sentences.
Another role is motivated by the problem that, given the present taxonomy, you cannot easily
classify the role of the NP in a sentence such as
John believed that it was raining.
The THEME role is filled with the clause "that it was raining", since this is what is believed. "John"
cannot be an AGENT because there is no intentionality in believing something. Thus you must
introduce a new role, called EXPERIENCER, which is filled by animate objects that are in a
described psychological state, or that undergo some psychological process, such as perception, as
in the preceding sentence and as in
John saw the unicorn.
Another role is the BENEFICIARY role, which is filled by the animate person for whom a certain
event is performed, as in
I rolled on the floor for Lucy.
Find me the papers!
I gave the book to Jack for Susan.
The last example demonstrates the need to distinguish the TO-POSS role (that is, to Jack) from the
BENEFICIARY role.
The INSTR role describes a tool, material, or force used to perform some event, as in
Harry broke the glass with the telescope.
The telescope broke the glass.
I used some flour to make a cake.
I made a cake with some flour.
Depending on the verb, the INSTR role sometimes can be used as the surface subject when the
AGENT role is not specified. Natural forces are also included in the INSTR category here, although
you could argue for a different analysis. Thus the following are also examples of the INSTR role:
The sun dried the apples.
Jack used the sun to dry the apples.
The AGENT and INSTR roles could be combined into a more general role named CAUSAL-
AGENT.
Other roles need to be identified before certain sentences can be analyzed. For example,
some sentences describe situations where two people perform an act together:
Henry lifted the piano with Jack.
To handle this, you must introduce a role CO-AGENT to account for the PP with Jack.
A more complicated case occurs in sentences involving exchanges or other complex
interactions. For example, consider the sentences
Jack paid $1 to the man for the book.
Jack bought the book from the man for $1.
These sentences both describe a situation where Jack gives the man $1 and receives the book in
exchange. In the first sentence, however, the $1 is the THEME and there is no role to account for the
book. In the second sentence the situation is reversed: the book is the THEME and $1 is
unaccounted for. To handle these cases you must add a role CO-THEME for the second object in an
exchange.
A more general solution to this problem would be to analyze such sentences as describing
two events. The primary event is the one you have been considering so far, but a secondary event
may be present. In this analysis you might analyze the first sentence as follows (the primary event
being Jack paying the dollar, and the secondary being Jack receiving the book):
Jack: AGENT of both PRIMARY and SECONDARY event
$1: THEME of PRIMARY event
the man: TO-POSS of PRIMARY, FROM-POSS of SECONDARY
the book: THEME of SECONDARY event
Roles and Other Common Definition
Subroles Names
CAUSAL-AGENT the object that caused the
event
AGENT intentional causation
INSTR force/tool used in causing
the event
THEME PATIENT the thing affected by the
event
EXPERIENCER the person involved in
perception
or a physical/psychological
state
BENEFICIARY the person for whom an act is
done
AT the state/value on some
dimension
AT-LOC LOCATION current location
AT-POSS POSSESSOR current possessor
AT-VALUE current value
AT-TIME current time
TO final value in a state change
TO-LOC DESTINATION final location
TO-POSS RECIPIENT final possessor
TO-VALUE final value
FROM original value in a state
change
FROM-LOC SOURCE original location
FROM-POSS original possessor
FROM- original value
VALUE
PATH path over which something
travels
CO-AGENT secondary agent in an action
CO-THEME secondary theme in an
exchange
Figure 4.5 Some possible semantic roles
This possibility will not be pursued further, however, since it leads into many issues not
relevant to the remainder of this chapter.
Figure 4.5 provides a summary of most of the roles distinguished thus far and the
hierarchical relationships between them.
As you've seen, verbs can be classified by the thematic roles that they require. To classify
them precisely, however, you must make a distinction between roles that are "intimately" related
to the verb and those that are not. For example, almost any past tense verb allows an AT-TIME role
realized by the adverb "yesterday". Thus this role is apparently more a property of verb phrases in
general than a property of any individual verb. However, other roles – namely, those realized by
constituents for which the verb subcategorizes - seem to be properties of the verb. For example, the
verb "put" subcategorizes for a PP, and furthermore, this PP must realize the TO-LOC role. In verb
classification this latter type of role is important, and these roles are called the inner roles of the
verb.
The preceding examples suggest one test for determining whether a given role is an inner
role for a given verb: if the role is obligatory, it is an inner role. Other inner roles, however, appear
to be optional, so other tests are also needed. Another test is based on the observation that all verbs
may take at most one NP in any given inner role. If multiple NPs are needed, they must be related
by a conjunction. Thus you can say
John and I ran to the store.
but not
* John I ran to the store.
Similarly, you can say
I ran to the store and to the bank.
but not
* I ran to the store to the bank.
Thus the AGENT and TO-LOC roles for the verb run are inner roles.
Verbs typically specify up to three inner roles, at least one of which must always be realized
in any sentence using the verb. Sometimes a particular role must always be present (for example,
TO-LOC with put). Typically, the THEME role is also obligatory, whereas the AGENT role is always
optional for any verb that allows the passive form.
There are also syntactic restrictions on how various roles can be realized. Figure 4.6 shows a
sample of ways that roles can be realized in different sentences.
The following are some sample sentences with each verb in italics and its argument,
whether NP, PP. or embedded 5, classified by its role in order of occurrence:
Jack ran. AGENT only
Jack ran with a crutch. AGENT + INSTR
Jack ran with a crutch for Susan. AGENT + INSTR + BENEFICIARY
Jack destroyed the car. AGENT + THEME
Jack put the car through the wall. AGENT + THEME + PATH
Jack sold Henry the car. AGENT + TO-POSS + THEME
Henry pushed the car from Jack's
house to the junkyard. AGENT + THEME + FROM-LOC + TO-LOC
Jack is tall. THEME
Henry believes that Jack is tall. EXPERIENCER + THEME
Susan owns a car. AT-POSS + THEME
Jack ran. AGENT only
I am in the closet. THEME + AT-LOC
The ice melted. THEME
Jack enjoyed the play. EXPERIENCER + THEME
The ball rolled down the hill
to the water. THEME + PATH + TO-LOC

Role Realization
AGENT as subject in active sentences
preposition by in passive sentences
THEME as object of transitive verbs
s subject of nonaction verbs
INSTR as subject in active sentences with no agent preposition with
EXPERIENCER as animate subject in active sentences with no agent
BENEFICIARY as indirect object with transitive verbs preposition for
AT-LOC prepositions in, on, beyond, etc.
AT-POSS possessive NP
as subject of sentence if no agent
TO-LOC prepositions to, into
TO-POSS preposition to, indirect object with certain verbs
FROM-LOC prepositions from, out of, etc.
FROM-POSS preposition from

Figure 4.6 Common realizations of the major roles

4.7 Linking Syntax and Semantics


This chapter discusses a particular method for linking logical forms with syntactic structures. This
will allow logical forms to be computed while parsing, a process called semantic interpretation. One
version discussed also allows a syntactic tree to be generated from a specified logical form, a
process called semantic realization. To fully couple syntax and semantics, there must be a well-
formed meaning expression for every constituent. The relation between the meaning of a
constituent and the meanings of its subconstituents can then he specified in the grammar using
features. Because each syntactic rule has a corresponding semantic interpretation rule, this method
is often referred to as a rule-by-rule style of semantic interpretation.
4.7.1 Semantic Interpretation and Compositionality
One of the principal assumptions often made about semantic interpretation is that it is a
compositional process. This means that the meaning of the constituent is derived solely from the
meanings of its subconstituents.
In linguistics compositionality is often defined in terms of a strict criterion: One
subconstituent must have as its meaning a function that maps the meanings of the other
subconstituents into the meaning of the new constituent. In computational approaches, the
requirement is often more relaxed and requires an mere -mental process of building meanings
constituent by constituent, where the meaning of a constituent is produced by some well-defined
computational function (that is, a program), that uses the meanings of the subconstituents as
inputs.
Compositional models tend to make grammars easier to extend and maintain. But
developing a compositional theory of semantic interpretation is harder than it looks. First, there
seems to be a structural inconsistency between syntactic structure and the structure of the logical
forms. For instance, a classic problem arises with quantified sentences. Consider the sentence Jill
loves every dog. The syntactic structure of this sentence clusters the words into phrases in the
obvious way: ((Jill) (loves (every dog))). But the unambiguous logical form of the sentence in a
predicate-argument form would be something like
(EVERY d : (DOG1 d) (LOVES1 l1 (NAME j1 "Jill") d))
This asserts that for every dog d there is an event l1 that consists of Jill loving d. There seems to be
no simple one-to-one correspondence between parts of the logical form and the constituents in
syntactic analysis. The phrase every dog, for instance, is a subconstituent of the VP loves every dog.
Its semantic interpretation, however – the generalized quantified phrase (EVERY d : (DOG1 d) ...) -
seems to have the meaning of the verb phrase as a part of it. Worse than that, the interpretation of
every dog seems to be split - it produces both the quantifier structure outside the predicate as well
as an argument to the predicate. As a result, it is hard to see how the meaning of every dog could be
represented in isolation and then used to construct the meaning of the sentence. This is one of
the most difficult problems to be dealt with if we are to have a compositional theory. Note that
introducing the unscoped logical form constructs provides one way around the problem. If we
define the goal of semantic interpretation as producing an unscoped logical form, the unscoped
version of this sentence would be
(LOVES1 l1 (NAME j1 "Jill") <EVERY d DOG1>)
which is much closer in structure to the syntactic form.
Another challenge to compositional theories is the presence of idioms. For instance, you may
say Jack kicked the bucket, meaning that Jack died. This interpretation seems to have no relation to
the meanings of the verb kick and nothing to do with buckets. Thus the meaning of the sentence
does not seem to be constructed out of the meaning of its subconstituents. One way to handle such
cases is to allow semantic meanings to be assigned to entire phrases rather than build the meaning
compositionally. So far, we have assumed that the primitive unit is the word (or morpheme).
Idiomatic expressions suggest that this might be generalized so that complete phrases may have a
primitive (that is, nonderived) meaning. In the previous example the verb phrase kick the bucket
has a primitive meaning similar to the verb die. This approach is supported by the observation that
certain syntactic paraphrases that are fine for sentences given their compositional meaning do not
apply for idiomatic readings. For instance, the passive sentence The bucket was kicked by Jack could
not mean that Jack died.
Interestingly, Jack kicked the bucket is ambiguous. It would have one meaning constructed
compositionally from the meanings of each of the words, say (KICK1 k1 (NAME j1 "Jack") <THE b
BUCKET>), and another constructed from the meaning of the word Jack and the primitive meaning
of the phrase kick the bucket, say (DIE1 d1 (NAME j1 "Jack")).
Another approach to this problem is to introduce new senses of words that appear in
idioms. For example, there might be a sense of kick that means DIE1, and that subcategorizes for an
object of type BUCKET1. While idioms are a very interesting and important aspect of language,
there will not be the space to deal with them in the next few chapters. For the purposes of this
book, you can assume that the primitive meanings are always associated with the word.
If the process of semantic interpretation is compositional, then you must be able to assign a
semantic structure to any syntactic constituent. For example, you must be able to assign some
uniform form of meaning to every verb phrase that can be used in any rule involving a VP as a
subconstituent. Consider the simplest case, where the VP consists of an intransitive verb, as in a
sentence such as Jack laughed. One suggestion is that the meaning of the verb phrase laughed is a
unary predicate that is true of any object that laughed in the past. Does this approach generalize? In
other words, could every VP have a meaning that is a unary predicate? Consider the sentence Jack
kissed Sue, with the logical form
(KISS1 k1 (NAME j1 "Jack") (NAME s1 "Sue"))
What is the meaning of the VP kissed Sue? Again, it could be a unary predicate that is true of any
object that kissed Sue. But so far we have no way to express such complex unary predicates. The
lambda calculus provides a formalism for this. In particular, the expression
(λ x (KISS k1 x (NAME s1 "Sue")))
is a predicate that takes one argument. You can view x as a parameter, and this predicate is true of
any object 0, such that substituting 0 for x in the expression results in a true proposition. Like any
other predicate, you can construct a proposition from a lambda expression and an argument. In the
logical form language, the following is a proposition:
((λ x (KISS1 k1 x (NAME s1 "Sue"))) (NAME j1 "Jack"))
This proposition is true if and only if (NAME j1 "Jack") satisfies the predicate (Û x (KISS1 k1 x
(NAME s1"Sue"))), which by definition is true if and only if
(KISS1 k1 (NAME j1 "Jack") (NAME s1 "Sue"))
is true. We will often say that this last expression was obtained by applying the lambda expression
(λ x (KISS1 x (NAME s1 "Sue"))) to the argument (NAME j1 "Jack"). This operation is called lambda
reduction. Given that we have had to introduce new concepts such as lambda expressions in order
to establish a close syntactic-semantic coupling, you might be tempted to abandon this approach
and develop some other method of semantic interpretation. As a grammar becomes larger and
handles more complex phenomena, however, the compositional theory becomes more
attractive. For instance, using this method, verb phrases can easily be conjoined even when they
have different syntactic structures, as in the sentence
Sue laughs and opens the door.
There are two VPs here: laughs, which has a semantic interpretation as a unary predicate true of
someone who laughs, say (􀀀 a (LAUGHS1 l2 a)); and opens the door, a unary predicate true of
someone who opens the door, namely
(λ a (OPENS1 l2 a < THE d1 DOOR1>))
These two unary predicates can be combined to form a complex unary predicate that is true of
someone who both laughs and opens the door, namely,
(λ a (& (LAUGHS1 l2 a) (OPENS1 o1 a <THE d1 DOOR1>)))
This is in exactly the right form for a VP and can be combined with other constituents like any other
VP. For instance, it can be applied to a subject NP with logical form (NAME s1 "Sue") to form the
meaning of the original sentence.
(& (LAUGHS1 l2 (NAME s1 "Sue"))
(OPENS1 o1 (NAME s1 "Sue") <THE d1 DOOR1>))
Consider another example. Prepositional phrase modifiers in noun phrases could be handled in
many different ways. For instance, we might not have an independent meaning for the phrase in the
store in the noun phrase The man in the store. Rather, a special mechanism might be used to look
for location modifiers in noun phrases and incorporate them into the interpretation. But this
mechanism would then not help in interpreting sentences like The man is in the store or The man
was thought to be in the store. If the prepositional phrase has an independent meaning, in this case
the unary predicate
(λ o (IN-LOC1 o <THE s1 STORE1>))
then this same interpretation can be used just as easily as a modifier to a noun phrase (adding a
new restriction) or as the predicate of a sentence. The logical form of the noun phrase the man in
the store would be
<THE m1 (MAN1 m1) (IN-LOC1 m1 <THE s1 STORE1>)>
while the logical form of the sentence The man is in the store would be
(IN-LOC1 <THE m1 MAN1> <THE s1 STORE1>)
These are just two simple examples. There are many other generalities that also arise if you adopt a
compositional approach to semantics.
In general, each major syntactic phrase corresponds to a particular semantic construction.
VPs and PPs map to unary predicates (possible complex expressions built out of lambda
expressions), sentences map to propositions, and NPs map to terms. The minor categories map to
expressions that define their role in building the major categories. Since every constituent in the
same syntactic category maps to the same sort of semantic construct, these can all be treated
uniformly. For example, you don’t need to know the specific structure of a VP. As long as its
meaning is a unary predicate, you can use it to build the meaning of another larger constituent that
contains it.

4.7.2 A Simple Grammar and Lexicon with Semantic Interpretation


This section constructs a simple grammar and lexicon to illustrate how the logical form can
be computed using the features while parsing. To keep the examples simple, the logical form will be
the predicate-argument structure used in the last section rather than the thematic role
representation. This will allow all verbs with the same sub-categorization structure to be treated
identically.
The main extension needed is to add a SEM feature to each lexical entry and grammatical rule. For
example, one rule in the grammar might be
(S SEM (?semvp ?semnp)) -> (NP SEM ?semnp) (VP SEM ?semvp)
Consider what this rule does given the NP subconstituent with SEM (NAME m1 "Mary") and the VP
subconstituent with SEM
(λ a (SEES1 e8 a (NAME j1 "Jack")))
The SEM feature of the new S constituent is simply the expression
(( λ a (SEES1 e8 a (NAME j1 "Jack"))) (NAME m1 "Mary"))
This expression can be simplified using lambda reduction to the formula
(SEES1 e8 (NAME m1 "Mary") (NAME j1 "Jack"))
which is the desired logical form for the sentence. Figure 4.7 shows the parse tree for this sentence
giving the SEM feature for each constituent.

Figure 4.7 A parse tree showing the SEM features


In the lexicon the SEM feature is used to indicate the possible senses of each word. Generally, a
word will have a different word sense for every possible subcategorization it has, since these will
be differentarity predicates. A sample lexicon is shown in Figure 4.8 When a word has a different
SEM form depending on its syntactic features, multiple entries are required. For example, the verb
decide has two entries: one for the case where the SUBCAT is _none, and one for the case where the
SUBCAT is _pp:on, where the verb has an additional argument. Note also that the word fish also has
two entries because its SEM depends on whether it is singular or plural.
a(art AGR 3sSEM INDEF1)
a(art AGR 3sSEM INDEF1)
can (aux SUBCAT base SEM CAN1)
car (n SEM CAR1 AGR 3s)
cry (v SEM CRY1 VFORM base SUBCAT_none)
decide (v SEM DECIDES1 VFORM base SUBCAT none)
decide (v SEM DECIDES-ON1 VFORM base SUBCAT _pp:on)
dog (n SEM DOG1 AGR 3s)
fish (n SEM FISH1 AGR 3s)
fish (n SEM (PLUR FISH1) AGR 3p)
house (n SEM HOUSE1 AGR 3s)
has (aux VFORM pres AGR 3s SUBCAT pastprt SEM PERF)
a(art AGR 3sSEM INDEF1)
he (pro SEM HE1 AGR 3s)
in (p PFORM {LOC MOT} SEM IN-LOC1)
Jill (name AGR 3s SEM "Jill")
man (n SEM MAN1 AGR 3s)
men (n SEM (PLUR MAN1) AGR 3p)
on(p PFORM LOC SEM ON-LOC1)
saw (v SEM SEES1 VFORM past SUBCAT _np AGR ?a)
see (v SEM SEES1 VFORM base SUBCAT _np IRREG-PAST + ENPASTPRT +)
she (pro AGR 3s SEM SHE1)
the (art SEM THE AGR {3s 3p})
a(art AGR 3sSEM INDEF1)
to (to AGR - VFORM inf)
Figure 4.8 A small lexicon showing the SEM features
Consider Grammar 4.3, which accepts very simple sentence and verb phrases and computes
their logical form. Note that another feature is introduced in addition to the SEM feature. The VAR
feature is new and stores the discourse variable that corresponds to the constituent. It will be
useful for handling certain forms of modifiers in the development that follows. The VAR feature is
automatically generated by the parser when a lexical constituent is constructed from a word, and
then it is passed up the tree by treating VAR as a head feature. It guarantees that the discourse
variables are always unique.
1. (S SEM (?semvp ?semnp) -> (NP SEM ?semnp) ( VP SEM ?semvp)
2. (VP VAR ?v SEM (λ a2 (?semv ?v a2))) -> (V[_none] SEM ?semv)
3. (VP VAR ?v SEM ( λ a3 (?semv ?v a3 ?semnp))) -> (V[_np] SEM ?semv) (NP SEM ? semnp)
4. (NP WH — VAR ?v SEM (PRO ?v ?sempro)) -> (PRO SEM ?sempro)
5. (NP VAR ?v SEM (NAME ?v ?semname)) -> (NAME SEM ?semname)
6. (NP VAR ?v SEM <?semart ?v (?semcnp ?v)>) -> (ART SEM ?semart) (CNP SEM ?semcnp)
7. (CNP SEM ?semn) -> (NSEM ?semn)
Head features for S, VP, NP, CNP: VAR
Grammar 4.3 A simple grammar with SEM features
The lexical rules for morphological derivation must also be modified to handle the SEM
feature. For instance, the rule that converts a singular noun into ~ plural noun takes the SEM of the
singular noun and adds the PLUR operators:
(N AGR 3p SEM (PLUR ?semn))  (N AGR 3s IRREG-PL - SEM ?semn) +S

A similar technique is used to insert unscoped tense operators for the past and present tenses. The
revisedmorphological rules are shown in Grammar 4.4.

L1. (V VFORM pres AGR 3s SEM <PRES ?semv>) 


(V VFORM base IRREG-PRES - SEM ?semv) +S
L2. (V VFORM pres AGR {1s 2s 1p 2p 3p} SEM <PRES ?semv>) 
(V VFORM base IRREG-PRES - SEM ?semv)
L3. (V VFORM past AGR {1s 2s 3s lp 2p 3p} SEM <PAST ?semv>) 
(V VFORM base IRREG-PAST - SEM ?semv) +ED
L4. (V VFORM pastprt SEM ?semv)  (V VFORM base EN-PASTPRT - SEM ?semv) +ED
L5. (V VFORM pastprt SEM ?semv)  (V VFORM base EN-PASTPRT + SEM ?semv) +EN
L6. (V VFORM ing SEM ?semv)  (V VFORM base SEM ?semv) +ING
L7. (N AGR 3p SEM (PLUR ?semn))  (V AGR 3s IRREG-PL - SEM ?semn) +S
Grammar 4.4 The morphological rules with semantic interpretation

These are the same as the original rules given as Grammar 3.5 except for the addition of the SEM
feature.
Rule 1 was previously discussed. Rules 2 and 3 handle transitive and intransitive verbs and
build the appropriate VP interpretation. Each takes the SEM of the verb, ?semv, and constructs a
unary predicate that will apply to the subject. The arguments to the verb sense include an event
variable, which is stored in the VAR feature, the subject, and then any additional arguments for
subcategorized constituents. Rule 4 constructs the appropriate SEM structure for pronouns given a
pronoun sense ?sempro, and rule 5 does the same for proper names. Rule 6 defines an expression
that involves an unscoped quantified expression, which consists of the quantifier ?semart, the
discourse variable ?v, and a proposition restricting the quantifier, which is constructed by applying
the unary predicate ?semcnp to the discourse. variable. For example, assuming that the discourse
variable ?v is ml, the NP the man would combine the SEM of the, namely the operator THE, with the
SEM of man, namely MAN1, to form the expression <THE m1 (MAN1 m1)>. Rule 7 builds a simple
CNP out of a single N. Since the SEM of a common noun is a unary predicate already, this value
simply is used as the SEM of the CNP.
Only two simple modifications need to be made to the standard chart parser to handle
semantic interpretation:
 When a lexical rule is instantiated for use, the VAR feature is set to a new discourse variable.
 Whenever a constituent is built, the SEM is simplified by per forming any lambda reductions
that are possible.
Figure 4.9 The parse of "Jill saw the dog" showing the SEM and VAR features

With these two changes, the existing parser now will compute the logical form as it parses.
Consider a trace of the parser on Jill saw the dog, whose parse tree is shown in Figure.9.5. The word
Jill is parsed as a name using the lexical lookup. A new discourse variable, j1, is generated and set to
the VAR feature. This constituent is used by rule 5 to build an NP. Since VAR is a head feature, the
VAR feature of the NP will be j1 as well, and the SEM is constructed from the equation in the
obvious manner. The lexical entry for the word saw generates a V constituent with the SEM <PAST
SEES1> and a new VAR ev1. Rule 6 combines the SEMs of the two entries for the and dog with the
VAR from the noun to build an NP with the SEM <THE d1 (DOG d1)>. This is then combined with
the SEM of the verb and its VAR by rule 3 to form a VP with the SEM
(λ x (<PAST SEES1> ev1 x <THE d1 (DOG1 d1)>))
This is then combined with the subject NP to form the final logical form for the sentence.
This completes the description of the basic semantic interpretation process. With the two
new features and two minor extensions to the parser described here, a grammar can be specified
that builds the logical form as it parses.

4.7.3 Prepositional Phrases and Verb Phrases

While the last section introduced everything needed to semantically interpret sentences, only the
simplest interpretation techniques were introduced. This section describes some additional
examples of grammatical rules that handle slightly more complicated phenomena. Specifically, it
addresses the inter pretation of verb phrases and prepositional phrases in more detail. First,
consider the rule that is needed for handling auxiliary verbs:
(VP SEM(λ a1 (?semaux (?semvp a1)))) 
(AUX SUBCAT ?v SEM ?semaux)
(VP VFORM ?v SEM ?semvp)
This rule inserts a modal operator in the appropriate place for the new VP. The SEM equation for
the new VP is a little complicated, so consider it in more detail. If ?semaux is a modal operator such
as CAN1, and ?semvp is a lambda expression such as (λ x (LAUGHS 1 e3 x)), then, according to the
auxiliary rule, the SEM of the VP can laugh will be
(λ a1 (CAN1 ((λ x (LAUGHS1 e3 x)) a1))
This can be simplified to
(λ a1 (CAN1 (LAUGHS1 e3 a1)))
It might help to view this type of SEM equation as "lifting" the variable for the subject over
the CAN1 operator. Starting with the VP interpretation (Û x (LAUGHS1 e3 x)), the equation builds a
new formula containing the CAN1 operator yet still retaining the lambda variable for the subject on
the outside of the formula. Note that, like all VPs, the new SEM is a unary predicate that applies to
the subject, so the auxiliary rule could be used recursively to analyze more complex sequences of
auxiliary verbs.
To analyze prepositional phrases, it is important to realize that they play two different
semantic roles in sentences. In one analysis the PP is a modifier to a noun phrase or verb phrase. In
the other use the PP is subcategorized for by a head word, and the preposition acts more as a flag
for an argument position than as an independent predicate.
Consider the modification case first. In these cases the SEM of the PP is a unary predicate to
be applied to whatever it eventually modifies. Thus the follow ing rule would be appropriate for
building a PP modifier:
(PP SEM (λ y (?semp y ?semnp))) (P SEM ?semp) (NP SEM ?semnp)
Given the PP in the corner, if the SEM of the P is IN-LOC1, and the SEM of the NP is <THE c1
(CORNER1 c1)>, then the SEM of the PP would be the unary predicate
(λ y (IN-LOC1 y <THE c1 CORNER1>))
Now you can consider the interpretation of the noun phrase the tnan in the corner. A reasonable
rule that incorporates the PP modifier would be
(CNP SEM (λ n1 (& (?semcnp n1) (?sempp n1)))) (CNP SEM ?semcnp) (PP SEM ?sempp)

Given that the SEM of the CNP man is the unary predicate MAN1 and the SEM of the PP in the
corner is (λ y (IN 1 y <THE c1 CORNER 1>)), the new SEM of the CNP, before simplification, would
be

(λ n1 (& (MAN1 n1) ((λ y (IN1 y <THE c1 CORNER1>)) n1)))


The subexpression ((Û y (IN1 y <THE c1 CORNER1>)) n1) can then be simplified to (IN1 n1 <THE
c1CORNER1>). Thus the overall expression becomes
(λ nl (& (MAN1 n1) (IN1 n1 <THE c1 CORNER1>)))
This is a unary predicate true of any man who is in the corner, the desired interpretation.
Combining this with a quantifier such as the using rule 6 would form a SEM such as
<THE m2 ((λ z (& (MAN1 z) (IN1 z <THE c1 CORNER1>))) m2)>
which itself simplifies to
<THE m2 (& (MAN1 m2) (IN1 m2 <THE c1 CORNER1>))>
PPs can also modify verb phrases, as in the VP "cry in the corner" in "Jill can cry in the corner". The
syntactic rule that introduces the PP modifier would be
VP -> VP PP
You might think that the SEM feature equation can be treated in exactly a parallel way as with the
noun phrase case, but there is a complication. Consider the desired behavior. The VP
subconstituent cry has the logical form
(λ x (CRIES1 e1 x))
and the PP has the logical form shown above. The desired logical form for the entire VP cry in the
corner is
(λ a (& (CRIES1 e1 a) (IN-LOC1 e1 <THE c1 CORNER1>)))
The problem is that the unary predicate for the VP subconstituent is intended to apply to the
subject, where the unary predicate for the PP modifier must apply to the discourse variable
generated for the VP. So the technique used for rule 9 does not yield the correct answer. Rather, the
SEM constructed from the PP must he applied to the discourse variable instead. In other words, the
appropriate rule is
(VP VAR ?v SEM (λ x (& (?semvp x) (?sempp ?v)))) -> (VP VAR ?v SEM ?semvp) (PP SEM ?sempp)
The parse tree for the VP cry in the corner using this rule is shown in Figure 4.10.
Prepositional phrases also appear as subcategorized constituents in verb phrases, and these cases
must be treated differently. Specifically, it is the verb that determines how the prepositional phrase
is interpreted. For example, the PP on a couch in isolation might indicate the location of some
object or event, but with the verb decide, it can indicate the object that is being decided about. The
distinction between the two readings can be illustrated by considering that the sentence Jill
decided on a couch is ambiguous between two readings:
 Jill made a decision while she was on a couch.
 Jill made a decision about a couch.

Figure 4.10 Using the VAR feature for PP modifiers of VPs

The first case results from treating on a couch as an adverbial PP. which was discussed above. What
is the semantic interpretation equation for the second case? The appropriate syntactic rule is
VP -> V[_pp:on] NP PP[on]
and the desired logical form of the final VP is
(λ s (DECIDES-ON1 d1 s <A c1 (COUCH c1)>))
Note that there appears to be no semantic contribution from the word on in this case.
Subcategorized PPs are handled in many systems by distinguishing between two different types of
prepositional phrases. A new binary feature PRED is introduced, where + indicates that the
prepositional phrase should be interpreted as a .predicate, and — means it should be interpreted as
an argument, that is, a term. This binary distinction allows us to specify two rules for prepositional
phrases. These are rules 8 and 9 in Grammar 4.7. Rule 8 is restricted to prepositional phrases with
a +PRED value, and the PP acts as a modifier. Rule 9 handles all the subcategorized prepositional
phrases, with the -PRED feature, in which case the SEM of the PP is simply the SEM of the object NP.
Grammar 4.7 also summarizes all the other rules developed in this section.
8. (PP PRED + SEM (Û x (?semp x ?semnp)))  (P SEM ?semp) (NP SEM ?semnp)
9. (PP PREI) - PFORM ?pf SEM ?semnp)  (P ROOT ?pf) (NP SEM ?semnp)
10. (VP VAR ?v SEM (Û ag1 (& (?semvp ag1) (?sempp ?v)))) 
(VP SEM ?semvp) (PP PRED + SEM ?sempp)
11. (VP VAR ?v SEM (Û ag2 (?semv ?v ag2 ?sempp))) 
(V[_np_pp:on SEM ?semv) (PP PRED — PFORM on SEM ?sempp)
12. (VP SEM (Û a1 (?semaux (?semvp a1)))) 
(AUX SUBCAT ?v SEM ?semaux) (VP VFORM ?v SEM ?semvp)
13. (CNP SEM (A. n1 (& (?semcnp n1) (?sempp n1)))) 
(CNP SEM ?semcnp) (PP PRED + SEM ?sempp)
Head features for PP: PFORM
Head features for VP, CNP: VAR

Grammar 4.7 Rules to handle PPs in verb phrases

Figure 4.11 shows the two readings of the VP decide on a couch given the grammar defined
by Grammars 4.3 and 4.7. The case in which the decision is about a couch is shown in the upper half
of the figure: the PP has the feature -PRED, as previously discussed, and its SEM is <A c1 COUCH1>.
The case in which a decision is made on a couch is shown in the lower half of the figure: the PP has
the feature +PRED and its SEM is (λ x (ON-LOC1 x <A c1 COUCH1>)).

Figure 4.11 Two possible parse trees for the VP "decide on a couch"
4.7.4 Lexicalized Semantic Interpretation and Semantic Roles

So far, the semantic forms of the lexical entries have only consisted of the possible senses for each
word, and all the complexity of the semantic interpretation is encoded in the grammatical rules.
While this is a reasonable strategy, many researchers use a different approach, in which the lexical
entries encode the complexities and the grammatical rules are simpler. Consider the verb decide,
say in its intransitive sense, DECIDES1. The SEM of the verb is simply the sense DECIDES1, and rule
2 builds the lambda expression (λ y (DECIDES1 e1 y)). An alternative approach would be to define
the SEM of the lexical entry to be (λ y (DECIDES1 e1 y)), and then the SEM equation for rule 2
would just use the SEM of the verb as its SEM. Likewise, the SEM for the lexical entry for the
transitive sense could be the expression (λ o (λ y (DECIDES-ON1 e1 y o))). The SEM equation for
rule 3 would then apply this predicate to the SEM of the object to obtain the appropriate SEM, as
before.
There is a tradeoff here between the complexity of the grammatical rules and the complexity
of the lexical entries. With the grammar we have developed so far, it seems better to stay with the
simple lexicon. But if the logical forms become more complex, the alternative approach begins to
look attractive. As an example, consider how to specify a grammar that produces a logical form
based on thematic roles. Consider first what happens if the lexicon can only store atomic word
senses. Whereas the earlier grammar had only one rule that could cover all transitive verbs, the
new grammar would have to classify transitive verbs by the thematic roles they use, and a separate
rule must be used for each. The verbs see and eat, for example, both have transitive forms where
the subject fills the AGENT role and the object fills the THEME role. The verb break, on the other
hand, also has a sense where the subject fills the INSTR role and the object fills the THEME role, as
in The hammer broke the window. To handle these two cases, a new feature, say ROLES, would
have to be added to the lexical entries that identifies the appropriate forms, and then a separate
grammar rule added for each. These might look as follows:
(VP VAR ?v SEM (λ a (?semv ?v [AGENT a] [THEME ?semnpl))) 
(V ROLES AG-THEME SEM ?semv) (NP SEM ?semnp)
(VP VAR ?v SEM (λ a (?semv ?v [INSTR a] [THEME ?semnp]))) 
(V ROLES INSTR-THEME SEM ?semv) (NP SEM ?semnp)
Additional rules would be added for all the other possible combinations of roles that can be used by
the verbs.
Clearly, this approach could be cumbersome. Since it requires adding thematic role
information to the lexicon anyway (using the ROLES feature), it might be simpler just to encode the
appropriate forms in the lexicon. For instance, if the lexical entries are
see: (V VAR ?v SEM (λ o (λ a (SEES1 ?v [AGENT a] [THEME ?o]))))
break: (V VAR ?v SEM (λ o (λ a (BREAKS1 ?v [INSTR al [THEME ?o]))))
then a single grammar rule, namely,
(VP SEM (?semv ?semnp))  (V SEM ?semv) (NP SEM ?semnp)
will cover all the cases. Consider the VP see the book, where the SEM of "see" is as above and "the
book" is <THE b1 (BOOK1 b1)>. The SEM for the VP would be
((λ o (λ a (SEES1 b1 [AGENT a] [THEME 0]))) <THE b1 (BOOK1 b1)>)
which can be simplified using one lambda reduction-to
(λ a (SEES1 b1 [AGENT a] [THEME <THE b1 (BOOK1 b1)>]))
The same rule given the VP break the book would apply the SEM of break above to the SEM for the
book to produce the reduced logical form
(λ a (BREAKS1 b1 [INSTR a] [THEME <THE b1 (BOOK1 b1)>]))
Hierarchical Lexicons
The problem with making the lexicon more complex is that there are many words, and specifying a
lexicon is difficult even when the entries are simple. Just specifying the semantic interpretation
rules for the most common sense is tedious, as there is a different semantic interpretation rule for
each complement structure the verb allows. For example, the verb give allows the forum
I gave the money.
I gave John the money.
I gave the money to John.
The lexical entries for give would include the following:
(V SUBCAT _np
SEM λ o λ a (GIVE1 * [AGENT a] [THEME o]))
(V SUBCAT _np_np
SEM V λ r λ o λ a (GIVE1 * [AGENT a] [THEME o] [TO-POSS r]))
(V SUBCAT _np_pp:to
SEM λ o λ r λ a (GIVE1 * [AGENT a] [THEME o] [TO-POSS r]))

This is quite a burden if it must be repeated for every verb. Luckily, we can do much better
by exploiting some general regularities across verbs in English. For example, there is a very large
class of verbs, including most transitive verbs, that all use the same semantic interpretation rule for
the ..np SUBCAT form. This class includes verbs such as give, take, see, find; paint, and so on -
virtually all verbs that describe actions. The idea of a hierarchical lexicon is to organize verb senses
in such a way that their shared properties can be captured concisely. This depends on a technique
called inheritance, where word senses inherit or acquire the properties of the abstract classes
above them in the hierarchy. For example, a very useful lexical hierarchy can be based on the
SUBCAT and SEM properties of verbs, Near the top of the hierarchy are abstract verb senses that
define common verb classes. For example, the abstract class INTRANS-ACT defines a class of verbs
that allow a SUBCAT _none and have the semantic interpretation rule
λ s (?PREDN *[AGENT s])
where ?PREDN is a predicate name determined by the verb. This fully specifies the semantic
interpretation of intransitive verbs such as run, laugh, sit, and so on, except for the actual predicate
name that still needs to be specified in the lexical entry for the word. Another common form is the
simple transitive verb that describes an action, including the verbs listed above. This form, TRANS-
ACT, has a SUBCAT _np and a SEM λ o λ a (? PREDN * [AGENT a] [THEME o]).
We can define similar classes for all the common verb forms and then build an inheritance
structure that relates verb senses to the forms they allow. Figure 4.12 shows a lexical hierarchy
that encodes the definitions of four different verb senses. It is equivalent to the following entries
specified without a hierarchy:
run (in the intransitive exercise sense, RUN1):

(SUBCAT _none
SEM λ a (RUN1 * [AGENT a]))

run (in the transitive "operate" sense, OP1):

(SUBCAT _np
SEM λ o λ a (OP1 * [AGENT a] [THEME oD)

donate (allowing the transitive and "to" form):

(SUBCAT np
SEM λ o λ a (DONATE1 * [AGENT a] [THEME o]))
(SUBCAT np pp:to
SEM λ o λ r λ a (DONATE1 * [AGENT a] [THEME o] [TO-POSS r]))

and, of course,
give (in all its forms as previously discussed):

(SUBCAT np
SEM λ o λ a (GIVE1 * [AGENT al [THEME o]))
(SUBCAT _np_pp:to
SEM λ o λ r λ a (GIVE1 * [AGENT a] [THEME o] [TO-POSS r]))
(SUBCAT _np_np
SEM λ r λ o λ a (GIVE1 * [AGENT a] [THEME o] [TO-POSS r]))

Figure 4.12 Part of a lexical hierarchy for verbs based on SUBCAT and SEM features

You could implement a lexical hierarchy by adding another feature to each lexical entry called SUP
(for superclass), which has as its value a list of abstract categories from which the constituent
inherits properties. It is then relatively simple to write a program that searches up this hierarchy to
find all the relevant feature values whenever the lexicon is accessed. The entry for the verb give
might now look like this:
give: (V ROOT give
PREDN GIVE1
SUP (BITRANS-TO-ACT TRANS-ACT))
14. (S INV — SEM (WH-query ?sems)) (NP WH Q AGR ?a SEM ?semnp)
(S INV + SEM ?sems GAP (NP AGR ?a SEM ?semnp))
15. (S INV + GAP ?g SEM (?semaux (?semvp ?semnp)))(AUX AGR ?a SUBCAT ?s SEM ?semaux)
(NP AGR ?a GAP - SEM ?sempp)
(VP VFORM ?s GAP ?g SEM ?semvp)
16. (NP WH Q VAR ?v SEM <WH ?v (?sempro ?v)>) (PRO WH Q SEM ?sempro)
Grammar 4.10 Rules to handle simple wh-questions
4.7.5 Handling Simple Questions

The grammar developed so far deals only with simple declarative sentences. To extend the
grammar to handle other sentence types requires adding rules to inter pret wh-terms, inverted
sentences, and the gap propagation required to handle wh-questions. This can be done quite
directly with the mechanisms developed so far and requires no additional extensions to the parser.
All you need to do is augment the S rules first developed for questions in Chapter 5 with
appropriate SEM feature equations.
The part that may need some explanation is how the SEM feature interacts with the GAP
feature. As you recall, many wh-questions use the GAP feature to build the appropriate structures.
Examples are
Who did Jill see?
Who did Jill want to see?
Who did Jill want to win the prize?
The rule introduced in Chapter 5 to account for these questions was
(S INV -) -> (NP WH Q AGR ?a) (S INV + GAP(NP AGR ?a))
That is, a wh-question consists of an NP with the WH feature Q followed by an inverted sentence
with an NP missing that agrees with the first NP. To make the semantic interpretation work, we add
the SEM feature to the features of the gap. This way it is passed into the S structure and can be used
at the appropriate place when the gap is found. The revised rule is
(S INV — SEM (WH-query ?sems)) (NP WH Q AGR ?a SEM ?semnp)
(S INV + SEM ?sems GAP (NP AGR ?a SEM ?semnp))

Grammar 4.10 gives the new rules required to handle this type of question. Rule 14 was just
discussed. Rule 15 handles inverted sentences and could be used for yes/no questions as well as for
the wh-questions discussed here. Rule 17 allows noun phrases to be built from wh-pronouns, such
as who and what.
The lexical entry for the wh-words would have to be extended with a SEM feature as well.
For instance, the ward who would have the entry
(PRO WH {Q R} SEM WHO1 AGR {3s 3p})
The predicate WHO1 would be true of any objects that are reasonable answers to such questions,
including people and possibly other animate agents.
To see how the SEM feature and GAP features interact, consider a derivation of the parse tree for
the question Who did Jill see?, as shown in Figure 4.13. Using rule 16, the word who would be
parsed as the noun phrase
(NP WH Q AGR 3s SEM <WH p1 (WHO1 p1)>)
Figure 4.13 The parse tree for "Who did Jill see?"

This constituent can be used to start rule 14, and thus we need the following constituent to
complete the rule:
(S INV +
GAP (NP AGR 3s SEM <WH p1 (WHO1 p1)>)
SEM ?sems)
Rule 15 can be used to rewrite this, and the words did and Jill provide the first two subconstituents.
The third subconstituent needed is:
(VP VFORM base
GAP (NP AGR 3s SEM <WH p1 (WHO1 p1)>)
SEM ?semvp)

This is a VP with an NP gap. Rule 3 from Grammar 4.3 applies to transitive verbs such as see. The
GAP feature is used to fill a gap for the object of the verb, instantiating ?semnp to <WH p1 (WHO
p1)>. Thus the SEM of the new VP is
(λ a3 (SEES1 s1 a3 <WH p1 (WHO1 p1)>))
The SEM of the initial word who has found its way to the appropriate place in the sentence. The
parse can now be completed as usual. This technique of passing in -the SEM in the GAP is
completely general and can be used to handle all of the question types discussed in Chapter 5.

o Prepositional Phrase Wh-Questions


Questions can also begin with prepositional phrases, such as In which box did you put the book?,
Where did you put the book?, and When did he disappear? The semantic interpretation of these
questions will depend on whether the PPs are subcategorized for the verb or are VP modifiers.
Many such questions can be handled by a rule virtually identical to rule 14 for NP wh.questions,
namely
(S INV — SEM (WH-query ?sems)) ->
(PP WH Q PRED ?p PTYPE ?pt SEM ?sempp)
(S INV + SEM ?sems GAP (PP PRED ?p PTYPE ?pt SEM ?sempp))
To handle wh-terms like where appropriately, the following rule is also needed:
(PP PRED ?pd PTYPE ?pt SEM ?sem) -> (PP-WRD PRED ?pd PTYPE ?pt SEM ?sem)
The wh-term where would have two lexical entries, one for each PRED value:
(PP-WRD PTYPE {LOC MOT} PRED - VAR ?v
SEM <WH ?v (LOC1 ?v)>)
(PP PRED + VAR ?v SEM (Û x (AT-LOC x <WH ?v (LOC1 v)>)))
These rules would extend the existing grammar so that many such questions could be answered.
Figure 4.14 shows part of the parse tree for the question "Where did Jill go?".

Figure 4.14 The parse tree for "Where did Jill go?"
Note that handling questions starting with +PRED prepositional phrases depends on having
a solution to the problem concerning gap propagation first mentioned in Chapters. Specifically, the
rule VP -+ VP PP, treated in the normal way, would only pass the GAP into the VP subconstituent,
namely the nonlexical head. Thus there seems no way to create a PP gap that modifies a verb
phrase. But this was a problem with the syntactic grammar as well, and any solution at that level
should carry over to the semantic level.

4.7.6 Semantic Interpretation Using Feature Unification

So far we have used lambda expressions and lambda reduction to drive the semantic
interpretation. This provides a good framework for explaining and corn -paring techniques for
semantic interpretation. However, many systems do not explicitly use lambda expressions and
perform semantic interpretation directly using feature values and variables. The basic idea is to
introduce new features for the argument positions that earlier would have been filled using lambda
reduction. For instance, instead of using rule 1 in Grammar 4.3, namely
(S SEM (?semvp ?semnp)) -> (NP SEM ?semnp) (VP SEM ?sernvp)
a new feature SUBJ is introduced, and the rule becomes
(S SEM ?semvp) -> (NP SEM ?semnp) (VP SUBJ ?semnp SEM ?semvp)
The SEM of the subject is passed into the VP constituent as the SUBJ feature and the SEM equations
for the VP insert the subject in the correct position. The new version of rule 3 in Grammar 9.3 that
does this is
(VP VAR ?v SUBJ ?semsubj SEM (?semv ?v ?semsubj ?semnp)) ->
(V[_none] SEM ?semv) (NP SEM ?semnp)
Figure 4.15 shows how this rule builds the SEM of the sentence Jill saw the dog. Compare this to the
analysis built using Grammar 4.3 shown in Figure 4.9. The differences appear in the treatment of
the VP. Here the SEM is the full proposition with the subject inserted, whereas before the SEM was
a lambda expression that would be applied to the subject later in the rule that builds the S.
Figure 4.15 The parse tree for Jill saw the dog using the SUBJ feature

1. (S SEM ?semvp)  (NP SEM ?semsubj) (VP SUBJ ?semsubj SEA! ?semvp)
2. (VP VAR ?v SUBJ ?semsubj SEM (?semv ?v ?semsubj))  (V[nonej SEA! ?semv)
3. (VP VAR ?v SUBJ ?semsubj SEM (?semv ?v ?semsubj ?semnp)) 
(V[_np] SEM ?semv) (NP SEM ?semnp)
4. (NP VAR ?v SEM (PRO ?v ?sempro))  (PRO SEM ?sempro)
5. (NP VAR ?v SEM (NAME ?v ?semname))  (NAME SEM ?semname)
6. (NP VAR ?v SEM <?semart ?v ?semcnp>)  (ART SEM ?semart) (CNP SEM ?semcnp)
7. (CNP VAR ?v SEM (?semn ?v))  (N SEM ?semn)
Head features for S, VP, NP, CNP: VAR

Grammar 4.14 A simple grammar with SEM features

Grammar 4.14 is a version of Grammar 4.3 reformulated using this technique. Besides the
changes to rules 1, 2, and 3, rules 6 and 7 are also modified. Rule 7 uses the VAR value to build a full
proposition (as opposed to a unary predicate in the old grammar), and rule 6 is changed
appropriately to account for the change to rule 7.

One advantage of this approach is that no special mechanism need be introduced to handle
semantic interpretation. In particular, there is no need to have a lambda reduction step. Everything
is accomplished by feature unification. Another significant advantage is that a grammar specified in
this form is reversible and hence can be used to generate sentences as well as parse them, as
discussed in the next section. Not all lambda expressions can be eliminated using these techniques,
however. For instance, to handle conjoined subject phrases as in Sue and Sam saw Jack, the
meaning of the verb phrase must still be a lambda expression. If the subject were inserted into the
VP using a variable for the SUBJ feature, then this variable would have to unify with the SEMs
of both Sue and Sam. which it can't do.

o 4.7.7 Generating Sentences from Logical Form


Intuitively, once you have a grammar for parsing, it should be easy to reverse it and use it for
generation. Such a sentence generator would be given a constituent with its SEM feature set to a
logical form, and then it would use the grammar to decompose this constituent into a series of
lexical constituents that would have the appropriate meaning. Not all grammars are reversible,
however. In fact, Grammar 4.3 is not reversible because it uses lambda reduction. To see why,
consider an example. Say you want to generate a sentence that has the meaning
(<PAST SEES1> s1 (NAME j1 "Jill") <THE d1 (DOG1 d1)>)
Grammar 4.3 has only one S rule, and if you try to unify the SEM value in rule 1 with this
logical form it will fail. The pattern (?semvp ?semnp) would match any proposition built out of a
unary predicate and one argument, but the logical form specified is a proposition with three
arguments. The problem is that lambda reduction was used to convert the original logical form,
which was
((λ a (<PAST SEES1> s1 a <THE d1 (DOG1 d1)>)) (NAME j1 "Jill"))
There is an inverse operation to lambda reduction, called lambda abstraction, that could be used to
find a match. But the problem is that there are three possible lambda abstractions of the logical
form, namely
(λ e (<PAST SEES1> e (NAME j1 "Jill") <THE d1 (DOG1 d1)>))
(λ a (<PAST SEES1> s1 a <THE d1 (DOG1 d)>))
(λ o (<PAST SEES 1> s1 (NAME j1 "Jill") o))
There is nothing in rule 1 that indicates which is the right abstraction, but only the second will yield
an appropriate sentence.
On the other hand, the approach using features, as in Grammar 9. 14, is reversible because it
retains in the features the information necessary to deter mine how the logical form was
constructed. It is easiest to understand how it does this by considering an example.
In many ways parsing and realization are very similar processes. Both can be viewed as
building a syntactic tree. A parser starts with the words and tries to find a tree that accounts for
them and hence determine the logical form of the sentence, while a realizer starts with a logical
form and tries to find a tree to account for it and hence determine the words to realize it. This
analogy suggests that the standard parsing algorithms might be adaptable to realization. For
instance, you could imagine using the standard top-down algorithm, where rules are selected from
the grammar based on whether they match the intended SEM structure, and the words in the
sentence are selected in a left-to-right fashion. The problem is that this approach would be
extremely inefficient. For example, consider using Grammar 4.14 to realize an S with the SEM
(<PAST SEES1> s1 (NAME j1 "Jill") <THE d1 (DOG1 d1)>)
Rule 1 matches trivially since its SEM feature has a variable as its value, ?sernvp. There are no other
S rules, so this one is instantiated. The standard top-down algorithm would now attempt to
generate the two subconstituents, namely
(NP SEM ?semsubj)
(VP SUBJ ?semsubj
SEM (<PAST SEES1> s1 (NAMF j1 "Jill") <THE d1 (DOG1 d1)>))

But the SEM of the NP is unconstrained. The realizer could only proceed by randomly
generating noun phrases and then attempting to realize the VP with each one and backtracking
until an appropriate one is found. Worse than that, with a more general grammar, the algorithm
may fall into an infinite loop. Clearly, this is an unworkable strategy.
One method for avoiding these problems is to expand the constituents in a different order.
For instance, choosing what term will be the subject depends on decisions made about the verb and
the structure of the verb phrase, such as whether the active or passive voice is used, and so on. So it
makes sense to expand the verb phrase first and then generate the appropriate subject. In fact, a
good general strategy is to expand the head constituents first and then fill in the others. Figure 4.16
gives a simple algorithm to do this. The realization algorithm operates on a list of constituents
much like the basic top-down parser described in Chapter 3. It continues to rewrite constituents in
this list until the list consists only of lexical constituents, at which point the words can be
generated.

Initialization: Set L to a list containing the constituent that you wish to generate.
Do until L contains no nonlexical constituents:
1. If L contains a constituent C that is marked as a nonlexical head,
2. Then use a rule in the grammar to rewrite C. Any variables in C that are bound in the rewrite
should be instantiated throughout the entire list.
3. Else choose a nonlexical constituent C, giving preference to one whose SEM feature is bound,
if one exists. Use a rule in the grammar to rewrite C. Any variables in C that are bound in the
rewrite should be instantiated throughout the entire list.

Figure 4.16 A head-driven realization algorithm

This algorithm can easily be generalized so that it searches all possible realizations based on
the grammar using a backtracking technique. The reason it works well is that, by expanding the
head constituents first, the algorithm moves quickly down to the lexical level and chooses the
words that have the most influence on the structure of the overall sentence. Once the lexical head is
chosen, most of the structure of the rest of the constituent is determined.
Consider this algorithm operating with Grammar 4.14 and the initial input
(S SEM (<PAST SEES1> s1 (NAME j1 "Jill") <THE d1 (DOG1 d1)>))
The S constituent is rewritten based on rule 1 in the grammar to produce the following constituent
list: (NP SEM ?semsubj)
(VP SUBJ ?semsubj
SEM (<PAST SEES1> s1 (NAME j1 "Jill") <THE d1 (DOG] d1)>))
The nonlexical head constituents are indicated in italics. Thus the VP is expanded next. Only rule 3
will match the SEM structure. As a result of the match, the following variables are bound:
?semv <- <PAST SEES1>
?v <- s1
?semsubj <— (NAME j1 "Jill")
?semnp <- <THE d1 (DOG1 d1>
Thus you obtain the following list of constituents after rewriting the VP and instantiating the
variables throughout the list:
(NP SEM (NAME j1 "Jill"))
(V[_np] SEM <PAST SEES1>)
(NP SEM <THE d1 (DOG1 d1)>)
Since there is no nonlexical head, the algorithm now picks any nonlexical constituent with a bound
SEM, say the first NP. Only rule 5 will match, yielding the lexical constituent (NAME SEM "Jill") and
producing the constituent list
(NAME SEM "Jill")
(V[_np] SEM <PAST SEES1>)
(NP SEM <THE d1 (DOG1 d1>)
The remaining NP is selected next. Rule 6 matches and the subconstituents (ART SEM THE) and
(CNP SEM DOG1) are generated, producing the constituent list
(NAME SEM "Jill")
(V[_np] SEM <PAST SEES1>)
(ART SEM THE)
(CNP SEM DOG1)
The CNP constituent is selected next and rewritten as a common noun with SEM DOG1, and the
algorithm is complete. The constituent list is now a sequence of lexical categories:
(NAME SEM "Jill")
(VLnp] SEM <PAST SEES1>)
(ART SEM THE)
(N SEM DOG1)
It is simple to produce the sentence Jill saw the dog from the lexical constituents.
The grammar used in this example was very small, so there is only one possible realization
of this sentence. With a larger grammar, a wider range of forms would be possible. For instance, if
only the SEM feature is specified, the realization program would randomly pick between active and
passive sentences when allowed by the verb. In other cases it might pick between different
subcategorization structures. For instance, the same logical form might be realized as Jill gave
the dog to Jack or Jill gave Jack the dog. Depending on the number of word senses used and whether
different words have senses in com mon, the realizer may also randomly pick between various
lexical realizations of a logical form. For instance, there could be a logical form that could be
realized by the sentence Jill gave the money to the Humane Society or Jack donated the money to
the Humane Society.
Each of these variations may have different effects in context, but these dis tinctions are not
captured in the logical form. To force particular realizations, you would have to specify other
features in addition to the SEM feature. For instance, you might set the VOICE feature to active to
guarantee an active voice sentence.

4.8 Recent Trends in Natural Language Processing


Natural Language Processing (NLP) is one of the most exciting fields in AI and has already given rise to
technologies like chatbots, voice assistants, translators, and many other tools we use every day.
The study of natural language processing began in the 1950s, with the first attempts of automated
translation from Russian to English laying the groundwork for research in natural language processing.
Around the same time, the Turing Test – originally known as the imitation game – was developed to
determine if a machine was capable of behaving in the same way as a human being.
Since then, NLP has come a long way. And, as unstructured data grows, NLP technology continues to
evolve to better understand the nuances, context, and ambiguities of human language.

Transfer Learning
Transfer learning is a machine learning approach that involves training a model for one job and then
repurposing it for a related activity. Instead of developing and training a model from start, which is
costly, time-consuming, and requires a large quantity of data, you may simply fine-tune one that has
already been trained. As a result, organizations can accomplish NLP jobs faster and with less labeled
data. Transfer learning, which first gained popularity in the field of computer vision, is now being
utilized in NLP applications such as intent classification, sentiment analysis, and named entity
recognition.

Fake News and Cyberbullying Detection


NLP has become a critical tool for detecting and preventing the spread of fake news and disinformation,
saving time and effort. We’ve already seen several intriguing methods to automatic fake news
identification this year, and we’ll likely see more of it in 2022, with so much incorrect information about
Covid-19 circulating. Cyberbullying detection is another method NLP is being utilized to make a good
influence. On social media, classifiers are being developed to detect the usage of abusive and
derogatory language, as well as hate speech.
Monitoring Social Media Using NLP
Sentiment analysis, also known as opinion mining, will continue to play a significant role, allowing
businesses to monitor social media and gain real-time insights into how customers feel about their
brand or products. Using natural language processing (NLP) tools to assess brand sentiment can assist
businesses in identifying areas for improvement, detecting negative comments on the fly (and
responding proactively), and gaining a competitive advantage. Analysis of the impact of marketing
efforts and evaluating how consumers react to events such as a new product introduction are two more
intriguing use cases for sentiment analysis in social media monitoring.

Reinforcement Learning Training Models


Reinforcement learning is a machine learning subfield that is expected to expand in 2021.
Reinforcement algorithms, in essence, learn by doing, using a trial-and-error process that incorporates
input from past acts and experiences. Reinforcement learning may be used in natural language
processing to speed up tasks such as question answering, machine translation, and summarization.
Currently, supervised algorithms are used to train NLP models, which are subsequently fine-tuned via
reinforcement learning.

The Use of Multilingual NLP Will Increase


Too far, the majority of NLP advancements have been concentrated on English. Companies like Google
and Facebook, on the other hand, are increasingly releasing pre-trained multilingual models that
perform as well as or better than monolingual models. Open-source libraries are following in the
footsteps of Google and Facebook, with recent improvements in language-agnostic sentence
embeddings, zero-shot learning, and the availability of multilingual embeddings, thus we may anticipate
seeing an increasing trend in multilingual NLP models in the coming year, 2022.

Using a Mix of Supervised and Unsupervised Machine Learning Techniques


Combining supervised and unsupervised approaches for developing a model for NLP appears to produce
more accurate results. Supervised learning, which is often used for tasks like topic categorization,
necessitates a significant quantity of labeled data as well as several iterations before a model can
generate good predictions. There is no labeled data in unsupervised learning: the model learns from
input data and can recognize patterns and draw conclusions on unseen data on its own. Clustering is an
example of this, in which related items are clustered together. Combining supervised and unsupervised
learning has been proven to improve the performance of machine learning models, particularly for text
interpretation.

Automating Customer Service: Tagging Tickets & New Era of Chatbots


Integrating natural language processing technologies with help desk software, for example, might
automate time-consuming and laborious operations like labeling and routing customer support issues,
freeing employees to focus on higher-value work. On the front lines of customer service, chatbots will
continue to play an important role. Despite certain restrictions, they can typically have a basic
conversation and do activities for which they have been trained. With advances in natural language
processing (NLP) and rising customer service demand, we can expect to see significant progress toward
the next generation of chatbots, which will be able to self-improve, hold more complex conversations,
and possibly learn how to complete new tasks without prior training.
Chatbots And Customer Assistants Taken To Next-Level
With advances in NLP and the increasing demands in customer service, one can expect major strides
towards next-gen bots that can hold complex conversations, self-improve, and learn how to carry out
tasks that have not been previously trained on.
Due to a rise in remote working situations in 2020, there has also been a tremendous increase in
customer support tickets across industries. It has become a major task to deal with increased ticket
volume and provide quick responses to urgent queries. One can expect the integration of NLP tools with
help desk softwares to perform tasks such as tagging and routing of customer support requests, thereby
requiring human intervention in just higher-value tasks.

Automation In NLP
The success of automated machine learning or autoML in effectively dealing with real-world problems
has prompted researchers to develop more automation and no-code tools and platforms.
One such area is automation in natural language processing. With AutoNLP, users can build models like
sentiment analysis with just a few basic lines of code. This encourages wider participation in the
machine learning community, earlier thought to be restricted to just developers and engineers.
AutoNLP helps in automating processes such as stemming, tokenisation, and lemmatisation, among
others. It can also help in choosing the best model for a given dataset.

The Rise of Low-Code Tools


Back in the day, developing NLP models needed an extensive understanding of AI and machine learning,
coding, open-source libraries, and other topics. No longer, thanks to low-code technologies that make
life easier. While low-code or no-code tools have existed for some time, their use has been restricted to
web and software development. We may see the emergence of these technologies in the NLP area as
well in 2022. MonkeyLearn, a SaaS company, aims to democratize machine learning and natural
language processing by making them accessible to non-technical people. The company has created a
model builder that allows you to create, train, and integrate emotional analysis and text classification
models using a point-and-click interface.

NLP will Necessitate a Comprehensive Strategy


From a commercial standpoint, organizations that understand how AI will operate within a product, in
addition to the technical skills required to launch and grow an NLP project, will do better. To remain
competitive, all departments of an organization must comprehend the benefits of AI integration and
how it will impact their roles. Many initiatives fail because of a lack of holistic AI integration, in which
product managers, designers, marketing, salespeople, and others aren’t actively involved in its
adoption. In the coming year, 2022, the overall investment in knowledge, time, energy, and practice
throughout the whole organization will determine its success.

Transformers Will Lead the Way: BERT & ELMO


BERT (Bidirectional Encoder Representations from Transformers) and ELMo (Embeddings from Language
Models) will be the focus of the NLP community in 2022. These models have been trained on massive
quantities of data and can enhance the performance of a wide range of NLP issues dramatically.
Deep Learning In NLP
In the past, machine learning approaches in NLP problems have largely been based on shallow models
such as SVM and logistic regression. However, recent trends show a shift towards the use of neural
networks to produce superior results on NLP tasks.
Traditional machine learning-based NLP systems are based on time crafted features, which is time-
consuming and often incomplete. Such shortcomings are overcome by deep learning related models,
more specifically convolutional neural networks (CNN), recurrent neural networks (RNNs), and recursive
neural networks. As per a paper titled ‘Recent Trends In Deep Learning Based Natural Language
Processing’, deep learning techniques optimise NLP models on features such as named-entity
recognition, semantic role labelling, sentiment classification, machine translations, contextual
embedding, and dialogue systems.
Future innovations for deep learning in NLP can be expected majorly on two fronts — the distinction
between similarity and relatedness, and word ambiguity. In the case of former, current deep learning
methods are still based on the degree of correlation rather than the intended meaning, which may
translate into poor practical applications; whereas, word ambiguity refers to the same word but with
different meanings.

Market Intelligence Monitoring


Market knowledge and information exchange between various organisations, stakeholders,
government, and regulatory bodies influence business markets. Undeniably, it is very important to stay
updated with the fast-changing market trends and standards. In this regard, natural language processing
is an important tool to track and monitor the market intelligence reports for extracting vital information
necessary for building strategies. NLP is already in use in financial marketing, for which it helps to
determine the market situation, employment changes, tender-related information, extracting
information from large repositories, etc.
Other applications such as reputation management, neural machine translation, talent acquisition, data
visualisation, and even process automation will have major components of NLP.
UNIT – V
Ambiguity Resolution: Statistical Methods, Probabilistic Language Processing, Estimating
Probabilities, Part- of- Speech tagging, Obtaining Lexical Probabilities, Probabilistic Context- Free
Grammars, Best First Parsing. Semantics and Logical Form, Word senses and Ambiguity, Encoding
Ambiguity in Logical Form.

5. Ambiguity Resolution: Statistical Methods

Chapter 6 suggested employing heuristics to choose between alternate syntactic analyses. Creating
such heuristics, however, is difficult and time consuming. Furthermore, there is no systematic
method for evaluating how well the heuristic rules work in practice. This section explores some
techniques for solving these problems based on probability theory. Such approaches have become
very popular in the past few years because large databases, or corpora, of natural language data
have become available. This data allows you to use statistically based techniques for automatically
deriving the probabilities needed. The most commonly available corpus, the Brown corpus, consists
of about a million words, all labeled with their parts of speech. More recently, much larger
databases have become available in formats ranging from raw text format (as in databases of
articles from the AP news wire and the "Wall Street Journal") to corpora with full syntactic
annotation (such as the Penn Treebank). The availability of all this data suggests new analysis
techniques that were previously not possible.

5.1 Basic Probability Theory

Intuitively, the probability of some event is the likelihood that it will occur. A probability of 1
indicates that the event is certain to occur, while a probability of 0 indicates that the event
definitely will not occur. Any number between 0 and 1 indicates some degree of uncertainty. This
uncertainty can be illuminated by considering the odds of the event occurring, as you would if you
were going to bet on whether the event will occur or not. A probability of .5 would indicate that the
event is equally likely to occur as not to occur, that is, a "50/50" bet. An event with
probability .5 would occur exactly half of the time. An event with probability .1 would occur once in
every 10 opportunities (1/10 odds), whereas an event with probability .75 would occur 75 times
out of 100 (3/4 odds).
More formally, probability can be defined in terms of a random variable, which may range
over a predefined set of values. While random variables may range over infinite sets and
continuous values, here we will use only random variables that range over a finite set of values. For
instance, consider tossing a coin. The random variable TOSS representing the result of a coin toss
would range over two possible values: heads (h) or tails (t). One possible event is that the coin
comes up heads - TOSS=h; the other is that the coin comes up tails - TOSS=t. No other value for
TOSS is possible, reflecting the fact that a tossed coin always comes up either heads or tails. Usually,
we will not mention the random variable and will talk of the probability of TOSS=h simply as the
probability of the event h.
A probability function, PROB, assigns a probability to every possible value of a random
variable. Every probability function must have the following properties, where e1,...,en are the
possible distinct values of a random variable E:
1. PROB(ei) ≥ 0, for all i
2. PROB(ei) ≤ 1, for all i
3. ∑i=1,n PROB(ei)= 1
Consider a specific example. A particular horse, Harry, ran 100 races in his career. The result of a
race is represented as a random variable R that has one of two values, Win or Lose. Say that Harry
won 20 times overall. Thus the probability of Harry winning the race is PROB(R=Win) = .2 and the
probability of him losing is
PROB(R=Lose) = .8. Note that these values satisfy the three constraints just defined.
Of course, there may be many different random variables, and we are often interested in
how they relate to each other. Continuing the racing example, consider another random variable W
representing the state of the weather and ranging over the values Rain and Shine. Let us say that it
was raining 30 times out of the 100 times a race was run. Of these 30 races, Harry won 15 times.
Intuitively, if you were given the fact that it was raining - that is, that W=Rain - the probability that
R=Win would be .5 (15 out of 30). This intuition is captured by the concept of conditional
probability and is written as PROB(Win | Rain), which is often described as the probability of the
event Win given the event Rain. Conditional probability is defined by the formula
PROB(e | e') = PROB(e & e') / PROB(e')
where PROB(e & e') is the probability of the two events e and e' occurring simultaneously.
You know that PROB( Rain) = .3 and PROB(Win & Rain) = .15, and using the definition of
conditional probability you can compute PROB(Win | Rain) and see that it agrees with your
intuition:
PROB(Win | Rain) = PROB(Win & Rain) / PROB(Rain)
= .15 / .30
= .5
An important theorem relating conditional probabilities is called Bayes' rule. This rule relates the
conditional probability of an event A given B to the conditional probability of B given A:
PROB(A | B) = PROB(B | A) * PROB(A)
PROB(B)
We can illustrate this rule using the race horse example. Using Bayes’ rule we can compute the
probability that it rained on a day that Harry won a race:
PROB(Rain | Win) = ( PROB(Win | Rain)*PROB(Rain)) / PROB(Win)
= (.5 * .3) / .2
= .75
which, of course, is the same value as if we calculated the conditional probability directly from its
definition:
PROB(Rain | Win) = PROB(Rain & Win) / PROB(Win)
= .15 / .20
= .75
The reason that Bayes' rule is useful is that we usually do not have complete information
about a situation and so do not know all the required probabilities. We can, however, often
estimate some probabilities reasonably and then use Bayes’ rule to calculate the others.
Another important concept in probability theory is the notion of independence. Two events
are said to be independent of each other if the occurrence of one does not affect the probability of
the occurrence of the other. For instance, in the race horse example, consider another random
variable, L, that indicates whether I took my lucky rabbit's foot to the race (value Foot) or not
(Empty). Say I took my rabbit's foot 60 times, and Harry won 12 races. This means that the
probability that Harry won the race on a day that I took the rabbit's foot is PROB(Win I Foot) =
12/60 = .2. Since this is the same as the usual probability of Harry winning, you can conclude that
winning the race is independent of taking the rabbit’s foot.
More formally, two events A and B are independent of each other if and only if
PROB(A | B) = PROB(A)
which, using the definition of conditional probability, is equivalent to saying
PROB(A & B) = PROB(A) * PROB(B)
Note that the events of winning and raining are not independent, given that PROB(Win & Rain) =
.15 while PROB(Win) * PROB(Rain) = .2 * .3 = .06. In other words, winning and raining occur
together at a rate much greater than random chance.
Consider an application of probability theory related to language, namely part-of-speech
identification: Given a sentence with ambiguous words, determine the most likely lexical category
for each word. This problem will be examined in detail in Section 7.3. Here we consider a trivial
case to illustrate the ideas under lying probability theory. Say you need to identify the correct
syntactic category for words that can be either nouns or verbs. This can be formalized using two
random variables: one, C, that ranges over the parts of speech (N and V), and another, W, that
ranges over all the possible words. Consider an example when W = "flies". The problem can be
stated as determining whether PROB(C=N | W="flies") or PROB(C=V | W="flies") is greater. Note
that, as mentioned earlier, the random variables will usually be omitted from the formulas. Thus
PROB(C=N | W="flies") will usually be abbreviated as PROB(N | "flies"). Given the definition of
conditional probability, the probabilities for the categories of the word flies are calculated as
follows:
PROB( N | flies ) = PROB(flies & N) / PROB(flies)
PROB( V | flies ) = PROB(flies & V) / PROB(flies)
So the problem reduces to finding which of PROB (flies & N) and PROB(flies & V) is greater, since
the denominator is the same in each formula.
How might you obtain these probabilities? You clearly cannot determine the true
probabilities since you don't have a record of all text ever written, let alone the text that will be
processed in the future. But if you have a large enough sample of data, you can estimate the
probabilities.
Let’s say we have a corpus of simple sentences containing 1,273,000 words. Say we find
1000 uses of the word "flies", 400 of them in the N sense, and 600 in the V sense. We can
approximate the probabilities by looking at the ratios of the frequencies of the words in the corpus.
For example, the probability of a randomly selected word being the word "flies" is
PROB(flies) ≡ 1000 / 1.273.000 = 0.0008
The joint probabilities for "flies" as a noun and "flies" as a verb are
PROB(flies & N) ≡ 400 / 1.273.000 = 0.0003
PROB(flies & V) ≡ 600 / 1.273.000 = 0.0005
Thus our best guess is that each use of "flies" is a verb. We can compute the probability that "flies" is
a verb using the formula
PROB (V |flies) = PROB(V & flies) / PROB (flies)
= .0005 / .0008 = .625
Using this method, an algorithm that always asserts "flies" to be a verb will be correct about 60
percent of the time. This is clearly a poor strategy, but better than guessing that it is a noun all the
time! To get a better method, you must consider more context, such as the sentence in which "flies"
occurs.

5.2 Estimating Probabilities

If you have all the data that would ever be relevant to a problem, you can compute exact
probabilities for that data. For instance, say Harry, the horse in the last section, ran only 100 races
and then was put out to pasture. Now you can compute the exact probability of Harry winning any
particular race someone might choose of the 100 possibilities. But, of course, this is not how
probability is generally used. Typically, you want to use probability to predict future behavior; that
is, you’d use information on Harry’s past performance to predict how likely he is to win his 101st
race (which you are going to bet on). This is a real-life application of probability theory. You are
in the same position when using probabilities to help resolve ambiguity, since you are interested in
parsing sentences that have never been seen before. Thus you need to use data on previously occur
ring sentences to predict the interpretation of the next sentence. We will always be working with
estimates of probability rather than
the actual probabilities.

Figure 5.1 Possible estimates of Prob(H) given two flips

As seen in the last section, one method of estimation is to use the ratios from the corpus as
the probability to predict the interpretation of the new sentence. Thus, if we have seen the word
"flies" 1000 times before, and 600 of them were as a verb, then we assume that PROB(V | "flies") is
.6 and use that to guide our guess with the 1001st case. This simple ratio estimate is called the
maximum likelihood estimator (MLE). If you have many examples of the events you are
estimating, this can be a very reliable estimate of the true probability.
In general, the accuracy of an estimate increases as the amount of data used expands, and
there is a theorem of statistics - the law of large numbers - that states that estimates can be made as
accurate as desired if you have unlimited data. Estimates can be particularly unreliable, however,
when only a small number of samples are involved. Consider trying to estimate the true probability
of a fair coin coming up heads when it is flipped. For the sake of discussion. since we know the
actual answer is .5, let us say the estimate is accurate enough if it falls between .25 and .75. This
range will be called the acceptable margin of error. If you only do two trials, there are four possible
outcomes, as shown in Figure 5.1: two heads, heads then tails, tails then heads, or two tails. This
means that, if you flip a coin twice, half the time you will obtain an estimate of 1/2 or .5. The other
half of the time, however, you will estimate that the probability of coming up heads is 1 (the two-
heads case) or 0 (the two tails case). So you have only a 50 percent chance of obtaining an estimate
within the desired margin of error. With three flips, the chance of getting a reliable enough estimate
jumps to 75 percent, as shown in Figure 5.2. With four flips, there is an 87.5 percent chance of the
estimate being accurate enough, with eight flips a 93 percent chance, with 12 flips a 95 percent
chance, and so on. No matter how long you flip, there is always a chance that the estimate found is
inaccurate, but you can reduce the probability of this occurring to as small a number as you desire
if you can do enough trials.

Figure 5.2 Possible estimates of Prob(H) given three flips

Almost any method of estimation works well when there is a lot of data. Unfortunately, there
are a vast number of estimates needed for natural language applications, and a large proportion of
these events are quite rare. This is the problem of sparse data. For instance, the Brown corpus
contains about a million words, but due to duplication there are only 49,000 different words. Given
this, you might expect each word to occur about 20 times on average. But over 40,000 of the words
occur five times or less. With such few numbers, our estimates of the part of speech for such words
may be highly inaccurate. The worst case occurs if a low-frequency word does not occur at all in
one of its possible categories. Its probability in this category would then be estimated as 0; thus no
interpretation using the word in any category would be possible, because the probability of the
overall sentence containing the word would be 0 as well. Unfortunately, rare events are common
enough in natural language applications that reliable estimates for these low-frequency words are
essential for the algorithms.
There are other estimation techniques that attempt to address the problem of estimating
probabilities of low frequency events. To examine them, let’s intro duce a framework in which they
can be compared. For a particular random variable X, all techniques start with a set of values, Vi,
computed from the count of the number of times X = xi. The maximum likelihood estimation
technique uses V1 = |xi|; that is, Vi is exactly the count of the number of times X = xi. Once Vi is
determined for each xi, the probability estimates are obtained by the formula
PROB (X = xi) ≡ Vi / ∑i Vi
Note that the denominator guarantees that the estimates obey the three properties of a probability
function defined in Section 5.1.
One technique to solve the zero probability problem is to ensure that no Vi has the value 0.
We might, for instance, add a small amount, say .5, to every count, such as in Vi = |xi| + .5. This
guarantees no zero probabilities yet retains the relative likelihoods for the frequently occurring
values. This estimation technique is called the expected likelihood estimator (ELE). To see the
difference between this and the MLE, consider a word w that happens not to occur in the corpus,
and consider estimating the probability that w occurs in one of 40 word classes L1 …… L40.
Thus we have a random variable X, where X = xi only if w appears in word category Li. The MLE for
PROB(X = xi) will not be defined because the formula has a zero denominator. The ELE, however,
gives an equally likely probability to each possible word class. With 40 word classes, for instance,
each Vi will be .5, and thus PROB(L1 | w) ≡ .5/20 = .025. This estimate better reflects the fact that
we have no information about the word. On the other hand, the ELE is very conservative. If w
appears in the corpus five times, once as a verb and four times as a noun, then the MLE estimate of
PROB(N | w) would be .8, while the ELE estimate would be 4.5/25 = .18, a very small value
compared to intuition.

Evaluation
Once you have a set of estimated probabilities and an algorithm for some particular
application, you would like to be able to tell how well your new technique performs compared with
other algorithms or variants of your algorithm. The general method for doing this is to divide the
corpus into two parts: the training set and the test set. Typically, the test set consists of 10 - 20
percent of the total data. The training set is then used to estimate the probabilities, and the
algorithm is run on the test set to see how well it does on new data. Running the algorithm on the
training set is not considered a reliable method of evaluation because it does not measure the
generality of your technique. For instance, you could do well on the training set simply by
remembering all the answers and repeating them back in the test! A more thorough method of
testing is called cross-validation, which involves repeatedly removing different parts of the corpus
as the test set, training on the remainder of the corpus, and then evaluating on the new test set.
This technique reduces the chance that the test set selected was somehow easier than you might
expect.
5.3 Part-of-Speech Tagging

Part-of-speech tagging involves selecting the most likely sequence of syntactic categories for the
words in a sentence. A typical set of tags, used in the Penn Treebank project, is shown in Figure 5.3.
In Section 5.1 you saw the simplest algorithm for this task: Always choose the interpretation that
occurs most frequently in the training set. Surprisingly, this technique often obtains about a 90
percent success rate, primarily because over half the words appearing in most corpora are not
ambiguous. So this measure is the starting point from which to evaluate algorithms that use more
sophisticated techniques. Unless a method does significantly better than 90 percent, it is not
working very well.

1. CC Coordinating conjunction 19. PP$ Possessive pronoun


2. CD Cardinal number 20. RB Adverb
3. DT . Determiner 21. RBR Comparative adverb
4. EX Existential there 22. RBS Superlative Adverb
5. FW Foreign word 23. RP Particle
6. IN Preposition / subord. conj 24. SYM Symbol (math or scientific)
7. JJ Adjective 25. TO to
8. JJR Comparative adjective 26. UH Interjection
9. JJS Superlative adjective 27. VB Verb, base form
10. LS List item marker 28. VBD Verb, past tense
11. MD Modal 29. VBG Verb, gerund/pres. participle
12. NN Noun, singular or mass 30. VBN Verb, past participle
13. NNS Noun, plural 31. VBP Verb, non-3s, present
14. NNP Proper noun, singular 32. VBZ Verb, 3s, present
15. NNPS Proper noun, plural 33. WDT Wh-determiner
16. PDT Predeterminer 34. WP Wh-pronoun
17. POS Possessive ending 35. WPZ Possessive whpronoun
18. PRP Personal pronoun 36. WRB Wh-adverb
Figure 5.3 The Penn Treebank tagset

The general method to improve reliability is to use some of the local context of the sentence in
which the word appears. For example, in Section 5.1 you saw that choosing the verb sense of "flies"
in the sample corpus was the best choice and would be right about 60 percent of the time. If the
word is preceded by the word "the", on the other hand, it is much more likely to be a noun. The
technique developed in this section is able to exploit such information.
Consider the problem in its full generality. Let w1,...,wT be a sequence of words. We want to find
the sequence of lexical categories C1,...,CT that maximizes
1. PROB (C1,...,CT | W1,...,WT )
Unfortunately, it would take far too much data to generate reasonable estimates for such
sequences, so direct methods cannot be applied. There are, however, reasonable approximation
techniques that produce good results.To develop them, you must restate the problem using Bayes’
rule, which says that this conditional probability equals
2. (PROB (C1,...,CT) * PROB (W1,...,WT | C1,...,CT)) / PROB (W1,...,WT)
As before, since we are interested in finding the C1,...,Cn that gives the maximum value, the
common denominator in all these cases will not affect the answer. Thus the problem reduces to
finding the sequence C1,...,Cn that maxi mizes the formula
3. PROB (C1,...,CT) * PROB (W1,...,WT | C1,...,CT)
There are still no effective methods for calculating the probability of these long sequences
accurately, as it would require far too much data. But the probabilities can be approximated by
probabilities that are simpler to collect by making some independence assumptions. While these
independence assumptions are not really valid, the estimates appear to work reasonably well in
practice. Each of the two expressions in formula 3 will be approximated. The first expression, the
probability of the sequence of categories, can be approximated by a series of probabilities based on
a limited number of previous categories. The most common assumptions use either one or two
previous categories. The bigram model looks at pairs of categories (or words) and uses the
conditional probability that a category C1 will follow a category Ci-1, written as PROB(C1 | Ci-1).
The trigram model uses the conditional probability of one category (or word) given the two
preceding categories (or words), that is, PROB(Ci | Ci-2 Ci-1). These models are called n-gram
models, in which n represents the number of words used in the pattern. While the trigram model
will produce better results in practice, we will consider the bigram model here for simplicity. Using
bigrams, the following approximation can be used:
PROB (C1,...,CT) ≡ ∏i=1,T PROB(Ci| Ci-1)
To account for the beginning of a sentence, we posit a pseudocategory Ø at position 0 as the value
of C0. Thus the first bigram for a sentence beginning with an ART would be PROB(ART | Ø). Given
this, the approximation of the probability of the sequence ART N V N using bigrams would be
PROB (ART N V N) ≡ PROB (ART | Ø) * PROB (N | ART)* PROB (V | N) * PROB (N | V)
The second probability in formula 3,
PROB (W1,...,WT | C1,...,CT)
can be approximated by assuming that a word appears in a category independent of the words in
the preceding or succeeding categories. It is approximated by the product of the probability that
each word occurs in the indicated part of speech. that is, by
PROB (W1,...,WT | C1,...,CT) ≡ ∏i=1,T PROB(wi| Ci)
With these two approximations, the problem has changed into finding the sequence C1,...,CT that
maximizes the value of
∏i=1,T PROB(Ci| Ci-1) * PROB PROB(wi| Ci)
The advantage of this new formula is that the probabilities involved can be read ily estimated from
a corpus of text labeled with parts of speech. In particular, given a database of text, the bigram
probabilities can be estimated simply by counting the number of times each pair of categories
occurs compared to the individual category counts. The probability that a V follows an N would be
estimated as follows:
𝑪𝒐𝒖𝒏𝒕(𝑵 𝒂𝒕 𝒑𝒐𝒔𝒊𝒕𝒊𝒐𝒏 𝒊 − 𝟏 𝒂𝒏𝒅 𝑽 𝒂𝒕 𝒊)
𝑷𝑹𝑶𝑩(𝑪𝒊 = 𝑽 | 𝑪𝒊 − 𝟏 = 𝑵) ≡
𝑪𝒐𝒖𝒏𝒕 (𝑵 𝒂𝒕 𝒑𝒐𝒔𝒊𝒕𝒊𝒐𝒏 𝒊 − 𝟏)

Figure 5.4 Bigram probabilities from the generated corpus


Figure 5.4 gives some bigram frequencies computed from an artificially generated corpus of
simple sentences. The corpus consists of 300 sentences but has words in only four categories: N, V,
ART, and P. In contrast, a typical real tagset used in the Penn Treebank, shown in Figure 7.3,
contains about 40 tags. The artificial corpus contains 1998 words: 833 nouns, 300 verbs, 558
articles, and 307 prepositions. Each bigram is estimated using the previous formula. To deal with
the problem of sparse data, any bigram that is not listed here will be assumed to have a token
probability of .0001.
The lexical-generation probabilities, PROS (wi | C1), can be estimated sim ply by counting
the number of occurrences of each word by category. Figure 5.5 gives some counts for individual
words from which the lexical-generation probability estimates in Figure 5.6 are computed. Note
that the lexical-generation probability is the probability that a given category is realized by a
specific word, not the probability that a given word falls in a specific category. For instance, PROB
(the | ART) is estimated by Count (# times the is an ART) / Count (# times an ART occurs). The
other probability, PROB(ART < the), would give a very different value.
Given all these probability estimates, how might you find the sequence of categories that has
the highest probability of generating a specific sentence? The brute force method would be to
generate all possible sequences that could generate the sentence and then estimate the probability
of each and pick the best one. The problem with this is that there are an exponential number of
sequences - given N categories and T words, there are NT possible sequences. Luckily, you can do
much better than this because of the independence assumptions that were made about the data.

N V ART P TOTAL
flies 21 23 0 0 44
fruit 49 5 1 0 55
like 10 30 0 21 61
a 1 0 201 0 202
the 1 0 300 2 303
flower 53 15 0 0 68
flowers 42 16 0 0 58
birds 64 1 0 0 65
others 592 210 56 284 1142
TOTAL 833 300 558 307 1998
Figure 5.5 A summary of some of the word counts in the corpus

PROB (the | ART) .54 PROB (a | ART) .360


PROB (flies | N) .025 PROB (a | N) .001
PROB (flies | V) .076 PROB (flower | N) .063
PROB (like | V) .1 PROB (flower | V) .05
PROB (like | P) .068 PROB (birds | N) .076
PROB (like | N) .012
Figure 5.6 The lexical-generation probabilities

Since we are only dealing with bigram probabilities, the probability that the i'th word is in a
category C1 depends only on the category of the (i-1)th word, Ci-1. Thus the process can be
modeled by a special form of probabilistic finite state machine, as shown in Figure 5.7. Each node
represents a possible lexical category and the transition probabilities (the bigram probabilities in
Figure 5.4) indicate the probability of one category following another.
With such a network you can compute the probability of any sequence of categories simply
by finding the path through the network indicated by the sequence and multiplying the transition
probabilities together. For instance, the sequence ART N V N would have the probability .71 * 1 *
.43 * .35 = .107. This representation, of course, is only accurate if the probability of a category occur
ring depends only on the one category before it. In probability theory this is often called the
Markov assumption, and networks like that in Figure 5.7 are called Markov chains.

Figure 5.7 A Markov chain capturing the bigram probabilities

The network representation can now be extended to include the lexical-generation


probabilities, as well. In particular, we allow each node to have an output probability, which gives a
probability to each possible output that could correspond to the node. For instance, node N in
Figure 7.7 would be associated with a probability table that indicates, for each word, how likely
that word is to be selected if we randomly select a noun. The output probabilities are exactly the
lexical-generation probabilities shown in Figure 7.6. A network like that in Figure 7.7 with output
probabilities associated with each node is called a Hidden Markov Model (HMM). The word
"hidden" in the name indicates that for a specific sequence of words, it is not clear what state the
Markov model is in. For instance, the word "flies" could be generated from state N with a
probability of .025 (given the values in Figure 7.6), or it could be generated-from state V with a
probability .076. Because of this ambiguity, it is no longer trivial to compute the probability of a
sequence of words from the network. If you are given a particular sequence, however, the
probability that it generates a particular output is easily computed by multiplying the probabilities
on the path times the probabilities for each output. For instance, the probability that the sequence
N V ART N generates the output "Flies like a flower" is computed as follows. The probability of the
path N V ART N, given the Markov model in Figure 7.7, is .29 * .43 * .65 * 1 = .081. The probability of
the output being "Flies like a flower" for this sequence is computed from the output probabilities
given in Figure 5.6:
PROB(flies | N) * PROB (like | V) * PROB(a | ART) * PROB (flower | N)
= .025 * .1 * .36 * .063
= 5.4 * 10-5
Multiplying these together gives us the likelihood that the HMM would generate the sentence, 4.37
* 10-6. More generally, the formula for computing the probability of a sentence w1,...,wT given a
sequence C1,...,CT is
∏i=1,T PROB(Ci| Ci-1) * PROB(wi| Ci)
Figure 5.8 Encoding the 256 possible sequences exploiting the Markov assumption

Now we can resume the discussion of how to find the most likely sequence of categories for
a sequence of words. The key insight is that because of the Markov assumption, you do not have to
enumerate all the possible sequences. In fact, sequences that end in the same category can be
collapsed together since the next category only depends on the one previous category in the
sequence. So if you just keep track of the most likely sequence found so far for each possible ending
category, you can ignore all the other less likely sequences. For example. Consider the problem of
finding the most likely categories for the sentence "Flies like a flower", with the lexical-generation
probabilities and bigram probabilities discussed so far. Given that there are four possible
categories, there are 44 =256 different sequences of length four. The brute force algorithm would
have to generate all 256 sequences and compare their probabilities in order to find this one.
Exploiting the Markov assumption, however, this set of sequences can be collapsed into a
representation that considers only the four possibilities for each word. This representation, shown
as a transition diagram in Figure 5.8, represents all 256 sequences. To find the most likely
sequence, you sweep forward through the words one at a time finding the most likely sequence for
each ending category. In other words, you first find the four best sequences for the two words "flies
like": the best ending with "like" as a V, the best as an N, the best as a P, and the best as an ART. You
then use this information to find the four best sequences for the three words "flies like a", each one
ending in a different category. This process is repeated until all the words are accounted for. This
algorithm is usually called the Viterbi algorithm. For a problem involving T words and N lexical
categories, it is guaranteed to find the most likely sequence using k*T*N2 steps, for some constant
k, significantly better than the NT steps required by the brute force search! The
rest of this section develops the Viterbi algorithm in detail.
Figure 5.9 The Viterbi algorithm

The Viterbi Algorithm


We will track the probability of the best sequence leading to each possible cate gory at each
position using an NxT array, where N is the number of lexical categories (L1,...,LN) and T is the
number of words in the sentence (w1,...,wT). This array, SEQSCORE(n, t), records the probability for
the best sequence up to position t that ends with a word in category Ln. To record the actual best
sequence for each category at each position, it suffices to record only the one pre ceding category
for each category and position. Another NxT array, BACKPTR, will indicate for each category in
each position what the preceding category is in the best sequence at position t-1. The algorithm,
shown in Figure 5.9, operates by computing the values for these two arrays.
Let's assume you have analyzed a corpus and obtained the bigram and lexical-generation
probabilities in Figures 5.4 and 5.6, and assume that any bigram probability not in Figure 5.4 has a
value of .0001. Using these probabilities, the algorithm running on the sentence "Flies like a flower"
will operate as follows.
The first row is set in the initialization phase using the formula
SEQSCORE (i, 1) = PROB (Flies | Li)) ** PROB ((Li | O ))
where L1 ranges over V, N, ART, and P. Because of the lexical-generation probabilities in Figure 5.6.
only the entries for a noun and a verb are greater than zero.
Figure 5.10 The results of the first two steps of the Viterbi algorithm
Thus the most likely sequence of one category ending in a V (= L1) to generate "flies" has a
score of 7.6 * 10-6, whereas the most likely one ending in an N (= L2) has a score of .00725.
The result of the first step of the algorithm is shown as the left-hand side network in Figure
5.10. The probability of "flies" in each category has been computed. The second phase of the
algorithm extends the sequences one word at a time, keeping track of the best sequence found so
far to each category. For instance, the probability of the state like/V is computed as follows:

PROB (like/V) = MAX (PROB (flies/N) * PROB (V | N),


PROB (flies/V) * PROB (V | V))*PROB (like/V)
= MAX (.00725 * .43, 7.6 * 10-6 * .0001) * .1
= 3.12* 10-4
The difference in this value from that shown in Figure 5.10 is simply a result of the fact that the
calculation here used truncated approximate values for the probabilities. In other words, the most
likely sequence of length two generating "Flies like" and ending in a V has a score of 3.1 * 10-4 (and
is the sequence N V), the most likely one ending in a P has a score of 2.2 * 10-5 (and is the sequence
N P), and the most likely one ending in an N has a score of 1.3 * 10-5 (and is the sequence N N). The
heavier arrows indicate the best sequence leading up to each node. The computation continues in
the same manner until each word has been processed. Figure 5.11 shows the result after the next
iteration, and Figure 5.12 shows the final result. The highest probability sequence ends in state
flower/N. It is simple to trace back from this category (using BACKPTR( 1, 4) and so on) to find the
full sequence N V ART N, agreeing with our intuitions.
Algorithms like this can perform effectively if the probability estimates are computed from a
large corpus of data that is of the same style as the input to be classified. Researchers consistently
report labeling with 95 percent or better accuracy using trigram models. Remember, however, that
the naive algorithm picks the most likely category about 90 percent of the time. Still, the error rate
is cut in half by introducing these techniques.

Figure 5.11 The result after the second iteration


Figure 5.12 The result after the third iteration

5.4 Obtaining Lexical Probabilities

Corpus-based methods suggest some new ways to control parsers. If we had some large
corpora of parsed sentences available, we could use statistical methods to identify the common
structures in English and favor these in the parsing algorithm. This might allow us to choose the
most likely interpretation when a sentence is ambiguous, and might lead to considerably more
efficient parsers that are nearly deterministic. Such corpora of parsed sentences are now becoming
available.
The first issue is what the input would be to such a parser. One simple approach would be
to use a part-of-speech tagging algorithm from the last section to select a single category for each
word and then start the parse with these categories. If the part-of-speech tagging is accurate, this
will be an excellent approach, because a considerable amount of lexical ambiguity will be
eliminated before the parser even starts. But if the tagging is wrong, it will prevent the parser from
ever finding the correct interpretation. Worse, the parser may find a valid but implausible
interpretation based on the wrongly tagged word and never realize the error. Consider that even at
95 percent accuracy, the chance that every word is correct in a sentence consisting of only 8 words
is .67, and with 12 words it is .46 - less than half. Thus the chances of this approach working in
general look slim.

Figure 5.13 Context-independent estimates for the lexical categories


A more appropriate approach would be to compute the probability that each word appears
in the possible lexical categories. If we could combine these probabilities with some method of
assigning probabilities to rule use in the grammar, then we could develop a parsing algorithm that
finds the most probable parse for a given sentence.
You already saw the simplest technique for estimating lexical probability by counting the
number of times each word appears in the corpus in each category. Then the probability that word
w appears in a lexical category Lj out of possible categories L1,...,LN could be estimated by the
formula
PROB (Lj| w) ≡ count (Lj & w)/ ∑i=1,N count (Li & w)
Using the data shown in Figure 5.5, we could derive the context-independent probabilities for each
category and word shown in Figure 5.13.
As we saw earlier, however, such estimates are unreliable because they do not take context
into account. A better estimate would be obtained by computing how likely it is that category Li
occurred at position t over all sequences given the input w1,...,wT. In other words, rather than
searching for the one sequence that yields the maximum probability for the input, we want to
compute the sum of the probabilities for the input from all sequences.
For example, the probability that "flies" is a noun in the sentence "The flies like flowers"
would be calculated by summing the probability of all sequences that end with "flies" as a noun.
Given the transition and lexical generation probabilities in Figures 5.4 and 5.6, the sequences that
have a nonzero values would be
The/ART flies/N 9.58 * 10-3
The/N flies/N 1.13 * 10-6
The/P flies/N 4.55 * 10-9
which adds up to 9.58 * 10 . Likewise, three nonzero sequences end with "flies" as a V, yielding a
-3

total sum of 1.13 * 10-5. Since these are the only sequences that have nonzero scores when the
second word is "flies", the sum of all these sequences will be the probability of the sequence "The
flies", namely 9.59 1 * 10-3. We can now compute the probability that "flies" is a noun as follows:

PROB (flies/N | The flies) = PROB (flies/N & The flies) / PROB (The flies)
= 9.58 * 10-3 / 9.591 * 10-3
= .9988
Likewise, the probability that "flies" is a verb would be .0012.

Figure 5.14 The forward algorithm for computing the lexical probabilities

Of course, it would not be feasible to enumerate all possible sequences in a realistic


example. Luckily, however, the same trick used in the Viterbi algorithm can be used here. Rather
than selecting the maximum score for each node at each stage of the algorithm, we compute the
sum of all scores.
To develop this more precisely, we define the forward probability, written as αi(t), which is
the probability of producing the words w1,...,wt and ending in state wt/Li:
αi(t) = PROB (wt / Li, w1,...,wt)
For example, with the sentence "The flies like flowers", α2(3) would be the sum of values computed
for all sequences ending in V (the second category) in position 3 given the input "The flies like".
Using the definition of conditional probability, you can then derive the probability that word wt is
an instance of lexical category Lj as follows:
PROB(wt / Li, w1,...,wt) = PROB(wt / Li, w1,...,wt) / PROB(w1,...,wt)
We estimate the value of PROB(w1,...,wt) by summing over all possible sequences up to any state
at position t, which is simply ∑j=1,N αj(t). In other words, we end up with

PROB(wt / Li | w1,...,wt) ≡ ∑j=1,N αi(t) / ∑j=1,N αj(t)


The first two parts of the algorithm shown in Figure 5.14 compute the forward probabilities using a
variant of the Viterbi algorithm. The last step converts the forward probabilities into lexical
probabilities for the given sentence by normalizing the values.

Figure 5.15 Computing the sums of the probabilities of the sequences

Figure 5.16 Context-dependent estimates for lexical categories in the sentence The flies like flowers

Consider deriving the lexical probabilities for the sentence "The flies like flowers" using the
probability estimates in Figures 5.4 and 5.6. The algorithm in Figure 5.14 would produce the sums
shown in Figure 5.15 for each category in each position, resulting in the probability estimates
shown in Figure 5.16.
Note that while the context-independent approximation in Figure 5.13 slightly favors the
verb interpretation of "flies", the context-dependent approximation virtually eliminates it because
the training corpus had no sentences with a verb immediately following an article. These
probabilities are significantly different than the context independent ones and much more in line
with intuition.
Note that you could also consider the backward probability, βi(t), the probability of
producing the sequence (w1,...,wt) beginning from state wt/Lj. These values can be computed by
an algorithm similar to the forward probability algorithm but starting at the end of the sentence
and sweeping backward through the states. Thus a better method of estimating the lexical
probabilities for word wt would be to consider the entire sentence rather than just the words up to
t. In this case, the estimate would be
PROB(wt / Li) = (αi(t) * βi(t)) / ∑j=1,N (αi(t) * βj(t))

5.5 Probabilistic Context-Free Grammars

Just as finite state machines could be generalized to the probabilistic case, context-free grammars
can also be generalized. To do this, we must have some statistics on rule use. The simplest
approach is to count the number of times each rule is used in a corpus containing parsed sentences
and use this to estimate the probability of each rule being used. For instance, consider a category C,
where the grammar contains m rules, R1 Rm, with the left-hand side C. You could estimate the
probability of using rule Rj to derive C by the formula
PROB(Rj | C) ≡ Count (#times Rj used) / ∑j=1,m (#times Ri used)
Grammar 5.17 shows a probabilistic CFG with the probabilities derived from analyzing a parsed
version of the demonstration corpus.

Grammar 5.17 A simple probabilistic grammar

You can then develop algorithms similar in function to the Viterbi algorithm that, given a
sentence, will find the most likely parse tree that could have generated that sentence. The
technique involves making certain independence assumptions about rule use. In particular, you
must assume that the probability of a constituent being derived by a rule Rj is independent of how
the constituent is used as a subconstituent. For example, this assumption would imply that the
probabilities of NP rules are the same whether the NP is the subject, the object of a verb, or the
object of a preposition. We know that this assumption is not valid in most cases. For instance, noun
phrases in the subject position are much more likely to be pronouns than noun phrases not in the
subject position. But, as before, it might be that useful predictive power can be obtained using these
techniques in practice.
With this assumption, a formalism can be developed based on the probability that a
constituent C generates a sequence of words wi, wi+1,...,wj. written as wi,j. This type of probability
is called the inside probability because it assigns a probability to the word sequence inside the
constituent. It is written as
PROB(wi,j | C)

Figure 5.18 The two possible ways that a flower could be an NP

Consider how to derive inside probabilities. The case for lexical categories is simple. In fact,
these are exactly the lexical-generation probabilities derived in Section 5.3. For example,
PROB(flower | N) is the inside probability that the constituent N is realized as the word "flower",
which for our hypothetical corpus was .06, given in Figure 5.6.
Using such lexical-generation probabilities, we can then derive the probability that the
constituent NP generates the sequence a "flower" as follows: There are only two NP rules in
Grammar 5.17 that could generate a sequence of two words. The parse trees generated by these
two rules are shown in Figure 5.18. You know the likelihood of each rule, estimated from the
corpus as shown in Grammar 5.17, so the probability that the constituent NP generates the words a
"flower" will be the sum of the probabilities of the two ways it can be derived, as follows:

This probability can then be used to compute the probability of larger constituents. For
instance, the probability of generating the words "A flower wilted" from constituent S could be
computed by summing the probabilities generated from each of the possible trees shown in Figure
5.19. Although there are three possible interpretations, the first two differ only in the derivation of
a "flower" as an NP. Both these interpretations are already included in the preceding computation
of PROB(a flower | NP). Thus the probability of "a flower blooms" is

Figure 5.19 The three possible ways to generate a flower wilted as an S

Using this method, the probability that a given sentence will be generated by the grammar
can be computed efficiently. The method only requires recording the value of each constituent
between each two possible positions.
We will not pursue this development further, however, because in parsing, you are
interested in finding the most likely parse rather than the overall probability of a given sentence. In
particular, it matters which of the two NP interpretations in Figure 5.19 was used. The probabilities
of specific parse trees can be found using a standard chart parsing algorithm, where the probability
of each constituent is computed from the probability of its subconstituents and the probability of
the rule used. Specifically, when entering an entry E of category C using a rule i with n
subconstituents corresponding to entries E1,...,En, then
PROB (E) = PROB (Rule i | C) * PROB (E1) * ... * PROB (En)
For lexical categories it is better to use the forward probability than the lexical-generation
probability. This will produce better estimates, because it accounts for some of the context of the
sentence. You can use the standard chart parsing algorithm and add a step that computes the
probability of each entry when it is added to the chart. Using the bottom-up algorithm and the
probabilities derived from the demonstration corpus, Figure 5.20 shows the complete chart for the
input "a flower". Note that the most intuitive reading, that it is an NP, has the highest
probability by far, .54. But there are many other possible interpretations because of the low
probability readings of the word a as a noun and the reading of "flower" as a verb. Note that the
context-independent probabilities for "flower" would favor the verb interpretation, but the forward
algorithm strongly favors the noun interpretation in the context where it immediately follows the
word "a". The probabilities for the lexical constituents ART416, N417, V419, and N422 were
computed using the forward algorithm. The context-independent lexical probabilities would be
much less in line with intuition.

Figure 5.20 The full chart for a flower

Unfortunately, a parser built using these techniques turns out not to work as well as you
might expect, although it does help. Some researchers have found that these techniques identify the
correct parse about 50 percent of the time. It doesn't do better because the independence
assumptions that need to be made are too radical. Problems arise in many guises, but one critical
issue is the handling of lexical items. For instance, the context-free model assumes that the
probability of a particular verb being used in a VP rule is independent of which rule is being
considered. This means lexical preferences for certain rules cannot be handled within the basic
framework. This problem then influences many other issues, such as attachment decisions.
For example, Grammar 5.17 indicates that rule 3 is used 39 percent of the time, rule 4, 22
percent of the time, and rule 5, 24 percent of the time. This means that, independent of whatever
words are used, an input sequence of form V NP PP will always be interpreted with the PP attached
to the verb. The tree fragments are shown in Figure 5.21: Any structure that attaches the PP to the
verb will have a probability of .22 from this fragment, whereas the structure that attaches the PP to
the NP will have a probability of .39 * .25, namely .093. So the parser will always attach a PP to the
verb phrase independent of what words are used. in fact, there are 23 cases of this situation in the
corpus where the PP attaches to an NP, and the probabilistic parser gets every one of them wrong.
Figure 5.21 The two structures affecting attachment decisions
This is true even with particular verbs that rarely take a nppp complement. Because of the
context-free assumption made about probabilities, the particular verb used has no effect on the
probability of a particular rule being used. Because of this type of problem, a parser based on this
approach with a realistic grammar is only a slight improvement over the nonprobabilistic methods.
For example, 84 sentences were selected from the corpus, each involving a PP attachment
problem. With the standard bottom-up nonprobabilistic algorithm, using Grammar 5.17, the first
complete S structure found was the correct answer on one-third of the sentences. By using
probabilistic parsing, the highest probability S structure was correct half of the time. Thus, pure
probabilistic parsing is better than guessing but leaves much to be desired. Note also that this test
was on the same sentences that were used to compute the probabilities in the first place, so
performance on new sentences not in the training corpus would tend to be even worse! Papers in
the literature that use real corpora and extensive training report accuracy results similar to those
described here.
While the pure probabilistic context-free grammars have their faults, the techniques
developed here are important and can be reused in generalizations that attempt to develop more
context-dependent probabilistic parsing schemes.

5.6 Best-First Parsing

So far, probabilistic context-free grammars have done nothing to improve the efficiency of
the parser. Algorithms can be developed that attempt to explore the high-probability constituents
first. These are called best-first parsing algorithms. The hope is that the best parse can be found
quickly and much of the search space, containing lower-rated possibilities, is never explored.

It turns out that all the chart parsing algorithms in Chapter 3 can be modified fairly easily to
consider the most likely constituents first. The central idea is to make the agenda a priority queue -
a structure where the highest-rated elements are always first in the queue. The parser then
operates by always re moving the highest-ranked constituent from the agenda and adding it to the
chart.

It might seem that this one change in search strategy is all that is needed to modify the
algorithms, but there is a complication. The previous chart parsing algorithms all depended on the
fact that the parser systematically worked from left to right, completely processing constituents
occurring earlier in the sentence before considering later ones. With the modified algorithm, this is
not the case. If the last word in the sentence has the highest score, it will be added to the chart first.
The problem this causes is that you cannot simply add active arcs to the chart (and depend on later
steps in the algorithm to extend them). In fact, the constituent needed to extend a particular active
arc may already be on the chart. Thus, whenever an active arc is added to the chart, you must check
to see if it can be extended immediately, given the current chart. Thus we need to modify the arc
extension algorithm. The algorithm is as before (in Section 3.4), except that step 2 is modified to
check for constituents that already exist on the chart. The complete algorithm is shown in Figure
5.22.
Figure 5.22 The new arc extension algorithm

Adopting a best-first strategy makes a significant improvement in the efficiency of the


parser. For instance, using Grammar 5.17 and lexicon trained from the corpus, the sentence "The
man put a bird in the house" is parsed correctly after generating 65 constituents with the best-first
parser. The standard bottom-up algorithm generates 158 constituents on the same sentence, only
to obtain the same result. If the standard algorithm were modified to terminate when the first
complete S interpretation is found, it would still generate 106 constituents for the same
sentence. So best-first strategies can lead to significant improvements in efficiency.

Even though it does not consider every possible constituent, the best-first parser is
guaranteed to find the highest probability interpretation. To see this, assume that the parser finds
an interpretation S1 with probability p1. The important property of the probabilistic scoring is that
the probability of a constituent must always be lower (or equal) to the probability of any of its
subconstituents. So if there were an interpretation S2 that had a score p2, higher than p1, then it
would have to be constructed out of subconstituents all with a probability of p2 or higher.
This means that all the subconstituents would have been added to the chart before S1 was. But this
means that the arc that constructs S2 would be completed, and hence S2 would be on the agenda.
Since S2 has a higher score than S 1, it would be considered first.

While the idea of a best-first parser is conceptually simple, there are a few problems that
arise when trying to apply this technique in practice. One problem is that if you use a multiplicative
method to combine the scores, the scores for constituents tend to fall quickly as they cover more
and more input. This might not seem problematic, but in practice, with large grammars, the
probabilities drop off so quickly that the search closely resembles a breadth-first search: First build
all constituents of length 1, then all constituents of length 2, and so on. Thus the promise of quickly
finding the most preferred solution is not realized. To deal with this problem, some systems use a
different function to compute the score for constituents. For instance, you could use the minimum
score of any subconstituent and the rule used, that is,

Score(C) = MIN(Score(C -> C1,...,Cn), Score (C1),...,Score(Cn))

This gives a higher (or equal) score than the first approach, but a single poorly-rated
subconstituent can essentially eliminate any constituent that contains it, no matter how well all the
other constituents are rated. Unfortunately, testing the same 84 sentences using the MIN function
leads to a significant decrease in accuracy to only 39 percent, not much better than a brute force
search. But other researchers have suggested that the technique performs better than this in actual
practice. You might also try other means of combining the scores, such as taking the average score
of all subconstituents.

5.7 A Simple Context-Dependent Best-First Parser

The best-first algorithm leads to improvement in the efficiency of the parser but does not
affect the accuracy problem. This section explores a simple alternative method of
computing rule probabilities that uses more context-dependent lexical information. The
idea exploits the observation that the first word in a constituent is often the head and thus
has a dramatic effect on the probabilities of rules that account for its complement. This
suggests a new probability measure for rules that is relative to the first word, PROB(R | C,
w). This is estimated as follows:
Count (# times rule R used for cat C starting with w)
PROB(R|C, w) = ______________________________________________
Count(# times cat C starts with w)

The effect of this modification is that probabilities are sensitive to the particular words used. For
instance, in the corpus, singular nouns rarely occur alone as a noun phrase (that is, starting rule NP
-> N), whereas plural nouns rarely are used as a noun modifying (that is, starting rule
NP —> N N). This can be seen in the difference between the probabilities for these two rules given
the words "house" and "peaches", shown in Figure 7.23. With the context-free probabilities, the
rule NP -> N had a probability of .14. This underestimates the rule when the input has a plural
noun, and overestimates it when the input has a singular noun.

More importantly, the context-sensitive rules encode verb preferences for different sub
categorizations. In particular, Figure 7.23 shows that the rule VP -> V NP PP is used 93 percent of
the time with the verb "put" but only 10 percent of the time with "like". This difference allows a
parser based on these probabilities to do significantly better than the context-free probabilistic
parser. In particular, on the same 84 test sentences on which the context-free probabilistic parser
had 49 percent accuracy, the context-dependent probabilistic parser has 66 percent accuracy,
getting the attachment correct on 14 sentences on which the context- free parser failed. The
context-dependent parser is also more efficient finding the answer on the sentence "The man put
the bird in the house" generating only 36 constituents. The results of the three parsing strategies
are summarized in Figure 7.24.
To see why the context-dependent parser does better, consider the attachment decision that has to
be made between the trees shown in Figure 7.21 for the verbs "like" and "put", say in the sentences
"The man put the bird in the house" and "The man likes the bird in the house". The context-free
probabilistic parser would assign the same structure to both, getting the example with "put" right
and the example with "like" wrong. The relevant parts of the charts are shown in Figures 7.25 and
7.26. In Figure 7.25 the probability of the rule VP -> V NP PP,
starting with "put", is .93, yielding a probability of .54 (.93 * .99 * .76 * .76) for constituent VP6840.
This is far greater than the probability of .0038 given the alternative VP6828. Changing the verb to
"like", as shown in Figure 7.26, affects the probabilities enough to override the initial bias towards
the V-NP-PP interpretation. In this case, the probability of the rule VP -> V NP PP, starting with
"like", is only . 1, whereas the probability of rule VP ->V NP is .9. This gives a probability of .1 to
VP6883, beating out the alternative.

The question arises as to how accurate a parser can become by using the appropriate contextual
information. While 66 percent accuracy was good compared to the alternative strategies, it still
leaves a 33 percent error rate on sentences involving a PP attachment decision. What additional
information could be added to improve the performance? Clearly, you could make the rule
probabilities relative to a larger fragment of the input, using a bigram or trigram at the beginning of
the rule. This may help the selectivity if there is enough training data. Attachment decisions depend
not only on the verb but also on the preposition in the PP. You could devise a more
complex estimate based on the previous category, the head verb, and the preposition for rule VP ->
V NP PP.
This would require more data to obtain reliable statistics but could lead to a significant increase in
reliability. But a complication arises in that you cannot compare across rules as easily now: The VP
-> V NP PP rule is evaluated using the verb and the preposition, but there is no corresponding
preposition with the rule VP -> V NP. Thus a more complicated measure would have to be devised.

In general, the more selective the lexical categories, the more predictive the estimates can be,
assuming there is enough data. Earlier we claimed that basing the statistics on words would
require too much data. But is there a subset of words that could profitably be used individually?
Certainly function words such as prepositions seem ideally suited for individual treatment. There is
a fixed number of them, and they have a significant impact on the structure of the sentence.
Similarly, other closed class words - articles, quantifiers, conjunctions, and so on – could all be
treated individually rather than grouped together as a class. It is reasonable to assume we will be
able to obtain enough data to obtain reliable estimates with these words.

The complications arise with the open class words. Verbs and nouns, for instance, play a major role
in constraining the data, but there are far too many of them to consider individually. One approach
is to model the common ones individually, as we did here. Another would be to try to cluster words
together into classes based on their similarities. This could be done by hand, based on semantic
properties. For instance, all verbs describing motion might have essentially the same behavior and
be collapsed into a single class. The other option is to use some automatic technique for learning
useful classes by analyzing corpora of sentences with attachment ambiguities.
Code No. 3386

B.E. 414
(IT)
FACULTY OF INFORMATICS
I-Semester
Subject:
nester
Natural (Makeup) Examina
(Makeup)
Language Examination, MayI June 201
Time 3 hours
ge Processing (Elective- IV)
Note: Answer all Max. Marks : 75

questions from Part- any FiVE questions from Part-B.

1 PART A (25 Marks)


What are the
2 features of good
3
Deine Lexicon. What is its grammar?
What is simple noun
4 What are the main phrase?
sianificance in Natural Language Proce
5
Explain Noun issues in
semantic interpretation?
6 What is Case Noun modifiers.
-

Derine Naturalgrammar?
8
Processing. language processing. List any two applications or Nauia
What is logical
form? How is it 3
explain derived tree forhelpful
9 Draw and in sentence 3
10 Explain interpretation?
"the boy kicked the bucket"? 3
Context -Free grammars.
PART B (50 Marks)
11 a) What is Natural Language
processing understanding? Explain with an example.
b) What is a phrase? Explain
prepositional phrase and compliments with example. 5
12 a) Explain Shift Reduce Parser.
5
b) What is Look ahead parser?
5
13 Explain semantic networks with example? What is the
inheritance property in semantic network.
advantage in removing
10

14 Explain the different types of logical forms. Discuss the features of logical forms.
10
15 Discuss rule - by rule semantic interpretation based on lambda calculus.
10
16 Write short notes on: a) Word senses ambiguity b) Encoding ambiguity in
logical forms. 10

17 Write short notes on

a) Problems of Machine translation


Anusaraka
b) Draw the block schematic of
******
B.E. FACULTY OF INFORMATICS
4/4 (.T.) II Semester (Main & Backlog) Examination, May
-

Subject: Natural Language Processing


(Elective - IV)
Time 3 Hours Max. Marks: 75

Note: Answer all


questions from Part-A & any five questions Trom
PART A (25 Marks)
1 Dscover al of the possible meanings of the following sentences by giving a paraphrase
of each interpretation. For each sentence, identify whether the different meanings a
from structural (3)
ambiguity, semantic ambiguity, or pragmatic ambigun
() Time flies like an arrow
(i) He drew one card

2 Give example for Discourse ambiguity. 2 4


(2)
3 Define ontology (2)
4 List any six thematic Roles. (3)
5 Write Bottom-up Parsing Algorithm.b U (3
6 Specify the lexicon for the following (2)
The large can hold the water

7 State why each of the following sentences are ambiguous or not. Specifically, state
whether they are ambiguous because of their possible syntactic structures, their word
senses, their semantic structures, or a combination of these factors (3)
S1:A man stopped at every truck stop
$2: Several people ate the pizza
$3: We saw her duck

8 The process of mapping the logical form to the final knowledge representation (KR)
languageis called (2)
9 Say we have an acceptable margin of error of between 0.4 and 0.6 for estimating the
probability of a fair coin coming up heads. What is the chance of obtaining a reliable
estimate for the probability with five flips? (3)
10 Write the advantages of best-first parser.
(2)
PART B (50 Marks)
11 (a) Draw diagram to show flow of information in Natural Language Understanding
Systems and explain with the following9 example, consider the following two
sentences (7)
S1:Visiting relatives can be trying
$2: Visiting museums can be trying
(b) Write short notes on different approaches to studying language.
(3)
2
.2.

and
prepositi
tional phrases with an
p h r a s e s

verb
of Translatior
on.
interpretation
Machine
Tin
Explain
for
12 (a) No
s t r a t e g i e s

e x a m p l e .

d i f f e r e n t

D i s c u s s

CFG.
(b)
following
1. De
Consider
the
13 (a) S 2 NPV 2. W
NPAUX
V cessing t h e sentence: The man is
S2
NP 2 ARTN
chart
parsers
in
p r o c e s s i n g

laughing 3. V

one
of the 4 .

Trace
lexicon
entries
with the
The: A R T 5
man: N
6.
is: AUX
and drawing the chart
laughing: V
of the parse,
giving
the parse, time
Show every step to the chart.
non-terminal
constituent is added
8.
Algorithm.
Chart Parsing
Top-Down 9
(b) Write
with an example.
and Logical Form
about Semantics following sentences, and ed
14 (a) Explain word used in the
can
senses of the
(b)Identify the predicate), n-ary predicate lo
term, property (unary
whether the sense is a modal operator. Justify v
predicate operator, or
operator, generalized quantifier, that could work just as well.G
classification and discuss alternate interpretation
an example logical form that
uses each sense.
S1:The yellow can fell to the ground
$2: He can see it
$3: He wants to can the tomatoes

15 (a) Give the Penn Treebank tag set.


(b) How to obtain Lexical Probabilities ? Explain with an example.
16 (a) Define the minimal set of lexicon ne
that, u entries for the following verbs so
morphological analysis algorithm, all standard forms of the verb are d
noegal torms are inadvertently produced. Discuss any problems 6 a
assumptions you make.
Dase Present Forms Past
go go, goes Past-Participle Present-Parioi
sing sing, sings went gone going
bid sang
(6) bid, bids
Discuss briefly about bid sung singing
17 Write various elements of bidden bidding (A
short notes on
(a) Estimating Probabilities
the
Simple Noun Phrases.
(b) Words following: (10
Senses and
Ambiguity
Code No. 11359

FACULTY OF ENGINEERING
DE 4/4
(.T.) II- Semester (Main Backlog) Examination,
2019
& Nayu
Subject: Natural Language Processing (Elective - )
Time: 3 hours. Max. Marks: 75

Answer all questions from


Oe Part-A & answer any five questions from Part-B
PART A (25 Marks)
1. What is
Computational Linguistics? (3)
2. State two (2)
dialog based NLP applications.
3. What are Verb Phrases? Give two (3)
examples.
4. What is a Lexical Category? 51 (2)
o. Describe the
purpose of building an Augmented Transition Netwórk. (3
6. Give a simple Grammar and Structure for the sentence Bob wrote the letter. (2)
7. How do you (3)
represent Context-Independent Meaning?
8. What are Thematic Roles? (2)
9. Why do we need to tag Parts of (2)
Speech?
10. How do you calculate Lexical
Probability (3)

PART B (50 Marks)


11. a) Describe Natural Language Understanding System. (5)
b) Explain Semantic-Processing with a suitable example (5)
12. a) Obtain Logical Formfor thesentence: the rose is red, and develop the

Contextual-Interpretation. (5)
b) What is Ambiguity? How do you encode? (5)
13. a) llustrat Depth-First Top-Down Parsing. (5)
b) Discuss issues in the design of Lexicon. (5)
14. a) Discuss co-occurrence of constraints that arise between word senses. (5)
b) Discuss linking Syntax and
Semantics. (5)
15.a) Present a Bigram and explain
some typical computations involved therein. (5)
c)Explain the use of the following assumption: the probability of a category occurring

depends only on the category


before it. (5)
16. a) Discuss Semantic-Interpretation
and Compositionality. (5)
resolve ambiguity for a case of your choice. (5)
b) Discuss how you
17. Write notes on:

Statistical Methods. (5)


NLP. (5)
i) Applications of ************

You might also like