0% found this document useful (0 votes)
90 views99 pages

CS 388: Natural Language Processing: Syntactic Parsing: Raymond J. Mooney

CS 388 is a course on natural language processing that covers syntactic parsing. Syntactic parsing involves finding phrases like noun phrases and verb phrases in sentences, and producing a parse tree that shows the syntactic structure of a sentence according to a context-free grammar. There are two main approaches to parsing: top-down parsing which starts with the start symbol and tries to derive the sentence; and bottom-up parsing which starts with the words in the sentence and tries to combine them into larger phrases. Dynamic programming techniques can be used to cache intermediate results and obtain a polynomial time parsing algorithm.

Uploaded by

Booming System
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views99 pages

CS 388: Natural Language Processing: Syntactic Parsing: Raymond J. Mooney

CS 388 is a course on natural language processing that covers syntactic parsing. Syntactic parsing involves finding phrases like noun phrases and verb phrases in sentences, and producing a parse tree that shows the syntactic structure of a sentence according to a context-free grammar. There are two main approaches to parsing: top-down parsing which starts with the start symbol and tries to derive the sentence; and bottom-up parsing which starts with the words in the sentence and tries to combine them into larger phrases. Dynamic programming techniques can be used to cache intermediate results and obtain a polynomial time parsing algorithm.

Uploaded by

Booming System
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 99

CS 388:

Natural Language Processing:


Syntactic Parsing

Raymond J. Mooney
University of Texas at Austin
1
Phrase Chunking
• Find all non-recursive noun phrases (NPs)
and verb phrases (VPs) in a sentence.
– [NP I] [VP ate] [NP the spaghetti] [PP with]
[NP meatballs].
– [NP He ] [VP reckons ] [NP the current account
deficit ] [VP will narrow ] [PP to ] [NP only #
1.8 billion ] [PP in ] [NP September ]
Phrase Chunking as Sequence Labeling
• Tag individual words with one of 3 tags
– B (Begin) word starts new target phrase
– I (Inside) word is part of target phrase but not
the first word
– O (Other) word is not part of target phrase
• Sample for NP chunking
– He reckons the current account deficit will
narrow to only # 1.8 billion in September.
Begin Inside Other

3
Syntactic Parsing
• Produce the correct syntactic parse tree for
a sentence.
Context Free Grammars (CFG)
• N a set of non-terminal symbols (or variables)
  a set of terminal symbols (disjoint from N)
• R a set of productions or rules of the form
A→, where A is a non-terminal and  is a
string of symbols from ( N)*
• S, a designated non-terminal called the start
symbol
Simple CFG for ATIS English
Grammar Lexicon
S → NP VP Det → the | a | that | this
S → Aux NP VP Noun → book | flight | meal | money
S → VP Verb → book | include | prefer
NP → Pronoun Pronoun → I | he | she | me
NP → Proper-Noun Proper-Noun → Houston | NWA
NP → Det Nominal Aux → does
Nominal → Noun Prep → from | to | on | near | through
Nominal → Nominal Noun
Nominal → Nominal PP
VP → Verb
VP → Verb NP
VP → VP PP
PP → Prep NP
Sentence Generation
• Sentences are generated by recursively rewriting
the start symbol using the productions until only
terminals symbols remain.
S

VP Derivation
or
Verb NP Parse Tree
book Det Nominal

the Nominal PP

Noun Prep NP
flight through Proper-Noun
Houston
Parsing
• Given a string of terminals and a CFG, determine
if the string can be generated by the CFG.
– Also return a parse tree for the string
– Also return all possible parse trees for the string
• Must search space of derivations for one that
derives the given string.
– Top-Down Parsing: Start searching space of
derivations for the start symbol.
– Bottom-up Parsing: Start search space of reverse
derivations from the terminal symbols in the string.
Parsing Example

VP

Verb NP
book that flight
book Det Nominal

that Noun

flight
Top Down Parsing

NP VP

Pronoun
Top Down Parsing

NP VP

Pronoun

X
book
Top Down Parsing

NP VP

ProperNoun
Top Down Parsing

NP VP

ProperNoun

X
book
Top Down Parsing

NP VP

Det Nominal
Top Down Parsing

NP VP

Det Nominal
X
book
Top Down Parsing

Aux NP VP
Top Down Parsing

Aux NP VP
X
book
Top Down Parsing

VP
Top Down Parsing

VP

Verb
Top Down Parsing

VP

Verb

book
Top Down Parsing

VP

Verb

X
book that
Top Down Parsing

VP

Verb NP
Top Down Parsing

VP

Verb NP

book
Top Down Parsing

VP

Verb NP

book Pronoun
Top Down Parsing

VP

Verb NP

book Pronoun
X
that
Top Down Parsing

VP

Verb NP

book ProperNoun
Top Down Parsing

VP

Verb NP

book ProperNoun
X
that
Top Down Parsing

VP

Verb NP

book Det Nominal


Top Down Parsing

VP

Verb NP

book Det Nominal

that
Top Down Parsing

VP

Verb NP

book Det Nominal

that Noun
Top Down Parsing

VP

Verb NP

book Det Nominal

that Noun

flight
Bottom Up Parsing

book that flight

32
Bottom Up Parsing

Noun

book that flight

33
Bottom Up Parsing

Nominal

Noun

book that flight

34
Bottom Up Parsing

Nominal

Nominal Noun

Noun

book that flight

35
Bottom Up Parsing

Nominal

Nominal Noun

X
Noun

book that flight

36
Bottom Up Parsing

Nominal

Nominal PP

Noun

book that flight

37
Bottom Up Parsing

Nominal

Nominal PP

Noun Det

book that flight

38
Bottom Up Parsing

Nominal

Nominal PP
NP
Noun Det Nominal

book that flight

39
Bottom Up Parsing

Nominal

Nominal PP
NP
Noun Det Nominal

book that Noun

flight
40
Bottom Up Parsing

Nominal

Nominal PP
NP
Noun Det Nominal

book that Noun

flight
41
Bottom Up Parsing

Nominal

Nominal PP S

NP VP
Noun Det Nominal

book that Noun

flight
42
Bottom Up Parsing

Nominal

Nominal PP S

NP VP
Noun Det Nominal X
book that Noun

flight
43
Bottom Up Parsing

Nominal

Nominal PP
X
NP
Noun Det Nominal

book that Noun

flight
44
Bottom Up Parsing

NP
Verb Det Nominal

book that Noun

flight
45
Bottom Up Parsing

VP
NP
Verb Det Nominal

book that Noun

flight
46
Bottom Up Parsing

VP
NP
Verb Det Nominal

book that Noun

flight
47
Bottom Up Parsing

VP X
NP
Verb Det Nominal

book that Noun

flight
48
Bottom Up Parsing

VP

VP PP
NP
Verb Det Nominal

book that Noun

flight
49
Bottom Up Parsing

VP

VP PP
X NP
Verb Det Nominal

book that Noun

flight
50
Bottom Up Parsing

VP
NP
NP
Verb Det Nominal

book that Noun

flight
51
Bottom Up Parsing

VP
NP
Verb Det Nominal

book that Noun

flight
52
Bottom Up Parsing

VP
NP
Verb Det Nominal

book that Noun

flight
53
Contoh
• Bagaimana proses dan hasil parsing kalimat
berikut: “He book a flight to Houstan”
dengan menggunakan:
– Top down parsing
– Bottom up parsing

54
Top Down vs. Bottom Up
• Top down never explores options that will not lead
to a full parse, but can explore many options that
never connect to the actual sentence.
• Bottom up never explores options that do not
connect to the actual sentence but can explore
options that can never lead to a full parse.
• Relative amounts of wasted search depend on how
much the grammar branches in each direction.

55
Dynamic Programming Parsing
• To avoid extensive repeated work, must cache
intermediate results, i.e. completed phrases.
• Caching (memoizing) critical to obtaining a
polynomial time parsing (recognition)
algorithm for CFGs.
• Dynamic programming algorithms based on
both top-down and bottom-up search can
achieve O(n3) recognition time where n is the
length of the input string.

56
Dynamic Programming Parsing Methods
• CKY (Cocke-Kasami-Younger) algorithm
based on bottom-up parsing and requires
first normalizing the grammar.
• Earley parser is based on top-down
parsing and does not require normalizing
grammar but is more complex.
• More generally, chart parsers retain
completed phrases in a chart and can
combine top-down and bottom-up search.
57
CKY
• First grammar must be converted to
Chomsky normal form (CNF) in which
productions must have either exactly 2 non-
terminal symbols on the RHS or 1 terminal
symbol (lexicon rules).
• Parse bottom-up storing phrases formed
from all substrings in a triangular table
(chart).

58
Simple CFG for ATIS English
Grammar Lexicon
S → NP VP Det → the | a | that | this
S → Aux NP VP Noun → book | flight | meal | money
S → VP Verb → book | include | prefer
NP → Pronoun Pronoun → I | he | she | me
NP → Proper-Noun Proper-Noun → Houston | NWA
NP → Det Nominal Aux → does
Nominal → Noun Prep → from | to | on | near | through
Nominal → Nominal Noun
Nominal → Nominal PP
VP → Verb
VP → Verb NP
VP → VP PP
PP → Prep NP
ATIS English Grammar Conversion
Original Grammar Chomsky Normal Form Lexicon
S → NP VP S → NP VP Det → the | a | that | this
S → Aux NP VP S → X1 VP
X1 → Aux NP Noun → book | flight |
S → VP S → book | include | prefer meal | money
S → Verb NP
S → VP PP Verb → book | include |
NP → Pronoun NP → I | he | she | me prefer
NP → Proper-Noun NP → Houston | NWA
NP → Det Nominal NP → Det Nominal Pronoun → I | he | she | me
Nominal → Noun Nominal → book | flight | meal | money
Nominal → Nominal Noun Nominal → Nominal Noun Proper-Noun → Houston |
Nominal → Nominal PP Nominal → Nominal PP NWA
VP → Verb VP → book | include | prefer
VP → Verb NP VP → Verb NP Aux → does
VP → VP PP VP → VP PP
PP → Prep NP PP → Prep NP Prep → from | to | on |
near | through
CKY Parser
Book the flight through Houston
j= 1 2 3 4 5
i=
0 Cell[i,j]
contains all
constituents
1
(non-terminals)
covering words
2 i +1 through j

4
61
CKY Parser

Book the flight through Houston

S, VP, Verb,
Nominal,
Noun None

NP
Det

Nominal,
Noun

62
CKY Parser

Book the flight through Houston

S, VP, Verb,
Nominal, VP
Noun None

NP
Det

Nominal,
Noun

63
CKY Parser

Book the flight through Houston

S, VP, Verb, S
Nominal, VP
Noun None

NP
Det

Nominal,
Noun

64
CKY Parser

Book the flight through Houston

S, VP, Verb, S
Nominal, VP
Noun None

NP
Det

Nominal,
Noun

65
CKY Parser

Book the flight through Houston

S, VP, Verb, S
Nominal, VP
Noun None None

NP
Det None

Nominal,
Noun None

Prep

66
CKY Parser

Book the flight through Houston

S, VP, Verb, S
Nominal, VP
Noun None None

NP
Det None

Nominal,
Noun None

Prep PP

NP
ProperNoun

67
CKY Parser

Book the flight through Houston

S, VP, Verb, S
Nominal, VP
Noun None None

NP
Det None

Nominal,
Noun Nominal
None

Prep PP

NP
ProperNoun

68
CKY Parser

Book the flight through Houston

S, VP, Verb, S
Nominal, VP
Noun None None

NP NP
Det None

Nominal,
Noun Nominal
None

Prep PP

NP
ProperNoun

69
CKY Parser

Book the flight through Houston

S, VP, Verb, S
Nominal, VP
Noun None None
VP

NP NP
Det None

Nominal,
Noun Nominal
None

Prep PP

NP
ProperNoun

70
CKY Parser

Book the flight through Houston

S, VP, Verb, S
Nominal, VP
Noun None S
None
VP

NP NP
Det None

Nominal,
Noun Nominal
None

Prep PP

NP
ProperNoun

71
CKY Parser

Book the flight through Houston

S, VP, Verb, S
Nominal, VP VP
Noun None S
None
VP

NP NP
Det None

Nominal,
Noun Nominal
None

Prep PP

NP
ProperNoun

72
CKY Parser

Book the flight through Houston

S, VP, Verb, S S
Nominal, VP VP
Noun None S
None
VP

NP NP
Det None

Nominal,
Noun Nominal
None

Prep PP

NP
ProperNoun

73
CKY Parser

Book the flight through Houston

S, VP, Verb, S S
Parse
Nominal, VP VP Tree
Noun None S
None
VP #1

NP NP
Det None

Nominal,
Noun Nominal
None

Prep PP

NP
ProperNoun

74
CKY Parser

Book the flight through Houston

S, VP, Verb, S S
Parse
Nominal, VP VP Tree
Noun None S
None
VP #2

NP NP
Det None

Nominal,
Noun Nominal
None

Prep PP

NP
ProperNoun

75
Complexity of CKY (recognition)
• There are (n(n+1)/2) = O(n2) cells
• Filling each cell requires looking at every
possible split point between the two non-
terminals needed to introduce a new phrase.
• There are O(n) possible split points.
• Total time complexity is O(n3)

76
Complexity of CKY (all parses)
• Previous analysis assumes the number of
phrase labels in each cell is fixed by the size
of the grammar.
• If compute all derivations for each non-
terminal, the number of cell entries can
expand combinatorially.
• Since the number of parses can be
exponential, so is the complexity of finding
all parse trees.
77
Syntactic Ambiguity
• Just produces all possible parse trees.
• Does not address the important issue of
ambiguity resolution.

79
Conclusions
• Syntax parse trees specify the syntactic structure
of a sentence that helps determine its meaning.
– John ate the spaghetti with meatballs with chopsticks.
– How did John eat the spaghetti?
What did John eat?
• CFGs can be used to define the grammar of a
natural language.
• Dynamic programming algorithms allow
computing a single parse tree in cubic time or all
parse trees in exponential time.

84
CS 388:
Natural Language Processing:
Statistical Parsing

Raymond J. Mooney
University of Texas at Austin
85
Statistical Parsing
• Statistical parsing uses a probabilistic model of
syntax in order to assign probabilities to each parse
tree.
• Provides principled approach to resolving syntactic
ambiguity.
• Allows supervised learning of parsers from tree-
banks of parse trees provided by human linguists.
• Also allows unsupervised learning of parsers from
unannotated text, but the accuracy of such parsers
has been limited.

86
Probabilistic Context Free Grammar
(PCFG)
• A PCFG is a probabilistic version of a CFG
where each production has a probability.
• Probabilities of all productions rewriting a
given non-terminal must add to 1, defining
a distribution for each non-terminal.
• String generation is now probabilistic where
production probabilities are used to non-
deterministically select a production for
rewriting a given non-terminal.
87
Simple PCFG for ATIS English
Grammar Prob Lexicon
S → NP VP 0.8 Det → the | a | that | this
S → Aux NP VP 0.1 + 1.0
0.6 0.2 0.1 0.1
S → VP 0.1 Noun → book | flight | meal | money
NP → Pronoun 0.2 0.1 0.5 0.2 0.2
NP → Proper-Noun 0.2 + 1.0 Verb → book | include | prefer
NP → Det Nominal 0.6 0.5 0.2 0.3
Nominal → Noun 0.3 Pronoun → I | he | she | me
Nominal → Nominal 0.2 + 1.0 0.5 0.1 0.1 0.3
Noun 0.5 Proper-Noun → Houston | NWA
Nominal → Nominal PP 0.2 0.8 0.2
VP → Verb 0.5 + 1.0 Aux → does
VP → Verb NP 0.3 1.0
VP → VP PP 1.0 Prep → from | to | on | near | through
PP → Prep NP 0.25 0.25 0.1 0.2 0.2
Sentence Probability
• Assume productions for each node are chosen
independently.
• Probability of derivation is the product of the
probabilities of its productions.
P(D1) = 0.1 x 0.5 x 0.5 x 0.6 x 0.6 x S
0.1 D1
0.5 x 0.3 x 1.0 x 0.2 x 0.2 x VP 0.5
0.5 x 0.8 Verb NP 0.6
0.5
=
0.0000216 book Det Nominal 0.5
0.6
the Nominal PP 1.0
0.3
Noun Prep NP 0.2
0.2
0.5flight throughProper-Noun
0.8
Houston
89
Syntactic Disambiguation
• Resolve ambiguity by picking most probable parse
tree.
S
P(D2) = 0.1 x 0.3 x 0.5 x 0.6 x 0.5 x 0.1 D2
0.6 x 0.3 x 1.0 x 0.5 x 0.2 x VP
0.3
VP 0.5
0.2 x 0.8
=
0.00001296 Verb NP0.6
0.5 PP
book Det Nominal 1.0
0.6 0.3
the Noun 0.2Prep NP 0.2
0.5flight
throughProper-Noun
0.8
Houston

90
Sentence Probability
• Probability of a sentence is the sum of the
probabilities of all of its derivations.
P(“book the flight through Houston”) =
P(D1) + P(D2) = 0.0000216 + 0.00001296
= 0.00003456

91
Contoh2
• Dari dua kalimat berikut, manakah yang
mempunyai nilai ambiguitas lebih sedikit:
– “Does the meal from Houstan”
– “Does the flight to Housan”

92
Probabilistic CKY
• CKY can be modified for PCFG parsing by
including in each cell a probability for each
non-terminal.
• Cell[i,j] must retain the most probable
derivation of each constituent (non-
terminal) covering words i +1 through j
together with its associated probability.
• When transforming the grammar to CNF,
must set production probabilities to
preserve the probability of derivations.
Probabilistic Grammar Conversion
Original Grammar Chomsky Normal Form
S → NP VP 0.8 S → NP VP 0.8
S → Aux NP VP 0.1 S → X1 VP 0.1
X1 → Aux NP 1.0
S → VP 0.1 S → book | include | prefer
0.01 0.004 0.006
S → Verb NP 0.05
S → VP PP 0.03
NP → Pronoun 0.2 NP → I | he | she | me
0.1 0.02 0.02 0.06
NP → Proper-Noun 0.2 NP → Houston | NWA
0.16 .04
NP → Det Nominal 0.6 NP → Det Nominal 0.6
Nominal → Noun 0.3 Nominal → book | flight | meal | money
0.03 0.15 0.06 0.06
Nominal → Nominal Noun 0.2 Nominal → Nominal Noun 0.2
Nominal → Nominal PP 0.5 Nominal → Nominal PP 0.5
VP → Verb 0.2 VP → book | include | prefer
0.1 0.04 0.06
VP → Verb NP 0.5 VP → Verb NP 0.5
VP → VP PP 0.3 VP → VP PP 0.3
PP → Prep NP 1.0 PP → Prep NP 1.0
Probabilistic CKY Parser

Book the flight through Houston


S :.01, VP:.1,
Verb:.5
Nominal:.03
Noun:.1 None

NP:.6*.6*.15
Det:.6 =.054

Nominal:.15
Noun:.5

96
Probabilistic CKY Parser

Book the flight through Houston


S :.01, VP:.1,
Verb:.5
Nominal:.03 VP:.5*.5*.054
Noun:.1 None =.0135

NP:.6*.6*.15
Det:.6 =.054

Nominal:.15
Noun:.5

97
Probabilistic CKY Parser

Book the flight through Houston


S :.01, VP:.1, S:.05*.5*.054
Verb:.5 =.00135
Nominal:.03
None VP:.5*.5*.054
Noun:.1
=.0135

NP:.6*.6*.15
Det:.6 =.054

Nominal:.15
Noun:.5

98
Probabilistic CKY Parser

Book the flight through Houston


S :.01, VP:.1, S:.05*.5*.054
Verb:.5 =.00135
Nominal:.03
None VP:.5*.5*.054 None
Noun:.1
=.0135

NP:.6*.6*.15
Det:.6 =.054 None

Nominal:.15
Noun:.5 None

Prep:.2

99
Probabilistic CKY Parser

Book the flight through Houston


S :.01, VP:.1, S:.05*.5*.054
Verb:.5 =.00135
Nominal:.03
None VP:.5*.5*.054 None
Noun:.1
=.0135

NP:.6*.6*.15
Det:.6 =.054 None

Nominal:.15
Noun:.5 None

PP:1.0*.2*.16
Prep:.2
=.032

NP:.16
PropNoun:.8

100
Probabilistic CKY Parser

Book the flight through Houston


S :.01, VP:.1, S:.05*.5*.054
Verb:.5 =.00135
Nominal:.03
None VP:.5*.5*.054 None
Noun:.1
=.0135

NP:.6*.6*.15
Det:.6 =.054 None

Nominal:
Nominal:.15 .5*.15*.032
Noun:.5 None =.0024

PP:1.0*.2*.16
Prep:.2
=.032

NP:.16
PropNoun:.8

101
Probabilistic CKY Parser

Book the flight through Houston


S :.01, VP:.1, S:.05*.5*.054
Verb:.5 =.00135
Nominal:.03
None VP:.5*.5*.054 None
Noun:.1
=.0135
NP:.6*.6*
NP:.6*.6*.15 .0024
Det:.6 =.054 None =.000864

Nominal:
Nominal:.15 .5*.15*.032
Noun:.5 None =.0024

PP:1.0*.2*.16
Prep:.2
=.032

NP:.16
PropNoun:.8

102
Probabilistic CKY Parser

Book the flight through Houston


S :.01, VP:.1, S:.05*.5*.054
Verb:.5 =.00135 S:.05*.5*
Nominal:.03 .000864
None VP:.5*.5*.054 None
Noun:.1 =.0000216
=.0135
NP:.6*.6*
NP:.6*.6*.15 .0024
Det:.6 =.054 None =.000864

Nominal:
Nominal:.15 .5*.15*.032
Noun:.5 None =.0024

PP:1.0*.2*.16
Prep:.2
=.032

NP:.16
PropNoun:.8

103
Probabilistic CKY Parser

Book the flight through Houston


S :.01, VP:.1, S:.05*.5*.054 S:.03*.0135*
Verb:.5 =.00135 .032
Nominal:.03 =.00001296
None VP:.5*.5*.054 None
Noun:.1 S:.0000216
=.0135
NP:.6*.6*
NP:.6*.6*.15 .0024
Det:.6 =.054 None =.000864

Nominal:
Nominal:.15 .5*.15*.032
Noun:.5 None =.0024

PP:1.0*.2*.16
Prep:.2
=.032

NP:.16
PropNoun:.8

104
Probabilistic CKY Parser

Book the flight through Houston


S :.01, VP:.1, S:.05*.5*.054
Verb:.5 =.00135 S:.0000216
Pick most probable
Nominal:.03
None VP:.5*.5*.054 None parse, i.e. take max to
Noun:.1
=.0135 combine probabilities
NP:.6*.6*
.0024
of multiple derivations
NP:.6*.6*.15
Det:.6 =.054 None =.000864 of each constituent in
each cell.
Nominal:
Nominal:.15 .5*.15*.032
Noun:.5 None =.0024

PP:1.0*.2*.16
Prep:.2
=.032

NP:.16
PropNoun:.8

105

You might also like