NLP Programming Tutorial 5 POS Tagging with HMMs
NLP Programming Tutorial 5 Part of Speech Tagging with
Hidden Markov Models
Graham Neubig
Nara Institute of Science and Technology (NAIST)
NLP Programming Tutorial 5 POS Tagging with HMMs
Part of Speech (POS) Tagging
Given a sentence X, predict its part of speech
sequence Y
Natural language processing ( NLP ) is a field of computer science
JJ
NN
NN -LRB- NN -RRB- VBZ DT NN IN NN
NN
A type of structured prediction, from two weeks ago
How can we do this? Any ideas?
NLP Programming Tutorial 5 POS Tagging with HMMs
Many Answers!
Pointwise prediction: predict each word individually
with a classifier (e.g. perceptron, tool: KyTea)
Natural language processing ( NLP ) is a field of computer science
classifier
processing = NN? VBG? JJ?
classifier
computer = NN? VBG? JJ?
Generative sequence models: todays topic! (e.g.
Hidden Markov Model, tool: ChaSen)
Discriminative sequence models: predict whole
sequence with a classifier (e.g. CRF, structured
perceptron, tool: MeCab, Stanford Tagger)
NLP Programming Tutorial 5 POS Tagging with HMMs
Probabilistic Model for Tagging
Find the most probable tag sequence, given the
sentence
Natural language processing ( NLP ) is a field of computer science
JJ
NN
NN LRB NN RRB VBZ DT NN IN NN
NN
argmax P (YX )
Y
Any ideas?
4
NLP Programming Tutorial 5 POS Tagging with HMMs
Generative Sequence Model
First decompose probability using Bayes' law
P ( XY ) P(Y )
argmax P (YX )=argmax
P (X )
Y
Y
=argmax P ( XY ) P(Y )
Y
Model of word/POS interactions
natural is probably a JJ
Model of POS/POS interactions
NN comes after DET
Also sometimes called the noisy-channel model
5
NLP Programming Tutorial 5 POS Tagging with HMMs
Hidden Markov Models
NLP Programming Tutorial 5 POS Tagging with HMMs
Hidden Markov Models (HMMs) for
POS Tagging
POSPOS transition probabilities
I +1
Like a bigram model!
P (Y ) i=1 PT (y iy i1 )
POSWord emission probabilities
I
P (XY ) 1 P E ( x iy i )
PT(JJ|<s>) * PT(NN|JJ) * PT(NN|NN)
<s>
JJ
NN
NN
LRB
natural language processing (
PE(natural|JJ) * PE(language|NN) * PE(processing|NN)
NN
RRB
...
nlp
...
</s>
NLP Programming Tutorial 5 POS Tagging with HMMs
Learning Markov Models (with tags)
Count the number of occurrences in the corpus and
natural language processing ( nlp )
c(JJnatural)++
<s>
c(<s> JJ)++
c(NNlanguage)++
JJ
NN
c(JJ NN)++
NN
is
LRB NN RRB VB </s>
Divide by context to get probability
PT(LRB|NN) = c(NN LRB)/c(NN) = 1/3
PE(language|NN) = c(NN language)/c(NN) = 1/3
8
NLP Programming Tutorial 5 POS Tagging with HMMs
Training Algorithm
# Input data format is natural_JJ language_NN
make a map emit, transition, context
for each line in file
previous = <s>
# Make the sentence start
context[previous]++
split line into wordtags with
for each wordtag in wordtags
split wordtag into word, tag with _
transition[previous+ +tag]++ # Count the transition
context[tag]++
# Count the context
emit[tag+ +word]++
# Count the emission
previous = tag
transition[previous+ </s>]++
# Print the transition probabilities
for each key, value in transition
split key into previous, word with
print T, key, value/context[previous]
9
# Do the same thing for emission probabilities with E
NLP Programming Tutorial 5 POS Tagging with HMMs
Note: Smoothing
In bigram model, we smoothed probabilities
PLM(wi|wi-1) = PML(wi|wi-1) + (1-) PLM(wi)
HMM transition prob.: there are not many tags, so
smoothing is not necessary
PT(yi|yi-1) = PML(yi|yi-1)
HMM emission prob.: smooth for unknown words
PE(xi|yi) = PML(xi|yi) + (1-) 1/N
10
NLP Programming Tutorial 5 POS Tagging with HMMs
Finding POS Tags
11
NLP Programming Tutorial 5 POS Tagging with HMMs
Finding POS Tags with Markov Models
Use the Viterbi algorithm again!!
I told you I
was important!!
What does our graph look like?
12
NLP Programming Tutorial 5 POS Tagging with HMMs
Finding POS Tags with Markov Models
What does our graph look like? Answer:
natural
0:<S>
language processing
nlp
1:NN
2:NN
3:NN
4:NN
5:NN
6:NN
1:JJ
2:JJ
3:JJ
4:JJ
5:JJ
6:JJ
1:VB
2:VB
3:VB
4:VB
5:VB
6:VB
1:LRB
2:LRB
3:LRB
4:LRB
5:LRB
6:LRB
1:RRB
2:RRB
3:RRB
4:RRB
5:RRB
6:RRB
13
NLP Programming Tutorial 5 POS Tagging with HMMs
Finding POS Tags with Markov Models
The best path is our POS sequence
natural
0:<S>
<s>
language processing
nlp
1:NN
2:NN
3:NN
4:NN
5:NN
6:NN
1:JJ
2:JJ
3:JJ
4:JJ
5:JJ
6:JJ
1:VB
2:VB
3:VB
4:VB
5:VB
6:VB
1:LRB
2:LRB
3:LRB
4:LRB
5:LRB
6:LRB
1:RRB
2:RRB
3:RRB
4:RRB
5:RRB
6:RRB
JJ
NN
NN
LRB
NN
RRB
14
NLP Programming Tutorial 5 POS Tagging with HMMs
Remember: Viterbi Algorithm Steps
Forward step, calculate the best path to a node
Find the path to each node with the lowest negative log
probability
Backward step, reproduce the path
This is easy, almost the same as word segmentation
15
NLP Programming Tutorial 5 POS Tagging with HMMs
Forward Step: Part 1
First, calculate transition from <S> and emission of the
first word for every POS
natural
0:<S>
1:NN
best_score[1 NN] = -log PT(NN|<S>) + -log PE(natural | NN)
1:JJ
best_score[1 JJ] = -log PT(JJ|<S>) + -log PE(natural | JJ)
1:VB
best_score[1 VB] = -log PT(VB|<S>) + -log PE(natural | VB)
1:LRB best_score[1 LRB] = -log PT(LRB|<S>) + -log PE(natural | LRB)
1:RRB best_score[1 RRB] = -log PT(RRB|<S>) + -log PE(natural | RRB)
16
NLP Programming Tutorial 5 POS Tagging with HMMs
Forward Step: Middle Parts
For middle words, calculate the minimum score for all
possible previous POS tags
natural language
1:NN
2:NN
1:JJ
2:JJ
1:VB
2:VB
1:LRB
2:LRB
1:RRB
2:RRB
best_score[2 NN] = min(
best_score[1 NN] + -log PT(NN|NN) + -log PE(language | NN),
best_score[1 JJ] + -log PT(NN|JJ) + -log PE(language | NN),
best_score[1 VB] + -log PT(NN|VB) + -log PE(language | NN),
best_score[1 LRB] + -log PT(NN|LRB) + -log PE(language | NN),
best_score[1 RRB] + -log PT(NN|RRB) + -log PE(language | NN),
...
)
best_score[2 JJ] = min(
best_score[1 NN] + -log PT(JJ|NN) + -log PE(language | JJ),
best_score[1 JJ] + -log PT(JJ|JJ) + -log PE(language | JJ),
best_score[1 VB] + -log PT(JJ|VB) + -log PE(language | JJ), 17
...
NLP Programming Tutorial 5 POS Tagging with HMMs
Forward Step: Final Part
Finish up the sentence with the sentence final symbol
science
I:NN
I:JJ
I:VB
I:LRB
I+1:</S>
best_score[I+1 </S>] = min(
best_score[I NN] + -log PT(</S>|NN),
best_score[I JJ] + -log PT(</S>|JJ),
best_score[I VB] + -log PT(</S>|VB),
best_score[I LRB] + -log PT(</S>|LRB),
best_score[I NN] + -log PT(</S>|RRB),
...
)
I:RRB
18
NLP Programming Tutorial 5 POS Tagging with HMMs
Implementation: Model Loading
make a map for transition, emission, possible_tags
for each line in model_file
split line into type, context, word, prob
possible_tags[context] = 1 # We use this to
# enumerate all tags
if type = T
transition[context word] = prob
else
emission[context word] = prob
19
NLP Programming Tutorial 5 POS Tagging with HMMs
Implementation: Forward Step
split line into words
I = length(words)
make maps best_score, best_edge
best_score[0 <s>] = 0 # Start with <s>
best_edge[0 <s>] = NULL
for i in 0 I-1:
for each prev in keys of possible_tags
for each next in keys of possible_tags
if best_score[i prev] and transition[prev next] exist
score = best_score[i prev] +
-log PT(next|prev) + -log PE(word[i]|next)
if best_score[i+1 next] is new or > score
best_score[i+1 next] = score
best_edge[i+1 next] = i prev
# Finally, do the same for </s>
20
NLP Programming Tutorial 5 POS Tagging with HMMs
Implementation: Backward Step
tags = [ ]
next_edge = best_edge[ I </s> ]
while next_edge != 0 <s>
# Add the substring for this edge to the words
split next_edge into position, tag
append tag to tags
next_edge = best_edge[ next_edge ]
tags.reverse()
join tags into a string and print
21
NLP Programming Tutorial 5 POS Tagging with HMMs
Exercise
22
NLP Programming Tutorial 5 POS Tagging with HMMs
Exercise
Write train-hmm and test-hmm
Test the program
Input: test/05{train,test}input.txt
Answer: test/05{train,test}answer.txt
Train an HMM model on data/wikientrain.norm_pos
and run the program on data/wikientest.norm
Measure the accuracy of your tagging with
script/gradepos.pldata/wikientest.posmy_answer.pos
Report the accuracy
Challenge: think of a way to improve accuracy
23
NLP Programming Tutorial 5 POS Tagging with HMMs
Thank You!
24