0% found this document useful (0 votes)

185 views24 pages

NLP Programming en 04 HMM

This document provides an overview of part-of-speech (POS) tagging using hidden Markov models (HMMs). It explains that POS tagging involves predicting the POS sequence for a given sentence. HMMs are generative sequence models that decompose the probability of a tag sequence using Bayes' rule. The document details how HMMs model POS-POS transition probabilities and POS-word emission probabilities. It also covers training HMMs from annotated text, finding the most likely POS sequence for a new sentence using the Viterbi algorithm, and evaluating a trained HMM model on a test set.

Uploaded by

khinwarwarhtike

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

185 views24 pages

NLP Programming en 04 HMM

Uploaded by

khinwarwarhtike

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

NLP Programming Tutorial 5 POS Tagging with HMMs

NLP Programming Tutorial 5 Part of Speech Tagging with

Hidden Markov Models

Graham Neubig
Nara Institute of Science and Technology (NAIST)

NLP Programming Tutorial 5 POS Tagging with HMMs

Part of Speech (POS) Tagging

Given a sentence X, predict its part of speech

sequence Y

Natural language processing ( NLP ) is a field of computer science

NN -LRB- NN -RRB- VBZ DT NN IN NN

A type of structured prediction, from two weeks ago

How can we do this? Any ideas?

NLP Programming Tutorial 5 POS Tagging with HMMs

Many Answers!

Pointwise prediction: predict each word individually

with a classifier (e.g. perceptron, tool: KyTea)

Natural language processing ( NLP ) is a field of computer science

classifier
processing = NN? VBG? JJ?

classifier
computer = NN? VBG? JJ?

Generative sequence models: todays topic! (e.g.

Hidden Markov Model, tool: ChaSen)
Discriminative sequence models: predict whole
sequence with a classifier (e.g. CRF, structured
perceptron, tool: MeCab, Stanford Tagger)

NLP Programming Tutorial 5 POS Tagging with HMMs

Probabilistic Model for Tagging

Find the most probable tag sequence, given the

sentence

Natural language processing ( NLP ) is a field of computer science

NN LRB NN RRB VBZ DT NN IN NN

argmax P (YX )
Y

Any ideas?
4

NLP Programming Tutorial 5 POS Tagging with HMMs

Generative Sequence Model

First decompose probability using Bayes' law

P ( XY ) P(Y )
argmax P (YX )=argmax
P (X )
Y
Y
=argmax P ( XY ) P(Y )
Y

Model of word/POS interactions

natural is probably a JJ

Model of POS/POS interactions

NN comes after DET

Also sometimes called the noisy-channel model

NLP Programming Tutorial 5 POS Tagging with HMMs

Hidden Markov Models

NLP Programming Tutorial 5 POS Tagging with HMMs

Hidden Markov Models (HMMs) for

POS Tagging

POSPOS transition probabilities

I +1
Like a bigram model!
P (Y ) i=1 PT (y iy i1 )
POSWord emission probabilities
I

P (XY ) 1 P E ( x iy i )
PT(JJ|<s>) * PT(NN|JJ) * PT(NN|NN)

<s>

LRB

natural language processing (

PE(natural|JJ) * PE(language|NN) * PE(processing|NN)

RRB

...

nlp

...

</s>

NLP Programming Tutorial 5 POS Tagging with HMMs

Learning Markov Models (with tags)

Count the number of occurrences in the corpus and

natural language processing ( nlp )

c(JJnatural)++

<s>
c(<s> JJ)++

c(NNlanguage)++

NN
c(JJ NN)++

LRB NN RRB VB </s>

Divide by context to get probability

PT(LRB|NN) = c(NN LRB)/c(NN) = 1/3
PE(language|NN) = c(NN language)/c(NN) = 1/3
8

NLP Programming Tutorial 5 POS Tagging with HMMs

Training Algorithm
# Input data format is natural_JJ language_NN
make a map emit, transition, context
for each line in file
previous = <s>
# Make the sentence start
context[previous]++
split line into wordtags with
for each wordtag in wordtags
split wordtag into word, tag with _
transition[previous+ +tag]++ # Count the transition
context[tag]++
# Count the context
emit[tag+ +word]++
# Count the emission
previous = tag
transition[previous+ </s>]++
# Print the transition probabilities
for each key, value in transition
split key into previous, word with
print T, key, value/context[previous]
9
# Do the same thing for emission probabilities with E

NLP Programming Tutorial 5 POS Tagging with HMMs

Note: Smoothing

In bigram model, we smoothed probabilities

PLM(wi|wi-1) = PML(wi|wi-1) + (1-) PLM(wi)

HMM transition prob.: there are not many tags, so

smoothing is not necessary
PT(yi|yi-1) = PML(yi|yi-1)

HMM emission prob.: smooth for unknown words

PE(xi|yi) = PML(xi|yi) + (1-) 1/N

NLP Programming Tutorial 5 POS Tagging with HMMs

Finding POS Tags

NLP Programming Tutorial 5 POS Tagging with HMMs

Finding POS Tags with Markov Models

Use the Viterbi algorithm again!!

I told you I
was important!!

What does our graph look like?

NLP Programming Tutorial 5 POS Tagging with HMMs

Finding POS Tags with Markov Models

What does our graph look like? Answer:

natural

0:<S>

language processing

nlp

1:NN

2:NN

3:NN

4:NN

5:NN

6:NN

1:JJ

2:JJ

3:JJ

4:JJ

5:JJ

6:JJ

1:VB

2:VB

3:VB

4:VB

5:VB

6:VB

1:LRB

2:LRB

3:LRB

4:LRB

5:LRB

6:LRB

1:RRB

2:RRB

3:RRB

4:RRB

5:RRB

6:RRB

NLP Programming Tutorial 5 POS Tagging with HMMs

Finding POS Tags with Markov Models

The best path is our POS sequence

natural

0:<S>

<s>

language processing

nlp

1:NN

2:NN

3:NN

4:NN

5:NN

6:NN

1:JJ

2:JJ

3:JJ

4:JJ

5:JJ

6:JJ

1:VB

2:VB

3:VB

4:VB

5:VB

6:VB

1:LRB

2:LRB

3:LRB

4:LRB

5:LRB

6:LRB

1:RRB

2:RRB

3:RRB

4:RRB

5:RRB

6:RRB

LRB

RRB

NLP Programming Tutorial 5 POS Tagging with HMMs

Remember: Viterbi Algorithm Steps

Forward step, calculate the best path to a node

Find the path to each node with the lowest negative log
probability

Backward step, reproduce the path

This is easy, almost the same as word segmentation

NLP Programming Tutorial 5 POS Tagging with HMMs

Forward Step: Part 1

First, calculate transition from <S> and emission of the

first word for every POS
natural

0:<S>

1:NN

best_score[1 NN] = -log PT(NN|<S>) + -log PE(natural | NN)

1:JJ

best_score[1 JJ] = -log PT(JJ|<S>) + -log PE(natural | JJ)

1:VB

best_score[1 VB] = -log PT(VB|<S>) + -log PE(natural | VB)

1:LRB best_score[1 LRB] = -log PT(LRB|<S>) + -log PE(natural | LRB)

1:RRB best_score[1 RRB] = -log PT(RRB|<S>) + -log PE(natural | RRB)

NLP Programming Tutorial 5 POS Tagging with HMMs

Forward Step: Middle Parts

For middle words, calculate the minimum score for all

possible previous POS tags

natural language
1:NN

2:NN

1:JJ

2:JJ

1:VB

2:VB

1:LRB

2:LRB

1:RRB

2:RRB

best_score[2 NN] = min(

NLP Programming Tutorial 5 POS Tagging with HMMs

Forward Step: Final Part

Finish up the sentence with the sentence final symbol

science
I:NN
I:JJ
I:VB
I:LRB

I+1:</S>

best_score[I+1 </S>] = min(

I:RRB

NLP Programming Tutorial 5 POS Tagging with HMMs

Implementation: Model Loading

make a map for transition, emission, possible_tags
for each line in model_file
split line into type, context, word, prob
possible_tags[context] = 1 # We use this to
# enumerate all tags
if type = T
transition[context word] = prob
else
emission[context word] = prob

NLP Programming Tutorial 5 POS Tagging with HMMs

Implementation: Forward Step

split line into words
I = length(words)
make maps best_score, best_edge
best_score[0 <s>] = 0 # Start with <s>
best_edge[0 <s>] = NULL
for i in 0 I-1:
for each prev in keys of possible_tags
for each next in keys of possible_tags
if best_score[i prev] and transition[prev next] exist
score = best_score[i prev] +
-log PT(next|prev) + -log PE(word[i]|next)
if best_score[i+1 next] is new or > score
best_score[i+1 next] = score
best_edge[i+1 next] = i prev
# Finally, do the same for </s>
20

NLP Programming Tutorial 5 POS Tagging with HMMs

Implementation: Backward Step

tags = [ ]
next_edge = best_edge[ I </s> ]
while next_edge != 0 <s>
# Add the substring for this edge to the words
split next_edge into position, tag
append tag to tags
next_edge = best_edge[ next_edge ]
tags.reverse()
join tags into a string and print

NLP Programming Tutorial 5 POS Tagging with HMMs

Exercise

NLP Programming Tutorial 5 POS Tagging with HMMs

Exercise

Write train-hmm and test-hmm

Test the program

Input: test/05{train,test}input.txt

Answer: test/05{train,test}answer.txt

Train an HMM model on data/wikientrain.norm_pos

and run the program on data/wikientest.norm

Measure the accuracy of your tagging with

script/gradepos.pldata/wikientest.posmy_answer.pos

Report the accuracy

Challenge: think of a way to improve accuracy

NLP Programming Tutorial 5 POS Tagging with HMMs

Thank You!

Notes For ISTQB Foundation Level Certification
100% (1)
Notes For ISTQB Foundation Level Certification
12 pages
Optimization Methods For Large-Scale Machine Learning - 2021
No ratings yet
Optimization Methods For Large-Scale Machine Learning - 2021
29 pages
ML Section15 Neural Networks
No ratings yet
ML Section15 Neural Networks
133 pages
5 Sequence Learning
No ratings yet
5 Sequence Learning
50 pages
1.pos-tagging-1
No ratings yet
1.pos-tagging-1
20 pages
9.chapter7 POS Tagging
No ratings yet
9.chapter7 POS Tagging
37 pages
Module-5 (Markov Model and Pos Tagging)
No ratings yet
Module-5 (Markov Model and Pos Tagging)
66 pages
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
No ratings yet
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
93 pages
CH-2 Natural Language Processing Models and Algorithm
No ratings yet
CH-2 Natural Language Processing Models and Algorithm
119 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
2021-25-pos-tagging-nlp
No ratings yet
2021-25-pos-tagging-nlp
8 pages
POStagging
No ratings yet
POStagging
72 pages
NLP Session 6
No ratings yet
NLP Session 6
5 pages
2 cs626 Pos Tagging Week of 1aug22
No ratings yet
2 cs626 Pos Tagging Week of 1aug22
57 pages
Unit No 3
No ratings yet
Unit No 3
8 pages
Lecture Notes On Syntactic Processing
No ratings yet
Lecture Notes On Syntactic Processing
14 pages
19CSE453 - Natural Language Processing: Part of Speech Tagging
No ratings yet
19CSE453 - Natural Language Processing: Part of Speech Tagging
59 pages
Multi-Tagging For Transition-Based Dependency Parsing
No ratings yet
Multi-Tagging For Transition-Based Dependency Parsing
10 pages
L8-10_Intro_POS___HMM
No ratings yet
L8-10_Intro_POS___HMM
22 pages
Assignment 3
No ratings yet
Assignment 3
12 pages
This Is AI4001: GCR: t37g47w
No ratings yet
This Is AI4001: GCR: t37g47w
51 pages
Patoary 2020
No ratings yet
Patoary 2020
4 pages
Python programming
No ratings yet
Python programming
3 pages
NLP 4
No ratings yet
NLP 4
83 pages
Week 9
No ratings yet
Week 9
36 pages
Machine Learning Natural Language 2023
No ratings yet
Machine Learning Natural Language 2023
28 pages
Lec PoS Tagging 2022
No ratings yet
Lec PoS Tagging 2022
67 pages
Csci 544 Sequence Labeling L
No ratings yet
Csci 544 Sequence Labeling L
79 pages
L4 Tagging
No ratings yet
L4 Tagging
107 pages
Lecture Part of Speech Tagging
No ratings yet
Lecture Part of Speech Tagging
41 pages
Unit 5 NLP
No ratings yet
Unit 5 NLP
24 pages
python
No ratings yet
python
9 pages
UNIT 5
No ratings yet
UNIT 5
8 pages
Ai TXT Unit5
No ratings yet
Ai TXT Unit5
7 pages
Module 3
No ratings yet
Module 3
33 pages
Be4 A 17 NLP Exp6
No ratings yet
Be4 A 17 NLP Exp6
4 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
47 pages
CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
46 pages
Rutuja
No ratings yet
Rutuja
10 pages
Viva Q&a
No ratings yet
Viva Q&a
5 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
POS Tagging HMM Notes With Diagrams
No ratings yet
POS Tagging HMM Notes With Diagrams
4 pages
Natural Language Processing 5
No ratings yet
Natural Language Processing 5
24 pages
Natural Language Processing: Parts of Speech Tagging - Pos
No ratings yet
Natural Language Processing: Parts of Speech Tagging - Pos
20 pages
NLPChapter 3
No ratings yet
NLPChapter 3
14 pages
Chap 7.1 Sequence Analysis Using FFN
No ratings yet
Chap 7.1 Sequence Analysis Using FFN
47 pages
NLP Ia2
No ratings yet
NLP Ia2
18 pages
NLP Report - Modified
No ratings yet
NLP Report - Modified
8 pages
Natural Language Processing Week 1-5 with tasks
No ratings yet
Natural Language Processing Week 1-5 with tasks
5 pages
Unit 3
No ratings yet
Unit 3
50 pages
NLP
No ratings yet
NLP
8 pages
Pos Tagging Pushpak
No ratings yet
Pos Tagging Pushpak
88 pages
Wadola Habte Seminar
No ratings yet
Wadola Habte Seminar
16 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
13 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
25 pages
3.1 Chap NLP Pos_tagging _lecture3
No ratings yet
3.1 Chap NLP Pos_tagging _lecture3
38 pages
Part of Speech Tagging and Hidden Markov Models
No ratings yet
Part of Speech Tagging and Hidden Markov Models
24 pages
Pos Tagging and Chunking
No ratings yet
Pos Tagging and Chunking
29 pages
3 cs626 Pos Tagging Week of 8aug22
No ratings yet
3 cs626 Pos Tagging Week of 8aug22
27 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Module 4: Transportation Problem and Assignment Problem: Prasad A Y, Dept of CSE, ACSCE, B'lore-74
No ratings yet
Module 4: Transportation Problem and Assignment Problem: Prasad A Y, Dept of CSE, ACSCE, B'lore-74
37 pages
Prof. A. Acharya
No ratings yet
Prof. A. Acharya
27 pages
Artificial Neuron Models For Hydrological Modeling: Seema Narain and Ashu Jain
No ratings yet
Artificial Neuron Models For Hydrological Modeling: Seema Narain and Ashu Jain
5 pages
5.black Box Testing and Levels of Testing
No ratings yet
5.black Box Testing and Levels of Testing
75 pages
300+ (UPDATED) Software Engineering MCQs PDF 2020 PDF
No ratings yet
300+ (UPDATED) Software Engineering MCQs PDF 2020 PDF
50 pages
Nonlinear (Design) Assignment
No ratings yet
Nonlinear (Design) Assignment
1 page
Failure Reporting, Analysis, and Corrective Action System
100% (3)
Failure Reporting, Analysis, and Corrective Action System
46 pages
Maintenance Planning: Building Maintenance Ii (Bsr307)
No ratings yet
Maintenance Planning: Building Maintenance Ii (Bsr307)
20 pages
Oop Assignment
No ratings yet
Oop Assignment
4 pages
The Role of Quality Assurance in Project Management
100% (2)
The Role of Quality Assurance in Project Management
6 pages
Modern Systems Analysis and Design
No ratings yet
Modern Systems Analysis and Design
35 pages
Mexican Hat Network
33% (3)
Mexican Hat Network
10 pages
2017 18 Sem2 Handout EEE INSTR F242 Control Systems
No ratings yet
2017 18 Sem2 Handout EEE INSTR F242 Control Systems
2 pages
EACT633 - Optimal Control and It's Applications Worksheet For Chapter 4, 5 & 6
No ratings yet
EACT633 - Optimal Control and It's Applications Worksheet For Chapter 4, 5 & 6
3 pages
CO1-I-1 (Part - 1)
100% (1)
CO1-I-1 (Part - 1)
46 pages
Gibbs Free Energy
100% (1)
Gibbs Free Energy
14 pages
Chapter One: Introduction To System Analysis
No ratings yet
Chapter One: Introduction To System Analysis
34 pages
Technical Questions
No ratings yet
Technical Questions
7 pages
Souppaya Et Al. - 2022 - Secure Software Development Framework (SSDF) Versi
No ratings yet
Souppaya Et Al. - 2022 - Secure Software Development Framework (SSDF) Versi
36 pages
Mathematical Modeling of Chemical Processes
No ratings yet
Mathematical Modeling of Chemical Processes
43 pages
Software Testing Viva Questions and Answer Mca Sem5
100% (5)
Software Testing Viva Questions and Answer Mca Sem5
11 pages
Machine Learning
No ratings yet
Machine Learning
18 pages
Autonomous Robot
100% (1)
Autonomous Robot
21 pages
Individuals and Societies Guide 2014
No ratings yet
Individuals and Societies Guide 2014
64 pages
1.1 Introduction To Management, Principles, Functions and Challenges
No ratings yet
1.1 Introduction To Management, Principles, Functions and Challenges
28 pages
1 Introduction To Software Engineering
No ratings yet
1 Introduction To Software Engineering
18 pages
SP18 Practice Midterm
No ratings yet
SP18 Practice Midterm
5 pages