Report

A
REPORT
On
NEURAL POS TAGGING

[ ASSIGNMENT - 02 ]
INTRODUCTION TO NATURAL LANGUAGE PROCESSING

By
Rishi Nayak
2023201004
[ M.tech CSE ]
INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,

HYDERABAD.
ABSTRACT
The following report contains the implementation of Part of Speech [ POS ] tagging
using Feed Forward Neural Network and Recurrent Neural Network ( LSTM ). The
models are trained on a variety of English datasets and validated and tested on given
datasets respectively. This report briefly explains the implementation of these neural
networks, and how the models are trained. It consists of reports of different
hyperparameters used to test the accuracies of the models and the graphs plotted
between accuracies and different hyperparameters.
INTRODUCTION
1. Feed Forward Neural Network:
This is an implementation for a Feedforward Neural Network (FNN) used as a

Part-of-Speech (POS) Tagger. The purpose of this system is to assign a grammatical
tag to each word in a given input sentence. The system is implemented using PyTorch.
Code Overview
The system's core is the FNN_POS_Tagger class, a PyTorch neural network module. It
utilises an embedding layer to convert input words into dense vectors, followed by fully
connected layers to perform classification. The system loads data from CONLLU files,
which contain parsed sentences in the Universal Dependencies format. These
sentences are then processed to extract word and POS tag information.
Functions are provided to generate input-output pairs for training the model. These pairs
consist of input contexts (previous and successive tokens) and their corresponding POS
tags. The model is trained using a provided training dataset. Training involves
minimizing a cross-entropy loss function using the Adam optimizer. The model's
performance is evaluated using a validation dataset to monitor loss and accuracy. After
training, the model is tested on separate test datasets to assess its generalization
performance. The test accuracy is reported for each test dataset. After training and
testing, the trained model's parameters are saved to a file for future use.
Hyperparameters
P: Number of previous tokens considered as context.

S: Number of successive tokens considered as context.
EMBEDDING_DIM: Dimensionality of word embeddings.
HIDDEN_DIM: Dimensionality of hidden layers in the neural network.
UNK_TOKEN: Token used to handle out-of-vocabulary words.
Batch Size: Number of input-output pairs processed in each training iteration.
Learning Rate: The rate at which the model's parameters are updated during optimization.
Number of Epochs: Number of times the entire dataset is passed forward and backwards
through the neural network.
2. Recurrent Neural Network [ LSTM ]:
This is an implementation for a Long Short-Term Memory (LSTM) based Part-of-Speech

(POS) Tagger. The system aims to assign a grammatical tag to each word in a given
input sentence. The implementation utilizes PyTorch.
Code Overview
The core component is the LSTM_POS_Tagger class, a PyTorch neural network

module. It incorporates an embedding layer to convert input words into dense vectors
and an LSTM layer for sequence modelling. The final fully connected layer performs
classification. Data is loaded from CONLLU files containing parsed sentences in the
Universal Dependencies format. These sentences are processed to extract word and
POS tag information.
Functions generate input-output pairs for training the model. These pairs consist of input
sequences (words represented as indices) and their corresponding POS tags. Data
loaders are created to handle batching and padding of input-output pairs for efficient
training.
The model is trained using the training dataset. Training involves minimizing the
cross-entropy loss function using the Adam optimizer. Validation is performed on a
separate dataset to monitor loss and accuracy. After training, the model is evaluated on
multiple test datasets to assess its generalization performance. The test accuracy is
reported for each test dataset. The trained model's parameters and the vocabulary
(word-to-index and tag-to-index mappings) are saved for future use.
Hyperparameters
EMBEDDING_DIM: Dimensionality of word embeddings.

HIDDEN_DIM: Dimensionality of hidden layers in the LSTM.
UNK_TOKEN: Token used to handle out-of-vocabulary words.
Batch Size: Number of input-output pairs processed in each training iteration.
Learning Rate: The rate at which the model's parameters are updated during optimization.
Number of Epochs: Number of times the entire dataset is passed forward and backwards
through the neural network.
HYPERPARAMETER TUNING REPORTS
FNN REPORTS
1. Experiment with hyperparameters:
To experiment with different values of hyperparameters, I created 3 sets and trained my

model on the training data. Then for the best validation results, I used that model to test
on my testing data. Here are the results -
hyperparams_sets = [
{'P': 4, 'S': 1, 'epochs': 2, 'hidden_layers': 1},
{'P': 1, 'S': 4, 'epochs': 5, 'hidden_layers': 2},
{'P': 2, 'S': 2, 'epochs': 10, 'hidden_layers': 3}
]
Set 1 - ACCURACY - 91.58%

Testing accuracies:
2. Experiment with context window:
With the entire given dataset of the English language, I tested my model with the
mentioned context window size with the specific condition of -
p ( previous tokens ) = s ( successive tokens ) = context_window_size
and trained the model with the training data, later testing the accuracies with the
validation set as mentioned in the task. Here are the observations -
Case 1: p = s = 0
Case 2: p = s = 1
Case 3: p = s = 2
Case 4: p = s = 3
Case 5: p = s = 4
Here is the graph plotted between the context window size and the validation set
accuracies -
LSTM REPORTS
hyperparameters = [
{"epochs": 10, "lstm_stacks": 1, "bidirectional": False,
"hidden_dim": 128, "embedding_dim": 100, "activation": nn.Tanh()},
{"epochs": 5, "lstm_stacks": 2, "bidirectional": True,
"hidden_dim": 256, "embedding_dim": 150, "activation": nn.ReLU()},
{"epochs": 8, "lstm_stacks": 1, "bidirectional": False,
"hidden_dim": 100, "embedding_dim": 200, "activation": nn.Sigmoid()}
]
1. Set 1 Results -
2. Set 2 Results -
3. Set 3 Results -
Performance of best model on testing data :
Evaluation of best model on test datasets -
1. ud-treebanks-v2.13/UD_English-ESLSpok/en_eslspok-ud-test.conllu:
Micro Precision: 0.9871, Micro Recall: 0.9871, Micro F1-score: 0.9871
Macro Precision: 0.8622, Macro Recall: 0.8176, Macro F1-score: 0.8296
2. ud-treebanks-v2.13/UD_English-EWT/en_ewt-ud-test.conllu:
3. ud-treebanks-v2.13/UD_English-GUM/en_gum-ud-test.conllu:
4. ud-treebanks-v2.13/UD_English-Pronouns/en_pronouns-ud-test.conllu:
5. ud-treebanks-v2.13/UD_English-GENTLE/en_gentle-ud-test.conllu:
6. ud-treebanks-v2.13/UD_English-LinES/en_lines-ud-test.conllu:
7. ud-treebanks-v2.13/UD_English-PUD/en_pud-ud-test.conllu:
MODEL PERFORMANCE -
LSTM -
FNN -
POS TAG RESULTS -
FNN - POS tag output for a given sentence -

LSTM - POS tag output for a given sentence -
CONCLUSION
FNN REPORT
When we Increase the Context Window Size -
● Positives: Enabling the model to capture more contextual information, potentially

improving accuracy by considering a wider range of surrounding words.
● Negatives: Increases computational complexity and memory requirements,

leading to longer training times and higher resource consumption.
Optimal Context Window Size: P=S=2

Impact on Model Performance:
Accuracy: Larger context window sizes may improve accuracy by providing more
contextual cues, while too small window sizes may lead to decreased accuracy due to
insufficient context.
Efficiency: Smaller window sizes improved the training efficiency but might sacrifice
accuracy, while larger window sizes enhanced the accuracy at the expense of increased
computational resources.
Resources: Larger context window sizes require more computational power.
RNN REPORT
Hyperparameters:
{"epochs": 10, "lstm_stacks": 1, "bidirectional": False, "hidden_dim": 128, "embedding_dim": 100,
"activation": nn.Tanh()}
This configuration resulted in moderate training and validation losses. The model
converged quickly, possibly indicating that the network capacity might not be fully
utilized.
Hyperparameters:
{"epochs": 15, "lstm_stacks": 2, "bidirectional": True, "hidden_dim": 256, "embedding_dim": 150,
"activation": nn.ReLU()}
This configuration led to slightly higher training and validation losses compared to the
first configuration but with an improved accuracy on the validation set. The deeper
network with bidirectional LSTM layers might have contributed to capturing more
complex patterns in the data.
Hyperparameters:
{"epochs": 12, "lstm_stacks": 1, "bidirectional": False, "hidden_dim": 100, "embedding_dim": 200,
"activation": nn.Sigmoid()}
With this configuration, the model exhibited a similar performance to the first
configuration on the validation set. The higher embedding dimension might have
allowed the model to capture more nuanced semantic information, but it did not
significantly improve the overall performance.

Report

Uploaded by

Copyright:

Available Formats

Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Report

Uploaded by

Copyright:

Available Formats

A

NEURAL POS TAGGING

INTRODUCTION TO NATURAL LANGUAGE PROCESSING

INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,

1. Feed Forward Neural Network:

This is an implementation for a Feedforward Neural Network (FNN) used as a

P: Number of previous tokens considered as context.

2. Recurrent Neural Network [ LSTM ]:

This is an implementation for a Long Short-Term Memory (LSTM) based Part-of-Speech

The core component is the LSTM_POS_Tagger class, a PyTorch neural network

EMBEDDING_DIM: Dimensionality of word embeddings.

HYPERPARAMETER TUNING REPORTS

1. Experiment with hyperparameters:

To experiment with different values of hyperparameters, I created 3 sets and trained my

Set 1 - ACCURACY - 91.58%

Set 2 - ACCURACY - 92.34%

2. Experiment with context window:

p ( previous tokens ) = s ( successive tokens ) = context_window_size

Evaluation of best model on test datasets -

POS TAG RESULTS -

FNN - POS tag output for a given sentence -

When we Increase the Context Window Size -

● Positives: Enabling the model to capture more contextual information, potentially

● Negatives: Increases computational complexity and memory requirements,

Optimal Context Window Size: P=S=2

Resources: Larger context window sizes require more computational power.

You might also like