AI Assignment 1
AI Assignment 1
AI Assignment 1
Word Analysis
Aim - The objective of the experiment is to learn about morphological features of a word by
analyzing it.
Theory- Analysis of a word into root and affix(es) is called as Morphological analysis of a word. It
is mandatory to identify root of a word for any natural language processing task. A root word can
have various forms. For example, the word 'play' in English has the following forms: 'play', 'plays',
'played' and 'playing'. Hindi shows more number of forms for the word 'खे ल' (khela) which is
equivalent to 'play'. The forms of 'खे ल'(khela) are the following:
For Telugu root ఆడడం (Adadam), the forms are the following::
Thus we understand that the morphological richness of one language might vary from one language
to another. Indian languages are generally morphologically rich languages and therefore
morphological analysis of words becomes a very significant task for Indian languages.
Types of Morphology
1. Inflectional morphology
Deals with word forms of a root, where there is no change in lexical category. For example,
'played' is an inflection of the root word 'play'. Here, both 'played' and 'play' are verbs.
2. Derivational morphology
Deals with word forms of a root, where there is a change in the lexical category. For example, the
word form 'happiness' is a derivation of the word 'happy'. Here, 'happiness' is a derived noun form
of the adjective 'happy'.
Morphological Features:
All words will have their lexical category attested during morphological analysis.
A noun and pronoun can take suffixes of the following features: gender, number, person, case For
example, morphological analysis of a few words is given below:
A verb can take suffixes of the following features: tense, aspect, modality, gender, number,
person
'rt' stands for root. 'cat' stands for lexical category. Thev value of lexicat category can be noun,
verb, adjective, pronoun, adverb, preposition. 'gen' stands for gender. The value of gender can be
masculine or feminine.
'num' stands for number. The value of number can be singular (sg) or plural (pl).
'per' stands for person. The value of person can be 1, 2 or 3
The value of tense can be present, past or future. This feature is applicable for verbs.
The value of aspect can be perfect (pft), continuous (cont) or habitual (hab). This feature is not
applicable for verbs.
'case' can be direct or oblique. This feature is applicable for nouns. A case is an oblique case when
a postposition occurs after noun. If no postposition can occur after noun, then the case is a direct
case. This is applicable for hindi but not english as it doesn't have any postpositions.
Some of the postpsitions in hindi are: का(kaa), की(kii), के(ke), क (ko), में (meM).
Procedure –
STEP1: Select the language.
OUTPUT: Drop down for selecting words will appear.
Output –
NLP Experiment 2
Word Generation
Aim - The objective of the experiment is to generate word forms from root and suffix
information.
Theory- Given the root and suffix information, a word can be generated. For example,
STEP3: After selecting all the features, select the word corresponding above features selected.
STEP4: Click the check button to see whether right word is selected or not
OUTPUT: Output tells whether the word selected is right or wrong
Output –
NLP Experiment 3
Morphology
Aim - Understanding the morphology of a word by the use of Add-Delete table
Definition
Morphemes are considered as smallest meaningful units of language. These morphemes can
either be a root word(play) or affix(-ed). Combination of these morphemes is called
morphological process. So, word "played" is made out of 2 morphemes "play" and "-ed". Thus
finding all parts of a word(morphemes) and thus describing properties of a word is called
"Morphological Analysis". For example, "played" has information verb "play" and "past tense", so
given word is past tense form of verb "play".
Analysis of a word :
(ओ
3 plural oblique)
=
A linguistic paradigm is the complete set of variants of a given lexeme. These variants can be
classified according to shared inflectional categories (eg: number, case etc) and arranged into
tables.
direct बच्ा(bachchaa)बच्े(bachche)
obliqueबच्े(bachche) बच् ूं
(bachchoM)
Delete आ(aa)
output बच्(bachch)
(bachchoM)
Paradigm Class
Words in the same paradigm class behave similarly, for Example लड़क is in the same paradigm
class as बच्, so लड़का would behave similarly as बच्ा as they share the same paradigm class.
Procedure –
STEP1: Select a word root.
STEP2: Fill the add-delete table and submit.
STEP3: If wrong, see the correct answer or repeat STEP1.
Output –
AI Experiment 4
N-Grams
Aim - The objective of this experiment is to learn to calculate bigrams from a given corpus
and calculate probability of a sentence.
Theory- A combination of words forms a sentence. However, such a formation is meaningful
only when the words are arranged in some order.
Eg: Sit I car in the
Such a sentence is not grammatically acceptable. However some perfectly grammatical
sentences can be nonsensical too!
Eg: Colorless green ideas sleep furiously
One easy way to handle such unacceptable sentences is by assigning probabilities to the strings
of words i.e, how likely the sentence is.
Probability of a sentence
If we consider each word occurring in its correct location as an independent event,the
probability of the sentences is : P(w(1), w(2)..., w(n-1), w(n))
Bigrams
We can avoid this very long calculation by approximating that the probability of a given word
depends only on the probability of its previous words. This assumption is called Markov
assumption and such a model is called Markov model- bigrams. Bigrams can be generalized to
the n-gram which looks at (n-1) words in the past. A bigram is a first-order Markov model.
Therefore ,
P(w(1), w(2)..., w(n-1), w(n))= P(w(2)|w(1)) P(w(3)|w(2)) …. P(w(n)|w(n-1))
Eg: Corpus – (eos) You book a flight (eos) I read a book (eos) You read (eos)
Bigram Table:
(eos)you booka flightI read
(eos) 0 0.330 0 0 0.250
you 0 0 0.5 0 0 0 0.5
book 0.5 0 0 0.50 0 0
a 0 0 0.5 0 0.5 0 0
flight1 0 0 0 0 0 0
I 0 0 0 0 0 0 1
read 0.5 0 0 0.50 0 0
Procedure –
STEP1: Select a corpus and click on Generate bigram table
STEP2: Fill up the table that is generated and hit Submit
STEP3: If incorrect (red), see the correct answer by clicking on show answer or repeat Step 2.
STEP4: If correct (green), click on take a quiz and fill the correct answer
Output –
AI Experiment 5
N-Grams Smoothing
Aim -The objective of this experiment is to learn how to apply add-one smoothing on sparse
bigram table..
Theory –
The standard N-gram models are trained from some corpus. The finiteness of the training
corpus leads to the absence of some perfectly acceptable N-grams. This results in sparse
bigram matrices. This method tend to underestimate the probability of strings that do not
occur in their training corpus.
There are some techniques that can be used for assigning a non-zero probabilty to these 'zero
probability bigrams'. This task of reevaluating some of the zero-probability and low-probabilty
N-grams, and assigning them non-zero values, is called smoothing. Some of the techniques are:
Add-One Smoothing, Witten-Bell Discounting, Good-Turing Discounting.
Add-One Smoothing
In Add-One smooting, we add one to all the bigram counts before normalizing them into
probabilities. This is called add-one smoothing.
Application on unigrams
The unsmoothed maximum likelihood estimate of the unigram probability can be computed by
dividing the count of the word by the total number of word tokens N
P(wx) = c(wx)/sumi{c(wi)}
= c(wx)/N
Let there be an adjusted count c*.
c i* = (c +1)*N/(N+V)
where where V is the total number of word types in the language.
Now, probabilities can be calculated by normalizing counts by N.
p i* = (c +1)/(N+V)
Application on bigrams
Normal bigram probabilities are computed by normalizing each row of counts by the unigram
count:
P(w n|wn-1) = C(wn-1wn)/C(wn-1)
For add-one smoothed bigram counts we need to augment the unigram count by the number of
total word types in the vocabulary V:
p *(wn|wn-1) = ( C(wn-1wn)+1 )/( C(wn-1)+V )
Procedure –
For POS tagging, it is assumed that POS are generated as random process, and each process
randomly generates a word. Hence, transition matrix denotes the transition probability from
one POS to another and emission matrix denotes the probability that a given word can have a
particular POS. Word acts as the observations. Some of the basic assumptions are:
EOS/eos
They/pronoun
cut/verb
the/determiner
paper/noun
EOS/eos
He/pronoun
asked/verb
for/preposition
his/pronoun
cut/noun.
EOS/eos
Put/verb
the/determiner
paper/noun
in/preposition
the/determiner
cut/noun
EOS/eos
Count the no. of times a specific word occus with a specific POS tag in the corpus.
Here, say for "cut"
count(cut,verb)=1
count(cut,noun)=2
count(cut,determiner)=0
P(cut/verb)=count(cut,verb)/count(cut)=1/3=0.33
Similarly,
Probability to be filled in the cell at he intersection of cut and determiner
P(cut/determiner)=count(cut,determiner)/count(cut)=0/3=0
Repeat the same for all the word-tag combination and fill the
Count the no. of times a specific tag comes after other POS tags in the corpus.
Here, say for "determiner"
count(verb,determiner)=2
count(preposition,determiner)=1
count(determiner,determiner)=0
count(eos,determiner)=0
count(noun,determiner)=0
P(determiner/verb)=count(verb,determiner)/count(determiner)=2/3=0.66
Similarly,
Probability to be filled in the cell at he intersection of determiner(in the column) and noun(in
the row)
P(determiner/noun)=count(noun,determiner)/count(determiner)=0/3=0
Procedure –
Here "s" denotes words and "t" denotes tags. "a" is transmission matrix and "b" is emission
matrix.
Using above algorithm, we have to fill the viterbi table column by column.
Procedure –
Output –
AI Experiment 8
Building POS Tagger
Aim - The objective of the experiment is to know the importance of context and size of
training corpus in learning Parts of Speech.
Theory – In the mid 1980s, researchers in Europe began to use Hidden Markov models (HMMs)
to disambiguate parts of speech. HMMs involve counting cases, and making a table of the
probabilities of certain sequences. For example, once you've seen an article such as 'the',
perhaps the next word is a noun 40% of the time, an adjective 40%, and a number 20%.
Knowing this, a program can decide that "can" in "the can" is far more likely to be a noun than
a verb or a modal. The same method can of course be used to benefit from knowledge about
following words.
More advanced ("higher order") HMMs learn the probabilities not only of pairs, but triples or
even larger sequences. So, for example, if you've just seen an article and a verb, the next item
may be very likely a preposition, article, or noun, but much less likely another verb.
When several ambiguous words occur together, the possibilities multiply. However, it is easy
to enumerate every combination and to assign a relative probability to each one, by
multiplying together the probabilities of each choice in turn.
It is worth remembering, as Eugene Charniak points out in Statistical techniques for natural
language parsing, that merely assigning the most common tag to each known word and the tag
"proper noun" to all unknowns, will approach 90% accuracy because many words are
unambiguous.
HMMs underlie the functioning of stochastic taggers and are used in various algorithms.
Accuracies for one such algorithm (TnT) on various training data is shown here.
Conditional random fields (CRFs) are a class of statistical modelling method often applied in
machine learning, where they are used for structured prediction. Whereas an ordinary
classifier predicts a label for a single sample without regard to "neighboring" samples, a CRF
can take context into account. Since it can consider context, therefore CRF can be used in
Natural Language Processing. Hence, Parts of Speech tagging is also possible. It predicts the
POS using the lexicons as the context.
If only one neighbour is considered as a context, then it is called bigram. Similarly, two
neighbours as the context is called trigram. In this experiment, size of training corpus and
context were varied to know their importance.
Procedure –
STEP1: Select the language.
OUTPUT: Drop down to select size of corpus, algorithm and features will appear.
STEP2: Select corpus size.
STEP3: Select algorithm "CRF" or "HMM".
STEP4: Select feature "bigram" or "trigram".
OUTPUT: Corresponding accuracy wil be shown.
Output –
AI Experiment 9
Chunking
Aim - The objective of this experiment is to understand the concept of chunking and get
familiar with the basic chunk tagset.
Theory –
Chunking of text invloves
dividing a text into
syntactically correlated words.
Chunk Types
The chunk types are based on the
syntactic category part. Besides the head a chunk also
contains modifiers (like determiners, adjectives,
postpositions in NPs).
NP Noun Chunks
Verb Chunks
For English:
The types of verb chunks and their tags are described below.
3. VGNN Gerunds
Eg:
Eg:
He walks (slowly/ADV)/ADVP
PP Prepositional Chunk
This chunk type is present
for only English and not for Indian languages. It consists
of only the preposition and not the NP argument.
Eg:
(with/IN)PP a pen
IOB prefixes
He PRP B-NP
ate VBD B-VP
an DT B-NP
apple NN I-NP
to TO B-VP
satiate VB I-VP
his PRP$ B-NP
hunger NN I-NP
Procedure –
STEP1: Select a language
STEP2: Select a sentence
STEP3: Select the corresponding chunk-tag for each word in the sentence and click the Submit
button.
OUTPUT1: The submitted answer will be checked.
Click on Get Answer button for the correct answer.
Output –
AI Experiment 10
Building Chunker
Aim - The objective of the experiment is to know the importance of selecting proper features
for training a model and size of training corpus in learning how to do chunking.
Theory –
Hidden Markov Model
In the mid 1980s, researchers in Europe began to use Hidden Markov models (HMMs) to
disambiguate parts of speech. HMMs involve counting cases, and making a table of the
probabilities of certain sequences. For example, once you've seen an article such as 'the',
perhaps the next word is a noun 40% of the time, an adjective 40%, and a number 20%.
Knowing this, a program can decide that "can" in "the can" is far more likely to be a noun than
a verb or a modal. The same method can of course be used to benefit from knowledge about
following words.
More advanced ("higher order") HMMs learn the probabilities not only of pairs, but triples or
even larger sequences. So, for example, if you've just seen an article and a verb, the next item
may be very likely a preposition, article, or noun, but much less likely another verb.
When several ambiguous words occur together, the possibilities multiply. However, it is easy
to enumerate every combination and to assign a relative probability to each one, by
multiplying together the probabilities of each choice in turn.
It is worth remembering, as Eugene Charniak points out in Statistical techniques for natural
language parsing, that merely assigning the most common tag to each known word and the tag
"proper noun" to all unknowns, will approach 90% accuracy because many words are
unambiguous.
HMMs underlie the functioning of stochastic taggers and are used in various algorithms.
Accuracies for one such algorithm (TnT) on various training data is shown here.
Conditional random fields (CRFs) are a class of statistical modelling method often applied in
machine learning, where they are used for structured prediction. Whereas an ordinary
classifier predicts a label for a single sample without regard to "neighboring" samples, a CRF
can take context into account. Since it can consider context, therefore CRF can be used in
Natural Language Processing. Hence, Parts of Speech tagging is also possible. It predicts the
POS using the lexicons as the context.
In this experiment both algorithms are used for training and testing data. As the size of
training corpus increases, it is observed that accuracy increases. Further, even features also
play an important role for better output. In this experiment, we can see that Parts of Speech
as a feature performs better than only lexicon as the feature. Therefore, it is important to
select proper features for training a model to have better accuracy.
Procedure –
STEP1: Select the language.
OUTPUT: Drop down to select size of corpus, algorithm and features will appear.
Output –