AI Assignment 1

NLP Experiment 1
Word Analysis
Aim - The objective of the experiment is to learn about morphological features of a word by
analyzing it.
Theory- Analysis of a word into root and affix(es) is called as Morphological analysis of a word. It
is mandatory to identify root of a word for any natural language processing task. A root word can
have various forms. For example, the word 'play' in English has the following forms: 'play', 'plays',
'played' and 'playing'. Hindi shows more number of forms for the word 'खे ल' (khela) which is
equivalent to 'play'. The forms of 'खे ल'(khela) are the following:
खे ल(khela), खे ला(khelaa), खे ली(khelii), खे ल ूंगा(kheluungaa), खे ल ूंगी(kheluungii), खे लेगा(khelegaa),

खे लेगी(khelegii), खे लते (khelate), खे लती(khelatii), खे लने (khelane), खे लकर(khelakar)
For Telugu root ఆడడం (Adadam), the forms are the following::
Adutaanu, AdutunnAnu, Adenu, Ademu, AdevA, AdutAru, Adutunnaru, AdadAniki, Adesariki,

AdanA, Adinxi, Adutunxi, AdinxA, AdeserA, Adestunnaru, ...
Thus we understand that the morphological richness of one language might vary from one language
to another. Indian languages are generally morphologically rich languages and therefore
morphological analysis of words becomes a very significant task for Indian languages.
Types of Morphology
Morphology is of two types,
1. Inflectional morphology
Deals with word forms of a root, where there is no change in lexical category. For example,
'played' is an inflection of the root word 'play'. Here, both 'played' and 'play' are verbs.
2. Derivational morphology
Deals with word forms of a root, where there is a change in the lexical category. For example, the
word form 'happiness' is a derivation of the word 'happy'. Here, 'happiness' is a derived noun form
of the adjective 'happy'.
Morphological Features:
All words will have their lexical category attested during morphological analysis.
A noun and pronoun can take suffixes of the following features: gender, number, person, case For
example, morphological analysis of a few words is given below:
Language input:word output:analysis

Hindi लडके (ladake) rt=लड़का(ladakaa), cat=n, gen=m, num=sg, case=obl
Hindi लडके (ladake) rt=लड़का(ladakaa), cat=n, gen=m, num=pl, case=dir
Hindi लड़क ूं rt=लड़का(ladakaa), cat=n, gen=m, num=pl, case=obl
(ladakoM)
English boy rt=boy, cat=n, gen=m, num=sg
English boys rt=boy, cat=n, gen=m, num=pl
A verb can take suffixes of the following features: tense, aspect, modality, gender, number,
person
Language input:word output:analysis

rt=हँ स(hans), cat=v, gen=fem, num=sg/pl, per=1/2/3 tense=past,
Hindi हँ सी(hansii)
aspect=pft
English toys rt=toy, cat=n, num=pl, per=3
'rt' stands for root. 'cat' stands for lexical category. Thev value of lexicat category can be noun,
verb, adjective, pronoun, adverb, preposition. 'gen' stands for gender. The value of gender can be
masculine or feminine.
'num' stands for number. The value of number can be singular (sg) or plural (pl).
'per' stands for person. The value of person can be 1, 2 or 3
The value of tense can be present, past or future. This feature is applicable for verbs.
The value of aspect can be perfect (pft), continuous (cont) or habitual (hab). This feature is not
applicable for verbs.
'case' can be direct or oblique. This feature is applicable for nouns. A case is an oblique case when
a postposition occurs after noun. If no postposition can occur after noun, then the case is a direct
case. This is applicable for hindi but not english as it doesn't have any postpositions.
Some of the postpsitions in hindi are: का(kaa), की(kii), के(ke), क (ko), में (meM).
Procedure –
STEP1: Select the language.
OUTPUT: Drop down for selecting words will appear.
STEP2: Select the word.

OUTPUT: Drop down for selecting features will appear.
STEP3: Select the features.
STEP4: Click "Check" button to check your answer.

OUTPUT: Right features are marked by tick and wrong features are marked by cross.
Output –
NLP Experiment 2
Word Generation
Aim - The objective of the experiment is to generate word forms from root and suffix
information.
Theory- Given the root and suffix information, a word can be generated. For example,
Language input:analysis output:word

Hindi rt=लड़का(ladakaa), cat=n, gen=m, num=sg, case=obl लड़के (ladake)
Hindi rt=लड़का(ladakaa), cat=n, gen=m, num=pl, case=dir लड़के (ladake)
English rt=boy, cat=n, num=pl boys
English rt=play, cat=v, num=sg, per=3, tense=pr plays
- Morphological analysis and generation: Inverse processes.

- Analysis may involve non-determinism, since more than one analysis is possible.
- Generation is a deterministic process. In case a language allows spelling variation, then till
that extent, generation would also involve non-determinism.
Procedure –
OUTPUT: Drop downs for selecting root and other features will appear.
STEP2: Select the root and other features.
STEP3: After selecting all the features, select the word corresponding above features selected.
STEP4: Click the check button to see whether right word is selected or not
OUTPUT: Output tells whether the word selected is right or wrong
Output –
NLP Experiment 3
Morphology
Aim - Understanding the morphology of a word by the use of Add-Delete table
Theory- Morph Analyser
Definition
Morphemes are considered as smallest meaningful units of language. These morphemes can
either be a root word(play) or affix(-ed). Combination of these morphemes is called
morphological process. So, word "played" is made out of 2 morphemes "play" and "-ed". Thus
finding all parts of a word(morphemes) and thus describing properties of a word is called
"Morphological Analysis". For example, "played" has information verb "play" and "past tense", so
given word is past tense form of verb "play".
Analysis of a word :
बच् ूं (bachchoM) = बच्ा(bachchaa)(root) + ओ(ूं oM)(suffix)
(ओ
3 plural oblique)
=
A linguistic paradigm is the complete set of variants of a given lexeme. These variants can be
classified according to shared inflectional categories (eg: number, case etc) and arranged into
tables.
Paradigm for बच्ा
case/num singular plural
direct बच्ा(bachchaa)बच्े(bachche)
obliqueबच्े(bachche) बच् ूं
(bachchoM)
Algorithm to get बच् (ूं bachchoM) from बच्ा(bachchaa)
Take Root बच्(bachch)आ(aa)
Delete आ(aa)
output बच्(bachch)
Add ओ(ूं oM) to
output Return बच् ूं
(bachchoM)
Therefore आ is deleted and ओूं is added to get बच् ूं
Add-Delete table for बच्ा

Delete Add Number Case Variants
आ(aa) आ(aa) sing dr बच्ा(bachcha

a)
आ(aa) ए(e) plu dr बच्े(bachche)
आ(aa) ए(e) sing ob बच्े(bachche)
आ(aa) ओ(ूं o plu ob बच् (ूं

M) bachchoM)
Paradigm Class
Words in the same paradigm class behave similarly, for Example लड़क is in the same paradigm
class as बच्, so लड़का would behave similarly as बच्ा as they share the same paradigm class.
Procedure –
STEP1: Select a word root.
STEP2: Fill the add-delete table and submit.
STEP3: If wrong, see the correct answer or repeat STEP1.
Output –
AI Experiment 4
N-Grams
Aim - The objective of this experiment is to learn to calculate bigrams from a given corpus
and calculate probability of a sentence.
Theory- A combination of words forms a sentence. However, such a formation is meaningful
only when the words are arranged in some order.
Eg: Sit I car in the
Such a sentence is not grammatically acceptable. However some perfectly grammatical
sentences can be nonsensical too!
Eg: Colorless green ideas sleep furiously
One easy way to handle such unacceptable sentences is by assigning probabilities to the strings
of words i.e, how likely the sentence is.
Probability of a sentence
If we consider each word occurring in its correct location as an independent event,the
probability of the sentences is : P(w(1), w(2)..., w(n-1), w(n))
Using chain rule:

= P(w(1)) * P(w(2) | w(1)) * P(w(3) | w(1)w(2)) ... P(w(n) | w(1)w(2)…w(n-1))
Bigrams
We can avoid this very long calculation by approximating that the probability of a given word
depends only on the probability of its previous words. This assumption is called Markov
assumption and such a model is called Markov model- bigrams. Bigrams can be generalized to
the n-gram which looks at (n-1) words in the past. A bigram is a first-order Markov model.
Therefore ,
P(w(1), w(2)..., w(n-1), w(n))= P(w(2)|w(1)) P(w(3)|w(2)) …. P(w(n)|w(n-1))
We use (eos) tag to mark the beginning and end of a sentence.

A bigram table for a given corpus can be generated and used as a lookup table for calculating
probability of sentences.
Eg: Corpus – (eos) You book a flight (eos) I read a book (eos) You read (eos)
Bigram Table:
(eos)you booka flightI read
(eos) 0 0.330 0 0 0.250
you 0 0 0.5 0 0 0 0.5
book 0.5 0 0 0.50 0 0
a 0 0 0.5 0 0.5 0 0
flight1 0 0 0 0 0 0
I 0 0 0 0 0 0 1
read 0.5 0 0 0.50 0 0
P((eos) you read a book (eos))

= P(you|eos) * P(read|you) * P(a|read) * P(book|a) * P(eos|book)
= 0.33 * 0.5 * 0.5 * 0.5 * 0.5
=.020625
Procedure –
STEP1: Select a corpus and click on Generate bigram table
STEP2: Fill up the table that is generated and hit Submit
STEP3: If incorrect (red), see the correct answer by clicking on show answer or repeat Step 2.
STEP4: If correct (green), click on take a quiz and fill the correct answer
Output –
AI Experiment 5
N-Grams Smoothing
Aim -The objective of this experiment is to learn how to apply add-one smoothing on sparse
bigram table..
Theory –
The standard N-gram models are trained from some corpus. The finiteness of the training
corpus leads to the absence of some perfectly acceptable N-grams. This results in sparse
bigram matrices. This method tend to underestimate the probability of strings that do not
occur in their training corpus.
There are some techniques that can be used for assigning a non-zero probabilty to these 'zero
probability bigrams'. This task of reevaluating some of the zero-probability and low-probabilty
N-grams, and assigning them non-zero values, is called smoothing. Some of the techniques are:
Add-One Smoothing, Witten-Bell Discounting, Good-Turing Discounting.
Add-One Smoothing
In Add-One smooting, we add one to all the bigram counts before normalizing them into
probabilities. This is called add-one smoothing.
Application on unigrams
The unsmoothed maximum likelihood estimate of the unigram probability can be computed by
dividing the count of the word by the total number of word tokens N
P(wx) = c(wx)/sumi{c(wi)}
= c(wx)/N
Let there be an adjusted count c*.
c i* = (c +1)*N/(N+V)
where where V is the total number of word types in the language.
Now, probabilities can be calculated by normalizing counts by N.
p i* = (c +1)/(N+V)
Application on bigrams
Normal bigram probabilities are computed by normalizing each row of counts by the unigram
count:
P(w n|wn-1) = C(wn-1wn)/C(wn-1)
For add-one smoothed bigram counts we need to augment the unigram count by the number of
total word types in the vocabulary V:
p *(wn|wn-1) = ( C(wn-1wn)+1 )/( C(wn-1)+V )
Procedure –
STEP1: Select a corpus

STEP2: Apply add one smoothing and calculate bigram probabilities using the given bigram
counts,N and V. Fill the table and hit Submit
STEP3: If incorrect (red), see the correct answer by clicking on show answer or repeat Step 2
Output –
AI Experiment 6
POS Tagging - Hidden Markov Model
Aim - The objective of the experiment is to calculate emission and transition matrix which
will be helpful for tagging Parts of Speech using Hidden Markov Model.
Theory – A Hidden Markov Model (HMM) is a statistical Markov model in which the system
being modeled is assumed to be a Markov process with unobserved (hidden) states.In a regular
Markov model (Markov Model (Ref: http://en.wikipedia.org/wiki/Markov_model)), the state is
directly visible to the observer, and therefore the state transition probabilities are the only
parameters. In a hidden Markov model, the state is not directly visible, but output, dependent
on the state, is visible.
Hidden Markov Model has two important components-
1)Transition Probabilities: The one-step transition probability is the probability of transitioning

from one state to another in a single step.
2)Emission Probabilties: : The output probabilities for an observation from state.

Emission probabilities B = { bi,k = bi(ok) = P(ok | qi) }, where okis an Observation. Informally,
B is the probability that the output is ok given that the current state is qi
For POS tagging, it is assumed that POS are generated as random process, and each process
randomly generates a word. Hence, transition matrix denotes the transition probability from
one POS to another and emission matrix denotes the probability that a given word can have a
particular POS. Word acts as the observations. Some of the basic assumptions are:
First-order (bigram) Markov assumptions:

Limited Horizon: Tag depends only on previous tag P(t i+1 = tk | t1=tj1,âŚ,ti=tji) = P(ti+1 = tk | ti = tj)
Time invariance: No change over time
P(ti+1 = tk | ti = tj) = P(t2 = tk | t1 = tj) = P(tj -> tk)
Output probabilities:
Probability of getting word wk for tag t j: P(wk | tj) is independent of other tags or words!
Calculating the Probabilities
Consider the given toy corpus
EOS/eos
They/pronoun
cut/verb
the/determiner
paper/noun
EOS/eos
He/pronoun
asked/verb
for/preposition
his/pronoun
cut/noun.
EOS/eos
Put/verb
the/determiner
paper/noun
in/preposition
the/determiner
cut/noun
EOS/eos
Calculating Emission Probability Matrix
Count the no. of times a specific word occus with a specific POS tag in the corpus.
Here, say for "cut"
count(cut,verb)=1
count(cut,noun)=2
count(cut,determiner)=0
... and so on zero for other tags too.
count(cut) = total count of cut = 3

Now, calculating the probability
Probability to be filled in the matrix cell at the intersection of cut and verb
P(cut/verb)=count(cut,verb)/count(cut)=1/3=0.33
Similarly,
Probability to be filled in the cell at he intersection of cut and determiner
P(cut/determiner)=count(cut,determiner)/count(cut)=0/3=0
Repeat the same for all the word-tag combination and fill the
Calculating Transition Probability Matrix
Count the no. of times a specific tag comes after other POS tags in the corpus.
Here, say for "determiner"
count(verb,determiner)=2
count(preposition,determiner)=1
count(determiner,determiner)=0
count(eos,determiner)=0
count(noun,determiner)=0
... and so on zero for other tags too.
count(determiner) = total count of tag 'determiner' = 3
Now, calculating the probability

Probability to be filled in the cell at he intersection of determiner(in the column) and verb(in
the row)
P(determiner/verb)=count(verb,determiner)/count(determiner)=2/3=0.66
Similarly,
Probability to be filled in the cell at he intersection of determiner(in the column) and noun(in
the row)
P(determiner/noun)=count(noun,determiner)/count(determiner)=0/3=0
Repeat the same for all the tags
Note: EOS/eos is a special marker which represents End Of Sentence.
Procedure –
STEP1: Select the corpus.

STEP2: For the given corpus fill the emission and transition matrix. Answers are rounded to 2
decimal digits.
STEP3: Press Check to check your answer.
Wrong answers are indicated by the red cell.
Output –
AI Experiment 7
POS Tagging - Viterbi Decoding
Aim - The objective of this experiment is to find POS tags of words in a sentence using Viterbi
decoding.
Theory – Viterbi Decoding is based on dynamic programming. This algorithm takes emission
and transmission matrix as the input. Emission matrix gives us information about proabities of
a POS tag for a given word and transmission matrix gives the probability of transition from one
POS tag to another POS tag. It observes sequence of words and returns the state sequences of
POS tags along with its probability.
Here "s" denotes words and "t" denotes tags. "a" is transmission matrix and "b" is emission
matrix.
Using above algorithm, we have to fill the viterbi table column by column.
Procedure –
STEP1: Select the corpus.

OUTPUT: Emission and Transmission matrix will appear.
STEP2: Fill the column with the probability of possible POS tags given the word (i.e. form the
Viterbi matrix by filling column for each observation). Answers submitted are rounded off to 3
digits after decimal and are than checked.
STEP3: Check the column.
Wrong answers are indicated by red background in a cell.
If answers are right, then go to step2
STEP4: Repeat steps 2 and 3 until all words of a sentence are covered.
STEP5: At last check the POS tag for each word obtained from backtracking.
Output –
AI Experiment 8
Building POS Tagger
Aim - The objective of the experiment is to know the importance of context and size of
training corpus in learning Parts of Speech.
Theory – In the mid 1980s, researchers in Europe began to use Hidden Markov models (HMMs)
to disambiguate parts of speech. HMMs involve counting cases, and making a table of the
probabilities of certain sequences. For example, once you've seen an article such as 'the',
perhaps the next word is a noun 40% of the time, an adjective 40%, and a number 20%.
Knowing this, a program can decide that "can" in "the can" is far more likely to be a noun than
a verb or a modal. The same method can of course be used to benefit from knowledge about
following words.
More advanced ("higher order") HMMs learn the probabilities not only of pairs, but triples or
even larger sequences. So, for example, if you've just seen an article and a verb, the next item
may be very likely a preposition, article, or noun, but much less likely another verb.
When several ambiguous words occur together, the possibilities multiply. However, it is easy
to enumerate every combination and to assign a relative probability to each one, by
multiplying together the probabilities of each choice in turn.
It is worth remembering, as Eugene Charniak points out in Statistical techniques for natural
language parsing, that merely assigning the most common tag to each known word and the tag
"proper noun" to all unknowns, will approach 90% accuracy because many words are
unambiguous.
HMMs underlie the functioning of stochastic taggers and are used in various algorithms.
Accuracies for one such algorithm (TnT) on various training data is shown here.
Conditional Random Field
Conditional random fields (CRFs) are a class of statistical modelling method often applied in
machine learning, where they are used for structured prediction. Whereas an ordinary
classifier predicts a label for a single sample without regard to "neighboring" samples, a CRF
can take context into account. Since it can consider context, therefore CRF can be used in
Natural Language Processing. Hence, Parts of Speech tagging is also possible. It predicts the
POS using the lexicons as the context.
If only one neighbour is considered as a context, then it is called bigram. Similarly, two
neighbours as the context is called trigram. In this experiment, size of training corpus and
context were varied to know their importance.
Procedure –
OUTPUT: Drop down to select size of corpus, algorithm and features will appear.
STEP2: Select corpus size.
STEP3: Select algorithm "CRF" or "HMM".
STEP4: Select feature "bigram" or "trigram".
OUTPUT: Corresponding accuracy wil be shown.
Output –
AI Experiment 9
Chunking
Aim - The objective of this experiment is to understand the concept of chunking and get
familiar with the basic chunk tagset.
Theory –
Chunking of text invloves
dividing a text into
syntactically correlated words.
Eg: He ate an apple to satiate his hunger. [NP He ] [VP ate

] [NP an apple] [VP to satiate] [NP his hunger]
Eg: दरवाज़ा खु ल गया

[NP दरवाज़ा] [VP खु ल गया]
Chunk Types
The chunk types are based on the
syntactic category part. Besides the head a chunk also
contains modifiers (like determiners, adjectives,
postpositions in NPs).
The basic types of chunks in English are:

Chunk Type Tag Name
1. Noun NP
2. Verb VP
3. Adverb ADVP
4. Adjectivial ADJP
5. Prepositional PP
The basic Chunk Tag Set for Indian Languages
Sl. No Chunk Type Tag Name

1 Noun Chunk NP
2.1 Finite Verb Chunk VGF
2.2 Non-finite Verb Chunk VGNF
2.3 Verb Chunk (Gerund) VGNN
3 Adjectival Chunk JJP
4 Adverb Chunk RBP
NP Noun Chunks
Noun Chunks will be given the tag NP and include

non-recursive noun phrases and postposition for Indian
languages and preposition for English. Determiners,
adjectives and other modifiers will be part of the noun
chunk.
Eg:
(इस/DEM ककताब/NN में/PSP)NP

'this' 'book' 'in'
((in/IN the/DT big/ADJ room/NN))NP
Verb Chunks
The verb chunks are marked as VP for English, however they

would be of several types for Indian languages. A verb
group will include the main verb and its auxiliaries, if
any.
For English:
I (will/MD be/VB loved/VBD)VP
The types of verb chunks and their tags are described below.
1. VGF Finite Verb Chunk
The auxiliaries in the verb group mark the finiteness of the

verb at the chunk level. Thus, any verb group which is
finite will be tagged as VGF. For example,
Eg: मैं ने घर पर (खाया/VM)VGF

'I erg''home' 'at''meal' 'ate'
2. VGNF Non-finite Verb Chunk
A non-finite verb chunk will be tagged as VGNF.
Eg: से ब (खाता/VM हुआ/VAUX)VGNF लड़का जा रहा है

'apple' 'eating' 'PROG' 'boy' go' 'PROG' 'is'
3. VGNN Gerunds
A verb chunk having a gerund will be annotated as VGNN.
Eg: शराब (पीना/VM)VGNN सेहत के कलए हाकनकारक है sharAba

'liquor' 'drinking' 'heath' 'for' 'harmful' 'is'
JJP/ADJP Adjectival Chunk
An adjectival chunk will be tagged as ADJP for English and

JJP for Indian languages. This chunk will consist of all
adjectival chunks including the predicative adjectives.
Eg:
वह लड़की है (सु न्दर/JJ)JJP

The fruit is (ripe/JJ)ADJP
Note: Adjectives appearing before a noun will be grouped

together within the noun chunk.
RBP/ADVP Adverb Chunk
This chunk will include all pure adverbial phrases.
Eg:
वह (धीरे -धीरे /RB)RBP चल रहा था

'he' 'slowly' 'walk' 'PROG' 'was'
He walks (slowly/ADV)/ADVP
PP Prepositional Chunk
This chunk type is present
for only English and not for Indian languages. It consists
of only the preposition and not the NP argument.
Eg:
(with/IN)PP a pen
IOB prefixes
Each chunk has an open boundary and close boundary that

delimit the word groups as a minimal non-recursive
unit. This can be formally expressed by using IOB prefixes:
B-CHUNK for the first word of the chunk and I-CHUNK for each
other word in the chunk. Here is an example of the file
format:
Tokens POS Chunk-Tags
He PRP B-NP
ate VBD B-VP
an DT B-NP
apple NN I-NP
to TO B-VP
satiate VB I-VP
his PRP$ B-NP
hunger NN I-NP
Procedure –
STEP1: Select a language
STEP2: Select a sentence
STEP3: Select the corresponding chunk-tag for each word in the sentence and click the Submit
button.
OUTPUT1: The submitted answer will be checked.
Click on Get Answer button for the correct answer.
Output –
AI Experiment 10
Building Chunker
Aim - The objective of the experiment is to know the importance of selecting proper features
for training a model and size of training corpus in learning how to do chunking.
Theory –
Hidden Markov Model
In the mid 1980s, researchers in Europe began to use Hidden Markov models (HMMs) to
disambiguate parts of speech. HMMs involve counting cases, and making a table of the
probabilities of certain sequences. For example, once you've seen an article such as 'the',
perhaps the next word is a noun 40% of the time, an adjective 40%, and a number 20%.
Knowing this, a program can decide that "can" in "the can" is far more likely to be a noun than
a verb or a modal. The same method can of course be used to benefit from knowledge about
following words.
More advanced ("higher order") HMMs learn the probabilities not only of pairs, but triples or
even larger sequences. So, for example, if you've just seen an article and a verb, the next item
may be very likely a preposition, article, or noun, but much less likely another verb.
When several ambiguous words occur together, the possibilities multiply. However, it is easy
to enumerate every combination and to assign a relative probability to each one, by
multiplying together the probabilities of each choice in turn.
It is worth remembering, as Eugene Charniak points out in Statistical techniques for natural
language parsing, that merely assigning the most common tag to each known word and the tag
"proper noun" to all unknowns, will approach 90% accuracy because many words are
unambiguous.
HMMs underlie the functioning of stochastic taggers and are used in various algorithms.
Accuracies for one such algorithm (TnT) on various training data is shown here.
Conditional Random Field
Conditional random fields (CRFs) are a class of statistical modelling method often applied in
machine learning, where they are used for structured prediction. Whereas an ordinary
classifier predicts a label for a single sample without regard to "neighboring" samples, a CRF
can take context into account. Since it can consider context, therefore CRF can be used in
Natural Language Processing. Hence, Parts of Speech tagging is also possible. It predicts the
POS using the lexicons as the context.
In this experiment both algorithms are used for training and testing data. As the size of
training corpus increases, it is observed that accuracy increases. Further, even features also
play an important role for better output. In this experiment, we can see that Parts of Speech
as a feature performs better than only lexicon as the feature. Therefore, it is important to
select proper features for training a model to have better accuracy.
Procedure –
OUTPUT: Drop down to select size of corpus, algorithm and features will appear.
STEP2: Select corpus size.
STEP3: Select algorithm "CRF" or "HMM".

STEP4: Select feature "only lexicon", "only POS", "lexicon and POS".
OUTPUT: Corresponding accuracy wil be shown.
Output –

AI Assignment 1

Uploaded by

Copyright:

Available Formats

AI Assignment 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AI Assignment 1

Uploaded by

Copyright:

Available Formats

NLP Experiment 1

खे ल(khela), खे ला(khelaa), खे ली(khelii), खे ल ूंगा(kheluungaa), खे ल ूंगी(kheluungii), खे लेगा(khelegaa),

Adutaanu, AdutunnAnu, Adenu, Ademu, AdevA, AdutAru, Adutunnaru, AdadAniki, Adesariki,

Morphology is of two types,

Language input:word output:analysis

Language input:word output:analysis

STEP2: Select the word.

STEP3: Select the features.

STEP4: Click "Check" button to check your answer.

Language input:analysis output:word

- Morphological analysis and generation: Inverse processes.

STEP2: Select the root and other features.

Theory- Morph Analyser

बच् ूं (bachchoM) = बच्ा(bachchaa)(root) + ओ(ूं oM)(suffix)

Paradigm for बच्ा

case/num singular plural

Algorithm to get बच् (ूं bachchoM) from बच्ा(bachchaa)

Take Root बच्(bachch)आ(aa)

Add ओ(ूं oM) to

output Return बच् ूं

Therefore आ is deleted and ओूं is added to get बच् ूं

Add-Delete table for बच्ा

आ(aa) आ(aa) sing dr बच्ा(bachcha

आ(aa) ए(e) sing ob बच्े(bachche)

आ(aa) ओ(ूं o plu ob बच् (ूं

Using chain rule:

We use (eos) tag to mark the beginning and end of a sentence.

P((eos) you read a book (eos))

STEP1: Select a corpus

Hidden Markov Model has two important components-

1)Transition Probabilities: The one-step transition probability is the probability of transitioning

2)Emission Probabilties: : The output probabilities for an observation from state.

First-order (bigram) Markov assumptions:

Calculating the Probabilities

Consider the given toy corpus

Calculating Emission Probability Matrix

... and so on zero for other tags too.

count(cut) = total count of cut = 3

Calculating Transition Probability Matrix

... and so on zero for other tags too.

count(determiner) = total count of tag 'determiner' = 3

Now, calculating the probability

Repeat the same for all the tags

Note: EOS/eos is a special marker which represents End Of Sentence.

STEP1: Select the corpus.

STEP1: Select the corpus.

Conditional Random Field

Eg: He ate an apple to satiate his hunger. [NP He ] [VP ate

Eg: दरवाज़ा खु ल गया

The basic types of chunks in English are:

The basic Chunk Tag Set for Indian Languages

Sl. No Chunk Type Tag Name

Noun Chunks will be given the tag NP and include

(इस/DEM ककताब/NN में/PSP)NP

((in/IN the/DT big/ADJ room/NN))NP

The verb chunks are marked as VP for English, however they

I (will/MD be/VB loved/VBD)VP

1. VGF Finite Verb Chunk

The auxiliaries in the verb group mark the finiteness of the