21AD3202 - Natural LanguageProcessing-Record

LAB WORKBOOK
21AD3202 NATURAL LANGUAGE PROCESSING
Team NLP
K L UNIVERSITY | 21AD3202-NATURAL LANGUAGE PROCESSING
LABORATORY WORKBOOK
STUDENT
NAME
REG. NO
YEAR
SEMESTER
SECTION
FACULTY
1
University Vision and Mission

Vision
To be a globally renowned university.
Mission
To impart quality higher education and to undertake research and extension with
emphasis on application and innovation that cater to the emerging societal needs
through all-round development of the students of all sections enabling them to be
globally competitive and socially responsible citizens with intrinsic values.
Department Vision and Mission

Vision
To be a department of international repute through continuous research, innovation

and industry led curriculum.
Mission
To Impart Quality Education with social consciousness and make them Globally
Competent.
Mission Statements
M1: Provide quality education in both the theoretical and applied foundations of
computer science & computerengineering.
M2: Train students effectively to apply their computational skills in solving

industrial, societal and real-worldproblems.
M3: Provide students a competitive advantage, emulous environment in the ever-

changing and challengingglobal workforce.
M4: Facilitate multi-disciplinary innovation to advance theoretical computer

science through experimentalresearch.
Program Educational Objectives (PEOs)
S. No PEO# Statement
Graduates will be able to practice engineering in a broad range of

1 PEO1
industrial, societal and real-world applications.
Graduates will be able to pursue advanced education, research and

2 PEO2 development, by adapting creative and innovative practices in their
professional careers.
Graduates will be able to conduct themselves in a responsible,

3 PEO3
professional, and ethical manner.
2
Graduates will be able to participate as leaders in their fields of

4 PEO4 expertise and in activities that support service and economic
development throughout the world.
3
Table of Contents
0. Organization of the student lab workbook......................................................................... 5

1. Installation of Python, NLTK, Jupyter and preliminaries. ................................................. 6
2. Implementation of Word Tokenizer ................................................................................. 11
3. Implememtation of Sentence Tokenizer ........................................................................... 17
4. Implememtation of paragraph tokenizer........................................................................... 23
5. Implement various actions in Corpora.............................................................................. 29
6. Implement probabilistic parsing ....................................................................................... 34
7. Learning Grammar ........................................................................................................... 37
8. Conditional frequency distribution ................................................................................... 40
9. Lexical Analyzer .............................................................................................................. 44
10. WORDNET ...................................................................................................................... 49
11. Context Free Grammer ..................................................................................................... 55
12. Name Entity Recognition ................................................................................................. 60
4
0: Organization of the student lab workbook
The laboratory framework includes a creative element but shifts the time-intensive
aspects outside of the Two-Hour closed laboratory period. Within this structure, each
laboratory includes three parts: Prelab, In-lab, and Post-lab.
a. Pre-Lab
The Prelab exercise is a homework assignment that links the lecture with the laboratory
period - typically takes 2 hours to complete. The goal is to synthesize the information
they learn in lecture with material from their textbook to produce a working piece of
software. Prelab Students attending a two-hour closed laboratory are expected to make
a good-faith effort to complete the Prelab exercise before coming to the lab. Their work
need not be perfect, but their effort must be real (roughly 80 percent correct).
b. In-Lab
The In-lab section takes place during the actual laboratory period. The First hour of the
laboratory period can be used to resolve any problems the students might have
experienced in completing the Prelab exercises. The intent is to give constructive
feedback so that students leave the lab with working Prelab software - a significant
accomplishment on their part. During the second hour, students complete the In-lab
exercise to reinforce the concepts learned in the Prelab. Students leave the lab having
received feedback on their Prelab and In-lab work.
c. Post-Lab
The last phase of each laboratory is a homework assignment that is done following the
laboratory period. In the Post-lab, students analyse the efficiency or utility of a given
system call. Each Post-lab exercise should take roughly 120 minutes to complete.
5
2023-24 EVEN SEMESTER LAB CONTINUOUS

EVALUATION
In-Lab
Sl Pre-Lab Post Lab Viva Voce Total Faculty
Date Experiment Name LOGIC EXECUT RESULT ANALYS
No (5M) (5M) (5M) (50M) Signature
(10M) ION (10M) IS (5M)
(10M)
Installation of Python, NLTK,

1
Jupyter and preliminaries.
Implementation of Word
2
Tokenizer
Implememtation of Sentence
3
Tokenizer
Implememtation of paragraph
4
tokenizer
Implement various actions in

5
Corpora
6 Implement probabilistic parsing
7 Learning grammar
8 Conditional frequency distribution
4
2023-24 EVEN SEMESTER LAB CONTINUOUS

EVALUATION
In-Lab
Sl Pre-Lab Post Lab Viva Voce Total Faculty
Date Experiment Name LOGIC EXECUTIO RESUL ANALYS
No (5M) (10M) N (10M) T IS (5M)
(5M) (5M) (50M) Signature
(10M)
9 Lexical Analyzer
10 Wordnet
11 Context Free Grammar
12 Named Entity Recognition
5
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK
1. Installation of Python, NLTK, Jupyter and preliminaries.

Date of the Session: / / Time of the Session: to
Prerequisite:
• Basic Knowledge on Natural Language Processing.
• Importing NLTK package.
• Methods of NLTK.
Pre-Lab Task:
1) What is Natural Language Processing?

Ans:-
2) What is NLTK?
Ans:-
3) Mention some methods available in NLTK package and explain them.

Ans:-
6
In Lab Task:
1) Basic Steps for Installation of NLTK in python.
PROCEDURE:
Step1: Download the latest version of Python for Windows
Step2: Click on downloaded .exe to run it.
Step3: Select customize installation.
Step4: Check for all the features especially “pip” as it helps to install NLTK and click on
Next.
Step5: select advanced options, select the path and click on install.
Step 6: Once the installation is successful close to the window.
Step 7: To have a better understanding of the commands, we need to install

Jupyter notebook
pip install notebook
Step 8: Copy the path of Scripts folder to install NLTK in the same folder.
Step 9: To install NLTK, open command prompt and type the command:
pip install nltk
Step 9: To run the Jupyter, select the appropriate folder in the command prompt and give
the following command
jupyter notebook
Writing space for the Problem: (For Student’s use only)
7
Post Lab Task:-

1) Steps for downloading datasets from NLTK.
After successfully installing NLTK, you can import it and also download its
corpora with thefollowing command.
Command:
import nltk
nltk.download()
NLTK downloader opens a window to download the datasets. The size of the dataset
is big henceit will take time. To test if datasets are installed properly, try importing the
dataset and use it.
Command: from nltk.corpus

import brownbrown.words();
2) Write the processing of NLTK.

Ans:-
3) Mention the uses of NLTK.

Ans:-
8
4) Although there are many other tools for natural language processing
why NLTK is mostlypreferred?
Ans:-
(For Evaluator’s use only)
Comment of the Evaluator (if Any) Evaluator’s Observation

Marks Secured: out of
Full Name of the Evaluator:
Signature of the Evaluator Date of Evaluation:
10

2. Implementation of Word Tokenizer

Prerequisite:
• NLTK and jupyter notebook.
• Basic knowledge on methods available in NLTK
Pre-Lab Task:
1) Explain Natural Language processing in your own words?
Ans:-
2) List out the techniques of NLP?

Ans:-
3) List out the real life examples of NLP?

Ans:-
11
4) Define tokenization?
Ans:-
5) Define stemming ?
Ans:-
6) Explain about stop words?
Ans:-
7) Which tokenizer is used to when there are other languages other than English?
Ans:-
12
In Lab Task:
1) Chinnu wants to help his sister pravallika to pass through the aptitude exam, so in
order to help her hewants to test her skills in english .So he decided to assign some
sentences and want the key root words inthe sentences.So implement a python code
that helps pravallika to get the root words in the given sentences?(Note:Use
stemming and tokenization process)
Sample Input:
Maximum students are Suffering from Insominia.
Sample Output:
Maximum;max
Students;Student
Are;are
Suffering;suffer
From;from
Insominia:somni
Writing space for the Problem:(For Student’s use

only)
13
2) Chinnu is a comic editor in a X company one day while editing a particular script
he became enthusiastic about the story of the script so there is no time to read the
whole script he decided to understand the total story line by learning about the
characters so he wants to separate the words in thesentence to know the characters
as the whole story is complex ,So implement a python program that splits the words
and display both splitted words and count of the words in the given sentence using
tokenizer function?
Sample input:
Chinnu is the one who protect lilly.
Sample output:
[‘chinnu’,
‘is’,’the’,’one’.’who’,’prote
cted’,lilly’]Count;7

only)
14
Post Lab Task:
1) Albert is a English teacher in a school.He wants to teach the students about

connectors and prepositionsand there usage in sentence formation so he thought
an idea that helps students to understand more about the connectors and
prepositions so he planned to give different sentences to the students and to remove
theconnectors and prepositions in the sentence .Implement a python code to help
students to remove those connectors and prepositions?(Note: connectors and
prepositions represents stop words take them in a textfile for required output)
Sample input:
There is a change in a medium.
Sample output:
Before
There is a
change in
a
medium.
After
[‘there,’c
hange’,’m
edium’,’.’
]
15

16
3. Implememtation of Sentence Tokenizer

Prerequisite:
• Linguistic rules and how to implement them.
• Sentence tokenize.
• Chunking.
Pre-Lab Task:
1) What is sentence tokenization?
Ans:-
Ans:-
Ans:-
17
4) What is shallow parsing and what is the minimum number of levels?

Ans:-
5) Define punkt,averaged_perceptron_tagger module in nltk?
Ans:-
18
In Lab Task:
1) Dany is struggling with her assignment given by her tuition master. She has to
read a paragraphand generate the tokens from the paragraph using sentence
tokenizer. She should also find the parts of speech for each word in the individual
tokens that she has generated using python.
Sample Input:
Notice of a bid advertisement shall be published in at least one local newspaper
and in one trade
publication at least 30 days in advance of sale. If applicable, the notice must identify
the reservationwithin which the tracts to be leased are found. Specific descriptions
of the tracts shall be availableat the office of the superintendent. The complete text
of the advertisement shall be mailed to eachperson listed on the appropriate agency
mailing list.
Sample Output:
Notice – NNP, bid – NN, advertisement – NN, and so on…
“Goveínment of India is taking all necessaíy steps to ensuíe that we aíe píepaíed
well to face the challenge and thíeat posed by the gíowing pandemic of COVID-
19 the Coíona Viíus.”
Writing space for the Problem:(For Student’s use only)
19
2. Sheena is working on her literature and needs to complete a task that should
contain a parse tree that satisfies the rule she is given and should draw the parse
tree using python for the given sentence in a required grammar rule using the
chunk parsing.
Sample Input:
Sentence=The little mouse ate the fresh cheezeRule=r"NP:{<DT>?<JJ>*<NN>}”
Sample Output:
20
Post Lab Task
1) Mary wanted to prepare a list of events that are being held in her country for
which she needs the information of name, location, time of the event and name
of the organization etc. She is having all the details of the events that are held,
all that she need is that the paragraph is modified in such a waythat the required
details are highlighted in each sentence of the paragraph. Help her out to find
the requirements using python.
Sample Input:
Larry Page and Sergey Brin, two students at Stanford University, USA, started
Backrub in early 1996.They made it into a company, Google Inc., on September
7, 1998 at a friend's garage in Menlo Park, California. In February 1999, the
company moved to 165 University Ave., Palo Alto, California,and then moved
to another place called the Googleplex. In September 2001, Google's rating
system (PageRank, for saying which information is more helpful) got a U.S.
Patent. The patent was to Stanford University, with Lawrence (Larry) Page as
the inventor (the person who first had the idea).Google makes a percentage of
its money through America Online and InterActiveCorp. It has a special group
known as the Partner Solutions Organization (PSO) which helps make contracts,
helpsmaking accounts better, and gives engineering help
21

22
3. Implememtation of paragraph tokenizer

Prerequisite:
• NLTK and jupyter notebook.
Pre-Lab Task:
1) What is latent Dirichlet allocation?
Ans:-
2) What is NLKT and how it is useful in processing NLP text analysis?
Ans:-
3) Mention the classes and methods in nltk.tokenize?
Ans:-
23
4) Write down the syntax for:
a) Importing nltk library.

Ans:-
b) Download the packages in nltk
Ans:-
c) Importing sentence tokenizer
Ans:-
d) Importing word tokenizer
Ans:-
24
In Lab Task:
1) Chinky is very naughty in the class. She is known for troubling her teacher.
One fine day her teacher decided to give her a punishment, so she gave her
set of paragraphs and told her to countthe number of sentences, words and
number of paragraphs that are there in the given book. Write a python
program using nltk tokenizer and help her out to complete the task.
25
2) Write a python code using Latent Dirichlet allocation (LDA) for topic modelling
[(i.e) apply LDAto a set of documents and split them into topics.]
26
Post Lab Task:
1) Using Latent Dirichlet Allocation answer the following
a) From the dataset mentioned above identify the top three ‘most’
probable words for the firsttopic.
b) Analyze the dataset thoroughly and find out the sum of the
probabilities assigned to the top50 words in the 3rd topic.
27

Marks Secured: out of _____
Signature of the Evaluator

Date of Evaluation:
28

4. Implement various actions in Corpora

Prerequisite:
• Understanding corpora and its purpose.
• Understanding types of corpora and data contained in them.
Pre-Lab Task:
1. Define corpus. What are different types of corpora and explain them?
Ans:-
2) Austin is fond of poems. He wants to perform few mathematical

operations on his favourite‘blake-poems’.
a) Import Gutenberg corpus and print all the file identifiers in this corpus.
Ans:-
b) Find the number of words in this poem.
Ans:-
29
c) Find number of times the word ‘blake’ is found the poem.
Ans:-
In Lab Task:
1) Henry wants to train a text-to-speech engine with large amounts of English
words based on genres.He decides to use brown corpus. To do this he needs
to understand brown corpus. So, help him with importing brown corpus and
performing the following actions.
a. Print all the genres present in it.
b. Select ‘editorial’ genre and print the different types of words present in
it in sorted order.
c. Select the range [3493:3499] in the above sorted list and find their
count in ‘fiction’, ‘lore’, ‘belles_lettres’, ’government’, ‘learned’
genres.
30
2) John wants to build a machine learning model to identify a speech given by
the president of the USA. To train the model, he wants to use an inaugural
corpus. Help him to understand the inauguralcorpus, implement the following
actions.
a) Print all the file identifiers.
b) Plot a graph where y-axis has count of words ‘president’ and ‘congress’
in each file and x-axishas year of presidential address.
31
Post Lab Task
1) Import reuters corpus and perform following actions.
a) Print all the categories present in corpus.
b) Print the file identifiers which has category named ‘coffee’.
Writing space for the Problem:(For Student’s use only
32
2) Create your own corpus with the NLP lab manual in text file. Print the
number of sentences inthe text file.
only)

33
5. Implement probabilistic parsing

Prerequisite:
• Understanding Probabilistic Parsing and its purpose.
• Understanding types of Parsing in Top Down and Bottom Up Parsing.
Pre-Lab Task:
1) Define ambiguity in Parse Tree and draw all possible parse trees
for given expressionP=Q+R*P?
Ans:-
2) Distinguish between Top-down and bottom up parsing with an example.

Ans:-
3) Apply the rules of parsing for the following expression“A boy can do
nothing for girl but be anything for her.”
Ans:-
34
In Lab Task:
1) Implement a Cocke-Younger-Kasamai algorithm for generating parse

trees for
a giveninput sentence in python.
Sample Input: S -> NP VP PP -> P NP NP -> Det N
NP -> Det N PP NP -> 'I'
VP -> V NP VP -> VP PP
Det -> 'an'
Det -> 'my'
N -> 'elephant' N -> 'pajamas' V -> 'shot'
P -> 'in'
Sample Output:
[S [NP 'I'][VP [V 'shot'][NP [NP0 [Det 'an'][N 'elephant']][PP [P 'in'][NP [Det
'my'][N
'pajamas']]]]]]
35
Post Lab Task:
1) Write variables for all length 6 substrings for deciding CFL using CKY
(Cocke-Younger-Kasamai ) algorithm.Consider the grammer
S-> ε (epsilon symbol) | AB | XB
T-> AB | XB X->AT
A->a B->b
Is w = aaabbb in L(G ) ?

Date of Evaluation:
36

6. Learning Grammar
Prerequisite:
Concepts Based on:
➢ Parsing
➢ PCFG
➢ CFG
➢ Viterbi Parser
Pre-Lab Task:
1) What are the types of parsing and their disadvantages?

Ans:-
2) What is Viterbi PCFG parsing?
Ans:-
3) Explain about top-down and bottom-up parsing and give examples.
Ans:-
37
In Lab Task:
1. Vicky is interested to know the difference between the two types of

Probabilistic Context-Free Grammars and also the productions provided in
each type for each word in a given sentence. He also wants probability of the
given sentence in parsing using python:
a. Print the words present in toy_pcfg1 and no. of productions.
b. Print the words present in toy_pcfg2 and no. of productions.
c. Find the probability for the given sentence and also all the words present in
the sentence must be in the toy_pcfg1 or toy_pcfg2 using viterbi pcfg
parsing.
Sample Input:
Jack ate the cookie
Sample Output:
(S (NP (Name Jack)) (VP (V ate) (NP (Det the) (N cookie)))) (p=0.000883756)

only)
38
Post Lab Task
1) Nicky is trying to learn how to parse the words in a given sentence so that she
could complete her challenge that she has done with her friend in a bet. She has
to parse the words in the given sentenceusing bottom-up using python in which
she should also use the Context-Free Grammar.
Sample Input:
the cat chased the dog
Sample Output:
(S (NP the (N cat)) (VP (V chased) (NP the (N dog))))

only)

39
7. Conditional frequency distribution

Date of the Session:____/ _____/ _____ Time of the Session: _____ to _____
Prerequisite:
• Understanding conditional frequency distribution and its
applications.
• Understanding concept of bigrams.
Pre-Lab Task:
1) Define Conditional Frequency Distributions and its features.
Ans:-
2) What are Bigrams and write python code to display bigrams by taking any
sentence of length 15.
Ans:-
40
In Lab Task:
1) Steven has a task to count number of words in a text given by his friends but
here the problem is to find the count of a genre using conditional frequency
distribution. Help him by using brown corpus and Take genre as [‘fiction’,
’lore’] and print most common words in both the conditions toComplete his
task successfully.
41
2) Find the number of words with frequency [2,4,7,9] and conditions

['Chickasaw', 'English','German_Deutsch’] Using Universal Declaration of
Human Rights Corpus and tabulate the results.
42
POST-LAB TASK:-
1. Write a python code to generate a sentence with most likely occurring words
of a given word withlength of 25 words Using Condition Frequency
Distribution and bigrams.
Use the following data to build the model:
Corpus: - “genesis” Text: - “english-web.txt” Word: - “darkness”

43

8. Lexical Analyzer
Date of the Session:_____/_____/_______ Time of the Session:_____ to ______
Prerequisite:
• Toolbox and text files.
Pre-Lab Task:
1) What is the purpose of lexicon?
Ans:-
2) What is the role of lexical analyzer in NLP?

Ans:-
3) What is a corpus? How to create wordlist corpus?

Ans:-
44
4) Implement a python program to create a wordlist corpus?
Ans:-
5) List out few tags for the parts of speech?

Ans:-
45
In Lab Task:
1) Gowtham is a English lecturer so he planned to conduct a exam to all the
students in the College.In the exam students have to identify the parts of
speech in the given paragraph and as per the evaluation there are few lecturers
of English to evaluate all the Students .So in order to complete the evaluation
he asked some other lecturers to evaluate and they don’t have enough
knowledge regarding the exam requirement . So in order to evaluate properly
they thought of using wordlist corpus as it can generate parts of speech. They
have to remove stop words from the paragraph andidentify the parts of speech
in the paragraph. Implement a python code to remove Stopwords andidentify
parts of speech?
Sample input:-
"gowtham, viswa and hari are my good friends. ""viswa is getting married next
year. " "hari is theinnocent among three." "gowtham is intelligent among them."
Sample output:-
[('gowtham', 'NN'), (',', ','), ('viswa', 'NN'), ('and',
'CC'), ('hari', 'NN'
), ('are', 'VBP'), ('my', 'PRP$'), ('good', 'JJ'),
('friends', 'NNS'), ('.','.')]
[('viswa', 'NN'), ('is', 'VBZ'), ('getting', 'VBG'),
('married', 'VBN'), ('next', 'JJ'), ('year', 'NN'), ('.',
'.')]
[('hari', 'NN'), ('is', 'VBZ'), ('the', 'DT'), ('innocent',
'JJ'), ('among',
'IN'), ('three.gowtham', 'NN'), ('is', 'VBZ'), ('intelligent',
'JJ'), ('among
', 'IN'), ('them', 'PRP'), ('.', '.')]
46
Post Lab Task:
1) A analyst wants to analyse a complete script of a new movie so that he can
clearly understand thatthere are no mistakes in the paragraph.In that way the
document will look cool and understandableto the editors so analyst wants to
script in the format that Convert text to lower case and Remove all non-word
characters Remove all punctuation in other words it has to be clear and
divided accordingly by removing all punctuation.Implement a python code in
toolbox using Bow(bag of words)?
Sample Input:
Beans. I was trying to explain to somebody as we were flying in, that’s corn. That’s
beans. And they werevery impressed at my agricultural knowledge. Please give it up
for Amaury once again for that outstanding introduction. I have a bunch of good
friends here today, including somebody who I served with, who is one of the finest
senators in the country, and we’re lucky to have him, your Senator, Dick Durbin is
here.I also noticed, by the way, former Governor Edgar here, who I haven’t seen in
a long time, and somehowhe has not aged and I have. And it’s great to see you,
Governor. I want to thank President Killeen and everybody at the U of I System for
making it possible for me to be here today. And I am deeply honored at the Paul
Douglas Award that is being given to me. He is somebody who set the path for so
much outstanding public service here in Illinois.
Sample Output:-

only)
47

48
DEPARTMENT OF COMPUTER SCIENCE

AND ENGINEERING
NATURAL LANGUAGE PROCESSING
WORKBOOK
9. WORDNET
Date of the Session: / / Time of the Session:
_to
Prerequisite:
• NLTK, Importing Word Net

Pre-Lab Task:
1) What is a Word Net and Synset?
Ans:-
2) What are hypernyms and hyponyms?

Ans:-
3) Below are the list of words mention its POS (parts of speech) through Synset
a) Silently
b) Lamborghini
c) Beauty
d) Maldives
Ans:-
49
4) Write a python program to find synonym and antonym using NLTK wordnet
(Given word = ”bad” )
Ans:-
5) What are Lexical Relations, give some examples of lexical relations.

Ans:-
50
In Lab Task:
1) Write down the syntax for the following:
(a) Import word net, Use the term "hello" to find Synsets
(b) Using Synset find the element in the 0th index, Just the word (using lemmas)
(c) Name, Definition of that first (0th index) Synset and examples of the word.
(d) Discern synonyms and antonyms in Synset
(e) Discern Hypernyms and Hyponyms in Synset
51
2) I. Given two words, calculate the similarity between the words
(a) By using Path Similarity
(b) By using Wu-Palmer Similarity
Word1=car & Word2=bar
II. Mention different methods in which we define the
distance between words.(i.e other than the methods
mentioned above)
52
Post Lab Task:
1) Explain how we can find out the semantic similarity of the sentences (it can
be two or more).
2) Draw the flowchart of wordnet hierarchy.
53
(For Evaluator’s use only

Date of Evaluation:
54
10. Context Free Grammer

Date of the Session:_____/ ____ /_______ Time of the Session: ______to________
Prerequisite:
• NLTK.
• Basic Knowledge on Context free Grammar.
• Basic knowledge on different types of Parsing techniques
Pre-Lab Task:
1) What is Recursive Descent Parser?
Ans:-
2) What is a parser and mention different types of parsers.

Ans:-
3) What is Shift Reduce Parser?

Ans:-
55
4) Implement the INLAB problem 1 manually.

Ans:-
In Lab Task:
1. Kriya a student at Cambridge University attends a competition on the theme

“Parsing” . She is given some explanation regarding the problem as follows:”
The simplest kind of parser interprets a grammar as a specification of how to
break a high-level goal into several lower-level sub goals. The top-level goal
is to find an S. The S → NP VP production permits the parser to replace this
goal with two sub goals: find an NP, then find a VP. Each of these sub goals
can be replaced in turn by sub-sub-goals, using productions that have NP and
VP on their left-hand side. Eventually,this expansion process leads to sub
goals such as: find the word telescope. Such sub goals can be directly
compared against the input sequence, and succeed if the next word is
matched. If there is no match the parser must back up and try a different
alternative”.The above explanation is regarding a parser. Help her solve the
above parsing technique and write a python code for it. Hint:This is a top down
parsing technique.
56
57
2. Implement the parser that tries to find sequences of words and phrases that
correspond to the righthand side of a grammar production, and replace them
with the left-hand side, until the whole sentence is reduced to an S. Hint: The
above mentioned parser is a bottom up parser.
58
Post Lab Task:
1) Implement the recursive descent parsing technique and shift reduce parsing
technique For LargeContext Free Grammar for ATIS grammar.

59
11. Name Entity Recognition
Date of the Session:_____/_____/______Time of the Session: ______ to ________
Prerequisite:
• Basic idea on Named Entity Recognition.
• Standard Libraries to use Named Entity Recognition.
Pre-Lab Task:
1) What is Context Free Grammar?
Ans:-
2) What is Chart Parser?
Ans:-
3) What are the different types of chart parsers?
Ans:-
60
In Lab Task:
1) Mr.P who is a student at Purdue University is interested in attending a
specialization based competition. The competition had a preliminary round in
which Mr.P is given a statement as follows:” Find the tool which uses the
algorithmic technique of dynamic programming to derive the parses of an
ambiguous sentence more efficiently and try to implement this tool with the help
of NLTK”.Assuming yourself as Mr.P try to implement it.
61
2) Implement the bottom-up left Corner and Left-Corner with Bottom-Up Filter
Parsers using nltk.

Date of Evaluation:
62

21AD3202 - Natural LanguageProcessing-Record

Uploaded by

Copyright:

Available Formats

21AD3202 - Natural LanguageProcessing-Record

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

21AD3202 - Natural LanguageProcessing-Record

Uploaded by

Copyright:

Available Formats

LAB WORKBOOK

21AD3202 NATURAL LANGUAGE PROCESSING

University Vision and Mission

To be a globally renowned university.

Department Vision and Mission

To be a department of international repute through continuous research, innovation

M2: Train students effectively to apply their computational skills in solving

M3: Provide students a competitive advantage, emulous environment in the ever-

M4: Facilitate multi-disciplinary innovation to advance theoretical computer

Program Educational Objectives (PEOs)

Graduates will be able to practice engineering in a broad range of

Graduates will be able to pursue advanced education, research and

Graduates will be able to conduct themselves in a responsible,

Graduates will be able to participate as leaders in their fields of

0. Organization of the student lab workbook......................................................................... 5

0: Organization of the student lab workbook

2023-24 EVEN SEMESTER LAB CONTINUOUS

Installation of Python, NLTK,

Implement various actions in

6 Implement probabilistic parsing

8 Conditional frequency distribution

2023-24 EVEN SEMESTER LAB CONTINUOUS

11 Context Free Grammar

12 Named Entity Recognition

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

1. Installation of Python, NLTK, Jupyter and preliminaries.

1) What is Natural Language Processing?

3) Mention some methods available in NLTK package and explain them.

Step1: Download the latest version of Python for Windows

Step2: Click on downloaded .exe to run it.

Step3: Select customize installation.

Step 6: Once the installation is successful close to the window.

Step 7: To have a better understanding of the commands, we need to install

pip install notebook

pip install nltk

Writing space for the Problem: (For Student’s use only)

Post Lab Task:-

Command: from nltk.corpus

2) Write the processing of NLTK.

3) Mention the uses of NLTK.

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

2. Implementation of Word Tokenizer

2) List out the techniques of NLP?

3) List out the real life examples of NLP?

6) Explain about stop words?

Writing space for the Problem:(For Student’s use

Writing space for the Problem:(For Student’s use

1) Albert is a English teacher in a school.He wants to teach the students about

Writing space for the Problem: (For Student’s use only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

3. Implememtation of Sentence Tokenizer

1) What is sentence tokenization?