21AD3202 - Natural LanguageProcessing-Record
21AD3202 - Natural LanguageProcessing-Record
21AD3202 - Natural LanguageProcessing-Record
Team NLP
K L UNIVERSITY | 21AD3202-NATURAL LANGUAGE PROCESSING
21AD3202 NATURAL LANGUAGE PROCESSING
LABORATORY WORKBOOK
STUDENT
NAME
REG. NO
YEAR
SEMESTER
SECTION
FACULTY
1
21AD3202 NATURAL LANGUAGE PROCESSING
Mission
To impart quality higher education and to undertake research and extension with
emphasis on application and innovation that cater to the emerging societal needs
through all-round development of the students of all sections enabling them to be
globally competitive and socially responsible citizens with intrinsic values.
Mission
To Impart Quality Education with social consciousness and make them Globally
Competent.
Mission Statements
M1: Provide quality education in both the theoretical and applied foundations of
computer science & computerengineering.
S. No PEO# Statement
2
21AD3202 NATURAL LANGUAGE PROCESSING
3
21AD3202 NATURAL LANGUAGE PROCESSING
Table of Contents
4
21AD3202 NATURAL LANGUAGE PROCESSING
The laboratory framework includes a creative element but shifts the time-intensive
aspects outside of the Two-Hour closed laboratory period. Within this structure, each
laboratory includes three parts: Prelab, In-lab, and Post-lab.
a. Pre-Lab
The Prelab exercise is a homework assignment that links the lecture with the laboratory
period - typically takes 2 hours to complete. The goal is to synthesize the information
they learn in lecture with material from their textbook to produce a working piece of
software. Prelab Students attending a two-hour closed laboratory are expected to make
a good-faith effort to complete the Prelab exercise before coming to the lab. Their work
need not be perfect, but their effort must be real (roughly 80 percent correct).
b. In-Lab
The In-lab section takes place during the actual laboratory period. The First hour of the
laboratory period can be used to resolve any problems the students might have
experienced in completing the Prelab exercises. The intent is to give constructive
feedback so that students leave the lab with working Prelab software - a significant
accomplishment on their part. During the second hour, students complete the In-lab
exercise to reinforce the concepts learned in the Prelab. Students leave the lab having
received feedback on their Prelab and In-lab work.
c. Post-Lab
The last phase of each laboratory is a homework assignment that is done following the
laboratory period. In the Post-lab, students analyse the efficiency or utility of a given
system call. Each Post-lab exercise should take roughly 120 minutes to complete.
5
21AD3202 NATURAL LANGUAGE PROCESSING
In-Lab
Sl Pre-Lab Post Lab Viva Voce Total Faculty
Date Experiment Name LOGIC EXECUT RESULT ANALYS
No (5M) (5M) (5M) (50M) Signature
(10M) ION (10M) IS (5M)
(10M)
Implementation of Word
2
Tokenizer
Implememtation of Sentence
3
Tokenizer
Implememtation of paragraph
4
tokenizer
7 Learning grammar
4
21AD3202 NATURAL LANGUAGE PROCESSING
In-Lab
Sl Pre-Lab Post Lab Viva Voce Total Faculty
Date Experiment Name LOGIC EXECUTIO RESUL ANALYS
No (5M) (10M) N (10M) T IS (5M)
(5M) (5M) (50M) Signature
(10M)
9 Lexical Analyzer
10 Wordnet
5
21AD3202 NATURAL LANGUAGE PROCESSING
Prerequisite:
• Basic Knowledge on Natural Language Processing.
• Importing NLTK package.
• Methods of NLTK.
Pre-Lab Task:
2) What is NLTK?
Ans:-
6
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:
1) Basic Steps for Installation of NLTK in python.
PROCEDURE:
Step4: Check for all the features especially “pip” as it helps to install NLTK and click on
Next.
Step5: select advanced options, select the path and click on install.
Step 8: Copy the path of Scripts folder to install NLTK in the same folder.
Step 9: To install NLTK, open command prompt and type the command:
Step 9: To run the Jupyter, select the appropriate folder in the command prompt and give
the following command
jupyter notebook
7
21AD3202 NATURAL LANGUAGE PROCESSING
Command:
import nltk
nltk.download()
NLTK downloader opens a window to download the datasets. The size of the dataset
is big henceit will take time. To test if datasets are installed properly, try importing the
dataset and use it.
8
21AD3202 NATURAL LANGUAGE PROCESSING
4) Although there are many other tools for natural language processing
why NLTK is mostlypreferred?
Ans:-
10
21AD3202 NATURAL LANGUAGE PROCESSING
Prerequisite:
• NLTK and jupyter notebook.
• Basic knowledge on methods available in NLTK
Pre-Lab Task:
1) Explain Natural Language processing in your own words?
Ans:-
11
21AD3202 NATURAL LANGUAGE PROCESSING
4) Define tokenization?
Ans:-
5) Define stemming ?
Ans:-
Ans:-
7) Which tokenizer is used to when there are other languages other than English?
Ans:-
12
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:
1) Chinnu wants to help his sister pravallika to pass through the aptitude exam, so in
order to help her hewants to test her skills in english .So he decided to assign some
sentences and want the key root words inthe sentences.So implement a python code
that helps pravallika to get the root words in the given sentences?(Note:Use
stemming and tokenization process)
Sample Input:
Maximum students are Suffering from Insominia.
Sample Output:
Maximum;max
Students;Student
Are;are
Suffering;suffer
From;from
Insominia:somni
13
21AD3202 NATURAL LANGUAGE PROCESSING
2) Chinnu is a comic editor in a X company one day while editing a particular script
he became enthusiastic about the story of the script so there is no time to read the
whole script he decided to understand the total story line by learning about the
characters so he wants to separate the words in thesentence to know the characters
as the whole story is complex ,So implement a python program that splits the words
and display both splitted words and count of the words in the given sentence using
tokenizer function?
Sample input:
Chinnu is the one who protect lilly.
Sample output:
[‘chinnu’,
‘is’,’the’,’one’.’who’,’prote
cted’,lilly’]Count;7
14
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task:
Sample input:
There is a change in a medium.
Sample output:
Before
There is a
change in
a
medium.
After
[‘there,’c
hange’,’m
edium’,’.’
]
15
21AD3202 NATURAL LANGUAGE PROCESSING
16
21AD3202 NATURAL LANGUAGE PROCESSING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK
Prerequisite:
• Linguistic rules and how to implement them.
• Sentence tokenize.
• Chunking.
Pre-Lab Task:
Ans:-
Ans:-
Ans:-
17
21AD3202 NATURAL LANGUAGE PROCESSING
Ans:-
18
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:
1) Dany is struggling with her assignment given by her tuition master. She has to
read a paragraphand generate the tokens from the paragraph using sentence
tokenizer. She should also find the parts of speech for each word in the individual
tokens that she has generated using python.
Sample Input:
Notice of a bid advertisement shall be published in at least one local newspaper
and in one trade
publication at least 30 days in advance of sale. If applicable, the notice must identify
the reservationwithin which the tracts to be leased are found. Specific descriptions
of the tracts shall be availableat the office of the superintendent. The complete text
of the advertisement shall be mailed to eachperson listed on the appropriate agency
mailing list.
Sample Output:
Notice – NNP, bid – NN, advertisement – NN, and so on…
“Goveínment of India is taking all necessaíy steps to ensuíe that we aíe píepaíed
well to face the challenge and thíeat posed by the gíowing pandemic of COVID-
19 the Coíona Viíus.”
19
21AD3202 NATURAL LANGUAGE PROCESSING
2. Sheena is working on her literature and needs to complete a task that should
contain a parse tree that satisfies the rule she is given and should draw the parse
tree using python for the given sentence in a required grammar rule using the
chunk parsing.
Sample Input:
Sentence=The little mouse ate the fresh cheezeRule=r"NP:{<DT>?<JJ>*<NN>}”
Sample Output:
20
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task
1) Mary wanted to prepare a list of events that are being held in her country for
which she needs the information of name, location, time of the event and name
of the organization etc. She is having all the details of the events that are held,
all that she need is that the paragraph is modified in such a waythat the required
details are highlighted in each sentence of the paragraph. Help her out to find
the requirements using python.
Sample Input:
Larry Page and Sergey Brin, two students at Stanford University, USA, started
Backrub in early 1996.They made it into a company, Google Inc., on September
7, 1998 at a friend's garage in Menlo Park, California. In February 1999, the
company moved to 165 University Ave., Palo Alto, California,and then moved
to another place called the Googleplex. In September 2001, Google's rating
system (PageRank, for saying which information is more helpful) got a U.S.
Patent. The patent was to Stanford University, with Lawrence (Larry) Page as
the inventor (the person who first had the idea).Google makes a percentage of
its money through America Online and InterActiveCorp. It has a special group
known as the Partner Solutions Organization (PSO) which helps make contracts,
helpsmaking accounts better, and gives engineering help
21
21AD3202 NATURAL LANGUAGE PROCESSING
22
21AD3202 NATURAL LANGUAGE PROCESSING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK
Pre-Lab Task:
1) What is latent Dirichlet allocation?
Ans:-
Ans:-
Ans:-
23
21AD3202 NATURAL LANGUAGE PROCESSING
Ans:-
Ans:-
Ans:-
24
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:
1) Chinky is very naughty in the class. She is known for troubling her teacher.
One fine day her teacher decided to give her a punishment, so she gave her
set of paragraphs and told her to countthe number of sentences, words and
number of paragraphs that are there in the given book. Write a python
program using nltk tokenizer and help her out to complete the task.
25
21AD3202 NATURAL LANGUAGE PROCESSING
2) Write a python code using Latent Dirichlet allocation (LDA) for topic modelling
[(i.e) apply LDAto a set of documents and split them into topics.]
Writing space for the Problem:(For Student’s use only)
26
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task:
1) Using Latent Dirichlet Allocation answer the following
a) From the dataset mentioned above identify the top three ‘most’
probable words for the firsttopic.
b) Analyze the dataset thoroughly and find out the sum of the
probabilities assigned to the top50 words in the 3rd topic.
27
21AD3202 NATURAL LANGUAGE PROCESSING
28
21AD3202 NATURAL LANGUAGE PROCESSING
Prerequisite:
• Understanding corpora and its purpose.
• Understanding types of corpora and data contained in them.
Pre-Lab Task:
1. Define corpus. What are different types of corpora and explain them?
Ans:-
Ans:-
Ans:-
29
21AD3202 NATURAL LANGUAGE PROCESSING
Ans:-
In Lab Task:
1) Henry wants to train a text-to-speech engine with large amounts of English
words based on genres.He decides to use brown corpus. To do this he needs
to understand brown corpus. So, help him with importing brown corpus and
performing the following actions.
a. Print all the genres present in it.
b. Select ‘editorial’ genre and print the different types of words present in
it in sorted order.
c. Select the range [3493:3499] in the above sorted list and find their
count in ‘fiction’, ‘lore’, ‘belles_lettres’, ’government’, ‘learned’
genres.
30
21AD3202 NATURAL LANGUAGE PROCESSING
2) John wants to build a machine learning model to identify a speech given by
the president of the USA. To train the model, he wants to use an inaugural
corpus. Help him to understand the inauguralcorpus, implement the following
actions.
a) Print all the file identifiers.
b) Plot a graph where y-axis has count of words ‘president’ and ‘congress’
in each file and x-axishas year of presidential address.
31
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task
1) Import reuters corpus and perform following actions.
a) Print all the categories present in corpus.
b) Print the file identifiers which has category named ‘coffee’.
32
21AD3202 NATURAL LANGUAGE PROCESSING
2) Create your own corpus with the NLP lab manual in text file. Print the
number of sentences inthe text file.
Writing space for the Problem:(For Student’s use
only)
33
21AD3202 NATURAL LANGUAGE PROCESSING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK
Prerequisite:
• Understanding Probabilistic Parsing and its purpose.
• Understanding types of Parsing in Top Down and Bottom Up Parsing.
Pre-Lab Task:
1) Define ambiguity in Parse Tree and draw all possible parse trees
for given expressionP=Q+R*P?
Ans:-
3) Apply the rules of parsing for the following expression“A boy can do
nothing for girl but be anything for her.”
Ans:-
34
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:
35
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task:
1) Write variables for all length 6 substrings for deciding CFL using CKY
(Cocke-Younger-Kasamai ) algorithm.Consider the grammer
S-> ε (epsilon symbol) | AB | XB
T-> AB | XB X->AT
A->a B->b
Is w = aaabbb in L(G ) ?
Date of Evaluation:
36
21AD3202 NATURAL LANGUAGE PROCESSING
6. Learning Grammar
Date of the Session: / / Time of the Session: to
Prerequisite:
Concepts Based on:
➢ Parsing
➢ PCFG
➢ CFG
➢ Viterbi Parser
Pre-Lab Task:
Ans:-
Ans:-
37
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:
Sample Input:
Jack ate the cookie
Sample Output:
(S (NP (Name Jack)) (VP (V ate) (NP (Det the) (N cookie)))) (p=0.000883756)
38
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task
1) Nicky is trying to learn how to parse the words in a given sentence so that she
could complete her challenge that she has done with her friend in a bet. She has
to parse the words in the given sentenceusing bottom-up using python in which
she should also use the Context-Free Grammar.
Sample Input:
the cat chased the dog
Sample Output:
(S (NP the (N cat)) (VP (V chased) (NP the (N dog))))
39
21AD3202 NATURAL LANGUAGE PROCESSING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK
Prerequisite:
• Understanding conditional frequency distribution and its
applications.
• Understanding concept of bigrams.
Pre-Lab Task:
Ans:-
2) What are Bigrams and write python code to display bigrams by taking any
sentence of length 15.
Ans:-
40
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:
1) Steven has a task to count number of words in a text given by his friends but
here the problem is to find the count of a genre using conditional frequency
distribution. Help him by using brown corpus and Take genre as [‘fiction’,
’lore’] and print most common words in both the conditions toComplete his
task successfully.
Writing space for the Problem:(For Student’s use only)
41
21AD3202 NATURAL LANGUAGE PROCESSING
42
21AD3202 NATURAL LANGUAGE PROCESSING
POST-LAB TASK:-
1. Write a python code to generate a sentence with most likely occurring words
of a given word withlength of 25 words Using Condition Frequency
Distribution and bigrams.
Use the following data to build the model:
Corpus: - “genesis” Text: - “english-web.txt” Word: - “darkness”
Writing space for the Problem:(For Student’s use only)
43
21AD3202 NATURAL LANGUAGE PROCESSING
8. Lexical Analyzer
Date of the Session:_____/_____/_______ Time of the Session:_____ to ______
Prerequisite:
• Toolbox and text files.
Pre-Lab Task:
1) What is the purpose of lexicon?
Ans:-
44
21AD3202 NATURAL LANGUAGE PROCESSING
4) Implement a python program to create a wordlist corpus?
Ans:-
45
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:
1) Gowtham is a English lecturer so he planned to conduct a exam to all the
students in the College.In the exam students have to identify the parts of
speech in the given paragraph and as per the evaluation there are few lecturers
of English to evaluate all the Students .So in order to complete the evaluation
he asked some other lecturers to evaluate and they don’t have enough
knowledge regarding the exam requirement . So in order to evaluate properly
they thought of using wordlist corpus as it can generate parts of speech. They
have to remove stop words from the paragraph andidentify the parts of speech
in the paragraph. Implement a python code to remove Stopwords andidentify
parts of speech?
Sample input:-
"gowtham, viswa and hari are my good friends. ""viswa is getting married next
year. " "hari is theinnocent among three." "gowtham is intelligent among them."
Sample output:-
[('gowtham', 'NN'), (',', ','), ('viswa', 'NN'), ('and',
'CC'), ('hari', 'NN'
), ('are', 'VBP'), ('my', 'PRP$'), ('good', 'JJ'),
('friends', 'NNS'), ('.','.')]
[('viswa', 'NN'), ('is', 'VBZ'), ('getting', 'VBG'),
('married', 'VBN'), ('next', 'JJ'), ('year', 'NN'), ('.',
'.')]
[('hari', 'NN'), ('is', 'VBZ'), ('the', 'DT'), ('innocent',
'JJ'), ('among',
'IN'), ('three.gowtham', 'NN'), ('is', 'VBZ'), ('intelligent',
'JJ'), ('among
', 'IN'), ('them', 'PRP'), ('.', '.')]
46
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task:
1) A analyst wants to analyse a complete script of a new movie so that he can
clearly understand thatthere are no mistakes in the paragraph.In that way the
document will look cool and understandableto the editors so analyst wants to
script in the format that Convert text to lower case and Remove all non-word
characters Remove all punctuation in other words it has to be clear and
divided accordingly by removing all punctuation.Implement a python code in
toolbox using Bow(bag of words)?
Sample Input:
Beans. I was trying to explain to somebody as we were flying in, that’s corn. That’s
beans. And they werevery impressed at my agricultural knowledge. Please give it up
for Amaury once again for that outstanding introduction. I have a bunch of good
friends here today, including somebody who I served with, who is one of the finest
senators in the country, and we’re lucky to have him, your Senator, Dick Durbin is
here.I also noticed, by the way, former Governor Edgar here, who I haven’t seen in
a long time, and somehowhe has not aged and I have. And it’s great to see you,
Governor. I want to thank President Killeen and everybody at the U of I System for
making it possible for me to be here today. And I am deeply honored at the Paul
Douglas Award that is being given to me. He is somebody who set the path for so
much outstanding public service here in Illinois.
Sample Output:-
47
21AD3202 NATURAL LANGUAGE PROCESSING
48
21AD3202 NATURAL LANGUAGE PROCESSING
9. WORDNET
Date of the Session: / / Time of the Session:
_to
Prerequisite:
3) Below are the list of words mention its POS (parts of speech) through Synset
a) Silently
b) Lamborghini
c) Beauty
d) Maldives
Ans:-
49
21AD3202 NATURAL LANGUAGE PROCESSING
4) Write a python program to find synonym and antonym using NLTK wordnet
(Given word = ”bad” )
Ans:-
50
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:
1) Write down the syntax for the following:
(a) Import word net, Use the term "hello" to find Synsets
(b) Using Synset find the element in the 0th index, Just the word (using lemmas)
(c) Name, Definition of that first (0th index) Synset and examples of the word.
(d) Discern synonyms and antonyms in Synset
(e) Discern Hypernyms and Hyponyms in Synset
51
21AD3202 NATURAL LANGUAGE PROCESSING
2) I. Given two words, calculate the similarity between the words
(a) By using Path Similarity
(b) By using Wu-Palmer Similarity
Word1=car & Word2=bar
II. Mention different methods in which we define the
distance between words.(i.e other than the methods
mentioned above)
52
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task:
1) Explain how we can find out the semantic similarity of the sentences (it can
be two or more).
2) Draw the flowchart of wordnet hierarchy.
53
21AD3202 NATURAL LANGUAGE PROCESSING
Date of Evaluation:
54
21AD3202 NATURAL LANGUAGE PROCESSING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK
Prerequisite:
• NLTK.
• Basic Knowledge on Context free Grammar.
• Basic knowledge on different types of Parsing techniques
Pre-Lab Task:
1) What is Recursive Descent Parser?
Ans:-
55
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:
56
21AD3202 NATURAL LANGUAGE PROCESSING
57
21AD3202 NATURAL LANGUAGE PROCESSING
2. Implement the parser that tries to find sequences of words and phrases that
correspond to the righthand side of a grammar production, and replace them
with the left-hand side, until the whole sentence is reduced to an S. Hint: The
above mentioned parser is a bottom up parser.
58
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task:
1) Implement the recursive descent parsing technique and shift reduce parsing
technique For LargeContext Free Grammar for ATIS grammar.
59
21AD3202 NATURAL LANGUAGE PROCESSING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK
Prerequisite:
• Basic idea on Named Entity Recognition.
• Standard Libraries to use Named Entity Recognition.
Pre-Lab Task:
1) What is Context Free Grammar?
Ans:-
Ans:-
Ans:-
60
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:
1) Mr.P who is a student at Purdue University is interested in attending a
specialization based competition. The competition had a preliminary round in
which Mr.P is given a statement as follows:” Find the tool which uses the
algorithmic technique of dynamic programming to derive the parses of an
ambiguous sentence more efficiently and try to implement this tool with the help
of NLTK”.Assuming yourself as Mr.P try to implement it.
61
21AD3202 NATURAL LANGUAGE PROCESSING
2) Implement the bottom-up left Corner and Left-Corner with Bottom-Up Filter
Parsers using nltk.
Date of Evaluation:
62