21AD3202 - Natural LanguageProcessing-Record

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

LAB WORKBOOK

21AD3202 NATURAL LANGUAGE PROCESSING

Team NLP
K L UNIVERSITY | 21AD3202-NATURAL LANGUAGE PROCESSING
21AD3202 NATURAL LANGUAGE PROCESSING

LABORATORY WORKBOOK

STUDENT
NAME
REG. NO
YEAR
SEMESTER
SECTION
FACULTY

1
21AD3202 NATURAL LANGUAGE PROCESSING

University Vision and Mission


Vision

To be a globally renowned university.

Mission

To impart quality higher education and to undertake research and extension with
emphasis on application and innovation that cater to the emerging societal needs
through all-round development of the students of all sections enabling them to be
globally competitive and socially responsible citizens with intrinsic values.

Department Vision and Mission


Vision

To be a department of international repute through continuous research, innovation


and industry led curriculum.

Mission

To Impart Quality Education with social consciousness and make them Globally
Competent.

Mission Statements

M1: Provide quality education in both the theoretical and applied foundations of
computer science & computerengineering.

M2: Train students effectively to apply their computational skills in solving


industrial, societal and real-worldproblems.

M3: Provide students a competitive advantage, emulous environment in the ever-


changing and challengingglobal workforce.

M4: Facilitate multi-disciplinary innovation to advance theoretical computer


science through experimentalresearch.

Program Educational Objectives (PEOs)

S. No PEO# Statement

Graduates will be able to practice engineering in a broad range of


1 PEO1
industrial, societal and real-world applications.

Graduates will be able to pursue advanced education, research and


2 PEO2 development, by adapting creative and innovative practices in their
professional careers.

Graduates will be able to conduct themselves in a responsible,


3 PEO3
professional, and ethical manner.

2
21AD3202 NATURAL LANGUAGE PROCESSING

Graduates will be able to participate as leaders in their fields of


4 PEO4 expertise and in activities that support service and economic
development throughout the world.

3
21AD3202 NATURAL LANGUAGE PROCESSING

Table of Contents

0. Organization of the student lab workbook......................................................................... 5


1. Installation of Python, NLTK, Jupyter and preliminaries. ................................................. 6
2. Implementation of Word Tokenizer ................................................................................. 11
3. Implememtation of Sentence Tokenizer ........................................................................... 17
4. Implememtation of paragraph tokenizer........................................................................... 23
5. Implement various actions in Corpora.............................................................................. 29
6. Implement probabilistic parsing ....................................................................................... 34
7. Learning Grammar ........................................................................................................... 37
8. Conditional frequency distribution ................................................................................... 40
9. Lexical Analyzer .............................................................................................................. 44
10. WORDNET ...................................................................................................................... 49
11. Context Free Grammer ..................................................................................................... 55
12. Name Entity Recognition ................................................................................................. 60

4
21AD3202 NATURAL LANGUAGE PROCESSING

0: Organization of the student lab workbook

The laboratory framework includes a creative element but shifts the time-intensive
aspects outside of the Two-Hour closed laboratory period. Within this structure, each
laboratory includes three parts: Prelab, In-lab, and Post-lab.
a. Pre-Lab
The Prelab exercise is a homework assignment that links the lecture with the laboratory
period - typically takes 2 hours to complete. The goal is to synthesize the information
they learn in lecture with material from their textbook to produce a working piece of
software. Prelab Students attending a two-hour closed laboratory are expected to make
a good-faith effort to complete the Prelab exercise before coming to the lab. Their work
need not be perfect, but their effort must be real (roughly 80 percent correct).
b. In-Lab
The In-lab section takes place during the actual laboratory period. The First hour of the
laboratory period can be used to resolve any problems the students might have
experienced in completing the Prelab exercises. The intent is to give constructive
feedback so that students leave the lab with working Prelab software - a significant
accomplishment on their part. During the second hour, students complete the In-lab
exercise to reinforce the concepts learned in the Prelab. Students leave the lab having
received feedback on their Prelab and In-lab work.
c. Post-Lab
The last phase of each laboratory is a homework assignment that is done following the
laboratory period. In the Post-lab, students analyse the efficiency or utility of a given
system call. Each Post-lab exercise should take roughly 120 minutes to complete.

5
21AD3202 NATURAL LANGUAGE PROCESSING

2023-24 EVEN SEMESTER LAB CONTINUOUS


EVALUATION

In-Lab
Sl Pre-Lab Post Lab Viva Voce Total Faculty
Date Experiment Name LOGIC EXECUT RESULT ANALYS
No (5M) (5M) (5M) (50M) Signature
(10M) ION (10M) IS (5M)
(10M)

Installation of Python, NLTK,


1
Jupyter and preliminaries.

Implementation of Word
2
Tokenizer

Implememtation of Sentence
3
Tokenizer

Implememtation of paragraph
4
tokenizer

Implement various actions in


5
Corpora

6 Implement probabilistic parsing

7 Learning grammar

8 Conditional frequency distribution

4
21AD3202 NATURAL LANGUAGE PROCESSING

2023-24 EVEN SEMESTER LAB CONTINUOUS


EVALUATION

In-Lab
Sl Pre-Lab Post Lab Viva Voce Total Faculty
Date Experiment Name LOGIC EXECUTIO RESUL ANALYS
No (5M) (10M) N (10M) T IS (5M)
(5M) (5M) (50M) Signature
(10M)

9 Lexical Analyzer

10 Wordnet

11 Context Free Grammar

12 Named Entity Recognition

5
21AD3202 NATURAL LANGUAGE PROCESSING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK

1. Installation of Python, NLTK, Jupyter and preliminaries.


Date of the Session: / / Time of the Session: to

Prerequisite:
• Basic Knowledge on Natural Language Processing.
• Importing NLTK package.
• Methods of NLTK.

Pre-Lab Task:

1) What is Natural Language Processing?


Ans:-

2) What is NLTK?
Ans:-

3) Mention some methods available in NLTK package and explain them.


Ans:-

6
21AD3202 NATURAL LANGUAGE PROCESSING

In Lab Task:
1) Basic Steps for Installation of NLTK in python.

PROCEDURE:

Step1: Download the latest version of Python for Windows

Step2: Click on downloaded .exe to run it.

Step3: Select customize installation.

Step4: Check for all the features especially “pip” as it helps to install NLTK and click on

Next.

Step5: select advanced options, select the path and click on install.

Step 6: Once the installation is successful close to the window.

Step 7: To have a better understanding of the commands, we need to install


Jupyter notebook

pip install notebook

Step 8: Copy the path of Scripts folder to install NLTK in the same folder.

Step 9: To install NLTK, open command prompt and type the command:

pip install nltk

Step 9: To run the Jupyter, select the appropriate folder in the command prompt and give
the following command

jupyter notebook

Writing space for the Problem: (For Student’s use only)

7
21AD3202 NATURAL LANGUAGE PROCESSING

Post Lab Task:-


1) Steps for downloading datasets from NLTK.
After successfully installing NLTK, you can import it and also download its
corpora with thefollowing command.

Command:
import nltk
nltk.download()

NLTK downloader opens a window to download the datasets. The size of the dataset
is big henceit will take time. To test if datasets are installed properly, try importing the
dataset and use it.

Command: from nltk.corpus


import brownbrown.words();

2) Write the processing of NLTK.


Ans:-

3) Mention the uses of NLTK.


Ans:-

8
21AD3202 NATURAL LANGUAGE PROCESSING

4) Although there are many other tools for natural language processing
why NLTK is mostlypreferred?

Ans:-

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: out of

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

10
21AD3202 NATURAL LANGUAGE PROCESSING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK

2. Implementation of Word Tokenizer


Date of the Session: / / Time of the Session: to

Prerequisite:
• NLTK and jupyter notebook.
• Basic knowledge on methods available in NLTK
Pre-Lab Task:
1) Explain Natural Language processing in your own words?
Ans:-

2) List out the techniques of NLP?


Ans:-

3) List out the real life examples of NLP?


Ans:-

11
21AD3202 NATURAL LANGUAGE PROCESSING

4) Define tokenization?
Ans:-

5) Define stemming ?
Ans:-

6) Explain about stop words?

Ans:-

7) Which tokenizer is used to when there are other languages other than English?
Ans:-

12
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:

1) Chinnu wants to help his sister pravallika to pass through the aptitude exam, so in
order to help her hewants to test her skills in english .So he decided to assign some
sentences and want the key root words inthe sentences.So implement a python code
that helps pravallika to get the root words in the given sentences?(Note:Use
stemming and tokenization process)

Sample Input:
Maximum students are Suffering from Insominia.
Sample Output:
Maximum;max
Students;Student
Are;are
Suffering;suffer
From;from
Insominia:somni

Writing space for the Problem:(For Student’s use


only)

13
21AD3202 NATURAL LANGUAGE PROCESSING

2) Chinnu is a comic editor in a X company one day while editing a particular script
he became enthusiastic about the story of the script so there is no time to read the
whole script he decided to understand the total story line by learning about the
characters so he wants to separate the words in thesentence to know the characters
as the whole story is complex ,So implement a python program that splits the words
and display both splitted words and count of the words in the given sentence using
tokenizer function?

Sample input:
Chinnu is the one who protect lilly.
Sample output:
[‘chinnu’,
‘is’,’the’,’one’.’who’,’prote
cted’,lilly’]Count;7

Writing space for the Problem:(For Student’s use


only)

14
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task:

1) Albert is a English teacher in a school.He wants to teach the students about


connectors and prepositionsand there usage in sentence formation so he thought
an idea that helps students to understand more about the connectors and
prepositions so he planned to give different sentences to the students and to remove
theconnectors and prepositions in the sentence .Implement a python code to help
students to remove those connectors and prepositions?(Note: connectors and
prepositions represents stop words take them in a textfile for required output)

Sample input:
There is a change in a medium.
Sample output:
Before
There is a
change in
a
medium.
After
[‘there,’c
hange’,’m
edium’,’.’
]

Writing space for the Problem: (For Student’s use only)

15
21AD3202 NATURAL LANGUAGE PROCESSING

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: out of

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

16
21AD3202 NATURAL LANGUAGE PROCESSING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK

3. Implememtation of Sentence Tokenizer


Date of the Session: / / Time of the Session: to

Prerequisite:
• Linguistic rules and how to implement them.
• Sentence tokenize.
• Chunking.
Pre-Lab Task:

1) What is sentence tokenization?

Ans:-

2) What is sentence tokenization?

Ans:-

3) What is sentence tokenization?

Ans:-

17
21AD3202 NATURAL LANGUAGE PROCESSING

4) What is shallow parsing and what is the minimum number of levels?


Ans:-

5) Define punkt,averaged_perceptron_tagger module in nltk?

Ans:-

18
21AD3202 NATURAL LANGUAGE PROCESSING

In Lab Task:
1) Dany is struggling with her assignment given by her tuition master. She has to
read a paragraphand generate the tokens from the paragraph using sentence
tokenizer. She should also find the parts of speech for each word in the individual
tokens that she has generated using python.
Sample Input:
Notice of a bid advertisement shall be published in at least one local newspaper
and in one trade
publication at least 30 days in advance of sale. If applicable, the notice must identify
the reservationwithin which the tracts to be leased are found. Specific descriptions
of the tracts shall be availableat the office of the superintendent. The complete text
of the advertisement shall be mailed to eachperson listed on the appropriate agency
mailing list.
Sample Output:
Notice – NNP, bid – NN, advertisement – NN, and so on…

“Goveínment of India is taking all necessaíy steps to ensuíe that we aíe píepaíed
well to face the challenge and thíeat posed by the gíowing pandemic of COVID-
19 the Coíona Viíus.”

Writing space for the Problem:(For Student’s use only)

19
21AD3202 NATURAL LANGUAGE PROCESSING

2. Sheena is working on her literature and needs to complete a task that should
contain a parse tree that satisfies the rule she is given and should draw the parse
tree using python for the given sentence in a required grammar rule using the
chunk parsing.
Sample Input:
Sentence=The little mouse ate the fresh cheezeRule=r"NP:{<DT>?<JJ>*<NN>}”
Sample Output:

Writing space for the Problem:(For Student’s use only)

20
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task
1) Mary wanted to prepare a list of events that are being held in her country for
which she needs the information of name, location, time of the event and name
of the organization etc. She is having all the details of the events that are held,
all that she need is that the paragraph is modified in such a waythat the required
details are highlighted in each sentence of the paragraph. Help her out to find
the requirements using python.
Sample Input:
Larry Page and Sergey Brin, two students at Stanford University, USA, started
Backrub in early 1996.They made it into a company, Google Inc., on September
7, 1998 at a friend's garage in Menlo Park, California. In February 1999, the
company moved to 165 University Ave., Palo Alto, California,and then moved
to another place called the Googleplex. In September 2001, Google's rating
system (PageRank, for saying which information is more helpful) got a U.S.
Patent. The patent was to Stanford University, with Lawrence (Larry) Page as
the inventor (the person who first had the idea).Google makes a percentage of
its money through America Online and InterActiveCorp. It has a special group
known as the Partner Solutions Organization (PSO) which helps make contracts,
helpsmaking accounts better, and gives engineering help

Writing space for the Problem:(For Student’s use only)

21
21AD3202 NATURAL LANGUAGE PROCESSING

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: out of

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

22
21AD3202 NATURAL LANGUAGE PROCESSING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK

3. Implememtation of paragraph tokenizer


Date of the Session: / / Time of the Session: to
Prerequisite:
• NLTK and jupyter notebook.

Pre-Lab Task:
1) What is latent Dirichlet allocation?

Ans:-

2) What is NLKT and how it is useful in processing NLP text analysis?

Ans:-

3) Mention the classes and methods in nltk.tokenize?

Ans:-

23
21AD3202 NATURAL LANGUAGE PROCESSING

4) Write down the syntax for:

a) Importing nltk library.


Ans:-

b) Download the packages in nltk

Ans:-

c) Importing sentence tokenizer

Ans:-

d) Importing word tokenizer

Ans:-

24
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:
1) Chinky is very naughty in the class. She is known for troubling her teacher.
One fine day her teacher decided to give her a punishment, so she gave her
set of paragraphs and told her to countthe number of sentences, words and
number of paragraphs that are there in the given book. Write a python
program using nltk tokenizer and help her out to complete the task.

Writing space for the Problem: (For Student’s use only)

25
21AD3202 NATURAL LANGUAGE PROCESSING

2) Write a python code using Latent Dirichlet allocation (LDA) for topic modelling
[(i.e) apply LDAto a set of documents and split them into topics.]
Writing space for the Problem:(For Student’s use only)

26
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task:
1) Using Latent Dirichlet Allocation answer the following
a) From the dataset mentioned above identify the top three ‘most’
probable words for the firsttopic.
b) Analyze the dataset thoroughly and find out the sum of the
probabilities assigned to the top50 words in the 3rd topic.

Writing space for the Problem: (For Student’s use only)

27
21AD3202 NATURAL LANGUAGE PROCESSING

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: out of _____

Full Name of the Evaluator:

Signature of the Evaluator


Date of Evaluation:

28
21AD3202 NATURAL LANGUAGE PROCESSING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK

4. Implement various actions in Corpora


Date of the Session: / / Time of the Session: to

Prerequisite:
• Understanding corpora and its purpose.
• Understanding types of corpora and data contained in them.

Pre-Lab Task:
1. Define corpus. What are different types of corpora and explain them?
Ans:-

2) Austin is fond of poems. He wants to perform few mathematical


operations on his favourite‘blake-poems’.
a) Import Gutenberg corpus and print all the file identifiers in this corpus.

Ans:-

b) Find the number of words in this poem.

Ans:-

29
21AD3202 NATURAL LANGUAGE PROCESSING

c) Find number of times the word ‘blake’ is found the poem.

Ans:-

In Lab Task:
1) Henry wants to train a text-to-speech engine with large amounts of English
words based on genres.He decides to use brown corpus. To do this he needs
to understand brown corpus. So, help him with importing brown corpus and
performing the following actions.
a. Print all the genres present in it.
b. Select ‘editorial’ genre and print the different types of words present in
it in sorted order.
c. Select the range [3493:3499] in the above sorted list and find their
count in ‘fiction’, ‘lore’, ‘belles_lettres’, ’government’, ‘learned’
genres.

Writing space for the Problem:(For Student’s use only)

30
21AD3202 NATURAL LANGUAGE PROCESSING
2) John wants to build a machine learning model to identify a speech given by
the president of the USA. To train the model, he wants to use an inaugural
corpus. Help him to understand the inauguralcorpus, implement the following
actions.
a) Print all the file identifiers.
b) Plot a graph where y-axis has count of words ‘president’ and ‘congress’
in each file and x-axishas year of presidential address.

Writing space for the Problem:(For Student’s use only)

31
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task
1) Import reuters corpus and perform following actions.
a) Print all the categories present in corpus.
b) Print the file identifiers which has category named ‘coffee’.

Writing space for the Problem:(For Student’s use only

32
21AD3202 NATURAL LANGUAGE PROCESSING
2) Create your own corpus with the NLP lab manual in text file. Print the
number of sentences inthe text file.
Writing space for the Problem:(For Student’s use
only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: out of

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

33
21AD3202 NATURAL LANGUAGE PROCESSING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK

5. Implement probabilistic parsing


Date of the Session: / / Time of the Session: to

Prerequisite:
• Understanding Probabilistic Parsing and its purpose.
• Understanding types of Parsing in Top Down and Bottom Up Parsing.
Pre-Lab Task:
1) Define ambiguity in Parse Tree and draw all possible parse trees
for given expressionP=Q+R*P?
Ans:-

2) Distinguish between Top-down and bottom up parsing with an example.


Ans:-

3) Apply the rules of parsing for the following expression“A boy can do
nothing for girl but be anything for her.”

Ans:-

34
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:

1) Implement a Cocke-Younger-Kasamai algorithm for generating parse


trees for
a giveninput sentence in python.
Sample Input: S -> NP VP PP -> P NP NP -> Det N
NP -> Det N PP NP -> 'I'
VP -> V NP VP -> VP PP
Det -> 'an'
Det -> 'my'
N -> 'elephant' N -> 'pajamas' V -> 'shot'
P -> 'in'
Sample Output:
[S [NP 'I'][VP [V 'shot'][NP [NP0 [Det 'an'][N 'elephant']][PP [P 'in'][NP [Det
'my'][N
'pajamas']]]]]]

Writing space for the Problem:(For Student’s use only)

35
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task:
1) Write variables for all length 6 substrings for deciding CFL using CKY
(Cocke-Younger-Kasamai ) algorithm.Consider the grammer
S-> ε (epsilon symbol) | AB | XB
T-> AB | XB X->AT
A->a B->b
Is w = aaabbb in L(G ) ?

Writing space for the Problem: (For Student’s use only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: out of

Full Name of the Evaluator:

Signature of the Evaluator

Date of Evaluation:

36
21AD3202 NATURAL LANGUAGE PROCESSING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK

6. Learning Grammar
Date of the Session: / / Time of the Session: to

Prerequisite:
Concepts Based on:
➢ Parsing
➢ PCFG
➢ CFG
➢ Viterbi Parser

Pre-Lab Task:

1) What are the types of parsing and their disadvantages?


Ans:-

2) What is Viterbi PCFG parsing?

Ans:-

3) Explain about top-down and bottom-up parsing and give examples.

Ans:-

37
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:

1. Vicky is interested to know the difference between the two types of


Probabilistic Context-Free Grammars and also the productions provided in
each type for each word in a given sentence. He also wants probability of the
given sentence in parsing using python:
a. Print the words present in toy_pcfg1 and no. of productions.
b. Print the words present in toy_pcfg2 and no. of productions.
c. Find the probability for the given sentence and also all the words present in
the sentence must be in the toy_pcfg1 or toy_pcfg2 using viterbi pcfg
parsing.

Sample Input:
Jack ate the cookie

Sample Output:
(S (NP (Name Jack)) (VP (V ate) (NP (Det the) (N cookie)))) (p=0.000883756)

Writing space for the Problem:(For Student’s use


only)

38
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task

1) Nicky is trying to learn how to parse the words in a given sentence so that she
could complete her challenge that she has done with her friend in a bet. She has
to parse the words in the given sentenceusing bottom-up using python in which
she should also use the Context-Free Grammar.

Sample Input:
the cat chased the dog

Sample Output:
(S (NP the (N cat)) (VP (V chased) (NP the (N dog))))

Writing space for the Problem:(For Student’s use


only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: out of

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

39
21AD3202 NATURAL LANGUAGE PROCESSING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK

7. Conditional frequency distribution


Date of the Session:____/ _____/ _____ Time of the Session: _____ to _____

Prerequisite:
• Understanding conditional frequency distribution and its
applications.
• Understanding concept of bigrams.
Pre-Lab Task:

1) Define Conditional Frequency Distributions and its features.

Ans:-

2) What are Bigrams and write python code to display bigrams by taking any
sentence of length 15.
Ans:-

40
21AD3202 NATURAL LANGUAGE PROCESSING

In Lab Task:
1) Steven has a task to count number of words in a text given by his friends but
here the problem is to find the count of a genre using conditional frequency
distribution. Help him by using brown corpus and Take genre as [‘fiction’,
’lore’] and print most common words in both the conditions toComplete his
task successfully.
Writing space for the Problem:(For Student’s use only)

41
21AD3202 NATURAL LANGUAGE PROCESSING

2) Find the number of words with frequency [2,4,7,9] and conditions


['Chickasaw', 'English','German_Deutsch’] Using Universal Declaration of
Human Rights Corpus and tabulate the results.
Writing space for the Problem:(For Student’s use only)

42
21AD3202 NATURAL LANGUAGE PROCESSING
POST-LAB TASK:-
1. Write a python code to generate a sentence with most likely occurring words
of a given word withlength of 25 words Using Condition Frequency
Distribution and bigrams.
Use the following data to build the model:
Corpus: - “genesis” Text: - “english-web.txt” Word: - “darkness”
Writing space for the Problem:(For Student’s use only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: out of

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

43
21AD3202 NATURAL LANGUAGE PROCESSING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK

8. Lexical Analyzer
Date of the Session:_____/_____/_______ Time of the Session:_____ to ______

Prerequisite:
• Toolbox and text files.
Pre-Lab Task:
1) What is the purpose of lexicon?
Ans:-

2) What is the role of lexical analyzer in NLP?


Ans:-

3) What is a corpus? How to create wordlist corpus?


Ans:-

44
21AD3202 NATURAL LANGUAGE PROCESSING
4) Implement a python program to create a wordlist corpus?

Ans:-

5) List out few tags for the parts of speech?


Ans:-

45
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:
1) Gowtham is a English lecturer so he planned to conduct a exam to all the
students in the College.In the exam students have to identify the parts of
speech in the given paragraph and as per the evaluation there are few lecturers
of English to evaluate all the Students .So in order to complete the evaluation
he asked some other lecturers to evaluate and they don’t have enough
knowledge regarding the exam requirement . So in order to evaluate properly
they thought of using wordlist corpus as it can generate parts of speech. They
have to remove stop words from the paragraph andidentify the parts of speech
in the paragraph. Implement a python code to remove Stopwords andidentify
parts of speech?
Sample input:-
"gowtham, viswa and hari are my good friends. ""viswa is getting married next
year. " "hari is theinnocent among three." "gowtham is intelligent among them."
Sample output:-
[('gowtham', 'NN'), (',', ','), ('viswa', 'NN'), ('and',
'CC'), ('hari', 'NN'
), ('are', 'VBP'), ('my', 'PRP$'), ('good', 'JJ'),
('friends', 'NNS'), ('.','.')]
[('viswa', 'NN'), ('is', 'VBZ'), ('getting', 'VBG'),
('married', 'VBN'), ('next', 'JJ'), ('year', 'NN'), ('.',
'.')]
[('hari', 'NN'), ('is', 'VBZ'), ('the', 'DT'), ('innocent',
'JJ'), ('among',
'IN'), ('three.gowtham', 'NN'), ('is', 'VBZ'), ('intelligent',
'JJ'), ('among
', 'IN'), ('them', 'PRP'), ('.', '.')]

Writing space for the Problem:(For Student’s use only)

46
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task:
1) A analyst wants to analyse a complete script of a new movie so that he can
clearly understand thatthere are no mistakes in the paragraph.In that way the
document will look cool and understandableto the editors so analyst wants to
script in the format that Convert text to lower case and Remove all non-word
characters Remove all punctuation in other words it has to be clear and
divided accordingly by removing all punctuation.Implement a python code in
toolbox using Bow(bag of words)?

Sample Input:
Beans. I was trying to explain to somebody as we were flying in, that’s corn. That’s
beans. And they werevery impressed at my agricultural knowledge. Please give it up
for Amaury once again for that outstanding introduction. I have a bunch of good
friends here today, including somebody who I served with, who is one of the finest
senators in the country, and we’re lucky to have him, your Senator, Dick Durbin is
here.I also noticed, by the way, former Governor Edgar here, who I haven’t seen in
a long time, and somehowhe has not aged and I have. And it’s great to see you,
Governor. I want to thank President Killeen and everybody at the U of I System for
making it possible for me to be here today. And I am deeply honored at the Paul
Douglas Award that is being given to me. He is somebody who set the path for so
much outstanding public service here in Illinois.

Sample Output:-

Writing space for the Problem:(For Student’s use


only)

47
21AD3202 NATURAL LANGUAGE PROCESSING

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: out of

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

48
21AD3202 NATURAL LANGUAGE PROCESSING

DEPARTMENT OF COMPUTER SCIENCE


AND ENGINEERING
SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING
WORKBOOK

9. WORDNET
Date of the Session: / / Time of the Session:
_to

Prerequisite:

• NLTK, Importing Word Net


Pre-Lab Task:
1) What is a Word Net and Synset?
Ans:-

2) What are hypernyms and hyponyms?


Ans:-

3) Below are the list of words mention its POS (parts of speech) through Synset
a) Silently
b) Lamborghini
c) Beauty
d) Maldives

Ans:-

49
21AD3202 NATURAL LANGUAGE PROCESSING
4) Write a python program to find synonym and antonym using NLTK wordnet
(Given word = ”bad” )
Ans:-

5) What are Lexical Relations, give some examples of lexical relations.


Ans:-

50
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:
1) Write down the syntax for the following:
(a) Import word net, Use the term "hello" to find Synsets
(b) Using Synset find the element in the 0th index, Just the word (using lemmas)
(c) Name, Definition of that first (0th index) Synset and examples of the word.
(d) Discern synonyms and antonyms in Synset
(e) Discern Hypernyms and Hyponyms in Synset

Writing space for the Problem:(For Student’s use only)

51
21AD3202 NATURAL LANGUAGE PROCESSING
2) I. Given two words, calculate the similarity between the words
(a) By using Path Similarity
(b) By using Wu-Palmer Similarity
Word1=car & Word2=bar
II. Mention different methods in which we define the
distance between words.(i.e other than the methods
mentioned above)

Writing space for the Problem:(For Student’s use only)

52
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task:
1) Explain how we can find out the semantic similarity of the sentences (it can
be two or more).
2) Draw the flowchart of wordnet hierarchy.

Writing space for the Problem:(For Student’s use only)

53
21AD3202 NATURAL LANGUAGE PROCESSING

(For Evaluator’s use only

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: out of

Full Name of the Evaluator:

Signature of the Evaluator

Date of Evaluation:

54
21AD3202 NATURAL LANGUAGE PROCESSING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK

10. Context Free Grammer


Date of the Session:_____/ ____ /_______ Time of the Session: ______to________

Prerequisite:
• NLTK.
• Basic Knowledge on Context free Grammar.
• Basic knowledge on different types of Parsing techniques
Pre-Lab Task:
1) What is Recursive Descent Parser?
Ans:-

2) What is a parser and mention different types of parsers.


Ans:-

3) What is Shift Reduce Parser?


Ans:-

55
21AD3202 NATURAL LANGUAGE PROCESSING

4) Implement the INLAB problem 1 manually.


Ans:-

In Lab Task:

1. Kriya a student at Cambridge University attends a competition on the theme


“Parsing” . She is given some explanation regarding the problem as follows:”
The simplest kind of parser interprets a grammar as a specification of how to
break a high-level goal into several lower-level sub goals. The top-level goal
is to find an S. The S → NP VP production permits the parser to replace this
goal with two sub goals: find an NP, then find a VP. Each of these sub goals
can be replaced in turn by sub-sub-goals, using productions that have NP and
VP on their left-hand side. Eventually,this expansion process leads to sub
goals such as: find the word telescope. Such sub goals can be directly
compared against the input sequence, and succeed if the next word is
matched. If there is no match the parser must back up and try a different
alternative”.The above explanation is regarding a parser. Help her solve the
above parsing technique and write a python code for it. Hint:This is a top down
parsing technique.

Writing space for the Problem:(For Student’s use only)

56
21AD3202 NATURAL LANGUAGE PROCESSING

57
21AD3202 NATURAL LANGUAGE PROCESSING
2. Implement the parser that tries to find sequences of words and phrases that
correspond to the righthand side of a grammar production, and replace them
with the left-hand side, until the whole sentence is reduced to an S. Hint: The
above mentioned parser is a bottom up parser.

Writing space for the Problem:(For Student’s use only)

58
21AD3202 NATURAL LANGUAGE PROCESSING
Post Lab Task:
1) Implement the recursive descent parsing technique and shift reduce parsing
technique For LargeContext Free Grammar for ATIS grammar.

Writing space for the Problem:(For Student’s use only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: out of

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

59
21AD3202 NATURAL LANGUAGE PROCESSING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SUBJECT CODE: 21AD3202
NATURAL LANGUAGE PROCESSING WORKBOOK

11. Name Entity Recognition

Date of the Session:_____/_____/______Time of the Session: ______ to ________

Prerequisite:
• Basic idea on Named Entity Recognition.
• Standard Libraries to use Named Entity Recognition.

Pre-Lab Task:
1) What is Context Free Grammar?

Ans:-

2) What is Chart Parser?

Ans:-

3) What are the different types of chart parsers?

Ans:-

60
21AD3202 NATURAL LANGUAGE PROCESSING
In Lab Task:
1) Mr.P who is a student at Purdue University is interested in attending a
specialization based competition. The competition had a preliminary round in
which Mr.P is given a statement as follows:” Find the tool which uses the
algorithmic technique of dynamic programming to derive the parses of an
ambiguous sentence more efficiently and try to implement this tool with the help
of NLTK”.Assuming yourself as Mr.P try to implement it.

Writing space for the Problem:(For Student’s use only)

61
21AD3202 NATURAL LANGUAGE PROCESSING

2) Implement the bottom-up left Corner and Left-Corner with Bottom-Up Filter
Parsers using nltk.

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: out of

Full Name of the Evaluator:

Signature of the Evaluator

Date of Evaluation:

62

You might also like