0% found this document useful (0 votes)
7 views

NLP Lab File

Uploaded by

Bharat Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

NLP Lab File

Uploaded by

Bharat Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

DELHI TECHNOLOGICAL

UNIVERSITY
SE-316
NATURAL LANGUAGE PROCESSING

Department of Software Engineering


Delhi Technological University
Bawana Road, Delhi-110042

Submitted by
Prashant Tiwari
Roll Number :- 2K20/IT/103
Batch :- IT-B

Submitted to : Dr. Divyashikha Sethia


Department of Software Engineering
Delhi Technological University
INDEX

S. No. Experiment Date

1. Import nltk and download the ‘stopwords’ 13-01-2023


and ‘punkt’ packages

2. Import spacy and load the language model. 13-01-2023

3. WAP in python to tokenize a given text. 20-01-2023

4. WAP in python to get the sentences of a 03-03-2023


text document.

5. WAP in python to tokenize text with 03-02-2023


stopwords as delimiters.

6. WAP in python to add custom stop words in 03-02-2023


spaCy.

7. WAP to remove punctuations, perform 24-02-2023


stemming, lemmatize given text and extract
usernames from emails

8. WAP to do spell correction, extract all 07-03-2023


nouns, pronouns and verbs in a given text

9. WAP to find similarity between two words 31-03-2023


and classify a text as positive/negative
sentiment
EXPERIMENT - 1
AIM : Import nltk and download the ‘stopwords’ and ‘punkt’
packages

CODE :
import nltk

nltk.download('stopwords')
nltk.download('punkt')

OUTPUT :
EXPERIMENT - 2
AIM : Import spacy and load the language model

CODE :
import spacy
nlp_eng = spacy.load('en_core_web_sm')
nlp_multi = spacy.load('xx_ent_wiki_sm')

OUTPUT :
EXPERIMENT - 3
AIM : WAP in python to tokenize a given text

CODE :
from nltk import word_tokenize
text = "Last week, the University of Cambridge shared its own research
that shows if everyone wears a mask outside home,dreaded ‘second wave’
of the pandemic can be avoided."
text = word_tokenize(text)
for t in text:
print(t)

OUTPUT :
EXPERIMENT - 4
AIM : WAP in python to get the sentences of a text document.

CODE :
file = open('04.txt')
Input_text = file.read()
ans = Input_text.split('.')

for an in ans:
print(an,'\n')

OUTPUT :
EXPERIMENT - 5
AIM : WAP in python to tokenize text with stopwords as
delimiters.

CODE :
text = "Walter was feeling anxious. He was diagnosed today. He probably
is the best person I know."

stop_words_and_delims = ['was', 'is', 'the', '.', ',', '-', '!', '?']


for r in stop_words_and_delims:
text = text.replace(r, 'DELIM')

words = [t.strip() for t in text.split('DELIM')]


words_filtered = list(filter(lambda a: a not in [''], words))
for word in words_filtered:
print(word)

OUTPUT :
EXPERIMENT - 6
AIM : WAP in python to add custom stop words in spaCy.

CODE :
import spacy

nlp = spacy.load('en_core_web_sm')

custom_stop_words = ['was', 'is','the','JUNK','NIL','of','more' ,'.',


',', '-', '!', '?','a']
for word in custom_stop_words:
nlp.vocab[word].is_stop = True

doc = nlp("Jonas was a JUNK great guy NIL Adam was evil NIL Martha JUNK
was more of a fool")
for token in doc:
if not token.is_stop:
print(token.text, end=" ")

OUTPUT :
EXPERIMENT - 7
AIM : WAP to remove punctuations, perform stemming,
lemmatize given text and extract usernames from emails

CODE :
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''

string = "Jonas!!! great \\guy <> Adam --evil [Martha] ;;fool() ."

ans = ""
for char in string:
if char not in punctuations:
ans+=char

print(ans)

from nltk.stem import PorterStemmer


from nltk.tokenize import word_tokenize
text= "Dancing is an art. Students should be taught dance as a subject
in schools . I danced in many of my school function. Some people are
always hesitating to dance."
ans = ""
stemmer = PorterStemmer()
tokens = word_tokenize(text)
for token in tokens:
ans+=stemmer.stem(token)
ans+=" "
print(ans)

from nltk.corpus import wordnet


from nltk.tokenize import word_tokenize

from nltk.stem.wordnet import WordNetLemmatizer


lemmatizer = WordNetLemmatizer()
text= "Dancing is an art. Students should be taught dance as a subject
in schools . I danced in many of my school function. Some people are
always hesitating to dance."
ans = ""
tokens = word_tokenize(text)
for token in tokens:
ans+=lemmatizer.lemmatize(token, wordnet.VERB)
ans+=" "
print(ans)
from nltk.tokenize import word_tokenize

text= "The new registrations are potter709@gmail.com ,


elixir101@gmail.com. If you find any disruptions, kindly contact
granger111@gamil.com or severus77@gamil.com "

text_list = word_tokenize(text)
usernames = []
for i in range(len(text_list)):
if text_list[i] == "@":
usernames.append(text_list[i-1])
print(usernames)

OUTPUT :
EXPERIMENT - 8
AIM : WAP to do spell correction, extract all nouns, pronouns
and verbs in a given text

CODE :
from textblob import TextBlob
text="He is a gret person. He beleives in bod"
textb = TextBlob(text)
correct_text = textb.correct()
print(correct_text)

import nltk
from nltk import word_tokenize, pos_tag
text="James works at Microsoft. She lives in manchester and likes to
play the flute"
tokens = word_tokenize(text)
parts_of_speech = nltk.pos_tag(tokens)
nouns = list(filter(lambda x: x[1] == "NN" or x[1] == "NNP",
parts_of_speech))
for noun in nouns:
print(noun[0])

from nltk import pos_tag, word_tokenize

text = "I may bake a cake for my birthday. The talk will introduce
reader about Use of baking"

words = word_tokenize(text)

verb_phrases = []
for i in range(len(words)):
if i > 0 and pos_tag(words)[i][1] == 'VB':
verb_phrase = words[i-1] + ' ' + words[i]
verb_phrases.append(verb_phrase)

for i in verb_phrases:
print (i)

OUTPUT :
EXPERIMENT - 9
AIM : WAP to find similarity between two words and classify a
text as positive/negative sentiment

CODE :
import spacy

nlp = spacy.load('en_core_web_md')
words = "amazing terrible excellent"

tokens = nlp(words)

token1, token2, token3 = tokens[0], tokens[1], tokens[2]

print(f"Similarity between {token1} and {token2} : ",


token1.similarity(token2))
print(f"Similarity between {token1} and {token3} : ",
token1.similarity(token3))

from textblob import TextBlob


text = "It was a very pleasant day"
print(TextBlob(text).sentiment)

OUTPUT :

You might also like