0% found this document useful (0 votes)

13 views15 pages

R22 NLP Python Programs

The document provides a comprehensive guide on installing the NLTK library and performing various text processing tasks using Python. It includes instructions for tokenization, stop word removal, stemming, word analysis, word generation, and morphological analysis, along with example code snippets and outputs. Additionally, it covers the implementation of word sense disambiguation and part-of-speech tagging.

Uploaded by

pasupunoorisrujana3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views15 pages

R22 NLP Python Programs

Uploaded by

pasupunoorisrujana3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Steps for Install nltk library

Install python latest version python 3.9.6

Select command prompt and type text as python –version and click on enter

Type text as pip –version and click on enter

Type text as pip install nltk and click on enter

Open IDLE shell click on file and select new file

Click on save as give the program name with extension .py

Click on run select run module

If any package not found in nltk library go to IDLE shell type below commands and click on enter

import nltk

nltk.download()

Example:

nltk.download('stopwords')

1. Write a Python program to perform following tasks on text

a) Tokenization

Word tokenization:

import nltk

word_data = "It originated from the idea that there are readers who prefer learning new skills
from the comforts of their drawing rooms"

nltk_tokens = nltk.word_tokenize(word_data)

print (nltk_tokens)

Output:

['It', 'originated', 'from', 'the', 'idea', 'that', 'there', 'are', 'readers', 'who', 'prefer', 'learning',
'new', 'skills', 'from', 'the', 'comforts', 'of', 'their', 'drawing', 'rooms']
Sentence tokenization:

import nltk

sentence_data = "The First sentence is about Python. The Second: about Django. You can learn
Python,Django and Data Ananlysis here. "

nltk_tokens = nltk.sent_tokenize(sentence_data)

print (nltk_tokens)

Output:

['The First sentence is about Python.', 'The Second: about Django.', 'You can learn
Python,Django and Data Ananlysis here.']

Character tokenization

Import nltk

charact_data=" Python programming"

charact_tokens=list(charact_data)

print(charact_tokens)

Output:

['P', 'y', 't', 'h', 'o', 'n', 'p', 'r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g'

b) Stop word Removal

from nltk.corpus import stopwords

en_stops = set(stopwords.words('english'))

all_words = ['There', 'is', 'a', 'tree','near','the','river']

for word in all_words:

if word not in en_stops:

print(word)

Output:

There
tree

near

river

2) Write a Python program to implement Porter stemmer algorithm for stemming

import nltk

from nltk.stem import Porter Stemmer

nltk.download('punkt')

stemmer=PorterStemmer()

words=["running","beautifulness","rivers","caresses","happily","studies","banking"]

stemmed_words=[stemmer.stem(word) for word in words]

print("Original Words:",words)

print("Stemmed Words",stemmed_words)

Output:

Original Words: ['running', 'beautifulness', 'rivers', 'caresses', 'happily', 'studies', 'banking']

Stemmed Words ['run', 'beauti', 'river', 'caress', 'happili', 'studi', 'bank']

3) Write a Python program for

a) Word Analysis

import re

from collections import Counter

def word_analysis(text):

# Convert text to lowercase and remove punctuation

text = text.lower()

text = re.sub(r'[^\w\s]', '', text)

# Split text into words

words = text.split()

# Count frequency of each word

word_freq = Counter(words)

# Calculate length of each word

word_len = {word: len(word) for word in words}

# Identify most common words

most_common_words = word_freq.most_common(10)

return word_freq, word_len, most_common_words

text = "This is an example sentence for word analysis. This sentence is just an example."

word_freq, word_len, most_common_words = word_analysis(text)

print("Word Frequency:")

for word, freq in word_freq.items():

print(f"{word}: {freq}")

print("\nWord Length:")

for word, length in word_len.items():

print(f"{word}: {length}")

print("\nMost Common Words:")

for word, freq in most_common_words:

print(f"{word}: {freq}")

Output:

Word Frequency:

this: 2

is: 2

an: 2
example: 2

sentence: 2

for: 1

word: 1

analysis: 1

just: 1

Word Length:

this: 4

is: 2

an: 2

example: 7

sentence: 8

for: 3

word: 4

analysis: 8

just: 4

Most Common Words:

this: 2

is: 2

an: 2

example: 2

sentence: 2
for: 1

word: 1

analysis: 1

just: 1

b) Word Generation

import random

import nltk

from nltk.corpus import wordnet

nltk.download('wordnet')

def generate_meaningful_words(part_of_speech, num_words):

synsets = list(wordnet.all_synsets(part_of_speech))

words = []

for _ in range(num_words):

synset = random.choice(synsets)

lemma = random.choice(synset.lemmas())

words.append(lemma.name())

return words

nouns = generate_meaningful_words('n', 10)

verbs = generate_meaningful_words('v', 10)

adjectives = generate_meaningful_words('a', 10)

adverbs = generate_meaningful_words('r', 10)

print("Nouns:")

for noun in nouns:

print(noun)
print("\nVerbs:")

for verb in verbs:

print(verb)

print("\nAdjectives:")

for adjective in adjectives:

print(adjective)

print("\nAdverbs:")

for adverb in adverbs:

print(adverb)

Output:

Nouns:

Haastia_pulvinaris

televangelist

genus_Estrilda

E._H._Weber

insidiousness

Evangelical_and_Reformed_Church

garnishee

semigloss

powder_keg

townspeople

Verbs:

encapsulate
remain

salve

cruise

credit

charge

drone_on

fume

sandblast

Adjectives:

stipendiary

reportable

stilly

live

adscititious

bindable

upper-class

god-awful

organized

untechnical

Adverbs:

pitty-patty
naturally

managerially

smartly

providently

dumbly

worse

tight

magniloquently

pointlessly

4. Create a sample list of at least 5 words with ambiguous sense and write a python program
to implement WSD.

import nltk

from nltk.corpus import wordnet as wn

from nltk.wsd import lesk

# Ensure that NLTK resources are downloaded

nltk.download('punkt')

nltk.download('wordnet')

nltk.download('omw-1.4')

# List of ambiguous words

ambiguous_words = ["bank", "bat", "bark", "pitch", "lead"]

# Example context sentences for each word

contexts = {

"bank": "I went to the river bank to relax by the water.",

"bat": "The bat flew out of the cave at dusk.",

"bark": "The dog barked loudly at the stranger.",

"pitch": "He gave a brilliant pitch to the investors.",

"lead": "He decided to lead the team on the project."

# Function to disambiguate word senses using Lesk algorithm

def disambiguate_word(word, context):

sense = lesk(context.split(), word)

if sense:

return sense.name() # Return the sense (meaning) of the word

else:

return "No sense found"

# Iterate over each ambiguous word and print its disambiguated sense based on context

for word in ambiguous_words:

context = contexts[word]

print(f"Word: {word}")

print(f"Context: {context}")

print(f"Disambiguated Sense: {disambiguate_word(word, context)}")

print("-" * 50)
Output:

Word: bank

Context: I went to the river bank to relax by the water.

Disambiguated Sense: bank.v.07

--------------------------------------------------

Word: bat

Context: The bat flew out of the cave at dusk.

Disambiguated Sense: bat.v.03

--------------------------------------------------

Word: bark

Context: The dog barked loudly at the stranger.

Disambiguated Sense: bark.n.04

--------------------------------------------------

Word: pitch

Context: He gave a brilliant pitch to the investors.

Disambiguated Sense: pitch.v.04

--------------------------------------------------

Word: lead

Context: He decided to lead the team on the project.

Disambiguated Sense: spark_advance.n.01

--------------------------------------------------

5. Install NLTK tool kit and perform stemming

import nltk

from nltk.stem import PorterStemmer

stemmer = PorterStemmer()

words = ['running', 'jumping', 'hiking', 'swimming']

stemmed_words = [stemmer.stem(word) for word in words]

print(stemmed_words)

Output:

['run', 'jump', 'hike', 'swim']

6. Create Sample list of at least 10 words POS tagging and find the POS for any given word

import nltk

from nltk import pos_tag, word_tokenize

nltk.download('punkt')

nltk.download('averaged_perceptron_tagger')

def find_pos_tag(word):

tokens = word_tokenize(word)

pos_tags = pos_tag(tokens)

return pos_tags[0][1]

word = input("Enter a word: ")

pos_tag = find_pos_tag(word)

print("The POS tag for '{}' is '{}'".format(word, pos_tag))

Output:

Enter a word: say

The POS tag for 'say' is 'VB'

Enter a word: the

The POS tag for 'the' is 'DT'

Enter a word: karimnagar

The POS tag for 'karimnagar' is 'NN'

Enter a word: good

The POS tag for 'good' is 'JJ'

7. Write a Python program to

a) Perform Morphological Analysis using NLTK library

import nltk

from nltk.stem import WordNetLemmatizer

from nltk.tokenize import word_tokenize

from nltk import pos_tag

nltk.download('punkt')

nltk.download('averaged_perceptron_tagger')

nltk.download('wordnet')

lemmatizer = WordNetLemmatizer()

def morphological_analysis(text):

tokens = word_tokenize(text)

tagged_tokens = pos_tag(tokens)

lemmatized_tokens = []

for token, tag in tagged_tokens:

if tag.startswith('J'):

wordnet_tag = 'a'

elif tag.startswith('V'):

wordnet_tag = 'v'

elif tag.startswith('N'):

wordnet_tag = 'n'
elif tag.startswith('R'):

wordnet_tag = 'r'

else:

wordnet_tag = ''

if wordnet_tag:

lemmatized_token = lemmatizer.lemmatize(token, wordnet_tag)

else:

lemmatized_token = token

lemmatized_tokens.append(lemmatized_token)

return lemmatized_tokens

text = "The quick brown fox jumps over the lazy dog."

print("Original Text:")

print(text)

print("\nLemmatized Tokens:")

print(morphological_analysis(text))

Output:

Original Text:

The quick brown fox jumps over the lazy dog.

Lemmatized Tokens:

['The', 'quick', 'brown', 'fox', 'jump', 'over', 'the', 'lazy', 'dog', '.']

b) Generate n-grams using NLTK N-Grams library

Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization
No ratings yet
1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization
13 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
32 pages
NLP Record
No ratings yet
NLP Record
23 pages
Nlp Lab Manual
No ratings yet
Nlp Lab Manual
6 pages
TSA Lab Manual New
No ratings yet
TSA Lab Manual New
14 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
18 pages
20BCP123 - NLP Lab Manual
No ratings yet
20BCP123 - NLP Lab Manual
45 pages
DSBD 7 Ass
No ratings yet
DSBD 7 Ass
9 pages
NLP Op
No ratings yet
NLP Op
16 pages
20BCP112 - NLP Lab - LAB - Manual
No ratings yet
20BCP112 - NLP Lab - LAB - Manual
65 pages
7 TextAnalysis
No ratings yet
7 TextAnalysis
3 pages
NLP Exp 5, Implement Stemming, Lemmetization, Pos - Tag, Wordnet - Colab
No ratings yet
NLP Exp 5, Implement Stemming, Lemmetization, Pos - Tag, Wordnet - Colab
2 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
A7 Dsbda Sana
No ratings yet
A7 Dsbda Sana
15 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
7 Exp
No ratings yet
7 Exp
6 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
NLP Lab1
No ratings yet
NLP Lab1
6 pages
Final NLP Lab File
No ratings yet
Final NLP Lab File
28 pages
Natural Language Processing Lab Manual
No ratings yet
Natural Language Processing Lab Manual
24 pages
NLP Record
No ratings yet
NLP Record
15 pages
3.Nlp Lab Manual
No ratings yet
3.Nlp Lab Manual
18 pages
Batch 2
No ratings yet
Batch 2
13 pages
Natural Language Processing
No ratings yet
Natural Language Processing
22 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
NLPPractical
No ratings yet
NLPPractical
12 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
NLTK Tutorial
No ratings yet
NLTK Tutorial
33 pages
DSBDA7
No ratings yet
DSBDA7
5 pages
NLP Pratical
No ratings yet
NLP Pratical
14 pages
AI Lab Manual Aktu
No ratings yet
AI Lab Manual Aktu
11 pages
NLP Expts
No ratings yet
NLP Expts
41 pages
Exp 4
No ratings yet
Exp 4
5 pages
SK NLP Practical (FS)
No ratings yet
SK NLP Practical (FS)
22 pages
NLP - Exp 1 11
No ratings yet
NLP - Exp 1 11
29 pages
NLP - Practical List
No ratings yet
NLP - Practical List
14 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
7 Idf
No ratings yet
7 Idf
5 pages
CCS369 - Text and Speech Analysis
No ratings yet
CCS369 - Text and Speech Analysis
31 pages
Assignment#6-1 - 11-Arid-3624 - Jupyter Notebook
No ratings yet
Assignment#6-1 - 11-Arid-3624 - Jupyter Notebook
6 pages
123 NLP 456
No ratings yet
123 NLP 456
4 pages
NLP Lab
No ratings yet
NLP Lab
18 pages
Sample Program Using Python 3
No ratings yet
Sample Program Using Python 3
5 pages
_COTB46_7
No ratings yet
_COTB46_7
3 pages
Sahil NLP
No ratings yet
Sahil NLP
16 pages
Soundarya 256 NLP Practs
No ratings yet
Soundarya 256 NLP Practs
14 pages
NLP FinAL
No ratings yet
NLP FinAL
27 pages
NLP Final
No ratings yet
NLP Final
26 pages
All Practicals
No ratings yet
All Practicals
33 pages
NLP Lab 2
No ratings yet
NLP Lab 2
6 pages
NLP Lab Manual (R20)
50% (2)
NLP Lab Manual (R20)
24 pages
Text Processing
No ratings yet
Text Processing
16 pages
DS 7
No ratings yet
DS 7
3 pages
Assignment No - 7
No ratings yet
Assignment No - 7
4 pages
NLP Intro
No ratings yet
NLP Intro
15 pages
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
From Everand
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
Shari Eskenas
5/5 (1)
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
Types of Sentences - 2
No ratings yet
Types of Sentences - 2
21 pages
Syntax and Constituent Structure
No ratings yet
Syntax and Constituent Structure
3 pages
Paraphrasing Exercises
No ratings yet
Paraphrasing Exercises
2 pages
TEE Sample Paper ENG2005
No ratings yet
TEE Sample Paper ENG2005
6 pages
Best Apps To Learn Arabic
No ratings yet
Best Apps To Learn Arabic
3 pages
Đề TS 10
No ratings yet
Đề TS 10
10 pages
Use A Singular Verb After Each and Every.: 1. Each of The Four Boys Clever
No ratings yet
Use A Singular Verb After Each and Every.: 1. Each of The Four Boys Clever
7 pages
Controlled and Freer Practice
No ratings yet
Controlled and Freer Practice
3 pages
Article Writing Template
No ratings yet
Article Writing Template
7 pages
IB English Literary Terms
No ratings yet
IB English Literary Terms
1 page
General English Demo
No ratings yet
General English Demo
31 pages
Gender
100% (1)
Gender
18 pages
Subject Assignment: Teaching English Through Translation: General Information
No ratings yet
Subject Assignment: Teaching English Through Translation: General Information
11 pages
3.-Simple Future
No ratings yet
3.-Simple Future
2 pages
Test 1
No ratings yet
Test 1
6 pages
Performance Task in Introduction To World Religion 2nd Quarter
No ratings yet
Performance Task in Introduction To World Religion 2nd Quarter
1 page
Book 2 Grammar Unit 10 Present Perfect
No ratings yet
Book 2 Grammar Unit 10 Present Perfect
18 pages
1MS Third Term Test - Dotx
No ratings yet
1MS Third Term Test - Dotx
3 pages
2019-20-S2 English
No ratings yet
2019-20-S2 English
4 pages
TRB U2
No ratings yet
TRB U2
7 pages
Basic Concepts of Decimals
No ratings yet
Basic Concepts of Decimals
54 pages
Grammar Kelas 2 Intensive Pi Semester 1 2023
No ratings yet
Grammar Kelas 2 Intensive Pi Semester 1 2023
2 pages
American English File 2e 3 Teachers Book 146
No ratings yet
American English File 2e 3 Teachers Book 146
1 page
CDI9
100% (1)
CDI9
88 pages
Accomplishment Report On Moving Up
No ratings yet
Accomplishment Report On Moving Up
6 pages
Efal Grade 7 Nov Dec 2023 QP Exam
No ratings yet
Efal Grade 7 Nov Dec 2023 QP Exam
9 pages
Rugian Slavic God Sventovit - One More Time Zaroff
No ratings yet
Rugian Slavic God Sventovit - One More Time Zaroff
18 pages
3rd+grade+topics+-+institutional+evaluations+-++third+term+23 24+ +elementary
No ratings yet
3rd+grade+topics+-+institutional+evaluations+-++third+term+23 24+ +elementary
5 pages
Narration
No ratings yet
Narration
6 pages
2NamesNombres Study Guide Answer Key
No ratings yet
2NamesNombres Study Guide Answer Key
3 pages