0% found this document useful (0 votes)
22 views10 pages

NLP Assign Mod-4,5,6 IramShaikh

Uploaded by

mohdsahilk841
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views10 pages

NLP Assign Mod-4,5,6 IramShaikh

Uploaded by

mohdsahilk841
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Natural Language Processing

1) Write short note on Yarowsky Algorithm with "one sense per collocation" and "one
sense per discourse"
The Yarowsky Algorithm is a clever approach to word sense disambiguation, or figuring
out which meaning of a word is intended in a given context. Two key principles of this
algorithm are "one sense per collocation" and "one sense per discourse":
One sense per collocation: This means that a word tends to have the same meaning
when used with the same neighboring words (collocations). For example, in "river bank"
vs. "financial bank," "bank" will have a consistent meaning based on its collocates.
One sense per discourse: This principle suggests that a word usually maintains the
same meaning throughout a single piece of text or conversation. So, if "bank" is used to
mean a financial institution at the beginning of a document, it will likely keep that meaning
throughout.
The Yarowsky Algorithm uses these principles to iteratively improve its accuracy in
identifying the correct sense of a word, starting with a few manually-labeled examples and
then expanding its scope as it learns.

2) Write short note on lesk algorithm.


The lesk algorithm is based on the assumption that words in a given 'neighbourhood' will tend to
share a common topic. In a simplified manner, the Lesk algorithm is to compare the dictionary
definition of an ambiguous word with the terms contained in its neighbourhood.
Versions have been adapted to use wordnet. An implementation appears like this:
• For every sense of the word being disambiguated one should count the number of words
that are in both the neighbourhood of that word and in the dictionary definition of that
sense.
• The sense that is to be chosen is the sense that has the largest number of this count
We consider an example illustrating this algorithm, for the context "pine cone".
. Dictionary definitions are: PINE
• Kind of evergreen tree with needle-shaped leaves.
• Waste away through sorrow or illness.
CONE
• Solid body which narrows to a point.
• Something of this shape whether solid or hollow.
• Fruit of certain evergreen trees.
• We note that, the best intersection is pine #1 ∩cone#3 = 2.
Limitations of Lesk Algorithm
(i) Lesk's approach is very sensitive to the exact wording of definitions.
(ii) The absence of a certain word can change the results considerably.
(iii) The algorithm determines overlaps only among the glosses of the senses being
considered.
3) Describe the steps involved in resolving a pronoun using Hobbs Algorithm.

1) Identify the Pronoun and Sentence: Start with the pronoun you want to resolve and the
sentence it's in.
2) Traverse Left Branches: From the pronoun, traverse up the parse tree to find the leftmost
node dominating the pronoun.
3) Find Potential Antecedents in the Same Sentence: For each node along the traversal,
scan for noun phrases (NPs) within the current sentence to see if they can be the
antecedents.
4) Traverse Sibling Nodes: If no suitable antecedent is found, traverse to the left sibling
nodes of the pronoun in the parse tree and check for noun phrases.
5) Move Up the Parse Tree: If still unresolved, move to the next higher node in the tree and
repeat the process, checking each level for potential antecedents.
6) Proceed to Previous Sentences: If no antecedent is found within the sentence
containing the pronoun, apply the algorithm to the previous sentences in the text, starting
from step 1 for each sentence.
4) Explain Word sense disambiguation in detail.

• To realise the various usage patterns in the language is important for various
Natural Language Processing Applications.
• Word Sense Disambiguation is an important method of NLP, using it the meaning of
a word can be determined. And that can be used in a particular context.
• The main problem of NLP systems is to identify words properly and to determine the
specific usage of a word in a particular sentence.
• Word sense Disambiguation solves the ambiguity when that arises while determining
the meaning of the same word, when it is used in different situations.
Applications of WSD

• WSD has various applications in text processing


• WSD can be used as Lexicography.
• WSD can also be used in text mining and Information Extraction tasks.
• WSD can be used for Information Retrieval purposes.
5) Complete the code
!pip install gensim
from gensim.summarization import summarize
# Sample text to summarize
text = """NLP is interesting and easy subject. It is most important subject to study in Machine
learning"""
# Generate a summary using gensim's summarize method
summary = summarize(text, ratio=0.3) # Summarize the text to 30% of the original length
# Print the summary
print("Original Text:")
print(text)
print("\nSummarized Text:")
print(summary)

6)Define Synonymy, homonymy,polysemy and Hyponymy ?


• Synonymy It is a relation between two lexical items having different forms but
expressing the same or a close meaning. Examples are ‘author / writer’, ‘fate / destiny’.
• Hyponymy It is defined as the relationship between a generic term and instances of that
generic term. The generic term is called hypernym and its instances are called
hyponyms. As an example, the word colour is hypernym and the colour red, green etc
are hyponyms.
• Homonymy It is defined as the words having same spelling or same form but having
different and unrelated meaning. For example, the word "Bat" is a homonymy word
because bat can be used to hit a ball or bat is a flying mammal also.
• Polysemy
· Polysemi means "many signs". It is a Greek word.
· It is a word or phrase with different but related sense.
· Polysemy has the same spelling but different and related meanings.
. For example, the word "bank" is a polysemy word with the following different meanings:
(i) A financial institution
(ii) The building in which such an institution is located.
Q7) What is principle of Lexical Approach?
• The basic principle of lexical approach is "Language is grammaticalised lexis, not
lexicalised grammar".
• In other words, lexis is central in creating meanings, grammar plays a subsidiary
managerial role.
• The Lexical Approach emphasizes the importance of vocabulary (lexis) in language
learning over the traditional focus on grammar.
• According to this approach, language is made up of chunks of meaningful words and
phrases, rather than just rules and structures. This means learners should focus on
understanding and using these chunks naturally, which helps in creating more fluent and
meaningful communication.
• Grammar is seen more as a supportive framework rather than the core of language
learning.

1) Explain Semantic ambiguity


Semantic Ambiguity
Semantic ambiguity happens when a word or phrase in a sentence can be understood in
more than one way. This type of ambiguity arises because the meaning of the words or
phrases is not clear-cut, leading to multiple possible interpretations.
For example, take the sentence: "The car hit the pole while it was moving." This
sentence can be ambiguous because it's unclear whether "it" refers to the car or the pole.
So, the interpretations can be:
"The car, while moving, hit the pole."
"The car hit the pole while the pole was moving."
These different interpretations showcase how semantic ambiguity can make the intended
meaning of a sentence unclear and open to multiple readings.

2) Explain synsets in wordnet dictionary


Synsets (Synonym Sets): Groups of words or phrases that share a common meaning.
Each synset represents a unique concept or idea.
Example: Consider the words "car," "automobile," and "motorcar." These words all share
the same meaning and belong to the same synset in WordNet. They represent the concept
of a motor vehicle.
Synsets help organize language by grouping synonymous words and mapping their
relationships, making it easier to understand how different words relate to each other.

3) What is NER (Name Entity Recognition).


• Named Entity Recognition (NER) is a technique in natural language processing
(NLP) that focuses on identifying and classifying entities.
• The purpose of NER is to automatically extract structured information from
unstructured text, enabling machines to understand and categorize entities in a
meaningful manner for various applications like text summarization, building
knowledge graphs, question answering, and knowledge graph construction.
• An entity is the thing that is consistently talked about or refer to in the text, such as
person names, organizations, locations, time expressions, quantities, percentages
and more predefined categories.
4) Draw a neat diagram of Vanquois triangle.

5) Explain Discourse processing.


• Discourse refers to any linguistic construction with multiple sentences. A
disclosure is used in understanding and generating natural language.
• A variety of text mining applications can be supported by discourse processing,
which is a collection of Natural Language Processing (NLP) tasks used to
extract linguistic structures from texts at different levels.
• Identifying the conversational discourse's topic structure, coherence structure,
coreference structure, and conversation structure is required for this.
• Together, these structures can guide information extraction, sentiment
analysis, machine translation, question answering, essay scoring, text
summarization, and thread recovery.
6) Write short note on Yarowsky algorithm.
In computational linguistics the yarowsky algorithm is a semi-superviged or unsupervised
learning algorithm, for word sense disambiguation. From observation, words tend to exhibit only
one sense in most given discourse and in a given collocation.
Key Principles:
One Sense per Collocation: A word tends to have the same meaning in the same context
(with similar neighboring words).
One Sense per Discourse: A word typically maintains the same meaning throughout a
single piece of text or conversation.
How It Works:
Initial Training: The algorithm starts with a small, manually-labeled set of examples for
different senses of a word.
Iterative Learning: Using the initial examples, it learns and applies these principles to
identify the correct sense of the word in new contexts.
Expansion: Over iterations, the algorithm refines its sense assignments and gradually
improves its accuracy by incorporating more and more data.
Q14) What is the Lesk Algorithm, and how does it help in word sense disambiguation
(WSD)?
The lesk algorithm is based on the assumption that words in a given 'neighbourhood' will tend to
share a common topic. In a simplified manner, the Lesk algorithm is to compare the dictionary
definition of an ambiguous word with the terms contained in its neighbourhood.
Versions have been adapted to use wordnet. An implementation appears like this:
1. For every sense of the word being disambiguated one should count the number of words that
are in both the neighbourhood of that word and in the dictionary definition of that sense.
2. The sense that is to be chosen is the sense that has the largest number of this count
The Lesk Algorithm is a straightforward approach to word sense disambiguation (WSD), relying
on the idea that words in the same context often share common topics.
• Assumption: Words near each other in text (their 'neighbourhood') tend to be related in
meaning.
• Comparison: Compare the dictionary definitions of the ambiguous word with the words
in its context (neighbourhood).
• Counting: For each potential sense of the ambiguous word, count how many words from
the context appear in the dictionary definition of that sense.
• Selection: The sense with the highest count is chosen as the correct meaning.
By using the surrounding context, the Lesk Algorithm helps determine the most appropriate
meaning of an ambiguous word, making it a valuable tool in natural language processing.

15) What is reference resolution in discourse processing, and what are the different
types of reference phenomena encountered in natural language? Provide examples
to illustrate each type.
Reference resolution in discourse processing involves identifying and linking pronouns
and other referring expressions to the correct entities in a text. It's crucial for maintaining
coherence and understanding the flow of a conversation or narrative. Essentially, it
ensures that every "he," "she," "it," or "they" is accurately connected to the right person,
place, or thing mentioned earlier or later in the text. This helps in making the text clear and
understandable.
1) Indefinite Noun Phrases: Refer to non-specific entities.
Example: "A dog barked loudly."
2) Definite Noun Phrases: Refer to specific entities known to the reader or listener.
Example: "The dog barked loudly."
3) Pronouns: Stand in for nouns previously mentioned or easily identifiable
Example: "She left the room."
4) Demonstratives: Indicate specific items and their relative position to the speaker.
Example: "This book is interesting, but that one is not."
5) One-Anaphora: Use "one" to refer back to a previously mentioned noun.
Example: "I have an old book and a new one."
Q16) Write short note on Sentiment analysis.

• Sentiment analysis, often known as opinion mining, is a technique used in natural


language processing (NLP) to determine the emotional undertone of a document.
• This is a common method used by organisations to identify and group ideas
regarding
a certain good, service, or concept. Text is mined for sentiment and subjective
information using data mining, machine learning, and artificial intelligence (AI).
Challenges in Sentiment Analysis
1) Subjectivity and Tone: Key to distinguish opinion from fact; even objective
writings can imply sentiment.
2) Context and Polarity: Essential for proper sentiment analysis.
3) Irony and Sarcasm: Often mean the opposite of the literal words, making analysis
tricky.
4) Comparison: Requires understanding relational aspects of sentiments.
5) Defining Neutral: Challenging and context-dependent.
Sentiment Analysis follows the following five steps: data collection, text
preparation, sentiment detection, sentiment classification, and output presentation

1) Data Collection: Gathering information from user forums, social media, blogs, and
commercial sites.
2) Text Preparation: Cleaning and preparing the data by removing inappropriate language
and unrelated content to ensure accurate analysis.
3) Sentiment Detection: Using machine learning and NLP to identify the sentiment
expressed in the text, focusing on meaningful phrases and sentences.
4) Sentiment Classification: Classifying the text based on the polarity of opinions (positive
or negative)
17) Describe semantic analysis in NLP
Semantic analysis is a subfield of NLP and Machine Learning. It tries to clear the of any text and
makes one realise the emotions inherent in the sentence.
This helps in extracting important information from achieving human level accuracy from the
computers. Semantic analysis is used in extracting important information from achieving human
level accuracy from the computers. It is used in tools like machine translations, chatbots, search
engines and text analysis.It involves:
• Word Sense Disambiguation: Determining which meaning of a word is used.
• Named Entity Recognition: Identifying names, places, dates, etc.
• Semantic Role Labeling: Understanding roles within a sentence (who did what to who
m).
• Relationship Extraction: Finding connections between entities.
• Co-reference Resolution: Linking different expressions to the same entity.
• Sentiment Analysis: Identifying the emotional tone
18)Explain naïve based approach for WSD
Word Sense Disambiguation (WSD) is the task of determining which meaning of a word is being
used in a given context, especially when the word has multiple meanings. The Naïve Bayes
approach is a probabilistic method based on Bayes' theorem, which is particularly useful for this
task due to its simplicity and effectiveness.
Steps of the Naïve Bayes Approach:
1. Training Data: Collect a corpus where ambiguous words are labeled with their correct
senses. For example, sentences where "bank" is tagged as "financial institution" or "rive
r bank."
2. Feature Extraction: Identify contextual features around the ambiguous word, such as

neighboring words and part-of-speech tags.


3. Probability Calculation:
• Prior Probability (P(Sense)): The likelihood of each sense based on the training
data. For
instance, if "bank" as "financial institution" appears 70% of the time, its prior prob
ability is 0.7.
• Likelihood (P(Context | Sense)): The probability of the context features given e
ach
sense. Forexample, the presence of the word "money" near "bank" might strongl
suggest "financial institution."
4. Bayes' Theorem:
Calculate the posterior probability using: P(Sense∣Context)∝P(Context∣Sense)∗P(Sense)
5. Sense Assignment: Assign the sense with the highest posterior probability to the am
biguous word.
Example:
Consider the sentence: "He deposited cash at the bank."
• Context Features: {deposit, cash, at}
• Calculate Likelihoods:
o P(deposit | financial institution) = 0.6
o P(cash | financial institution) = 0.7
o P(at | financial institution) = 0.5
o P(deposit | river bank) = 0.1
o P(cash | river bank) = 0.1
o P(at | river bank) = 0.3
• Prior Probabilities:
o P(financial institution) = 0.7
o P(river bank) = 0.3
• Calculate Posterior Probabilities:
o P(financial institution | Context) ∝ 0.6 * 0.7 * 0.5 * 0.7
o P(river bank | Context) ∝ 0.1 * 0.1 * 0.3 * 0.3
Given the higher posterior probability, "financial institution" is chosen as the correct sens
e of
"bank" in this context.
19) Describe the Cantering Algorithm for anaphora resolution and compare its approach with the
Hobbs Algorithm.

In the Centering Algorithm, anaphora resolution involves identifying the most prominent "center" (entity)
in a discourse segment and using transition states to link pronouns and other referring expressions to
these centers. This ensures that references like "he," "she," or "it" are correctly linked to the right entities,
maintaining the coherence and flow of the text. The algorithm essentially ranks potential antecedents and
prefers those that keep the discourse clear and logically connected.
1) Centering Algorithm Approach:
Focus: Discourse coherence.
Method: Segments text, ranks potential "centers" (entities), and tracks transitions between
centers (Continue, Retain, Shift) to resolve anaphors, maintaining overall narrative flow.
2) Hobbs Algorithm Approach:
Focus: Syntactic structure.
Method: Traverses parse trees starting from the pronoun, following the leftmost path up the tree,
and searching for potential antecedents, based on syntactic constraints.
Each approach offers a unique method for resolving anaphors, with the Centering Algorithm emphasizing
discourse coherence and the Hobbs Algorithm focusing on syntactic structure.

20) Demonstrate the working of machine translation systems


Machine Translation (MT) or automated translation refers to the process when computer software
translates a text from the source language into the target language without human intervention.
Machine Translation (MT) is the task to translate a text from a source language to its
counterpart in a target language.
There are many challenging aspects of MT:
• The large variety of languages, alphabets and grammars;
• The task to translate a sequence (a sentence for example) to a sequence is harder
for a computer than working with numbers only;
• There is no one correct answer (e.g .: translating from a language without gender-
dependent pronouns, he and she can be the same).
There are three types of machine translation methods, described here in simple terms:
• Rules-based machine translation uses grammar and language rules that have been
developed by language experts, as well as customized dictionaries
• Statistical machine translation learns how to translate by analyzing a large number of
existing human translations
• Neural machine translation teaches itself how to translate by using a large neural
network. This method is becoming increasingly popular as it often provides the best
results.
Example: Consider the sentence "आप कैसे हैं ?" (Aap kaise hain?) in Hindi, which means
"How are you?" in English:
• Text Preprocessing: Tokenize the input into smaller units like words: ["आप", "कैसे", "हैं "].
• Translation Model: Use the trained neural network to find the English equivalent. Modern
systems like the Transformer model will map each word and its context.
• Neural Networks: The model, trained on Hindi-English text pairs, suggests "How are
you?" as the translation.
• Decoding: Ensure that the translated phrase is coherent and contextually correct.
• Post-processing: Adjust for any grammar or syntax issues to ensure natural-sounding
English.
• Output: Present "How are you?" as the final translated text.

You might also like