NLP Assign Mod-4,5,6 IramShaikh
NLP Assign Mod-4,5,6 IramShaikh
1) Write short note on Yarowsky Algorithm with "one sense per collocation" and "one
sense per discourse"
The Yarowsky Algorithm is a clever approach to word sense disambiguation, or figuring
out which meaning of a word is intended in a given context. Two key principles of this
algorithm are "one sense per collocation" and "one sense per discourse":
One sense per collocation: This means that a word tends to have the same meaning
when used with the same neighboring words (collocations). For example, in "river bank"
vs. "financial bank," "bank" will have a consistent meaning based on its collocates.
One sense per discourse: This principle suggests that a word usually maintains the
same meaning throughout a single piece of text or conversation. So, if "bank" is used to
mean a financial institution at the beginning of a document, it will likely keep that meaning
throughout.
The Yarowsky Algorithm uses these principles to iteratively improve its accuracy in
identifying the correct sense of a word, starting with a few manually-labeled examples and
then expanding its scope as it learns.
1) Identify the Pronoun and Sentence: Start with the pronoun you want to resolve and the
sentence it's in.
2) Traverse Left Branches: From the pronoun, traverse up the parse tree to find the leftmost
node dominating the pronoun.
3) Find Potential Antecedents in the Same Sentence: For each node along the traversal,
scan for noun phrases (NPs) within the current sentence to see if they can be the
antecedents.
4) Traverse Sibling Nodes: If no suitable antecedent is found, traverse to the left sibling
nodes of the pronoun in the parse tree and check for noun phrases.
5) Move Up the Parse Tree: If still unresolved, move to the next higher node in the tree and
repeat the process, checking each level for potential antecedents.
6) Proceed to Previous Sentences: If no antecedent is found within the sentence
containing the pronoun, apply the algorithm to the previous sentences in the text, starting
from step 1 for each sentence.
4) Explain Word sense disambiguation in detail.
• To realise the various usage patterns in the language is important for various
Natural Language Processing Applications.
• Word Sense Disambiguation is an important method of NLP, using it the meaning of
a word can be determined. And that can be used in a particular context.
• The main problem of NLP systems is to identify words properly and to determine the
specific usage of a word in a particular sentence.
• Word sense Disambiguation solves the ambiguity when that arises while determining
the meaning of the same word, when it is used in different situations.
Applications of WSD
15) What is reference resolution in discourse processing, and what are the different
types of reference phenomena encountered in natural language? Provide examples
to illustrate each type.
Reference resolution in discourse processing involves identifying and linking pronouns
and other referring expressions to the correct entities in a text. It's crucial for maintaining
coherence and understanding the flow of a conversation or narrative. Essentially, it
ensures that every "he," "she," "it," or "they" is accurately connected to the right person,
place, or thing mentioned earlier or later in the text. This helps in making the text clear and
understandable.
1) Indefinite Noun Phrases: Refer to non-specific entities.
Example: "A dog barked loudly."
2) Definite Noun Phrases: Refer to specific entities known to the reader or listener.
Example: "The dog barked loudly."
3) Pronouns: Stand in for nouns previously mentioned or easily identifiable
Example: "She left the room."
4) Demonstratives: Indicate specific items and their relative position to the speaker.
Example: "This book is interesting, but that one is not."
5) One-Anaphora: Use "one" to refer back to a previously mentioned noun.
Example: "I have an old book and a new one."
Q16) Write short note on Sentiment analysis.
1) Data Collection: Gathering information from user forums, social media, blogs, and
commercial sites.
2) Text Preparation: Cleaning and preparing the data by removing inappropriate language
and unrelated content to ensure accurate analysis.
3) Sentiment Detection: Using machine learning and NLP to identify the sentiment
expressed in the text, focusing on meaningful phrases and sentences.
4) Sentiment Classification: Classifying the text based on the polarity of opinions (positive
or negative)
17) Describe semantic analysis in NLP
Semantic analysis is a subfield of NLP and Machine Learning. It tries to clear the of any text and
makes one realise the emotions inherent in the sentence.
This helps in extracting important information from achieving human level accuracy from the
computers. Semantic analysis is used in extracting important information from achieving human
level accuracy from the computers. It is used in tools like machine translations, chatbots, search
engines and text analysis.It involves:
• Word Sense Disambiguation: Determining which meaning of a word is used.
• Named Entity Recognition: Identifying names, places, dates, etc.
• Semantic Role Labeling: Understanding roles within a sentence (who did what to who
m).
• Relationship Extraction: Finding connections between entities.
• Co-reference Resolution: Linking different expressions to the same entity.
• Sentiment Analysis: Identifying the emotional tone
18)Explain naïve based approach for WSD
Word Sense Disambiguation (WSD) is the task of determining which meaning of a word is being
used in a given context, especially when the word has multiple meanings. The Naïve Bayes
approach is a probabilistic method based on Bayes' theorem, which is particularly useful for this
task due to its simplicity and effectiveness.
Steps of the Naïve Bayes Approach:
1. Training Data: Collect a corpus where ambiguous words are labeled with their correct
senses. For example, sentences where "bank" is tagged as "financial institution" or "rive
r bank."
2. Feature Extraction: Identify contextual features around the ambiguous word, such as
In the Centering Algorithm, anaphora resolution involves identifying the most prominent "center" (entity)
in a discourse segment and using transition states to link pronouns and other referring expressions to
these centers. This ensures that references like "he," "she," or "it" are correctly linked to the right entities,
maintaining the coherence and flow of the text. The algorithm essentially ranks potential antecedents and
prefers those that keep the discourse clear and logically connected.
1) Centering Algorithm Approach:
Focus: Discourse coherence.
Method: Segments text, ranks potential "centers" (entities), and tracks transitions between
centers (Continue, Retain, Shift) to resolve anaphors, maintaining overall narrative flow.
2) Hobbs Algorithm Approach:
Focus: Syntactic structure.
Method: Traverses parse trees starting from the pronoun, following the leftmost path up the tree,
and searching for potential antecedents, based on syntactic constraints.
Each approach offers a unique method for resolving anaphors, with the Centering Algorithm emphasizing
discourse coherence and the Hobbs Algorithm focusing on syntactic structure.