0% found this document useful (0 votes)

38 views4 pages

Lesk Algorithm

The document discusses word-sense disambiguation in Natural Language Processing (NLP), highlighting the challenges computers face in understanding words with multiple meanings. It details Lesk's Algorithm, a method for disambiguation that compares dictionary definitions and counts word overlaps within a context window, while also outlining its advantages and disadvantages. The document concludes with suggestions for improving the algorithm's accuracy, such as using better dictionaries and adjusting context window sizes.

Uploaded by

mahadev.22scse1410011

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views4 pages

Lesk Algorithm

Uploaded by

mahadev.22scse1410011

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Word-sense disambiguation in NLP

In the realm of Natural Language Processing (NLP), a facet of artificial intelligence dedicated
to deciphering and comprehending human language, the intricate dance of words wielding
multiple meanings presents a formidable challenge. While our cognitive faculties navigate
this linguistic issues adeptly, computers struggle with the same task. The significance of a
word’s connotation hinges on its surroundings, but for machines, capturing the
encompassing context remains a formidable feat. Metaphors, modifiers, negations, and the
myriad subtleties of language intricately interweave, confounding machine learning.

The pragmatic applications of NLP underscore the criticality of disentangling word senses,
propelling the emergence of diverse disambiguation techniques. Among these,
contemporary machine learning methodologies encompass supervised tactics. Here,
algorithms learn from a trove of manually annotated words, each tagged with its speciﬁc
sense, expediting classiﬁcation. Nevertheless, this route is beset with challenges:
assembling these tagged datasets proves expensive, time-intensive, and inherently
imperfect, given that even human annotators concur on word senses only 70-95% of the
time (Edmonds, 2006). Counterbalancing this e ort, unsupervised approaches come into
play, encompassing methods that endeavor to cluster words grounded in shared contextual
traits.

Lesk’s Algorithm: A simple method for word-sense disambiguation

Perhaps one of the earliest and still most used methods for word-sense disambiguation
today is Lesk’s Algorithm, proposed by Michael E. Lesk in 1986. Lesk’s algorithm is based on
the idea that words that appear together in text are related somehow, and that the
relationship and corresponding context of the words can be extracted through the definitions
of the words of interest as well as the other words used around it. Developed long before the
advent of modern technology, Lesk’s algorithm aims to disambiguate the meaning of words
of interest — usually appearing within a short phrase or sentence — by finding the pair of
matching dictionary “senses” (i.e. synonyms) with the greatest word overlap in dictionary
definitions. The Lesk algorithm is an knowledge based approach to word sense
disambiguation. It works by comparing the context of the target word to the contexts of its
di erent senses in a dictionary. The sense with the most similar context is the most likely
correct sense.

In its most basic essence, Lesk’s algorithm operates by tallying the common features
between the dictionary definitions of a target word and those of the neighboring words within
a designated “context window.” This window encompasses the surrounding terms which
would be considered for respective words. The algorithm then selects the definition
associated with the word that displays the highest count of overlaps, while excluding
common stop words like “the,” “a,” and “and.” This chosen definition is posited as the
inferred “sense” of the word.

STEPS:

1. Select Word: Choose the word for which you want to determine the correct meaning.
Let’s call this the “target word.”

2. Deﬁne Context Window: Identify a group of nearby words around the target word.
This collection of words is known as the “context window.”

3. Compare Definitions: Compare the dictionary definitions of the target word with the
definitions of the words within the context window.

4. Count Overlaps: Count the number of words that appear in both the target word’s
deﬁnitions and the deﬁnitions of the words in the context window. Exclude common
words like “the,” “a,” and “and.”

5. Choose Most Overlapping Definition: Select the definition of the target word that
has the highest count of overlapping words from step 4. This chosen definition
represents the most likely meaning of the target word in its current context.
Advantages and Disadvantages

Advantages

1. Simple to implement: Lesk’s algorithm is a relatively simple algorithm that can be

implemented in a few lines of code. This makes it a good choice for beginners and for
applications where speed is important.

2. Applicable in a variety of di erent contexts: Lesk’s algorithm does not rely on

global information or the structure of the sentence. This means that it can be applied
to new words and new contexts without having to be extensively modiﬁed. For
example, the algorithm can be used to disambiguate the word “bank” in the
sentences “The bank robber ran down the street” and “The bank of the river is very
steep.”

3. Easily generalizable: Lesk’s algorithm is easily generalizable to new words and new
contexts. This is because the algorithm does not rely on global information or the
structure of the sentence.

4. Non-syntactic: Lesk’s algorithm is non-syntactic, which means that it does not rely
on the arrangement of words in a sentence. This makes it more robust to errors in
grammar and punctuation.

5. Accuracy level: Lesk’s algorithm has been shown to be accurate in most cases,
especially for words with clearly deﬁned sentence with di erent sense of similar
words. However, the algorithm can be less accurate for words with multiple senses
that are semantically similar.

Disadvantages limitations challenges

1. Low accuracy: Lesk’s algorithm has been shown to have low accuracy, especially for
words with multiple senses that are semantically similar. For example, the word
“bank” can have multiple senses, such as a ﬁnancial institution, a riverbank, or a
slope.

2. Low recall: Lesk’s algorithm also has low recall, meaning that it cannot disambiguate
many words. This is because the algorithm does not consider the context of the word
in the sentence.

Overall, Lesk’s algorithm is a simple, e ective, and versatile algorithm for word sense
disambiguation. It is a good choice for a variety of applications, especially for those where
speed and accuracy are important.
Overall, Lesk’s algorithm is a simple and e ective algorithm for word sense disambiguation.
However, it has some limitations, such as its low accuracy and low recall. These limitations
can be addressed by using a better dictionary, weighting the matched terms, and adjusting
the context window size as mentioned below

Unanswered questions

Lesk’s algorithm leaves several questions unanswered, such as:

1. What dictionary should be used? The Lesk algorithm does not specify which
dictionary should be used. This can a ect the accuracy of the algorithm, as di erent
dictionaries may have di erent deﬁnitions for the same word.

2. Should all matched terms be considered equally, or should they be weighed by the
length of the dictionary deﬁnition? The Lesk algorithm does not specify how to weight
the matched terms. This can also a ect the accuracy of the algorithm, as some terms
may be more important than others.

3. How wide should the context window be? The Lesk algorithm does not specify how
wide the context window should be. This can also a ect the accuracy of the
algorithm, as a wider context window may capture more information about the word,
but it may also include irrelevant information.

These types of questions help us discover the best ﬁt for our use-case and improve algorithm
as per our need,

One way to improve the accuracy of Lesk’s algorithm is to use a better dictionary. A better
dictionary would have more accurate and complete deﬁnitions for words.

Another way to improve the accuracy of is to weight the matched terms. Some terms may be
more important than others, and weighting the terms can help to improve the accuracy of
the algorithm.

Another way to improve is to adjust the context window size. A wider context window may
capture more information about the word, but it may also include irrelevant information.
Adjusting the context window size can help to improve the accuracy of the algorithm by
capturing the relevant information while ﬁltering out the irrelevant information.

Learning Salesforce Development with Apex: Learn to Code, Run and Deploy Apex Programs for Complex Business Process and Critical Business Logic - 2nd Edition
From Everand
Learning Salesforce Development with Apex: Learn to Code, Run and Deploy Apex Programs for Complex Business Process and Critical Business Logic - 2nd Edition
Paul Battisson
No ratings yet
Madeleine Leininger Transcultural Nursing
100% (1)
Madeleine Leininger Transcultural Nursing
5 pages
Learning Salesforce Development with Apex: Write, Run and Deploy Apex Code with Ease (English
From Everand
Learning Salesforce Development with Apex: Write, Run and Deploy Apex Code with Ease (English
Paul Battisson
No ratings yet
NLP Assign Mod-4,5,6 IramShaikh
No ratings yet
NLP Assign Mod-4,5,6 IramShaikh
10 pages
Semantic Parsing
No ratings yet
Semantic Parsing
79 pages
NLP-UNIT III (1).pptx
No ratings yet
NLP-UNIT III (1).pptx
74 pages
Big Data Analytics
No ratings yet
Big Data Analytics
13 pages
Lexical Semantics: Prabhleen Juneja Tiet
No ratings yet
Lexical Semantics: Prabhleen Juneja Tiet
43 pages
WSD Using Semantic Relatedness Measure
No ratings yet
WSD Using Semantic Relatedness Measure
2 pages
Visual Word: Unlocking the Power of Image Understanding
From Everand
Visual Word: Unlocking the Power of Image Understanding
Fouad Sabry
No ratings yet
WSD Using Dictionary
No ratings yet
WSD Using Dictionary
4 pages
Semantic Modeling In Formal English
From Everand
Semantic Modeling In Formal English
Dr. Ir. Andries Van Renssen
No ratings yet
Introduction to LLMs for Business Leaders: Responsible AI Strategy Beyond Fear and Hype: Byte-Sized Learning Series
From Everand
Introduction to LLMs for Business Leaders: Responsible AI Strategy Beyond Fear and Hype: Byte-Sized Learning Series
I. Almeida
No ratings yet
Semantic Disambiguation
No ratings yet
Semantic Disambiguation
46 pages
Word Sense Disambiguation Using Hindi Wordnet and Lesk Approach
No ratings yet
Word Sense Disambiguation Using Hindi Wordnet and Lesk Approach
6 pages
WSD 1
No ratings yet
WSD 1
10 pages
Lecture 13
No ratings yet
Lecture 13
35 pages
ChatGPT Simplified: A Comprehensive Guide to Understanding and Utilizing AI Language Models, ChatGPT-4, ChatGPT Prompts, Fiction Writing, Blogging, Content Writing, Make Money Online
From Everand
ChatGPT Simplified: A Comprehensive Guide to Understanding and Utilizing AI Language Models, ChatGPT-4, ChatGPT Prompts, Fiction Writing, Blogging, Content Writing, Make Money Online
Silas Quantum
5/5 (1)
An Hybrid Approach To Word Sense Disambiguation
No ratings yet
An Hybrid Approach To Word Sense Disambiguation
12 pages
iOS Programming Nuts and bolts
From Everand
iOS Programming Nuts and bolts
Keith Lee
4/5 (1)
Swift Programming Nuts and bolts
From Everand
Swift Programming Nuts and bolts
Keith Lee
No ratings yet
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
Rust Essentials for New Developers: A Practical Guide with Examples
From Everand
Rust Essentials for New Developers: A Practical Guide with Examples
William E. Clark
No ratings yet
Regular Expressions Demystified: A Practical Guide with Examples
From Everand
Regular Expressions Demystified: A Practical Guide with Examples
William E. Clark
No ratings yet
International Journal of Engineering and Science Invention (IJESI)
No ratings yet
International Journal of Engineering and Science Invention (IJESI)
4 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
103 pages
Subject Oriented Programming
From Everand
Subject Oriented Programming
Godwin Ani
No ratings yet
Constraint Satisfaction: Fundamentals and Applications
From Everand
Constraint Satisfaction: Fundamentals and Applications
Fouad Sabry
No ratings yet
NLP QB2
No ratings yet
NLP QB2
9 pages
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
From Everand
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Automatic Selection of Defining Vocabulary in An E
No ratings yet
Automatic Selection of Defining Vocabulary in An E
5 pages
Towards The Sense Disambiguation of Afan Oromo Words Using Hybrid Approach (Unsupervised Machine Learning and Rule Based)
No ratings yet
Towards The Sense Disambiguation of Afan Oromo Words Using Hybrid Approach (Unsupervised Machine Learning and Rule Based)
17 pages
Introduction To Semantic Processing
No ratings yet
Introduction To Semantic Processing
13 pages
How to Learn PHP, MySQL and Javascript Quickly!: For Dummies
From Everand
How to Learn PHP, MySQL and Javascript Quickly!: For Dummies
Andrei Besedin
5/5 (1)
JavaScript Data Structures Explained: A Practical Guide with Examples
From Everand
JavaScript Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
4.NLP CIC 4 PDF
No ratings yet
4.NLP CIC 4 PDF
29 pages
Word Sense Disambiguation (WSD)
No ratings yet
Word Sense Disambiguation (WSD)
9 pages
NLP - Mid 2 Examination
No ratings yet
NLP - Mid 2 Examination
4 pages
Introduction To The Special Issue On Word Sense Disambiguation: The State of The Art
No ratings yet
Introduction To The Special Issue On Word Sense Disambiguation: The State of The Art
40 pages
Unit 3-1
No ratings yet
Unit 3-1
66 pages
Large Language Models
From Everand
Large Language Models
A. Scholtens
2/5 (2)
Terminology Extraction for Translation and Interpretation Made Easy: How to use ChatGPT and other low-cost, web-based programs to create terminology extraction lists and glossaries quickly and easily
From Everand
Terminology Extraction for Translation and Interpretation Made Easy: How to use ChatGPT and other low-cost, web-based programs to create terminology extraction lists and glossaries quickly and easily
Uwe Muegge
No ratings yet
NLP Qa
No ratings yet
NLP Qa
10 pages
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
NLP Mod4 Lec1 Word Sense Disambiguation
No ratings yet
NLP Mod4 Lec1 Word Sense Disambiguation
26 pages
An Innovative Method For Hindi Word Sense Disambiguation: Binod Kumar Mishra Suresh Jain
No ratings yet
An Innovative Method For Hindi Word Sense Disambiguation: Binod Kumar Mishra Suresh Jain
17 pages
A Knowledge Based Approach To Resolve Wo
No ratings yet
A Knowledge Based Approach To Resolve Wo
6 pages
An Introduction to Functional Programming Through Lambda Calculus
From Everand
An Introduction to Functional Programming Through Lambda Calculus
Greg Michaelson
No ratings yet
Gensim in Practice: Building Scalable NLP Systems with Topic Models, Embeddings, and Semantic Search
From Everand
Gensim in Practice: Building Scalable NLP Systems with Topic Models, Embeddings, and Semantic Search
William E. Clark
No ratings yet
The Perfect English Grammar Workbook: Simple Rules and Quizzes to Master Today's English
From Everand
The Perfect English Grammar Workbook: Simple Rules and Quizzes to Master Today's English
Lisa McLendon
No ratings yet
Word Sense
No ratings yet
Word Sense
13 pages
Me and My AI: 1, #1
From Everand
Me and My AI: 1, #1
Factsmasterx
No ratings yet
Unit No 4
No ratings yet
Unit No 4
9 pages
Programming Problems: Advanced Algorithms
From Everand
Programming Problems: Advanced Algorithms
Bradley Green
3.5/5 (7)
Scheme Language Reference: Definitive Reference for Developers and Engineers
From Everand
Scheme Language Reference: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Trigram 11
No ratings yet
Trigram 11
16 pages
Schematron: A language for validating XML
From Everand
Schematron: A language for validating XML
Erik Siegel
No ratings yet
State of The Art - Word Sense Disambiguation
No ratings yet
State of The Art - Word Sense Disambiguation
41 pages
Grammar Sucks Guide to Mastering English
From Everand
Grammar Sucks Guide to Mastering English
Gary W. McCarty
No ratings yet
Aris - 1440380105 - Anna's Archive
No ratings yet
Aris - 1440380105 - Anna's Archive
43 pages
Creating Sustainable Performance-Article Summaryp9
No ratings yet
Creating Sustainable Performance-Article Summaryp9
2 pages
Early Algebra Is Not The Same As Algebra Early
No ratings yet
Early Algebra Is Not The Same As Algebra Early
68 pages
A New Mandate For Human Resources
No ratings yet
A New Mandate For Human Resources
3 pages
Buddhist Quotes
No ratings yet
Buddhist Quotes
11 pages
Central Neurophysiology of Vision 2
No ratings yet
Central Neurophysiology of Vision 2
24 pages
Lord of The Mysteries Vol. 001
No ratings yet
Lord of The Mysteries Vol. 001
9 pages
Essay Wabi Sabi
No ratings yet
Essay Wabi Sabi
10 pages
Stanovich Cognition 2019
No ratings yet
Stanovich Cognition 2019
11 pages
What Is Well Being
No ratings yet
What Is Well Being
3 pages
Premarital Counseling 1 - Communication 3
No ratings yet
Premarital Counseling 1 - Communication 3
5 pages
HR Project On Stress Management
No ratings yet
HR Project On Stress Management
104 pages
Learning Activity Theory and Models
No ratings yet
Learning Activity Theory and Models
2 pages
Career Exploration Guide
No ratings yet
Career Exploration Guide
24 pages
What Makes A Leader
No ratings yet
What Makes A Leader
11 pages
Philippine Studies
No ratings yet
Philippine Studies
8 pages
الذاكرة طويلة المدى
No ratings yet
الذاكرة طويلة المدى
169 pages
Henok CV
No ratings yet
Henok CV
4 pages
Content Analysis in Communication Research
No ratings yet
Content Analysis in Communication Research
3 pages
Miss
No ratings yet
Miss
3 pages
Leading Indicators
No ratings yet
Leading Indicators
26 pages
In A Grove: Translated by Takashi Kojima
No ratings yet
In A Grove: Translated by Takashi Kojima
7 pages
How To Make Winning Your Lifestyle A Psychiatrists Guide To Getting and Keeping The Upper Hand (David S. Viscott) (Z-Library)
No ratings yet
How To Make Winning Your Lifestyle A Psychiatrists Guide To Getting and Keeping The Upper Hand (David S. Viscott) (Z-Library)
292 pages
Department of Education Region V (Bicol)
No ratings yet
Department of Education Region V (Bicol)
5 pages
2022년 5월 고3 이투스 전국 모의고사 영어 문제지
No ratings yet
2022년 5월 고3 이투스 전국 모의고사 영어 문제지
8 pages
3rd Quarter Skills
No ratings yet
3rd Quarter Skills
2 pages
Online Gaming
No ratings yet
Online Gaming
21 pages
Writing An Abstract
No ratings yet
Writing An Abstract
5 pages
Guarin NCP
No ratings yet
Guarin NCP
4 pages
Chinese Construction (In Chinese Language), by Linfan Mao
No ratings yet
Chinese Construction (In Chinese Language), by Linfan Mao
389 pages

Lesk Algorithm

Uploaded by

Lesk Algorithm

Uploaded by

Word-sense disambiguation in NLP

Lesk’s Algorithm: A simple method for word-sense disambiguation

1. Simple to implement: Lesk’s algorithm is a relatively simple algorithm that can be

2. Applicable in a variety of di erent contexts: Lesk’s algorithm does not rely on

Disadvantages limitations challenges

Lesk’s algorithm leaves several questions unanswered, such as:

You might also like