0% found this document useful (0 votes)

14 views6 pages

NLP UNIT 2 Part 2

The document discusses lexical resources, focusing on Wordnet and FrameNET, which are databases that organize words and their meanings for applications in natural language processing (NLP). It also covers stemming as a technique to reduce words to their root forms, along with its advantages, disadvantages, and applications in information retrieval. Additionally, the document outlines various part-of-speech taggers used in NLP, emphasizing their methodologies and functionalities.

Uploaded by

pavani20891

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views6 pages

NLP UNIT 2 Part 2

Uploaded by

pavani20891

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

UNIT 2 (PART 2)

LEXICAL RESOURCES

Introduction:-
A lexical resource is a collection of words and/ or phrases along with associated information,
such as part-of-speech and sense definitions. These are secondary to texts, and are usually
created and enriched with the help of texts. There are 4 types of lexical nouns, verbs and
adjectives and adverbs.

Wordnet:-
Wordnet is a large lexical database of English words. Nouns, verbs, adjectives and adverbs are
grouped into a set of cognitive synonyms called ” synsets”, each expressing a distinct approach.
Synsets are interlinked using conceptual, semantic and lexical relations such as hyponymy and
antonymy.
The meaning of a word is called sense.
Noun:
1. read (something that is read) “ the article was a very good read”
Verb:
1. read (Interpret something that is written or printed)
“Read the advertisement”;“ Have you read Salman Rushdie?”
2. Read, (have or contain a certain wording or form)
“The passage reads as follows”;” what does the law say?”
3. Read, scan
“This dictionary can be read by the computer”
4. learn, study, read, take
“She is reading for the bar exam”
5. read “I read you loud and clear!”
The word ‘ read’,‘Read’ has one sense as a noun and five senses as a verb. The below
figure shows some of the relationships that hold between nouns, verbs and adjectives and
adverbs.
Nouns and verbs are organized into hierarchies based on the hypernymy/hyponymy
relation, whereas adjectives are organized into clusters based on antonym pairs.
Applications of Wordnet:-
Word net has found numerous applications in problems related with IR and NLP.

Concept Identification in Natural Language:-

Wordnet can be used to identify concepts pertaining to a term, to so them to full semantic
richness and complexity of a given information need.

Word Sense Disambiguation:-

Wordnet combines features of a number of the other resources commonly used in
disambiguation work. It offers sense definitions of words, identifies synsets of synonyms,
defines a number of semantic relations and is freely available.
Automatic Query Expansion:-
Wordnet semantic relations can be used to expand queries so that the search for a document is
not confined to the pattern-matching of query terms, but also covers synonyms.

Document Structuring and Categorization:-

The semantic information extracted from wordnets, and wordnet conceptual representation of
knowledge, have been used for text categorization.

Document Summarisation:-
Wordnet has found useful applications in text summarization. It utilizes information from word
net to compute lexical chains.
● FrameNET:
FrameNET is a large database of semantically annotated English sentences. It is based
on principles of semantics. It defines a tagset of semantic roles called the frame
element.
FrameNET captures situations through case-frame representation of words
( verbs, adjectives and nouns). The word that involves a frame is called target word or
predicate and the participant entities are defined using semantic roles, which are called
frame elements.
Each frame contains a main lexical item as predicate and associated frame-
specific semantic roles, such as AUTHORITIES, TIME and SUSPECT in the ARREST
Frame called the Frame Elements.
Example,

The above sentence annotated with the semantic roles AUTHORITIES and SUSPECT.
The target word in sentence is ’nab’ which is a verb in the ARREST frame.

FrameNET Applications:-
The shallow semantic role obtained from FrameNET can play an important role in information
extraction.
For example, a semantic role makes it impossible to identify that the theme role played
by ’match’ is the same in sentences (1) and (2) though the semantic role is different.
(1) The Empire stopped the match.
(2) The match stopped due to bad weather.
In sentence (1) the word ’match’ is the object, while it is the subject in sentence (2).

STEMMERS:-
Stemming is NLP technique that is used to reduce words to their base form, also known as the
root form. The process of stemming is used to normalize text and make it easier to process. It is
an important step in text pre-processing, and it is commonly used in information retrieval and
text mining applications.

A stunning algorithm reduces the words “ chocolates”,” chocolatey”, ” choco” to the root
word, ” chocolate” and ” retrieval”, ”retrieved” reduced to the stem ” retrieve”.
There are several different algorithms for stamming including the porter stemmer,
snowball stemmer and lancaster stemmer.
The Porter stemmer is used to remove common suffixes from the words.
The snowball stemmer is advanced and is based on Potter, but it also supports several
other languages in addition to English.
The Lancaster stemmer is less accurate than the porter stemmer and snowball stemmer.
Stemming can be useful for several NLP tasks such as, text classification, information
retrieval and text summarisation.

Disadvantages:-
● Reducing the readability of the text.
● it may not always produce the correct root form of a word.
Note: Stemming is different from Lemmatization.
Ex: Stemming for root word “like” include,
“Likes”, “Liked”, “Likely”, “Liking”

Errors in Stemming:-
There are mainly two errors,
1. Over-stemming
2. Under- stemming

Over-stemming occurs when two words are stemmed from the same root that are of
different stems.
Under- stemming occurs when two words are stemmed from the same root that are not
of different stems.

Applications of Stemming:-
1. Stemming is used in Information retrieval systems like search engines.
2. It is used to determine domain vocabularies in domain analysis.
3. A method of group analysis is used on textual materials called document ( text)
clustering. It includes subject extraction, automatic documents structuring and quick
information retrieval.
PART-OF-SPEECH TAGGER:-
Part-of-speech tagging is used at an early stage of text processing in many NLP
applications such as speech synthesis, machine translation, IR and information extraction.
In IR, part of speech tagging can be used in indexing, extracting phrases and for
disambiguating word senses. The rest of this section presents a number of part-of-speech
taggers are already in place.

Stanford Log-linear part-of-speech (POS) Tagger:-

1. It makes explicit use of both the preceding and following tag context via a
dependency network representation.
2. It uses a broad range of lexical features.
3. Utilizes priors in conditional log linear models.

A part-of-speech Tagger for English:-

This tagger uses a bi-directional inference algorithm for part-of-speech tagging.It is
based on Maximum Entropy Markov Models (MEMM).
The algorithm can enumerate all possible decomposition structures and find the
highest probability sequence together with the corresponding decomposition structure in
polynomial time.

TNT Taggers:-
Trigrams ’n’ Tags (or) TNT is an efficient statistical part-of-speech tagger.This tagger is
based on Hidden Markov Models (HMM) and uses some Optimisation techniques for
smoothing and handling unknown words.

Brill Tagger:-
Brill described a trainable rule-based tagger that obtained performance comparable to
that of a stochastic tagger.It uses transformation-based learning to automatically induce
rules.

CLAWS part-of-speech Tagger for English:-

Constituent Likelihood Automatic Word-tagging System (CLAWS) one of the earliest
probabilistic taggers for English.It can be easily adapted to different types of text in
different input formats.

Tree Tagger:-
It is a probability tagging method. It avoids problems faced by the Markov model
methods when estimating transition probabilities from sparse data.

ACoPOST: A Collection of POS Taggers:-

ACoPOST is a set of freely available POS taggers. The taggersin the set are based on
different frameworks. It consists of four taggers.
Maximum Entropy Tagger (MET):-
It uses an iterative procedure to successively improve parameters for a set of features
that help to distinguish between relevant contexts.

Trigram Tagger:- (T3)

It is based on HMM. The states in the model are tag pairs that emit words.

Error-driven Transformation-Based Tagger (TBT):-

It uses annotated corpuses to learn transformation rules, which are then used to change
the assigned tag using contextual information.

Example-based Tagger (ET):-

The underlying assumption of example-based models also called memory-based
instance-based models.

POS Tagger for Indian Languages:-

POS tagging adds many more dimensions as most of them are agglutinative,
morphologically very rich, highly inflected and are sometimes diglossic.
● RESEARCH CORPORA:-
A corpus is a searchable database of language samples for linguistic research. A
corpus may be based on written or spoken language. Some corpora are tagged
or annotated by part-of-speech. Other corpora are plain text.

● Monolingual Corpus:- It consists of texts within a single language.

● Multilingual Corpus:- It contains text written in multiple languages.
● Specialized Corpus:- It is focused on a specific domain or subject area.

A Corpus can be made up of everything from newspapers, novels, recipes, radio

broadcasts to television shows, movies and tweets.
In NLP, a Corpus contains text and speech data that can be used to train AI and
machine learning systems.

nlp
No ratings yet
nlp
17 pages
FA2000_User_Manual
No ratings yet
FA2000_User_Manual
237 pages
Ebook Buyers Guide To API Gateways
No ratings yet
Ebook Buyers Guide To API Gateways
64 pages
Development of Punjabi WordNet
No ratings yet
Development of Punjabi WordNet
18 pages
(eBook PDF) Foundations of Software Testing ISTQB Certification, 4th edition - Download the full ebook set with all chapters in PDF format
100% (1)
(eBook PDF) Foundations of Software Testing ISTQB Certification, 4th edition - Download the full ebook set with all chapters in PDF format
47 pages
dvt u4 my notes
No ratings yet
dvt u4 my notes
15 pages
Wordnet and Other NLP Tools: Course G22.2580 - Web Search Engines 3/9/2011 Wei Xu Xuwei@Cs - Nyu.Edu
No ratings yet
Wordnet and Other NLP Tools: Course G22.2580 - Web Search Engines 3/9/2011 Wei Xu Xuwei@Cs - Nyu.Edu
12 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
103 pages
UNIT 5 NLP Tools and Techniques
No ratings yet
UNIT 5 NLP Tools and Techniques
7 pages
Applied Text Analysis 2
No ratings yet
Applied Text Analysis 2
30 pages
NLP U5
No ratings yet
NLP U5
26 pages
nlp presentation
No ratings yet
nlp presentation
27 pages
nlp 9 que
No ratings yet
nlp 9 que
10 pages
Atlas Tech Inc. Canada
No ratings yet
Atlas Tech Inc. Canada
36 pages
Unit 5
No ratings yet
Unit 5
45 pages
An Online Punjabi Shahmukhi Lexical Resource
100% (1)
An Online Punjabi Shahmukhi Lexical Resource
7 pages
unit 4
No ratings yet
unit 4
70 pages
Text Mining
No ratings yet
Text Mining
34 pages
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
No ratings yet
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
63 pages
A Sentence Scoring Method For Extractive Text Summarization Based On Natural Language Queries
No ratings yet
A Sentence Scoring Method For Extractive Text Summarization Based On Natural Language Queries
5 pages
New Features For Framenet - Wordnet Mapping: Sara Tonelli and Daniele Pighin Fbk-Irst, Human Language Technologies
No ratings yet
New Features For Framenet - Wordnet Mapping: Sara Tonelli and Daniele Pighin Fbk-Irst, Human Language Technologies
9 pages
NLP Unit4
No ratings yet
NLP Unit4
13 pages
Lec-5 POStagging
No ratings yet
Lec-5 POStagging
24 pages
Lecture_5_Part_Of_Speech_Tagging
No ratings yet
Lecture_5_Part_Of_Speech_Tagging
39 pages
SemanticsSpeechRecognitionUnderstanding PDF
No ratings yet
SemanticsSpeechRecognitionUnderstanding PDF
11 pages
Module-2_NLP (1)
No ratings yet
Module-2_NLP (1)
50 pages
UNIT-2 NLP
No ratings yet
UNIT-2 NLP
12 pages
Nlp - Mid 2 Examination
No ratings yet
Nlp - Mid 2 Examination
4 pages
Machine Learning For NLP: Vocabulary
No ratings yet
Machine Learning For NLP: Vocabulary
37 pages
Flight-Data Acquisition Storage and Transmission System
No ratings yet
Flight-Data Acquisition Storage and Transmission System
20 pages
NLP Assign Mod-4,5,6 IramShaikh
No ratings yet
NLP Assign Mod-4,5,6 IramShaikh
10 pages
Grapheme:: Morpheme
No ratings yet
Grapheme:: Morpheme
20 pages
DECODE Unit4
No ratings yet
DECODE Unit4
21 pages
8300 016 ADI Flight Instrument Installation Guide PDF
100% (1)
8300 016 ADI Flight Instrument Installation Guide PDF
16 pages
nlp soln
No ratings yet
nlp soln
6 pages
Unit 3-1
No ratings yet
Unit 3-1
66 pages
Fundaments of Text Analysis
No ratings yet
Fundaments of Text Analysis
14 pages
Unit 2
No ratings yet
Unit 2
8 pages
Semanti Roles PDF
No ratings yet
Semanti Roles PDF
105 pages
NLP_Module 2
No ratings yet
NLP_Module 2
54 pages
AIML-HC Mod 04
No ratings yet
AIML-HC Mod 04
71 pages
ZedBoard HW UG v1 1 PDF
No ratings yet
ZedBoard HW UG v1 1 PDF
38 pages
Feb 2023 IDT Question Paper and Answers
No ratings yet
Feb 2023 IDT Question Paper and Answers
15 pages
ELIB User Guide
No ratings yet
ELIB User Guide
11 pages
Print Lect6 Pos
No ratings yet
Print Lect6 Pos
11 pages
NLP Unit-Iv
No ratings yet
NLP Unit-Iv
124 pages
NLP PYQ SOLUTIONS
No ratings yet
NLP PYQ SOLUTIONS
59 pages
Documentacion PDF
0% (1)
Documentacion PDF
3,426 pages
NLP_Notes
No ratings yet
NLP_Notes
12 pages
NLP UNIT 5 part b
100% (2)
NLP UNIT 5 part b
31 pages
Assignment Two
No ratings yet
Assignment Two
5 pages
BS Computer Science
No ratings yet
BS Computer Science
4 pages
Urmet Domus 1033 470 471 472 Telephone Dialer Owners Manual en
No ratings yet
Urmet Domus 1033 470 471 472 Telephone Dialer Owners Manual en
13 pages
Text Analytics Basics
No ratings yet
Text Analytics Basics
28 pages
AECC43 (1) iffh
No ratings yet
AECC43 (1) iffh
2 pages
UNIT-1 notes
No ratings yet
UNIT-1 notes
19 pages
International Journal of Information Management: Review
No ratings yet
International Journal of Information Management: Review
13 pages
NLP UNIT-2
No ratings yet
NLP UNIT-2
12 pages
Nivriti Solutions Global: World Class Fleet Management Platform
No ratings yet
Nivriti Solutions Global: World Class Fleet Management Platform
13 pages
Seminar On Natural Language Processing
No ratings yet
Seminar On Natural Language Processing
21 pages
Bit 415 Course Outline
No ratings yet
Bit 415 Course Outline
3 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
Natural Language Processing
No ratings yet
Natural Language Processing
27 pages
LKPD Matematika Kelas V Worksheet
No ratings yet
LKPD Matematika Kelas V Worksheet
8 pages
AI Unit 3 Lecture 2
No ratings yet
AI Unit 3 Lecture 2
8 pages
AP for NLP-Word 2 Vec
No ratings yet
AP for NLP-Word 2 Vec
33 pages
Tagging and its types
No ratings yet
Tagging and its types
3 pages
Cabré - Introduction - Application-Driven Terminology Engineering
No ratings yet
Cabré - Introduction - Application-Driven Terminology Engineering
19 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
Cics Refresher
100% (1)
Cics Refresher
32 pages
Advance Tech
No ratings yet
Advance Tech
9 pages
NLP Steps Basic
No ratings yet
NLP Steps Basic
26 pages
NLP TT-1 Question Bank
No ratings yet
NLP TT-1 Question Bank
21 pages
Facing The Cold Start Problem in Recommender Systems PDF
No ratings yet
Facing The Cold Start Problem in Recommender Systems PDF
9 pages
Lenovo Thinkplus Livepods lp7 en 95506 Locked Navod
No ratings yet
Lenovo Thinkplus Livepods lp7 en 95506 Locked Navod
4 pages
Tle Tech Drafting 10 q1 m3
100% (1)
Tle Tech Drafting 10 q1 m3
15 pages
Speech Recognition Architecture
No ratings yet
Speech Recognition Architecture
13 pages
AP for NLP-LO1
No ratings yet
AP for NLP-LO1
61 pages
NLP Unit Test 2
No ratings yet
NLP Unit Test 2
10 pages
IOT Based Fire Detection System
No ratings yet
IOT Based Fire Detection System
3 pages
Asad Raza - Business Analyst Sr. T P
No ratings yet
Asad Raza - Business Analyst Sr. T P
4 pages
NLP Part1
No ratings yet
NLP Part1
67 pages
Smart Pill Box: Radha Gandhi, Rohan Dhanawade, Vivek Ambekar, Pranit Chaple, Geetha Chillarge
No ratings yet
Smart Pill Box: Radha Gandhi, Rohan Dhanawade, Vivek Ambekar, Pranit Chaple, Geetha Chillarge
3 pages
Jabra Stealth - Datasheet - UC
No ratings yet
Jabra Stealth - Datasheet - UC
2 pages
SAS RubiconSL Datasheet
No ratings yet
SAS RubiconSL Datasheet
2 pages
Acceleration of Car Crash Simulations
No ratings yet
Acceleration of Car Crash Simulations
2 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
OS Unit - 4 Notes
No ratings yet
OS Unit - 4 Notes
35 pages
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
Computeractive Issue 699, 18 December 2024-2 January 2025
No ratings yet
Computeractive Issue 699, 18 December 2024-2 January 2025
76 pages