02 - Morphological Analysis
02 - Morphological Analysis
02 - Morphological Analysis
Morphological
Analysis
Typical Use case ….
Absolutely loving the new update to the app. Great job! Positive Review
Very disappointed with the customer service, not helpful at all. Negative Review
I noticed the store has extended its hours. Interesting move. Neutral comment
Does anyone know if this product is available in blue? Enquiry
Just tried the new cafe downtown, and it's amazing! Praise , Positive f / b
I'm having trouble logging into my account, can you assist me? Support Request
My order has been delayed for two weeks now, what's going on? Complain
.
.
What are your store hours on weekends?
Can I get more information about the warranty on the laptop models you sell
It focuses on how the components within a word (stems, root words, prefixes, suffixes, etc.)
are arranged or modified to create different meanings.
Morphology varies greatly between languages. In languages such as Russian, word endings
indicate the role of a word in a sentence . As a result, morphological analysis depends heavily
on the source language, and an understanding of what is supported within that language plays
vital role in developing a NLP application.
The Natural Language API uses morphological analysis to infer grammatical information
about words.
Types of Morphemes
Some very Important relevant terminologies used in Morphology are …
Stem :
Is a part of a word responsible for its lexical meaning. It refers to the main part of a word to
which affixes (prefixes, suffixes, infixes, circumfixes) are added. It is the base form that
remains after removing all the affixes that modify its meaning or create new words.
Examples.
In the word "unbelievable" the stem is
For the word "runner," the
"believe."
stem is "run."
Prefix: "un-" (meaning not)
Stem: "run" (basic action)
Stem: "believe" (basic meaning: accept as true)
Suffix: "-ner" (one who does
Suffix: "-able" (meaning able to be)
the action)
The word "unbelievable" thus means 'not able
"Runner" refers to 'one who
to be believed.'
runs.'
Root :
Is the most basic, irreducible part that carries the core meaning of the word. Unlike stems, roots
cannot be broken down into smaller parts and typically do not have prefixes, suffixes, or
infixes attached to them in their most basic form. Roots form the base upon which stems and
ultimately full words are built. In many cases, the root is the same as the stem
Types of Morphemes (contd)
For the word "reaction," the root is "act." In "writer" the root is "write."
Prefix: "re-" (meaning again or back)
Root: "act" (basic action or doing) Root: "write" (basic action: to form letters or
Suffix: "-ion" (denoting the action or condition words)
of) Suffix: "-er" (one who does the action)
"Reaction" refers to 'the action of doing "Writer" refers to 'one who writes.'
something again or in response.'
Part of Speech :
Is a category of words in a language that have similar grammatical properties. Common parts
of speech include nouns, verbs, adjectives, adverbs, pronouns, prepositions, conjunctions, and
interjections. Each part of speech plays a specific role in a sentence, contributing to the
sentence's overall meaning and structure. Understanding parts of speech is crucial for
analyzing and constructing sentences effectively.
Nouns: Words that name people, places, things, Adjectives: Words that describe or modify
or ideas. nouns.
Example: "Computer," "Paris," "happiness." Example: "red," "quick," "intelligent."
Verbs: Words that express actions, occurrences,
or states of being.
Example: "run," "is," "think."
Types of Morphemes (contd)
Adverbs: Words that modify verbs, adjectives, or
other adverbs, often indicating manner, place, Conjunctions: Words that join words,
time, or degree. phrases, or clauses.
Example: "quickly," "there," "very.“ Example: "and," "but," "because.“
Pronouns: Words that take the place of nouns. Interjections: Words used to express
Example: "he," "they," "it.“ emotions or sudden bursts of feeling.
Example: "Wow!," "Ouch!," "Hey!"
Prepositions: Words that show the relationship
between a noun (or pronoun) and other words in
a sentence, often indicating time, place, or
direction.
Example: "in," "at," "by.“
Inflectional morphology
Adds information to a word consistent with its context within a sentence
Examples
• Number (singular versus plural) • Case (nominative versus accusative versus…)
automaton → automata he, him, his, …
• Walk → walks
Morphology Analysis Approaches
Morphological analysis may be defined as the process of obtaining grammatical information from
tokens, given their suffix information. Morphological analysis can be performed in three ways:
1. Morpheme-based morphology (or anitem and arrangement approach),
2. Word-based morphology (or a word and paradigm approach), and
3. Lexeme-based morphology (or an item and process approach).
1. Morpheme-based morphology
Morpheme-based morphology analyzes and describes the structure of words by breaking them
down into their smallest meaningful units, called morphemes. There are two main types of
morphemes in morpheme-based morphology. Free Morphemes: These can
stand alone as words (e.g., "book", "go"). Bound Morphemes: These cannot
stand alone and must be attached to a free morpheme (e.g., prefixes like "un-", suffixes like "-
ing"). Words are formed by combining these morphemes in a linear arrangement.
Word: "Unhappiness"
Structure: [Prefix "Un-"] + [Root "happy"] + [Suffix "-ness"]
This structure shows that the word "unhappiness" is composed of three morphemes: "un-" (a
prefix), "happy" (a root), and "-ness" (a suffix). Each morpheme contributes to the overall
meaning of the word.
Morphology Analysis Approaches (contd)
2. Word -based morphology
Word-based morphology focuses on words as the central units of morphological analysis
rather than morphemes. This approach emphasizes the full forms of words rather than
attempting to segment words into constituent morphemes. It’s a contrast to morpheme-based
morphology, which breaks down words into the smallest units of meaning. It treats words as
indivisible wholes or as bases to which processes are applied. It looks at how words change as
whole units through processes like inflection, derivation, and compounding.
There is less focus on dividing the word into prefixes, stems, and suffixes. Instead, the
processes that affect the word as a whole are examined.
-The lexical layer consists of lexemes, which are the abstract, minimal units of meaning
without any inflectional endings or derivational affixes. They represent the set of words
which often are "dictionary entries.
-The inflectional layer involves the addition of affixes to lexemes to express
grammatical relationships and features, such as tense, number, gender, etc., without
changing the core meaning or word class (e.g., "walk" to "walked").
[ Lexeme "walk" ] → [ Derivation (N/A in this case) ] ↓
[ Inflection ] → [ "walk" (base) | "walks" (3rd person singular) | "walked"
(past) | "walking" (progressive) ]
Morphology Analysis (contd)
A morphological analyzer may be defined as a program that is responsible for the analysis of
the morphology of a given input token. It analyzes a given token and generates
morphological information, such as root ,stem,prefix and so on, as an output.
While performing the morphological analysis, each particular word is analyzed. Each word is
assigned a syntactic category to discard the uncertainty from the word. Non-word tokens such
as punctuation are removed from the words.
Stemming
Stemming algorithms aim to remove those affixes required for eg. grammatical role, tense,
derivational morphology leaving only the stem of the word. This is a difficult problem due to
irregular words (eg. common verbs in English), complicated morphological rules, and part-of-
speech and sense ambiguities
NLTK algorithm
- PorterStemmer
- SnowballStemmer
- Lancaster stemmer:
Morphology Analysis (contd)
Lemmatization
Lemmatization is another technique used to reduce inflected words to their root word. It
describes the algorithmic process of identifying an inflected word’s “lemma” (dictionary
form) based on its intended meaning.
POS
Part of natural language processing is determining the role of each word or token in a body
of text. In the world of NLP, we call this process part-of-speech (POS) tagging. The NLTK
package comes with a function pos_tag() that makes this job relatively seamless, and gives
us a good starting point.
VB verb, base form – take
VBD verb, past tense – took
VBG verb, gerund/present participle – taking
VBN verb, past participle – taken
VBP verb, sing. present, non-3d – take
VBZ verb, 3rd person sing. present – takes
On the other hand, lemmatization algorithms attempt to find common base roots from
inflected words by conducting a more heuristic morphological analysis. However , to
accurately reduce inflections, a detailed dictionary must be kept so the algorithm can search
through to link an inflected word back to its lemma. Lemmatization algorithms sacrifice
speed and efficiency for accuracy, BUT, may result in meaningful base roots better than
Stemming algorithms.
Popular NLP Tools
NLTK
NLTK is a leading platform for building Python programs to work with human language data.
It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet,
along with a suite of text processing libraries for classification, tokenization, stemming,
tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries
Syntax Analysis
Sentiment Analysis
Entity Analysis
Entity Sentiment Analysis
Text Classification
Popular NLP Tools (contd)
The analyzeSyntax method returns details about the linguistic structure of the given text. For
each token in the text, the Natural Language API provides information about its internal
structure (morphology) and its role in the sentence (syntax).