02 - Morphological Analysis

Unit # 1
Morphological
Analysis
Typical Use case ….
Absolutely loving the new update to the app. Great job! Positive Review
Very disappointed with the customer service, not helpful at all. Negative Review
I noticed the store has extended its hours. Interesting move. Neutral comment
Does anyone know if this product is available in blue? Enquiry
Just tried the new cafe downtown, and it's amazing! Praise , Positive f / b
I'm having trouble logging into my account, can you assist me? Support Request
My order has been delayed for two weeks now, what's going on? Complain
.
.
What are your store hours on weekends?
Can I get more information about the warranty on the laptop models you sell
Suggestions Service Enquiry Complaint Top Mgmt

0.45 0.72 0.35 0.85 0.15
What is Morphology ?
In linguistics, Morphology is the study of the internal structure of words. It is the study of
words, how they are formed, and their relationship to other words in the same language. It
analyzes the structure of words and parts of words such as stems, root words, prefixes, and
suffixes. Morphology also looks at parts of speech, intonation and stress, and the ways
context can change a word's pronunciation and meaning.
It focuses on how the components within a word (stems, root words, prefixes, suffixes, etc.)
are arranged or modified to create different meanings.
Morphology varies greatly between languages. In languages such as Russian, word endings
indicate the role of a word in a sentence . As a result, morphological analysis depends heavily
on the source language, and an understanding of what is supported within that language plays
vital role in developing a NLP application.
The Natural Language API uses morphological analysis to infer grammatical information
about words.
Types of Morphemes
Some very Important relevant terminologies used in Morphology are …
Stem :
Is a part of a word responsible for its lexical meaning. It refers to the main part of a word to
which affixes (prefixes, suffixes, infixes, circumfixes) are added. It is the base form that
remains after removing all the affixes that modify its meaning or create new words.
Examples.
In the word "unbelievable" the stem is
For the word "runner," the
"believe."
stem is "run."
Prefix: "un-" (meaning not)
Stem: "run" (basic action)
Stem: "believe" (basic meaning: accept as true)
Suffix: "-ner" (one who does
Suffix: "-able" (meaning able to be)
the action)
The word "unbelievable" thus means 'not able
"Runner" refers to 'one who
to be believed.'
runs.'
Root :
Is the most basic, irreducible part that carries the core meaning of the word. Unlike stems, roots
cannot be broken down into smaller parts and typically do not have prefixes, suffixes, or
infixes attached to them in their most basic form. Roots form the base upon which stems and
ultimately full words are built. In many cases, the root is the same as the stem
Types of Morphemes (contd)
For the word "reaction," the root is "act." In "writer" the root is "write."
Prefix: "re-" (meaning again or back)
Root: "act" (basic action or doing) Root: "write" (basic action: to form letters or
Suffix: "-ion" (denoting the action or condition words)
of) Suffix: "-er" (one who does the action)
"Reaction" refers to 'the action of doing "Writer" refers to 'one who writes.'
something again or in response.'
Part of Speech :
Is a category of words in a language that have similar grammatical properties. Common parts
of speech include nouns, verbs, adjectives, adverbs, pronouns, prepositions, conjunctions, and
interjections. Each part of speech plays a specific role in a sentence, contributing to the
sentence's overall meaning and structure. Understanding parts of speech is crucial for
analyzing and constructing sentences effectively.
Nouns: Words that name people, places, things, Adjectives: Words that describe or modify
or ideas. nouns.
Example: "Computer," "Paris," "happiness." Example: "red," "quick," "intelligent."
Verbs: Words that express actions, occurrences,
or states of being.
Example: "run," "is," "think."
Types of Morphemes (contd)
Adverbs: Words that modify verbs, adjectives, or
other adverbs, often indicating manner, place, Conjunctions: Words that join words,
time, or degree. phrases, or clauses.
Example: "quickly," "there," "very.“ Example: "and," "but," "because.“
Pronouns: Words that take the place of nouns. Interjections: Words used to express
Example: "he," "they," "it.“ emotions or sudden bursts of feeling.
Example: "Wow!," "Ouch!," "Hey!"
Prepositions: Words that show the relationship
between a noun (or pronoun) and other words in
a sentence, often indicating time, place, or
direction.
Example: "in," "at," "by.“
Inflectional morphology
Adds information to a word consistent with its context within a sentence
Examples
• Number (singular versus plural) • Case (nominative versus accusative versus…)
automaton → automata he, him, his, …
• Walk → walks
Morphology Analysis Approaches
Morphological analysis may be defined as the process of obtaining grammatical information from
tokens, given their suffix information. Morphological analysis can be performed in three ways:
1. Morpheme-based morphology (or anitem and arrangement approach),
2. Word-based morphology (or a word and paradigm approach), and
3. Lexeme-based morphology (or an item and process approach).
1. Morpheme-based morphology
Morpheme-based morphology analyzes and describes the structure of words by breaking them
down into their smallest meaningful units, called morphemes. There are two main types of
morphemes in morpheme-based morphology. Free Morphemes: These can
stand alone as words (e.g., "book", "go"). Bound Morphemes: These cannot
stand alone and must be attached to a free morpheme (e.g., prefixes like "un-", suffixes like "-
ing"). Words are formed by combining these morphemes in a linear arrangement.
Word: "Unhappiness"
Structure: [Prefix "Un-"] + [Root "happy"] + [Suffix "-ness"]
This structure shows that the word "unhappiness" is composed of three morphemes: "un-" (a
prefix), "happy" (a root), and "-ness" (a suffix). Each morpheme contributes to the overall
meaning of the word.
Morphology Analysis Approaches (contd)
2. Word -based morphology
Word-based morphology focuses on words as the central units of morphological analysis
rather than morphemes. This approach emphasizes the full forms of words rather than
attempting to segment words into constituent morphemes. It’s a contrast to morpheme-based
morphology, which breaks down words into the smallest units of meaning. It treats words as
indivisible wholes or as bases to which processes are applied. It looks at how words change as
whole units through processes like inflection, derivation, and compounding.
There is less focus on dividing the word into prefixes, stems, and suffixes. Instead, the
processes that affect the word as a whole are examined.
Base Word: "Run" → Past Tense Process → Result: "Ran"

Morphology Analysis Approaches (contd)
3. Lexeme-based morphology
Lexeme-based morphology is a theoretical framework in linguistics, which separates
morphological processes into two layers: the lexical layer and the inflectional layer.
-The lexical layer consists of lexemes, which are the abstract, minimal units of meaning
without any inflectional endings or derivational affixes. They represent the set of words
which often are "dictionary entries.
-The inflectional layer involves the addition of affixes to lexemes to express
grammatical relationships and features, such as tense, number, gender, etc., without
changing the core meaning or word class (e.g., "walk" to "walked").
[ Lexeme "walk" ] → [ Derivation (N/A in this case) ] ↓
[ Inflection ] → [ "walk" (base) | "walks" (3rd person singular) | "walked"
(past) | "walking" (progressive) ]
Morphology Analysis (contd)
A morphological analyzer may be defined as a program that is responsible for the analysis of
the morphology of a given input token. It analyzes a given token and generates
morphological information, such as root ,stem,prefix and so on, as an output.
While performing the morphological analysis, each particular word is analyzed. Each word is
assigned a syntactic category to discard the uncertainty from the word. Non-word tokens such
as punctuation are removed from the words.
Stemming
Stemming algorithms aim to remove those affixes required for eg. grammatical role, tense,
derivational morphology leaving only the stem of the word. This is a difficult problem due to
irregular words (eg. common verbs in English), complicated morphological rules, and part-of-
speech and sense ambiguities
NLTK algorithm
- PorterStemmer
- SnowballStemmer
- Lancaster stemmer:
Morphology Analysis (contd)
Lemmatization
Lemmatization is another technique used to reduce inflected words to their root word. It
describes the algorithmic process of identifying an inflected word’s “lemma” (dictionary
form) based on its intended meaning.
POS
Part of natural language processing is determining the role of each word or token in a body
of text. In the world of NLP, we call this process part-of-speech (POS) tagging. The NLTK
package comes with a function pos_tag() that makes this job relatively seamless, and gives
us a good starting point.
VB verb, base form – take
VBD verb, past tense – took
VBG verb, gerund/present participle – taking
VBN verb, past participle – taken
VBP verb, sing. present, non-3d – take
VBZ verb, 3rd person sing. present – takes
NN noun, singular ‘- desk’

NNS noun plural – ‘desks’
NNP proper noun - America
NNPS proper noun, plural - Americans
RB adverb – very, silently,

Stemming Vs Lemmatisation
Stemming and lemmatization are both text-processing techniques that aim to reduce
inflected words to a common base root. Despite the correlation in the overarching objective,
the two techniques are not the same.
Stemming algorithms attempt to find the common base roots of various inflections by cutting
off the endings or beginnings of the word. The crude heuristic approach taken by stemming
algorithms typically means they’re fast and efficient but not always accurate.
On the other hand, lemmatization algorithms attempt to find common base roots from
inflected words by conducting a more heuristic morphological analysis. However , to
accurately reduce inflections, a detailed dictionary must be kept so the algorithm can search
through to link an inflected word back to its lemma. Lemmatization algorithms sacrifice
speed and efficiency for accuracy, BUT, may result in meaningful base roots better than
Stemming algorithms.
Popular NLP Tools
NLTK
NLTK is a leading platform for building Python programs to work with human language data.
It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet,
along with a suite of text processing libraries for classification, tokenization, stemming,
tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries
Google Natural Language API

The Google Natural Language API is an easy to use interface to a set of powerful NLP models
which have been pre-trained by Google to perform various tasks. As these models have been
trained on enormously large document corpuses, their performance is usually quite good as long
as they are used on datasets that do not make use of a very idiosyncratic language.
The Natural Language API comprises five different services:
Syntax Analysis
Sentiment Analysis
Entity Analysis
Entity Sentiment Analysis
Text Classification
Popular NLP Tools (contd)
The analyzeSyntax method returns details about the linguistic structure of the given text. For
each token in the text, the Natural Language API provides information about its internal
structure (morphology) and its role in the sentence (syntax).
Google AutoML Natural Language

• If the Natural Language API is not flexible enough for business purposes, then AutoML
Natural Language is the next choice. AutoML is a new Google Cloud Service (still in beta)
that enables the user to create customized machine learning models. In contrast to the
Natural Language API, the AutoML models will be trained on the user’s data and therefore
fit a specific task. The AutoML service requires a bit more effort for the user, mainly
because you have to provide a dataset to train the model.
• The AutoML service covers three use cases. All of these use cases support solely the
English language for now.
1. AutoML Text Classification

2. AutoML Entity Extraction
Thanks
Google AutoML Natural Language
If the Natural Language API is not flexible enough for your business purposes, then AutoML
Natural Language might be the right service. AutoML is a new Google Cloud Service (still in
beta) that enables the user to create customized machine learning models. In contrast to the
Natural Language API, the AutoML models will be trained on the user’s data and therefore fit
a specific task.

02 - Morphological Analysis

Uploaded by

Copyright:

Available Formats

02 - Morphological Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

02 - Morphological Analysis

Uploaded by

Copyright:

Available Formats

Unit # 1

Suggestions Service Enquiry Complaint Top Mgmt

Base Word: "Run" → Past Tense Process → Result: "Ran"

NN noun, singular ‘- desk’

RB adverb – very, silently,

Google Natural Language API

The Natural Language API comprises five different services:

Google AutoML Natural Language

1. AutoML Text Classification

You might also like