1 - Introducntion To NLP
1 - Introducntion To NLP
• The work done in this phase focused mainly on machine translation (MT).
• Later, Chomsky developed his first book syntactic structures and claimed
that language is generative in nature.
Introduction:
• (1960-1980) – Second Phase Flavored with Artificial Intelligence (AI)
• In Case Grammar, case roles can be defined to link certain kinds of verbs and
objects.
Introduction:
• For example: “Ram broke the glass with the hammer". In this example case
grammar identify Ram as an agent, glass as a theme, and hammer as an
instrument.
• In the year 1960 to 1980, key systems like SHRDLU, LUNAR were developed
• It can handle instructions such as "pick up the green ball" and also answer
the questions like "What is inside the black box."
• Till the year 1980, natural language processing systems were based on
complex sets of hand-written rules.
• After 1980, NLP were introduced with machine learning algorithms for
language processing.
• In the beginning of the year 1990s, NLP started growing faster and achieved
good process accuracy, especially in English Grammar.
Introduction:
• In 1990 also, an electronic text introduced, which provided a good resource
for training and examining natural language programs.
• Other factors may include the availability of computers with fast CPUs and
more memory.
2. Synonyms
4. Ambiguity
7. Domain-specific language
8. Low-resource languages
• The same words and phrases can have different meanings according the
context of a sentence and many words – especially in English – have the
exact same pronunciation but totally different meanings.
• For example
• I ran to the store because we ran out of milk.
• Can I run something past that you real may like?
• The house is looking really run down.
• NLP language models may need to have learned all of the definitions,
differentiating between them in context can present problems.
Introduction:
• Homonyms – two or more words that are pronounced the same but have
different definitions – can be problematic for question answering and speech-
to-text applications.
• Usage of their and there, for example, is even a common problem for humans.
2. Synonyms
• Some of these words may convey exactly the same meaning, while some may
be levels of complexity (small, little, tiny, minute).
• So, building NLP systems, it’s important to include all of a word’s possible
meanings and all possible synonyms.
Introduction:
3. Irony and sarcasm
• Irony and sarcasm present problems for machine learning models that generally use words
• Models can be trained with certain cues that frequently accompany ironic or sarcastic
• Sarcasm is a specific form of verbal irony that involves making a cutting or mocking remark
• Example : When someone makes a mistake, you might say, "Brilliant move."
4. Ambiguity : Ambiguity in NLP refers to sentences and phrases that potentially have two or
For example :
Train:
. For example:
He is a smart guy
Introduction:
• Syntactic ambiguity: Syntactic ambiguity arises from the structure or arrangement
of words and phrases within a sentence.
• Grammatical Errors:
• Spelling Errors:
Colloquialisms and slang are informal expressions and words that are
commonly used in everyday speech, particularly within specific regions or
social groups.
Introduction:
• They often deviate from formal language and may not be suitable for formal writing.
"Wanna" (short for "want to") "Gonna" (short for "going to"):
"The party last night was lit!“ "Let's just chill at a party.“
7. Domain-specific language
• For example
• The more data NLP models are trained on, the smarter they become.
• All of the problems above will require more research and new techniques in
order to improve on them.
• Advanced practices like artificial neural networks and deep learning allow a
multitude of NLP techniques, algorithms, and models to work progressively,
much like the human mind does.
NLP pipeline
• NLP pipeline includes the following steps for building a NLP model
• Sentence Segment is the first step for building the NLP pipeline. It breaks the
paragraph into separate sentences.
• Independence Day is one of the important festivals for every Indian citizen. It is
celebrated on the 15th of August each year ever since India got independence
from the British rule. The day celebrates independence in the true sense.
• 1. "Independence Day is one of the important festivals for every Indian citizen."
• 2. "It is celebrated on the 15th of August each year ever since India got
independence from the British rule.“
• Word Tokenizer is used to break the sentence into separate words or tokens.
• For Example:
Step3: Stemming
• Stemming is used to normalize words into its base form or root form.
NLP pipeline
• For Example - celebrates, celebrated and celebrating, all these words are
originated with a single root word "celebrate"
• The big problem with stemming is that sometimes it produces the root word which
may not have any meaning.
• For Example - intelligence, intelligent, and intelligently, all these words are
originated with a single root word "intelligen.“
• Step 4: Lemmatization
• The main difference between Stemming and lemmatization is that it produces the
root word, which has a meaning.
NLP pipeline
• For example: In lemmatization, the words intelligence, intelligent, and
intelligently has a root word intelligent, which has a meaning.
• In English, there are a lot of words that appear very frequently like "is",
"and", "the", and "a". NLP pipelines will flag these words as stop words.
• Stop words might be filtered out before doing any statistical analysis.
• Dependency Parsing is used to find that how all the words in the sentences
are related to each other (determine the syntactic relationships between
words).
NLP pipeline
Step 7: POS tags
• POS stands for parts of speech, which includes Noun, verb, adverb, and
Adjective.
• A word has one or more parts of speech based on the context in which it is
used.
• Named Entity Recognition (NER) is the process of detecting the named entity
such as person name, movie name, organization name, or location.
NLP pipeline
• For Example: Steve Jobs introduced iPhone at the Macworld Conference in
San Francisco and California.
• Step 9: Chunking
• Lexical analysis is the process of breaking down a text file into paragraphs,
sentences, phrases, and words.
i. Stop word removal (removing ‘and’, ‘of’, ‘the’ etc. from text)
Word tokenizer
Sentence tokenizer
Tweet tokenizer
iii. Stemming (removing ‘ing’, ‘es’, ‘s’ from the tail of the words)
• It tries to parse the sentence in order to ensure that the grammar is correct
at the sentence level.
• For Example: This word in the sentence does not make sense: “Truck is eating
Oranges “
i. Dependency Parsing
• It also deals with putting words together to form sentences and extracts the
text’s exact meaning or dictionary definition.
information summary.
• Here, the machine is able to understand that the word “he” in the second sentence is
referring to “John”.
5) Pragmatic Analysis: It is a complex phase where machines should have knowledge not
only about the provided text but also about the real world.
• There can be multiple scenarios where the intent of a sentence can be misunderstood if the
machine doesn’t have real world knowledge.
Example:
(Contains sarcasm)
"Can you share your screen?" (here the context is about computer’s
But NLP isn't perfect, although there are over 7000 languages spoken
around the globe, most NLP processes only use some languages: English,
Hindi, Chinese, Urdu, Farsi, Arabic, French, and Spanish etc.
• Components of NLP
Train:
Syntactic Ambiguity
• Formal Language
e.g., 01110 and 111 are strings from the alphabet B above.
• Grammar
• Grammar refers to the set of rules that govern the structure and
syntax of a language.
Language and Grammar
• These rules define how words and phrases can be combined to
create grammatically correct sentences.