Natural language processing notes
Natural language processing notes
processing
Unit I
🧠 Natural Language Processing (NLP)
📌 Definition:
It combines:
Linguistics
Computer Science
Machine Learning
🌍 Scope of NLP
NLP allows machines to interact with human language for a wide variety of
tasks:
Area Description
ChatGPT)
1. 📱 Customer Support
2. 📈 Business Intelligence
3. 🏥 Healthcare
Contract analysis
6. 🛒 E-Commerce
7. 🎓 Education
1. Ambiguity of Language
3. Language Diversity
4. Data Dependency
6. Context Understanding
NLP models can learn and reflect biases present in training data
(gender, racial, political, etc.).
1. Syntax in NLP
Task Description
Sentence
Splits a paragraph into individual sentences.
Segmentation
✅ Example:
🧠 2. Semantics in NLP
Task Description
Disambiguation
(e.g., "bank" as riverbank or financial?).
(WSD)
✅ Example:
🧠 3. Pragmatics in NLP
Task Description
✅ Example:
Conversation:
Bot: (Understands the context means the user is cold and suggests
turning on the heater)
🎯 Summary Table
Semantic
Meaning WSD, NER, SRL, Entailment
s
📘 1. Boolean Model
✅ Definition:
📚 How it Works:
🔍 Example:
Only documents that contain both “AI” and “Healthcare” but not “Robotics”
will be returned.
🧠 Advantages:
⚠️Limitations:
✅ Definition:
📚 How it Works:
📈 Formula:
Where:
🔍 Example:
If a query is "Machine Learning" and Document A has those terms with high
weights, it will rank higher than Document B which only mentions "Learning".
🧠 Advantages:
Supports ranking of documents.
⚠️Limitations:
📘 3. Probabilistic Model
✅ Definition:
📚 How it Works:
🔍 Example Models:
🧠 Advantages:
⚠️Limitations:
📊 Comparison Table
Yes (based on
Term Weighting None (0 or 1) Yes (TF-IDF)
relevance)
Handles Partial
No Yes Yes
Match
✅ Summary
✅ Overview:
🧩 Example Use:
POS tagging using handcrafted rules (e.g., "if a word ends with 'ly', tag
it as an adverb").
🧠 Pros:
Interpretable, deterministic.
⚠️Cons:
Time-consuming to develop.
📈 2. Statistical Model
✅ Overview:
🧩 Example Use:
🧠 Pros:
⚠️Cons:
🧩 Example Use:
🧠 Pros:
⚠️Cons:
✅ Overview:
🧩 Example Use:
🧠 Pros:
⚠️Cons:
✅ Overview:
🧩 Example Use:
🧠 Pros:
⚠️Cons:
Computationally intensive.
Rule- Probabilistic
Feature / Statistical Rule-
Based IR Model Graphical
Model NLP Based MT
NLP Model
Learning ✅/❌
❌ No ✅ Yes ❌ No ✅ Yes
from data? (depends)
Interpretabili
High Moderate High High Moderate
ty
✅ Final Notes:
Modern NLP systems (like BERT, GPT) use deep learning, which
has largely replaced these classical models, though PGMs and
statistical models are still used in low-resource and explainable NLP
systems.
✅ Definition:
Type of
Description
Phonetics
Auditory
Focuses on how listeners perceive speech sounds.
Phonetics
🧩 Applications in NLP:
Speech Recognition Systems: Understanding sound variations
(accents, pronunciation).
Example:
🧠 2. Phonology
✅ Definition:
🔍 Key Concepts:
Concept Explanation
The smallest sound unit that can change meaning (e.g., /p/ vs.
Phoneme
/b/ in pat vs bat).
Minimal Pairs of words that differ by only one phoneme (e.g., bit vs.
Pairs pit).
🧩 Applications in NLP:
In English, the plural ending "-s" is pronounced differently based on the final
sound of the noun:
cats → /s/
dogs → /z/
buses → /ɪz/
Spectrograms, waveforms,
Tools Phonemic charts, minimal pairs
articulatory diagrams
🎯 Summary
Both are crucial for speech-based NLP tasks and understanding the
linguistic backbone of natural language.
🧱 1. Morphology
✅ Definition:
Types:
🧩 Applications in NLP:
Spell checking
Machine translation
2. Syntax
✅ Definition:
🔹 Concepts:
Parse trees
🧩 Applications in NLP:
Grammar checking
Sentence parsing
Text generation
💬 3. Semantics
✅ Definition:
🔹 Examples:
🧩 Applications in NLP:
Question answering
Chatbots
Semantic search
🧠 4. Pragmatics
✅ Definition:
🔹 Examples:
“Can you pass the salt?” → A request, not a question about ability.
🧩 Applications in NLP:
Conversational AI
Sentiment analysis
Emotion detection
🧭 5. Semiotics
✅ Definition:
🔹 Aspects:
🧩 Applications:
Symbolic reasoning
Human-computer interaction
📚 6. Discourse Analysis
✅ Definition:
🔹 Topics:
Turn-taking in conversation
🧩 Applications in NLP:
Dialogue systems
Summarization
Narrative analysis
🧬 7. Psycholinguistics
✅ Definition:
🔹 Areas:
Language acquisition
Word recognition
Sentence processing
🧩 Applications in NLP:
Cognitive modeling
Assistive technologies
8. Corpus Linguistics
✅ Definition:
🔹 Components:
Frequency analysis
Concordance analysis
🧩 Applications in NLP:
POS tagging
Collocation extraction
Sentiment analysis
🧠 Summary Table:
These are the different ways in which new words are formed in a language.
Understanding these processes helps in morphological analysis, tokenization,
and language generation.
🔹 Major Types:
telephone → phone,
Clipping Shortening a word
influenza → flu
brunch (breakfast +
Blending Combining parts of two words lunch), smog (smoke +
fog)
Acronyms &
Forming words from initial letters NASA, FBI
Initialisms
🧩 2. Morphological Analysis
✅ Definition:
🔍 Example:
Word: "unhappiness"
o Root: happy
🧠 Uses in NLP:
Stemming
Machine Translation
Spell Checkers
🔧 Tools:
Rule-based analyzers
Morphological dictionaries
✅ Definition:
Input: "running"
🔹 Structure:
States (nodes)
Accept states
📜 How it works:
🔍 Example Transition:
perl
CopyEdit
State 0:
...
💡 Applications:
🧪 Example Output:
text
CopyEdit
Input: "walked"
Implementation in NLP
Tool Description
Hunspe
Used in spell checkers (LibreOffice, Firefox)
ll
🎯 Summary
Concept Description
Word Formation
How new words are created in language
Processes
Morphological
Identifying root words and affixes
Analysis
🔹 Types:
o Example: "I love NLP. It's amazing!" → ["I love NLP.", "It's
amazing!"]
✅ Definition:
POS tagging is the process of labeling each word with its grammatical
category (e.g., noun, verb, adjective).
🔹 Example:
Meanin
Tag
g
NN Noun
VB Verb
JJ Adjectiv
Meanin
Tag
g
RB Adverb
PRP Pronoun
Libraries:
NLTK
spaCy
Stanford NLP
TextBlob
🧠 Use Cases:
Grammar checking
Machine Translation
🧬 3. Lemmatization
✅ Definition:
🔹 Example:
💡 Features:
Tools:
WordNetLemmatizer (NLTK)
spaCy's Token.lemma_
✂️4. Stemming
✅ Definition:
🔹 Example:
💡 Characteristics:
🔧 Common Algorithms:
Porter Stemmer
Lancaster Stemmer
Snowball Stemmer
🔍 Comparison Table:
Tokenizati
Feature POS Tagging Lemmatization Stemming
on
Context
❌ ✅ ✅ ❌
aware?
❌
Real words? ✅ ✅ ✅
(sometimes)
Tokenizati
Feature POS Tagging Lemmatization Stemming
on
✅ Definition:
NER is the task of identifying and classifying named entities in text into
predefined categories such as:
Person names
Organizations
Locations
Dates, Time
Monetary values
Percentages
🔍 Example:
"Barack Obama was born in Hawaii and served as President of the United
States."
NER Tags:
Hawaii → LOCATION
President → TITLE
India,
LOCATION
Himalayas
ORGANIZATIO
Google, UN
N
Entity Type Example
15 August
DATE
1947
TIME 10:30 AM
Libraries:
spaCy (ent.label_)
Stanford NLP
Flair
🧠 Applications:
Information extraction
Question answering
Resume parsing
Chatbots
✅ Definition:
🔍 Example:
Word: "bank"
🔧 Approaches:
Approach Description
based
Tools:
NLTK + WordNet
🧠 Applications:
Machine translation
Search engines
Semantic analysis
🔡 3. Word Embedding
✅ Definition:
🔍 Why?
🧠 Key Ideas:
Model Description
🌐 Example:
python
CopyEdit
# Sample sentence
vector = model.wv["dog"]
🧠 Applications:
Sentiment analysis
Recommendation systems
Chatbots
Text classification
Similarity detection
🔁 Summary Table:
📌 Definition:
🔧 How it works:
🧪 Example:
✅ Pros:
Simple, interpretable
❌ Cons:
📌 Definition:
🔧 Methods:
Unigram Tagging: Assigns the most frequent tag for each word.
🧪 Example:
Word: “play”
✅ Pros:
❌ Cons:
📌 Definition:
🔧 Key Steps:
1. Initial tagging
✅ Pros:
High accuracy
❌ Cons:
Slower to train
📌 Definition:
Uses a lexicon (dictionary) where each word is associated with its possible
POS tags based on usage frequency or pre-defined mapping.
🧠 Characteristics:
🧪 Example:
✅ Pros:
❌ Cons:
🧾 Summary Table:
Word-tag Low-
Lexical ❌ ❌
dictionary Med
NLTK (Python)
spaCy
Stanford CoreNLP
OpenNLP
AllenNLP
✅ Definition:
Speech Recognition
🧠 Key Concepts:
🔁 HMM Steps:
✅ Definition:
🧠 Key Idea:
POS tagging
NER
Text classification
✅ Pros:
📊 3. N-grams
✅ Definition:
Unigram
I, love, NLP
(n=1)
Bigram
I love, love NLP
(n=2)
Trigram
I love NLP
(n=3)
📌 Applications:
Language modeling
Text generation
Spelling correction
Auto-suggestions
⚠️Limitation:
✅ Definition:
🔍 Examples:
📊 Detected Using:
📦 Applications:
Machine translation
Question
Identify named entities as answers
Answering
Social Media
Detect brands, products, or sentiment drivers
Monitoring
Legal & Medical Extract patient names, drug names, legal terms
Application Area Description
Docs
🔧 Example:
Input:
NER Tags:
iPhone → PRODUCT
California → LOCATION
September 14 → DATE
🧾 Summary Table:
POS, NER,
MaxEnt Feature-based discriminative classifier
Sentiment
Language
N-Grams Sequence of N tokens
modeling
NER
Use of entity extraction in real domains Resume, QA, NLP
Applications
✅ Definition:
CFGs are widely used in NLP to describe the syntax of natural languages in
a simplified, yet powerful way.
✨ Components of CFG:
🔧 Example of a CFG:
plaintext
CopyEdit
S → NP VP
NP → Det N
VP → V NP
N → "cat" | "dog"
V → "chased" | "saw"
🧪 Example Sentence:
"The cat chased the dog."
Parsed as:
S
o NP (The cat)
These are specific rules to describe how words combine to form phrases
and sentences in English.
Rule Description
NP → Det
A noun phrase is a determiner followed by a noun
N
📍 Example:
NP → Det N PP
PP → P NP
VP → V Adv
🧠 4. Syntactic Parsing
✅ Definition:
It answers:
📍 Types of Parsing:
Type Description
plaintext
CopyEdit
/ \
NP VP
/\ / \
Det N V NP
| | | /\
| |
a dog
saw → root
🧠 Parsing Algorithms:
Algorithm Description
✅ Summary:
Concept Description
Grammar Rules
Rule sets describing English syntax
(English)
1. Grammar Formalisms
✅ Definition:
🌳 2. Treebanks
✅ Definition:
TIGER Treebank
Hybrid Treebank Combines features of both.
(German)
📌 Notable Treebanks:
Languag
Treebank Formalism Notes
e
Universal
Multilingu Dependency
Dependencies Cross-linguistic annotations
al grammar
(UD)
Prague
Dependency
Dependency Czech Rich morphological tagging
grammar
Treebank
(S
(. .))
🔗 Dependency Version:
Linguistic research
📈 Summary Table
Constituency
Phrase structure analysis NP → Det N
Parsing
Dependency
Head-dependent relations sleeps ← subj ← cat
Parsing
Example:
S → NP VP
NP → Det N
VP → V NP
🔧 Parsing Challenges:
Top-Down Starts from start symbol (S), tries Not efficient; may lead to
Parsing to rewrite using rules infinite loops
Bottom-Up Starts from the input and works More efficient than top-
Parsing upward down
Earley Works with all CFGs, top-down and O(n³) worst-case, O(n²)
Parser bottom-up hybrid average
Builds a parse table where each cell [i, j] stores possible non-
terminals that can derive substring from position i to j
🔹 2. Statistical Parsing
✅ What is it?
Statistical parsing uses probability to choose the most likely parse tree
when multiple parses exist.
🔍 How it Works:
✅ Definition:
✨ Format:
A → B C [p]
Example:
S → NP VP [1.0]
NP → Det N [0.6]
NP → N [0.4]
VP → V NP [0.8]
VP → V [0.2]
Given:
Two parse trees are possible. PCFG assigns probabilities to each and selects
the more likely one based on training data.
📊 Advantages of PCFG:
✅ Summary Table
Key
Concept Description
Algorithm
Defines grammatical
CFG CYK, Earley
structure
Viterbi
PCFG CFG + rule probabilities
algorithm
Unit V:Semantic
Analysis
1. Requirements for Knowledge Representation in AI
Requirement Explanation
Inferential
Should allow for efficient reasoning (fast and scalable)
Efficiency
✅ What is FOL?
Predicates
Variables
✨ Components of FOL:
Compone
Example Meaning
nt
Loves(John, Expresses a
Predicate
IceCream) relationship
Placeholder for
Variable x, y
objects
∀x, ∃x
Quantifie “for all”, “there
rs exists”
scss
CopyEdit
∀x Loves(x, IceCream)
scss
CopyEdit
∃x Teacher(x)
✨ DL Focuses On:
Concepts (Classes)
Roles (Properties/Relationships)
Individuals (Instances)
Feature Description
Individuals
Objects in the domain (e.g., Alice, Car1)
(a, b)
🧠 Example in DL Notation:
Student ⊑ Person
→ All students are persons.
✅ Applications of DL:
Medical informatics
📌 Summary Table:
✅ What is it?
🧠 Key Principles:
1. Compositional Semantics:
Meaning is derived by combining meanings of subparts.
2. Parse Trees & Grammar Rules:
Semantic rules are attached to grammar productions.
📌 Example:
Grammar rule:
nginx
CopyEdit
S → NP VP
ini
CopyEdit
Let’s say:
NP.meaning = "John"
Then:
🔹 2. Semantic Attachments
For example:
Production:
nginx
CopyEdit
VP → V NP
Semantic Attachment:
ini
CopyEdit
If:
V.meaning = λx.eat(x)
NP.meaning = "apples"
Then:
VP.meaning = eat(apples)
💡 Tools Used:
🧠 Application in NLP:
Component Role
Intermediate
e.g., First-order logic, semantic graphs
Representation
✅ Summary
Concept Description
Syntax-Driven Semantic
Constructs meaning using syntactic structure
Analysis
1. Word Senses
📌 Example:
Machine translation
Information retrieval
Question answering
Text summarization
Synonym
Same meaning big ↔ large
y
Antonym
Opposite meaning hot ↔ cold
y
Meronym
Part-whole wheel is a meronym of car
y
Holonym
Whole-part car is a holonym of wheel
y
🔹 4. Selectional Restrictions
📌 Examples:
Concept Description
🔍 Example:
In the sentence "He sat on the bank of the river",
WSD helps the system understand that "bank" refers to riverbank, not a
financial institution.
🔹 Approaches to WSD
Uses labeled training data where words are tagged with the correct
sense.
🧠 How it works:
o Decision Trees
o Naive Bayes
o SVM
o Neural Networks
✅ Example:
✔️Pros:
❌ Cons:
✅ Example:
For "bat":
✔️Pros:
❌ Cons:
Synonyms
Hypernyms
Semantic similarity
🧠 Example methods:
Sense clustering
✔️Pros:
❌ Cons:
🎯 Applications of WSD
✅ Summary Table
Requires
Approach Technique Strength
Training?
Dictionary- Easy to
Lesk Algorithm ❌ No
Based implement