Lecture-Feb20&25

The document discusses text classification, which involves assigning labels to texts, including tasks like sentiment analysis and spam detection. It outlines various techniques such as handwritten rules and supervised machine learning classifiers, specifically focusing on generative and discriminative algorithms like Naive Bayes. The document also addresses challenges like data sparsity and the importance of smoothing techniques in classification.

Uploaded by

Janvi Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views11 pages

Lecture-Feb20&25

Uploaded by

Janvi Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

CSN-528

Lecture-Feb 20&25, 2025

Text Classification

A task of assigning a label or category to an entire text or document.

- Sentiment Analysis (assigning positive/negative tags to a piece of text
(phrase/sentence/document)) is a text classification task.
- Ranging from movie, restaurant, tourism domains to politics.

- Spam Detection (assigning legitimate or illegitimate tag to a mail)

Cues: “online pharmaceutical” or “WITHOUT ANY COST” or “Dear Winner”

- Language Identification in code-mixed language.

- Topic lable (sports/entertainment/education) to a text.
Text Classification

- Even language modeling can also be viewed as classification:

- Each word can be thought of as a class, and so predicting the next word is classifying the
context-so-far into a class for each next word.

- Similarly POS tagging can also be viewed as classification.

Text Classification

Text classification pipeline:

Take a single observation -> Extract (find useful) Features -> Present Data in machine readable
format -> Classify the obeservation into one of the discerete classes using classification algorithm.
Techniques:
1. Handwritten Rules
- Sentiment Analysis using Lexicon and rules.
2. Supervised Machine Learning Based Classifier
- Generative Classification Algorithms
eg., Naive Bayes
- Discriminative Classificaiton Algorithms
eg., Logistic Regression, SVM
Text Classification

Generative classification Algorithm:

We compute P(Observation|Class)
We compute how an observation is generated from a probable class value.

Discriminative classification Algorithm:

We compute P(Class|Observation) directly.
Text Classification

Naive Bayes Algorithm for Classification

- This idea of inference using Bayes Rule has been known since the work of Bayes (1763),
and was first applied to text classification by Mosteller and Wallace (1964).
- Use Bayes rule to transform the desired conditional probability P(Class|document) into some other
expression computable from the corpus.

d – document
c – class
P(d) is constant for all
the Class values
Text Classification

Naive Bayes Assumptions: f is a set of features representing d.

d – document
c – class
P(d) is constant for all the
Class values

1. Order of features doesn’t matter, called bag-of-words assumption.

2. Probabilities P(fi|c) are independent given the class c and hence can be ‘naively’ multiplied.
Text Classification

Generic Form of NB Classifier (A Linear Classifier) for text classification

How to compute prior and posterior?

Word wi in class c

All words in vocab in

class c
Text Classification

Data Sparsity
1. Unknown Words in a class

The zero occurence of fantastic will make the entire P(C|f)

zero, while f = {it’s fun to visit this fantastic place}

Add-1 Smoothing
Text Classification

Data Sparsity
1. Unknown Words in the test data – Ignore the particular word and work with remaining words.

P(sentiment class|love to visit this fantastic place) = ? if ‘visit’ is missing in the training
corpus.
Text Classification

Compute P(Class|predictable with no fun))?

P(+|test) = ((P(predictable|+)P(with|+)P(no|+)P(fun|+)) P(+))/P(test)

NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
100% (1)
NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
86 pages
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
No ratings yet
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
38 pages
Lecture5 421
No ratings yet
Lecture5 421
115 pages
05 Text Classification - Naive Bayes
No ratings yet
05 Text Classification - Naive Bayes
64 pages
05 Text Classification - Naive Bayes (1)
No ratings yet
05 Text Classification - Naive Bayes (1)
64 pages
mla_unit-5'2 (1)
No ratings yet
mla_unit-5'2 (1)
74 pages
Lecture 5-1 Naive
No ratings yet
Lecture 5-1 Naive
44 pages
Naive Bayes Sentiment Analysis
No ratings yet
Naive Bayes Sentiment Analysis
23 pages
Text Classification
No ratings yet
Text Classification
24 pages
Lecture 6- Word2Vec and Text Classification
No ratings yet
Lecture 6- Word2Vec and Text Classification
66 pages
NLP ch4 l1
No ratings yet
NLP ch4 l1
23 pages
Lec 2
No ratings yet
Lec 2
21 pages
IR unit 2 (1,2)
No ratings yet
IR unit 2 (1,2)
76 pages
Text Classification[1][1]
No ratings yet
Text Classification[1][1]
11 pages
Text Classification in ML
No ratings yet
Text Classification in ML
47 pages
Text Classification: Slides Adapted From Lyle Ungar and Dan Jurafsky
No ratings yet
Text Classification: Slides Adapted From Lyle Ungar and Dan Jurafsky
29 pages
Naive Bayes and Sentiment Classification
No ratings yet
Naive Bayes and Sentiment Classification
23 pages
nb24aug
No ratings yet
nb24aug
85 pages
L5 TextClassification Updated
No ratings yet
L5 TextClassification Updated
179 pages
4.Machine Learning for Text Understanding-1
No ratings yet
4.Machine Learning for Text Understanding-1
45 pages
3. Text Classification
No ratings yet
3. Text Classification
60 pages
ss-dx6vs
No ratings yet
ss-dx6vs
56 pages
BAI601 Module 3 PDF
No ratings yet
BAI601 Module 3 PDF
19 pages
nb24aug
No ratings yet
nb24aug
79 pages
Modele Des Sommes Versees Aux Tiers
No ratings yet
Modele Des Sommes Versees Aux Tiers
78 pages
Naivebayes 2021
No ratings yet
Naivebayes 2021
77 pages
Text Classification PDF
No ratings yet
Text Classification PDF
56 pages
NLP Module 3
No ratings yet
NLP Module 3
66 pages
Text Classification
No ratings yet
Text Classification
53 pages
NLP NB
No ratings yet
NLP NB
52 pages
4 Naive Bayes
No ratings yet
4 Naive Bayes
82 pages
Text Classification Using TF-IDF and Machine Learning
No ratings yet
Text Classification Using TF-IDF and Machine Learning
30 pages
Resentation On Aïve Bayesian Lassification
No ratings yet
Resentation On Aïve Bayesian Lassification
38 pages
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
No ratings yet
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
55 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
48 pages
01 What Is Text Classification 8-12
No ratings yet
01 What Is Text Classification 8-12
4 pages
24 Shivangi DMDW
No ratings yet
24 Shivangi DMDW
12 pages
4 NB 2024
No ratings yet
4 NB 2024
82 pages
IOT Based Lamp Security System
No ratings yet
IOT Based Lamp Security System
37 pages
Electrical Diagrams FG-D15-35-E2 (T)
No ratings yet
Electrical Diagrams FG-D15-35-E2 (T)
36 pages
Naive Bayes With Sentiment Classification
No ratings yet
Naive Bayes With Sentiment Classification
82 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper (1)
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper (1)
74 pages
Multimedia Application L7_for
No ratings yet
Multimedia Application L7_for
46 pages
Technology
No ratings yet
Technology
37 pages
MultinomialNB
No ratings yet
MultinomialNB
52 pages
NB 24 Aug
No ratings yet
NB 24 Aug
82 pages
5 - Spe-2969-Ms
No ratings yet
5 - Spe-2969-Ms
27 pages
EVBA - Info
No ratings yet
EVBA - Info
108 pages
04 Textcat
No ratings yet
04 Textcat
101 pages
Naive Bayes
No ratings yet
Naive Bayes
56 pages
DCN-UNIT2-DATA AND SIGNALS
No ratings yet
DCN-UNIT2-DATA AND SIGNALS
13 pages
Multimedia Application L8
No ratings yet
Multimedia Application L8
68 pages
ASSA ABLOY SL500 Telescopic Sliding Door Overview
No ratings yet
ASSA ABLOY SL500 Telescopic Sliding Door Overview
32 pages
Document
No ratings yet
Document
7 pages
Unit 2
No ratings yet
Unit 2
26 pages
Lecture 12 Dr. Lamiaa
No ratings yet
Lecture 12 Dr. Lamiaa
21 pages
Lect05
No ratings yet
Lect05
17 pages
Naive Bayes and Sentiment
No ratings yet
Naive Bayes and Sentiment
19 pages
ML CLassification Naive Bayes
No ratings yet
ML CLassification Naive Bayes
6 pages
Irs Unit 4 CH 1
No ratings yet
Irs Unit 4 CH 1
58 pages
13. TEXT CLASSIFICATION USING NLP
No ratings yet
13. TEXT CLASSIFICATION USING NLP
28 pages
2022 - Hao Zhang
No ratings yet
2022 - Hao Zhang
12 pages
Naive_Bayes_Classifier_Presentation
No ratings yet
Naive_Bayes_Classifier_Presentation
10 pages
Effect of Social Media On The Academic P
No ratings yet
Effect of Social Media On The Academic P
23 pages
Installation, Operation and Maintenance Manual: Dalamatic Cased Dust Collectors
No ratings yet
Installation, Operation and Maintenance Manual: Dalamatic Cased Dust Collectors
42 pages
Rubrics For App Design Presentation
No ratings yet
Rubrics For App Design Presentation
2 pages
Finkbeiner - PDF - Elevator - Ac Power Plugs and Sockets
No ratings yet
Finkbeiner - PDF - Elevator - Ac Power Plugs and Sockets
5 pages
API 579 G Factors For K Calculations
No ratings yet
API 579 G Factors For K Calculations
10 pages
Introduction To Satellite Communication (
No ratings yet
Introduction To Satellite Communication (
49 pages
NaiveBayes N Text Analytics
No ratings yet
NaiveBayes N Text Analytics
20 pages
CH 1
No ratings yet
CH 1
12 pages
Management Plane and Netconf
No ratings yet
Management Plane and Netconf
14 pages
TM Indian History by Kareem Sir RC Reddy
No ratings yet
TM Indian History by Kareem Sir RC Reddy
490 pages
Introduction To Paper Mills
No ratings yet
Introduction To Paper Mills
8 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
27 pages
Using Project Management Office (Pmo) To Improve Project Management Abilities
No ratings yet
Using Project Management Office (Pmo) To Improve Project Management Abilities
13 pages
Computer Software - MBA
No ratings yet
Computer Software - MBA
12 pages
CM6800GIP
No ratings yet
CM6800GIP
18 pages
SFRA Test Signatures Comparison
No ratings yet
SFRA Test Signatures Comparison
26 pages
BSNL HP Telecom Circle
No ratings yet
BSNL HP Telecom Circle
9 pages
Bhuvanteza Happy Homes Hmda Fee Receipts
No ratings yet
Bhuvanteza Happy Homes Hmda Fee Receipts
6 pages
Feedback On The Self-Test - Ais 5131 Chapter 1 Part B
No ratings yet
Feedback On The Self-Test - Ais 5131 Chapter 1 Part B
3 pages
AI Lec 04+05 - Naive Bayes
No ratings yet
AI Lec 04+05 - Naive Bayes
55 pages
How To Get Banned From The Apricity Forum
No ratings yet
How To Get Banned From The Apricity Forum
9 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Mẫu Đơn Yêu Cầu Gửi Điểm Tới Các Tổ Chức Copy 2
No ratings yet
Mẫu Đơn Yêu Cầu Gửi Điểm Tới Các Tổ Chức Copy 2
1 page
1.4 Disclaimer: Shareware Register
No ratings yet
1.4 Disclaimer: Shareware Register
1 page
Brush-up java for Interview
From Everand
Brush-up java for Interview
Ashutosh Shashi
5/5 (1)
Naive Bayes Classifier: Fundamentals and Applications
From Everand
Naive Bayes Classifier: Fundamentals and Applications
Fouad Sabry
No ratings yet