Lecture-Feb20&25
Lecture-Feb20&25
Take a single observation -> Extract (find useful) Features -> Present Data in machine readable
format -> Classify the obeservation into one of the discerete classes using classification algorithm.
Techniques:
1. Handwritten Rules
- Sentiment Analysis using Lexicon and rules.
2. Supervised Machine Learning Based Classifier
- Generative Classification Algorithms
eg., Naive Bayes
- Discriminative Classificaiton Algorithms
eg., Logistic Regression, SVM
Text Classification
d – document
c – class
P(d) is constant for all
the Class values
Text Classification
d – document
c – class
P(d) is constant for all the
Class values
2. Probabilities P(fi|c) are independent given the class c and hence can be ‘naively’ multiplied.
Text Classification
Data Sparsity
1. Unknown Words in a class
Add-1 Smoothing
Text Classification
Data Sparsity
1. Unknown Words in the test data – Ignore the particular word and work with remaining words.
P(sentiment class|love to visit this fantastic place) = ? if ‘visit’ is missing in the training
corpus.
Text Classification