Machine Learning and Web Scraping Lecture 03
Machine Learning and Web Scraping Lecture 03
Unlike regression, the output variable of Classification is a category, not a value, such as "Green or Blue", "fruit or
animal", etc. Since the Classification algorithm is a Supervised learning technique, hence it takes labeled input
data, which means it contains input with the corresponding output.
In classification algorithm, a discrete output function(y) is mapped to input variable(x).
The best example of an ML classification algorithm is Email Spam Detector.
The main goal of the Classification algorithm is to identify the category of a given dataset, and these algorithms are
mainly used to predict the output for the categorical data.
Classification algorithms can be better understood using the below diagram. In the below diagram, there are two
classes, class A and Class B. These classes have features that are similar to each other and dissimilar to other
classes.
• Eager Learners: Eager Learners develop a classification model based on a training dataset before receiving
a test dataset. Opposite to Lazy learners, Eager Learner takes more time in learning, and less time in
prediction. Example: Decision Trees, Naïve Bayes, ANN.
Types of ML Classification Algorithms:
Classification Algorithms can be further divided into mainly two categories:
•Linear Models
• Logistic Regression
• Support Vector Machines
•Non-linear Models
• K-Nearest Neighbours
• Kernel SVM
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification
Evaluating a Classification model:
Once our model is completed, it is necessary to evaluate its performance; either it is a Classification or
Regression model. So for evaluating a Classification model, we have the following ways:
Confusion Matrix:
•The confusion matrix provides us a matrix/table as output and describes the performance of the model.
•It is also known as the error matrix.
•The matrix consists of predictions result in a summarized form, which has a total number of correct predictions
and incorrect predictions. The matrix looks like as below table:
•Firstly, we will choose the number of neighbors, so we will choose the k=5.
•Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have already
studied in geometry. It can be calculated as:
Applications of SVM:
• Face detection,
• image classification,
• text categorization
• ….
Types of SVM
• Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified
into two classes by using a single straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such data is termed as non-linear data and classifier
used is called as Non-linear SVM classifier.
So now, SVM will divide the datasets into classes in the Since we are in 3-d Space, hence it is looking like a plane parallel to
following way. Consider the below image: the x-axis. If we convert it in 2d space with z=1, then it will become as: