Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
• A method for using class labels of K nearest neighbors to determine the class label of n
unknown record
• It can be the majority voting of the k nearest neighbors of the unknown record
K-Nearest Neighbors
• KNN works as follows:
• There is no real “training” involved as the algorithm stores some sort of “memory”
K-Nearest Neighbors
• KNN works as follows:
3. Among these k neighbors, count the data points that belong to each label
4. Assign the new data point to the most frequent label among its nearest neighbors
K-Nearest Neighbors
• In practice:
• https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
• Sklearn’s implementation
• https://towardsdatascience.com/make-knn-300-times-faster-than-scikit-learns-in-20-lines-
5e29d74e76bb
• https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-
search/
• With Facebook’s faiss, custom implementation can be up to 300x faster
Decision Trees
How can an algorithm be represented as a tree?
• Processes:
• Splitting
• Pruning
Decision Trees
• Intuition
• Particularities
• How to separate the records?
• Determine the attribute (s) that lead to the best partition?
• Specify the test condition of the attribute?
• Different algorithms lead to different decision trees, based on local optimization of
attributes
• start at the tree root and split the data on the feature that results in the largest
information gain (IG)
• objective function is to maximize the information gain (IG) at each split
Note: the lower the impurity of the child nodes, the larger the information gain
Decision Trees
• Gini Impurity
• measure of how often a randomly chosen element from the dataset would be incorrectly
labeled if it was randomly labeled according to the distribution of labels in the subset
• where j is the number of classes present in the node and p is the distribution of the
class in the node
https://archive.ics.uci.edu/ml/datasets/Heart+Disease
Decision Trees
• Gini Impurity - example • Gini Impurity Sex = 0
= 0,495
• Gini Impurity Exang = 1
= 0,24
= 0, 356
Decision Trees
• Gini Impurity – example
• Gini Impurity
• Sex = 0,399
• Fbs = 0,249
• Exang = 0,356
• where j is the number of classes present in the node and p is the distribution of the class in
the node
• The highest Entropy value on the first iteration will be the Root Node
Information Gain
https://archive.ics.uci.edu/ml/datasets/Heart+Disease
Decision Trees
• Entropy - example • Entropy Sex = 0
• Entropy Sex = 1
• Information Gain
Decision Trees
• Entropy - example • Entropy Fbs = 0
• Entropy Fbs = 1
• Information Gain
Decision Trees
• Entropy - example • Entropy Exang = 0
• Entropy Exang = 1
• Information Gain
Decision Trees
• Entropy – example
• Information Gain
• Sex = 0,328
• Fbs = 0,389
• Exang = 0,224
C1 0
C2 6
C1 1
C2 5
C1 3
C2 3
Decision Trees
• Classification and Regression Trees (CART) algorithm
• Step 1: Start at the root node with all training instances
• Step 2: Select an attribute based on splitting criteria
• Step 3: Partition instances according to selected attribute recursively
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
CART - example
Step 2: Select an attribute on the basis of splitting criteria
Oulook
Gini(Oulook)
CART - example
Step 2: Select an attribute on the basis of splitting criteria
Temp
Gini(Temp)
CART - example
Step 2: Select an attribute on the basis of splitting criteria
Humidity
Gini(Humidity)
CART - example
Step 2: Select an attribute on the basis of splitting criteria
Wind
Gini(Wind)
CART - example
Step 2: Select an attribute on the basis of splitting criteria
Time to decide
Next decisions:
• Focus on the sub dataset for Sunny outlook
• Focus on the sub dataset for Rain outlook
CART - example
Focus on the sub dataset for Sunny outlook Step 2: Select an attribute on the basis of splitting criteria
Temp
• All examples have the same values in their attributes (but different classes)
• Disadvantages
• The model can get unstable due to the slight variation of the data
• Overfitting happens when the algorithm grabs noise in the data
• A highly complicated decision tree tends to have a low bias / high variance
• Which makes it problematic for the model to work with new data
Decision Trees
• Using a Decision Tree Analysis for Intrusion Detection: A How-To Guide
• https://sansorg.egnyte.com/dl/6edQgfwngE
• A Consolidated Decision Tree-Based Intrusion Detection System for Binary and Multiclass
Imbalanced Datasets
• https://www.mdpi.com/2227-7390/9/7/751/pdf
• Specific models do well in modeling one aspect of data, while others do well in modeling
another
• Instead of learning a single complex model, learn several simple models and combine their
output to build the final decision
• The combined strength of the models compensates for individual model variances and biases
• Delivers a combined prediction where the final accuracy is better than the accuracy of the
individual models
Bagging
• Bagging (Bootstrap Aggregating)
• Random samples of the training
data set (subsets of training data
set) are created
• If classes are predicted incorrectly using the first learner, then it gives higher weight to
the missed classified observation
• Being an iterative process, it continues to add classifier learner until a limit is reached in
the number of models or accuracy
• Boosting has shown better predictive accuracy than bagging, but it also tends to overfit
the training data as well (!)
Stacking
• Stacking
• Use a learner to combine output
from different learners, learning a
function of their predictions
• Random Forest
• The "forest" it builds, is an ensemble of decision trees, usually trained with the
“bagging” method
• Instead of searching for the most important feature while splitting a node, it
searches for the best feature among a random subset of features
• This results in a wide diversity that generally results in a better model
• Is based on the “Wisdom of Crowds” theory