0% found this document useful (0 votes)
14 views

entropy and information gain for decision tree algorithm

Xrd6 gu9tx sjfsmzvmzv 59d fuptuf hda khx. Yotd mhtyc

Uploaded by

engtawkibhasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

entropy and information gain for decision tree algorithm

Xrd6 gu9tx sjfsmzvmzv 59d fuptuf hda khx. Yotd mhtyc

Uploaded by

engtawkibhasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Decision trees are a popular machine learning algorithm for both classification and regression tasks.

They
work by recursively splitting the dataset into subsets based on feature values, creating a tree-like
structure of decisions that leads to predictions. Here’s an overview of decision trees and some
commonly used algorithms:

1. Basic Concept of Decision Trees

• Nodes: Each node represents a feature (or attribute) in the dataset.

• Edges: Each branch from a node represents a decision based on that feature’s value.

• Leaf Nodes: Represent the final output (class or value) after all decisions have been made.

• Root Node: The topmost node in a tree, representing the initial feature or question.

2. Decision Tree Algorithms

a) ID3 (Iterative Dichotomiser 3)

• Developed by Ross Quinlan, ID3 is one of the earliest algorithms.

• Criterion: It uses information gain to decide which feature to split on, favoring splits that result
in the greatest reduction in entropy.

• Limitations: Prone to overfitting and can’t handle numeric data directly without modification.

b) C4.5

• An extension of ID3, also developed by Quinlan.

• Criterion: Uses gain ratio to handle continuous and categorical data better than ID3.

• Pruning: Implements pruning to reduce overfitting.

• Handling of Missing Values: C4.5 can handle datasets with missing values more effectively than
ID3.

c) CART (Classification and Regression Trees)

• Developed by Leo Breiman, CART is widely used in both classification and regression.

• Criterion: For classification, CART uses Gini impurity as the splitting criterion, while for
regression, it uses mean squared error (MSE).

• Binary Splits Only: CART splits the data into exactly two branches at each node, creating binary
trees.

• Pruning: Prunes trees based on a cost-complexity parameter to manage overfitting.

d) CHAID (Chi-Square Automatic Interaction Detector)

• CHAID is used for categorical data and is based on the chi-square test.

• Criterion: Uses statistical tests (chi-square for classification, ANOVA for regression) to determine
splits.
• Multifurcating Splits: Unlike CART, CHAID can create branches with multiple splits from a single
node.

• Use Cases: Often used for market research and survey analysis.

3. Advantages of Decision Trees

• Interpretability: Easy to understand and visualize, even for non-experts.

• Non-linearity: Can model non-linear relationships.

• Little Data Preprocessing: Often requires minimal data preparation, like normalization or scaling.

4. Limitations of Decision Trees

• Overfitting: Decision trees can easily overfit, especially with deep trees.

• Bias: Sensitive to small changes in data, which can lead to vastly different trees (high variance).

• Preference for Certain Features: Tend to favor features with more levels.

5. Applications of Decision Trees

• Classification tasks (e.g., spam detection, customer churn prediction)

• Regression tasks (e.g., predicting housing prices)

• Feature selection

To calculate entropy and information gain for building a decision tree:


entropy and information gain calculation with a small dataset and use it to build a decision tree. We’ll
also make a prediction for a given input based on the final decision.
Details:
https://towardsdatascience.com/decision-tree-in-machine-learning-
e380942a4c96

You might also like