2/20/25, 9:32 AM Decision_Tree_Regression.
ipynb - Colab
Decision Tree
A Decision Tree is a popular machine learning algorithm used for both classification and regression tasks.
In regression, the Decision Tree algorithm predicts a continuous target variable by splitting the data into subsets based on feature
values.
The splits are made to minimize the variance (or mean squared error) in the target variable within each subset.
keyboard_arrow_down How Decision Tree Regression Works
Splitting
The algorithm starts at the root node and splits the data into two or more subsets based on the feature that results in the most significant
reduction in variance (or another criterion).
Leaf Nodes
The process continues recursively, creating branches until a stopping criterion is met (e.g., maximum depth, minimum samples per leaf).
Prediction
For a new data point, the algorithm traverses the tree from the root to a leaf node, and the prediction is typically the mean of the target values
in that leaf node.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
# Create a synthetic dataset
np.random.seed(0)
X = np.sort(5 * np.random.rand(80, 1), axis=0) # 80 random points in the range [0, 5]
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0]) # Sine function with noise
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a Decision Tree Regressor
regressor = DecisionTreeRegressor(max_depth=3) # Limiting the depth to avoid overfitting
# Fit the model
regressor.fit(X_train, y_train)
# Make predictions
y_pred = regressor.predict(X_test)
# Visualize the results
plt.figure(figsize=(10, 6))
plt.scatter(X_train, y_train, color='blue', label='Training data')
plt.scatter(X_test, y_test, color='red', label='Test data')
plt.scatter(X_test, y_pred, color='green', label='Predictions', marker='x')
plt.title('Decision Tree Regression')
plt.xlabel('Feature (X)')
plt.ylabel('Target (y)')
plt.legend()
plt.show()
https://colab.research.google.com/drive/1WpDf5vvlmg_lXutKgPNgFx3q5-SsOWQ0#scrollTo=iKsTLoVaduyU&printMode=true 1/3
2/20/25, 9:32 AM Decision_Tree_Regression.ipynb - Colab
Iris DataSet
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, confusion_matrix
# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data # Features (4 features)
y = iris.target # Target labels (3 classes)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a Decision Tree Classifier
classifier = DecisionTreeClassifier(max_depth=3, random_state=42)
# Fit the model
classifier.fit(X_train, y_train)
# Make predictions
y_pred = classifier.predict(X_test)
# Evaluate the model
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
# Visualize the decision boundaries (using only the first two features for 2D visualization)
X_train_2d = X_train[:, :2] # Use only the first two features
X_test_2d = X_test[:, :2] # Use only the first two features
# Create a mesh grid for plotting decision boundaries
x_min, x_max = X_train_2d[:, 0].min() - 1, X_train_2d[:, 0].max() + 1
y_min, y_max = X_train_2d[:, 1].min() - 1, X_train_2d[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
np.arange(y_min, y_max, 0.01))
# Train a new classifier on the 2D data for visualization
classifier_2d = DecisionTreeClassifier(max_depth=3, random_state=42)
classifier_2d.fit(X_train_2d, y_train)
https://colab.research.google.com/drive/1WpDf5vvlmg_lXutKgPNgFx3q5-SsOWQ0#scrollTo=iKsTLoVaduyU&printMode=true 2/3
2/20/25, 9:32 AM Decision_Tree_Regression.ipynb - Colab
# Predict the class for each point in the mesh grid
Z = classifier_2d.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
# Plotting
plt.figure(figsize=(10, 6))
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)
plt.scatter(X_train_2d[:, 0], X_train_2d[:, 1], c=y_train, edgecolor='k', marker='o', label='Training data')
plt.scatter(X_test_2d[:, 0], X_test_2d[:, 1], c=y_test, edgecolor='k', marker='x', label='Test data')
plt.title('Decision Tree Classifier on Iris Dataset (2D Visualization)')
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.legend()
plt.show()
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 10
1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
<ipython-input-4-84cca00a73bc>:53: UserWarning: You passed a edgecolor/edgecolors ('k') for an unfilled marker ('x'). Matplotlib is ign
plt.scatter(X_test_2d[:, 0], X_test_2d[:, 1], c=y_test, edgecolor='k', marker='x', label='Test data')
https://colab.research.google.com/drive/1WpDf5vvlmg_lXutKgPNgFx3q5-SsOWQ0#scrollTo=iKsTLoVaduyU&printMode=true 3/3