0% found this document useful (0 votes)
3 views

practical 15 python

The document outlines the process of building a Decision Tree Classifier using Gini criteria with a dataset, specifically the Iris dataset. It includes code snippets for loading the dataset, training the classifier, hyperparameter tuning with GridSearchCV, and visualizing the decision tree. The conclusion emphasizes the importance of understanding Decision Tree Classifiers for creating accurate machine learning models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

practical 15 python

The document outlines the process of building a Decision Tree Classifier using Gini criteria with a dataset, specifically the Iris dataset. It includes code snippets for loading the dataset, training the classifier, hyperparameter tuning with GridSearchCV, and visualizing the decision tree. The conclusion emphasizes the importance of understanding Decision Tree Classifiers for creating accurate machine learning models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Practical -15

Aim:-Write a program a Build a Decision Tree Classifier using Gini Criteria in a Dataset.

A Decision Tree Classifier is a type of supervised learning algorithm that uses a tree-like
model to classify data into different categories. The algorithm works by recursively
partitioning the data into smaller subsets based on the values of the input features. Each
internal node in the tree represents a feature or attribute, and each leaf node represents a
class label. The classification process involves traversing the tree from the root node to a leaf
node, with each node providing a decision based on the input features.

Input:-

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score

# load iris dataset

iris = load_iris()

X = iris.data

y = iris.target

# split dataset to training and test set

X_train, X_test, y_train, y_test = train_test_split(

X, y, test_size=0.3, random_state = 99)

# initialize decision tree classifier

clf = DecisionTreeClassifier(random_state=1)
# train the classifier

clf.fit(X_train, y_train)

# predict using classifier

y_pred = clf.predict(X_test)

# claculate accuracy

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy}')

Output:-

Accuracy: 0.9555555555555556

Input:-

from sklearn.model_selection import GridSearchCV

# Hyperparameter to fine tune

param_grid = {

'max_depth': range(1, 10, 1),

'min_samples_leaf': range(1, 20, 2),

'min_samples_split': range(2, 20, 2),

'criterion': ["entropy", "gini"]

# Decision tree classifier

tree = DecisionTreeClassifier(random_state=1)

# GridSearchCV

grid_search = GridSearchCV(estimator=tree, param_grid=param_grid,

cv=5, verbose=True)

grid_search.fit(X_train, y_train)
# Best score and estimator

print("best accuracy", grid_search.best_score_)

print(grid_search.best_estimator_)

Output:

Fitting 5 folds for each of 1620 candidates, totalling 8100 fits


best accuracy 0.9714285714285715
DecisionTreeClassifier(criterion='entropy', max_depth=4,
min_samples_leaf=3, random_state=1)

Visualizing the Decision Tree Classifier

Input:-

from sklearn.tree import plot_tree

import matplotlib.pyplot as plt

# best estimator

tree_clf = grid_search.best_estimator_

# plot

plt.figure(figsize=(18, 15))

plot_tree(tree_clf, filled=True, feature_names=iris.feature_names,

class_names=iris.target_names)
plt.show()

Output:-

Input:-

import pandas as pd

import matplotlib.pyplot as plt

# load dataset

dataset_link = 'https://media.geeksforgeeks.org/wp-
content/uploads/20240620175612/spam_email.csv'

df = pd.read_csv(dataset_link)
# plot the category count

df['Category'].value_counts().plot.bar(color = ["g","r"])

plt.title('Total number of ham and spam in the dataset')

plt.show()

Input:-

import seaborn as sns

# confusion matrix

cmat = confusion_matrix(y_test, pred)


# plot heatmap

sns.heatmap(cmat, annot=True, cmap='Paired',

cbar=False, fmt="d", xticklabels=[

'Not Spam', 'Spam'], yticklabels=['Not Spam', 'Spam'])

Output:-

Conclusion:-

In this article, we have explored the world of Decision Tree Classifiers using Scikit-Learn. We
have covered the theoretical foundations, implementation, and practical applications of
Decision Tree Classifiers, providing a comprehensive guide for both beginners and
experienced practitioners. By understanding the strengths and limitations of Decision Tree
Classifiers, we can harness their power to build accurate and interpretable machine learning
models.

You might also like