24CSR1R01 DSF Assignment 2

Uploaded by

Stutee Pradhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views9 pages

24CSR1R01 DSF Assignment 2

Uploaded by

Stutee Pradhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

National Institute of Technology Warangal

Department of Computer Science and Engineering

CS 16035: DATA SCIENCE FUNDAMENTALS
Class Assignment Name: Smaraki Bhaktisudha
PhD 1st Yr. 1st Sem. Roll No: 24CSR1R01

Assignment 2

 Download weka tool.

 Try different tasks with different datasets
 Consider popular evaluation metrics for the evaluation of your experiments.
 Write your observations about each task- different datasets & different tasks- each dataset.

Dataset 1- Iris dataset

 We can see from the dataset that we have 150 rows i.e. 150 flowers of the same but distinguished colors
are collected and they are divided into different class on that basis.
 The iris dataset has three classes: ‘Iris-setosa’, ‘Iris-versicolor’, and ‘Iris-virginica’, each containing 50
instances.
 It is a balanced dataset.
 The dataset includes four numeric attributes (sepal length, sepal width, petal length, petal width) and
one nominal class attribute.
 It is a supervised dataset.

Visualising the dataset

 From the numeric attributes (especially ‘sepallength’, ‘sepalwidth’, ‘petalwidth’) it’s quite difficult to
distinguish between the flowers.
 But after looking at the ‘petallength’ here are some observations which we can note down are-
 If the petal-length of a flower is between 1 to 2.18 then that flower is iris-setosa.
 If the petal-length of a flower is between 2.18 to 3.36 then that flower is iris-versicolor (very
less instances though).
 If the petal-length of a flower is between 5.72 to 6.9 then that flower is iris-virginica
(comparatively less instances).

Classifying the dataset

1. Naive Bayes classifier

First I used Naive Bayes classifier. It is usually considered as the base line. It is usually very fast. Although
we can do better in terms of accuracy. After looking at the confusion matrix we can say that-
Observation:-
 All 50 Iris-setosa are classified correctly without any error as ‘b’ and ‘c’ are 0.
 When it classified Iris- versicolor there was two error. Two of data went to ‘c’.
 For Iris-virginica 46 were classified correctly but 4 were error.

2. Decision tree (J48) classifier

Observation:-
Here I used J48 tree and after looking at the confusion matrix we can say that-
 All 50 Iris-setosa are classified correctly without any error as ‘b’ and ‘c’ are 0.
 When it classified Iris- versicolor there was one error. One of data went to ‘c’.
 For Iris-virginica 48 were classified correctly but 2 were error.
This is the tree which is used by this classifier to distinguish between different classes.

3. Multilayer Perceptron

Clustering
Observation:
 Four clusters were identified, despite the dataset originally having three classes. This suggests that one
or more species of iris exhibit internal variation that the EM algorithm picked up on.
 Cluster 1 corresponds to Iris-setosa, which is easily separable due to its distinct features.
 Cluster 0 primarily corresponds to Iris-versicolor, while Clusters 2 and 3 mostly represent Iris-
virginica.
 The presence of four clusters instead of three suggests some variability or overlap between Iris-
versicolor and Iris-virginica, which might be better understood with further analysis.
 The EM algorithm's clustering model captures the natural variation within the Iris dataset, providing
insights into the subtle differences between the species.

Ranking the Attributes

Observation:
 According to the Ranker the attribute “sepalwidth” is least impactful for determining the result
(classification).
 From the screenshots below we can see sepalwidth has the least effect, so if we remove it while
classifying it won’t have much effect on the result.
Dataset 2- Diabetes dataset

 We can see from the dataset that we have 768 instances i.e. 150 flowers of the same but distinguished
colors are collected and they are divided into different class on that basis.
 There are 8 numerical input features (attributes) and 1 binary output variable (class - tested_negative /
tested_positive).
 It is an unbalanced dataset.
 It is a supervised dataset.

Visualising the dataset

Classifying the dataset

1. Naive Bayes classifier

2. Decision tree (J48) classifier

This is the tree which is used by this classifier to distinguish between different classes.
3. Multilayer Perceptron

Clustering

Observation:
 Three clusters were identified, each representing different subgroups in the dataset.
 Cluster 0 represent individuals with higher glucose and insulin levels, possibly indicating higher
diabetes risk.
 Cluster 1 represents healthier, younger individuals with lower measurements across most features,
corresponding mainly to non-diabetic cases.
 Cluster 2 includes older individuals with more pregnancies, with moderate glucose levels and possibly
missing insulin data, indicating another at-risk group.
 The EM algorithm's clustering provides insights into the underlying structure of the data, potentially
guiding further analysis or targeted interventions.
Difference between the observations of using different classifiers on different datasets:
 Naive Bayes classifier worked quite well for the dataset with less number of instances i.e. the Iris
Dataset. But as the number of instances increased in the second dataset, the respective classifier didn’t
performed well.
 For iris dataset the Multilayer Perceptron performed the best while doing classification. But in the
case of Diabetes dataset Decision tree (J48) classifier performed the best.
 For iris dataset the three classes (‘Iris-setosa’, ‘Iris-versicolor’, and ‘Iris-virginica’) contains 50
instances each.
 For Diabetes dataset, the two classes ‘tested_negative’, ‘tested_positive’ has 500 and 268 instances
respectively.

1613101309_JAYESH BANSAL_FinalProjectReport - Jayesh Bansal
No ratings yet
1613101309_JAYESH BANSAL_FinalProjectReport - Jayesh Bansal
38 pages
Iris flower classification project
100% (1)
Iris flower classification project
14 pages
SAP HANA Predictive Analysis Library PAL en
No ratings yet
SAP HANA Predictive Analysis Library PAL en
672 pages
AI USIT Log
No ratings yet
AI USIT Log
22 pages
Classification of Iris Flower Species Updated
100% (1)
Classification of Iris Flower Species Updated
5 pages
Credit Scoring Using Machine Learning
No ratings yet
Credit Scoring Using Machine Learning
381 pages
ST1 4483 8995 Capstone PPT Template
No ratings yet
ST1 4483 8995 Capstone PPT Template
10 pages
DM Practicals in Python
No ratings yet
DM Practicals in Python
55 pages
ML Unit 2
No ratings yet
ML Unit 2
25 pages
Python ML Algorithm
No ratings yet
Python ML Algorithm
30 pages
BT-2016 SEM-IV Project Report (Review 1)
No ratings yet
BT-2016 SEM-IV Project Report (Review 1)
42 pages
Data Minig Lab File
No ratings yet
Data Minig Lab File
25 pages
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
No ratings yet
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
55 pages
Lecture 8_Classification_Part 1_747f318b1ce91a5f5b8f605534dcd8a3
No ratings yet
Lecture 8_Classification_Part 1_747f318b1ce91a5f5b8f605534dcd8a3
28 pages
Sameed Ahmed Khan Tools For Artificial Neural Network and Machine Learning
No ratings yet
Sameed Ahmed Khan Tools For Artificial Neural Network and Machine Learning
14 pages
Iris Flower Classification
No ratings yet
Iris Flower Classification
47 pages
Notes- Introduction to AI,ML,DS
No ratings yet
Notes- Introduction to AI,ML,DS
61 pages
Naive Bayes Classifier 066
No ratings yet
Naive Bayes Classifier 066
14 pages
ML Mod-4
No ratings yet
ML Mod-4
30 pages
Deep Learning
No ratings yet
Deep Learning
11 pages
ML Assignment 2
No ratings yet
ML Assignment 2
25 pages
Project A
No ratings yet
Project A
24 pages
Data Science Project
No ratings yet
Data Science Project
31 pages
Sridevi Women'S Engineering College: Mini Project Seminar On
No ratings yet
Sridevi Women'S Engineering College: Mini Project Seminar On
23 pages
Iris Flower Classification Using ML - by Modassir - Medium
No ratings yet
Iris Flower Classification Using ML - by Modassir - Medium
21 pages
Wa0001
No ratings yet
Wa0001
39 pages
61 JBS1753
No ratings yet
61 JBS1753
13 pages
Artificial Neural Networks: References
No ratings yet
Artificial Neural Networks: References
57 pages
Data Mining practical
No ratings yet
Data Mining practical
13 pages
iris-dataset-project-report_compress
No ratings yet
iris-dataset-project-report_compress
16 pages
Research
No ratings yet
Research
12 pages
Health Information System For ML Transes p2
No ratings yet
Health Information System For ML Transes p2
15 pages
SUMITs MINOR REPORT
No ratings yet
SUMITs MINOR REPORT
16 pages
Solution HW2
No ratings yet
Solution HW2
6 pages
Understanding-Code-for A-Classifier
No ratings yet
Understanding-Code-for A-Classifier
15 pages
Naïve Bayes
No ratings yet
Naïve Bayes
11 pages
Adaptive Linear Neuron
No ratings yet
Adaptive Linear Neuron
11 pages
Aman Anand-2 (1)
No ratings yet
Aman Anand-2 (1)
40 pages
WWW - Topmentor.In: India'S First 100% Practical Training Institute
No ratings yet
WWW - Topmentor.In: India'S First 100% Practical Training Institute
18 pages
Bs Report On Iris
No ratings yet
Bs Report On Iris
6 pages
Amber Iris Ppt
No ratings yet
Amber Iris Ppt
23 pages
Iris Flower Classification Final
No ratings yet
Iris Flower Classification Final
15 pages
Fo DS
No ratings yet
Fo DS
9 pages
Weka Project1 Sajeena
No ratings yet
Weka Project1 Sajeena
14 pages
04-svm
No ratings yet
04-svm
8 pages
Dicision Trees On Weka
No ratings yet
Dicision Trees On Weka
4 pages
Theory of Computation Notes 1 - TutorialsDuniya
No ratings yet
Theory of Computation Notes 1 - TutorialsDuniya
106 pages
ML-Lecture-10-Project
No ratings yet
ML-Lecture-10-Project
20 pages
Knn Datacamp
No ratings yet
Knn Datacamp
31 pages
22BCS14374 - Sanya - Singh - Assignment 2
No ratings yet
22BCS14374 - Sanya - Singh - Assignment 2
8 pages
A Scoping Review of Artificial Intelligence in Medical Education BEME Guide No. 84 (1)
No ratings yet
A Scoping Review of Artificial Intelligence in Medical Education BEME Guide No. 84 (1)
26 pages
Iris Flower Classification Project
No ratings yet
Iris Flower Classification Project
9 pages
Ijcait1211 Kalpanasharma
No ratings yet
Ijcait1211 Kalpanasharma
5 pages
Project Report Stock Market
No ratings yet
Project Report Stock Market
62 pages
AML_Lab3_2021wb15156
No ratings yet
AML_Lab3_2021wb15156
13 pages
Reasearch Paper Review
No ratings yet
Reasearch Paper Review
45 pages
Lab5_DataMining
No ratings yet
Lab5_DataMining
7 pages
An Approach Based Iris Flower Species Recognition Using Machine Learning Classifiers
No ratings yet
An Approach Based Iris Flower Species Recognition Using Machine Learning Classifiers
7 pages
Modeling Mine Workforce Fatigue Finding Leading Indicators
No ratings yet
Modeling Mine Workforce Fatigue Finding Leading Indicators
22 pages
Lab Report 10 FDS
No ratings yet
Lab Report 10 FDS
7 pages
Iris Classification
No ratings yet
Iris Classification
6 pages
KNN Based Clothing Color Detection For Optimization of Color Selection Based On Thermal Comforatability
No ratings yet
KNN Based Clothing Color Detection For Optimization of Color Selection Based On Thermal Comforatability
22 pages
12 Classification
No ratings yet
12 Classification
16 pages
R-course_part7-ML_exercise-sheet-2024
No ratings yet
R-course_part7-ML_exercise-sheet-2024
8 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
20 pages
Naive Bayes With R
No ratings yet
Naive Bayes With R
2 pages
Task 1 Iris Flower Classification Using Machine Learning
No ratings yet
Task 1 Iris Flower Classification Using Machine Learning
10 pages
Icicit 2020
No ratings yet
Icicit 2020
981 pages
kmeans_steps
No ratings yet
kmeans_steps
3 pages
Data Science: Objectives
No ratings yet
Data Science: Objectives
10 pages
Enhancing Phishing Detection Through Natural Language Processing
No ratings yet
Enhancing Phishing Detection Through Natural Language Processing
14 pages
Assignment 4 r Program1
No ratings yet
Assignment 4 r Program1
11 pages
Iris Flower Classification
No ratings yet
Iris Flower Classification
3 pages
Lab 6
No ratings yet
Lab 6
4 pages
From Pulse To Prescription: Exploring The Rise of AI in Medicine and Its Implications
No ratings yet
From Pulse To Prescription: Exploring The Rise of AI in Medicine and Its Implications
18 pages
CS 230 - Deep Learning Tips and Tricks Cheatsheet
No ratings yet
CS 230 - Deep Learning Tips and Tricks Cheatsheet
8 pages
Ludic - Workshop - Iris - Copie
No ratings yet
Ludic - Workshop - Iris - Copie
5 pages
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
No ratings yet
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
24 pages
Project Ppt 01
No ratings yet
Project Ppt 01
9 pages
Machine Learning Models For Estimating Preliminary Factory Construction Cost: Case Study in Southern Vietnam
No ratings yet
Machine Learning Models For Estimating Preliminary Factory Construction Cost: Case Study in Southern Vietnam
10 pages
Introduction To Machine Learning
100% (1)
Introduction To Machine Learning
11 pages
ML Lab1 pgm
No ratings yet
ML Lab1 pgm
4 pages
FATE-LLM: A Industrial Grade Federated Learning Framework For Large Language Models
No ratings yet
FATE-LLM: A Industrial Grade Federated Learning Framework For Large Language Models
7 pages
Exploring The Fusion of Graph Theory and Diverse Machine Learning Models in Evaluating Cybersecurity Risk
No ratings yet
Exploring The Fusion of Graph Theory and Diverse Machine Learning Models in Evaluating Cybersecurity Risk
10 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Machine Learning For Cybersecurity Threat Detection and Prevention
No ratings yet
Machine Learning For Cybersecurity Threat Detection and Prevention
7 pages
Automatic Cell Image Segmentation Using Genetic Algorithms
No ratings yet
Automatic Cell Image Segmentation Using Genetic Algorithms
5 pages
Job Description:: About Info Edge
No ratings yet
Job Description:: About Info Edge
2 pages
Amulya Resume
No ratings yet
Amulya Resume
1 page
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)