WEKA Manual
WEKA Manual
WEKA Manual
This document is a short manual for installing and using the basic functionality of the
Machine Learning tool WEKA. By far not all functionality of Weka will be explained in the
manual. More information can be found on the internet.
Mac users
For people using a Mac laptop: you also have to copy the folder weka-X-X-XX to the
Applications folder.
You can run WEKA by (double) clicking the WEKA icon.
@RELATION iris
@ATTRIBUTE
@ATTRIBUTE
@ATTRIBUTE
@ATTRIBUTE
@ATTRIBUTE
virginica}
sepallength
REAL
sepalwidth
REAL
petallength
REAL
petalwidth REAL
class
{Iris-setosa,Iris-versicolor,Iris-
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
Explorer
One of the basic applications to explore the data and classifiers is the Explorer. Click on
the Explorer button in the Weka GUI Chooser window, see Figure 1. Then the
following screen will appear:
Loading data
Click on the Open file button and select the arff data file you want to analyze. For
this manual we select the iris.arff file which can be found in the Weka data folder. On
my Windows machine this is the folder C:\Program Files (x86)\Weka-3-6\data.
After opening the file the following screen appears, see Figure 4
A single attribute has most of the time not that much discriminating power. To explore
the discriminating power of two attributes click on the visualize tab of the Explorer
screen, see Figure 4. The following window should appear.
Exploring classifiers.
One of the basic classifiers we consider in the course is the Decision Tree. Click in the
Explorer window in the Classify tab. The following window should appear,
Exercise 3
Is the confusion in line with what observed in the data visualization concerning the
separation of the classes? Explain why!
Exercise 4
Apply a Nave Bayes classifier (in the subfolder bayes) to the above data and give the
performance and the confusion matrix. Which classifier is better, the decision tree J48 or
the Nave Bayes classifier? Explain why! Is there a difference between the two
classifiers? (Hint: look at the confusion matrix).
One problem with Weka Explorer is that you have to remember and repeat all the steps
by selecting tabs and clicking buttons. Another option is to use the Knowledge flow
functionality of Weka.
Attribute selection
In some cases, for instance if one wants to classify emails or texts, one has a lot of
features. In that case a classifier can benefit from attribute selection. This can be done
by manually removing attributes in the Preprocess tab; select the attributes you want to
remove and click the remove button, see Figure 4. Another option is to apply an
automatic feature selection method which can be done in the Select attributes tab.
Knowledge flow
In order to start the Knowledge flow click the KnowledgeFlow button in the Weka Gui
Chooser. The following window should appear,
Figure 13: Extended pipeline, can be used to compare a Nave Bayes classifier
with a decision tree.