26 Weka
26 Weka
26 Weka
These tutorial exercises introduce WEKA and ask you to try out several machine
learning, visualization, and preprocessing methods using a wide variety of datasets:
• Learners: decision tree learner (J48), instance-based learner (IBk), Naïve
Bayes (NB), Naïve Bayes Multinomial (NBM), support vector machine (SMO),
association rule learner (Apriori)
• Meta-learners: filtered classifier, attribute selected classifiers (CfsSubsetEval
and WrapperSubsetEval)
• Visualization: visualize datasets, decision trees, decision boundaries,
classification errors
• Preprocessing: remove attributes and instances, use supervised and
unsupervised discretization, select features, convert strings to word vectors
• Testing: on training set, on supplied test set, using cross-validation, using TP
and FP rates, ROC area, confidence and support of association rules
• Datasets: weather.nominal, iris, glass (with variants), vehicle (with variants),
kr-vs-kp, waveform-5000, generated, sick, vote, mushroom, letter, ReutersCorn-
Train and ReutersGrain-Train, supermarket
Tutorial 4: Preprocessing
Introduce the datasets sick, vote, mushroom and letter.
Apply discretization:
• explain what discretization is
• load the sick dataset and look at the attributes
• classify using NB, evaluating with cross-validation
• apply the supervised discretization filter and look at the effect (in the Preprocess
panel)
• apply unsupervised discretization with different numbers of bins and look at the
effect
• use the FilteredClassifier with NB and supervised discretization, evaluating with
cross-validation
• repeat using unsupervised discretization with different numbers of bins
• compare and interpret the results.
Apply feature selection using CfsSubsetEval:
• explain what feature selection is
• load the mushroom dataset and apply J48, IBk and NB, evaluating with cross-
validation
• select attributes using CfsSubsetEval and GreedyStepwise search
• interpret the results
• use AttributeSelectedClassifier (with CfsSubsetEval and GreedyStepwise
search) for classifiers J48, IBk and NB, evaluating with cross-validation
• interpret the results.
Apply feature selection using WrapperSubsetEval:
• load the vote dataset and apply J48, IBk and NB, evaluating with cross-
validation
• select attributes using WrapperSubsetEval with InfoGainAttributeEval and
RankSearch, with the J48 classifier
• interpret the results
• use AttributeSelectedClassifier (with WrapperSubsetEval,
InfoGainAttributeEval and RankSearch) with classifiers J48, IBk and NB,
evaluating with cross-validation
• interpret the results.
Sampling a dataset:
• load the letter dataset and examine a particular (numeric) attribute
• apply the Resample filter to select half the dataset
• examine the same attribute and comment on the results.