0% found this document useful (0 votes)
364 views23 pages

Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India

WEKA is a collection of machine learning algorithms and data preprocessing tools developed at the University of Waikato. It contains tools for data preprocessing, classification, regression, clustering, and associating rule mining. It has a graphical user interface called the Explorer that allows users to load data, apply preprocessing techniques, evaluate machine learning models, perform attribute selection, and visualize data. WEKA is open source, written in Java, and runs on multiple platforms including Windows, Mac and Linux.

Uploaded by

antony
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
364 views23 pages

Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India

WEKA is a collection of machine learning algorithms and data preprocessing tools developed at the University of Waikato. It contains tools for data preprocessing, classification, regression, clustering, and associating rule mining. It has a graphical user interface called the Explorer that allows users to load data, apply preprocessing techniques, evaluate machine learning models, perform attribute selection, and visualize data. WEKA is open source, written in Java, and runs on multiple platforms including Windows, Mac and Linux.

Uploaded by

antony
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

WEKA

A . Antony Alex MCA Dr G R D College of Science CBE Tamil Nadu - India

Waikato Environment for Knowledge Analysis


A collection of open source ML algorithms
pre-processing classifiers clustering association rule Its a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is also a bird found only on the islands of New Zealand.

Java based Routines are implemented as classes and logically arranged in packages Comes with an extensive GUI interface

Download and Install WEKA


Website: http://www.cs.waikato.ac.nz/~ml/weka/index. html Support multiple platforms (written in java):
Windows, Mac OS X and Linux

9/10/2012

Main Features
49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 3 algorithms for finding association rules 15 attribute/subset evaluators + 10 search algorithms for feature selection

9/10/2012

Command line interface


Dataset Classifier Weka.filters Weka.classifiers

java weka.core.converters.CSVLoader data.csv > data.arff java weka.core.converters.C45Loader c45_filestem > data.arff java weka.classifiers.rules.ZeroR -t weather.arff java weka.classifiers.trees.J48 -t weather.arff java weka.filters.supervised.attribute.Discretize -i data/iris.arff \ -o iris-nom.arff -c last java weka.filters.supervised.attribute.Discretize -i data/cpu.arff \ -o cpu-classvendor-nom.arff -c first

Main GUI
Three graphical user interfaces
The Explorer (exploratory data analysis) The Experimenter (experimental environment) The KnowledgeFlow (new process model inspired interface)

9/10/2012

Explorer: pre-processing the data preData can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called filters WEKA contains filters for:
Discretization, normalization, resampling, attribute selection, transforming and combining attributes,
9/10/2012 7

ACCESSING DATABASE
jdbcDriver jdbcURL
DatabaseUtils.props.hsql - HSQLDB DatabaseUtils.props.msaccess - MS Access DatabaseUtils.props.mssqlserver - MS SQL Server DatabaseUtils.props.mysql - MySQL DatabaseUtils.props.odbc - ODBC access via ODBC/JDBC bridge, DatabaseUtils.props.oracle - Oracle 10g DatabaseUtils.props.postgresql - PostgreSQL 7.4 DatabaseUtils.props.sqlite3 - sqlite 3.x

WEKA flat files


@relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ...

9/10/2012

9/10/2012

University of Waikato

10

9/10/2012

University of Waikato

11

9/10/2012

University of Waikato

12

9/10/2012

University of Waikato

13

WEKA:: Explorer: building classifiers


Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include:
Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes nets,

Meta-classifiers include:
Bagging, boosting, stacking, error-correcting output codes, locally weighted learning,

Explorer: clustering data


WEKA contains clusterers for finding groups of similar instances in a dataset Implemented schemes are:
k-Means, EM, Cobweb, X-means, FarthestFirst

Clusters can be visualized and compared to true clusters

9/10/2012

16

Explorer: finding associations


WEKA contains an implementation of the Apriori algorithm for learning association rules
Works only with discrete data

Can identify statistical dependencies between groups of attributes:


milk, butter bread, eggs (with confidence 0.9 and support 2000)

Apriori can compute all rules that have a given minimum support and exceed a given confidence
9/10/2012 17

Explorer: attribute selection


Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts:
A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared,

Very flexible: WEKA allows (almost) arbitrary combinations of these two


9/10/2012 18

Explorer: data visualization


Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d)
To do: rotating 3-d visualizations (Xgobi-style)

Color-coded class values Jitter option to deal with nominal attributes (and to detect hidden data points) Zoom-in function
9/10/2012 19

Thank U

You might also like