Introduction To WEKA: 2. Weka: Download and Installation

Introduction to WEKA
Weka Main Features

A collection of open source of many data mining and machine learning algorithms,
including
pre-processing on data
Classification:
clustering
association rule extraction
Created by researchers at the University of Waikato in New Zealand
Java based (also open source).
49 data preprocessing tools

76 classification/regression algorithms

8 clustering algorithms

15 attribute/subset evaluators + 10 search algorithms for feature selection.

3 algorithms for finding association rules

3 graphical user interfaces

The Explorer (exploratory data analysis)

The Experimenter (experimental environment)

The KnowledgeFlow (new process model inspired interface)

2. Weka : Download and Installation

Start the Weka

From windows desktop,

Download Weka (the stable version) from

click Start, choose All programs,

http://www.cs.waikato.ac.nz/ml/weka/

Choose a self-extracting executable (including Java VM) Choose Weka 3.6 to start Weka

Then the first interface

Weka GUI Chooser.

After download is completed, run the self-extracting file to install Weka, and use the default set-ups.

WEKA Application Interfaces

Weka Functions and Tools

Pre-processing Filters

Attribute selection

Classification/Regression

Clustering

Association discovery

Visualization

Data can be imported from a file in various formats:

ARFF (Attribute Relation File Format) has two sections:

the Header information defines attribute name, type and relations.
the Data section lists the data records.

CSV: Comma Separated Values (text file)

C4.5: A format used by a decision induction algorithm

C4.5, requires two separated files
Name file: defines the names of the attributes
Date file: lists the records (samples)
binary

Data can also be read from a URL or from an SQL database (using JDBC)

ARFF File Format

ARFFFile Format
Require declarations of @RELATION, @ATTRIBUTE and @DATA

@RELATION declaration associates a name with the dataset

@RELATION <relation-name>

@ATTRIBUTE declaration specifies the name and type of an attribute

@ATTRIBUTE <attribute-name> <datatype>

Datatype can be numeric, nominal, string or date

@ATTRIBUTE sepallength NUMERIC

@ATTRIBUTE petalwidth NUMERIC

@ATTRIBUTE class {Setosa,Versicolor,Virginica}

@DATA declaration is a single line denoting the start of the data segment

Missing values are represented by ?

%
% data related to student
%
@relation 'XYZ'
@attribute name numeric
@attribute qualification numeric
@attribute designation numeric
@attribute addr numeric

@data
1,2,3,4
4,5,?,7

Predicting whether a loan will be repaid (credit scoring)is" important task for any bank. "High
accuracy benefits both the banks and the loan applicants. There are 20 attributes used in judging a
loan applicant. The goal is to classify the applicant into one of two categories ,good or bad.
Number of Attributes in german credit data set are 20 (7 numerical, 13 categorical)

.Attribute description for german

Attribute 1: (qualitative)
Status of existing checking account
A11 : ... < 0 DM
A12 : 0 <= ... < 200 DM
A13 : ... >= 200 DM /
salary assignments for at least 1 year
A14 : no checking account

Attribute 2: (numerical)
Duration in month
Credit history
A30 : no credits taken/
all credits paid back duly
A31 : all credits at this bank paid back duly
A32 : existing credits paid back duly till now
A33 : delay in paying off in the past
A34 : critical account/
other credits existing (not at this bank)
Purpose
A40 : car (new)
A41 : car (used)

A42 : furniture/equipment
A43 : radio/television
A44 : domestic appliances
A45 : repairs
A46 : education
A47 : (vacation - does not exist?)
A48 : retraining
A49 : business
A410 : others

Attribute 5: (numerical)
Credit amount

Attibute 6: (qualitative)
Savings account/bonds
A61 : ... < 100 DM
A62 : 100 <= ... < 500 DM
% A63 : 500 <= ... < 1000 DM
% A64 : .. >= 1000 DM
% A65 : unknown/ no savings account
%
% Attribute 7: (qualitative)
% Present employment since
% A71 : unemployed
% A72 : ... < 1 year
% A73 : 1 <= ... < 4 years
% A74 : 4 <= ... < 7 years
% A75 : .. >= 7 years
%
% Attribute 8: (numerical)
% Installment rate in percentage of disposable income
%
% Personal status and sex
% A91 : male : divorced/separated
% A92 : female : divorced/separated/married
% A93 : male : single
% A94 : male : married/widowed
% A95 : female : single
%
% Other debtors / guarantors
% A101 : none
% A102 : co-applicant
% A103 : guarantor
%
% Present residence since
%
% Property
% A121 : real estate
% A122 : if not A121 : building society savings agreement/
% life insurance
% A123 : if not A121/A122 : car or other, not in attribute 6
% A124 : unknown / no property
%
% Age in years

%
% Other installment plans
% A141 : bank
% A142 : stores
% A143 : none
%
% Housing
% A151 : rent
% A152 : own
% A153 : for free
%
% Number of existing credits at this bank
%
% Job
% A171 : unemployed/ unskilled - non-resident
% A172 : unskilled - resident
% A173 : skilled employee / official
% A174 : management/ self-employed/
% highly qualified employee/ officer
%
% Number of people being liable to provide maintenance for
%
% Telephone
% A191 : none
% A192 : yes, registered under the customers name
%
% foreign worker
% A201 : yes
% A202 : no
%

Introduction To WEKA: 2. Weka: Download and Installation

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Introduction To WEKA: 2. Weka: Download and Installation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To WEKA: 2. Weka: Download and Installation

Uploaded by

Copyright:

Available Formats

Introduction to WEKA

Weka Main Features

You might also like