Perceptron C++

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Perceptron Learning & Decision Tree Learning

Artificial Intelligence
CSCI 6751.81, FALL 2001

Instructor Professor Francis Sand

By Kanoksri Sarinnapakorn
(Due date: December 15, 2001)

A.I. Project Fall 2001


1) Write a program to implement the perceptron learning rule. 2) Write a program to implement the decision tree learning rule. 3) Solve Exercise 19.4. The number were generated from a kind of majority or voting function where each input had a different number of votes: 10 for I1, 4 for I2-I4, 2 for I5, 1 for I6. 4) Generate more examples from this function and test the performance of your trained neural net on these examples. 5) Draw learning curves for the perceptron and learning tree neural nets based on future made up examples with the same function.

Students will be expected to deliver the following items at the completion of the project. Item 1 Code listing. Item 2 Documentation of the programs, including explanation of the program logic and the inputs and expected outputs. Item 3 Printout of the results obtained from running the executable program with the data supplied and the test data. Item 4 The learning curves and a brief discussion of what you derive from them.

ii

Table of Contents
1. Perceptron Learning Program (PCTLearn) 1.1 Program logic .... .. 1 2 3 1.2 Input and output format . 1.3 Source code listing

2. Decision Tree Learning Program (DTLearn) 2.1 Program logic .... 2.2 Input and output format 2.3 Source code listing 3. Exercise 19.4 3.1 The result from perceptron learning ...... 22 22 3.2 The result from decision tree learning ..... 4. Testing Performance 4.1 Performance of perceptron learning .... 24 25 28 4.2 Performance of decision tree learning ..... 5. Learning Curves . . .. 7 8 10

6. Discussion

...

29

7. References

...

30

iii

1. Perceptron Learning Program (PCTLearn)


The program (PCTLearn.cpp) is written in C++ and compiled using Microsoft Visual C++ Version 6 compiler. 1.1 Program Logic 1. Set the threshold = 1.0, learning rate (ALPHA) = 0.1, number of attributes (n) = 6, and maximum number of trials = 100. 2. Initialize the weight of each attribute by using a random number function (value between 0.0 to 1.0). 3. Open the input file and read all inputs into a training_data array. 4. Train the For each example data a) Calculate the output unit O from O = Step (

W
j =0

IJ )

where Ij and Wj are the input and the weight of attribute j (j = 1 to n) W0 = -threshold and I0 = 1.0 This step function means that if

W
j =0

I J >= 0, then O = 1; otherwise O = 0.

b) Calculate the error between the target (or correct output) and calculated output. Err = T O where T = target and O = calculated output c) If Err does not equal 0, adjust the weight of each attribute and start the training process again from the first example. Wj <- Wj + ALPHA * IJ * Err , j = 1, , n d) If Err equals 0, use the same weights for the next example. e) Repeat steps a to d until all the examples are correctly predicted (success) or the maximum number of trials is reached (failure: could not learn). 5. Print the final weights. 6. Calculate the percentage of correct prediction from 1000 test data.

1.2 Input and Output Format Input file 1. Column1: The example number 2. Column2: Target value 3. Column3 8: Attributes (I1 I6) Example of input file train14.dat EX T 1 1 2 1 3 1 4 1 5 1 6 1 7 0 8 1 9 0 10 0 11 0 12 0 13 0 14 0 Output Perceptron Learning Number of training examples: 14 Number of changing weights: 16 Number of trials: 9 Final weights W1 0.767931 W2 0.192413 W3 0.254704 W4 0.478561 W5 0.121235 W6 0.111350 I1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 I2 0 0 0 1 1 0 0 1 1 0 1 0 1 1 I3 1 1 1 0 1 0 0 1 1 0 0 0 1 1 I4 0 1 0 0 1 0 0 1 0 1 1 1 0 1 I5 0 0 1 1 0 1 1 0 1 1 0 0 1 0 I6 0 0 0 1 0 1 0 1 1 0 1 1 1 0 Title

Examples

Correct rate for 1000 examples is 98.80 %

1.3 Source Code Listing // --------------------------------------------------------------------------// PCTLearn.cpp: Perceptron Learning // --------------------------------------------------------------------------// Class: Artificial Intelligence (CSCI6751) Fall 2001 // Fairleigh Dickinson University // Instructor: Prof. Fransic Sand // --------------------------------------------------------------------------// Author: Kanoksri Sarinnapakorn // Date: 12/5/2001 // Revisions: None. // --------------------------------------------------------------------------#include #include #include #include #include #include #include const const const const const <stdio.h> <string.h> <iostream.h> <fstream.h> <iomanip.h> <time.h> <stdlib.h> MAX_TRIALS MAX_EXAMPLES NUM_INPUTS ALPHA THRESHOLD = = = = = 100; 1000; 6; float(0.1); 1.0; // // // // Maximum trials Maximum number of the examples Number of input I1...I6 Learning rate

int int int float float

int training_data[MAX_EXAMPLES][NUM_INPUTS+1];

// add 1 for target value

void init_weights(float weights[]); void train(int num_examples, float final_weights[]); void get_input(ifstream &fin, int &num_examples); void adjust_weight(float weights[], int input[], int err); void print_weights(float weights[]); void cal_correct_rate(float final_weights[]); int perceptron_output(float weights[], int input[]); int main(int argc, char *argv[]) { float weights[NUM_INPUTS]; int num_examples; char *filename = "train14.dat"; ifstream fin; if(argc < 2) fin.open(filename); else fin.open(argv[1]); if (fin.fail()) cout << "Could not open the file" << endl; else { init_weights (weights); get_input(fin,num_examples); fin.close();

cout << setw(42) << "Perceptron Learning" << endl << endl; cout << "Number of training examples: " << num_examples << endl; train(num_examples,weights); print_weights(weights); cal_correct_rate(weights); } return(0); } void get_input(ifstream &fin, int &num_examples) { const int TITLE_LENGHT = 100; const int NAME_LENGHT = 5; char ex_number[NAME_LENGHT]; char title[TITLE_LENGHT]; int j ; /* counter */ int tmp_target; /* examples number */

num_examples = 0; fin.getline(title,TITLE_LENGHT,'\n');

/* read title line */

fin >> ex_number; while (!fin.eof()) { fin >> tmp_target; for (j = 0; j < NUM_INPUTS; j++) fin >> training_data[num_examples][j]; training_data[num_examples][NUM_INPUTS] = tmp_target; num_examples++; fin >> ex_number; } } void train(int num_examples, float weights[]) { int i,n_ex = 0, n_trial = 0; /* counter */ int input[NUM_INPUTS]; int T, O, Err; // T = targer, O = ouput from perceptron, Err = T-O int trials = 1,changes=0; while (n_ex < num_examples && trials < MAX_TRIALS) { for (i = 0; i < NUM_INPUTS; i++) input[i] = training_data[n_ex][i]; T = training_data[n_ex][NUM_INPUTS]; O = perceptron_output(weights, input); Err = T - O; if (Err != 0) { adjust_weight(weights, input, Err); n_ex = 0; /* start at the beginning again */

++changes; } else n_ex++;

/* Number of changing weights */

if (++n_trial > num_examples) { n_trial = 0; ++trials; } } cout << "Number of changing weights: " << changes << endl; cout << "Number of trials: " << trials << endl; } void init_weights (float weights[]) { int i; srand((unsigned) time (NULL)); for (i=0 ; i< NUM_INPUTS ; i++) weights[i] = float(rand())/float(RAND_MAX); } void adjust_weight(float weights[], int input[], int err) { int i; for (i = 0; i < NUM_INPUTS ; i++) weights[i] = weights[i] + ALPHA * err * input[i]; } int perceptron_output(float weights[], int input[]) { int i, output; float sum; sum = -THRESHOLD; for (i=0 ; i < NUM_INPUTS ; i++) sum += float (input[i]) * weights[i]; if (sum >= 0.0) output = 1; else output = 0; return (output); } void print_weights(float weights[]) { int i; cout << endl << setw(40) << "Final weights " << endl << endl << setw(5); for (i = 0; i < NUM_INPUTS; i++) cout << "W" << i+1 << setw(11); cout << endl; cout.setf(ios::showpoint); cout.setf(ios::fixed);

/* 0.0 - 1.0 */

setprecision(6); for (i = 0; i < NUM_INPUTS; i++) cout << weights[i] << setw(12) ; cout << endl << endl; } void cal_correct_rate (float weights[]) { const int TITLE_LEN = 100; const int NAME_LEN = 5; char ex_number[NAME_LEN]; char title[TITLE_LEN]; /* examples number */

int num_examples; int target, data, correct_no, output, i; float sum; char *filename = "testdata.dat"; ifstream fin(filename); if (fin.fail()) cout << "Could not open the testdata.txt" << endl; else { correct_no = 0; num_examples = 0; fin.getline(title,TITLE_LEN,'\n'); /* read title line */ fin >> ex_number; while (!fin.eof()) { fin >> target; sum = -THRESHOLD; for (i = 0; i < NUM_INPUTS; i++) { fin >> data; sum += float(data)* weights[i]; } if (sum >= 0.0) output = 1; else output = 0; if (output == target) correct_no++; num_examples++; fin >> ex_number; } cout << setprecision(2); cout << "Correct rate for " << num_examples << " examples is " << (correct_no * 100.0)/float (num_examples) << " %" << endl; } }

2. Decision Tree Learning Program (DTLearn)


The program (DTLearn) is written in C++ and compiled on Microsoft Visual C++ version 6 compiler. It consists of five cpp files and five header files. 1. DTLearn.cpp and DTLearn.h 3. Example.cpp and Example.h 4. Node.cpp and Node.h 5. Utility.cpp and Utility.h - The main program and decision class - The example class - The node class for creating the tree - The utility functions

2. Dimension.cpp and Dimension.h - The dimension class stores an attribute and its possible values

The DTLearn is modified from decision tree learning programs which were written by Jon Monsarrat and run on Unix operating system. The original source code can be downloaded from http://yoda.cis.temple.edu:8080/books/dean/05.Learning/C%2b%2bCode/Decision/. 2.1 Program Logic 1. Read information about classes and attributes from the definition file. 2. Read each example (target value and attribute values) from the input file and keep in the examples class (using linked list data structure). 3. Build a tree. Start with the whole set of examples at the root node and split the root node using the following algorithm: 3.1 If the number of examples equals zero, return (skip this node). 3.2 Print the distribution. 3.3 If all examples have the same class, then return the class (declare the node a terminal node). 3.4 Call splitter function to find the best attribute in order to split the training set at this node. Choose the best attribute which is the attribute with the largest gain. For any attribute A, the gain is defined as follows: Gain(A) = I (

p n , ) Remainder(A) p +n p +n
pi + ni pi ni , ) I( p +n pi + ni pi + ni i =1

Remainder(A) =

I(

p p p n n n , )= log 2 log 2 p +n p +n p +n p +n p +n p +n

where attribute A has v distinct values and each subset has pi positive examples and ni negative examples. 4. Create a subtree for each non-terminal node using the same algorithm as in step 3. Repeat this process until no more non-terminal nodes left. 5. return the tree. 7

2.2 Input and Output Format Input There are two files for the input, the definition file and the data file. Definition file (Filedef.dat) Input format is laid out in this order: 1. The word "classes", followed by the number of classes and their values 2. The word "dimensions", followed by the number of dimensions 3. Each dimension in order, which includes a. the name of the attribute b. the number of different values of the attribute c. all possible values of the attribute Example of definition file classes 2 1 0 dimensions 6 I1 2 1 0 I2 2 1 0 I3 2 1 0 I4 2 1 0 I5 2 1 0 I6 2 1 0 Data file (ex: Train14.dat) 1. Column1: The example number 2. Column2: Target value 3. Column3 8: Attributes (for 6 attributes) EX T I1 I2 I3 I4 1 1 1 0 1 0 2 1 1 0 1 1 3 1 1 0 1 0 4 1 1 1 0 0 5 1 1 1 1 1 6 1 1 0 0 0 7 0 1 0 0 0 8 1 0 1 1 1 9 0 0 1 1 0 10 0 0 0 0 1 11 0 0 1 0 1 12 0 0 0 0 1 13 0 0 1 1 0 14 0 0 1 1 1

I5 0 0 1 1 0 1 1 0 1 1 0 0 1 0

I6 0 0 0 1 0 1 0 1 1 0 1 1 1 0

Output format There are two parts in the output. The first part is the class distribution and the chosen attribute during the building of the decision tree. The second part is the resulting decision tree. Example of output Class distribution is Dimension chosen: I1 Class distribution is : : Class distribution is Terminal node Class distribution is Terminal node Split on I1: ---1: Split on I3: ---1: [1] ---0: Split on I6: ---1: [1] ---0: [0] ---0: Split on I3: ---1: Split on I4: ---1: Split on I6: ---1: [1] ---0: [0] ---0: [0] ---0: [0] 1: 7 0: 7 1: 6 0: 1 Distribution 1: 0 0: 2 1: 0 0: 3

Decision tree

From the decision tree information, we can draw the decision tree like this.
I1 0 1

I3 0 0 0 0 0 0 1 I6 I4 1 0 I6 1 0

I3 0 1 1 1 1

2.3 Source Code Listing // --------------------------------------------------------------------------// DTLearn.cpp: The main program which learns and creates the decision tree // --------------------------------------------------------------------------// Class: Artificial Intelligence (CSCI6751) Fall 2001 // Fairleigh Dickinson University // Instructor: Prof. Fransic Sand // --------------------------------------------------------------------------// Author: Kanoksri Sarinnapakorn // Date: 11/9/2001 // --------------------------------------------------------------------------#include <stdlib.h> #include <iostream.h> #include <fstream.h> #include #include #include #include "DTLearn.h" "Node.h" "Example.h" "Dimension.h"

int Decision::num_classes; // Number of classes for this run char Decision::classes[MAX_CLASSES][MAX_STRING]; // Names for all classes int Decision::num_dimensions; // Number of dimensions for this run Dimension **Decision::dimensions; // Array of dimensions int main(int argc, char *argv[]) { ifstream fdef("filedef.dat"); /* fdef (definition file) */ ifstream fin; /* fin (data file) */ char *filename = "train14.dat; if(argc < 2) fin.open(filename); else fin.open(argv[1]); if(fin.fail()) cerr << "Could not open file " << filename << endl; else { Decision decision(fdef,fin); decision.build_tree(); cout << decision << endl; } fdef.close(); fin.close(); return 0; } // Construct a decision given the filename for the info file Decision::Decision(ifstream &fdef, ifstream &fin) { char keyword[MAX_STRING]; // Some scratch space for reading in words char title_line[100];

10

int i; fdef >> keyword; fdef >> num_classes; // Should be "classes" // Read in the number of classes

for(i = 0; i < num_classes; i++) // Read in each class name fdef >> classes[i]; fdef >> keyword; // Should be "dimensions" fdef >> num_dimensions; // Read in the number of dimensions for this run dimensions = new Dimension *[num_dimensions]; for(i = 0; i < num_dimensions; i++) { dimensions[i] = new Dimension(); fdef >> *dimensions[i]; // Read in each dimension } root = new Node; // Create the top of the decision tree fin.getline(title_line,100,'\n'); // get title of variables // Construct a linked list of all examples. // Read in all the examples and give them to the root node while(!fin.eof()) { fin >> keyword; /* examples number */ if(keyword[0] && !fin.eof() && !fin.fail()) // End of file? { Example *example = new Example(keyword, fin); root->add_example(example); } } } // Destroy the decision Decision::~Decision() { for(int i = 0; i < num_dimensions; i++) delete dimensions[i]; delete [] dimensions; delete root; //Deleting the root } // Given a name, look up which attribute this is, and return the index int Decision::lookup_attribute(int i, char temp[]) { return dimensions[i]->lookup(temp); } // Given a name, look up which class this is, and return the index int Decision::lookup_class(char temp[]) { int result = 0; for(int i = 0; i < num_classes && result == 0; i++) if(strcmp(classes[i], temp) == 0) result = i; return result; }

11

// Run the build_tree() algorithm to create the decision tree void Decision::build_tree() { root->build_tree(); } // Print out the tree ostream& operator<<(ostream &o, Decision &d) { d.root->print(0); return o; }

// -------------------------------------------------------------------------// DTLearn.h: Header file of DTLearn.cpp // --------------------------------------------------------------------------// Class: Artificial Intelligence (CSCI6751) Fall 2001 // Fairleigh Dickinson University // Instructor: Prof. Fransic Sand //--------------------------------------------------------------------------// Author: Kanoksri Sarinnapakorn // Date: 11/9/2001 // Revisions: None. //--------------------------------------------------------------------------#include <fstream.h> #include <iostream.h> #include "Utility.h" const int MAX_CLASSES = 20; // The maximum number of classes allowed

class Dimension; class Node; class Decision { friend ostream& operator<<(ostream &o, Decision &d); // Print out the tree public: static int num_classes; // Number of classes for this run static char classes[MAX_CLASSES][MAX_STRING]; // Names for all classes static int lookup_class(char temp[]); // Get the class index given the name static int num_dimensions; // Number of dimensions for this run static Dimension **dimensions; // Array of dimensions static int lookup_attribute(int x, char temp[]); // Get the dimension index Node *root; // The top of the decision tree

Decision (ifstream &fdef, ifstream &fin); // Construct a Decision ~Decision(); // Destroy the decision tree void build_tree(); }; // Build a decision tree

12

// ------------------------------------------------------------------------------// Dimension.cpp: The dimension class stores an attribute and its possible values // ------------------------------------------------------------------------------#include #include #include #include "Utility.h" "Dimension.h" "Example.h" "Node.h"

// Build a dimension Dimension::Dimension() { } // Destroy a dimension Dimension::~Dimension() { } // Given a name, lookup the index of the corresponding value int Dimension::lookup(char temp[]) { int result = 0; for(int i = 0; i < num_values && result == 0; i++) if(strcmp(temp, values[i]) == 0) result = i; return result; } // Return the name of the attribute for this dimension char *Dimension::get_attribute() { return attribute; } // Read in a dimension from a file istream& operator>>(istream &in, Dimension &d) { in >> d.attribute; in >> d.num_values; for(int i = 0; i < d.num_values; i++) in >> d.values[i]; return in; }

13

// // // // // // // //

------------------------------------------------------------------------------Dimension.H Each example that we want to place in the decision tree has a value for each attribute. An attribute together with all the possible values for the attribute we call a dimension. The dimension class stores an attribute and its possible values, which is handy for various reasons. -------------------------------------------------------------------------------

#include <iostream.h> #include "Utilit.h" const int MAX_VALUES = 40; // Same as MAX_CHILDREN for a node.

class Node; class Example; class Dimension { friend istream& operator>>(istream &i, Dimension &d); char attribute[MAX_STRING]; public: int num_values; // How many values this dimension has char values[MAX_VALUES][MAX_STRING]; // The names of all values int lookup(char temp[]); char *get_attribute(); Dimension(); ~Dimension(); }; // ------------------------------------------------------------------------------// Example.cpp: An example is an assignment of values to each possible attribute. // An example has a name, and also a classification value. // -------------------------------------------------------------------------------#include "Dtlearn.h" #include "Example.h" #include "Dimension.h" // Construct an example, given the name of the example and an input stream. // This input stream is the input from the examples file. Example::Example(char *_name, istream& in) { strncpy(name, _name, MAX_STRING); char class_value[MAX_STRING]; in >> class_value; my_class = Decision::lookup_class(class_value); values = new int[Decision::num_dimensions]; for(int i = 0; i < Decision::num_dimensions; i++) { in >> class_value; values[i] = Decision::lookup_attribute(i, class_value); } } // // // // Lookup the index of a given value Return the name of the attribute Construct a Dimension object Destroy this dimension

14

// Destroy this example. That means destroying the dynamically allocated memory Example::~Example() { delete [] values; } // Get the index of the class this example goes with int Example::get_class() { return my_class; } // Get the value for a certain dimension. int Example::get_value(int dim) { return values[dim]; } // Print out the example ostream& operator<<(ostream &o, Example &e) { o << " [" << Decision::classes[e.my_class] << "] "; /* display each examples use this routine */ //o << e.name << " [" << Decision::classes[e.my_class] << "] "; //for(int i = 0; i < Decision::num_dimensions; i++) //{ // o << "("; // o << Decision::dimensions[i]->get_attribute() << ", "; // o << Decision::dimensions[i]->values[e.values[i]] << ") "; //} return o; } // ------------------------------------------------------------------------------// Example.H // An example is an assignment of values to each possible attribute. // An example has a name, and also a classification value. // ------------------------------------------------------------------------------#include <iostream.h> #include "Utility.h" class Example { friend ostream& operator<<(ostream &o, Example &e); // Print out the example char name[MAX_STRING]; // The name of this example int *values; // Array of values. We use numbers to index // into the symbols in the dimensions. int my_class; // Which of the valid classifications this is. public: Example(char *_name, istream& in); // Construct given a name and in file ~Example(); // Destroy this example int get_class(); // Get the class index of this example int get_value(int dim); // Get the value for a certain dimension. Example *next; }; // The next example in a linked list.

15

// ---------------------------------------------------------------------------// Node.cpp: Node class // ---------------------------------------------------------------------------#include #include #include #include #include #include <iostream.h> <math.h> "Example.h" "Node.h" "DTLearn.h" "Dimension.h"

// Construct a node. Node::Node() { num_children = 0; num_examples = 0; examples = NULL; } // Destroy a node. This means deleting all the dynamically allocated memory Node::~Node() { // Delete the linked list of examples Example *example = examples; Example *temp; while(example) { temp = example->next; delete example; example = temp; } // Delete all the children nodes for(int i = 0; i < num_children; i++) delete children[i]; } // Add an example to this node. A specific example can only inhabit // one node at any one time. void Node::add_example(Example *example) { example->next = examples; examples = example; num_examples++; }

// // // // // // //

Build the decision subtree starting at this node. If this node is the root node, then this means building the entire tree! First we see if the examples at this node have more than one class. If not, then there is no splitting required. If so, then we check all the dimensions using a special metric in evaluate() and choose one dimension to split on. This split generates new children nodes, and we run the build_tree() algorithm on all the children, creating decision subtrees.

16

void Node::build_tree() { if(num_examples != 0) // Skip this node if no examples { print_distribution(); // Print the class distribution of this node // We do not want to split this node if the examples all have the // same class. Further differentiation is not required, because all // the "answers" are the same. int first_class = examples->get_class(); int different_classes = 0; // Assume not different classes for(Example *example = examples->next; example; example = example->next) if(example->get_class() != first_class) different_classes = 1; if(different_classes) // if different classes { // Find the attribute that maximizes a particular evaluation function // implemented by evaluate(). Which dimension is best to split on? attribute = splitter(); cout << "Dimension chosen: " << Decision::dimensions[attribute]->get_attribute() << endl; partition(); // Actually break up the node. Allocate all examples to kids for(int i = 0; i < num_children; i++) // Now do it on all the subtrees children[i]->build_tree(); } else cout << "Terminal node" << endl; } } // This node has many examples. We build children from this node by creating // new nodes, one each for a value of the attribute specified in "attribute" void Node::partition() { for(int val = 0; val < Decision::dimensions[attribute]->num_values; val++) children[num_children++] = new Node; Example *temp; Example *example = examples; while(example) { temp = example->next; children[example->get_value(attribute)]->add_example(example); example = temp; } examples = NULL; }

17

// Print out the class distribution of this node void Node::print_distribution() { int i; cout << "Class distribution is "; int class_totals[MAX_CLASSES]; // How many are in each class? for(i = 0; i < Decision::num_classes; i++) class_totals[i] = 0; for(Example *example = examples; example; example = example->next) class_totals[example->get_class()]++; for(i = 0; i < Decision::num_classes; i++) cout << Decision::classes[i] << ": " << class_totals[i] << " "; cout << endl; } // There are many dimensions that we could split on at this node. // Decide which one is most appropriate. Returns the number index. int Node::splitter() { // Go through all the dimensions and find the one with the lowest // disorder value. We get this from the evaluation function. double min_value = evaluate(0); // The lowest evaluation seen so far int min_dim = 0; // Dimension of lowest evaluation double value; for(int dim = 1; dim < Decision::num_dimensions; dim++) { value = evaluate(dim); if(value < min_value) { min_value = value; min_dim = dim; } } return min_dim; } // Assuming that we split this node along the given dimension, there // will be a number of subtrees. Each of these subtrees will have a certain // disorder. Add up the total disorder of the subtrees and return it. double Node::evaluate(int dim) { // Find the numerical distribution of examples along this dimension int dim_totals[MAX_VALUES]; for(int value = 0; value < Decision::dimensions[dim]->num_values; value++) dim_totals[value] = 0; for(Example *example = examples; example; example = example->next) dim_totals[example->get_value(dim)]++; // Tally up the overall total double total = 0; for(value = 0; value < Decision::dimensions[dim]->num_values; value++)

18

total += dim_totals[value]; double sum = 0; for(value = 0; value < Decision::dimensions[dim]->num_values; value++) sum += disorder(dim, value) * dim_totals[value] / total; return sum; } // Given a dimension to split on, and a particular class, how disordered // would this subtree be, starting from this node? double Node::disorder(int dim, int value) { double result; int class_totals[MAX_CLASSES]; double total = 0, sum = 0; // How many are in each class?

for(int i = 0; i < Decision::num_classes; i++) class_totals[i] = 0; for(Example *example = examples; example; example = example->next) if(example->get_value(dim) == value) class_totals[example->get_class()]++; for(i = 0; i < Decision::num_classes; i++) total += class_totals[i]; // Should be same as num_examples. if(total < 0.1) // basically total == 0 result = 0; else { for(i = 0; i < Decision::num_classes; i++) if(class_totals[i] > 0) sum += (class_totals[i] / total) * logbase2(class_totals[i]); result = logbase2(total) - sum; } return result; } // Print this node with "depth" amount of indentation void Node::print(int depth) { int i, j; if(num_children == 0) { Example *example = examples; for(i = 0; i < depth; i++) cout << " "; cout << *example << endl; else { for(i = 0; i < depth; i++) cout << " "; cout << "Split on " << Decision::dimensions[attribute]->get_attribute() << ":" << endl;

19

for(j = 0; j < num_children; j++) { for(i = 0; i < depth; i++) cout << " "; cout << "---" << Decision::dimensions[attribute]->values[j] << ": " << endl; children[j]->print(depth+1); } } } // --------------------------------------------------------------------------// Node.h: Node class. // --------------------------------------------------------------------------const int MAX_CHILDREN = 40; class Example; class Dimension; class Node { Example *examples; int num_examples; int attribute; int num_children; Node *children[MAX_CHILDREN]; // Cannot be more than MAX_VALUES for a dimension

// Start of a linked list of examples. // The number of examples

// The children under this node

int splitter(); // Which dimension will node be split on? double evaluate(int dim); // Metric for deciding splitting dimension double disorder(int dim, int value); // Another metric, used by evaluate void print_distribution(); // Print out the class distribution here void partition(); // Break up the node given the split public: Node(); // ~Node(); // void add_example(Example *example); void build_tree(); // node double disorder(int value); // void print(int depth); // };

Construct a node Destroy a node // Add an example to this node's set Build the decision subtree from this Calculate a metric for disorder Print the node with "depth" indentation

20

// ---------------------------------------------------------------------------// Utility.cpp: This is a collection of utility functions in DTLearn. // ---------------------------------------------------------------------------#include <math.h> #include "Utility.h" // Compares two strings. If they are equal, return zero. int strcmp(const char *a, const char *b) { int result = 0; while(*b) { if(*a > *b) result = 1; else if(*a < *b) result = -1; a++; b++; } if(*a) result = 1; return result; } // Copy one string to another, but NOT more than n characters. // Assume that "a" points to allocated memory which can be written into. void strncpy(char *a, const char *b, int n) { for(int i=0; *b && i<n; i++) *a++ = *b++; *a = '\0'; } double logbase2(double n) { return (log(n)/log(2.0)); } // // // // // ---------------------------------------------------------------------------Utility.H This is a general string class. We include it for a platform-independent string comparison function. ----------------------------------------------------------------------------

#ifndef _STRING_H_ #define _STRING_H_ #ifndef NULL #define NULL 0 #endif const int MAX_STRING = 20; // The maximum length of any string. extern int strcmp(const char *a, const char *b); extern void strncpy(char *a, const char *b, int n); #endif extern double logbase2(double n);

21

3. Exercise 19.4
Each example has six inputs and one target output: I1 I2 I3 I4 I5 I6 T 1 0 1 0 0 0 1 1 0 1 1 0 0 1 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 1 0 0 1 1 0 0 0 1 1 1 1 0 0 0 1 0 0 0 1 1 1 0 1 1 0 1 1 0 1 1 0 0 0 0 1 1 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 1 0 0 0

a) Run the perceptron learning rule on this example set, and show the resulting set of weights. The result from perceptron learning Perceptron Learning Number of training examples: 14 Number of changing weights: 16 Number of trials: 9 Final weights W1 0.767931 W2 0.192413 W3 0.254704 W4 0.478561 W5 0.121235 W6 0.111350

Correct rate for 1000 examples is 98.80 % b) Run the decision tree learning rule, and show the resulting decision tree. The result from decision tree learning Class distribution is Dimension chosen: I1 Class distribution is Dimension chosen: I3 Class distribution is Terminal node Class distribution is Dimension chosen: I6 Class distribution is Terminal node Class distribution is Terminal node Class distribution is Dimension chosen: I3 Class distribution is Dimension chosen: I4 Class distribution is Dimension chosen: I6 Class distribution is Terminal node Class distribution is Terminal node 1: 7 0: 7 1: 6 0: 1 1: 4 0: 0 1: 2 0: 1 1: 2 0: 0 1: 0 0: 1 1: 1 0: 6 1: 1 0: 3 1: 1 0: 1 1: 1 0: 0 1: 0 0: 1

22

Class distribution is 1: 0 0: 2 Terminal node Class distribution is 1: 0 0: 3 Terminal node Split on I1: ---1: Split on I3: ---1: [1] ---0: Split on I6: ---1: [1] ---0: [0] 0 ---0: Split on I3: 0 ---1: Split on I4: ---1: Split on I6: ---1: [1] ---0: [0] ---0: [0] ---0: [0]

I1 0 1

I3 1 I6 I4 0 0 0 0 1 0 I6 1 0

I3 0 1 1 1 1

23

4. Testing Performance and Learning Curve


Generate more examples from majority function for data in exercise 19.4 (votes: 10 for I1, 4 for I2-I4, 2 for I5, 1 for I6). There are eleven training sets, each having different number of examples: 14, 50, 100, 150, , 500. The test data set has 1000 examples. 4.1 Performance of Perceptron Learning The PTCLearn program (section 1.3, page 3) is modified for testing performance. Output from one time running of different training set sizes Size 14 50 100 150 200 250 300 350 400 450 500 W1 0.941713 0.841713 0.841713 0.841713 0.841804 0.941804 0.941804 0.741804 0.841804 0.941804 0.841804 W2 0.250551 0.350551 0.250551 0.250551 0.278533 0.278533 0.278533 0.278533 0.278533 0.278533 0.178533 W3 0.240110 0.240110 0.340110 0.340110 0.285293 0.285293 0.185293 0.285293 0.285293 0.285293 0.385293 W4 W5 0.449168-0.089596 0.349168 0.110404 0.449168 0.010404 0.449168 0.110404 0.283505 0.225202 0.283505 0.325202 0.283505 0.325202 0.283505 0.125202 0.283505 0.225202 0.283505 0.325202 0.483505 0.025202 W6 Correct rate 0.196139 96.20% 0.196139 96.50% 0.196139 95.70% 0.196139 92.60% 0.115986 96.30% 0.015986 96.30% 0.115986 92.80% 0.115986 95.90% 0.015986 96.30% 0.015986 96.30% 0.115986 94.90%

Run 30 times for each training set size and compute the average and standard deviation of correct rate. Training set size 14 50 100 150 200 250 300 350 400 450 500 Average correct rate (%) 92.20 95.77 99.30 96.82 97.64 99.29 98.43 98.82 99.38 99.16 94.19 Standard Deviation correct rate (%) 0.47 1.78 1.01 2.64 2.94 1.83 2.65 2.40 1.40 1.71 4.89

24

4.2 Performance of Decision Tree Learning The decision tree from training set size 14
Split on I1: ---1: Split on I3: ---1: [1] ---0: Split on I6: ---1: [1] ---0: [0] ---0: Split on I3: ---1: Split on I4: ---1: Split on I6: ---1: [1] ---0: [0] ---0: [0] ---0: [0]
I1 0 1

I3 0 0 0 0 0 0 1 I6 I4 1 0 I6 1 0

I3 0 1 1 1 1

Correct rate 84.70%.

The decision tree from training set size 50


Split on I1: ---1: Split on I4: ---1: [1] ---0: Split on I2: ---1: [1] ---0: Split on I6: ---1: [1] ---0: Split on I3: ---1: [1] ---0: [0] ---0: Split on I3: ---1: Split on I2: ---1: Split on I4: ---1: [1] ---0: [0] ---0: [0] ---0: [0]

I1 0 1

I3 0 0 0 0 0 0 1 I2 1 I4 1 I3 0 0 1 1 0 1 I6 1 I2 0

I4 0 1 1 1 1

Correct rate 95.70%.

25

The decision tree from training set size 100


Split on I1: ---1: Split on I4: ---1: [1] ---0: Split on I3: ---1: [1] ---0: Split on I6: ---1: [1] ---0: Split on I2: ---1: [1] ---0: [0] ---0: Split on I3: ---1: Split on I5: ---1: Split on I2: ---1: Split on I4: ---1: [1] ---0: [0] ---0: [0] ---0: [0] ---0: [0]

I1 0 1

I3 0 0 0 0 1 I5 1 I2 0 0 0 0 1 I4 1 1 0 0 I2 1 1 0 I6 1

I4 0 I3 0 1 1 1 1

Correct rate 92.20% The decision tree from training set size 150
Split on I1: ---1: Split on I4: ---1: [1] ---0: Split on I3: ---1: [1] ---0: Split on I2: ---1: [1] ---0: Split on I5: ---1: Split on I6: ---1: [1] ---0: [0] ---0: [0] ---0: Split on I5: ---1: Split on I4: ---1: Split on I2: ---1: Split on I3: ---1: [1] ---0: [0] ---0: [0] ---0: [0] ---0: [0]
0 0 0 0

I1 0 1

I3 1 I5 1 I2 0 0 0 0 1 I4 1 1 0 0 I5 1 1 0 I2 1

I3 0 I4 0 1 1 1 1

Correct rate 96.30%.

26

The decision tree from training set size 200


I1 0 1

The decision tree from training set size 250


I1 0 1

I2 0 0 0 0 1 I4 1 I3 0 0 0 I6 0 0 1 1 1 I5 1 1 0 0 0 0 I5 1 I6 1 1 0 I2 1

I3 0 I4 0 1 1 1 1
0 0

I2 1 I4 0 0 1 I3 0 0 0 I6 0 0 1 1 1 I5 1 1 0 0 0 0 I6 1 I5 1 1 0 I2 1

I3 0 I4 0 1 1 1 1

Correct rate 100.00%

Correct rate 100.00%

The decision trees of 300, 350, 400 and 450 training sets have almost the same pattern as that of 200, 250 and 500 training sets. The decision tree from training set size 500

I1 0 1

Correct rate 100.00%


I2

I4 0 0 0 0 1 I3 1 I2 0 0 0 I5 0 0 1 1 1 I6 1 1 0 0 0 0 I6 1 I5 1 1 0 I3 1

0 I4 0 1 1

1 1

27

Learning Curves Training set size 14 50 100 150 200 250 300 350 400 450 500 Perceptron correct rate (%) 92.20 95.77 99.30 96.82 97.64 99.29 98.43 98.82 99.38 99.16 94.19 Decision Tree correct rate (%) 84.70 95.70 92.20 96.30 100.00 100.00 100.00 100.00 100.00 100.00 100.00

Learning Curve
100 % correct on test set 80 60 40 20 0 0 100 200 300 400 500 Training set size

Perceptron Decision Tree

28

5. Discussion
Both perceptron learning and decision tree learning are simple forms of supervised learning. Perceptron Learning The perceptron is a single layer neural network whose weights and biases could be trained to produce a correct target vector when presented with the corresponding input vector. The perceptron generated great interest due to its ability to generalize from its training vectors and work with randomly distributed connections. Perceptron networks have several limitations. First, the output values of a perceptron can take on only one of the two values (True or False). Second, the perceptron can only classify linearly separable sets of vectors. If a straight line or plane can be drawn to separate the input vectors into their correct categories, the input vectors are said to be linearly separable and the perceptron will find the solution. If the vectors are not linearly separable, perceptron learning will never reach a point where all vectors are classified properly. Decision Tree Learning Decision tree learning is an efficient method for learning deterministic Boolean functions. It can be used with functions with a larger range of outputs as well. The result of decision tree learning is a decision tree that provides a simple representation for the prepositional knowledge. It helps us see the important attributes and understand the problem better, and the resulting tree can be used easily for decision making and classification of objects. Performance Comparison Comparing the learning curves from perceptron learning and decision tree learning, we see that both can predict with a good accuracy of 80% or above. When the training set size is below 200, perceptron learning has a slightly higher percentage of correct prediction. As the training set size gets bigger, decision tree learning performs better and outperforms perceptron learning. In the study here, decision tree learning gives a 100% correct prediction once the training set size is 200 or more. In terms of interpretation, a linear function classifier as obtained from perceptron learning is harder to understand than a decision tree with a single attribute split at each node. Besides, the weights of attributes in the linear function can change from one run to another even with the same set of data, although the accuracy may not be affected much. In decision tree learning, it is possible that not all attributes are used in splitting. Perhaps we can say that decision tree learning can screen out some unimportant attributes and produce a parsimonious classifier. (The attributes that are not selected may be indeed unimportant or they may have high correlation with some other attributes presented in the tree.)

29

6. References
[1] Chakrabarty, N., Example of neural networks, [Online document], http://www.thechak.net/~neil/neilchak.htm [2] Fullr, R., The perceptron learning rule, [Online document], ftp://ftp.abo.fi/pub/iamsr/nfs8.pdf [3] Russell S. and Norvig P. (1995). Artificial Intelligence: A Modern Approach," Prentice Hall, Upper Saddle River, New Jersey. [4] Thornton, C., Decision tree learning program, by, [Online document], http://www.cogs.susx.ac.uk/users/christ/crs/sai/lec15.html [5] Weisman, O. and Pollack, Z., The perceptron, [Online document], http://www.cs.bgu.ac.il/~omri/Perceptron/ [6] Decision tree learning, [Online document], http://www.cs.cornell.edu/home/cardie/tutorial/dtrees.pdf [7] Example of decision tree programs, [Online document], http://yoda.cis.temple.edu:8080/books/dean/05.Learning/C%2b%2bCode/Decision/.

30

You might also like