4 - Training and Testing Classifier Models

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 15

Training and testing classifier models

The procedure of training and testing classifier models in machine learning typically
involves the following steps:
1. Data Preparation
2. Splitting the Data
3. Model Selection
4. Model Training
5. Hyperparameter tuning
6. Batch normalization
7. Model Evaluation
1. Data Preparation

• Split the dataset into features (independent variables) and labels


(dependent variable).

• Preprocess the data, including handling missing values, scaling features,


and encoding categorical variables.
2. Splitting the Data

• Divide the dataset into three subsets: training, validation and test.

a. Training Up to 75 percent of the total dataset is used for training. The model learns on the
training set; in other words, the set is used to assign the weights and biases that go into the
model.
b. Validation Between 15 and 20 percent of the data is used while the
model is being trained to evaluate initial accuracy, observe how the
model learns, and fine-tune hyperparameters. The model sees validation
data but does not use it to learn weights and biases.

c. Test Between 5 and 10 percent of the data is used for the final
evaluation. Having never seen this dataset, the model is free of any bias.
3. Model Selection

• Choose a classifier model based on the problem requirements and characteristics of the data.

• Common classifier models include:

1. logistic regression,

2. decision trees,

3. random forests,

4. support vector machines, and

5. neural networks.
4. Model Training

• Train the selected model on the training set.

• The model learns the patterns and relationships between features and
labels in the training data.
5. Hyperparameter tuning

Hyperparameters can be imagined as settings for controlling the behavior of a training


algorithm, as shown below.
The algorithm learns parameters from the data during the training phase based on human-
adjustable hyperparameters. The designer sets them after theoretical deductions or adjusts
them automatically.

In the context of deep learning, examples of hyperparameters are:

1. Learning rate

2. Number of hidden units

3. Convolution kernel width

4. Regularization techniques
6. Batch normalization

Two techniques, normalization and standardization, both aim to transform the data by
putting all the data points on the same scale in preparation for training.

The normalization process usually consists of scaling the numerical data down to a scale
from zero to one.

Standardization, on the other hand, usually consists of subtracting the dataset's mean
from each data point and then dividing the difference by the dataset’s standard
deviation. That forces the standardized data to take on a mean of zero and a standard
deviation of one. Standardization is often referred to as normalization; both involve
putting data on some known or standard scale.
7. Model Evaluation

• Once a model has been trained, performance is gauged according to a


confusion matrix and precision/accuracy metrics.
a. Confusion matrix

A confusion matrix describes the performance of a classifier model, as in


the 2x2 matrix depicted below.
Consider a simple classifier that predicts whether a patient has cancer or not. There are four
possible results:

• True positives (TP): The prediction was yes, and the patient does have cancer.
• True negatives (TN): The prediction was no, and the patient does not have cancer.
• False positives (FP): The prediction was yes, but the patient does not have cancer (also
known as a "Type I error").
• False negatives (FN): The prediction was no, but the patient does have cancer (also
known as a "Type II error")
A confusion matrix can hold more than 2 classes per axis, as shown here:
b. Precision / Accuracy

It is also useful to calculate the precision and accuracy based on classifier prediction and
actual value.
Accuracy measures how often the classifier is correct overall observations. Based on the
grid above, the calculation is (TP+TN)/total = (100+50)/(60+105) = 0.91.

Precision measures how often the actual value is Yes when the prediction is Yes. In this
case, that calculation is TP/predicted yes = 100/(100+10) = 0.91.

You might also like