Cross-Validation in Machine Learning

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 4

Cross-Validation in Machine

Learning
What is cross-validation?
• Cross-validation is a resampling procedure used for evaluating a machine
learning model and testing its performance.
• It helps to compare and select an appropriate model for the specific
predictive modeling problem.
• CV is easy to understand, easy to implement, and it tends to have a lower
bias than other methods used to count the model’s efficiency scores.
• All this makes cross-validation a powerful tool for selecting the best
model for the specific task.
• There are a lot of different techniques that may be used to cross-validate
a model.
k-Fold cross-validation
Configuration of k
• The k value must be chosen carefully for your data sample.
• A poorly chosen value for k may result in a high variance or a high bias.
• Three common tactics for choosing a value for k are as follows:
• Representative: The value for k is chosen such that each train/test group of data
samples is large enough to be statistically representative of the broader dataset.
• k=10: The value for k is fixed to 10, a value that has been found through
experimentation to generally result in a model skill estimate with low bias a modest
variance.
• k=n: The value for k is fixed to n, where n is the size of the dataset to give each test
sample an opportunity to be used in the hold out dataset. This approach is called
leave-one-out cross-validation.
Note: The choice of k is usually 5 or 10, but there is no formal rule. As k gets larger, the difference in
size between the training set and the resampling subsets gets smaller. As this difference
decreases, the bias of the technique becomes smaller.

You might also like