Cross-Validation in Machine Learning
Cross-Validation in Machine Learning
Cross-Validation in Machine Learning
Learning
What is cross-validation?
• Cross-validation is a resampling procedure used for evaluating a machine
learning model and testing its performance.
• It helps to compare and select an appropriate model for the specific
predictive modeling problem.
• CV is easy to understand, easy to implement, and it tends to have a lower
bias than other methods used to count the model’s efficiency scores.
• All this makes cross-validation a powerful tool for selecting the best
model for the specific task.
• There are a lot of different techniques that may be used to cross-validate
a model.
k-Fold cross-validation
Configuration of k
• The k value must be chosen carefully for your data sample.
• A poorly chosen value for k may result in a high variance or a high bias.
• Three common tactics for choosing a value for k are as follows:
• Representative: The value for k is chosen such that each train/test group of data
samples is large enough to be statistically representative of the broader dataset.
• k=10: The value for k is fixed to 10, a value that has been found through
experimentation to generally result in a model skill estimate with low bias a modest
variance.
• k=n: The value for k is fixed to n, where n is the size of the dataset to give each test
sample an opportunity to be used in the hold out dataset. This approach is called
leave-one-out cross-validation.
Note: The choice of k is usually 5 or 10, but there is no formal rule. As k gets larger, the difference in
size between the training set and the resampling subsets gets smaller. As this difference
decreases, the bias of the technique becomes smaller.