Nested cross-validation: why the proposed example does not have an hold-out set? #31510

SamGG · 2025-06-09T18:51:40Z

SamGG
Jun 9, 2025

I am learning Machine Learning and exploring nested cross-validation.

I don't understand the example given in scikit-learn. The model seems to learn from the whole dataset and the evaluation is not performed on a hold-out set.
scikit documentation
scikit implementation

# Loop for each trial
for i in range(NUM_TRIALS):
    # Choose cross-validation techniques for the inner and outer loops,
    # independently of the dataset.
    inner_cv = KFold(n_splits=4, shuffle=True, random_state=i)
    outer_cv = KFold(n_splits=4, shuffle=True, random_state=i)

    # Nested CV with parameter optimization
    clf = GridSearchCV(estimator=svm, param_grid=p_grid, cv=inner_cv)
    nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)
    nested_scores[i] = nested_score.mean()

From what I read in Applied Predictive Modeling from Kuhn & Johnson, the model resulting from the inner loop should be evaluated on the hold-out set of the outer loop and the following post adheres to this point machinelearningmastery blog

As I am far from a Python expert, could you tell me the advantages, drawbacks and purposes of both of these implementations?

I read #21621 but I am not sure if it really answers my question. If it does, let me know and I will try to carefully understand it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Nested cross-validation: why the proposed example does not have an hold-out set? #31510

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Nested cross-validation: why the proposed example does not have an hold-out set? #31510

Uh oh!

SamGG Jun 9, 2025

Replies: 0 comments

SamGG
Jun 9, 2025