Skip to content

Get the number of iterations in SVR #18928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
leonardtschora opened this issue Nov 27, 2020 · 9 comments · Fixed by #21408
Closed

Get the number of iterations in SVR #18928

leonardtschora opened this issue Nov 27, 2020 · 9 comments · Fixed by #21408

Comments

@leonardtschora
Copy link

Describe the workflow you want to enable

Hi everyone,

I am manipulating SVR objects in GridSearcheCV. I am able to access the mean_fit_time in the cv_results_, but I can't access the number of iterations of the optimization problem.

I would like to have this information to properly set the max_iter parameter of the GridSearch.

Describe your proposed solution

I have tried the following:

from sklearn.svm import SVR
from sklearn.datasets import load_boston

# Load data
X, y = load_boston(return_X_y=True)

# Model test
model = SVR(verbose=4)
model.fit(X, y)

[LibSVM]*
optimization finished, #iter = 351
obj = -3012.975812, rho = -21.172739
nSV = 499, nBSV = 431
Out[1]: SVR(gamma=1.0, verbose=4)

I am interested in getting the #iter field here. It should be available as a property of the model once fitted, and all number of iterations should appear somewhere in the cv_results_.

Also, please not that this feature should be available for all libsvm-based SVM objects: SVC, SVR, etc...

Additional context

I am running this code on:
Python 3.7.3
scikit-learn 0.23.1

Thanks by advance for your support.

@NicolasHug
Copy link
Member

NicolasHug commented Nov 27, 2020

I don't know why we don't expose n_iter (it's possible that LibSVM makes it hard for us).

Regarding GridSearch: attributes aren't stored in cv_results. We can't store fitted models there because that would mean storing n_param_candidates * n_folds models, which can be prohibitively expensive in terms of memory.

You can however access the attributes of the final fitted model from gs.best_estimator_

@NicolasHug
Copy link
Member

NicolasHug commented Nov 27, 2020

For ref:

def check_non_transformer_estimators_n_iter(name, estimator_orig):
# Test that estimators that are not transformers with a parameter
# max_iter, return the attribute of n_iter_ at least 1.
# These models are dependent on external solvers like
# libsvm and accessing the iter parameter is non-trivial.
# SelfTrainingClassifier does not perform an iteration if all samples are
# labeled, hence n_iter_ = 0 is valid.
not_run_check_n_iter = ['Ridge', 'SVR', 'NuSVR', 'NuSVC',
'RidgeClassifier', 'SVC', 'RandomizedLasso',
'LogisticRegressionCV', 'LinearSVC',
'LogisticRegression', 'SelfTrainingClassifier']

@glemaitre
Copy link
Member

Regarding GridSearch: attributes aren't stored in cv_results. We can't store fitted models there because that would mean storing n_param_candidates * n_folds models, which can be prohibitively expensive in terms of memory.

Actually, we can by passing the parameter return_estimator=True and then inspect each estimator.
I am working on an example to show how to do so:

PR

#18821

Artifacts

https://125245-843222-gh.circle-artifacts.com/0/doc/auto_examples/inspection/plot_model_inspection_cross_validation.html

@NicolasHug
Copy link
Member

Yup but that's for cross_validate, @Leonardbcm mentionned GridSearchCV and cv_resulst_ in particular. cross_validate might fit @Leonardbcm 's needs though

@glemaitre
Copy link
Member

glemaitre commented Nov 27, 2020 via email

@jnothman
Copy link
Member

Hacky solution: add a scorer under the grid searcher's scoring parameter, defined like:

def get_n_iter(est, X_test, y_test):
    return est.n_iter_

@leonardtschora
Copy link
Author

Hi, thanks for your answers.

I don't know why we don't expose n_iter (it's possible that LibSVM makes it hard for us).

This is sad but it makes a lot of sense.

Regarding GridSearch: attributes aren't stored in cv_results. We can't store fitted models there because that would mean storing n_param_candidates * n_folds models, which can be prohibitively expensive in terms of memory.

It also makes sense but could be optional. Metrics values are already stored and they also take n_metrics * n_folds per grid search iteration.

cross_validate might fit @Leonardbcm 's needs though

Unfortunately, no. My use-case is the following: I run my grid search on 20CPUs. Most of the iterations are taking 10s per fold to train and evaluate, while a few other are taking hours to converge. As a single model training can't be dispatched on multiple cores, the result is that I have 18 CPUs that are done and 2 that "blocks" the main script. (Does that make sense?).

What I am trying to do is setting the max_iter hyper-parameter for my entiere grid search so that the search is not "blocked" by models that are hard to converge. Accessing the number of realized itertions would allow me to:

  • FInd a reasonnable value for max_iter to allow most model to converge while preserving a low training time
  • Know which models have finished training but have not reached convergence yet (their number of iteration is equal to max_iter.

return est.n_iter_

The problem is that n_iter_ is not an attribute nor a property of the models.

Thanks for your help.

@jmloyola
Copy link
Member

If it is ok, I would like to work on this.
I already have a draft with the number of iterations stored as an attribute for the class BaseLibSVM.
Though, I still need to document the changes and do more testing.

Should we try to expose more attributes from the libsvm code-base?

@glemaitre
Copy link
Member

Exposing n_iter_ would be enough for the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants