-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Parallel computing with nested cross-validation #10232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
why do think it must be set to 1?
|
You can probably refer to #3754 which should be highly related to your question.
Good catch :) |
I am closing since this is more related to joblib |
Should this be in the FAQ? I get asked this regularly (and on a webcast two days ago). It's also non-obvious to users how to decide which n_jobs to set and we could give some guidance on that. |
it seems reasonable. |
Can I take this? |
Go for it
…On 13 December 2017 at 16:27, Akash ***@***.***> wrote:
Can I take this?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#10232 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz61s0fL0grsjuyklG9qXq1-IWT3uhks5s_2BcgaJpZM4Qwy9x>
.
|
Will have to solve this one. Right? |
If you feel you have the experience, or can experiment/research, in order
to understand what to right, then do it.
…On 13 December 2017 at 17:16, Akash ***@***.***> wrote:
Should this be in the FAQ? I get asked this regularly (and on a webcast
two days ago). It's also non-obvious to users how to decide which n_jobs to
set and we could give some guidance on that.
Will have to solve this one. Right?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10232 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz68kx6Trqil8JZpxrOBT3G6Jwv01Sks5s_2u9gaJpZM4Qwy9x>
.
|
sorry: what to *write.
…On 13 December 2017 at 17:18, Joel Nothman ***@***.***> wrote:
If you feel you have the experience, or can experiment/research, in order
to understand what to right, then do it.
On 13 December 2017 at 17:16, Akash ***@***.***> wrote:
> Should this be in the FAQ? I get asked this regularly (and on a webcast
> two days ago). It's also non-obvious to users how to decide which n_jobs to
> set and we could give some guidance on that.
>
> Will have to solve this one. Right?
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#10232 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAEz68kx6Trqil8JZpxrOBT3G6Jwv01Sks5s_2u9gaJpZM4Qwy9x>
> .
>
|
Will figure out |
Has this been done now? Do we still have to specify |
I don't think we have a solution to this yet
|
There already seems to be 2 questions in FAQ(First and Second) which address Am I supposed to add a new question explaining how to use |
Hi, |
From the last PR linked above, nested parallelism is now enabled by default when you use Unless we want to document this more extensively in a tutorial, I suggest we close this issue. |
I'm going to close this as the original post was addressed: no need to set n_jobs to 1. As to documenting what We have related docs in https://scikit-learn.org/stable/modules/computing.html#parallelism and there is also #14228 which I think will tackle most of it. |
Although, this is closed I think it would be good for an update, for users who land here now: What would be the current best practise for parallelism in nested cross validation with sklearn today? Running inside a jupyter notebook, I am trying to use parallel computation on a server (120 cpu cores) like so: with parallel_backend('loky', n_jobs=-1):
innerCV = GridSearchCV(
pipe,
params,
scoring= scoring,
refit= refit_scorer,
cv=10,
verbose=1,
)
outerCV = cross_validate(
innerCV,
model_X,
model_y,
scoring=scoring,
cv=10,
return_estimator=True,
verbose=1,
) The It runs without errors, however, I am not sure if it is completely optimised. Some time during the fit I see load on all CPUs but most of the time just 10 of them get to work. I assume this is due to the The times when all CPUs are in use might be when an estimator is tested which has some internal (numpy) parallelisation, I assume? So, is this a tangible way today to approach nested CV parallelisation in sklearn today? ...Or would it be better to:
Any guidance welcome! |
Dear sklearn's experts,
Standard use of nested cross-validation within sklearn doesn't allow multi-core computing. As in the example below, njobs has to be set to 1 for inner/outer loops:
Would there be any no too difficult way to parallelize jobs in nested cross-validation, which would allow to highly reduce time-consuming computing ?
Thanks in advance !
Best,
Matthieu
The text was updated successfully, but these errors were encountered: