-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Adds documentation for parallelisation of custom scorer #12813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Adds documentation for parallelisation of custom scorer #12813
Conversation
I know this is a restriction if pickle is used but is this still needed now that joblib is using cloudpickle? |
@albertcthomas would it make sense to add this for pre-cloudpickle versions of sklearn ? Could you let me know which version onwards has cloudpickle ? |
I was just wondering. It might be relevant even with cloudpickle. cc @tomMoral |
Thanks. Can you test if the issue persists in 0.20.1? If so, then cloudpickle doesn't help. |
Installed
|
So now that joblib is using cloudpickle I think that it should work out of the box for most of the cases (if the loky backend is used). Instead of the example we can maybe add a remark in the doc saying that if a custom scoring function is used with |
Thanks, @albertcthomas. @amueller @adrinjalali - comments ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need an example of importing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now are we sure this is necessary with current joblib?
@jnothman The new Loky backend takes care of serialising the interactively-defined functions, but I'm not sure if older versions of sklearn will work with this new backend. Can I suggest this to be kept for reasons of backward compatibility ? |
I think it would be better to say that this example is useful as other backends than loky (multiprocessing, custom backends) can be used. |
Roger that, @albertcthomas. |
Well it would be nice if someone could confirm that this is the expected behavior when loky is used and when a different backend is used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my two comments below. Also as rmse
is not defined anywhere in the example and as the example is not run, why do you think of using custom_scoring_function
or custom_scorer
instead of rmse
?
IMO |
We could keep an abstract function name, like 'custom_scoring_function`, probably ? |
Yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fx86! Almost there :)
To make things even clearer I would also suggest putting your addition in a Notes section with a well-chosen title such as
.. notes:: **Using custom scorers in functions where `n_jobs > 1`**
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last comment. Otherwise LGTM! Thanks @fx86
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt this will be a frequent issue with loky/cloudpickle, but I'm okay to merge. Thanks @fx86
This reverts commit d551ff2.
This reverts commit d551ff2.
Reference Issues/PRs
Adds documentation on parallelisation of custom scorers. Without importing from a module,
joblib is unable to run custom scoring functions in parallel.
Prompted from this discussion - #10054 (comment)
ping @jnothman @amueller