Skip to content

DOC Improve docstrings for scikit-learn configuration functions #31486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

StefanieSenger
Copy link
Contributor

@StefanieSenger StefanieSenger commented Jun 5, 2025

This PR improves the docstrings of set_config, get_config, config_context, to help users understand how to use them.

Specifically:

  • explains the concept of global default configurations
  • better distinguishes between function argument defaults (None) and global defaults

Copy link

github-actions bot commented Jun 5, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: efb3fc3. Link to the linter CI: here

Comment on lines 256 to 258
It is recommended to use this context manager over :func:`set_config` for
experimental features (such as `array_api_dispatch` or `enable_metadata_routing`),
as their support is not yet consistently available across the library.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not true. Using context managers for such settings actually probably results in confusion. I'd remove the paragraph.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Genuine question, I didn't realise this? In the array API docs this is what we seem to suggest: https://scikit-learn.org/dev/modules/array_api.html#pytorch-support

Copy link
Contributor Author

@StefanieSenger StefanieSenger Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in array API, we would suggest users to use config_context and I also remember that we had talked about encouraging users to rather use config_context for metadata routing, because then they cannot forget to disable it and don't get error messages, that are hard to trace back to what caused them. But that was when it was not as widely available as it is now.

Should I maybe only remove its mention, so that metadata routing is not so exposed here? I think it's outgrowing experimental state and that maybe for it that soon wouldn't be the right advise anymore.

Suggested change
It is recommended to use this context manager over :func:`set_config` for
experimental features (such as `array_api_dispatch` or `enable_metadata_routing`),
as their support is not yet consistently available across the library.
It is recommended to use this context manager over :func:`set_config` for
experimental features (such as `array_api_dispatch`),
as their support is not yet consistently available across the library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issues with using context managers in array API and metadata routing:

  • users might fit with array API enabled, which results in certain private attributes being stored in the same space as the input. Then down the script they forget to enable array API and predict would fail since it would convert input to numpy and try to do computation with numpy on one side, and original space of the input data on the other. We don't have a good solution for this and we just error I guess.
  • with metadata routing, users might set requests via (set_{method}_request) for instance, and then down there forget to enable routing and try a GridSearchCV assuming things get routed.

So in both cases, it's better to either enable it or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I do not necessarily agree, because usage mistakes can happen whether set_config or config_context are used, but it's not wrong to leave this recommendation out either, so I will remove this part.

About different namespaces during fit and predict:

We don't have a good solution for this and we just error I guess.

We don't error currently. I think if we would error, then the error message could point towards enabling array_api_dispatch, and then using config_context would not be a hurdle. This question is currently discussed on the bottom of issue #28668, especially from #28668 (comment) onwards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also had an issue about it: #26083

Comment on lines 252 to 253
This context manager can be used to apply scikit-learn configuration changes locally
within a specific scope. Once the context exits, the previous global configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This context manager can be used to apply scikit-learn configuration changes locally
within a specific scope. Once the context exits, the previous global configuration
This context manager can be used to apply scikit-learn configuration changes locally,
within a context block. Once the context exits, the previous global configuration

not 100% on this.

Copy link
Contributor Author

@StefanieSenger StefanieSenger Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit redundant and I also wanted to educate people who don't know what a context manager is.

What about:

Suggested change
This context manager can be used to apply scikit-learn configuration changes locally
within a specific scope. Once the context exits, the previous global configuration
This context manager can be used to apply scikit-learn configuration changes
locally, limited to the scope of the with statement. Once the context exits, the
previous global configuration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about?

Suggested change
This context manager can be used to apply scikit-learn configuration changes locally
within a specific scope. Once the context exits, the previous global configuration
This context manager can be used to apply scikit-learn configuration changes within
the scope of the with statement. Once the context exits, the previous global configuration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants