Skip to content

DOC Remove scikit-learn-only docs references related to metadata routing #26747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

DOC Remove scikit-learn-only docs references related to metadata routing #26747

wants to merge 1 commit into from

Conversation

jameslamb
Copy link
Contributor

Reference Issues/PRs

The release of scikit-learn v1.3.0 two days ago (link) included the addition of new public methods for metadata routing, specifically #24027.

That change broke documentation generation in LightGBM (https://github.com/microsoft/LightGBM), and I suspect it might affect other projects which implement scikit-learn-compatible estimators similarly. The LightGBM-specific details are documented in microsoft/LightGBM#5954 (comment), but I've included the relevant points below.

What does this implement/fix? Explain your changes.

Proposes removing reStructuredText :ref: references to scikit-learn's own documentation in public methods of classes which are intended to be inherited from by other libraries.

In LightGBM, we follow the advice in "Developing scikit-learn estimators" (link) and inherit from sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin, and sklearn.base.RegressorMixin to reduce boilerplate and keep up with changes here in scikit-learn.

#24027 introduced reStructuredText references in public methods of those classes which only resolve correctly when rendered inside scikit-learn's own docs, like this:

Please check :ref:`User Guide <metadata_routing>` on how the routing
mechanism works.

In LightGBM, we use sphinx.ext.autodoc to generate documentation of the lightgbm Python package, e.g. https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#. And we set the following in our Sphinx configuration (code link)

autodoc_default_options = {
    "members": True,
    "inherited-members": True,
    "show-inheritance": True,
}

With that combination, sphinx.ext.autodoc automatically includes the rendered docstrings for every class attribute and method which doesn't begin with a _, including those inherited from the class's parent classes. That means as of scikit-learn v1.3.0, Sphinx will try to render method documentation for e.g. lightgbm.sklearn.LGBMClassifier.get_metadata_routing.

Because of the scikit-learn-docs-specific references in scikit-learn's docstrings, that can lead to docs-generation errors like the following:

Warning, treated as error:
/home/runner/work/LightGBM/LightGBM/python-package/lightgbm/dask.py:docstring of lightgbm.dask.DaskLGBMClassifier:1:undefined label: 'metadata_routing'

Any other comments?

I suspect that the setup for LightGBM that I've described is one that other projects implementing scikit-learn-compatible estimators might have for their documentation + strategy for remaining compatible. That's why I think a change to the docs here in scikit-learn is preferable to the following things I could do in LightGBM to work around this:

  • explicitly add all the new APIs like get_metadata_routing and the set_*_request() methods, and define new docstrings on them, in lightgbm's source code
  • create references in lightgbm's Sphinx docs which catch those references like :ref:User Guide <metadata_routing> and re-route out to the relevant page(s) at https://scikit-learn.org/

Since the things in sklearn.base are expected to be imported by other projects creating scikit-learn-compatible estimators, specifically to reduce such boilerplate and duplicate effort across downstream projects, I think they should be free from docs references which assume they're being rendered inside scikit-learn's own documentation.

Thanks very much for your time and consideration.

@github-actions
Copy link

github-actions bot commented Jul 2, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 7ffed24. Link to the linter CI: here

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and taking the time to explain the problem in details. I think this is a legitimate request.

WDYT @adrinjalali? Shall we use static URLs instead? Or is the text suggested in the PR enough?

@ogrisel ogrisel added this to the 1.3.1 milestone Jul 3, 2023
@ogrisel ogrisel added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Jul 3, 2023
@thomasjpfan
Copy link
Member

This issue came up during review of metadata_routing. We decided that third party libraries should enable intersphinx to get the links to work. I opened microsoft/LightGBM#5956 to enable intersphinx in LightGBM.

Although, I am okay accepting this PR with static links, so it is easier for third party libraries.

@jameslamb
Copy link
Contributor Author

This issue came up during review of metadata_routing. We decided that third party libraries should enable intersphinx to get the links to work

Ah interesting. I didn't notice that. I guess we just got lucky that we hadn't run into this in LightGBM before, based on the mix of things we import from scikit-learn.

I think the intersphinx approach like what you proposed in microsoft/LightGBM#5956 is a really good one, as it allows for rich linking between docs (making things easier for users) while allowing projects importing from scikit-learn + using something like autodoc to not need to care about the types of issues I described at the top of this PR.

A note in https://scikit-learn.org/stable/developers/develop.html about this might help other developers of downstream libraries avoid stumbling over this the way I did... but I'll leave it to yall whether or not add that. Would totally understand if you want to keep that document free of such details.

Given all this... it's ok with me if you all choose to close this PR without merging it. I'll leave that up to you to decide. Thanks very much for your time and consideration on this issue!

@Remi-Gau
Copy link

Remi-Gau commented Jul 4, 2023

This issue came up during review of metadata_routing. We decided that third party libraries should enable intersphinx to get the links to work. I opened microsoft/LightGBM#5956 to enable intersphinx in LightGBM.

We are getting the same problem in the nilearn doc build: is this recommendation for third party mentioned somewhere in the doc? Sorry if I missed it.

@adrinjalali
Copy link
Member

I don't think we should change them to static links, since it'll be brittle and links might change. Closing this one as the solution is an easy on to implement through intersphinx.

@adrinjalali adrinjalali closed this Jul 5, 2023
@jameslamb jameslamb deleted the docs/metadata-routing branch July 5, 2023 12:59
davidt0x added a commit to davidt0x/brainiak that referenced this pull request Aug 7, 2023
The following warning generates and error in doc build.

Warning, treated as error:
/home/runner/work/brainiak/brainiak/brainiak/factoranalysis/htfa.py:docstring of brainiak.factoranalysis.htfa:1:undefined label: 'metadata_routing'

Seems to be discussed here:

scikit-learn/scikit-learn#26747
@dokato
Copy link
Contributor

dokato commented May 3, 2024

Although that really makes life difficult for people who work on secure cluster where access to external urls via intersphix is limited.

@adrinjalali
Copy link
Member

You can hook into sphinx's resolver and replace the links as you want. But that usecase is not something we can support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation module:metrics module:utils To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants