Skip to content

Conversation

FrancoisPgm
Copy link
Contributor

@FrancoisPgm FrancoisPgm commented Sep 2, 2025

Reference Issues/PRs

Towards #26482
See also #27724 and #26691 which appear to be stalled.
Closes #27724
Closes #26691

What does this implement/fix? Explain your changes.

This runs the common estimator tests on SparseCoder.

To match the behavior expected by the common tests, n_components_ and n_features_in_ are changed from properties to attributes initialized in the fitmethod. validate_data is run in fit.

Specific dictionary arguments for checks are added in PER_ESTIMATOR_CHECK_PARAMS.

To be able to use a specific dictionary in check_set_output_transform, _yield_instances_for_check is used in test_set_output_transform.

Copy link

github-actions bot commented Sep 2, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: fbb9dbf. Link to the linter CI: here

FrancoisPgm and others added 2 commits September 2, 2025 15:42
Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
Comment on lines -1397 to -1401
@property
def n_features_in_(self):
"""Number of features seen during `fit`."""
return self.dictionary.shape[1]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: With this modification, n_features_in_ is now only set if fit is called. It makes this estimator follow our API and in line with this discussion #27724 (comment)

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of suggestions for the docstrings. Looks good otherwise. Please also add a changelog entry. You can say that SparseCoder now follows the transformer API of scikit-learn and that now parameter and input validation is executed in fit (you can follow these instructions to write a changelog fragment https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md)

FrancoisPgm and others added 3 commits September 3, 2025 09:59
Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @FrancoisPgm

@jeremiedbb jeremiedbb added the Waiting for Second Reviewer First reviewer is done, need a second one! label Sep 3, 2025
@jeremiedbb
Copy link
Member

maybe @adrinjalali for a second review since you reviewed the previous attempt #27724.

…ancement.rst

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
@@ -711,6 +716,38 @@
],
},
SkewedChi2Sampler: {"check_dict_unchanged": dict(n_components=1)},
SparseCoder: {
"check_estimators_dtypes": dict(dictionary=rng.normal(size=(5, 5))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a single case where we can set to have it pass all the tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dictionary is not a very friendly parameter because it needs to have a shape compatible with X, but all the checks have different Xs

@@ -356,6 +360,7 @@ def test_set_output_transform(estimator):
f"Skipping check_set_output_transform for {name}: Does not support"
" set_output API"
)
estimator = next(_yield_instances_for_check(check_set_output_transform, estimator))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be a nice catch, but I think we should then go through all instances for this test, instead of only the first one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, for the sparsecoder right now it's always one instance but it can be more. I'll add a for loop to go through all instances.

@@ -0,0 +1,2 @@
- :class:`decomposition.SparseCoder` now follows the transformer API of scikit-learn.
In addition, the :meth:`fit` method now validates the input and parameters.
Copy link
Member

@jeremiedbb jeremiedbb Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reference yourself as author of the PR: "By :user:`your name <your handle>`".

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@adrinjalali adrinjalali merged commit 68218f7 into scikit-learn:main Sep 9, 2025
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants