Skip to content

Conversation

glemaitre
Copy link
Member

closes #26482
closes #26691
superseded #26691

While reviewing #26691, we saw that there is a coupling between the parameter dictionary in SparseCoder and X. Therefore, the simplest way to not complicate the common tests is to run them at the level of the module and inherit the SparseCoder to magically set the dictionary attribute.

Copy link

github-actions bot commented Nov 4, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: e7e16f3. Link to the linter CI: here

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions for improvement but otherwise, LGTM.

@property
def n_features_in_(self):
"""Number of features seen during `fit`."""
return self.dictionary.shape[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we still expose n_features_in_ when getattr(self, "dictionary", None) is not None?

This would be less disruptive and would not require to change test_sparse_coder_n_features_in (also both cases could be tested instead: with and without calling fit).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I can easily preserve the behaviour with:

    @property
    def n_features_in_(self):
        """Number of features seen during :term:`fit`."""
        return getattr(self, "_n_features_in_", self.dictionary.shape[1])

    @n_features_in_.setter
    def n_features_in_(self, value):
        self._n_features_in_ = value

However, it will break our API and more precisely the check_n_features_in since n_features_in_ should not be set if fit is not called.

I don't know what is best here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense for fit to set n_features_in_, and that would also satisfy our API test, i.e., no property kinda thing.

- |Fix| :class:`decomposition.SparseCoder` now follows the transformer API of
scikit-learn. This class is defined as a stateless transformer meaning that it is not
required to call `fit` before calling `transform`. Note however that parameter
validation only happens at `fit` time as for other scikit-learn estimators.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and input validation (feature names, feature counts)

@property
def n_features_in_(self):
"""Number of features seen during `fit`."""
return self.dictionary.shape[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense for fit to set n_features_in_, and that would also satisfy our API test, i.e., no property kinda thing.

Comment on lines +1033 to +1038
SparseCoder is fundamentally stateless and requires a dictionary parameter
as constructor parameter which makes it challenging to run the usual transformer
checks. This stateful subclass instead creates a random dictionary that is
compatible with the input data dimensions passed to fit to benefit from the
test coverage provided by the usual stateful transformer checks.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O_o, interesting. When we work on allowing custom data generators for tests, this would then probably be void.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we should not have that. I think that now that we have metadata routing, we should instead pass dictionary as a metadata and not as an __init__ parameter. This will avoid this subclassing.

However, we need to have something that generate the necessary metadata thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy if we deprecate it on the constructor and move it to the relevant method here as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Run common test for SparseCoder and FeatureUnion
3 participants