-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
FIX make SparseCoder pass the common tests #27724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions for improvement but otherwise, LGTM.
@property | ||
def n_features_in_(self): | ||
"""Number of features seen during `fit`.""" | ||
return self.dictionary.shape[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we still expose n_features_in_
when getattr(self, "dictionary", None) is not None
?
This would be less disruptive and would not require to change test_sparse_coder_n_features_in
(also both cases could be tested instead: with and without calling fit
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I can easily preserve the behaviour with:
@property
def n_features_in_(self):
"""Number of features seen during :term:`fit`."""
return getattr(self, "_n_features_in_", self.dictionary.shape[1])
@n_features_in_.setter
def n_features_in_(self, value):
self._n_features_in_ = value
However, it will break our API and more precisely the check_n_features_in
since n_features_in_
should not be set if fit
is not called.
I don't know what is best here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes more sense for fit
to set n_features_in_
, and that would also satisfy our API test, i.e., no property kinda thing.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
- |Fix| :class:`decomposition.SparseCoder` now follows the transformer API of | ||
scikit-learn. This class is defined as a stateless transformer meaning that it is not | ||
required to call `fit` before calling `transform`. Note however that parameter | ||
validation only happens at `fit` time as for other scikit-learn estimators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and input validation (feature names, feature counts)
@property | ||
def n_features_in_(self): | ||
"""Number of features seen during `fit`.""" | ||
return self.dictionary.shape[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes more sense for fit
to set n_features_in_
, and that would also satisfy our API test, i.e., no property kinda thing.
SparseCoder is fundamentally stateless and requires a dictionary parameter | ||
as constructor parameter which makes it challenging to run the usual transformer | ||
checks. This stateful subclass instead creates a random dictionary that is | ||
compatible with the input data dimensions passed to fit to benefit from the | ||
test coverage provided by the usual stateful transformer checks. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O_o, interesting. When we work on allowing custom data generators for tests, this would then probably be void.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we should not have that. I think that now that we have metadata routing, we should instead pass dictionary
as a metadata and not as an __init__
parameter. This will avoid this subclassing.
However, we need to have something that generate the necessary metadata thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy if we deprecate it on the constructor and move it to the relevant method here as well.
closes #26482
closes #26691
superseded #26691
While reviewing #26691, we saw that there is a coupling between the parameter
dictionary
inSparseCoder
andX
. Therefore, the simplest way to not complicate the common tests is to run them at the level of the module and inherit theSparseCoder
to magically set thedictionary
attribute.