-
-
Notifications
You must be signed in to change notification settings - Fork 26k
MRG adding test of fit attributes #16286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, a few comments below! Should be move the status to MRG?
'StackingRegressor', 'TfidfVectorizer', 'VotingClassifier', | ||
'VotingRegressor'] | ||
if Estimator.__name__ in IGNORED or Estimator.__name__.startswith('_'): | ||
pytest.xfail( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe pytest.skip
here since realistically we don't intend to make these work in the future.
X_reg -= X_reg.min() | ||
|
||
if is_classifier(est): | ||
X, y = X_classif, y_classif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe call make_classification
/ regression
here only when necessary, and then,
X -= X.min()
once.
est.fit(X, y) | ||
|
||
for attr in attributes: | ||
desc = ' '.join(attr.desc).lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add a comment what this checks for since it's not very clear after reading the ode.
continue | ||
if attr.startswith('_'): | ||
continue | ||
assert attr in fit_attr_names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here it might be better to the filtered fit_attr
(with removed private attributes and known exceptions), than do,
undocumented_attrs = set(fit_attr_names).difference(fit_attr)
assert not undocumented_attrs, "Undocumented attributes: {}".format(undocumented_attrs)
that way all the undocumented attributes are printed at once, and the user doesn't have to iteratively run this test.
fit_attr = [k for k in est.__dict__.keys() if k.endswith('_')] | ||
fit_attr_names = [attr.name for attr in attributes] | ||
for attr in fit_attr: | ||
if attr in ['X_offset_', 'X_scale_', 'fit_', 'partial_fit_', 'x_mean_', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe let's add a comment that these should be removed from the public API.
…uded in new test for doc of all attributes
…tion of tag requires_positive_X broke test of sklearn/tests/test_common.py
…into doc_check_attributes_fit
thx @judithabk6 for taking over. @rth I addressed your last comments. I think it's good enough from my end. more reviews are welcome |
@judithabk6 @agramfort the failing test is related to #16545 : could you please sync with upstream? This will hopefully solve the issue. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments otherwise LGTM, thanks!
est.k = 2 | ||
|
||
if Estimator.__name__ == 'DummyClassifier': | ||
est.strategy = "stratified" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following might work for a larger number of estimators,
from sklearn.utils.estimator_checks import _construct_instance, _set_checking_parameters
est = _construct_instance(Estimator)
_set_checking_parameters(est)
Not asking to do it now, I can change it in a follow up PR.
'SkewedChi2Sampler'} | ||
if Estimator.__name__ in IGNORED: | ||
pytest.xfail( | ||
reason="Classifier has too many undocumented attributes.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: we can now also put these in _xfail_test
estimator tag for individual estimators (https://scikit-learn.org/dev/developers/develop.html#estimator-tags) but it's not critical. Not asking to do it.
Now tests fails because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, assuming CI passes. Thanks!
Merging +1 as this is fairly low risk (extends an existing test). Thanks! |
Yay! |
What does this implement/fix? Explain your changes.
aims to makes sure that all attributes that appear in the fit are documented and vice versa.