-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] run check_estimator on meta-estimators #9741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
some rebase :-/ I'll fix the history once #9716 is merged. |
I guess we need a tag to skip the "modify init parameter" check as long as pipeline and feature union violate the API? |
remove outdated comment fix also for FeatureUnion [MRG+2] Limiting n_components by both n_features and n_samples instead of just n_features (Recreated PR) (scikit-learn#8742) [MRG+1] Remove hard dependency on nose (scikit-learn#9670) MAINT Stop vendoring sphinx-gallery (scikit-learn#9403) CI upgrade travis to run on new numpy release (scikit-learn#9096) CI Make it possible to run doctests in .rst files with pytest (scikit-learn#9697) * doc/datasets/conftest.py to implement the equivalent of nose fixtures * add conftest.py in root folder to ensure that sklearn local folder is used rather than the package in site-packages * test doc with pytest in Travis * move custom_data_home definition from nose fixture to .rst file [MRG+1] avoid integer overflow by using floats for matthews_corrcoef (scikit-learn#9693) * Fix bug#9622: avoid integer overflow by using floats for matthews_corrcoef * matthews_corrcoef: cosmetic change requested by jnothman * Add test_matthews_corrcoef_overflow for Bug#9622 * test_matthews_corrcoef_overflow: clean-up and make deterministic * matthews_corrcoef: pass dtype=np.float64 to sum & trace instead of using astype * test_matthews_corrcoef_overflow: add simple deterministic tests TST Platform independent hash collision tests in FeatureHasher (scikit-learn#9710) TST More informative error message in test_preserve_trustworthiness_approximately (scikit-learn#9738) add some rudimentary tests for meta-estimators fix extra whitespace in error message add missing if_delegate_has_method in pipeline don't test tuple pipeline for now only copy list if not list already? doesn't seem to help?
5cb5323
to
307c360
Compare
@jnothman do you have an opinion on what to do about the "modifying init param in fit" failure for pipelines? Should we create a work-around or start the worst deprecation cycle yet and introduce |
I don't think we can avoid the modifying init param in fit failure for a
while.
The deprecation is a 1.0 sort of an issue. As I've said before, it's best
enabled by a future parameter, rather than a straightforward deprecation.
But first we have to be sure we can handle all common use-cases for
pipelines in an intuitive way. Pipeline(other_pipeline.steps[:-1]) is
currently straightforward whether or not other_pipeline is fitted, for
example.
…On 20 September 2017 at 04:18, Andreas Mueller ***@***.***> wrote:
@jnothman <https://github.com/jnothman> do you have an opinion on what to
do about the "modifying init param in fit" failure for pipelines? Should we
create a work-around or start the worst deprecation cycle yet and introduce
steps_?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9741 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6xd38TWn3t5gk3K8hCBWQewVbdlCks5skAV6gaJpZM4PVBwz>
.
|
Ok, so skip that test? I'll go back to finishing up the tags then ;) |
Depends if you see that as a bug of a feature. If one passes an estimator to a pipeline, naively I would expect that estimator to be fitted when the pipeline is fit, as it happening now. Not have a some separate cloned instance created in the pipeline. At least that's what happens in DL libraries I think and it makes more sense to me, aside the fact that it doesn't respect the scikit-learn API contract. Was that contract designed with pipelines in view though? |
I think there's a separate issue to track that, #8157. Maybe it makes sense to figure out a short-term solution here for the tests? |
# Conflicts: # sklearn/pipeline.py # sklearn/utils/estimator_checks.py
Ok so I have no idea how to make |
I'm hacking around the pipeline cloning issue here. The missing value tags are a bit of an issue and generally defining tags for pipelines will be tricky, but this is a start. I didn't add ColumnTransformer yet, because there are too many issues. |
ping @NicolasHug |
# grid-search | ||
GridSearchCV(LogisticRegression(), {'C': [0.1, 1]}, cv=2), | ||
# will fail tragically | ||
# make_pipeline(StandardScaler(), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this failing because of how None
or 'passthrough'
does not support _get_tags
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there were multiple issues
the validation and ducktyping in pipeline is a mess. But I think adding at least some tests is good... |
Agreed. |
I was just reminded that |
I would expect the following to work as expected: from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y,
random_state=0)
pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC())]) The unfitted pipeline will error. from sklearn.utils.validation import check_is_fitted
check_is_fitted(pipe, "n_features_in_") NotFittedError: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator. And the following will not error. pipe.fit(X, y)
check_is_fitted(pipe, "n_features_in_") |
Our new |
I am confused regarding this topic. It is coming from time to time and I don't now recall what are we supposed to do (probably we don't know :)). From what I recall, we basically don't know what to validate in a I was under the impression that @adrinjalali do you recall the latest news regarding this topic? |
To me, a I'd just set a |
maybe |
So should I just submit a PR setting some random attribute in pipeline for check_is_fitted to work? |
I would be happy to review. It could be nice to have a review of @jnothman that always think about some side effects that I am not aware of. |
Let's go for it. I run into the pipeline +
|
The other PR only checks for |
Indeed sorry. It needs an update though. Let's reopen not to forget. |
@adrinjalali We already run the common tests on many meta estimators: scikit-learn/sklearn/tests/test_common.py Line 150 in 006ccdb
scikit-learn/sklearn/tests/test_common.py Line 358 in 006ccdb
I think this PR can be closed. |
Fixes #9443.
After #9716 I think it's time to add these checks.