Skip to content

[WIP] Test determinism of estimators #7270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

[WIP] Test determinism of estimators #7270

wants to merge 5 commits into from

Conversation

betatim
Copy link
Member

@betatim betatim commented Aug 27, 2016

Reference Issue

Fixes #7139

What does this implement/fix? Explain your changes.

Fit two instances of the same estimator on a toy problem. Use both to predict on an unseen subset of data and compare predictions. If an estimator has a random_state argument provide it, if not then not.

Any other comments?

There are a few estimators that I am skipping at the moment because they fail the test. Most blatantly are not deterministic but two or so have a different error.

Unsure about the location, should this be a check in utils/estimator_checks.py instead?

  • move to check_deterministic inutils/estimator_checks.py
  • transformers
  • unsupervised algorithms
  • why do HuberRegressor, LogisticRegressionCV, LinearRegression, RANSACRegressor fail?
  • why does RadiusNeighborsClassifier fail? Failure mode is different to the above

Created a test that fits two copies of each estimators and
compares their predictions on unseen data.
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5)

for name, Estimator in all_estimators(type_filter=['classifier',
'regressor']):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transformers and unsupervised still to do

est1.fit(X_train, y_train)
est2.fit(X_train, y_train)

assert_array_almost_equal(est1.predict(X_test),
Copy link
Member

@TomDLT TomDLT Aug 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be assert_array_equal instead of assert_array_almost_equal ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for classifiers and regressors, for the latter we need almost equal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But can we call deterministic a method that does not give always the exact same result?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a machine with floating point precision? I would.

@TomDLT
Copy link
Member

TomDLT commented Aug 29, 2016

Note that for CV estimators (ElasticNetCV, ...), when no cv parameter is given, then the random state is not passed to the cross-validation split, which may lead to non-deterministic behaviors.

@betatim
Copy link
Member Author

betatim commented Aug 29, 2016

Note that for CV estimators (ElasticNetCV, ...), when no cv parameter is given, then the random state is not passed to the cross-validation split, which may lead to non-deterministic behaviors.

Interesting, it seems except for LogisticRegressionCV the CV estimators are passing this test. Should I consider that a bug of the test?

@jnothman
Copy link
Member

jnothman commented Aug 29, 2016

Do we have good reason to believe there are multiple estimators that should fail such a test in scikit-learn?

In terms of the specifics, equivalence of predict is going to be a very weak test. Classifiers, for instance, may have subtly different boundaries, but produce the same predictions. The assert_model_equal helper proposed at #4841 might be more robust as a test.

You might also consider whether some datasets are more likely to elicit variation than others. For example, I'd consider using a dataset with a random target so that a meaningful classifier is unlikely learnable.

y_train = np.reshape(y_train, (-1, 1))
y_test = np.reshape(y_test, (-1, 1))

needs_state = 'random_state' in signature(Estimator.__init__).parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use get_params()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You end up with something like needs_state = 'random_state' in Estimator().get_params() is that really nicer than inspecting the __init__ arguments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a helper set_random_state in the estimator_checks.

@jnothman
Copy link
Member

I should add: without something like assert_model_equal, decision_function remains a more powerful test for classifiers than predict

@betatim
Copy link
Member Author

betatim commented Aug 29, 2016

Do we have good reason to believe there are multiple estimators that should fail such a test in scikit-learn?

Given @TomDLT's comment about the *CV estimators not passing on random_state I would expect all of them to fail this test. Without a fixed CV split they should give (statistical) equivalent but not identical estimators. IMHO this test should test for identity not (statistically) equivalence.

I like the idea on the random dataset.

@jnothman
Copy link
Member

The other thing that comes to mind here is that we may have false negatives
if, as is often the case, a parameter controls the use of randomness. This
is a limitation of check_estimator altogether that assumes an estimator
class is homogeneous in its behaviour regardless of parameters.

On 29 August 2016 at 22:05, Tim Head notifications@github.com wrote:

Do we have good reason to believe there are multiple estimators that
should fail such a test in scikit-learn?

Given @TomDLT https://github.com/TomDLT's comment about the *CV
estimators not passing on random_state I would expect all of them to fail
this test. Without a fixed CV split they should give (statistical)
equivalent but not identical estimators. IMHO this test should test for
identity not (statistically) equivalence.

I like the idea on the random dataset.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#7270 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz63OZlL92r-I1xpjOEBre0HshHemgks5qkssHgaJpZM4Juyv_
.

@TomDLT
Copy link
Member

TomDLT commented Aug 29, 2016

The other thing that comes to mind here is that we may have false negatives
if, as is often the case, a parameter controls the use of randomness.

Starting with the different solvers that may exists in one estimator.

@amueller
Copy link
Member

On 08/29/2016 06:05 AM, Tom Dupré la Tour wrote:

Note that for CV estimators (|ElasticNetCV|, ...), when no |cv|
parameter is given, then the random state is not passed to the
cross-validation split, which may lead to non-deterministic behaviors.

All the default ones use deterministic splitting.

@amueller
Copy link
Member

On 08/29/2016 08:08 AM, Joel Nothman wrote:

The other thing that comes to mind here is that we may have false
negatives
if, as is often the case, a parameter controls the use of randomness. This
is a limitation of check_estimator altogether that assumes an estimator
class is homogeneous in its behaviour regardless of parameters.
yeah, we could remedy that with stronger annotations.
If an estimator has several "modes" (like solvers) we could provide
dictionaries
that enumerate them. That's sort of heavy-handed, though.
The definition would basically be "the set of parameters so that trying all
explores all paths". Though there could still be data-dependent paths
in the code that are harder to explore.

If we would "just" describe the legal space of all parameters in a more
formal way
(like the python3 type annotation) and we assume that only categorical
variables
change the semantics of a model, then we could also try all combinations.
That would probably not be a unit test, though.

@betatim
Copy link
Member Author

betatim commented Aug 30, 2016

IMHO having this test is useful, even if it only covers the default argument case (perfection being the enemy of good or some such).

LinearRegressionCV uses StratifiedKFold without passing on random_state. ElasticNetCV (and friends) use KFold without shuffling (-> deterministic). Does someone remember the reasoning behind this choice? Can we change either check_cv to pass through a random_state or LinearRegressionCV to use KFold (without shuffling)?

@jnothman
Copy link
Member

It's pretty weird that Regression uses stratified CV...? And certainly random_state should be passed on. I'd be interested to see if checking pickle equality gives the same results though.

@amueller
Copy link
Member

LinearRegressionCV uses stratified? That's a bug.
Not using shuffling by default was as choice made to show users if they have correlations in their data.

@amueller amueller added this to the 0.19 milestone Nov 16, 2016
@jnothman jnothman modified the milestones: 0.20, 0.19 Jun 13, 2017
@glemaitre glemaitre removed this from the 0.20 milestone Jun 13, 2018
@glemaitre glemaitre added this to the 0.21 milestone Jun 13, 2018
@jnothman
Copy link
Member

I think several comments here are red herrings.

Note that for CV estimators (ElasticNetCV, ...), when no cv parameter is given, then the random state is not passed to the cross-validation split, which may lead to non-deterministic behaviors.

Our default cv (KFold, StratifiedKFold) are not randomised by default.

Do we want to continue on with this? Marking as Stalled.

@jjerphan
Copy link
Member

Do we want to continue on with this?

@jnothman: I am interested to carry on if this is relevant.

Base automatically changed from master to main January 22, 2021 10:49
@cmarmo cmarmo added Needs Decision Requires decision and removed help wanted labels Aug 15, 2022
@betatim betatim closed this by deleting the head repository Sep 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add common test: fixing random_state makes algorithms deterministic.
7 participants