[MRG] Better error message for MiniBatchKMeans partial_fit when number of features changes #12431

jeremiedbb · 2018-10-22T13:14:59Z

Fixes #12430
It's all in the title. I don't think test is needed for that.

eamanu · 2018-10-22T15:31:07Z

Are not there any test that assert with this rasise ?

jeremiedbb · 2018-10-22T15:35:37Z

Yes, they are in common tests.

amueller · 2018-10-22T17:46:41Z

I don't understand the statement about tests. If this is tested, how did the tests pass prior?

jeremiedbb · 2018-10-22T17:55:18Z

There is a common test to check that any estimator, that have a partial_fit method, raises an error when the number of features changes between calls.
An error is actually raised, so the test pass (see #12430). My point was just that the message could be a bit more clear. Because the actual message comes from numpy, about broadcasting failure, or from a function called inside partial_fit, with an error message concerning the parameters of that function.

My comment about not needing test was unclear. I meant I think we don't need test for the change I made because it's already tested in common tests.

amueller · 2018-10-22T17:56:52Z

can we change the common test to test for a good error message?

jeremiedbb · 2018-10-23T09:06:57Z

can we change the common test to test for a good error message?

I totally agree. Below are the error messages we get when the number of features changes between calls to partial_fit:

MultinomialNB:  "Number of features 9 does not match previous data 10."
BernoulliNB:  "Number of features 9 does not match previous data 10."
Perceptron:  "Number of features 9 does not match previous data 10."
SGDClassifier:  "Number of features 9 does not match previous data 10."
PassiveAggressiveClassifier:  "Number of features 9 does not match previous data 10."
MLPClassifier:  "shapes (100,9) and (10,100) not aligned: 9 (dim 1) != 10 (dim 0)"
SGDRegressor:  "Number of features 9 does not match previous data 10."
MLPRegressor:  "shapes (100,9) and (10,100) not aligned: 9 (dim 1) != 10 (dim 0)"
PassiveAggressiveRegressor :  "Number of features 9 does not match previous data 10."
MiniBatchKMeans:  "Incompatible dimension for X and Y matrices: X.shape[1] == 9 while Y.shape[1] == 10"
Birch:  "Training data and predicted data do not have same number of features."
MiniBatchDictionaryLearning:  "shapes (9,10) and (9,100) not aligned: 10 (dim 1) != 9 (dim 0)"
IncrementalPCA:  "operands could not be broadcast together with shapes (10,) (9,)"
LatentDirichletAllocation:  "The provided data has 9 dimensions while the model was trained with feature size 10."
StandardScaler:  "operands could not be broadcast together with shapes (10,) (9,)"
MinMaxScaler:  "operands could not be broadcast together with shapes (10,) (9,)"
MaxAbsScaler:  "operands could not be broadcast together with shapes (10,) (9,)"

~ 2/3 have an informative message because it directly comes from the partial_fit method. For the others, it comes from a function called inside partial_fit (e.g. in MiniBatchKMeans, partial_fit calls euclidean_distances) or even from numpy broadcasting error.

It would be nice to have a consistent message across all estimators. The way I see that is to add a function in utils/validation, something like check_X_partial_fit, to check that number of features is consistent between calls. And use that check inside all partial_fit methods.

Do you think it's the right way to do that ?

amueller · 2018-10-23T15:46:03Z

Yes, consistent messages would be nice and allows us to test for it.
Not entirely sure if a helper function is the right way to go if it's a single line but that would be a way to make sure the message is consistent.
So maybe implement the helper function and change the common tests to check for it.
It would be nice in Birch to actually show the numbers as well.

jeremiedbb · 2018-10-26T13:17:08Z

@amueller I implemented that in #12465. We can move the discussion there.
I'll close this PR if the other one turns out to be a good solution.

jeremiedbb · 2020-07-20T15:02:03Z

closing. superseeded by n_features_in

better error message

d37e5b8

jeremiedbb mentioned this pull request Oct 26, 2018

[WIP] Consistent and informative error message for partial_fit when n_features changes #12465

Closed

amueller added the Superseded PR has been replace by a newer PR label Aug 5, 2019

github-actions bot added the module:cluster label Mar 2, 2020

jeremiedbb closed this Jul 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Better error message for MiniBatchKMeans partial_fit when number of features changes #12431

[MRG] Better error message for MiniBatchKMeans partial_fit when number of features changes #12431

jeremiedbb commented Oct 22, 2018

eamanu commented Oct 22, 2018

jeremiedbb commented Oct 22, 2018

amueller commented Oct 22, 2018

jeremiedbb commented Oct 22, 2018

amueller commented Oct 22, 2018

jeremiedbb commented Oct 23, 2018 •

edited

Loading

amueller commented Oct 23, 2018

jeremiedbb commented Oct 26, 2018 •

edited by ogrisel

Loading

jeremiedbb commented Jul 20, 2020

[MRG] Better error message for MiniBatchKMeans partial_fit when number of features changes #12431

[MRG] Better error message for MiniBatchKMeans partial_fit when number of features changes #12431

Conversation

jeremiedbb commented Oct 22, 2018

eamanu commented Oct 22, 2018

jeremiedbb commented Oct 22, 2018

amueller commented Oct 22, 2018

jeremiedbb commented Oct 22, 2018

amueller commented Oct 22, 2018

jeremiedbb commented Oct 23, 2018 • edited Loading

amueller commented Oct 23, 2018

jeremiedbb commented Oct 26, 2018 • edited by ogrisel Loading

jeremiedbb commented Jul 20, 2020

jeremiedbb commented Oct 23, 2018 •

edited

Loading

jeremiedbb commented Oct 26, 2018 •

edited by ogrisel

Loading