-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Better error message for MiniBatchKMeans partial_fit when number of features changes #12431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Better error message for MiniBatchKMeans partial_fit when number of features changes #12431
Conversation
Are not there any test that assert with this rasise ? |
Yes, they are in common tests. |
I don't understand the statement about tests. If this is tested, how did the tests pass prior? |
There is a common test to check that any estimator, that have a partial_fit method, raises an error when the number of features changes between calls. My comment about not needing test was unclear. I meant I think we don't need test for the change I made because it's already tested in common tests. |
can we change the common test to test for a good error message? |
I totally agree. Below are the error messages we get when the number of features changes between calls to partial_fit: MultinomialNB: "Number of features 9 does not match previous data 10."
BernoulliNB: "Number of features 9 does not match previous data 10."
Perceptron: "Number of features 9 does not match previous data 10."
SGDClassifier: "Number of features 9 does not match previous data 10."
PassiveAggressiveClassifier: "Number of features 9 does not match previous data 10."
MLPClassifier: "shapes (100,9) and (10,100) not aligned: 9 (dim 1) != 10 (dim 0)"
SGDRegressor: "Number of features 9 does not match previous data 10."
MLPRegressor: "shapes (100,9) and (10,100) not aligned: 9 (dim 1) != 10 (dim 0)"
PassiveAggressiveRegressor : "Number of features 9 does not match previous data 10."
MiniBatchKMeans: "Incompatible dimension for X and Y matrices: X.shape[1] == 9 while Y.shape[1] == 10"
Birch: "Training data and predicted data do not have same number of features."
MiniBatchDictionaryLearning: "shapes (9,10) and (9,100) not aligned: 10 (dim 1) != 9 (dim 0)"
IncrementalPCA: "operands could not be broadcast together with shapes (10,) (9,)"
LatentDirichletAllocation: "The provided data has 9 dimensions while the model was trained with feature size 10."
StandardScaler: "operands could not be broadcast together with shapes (10,) (9,)"
MinMaxScaler: "operands could not be broadcast together with shapes (10,) (9,)"
MaxAbsScaler: "operands could not be broadcast together with shapes (10,) (9,)" ~ 2/3 have an informative message because it directly comes from the partial_fit method. For the others, it comes from a function called inside partial_fit (e.g. in MiniBatchKMeans, partial_fit calls It would be nice to have a consistent message across all estimators. The way I see that is to add a function in utils/validation, something like Do you think it's the right way to do that ? |
Yes, consistent messages would be nice and allows us to test for it. |
closing. superseeded by n_features_in |
Fixes #12430
It's all in the title. I don't think test is needed for that.