-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Improve the error message for some metrics when the shape of sample_weight is inappropriate #9903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
That's pretty nice you did it via the common tests, well done! LGTM. A small tip for PEP8, you should spend some time configuring your editor to have on-the-fly flake8 checks, this saves some time in the long run! Another small tip you can link to a given comment in an issue e.g. #9786 (comment) (I edited your message accordingly). This makes it a lot easier to find the comment you are referring to. Note what I had originally in mind was something a bit more generic than just doing it for metrics i.e. checking that each time a function signature had |
@lesteve Thanks a lot for the detailed instruction :)
I think at least it won't be difficult if we go through all the public functions. I'll do this in a couple of days and open PR/issue if necessary. |
Indeed ! Thanks @qinhanmin2014 |
…sample_weight is inappropriate (#9903)
ping @lesteve I tried my best to go through all the public methods (still hard to guarantee the completeness).
We do not always use check_consistent_length to check the shape of sample_weight. Sometimes, we use an if statement, it enables us to raise even more informative error message than check_sample_weight, e.g., Shapes of X and sample_weight do not match. Overall, I do not find any public function that can run through with inappropriate shape of sample_weight, but there are indeed some functions which do not check the shape of sample_weight, thus resulting in unclear error message. These include DummyClassifier
KernelRidge
LinearRegression, RANSACRegressor, Ridge, RidgeClassifier
MultinomialNB, ComplementNB, BernoulliNB Below are the modules I go through which seems related to sample_weight. Some modules, e.g., sklearn.ensemble, seems to pass sample_weight directly to the underlying estimators, thus are not included. sklearn.cluster, sklearn.dummy, sklearn.isotonic
sklearn.kernel_ridge, sklearn.linear_mode, sklearn.naive_bayes
sklearn.svm, sklearn.tree WDYT? Does it worth a fix? Thanks :) Update: I have opened #9926 for further discussion. |
…sample_weight is inappropriate (scikit-learn#9903)
…sample_weight is inappropriate (scikit-learn#9903)
Reference Issue
proposed by @lesteve in #9786 (comment).
Fixes #9870
What does this implement/fix? Explain your changes.
Currently, many metrics do not explicitly check the shape of sample_weight. Instead, they rely on certain statement to block the code from running through, this may cause:
(1)Users can't get meaningful error message (e.g., now you may get
Axis must be specified when shapes of a and weights differ
or evenoperands could not be broadcast together with shapes (2,1) (3,1)
)(2)Sometimes all the statements fail to block the code and you even can't get an erorr (e.g., roc_auc_score previously)
The PR fixes the problem and improves the common test to ensure that meaningful error message is raised by all metrics with sample_weight.
Any other comments?
cc @jnothman @lesteve