Skip to content

[MRG+1] Improve the error message for some metrics when the shape of sample_weight is inappropriate #9903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 11, 2017
Merged

Conversation

qinhanmin2014
Copy link
Member

@qinhanmin2014 qinhanmin2014 commented Oct 11, 2017

Reference Issue

proposed by @lesteve in #9786 (comment).

in this PR we spotted a place where check_consistent_lengths(X, y) was used where check_consistent_lengths(X, y, sample_weight) should have called it would be good to double-check that this error is not present in some other places in our codebase.

Fixes #9870

What does this implement/fix? Explain your changes.

Currently, many metrics do not explicitly check the shape of sample_weight. Instead, they rely on certain statement to block the code from running through, this may cause:
(1)Users can't get meaningful error message (e.g., now you may get Axis must be specified when shapes of a and weights differ or even operands could not be broadcast together with shapes (2,1) (3,1))
(2)Sometimes all the statements fail to block the code and you even can't get an erorr (e.g., roc_auc_score previously)
The PR fixes the problem and improves the common test to ensure that meaningful error message is raised by all metrics with sample_weight.

Any other comments?

cc @jnothman @lesteve

@lesteve
Copy link
Member

lesteve commented Oct 11, 2017

That's pretty nice you did it via the common tests, well done! LGTM.

A small tip for PEP8, you should spend some time configuring your editor to have on-the-fly flake8 checks, this saves some time in the long run!

Another small tip you can link to a given comment in an issue e.g. #9786 (comment) (I edited your message accordingly). This makes it a lot easier to find the comment you are referring to.

Note what I had originally in mind was something a bit more generic than just doing it for metrics i.e. checking that each time a function signature had sample_weight, there was a check_consistent_length(..., ..., sample_weight) call within its body. Not sure how easy this though.

@lesteve lesteve changed the title [MRG] Improve the error message for some metrics when the shape of sample_weight is inappropriate [MRG+1] Improve the error message for some metrics when the shape of sample_weight is inappropriate Oct 11, 2017
@qinhanmin2014
Copy link
Member Author

@lesteve Thanks a lot for the detailed instruction :)

Note what I had originally in mind was something a bit more generic than just doing it for metrics i.e. checking that each time a function signature had sample_weight, there was a check_consistent_length(..., ..., sample_weight) call within its body. Not sure how easy this though.

I think at least it won't be difficult if we go through all the public functions. I'll do this in a couple of days and open PR/issue if necessary.

@TomDLT
Copy link
Member

TomDLT commented Oct 11, 2017

That's pretty nice you did it via the common tests, well done! LGTM.

Indeed ! Thanks @qinhanmin2014

TomDLT pushed a commit that referenced this pull request Oct 11, 2017
@TomDLT TomDLT merged commit 6e75058 into scikit-learn:master Oct 11, 2017
@qinhanmin2014 qinhanmin2014 deleted the metric_err_message branch October 11, 2017 09:20
@qinhanmin2014
Copy link
Member Author

qinhanmin2014 commented Oct 15, 2017

ping @lesteve I tried my best to go through all the public methods (still hard to guarantee the completeness).

Note what I had originally in mind was something a bit more generic than just doing it for metrics i.e. checking that each time a function signature had sample_weight, there was a check_consistent_length(..., ..., sample_weight) call within its body.

We do not always use check_consistent_length to check the shape of sample_weight. Sometimes, we use an if statement, it enables us to raise even more informative error message than check_sample_weight, e.g., Shapes of X and sample_weight do not match.

Overall, I do not find any public function that can run through with inappropriate shape of sample_weight, but there are indeed some functions which do not check the shape of sample_weight, thus resulting in unclear error message. These include

DummyClassifier
KernelRidge
LinearRegression, RANSACRegressor, Ridge, RidgeClassifier
MultinomialNB, ComplementNB, BernoulliNB

Below are the modules I go through which seems related to sample_weight. Some modules, e.g., sklearn.ensemble, seems to pass sample_weight directly to the underlying estimators, thus are not included.

sklearn.cluster, sklearn.dummy, sklearn.isotonic
sklearn.kernel_ridge, sklearn.linear_mode, sklearn.naive_bayes
sklearn.svm, sklearn.tree

WDYT? Does it worth a fix? Thanks :)

Update: I have opened #9926 for further discussion.

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not clear error message for some metrics when the shape of sample_weight is inappropriate
3 participants