-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG + 1] Clarified error msg in plot_partial_dependence #7673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG + 1] Clarified error msg in plot_partial_dependence #7673
Conversation
@@ -306,8 +306,8 @@ def convert_feature(fx): | |||
l.append(feature_names[i]) | |||
names.append(l) | |||
except IndexError: | |||
raise ValueError('features[i] must be in [0, n_features) ' | |||
'but was %d' % i) | |||
raise ValueError('features[i] must be in [0, len(feature_names)={0} ) ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like there's an extra space after {0}
raise ValueError('features[i] must be in [0, n_features) ' | ||
'but was %d' % i) | ||
raise ValueError('features[i] must be in [0, len(feature_names)={0} ) ' | ||
'but was {1}'.format(len(feature_names), i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit odd that you specify that "features[i] must be in a range", but then show the user the value of just "i" in the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't show features[i]
in this case, since i
is out of range of the array. However, i
is the value from features[i]
that caused the error because of the looping above.
It was a bit of a tricky issue here; perhaps the error message needs further revision. Just to make sure I understand correctly, features
is an array of indices for the features whose names correspond to the elements of feature_names
at features[i]
. My goal here is to work out a way to show a message such that a user who hasn't fully read the documentation would understand it. Along those lines, showing the value for len(feature_names)
is also a bit confusing, since both seem like they should contain at most the n_features
elements (with the further restriction that features[i]
is also bound to [0, n_features)
). If a user didn't specify a name for every feature, then it seems this value would be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see. perhaps @tetraptych would have an opinion about this, since he had to go through the process of debugging it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bit of history here that's missing, because I picked this up from an abandoned PR. The error message that I have here is, more or less, the same as that proposed in the previous PR, #7609 .
I think the original error message is (partially) correct, we just need to add some more description to it about what's going on here. It seems to me that there are 2 conditions that features[i]
has to satisfy:
- 0 <=
features[i]
<n_features
because you can't enumerate more features than exist - 'features[i]' <
len(feature_names)
becausefeatures[i]
contains indices tofeature_names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for linking that to me. yeah, I agree that the error message is mostly there. It just seemed extremely ambiguous to me when I first read it (which made me question its value to a user), but it's obvious in hindsight after I've seen the code in question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's slightly confusing that the i
in features[i]
is not i
as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not features contains {} which is too large for feature_names
? or something like that?
Maybe "All entries of features must be <len(feature_names)={}, got {}".format(len(feature_names), features[i])"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, after the features
is converted by the prior code, the looping below causes i
to be a value from the original features
array that was passed in (if features
containing integer values).
for fxs in features:
l = []
# explicit loop so "i" is bound for exception below
for i in fxs:
l.append(feature_names[i])
names.append(l)
except IndexError:
...
From a clarity standpoint, part of the issue may come from reassigning the features
parameter that was passed in to tmp_features
. After that, features[i]
in the code takes a different meaning than what a user thinks of when reading features[i]
in an error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why my error message doesn't contain features[i]
;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh sorry about that... I had a forum update issue and didn't see your last message : D. I'll push out another update later today.
Can you please fix the documentation for |
@amueller I updated the error message with your suggestion, but left off I updated the docs to cover all of the ways to specify Is there a place under the doc folder that should be updated as well? The partial dependence section in ensemble.rst didn't go into parameters for the specific plotting function, and I haven't found mention of |
Sorry it should have been |
Thanks, lgtm! |
Thanks for the patch! |
…n#7673) * Clarified error msg in plot_partial_dependence * Changed err msg for feature[i] out of range. Updated docs. * Error message shows invalid value.
…n#7673) * Clarified error msg in plot_partial_dependence * Changed err msg for feature[i] out of range. Updated docs. * Error message shows invalid value.
…n#7673) * Clarified error msg in plot_partial_dependence * Changed err msg for feature[i] out of range. Updated docs. * Error message shows invalid value.
…n#7673) * Clarified error msg in plot_partial_dependence * Changed err msg for feature[i] out of range. Updated docs. * Error message shows invalid value.
…n#7673) * Clarified error msg in plot_partial_dependence * Changed err msg for feature[i] out of range. Updated docs. * Error message shows invalid value.
…n#7673) * Clarified error msg in plot_partial_dependence * Changed err msg for feature[i] out of range. Updated docs. * Error message shows invalid value.
Reference Issue
#7600
What does this implement/fix? Explain your changes.
Clarified error message in plot_partial_dependence when an
index greater than the number of features is specified in
features
Any other comments?
This PR picks up on a previously abandoned PR, #7609 .