Skip to content

[MRG + 1] Clarified error msg in plot_partial_dependence #7673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 25, 2016

Conversation

kgilliam125
Copy link
Contributor

@kgilliam125 kgilliam125 commented Oct 14, 2016

Reference Issue

#7600

What does this implement/fix? Explain your changes.

Clarified error message in plot_partial_dependence when an
index greater than the number of features is specified in
features

Any other comments?

This PR picks up on a previously abandoned PR, #7609 .

@kgilliam125 kgilliam125 changed the title [WIP] Clarified error msg in plot_partial_dependence [MRG] Clarified error msg in plot_partial_dependence Oct 15, 2016
@@ -306,8 +306,8 @@ def convert_feature(fx):
l.append(feature_names[i])
names.append(l)
except IndexError:
raise ValueError('features[i] must be in [0, n_features) '
'but was %d' % i)
raise ValueError('features[i] must be in [0, len(feature_names)={0} ) '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like there's an extra space after {0}

raise ValueError('features[i] must be in [0, n_features) '
'but was %d' % i)
raise ValueError('features[i] must be in [0, len(feature_names)={0} ) '
'but was {1}'.format(len(feature_names), i))
Copy link
Contributor

@nelson-liu nelson-liu Oct 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit odd that you specify that "features[i] must be in a range", but then show the user the value of just "i" in the error message.

Copy link
Contributor Author

@kgilliam125 kgilliam125 Oct 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't show features[i] in this case, since i is out of range of the array. However, i is the value from features[i] that caused the error because of the looping above.

It was a bit of a tricky issue here; perhaps the error message needs further revision. Just to make sure I understand correctly, features is an array of indices for the features whose names correspond to the elements of feature_names at features[i]. My goal here is to work out a way to show a message such that a user who hasn't fully read the documentation would understand it. Along those lines, showing the value for len(feature_names) is also a bit confusing, since both seem like they should contain at most the n_features elements (with the further restriction that features[i] is also bound to [0, n_features)). If a user didn't specify a name for every feature, then it seems this value would be wrong.

Copy link
Contributor

@nelson-liu nelson-liu Oct 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see. perhaps @tetraptych would have an opinion about this, since he had to go through the process of debugging it?

Copy link
Contributor Author

@kgilliam125 kgilliam125 Oct 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bit of history here that's missing, because I picked this up from an abandoned PR. The error message that I have here is, more or less, the same as that proposed in the previous PR, #7609 .

I think the original error message is (partially) correct, we just need to add some more description to it about what's going on here. It seems to me that there are 2 conditions that features[i] has to satisfy:

  1. 0 <= features[i] < n_features because you can't enumerate more features than exist
  2. 'features[i]' < len(feature_names) because features[i] contains indices to feature_names

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for linking that to me. yeah, I agree that the error message is mostly there. It just seemed extremely ambiguous to me when I first read it (which made me question its value to a user), but it's obvious in hindsight after I've seen the code in question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's slightly confusing that the i in features[i] is not i as far as I can tell.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not features contains {} which is too large for feature_names? or something like that?
Maybe "All entries of features must be <len(feature_names)={}, got {}".format(len(feature_names), features[i])"

Copy link
Contributor Author

@kgilliam125 kgilliam125 Oct 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, after the features is converted by the prior code, the looping below causes i to be a value from the original features array that was passed in (if features containing integer values).

for fxs in features:
            l = []
            # explicit loop so "i" is bound for exception below
            for i in fxs:
                l.append(feature_names[i])
            names.append(l)
    except IndexError:
    ...

From a clarity standpoint, part of the issue may come from reassigning the features parameter that was passed in to tmp_features. After that, features[i] in the code takes a different meaning than what a user thinks of when reading features[i] in an error message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why my error message doesn't contain features[i] ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh sorry about that... I had a forum update issue and didn't see your last message : D. I'll push out another update later today.

@amueller
Copy link
Member

amueller commented Oct 17, 2016

Can you please fix the documentation for features to say that it can also include strings that are feature names? Maybe convert_feature should be feature_name_to_index?

@kgilliam125
Copy link
Contributor Author

kgilliam125 commented Oct 18, 2016

@amueller I updated the error message with your suggestion, but left off features[i] I can't use it in this case because i won't always be in the range [0 len(features)) either if both features and feature_names are specified.

I updated the docs to cover all of the ways to specify features, so hopefully that at least helps clear up confusion for future use. I kept the error message topical to the situation which could cause the error to occur, rather than explaining all usages of features there as well.

Is there a place under the doc folder that should be updated as well? The partial dependence section in ensemble.rst didn't go into parameters for the specific plotting function, and I haven't found mention of plot_partial_dependence elsewhere.

@amueller
Copy link
Member

Sorry it should have been "got {}".format(i) i was definitely contained in features which is what was passed.

@amueller
Copy link
Member

Thanks, lgtm!

@amueller amueller changed the title [MRG] Clarified error msg in plot_partial_dependence [MRG + 1] Clarified error msg in plot_partial_dependence Oct 24, 2016
@NelleV
Copy link
Member

NelleV commented Oct 25, 2016

Thanks for the patch!

@NelleV NelleV merged commit 4ddb744 into scikit-learn:master Oct 25, 2016
amueller pushed a commit to amueller/scikit-learn that referenced this pull request Oct 25, 2016
…n#7673)

* Clarified error msg in plot_partial_dependence

* Changed err msg for feature[i] out of range. Updated docs.

* Error message shows invalid value.
amueller pushed a commit to amueller/scikit-learn that referenced this pull request Oct 27, 2016
…n#7673)

* Clarified error msg in plot_partial_dependence

* Changed err msg for feature[i] out of range. Updated docs.

* Error message shows invalid value.
sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
…n#7673)

* Clarified error msg in plot_partial_dependence

* Changed err msg for feature[i] out of range. Updated docs.

* Error message shows invalid value.
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
…n#7673)

* Clarified error msg in plot_partial_dependence

* Changed err msg for feature[i] out of range. Updated docs.

* Error message shows invalid value.
trevorstephens added a commit to trevorstephens/scikit-learn that referenced this pull request Jul 26, 2017
trevorstephens added a commit to trevorstephens/scikit-learn that referenced this pull request Aug 12, 2017
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…n#7673)

* Clarified error msg in plot_partial_dependence

* Changed err msg for feature[i] out of range. Updated docs.

* Error message shows invalid value.
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…n#7673)

* Clarified error msg in plot_partial_dependence

* Changed err msg for feature[i] out of range. Updated docs.

* Error message shows invalid value.
trevorstephens added a commit to trevorstephens/scikit-learn that referenced this pull request May 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants