Skip to content

FIX solve wrong indexing in plots with ICE when subsampling #18359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Oct 30, 2020

Conversation

glemaitre
Copy link
Member

@glemaitre glemaitre commented Sep 8, 2020

This PR proposes the following changes:

@glemaitre glemaitre changed the title [WIP] FIX solve wrong indexing in plots with ICE when subsampling FIX solve wrong indexing in plots with ICE when subsampling Sep 8, 2020
@glemaitre
Copy link
Member Author

@madhuracj
Copy link
Contributor

😞 Yes, the indexing seems to be quite off. I am relieved this was found out before the feature was released. Thanks, @glemaitre for the quick fix.

@glemaitre
Copy link
Member Author

I am relieved this was found out before the feature was released

Don't worry we already released bugs in the past :)

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the code quite complex to review. I am not even sure what was the actual indexing bug to look for. Could you please describe it in two words? Would it happen in the "both" case? I am not sure I can see the difference visually on the rendered HTML example.

Anyways, here are some comment to try to refactor this code to make it easier to follow by introducing sub-methods:

@glemaitre
Copy link
Member Author

I find the code quite complex to review. I am not even sure what was the actual indexing bug to look for. Could you please describe it in two words? Would it happen in the "both" case? I am not sure I can see the difference visually on the rendered HTML example.

Our plotting was bug resilient :)
The reason is that we are generating the plot and storing the line into self.lines_[...] and it is this indexing which is wrong.
So the element in self.lines_ are not properly indexed but we are not using them to plot. However, if one want to use this attribute for some reason, it is wrongly indexed. A small example to show the bug:

    from sklearn.datasets import make_classification
    from sklearn.linear_model import LogisticRegression
    from sklearn.inspection import plot_partial_dependence
    X, y = make_classification()
    model = LogisticRegression().fit(X, y)
    disp = plot_partial_dependence(
        model, X, features=[0, 1, 2, 3], kind="both",
        subsample=3
    )

The axis will be organized on a grid of (2, 3)

disp.axes_.shape
(2, 3)

You can access the line of a plot with disp.lines_[0, 0] which is the index of the sub-axis. On our case, we have 4 lines (3 ICE and one mean.

disp.lines_[0, 1].shape
(4,)

The array disp.lines_ was pre-allocated with None. So now we can see the error of indexing:

disp.lines_[0, 1]
[<matplotlib.lines.Line2D object at 0x7fc12c0143a0> None
 <matplotlib.lines.Line2D object at 0x7fc12bff69a0> None]

So we have 2 values at None because we did not place the line at the right place.

My non-regression test check that there are no None values in this array.

@glemaitre
Copy link
Member Author

I think that we should use np.ravel_multi_index to simplify the indexing. It will make things much more readable.

@glemaitre
Copy link
Member Author

I refactor the code with a 1-way private function calling the ICE and PDP plotting function and a 2-way plotting function.
We have some slight redundancy to decorate the axis but I think that the code will be more readable as you suggested @ogrisel

I still have to document and maybe add a couple of comments to clarify the indexing or some parameters.

@glemaitre
Copy link
Member Author

@ogrisel The code is ready for another review. A bit less easy to review but I think the refactoring makes it easier to go through the code, hopefully :).

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more nitpicks but otherwise this looks good enough :)

@glemaitre
Copy link
Member Author

ping @thomasjpfan I think that it should be ready now.

@glemaitre
Copy link
Member Author

pinging @NicolasHug and @adrinjalali

@glemaitre glemaitre added this to the 0.24 milestone Oct 25, 2020
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The black formatting + refactoring did make this a little harder to review. But having looked through it all, this LGTM!

This still needs a whats new for the random_state parameter and the bug fix.

@glemaitre
Copy link
Member Author

@thomasjpfan no need for the what's new because we did not release it yet.
So I am merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants