Skip to content

[WIP] Add plotting module with heatmaps for confusion matrix and grid search results #16287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 111 commits into from
Closed

[WIP] Add plotting module with heatmaps for confusion matrix and grid search results #16287

wants to merge 111 commits into from

Conversation

chbrandt
Copy link
Contributor

@chbrandt chbrandt commented Jan 29, 2020

Reference Issues/PRs

Closes #9173

Addressing this in the context of Paris Sprint of the Decade.

What does this implement/fix? Explain your changes.

Merged current master into thismlguy@01a63a7 :

  • Resolved the conflicts choosing for latest (current master).
  • Added test for GridSearchCV with trianing-score, since the default behaviour changed in the meantime

Ping @amueller @thomasjpfan

  • test check GridSearch Display class
  • test check params with more than 2 (not implemented) values
  • test check params with wrong values (right dimensionality, but non-existent value)
  • Use NotImplementedError when params=None and cv_results dim > 2
  • Review/adjust examples

@thomasjpfan
Copy link
Member

Thank you for the PR @chbrandt . I think this issue is not a good first issue for the following reasons:

  1. Grid search plotting will need some prototyping to figure out what we want to do with more than three search parameters.
  2. One needs to create a Display object that separates computation and visualizations.

I would say point 1 makes this more difficult, since we still need to discuss what the grid search plotting should show. We most likely will need the same "space stealing" logic like in partial dependence plotting.

@glemaitre
Copy link
Member

Grid search plotting will need some prototyping to figure out what we want to do with more than three search parameters.

Heatmap will be the default choice for 2 parameters. We could probably go incrementally.

We used parallel-coordinates plot in some course: https://plot.ly/python/parallel-coordinates-plot/
which is even better when it is interactive.

So it might be interesting to restrain the scope of the PR to the heatmap implementation for 2 parameters and then add-up in a following PR? @thomasjpfan WDYT? Maybe it might be dangerous if we think that we might make a mistake in the API?

@chbrandt
Copy link
Contributor Author

Thanks for the note @thomasjpfan. @glemaitre pointed out some actions I would have to take to bring this old thread to the current state of plot in the library, as you also said, to use Display. I will have a further look into the Display and see how/if I can manage it. Nevertheless, if you think is not worth at all, please fill free to hard close it and I move on to other stuff ;)

@thomasjpfan
Copy link
Member

As long as we can reduce the scope it would be nice.

We used parallel-coordinates plot in some course

The parallel-coordinates plot for hyperparamters may work in our case, if we allow the metric range to be specified when the plot is created. Maybe "highlight the top 5% of scores from the search" and have this as keyword. The user can plot over and over again, since all the computation is already done in the GridSearchCV object. Although in this case, the std is lost in the visualization. The std is lost in the 2-parameter heatmap version as well.

I am okay with a 2 parameter heatmap version, but I would want a clear plan forward from there (API-wise).

For 1 parameter, we can have a scatter/line plot, with error bars to represent the std.
For 2 parameters, we can have a single heatmap.
For 3 parameters, we can have multiple scatter/line plots? What if a user wants to do this for 2 parameters?

Alternatively, we can not allow searches with more than 2 parameters. In this case, how useful is this plotting function?

@chbrandt
Copy link
Contributor Author

Yes, so I will adapt the old function to the current plot/display structure. And I will focus on the 2 parameters case to have this thread closed.
For simplicity, the user can handle a high dimensional set of parameters (>2) to the plot function, but also a (size-2) list with the parameters to visualize (when grid space >2).

From my understanding of GridSearchCV.cv_results_ so far, such implementation decisions would have internals along the lines:

def plot_gridsearch_results(cv_results, params_list=None):
    params_in_results = cv_results['params'][0].keys()
    if len(params_in_results) > 2:
        assert len(params_list) == 2, "GridSearch results visualization supports only two-parameters at a time. Please, choose a (x,y) pair from your parameters: {!s}".format(params_in_results)
        assert all(param in params_in_results for param_ in params_list)
(...)

And move on from there.
Along the same lines, 1D plot would be supported I guess. I still have to understand where I get errors/std. But the user interface could be alike.

For the higher-dimensional cases, as suggested, we would support it with the parallel lines plot (e.g., https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#parallel-coordinates, https://www.scikit-yb.org/en/latest/api/features/pcoords.html#parallel-coordinates). But again, I would focus now on the simplest cases (1D,2D), and have the higher-dimensional cases in a new issue.
How do you feel about that?

@thomasjpfan
Copy link
Member

For the first iteration, I may even go as far as restricting the the plot to only grid searches with two hyperparameters.

Lets say a grid search has a cross product of three hyperparameters and a use passes in a params_list with 2 parameters. The hyperparameter not passed in, will have to be aggregated somehow to form the heatmap.

@cmarmo
Copy link
Contributor

cmarmo commented Sep 1, 2020

Hi @chbrandt , are you still interested in working on this PR? Thanks.

Base automatically changed from master to main January 22, 2021 10:51
@chbrandt chbrandt closed this by deleting the head repository Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants