-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] Add plotting module with heatmaps for confusion matrix and grid search results #16287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…cikit-learn into plot_confusion_matrix
Conflicts were resolved choosing the latest code, from master. At the top-level, the list of sub- packages conflict, the 'plot' merged stayed as one of the __all__ modules.
Thank you for the PR @chbrandt . I think this issue is not a good first issue for the following reasons:
I would say point 1 makes this more difficult, since we still need to discuss what the grid search plotting should show. We most likely will need the same "space stealing" logic like in partial dependence plotting. |
Heatmap will be the default choice for 2 parameters. We could probably go incrementally. We used parallel-coordinates plot in some course: https://plot.ly/python/parallel-coordinates-plot/ So it might be interesting to restrain the scope of the PR to the heatmap implementation for 2 parameters and then add-up in a following PR? @thomasjpfan WDYT? Maybe it might be dangerous if we think that we might make a mistake in the API? |
Thanks for the note @thomasjpfan. @glemaitre pointed out some actions I would have to take to bring this old thread to the current state of plot in the library, as you also said, to use Display. I will have a further look into the Display and see how/if I can manage it. Nevertheless, if you think is not worth at all, please fill free to hard close it and I move on to other stuff ;) |
As long as we can reduce the scope it would be nice.
The parallel-coordinates plot for hyperparamters may work in our case, if we allow the metric range to be specified when the plot is created. Maybe "highlight the top 5% of scores from the search" and have this as keyword. The user can plot over and over again, since all the computation is already done in the I am okay with a 2 parameter heatmap version, but I would want a clear plan forward from there (API-wise). For 1 parameter, we can have a scatter/line plot, with error bars to represent the std. Alternatively, we can not allow searches with more than 2 parameters. In this case, how useful is this plotting function? |
Yes, so I will adapt the old function to the current plot/display structure. And I will focus on the 2 parameters case to have this thread closed. From my understanding of def plot_gridsearch_results(cv_results, params_list=None):
params_in_results = cv_results['params'][0].keys()
if len(params_in_results) > 2:
assert len(params_list) == 2, "GridSearch results visualization supports only two-parameters at a time. Please, choose a (x,y) pair from your parameters: {!s}".format(params_in_results)
assert all(param in params_in_results for param_ in params_list)
(...) And move on from there. For the higher-dimensional cases, as suggested, we would support it with the parallel lines plot (e.g., https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#parallel-coordinates, https://www.scikit-yb.org/en/latest/api/features/pcoords.html#parallel-coordinates). But again, I would focus now on the simplest cases (1D,2D), and have the higher-dimensional cases in a new issue. |
TODO: Examples must be revised
For the first iteration, I may even go as far as restricting the the plot to only grid searches with two hyperparameters. Lets say a grid search has a cross product of three hyperparameters and a use passes in a |
Hi @chbrandt , are you still interested in working on this PR? Thanks. |
Reference Issues/PRs
Closes #9173
Addressing this in the context of Paris Sprint of the Decade.
What does this implement/fix? Explain your changes.
Merged current master into thismlguy@01a63a7 :
Ping @amueller @thomasjpfan
params
with more than 2 (not implemented) valuesparams
with wrong values (right dimensionality, but non-existent value)NotImplementedError
whenparams=None
andcv_results
dim > 2