-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Add a sklearn.plot Module #13448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for opening this issue, ping @jnothman @amueller @GaelVaroquaux according to gitter
Good idea, but not consistent with existing functions (plot_tree and plot_partial_dependence), right? |
There are cases where you need to output/modify a figure, such as with
multiple subplots (see seaborn's facet plots etc, and upsetplot for
example). Can you give reasons why you want to limit it to axes?
…On Fri., 15 Mar. 2019, 2:19 am Hanmin Qin, ***@***.***> wrote:
Thanks for opening this issue, ping @jnothman
<https://github.com/jnothman> @amueller <https://github.com/amueller>
@GaelVaroquaux <https://github.com/GaelVaroquaux> according to gitter
#8425 <#8425> is not
related to Decision Regions of Classifiers,?
I prefer to move plot_tree and plot_partial_dependence to sklearn.plot and
solve #13335 <#13335>
in 0.21 (maybe introduce a function to plot the decision boundary, since
it's important and not easy for beginners IMO). What do others think?
To help control the scope of sklearn.plot, I propose we only do plotting
on the axes level and not the figure level. The user would pass in the axes
as a keyword. As a convenience, the default of axes will be None. Only in
this case, will the plotting function generate a axes/figure to plot on.
Good idea, but not consistent with existing functions (plot_tree and
plot_partial_dependence), right?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13448 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6y4ZcL4WftNY92wCoz19vqtXL9Njks5vWmiCgaJpZM4b0Oiz>
.
|
@qinhanmin2014
@jnothman Seaborn has documentation that clearly separates "figure-level" plot and "axes-level" plots. If we can properly document this behavior in scikit-learn, it is possible to have these "figure-level" plots. My biggest concern with "figure-level" plots, is they are harder to maintain and test. There will be elements from one axes that may overlap with another axes. Although, we can work around this by structuring figures in such a way that overlapping happens less often. In terms of testing, we can go the way of seaborn and test directly matplotlib objects or the yellowbrick way where we do pixel level testing. I tend to favor the testing the matplotlib objects. |
My 2 cents:
|
I agree but I also think that showing the users how to plot results such as decision boundaries is one of the goals of the examples. If I want a quick first plot of a result, plotting functions are great, especially for complicated plots such as plotting a tree, but I very often like to tweak the plot to my needs and for that reason I prefer to rely on existing examples and modify the code. |
Regarding the name of the module, IMO
|
no strong opinion, will vote +1 for both name. Maybe plot is more straightforward? Again I'm interested in creating the new module before 0.21 and move plot_tree and plot_partial_dependence there. Ideally we should also reach consensus on the API (e.g., axes level/ figure level). |
Other point in favor of We may want inspect tools that offer plotting as an option. For example, print the characteristics of a tree (number of nodes, leaves, split points, etc.) and optionally plot it with matplotlib. I would be in favor of using axes instead of figures, as suggested (sigh, I'll need to change PDPs again). It's easier for us to support and test. It's a little more work for the users, but it also allows for more control. |
> IMO inspect is more versatile than plot
no strong opinion, will vote +1 for both name. Maybe plot is more straightforward?
"inspect" is loaded in Python (it's a module in the standard library). I
would avoid using the same name.
Again I'm interested in creating the new module before 0.21 and move plot_tree and plot_partial_dependence there. Ideally we should also reach consensus on the API (e.g., axes level/ figure level).
This shouldn't delay 0.21. Our goal is to release early, and hopefully
release early again.
|
I propose |
We might want to inspect something that's not a model (encoder, preprocessor, grid search results...)
|
Those things are also models :)
Gael's suggestion of a public plot submodule on each subpackage is worth
considering.
|
FWIW, I'd also prefer |
If so, where shall we put Regarding #12599, @NicolasHug I doubt whether |
If so, where shall we put plot_decision_boundary?
sklearn.plot ?
I don't want to push too hard for this solution. However, I agree with
the feeling that "plot" may be easier to discover for end users.
|
I don't understand what you mean. #12599 deprecates I'm fine with
But so far all the plotting tools that are proposed (PDPs, Calibration, Confusion matrix and decision region) aren't specific to a single module. |
Apologies I haven't looked at that PR in detail. Maybe |
Maybe inspect.partial_dependence + plot.plot_partial_dependence?
I like a clear separation between computing the values and plotting it.
It's an model/view like separation, and it should help increase the
reusability.
|
Wasn't Gaël earlier proposing sklearn.inspect.partial_dependence and
sklearn.inspect.plot.plot_partial_dependence (Substitute some other name
for inspect if appropriate)? I don't mind this.
|
Yes, but I asked him where shall we put |
FYI, I updated the PDP PR #12599 following the recommendations above:
The docs are here https://53182-843222-gh.circle-artifacts.com/0/doc/_changed.html Note that the user guide only includes the (This is the kind of overlap that I was worried about) Of course if you think it's still better to have separate user guides for |
+1
+1 So we've decided to use the name sklearn.plot? |
I find sklearn.plot importing dependencies from across sklearn to be a bit weird when we have avoided putting everyone in the root namespace. |
So you'd prefer No |
I think I'd prefer that. Not yet certain how it generalises.
|
As long as we can figure out an appropriate place to put things like |
Does this need a slep? we don't seem to be making much progress EDIT ugh, sleepy me read Joel's comment as I don't think I'd prefer that, sorry |
I'm fine with either solution ( |
|
Interesting idea, I'll vote +1. Maybe it's not so good to throw all the things to sklearn.plot (https://github.com/rasbt/mlxtend put all the plotting functions in a single module). So we'll support |
Not sure it needs a slep but might be worth inviting opinions on the
mailing list
|
Not sure it needs a slep but might be worth inviting opinions on the mailing list
+1. It does not fall in the criteria of SLEPs, it seems.
|
As I said on the mailing list, I think we should really also consider "where the work happens" or what the interface will be like. That was already quite unclear for partial dependence. So we could either
|
The main consideration I discusses with @NicolasHug is that having plot_X call compute_X is convenient for the user - as long as they only want to plot the thing once. If they don't like the plot and want to change something, they need to compute_X again, which is potentially a waste.
+1000. It's a common problem that I see in research code.
From a design problem, it violates an MVC separation (slightly pedantic,
sorry).
In the various solutions that you propose, would you consider taking a
fitted model as an approach? I feel that it would mitigate the problem of
remembering the order of parameters. But maybe there are additional
problems.
|
I'm not sure what you mean by fitted model. Often the output of computation is not a fitted model. We could define objects for all of the computation results, so that Oh the reason I bring this up now: without this decision I have no idea what user code will look like, and I don't like making naming / API decisions without being able to write down examples ;) |
returning an object would be pretty un-sklearn-like, imho. But it might solve the location problem: it could have a |
Often the output of computation is not a fitted model. We could define objects for all of the computation results, so that partial_dependence returns a PartialDependence object. Or a bunch. But it doesn't return an estimator.
Point taken.
One option (not saying that it is the best one) would be to have all the
compute functions return named tuples and all the corresponding compute
functions take this. It would be somewhat consistent with modern
additions to the Python standard library.
and I don't like making naming / API decisions without being able to write down examples ;)
I'm like you.
|
So if the computation is expensive, we can do the computation outside the function. If so, we return an object and the plotting function take this object as its input, right? I'll vote +1. And maybe we need another issue to discuss this :) |
The benefit of @GaelVaroquaux's suggestion is that it might not require users to change their code because of tuple unpacking. But that wouldn't work if there's a single returned object like in @qinhanmin2014 if we return arbitrary objects we have to deprecate the return type for a function every time we create a plotting helper. That seems like a mess. I had one idea, and then a second better idea:
cm = ConfusionMatrix(y, y_pred)
cm.plot() That would solve the problem, but would duplicate some interface and it would be a bit unclear. Actually the same principle can be done in a way that I think is more intuitive: plot_confusion_matrix(y, y_pred) would therefor just plot, and return a While this exposes an additional class for the more complex use-cases, I think it's a reasonable trade-off because it has a nice answer to all the situations. |
Why do users need to plot the same data again? @amueller adjust the format? |
@qinhanmin2014 yes, make the font bigger, change the colors, plot it with something else together in the same plot, add labels, ... |
I doubt whether it’s worthwhile to consider these formatting issues here. Users can start will a small part of the dataset? |
And @amueller we’ll support passing axes, so users can easily adjust the plot after they call the plotting functions? |
@qinhanmin2014 no, many things can not easily be changed afterwards, and we don't need to think about all the formatting necessarily ourselves, but we should allow users to plot something again. It will basically happen any time you do a plot. And having to subsample datasets every time I want to do a plot is a bit annoying. And if I later change my mind I will still need to recompute. |
Yes, I know it might be useful, but the main issue here is that we don't have a clean way to support this functionality, and I guess most plotting functions do not require much calculation? |
I like that we're discussing this with a little more grounding, but tbh I'm
still not entirely convinced that we even need to have plot in the import
path at all. After all, we seem to have plot_ as a prefix for the
functions. The question also relates to plot_tree: why *should* it be
separated from other export and textual visualisation code?
|
@qinhanmin2014 I don't think "we don't have a good API yet" is a good reason. |
@jnothman I think @GaelVaroquaux wanted to keep the matplotlib dependency confined to a module and that was one of the main motivations? I don't really have very coherent thoughts about this yet. |
Thanks, I now realized that I'm wrong :)
Yes this can also be an option. If so, we can mention that all the functions which start with |
Going over this discussion, I agree with not adding a For example, in #12599, |
Ok, unless someone disagrees with this in the following days, I'm going to update the PDP PR and:
|
Can we make the final decision here? |
Sleeping over it I think it’s simple enough to explain and it avoids the difficulty to think of a shared plotting api between different modules.
|
Yes, let's do that. Make a helper function to provide a more helpful
ImportError
|
FYI, I'm adding def check_matplotlib_support(caller_name):
try:
import matplotlib
except ImportError as e:
raise ImportError(
"{} requires matplotlib. You can install matplotlib with "
"`pip install matplotlib`".format(caller_name)
) from e |
That's great! Thanks. |
Here is an overview of the work done so far relating to plotting:
To help control the scope of
sklearn.plot
, I propose we only do plotting on the axes level and not the figure level. The user would pass in the axes as a keyword. As a convenience, the default ofaxes
will beNone
. Only in this case, will the plotting function generate a axes/figure to plot on.The text was updated successfully, but these errors were encountered: