-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[ENH]: Adapt 2-dim histogram axis limits to content #26288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
I actually need the plot to be transparent when bin-counts are too low, so hopefully that will still be possible if you change it to colormaps' "under" color. |
I very much disagree that the second example is "wrong". From a library point of view we should show the user the data that was passed to us, not the data the user passed us that we might want. Put another way, I think wrapping that code up in a helper like def shrink_wrap_hist_and_hide_existinace_of_outliers(h, xedges, yedges, pc):
... or just wrapping the call to is the best option and I am very 👎🏻 on adding a flag to It looks like setting the under values to From a strictly visual view: diff --git a/lib/matplotlib/axes/_axes.py b/lib/matplotlib/axes/_axes.py
index 4eff4ec1b5..3a3dae0711 100644
--- a/lib/matplotlib/axes/_axes.py
+++ b/lib/matplotlib/axes/_axes.py
@@ -7190,15 +7190,11 @@ such objects
h, xedges, yedges = np.histogram2d(x, y, bins=bins, range=range,
density=density, weights=weights)
- if cmin is not None:
- h[h < cmin] = None
- if cmax is not None:
- h[h > cmax] = None
-
- pc = self.pcolormesh(xedges, yedges, h.T, **kwargs)
+ pc = self.pcolormesh(xedges, yedges, h.T, vmin=cmin, **kwargs)
self.set_xlim(xedges[0], xedges[-1])
self.set_ylim(yedges[0], yedges[-1])
-
+ if cmin is not None and 'cmap' not in kwargs:
+ pc.cmap.set_under((0, 0, 0, 0))
return h, xedges, yedges, pc
@_preprocess_data(replace_names=["x", "weights"], label_namer="x") is enough for many cases (the edge case of the user passed in Setting the "under" values to As near as I can tell we do not actually use So, on net, although I am with @jklymak in being surprised by the behavior of treating out-of-bounds data as missing data, it is a reasonable, documented, and long standing behavior so it is not a candidate for deprecation or change. |
@tacaswell Thanks very much for your extensive comment! I think our disagreement boils down to what you said here:
But we are using First we have Second we have Finally we have This is a very typical example of what I experience with Because we are using (Apart from this issue, I think Matplotlib has made some beautiful plots here!) |
Part of the core problem here is that One other aspect of the missing data problem is if you treat it as "there should be data here but we found nothing" vs "there is nothing to see here, not even the possibility that there could have been data". In the first case it is thing like in a map showing the outline of areas where there is missing data (because it is interesting information that state X has no data) vs auto scaling to the data given (because while there is other possible data .... we don't have it). I think that given this ambiguity here I the default should be "it is interesting that this data is missing". I do not have a good sense of how common the second case is for If we were to add a way to tell |
There's not many people referring to I use My plots are generated from simulated data. The data doesn't come from a data-file. So the user doesn't know what data "should" be there. The user simply specifies whether they want to remove outliers or not. I also use another traditional method of removing extreme outliers that are too far from the center of the distribution, but that cannot remove "2-dim" outliers, which So I really need the The docs describe what
It's a fairly isolated feature to add an Allow me to say, that I think we are both wasting too much time on this. Especially as someone has already taken my code above and turned it into a PR. I don't mean to be rude as you guys are doing a great job with Matplotlib, but it does feel rather difficult to get something accepted into the code-base, and that is unfortunate in an open-source project. Whether this feature should be implemented as I've done above, or in some of the other ways you suggest, I can't answer. But unless you or someone else wants to implement it that way, then I hope you would just move ahead with the working solution we have here. The person who opened the PR is apparently a young student who wants to contribute to open-source, so it might be a bit demoralizing for them to get shut down like that. I will give the person a few comments on the PR, and it might be a good investment for you guys to give the person a few helpful comments as well, so they might come back and contribute more in the future. If we still disagree on whether this feature belongs in Matplotlib, what is the procedure for "conflict resolution" here? Do you vote? Or are the core developers the "ruling junta" so they just get to decide? :-) |
This is a huge codebase, and we tend to be very conservative adding new behaviour. A new change needs to argue that it cannot be easily done outside the library. In this case, it's just a case of using numpy.histogram2d and trimming the data before sending to pcolormesh.
It sounds like an isolated feature, but if it only goes into
Thats one way to express it. The more charitable way to say it is that the core developers have many years of experience put into the project and have to make judgement calls, particularly as they will likely assume the maintenance burden of any new features. OTOH, the ruling junta is relatively permissive with inviting new members who show interest with PRs and contributions. |
The solution in the OP also uses only public API. Though I think it might benefit from the hint in #25962. |
https://snarky.ca/setting-expectations-for-open-source-participation/ (longer) and https://textual.textualize.io/blog/2023/07/29/pull-requests-are-cake-or-puppies/ (shorter) are good posts about the sort of reasoning that maintainers go through when evaluating PRs and feature requests. Our rules for merging are documented here: https://matplotlib.org/devdocs/devel/coding_guide.html#merging |
Problem
Plotting a 2-dim histogram using the
hist2d
function can leave a lot of excess space in the plot when using the argumentcmin
to exclude bins with insufficient numbers of data-points. (There's presumably a similar problem when using the argumentcmax
).Proposed solution
The following is a quick solution that seems to work. But it may be possible to make a better solution when integrating it into the
hist2d
function. It should be made as an optional argument such ashist2d(adapt_lim=True, ...)
to adapt both x and y-axis, or specify a given axis such ashist2d(adapt_lim='x', ...)
I don't really have time to add this to Matplotlib's code-base myself, and make sure everything is made according to your standards etc. So hopefully someone else can do it, if you like the feature. Thanks!
The text was updated successfully, but these errors were encountered: