Skip to content

Fixes #12926 - inconsistency upon passing C in hexbin #26113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 13, 2023

Conversation

krooijers
Copy link
Contributor

@krooijers krooijers commented Jun 12, 2023

PR summary

Closes #12926

Around line 4998 in _axes.py two code paths diverge, depending on whether parameter C is given. The strict inequality between accum and mincnt in L5003 when C == None is not symmetric to the strict inequality at L5017, when C != None. This causes data points to become "hidden", when the number of data points (accum) is equal to mincnt.

'Tis but a scratch: changing > mincnt to >= mincnt was enough to get consistent hexbin plots.

hexbin result before

image

hexbin result after

image

PR checklist

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening your first PR into Matplotlib!

If you have not heard from us in a week or so, please leave a new comment below and that should bring it to our attention. Most of our reviewers are volunteers and sometimes things fall through the cracks.

You can also join us on gitter for real-time discussion.

For details on testing, writing docs, and our review process, please see the developer guide

We strive to be a welcoming and open project. Please follow our Code of Conduct.

@oscargus
Copy link
Member

Thank you! I think this makes sense from what you described (and the results).

Can you please add a test for this? For example, a check_figures_equal-test for two of the cases that were earlier different, see the following for an example:

@check_figures_equal(extensions=["png"])
def test_scatter_single_point(self, fig_test, fig_ref):
ax = fig_test.subplots()
ax.scatter(1, 1, c=1)
ax = fig_ref.subplots()
ax.scatter([1], [1], c=[1])

@oscargus oscargus added this to the v3.8.0 milestone Jun 13, 2023
@krooijers
Copy link
Contributor Author

Thanks for taking a look, @oscargus

I've added a test test_hexbin_mincnt_behavior_upon_C_parameter and verified that indeed it does fail without the fix, and passes with the fix.

Copy link
Member

@oscargus oscargus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!

@ksunden
Copy link
Member

ksunden commented Jun 13, 2023

Anyone can merge on green CI

@oscargus oscargus merged commit 584cda0 into matplotlib:main Jun 13, 2023
@oscargus
Copy link
Member

Thanks @krooijers and congrats on your first merged PR in Matplotlib. Hope to see you around!

@anntzer
Copy link
Contributor

anntzer commented Jul 14, 2023

@oscargus @ksunden This issue was discussed at length in #16865 and #21381. There's at least one issue with the fix there: this makes the reducer (defaulting to np.mean) be called for empty arrays too, and that triggers a warning ("Mean of empty slice") as soon as a cell is empty (e.g., hexbin(randn(1000), randn(1000), randn(1000))). You can easily imagine that there are reducers which would instead throw on empty inputs instead of returning nan, which would make things even worse. At the very least one should add an API change note stating that the reducer needs to be able to handle empty inputs (unless one always sets mincnt to 1), and suppress the warning in the case of np.mean.

As argued in the linked threads, I think we should switch the default reduce_C_function to np.sum (because it usually makes more sense than np.mean), and possibly also deprecate support for mincnt when C is not passed (because in that case mincnt is more simply and more generally expressed as vmin (plus possibly setting a custom "under" value on the colormap)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Inconsistent behavior of hexbins mincnt parameter, depending on C parameter
5 participants