Skip to content

Possible issue in hexbin using weights #20877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Enzoupi opened this issue Aug 23, 2021 · 7 comments
Closed

Possible issue in hexbin using weights #20877

Enzoupi opened this issue Aug 23, 2021 · 7 comments
Labels
Milestone

Comments

@Enzoupi
Copy link

Enzoupi commented Aug 23, 2021

I am facing a problem in matplotlib, using hexbin with weights (argument C=).

When I am filling the C argument, not only the ouput color changes but also the shape. Here is a very simple code snippet reproducing the behavior:

  npts = int(2e5)
  
  # Create a random distribution and another one as it perturbation
  x1 = np.random.randint(0,3000,npts)
  x2 = x1 + np.random.normal(loc=0,scale=75,size=npts)
  
  # Same for the second axis variable
  y1 = np.random.randint(280,310,npts)
  y2 = y1 + np.random.normal(loc=0,scale=2,size=npts)
  
  # Plot
  fig, axes = plt.subplots(2,1)
  ax = axes[0]
  ax.hexbin(y2-y1,x1,mincnt=1,extent=[-8,8,0,3000], gridsize=150) 
  #
  ax = axes[1]
  res = ax.hexbin(y2-y1,x1,C=x1,mincnt=1,extent=[-8,8,0,3000], gridsize=150)
  
  fig.savefig(f"hexbin_weights.png")
  plt.show()

And the corresponding output:
Snippet output

To my understanding, as the x and y coordinates of the points are identical, only the color on the two plots should change. However, it is clearly not the case. Is it another parameter that is changing without me being aware of ? Is C not doing what I expect it to do ?

Thanks in advance for any help !

@jklymak
Copy link
Member

jklymak commented Aug 23, 2021

Try turning mincount to 1e-6 or something else suitably small, and I think you will figure out the issue.

I'll close because I don't think there is an error here. Please discuss at https://discourse.matplotlib.org if you would like more explanation. Thanks!

@jklymak jklymak closed this as completed Aug 23, 2021
@jklymak jklymak added the Community support Users in need of help. label Aug 23, 2021
@Enzoupi
Copy link
Author

Enzoupi commented Aug 23, 2021

Thanks for your answer ! Although, I believe there might still be something more to say @jklymak. From the hexbin documentation it is said :

mincnt > 0, default: None
    If not None, only display cells with more than mincnt number of points in the cell.

Hence the mincnt should, to my understanding of the docstring, not depend on any of the variables (not x,y, or C). It should be the minimum number of points landing in the hexagon no ? This might be an English related problem though...

@jklymak
Copy link
Member

jklymak commented Aug 23, 2021

Right maybe the documentation could be clearer. Apparently when c is provided it modifies the count (or weight) of each data point. In this case you are making those counts be less than 1, and falling below the threshold.

@RebeccaWPerry
Copy link
Contributor

OK if I pick this one up? I can see how an improvement to the mincnt part of the docstring would help here.

@jklymak
Copy link
Member

jklymak commented Oct 18, 2021

@RebeccaWPerry we don't assign PRs, so if an issue doesn't have a linked PR, anyone is welcome to work on it. If there is a linked PR, please try and work in the confines of that PR.

@RebeccaWPerry
Copy link
Contributor

I looked into this more and realized it wasn't actually a documentation-only issue. I included more details in PR #21381, but for for the same of completeness here is an example that shows even when "scaling" by 1 and reducing with sum the bug persists.

import numpy as np
import matplotlib.pyplot as plt

npts = int(2e4)

# Create a random distribution and another one as it perturbation
y = np.random.randint(0, 100, npts)
# Y axis variable
x = np.random.normal(loc=0, scale=2, size=npts)

# Plot
fig, axes = plt.subplots(2,1)

#C not specified
ax = axes[0]
res1 = ax.hexbin(x, y, mincnt=1, extent=[-8,8,0,100], gridsize=150)

#adding weights of 1 and specifying reduce as sum should be identical to no weights
ax = axes[1]
res2 = ax.hexbin(x, y, C=np.ones(len(x)), mincnt=1, extent=[-8,8,0,100], gridsize=150,
                 reduce_C_function = np.sum)

c1 = res1.get_array()                
c2 = res2.get_array()

print('points binned in plot1: ', sum(c1))
print('points binned in plot2: ', sum(c2))

fig.savefig(f"hexbin_weights.png")
plt.show()

@rcomer
Copy link
Member

rcomer commented Dec 12, 2023

With v3.8.0 and main, the code from the OP gives me

image

So I think this is fixed, probably by #26113.

@rcomer rcomer closed this as completed Dec 12, 2023
@rcomer rcomer added this to the v3.8.0 milestone Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants