Skip to content

plot.scatter: negative or null values lead to strange behavior on log scale #11898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ivankeller opened this issue Aug 20, 2018 · 13 comments
Closed
Labels
API: consistency status: inactive Marked by the “Stale” Github Action

Comments

@ivankeller
Copy link

Bug report

Related issues have already been reported but I think a simpler case is worth mentioning:

Bug summary

With negative or null values, log scale fails (although it does not crashes) discarding (masking) more points than the ones corresponding to the invalid negative or null values. At least a WARNING would be welcome.

Code for reproduction

Normal behavior: only positive values in y :

x = [1, 2, 3, 4]
y = [1, 10, 50, 100]
plt.scatter(x, y)
plt.yscale('log')

Actual outcome:
image

Bug: with value 0 in y (idem with negative values):

x = [1, 2, 3, 4]
y = [0, 10, 50, 100]
plt.scatter(x, y)
plt.yscale('log')

Actual outcome: only the last point is displayed.

image

Expected outcome
I understand that default behavior is masking of non positive values. But it seems that more values are masked. I tried chaging parameter nonposy : plt.yscale('log', nonposy='clip') but the behavior is the same.
At least a WARNING should be displayed about presence of negative or null values not compatible with log scale.

Matplotlib version

  • Operating system: Ubuntu 14.04.3 LTS
  • Matplotlib version: 2.2.2
  • Matplotlib backend (print(matplotlib.get_backend())): module://ipykernel.pylab.backend_inline
  • Python version: 3.6.5
  • Jupyter version (if applicable): 5.5.0
  • Other libraries:

matplotlib installed from conda default.

@anntzer
Copy link
Contributor

anntzer commented Aug 20, 2018

Actually this looks like a bug in the way the autolimits are computed (panning the window down shows that the "missing" points are actually plotted, just below the default lower y bound).
Silently ignorning nans/infs (which is what log(0) effectively is) is a longstanding behavior and unlikely to be ever changed.

@jklymak jklymak added this to the v3.1 milestone Aug 20, 2018
@jklymak
Copy link
Member

jklymak commented Aug 20, 2018

If you do plt.plot instead of plt.scatter this works fine.

@jklymak
Copy link
Member

jklymak commented Feb 12, 2019

I spent a while with this, but it has to do with _minpos handling, which seems far too mysterious to me. If you run this with plot the call self.axis.get_minpos() returns 10, which is the right answer. If you run with scatter it returns 100.008928, because? ... I guess something in src/_path_wrapper.py?

@hershen
Copy link
Contributor

hershen commented Feb 12, 2019

I don't completely understand what's going on, so I hope it's enough to be on the right track. The issue seems to be related to the fact that scatter adds a collection.
The collection's datalim is used to set the axes.dataLim:

self.update_datalim(collection.get_datalim(self.transData))

But for some reason, the collection's datalim is a bit wider than the minimum and maximum values of x and y (the inputs to scatter).

For the OP's problematic script, the collection created by scatter has a datalim:

In[3]: collection.get_datalim(self.transData)
Out[3]: Bbox([[0.9939516129032259, -0.008116883116883133], [4.006048387096774, 100.00811688311691]])

And the y of _minpos is set to the minimum positive value: 100.00811.

plot, adds lines instead of collections, and then a path (which I don't fully understand) is used to set the axes.dataLim:

self.dataLim.update_from_path(data_path,
self.ignore_existing_data_limits,
updatex=updatex,
updatey=updatey)

For the same script, but with plot instead of scatter:

In[2]: data_path
Out[2]: 
Path(array([[  1.,   0.],
       [  2.,  10.],
       [  3.,  50.],
       [  4., 100.]]), None)

And the correct _minpos is set.

I wasn't able to follow how the collection's datalim is determined, but a possible solution might be to set the axes.dataLim before adding the collection to the axes and then adding the collection with autolim=False, like is done in fill_between. If that's a valid solution, I can open a PR.

BTW, this also happens with some other plot types that add collections:
plt.hlines:

plt.hlines([-1, 1, 2, 3, 100], 0, 1)
plt.yscale('log')

hlines

plt.eventloop:

x = [-1, 10, 20, 100]
plt.eventplot(x)
plt.xscale('log')

eventplot

@jklymak
Copy link
Member

jklymak commented Feb 12, 2019

I think it'd be better to fix what seems to be a bug in _path_wrapper.cpp

@jklymak
Copy link
Member

jklymak commented Feb 13, 2019

OK, update_datalim calls collection.get_datalim which calls path.get_path_collection_extents where it gets turned into a four tuple. But this routine works in pixels, and the result is wanted in data units. collection.get_datalim transfers back to data units, but the logic that finds the smallest positive data is wrong because even negative data values can have positive pixel values.

This is all complicated, because of course the paths are in pixels, and the offsets are in data, so its not trivial to just pass in different transforms and expect it all to work. It seems a shame to not use the fast c-code, but its also not trivial to see how this can be fixed.

@jklymak
Copy link
Member

jklymak commented Feb 13, 2019

@hershen I think your solution is the only one that will work. If you wanted to submit a pr that would be good.

We may also just want to deprecate the autolim kwarg for add-collection since it doesn’t do the right thing.

@hershen
Copy link
Contributor

hershen commented Feb 13, 2019

Sure, I'll be happy to submit a PR.

Do you mind if I play around with it a bit more? Perhaps, the fact that the problem doesn't occur with plot mean that add_collection can be changed so that it uses the same mechanism? I think that'll be a neater solution.

I can also submit a PR fixing the problem for scatter and the other plot types that add collections and submit another PR if I find a better solution later, if that's preferred.

@jklymak
Copy link
Member

jklymak commented Feb 14, 2019

        self.add_collection(collection, autolim=False)
        xy = np.vstack((x, y)).T
        self.dataLim.update_from_data_xy(xy, self.ignore_existing_data_limits,
                                         updatex=True, updatey=True)

"fixes" it for me. However that fails all the image tests for scatter. The fundamental problem is that this doesn't take into account large markers, which should theoretically get a (slightly) larger extent to encompass the marker.

I'd argue that this extra extent to take into account the markers is not worth the extra complexity. But someone put some effort into almost making this work, so someone cared about it.

Another approach is to try to figure out the transform logic and write a proper data-space version of collection.get_dataLim.

@jklymak
Copy link
Member

jklymak commented Feb 14, 2019

Need to mask and ignore existing data lims. Still fails image tests with finite-sized markers...

        self.add_collection(collection, autolim=False)
        xy = np.vstack((x[~x.mask], y[~y.mask])).T
        self.ignore_existing_data_limits = False
        self.dataLim.update_from_data_xy(xy, self.ignore_existing_data_limits,
                                         updatex=True, updatey=True)
        self.autoscale_view()

@anntzer
Copy link
Contributor

anntzer commented Aug 7, 2019

This was mostly fixed by (I think) #13642, but the resulting autolimits remain non-optimal:
test
Symmetric margins seem normal.

@github-actions
Copy link

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label May 13, 2023
@anntzer
Copy link
Contributor

anntzer commented May 13, 2023

This seems to have been fixed.

@anntzer anntzer closed this as completed May 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API: consistency status: inactive Marked by the “Stale” Github Action
Projects
None yet
Development

No branches or pull requests

5 participants