Skip to content

Default rstride and cstride for plot_surface changed to 1 from 10. #3142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

tobin
Copy link

@tobin tobin commented Jun 20, 2014

It doesn't seem like a sensible default to throw away 90% of the data.

It doesn't seem like a sensible default to throw away 90% of the data.
@tobin
Copy link
Author

tobin commented Jun 20, 2014

Correction: the old defaults (rstride=10, cstride=10) throw away 99% of the data.

@tacaswell
Copy link
Member

@WeatherGod comments?

I am skeptical of changing defaults as a rule (contra @pelson ). I think the default is what it is to make plotting smooth, large data sets snappy/fury in memory.

@tobin
Copy link
Author

tobin commented Jun 20, 2014

I agree that changing a default value is a bit awkward, as some users' scripts will break. However, I strongly believe that the current default (10) is not sensible.

For example, I encountered this issue because I was very confused why surface_plot was not plotting the data that I gave it, which I had already sampled at the level of mesh refinement necessary to generate a nice plot. I had to go into the matplotlib source code to discover that matplotlib was internally discarding 99% of the data that I wanted to plot.

I strongly believe that the only sensible default is to plot all of the data that the user asks to be plotted, and only discard some of it if explicitly asked.

@WeatherGod
Copy link
Member

This issue has come up from time to time. There are reasons for the defaults, namely that the performance absolutely goes to crap if you have too much data, as well as the plot looking very poor at even moderate densities (due to edgelines crowding out).

With respect to defaults, for many of the discussions of default settings in matplotlib, it comes down to stylistic differences. And the only reason why we keep them is to keep from breaking existing codebases. We are moving towards a very nice solution using "stylesheets" where we would provide a number of stylesheets that let's developers explicitly state that they want a particular version or the latest, etc.

This, however, is not one of those situations. Unlike most other default choices, this one particular set of defaults actually have material impact on the resulting plot, and a material impact on the performance of the code. Changing a default like that -- either in the codebase, or in stylesheets -- is simply not an option.

That all being said... what should definitely happen here is that the r/cstride parameters should be much better documented (here and in the wireframe function). Currently, the options are described, but it may not be clear what it means. Also, the default values are not stated, nor are they in the call signature. At the absolute least, that must be fixed.

@tacaswell tacaswell modified the milestones: proposed next point release, next point release Feb 19, 2015
@tacaswell
Copy link
Member

Closing this as it as 7mo old and two devs are 👎 on it.

This resulted in #3185 which documents the defaults more prominently.

@tacaswell tacaswell closed this Feb 19, 2015
@anntzer
Copy link
Contributor

anntzer commented Aug 30, 2016

@tacaswell @WeatherGod Any chance we can revisit this choice for 2.0?
I agree with the OP that the current defaults are not sensible. Not all datasets are large; and averaging may or may not be the correct way of downsampling.

@tacaswell
Copy link
Member

I would be willing to change this for 2.0, but will defer to @WeatherGod on this.

@tacaswell tacaswell modified the milestones: 2.0 (style change major release), v1.5.0 Aug 30, 2016
@tacaswell tacaswell reopened this Aug 30, 2016
@tacaswell
Copy link
Member

tacaswell commented Sep 12, 2016

ping @WeatherGod

@WeatherGod
Copy link
Member

I honestly just do not think that this default should change. If it
overdecimates your data, then it will be obvious to the user and they
should adjust. If it under-decimate your data, then you are easily at risk
of swamping your memory and cpu, which would require kill -9 to get out of,
potentially losing other work in an interactive session.

Yes, this is true for several of matplotlib's functions, but plot_surface
seems especially sensitive to this problem (partly because of its
inefficient algorithm).

On Mon, Sep 12, 2016 at 3:39 PM, Thomas A Caswell notifications@github.com
wrote:

pning @WeatherGod https://github.com/WeatherGod


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3142 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AARy-ArPplUoVy92j9k1rnSRPQ8PCV2pks5qpap-gaJpZM4CGKq6
.

@anntzer
Copy link
Contributor

anntzer commented Sep 12, 2016

I don't want to sound rude but... I think this argument holds no water.

There are currently a ton of ways of fatal-abort'ing matplotlib (e.g. #6983, #7052 for recently fixed issues), and imshow itself has such bad memory performance that it can also easily lead to an out of memory situation (#6952). Still I don't think anyone would think it would make sense to auto-downsample arguments to imshow just in case the user was going to do something stupid.

Moreover, the 10x 10x downsampling is just a random shot in the dark. You have no way of knowing whether that's going to be enough or too much downsampling when the goal is to prevent an out-of-memory situation. If we really want to do something about this, we should measure how much memory is currently available and downsample to fit in that amount. Not that I think we should bother doing this, but at least it would be useful, rather than being an annoyance.

See also #5602 for another automatic downsampling approach which may make sense.

@efiring
Copy link
Member

efiring commented Sep 12, 2016

How about a simpler approach: a new default of None for rstride and cstride, in which case the strides are chosen to keep the number of points under some maximum, say 100. This maximum could be controlled via another kwarg (documented with a warning that setting it to a larger value risks hanging the machine). I think this would reduce shotgun blasts to users' feet without depriving them of any of the control they have now. It would be a better default because it would do the expected thing--no decimation--when given small input arrays.

@efiring
Copy link
Member

efiring commented Sep 12, 2016

A fancier downscaling operation could be added in some future release, but it shouldn't delay 2.0 given that a simple approach such as I outlined yields major progress.

@WeatherGod
Copy link
Member

Hmmm, I would be willing to accept something like that.

On Mon, Sep 12, 2016 at 6:08 PM, Eric Firing notifications@github.com
wrote:

A fancier downscaling operation could be added in some future release, but
it shouldn't delay 2.0 given that a simple approach such as I outlined
yields major progress.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3142 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AARy-LC8IdF_gYKu94em8y5olRKVgXoFks5qpc1xgaJpZM4CGKq6
.

@WeatherGod
Copy link
Member

I have implemented something that I think would satisfy everybody. See #7164.

@tacaswell
Copy link
Member

re-closing in favor of #7164

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants