Skip to content

FIX: make sure nans are dealt with in path #4526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

tacaswell
Copy link
Member

With gcc 5.1.0 it seems that the inner loop of the fast-path in
PathNanRemover gets optimized out of existence and the first non-finite
entry in the path prevents any further drawing.

This PR (rather hackishly) adds explicit de-references to x and y
to the loop to make sure the compiler does not decide it is unneeded.

Closes #4525

attn @efiring @mdboom @ianthomas23 I am in a bit over my head here (not that that has ever stopped me before).

@tacaswell tacaswell added this to the next point release milestone Jun 15, 2015
@efiring
Copy link
Member

efiring commented Jun 15, 2015

Never having studied C++, I'm lost in this code. I am guessing, however, that there is a cleaner solution involving a change in declarations somewhere--maybe 'volatile' modifiers or something like that.

@mdboom
Copy link
Member

mdboom commented Jun 15, 2015

Wow. Seems like a pretty bad compiler bug. It can't possibly be optimizing that away on purpose.

You could try volatile, but I don't think that's related -- volatile is generally to indicate that a value may be altered by another thread (or interrupt handler) -- and the compiler won't let you alias the value with a non-volatile variable. We're not multithreaded. register might make sense, because it would force its loading into the CPU register, but my understanding is that gcc has ignored that for years on x86 anyway. Though in the presence of an unknown bug, anything is possible, I suppose. I think this fix is probably better than either of those options.

I can't reproduce on the 5.1.1 (which is what Fedora currently has -- it's actually a prerelease, since 5.1.1 hasn't been released yet), suggesting perhaps that the bug has already been fixed in the compiler and it's only a matter of time before others get this fix. But it's not like your fix here is terribly horrible -- maybe add a comment that it's to get around an apparent compiler bug in gcc-5.1.0, since I'd be likely to come across this naively and think "um, no" and remove it.

BTW: It seems #4252 is the wrong bug, perhaps?

@mdboom
Copy link
Member

mdboom commented Jun 15, 2015

Actually -- I take this back. I can reproduce on 5.1.1-1 (Fedora prerelease) with Python 3.4.3. I can't reproduce on 2.7.9. So that's another data point (and also somewhat disturbing...) I'll play with this for a bit.

@tacaswell
Copy link
Member Author

should be #4525.

I am glad you can reproduce this. I spent most of my evening yesterday and was getting worried I was making some really obvious mistake.

That it works in 2.7, but not with 3.4 is rather baffling.

With gcc 5.1.0 it seems that the inner loop of the fast-path in
PathNanRemover gets optimized out of existence and the first non-finite
entry in the path prevents any further drawing.

This PR (rather hackishly) adds explicit de-references to x and y
to the loop to make sure the compiler does not decide it is unneeded.

Closes matplotlib#4525
@tacaswell tacaswell force-pushed the fix_prevent_loop_optomization branch from 5f747ca to ed7a3dc Compare June 15, 2015 14:01
@tacaswell
Copy link
Member Author

@mdboom fyi, force pushed to fix the commit message.

This looks like a bad idea, but this is working around a complier
bug.
@mdboom
Copy link
Member

mdboom commented Jun 15, 2015

I think I've narrowed this down to the use of the -fno-strict-aliasing flag. My Fedora-provided system Python 2.7.9 passes this flag to gcc when compiling extensions, and my self-compiled Python 3.4 does not. (And adding the flag fixes it everywhere). So the difference is not so much in the Python version but who it was configured/compiled.

I'm looking into whether we can fix our breaking of strict aliasing rules (the compile with -fno-strict-aliasing is raising some warnings), or we just forcibly add the flag (which gets tricky for non-gcc compilers). Anyway, things are starting to make more sense, at least...

@tacaswell
Copy link
Member Author

On both of my arch boxes there are 6-7 test failures that I have been ignoring because tests pass on travis, I should probably track those down to make sure none of these are due to similar things.

@mdboom
Copy link
Member

mdboom commented Jun 15, 2015

The issue is here in mpl_isnan.h. This macro is the core of all of the MPL_isnan64, MPL_isfinite64 etc. macros. Below, u is a double, and it gets recast as an int64 in order to look at the individual bits and determine what special flags are set.

#if !defined(MPL_U64)
#define MPL_U64(u) (* (MPL_Int64 *) &(u) )
#endif /* MPL_U64 */

This kind of thing has had "undefined behavior" in C for a long time (forever?) but we could get away with it when compilers were less aggressive about optimizing. And gcc's -fno-strict-aliasing has been a way to get the old behavior.

I think what we should do is move all of these to use the C++ stdlib's isfinite, isnan etc. They should be just as fast, and inlined etc. There may have been a portability issue with them back when John wrote matplotlib's macros (just guessing there) but I truly doubt that's an issue anymore.

I should have a net-negative-lines-of-code PR shortly.

@tacaswell
Copy link
Member Author

Closing in favor of #4527

@tacaswell tacaswell closed this Jun 15, 2015
@tacaswell tacaswell deleted the fix_prevent_loop_optomization branch July 22, 2015 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants