Skip to content

FIX: Remove type checking for strings in '_validate_linestyle' #8165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 12, 2017

Conversation

afvincent
Copy link
Contributor

This PR fixes #8145 .

The type checking of strings was raising errors when non Unicode strings were passed as arguments to _validate_linestyle (that was introduced in #8040), which can easily occur under Python 2.7 for example. This PR replaces this type checking with a simple try: … except: … approach. It also adds a couple of test cases to explicitly test the situation of non Unicode arguments.

@afvincent afvincent added this to the 2.0.1 (next bug fix release) milestone Feb 27, 2017
@@ -926,7 +930,7 @@ def _validate_linestyle(ls):
# (called inside the instance of validate_nseq_float).
pass

raise ValueError("linestyle must be a string or " +
raise ValueError("linestyle must be a valid string or " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the "+" here?

Copy link
Contributor Author

@afvincent afvincent Feb 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that. Edit: done in 278913d.

@afvincent
Copy link
Contributor Author

Well Travis is really angry at me for the b'dotted' test case ^^! (Too bad, the tests pass locally on my computer with Python 2.7…) I am going to investigate this.

@afvincent
Copy link
Contributor Author

Ok, it looks like Python 2 (, which I am working with) is less strict about string-like object comparisons:

# Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40) 

In [1]: b'dotted' in [u'dotted']
Out[1]: True

In [2]: 'dotted'.encode('ascii') in [u'dotted']
Out[2]: True

than Python 3:

# Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 12:22:00) 

In [1]: b'dotted' in ['dotted']
Out[1]: False

In [2]: b'dotted'.decode() in ['dotted']
Out[2]: True

In [3]: 'dotted'.encode('ascii').decode() in ['dotted']
Out[3]: True

The commit 973dd5e should fix the previous failures, by using the decode method if it is available.

@afvincent
Copy link
Contributor Author

Hum, Travis seems a bit happier with Python 3, but I just tried locally _validate_linestyle('dotted'.encode('utf-16') on Python 3, and now I understand better @tacaswell's comment in #8145 😞… It raises a nice UnicodeDecodeError as the enconding is not UTF-8.

What is the usual way to deal with this kind of encoding problem?

PS: I'll take take of the PEP8 issue, do not worry.

@afvincent
Copy link
Contributor Author

Ok, I had a look at how matplotlib.text.Text handles the problem of encoding. It accepts “string or anything printable with '%s' conversion”. So actually, I wonder if it would not be better to do something similar in _validate_linestyle if we want to avoid avoid raising a UnicodeDecodeError exception when _validate_linestyle is given a “string” with an “exotic” encoding. For example

    try:
        return _validate_named_linestyle("{:s}".format(ls))
    except KeyError:
        # ls is not a valid name of line style.
        pass

should avoid raising a UnicodeDecodeError with UTF-16 & Co. Or would it be better to really raise a UnicodeDecodeErrorexception? In this case,

    try:
        return _validate_named_linestyle(six.text_type(ls))
    except KeyError:
        # ls is not a valid name of line style.
        pass

may be more consistent with the other string checking done in rcsetup rather than trying to call the decode method if it is available (like in 973dd5e).

@afvincent
Copy link
Contributor Author

Removing the need_review flag for the moment because handling encoding has appear to be a bit harder than I expected and the proper way to do it has still to be chosen.

Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove attempts to decode.

@@ -342,6 +342,8 @@ def generate_validator_testcases(valid):
('', ''), (' ', ' '),
('None', 'none'), ('none', 'none'),
('DoTtEd', 'dotted'),
(b'dotted', 'dotted'), # binary string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should only pass on python2 (where str and bytes are conflated). In python 3 this is correctly failing asb'dotted' != 'dotted' because to do the comparison you would have to either decode the bytes (which we can't do because we don't know the encoding) or encode the string into the same encoding as the bytes (which we can't do because we don't know the encoding).

I suggest putting this in another test with a skip if python3 mark?

@tacaswell
Copy link
Member

Just pass the user input through, in 3, users passing bytes in is wrong and should fail, on 2 it happens to work due to the warts of string v unicode on python2.

@afvincent
Copy link
Contributor Author

Ok thanks for the comment @tacaswell . I'll do this in a moment.

Note to myself: go back to before the decode commit, and play with six.PY3 in the test.

@afvincent afvincent force-pushed the fix_issue_8145_ls_validation branch from 973dd5e to 9ae7265 Compare March 1, 2017 10:35
@afvincent
Copy link
Contributor Author

Rebased! I squashed some similar commits and dropped the former one that attempted to decode the arguments.

@tacaswell No more attempt to decode the arguments, and I modified the tests to take care of the difference between Python 2 and Python 3. TBH, I struggled a bit to find a clean way to do the latter: I hope the style of the new version of the tests is fine. On my local computer with Python 2, the tests are passing (and the doc is building): hopefully Travis with Python 3 will agree.

@anntzer
Copy link
Contributor

anntzer commented Mar 2, 2017

The new approach may not play nicely (as in, cause a warning) due to the same issue as #7658? I haven't checked, but you may want to have a look.

@afvincent
Copy link
Contributor Author

@anntzer Thanks, I'll have a look this weekend.

@afvincent
Copy link
Contributor Author

Based on @anntzer's remark, it appears that indeed

np.array([1, 2]) in ['dotted', 'solid']

raises a FutureWarningbecause one day the comparison that is done under the hood by in will be element-wise (the warning will then become a ValueError exception, which does not belong to the ones that are currently captured in this PR). Actually, in the precise case of this PR, this FutureWarning is not raised because we ignore the case, which calls .lower() before any in operation, and thus raises an AttributeError error that is correctly caught if a Numpy array is given as on-off sequence.

With 97234b5, I reintroduce some early type checking to avoid any comparison of a Numpy array with strings. I agree it is kind of a nuke for a problem that does not even currently exist, but some (super) thin reasons to support this may be:

  • it makes the exception catching clearer (one can remove the AttributeError that may be raised by .lower())
  • if one day one want not to ignore the case of the argument, it will still work properly… (eventhough this API break is not very likely to happen ;) )
  • well, it works as well as the other solution (at least on Python 2 the tests pass locally, I hope Travis will agree on Python 3 ^^).

@afvincent
Copy link
Contributor Author

@tacaswell Ping if you want to have a look when Travis will have run (TL; DR: decode is no more 🐑)

@anntzer I hope you will notice the effort of not using is_string_like, despite the urge of doing so on first thought 😁. (ref: #8011)

if isinstance(ls, six.string_types):
try:
return _validate_named_linestyle(ls)
except (KeyError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no parentheses

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed… 844115f takes care of it.

@anntzer
Copy link
Contributor

anntzer commented Mar 8, 2017

@afvincent Congratulations, you've earned un bon point. (Dix points = une image. Dropping Python2 support = many many points :-))

@afvincent
Copy link
Contributor Author

@anntzer You know, I had a few teachers that were doing so in primary school 😄.

Abouy dropping Python2, well… Let's say that I made a “mistake” when I started to learn Python at the beginning of my Ph.D. But I promise, I'll switch for my next big research project (2020 is coming close anyway…). We'll see if the bounty points lasts until then ;). If it doesn't, at least I'll have access to all the wonderful things like @, f-strings & Co!

@anntzer
Copy link
Contributor

anntzer commented Mar 8, 2017

You know, I grew up in France... (in fact, pretty close to where you did your PhD...)

@afvincent
Copy link
Contributor Author

Rahh, Travis does not seem to be happy (@anntzer you may want to take your bon point back). I think I understand why (if I am right, I was lucky about exceptions catching before…): I'll push a commit when I will have a fix.

@afvincent afvincent force-pushed the fix_issue_8145_ls_validation branch from 844115f to d4bd0cf Compare March 9, 2017 20:33
@afvincent
Copy link
Contributor Author

Rebased (and squashed) with a new version! The former spirit is still there but now the exception catching should be more robust, both on Python 2 (a local pytest runs smoothly with test_rcparams.py and the docs are building) and on Python 3 (manual testing in an interactive session, I hope Travis will agree with me this time).

For the record, previously on both Python versions, the exception handling was flaky or error-prone. For example on Python 2, a UnicodeDecodeError was not caught with arguments like 'dotted'.encode('utf-16'), causing a short exit that was not expected. Unfortunately, this exception was then caught by pytest, which was expecting a ValueError and was thus marking the test as fine… On Python 3, there was (at least) a funny doubly wrong behavior: byte-like ls argument that were of even-length were passed to the instance of validate_nseq_float, which was converting them into funny on-off ink sequences (for example validate_nseq_float()('dotted'.encode('utf-16')) corresponds to [255.0, 254.0, 100.0, 0.0, 111.0, 0.0, 116.0, 0.0, 116.0, 0.0, 101.0, 0.0, 100.0, 0.0] 😄 )

@afvincent
Copy link
Contributor Author

Ok, now Travis stays calm :)! Ping to @tacaswell if you want to have a look, now that Travis is peacefull. And @anntzer because there have been quite a few changes since you had a look yesterday (sorry for that 🐑 ).

Copy link
Contributor

@anntzer anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor fix.

# 'solid'.encode('utf-16'), may raise a unicode error.
raise ValueError("the linestyle string is not a valid string.")

if hasattr(ls, 'decode'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance(ls, bytes) seems to be more explicit? (in the Py2 case this will already have been covered by the case above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I guess it should be isinstance(ls, (bytes, bytearray)) if one want to avoid jokes (and keep an identical behavior). I'll try to test that during the day.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's memoryview on Py3, and it doesn't have a decode attribute. I think we should just not worry about that case (people can always pass in objects with arbitrarily messed up __len__/__iter__/...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, on Python 3.6, I get

In [16]: hello = bytearray("Hello".encode('utf-16'))

In [17]: hasattr(hello, 'decode')
Out[17]: True

so this looks like bytearray instances do have a decode method, don't they?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops sorry, was confusing with buffer/memoryview. Ignore what I said :-)

except (UnicodeDecodeError, KeyError):
# On Python 2, string-like *ls*, like for example
# 'solid'.encode('utf-16'), may raise a unicode error.
raise ValueError("the linestyle string is not a valid string.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For debugging purpose of our users, could you add here what the value passed is?
raise ValueError("the linestyle string %r is not a valid string." % ls) should work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure @anntzer will support the idea of using old string formatting ^^. Joke apart, the idea to return argument that was passed seems OK to me: I'll do that during the day too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait doesn't everyone use f"the linestyle string {ls} is not a valid string." these days? :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however you prefer the string formatting to be :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we let master branch go 3-only we probably can target 3.6+ if we go back to supporting the 'last 2' python minor versions and are targeting a 3 only release for 2018-07 (py3.7 is scheduled for 2018-06-15).....

@NelleV NelleV changed the title FIX: Remove type checking for strings in '_validate_linestyle' [MRG+1] FIX: Remove type checking for strings in '_validate_linestyle' Mar 11, 2017
@afvincent afvincent force-pushed the fix_issue_8145_ls_validation branch from d4bd0cf to ed04d93 Compare March 11, 2017 10:42
@afvincent
Copy link
Contributor Author

Re-rebased: squashed to 2 single commits and reordered them (1. code changes, 2. update of the related test).

The only difference with the previous commits should be:

  • hasattr(ls, 'decode') <- isinstance(ls, (bytes, bytearray)), because “Explicit is better than implicit;”
  • more useful exception messages for debugging by adding the “representation” of the ls argument (done with "Yada yada {!r} yada".format(ls), because 2020 is far away and I like format :) ).

@NelleV
Copy link
Member

NelleV commented Mar 11, 2017

This looks good to me.

@NelleV NelleV changed the title [MRG+1] FIX: Remove type checking for strings in '_validate_linestyle' [MRG+2] FIX: Remove type checking for strings in '_validate_linestyle' Mar 11, 2017
@anntzer anntzer merged commit 3c77c28 into matplotlib:master Mar 12, 2017
@anntzer
Copy link
Contributor

anntzer commented Mar 12, 2017

Thanks!

@QuLogic QuLogic changed the title [MRG+2] FIX: Remove type checking for strings in '_validate_linestyle' FIX: Remove type checking for strings in '_validate_linestyle' Mar 12, 2017
@afvincent
Copy link
Contributor Author

Thanks to all who reviewed the PR :). It was a windy journey, but an instructive one from my side ^^.

@tacaswell
Copy link
Member

You have hit on both why the bytes / unicode changes were made in python3 and why the transition has been a bit rough 😈 .

@QuLogic
Copy link
Member

QuLogic commented Mar 27, 2017

There does not appear to be a _validate_linestyle in v2.0.x; @afvincent can you verify if this needs backporting?

@afvincent
Copy link
Contributor Author

@QuLogic Indeed, backporting does not seem required. #8040 (which is fixed by this PR) was not backported but milestoned for 2.1 (by @tacaswell) instead: that is why there is no _validate_linestyle in v2.0.x branch ⇒ removing the need_backport flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warning treated as error while generating docs
5 participants