FIX: Remove type checking for strings in '_validate_linestyle' #8165

afvincent · 2017-02-27T17:28:55Z

This PR fixes #8145 .

The type checking of strings was raising errors when non Unicode strings were passed as arguments to _validate_linestyle (that was introduced in #8040), which can easily occur under Python 2.7 for example. This PR replaces this type checking with a simple try: … except: … approach. It also adds a couple of test cases to explicitly test the situation of non Unicode arguments.

anntzer · 2017-02-27T17:56:22Z

lib/matplotlib/rcsetup.py

@@ -926,7 +930,7 @@ def _validate_linestyle(ls):
        # (called inside the instance of validate_nseq_float).
        pass

-    raise ValueError("linestyle must be a string or " +
+    raise ValueError("linestyle must be a valid string or " +


Remove the "+" here?

I'll do that. Edit: done in 278913d.

afvincent · 2017-02-27T18:24:36Z

Well Travis is really angry at me for the b'dotted' test case ^^! (Too bad, the tests pass locally on my computer with Python 2.7…) I am going to investigate this.

afvincent · 2017-02-27T20:20:57Z

Ok, it looks like Python 2 (, which I am working with) is less strict about string-like object comparisons:

# Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40) 

In [1]: b'dotted' in [u'dotted']
Out[1]: True

In [2]: 'dotted'.encode('ascii') in [u'dotted']
Out[2]: True

than Python 3:

# Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 12:22:00) 

In [1]: b'dotted' in ['dotted']
Out[1]: False

In [2]: b'dotted'.decode() in ['dotted']
Out[2]: True

In [3]: 'dotted'.encode('ascii').decode() in ['dotted']
Out[3]: True

The commit 973dd5e should fix the previous failures, by using the decode method if it is available.

afvincent · 2017-02-27T20:34:14Z

Hum, Travis seems a bit happier with Python 3, but I just tried locally _validate_linestyle('dotted'.encode('utf-16') on Python 3, and now I understand better @tacaswell's comment in #8145 😞… It raises a nice UnicodeDecodeError as the enconding is not UTF-8.

What is the usual way to deal with this kind of encoding problem?

PS: I'll take take of the PEP8 issue, do not worry.

afvincent · 2017-02-28T07:57:46Z

Ok, I had a look at how matplotlib.text.Text handles the problem of encoding. It accepts “string or anything printable with '%s' conversion”. So actually, I wonder if it would not be better to do something similar in _validate_linestyle if we want to avoid avoid raising a UnicodeDecodeError exception when _validate_linestyle is given a “string” with an “exotic” encoding. For example

    try:
        return _validate_named_linestyle("{:s}".format(ls))
    except KeyError:
        # ls is not a valid name of line style.
        pass

should avoid raising a UnicodeDecodeError with UTF-16 & Co. Or would it be better to really raise a UnicodeDecodeErrorexception? In this case,

    try:
        return _validate_named_linestyle(six.text_type(ls))
    except KeyError:
        # ls is not a valid name of line style.
        pass

may be more consistent with the other string checking done in rcsetup rather than trying to call the decode method if it is available (like in 973dd5e).

afvincent · 2017-02-28T08:01:13Z

Removing the need_review flag for the moment because handling encoding has appear to be a bit harder than I expected and the proper way to do it has still to be chosen.

tacaswell

Remove attempts to decode.

tacaswell · 2017-03-01T05:30:30Z

lib/matplotlib/tests/test_rcparams.py

@@ -342,6 +342,8 @@ def generate_validator_testcases(valid):
                     ('', ''), (' ', ' '),
                     ('None', 'none'), ('none', 'none'),
                     ('DoTtEd', 'dotted'),
+                     (b'dotted', 'dotted'),  # binary string


These should only pass on python2 (where str and bytes are conflated). In python 3 this is correctly failing asb'dotted' != 'dotted' because to do the comparison you would have to either decode the bytes (which we can't do because we don't know the encoding) or encode the string into the same encoding as the bytes (which we can't do because we don't know the encoding).

I suggest putting this in another test with a skip if python3 mark?

tacaswell · 2017-03-01T05:32:41Z

Just pass the user input through, in 3, users passing bytes in is wrong and should fail, on 2 it happens to work due to the warts of string v unicode on python2.

afvincent · 2017-03-01T06:48:28Z

Ok thanks for the comment @tacaswell . I'll do this in a moment.

Note to myself: go back to before the decode commit, and play with six.PY3 in the test.

afvincent · 2017-03-01T10:45:44Z

Rebased! I squashed some similar commits and dropped the former one that attempted to decode the arguments.

@tacaswell No more attempt to decode the arguments, and I modified the tests to take care of the difference between Python 2 and Python 3. TBH, I struggled a bit to find a clean way to do the latter: I hope the style of the new version of the tests is fine. On my local computer with Python 2, the tests are passing (and the doc is building): hopefully Travis with Python 3 will agree.

anntzer · 2017-03-02T01:49:16Z

The new approach may not play nicely (as in, cause a warning) due to the same issue as #7658? I haven't checked, but you may want to have a look.

afvincent · 2017-03-02T16:42:01Z

@anntzer Thanks, I'll have a look this weekend.

afvincent · 2017-03-08T20:16:46Z

Based on @anntzer's remark, it appears that indeed

np.array([1, 2]) in ['dotted', 'solid']

raises a FutureWarningbecause one day the comparison that is done under the hood by in will be element-wise (the warning will then become a ValueError exception, which does not belong to the ones that are currently captured in this PR). Actually, in the precise case of this PR, this FutureWarning is not raised because we ignore the case, which calls .lower() before any in operation, and thus raises an AttributeError error that is correctly caught if a Numpy array is given as on-off sequence.

With 97234b5, I reintroduce some early type checking to avoid any comparison of a Numpy array with strings. I agree it is kind of a nuke for a problem that does not even currently exist, but some (super) thin reasons to support this may be:

it makes the exception catching clearer (one can remove the AttributeError that may be raised by .lower())
if one day one want not to ignore the case of the argument, it will still work properly… (eventhough this API break is not very likely to happen ;) )
well, it works as well as the other solution (at least on Python 2 the tests pass locally, I hope Travis will agree on Python 3 ^^).

afvincent · 2017-03-08T20:29:21Z

@tacaswell Ping if you want to have a look when Travis will have run (TL; DR: decode is no more 🐑)

@anntzer I hope you will notice the effort of not using is_string_like, despite the urge of doing so on first thought 😁. (ref: #8011)

anntzer · 2017-03-08T20:41:19Z

lib/matplotlib/rcsetup.py

+    if isinstance(ls, six.string_types):
+        try:
+            return _validate_named_linestyle(ls)
+        except (KeyError):


no parentheses

Indeed… 844115f takes care of it.

anntzer · 2017-03-08T20:43:26Z

@afvincent Congratulations, you've earned un bon point. (Dix points = une image. Dropping Python2 support = many many points :-))

afvincent · 2017-03-08T23:49:36Z

@anntzer You know, I had a few teachers that were doing so in primary school 😄.

Abouy dropping Python2, well… Let's say that I made a “mistake” when I started to learn Python at the beginning of my Ph.D. But I promise, I'll switch for my next big research project (2020 is coming close anyway…). We'll see if the bounty points lasts until then ;). If it doesn't, at least I'll have access to all the wonderful things like @, f-strings & Co!

anntzer · 2017-03-08T23:51:57Z

You know, I grew up in France... (in fact, pretty close to where you did your PhD...)

afvincent · 2017-03-09T17:57:02Z

Rahh, Travis does not seem to be happy (@anntzer you may want to take your bon point back). I think I understand why (if I am right, I was lucky about exceptions catching before…): I'll push a commit when I will have a fix.

afvincent · 2017-03-09T20:33:18Z

Rebased (and squashed) with a new version! The former spirit is still there but now the exception catching should be more robust, both on Python 2 (a local pytest runs smoothly with test_rcparams.py and the docs are building) and on Python 3 (manual testing in an interactive session, I hope Travis will agree with me this time).

For the record, previously on both Python versions, the exception handling was flaky or error-prone. For example on Python 2, a UnicodeDecodeError was not caught with arguments like 'dotted'.encode('utf-16'), causing a short exit that was not expected. Unfortunately, this exception was then caught by pytest, which was expecting a ValueError and was thus marking the test as fine… On Python 3, there was (at least) a funny doubly wrong behavior: byte-like ls argument that were of even-length were passed to the instance of validate_nseq_float, which was converting them into funny on-off ink sequences (for example validate_nseq_float()('dotted'.encode('utf-16')) corresponds to [255.0, 254.0, 100.0, 0.0, 111.0, 0.0, 116.0, 0.0, 116.0, 0.0, 101.0, 0.0, 100.0, 0.0] 😄 )

afvincent · 2017-03-09T22:57:32Z

Ok, now Travis stays calm :)! Ping to @tacaswell if you want to have a look, now that Travis is peacefull. And @anntzer because there have been quite a few changes since you had a look yesterday (sorry for that 🐑 ).

anntzer

Minor fix.

anntzer · 2017-03-10T00:32:49Z

lib/matplotlib/rcsetup.py

+            # 'solid'.encode('utf-16'), may raise a unicode error.
+            raise ValueError("the linestyle string is not a valid string.")
+
+    if hasattr(ls, 'decode'):


isinstance(ls, bytes) seems to be more explicit? (in the Py2 case this will already have been covered by the case above).

Then I guess it should be isinstance(ls, (bytes, bytearray)) if one want to avoid jokes (and keep an identical behavior). I'll try to test that during the day.

It's memoryview on Py3, and it doesn't have a decode attribute. I think we should just not worry about that case (people can always pass in objects with arbitrarily messed up __len__/__iter__/...)

Huh, on Python 3.6, I get

In [16]: hello = bytearray("Hello".encode('utf-16')) In [17]: hasattr(hello, 'decode') Out[17]: True

so this looks like bytearray instances do have a decode method, don't they?

Oops sorry, was confusing with buffer/memoryview. Ignore what I said :-)

NelleV · 2017-03-10T03:07:35Z

lib/matplotlib/rcsetup.py

+        except (UnicodeDecodeError, KeyError):
+            # On Python 2, string-like *ls*, like for example
+            # 'solid'.encode('utf-16'), may raise a unicode error.
+            raise ValueError("the linestyle string is not a valid string.")


For debugging purpose of our users, could you add here what the value passed is?
raise ValueError("the linestyle string %r is not a valid string." % ls) should work.

I am not sure @anntzer will support the idea of using old string formatting ^^. Joke apart, the idea to return argument that was passed seems OK to me: I'll do that during the day too.

Wait doesn't everyone use f"the linestyle string {ls} is not a valid string." these days? :-)

👍 to this.

however you prefer the string formatting to be :)

When we let master branch go 3-only we probably can target 3.6+ if we go back to supporting the 'last 2' python minor versions and are targeting a 3 only release for 2018-07 (py3.7 is scheduled for 2018-06-15).....

afvincent · 2017-03-11T10:53:06Z

Re-rebased: squashed to 2 single commits and reordered them (1. code changes, 2. update of the related test).

The only difference with the previous commits should be:

hasattr(ls, 'decode') <- isinstance(ls, (bytes, bytearray)), because “Explicit is better than implicit;”
more useful exception messages for debugging by adding the “representation” of the ls argument (done with "Yada yada {!r} yada".format(ls), because 2020 is far away and I like format :) ).

NelleV · 2017-03-11T20:11:42Z

This looks good to me.

anntzer · 2017-03-12T17:29:30Z

Thanks!

afvincent · 2017-03-12T19:04:36Z

Thanks to all who reviewed the PR :). It was a windy journey, but an instructive one from my side ^^.

tacaswell · 2017-03-12T19:33:01Z

You have hit on both why the bytes / unicode changes were made in python3 and why the transition has been a bit rough 😈 .

QuLogic · 2017-03-27T04:56:24Z

There does not appear to be a _validate_linestyle in v2.0.x; @afvincent can you verify if this needs backporting?

afvincent · 2017-03-28T08:23:59Z

@QuLogic Indeed, backporting does not seem required. #8040 (which is fixed by this PR) was not backported but milestoned for 2.1 (by @tacaswell) instead: that is why there is no _validate_linestyle in v2.0.x branch ⇒ removing the need_backport flag.

afvincent added status: needs review topic: rcparams labels Feb 27, 2017

afvincent added this to the 2.0.1 (next bug fix release) milestone Feb 27, 2017

afvincent mentioned this pull request Feb 27, 2017

Warning treated as error while generating docs #8145

Closed

tacaswell approved these changes Feb 27, 2017

View reviewed changes

anntzer reviewed Feb 27, 2017

View reviewed changes

afvincent added In Progress and removed status: needs review labels Feb 28, 2017

tacaswell requested changes Mar 1, 2017

View reviewed changes

tacaswell reviewed Mar 1, 2017

View reviewed changes

afvincent force-pushed the fix_issue_8145_ls_validation branch from 973dd5e to 9ae7265 Compare March 1, 2017 10:35

afvincent mentioned this pull request Mar 8, 2017

Reduce number of CI builds #8232

Closed

afvincent added status: needs review and removed In Progress labels Mar 8, 2017

anntzer requested changes Mar 8, 2017

View reviewed changes

anntzer approved these changes Mar 8, 2017

View reviewed changes

afvincent force-pushed the fix_issue_8145_ls_validation branch from 844115f to d4bd0cf Compare March 9, 2017 20:33

afvincent mentioned this pull request Mar 9, 2017

changes in travis's build environment #8239

Merged

anntzer requested changes Mar 10, 2017

View reviewed changes

NelleV reviewed Mar 10, 2017

View reviewed changes

tacaswell approved these changes Mar 10, 2017

View reviewed changes

NelleV changed the title ~~FIX: Remove type checking for strings in '_validate_linestyle'~~ [MRG+1] FIX: Remove type checking for strings in '_validate_linestyle' Mar 11, 2017

afvincent added 2 commits March 11, 2017 11:38

More robust type checking in '_validate_linestyle', on both Py2 and Py3

0a1473e

tests now depend on python version to check cases with bytes args

ed04d93

afvincent force-pushed the fix_issue_8145_ls_validation branch from d4bd0cf to ed04d93 Compare March 11, 2017 10:42

NelleV approved these changes Mar 11, 2017

View reviewed changes

NelleV changed the title ~~[MRG+1] FIX: Remove type checking for strings in '_validate_linestyle'~~ [MRG+2] FIX: Remove type checking for strings in '_validate_linestyle' Mar 11, 2017

anntzer approved these changes Mar 12, 2017

View reviewed changes

anntzer merged commit 3c77c28 into matplotlib:master Mar 12, 2017

QuLogic added the Needs backport label Mar 12, 2017

QuLogic changed the title ~~[MRG+2] FIX: Remove type checking for strings in '_validate_linestyle'~~ FIX: Remove type checking for strings in '_validate_linestyle' Mar 12, 2017

QuLogic removed the status: needs review label Mar 20, 2017

afvincent removed the Needs backport label Mar 28, 2017

Uh oh!

FIX: Remove type checking for strings in '_validate_linestyle' #8165

FIX: Remove type checking for strings in '_validate_linestyle' #8165

Uh oh!

Conversation

afvincent commented Feb 27, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

afvincent Feb 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

afvincent commented Feb 27, 2017

Uh oh!

afvincent commented Feb 27, 2017

Uh oh!

afvincent commented Feb 27, 2017

Uh oh!

afvincent commented Feb 28, 2017

Uh oh!

afvincent commented Feb 28, 2017

Uh oh!

tacaswell left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tacaswell commented Mar 1, 2017

Uh oh!

afvincent commented Mar 1, 2017

Uh oh!

afvincent commented Mar 1, 2017

Uh oh!

anntzer commented Mar 2, 2017

Uh oh!

afvincent commented Mar 2, 2017

Uh oh!

afvincent commented Mar 8, 2017

Uh oh!

afvincent commented Mar 8, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anntzer commented Mar 8, 2017

Uh oh!

afvincent commented Mar 8, 2017

Uh oh!

anntzer commented Mar 8, 2017

Uh oh!

afvincent commented Mar 9, 2017

Uh oh!

afvincent commented Mar 9, 2017

Uh oh!

afvincent commented Mar 9, 2017

Uh oh!

anntzer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

afvincent Feb 27, 2017 •

edited

Loading