|
| 1 | +Writing code for Python 2 and 3 |
| 2 | +------------------------------- |
| 3 | + |
| 4 | +As of matplotlib 1.4, the `six <http://pythonhosted.org/six/>`_ |
| 5 | +library is used to support Python 2 and 3 from a single code base. |
| 6 | +The `2to3` tool is no longer used. |
| 7 | + |
| 8 | +This document describes some of the issues with that approach and some |
| 9 | +recommended solutions. It is not a complete guide to Python 2 and 3 |
| 10 | +compatibility. |
| 11 | + |
| 12 | +Welcome to the ``__future__`` |
| 13 | +----------------------------- |
| 14 | + |
| 15 | +The top of every `.py` file should include the following:: |
| 16 | + |
| 17 | + from __future__ import absolute_import, division, print_function, unicode_literals |
| 18 | + |
| 19 | +This will make the Python 2 interpreter behave as close to Python 3 as |
| 20 | +possible. |
| 21 | + |
| 22 | +All matplotlib files should also import `six`, whether they are using |
| 23 | +it or not, just to make moving code between modules easier, as `six` |
| 24 | +gets used *a lot*:: |
| 25 | + |
| 26 | + import six |
| 27 | + |
| 28 | +Finding places to use six |
| 29 | +------------------------- |
| 30 | + |
| 31 | +The only way to make sure code works on both Python 2 and 3 is to make sure it |
| 32 | +is covered by unit tests. |
| 33 | + |
| 34 | +However, the `2to3` commandline tool can also be used to locate places |
| 35 | +that require special handling with `six`. |
| 36 | + |
| 37 | +(The `modernize <https://pypi.python.org/pypi/modernize>`_ tool may |
| 38 | +also be handy, though I've never used it personally). |
| 39 | + |
| 40 | +The `six <http://pythonhosted.org/six/>`_ documentation serves as a |
| 41 | +good reference for the sorts of things that need to be updated. |
| 42 | + |
| 43 | +The dreaded ``\u`` escapes |
| 44 | +-------------------------- |
| 45 | + |
| 46 | +When `from __future__ import unicode_literals` is used, all string |
| 47 | +literals (not preceded with a `b`) will become unicode literals. |
| 48 | + |
| 49 | +Normally, one would use "raw" string literals to encode strings that |
| 50 | +contain a lot of slashes that we don't want Python to interpret as |
| 51 | +special characters. A common example in matplotlib is when it deals |
| 52 | +with TeX and has to represent things like ``r"\usepackage{foo}"``. |
| 53 | +Unfortunately, on Python 2there is no way to represent `\u` in a raw |
| 54 | +unicode string literal, since it will always be interpreted as the |
| 55 | +start of a unicode character escape, such as `\u20af`. The only |
| 56 | +solution is to use a regular (non-raw) string literal and repeat all |
| 57 | +slashes, e.g. ``"\\usepackage{foo}"``. |
| 58 | + |
| 59 | +The following shows the problem on Python 2:: |
| 60 | + |
| 61 | + >>> ur'\u' |
| 62 | + File "<stdin>", line 1 |
| 63 | + SyntaxError: (unicode error) 'rawunicodeescape' codec can't decode bytes in |
| 64 | + position 0-1: truncated \uXXXX |
| 65 | + >>> ur'\\u' |
| 66 | + u'\\\\u' |
| 67 | + >>> u'\u' |
| 68 | + File "<stdin>", line 1 |
| 69 | + SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in |
| 70 | + position 0-1: truncated \uXXXX escape |
| 71 | + >>> u'\\u' |
| 72 | + u'\\u' |
| 73 | + |
| 74 | +This bug has been fixed in Python 3, however, we can't take advantage |
| 75 | +of that and still support Python 2:: |
| 76 | + |
| 77 | + >>> r'\u' |
| 78 | + '\\u' |
| 79 | + >>> r'\\u' |
| 80 | + '\\\\u' |
| 81 | + >>> '\u' |
| 82 | + File "<stdin>", line 1 |
| 83 | + SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in |
| 84 | + position 0-1: truncated \uXXXX escape |
| 85 | + >>> '\\u' |
| 86 | + '\\u' |
| 87 | + |
| 88 | +Iteration |
| 89 | +--------- |
| 90 | + |
| 91 | +The behavior of the methods for iterating over the items, values and |
| 92 | +keys of a dictionary has changed in Python 3. Additionally, other |
| 93 | +built-in functions such as `zip`, `range` and `map` have changed to |
| 94 | +return iterators rather than temporary lists. |
| 95 | + |
| 96 | +In many cases, the performance implications of iterating vs. creating |
| 97 | +a temporary list won't matter, so it's tempting to use the form that |
| 98 | +is simplest to read. However, that results in code that behaves |
| 99 | +differently on Python 2 and 3, leading to subtle bugs that may not be |
| 100 | +detected by the regression tests. Therefore, unless the loop in |
| 101 | +question is provably simple and doesn't call into other code, the |
| 102 | +`six` versions that ensure the same behavior on both Python 2 and 3 |
| 103 | +should be used. The following table shows the mapping of equivalent |
| 104 | +semantics between Python 2, 3 and six for `dict.items()`: |
| 105 | + |
| 106 | +============================== ============================== ============================== |
| 107 | +Python 2 Python 3 six |
| 108 | +============================== ============================== ============================== |
| 109 | +``d.items()`` ``list(d.items())`` ``list(six.iteritems(d))`` |
| 110 | +``d.iteritems()`` ``d.items()`` ``six.iteritems(d)`` |
| 111 | +============================== ============================== ============================== |
| 112 | + |
| 113 | +Numpy-specific things |
| 114 | +--------------------- |
| 115 | + |
| 116 | +When specifying dtypes, all strings must be byte strings on Python 2 |
| 117 | +and unicode strings on Python 3. The best way to handle this is to |
| 118 | +force cast them using `str()`. The same is true of structure |
| 119 | +specifiers in the `struct` built-in module. |
0 commit comments