|
| 1 | +原文:[Better Python Object Serialization](https://hynek.me/articles/serialization/) |
| 2 | + |
| 3 | +--- |
| 4 | + |
| 5 | +The Python standard library is full of underappreciated gems. One of them |
| 6 | +allows for simple and elegant function dispatching based on argument types. |
| 7 | +This makes it perfect for serialization of arbitrary objects – for example to |
| 8 | +JSON in web APIs and structured logs. |
| 9 | + |
| 10 | +Who hasn’t seen it: |
| 11 | + |
| 12 | +```python |
| 13 | + |
| 14 | + TypeError: datetime.datetime(...) is not JSON serializable |
| 15 | + |
| 16 | +``` |
| 17 | + |
| 18 | +While this shouldn’t be a big deal, it is. The `json` module – that inherited |
| 19 | +its API from `simplejson` – offers two ways to serialize objects: |
| 20 | + |
| 21 | + 1. Implement a `default()` _function_ that takes an object and returns something that [`JSONEncoder`](https://docs.python.org/3/library/json.html#json.JSONEncoder) understands. |
| 22 | + 2. Implement or subclass a `JSONEncoder` yourself and pass it as `cls` to the dump methods. You can implement it on your own or just override the `JSONEncoder.default()` _method_. |
| 23 | + |
| 24 | +And since alternative implementations want to be drop-in, they imitate the |
| 25 | +`json` module’s API to various degrees1. |
| 26 | + |
| 27 | +## Expandability |
| 28 | + |
| 29 | +What both approaches have in common is that they’re not expandable: adding |
| 30 | +support for new types is not provided for. Your single `default()` fallback |
| 31 | +has to know about all custom types you want to serialize. Which means you |
| 32 | +either write functions like: |
| 33 | + |
| 34 | +```python |
| 35 | + |
| 36 | + def to_serializable(val): |
| 37 | + if isinstance(val, datetime): |
| 38 | + return val.isoformat() + "Z" |
| 39 | + elif isinstance(val, enum.Enum): |
| 40 | + return val.value |
| 41 | + elif attr.has(val.__class__): |
| 42 | + return attr.asdict(val) |
| 43 | + elif isinstance(val, Exception): |
| 44 | + return { |
| 45 | + "error": val.__class__.__name__, |
| 46 | + "args": val.args, |
| 47 | + } |
| 48 | + return str(val) |
| 49 | + |
| 50 | +``` |
| 51 | + |
| 52 | +Which is painful since you have to add serialization for all objects in one |
| 53 | +place2. |
| 54 | + |
| 55 | +Alternatively you can try to come up with general solutions on your own like |
| 56 | +Pyramid’s JSON renderer did in [`JSON.add_adapter`](http://docs.pylonsproject. |
| 57 | +org/projects/pyramid/en/latest/narr/renderers.html#using-the-add-adapter- |
| 58 | +method-of-a-custom-json-renderer) which uses the widely underappreciated |
| 59 | +`zope.interface`’s adapter registry3. |
| 60 | + |
| 61 | +Django on the other hand satisfies itself with a `DjangoJSONEncoder` that is a |
| 62 | +subclass of `json.JSONEncoder` and knows how to encode dates, times, UUIDs, |
| 63 | +and promises. But other than that, you’re on your own again. If you want to go |
| 64 | +further with Django and web APIs, you’re probably already using the Django |
| 65 | +REST framework anyway. They came up with a whole [serialization |
| 66 | +system](http://www.django-rest-framework.org/api-guide/serializers/) that does |
| 67 | +a lot more than just making data `json.dumps()`-ready. |
| 68 | + |
| 69 | +Finally for the sake of completeness I feel like I have to mention my own |
| 70 | +solution in [`structlog`](http://www.structlog.org/en/stable/) that I fiercely |
| 71 | +hated from day one: adding a `__structlog__` method to your classes that |
| 72 | +return a serializable representation in the tradition of `__str__`. Please |
| 73 | +don’t repeat my mistake; hashtag [software clown](https://softwareclown.com). |
| 74 | + |
| 75 | +* * * |
| 76 | + |
| 77 | +Given how prevalent JSON is, it’s surprising that we have only siloed |
| 78 | +solutions so far. What _I_ personally would like to have is a way to register |
| 79 | +serializers in a central place but in a decentralized fashion that doesn’t |
| 80 | +require any changes to my (or worse: third party) classes. |
| 81 | + |
| 82 | +## Enter PEP 443 |
| 83 | + |
| 84 | +Turns out, Python 3.4 came with a nice solution to this problem in the form of |
| 85 | +[PEP 443](https://www.python.org/dev/peps/pep-0443/): [`functools.singledispat |
| 86 | +ch`](https://docs.python.org/3/library/functools.html#functools.singledispatch |
| 87 | +) (also available on [PyPI](https://pypi.org/project/singledispatch/) for |
| 88 | +legacy Python versions). |
| 89 | + |
| 90 | +Put simply, you define a default function and then register additional |
| 91 | +versions of that functions depending on the type of the first argument: |
| 92 | + |
| 93 | +```python |
| 94 | + |
| 95 | + from datetime import datetime |
| 96 | + from functools import singledispatch |
| 97 | + |
| 98 | + @singledispatch |
| 99 | + def to_serializable(val): |
| 100 | + """Used by default.""" |
| 101 | + return str(val) |
| 102 | + |
| 103 | + @to_serializable.register(datetime) |
| 104 | + def ts_datetime(val): |
| 105 | + """Used if *val* is an instance of datetime.""" |
| 106 | + return val.isoformat() + "Z" |
| 107 | + |
| 108 | +``` |
| 109 | + |
| 110 | +Now you can call `to_serializable()` on `datetime` instances too and single |
| 111 | +dispatch will pick the correct function: |
| 112 | + |
| 113 | +```python |
| 114 | + |
| 115 | + >>> json.dumps({"msg": "hi", "ts": datetime.now()}, |
| 116 | + ... default=to_serializable) |
| 117 | + '{"ts": "2016-08-20T13:08:59.153864Z", "msg": "hi"}' |
| 118 | + |
| 119 | +``` |
| 120 | + |
| 121 | +This gives you the power to put your serializers wherever you want: along with |
| 122 | +the classes, in a separate module, or along with JSON-related code? _You_ |
| 123 | +choose! But your _classes_ stay clean and you don’t have a huge `if-elif-else` |
| 124 | +branch that you cargo-cult between your projects. |
| 125 | + |
| 126 | +## Going Further |
| 127 | + |
| 128 | +Obviously the utility of `@singledispatch` goes far beyond JSON. Binding |
| 129 | +different behaviors to different types in general and object serialization in |
| 130 | +particular are universally useful4. Some of my proofreaders mentioned they |
| 131 | +tried a ghetto approximation using `dict`s of classes to callables and other |
| 132 | +similar atrocities. |
| 133 | + |
| 134 | +In other words, `@singledispatch` just may be the function that you’ve been |
| 135 | +missing although it was there all along. |
| 136 | + |
| 137 | +P.S. Of course there’s also a `*multiple*dispatch` on |
| 138 | +[PyPI](https://pypi.org/project/multipledispatch/). |
| 139 | + |
| 140 | +## Footnotes |
| 141 | + |
| 142 | +* * * |
| 143 | + |
| 144 | + 1. However, from the popular ones: [UltraJSON](https://github.com/esnme/ultrajson) doesn’t support custom object serialization at all and [`python-rapidjson`](https://github.com/kenrobbins/python-rapidjson) only supports the `default()` function. ↩︎ |
| 145 | + 2. Although as you can see it’s manageable with `attrs`; maybe [you should use `attrs`](https://glyph.twistedmatrix.com/2016/08/attrs.html)! ↩︎ |
| 146 | + 3. Unfortunately the API Pyramid uses is currently [undocumented](https://github.com/zopefoundation/zope.interface/issues/41) after being transplanted from [`zope.component`](https://docs.zope.org/zope.component/). ↩︎ |
| 147 | + 4. I’ve been told the original incentive for adding single dispatch to the standard library was a more elegant reimplementation of [`pprint`](https://docs.python.org/3.5/library/pprint.html) (that never happened). ↩︎ |
0 commit comments