extend the Encoder/Decoder state API to be type-aware #13017

thrau · 2025-08-18T11:01:39Z

Motivation

Currently our decoders serve as adapter only to dill pickle. Because they type information is stored in the pickle, you can de-serialize a pickle and get the correct type without having to specify it at deserialization. Unfortunately this process breaks down when you've moved or renamed a class that you want to deserialize into.

So to be able to de-couple code and data serialization, and separate reader/writer schema, we need the decoder to be type-aware. The Encoder has to be type aware for certain operations as well, for example if we pass a TypedDict to encode, we won't know which type to encode since python doesn't retain type information for typed dicts. So this is just to make the encoding API a more flexible.

Changes

state Encoder and Decoder APIs now optionally take a expected_type argument that allows an underlying implementation to be specific about the python type to serialize or deserialize into

github-actions · 2025-08-18T11:04:04Z

Test Results - Preflight, Unit

22 158 tests ±0 20 421 ✅ ±0 6m 23s ⏱️ -25s
1 suites ±0 1 737 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit b30e59a. ± Comparison against base commit 38aecc7.

♻️ This comment has been updated with latest results.

github-actions · 2025-08-18T12:03:39Z

LocalStack Community integration with Pro

2 files ±0 2 suites ±0 1h 45m 15s ⏱️ + 3m 39s
4 632 tests ±0 4 189 ✅ ±0 443 💤 ±0 0 ❌ ±0
4 634 runs ±0 4 189 ✅ ±0 445 💤 ±0 0 ❌ ±0

Results for commit b30e59a. ± Comparison against base commit 38aecc7.

♻️ This comment has been updated with latest results.

github-actions · 2025-08-27T10:35:29Z

Test Results (amd64) - Acceptance

7 tests ±0 5 ✅ ±0 3m 13s ⏱️ +6s
1 suites ±0 2 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit b30e59a. ± Comparison against base commit 38aecc7.

♻️ This comment has been updated with latest results.

github-actions · 2025-08-27T11:07:12Z

Test Results (amd64) - Integration, Bootstrap

5 files 5 suites 2h 19m 56s ⏱️
5 003 tests 4 402 ✅ 601 💤 0 ❌
5 009 runs 4 402 ✅ 607 💤 0 ❌

Results for commit b30e59a.

♻️ This comment has been updated with latest results.

giograno

Thanks for the PR!

I believe we need to account for the writer schema to be provided to the decoder (as mentioned in the comment below). The "weird" failing test I showed you yesterday was due to the fact that we were providing the reader schema as the writer one (still not 100% sure how avro could convert it, though).

giograno · 2025-08-28T11:18:25Z

localstack-core/localstack/state/core.py

@@ -97,42 +97,48 @@ def __str__(self) -> str:


 class Encoder:
-    def encodes(self, obj: Any) -> bytes:
+    def encodes(self, obj: Any, py_type: type = None) -> bytes:


Suggested change

def encodes(self, obj: Any, py_type: type = None) -> bytes:

def encodes(self, obj: Any, py_type: type | None = None) -> bytes:

small nit (applies everywhere)

giograno · 2025-08-28T11:58:58Z

localstack-core/localstack/state/core.py


-    def decode(self, file: IO[bytes]) -> Any:
+    def decode(self, file: IO[bytes], py_type: type = None) -> Any:


Following up on our discussion from yesterday, I believe we need to extend the decoder APIs and provide the writer schema.

The avro DatumReader needs both the writer and the reader schema (see here), and our initial prototype was wrongly providing the reader as the writer. If only the writer schema is provided, avro assumes the reader schema is the same.
Maybe we could also get along with a **kwargs for the initial iteration.

good callout! i think this is not the right level of abstraction to do this. adding **kwargs just obfuscates the fact that it at this point has become a leaky abstraction.

my suggestion would be to find a different way to hide this. for example, we can make sure in the encoding step that the writer schema is placed into the same file, and we read it first, and then read the data.

Good point! We can definitely save everything in a single file. avro supports this out of the box. I could easily implement it.

Since you mentioned leaky abstractions, couldn't we get rid of py_type as well?

how would we derive the reader schema?

ad: i don't think i really fully understand. could you share an example?

For the encoder, we can just type(obj). For the encoder, we can read the type from the writer schema we stored in the binary data. The store class might be renamed, though, and we'd need to provide an alias.

giograno

My main comment was about the writer schema in the decoder signature.
As we discussed, we have been able to store it along with the serialized data in the avro implementation (see companion PR), so that's not valid anymore.

thrau added the semver: patch Non-breaking changes which can be included in patch releases label Aug 18, 2025

thrau requested review from viren-nadkarni and giograno August 18, 2025 11:14

thrau force-pushed the storage-with-avro branch 2 times, most recently from cb430dd to b458e31 Compare August 19, 2025 23:14

thrau added the skip-docs Pull request does not require documentation changes label Aug 27, 2025

thrau force-pushed the storage-with-avro branch from b458e31 to 48c788c Compare August 27, 2025 10:12

extend the Encoder/Decoder state API to be type-aware

b30e59a

thrau force-pushed the storage-with-avro branch from 48c788c to b30e59a Compare August 27, 2025 15:46

thrau marked this pull request as ready for review August 28, 2025 11:15

giograno reviewed Aug 28, 2025

View reviewed changes

giograno self-requested a review August 29, 2025 08:10

giograno approved these changes Aug 29, 2025

View reviewed changes

thrau merged commit ffe3e4f into main Aug 29, 2025
40 checks passed

thrau deleted the storage-with-avro branch August 29, 2025 10:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

extend the Encoder/Decoder state API to be type-aware #13017

extend the Encoder/Decoder state API to be type-aware #13017

Uh oh!

thrau commented Aug 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 27, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 27, 2025 •

edited

Loading

Uh oh!

giograno left a comment

Uh oh!

giograno Aug 28, 2025

Uh oh!

giograno Aug 28, 2025

Uh oh!

thrau Aug 28, 2025

Uh oh!

giograno Aug 28, 2025 •

edited

Loading

Uh oh!

thrau Aug 28, 2025 •

edited

Loading

Uh oh!

giograno Aug 28, 2025 •

edited

Loading

Uh oh!

giograno left a comment

Uh oh!

Uh oh!

Uh oh!

	def encodes(self, obj: Any, py_type: type = None) -> bytes:
	def encodes(self, obj: Any, py_type: type \| None = None) -> bytes:


		def decode(self, file: IO[bytes]) -> Any:
		def decode(self, file: IO[bytes], py_type: type = None) -> Any:

Uh oh!

extend the Encoder/Decoder state API to be type-aware #13017

extend the Encoder/Decoder state API to be type-aware #13017

Uh oh!

Conversation

thrau commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Uh oh!

github-actions bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results - Preflight, Unit

Uh oh!

github-actions bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LocalStack Community integration with Pro

Uh oh!

github-actions bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results (amd64) - Acceptance

Uh oh!

github-actions bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results (amd64) - Integration, Bootstrap

Uh oh!

giograno left a comment

Choose a reason for hiding this comment

Uh oh!

giograno Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

giograno Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

thrau Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

giograno Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thrau Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

giograno Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

giograno left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

thrau commented Aug 18, 2025 •

edited

Loading

github-actions bot commented Aug 18, 2025 •

edited

Loading

github-actions bot commented Aug 18, 2025 •

edited

Loading

github-actions bot commented Aug 27, 2025 •

edited

Loading

github-actions bot commented Aug 27, 2025 •

edited

Loading

giograno Aug 28, 2025 •

edited

Loading

thrau Aug 28, 2025 •

edited

Loading

giograno Aug 28, 2025 •

edited

Loading