Skip to content

gh-132983: Convert dict_content to take Py_buffer #133924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 26, 2025

Conversation

AA-Turner
Copy link
Member

@AA-Turner AA-Turner commented May 12, 2025

ZstdDict's dict_content is just binary data, we don't need to allocate an entire PyBytesObject for it. This also lets us use the Py_buffer argument clinic converter.

We replace the dict_content member of ZstdDict with dict_buffer and dict_len. ZstdDict.dict_content is still exposed as a bytes object to Python via a new getter.

A

cc @Rogdham

@serhiy-storchaka serhiy-storchaka self-requested a review May 12, 2025 15:01
@emmatyping
Copy link
Member

It would probably be good to run the refleaks buildbots on this PR before merge.

@emmatyping
Copy link
Member

@AA-Turner would you like to pick this up again? I'm happy to take this over if you don't have time. Want to get the last of the zstd items done before b2.

@AA-Turner AA-Turner added the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label May 24, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @AA-Turner for commit fe3fb49 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F133924%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label May 24, 2025
@emmatyping
Copy link
Member

Merged main because there were nonsense errors in the tests.

@AA-Turner AA-Turner force-pushed the zstd-dict-content-buffer branch from 90cef43 to 30ac9d5 Compare May 26, 2025 07:55
@AA-Turner AA-Turner added the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label May 26, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @AA-Turner for commit 30ac9d5 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F133924%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label May 26, 2025
@AA-Turner
Copy link
Member Author

Apologies for the force-push, my local git state got messy.

Comment on lines 52 to 53
"Zstandard dictionary content must be longer "
"than eight bytes.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can also be exactly 8 bytes, right?

What happens if you remove this check? What error will you get?

Copy link
Member Author

@AA-Turner AA-Turner May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wording error, sorry.

What happens if you remove this check? What error will you get?

From RFC 8878 s.5, a dictionary can either be in Zstandard format or 'raw content' format.

  • "raw content" dictionaries ... must be at least 8 bytes.

  • The Zstandard format has a required 4 byte magic number and 4 byte ID

This check only bites when is_raw is True, as normally we check that the dict is a valid Zstandard directory via ZSTD_getDictID_fromDict() != 0. That function returns 0 if dictSize < 8 or otherwise if the data does not match the Zstandard header.

The Zstandard documentation is quite sparse on 'raw content' dictionaries, but I would conclude that the early error here probably doesn't hurt, and helps ensure we are conformant to the RFC.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no principal difference between keeping binary data as a bytes object or a raw memory buffer. But if the latter looks better to you, LGTM.

@AA-Turner AA-Turner enabled auto-merge (squash) May 26, 2025 14:30
@AA-Turner AA-Turner merged commit f2ce4bb into python:main May 26, 2025
38 checks passed
@miss-islington-app
Copy link

Thanks @AA-Turner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 26, 2025
…()`` (pythonGH-133924)

(cherry picked from commit f2ce4bb)

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
@bedevere-app
Copy link

bedevere-app bot commented May 26, 2025

GH-134723 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label May 26, 2025
AA-Turner added a commit that referenced this pull request May 28, 2025
…t()`` (GH-133924) (#134723)

gh-132983: Convert dict_content to take Py_buffer in ``ZstdDict()`` (GH-133924)
(cherry picked from commit f2ce4bb)

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants