-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Add set_pledged_input_size to ZstdCompressor #134938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In conda-package-handling we are using |
When marked with |
Public APIWith the safeguard of raising an exception if the size is not the right one in the end, I'm +1 on including the method as public API. Initial values
I think you want Catch incorrect usage
I think we don't do that in Too much data
>>> from pyzstd import ZstdCompressor
>>> c = ZstdCompressor()
>>> c._set_pledged_input_size(1)
>>> c.compress(b'aa')
Traceback (most recent call last):
File "<python-input-3>", line 1, in <module>
c.compress(b'aa')
~~~~~~~~~~^^^^^^^
pyzstd.ZstdError: Unable to compress zstd data: Src size is incorrect Not enough data
>>> from pyzstd import ZstdCompressor
>>> c = ZstdCompressor()
>>> c._set_pledged_input_size(3)
>>> c.compress(b'aa')
b''
>>> c.flush()
Traceback (most recent call last):
File "<python-input-4>", line 1, in <module>
c.flush()
~~~~~~~^^
pyzstd.ZstdError: Unable to compress zstd data: Src size is incorrect So I'm not sure if this is needed to implement it in code ourselves (if Not enough data
Based on my last point, we will get an error from Moreover, from the standard specification, the value is expected to be the exact size:
I am personally in favor of raising an exception both if we have too much or not enough data. Behavior and corner casesMy take on other misc points:
|
Great! I'll work on a PR for this a bit later today. |
…onGH-135010) (cherry picked from commit 4b44b34) Co-authored-by: Emma Smith <emma@emmatyping.dev>
Thanks! |
Uh oh!
There was an error while loading. Please reload this page.
Feature or enhancement
Proposal:
pyzstd's ZstdCompressor class had a method
_set_pledged_input_size
, which allowed users to set the amount of data they were going to write into a frame so it would be written into the frame header. We should support this use case incompresison.zstd
.I don't want to add a private API that is unsafe or only for advanced users, so I want to sketch out an implementation that could be used in general and catch incorrect usage:
unsigned long long
memberscurrent_frame_size
andpledged_size
, both initialized toZSTD_CONTENTSIZE_UNKNOWN
set_pledged_size
, the main difference from the pyzstd implementation is that it will updatepledged_size
compress()
andflush()
to track how much data is being written to the compressor, written intocurrent_frame_size
. If the mode isFLUSH_FRAME
then after writing, check thatcurrent_frame_size == pledged_size
, otherwise raise aZstdError
to indicate the failure. Resetpledged_size
andcurrent_frame_size
.I think the one drawback of the above is it will notify the user if something goes wrong but if they are streaming compressed data elsewhere they could still send garbage if they use the API wrong. But that's inherently not something we can really fix.
An open question I have is should we check
current_frame_size <= pledged_size
at the end of writing when the mode isn'tFLUSH_FRAME
? I think probably yes?cc @Rogdham, I'd be interested in your thoughts.
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
https://discuss.python.org/t/pep-784-adding-zstandard-to-the-standard-library/87377/143
Linked PRs
set_pledged_input_size
to ZstdCompressor #135010The text was updated successfully, but these errors were encountered: