Skip to content

gh-132983: Convert dict_content to take Py_buffer #133924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AA-Turner
Copy link
Member

@AA-Turner AA-Turner commented May 12, 2025

ZstdDict's dict_content is just binary data, we don't need to allocate an entire PyBytesObject for it. This also lets us use the Py_buffer argument clinic converter.

We replace the dict_content member of ZstdDict with dict_buffer and dict_len. ZstdDict.dict_content is still exposed as a bytes object to Python via a new getter.

A

cc @Rogdham

Comment on lines +68 to +73
self->dict_buffer = PyMem_RawMalloc(dict_content->len);
if (!self->dict_buffer) {
return PyErr_NoMemory();
}
memcpy(self->dict_buffer, dict_content->buf, dict_content->len);
self->dict_len = dict_content->len;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @vstinner, I think RawMalloc is correct here but would be grateful for a review / confirmation please

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you told the GIL in this code, so you can use PyMem_Malloc() instead. The Raw variant is when you don't hold the GIL.

@serhiy-storchaka serhiy-storchaka self-requested a review May 12, 2025 15:01
@@ -60,24 +59,28 @@ _zstd_ZstdDict_new_impl(PyTypeObject *type, PyObject *dict_content,
}

/* Check dict_content's type */
self->dict_content = PyBytes_FromObject(dict_content);
if (self->dict_content == NULL) {
if (dict_content == NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is never NULL.

BTW, overriding arbitrary exception with a TypeError in the current code is wrong.

if (Py_SIZE(self->dict_content) < 8) {
self->dict_buffer = PyMem_RawMalloc(dict_content->len);
if (!self->dict_buffer) {
return PyErr_NoMemory();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self is leaked here.

@@ -124,23 +124,33 @@ PyDoc_STRVAR(ZstdDict_dictid_doc,
"The special value '0' means a 'raw content' dictionary,"
"without any restrictions on format or content.");

PyDoc_STRVAR(ZstdDict_dictcontent_doc,
"The content of a Zstandard dictionary, as a bytes object.");

static PyObject *
ZstdDict_str(PyObject *ob)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is wrong to implement __str__. __repr__ should be implemented.

static PyObject *
ZstdDict_str(PyObject *ob)
{
ZstdDict *dict = ZstdDict_CAST(ob);
return PyUnicode_FromFormat("<ZstdDict dict_id=%u dict_size=%zd>",
dict->dict_id, Py_SIZE(dict->dict_content));
dict->dict_id, dict->dict_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dict->dict_id, dict->dict_len);
(unsigned int)dict->dict_id, dict->dict_len);

self->dict_len = dict_content->len;

/* Both ordinary and "raw content" dictionaries must be 8 bytes minimum */
if (self->dict_len < 8) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be checked at the beginning, before any allocations.

}

self->dict_content = NULL;
self->d_dict = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self->d_dict = NULL;
self->d_dict = NULL;
self->dict_buffer = NULL;

Otherwise self->dict_buffer is not initialized in the destructor.

Comment on lines +67 to 71
char *dict_buffer = self->dict_buffer;
Py_ssize_t dict_len = self->dict_len;
Py_BEGIN_ALLOW_THREADS
self->d_dict = ZSTD_createDDict(dict_buffer,
dict_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just inline them, as in the code below.

Suggested change
char *dict_buffer = self->dict_buffer;
Py_ssize_t dict_len = self->dict_len;
Py_BEGIN_ALLOW_THREADS
self->d_dict = ZSTD_createDDict(dict_buffer,
dict_len);
Py_BEGIN_ALLOW_THREADS
self->d_dict = ZSTD_createDDict(self->dict_buffer, self->dict_len);

Comment on lines +173 to 177
char *dict_buffer = self->dict_buffer;
Py_ssize_t dict_len = self->dict_len;
Py_BEGIN_ALLOW_THREADS
cdict = ZSTD_createCDict(dict_buffer,
dict_len,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
char *dict_buffer = self->dict_buffer;
Py_ssize_t dict_len = self->dict_len;
Py_BEGIN_ALLOW_THREADS
cdict = ZSTD_createCDict(dict_buffer,
dict_len,
Py_BEGIN_ALLOW_THREADS
cdict = ZSTD_createCDict(self->dict_buffer, self->dict_len,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants