-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
dict_set_fromkeys() calculates size of dictionary improperly #132762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looks like a 3.13 regression. On a debug build, this assertion fails:
I'm pretty sure the regressing PR was gh-114508 (cc @DinoV @colesbury). |
@DinoV - would you please take a look at this? |
I'd like to help with this one. I've done some homework. TL;DR
Repro & scope
Root cause In gh-114508, the --- a/Objects/dictobject.c
+++ b/Objects/dictobject.c
@@ -2341,63 +2341,27 @@ PyObject *_PyDict_FromKeys(PyObject *cls, PyObject *iterable, PyObject *value) {
- if (PyDict_CheckExact(d) && ((PyDictObject*)d)->ma_used == 0) {
+ if (PyDict_CheckExact(d)) {
Now any
Originally posted by @DinoV in #114508 (comment) To accommodate the looser fast path, /* before */
dictresize(interp, mp,
estimate_log2_keysize(PyDict_GET_SIZE(iterable)), unicode);
/* after */
uint8_t new_size = Py_MAX(
estimate_log2_keysize(PyDict_GET_SIZE(iterable)),
DK_LOG_SIZE(mp->ma_keys)
);
dictresize(interp, mp, new_size, unicode); That code prevents accidentally shrinking below the existing table when What's missing is the corresponding dictresize(interp, mp,
estimate_log2_keysize(PySet_GET_SIZE(iterable)), unicode); So it fails to prevent accidental table-shrinking. This is also why the bug is unique to Why It's certainly more intuitive to do this instead: uint8_t new_size = estimate_log2_keysize(mp->ma_used + PySet_GET_SIZE(iterable));
dictresize(interp, mp, new_size, unicode); But this strategy eagerly reserves memory for all old+new entries (plus headroom) upfront, even when the current table already has plenty of headroom. With the My Proposal
/* before */
dictresize(interp, mp, estimate_log2_keysize(PySet_GET_SIZE(iterable)), unicode);
/* after */
uint8_t new_size = Py_MAX(
estimate_log2_keysize(PySet_GET_SIZE(iterable)),
DK_LOG_SIZE(mp->ma_keys)
);
dictresize(interp, mp, new_size, unicode);
Would love a 👍 or any feedback (cc @DinoV @colesbury) before opening a PR. This would be my first CPython contribution, so thanks for your guidance! |
Go for it! You can @ me on the PR to review it |
Bug report
Bug description:
The function dict_set_fromkeys() in the file dictobject.c adds elements of an iterable to an existing dictionary. The size of the expanded dictionary is estimated as PySet_GET_SIZE(iterable) and the size of the existing dictionary is not considered.
This is unlogical.
A more resonable estimation is "mp->ma_used + PySet_GET_SIZE(iterable)". What turns this into a bug is the fact that a large dictionary plus a small iterable can cause the function to loop forever.
The following code is adopted from the official tests, file test_dict.py. Official binaries for Python 3.13.3 compiled with MSVC will loop forever when running this modified test (the official test has range(10)). Other platforms may be affected as well.
A minor error is in the function calculate_log2_keysize(minsize) which has three branches for calculating its result. However, when minsize < 10 the results of the three branches can differ. Since _BitScanReverse64() is Windows-only other platforms might be unaffected.
A final unrelated note. PEP 11 – CPython platform support states that aarch64-unknown-linux-gnu and x86_64-unknown-linux-gnu are Tier 1 AND Tier 2.
CPython versions tested on:
3.13
Operating systems tested on:
Windows
Linked PRs
The text was updated successfully, but these errors were encountered: