You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -3349,36 +3349,38 @@ Let's increase the number of iterations by a factor of 10.
3349
3349
3350
3350
---
3351
3351
3352
-
### ▶ `dict` lookup performance
3352
+
### ▶ Slowing down `dict` lookups
3353
3353
3354
3354
```py
3355
-
>>> some_dict = {str(i): 1 for i in range(1_000_000)}
3355
+
some_dict = {str(i): 1 for i in range(1_000_000)}
3356
+
another_dict = {str(i): 1 for i in range(1_000_000)}
3357
+
```
3358
+
3359
+
**Output:**
3360
+
```py
3356
3361
>>> %timeit some_dict['5']
3357
3362
28.6 ns ± 0.115 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
3358
3363
>>> some_dict[1] = 1
3359
3364
>>> %timeit some_dict['5']
3360
3365
37.2 ns ± 0.265 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
3361
-
# why did it become much slower?
3362
-
```
3363
3366
3364
-
#### 💡 Explanation:
3365
-
+ CPython has a generic dictionary lookup function that handles all types of keys (`str`, `int`, any object ...), and a specialized one for the common case of dictionaries composed of `str`-only keys.
3366
-
+ The specialized function (named `lookdict_unicode` in CPython's sources) knows all existing keys (including the looked-up key) are strings, and uses the faster & simpler string comparison to compare keys, instead of calling the `__eq__` method.
3367
-
+ The first time a `dict` instance is accessed with a non-`str` key, it's modified so future lookups use the generic function.
3368
-
+ This process is not reversible for the particular `dict` instance, and the key doesn't even have to exist in the dictionary - attempting a failed lookup has the same effect:
3369
-
```py
3370
-
>>> some_dict = {str(i): 1 for i in range(1_000_000)}
3371
-
>>> %timeit some_dict['5']
3367
+
>>> %timeit another_dict['5']
3372
3368
28.5 ns ± 0.142 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
3373
-
>>> some_dict[1]
3369
+
>>> another_dict[1] # Trying to access a key that doesn't exist
3374
3370
Traceback (most recent call last):
3375
3371
File "<stdin>", line 1, in <module>
3376
3372
KeyError: 1
3377
-
>>> %timeit some_dict['5']
3373
+
>>> %timeit another_dict['5']
3378
3374
38.5 ns ± 0.0913 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
3379
3375
```
3376
+
Why are same lookups becoming slower?
3377
+
3378
+
#### 💡 Explanation:
3379
+
+ CPython has a generic dictionary lookup function that handles all types of keys (`str`, `int`, any object ...), and a specialized one for the common case of dictionaries composed of `str`-only keys.
3380
+
+ The specialized function (named `lookdict_unicode` in CPython's [source](https://github.com/python/cpython/blob/522691c46e2ae51faaad5bbbce7d959dd61770df/Objects/dictobject.c#L841)) knows all existing keys (including the looked-up key) are strings, and uses the faster & simpler string comparison to compare keys, instead of calling the `__eq__` method.
3381
+
+ The first time a `dict` instance is accessed with a non-`str` key, it's modified so future lookups use the generic function.
3382
+
+ This process is not reversible for the particular `dict` instance, and the key doesn't even have to exist in the dictionary. That's why attempting a failed lookup has the same effect.
3380
3383
3381
-
---
3382
3384
3383
3385
### ▶ Minor Ones *
3384
3386
<!-- Example ID: f885cb82-f1e4-4daa-9ff3-972b14cb1324 --->
0 commit comments