Skip to content

Commit b743451

Browse files
committed
Add new example: Tricky strings
Closes satwikkansal#54
1 parent 1ec3c5e commit b743451

File tree

1 file changed

+39
-2
lines changed

1 file changed

+39
-2
lines changed

README.md

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -423,14 +423,52 @@ SyntaxError: EOL while scanning string literal
423423
424424
---
425425
426-
### String interning
426+
### Strings can be tricky sometimes
427427
428+
1\.
428429
```py
429430
>>> a = "some_string"
430431
>>> id(a)
431432
140420665652016
432433
>>> id("some" + "_" + "string") # Notice that both the ids are same.
433434
140420665652016
435+
```
436+
437+
2\.
438+
```py
439+
>>> a = "wtf"
440+
>>> b = "wtf"
441+
>>> a is b
442+
True
443+
444+
>>> a = "wtf!"
445+
>>> b = "wtf!"
446+
>>> a is b
447+
False
448+
```
449+
450+
3\.
451+
```py
452+
>>> 'a' * 20 is 'aaaaaaaaaaaaaaaaaaaa'
453+
True
454+
>>> 'a' * 21 is 'aaaaaaaaaaaaaaaaaaaaa'
455+
```
456+
457+
Makes sense, right?
458+
459+
#### 💡 Explanation:
460+
+ Such behavior is due to CPython optimization (called string interning) that tries to use existing immutable objects in some cases rather than creating a new object every time.
461+
+ After being interned, many variables may point to the same string object in memory (thereby saving memory).
462+
+ In the snippets above, strings are implicity interned. The decison of when to implicitly intern a string is implementation dependent. There are some facts that can be used to guess if a string will be interned or not:
463+
* All length 0 and length 1 strings are interned.
464+
* Strings are interned at compile time (`'wtf'` will be interned but `''.join(['w', 't', 'f']` will not be interned)
465+
* Strings that are not composed of ascii letters, digits or underscores, are not interned. This explains why `'wtf!'` was not interned due to `!`.
466+
467+
---
468+
469+
### `+=` is faster
470+
471+
```py
434472
# using "+", three strings:
435473
>>> timeit.timeit("s1 = s1 + s2 + s3", setup="s1 = ' ' * 100000; s2 = ' ' * 100000; s3 = ' ' * 100000", number=100)
436474
0.25748300552368164
@@ -441,7 +479,6 @@ SyntaxError: EOL while scanning string literal
441479
442480
#### 💡 Explanation:
443481
+ `+=` is faster than `+` for concatenating more than two strings because the first string (example, `s1` for `s1 += s2 + s3`) is not destroyed while calculating the complete string.
444-
+ Both the strings refer to the same object because of CPython optimization that tries to use existing immutable objects in some cases (implementation specific) rather than creating a new object every time. You can read more about this [here](https://stackoverflow.com/questions/24245324/about-the-changing-id-of-an-immutable-string).
445482
446483
---
447484

0 commit comments

Comments
 (0)