Skip to content

Commit ea29aa0

Browse files
committed
Update Example: Let's make a giant string
* Add another function `add_bytes_with_plus` actually illustrating quadratic behavior for the `+=` operator. * Add explaination for linear behavior due to `+=` optimizations in case of strings. * Change the order of examples (move string interning example just before the giant string example). Closes satwikkansal#38
1 parent 31d4382 commit ea29aa0

File tree

1 file changed

+66
-29
lines changed

1 file changed

+66
-29
lines changed

README.md

Lines changed: 66 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,10 @@ So, here ya go...
3535
- [💡 Explanation:](#-explanation-1)
3636
- [Backslashes at the end of string](#backslashes-at-the-end-of-string)
3737
- [💡 Explanation](#-explanation-4)
38-
- [Let's make a giant string!](#lets-make-a-giant-string)
39-
- [💡 Explanation](#-explanation-5)
4038
- [String interning](#string-interning)
4139
- [💡 Explanation:](#-explanation-2)
40+
- [Let's make a giant string!](#lets-make-a-giant-string)
41+
- [💡 Explanation](#-explanation-5)
4242
- [Yes, it exists!](#yes-it-exists)
4343
- [💡 Explanation:](#-explanation-3)
4444
- [`is` is not what it is!](#is-is-not-what-it-is)
@@ -406,6 +406,28 @@ SyntaxError: EOL while scanning string literal
406406
407407
---
408408
409+
### String interning
410+
411+
```py
412+
>>> a = "some_string"
413+
>>> id(a)
414+
140420665652016
415+
>>> id("some" + "_" + "string") # Notice that both the ids are same.
416+
140420665652016
417+
# using "+", three strings:
418+
>>> timeit.timeit("s1 = s1 + s2 + s3", setup="s1 = ' ' * 100000; s2 = ' ' * 100000; s3 = ' ' * 100000", number=100)
419+
0.25748300552368164
420+
# using "+=", three strings:
421+
>>> timeit.timeit("s1 += s2 + s3", setup="s1 = ' ' * 100000; s2 = ' ' * 100000; s3 = ' ' * 100000", number=100)
422+
0.012188911437988281
423+
```
424+
425+
#### 💡 Explanation:
426+
+ `+=` is faster than `+` for concatenating more than two strings because the first string (example, `s1` for `s1 += s2 + s3`) is not destroyed while calculating the complete string.
427+
+ Both the strings refer to the same object because of CPython optimization that tries to use existing immutable objects in some cases (implementation specific) rather than creating a new object every time. You can read more about this [here](https://stackoverflow.com/questions/24245324/about-the-changing-id-of-an-immutable-string).
428+
429+
---
430+
409431
### Let's make a giant string!
410432
411433
This is not a WTF at all, just some nice things to be aware of :)
@@ -417,6 +439,12 @@ def add_string_with_plus(iters):
417439
s += "xyz"
418440
assert len(s) == 3*iters
419441
442+
def add_bytes_with_plus(iters):
443+
s = b""
444+
for i in range(iters):
445+
s += b"xyz"
446+
assert len(s) == 3*iters
447+
420448
def add_string_with_format(iters):
421449
fs = "{}"*iters
422450
s = fs.format(*(["xyz"]*iters))
@@ -437,43 +465,52 @@ def convert_list_to_string(l, iters):
437465
**Output:**
438466
```py
439467
>>> timeit(add_string_with_plus(10000))
440-
100 loops, best of 3: 9.73 ms per loop
468+
1000 loops, best of 3: 972 µs per loop
469+
>>> timeit(add_bytes_with_plus(10000))
470+
1000 loops, best of 3: 815 µs per loop
441471
>>> timeit(add_string_with_format(10000))
442-
100 loops, best of 3: 5.47 ms per loop
472+
1000 loops, best of 3: 508 µs per loop
443473
>>> timeit(add_string_with_join(10000))
444-
100 loops, best of 3: 10.1 ms per loop
474+
1000 loops, best of 3: 878 µs per loop
445475
>>> l = ["xyz"]*10000
446476
>>> timeit(convert_list_to_string(l, 10000))
447-
10000 loops, best of 3: 75.3 µs per loop
477+
10000 loops, best of 3: 80 µs per loop
448478
```
449479
450-
#### 💡 Explanation
451-
- You can read more about [timeit](https://docs.python.org/3/library/timeit.html) from here. It is generally used to measure the execution time of snippets.
452-
- Don't use `+` for generating long strings — In Python, `str` is immutable, so the left and right strings have to be copied into the new string for every pair of concatenations. If you concatenate four strings of length 10, you'll be copying (10+10) + ((10+10)+10) + (((10+10)+10)+10) = 90 characters instead of just 40 characters. Things get quadratically worse as the number and size of the string increases.
453-
- Therefore, it's advised to use `.format.` or `%` syntax (however, they are slightly slower than `+` for short strings).
454-
- Or better, if already you've contents available in the form of an iterable object, then use `''.join(iterable_object)` which is much faster.
455-
456-
---
457-
458-
### String interning
480+
Let's increase the number of iterations by a factor of 10.
459481
460482
```py
461-
>>> a = "some_string"
462-
>>> id(a)
463-
140420665652016
464-
>>> id("some" + "_" + "string") # Notice that both the ids are same.
465-
140420665652016
466-
# using "+", three strings:
467-
>>> timeit.timeit("s1 = s1 + s2 + s3", setup="s1 = ' ' * 100000; s2 = ' ' * 100000; s3 = ' ' * 100000", number=100)
468-
0.25748300552368164
469-
# using "+=", three strings:
470-
>>> timeit.timeit("s1 += s2 + s3", setup="s1 = ' ' * 100000; s2 = ' ' * 100000; s3 = ' ' * 100000", number=100)
471-
0.012188911437988281
483+
>>> timeit(add_string_with_plus(100000)) # Linear increase in execution time
484+
100 loops, best of 3: 9.75 ms per loop
485+
>>> timeit(add_bytes_with_plus(100000)) # Quadratic increase
486+
1000 loops, best of 3: 974 ms per loop
487+
>>> timeit(add_string_with_format(100000)) # Linear increase
488+
100 loops, best of 3: 5.25 ms per loop
489+
>>> timeit(add_string_with_join(100000)) # Linear increase
490+
100 loops, best of 3: 9.85 ms per loop
491+
>>> l = ["xyz"]*100000
492+
>>> timeit(convert_list_to_string(l, 100000)) # Linear increase
493+
1000 loops, best of 3: 723 µs per loop
472494
```
473495
474-
#### 💡 Explanation:
475-
+ `+=` is faster than `+` for concatenating more than two strings because the first string (example, `s1` for `s1 += s2 + s3`) is not destroyed while calculating the complete string.
476-
+ Both the strings refer to the same object because of CPython optimization that tries to use existing immutable objects in some cases (implementation specific) rather than creating a new object every time. You can read more about this [here](https://stackoverflow.com/questions/24245324/about-the-changing-id-of-an-immutable-string).
496+
#### 💡 Explanation
497+
- You can read more about [timeit](https://docs.python.org/3/library/timeit.html) from here. It is generally used to measure the execution time of snippets.
498+
- Don't use `+` for generating long strings — In Python, `str` is immutable, so the left and right strings have to be copied into the new string for every pair of concatenations. If you concatenate four strings of length 10, you'll be copying (10+10) + ((10+10)+10) + (((10+10)+10)+10) = 90 characters instead of just 40 characters. Things get quadratically worse as the number and size of the string increases (justified with the execution times of `add_bytes_with_plus` function)
499+
- Therefore, it's advised to use `.format.` or `%` syntax (however, they are slightly slower than `+` for short strings).
500+
- Or better, if already you've contents available in the form of an iterable object, then use `''.join(iterable_object)` which is much faster.
501+
- `add_string_with_plus` didn't show a quadratic increase in execution time unlike `add_bytes_with_plus` becuase of the `+=` optimizations discussed in the previous example. Had the statement been `s = s + "x" + "y" + "z"` instead of `s += "xyz"`, the increase would have been quadratic.
502+
```py
503+
def add_string_with_plus(iters):
504+
s = ""
505+
for i in range(iters):
506+
s = s + "x" + "y" + "z"
507+
assert len(s) == 3*iters
508+
509+
>>> timeit(add_string_with_plus(10000))
510+
100 loops, best of 3: 9.87 ms per loop
511+
>>> timeit(add_string_with_plus(100000)) # Quadratic increase in execution time
512+
1 loops, best of 3: 1.09 s per loop
513+
```
477514
478515
---
479516

0 commit comments

Comments
 (0)