Zero To Py

Download as pdf or txt
Download as pdf or txt
You are on page 1of 320

Zero to Py

A Comprehensive Guide to Learning the


Python Programming Language
Michael Green
This book is for sale at http://leanpub.com/zero-to-py

This version was published on 2023-03-11

This is a Leanpub book. Leanpub empowers authors and publishers


with the Lean Publishing process. Lean Publishing is the act of
publishing an in-progress ebook using lightweight tools and many
iterations to get reader feedback, pivot until you have the right book
and build traction once you do.

© 2023 Michael Green


Tweet This Book!
Please help Michael Green by spreading the word about this book on
Twitter!
The suggested hashtag for this book is #ZeroToPy.
Find out what other people are saying about the book by clicking on
this link to search for this hashtag on Twitter:
#ZeroToPy
Contents

Zero to Py: A Comprehensive Guide to Learning the Python


Programming Language . . . . . . . . . . . . . . . . . . . . . . . . 1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Part I: A Whirlwind Tour . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 0: System Setup . . . . . . . . . . . . . . . . . . . . . . . . . 3
Installing Python Natively . . . . . . . . . . . . . . . . . . . . 3
Python Versions . . . . . . . . . . . . . . . . . . . . . . . 4
Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
macOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
What’s in a PATH? . . . . . . . . . . . . . . . . . . . . . 6
Setting up your dev environment . . . . . . . . . . . . . . . 7
Running Python . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
A program that runs programs . . . . . . . . . . . . . 8
One interpreter, many modes . . . . . . . . . . . . . . 9
Chapter 1. Fundamental data types . . . . . . . . . . . . . . . . . . 11
But first, variables . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Primitives and Containers . . . . . . . . . . . . . . . . . . . . 14
Mutability vs Immutability . . . . . . . . . . . . . . . . . . . . 15
Introduction to Python’s data types . . . . . . . . . . . . . . 16
Primitive Types . . . . . . . . . . . . . . . . . . . . . . . . 16
Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
CONTENTS

Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Floats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Complex . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Container Types . . . . . . . . . . . . . . . . . . . . . . . 21
Tuples and Lists . . . . . . . . . . . . . . . . . . . . . . 21
Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . 24
Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Python Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
References to Objects . . . . . . . . . . . . . . . . . . . . 27
Chapter 2. Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Packing operators . . . . . . . . . . . . . . . . . . . . . . 32
Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Logical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
What is Truthy? . . . . . . . . . . . . . . . . . . . . . . . 37
Short-Circuits . . . . . . . . . . . . . . . . . . . . . . . . . 37
Logical Assignments . . . . . . . . . . . . . . . . . . . . 37
Membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Bitwise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Chapter 3. Lexical Structure . . . . . . . . . . . . . . . . . . . . . . . 43
Line Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Explicit and Implicit Line Joining . . . . . . . . . . . . . . . 45
Indentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Chapter 4. Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . 47
if, elif, and else . . . . . . . . . . . . . . . . . . . . . . . . . 48
while . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
break . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
CONTENTS

continue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
for . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
What is an Iterable? . . . . . . . . . . . . . . . . . . . . . 52
for/else and break . . . . . . . . . . . . . . . . . . . . . . . 53
Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . 55
raise from . . . . . . . . . . . . . . . . . . . . . . . . . . 57
else and finally . . . . . . . . . . . . . . . . . . . . . 58
match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
type checking . . . . . . . . . . . . . . . . . . . . . . . . . 61
guards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Or Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
as . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Chapter 5. Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Function Signatures . . . . . . . . . . . . . . . . . . . . . . . . . 67
Explicitly positional/key-value . . . . . . . . . . . . . . 70
Default Values . . . . . . . . . . . . . . . . . . . . . . . . . 70
Mutable Types as Default Values . . . . . . . . . . . 71
Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Nested Scopes . . . . . . . . . . . . . . . . . . . . . . . . . 74
nonlocal and global . . . . . . . . . . . . . . . . . . . 75
Closures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Anonymous Functions . . . . . . . . . . . . . . . . . . . . . . . 78
Decorators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Chapter 6. Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
data types as classes. . . . . . . . . . . . . . . . . . . . . . . . . 85
__dunder__ methods . . . . . . . . . . . . . . . . . . . . . . . 86
The __init__ method . . . . . . . . . . . . . . . . . . . 86
Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Class Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
A Functional Approach . . . . . . . . . . . . . . . . . . . . . . 89
@staticmethod . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
CONTENTS

@classmethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Part II. A Deeper Dive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Chapter 7. Expressions, Comprehensions, and Generators . . 93
Generator Expressions . . . . . . . . . . . . . . . . . . . . . . . 93
Generator Functions . . . . . . . . . . . . . . . . . . . . . . . . 94
yield from . . . . . . . . . . . . . . . . . . . . . . . . . . 95
List Comprehensions . . . . . . . . . . . . . . . . . . . . . . . . 96
Dictionary Comprehensions . . . . . . . . . . . . . . . . . . . 97
Expressions and the Walrus Operator . . . . . . . . . . . . . 98
Chapter 8. Python’s Built-in Functions . . . . . . . . . . . . . . . 98
Type Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Mathematical Functions . . . . . . . . . . . . . . . . . . . . . . 105
all() and any() . . . . . . . . . . . . . . . . . . . . . . . . . . 107
dir() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
enumerate() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
eval() and exec() . . . . . . . . . . . . . . . . . . . . . . . . 110
map() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
filter() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
input() and print() . . . . . . . . . . . . . . . . . . . . . . 112
open() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
range() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
sorted() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
reversed() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
zip() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Chapter 9. The Python Data Model . . . . . . . . . . . . . . . . . . 117
Object Creation Using __new__ and __init__ . . . . . . 118
Singletons . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Rich Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Operator Overloading . . . . . . . . . . . . . . . . . . . . . . . 122
String Representations . . . . . . . . . . . . . . . . . . . 125
CONTENTS

Emulating Containers . . . . . . . . . . . . . . . . . . . . . . . 126


Emulating Functions . . . . . . . . . . . . . . . . . . . . . . . . 128
Using Slots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Customizing Attribute Access . . . . . . . . . . . . . . . . . . 130
Iterators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Lazy Evaluation . . . . . . . . . . . . . . . . . . . . . . . 134
Context Managers . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Chapter 10. Concepts in Object-Oriented Programming . . . . 142
Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Calling the Super Class using super() . . . . . . . . 143
Multiple Inheritance and Method Resolution Order 144
Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Chapter 11. Metaclasses . . . . . . . . . . . . . . . . . . . . . . . . . 148
Chapter 12. The Data Types, Revisited. . . . . . . . . . . . . . . . 151
Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Bits and Bytes . . . . . . . . . . . . . . . . . . . . . . . . . 153
Floats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Float Methods . . . . . . . . . . . . . . . . . . . . . . . . . 154
Hex Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 155
Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Split and Join . . . . . . . . . . . . . . . . . . . . . . . . . 155
Search and Replace . . . . . . . . . . . . . . . . . . . . . 156
Paddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Formatting . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Translating . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Prefixes and Suffixes . . . . . . . . . . . . . . . . . . . . 160
CONTENTS

Boolean Checks . . . . . . . . . . . . . . . . . . . . . . . . 161


Case Methods . . . . . . . . . . . . . . . . . . . . . . . . . 162
Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Bytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Hex Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Counting and Indexing . . . . . . . . . . . . . . . . . . . 166
Copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Orderings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Iter Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Getter/Setter Methods . . . . . . . . . . . . . . . . . . . 171
Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Creating new Dictionaries . . . . . . . . . . . . . . . . 173
Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Boolean Checks . . . . . . . . . . . . . . . . . . . . . . . . 178
Copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Chapter 13. Type Hints . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Incorporating Type Hints . . . . . . . . . . . . . . . . . . . . . 180
Union types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Optional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
type|None . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Literal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Final . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
TypeAlias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
NewType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
CONTENTS

TypeVar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Runtime type checking . . . . . . . . . . . . . . . . . . . 193
Generics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
TypedDict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Chapter 14. Modules, Packages, and Namespaces . . . . . . . . 197
Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Module Attributes . . . . . . . . . . . . . . . . . . . . . . 200
if __name__ == "__main__": . . . . . . . . . . . . 201
Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Imports within packages . . . . . . . . . . . . . . . . . . 204
Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Part III. The Python Standard Library . . . . . . . . . . . . . . . . . . 207
Chapter 15. Virtual Environments . . . . . . . . . . . . . . . . . . . 207
Chapter 16. Copy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Chapter 17. Itertools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Chaining Iterables . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Filtering Iterables . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Cycling Through Iterables . . . . . . . . . . . . . . . . . . . . 215
Creating Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Slicing Iterables . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Zip Longest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Chapter 18. Functools . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
Partials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
Reduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Dispatching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Chapter 19. Enums, NamedTuples, and Dataclasses . . . . . . . 223
Enums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
CONTENTS

NamedTuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Dataclasses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Chapter 20. Multithreading and Multiprocessing . . . . . . . . . 228
Multithreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Thread Locks . . . . . . . . . . . . . . . . . . . . . . . . . 231
Multiprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
Process and Pool . . . . . . . . . . . . . . . . . . . . . . . 234
Process Locks . . . . . . . . . . . . . . . . . . . . . . . . . 236
Pipes and Queues . . . . . . . . . . . . . . . . . . . . . . 237
concurrent.futures . . . . . . . . . . . . . . . . . . . . . . 239
Chapter 21. Asyncio . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Coroutines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Tasks and Task Groups . . . . . . . . . . . . . . . . . . . . . . 244
ExceptionGroup and Exception unpacking . . . . 245
Part VI. The Underbelly of the Snake . . . . . . . . . . . . . . . . . . . 248
Chapter 22. Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . 248
pdb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Other Debuggers . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Chapter 23. Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
cProfile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
flameprof . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
snakeviz . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
memory_profiler . . . . . . . . . . . . . . . . . . . . . . . . . 264
Chapter 24. C extensions . . . . . . . . . . . . . . . . . . . . . . . . . 265
Hello World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
hello_world.c . . . . . . . . . . . . . . . . . . . . . . . 267
setup.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Passing data in and out of Python . . . . . . . . . . . . . . . 274
Memory Management . . . . . . . . . . . . . . . . . . . 278
Parsing Arguments . . . . . . . . . . . . . . . . . . . . . . . . . 279
CONTENTS

Parsing Tuple Arguments . . . . . . . . . . . . . . . . . 279


Parsing Keyword Arguments . . . . . . . . . . . . . . . 280
Creating PyObjects . . . . . . . . . . . . . . . . . . . . . . . . . 282
Importing Modules . . . . . . . . . . . . . . . . . . . . . . . . . 284
Defining New Types . . . . . . . . . . . . . . . . . . . . . . . . 285
Stack type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Debugging C extensions . . . . . . . . . . . . . . . . . . . . . 305
Zero to Py: A
Comprehensive Guide to
Learning the Python
Programming Language
Introduction

Python is a powerful and versatile programming language that is widely


used across many different industries. From data science and machine
learning to web development and automation, Python is a go-to choice
for many developers due to its ease of use and its vast developer
ecosystem.
This book aims to be a comprehensive guide to learning the Python
Programming Language, covering all the essential concepts and topics
that a python developer needs to know. This book has been written
so to be accessible to readers of all levels, whether you’re just starting
your journery into programming, or already have some experience with
python. Even more advanced users may find some utility from the later
chapters.
This book is divided into four parts.
In Part I of this book, we start by introducing you to the basics of Python,
Zero to Py: A Comprehensive Guide to Learning the Python Programming Language 2

building a foundation on the fundamentals as we cover topics such as


data types, operators, control flow, functions, and classes.
In Part II of this book, we build upon the foundation established in Part I
and dive deeper into more advanced features of the Python programming
language. We explore the more advanced features of Python, discussing
topics such as generators, the data object model, metaclasses, etc, with
the explicit aim to help you write more efficient and elegant code. And
finally, we’ll look into Python modules and packaging, so we can see how
to take your python code and construct libraries which can be shared
with the broader python community.
In Part III of this book, we will take a closer look at some of the
“batteries included” sections of the Python standard library. These are the
modules and packages that are included with every Python installation
and provide a wide range of functionality that can be used to solve
a variety of problems. We’ll explore packages like functools, itertools,
dataclasses, etc.
And finally in Part VI we’ll dig into mechanisms for profiling and
debugging python. Furthermore when we identify specific pain points
in our implementation, we’ll see how to refactor our projects to use
performant C code for a boost in execution speed. We’ll also look into
how to use C debuggers to run the python interpreter, so we can debug
our C extensions.
Part I: A Whirlwind Tour
Chapter 0: System Setup
If you have a working instance of python installed and are comfortable
using it, feel free to skip this chapter.
Python code is executed by a software program called the python
interpreter. The python interpreter reads the code from top to bottom,
line by line, and performs actions corresponding to the directive of the
program. Unlike a compiler, which converts the entire code into machine-
readable code before executing it, an interpreter executes the code as it
is read.
This book is written against CPython 3.11, and many of the features
discussed are version-dependent. If your system comes with a prepack-
aged version of python that is older than 3.11, you may consider looking
into a python version manager. Personally I recommend using pyenv to
manage your local and global python versions. My typical workflow is
to use pyenv to switch local versions to a target distribution, and then
use that version to create a local virtual environment using venv. This
however is just one of many opinions, and how you choose to configure
your setup is completely up to you.

Installing Python Natively

Installing the python interpreter is the first step to getting started with
programming in python. The installation process is different for every
Part I: A Whirlwind Tour 4

operating system. To install the latest version, Python 3.11, you’ll want to
go to the downloads¹ page of the official website for the Python Software
Foundation (PSF), and click the “download” button for the latest release
version. This will take you to a page where you can select a version
specific for your operating system.

Python Versions

Python is versioned by release number, <major>.<minor>.<patch>.


The latest version is the one with the greatest major version, followed
by the greatest minor version, followed by the greatest patch version.
So for example, 3.1.0 is greater that 2.7.18, and 3.11.2 is greater
than 3.10.8 (This is worth noting because in the download page on
python.org, a patch release for an older version may sometimes be
listed above the latest patch release for a newer version).

Windows

The PSF provides several installation options for windows users. The first
two options are for 32-bit and 64-bit versions of the python interpreter.
Your default choice is likely to be the 64-bit interpreter - the 32 bit
interpreter is only necessary for hardware which doesn’t support 64-bit,
which nowadays is atypical for standard desktop and laptop. The second
two options are for installing offline vs. over the web. If you intend to
hold onto the original installer for archive, the offline version is what
you want. Else, the web installer is what you want.
When you start the installer, a dialogue box will appear. There will be
four clickable items in this installer; an “Install Now” button, a “Cus-
tomize installation” button, a “install launcher for all users” checkbox,
¹https://www.python.org/downloads
Part I: A Whirlwind Tour 5

and a “Add python to PATH” checkbox. In order to launch python from


the terminal, you will need to check the “add python to path” checkbox.
With that, click “Install Now” to install Python.
A secondary option to a windows install is to use Windows Subsystem
for Linux. WSL is a tool for running a linux kernel along side your
windows installation. While the details of this tool are outside the scope
of this text, using WSL provides you a linux-like experience without
forcing you away from a windows OS. With WSL installed and running
in the terminal, you can read the rest of this book as if you were running
python on linux. And, given that a vast majority of python projects are
web servers, and those web servers run on linux, learning linux alongside
python would certainly be beneficial.

macOS

For macOS, a universal installer is provided in the downloads page


as a .pkg file. Downloading this installer and running through the
installation process will install this version of python to /usr/lo-
cal/bin/python3. This install will provide a separate instance of
python as compared to the Apple-controlled installation which comes
with your mac. This is important because you’ll need to take special care
that this new installation does not take precedence over your default
install, as system utilities, libraries, and programs may rely on the default
python install, and overriding the system python preferences may result
in instability.

Linux

For linux users, installing the latest version of python can be done by
compiling the source code. First, download a tarball from the downloads
Part I: A Whirlwind Tour 6

page and extract into a local directory. Next, install the build dependen-
cies, and run the ./configure script to configure the workspace. Run
make to compile the source code and finally use make altinstall to
install the python version as python<version> (this is so your latest
version doesn’t conflict with the system installation).

1 root@f26d333a183e:~# apt-get install wget tar make \


2 > gcc build-essential gdb lcov pkg-config libbz2-dev \
3 > libffi-dev libgdbm-dev libgdbm-compat-dev liblzma-dev \
4 > libncurses5-dev libreadline6-dev libsqlite3-dev \
5 > libssl-dev lzma lzma-dev tk-dev uuid-dev zlib1g-dev
6 root@f26d333a183e:~# wget https://www.python.org/ftp/python/3.11.\
7 2/Python-3.11.2.tar.xz
8 root@f26d333a183e:~# file="Python-3.11.2.tar.xz"; \
9 > tar -xvf $file && rm $file
10 root@f26d333a183e:~# cd Python-3.11.2
11 root@f26d333a183e:~# ./configure
12 root@f26d333a183e:~# make
13 root@f26d333a183e:~# make altinstall
14 root@f26d333a183e:~# python3.11
15 Python 3.11.2 (main, Feb 13 2023, 18:44:04) [GCC 11.3.0] on linux
16 Type "help", "copyright", "credits" or "license" for more informa\
17 tion.
18 >>>

What’s in a PATH?

Before moving forward, it’s worthwhile to talk a bit about the PATH
variable and what it’s used for. When you type a name into the terminal,
python for example, the computer looks for an executable on your
machine that matches this name. But it doesn’t look just anywhere; it
Part I: A Whirlwind Tour 7

looks in a specific list of folders, and the PATH environment variable is


that list.

1 root@2711ea43ad26:~# echo $PATH


2 /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
3 root@2711ea43ad26:~# which python3.11
4 /usr/local/bin/python3.11

So for example, on this linux machine, the $PATH variable is a list of


folders separated by a colon (on Windows the separator is a semicolon,
but the concept is the same). One of those folders is the /usr/local/bin
folder, which is where our python3.11 binary was installed. This means
we can simply type python3.11 into our terminal, and the OS will find
this binary and execute it.
If python is not in a folder specified in the PATH, or you’d if you’d prefer
not to change your PATH settings at this time, you can always execute
the
python executable directly.

1 root@d9687c856f09:/# /usr/local/bin/python
2 Python 3.11.1 (main, Jan 23 2023, 21:04:06) [GCC 10.2.1 20210110]\
3 on linux
4 Type "help", "copyright", "credits" or "license" for more informa\
5 tion.
6 >>>

Setting up your dev environment

Many developers have many opinions on what the best setup is for
writing Python. This debate over the optimal setup is a contentious
Part I: A Whirlwind Tour 8

one, with many developers holding strong opinions on the matter. Some
prefer a lightweight text editor like Sublime Text or Neovim, while others
prefer an IDE like PyCharm or Visual Studio Code. Some developers
prefer a minimalist setup with only a terminal and the interpreter itself,
while others prefer a more feature-rich environment with additional
tools builtin for debugging and testing. Ultimately, the best setup for
writing Python will vary depending on the individual developer’s needs
and preferences. Some developers may find that a certain setup works
best for them, while others may find that another setup is more suitable.
With that being said, if you’re new to python and writing software, it
might be best to keep your setup simple and focus on learning the basics
of the language. When it comes down to it, all you really need is an
interpreter and a .py file. This minimal setup allows you to focus on
the core concepts of the programming language without getting bogged
down by the plethora of additional tools and features. As you progress
and gain more experience, you can explore more advanced tools and
setups. But when starting out, keeping things simple will allow you to
quickly start writing and running your own code.

Running Python

At this point, we should have a working python installation on our


system. This means that somewhere in our file system there is a binary
which, if executed, will start the python interpreter.

A program that runs programs

In essence, the python interpreter is a program that runs programs. The


standard interpreter, the cpython interpreter, is primarily written in C
(though there are other implementations of the interpreter written in
Part I: A Whirlwind Tour 9

different languages, like Jython written in Java and IronPython written


in C#). When a Python script is executed, the interpreter reads the code
line by line and performs specified actions. This allows developers to
write code in a high-level language that is easy to read and understand,
while the interpreter takes care of the low-level details of translating the
code into machine-readable instructions.

One interpreter, many modes

The python interpreter can be interfaced with in a multitude of ways,


and before getting into the main contents of this book, it’ll be worth
discussing what those ways are, as this book contains many examples
which you may want to run locally yourself.
The first mode is called a REPL. REPL is an acronym that stands for “read,
evaluate, print, loop”. These are the four stages of a cycle which python
can use to collect input from a user, execute that input as code, print any
particular results, and finally loop back to repeat the cycle.
To enter the python REPL, execute the python binary with no arguments.

1 root@2711ea43ad26:~# python3.11
2 Python 3.11.2 (main, Feb 14 2023, 05:47:57) [GCC 11.3.0] on linux
3 Type "help", "copyright", "credits" or "license" for more informa\
4 tion.
5 >>>

Within the REPL we can write code, submit it, and the python interpreter
will execute that code and update its state accordingly. To exit the REPL,
simply type exit() or press Ctrl-D.
Many examples in this textbook are depicted as code which was written
in the python REPL. If you see three right-pointing angle brackets >>>,
Part I: A Whirlwind Tour 10

sometimes known as a chevron, this is meant to represent the right


pointing angle brackets of the python REPL. Furthermore, any blocks
of code which require multiple lines will be continuated with three dots
..., which is the default configuration of the python REPL.

1 root@2711ea43ad26:~# python3.11
2 Python 3.11.2 (main, Feb 14 2023, 05:47:57) [GCC 11.3.0] on linux
3 Type "help", "copyright", "credits" or "license" for more informa\
4 tion.
5 >>> for i in range(2):
6 ... print(i)
7 ...
8 0
9 1
10 >>>

A second mode is where you write python code in a file, and then
use the interpreter to execute that file. This is referred to as scripting,
where you execute your python script from the terminal using python
./script.py, where the only argument is a filepath to your python
script, which is a file with a .py extension. Many examples in this
textbook are depicted as scripts. They all start with the first line as a
comment # which will depict the relative filepath of the script.

1 # ./script.py
2
3 for i in range(2):
4 print(i)
Part I: A Whirlwind Tour 11

1 root@2711ea43ad26:~# python3.11 ./script.py


2 0
3 1

Occasionally throughout this text you’ll encounter examples where a -


c flag is passed to the python interpreter, followed by a string. This is
a special option in the interpreter where instead of passing in a python
program as a filename, we’re passing it in as a string.

1 root@03987ed73c0e:~# python -c "print('Hello World')"


2 Hello World

There are other ways to interact with the python interpreter and execute
python code, and the choice of which one to use will depend on your
specific needs and preferences. But for just starting off, these two means
of interacting with the interpreter are sufficient for our use cases.

Chapter 1. Fundamental data types

In the Python programming language, the data types are the funda-
mental building blocks for creating and manipulating data structures.
Understanding how to work with different data structures is essential
for any programmer, as it allows you to store, process, and manipulate
data in a variety of ways. In this chapter, we will cover the basic data
types of Python; including integers, floats, strings, booleans, and various
container types. We will explore their properties, how to create and
manipulate them, and how to perform common operations with them.
By the end of this chapter, you will have a solid understanding of the
basic data types in Python and will be able to use them effectively in
your own programs.
Part I: A Whirlwind Tour 12

But first, variables

But before we talk about data structures, we should first talk about how
we go about referencing a given data structure in Python. This is done by
assigning our data structures to variables using the assignment operator,
=. Variables are references to values, like data structures, and the type of
the variable is determined by the type of the value that is assigned to the
variable. For example, if you assign an integer value to a variable, that
variable will be of the type int. If you assign a string value to a variable,
that variable will be of the type str. You can check the type of a variable
by using the built-in type() function.

1 >>> x = "Hello!"
2 >>> x
3 "Hello!"
4 >>> type(x)
5 <class 'str'>

Note: this example using type() is our first encounter of whats called a
“function” in Python. We’ll talk more about functions at a later point in
this book. For now, just know that to use a function, you “call” it using
parentheses, and you pass it arguments. This type() function can take
one variable as an argument, and it returns the “type” of the variable it
was passed - in this case x is a str type, so type(x) returns the str type.
It’s also important to note that, in Python, variables do not have a set data
type; they are simply a name that refers to a value. The data type of a
variable is determined by the type of the value that is currently assigned
to the variable.
Variable names must follow a set of rules in order to be considered valid.
These rules include:
Part I: A Whirlwind Tour 13

• Variable names can only contain letters, numbers, and underscores.


They cannot contain spaces or special characters such as !, @, #, $,
etc.
• Variable names cannot begin with a number. They must begin with
a letter or an underscore.
• Variable names are case-sensitive, meaning that MyVariable and
myvariable are considered to be different variables.
• Python has some reserved words that cannot be used as variable
names. These include keywords such as False, None, True, and,
as, etc.

1 >>> __import__('keyword').kwlist
2 ['False', 'None', 'True', 'and', 'as', 'assert',
3 'async', 'await', 'break', 'class', 'continue',
4 'def', 'del', 'elif', 'else', 'except', 'finally',
5 'for', 'from', 'global', 'if', 'import', 'in',
6 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass',
7 'raise', 'return', 'try', 'while', 'with', 'yield']

A list of the reserved keywords in python.


Variables are not constant, meaning that their values and types can be
changed or reassigned after they are created. Or, once you have created
a variable and assigned a value to it, you can later change that value to
something else. For example, you might create a variable x and assign it
the value of 9, and later on in your program you can change the value
of x to "message".
Part I: A Whirlwind Tour 14

1 >>> x
2 9
3 >>> x = "message"
4 >>> x
5 "message"

The concept of constants does not exist in Python, but it is a common


practice to use all uppercase letters for variable names to indicate that
the variable is intended to be constant and should not be reassigned.

1 CONSTANT=3.14

Primitives and Containers

In Python, data types can be broadly classified into two categories:


primitive data types and container data types.
The primitive data types, also referred to as scalar data types, represent
a single value and are indivisible. The examples of of these data types
include the bool, the int, the float, the complex, and to a certain
extent, the str and bytes types. Primitive types are atomic, in that
they do not divide into smaller units of more primitive types. With the
exception of the string types, since they represent a single value, they
cannot be indexed or iterated.
On the other hand, container data types, also known as non-scalar data
types, represent multiple values, and are divisible. Some examples of
container data types include list, tuple, set, and dict. They are used
to store collections of values and allow for indexing and iteration (more
on that later). These types are built on top of the primitive data types
and provide a way to organize and manipulate more primitive data in a
structured way.
Part I: A Whirlwind Tour 15

Python str and bytes types are somewhat unique in that they have
properties of both primitive and container data types. They can be used
like primitive data types, as a single, indivisible value; but they also
behave like container types in that they are sequences. It can be indexed
and iterated, where each letter can be accessed individually. In this sense,
these types can be said to be a hybrid that combine the properties of both
primitive and container data types.

Mutability vs Immutability

One important concept to understand when working with container data


types is the difference between a container being mutable vs immutable.
Mutable types can be modified after they are created, while immutable
data types cannot be modified. For example, a list is a mutable data type,
so you can add, remove or change elements in a list after it is created. On
the other hand, a tuple is an immutable data type, so you cannot change
its elements after it is created. This difference in behavior can have a
significant impact on how you work with data in your programs and it’s
important to be aware of it.

1 >>> x = [1, 2, 3]
2 >>> x[0] = 0
3 >>> x
4 [0, 2, 3]
5
6 >>> y = (1, 2, 3)
7 >>> y[0] = 0
8 Traceback (most recent call last):
9 File "<stdin>", line 1, in <module>
10 TypeError: 'tuple' object does not support item assignment
Part I: A Whirlwind Tour 16

Introduction to Python’s data types

Below is a short introduction to the fundamental data types which are


found in Python. We’ll cover each data type more in depth as we progress
throughout this book.

Primitive Types

Booleans

The python boolean, represented by the bool data type, is a primitive


data type that has two possible values: True and False. It is used to
represent logical values and is commonly used to check whether a certain
condition is met.

1 >>> x = True
2 >>> type(x)
3 <class 'bool'>

Integers

The python integer, represented by the int data type, is a primitive data
type that is used to represent whole numbers. Integers can be positive
or negative and are commonly used in mathematical operations such as
addition, subtraction, multiplication, and division.

1 >>> x = 5
2 >>> type(x)
3 <class 'int'>
Part I: A Whirlwind Tour 17

Integers are by default are coded in base 10. However, Python integers
can also be represented in different number systems, such as decimal,
hexadecimal, octal, and binary. Hexadecimal representation uses the
base 16 number system with digits 0-9 and letters A-F. In Python, you
can represent a hexadecimal number by prefixing it with 0x. For example,
the decimal number 15 can be represented as 0xF in hexadecimal. Octal
representation uses the base 8 number system with digits 0-7. In Python,
you can represent an octal number by prefixing it with 0o. For example,
the decimal number 8 can be represented as 0o10 in octal. Binary
representation uses the base 2 number system with digits 0 and 1. In
Python, you can represent a binary number by prefixing it with 0b. For
example, the decimal number 5 can be represented as 0b101 in binary.

1 >>> 0xF
2 15
3 >>> 0o10
4 8
5 >>> 0b101
6 5

Floats

The python floating-point number, represented by the float data type,


is a primitive data type that is used to represent decimal numbers. Floats
are numbers with decimal points, such as 3.14 or 2.718. There are also
a few special float values, such as float('inf'), float('-inf'), and
float('nan'), which are representations of infinity, negative infinity,
and “Not a Number”, respectively.
Part I: A Whirlwind Tour 18

1 >>> x = 3.14
2 >>> type(x)
3 <class 'float'>

Complex

The python complex, represented by the complex data type, is a prim-


itive data type that is used to represent complex numbers. Complex
numbers are numbers which map to the complex plane using the
notation (x+yj), where x is the real part and y is the imaginary part of
the complex number. Complex numbers are often used in mathematical
operations such as arithmetic operations, trigonometry, and calculus.

1 >>> x = (1+2j)
2 >>> type(x)
3 <class 'complex'>

Strings

The python string, represented by the str data type, is a primitive data
type that is used to represent sequences of characters, such as "hello"
or "world". Strings are one of the most commonly used data types in
Python and are used to store and manipulate text. Strings can be created
by enclosing a sequence of characters in 'single' or "double" quotes.
They also can be made into multi-line strings using triple quotes (""" or
'''). Strings are immutable, meaning that once they are created, their
value cannot be modified.
Part I: A Whirlwind Tour 19

1 >>> x = "Hello"
2 >>> type(x)
3 <class 'str'>
4 >>> _str = """
5 ... this is a
6 ... multiline string.
7 ... """
8 >>> print(_str)
9
10 this is a
11 multiline string.
12
13 >>>

Strings can contain special characters that are represented using back-
slash \ as an escape character. These special characters include the
newline character \n, the tab character \t, the backspace character \b,
the carriage return \r, the form feed character \f, the single quote ' the
double quote " and backslash character itself \.
By default, Python interprets these escape sequences in a string, meaning
that when an escape character is encountered followed by a special char-
acter, python will replace the sequence with the corresponding character.
So for example, the string "hello\nworld" contains a newline character
that will separate "hello" and "world" on separate lines when printed
to the console or written to a file.
Raw strings, represented by the prefix r before the string, are used to
represent strings that should not have special characters interpreted. For
example, a raw string like r"c:\Windows\System32" will include the
backslashes in the string, whereas a normal string would interpret them
as escape characters.
Part I: A Whirlwind Tour 20

1 >>> "\n"
2 '\n'
3 >>> r"\n"
4 '\\n'

f-strings, also known as “formatted string literals”, allow you to embed


expressions inside strings using curly braces {}. They are useful for
creating strings that include values. To create one, simply prepend the
letter f to the string you’re trying to create. For example, an f-string
like f"Hello, {name}!" will include the value of the variable name
in the string. These f-strings do not require the provided variables be
of type str; python will implement an implicit type conversion during
formatting.

1 >>> name = "The Primeagen"


2 >>> slogan = f"My name is {name}!"
3 >>> slogan
4 'My name is The Primeagen!'

Byte strings, represented by the bytes data type, are used to represent
strings as a sequence of bytes. They are typically used when working
with binary data or when working with encodings that are not Unicode.
Byte strings can be created by prefixing a string with a b, for example,
b"Hello".

1 >>> type(b"")
2 <class 'bytes'>
3 >>>
Part I: A Whirlwind Tour 21

Container Types

Tuples and Lists

The python tuple, represented by the tuple data type, is a container


data type that is used to store an ordered collection of items. A tuple
is immutable; once it is created, its items cannot be modified. Tuples
are created by writing a sequence of primitives separated by commas.
While technically optional, it’s common to also enclude a set of enclosing
parentheses. For example, a tuple of integers can be written as (1, 2,
3) and a tuple of strings ("one", "two", "three"). The primitives
contained within a tuple can be of varying types.

1 >>> my_tuple = (1, "two", 3)


2 >>> type(my_tuple)
3 <class 'tuple'>
4 >>> # parentheses are optional (though recommended)
5 >>> 4, 5, 6
6 (4, 5, 6)

The python list, represented by the list data type, is a container data type
that is used to store an ordered collection of items. A list is similar to a
tuple, but it is mutable, meaning that its items in the container can be
modified after it is created. Lists are created by enclosing a sequence of
primitives in square brackets, separated by commas. For example, a list
of integers [1, 2, 3] or a list of strings ["one", "two", "three"].
Again, these primitives can be of varying types.
Part I: A Whirlwind Tour 22

1 >>> my_list = [4, "five", 6]


2 >>> type(my_list)
3 <class 'list'>

Both tuples and lists in Python can be indexed to access the individual
elements within their collection. Indexing is done by using square
brackets [] and providing the index of the element you want to access.
Indexing starts at 0, so the first element in the tuple or list has an index
of 0, the second element has an index of 1, and so on.
So for example if you have a tuple my_tuple = ("Peter", "Theo",
"Sarah"), you can access the first element by using the index my_tu-
ple[0]. Similarly, if you have a list my_list = ["Bob", "Karen",
"Steve"], you can access the second element by using the index
my_list[1].
Both tuples and lists use this convention for getting items from their re-
spective collections. Lists, given their mutability, also use this convention
for setting items.

1 >>> my_tuple = ("Peter", "Theo", "Sarah")


2 >>> my_tuple[0]
3 "Peter"
4 >>> my_list = ["Bob", "Karen", "Steve"]
5 >>> my_list[1]
6 "Karen"
7 >>> my_list[1] = "Sarah"
8 >>> my_list
9 ["Bob", "Sarah", "Steve"]

Negative indexing can also be used to access the elements of the tuple
or list in reverse order. For example, given the tuple my_tuple =
Part I: A Whirlwind Tour 23

("Peter", "Theo", "Sarah"), you can access the last element by


using the index my_tuple[-1]. Similarly, if you have a list my_list
= ["Bob", "Karen", "Steve"], you can access the second-to-last
element by using the index my_list[-2].

1 >>> my_tuple = ("Peter", "Theo", "Sarah")


2 >>> my_tuple[-1]
3 "Sarah"
4 >>> my_list = ["Bob", "Sarah", "Steve"]
5 >>> my_list[-3]
6 "Bob"

A third indexing options is to use slicing to access a range of elements


from a tuple or list. Slicing is done by creating a slice object using the :
operator inside the square brackets used for indexing. When providing
a slice, it should be known that the interval is half-open, including the
first number but excluding the last.
As an example, if you have a list my_list = ["Jake", "Karen",
"Paul", "Tim", "Greg", "Katie"], you can access the elements
from the second to the fourth by using using my_list[1:4]. An optional
third value can be provided to indicate a step value.

1 >>> my_list = ["Jake", "Karen", "Paul", "Tim", "Greg", "Katie"]


2 >>> my_list[1:4]
3 ["Karen", "Paul", "Tim"]
4 >>> my_list[1:5:2]
5 ["Karen", "Tim"]

When you try to access an index that is out of bounds of a list or a tuple,
Python raises an exception called IndexError (we’ll talk more about
Part I: A Whirlwind Tour 24

exceptions later, but for now just consider exceptions as how python
indicates that something went wrong). Or in other words, if you try to
access an element at an index that does not exist, Python will raise an
IndexError.

1 >>> my_list = ["Jake", "Karen", "Paul", "Tim", "Greg", "Katie"]


2 >>> my_list[6]
3 Traceback (most recent call last):
4 File "<stdin>", line 1, in <module>
5 IndexError: list index out of range

Dictionaries

The python dictionary, represented by the dict data type, is a container


data type that stores key-value pairs. Dictionaries are created by enclos-
ing a sequence of items in curly braces {}, with each key-value pair sep-
arated by a colon. For example, a dictionary of integers {1: "one", 2:
"two", 3: "three"} or a dictionary of strings {"apple": "fruit",
"banana": "fruit", "cherry": "fruit"}. Empty dictionaries can
be created simply using {}.
To get an item from a dictionary, you can once again use the square
bracket notation [] and provide a key as the index. For example, if
you have a dictionary my_dict = {"apple": "fruit", "banana":
"fruit", "cherry": "fruit"}, you can get the value associated
with the key "apple" by using the notation my_dict["apple"], which
would return "fruit". Requesting a value from a dictionary using a key
which is not indexed causes python to raise a KeyError exception.
Part I: A Whirlwind Tour 25

1 >>> my_dict = {"apple": "fruit", "banana": "fruit", "cherry": "fr\


2 uit"}
3 >>> my_dict["apple"]
4 "fruit"
5 >>> my_dict["orange"]
6 Traceback (most recent call last):
7 File "<stdin>", line 1, in <module>
8 KeyError: "orange"

Sets

The python set, represented by the set data type, is a container data type
that is used to store an unordered collection of unique items. Sets are
created by enclosing a sequence of items in curly braces {} and separated
by commas. For example, a set of integers {1, 2, 3} or a set of strings
{"bippity", "boppity", "boop"}. It is important to note that you
can not create an empty set outside of using a constructing function (a
constructor), as {} by default creates a dictionary. The constructor for
creating an empty set is set().
Python also includes a data type called frozenset that is similar to
the set data type, but with one key difference: it is immutable. This
means that once a frozenset is created, its elements cannot be added,
removed, or modified. Frozensets are created by using the frozenset()
constructor; for example, frozenset({1, 2, 3}) or frozenset([4,
5, 6]).
Part I: A Whirlwind Tour 26

1 >>> my_set = {"Michael", "Theo", "Michael"}


2 >>> my_set
3 {"Theo", "Michael"}
4 >>> frozenset(my_set)
5 frozenset({'Theo', 'Michael'})

Python Objects

In Python, everything is an object, which means that each element in the


language is an instance of a specific class. This includes the built-in data
types such as integers, strings, and lists, as well as more complex data
structures like functions and modules.
When a variable is assigned a value in Python, that value is actually an
object. For example, when you create a variable x and assign the value
5 to it, x is not just a number, it’s an instance of the int class, with the
value 5.
This is because in Python, the standard implementation of the language
is built on C-based infrastructure, where the basic unit of memory that
holds the value of a python object is a C structure called a PyObject*.
This struct contains information about the object’s type, reference
count, and the actual values. PyObject* is a pointer to a PyObject
and it can be used as an opaque handle to the python object, allowing
the implementation to abstract the details of these objects and providing
python developers a consistent way to interact with any kind of python
object, regardless of its type.
In Python, each object has a unique identity that is assigned to it when it
is created. This identity is an integer value that is used to distinguish one
object from another. The built-in function id() can be used to retrieve
the identity of an object.
Part I: A Whirlwind Tour 27

For example, the following code creates two variables, x and y, and
assigns the value 257 to both of them. Even though the values of x and
y are the same, they are different objects in memory and therefore have
different identities. Even though the values are the same, x and y are
given two different id values, because they are different objects.

1 >>> x = 257
2 >>> y = 257
3 >>> id(x)
4 140097007292784
5 >>> id(y)
6 140097007292496

It is generally the case that variable assignment creates an object that has
a unique identity. However, there are some exceptions to this rule.
There are specific values which Python has elected to make as what is
called a singleton. Meaning, that there will only ever be a single instance
of this object. For example, the integers -5 through 256 are singletons,
because of how common they are in everyday programming. The values
None, False, and True are singletons as well.

1 >>> x = 1
2 >>> y = 1
3 >>> id(x)
4 140097008697584
5 >>> id(y)
6 140097008697584

References to Objects

In Python, variables are references to objects, which means that when


you create a variable and assign a value to it, the variable does not store
Part I: A Whirlwind Tour 28

the value directly, it instead stores a reference to the object that contains
the value.
For example, consider the following:

1 >>> x = 5
2 >>> y = x
3 >>> x = 10

In this example, x is assigned the value 5, which means any reference


to x is fundamentally a reference to the object that is the integer 5.
The variable y is then assigned the value of x. By the same token then,
y is fundamentally a reference to the same object which x currently
references, the 5. This means that x and y are now both references to the
same object in memory. Now, we can subsequently change the object
that x references, to say the integer 10, and this will have no effect on y,
because x and y independently referenced the object 5.

1 x = 5 y = x x = 5
2 ------ ----- -----
3 x -- int(5) x x -- int(10)
4 \
5 int(5)
6 /
7 y y -- int(5)

It’s worth noting that when you work with mutable objects, like lists or
dictionaries, you have to be aware that if you assign a variable to another
variable, both of the variables fundamentally reference the same object
in memory. This means that any modification made to the object from
one variable will be visible to the other variable.
Part I: A Whirlwind Tour 29

1 x = {} y = x x["fizz"] = "buzz"
2 ------ ----- -----
3 x -- {} x x
4 \ \
5 {} {"fizz": "buzz"}
6 / /
7 y y

If your intention is to have two different objects which can be operated


on independently, you must create a copy of the first object, and assign
that copy to the second variable. We’ll explore how to do this in later
chapters.

Chapter 2. Operators

Operators are special symbols in Python that carry out various types
of computation. The value that a given operator operates on is called
the operand. For example: in the expression 4 + 5 = 9, 4 and 5 are
operands and + is the operator that designates addition. Python supports
the following types of operators: arithmetic, assignment, comparison,
logical, membership, bitwise, and identity. Each of these operators will
be covered in depth in the chapter.

Arithmetic

Arithmetic operators are used to perform mathematical operations. The


order of operations for these operators follows the same order of oper-
ations as in mathematics: Parentheses, Exponentiation, Multiplication
Part I: A Whirlwind Tour 30

and Division, and Addition and Subtraction. That being said it’s rec-
ommended to use parentheses liberally in order to make the code more
readable and avoid confusion about the order of operations.
Here are the most commonly used arithmetic operators in Python:

• + - adds two operands


• - - subtracts the right operand from the left operand
• * - multiplies the operands
• / - divides the left operand from the right operand
• // - returns a floor division of the left operand from the right
operand
• % - returns the remainder of a division of the left operand by the
right operand
• ** - raises the left operand to the power of the right operand

1 >>> x = 5
2 >>> y = 2
3 >>> x + y
4 7
5 >>> x - y
6 3
7 >>> x * y
8 10
9 >>> x / y
10 2.5
11 >>> x // y
12 2
13 >>> x % y
14 1
15 >>> x ** y
16 25
Part I: A Whirlwind Tour 31

Assignment

Assignment operators are used to assign values to variables. The most


basic assignment operator in Python is the = operator (of which we’ve
made plenty of use so far).
For example:

1 >>> my_var = "Hello!"

This assigns the string "Hello!" to the variable my_var.


In addition to the basic assignment operator, Python also includes a set
of shorthand assignment operators that can be used to perform the same
operation, plus a mathematical operation, in a shorter amount of code.
These include:

• += - adds the right operand to the left operand and assigns the result
to the left operand
• -= - subtracts the right operand from the left operand and assigns
the result to the left operand
• *= - multiplies the left operand by the right operand and assigns the
result to the left operand
• /= - divides the left operand by the right operand and assigns the
result to the left operand
• %= - takes the modulus of the left operand by the right operand and
assigns the result to the left operand
• //= - floor divides the left operand by the right operand and assigns
the result to the left operand
• **= - raises the left operand to the power of the right operand and
assigns the result to the left operand
Part I: A Whirlwind Tour 32

• := - expresses a value while also assigning it

For example:

1 >>> # () can contain expressions: this assigns 2 to y,


2 >>> # and expresses the 2 which is added to 3 (more on this later)
3 >>> x = 3 + (y := 2)
4 >>> y
5 2
6 >>> x
7 5
8 >>> x += 5 # is equivalent to x = x + 5
9 >>> x
10 10
11 >>> x *= 2 # is equivalent to x = x * 2
12 >>> x
13 20

Packing operators

Python allows you to assign the elements of a collection to multiple


variables inside a tuple, using a feature called tuple unpacking or tuple
assignment. To use tuple unpacking, write a comma-separated tuple with
variables on the left side of an assignment statement, and assign to that
tuple a sequence with the same number of elements.
Part I: A Whirlwind Tour 33

1 >>> (a, b) = [1, 2] # note that the parentheses are optional


2 >>> a
3 1
4 >>> b
5 2

Python also defines two unpacking operators which can be used to assign
the elements of a sequence to multiple variables in a single expression.
These include:

• * - iterable unpacking, unpacks a sequence of items


• ** - dictionary unpacking, unpacks dictionaries into key-value pairs

The * operator can be used to unpack a sequence into separate arguments.


For example, if you have a tuple my_tuple = (1, 2), you can unpack it
within the construction of a separate collection, like a list. The result will
be that the items of the tuple are unpacked during the list construction,
resulting in a single list of elements.

1 >>> my_tuple = (1, 2)


2 >>> my_list = [0, *my_tuple, 3, 4, 5]
3 >>> my_list
4 [0, 1, 2, 3, 4, 5]

These two operations can also be combined to deterministically unpack


a collection of values into a list. For example, the previous my_list can
be unpacked using a tuple assignment into the variables a, b, and c, with
variable b using the * operator as a means to collect any and all elements
which aren’t explicitly assigned to a and c.
Part I: A Whirlwind Tour 34

1 >>> (a, *b, c) = my_list


2 >>> a
3 0
4 >>> b
5 [1, 2, 3, 4]
6 >>> c
7 5

The ** operator can also be used for unpacking operations, though


in this case the operator expects operands which are dictionaries. For
example, we can merge two smaller dictionaries into one large dictionary
by unpacking the two dictionaries during construction of the merged
dictionary.

1 >>> dict_one = {1: 2, 3: 4}


2 >>> dict_two = {5: 6}
3 >>> {**dict_one, **dict_two}
4 {1: 2, 3: 4, 5: 6}

Comparison

Comparison operators are used to compare two values and return a


Boolean value (True or False) based on the outcome of the comparison.
These include:

• == - returns True if the operands are equal, False otherwise


• != - returns True if the operands are not equal, False otherwise
• > - returns True if the left operand is greater than the right operand,
False otherwise
• < - returns True if the left operand is less than the right operand,
False otherwise
Part I: A Whirlwind Tour 35

• >= - returns True if the left operand is greater than or equal to the
right operand, False otherwise
• <= - returns True if the left operand is less than or equal to the right
operand, False otherwise
• is - returns True if the left operand is the same instance as the right
operand, False otherwise
• is not - returns True if the left operand is not the same instance
as the right operand, False otherwise

Comparison operators are widely used in decision-making statements.


They can be used to compare not only numbers but also strings, charac-
ters, lists, and other data types.

1 >>> 1 == 1
2 True
3 >>> 1 == 2
4 False
5 >>> 1 != 2
6 True
7 >>> 1 is 1
8 True
9 >>> 5 < 2
10 False
11 >>> 5 > 2
12 True
13 >>> 5 <= 2
14 False
15 >>> 2 >= 2
16 True
17 >>> x = int(1)
18 >>> y = int(1)
19 >>> x is y # 1 is a singleton in Python
Part I: A Whirlwind Tour 36

20 True
21 >>> x = int(257)
22 >>> y = int(257)
23 >>> x is y # 257 is not a singleton
24 False
25 >>> x is not y
26 True

Logical

Logical operators are used to combine the results of one or more


comparison operations, and can be used to check multiple conditions
at the same time. The logical operators include:

• and - returns True if both operands are truthy, False otherwise


• or - returns True if either of the operands is truthy, False otherwise
• not - returns True if the operand is falsy, False if the operand is
truthy

1 >>> x = 4
2 >>> y = 2
3 >>> z = 8
4 >>> (x > 2 or y > 5) and z > 6
5 True

This example evaluates to True because a) x > 2 is True, b) y < 5 is


False, thus (True or False) is True, and finally True and True is
True.
Part I: A Whirlwind Tour 37

What is Truthy?

Objects which evaluate to True in a boolean context are considered


“truthy”. In conditional statements, a value does not need to be explicitly
True or False: being “truthy” or “falsy” is sufficient. Most of the time
objects are considered “truthy”, so it’s easier to list the objects which are
inherently “falsy”:

• False
• None
• 0
• 0.0
• Empty collections (i.e. [], (), {}, set(), etc.)
• Objects which define a __bool__() method that returns falsy
• Objects which define a __len__() method that returns falsy

Short-Circuits

It’s important to note that the and and or operators are short-circuit
operators, which means that they exit their respective operations eagerly.
In the case of and operator, it only evaluates the second operand if the
first operand is True, and in the case of or operator, it only evaluates
the second operand if the first operand is False.

Logical Assignments

Logical operators can also be used in assignment expressions. The short-


circuit nature of Python operators determines the assignment.
Part I: A Whirlwind Tour 38

1 >>> x = "" or 3
2 >>> x
3 3

This example assigns the variable x the value 3 because the empty string
is falsy.

1 >>> x = 0 and 6
2 >>> x
3 0

This example assigns the variable x the value 0 because zero is falsy, and
since and requires the first operand to be truthy, the resulting assignment
is 0.

1 >>> x = 3 and 6
2 >>> x
3 6

This example assigns the variable x the value 6 because 3 is truthy, and
since and requires the first operand to be truthy, the resulting assignment
falls to the second operand.
In instances where we want to explicitly assign a given variable a boolean
representation of the expression, we would need some way to evaluate
the logical operation before the assignment operation. One way to do this
is using an explicit type conversion, by passing the value to the bool()
constructor:
Part I: A Whirlwind Tour 39

1 >>> x = bool(3 and 6)


2 >>> x
3 True

However, we can also do this using only logical operators, and without
a function call.

1 >>> x = not not (3 and 6)


2 >>> x
3 True

In this example the parentheses evaluate 3 and 6 to 6 using the conven-


tion above. Next, not 6 evaluates False because 6 is truthy, and finally
not False evaluates True.

Membership

Membership operators are used to test whether a value is a member


of a sequence (such as a string, list, or tuple) or a collection (such as
a dictionary or set). The membership operators include:

• in - returns True if the value is found in the sequence, False


otherwise
• not in - returns True if the value is not found in the sequence,
False otherwise
Part I: A Whirlwind Tour 40

1 >>> collection = {1,2,3}


2 >>> 5 in collection
3 False
4 >>> 3 in collection
5 True
6 >>> "t" in "String"
7 True
8 >>> 1 in {1: 3} # Membership of dictionaries in python is depende\
9 nt on the key
10 True

Bitwise

Bitwise operators are used to perform operations on the binary represen-


tations of integers. These operators work on the individual bits of the
integers, rather than on the integers as a whole. The bitwise operators
include:

• & - performs a bitwise AND operation on the operands


• | - performs a bitwise OR operation on the operands
• ^ - performs a bitwise XOR operation on the operands
• - performs a bitwise NOT operation on the operand

• << - shifts the bits of the operand to the left by the number of places
specified by the second operand
• >> - shifts the bits of the operand to the right by the number of places
specified by the second operand
Part I: A Whirlwind Tour 41

1 >>> x = 0b101
2 >>> y = 0b011
3 >>> x & y
4 1 # 0b001
5 >>> x | y
6 7 # 0b111
7 >>> x ^ y
8 6 # 0b110
9 >>> ~x
10 -6 # -0b110
11 >>> x << 1
12 10 # 0b1010
13 >>> x >> 1
14 2 # 0b010

It’s important to note that the bitwise NOT operator (∼) inverts the bits
of the operand and returns the two’s complement of the operand.
It’s also worth noting that Python also has support for the analogous
bitwise assignment operators, such as &=, |=, ^=, <<=, and >>=, which
allow you to perform a bitwise operation and assignment in a single
statement.

Identity

Finally, The identity operator is used to compare the memory addresses


of two objects to determine if they are the same object or not. The identity
operators include:

• is - returns True if the operands are the same object, False otherwise
• is not - returns True if the operands are not the same object, False
otherwise
Part I: A Whirlwind Tour 42

You can use this identity operator to check against singletons, such as
None, False, True, etc.

1 # ./script.py
2
3 x = None
4 if x is None:
5 print("x is None!")

It’s important to note that two objects can have the same value but
different memory addresses; in this case the == operator can be used
to check if they have the same value and the is operator can be used to
check if they have the same memory address.

1 >>> x = [1, 2, 3]
2 >>> y = [1, 2, 3]
3 >>> z = x
4
5 >>> z is x
6 True
7 >>> x is y
8 False
9 >>> x == y
10 True

This example shows that even though x and y contain the same elements,
they are two different lists, so x is not y returns True. The variables
x and z however point to the same list, so x is z returns True.
Part I: A Whirlwind Tour 43

Chapter 3. Lexical Structure

Up until now, we’ve been working with Python in the context of single-
line statements. However, in order to proceed to writing full python
scripts and complete programs, we need to take a moment to discuss
the lexical structure which goes into producing valid Python code.
Lexical Structure refers to the smallest units of code that the language is
made up of, and the rules for combining these units into larger structures.
In other words, it is the set of rules that define the syntax and grammar of
the language, and they determine how programs written by developers
will be interpreted.

Line Structure

The python interpreter parses a python program as a sequence of “logical”


lines. To the python interpreter, each logical line is represented by a
NEWLINE token. Statements cannot extend beyond this token, except in
cases where the syntax allows for it, such as in compound statements. A
logical line is created by combining one or more “physical” lines, i.e. the
lines in a file you create by hitting the “enter” key.
A physical line in Python is a sequence of characters that is ended by
an end-of-line sequence, corresponding to hitting the “enter” key. This
end-of-line sequence either takes the form of the Unix line termination
sequence using ASCII LF (linefeed, b'\n'), the Windows sequence using
ASCII CR LF (carriage return followed by linefeed, b'\r\n'), or the
Macintosh sequence using ASCII CR (carriage return, b'\r'), depending
on the platform. The end of a file is also considered an implicit terminator
for the final physical line.
Part I: A Whirlwind Tour 44

1 # ./script.py
2
3 for i in range(2):
4 print(i)

1 root@a59580cc2448:~# python -c "


2 > print(open('./script.py', 'br').read())
3 > "
4 b'# ./script.py\n\nfor i in range(2):\n print(i)\n'
5 root@a59580cc2448:~#

In this example, we see how the python script ‘script.py’ looks in our
editor. We can contrast that to how the computer interpret the file, as a
sequence of characters separated by newline characters (as this is on a
linux machine, the newline character is \n).

Comments

A comment in Python begins with the hash character # and continues


until the end of the physical line. A comment indicates the end of
the logical line unless the rules for implicit line joining are applied.
Comments are ignored by the syntax and are not executed by the
interpreter.

1 >>> # this is a comment, and the next


2 >>> # line is executable code.
3 >>> this = True
Part I: A Whirlwind Tour 45

Explicit and Implicit Line Joining

Two or more physical lines in Python can be joined into a single logical
line using backslash characters \. This is done by placing a backslash at
the end of a physical line, which will be followed by the next physical
line. The backslash and the end-of-line character are removed, resulting
in a single logical line.

1 >>> # you can join multiple physical lines to make a single logic\
2 al line
3 >>> x = 1 + 2 + \
4 ... 3 + 4
5 ...
6 >>> x
7 10

In this example, the logical line “x = 1 + 2 + 3 + 4” is created by combining


two physical lines, separated by the backslash.
It should be noted that a line that ends with a backslash cannot contain
a comment.

1 >>> x = 1 + 2 \ # comment
2 File "<stdin>", line 1
3 x = 1 + 2 \ # comment
4 ^
5 SyntaxError: unexpected character after line continuation charact\
6 er

Expressions in Python that are enclosed in parentheses, square brackets,


or curly braces can be spread across multiple physical lines without the
need to use backslashes. For example:
Part I: A Whirlwind Tour 46

1 >>> # These are a valid ways of splitting an expression over mult\


2 iple
3 >>> # physical lines without using backslashes
4 >>> my_first_variable = (
5 ... 1 + 2 + 3 + 4
6 ... )
7 ...
8 >>> my_second_variable = [
9 ... 1, 2, 3, 4
10 ... ]
11 ...
12 >>> my_final_variable = {
13 ... 'a': 1, 'b': 2, 'c': 3, 'd': 4
14 ... }
15 ...

In this example, all the expressions are spread across multiple physical
lines, but they are still considered as a single logical line, because they
are enclosed in parentheses, square brackets, or curly braces.

Indentation

One important aspect of the lexical structure of Python is the use of


indentation to define scope. In Python, indentation is used to indicate
the scope of a block of code, rather than using curly braces like other
programming languages. This means that any code that has the same
indentation level, say under an if statement or a for loop, is considered
to be within the scope of that particular block. This feature makes the
code more readable and easy to understand, as it is clear at a glance
which lines of code are part of a specific block. However, it also means
that we need to be very careful about how we indent our code, as
improper indentation can lead to syntax errors.
Part I: A Whirlwind Tour 47

For example, in the following code snippet, the lines of code that are
indented under the if statement are considered to be within the scope
of the if statement and will only be executed if the condition x > 0 is
True.

1 >>> x = 5
2 >>> if x > 0:
3 ... print("x is positive")
4 ... x = x - 1
5 ... print("x is now", x)
6 ...
7 x is positive
8 x is now 4

Here, the first print statement "x is positive" is only executed if x is


greater than 0, and the second print statement "x is now" is executed
no matter what.
The indentation of the python code is crucial; if it is not correct, it will
raise an error, or worse, will do something you did not expect, leading to
bugs. Finally, the indentation must be consistent throughout the program,
usually 4 spaces or a tab.

Chapter 4. Control Flow

Control flow is a fundamental concept in programming that allows a


developer to control the order in which code is executed. In Python,
control flow is achieved through the use of conditional statements (such
as if/else) and looping constructs (such as for and while loops). By
using these constructs, developers can create logical conditions for code
Part I: A Whirlwind Tour 48

execution and repeat blocks of code as necessary. Understanding control


flow is essential for creating Python programs. In this chapter, we will
cover the basics of control flow, focusing on conditional statements,
loops, and other related concepts.

if, elif, and else

The if statement is a control flow construct that allows a developer to


specify that a block of code should only be executed if a certain condition
is met. The basic syntax of an if statement is as follows:

1 if condition:
2 # code to be executed if condition is Truthy

In addition to the if statement, Python also provides the elif (short for
“else if”) and else statements, which can be used to specify additional
blocks of code to be executed if the initial condition is not met.

1 if condition:
2 # code to be executed if `condition` is truthy
3 elif other_condition:
4 # code to be executed if `condition` is falsy
5 # and `other_condition` is truthy
6 else:
7 # code to be executed if both conditions are falsy

Question 1:
What values would trigger the if statement to execute?

1. condition = ["", None] and other_condition = True


Part I: A Whirlwind Tour 49

2. condition = "" and other_condition = True


3. condition = "" and other_condition = None

Answer 1: 1. is the correct answer, because condition is truthy, even


though all of its contents are falsy.
Question 2:
What values would trigger the elif statement to execute?

1. condition = False and other_condition = 0


2. condition = "None" and other_condition = False
3. condition = {} and other_condition = [0]

Answer 2: 3. is the correct answer, because condition is falsy, and


other_condition is truthy

while

The while statement is a control flow construct that allows a developer


to execute a block of code repeatedly, as long as a certain condition is
considered truthy. The basic syntax of a while loop is as follows:

1 while condition:
2 # code to be executed while `condition` is truthy

condition is an expression that is either truthy or falsy. If the condition


is truthy, the code indented under the while statement will be executed.
At the end of the block, the program loops back to the while statement
and the condition will be re-evaluated. If the condition is then falsy, the
loop will exit and the program will continue beyond the while block.
Part I: A Whirlwind Tour 50

1 >>> condition = 10
2 >>> while condition:
3 ... condition -= 1 # condition = condition - 1
4 ...
5 >>> condition
6 0 # 0 is falsy

It’s important to note that it’s the programmer’s responsibility to ensure


that the condition in the while loop will eventually evaluate to False,
otherwise the loop will run indefinitely. This is known as an infinite loop,
and it can cause your program to crash or hang.

break

The break statement is a control flow construct that can be used to


exit loops, including the while loop. When the interpreter encounters
a break statement within a loop, it immediately exits the loop and
continues execution at the next line of code following the loop.

1 >>> condition = 0
2 >>> while condition > 10:
3 ... if condition == 5:
4 ... break
5 ... condition -= 1
6 ...
7 >>> condition
8 5

continue

In instances where you would like to skip an iteration of a loop and move
onto the next, you can use a continue statement. When the interpreter
Part I: A Whirlwind Tour 51

encounters a continue statement within a loop, it immediately skips the


rest of the code in the current iteration of the block, and continues on to
the next iteration of the loop.

1 >>> i = 0
2 >>> skipped = {2, 4}
3 >>> while i <= 5:
4 ... i += 1
5 ... if number in skipped:
6 ... continue
7 ... print(number)
8 ...
9 1
10 3
11 5

for

The for statement is a control flow construct that allows a developer


to iterate over a collection of items, such as a list or a string. The basic
syntax of a for loop is as follows:

1 for item in iterable:


2 # code to be executed for each item in the sequence of items

Here, the variable item will take on the value of each item in the
sequence in turn, allowing you to operate on each item of the collection
one at a time.
Part I: A Whirlwind Tour 52

What is an Iterable?

An iterable is an object which has the potential to sequentially yields


each item of a collection in a predictable order. These items can be any
data type, such as numbers, strings, or even other objects. The items in
an iterable are accessed in a specific order, usually by their index or in
the order in which they were added to a collection.
Examples of built-in iterable objects in Python include lists, tuples, and
strings. For example, a list is an ordered collection of items, where each
item can be any data type and is accessed by its index:

1 >>> numbers = [1, 2, 3, 4, 5]


2 >>> for number in numbers:
3 ... print(number)
4 ...
5 1
6 2
7 3
8 4
9 5

In this case, numbers is an iterable, where each item is a number, and


the items can be accessed in order by their index.
A string is also a collection of items, where each item is a letter, and the
items can also be accessed by their index.
Part I: A Whirlwind Tour 53

1 >>> word = "hello"


2 >>> for letter in word:
3 ... print(letter)
4 ...
5 h
6 e
7 l
8 l
9 o

In this case, word is an iterable where each item is a character, and the
items can be accessed by their position in the string.

for/else and break

The for/else construct is a special combination of the for loop and the
else statement. It allows you to specify a block of code (the else block)
that will be executed after the for loop completes, but only if the loop
completes without a break. This means that if the loop is exited early
using a break statement, the else block will not be executed.

1 >>> numbers = [1, 2, 3, 4, 5]


2 ... for number in numbers:
3 ... if number == 4:
4 ... print("Found 4, exiting loop.")
5 ... break
6 ... print(number)
7 ... else:
8 ... print("Loop completed normally.")
9 ...
10 1
Part I: A Whirlwind Tour 54

11 2
12 3
13 Found 4, exiting loop.

In this example, the for loop iterates over the numbers in the list. When
it encounters the number 4, it exits the loop using the break statement.
Since the loop was exited early, the else block is not executed, and the
message “Loop completed normally” is not printed.
If the loop completes normally, the else block will be executed:

1 >>> numbers = [1, 2, 3, 4, 5]


2 ... for number in numbers:
3 ... if number == 6:
4 ... print("Found 4, exiting loop.")
5 ... break
6 ... print(number)
7 ... else:
8 ... print("Loop completed normally.")
9 ...
10 1
11 2
12 3
13 4
14 5
15 Loop completed normally

The for/else construct can be useful in certain situations where you


want to take different actions depending on whether the loop completed
normally or was exited early. However, it’s not a very common construct,
and in most cases, it can be replaced by a simple if statement after the
loop.
Part I: A Whirlwind Tour 55

Exception Handling

Exceptions are a way of handling errors and unexpected situations in a


program. When an exception occurs, it causes the normal flow of the
program to be interrupted, and the control flow is transferred to an
exception track.
This is done using the raise keyword, followed by an exception object
or class. The exception object can be any class that inherits from the built-
in Exception type. Raising an exception interrupts the normal flow of
execution and transfers the control flow to the exception track. Exception
classes can generally be instantiated with a single str type argument
that acts as an exception message.

1 >>> x = None
2 >>> if x is None:
3 ... raise ValueError("value 'x' should not be None")
4 ...
5 Traceback (most recent call last):
6 File "<stdin>", line 2, in <module>
7 ValueError: value 'x' should not be None

The exception track can be caught using an exception handler. Exception


handlers are blocks of code which allow you to gracefully handle an error
or unexpected situation, rather than abruptly terminating the program.
Exception handlers are defined using a try/except statement. The try
block contains the code that may raise an exception, while the except
block contains code that handles the exception. If an exception is raised in
the try block, the interpreter jumps to the except block, and searches
for an exception handler that matches the type of exception that was
raised.
Part I: A Whirlwind Tour 56

1 try:
2 # code that may raise an exception
3 except Exception:
4 # code to be executed if the exception occurred

Each try block can have one or more except blocks, so as to handle
multiple different types of errors. Each except block can have multiple
exceptions to match against, by using a tuple of exceptions. The exception
in an except block can be assigned to a variable using the as keyword.
When an Exception is raised in the try block, the interpreter checks
each except block in the order which they were defined to see if a
particular exception handler matches the type of exception that was
raised. If a match is found, the code in the corresponding except block
is executed, and the exception is considered handled. If no except block
matches the exception type, the exception continues to bubble up the call
stack until it is either handled elsewhere or the program terminates.

1 >>> try:
2 ... raise ArithmeticError
3 ... except (KeyError, ValueError):
4 ... print("handle Key and Value errors")
5 ... except ArithmeticError as err:
6 ... print(type(err))
7 ...
8 <class 'ArithmeticError'>

In this example, an ArithmeticError is raised in the try block. When


this error is raised, it is checked against each except statement in
the exception handler. The first exception handler is a (KeyError,
ValueError) tuple, and neither the KeyError or the ValueError match
the type of the exception raised, so this except block is passed over. The
Part I: A Whirlwind Tour 57

second exception handler is an ArithmeticError, which matches the


type of the exception, as shown by the print statement, so this except
block is executed and the error is considered handled.
Most exceptions are derived from the Exception class. As such, a
general except Exception will match most raised exceptions. There
are some instances where this may not be the desired effect; it is generally
better to be precise with your exception handling, so that unexpected
errors aren’t caught accidentally. In a similar vein, it’s worth noting that
not all exceptions need to be handled, and sometimes it may be better
to let the program crash if it encounters an error that it cannot recover
from.

raise from

The raise from idiom can be used to re-raise an exception which was
caught in an except block. This is useful in instance where we want to
add more context to an exception, so as to give more meaning to a given
exception that was raised.

1 >>> try:
2 ... x = int("abc")
3 ... except ValueError as e:
4 ... raise ValueError(f"Variable `x` recieved invalid input: {\
5 e}") from e
6 ...
7 Traceback (most recent call last):
8 File "<stdin>", line 2, in <module>
9 ValueError: invalid literal for int() with base 10: 'abc'
10
11 The above exception was the direct cause of the following excepti\
12 on:
Part I: A Whirlwind Tour 58

13
14 Traceback (most recent call last):
15 File "<stdin>", line 4, in <module>
16 ValueError: Variable `x` recieved invalid input: invalid literal \
17 for int() with base 10: 'abc'

else and finally

An else statement can be added to a try/except block to handle


instances where an exception was not raised. If however an exception
is raised in the try block, the else statement is not executed. Code that
should be executed regardless of whether or not an exception was raised
can be placed in a finally block. This block executes regardless of if
the program flow is on the normal track or the exception track.

1 >>> try:
2 ... pass
3 ... except Exception:
4 ... print("an exception was raised")
5 ... else:
6 ... print("no exception was raised")
7 ... finally:
8 ... print("this is going to be executed regardless")
9 ...
10 no exception was raised
11 this is going to be executed regardless
12 >>> try:
13 ... raise ValueError
14 ... except ArithmeticError:
15 ... pass
16 ... finally:
17 ... print("this is going to be executed regardless")
Part I: A Whirlwind Tour 59

18 ...
19 this is going to be executed regardless
20 Traceback (most recent call last):
21 File "<stdin>", line 2, in <module>
22 ValueError

match

A match statement allows you to match an expression against a set of


patterns, and execute different code depending on which pattern the
value matches. It’s similar to a series of if/elif/else statements, but
instead of relying on the truthiness of a given value, it relies on structural
pattern matching.

1 match <expression>:
2 case <pattern>:
3 <block>

Structural pattern matching is a programming paradigm that allows you


to match the both the structure of, and the values in, an expression by
comparing it against patterns defined in a series of case statements.
The comparison is done from top to bottom, until an exact match is
confirmed. Once a match is confirmed, the action associated with the
matching pattern is executed. If an exact match is not found, a wildcard
case, defined as case _:, if provided, will be used as the matching case
with no corresponding variable assignment. If an exact match is not
confirmed and a wildcard case does not exist, the entire match block
will be a no-op and no action will be taken.
Part I: A Whirlwind Tour 60

1 >>> coordinate = (0, 1)


2 >>> match coordinate:
3 ... case (x, 0):
4 ... print(f"coordinate is on the y axis, x={x}")
5 ... case (0, y):
6 ... print(f"coordinate is on the x axis, y={y}")
7 ... case _:
8 ... print("coordinate is on neither axis")
9 ...
10 coordinate is on the x axis, y=1

In this example, a variable coordinate is defined as a tuple with the val-


ues (0, 1). The match statement is then used to compare the coordinate
variable against different patterns in a series of case statements.
In the first case statement, the pattern is (x, 0), which matches if the
second value of the tuple is 0, and the first value can be any value,
represented by the variable x. If this pattern is matched, the string
“coordinate is on the y axis, x=<value of x>” is printed, where <value of
x> is replaced with the value of x. This pattern however does not match
the coordinate case, so the case block is passed over.
In the second case, the pattern is (0, y), which matches if the first
value of the tuple is 0, and the second value can be any value, represented
by the variable y. If this pattern is matched, the string “coordinate is on
the x axis, y=<value of y>” is printed, where <value of y> is replaced
with the value of y. This pattern is matched to the coordinate case, so the
variable y is assigned, and the case block executes, printing the formatted
string literal.
In the third case statement, the pattern is simply _, which is the wildcard
pattern that matches any value. This case would be executed if no prior
Part I: A Whirlwind Tour 61

cases were to match the expression. Since the second case was a match,
this block was passed over.
Matches can be composed of any array of objects, from simple primitive
types to collections and even user-defined types. Furthermore, collec-
tions can make use of unpacking operators to assign values to variables.

1 >>> location = {"city": "Boise", "state": "Idaho", "country": "US\


2 "}
3 >>> match location:
4 ... case {"country": "US", **rest}:
5 ... print((
6 ... f"the city {rest['city']}, {rest['state']}"
7 ... " is in the United States"
8 ... ))
9 ...
10 the city Boise, Idaho is in the United States
11 >>> cities = ["Seattle", "Spokane", "Portland", "Boise"]
12 >>> match cities:
13 ... case ["Seattle", "Spokane", *rest]:
14 ... print(rest)
15 ...
16 ['Portland', 'Boise']

type checking

Match statements are capable of performing type checks against a given


expression, executing case blocks only if the match expression adheres
to a specified type. The mechanism is called a class pattern, as classes are
the structure Python uses to define types.
To create a class pattern, the type definition is called with patterns as
arguments. If no argument is specified, any expression that matches the
Part I: A Whirlwind Tour 62

type is considered a match.

1 >>> item = 1
2 >>> match item:
3 ... case int(0 | -1):
4 ... print(f"{item} is either 0 or -1")
5 ... case int():
6 ... print(f"{item} is an int, not zero or -1")
7 ...
8 1 is an int, not 0 or -1

In this example, we’re using a match statement to check both the type
and the value of the given item. The first case defines a class pattern
for matching int types if the item matches the pattern of 0 or -1. This
pattern fails to match because item = 1. The second case defines a class
pattern for matching all int types, since a pattern is not provided in the
type call. This pattern matches the item, so the case block is executed.

guards

Additional conditions for pattern matching can be included using guard


statements. A guard is an if statement contained in the case which
checks the truthiness of an expression, and if that expression is truthy,
the pattern is matched.
Part I: A Whirlwind Tour 63

1 >>> coordinate = (0, 1)


2 >>> match coordinate:
3 ... case (0, y) if y == 1:
4 ... print((
5 ... "coordinate is on the x axis,"
6 ... f" and on the unit circle at y={y}"
7 ... ))
8 ... case (0, y):
9 ... print(f"coordinate is on the x axis, y={y}")
10 ...
11 coordinate is on the x axis, and on the unit circle at y=1

In this example, the first case matches because both the pattern of the
coordinate matches the case, and the expression y == 1 is True.

Or Patterns

The pipe operator | can be used to combine multiple patterns in a single


case statement. This defines an “or” relationship between patterns, where
the expression can match either the first or second operand. If a value
matches either of the pipe’s operands, the case is matched and the case
block is executed.
Part I: A Whirlwind Tour 64

1 >>> coordinate = (0, 1)


2 >>> match coordinate:
3 ... case (0, _) | (_, 0):
4 ... x, y = coordinate
5 ... print((
6 ... "coordinate is on an axis,"
7 ... f" x={x}, y={y}"
8 ... ))
9 ...
10 coordinate is on an axis, x=0, y=1

If an or pattern is used for variable binding, each pattern should define


the same variable names. If different patterns define different variable
names, the case will throw a SyntaxError.

1 >>> coordinate = (0, 1)


2 >>> match coordinate:
3 ... case (0, y) | (x, 0):
4 ... print((
5 ... "coordinate is on an axis,"
6 ... f" x={x}, y={y}"
7 ... ))
8 ...
9 File "<stdin>", line 2
10 SyntaxError: alternative patterns bind different names

as

The as keyword can be used to assign to a variable a pattern that matches


a piece of the expression being matched.
Part I: A Whirlwind Tour 65

1 >>> coordinate = (0, 1)


2 >>> match coordinate:
3 ... case (0 as x, y) | (x, 0 as y):
4 ... print((
5 ... "coordinate is on an axis,"
6 ... f" x={x}, y={y}"
7 ... ))
8 ...

In this example, the or pattern is used in the case definition, and each
pattern in the case maps the values of the coordinate to the variables x
and y. Both variables are defined in each pattern, so the case does not
throw a syntax error. The as keyword is used to bind matching values
to variables, which can subsequently be used in the case block when the
pattern is matched.

Chapter 5. Functions

Functions are some of the most fundamental building blocks in program-


ming. They allow you to encapsulate pieces of code and reuse it multiple
times throughout your program. This is beneficial, in that it helps you
avoid repeating the same code in multiple places, which can lead to
bugs and make your code harder to maintain. In short, functions are
a powerful tool that allow you to write better, more efficient, and more
maintainable code.
In Python, a function is defined using the def keyword, followed by the
name of the function, a set of parentheses which contain the arguments
of the function, and finally a colon :. The code that makes up the
function’s body is indented under the definition line. Finally, any values
Part I: A Whirlwind Tour 66

that are returned to the function caller are specified by the return
keyword. If a return is not defined, the function implicitly returns None.
For example, the following code defines a simple function called greet
that takes in a single argument, name, and returns a greeting using that
parameter:

1 >>> def greet(name):


2 ... greeting = f"Hello, {name}!"
3 ... return greeting
4 ...
5 >>> greet("Ricky")
6 'Hello, Ricky!'

Functions are not executed immediately when they are defined. Instead,
they are executed only when they are called. This means that when
a script or program is running, the interpreter will read the function
definition, but it will not execute the code within the function block until
the program calls the function.
Consider the following script:

1 # ./script.py
2
3 def my_function():
4 print("Function called")
5
6 print("Script start")
7 my_function()
8 print("Script end")

When this script is run, the Python interpreter first reads the function
definition. The function is defined, but the function code is not executed
Part I: A Whirlwind Tour 67

until the function is called on the line my_function(). As a result, the


output will be:

1 root@b854aeada00a:~/code# python script.py


2 Script start
3 Function called
4 Script end

Once a function is defined, it can be called multiple times, and each time
it will execute the code inside the function.

Function Signatures

A function signature is the combination of the function’s name and its


input parameters. In Python, the function signature includes the name
of the function after the def keyword, followed by the names of the
function parameters contained in parentheses.

1 function name
2 |
3 | /--/------- function parameters
4 def add(a, b):

The function name is the identifier that is used to call the function. It
should be chosen to be descriptive and meaningful, so other developers
can ascertain the function’s purpose.
The parameters of a function are variables that are assigned in the scope
of the body of the function when it is called. In Python, the parameters
are defined in parentheses following the function name. Each parameter
has a name, which is used to reference values within the function. It’s
Part I: A Whirlwind Tour 68

also possible to define a function with no parameters by using empty


parentheses.
These parameters can be assigned to positional arguments or keyword
arguments. A positional argument is a function argument which is
assigned a name in accordance to the order in which the names are
defined in the function signature. A keyword argument is an argument
which is explicitly assigned to a specific name using a key=value syntax.

1 >>> def sub(a, b):


2 ... return a - b
3 ...
4 >>> sub(1, 2)
5 -1
6 >>> sub(2, 1)
7 1
8 >>> sub(b=2, a=1)
9 -1

In this example, we define a sub() function with two arguments, a and


b. a is the first argument in the function signature, and b is the second.
When we call this function using positional arguments only, the first
argument is assigned to the name a and the second argument is assigned
to the name b. In the first function call, we pass the values 1, 2. Since 1 is
the argument passed first, it is assigned to the name a, and subsequently
2 is assigned to b. The function returns a - b, which in this case is 1 -
2 which results to -1.
If we reverse the order of the arguments in the second function call,
passing the values 2, 1, the 2 is now assigned to a and the 1 is assigned
to b. The function still returns the result of a - b, but in this case the
expression is 2 - 1, so the function returns 1.
Part I: A Whirlwind Tour 69

In the final function call, we explicitly assign values to the argument


names using the key=value syntax, by passing the values b=2, a=1.
Keyword arguments take preferential assignment over positional argu-
ments, so even though the 2 is passed first in the function signature, it
is explicitly assigned to b, where a is explicitly assigned to 1. This again
results in returning the expression 1 - 2, so the function returns -1.
Function calls can make use of both positional and keyword arguments,
but any positional arguments must be listed first when calling the
function. Given this flexibility, it’s possible to accidentally call a function
with a keyword argument which references a name which was already
assigned a value via a positional argument. Yet multiple arguments
cannot be assigned to the same name, so doing this will cause the
interpreter to raise a TypeError.

1 >>> def sub_add(a, b, c):


2 ... return a - b + c
3 ...
4 >>> sub_add(1, c=2, b=3)
5 0
6 >>> sub_add(1, 2, b=3)
7 File "<stdin>", line 1, in <module>
8 TypeError: sub_add() got multiple values for argument 'b'

In this example, we define a sub_add() function with three arguments,


a, b, and c, which returns the value of the expression a - b + c.
In the first function call, we pass 1 as a positional argument, which results
in the value 1 being assigned to a. The next two arguments are keyword
arguments, assigning c the value 2 and b the value 3. This results in the
function returning the value of the expression 1 - 3 + 2, which results
to 0.
Part I: A Whirlwind Tour 70

In the second example, we pass 1, 2 as positional arguments, which


results in the value 1 being assigned to a and the value 2 being assigned
to b. However we then specify the keyword argument b=3. This results
in two different values being assigned to b, which is not allowed, so the
interpreter raises a TypeError.

Explicitly positional/key-value

Optionally, variables can be explicitly made to be positional or key-


word arguments. A function signature containing / specifies that the
arguments to the left in the function signature are positional only. *
conversely specifies that arguments to the right in the function signature
are keyword only. Arguments between the two can be called in either
manner.

1 >>> def greet(name, /, title, *, punctuation):


2 ... return f"Hello {name}, {title}{punctuation}"
3 >>> greet("Ricky", "Esq", punctuation="!")
4 "Hello Ricky, Esq?"
5 >>> greet("Ricky", "Esq", "?")
6 Traceback (most recent call last):
7 File "<stdin>", line 1, in <module>
8 TypeError: greet() takes 2 positional arguments but 3 were given

Default Values

Function signatures can define default arguments for parameters by


using the assignment operator =. If when a function is called, it does
not specify a value for a given parameter, the default value is used.
Part I: A Whirlwind Tour 71

1 >>> def greet(name, punctuation="!"):


2 ... return f"Hello {name}{punctuation}"
3 ...
4 >>> greet("Jeff")
5 Hello Jeff!

Mutable Types as Default Values

When assigning a default value to a function, it is generally not recom-


mended to use a mutable type, such as a list or a dictionary, as a default
value. When a function is created with a default value, that same object
is used in every function call where the default value is not overridden.
This effectively creates shared state across all function calls, which can
lead to unexpected behavior.

1 >>> def my_function(value, default=[]):


2 ... default.append(value)
3 ... return default
4 ...
5 >>> x = my_function(0)
6 >>> y = my_function(1)
7 >>> y
8 [0, 1]
9 >>> x
10 [0, 1]

Unless this shared state is desired, it’s better to use immutable state for
default values, or a unique object to serve as an identity check.
Part I: A Whirlwind Tour 72

1 >>> def my_function(


2 ... value,
3 ... default=(UNDEFINED := object())
4 ... ):
5 ... if default is UNDEFINED:
6 ... default = []
7 ... default.append(value)
8 ... return default
9 ...
10 >>> x = my_function(0)
11 >>> y = my_function(1)
12 >>> y
13 [1]
14 >>> x
15 [0]

In this example, UNDEFINED is a unique object in the enclosing scope.


If no value is passed by the caller for default, then default is
UNDEFINED evaluates to True and the variable is replaced with a new
list. This guarantees that the returned list is a unique list.

1 >>> def create_definition(name, dtype, value, codes=()):


2 ... metadata = {"type": dtype}
3 ... for item in codes:
4 ... # some validation code operating per-item
5 ... # ...
6 ... if "codes" not in metadata:
7 ... metadata["codes"] = []
8 ... metadata["codes"].append(item)
9 ... obj = {"name": name, "value": value, "metadata": metadata}
10 ... return obj
11 ...
Part I: A Whirlwind Tour 73

12 >>> create_definition("a", "primative", 1)


13 {'name': 'a',
14 'value': 1,
15 'metadata': {'type': 'primative'}}
16 >>> create_definition("b", "collection", ['a', 'b', 'c'], [1, 2, \
17 3])
18 {'name': 'b',
19 'value': ['a', 'b', 'c'],
20 'metadata': {'type': 'collection', 'codes': [1, 2, 3]}}

In this example, codes defaults to a tuple, which is both immutable and


iterable.

Scope

Up until this point, the code we’ve written has made no sort of distinction
between when access to a variable is considered valid. Once a variable
has been defined, later code is able to access its value and make use of it.
With the introduction of functions, this assumption of a variable being
available at definition is no longer strictly valid. This is because variables
defined within a function block are not variables which can be accessed
from outside the function block. The function block is said to have it’s
own local scope, also referred to as a namespace (or, the space where a
given variable name is assigned), and that block scope is separate from
the top-level global scope of the interpreter.

1 def my_function():
2 x = 5
3 print(x)
Part I: A Whirlwind Tour 74

In this example, the variable x is defined within the local scope of the
function my_function, and it can only be accessed within the block
scope of the function. If you try to access it outside the function, you
will get an error.

Nested Scopes

In Python, the block scopes can be nested. A nested scope is a scope that
is defined within another scope. Any scope which is nested can read
variables from scopes which enclose the nested scope.
When the interpreter searches for a variable referenced in a nested scope,
it follows what’s referred to as the LEGB rule for searching scopes. LEGB
is an acronym for Local, Enclosing, Global, and Built-in. The interpreter
first looks in the local scope for a variable. If it is not found, the interpreter
then looks in any enclosing scopes for the variable. If it is still not found,
the interpreter then looks in the top-level global scope for the variable.
If it is still not found, it looks in the built-in scope.

1 >>> greeting = "Hello"


2 >>> def outer_function():
3 ... def inner_function():
4 ... greeting = "Hey"
5 ... print(greeting)
6 ... inner_function()
7 ... print(greeting)
8 ...
9 >>> outer_function()
10 Hey
11 Hello

In this example, the variable greeting is defined with different values


in three different scopes: the global scope, the outer_function scope,
Part I: A Whirlwind Tour 75

and the inner_function scope. When the inner function is called, the
interpreter first looks for the variable greeting in the local scope of the
inner function, and finds it with the value “Hey”, which is then printed to
the console. This scope is then exited when the function returns. The next
print(greeting) call looks for the variable greeting. This variable is
not defined in the local scope, so it next checks for any enclosing scopes.
There is no enclosing scope, as outer_function is a top-level function.
So the interpreter then checks the global scope for greeting, where
greeting is defined as “Hello”, which is then printed to the console.

nonlocal and global

By default, assignments are assumed to be in local scope. In order to make


changes to a variable in either an enclosing scope or the global scope,
you need to indicate to the interpreter that you’re looking to implement
nondefault behavior. Python provides two keywords, nonlocal and
global, for this purpose. nonlocal indicates that a variable reference
is in an enclosing scope, and global indicates that a reference is within
the global scope.

1 >>> global_greet = "Hello"


2 >>> def outer_function():
3 ... nonlocal_greet = "Hi"
4 ... def inner_function():
5 ... nonlocal nonlocal_greet
6 ... global global_greet
7 ... nonlocal_greet = "Hi"
8 ... global_greet = "Hey"
9 ... inner_function()
10 ... print(nonlocal_greet, global_greet)
11 ...
Part I: A Whirlwind Tour 76

12 >>> outer_function()
13 Hi Hey

In this example, the variables global_greet and nonlocal_greet are


defined with different values within different scopes: global_greet
is defined in the global scope, and nonlocal_greet is defined in
the block scope of outer_function. From the context of the block
scope of the inner_function, the scope of the outer_function is
an enclosing scope, and the block scope of the REPL is a global scope.
When inner_function is executed, the nonlocal keyword is used to
specify that the variable nonlocal_greet is bound to the enclosing
scope, and the global keyword is used to specify that the variable
global_greet is bound to the global scope. When the inner_func-
tion then executes its assignment operations, it changes the values to
which both global_greet and nonlocal_greet are referencing. This
substitution is made evident by the print() function, which prints the
new values of global_greet and nonlocal_greet to the console.

Closures

Closures are functions which have access to variables in an enclosing


scope, even when the function is invoked outside of that scope. This
allows the function to “remember” the values of variables from its
enclosing scope, and to continue to access them, even after the enclosing
scope has exited. Closures are useful for a variety of tasks, such as
creating callbacks, implementing decorators, and encapsulating state.
A closure is created when a nested function references a variable from
its enclosing function. For example:
Part I: A Whirlwind Tour 77

1 >>> def outer_function(message):


2 ... def inner_function():
3 ... print(message)
4 ... return inner_function
5 ...
6 >>> my_closure = outer_function("Hello")
7 >>> my_closure()
8 Hello

Here, the inner function inner_function references the variable mes-


sage from its enclosing function outer_function. When the outer
function is called, a reference to the inner function is returned. The
inner function maintains a reference to the variable message from its
enclosing scope. When the inner_function is invoked from the global
scope, it can still access the value of the message variable, even though
the message variable itself is not accessible from the global scope.
When evoking an assignment operation in closures, by default Python
will create a new local variable with the same name, if one does not
exist. Oftentimes however in closures the desired effect is to modify a
variable from an enclosing scope. This can be done by specifying that
variable using Python’s nonlocal keyword.
Part I: A Whirlwind Tour 78

1 >>> def make_counter():


2 ... count = 0
3 ... def counter():
4 ... nonlocal count
5 ... count += 1
6 ... return count
7 ... return counter
8 ...
9 >>> my_counter = make_counter()
10 >>> my_counter()
11 1
12 >>> my_counter()
13 2

In this example, the make_counter function defines a counter closure


that increments a count variable and returns the current count. The user
can interact with the counter by calling the closure, which increments
the count value which was defined in the enclosing scope.
Closures are ultimately a mechanism for exposing functionality without
(easily) exposing state. By using closures, you can ensure that the state
of a given variable is only modified in controlled ways. This mechanism
allows you to encapsulate state and control the behavior of the closure,
while still providing a simple but limited interface through the returned
function.

Anonymous Functions

Anonymous functions are functions that are defined without a name.


They are also known as lambda functions. Anonymous functions are
defined using the lambda keyword, followed by one or more arguments,
a colon : and finally an expression that defines the function’s behavior.
Part I: A Whirlwind Tour 79

For example, the following code defines an anonymous function that


takes one argument and returns its square:

1 >>> square = lambda x: x**2


2 >>> square(5)
3 25

Lambda functions typically used when you need to define a function that
will be used only once, or when you need to pass a small function as an
argument to another function.

Decorators

Decorators are a way to modify the behavior of an object by wrapping


it within a function. The decorator function takes the original function
as an argument and returns a new function that will replace the original
function. Decorators can be applied to functions using the @decorator
syntax, which is placed immediately before the definition of the object
that is being decorated.

1 >>> def my_decorator(fn):


2 ... def wrapper():
3 ... print("entering the function...")
4 ... fn()
5 ... print("exiting the function...")
6 ... return wrapper
7 ...
8 >>> @my_decorator
9 ... def my_function():
10 ... print("inside the function...")
11 ...
Part I: A Whirlwind Tour 80

12 >>> my_function()
13 entering the function...
14 inside the function...
15 exiting the function...

Decorators are typically 2 or 3 functions deep. Instances of 3-deep


functions allow you to configure the context of the decorator in a manner
not unlike a closure.

1 >>> def my_closure(value):


2 ... def my_decorator(fn):
3 ... def wrapper():
4 ... print(f"value is: {value}")
5 ... print("entering the function...")
6 ... fn()
7 ... print("exiting the function...")
8 ... return wrapper
9 ... return my_decorator
10 ...
11 >>> @my_decorator("this")
12 ... def my_function():
13 ... print("inside the function.")
14 ...
15 >>> my_function()
16 value is: this
17 entering the function...
18 inside the function.
19 exiting the function...

Decorators can also be stacked, allowing you to continuously layer


functionality through the decorator syntax.
Part I: A Whirlwind Tour 81

1 >>> def decorator_1(fn):


2 ... def wrapper():
3 ... print("entering decorator 1...")
4 ... fn()
5 ... print("exiting decorator 1...")
6 ... return wrapper
7 ...
8 >>> def decorator_2(fn):
9 ... def wrapper():
10 ... print("entering decorator 2...")
11 ... fn()
12 ... print("exiting decorator 2...")
13 ... return wrapper
14 ...
15 >>> def decorator_3(fn):
16 ... def wrapper():
17 ... print("entering decorator 3...")
18 ... fn()
19 ... print("exiting decorator 3...")
20 ... return wrapper
21 ...
22 >>> @decorator_1
23 ... @decorator_2
24 ... @decorator_3
25 ... def my_function():
26 ... print("inside the function.")
27 ...
28 >>> my_function()
29 entering decorator 1...
30 entering decorator 2...
31 entering decorator 3...
32 inside the function.
Part I: A Whirlwind Tour 82

33 exiting decorator 3...


34 exiting decorator 2...
35 exiting decorator 1...

It’s important to note that decorators are a syntactic sugar, meaning they
make a particular design pattern more elegant, but they do not add any
additional functionality to the language.

1 >>> def my_decorator(fn):


2 ... def wrapper():
3 ... fn()
4 ... return wrapper
5 ...
6 >>> # syntactic sugar, because this...
7 >>> @my_decorator
8 ... def my_function():
9 ... pass
10 ...
11 >>> # is functionally equivalent to this...
12 >>> my_function = my_decorator(my_function)
13 ...

Chapter 6. Classes

Classes are an essential concept in object-oriented programming and are


used to define new types of objects. They allow you to abstract the state
of an object and encapsulate it within a single entity, making it easy to
manage and manipulate. By creating classes, you can define the proper-
ties and methods of an object, using them to represent higher-abstraction
entities or concepts in your code, all while hiding the implementation
Part I: A Whirlwind Tour 83

details of how those properties and methods work. This allows you to
separate the interface of an object, which defines how it can be used,
from its implementation, which defines how it works.
In Python, a class is defined using the class keyword, followed by the
name of the class. The class definition typically starts with an indented
block of code, which is known as the class body.

1 >>> class MyClass:


2 ... pass

The pass keyword is used as a placeholder and does not do anything, but
it is needed in this case to create an empty class.
Once a class is defined, it can be instantiated by calling the type.

1 >>> my_object = MyClass()


2 >>> my_object

Inside the block scope of a class, you can define methods that are
associated with an object or class. Methods are used to define the
behavior of an object and to perform operations on the properties of the
object.
Methods are defined inside a class using the def keyword, followed by
the name of the method, a set of parentheses which contain the method’s
arguments, and finally a colon :. The first parameter of a standard
method is always a variable which references the instance of the class.
By convention, this is typically given the name self.
For example:
Part I: A Whirlwind Tour 84

1 >>> class MyClass:


2 ... def my_method(self):
3 ... print("Hello from my_method!")
4 ...

Once a method is defined, it can be called on an instance of the class


using a dot notation.

1 >>> my_object = MyClass()


2 >>> my_object.my_method()
3 Hello from my_method!

Methods can also take additional parameters and return values, similar
to functions.

1 >>> class MyClass:


2 ... def add(self, a, b):
3 ... return a + b
4 ...
5 ...
6 >>> my_object = MyClass()
7 >>> result = my_object.add(2, 3)
8 >>> result
9 5

In fact, to a certain extent methods are functions, bound to what’s


referred to as the namespace of the class. Functionally there is no
difference between str.join("", ('a', 'b')) and "".join(('a',
'b'))
Finally, classes can subclass other classes, which allows them to derive
functionality from a superclass. This concept is generally known as
inheritance, which will be discussed in more depth at a later time.
Part I: A Whirlwind Tour 85

1 >>> class YourClass:


2 ... def subtrace(self, a, b):
3 ... return a - b
4 ...
5 >>> class MyClass(YourClass):
6 ... pass
7 ...
8 >>> my_object = MyClass()
9 >>> result = my_object.subtract(2, 3)
10 >>> result
11 -1

data types as classes.

You might have noticed that earlier in the book when we called the
type() function on an object, the return specified that the type was
oftentimes a class.

1 >>> x = "Hello!"
2 >>> x
3 "Hello!"
4 >>> type(x)
5 <class 'str'>

In Python, all of the built-in data types, such as integers, strings, lists,
and dictionaries, are implemented in a manner similar to classes. And
similar to user-defined classes, all of the built-in data types in Python
have associated methods that can be used to manipulate their underlying
data structures. For example, and as we just saw, the str class defines a
method called join(). This join method takes the instance of the calling
object, a string, as self, and a collection of other strings which are joined
using self as the separator.
Part I: A Whirlwind Tour 86

1 >>> " ".join(('a', 'b', 'c'))


2 'a b c'

Given this, calling the class method is equally valid; in this case we’re
simply passing the self parameter explicitly.

1 >>> str.join(" ", ('a', 'b', 'c'))


2 'a b c'

__dunder__ methods

“dunder” (short for “double underscore”) methods, also known as “magic


methods”, are special methods that have a specific meaning and are used
to define specific behaviors of objects. They are identified by their double
underscore prefix and suffix, such as __init__ or __len__.
For now we’re only going to focus on one particular method, the __-
init__ method, and we’ll return to review the others in later chapters.

The __init__ method

The __init__ method is a special method that is automatically called


after an object is created. It is used to initialize the attributes of the object.
The only thing required to create an __init__ method is to name it as
__init__.
Note: __init__ is not a constructor. More on this later.
Once you have defined the __init__ method, you can create an instance
of the class, and initialize any attributes you wish to assign the instance.
This is done by passing in the values as arguments when you create the
Part I: A Whirlwind Tour 87

object by calling it. Those values will be passed into the call of the __-
init__ method, where they can be assigned to the instance self via an
instance attribute.

1 >>> class MyClass:


2 ... def __init__(self, my_value):
3 ... self.my_value = my_value
4 ...
5 >>> my_object = MyClass(5)
6 >>> my_object.my_value
7 5

Attributes

In Python, attributes are a way to define variables that are associated


with an object or class. They are used to store the state of an object and
can be accessed or modified using the dot notation.
There are different ways to define attributes in Python, but the most
common approach is to use instance variables, which can be defined
inside any class method using the self keyword.
This was demonstrated in the previous example, where the __init__-
method defines an attribute called my_value, which is assigned to the
instance self.
You can access, assign, or modify attribute values using the dot notation.
For example, the following code creates an instance of the MyClass class
and sets its my_value attribute to 10:
Part I: A Whirlwind Tour 88

1 >>> my_object = MyClass(5)


2 >>> my_object.my_value
3 5
4 >>> my_object.my_value = 10
5 >>> my_object.my_value
6 10

Class Attributes

Class attributes are variables that are defined at the class level, rather
than the instance level. They are shared amongst all instances of a class
and can be accessed using either the class name, or an instance of the
class.

1 >>> class MyClass:


2 ... class_attribute = "I am a class attribute."
3 ...
4 >>> obj1 = MyClass()
5 >>> obj2 = MyClass()
6
7 >>> obj1.class_attribute
8 I am a class attribute.
9 >>> obj2.class_attribute
10 I am a class attribute.
11 >>> MyClass.class_attribute
12 I am a class attribute.

It’s worth noting that the same precaution about using default function
arguments also applies to class attributes. Since class attributes are
shared across all instances of the class, mutable state should likewise
be avoided.
Part I: A Whirlwind Tour 89

1 >>> class One:


2 ... items = []
3 ...
4 >>> a = One()
5 >>> b = One()
6 >>> a.items.append(1)
7 >>> b.items
8 [1]

A Functional Approach

Python provides the functions hasattr(), getattr(), setattr(), and


delattr(), which can be used to work with attributes of objects.
The hasattr() function is used to check if an object has a particular
attribute or not. It takes two arguments: the object, and the name of the
attribute as a string. It returns True if the object has the attribute, and
False otherwise.
The getattr() function retrieves the value of an attribute of an object.
It takes at least two arguments: the object, and the name of the attribute
as a string. It returns the value of the attribute if it exists, and raises
an AttributeError if it does not. You can also provide an optional
third parameter to serve as a default value, which will be returned if
the attribute does not exist.
The setattr() function sets the value of an attribute of an object. It
takes three arguments: the object, the name of the attribute as a string,
and the value to which to set the attribute. If the attribute does not exist,
it will be created.
Finally, the delattr() function deletes an attribute of an object. It takes
two arguments: the object, and the name of the attribute as a string. If
Part I: A Whirlwind Tour 90

the attribute exists, it will be deleted; if it does not exist the function call
will raise an AttributeError.

1 >>> class MyClass:


2 ... def __init__(self, my_value):
3 ... self.my_value = my_value
4 ...
5 >>> my_object = MyClass(1)
6 >>> hasattr(my_object, "my_value")
7 True
8 >>> getattr(my_object, "my_value")
9 1
10 >>> setattr(my_object, "my_value", 5)
11 >>> getattr(my_object, "my_value")
12 5
13 >>> delattr(my_object, "my_value")
14 >>> hasattr(my_object, "my_value")
15 False
16 >>> getattr(my_object, "my_value", -1)
17 -1
18 >>> getattr(my_object, "my_value")
19 Traceback (most recent call last):
20 File "<stdin>", line 13, in <module>
21 AttributeError: 'object' object has no attribute 'my_value'

@staticmethod

@staticmethod is a decorator that is used to define static methods. A


static method is effectively a function which exists in the namespace of
the class. It does not have access to the class or instance state, meaning
it cannot modify or access any attributes or methods of the class or its
instances.
Part I: A Whirlwind Tour 91

1 >>> class MyClass:


2 ... @staticmethod
3 ... def static_method():
4 ... return "I am a static method."
5 ...
6 >>> MyClass.static_method()
7 I am a static method.
8 >>> obj = MyClass()
9 >>> obj.static_method()
10 I am a static method.

@classmethod

@classmethod is a decorator that is used to define class methods. A class


method is a method that is bound to the class and not the instance of the
object. It can be called on the class itself, as well as on any instance of
the class. A class method takes the class as its first argument, typically
aliased as cls.

1 >>> class MyClass:


2 ... def __init__(self, value=None):
3 ... self.value = value
4 ...
5 ... @classmethod
6 ... def class_method(cls, value):
7 ... return cls(value)
8 ...
9 >>> MyClass.class_method("Hello World!")
10 <__main__.MyClass object at 0x7fe83b29c040>
11 >>> obj = MyClass()
12 >>> obj.class_method("Hello World!")
13 <__main__.MyClass at 0x7fe83b39c6d0>
Part I: A Whirlwind Tour 92

14 >>> class YourClass(MyClass): pass


15 >>> YourClass.class_method("Hello World!")
16 <__main__.YourClass at 0x7fe83bdec3d0>

A common use case for class methods is to define alternative constructors


for a class. For example, you might want to create an object from a
string or from a file. To do this you can define class methods that handle
these cases and return instances of the class. The benefit of using a
@classmethod over a @staticmethod in this scenario is that cls is
bound to the class that calls it, and this includes subclasses.
Part II. A Deeper Dive
Chapter 7. Expressions,
Comprehensions, and Generators

In Python an expression is a piece of code that evaluates to a value.


They can be used in a variety of contexts, such as in assignment
statements, function calls, and control flow statements. Expressions can
take many forms, including arithmetic expressions, logical expressions,
and function calls. And in certain cases, expressions can be used to create
new objects, some of which we’ll discuss here.

Generator Expressions

A generator expression is a special type of expression that returns an


object, called an iterator, that generates values on-the-fly, one at a time,
rather than eagerly creating a data structure in memory. The iterator
can be used in a for loop or other iteration constructs to iterate over the
values in the expression.
The syntax for a generator expression is similar to that of a normal
expression, but with a single for clause. The expression is enclosed within
parenthesis () and followed by one or more for clauses and/or if/else
clauses. The for and if/else clauses are used to filter or map the data
from the iterable and construct the generator.
Part II. A Deeper Dive 94

1 (item for item in iterable if condition)


2 (item if condition else other for item in iterable)

Generator expressions are useful when working with large data sets
or when the values in the expression are the result of some expensive
computation. Because they generate values on-the-fly, they can be more
memory-efficient than creating a list in memory.
They can also be used to build powerful and efficient iterator pipelines.
Using iterators in this fashion allows you to perform multiple operations
on data without necessitating that each operation hold intermediary
values in memory.

1 >>> numbers = range(10)


2 >>> # generator expression, doesn't
3 >>> #create intermediary state
4 >>> even_numbers = (
5 ... n for n in range(10) if n % 2 == 0
6 ... )
7 >>> # another generator expression, doesn't
8 >>> # create intermediary state
9 >>> squared_numbers = (n**2 for n in even_numbers)
10 >>> # state only materializes in the final list
11 >>> list(squared_numbers)
12 [0, 4, 16, 36, 64]

Generator Functions

Generator functions are similar to normal functions, but instead of


returning a value, they return a generator object, similar to generator
expressions, which can be used to iterate over a sequence of values.
Part II. A Deeper Dive 95

A generator function is defined like a normal function, using the def


keyword, but instead of using return to return a value, it uses the yield
keyword to yield a value per-iteration. The generator function yields
multiple values, one at a time, during its execution.

1 >>> def my_generator():


2 ... yield 1
3 ... yield 2
4 ... yield 3

When a generator function is called, it returns a generator object, but


does not start executing the function. The function is only executed when
the generator object is iterated over, which can be done manually using
the next() function, or in the context of iteration, like a for loop. Each
time the generator function yields a value, it is returned to the caller and
the function’s execution is paused. The next time the generator is iterated
over, it resumes executing where it left off, with a saved state.

1 >>> gen = my_generator()


2 >>> next(gen)
3 1
4 >>> next(gen)
5 2
6 >>> next(gen)
7 3

yield from

The yield from statement can be used to pass the context of an iteration
from one generator to the next. When a generator executes yield from,
the control of an iteration passes to the called generator or iterator.
Part II. A Deeper Dive 96

When that generator is exhausted, control is yielded back to the calling


generator for further iteration.

1 >>> def _generator():


2 ... yield 2
3 ...
4 >>> def generator():
5 ... yield 1
6 ... yield from _generator()
7 ... yield 3
8 ...
9 >>> for i in generator():
10 ... print(i)
11 1
12 2
13 3

List Comprehensions

List comprehensions are built-in way to create a list by applying a single


expression to each item in an existing iterable. The syntax of a list
comprehension is similar to the generator expression, but it is enclosed
within square brackets []. The expression is followed by one or more
for clauses and/or if/else clauses, like a generator expression. The for
clause is used to iterate over the items in the iterable, and the if/else
clause is used to filter the items.

1 [item for item in iterable if condition]


2 [item if condition else other for item in iterable]

List comprehensions are generally faster than their equivalent python


Part II. A Deeper Dive 97

for loop, because the code that generates the list is fully implemented in
C.

1 >>> [x**2 for x in range(4)]


2 [0, 1, 4, 9]

Dictionary Comprehensions

Dictionary comprehensions provide a means for creating dictionaries


from expressions. The syntax for a dictionary comprehension is similar
to that of a list comprehension, but with curly braces {} instead of square
brackets [].
The dictionary comprehension consists of an expression, and one or more
for clauses and/or if/else clauses, which are used to construct the
key-value pairs of the dictionary. Oftentimes the iterable is an iterator
which yields len(2) tuples which can be unpacked into key and value
variables.

1 {key: value for (key, value) in iterable if condition}

This however is not strictly necessary; for example the following dictio-
nary comprehension creates a dictionary that maps some letters of the
alphabet to their corresponding ASCII values:

1 >>> {chr(i): i for i in range(97, 100)}


2 {'a': 97, 'b': 98, 'c': 99}
Part II. A Deeper Dive 98

Expressions and the Walrus Operator

The walrus operator can be used in a list comprehension to assign a value


to a variable in the for clause and then use that variable in the expression.
This can be useful when you want to use the value of a variable multiple
times in the comprehension, or when the value of the variable is the
result of an expensive computation.

1 >>> # collects the square of n only if that square is even


2 >>> [sq for n in range(1, 11) if (sq := n**2) % 2 == 0]
3 [4, 16, 36, 64, 100]

Chapter 8. Python’s Built-in Functions

Python provides a wide range of built-in functions that we can directly


leverage. These functions are accessible from the built-in namespace, and
we can use them without the need to import any additional libraries.
These functions are built into the Python interpreter and provide a
wide range of functionality, from basic operations such as mathematical
calculations and string manipulation, to more advanced functionality
such as file I/O and error handling. In this chapter, we will explore some
of the most commonly used built-in functions in Python and learn how
to use them.
The list of builtin functions include:

• abs(x) - Returns the absolute value of a number x


• aiter(i) - Asynchronous iterator, return an asynchronous iterator
from an asynchronous iterable i.
• all(i) - Returns True if all elements in an iterable i are true
Part II. A Deeper Dive 99

• any(i) - Returns True if at least one element in an iterable i is true


• anext(ai) - Retrieve the next item from an asynchronous iterator
ai.
• ascii(o) - Returns a string containing a printable representation
of an object o
• bin(x) - Converts an integer x to a binary string
• bool(x) - Converts a value x into a Boolean
• breakpoint() - Drops the runtime into a python debugging ses-
sion.
• bytearray(*args, **kwargs) - Returns a bytearray object
• bytes(*args, **kwargs) - Returns a bytes object
• callable(o) - Returns True if the object o is callable, False other-
wise
• chr(c) - Returns a string representing a character c whose Unicode
code point is the integer
• classmethod(fn) - Returns a class method for a function fn
• compile(s, *args, **kwargs) - Returns a code object from
source s (string, file, etc.)
• complex(r, i) - Returns a complex number where r is real and i
is imaginary
• delattr(o, n) - Deletes an attribute of name n from an object o
• dict() - Returns a new dictionary
• dir(o) - Returns a list of names in the namespace of object o
• divmod(a, b) - Takes two numbers a and b and returns a pair of
numbers (a tuple) consisting of their quotient and remainder
• enumerate(o[, s]) - Returns an enumerate object, which can be
used to iterate over an iterable o and get the index of each element.
Optional start value starts the enumeration at value s
• eval(s, *args) - Evaluates a string s as a Python expression
• exec(s, *args, **kwargs) - Used for the dynamic execution of
source s which is a valid Python programs
Part II. A Deeper Dive 100

• filter(fn, i) - Returns an iterator from elements of an iterable


i for which a function fn returns True. Uses lazy evaluation.
• float(x) - Converts a value x to a float
• format(o) - Returns a formatted string version of an object o
• frozenset() - Returns an immutable frozenset object
• getattr(o, n[, d]) - Returns the value of a named attribute n
from object o. Optional default d is returned if attribute does not
exist.
• globals() - Returns the current global symbol table as a dictionary
• hasattr(o, n) - Returns True if the object o has the given named
attribute n, False otherwise
• hash(o) - Returns the hash value of an object o
• help(o) - Invokes the built-in help system for object o
• hex(i) - Converts an integer i to a hexadecimal string
• id(o) - Returns the identity of an object o
• input(p) - Reads a line from input, after printing the optional
prompt p
• int(x) - Converts a value x to an integer
• isinstance(o, t) - Returns True if the object o is an instance of
the specified type t, False otherwise. If t is tuple or union, checks
against all types in t.
• issubclass(c, s) - Returns True if a class c is a subclass of a
specified superclass s, False otherwise
• iter(o) - Returns an iterator of object o
• len(o) - Returns the length of an object o
• list() - Returns a new list
• locals() - Returns an updated dictionary of the current namespace
• map(fn, i) - Returns an iterator which applies a function fn to all
items in an iterable i. Uses lazy evaluation.
• max(*args[, key=fn]) - Returns the largest item of two or more
arguments, optional key is a function fn to apply to each item
Part II. A Deeper Dive 101

• memoryview(o) - Returns a memory view object of the given object


o
• min(*args[, key=fn]) - Returns the smallest item of two or more
arguments, optional key is a function fn to apply to each item
• next(i) - Retrieves the next item from an iterator i
• object() - Returns a new featureless object
• oct(i) - Converts an integer i to an octal string
• open(p, m, *args, **kwargs) - Opens a file and returns a file
object
• ord(c) - Given a string representing one Unicode character c,
returns an integer representing the Unicode code point of that
character
• pow(x, y) - Returns the value of x to the power of y (x**y)
• print(*args, **kwargs) - Prints the specified messages args to
the screen
• property(fg[, fs, fd, d]) - Gets, sets, or deletes a property of
an object
• range(*args, **kwargs) - Returns a sequence of numbers
• repr(o) - Returns a string containing a printable representation of
an object o
• reversed(s) - Returns a reversed iterator of a sequence s
• round(n, d) - Rounds a number n to the nearest integer, or to the
specified number of decimals d.
• set() - Returns a new set object
• setattr(o, n, v) - Sets the value v of a named attribute n of an
object o
• slice(*args, **kwargs) - Returns a slice object
• sorted(i[, key=fn, reversed=False]) - Returns a sorted list
from the specified iterable i, optional key is a function fn to apply
to each item, and is reversed if keword reversed is truthy.
• staticmethod(fn) - Returns a static method for a function fn
Part II. A Deeper Dive 102

• str() - Returns a string object


• sum(i) - Sums the items of an iterable i
• super() - Returns a temporary object of the superclass
• tuple() - Returns a new tuple object
• type(o) - Returns the type of an object o
• vars(o) - Returns the dict attribute of an object o
• zip(*args[, strict=False]) - Returns an iterator of tuples,
where the first item in each passed iterator is paired together, and
then the second item in each passed iterator is paired together, etc.

We’ll discuss some of the more common builtin synchronous functions


here, and discuss the async functions more in depth later.

Type Conversions

Type conversions are the process of converting one data type to another
data type. Python provides the built-in functions int(), float(), com-
plex(), str(), bytes(), list(), dict(), set(), and frozenset()
to use for type conversions.
The int() function is used to convert a number or a string into an
integer. It takes a single object and returns a new object of type int if
a conversion is possible, else it throws a ValueError. An optional base
keyword can be provided if the string is in a format other than base 10.
Part II. A Deeper Dive 103

1 >>> int('123')
2 123
3 >>> int('a')
4 Traceback (most recent call last):
5 File "<stdin>", line 1, in <module>
6 ValueError: invalid literal for int() with base 10: 'a'
7 >>> int('101', base=2)
8 5

The float() function is used to convert a number or string a floating-


point
number. If conversion fails, throws a ValueError.

1 >>> float('12.34')
2 12.34
3 >>> float('a')
4 Traceback (most recent call last):
5 File "<stdin>", line 1, in <module>
6 ValueError: could not convert string to float: 'a'

The complex() function is used to create a complex value of the form


(x+yj)

1 >>> complex(1, 2)
2 (1+2j)

The str() function is used to convert create a string representation of a


given object. An optional encoding option can be provided if the object
requires transpiling into utf8.
Part II. A Deeper Dive 104

1 >>> str(123)
2 '123'
3 >>> str(b'\xdf', encoding='latin1')
4 'ß'

The bytes() function is used to create an immutable bytes object from


either a string, an iterable of integers between 0-255, an integer, or
another bytes object. If a string is provided, a second encoding argument
must specify the string encoding for type conversion. If the argument is
a single integer, the function returns a null-value bytes object of length
equal to the integer.

1 >>> bytes('abc', encoding='utf8')


2 b'abc'
3 >>> bytes(2)
4 b'\x00\x00'
5 >>> bytes((1, 2, 3))
6 b'\x01\x02\x03'

The list() function is used to convert iterables into a list. If no


argument is provided, it creates a new, empty list.

1 >>> list('abcde')
2 ['a', 'b', 'c', 'd', 'e']
3 >>> list()
4 []

The dict() function is used to convert an iterable of paired values


into a dictionary. It also can create new dictionaries from key=value
arguments. If no arguments are provided, it returns a new, empty
dictionary.
Part II. A Deeper Dive 105

1 >>> dict([('a', 1), ('b', 2), ('c', 3)])


2 {'a': 1, 'b': 2, 'c': 3}
3 >>> dict(a=1, b=2, c=3)
4 {'a': 1, 'b': 2, 'c': 3}

The set() function is used to convert an iterable of values into a set. If


no iterable is provided, it returns a new, empty set.

1 >>> set([1,2,3,1])
2 {1, 2, 3}
3 >>> set()
4 set()

The frozenset() function is used to convert an iterable of values into


a set that is also immutable. If no iterable is provided, it returns a new,
empty frozenset.

1 >>> frozenset([1,2,3,1])
2 frozenset({1, 2, 3})
3 >>> frozenset()
4 frozenset()

Mathematical Functions

Python provides several built-in mathematical functions that can be


used to perform various mathematical operations, such as abs(), min(),
max(), sum(), and pow().
The abs() function returns the absolute value of a number.
Part II. A Deeper Dive 106

1 >>> abs(-5)
2 5
3 >>> abs(5)
4 5

The min() function returns the smallest item in an iterable or the


smallest of two or more arguments. An optional key keyword argument
can be provided, which is a function to be applied to each item before
comparison. If the item is an iterable, a default keyword argument can
also be provided, in case the iterable is empty.

1 >>> min([1, 2, 3, 4, 5])


2 1
3 >>> min(1, 2, 3, 4, 5)
4 1
5 >>> min('abc', key=lambda x: ord(x))
6 'a'
7 >>> min((), default=0)
8 0

The max() function returns the largest item in an iterable or the largest
of two or more arguments. The same optional keyword values from min
apply.
Part II. A Deeper Dive 107

1 >>> max([1, 2, 3, 4, 5])


2 5
3 >>> max(1, 2, 3, 4, 5)
4 5
5 >>> max('abc', key=lambda x: ord(x))
6 'c'
7 >>> max((), default=0)
8 0

The sum() function returns the sum of all items in an iterable. It also
accepts an optional second argument which is used as the starting value.

1 >>> sum([1, 2, 3, 4, 5])


2 15
3 >>> sum([1, 2, 3, 4, 5], 10)
4 25

The pow() function returns the value of x to the power of y (x**y).

1 >>> pow(2, 3)
2 8

all() and any()

The all() and any() functions are used to check if all or any of the
elements in an iterable are truthy. The all() function returns True if
all elements in an iterable are truthy and False otherwise. The any()
function returns True if any of the elements in an iterable are truthy,
and False otherwise.
Part II. A Deeper Dive 108

1 >>> all([True, True, True])


2 True
3 >>> all([True, True, False])
4 False
5 >>> all([0, 1, 2])
6 False
7 >>> all([])
8 True
9
10 >>> any([True, True, True])
11 True
12 >>> any([True, True, False])
13 True
14 >>> any([0, 1, 2])
15 True
16 >>> any([])
17 False

all() and any() can also be used with an expression to check if all or
any elements in a sequence meet a certain condition.

1 >>> my_list = [1, 2, 3, 4]


2 >>> all(i > 0 for i in my_list)
3 True
4 >>> any(i < 0 for i in my_list)
5 False

dir()

The dir() function is used to find out the attributes and methods of an
object. When called without an argument, dir() returns a list of names in
Part II. A Deeper Dive 109

the current local scope or global scope. When called with an argument,
it returns a list of attribute and method names in the namespace of the
object.

1 >>> dir()
2 ['__annotations__', '__builtins__', '__doc__',
3 '__loader__', '__name__', '__package__', '__spec__']
4 >>> dir(list)
5 ['__add__', '__class__', '__contains__', '__delattr__',
6 '__delitem__', '__dir__', '__doc__', '__eq__',
7 '__format__', '__ge__', '__getattribute__',
8 '__getitem__', '__gt__', '__hash__', '__iadd__',
9 '__imul__', '__init__', '__init_subclass__',
10 '__iter__', '__le__', '__len__', '__lt__', '__mul__',
11 '__ne__', '__new__', '__reduce__', '__reduce_ex__',
12 '__repr__', '__reversed__', '__rmul__', '__setattr__',
13 '__setitem__', '__sizeof__', '__str__',
14 '__subclasshook__', 'append', 'clear', 'copy', 'count',
15 'extend', 'index', 'insert', 'pop', 'remove',
16 'reverse', 'sort']

enumerate()

The enumerate() function is used to iterate over an iterable. It returns


an enumerate object, which can be used to access the index and value of
each element in the iterable. It takes an iterable as an argument, and an
optional start value which specifies from which value to start counting.
Part II. A Deeper Dive 110

1 >>> fruits = ['apple', 'banana', 'orange']


2 >>> for i, fruit in enumerate(fruits, start=1):
3 ... print(i, fruit)
4 ...
5 1 apple
6 2 banana
7 3 orange

eval() and exec()

The eval() and exec() functions are built-in Python functions that are
used to evaluate and execute code, respectively.
The eval() function takes a least one argument, which is a string
containing a valid Python expression, and evaluates it, returning the
result of the expression. It also takes optional globals and locals
arguments which act as global and local namespaces. If these values
aren’t provided, the global and local namespaces of the current scope
are used.

1 >>> x = 1
2 >>> y = 2
3 >>> eval("x + y")
4 3
5 >>> eval("z + a", {}, {"z": 2, "a": 3})
6 5

Similarly, exec() function takes a single argument, which is a string


containing valid Python code, and executes it. It also takes optional
globals and locals arguments.
Part II. A Deeper Dive 111

1 >>> x = 1
2 >>> y = 2
3 >>> exec("result = x + y")
4 >>> result
5 3

It’s important to note that eval() and exec() can execute any code
that could be written in a Python script. If the strings passed to these
functions are not properly sanitized, a hacker could use them to execute
arbitrary code with the permissions of the user running the script.
Both eval() and exec() have the potential to introduce security
vulnerabilities in your code, so it’s important to be extremely careful
when using them, and to avoid them if possible.

map()

The map() function applies a given function to all items of an input


iterable. It returns a map object as a lazy iterable, meaning that the values
of the mapping are only produced when the map object is consumed.

1 >>> numbers = [1, 2, 3, 4, 5]


2 >>> # the squares are not calculated
3 >>> # when the map() is created
4 >>> squared_numbers = map(lambda x: x**2, numbers)
5 >>> # the squares are only calculated once
6 >>> # the squared_numbers iterable is consumed,
7 >>> # in this case during the construction of a list.
8 >>> list(squared_numbers)
9 [1, 4, 9, 16, 25]
Part II. A Deeper Dive 112

filter()

The filter() function takes a function f and an iterable, and returns a


filter object. This filter object is a lazy iterable, only yielding values as it
is consumed. The values it yields are all items i of the iterable where the
application of the function f(i) returns a truthy value.

1 >>> numbers = [1, 2, 3, 4, 5]


2 >>> even_numbers = filter(lambda x: x%2==0, numbers)
3 >>> list(even_numbers)
4 [2, 4]
5 >>> numbers = (-1, 0, 1)
6 >>> # 0 is falsy, so filter will drop it
7 >>> # and keep both -1 and 1
8 >>> list(filter(lambda x: x, numbers))
9 [-1, 1]

input() and print()

The input() and print() functions are are used to read input from the
user and print output to the console, respectively.
The input() function reads a line of text from the standard input
(usually the keyboard) and returns it as a string. A prompt can be passed
as an optional argument that will be displayed to the end user.

1 >>> name = input("What is your name? ")


2 What is your name? TJ
3 >>> name
4 'TJ'
Part II. A Deeper Dive 113

Conversely, the print() function writes a string to the standard output


(usually the console). The print() function takes one or more objects
to be printed, separated by commas, with an optional separator between
objects, which defaults to a space. The function also takes an optional
end parameter, which is appended after the last object, which defaults
to a newline character, an an optional file argument, which is the file
to write to, by default sys.stdout, and a flush argument, which defaults
to False, which specifies whether to flush the buffer or not on printing
to the file.

1 print(
2 *objects,
3 sep=' ',
4 end='\n',
5 file=sys.stdout,
6 flush=False
7 )

open()

The open() function takes the name of the file or a file-like object as
an argument and opens it in a specified mode. Different modes can
be selected by passing a second mode argument, or the mode keyword.
Modes include r for read mode (default), w for write mode, a for append
mode, x for exclusive creation mode, b for binary mode, t for text mode
(which is the default), and + for both reading and writing. Modes which
don’t conflict can be used in concert.
Part II. A Deeper Dive 114

1 >>> file = open("./file.txt", mode="wb")


2 >>> file.write(b"Hello!\n")
3 >>> file.close()

The open() function talk a number of other optional arguments.

• buffering - the buffering policy for the file. The default is to use
the system default buffering policy (-1). A value of 1 configures the
file object to buffer per-line in text mode. A value greater than 1
configures a buffer size, in bytes, for the file object to use. Finally, a
value of 0 switches buffering off (though this option is only available
with binary mode).
• encoding - the encoding to be used for the file. The default is None,
which sets the file object to use the default encoding for the platform.
Can be any string which python recognizes as a valid codec.
• errors - the error handling policy for the file. The default is None,
which means that errors will be handled in the default way for the
platform.
• newline - configures the file object to anticipate a specific newline
character. The default is None, which means that universal newlines
mode is disabled.
• closefd - whether the file descriptor should be closed when the file
is closed. The default is True.
• opener - a custom opener for opening the file. The default is None,
which means that the built-in opener will be used.

range()

The range() function is used to generate an iterable for yielding a


sequence of numbers. The function takes up to three arguments: If only
Part II. A Deeper Dive 115

one value is provided, it yields a sequence from [0, n), where n is the
stop value passed to range. If two arguments are provided, the range
is [a, b), where a is the start value and b is the stop value. If a third
argument is provided, it acts as a step value, incrementing the count per-
iteration by the value of the third step argument.

1 >>> tuple(range(2))
2 (0, 1)
3 >>> list(range(1, 3))
4 [1, 2]
5 >>> for i in range(start=2, stop=8, step=2):
6 >>> print(i)
7 2
8 4
9 6

sorted()

The sorted() function is used to produce a sorted list of elements. It


takes an argument for the sequence of elements to be sorted, an optional
argument for a function used to extract a key from each element for
sorting and an optional argument, a boolean indicating whether the
elements should be sorted in descending order, which defaults to False.
It returns a new list, instead of mutating the original.
Part II. A Deeper Dive 116

1 >>> numbers = ["3", "2", "1", "4", "5"]


2 >>> sorted_numbers = sorted(
3 ... numbers,
4 ... key=lambda x: int(x),
5 ... reverse=True
6 ... )
7 >>> print(sorted_numbers)
8 ['5', '4', '3', '2', '1']
9 >>> numbers
10 ['3', '2', '1', '4', '5']

reversed()

The reversed() function is used to create an iterator which iterates


over a collection of items in reverse order.

1 >>> numbers = [1, 2, 3, 4, 5]


2 >>> reversed_numbers = list(reversed(numbers))
3 >>> print(reversed_numbers)
4 [5, 4, 3, 2, 1]

zip()

The zip() function creates an iterator which consumes iterables. The


first value yielded by the iterator is a tuple containing the first item
yielded by each iterable provided. The second item yielded is a tuple
containing the second item yielded by each iterable. This process of
yielding tuples continues until one of the iterables provided to the
iterator is exhausted. An optional strict keyword can be provided to
zip() which will cause the iterator to raise a ValueError should one
iterable be exhausted before the others.
Part II. A Deeper Dive 117

1 >>> a, b = (1,2,3), (4,5,6)


2 >>> for items in zip(a, b):
3 >>> print(items)
4 (1, 4)
5 (2, 5)
6 (3, 6)
7 >>> list(zip([1, 2], [3, 4, 5]))
8 [(1, 3), (2, 4)]
9 >>> list(zip([1, 2], [3, 4, 5], strict=True))
10 Traceback (most recent call last):
11 File "<stdin>", line 1, in <module>
12 ValueError: zip() argument 2 is longer than argument 1

Chapter 9. The Python Data Model

Underlying infrastructure that defines how Python objects interact with


one another is referred to as the Python Data Model. It is the set of
rules and conventions that govern how Python objects can be created,
manipulated, and used.
The functionality of the data model is implemented through the use of
special methods, also known as “dunder methods” or “magic methods”,
which can be used to hook into specific behaviors provided to objects by
the python interpreter. These methods have a set syntax, where they both
start and end with double underscores, such as __init__ and __str__.
They are called automatically by the Python interpreter whenever spe-
cific conditions are met, such as when an object is created or when it is
used in a specific context.
By leveraging the Python data model, we can create rich interactions
between our Python objects. For example, we can define custom behavior
Part II. A Deeper Dive 118

for mathematical operations such as addition and subtraction, or we


can define how our objects should be represented as a string. This
functionality allows us to create objects which behave in a natural and
intuitive way.

Object Creation Using __new__ and __init__

__new__ and __init__ are two special methods in Python that are used
in the process of creating and initializing new objects.
__new__ is a method that is called when a new object is to be constructed.
It is responsible for creating a new instance of the class. It takes the class
as its first argument, and any additional arguments passed to the class
constructor are passed into the __new__ method. The new method is
then responsible for creating an instance of a class, and returning that
object to the interpreter.
Once an instance of an object has been created by __new__, it is then
passed to the __init__ method, along with the provided initializing
values as positional and keyword arguments. __init__ is then responsi-
ble for initializing the state of the new object, and to perform any other
necessary setup.
In most cases, the __new__ method is not necessary to be implemented
by the developer. Most of the time it is sufficient to use the base
implementation, which creates a new instance of the class, and then
calls __init__ on the instance, passing the instance as self and further
passing in all the provided arguments.

Singletons

To demonstrate the usage of __new__ and __init__, let’s consider the


singleton pattern. A singleton is a design pattern that ensures that a class
Part II. A Deeper Dive 119

can only have one instance, while providing a global access point to that
instance. This pattern can be implemented in Python by using the __-
new__ method.
The basic idea is to override the __new__ method in the singleton class
so that it only creates a new instance if one does not already exist.
If an instance already exists, the __new__ method simply returns that
instance, instead of creating a new one.

1 >>> class Singleton:


2 ... _instance = None
3 ...
4 ... def __new__(cls, *args, **kwargs):
5 ... if cls._instance is None:
6 ... cls._instance = object.__new__(cls)
7 ... return cls._instance
8 ...
9 ... def __init__(self, my_value):
10 ... self.my_value = my_value
11 ...
12 >>> my_object = Singleton(1)
13 >>> my_object.my_value
14 1
15 other_object = Singleton(2)
16 other_object.my_value
17 2
18 my_object.my_value
19 2

The function signature of __new__ uses a common *args, **kwargs


motif, which leverages tuple and dictionary packing to package the
function arguments and keyword arguments into a tuple/dictionary pair
Part II. A Deeper Dive 120

without explicitly naming which arguments and keyword arguments a


function or method expects.
In this example, the __new__ method first checks if an instance of the
class already exists on the class attribute cls._instance. If it does not
exist, the method creates a new instance using the __new__ method on
object as its constructor, and it assigns the new instance to the _in-
stance class variable. If an instance already exists on cls._instance,
the __new__ method simply returns the existing instance. This ensures
that the class can only ever have one instance.

Rich Comparisons

In Python, rich comparison methods are special methods that allow you
to define custom behavior for comparison operators, such as <, >, ==, !=,
<=, and >=.
The rich comparison methods are:

• def __lt__(self, other): - Implements the < operator


• def __le__(self, other): - Implements the <= operator
• def __eq__(self, other): - Implements the == operator
• def __ne__(self, other): - Implements the != operator
• def __gt__(self, other): - Implements the > operator
• def __ge__(self, other): - Implements the >= operator

Let’s take a look at an example implementation:


Part II. A Deeper Dive 121

1 class Money:
2 def __init__(self, amount: int, currency: str):
3 self.amount = amount
4 self.currency = currency
5
6 def __eq__(self, other):
7 return (
8 self.amount == other.amount
9 and self.currency == other.currency
10 )
11
12 def __lt__(self, other):
13 if self.currency != other.currency:
14 raise ValueError(
15 "Can't compare money of differing currencies."
16 )
17 return self.amount < other.amount
18
19 def __le__(self, other):
20 if self.currency != other.currency:
21 raise ValueError(
22 "Can't compare money of differing currencies."
23 )
24 return self.amount <= other.amount

In this example, the Money class has two attributes: amount, which is
the monetary value, and currency, which is the currency type. The __-
eq__ method compares the amount and currency of two Money objects,
and returns True if they are the same, and False otherwise. The __lt_-
_ and __le__ methods compare the amount value between two Money
objects with the same currency, and raises an error if they have different
currencies.
Part II. A Deeper Dive 122

You are typically only required to define half of the rich comparison
methods of any given object, as Python can infer the inverse value if a
requested operation is not defined.
With these methods defined, a Money object can be used in compar-
ison operations such as <, >, ==, !=, <=, and >= in a natural and in-
tuitive way. For example, Money(10, "USD") < Money(20, "USD")
will return True, and Money(10, "USD") == Money(10, "EUR") will
return False.

Operator Overloading

Operator overloading refers to the ability to define custom behavior for


operators such as +, -, *, /, etc. when they are used with objects of a
certain class. This is achieved by specific special methods, shown below.

• def __add__(self, other): - Implements the + operator.


• def __sub__(self, other): - Implements the - operator.
• def __mul__(self, other): - Implements the * operator.
• def __truediv__(self, other): - Implements the / operator.
• def __floordiv__(self, other): - Implements the // opera-
tor.
• def __mod__(self, other): - Implements the % operator.
• def __pow__(self, other): - Implements the ** operator.
• def __and__(self, other): - Implements the & operator.
• def __or__(self, other): - Implements the | operator.
• def __xor__(self, other): - Implements the ^ operator.
• def __lshift__(self, other): - Implements the << operator.
• def __rshift__(self, other): - Implements the >> operator.

By using operator overloading, we can create classes that have natural


and intuitive behavior when used as operands.
Part II. A Deeper Dive 123

1 >>> class Money:


2 ... def __init__(self, amount: int, currency: str):
3 ... self.amount = amount
4 ... self.currency = currency
5 ...
6 ... def __add__(self, other):
7 ... if self.currency != other.currency:
8 ... raise ValueError(
9 ... "Can't add money of differing currencies"
10 ... )
11 ... return Money(
12 ... self.amount + other.amount,
13 ... self.currency
14 ... )
15 ...
16 ... def __sub__(self, other):
17 ... if self.currency != other.currency:
18 ... raise ValueError(
19 ... "Can't subtract money of differing currencies"
20 ... )
21 ... return Money(
22 ... self.amount - other.amount,
23 ... self.currency
24 ... )
25 ...
26 ... def __mul__(self, other):
27 ... if not isinstance(other, int):
28 ... raise ValueError(
29 ... "Can't multiply by non-int value"
30 ... )
31 ... return Money(self.amount * other, self.currency)
32 ...
Part II. A Deeper Dive 124

33 ... def __truediv__(self, other):


34 ... if not isinstance(other, int):
35 ... raise ValueError(
36 ... "Can't divide by non-int value"
37 ... )
38 ... # divmod() is a builtin which does
39 ... # // and % at the same time
40 ... quotient, remainder = divmod(self.amount, other)
41 ... return (
42 ... Money(quotient, self.currency),
43 ... Money(remainder, self.currency)
44 ... )
45 ...

In this example, the Money class has two attributes: amount, which is the
monetary value, and currency, which is the currency type.
The __add__ method overloads the + operator, allowing you to add two
Money objects. It also check to make sure that the operands are of the
same currency, otherwise it raises a ValueError.
The __sub__ method overloads the - operator, allowing to subtract two
Money objects. It also check to make sure that the operands are of the
same currency, otherwise it raises a ValueError.
The __mul__ method overloads the * operator, allowing to multiply a
Money object by an int. If the value is not an int, it raises a ValueError.
The __truediv__ method overloads the / operator, allowing to divide a
Money object by an int. It returns the quotient and the remainder in the
form of new Money objects in the same currency.
Part II. A Deeper Dive 125

1 >>> my_money = Money(100, "USD")


2 >>> my_money += Money(50, "USD")
3 >>> my_money.amount
4 150
5 >>> q, r = my_money / 4
6 >>> q.amount
7 37
8 >>> r.amount
9 2

String Representations

Humans communicate using text. As such, utilizing the special methods


which render Python objects in string format such as __str__ and __-
repr__, makes it much easier for humans to understand the value of an
object and its purpose. These methods allow us to define a user-friendly
and unambiguous string representation of the object, making it more
intuitive to interact with.

1 >>> class Money:


2 ... def __init__(self, amount: int, currency: str):
3 ... self.amount = amount
4 ... self.currency = currency
5 ...
6 ... def __repr__(self):
7 ... return f"Money({self.currency} {str(self)})"
8 ...
9 ... def __str__(self):
10 ... return f"${round(self.amount/100, 2)}"
11 ...
12 ...
13 >>> Money(1200, "USD")
Part II. A Deeper Dive 126

14 Money(USD $12.05)
15 >>> str(Money(1200, "USD"))
16 '$12.0'

Emulating Containers

Container types such as lists, tuples, and dictionaries have built-in be-
havior for certain operations, such as the ability to iterate over elements,
check if an element is in their collection, and retrieve the length of the
container. We can emulate this behavior using special methods.
The [] operator can be overloaded using the following methods:

• def __getitem__(self, key): - This method is called when the


[] operator is used to retrieve an item from the container. The key
parameter represents the index or key of the item being retrieved.
This method should return the item at the specified key or raise an
IndexError if the key is not found.
• def __setitem__(self, key, value): - This method is called
when the [] operator is used to set an item in the container. The key
parameter represents the index or key of the item being set, and the
value parameter represents the new value of the item. This method
should set the item at the specified key to the specified value.
• def __delitem__(self, key): - This method is called when the
del statement is used to delete an item from the container. The key
parameter represents the index or key of the item being deleted. This
method should remove the item at the specified key.

In addition, the in operator can be overloaded using the following:


Part II. A Deeper Dive 127

• def __contains__(self, item): - This method allows a class


to define its own way of checking if an item is contained in the
container. This method should return a Boolean indicating whether
the item is contained in the container.

Finally, the len() function can be overloaded using the following:

• def __len__(self): - This method is called when the function


len() is called on a container. It should return an integer represent-
ing the length of the object.

By using these methods, we can create classes that have similar behavior
to built-in container types, making our code more effecient and Pythonic.

1 >>> class MutableString:


2 ... def __init__(self, text: str):
3 ... self._text = list(text)
4 ...
5 ... def __getitem__(self, idx):
6 ... return self._text[idx]
7 ...
8 ... def __setitem__(self, idx, value):
9 ... self._text[idx] = value
10 ...
11 ... def __delitem__(self, idx):
12 ... del self._text[idx]
13 ...
14 ... def __str__(self):
15 ... return "".join(self._text)
16 ...
17 ... def __len__(self):
Part II. A Deeper Dive 128

18 ... return len(self._text)


19 ...
20 >>> my_str = MutableString("fizzbuzz")
21 >>> my_str[0] = "F"
22 >>> str(my_str)
23 Fizzbuzz
24 >>> len(my_str)
25 8

In this example, the MutableString class has a single attribute _-


text, which is a list of characters representing the string. With the
__getitem__, __setitem__ and __delitem__ methods defined, a
MutableString object can be used to get, set, and delete characters in
the string, allowing for string manipulation in a similar way to a list
of characters. The __str__ method then allows a user to compile the
MutableString object into a python string.

Emulating Functions

A class can emulate a function by defining a __call__ method. The


__call__ method is itself called when an instance of the class is called
using parentheses (i.e., obj()).
Part II. A Deeper Dive 129

1 >>> class MyFunction:


2 ... def __call__(self, x, y):
3 ... return x + y
4 ...
5 >>> my_function = MyFunction()
6 >>> my_function(1, 2)
7 3

Using Slots

__slots__ is a special class attribute that you can define to reserve a


fixed amount of memory for each instance of the class. It is used to
optimize memory usage for classes that have a large number of instances,
and that do not need to add new attributes dynamically.
When you define __slots__ in a class, it creates a fixed-size array for
each instance to store the attributes defined in __slots__. This array is
much smaller than the dictionary that is used by default to store instance
attributes. The result is that __slots__ objects save a considerable
amount of memory, particularly if you have many instances of the class.
Slots also have the benefit of faster attribute access.

1 >>> class SlotsClass:


2 ... __slots__ = ['x', 'y']
3 ... def __init__(self, x, y):
4 ... self.x = x
5 ... self.y = y
6 ...
7 >>> obj = SlotsClass(1, 2)
8 >>> print(obj.x, obj.y)
9 1 2
Part II. A Deeper Dive 130

Customizing Attribute Access

In Python, we can customize attribute interactions using special methods


such as __getattribute__, __getattr__, __setattr__, and __de-
lattr__. These methods allow us to control how an object’s attributes
are accessed, set, and deleted.
There are two special methods for accessing the attributes of a given
object. The __getattribute__ method is the first attribute accessor
to be called when an attribute is accessed using the dot notation (e.g.,
obj.attribute) or when using the getattr() built-in function. It
takes the attribute name as its parameter and should return the value of
the attribute. The __getattr__ method is called only when an attribute
is not found by __getattribute__, i.e. when __getattribute__-
raises an AttributeError. It takes the attribute name as its parameter
and should return the value of the attribute.
The benefit to this dual implementation is that you can get other
known attributes on the instance inside __getattr__ using dot notation,
without running the risk of recursion errors, while still hooking into the
accessor protocol before an object attribute is returned.

1 >>> class ADTRecursion:


2 ... def __init__(self, **kwargs):
3 ... self._data = kwargs
4 ...
5 ... def __getattribute__(self, key):
6 ... return self._data.get(key)
7 ...
8 >>> class AbstractDataType:
9 ... def __init__(self, **kwargs):
10 ... self._data = kwargs
Part II. A Deeper Dive 131

11 ...
12 ... def __getattr__(self, key):
13 ... return self._data.get(key)
14 ...
15 >>> this = ADTRecursion(my_value=1)
16 >>> this.my_value
17 File "<stdin>", line 12, in __getattribute__
18 File "<stdin>", line 12, in __getattribute__
19 File "<stdin>", line 12, in __getattribute__
20 [Previous line repeated 996 more times]
21 RecursionError: maximum recursion depth exceeded
22 >>> that = AbstractDataType(my_value=1)
23 >>> that.my_value
24 1

If it is necessary to override the default __getattribute__ method on


a class, you can use the __getattribute__ method of the object class
in order to avoid the issue of infinite recursion.

1 >>> class ObjectGetAttr:


2 ... def __init__(self, **kwargs):
3 ... self._data = kwargs
4 ...
5 ... def __getattribute__(self, key):
6 ... _data = object.__getattribute__(self, "_data")
7 ... return _data.get(key)
8 ...
9 >>> this = ObjectGetAttr(my_value=1)
10 >>> this.my_value
11 1

The __setattr__ method is called when an attribute is set using the dot
notation (e.g., obj.attribute = value) or when using the setattr()
Part II. A Deeper Dive 132

built-in function. It takes the attribute name and value as its parameters
and should set the attribute to the specified value.
The __delattr__ method is called when an attribute is deleted using the
del statement (e.g., del obj.attribute) or when using the delattr()
built-in function. It takes the attribute name as its parameter and should
delete the attribute.

Iterators

The iterator protocol is a set of methods that allow Python objects


to define their own iteration behavior. The protocol consists of two
methods: __iter__ and __next__.
The __iter__ method is used to create an iterator object. It is called
when the iter() built-in function is used on an object or when a for
loop is used to iterate over an object. It should return an iterator object
that defines a __next__ method.
The __next__ method is used to retrieve the next item from the iterator.
It is called automatically by the for loop or when the next() built-in
function is used on the iterator. It should return the next item or raise a
StopIteration exception when there are no more items.
Part II. A Deeper Dive 133

1 >>> class CountByTwos:


2 ... def __init__(self, start, stop):
3 ... self.internal_value = start
4 ... self.stop = stop
5 ...
6 ... def __iter__(self):
7 ... return self
8 ...
9 ... def __next__(self):
10 ... snapshot = self.internal_value
11 ... if snapshot >= self.stop:
12 ... raise StopIteration
13 ... self.internal_value += 2
14 ... return snapshot
15 ...
16 >>> _iter = CountByTwos(start=5, stop=13)
17 >>> next(_iter)
18 5
19 >>> next(_iter)
20 7
21 >>> for i in _iter: # takes the partially consumed iterator and e\
22 xhausts it.
23 ... print(i)
24 9
25 11

In this example, the CountByTwos class has an __iter__ method that


returns self, and a __next__ method that lazily generates the next
number in the sequence. Calling the iter() function on our _iter value
simply returns self, as the instance already conforms to the iterator
protocol. When next() is called, either manually or in the context of a
for loop, the iterator yields the next number in the sequence. The class
Part II. A Deeper Dive 134

stops iterating when the internal value reaches the stop value, as at this
point the StopIteration exception is raised.

Lazy Evaluation

Since an iterator only generates the next value when it is requested,


it avoids the need to generate and store all the values at once. This is
particularly useful when working with large datasets that do not fit in
memory, as it allows the program to process the data one piece at a time
without having to load the entire dataset into memory. This sort of lazy
evaluation also allows the developer to avoid unnecessary computation.
For example, if the developer is looking for a specific value in an iterator,
the program can stop generating new values as soon as the value is found,
rather than generating all the remaining values.
A particular example where this might be useful is in the context of
infinite sequences. For example, lets consider a modified version of
CountByTwos which has no internal stopping mechanism.

1 >>> class CountByTwos:


2 ... def __init__(self, start):
3 ... self.internal_value = start
4 ...
5 ... def __iter__(self):
6 ... return self
7 ...
8 ... def __next__(self):
9 ... snapshot = self.internal_value
10 ... self.internal_value += 2
11 ... return snapshot
12 ...
Part II. A Deeper Dive 135

This iterator produces an infinite sequence of values, starting at the


start value. If this were greedily consumed, the program would hang.
But since iterators evaluate lazily, we can use this structure in contexts
where iteration can be stopped, by either a break or a return.

1 >>> for i in CountByTwos(5):


2 ... if i >= 9:
3 ... break
4 ... print(i)
5 ...
6 5
7 7
8 >>> i
9 9
10 >>> _iter = CountByTwos(0)
11 >>> while True:
12 ... val = next(_iter)
13 ... if val >= 6:
14 ... break
15 ... print(val)
16 ...
17 0
18 2
19 4

Context Managers

Context Managers provide a clean and convenient method for managing


resources that need to be acquired and released. The with statement
ensures that resources are acquired before the with block is executed,
and released after the block of code is exited, even if an exception is
raised.
Part II. A Deeper Dive 136

To hook into the context manager protocol, an object should define the
special methods __enter__ and __exit__. The __enter__ method
is called when the context is entered, and can be used to acquire the
resources needed by the block of code. It can return an object that will
be used as the context variable in the as clause of the with statement.
The __exit__ method is called when the context is exited, and it can
be used to release the resources acquired by the __enter__ method.
It takes three arguments: an exception type, an exception value, and
a traceback object. The __exit__ method can use these arguments to
perform cleanup actions, suppress exceptions, or log errors. If you don’t
plan on using the exception values in the object, they can be ignored.

1 >>> class ContextManager(object):


2 ... def __enter__(self):
3 ... print('entering!')
4 ... return self
5 ...
6 ... def __exit__(self, *args, **kwargs):
7 ... print('exiting!')
8 ...
9 ... def print(self):
10 ... print('in context')
11 ...
12 >>> with ContextManager() as cm:
13 ... cm.print()
14 ...
15 entering!
16 in context
17 exiting!
18
19 >>> with ContextManager():
20 ... raise ValueError
Part II. A Deeper Dive 137

21 entering!
22 exiting!
23 Traceback (most recent call last):
24 File "<stdin>", line 7, in <module>
25 raise ValueError
26 ValueError

In this example, when the with statement is executed, an instance of


ContextManager() is created, and the __enter__ method is called,
printing “entering!”. The __enter__ object returns the instance self
which is assigned to the variable cm. The block of code inside the with
statement is then executed, where the print() method of the class is
called and it prints “in context”. Finally, the __exit__ method is called,
printing “exiting!”.
Next, The __enter__ method of the ContextManager() class is called,
printing the “entering!” message to indicate that the context has been
entered. Then, the block of code inside the with statement is executed,
where a ValueError exception is raised. Since an exception is raised
within the with block, the interpreter leaves the block, and the __exit_-
_ method is called, passing the ValueError exception, its value and its
traceback as arguments. The __exit__ method simply prints “exiting!”
and the ValueError exception propagates up to the next level of the call
stack, where it can be handled by an enclosing exception handler.
A single with statement can execute multiple context managers in
concert:
Part II. A Deeper Dive 138

1 >>> class ContextManager(object):


2 ... def __init__(self, val):
3 ... self.val = val
4 ...
5 ... def __enter__(self):
6 ... print(f'entering {self.val}!')
7 ... return self
8 ...
9 ... def __exit__(self, *args, **kwargs):
10 ... print(f'exiting! {self.val}')
11 ...
12 ... def print(self):
13 ... print(f'in context of {self.val}')
14
15 >>> with (
16 ... ContextManager(1) as cm1,
17 ... ContextManager(2) as cm2,
18 ... ):
19 ... cm1.print()
20 ... cm2.print()
21 ...
22 entering 1!
23 entering 2!
24 in context of 1
25 in context of 2
26 exiting! 2
27 exiting! 1

Descriptors

Descriptors are objects that define one or more of the special methods __-
get__, __set__, and __delete__. These methods are used to customize
Part II. A Deeper Dive 139

the behavior of attribute access, such as getting, setting and deleting


attributes respectively.
A descriptor can be a class or an instance of a class that defines one
or more of these special methods. A class or an instance that defines a
descriptor is called a descriptor class or descriptor object. Descriptors are
then assigned as class attributes, and act on objects per-instance.
The __get__(self, instance, owner) method is called when an
attribute is accessed using the dot notation (e.g., instance.attribute)
or using the built-in getattr() function. It takes two arguments: the
instance that the descriptor is an attribute of, and the owner class which
defines the instance. The __get__ method should return the value of the
attribute.
The __set__(self, instance, value) method is called when an
attribute is set using the dot notation (e.g., instance.attribute =
value) or using the built-in setattr() function. It takes two arguments:
the instance that the descriptor is an attribute of and the value to set. The
__set__ method should set the attribute to the specified value.
The __delete__(self, instance) method is called when an attribute
is deleted using the del statement (e.g., del instance.attribute)
or using the built-in delattr() function. It takes one argument: the
instance that the descriptor is an attribute of. The __delete__ method
should delete the attribute.
Descriptors can be used to define attributes that have custom behavior,
such as computed properties, read-only properties, or properties that
enforce constraints. For example, we can write a descriptor which
requires attributes to be non-falsy.
Part II. A Deeper Dive 140

1 >>> class MyDescriptor:


2 ... def __get__(self, instance, owner):
3 ... return getattr(instance, "_my_attr", None)
4 ...
5 ... def __set__(self, instance, value):
6 ... if not value:
7 ... raise AttributeError(
8 ... "attribute must not be falsy"
9 ... )
10 ... setattr(instance, "_my_attr", value)
11 ...
12 ... def __delete__(self, instance):
13 ... delattr(instance, "_my_attr")
14 ...
15 >>> class MyClass:
16 ... desc = MyDescriptor()
17
18 >>> my_object = MyClass()
19 >>> my_object.desc
20 >>> my_object.desc = 1
21 >>> my_object.desc
22 1
23 >>> my_object.desc = 0
24 Traceback (most recent call last):
25 File "<stdin>", line 14, in <module>
26 AttributeError: attribute must not be falsy
27 del my_object.desc

Python provides a builtin descriptor called property. A property de-


scriptor is defined using the property() built-in constructor and it can
take several arguments, including fget, fset, and fdel, as functions.
Part II. A Deeper Dive 141

1 >>> class MyClass:


2 ... my_value = property(
3 ... fget=lambda self: getattr(self, "_value", None),
4 ... fset=lambda self, val: setattr(self, "_value", val),
5 ... fdel=lambda self: delattr(self, "_value")
6 ... )
7 ...
8 >>> my_object = MyClass()
9 >>> my_object.my_value = 5
10 >>> my_object.my_value
11 5

While vaild, the isn’t the typical use case of the property descriptor. You
will most commonly see the property descriptor evoked as a decorator
(we’ll talk more about decorators later).

1 >>> class MyClass:


2 ... @property
3 ... def my_value(self):
4 ... return getattr(self, "_my_value", None)
5 ...
6 ... @my_value.setter
7 ... def my_value(self, val):
8 ... self._my_value = val
9 ...
10 ... @my_value.deleter
11 ... def my_value(self):
12 ... del self._my_value
13 ...
14 >>> my_object = MyClass()
15 >>> my_object.my_value = 5
16 >>> my_object.my_value
17 5
Part II. A Deeper Dive 142

Chapter 10. Concepts in


Object-Oriented Programming

Object-oriented programming (OOP) is a programming paradigm that


organizes code into objects. Through features such as polymorphism,
encapsulation, and inheritance, we can create objects which have con-
sistent behaviors, hide away implementation details, and can be easily
reused and extended. The ultimate goal of object-oriented programming
is the abstraction of state, and Python provides developers with the tools
necessary to do so.

Inheritance

Inheritance is a feature of object-oriented programming that allows a


new class to inherit the properties and methods of an existing class. The
new class is called the derived class or sub class, and the existing class is
called the base class or super class.
In Python, a class can inherit from another class using parentheses.

1 class SuperClass:
2 def __init__(self, name):
3 self.name = name
4
5 def print_name(self):
6 print(self.name)
7
8
9 class SubClass(SuperClass):
10 pass
Part II. A Deeper Dive 143

Here, the SubClass inherits from the SuperClass, and automatically


has access to all the properties and methods defined in the SuperClass.
A derived class can override or extend the methods of the super class by
redefining them in the derived class.

1 >>> class SuperClass:


2 ... def __init__(self, name):
3 ... self.name = name
4 ...
5 ... def print_name(self):
6 ... print(self.name)
7 ...
8 ... class SubClass(SuperClass):
9 ... def print_name(self):
10 ... print(self.name.upper())
11 ...

Here, the SubClass overrides the print_name method of the Super-


Class, to print the name in uppercase.

Calling the Super Class using super()

The super() function is a built-in function in Python that allows a


derived class to call methods from its super class. It is often used when a
derived class wants to extend or override the functionality of a method
defined in its super class.
For example, let’s say we have a class SuperClass with a method
print_name and a class SubClass that inherits from SuperClass and
wants to extend the functionality of the print_name method.
Part II. A Deeper Dive 144

1 >>> class SuperClass:


2 ... def __init__(self, name):
3 ... self.name = name
4 ...
5 ... def print_name(self):
6 ... print(self.name)
7 ...
8 >>> class SubClass(SuperClass):
9 ... def print_name(self):
10 ... # call the print_name() fn
11 ... # of the parent class
12 ... super().print_name()
13 ... print("child class")
14 ...

Here, the SubClass overrides the print_name method of the Super-


Class, but it also wants to call the print_name method of the parent
class to keep its original functionality. The super().print_name() call
in the SubClass will call the print_name method of the SuperClass.

Multiple Inheritance and Method Resolution Order

In Python, a class can inherit from multiple classes by listing them in the
class definition, separated by commas. For example:

1 >>> class SubClass(SuperClass1, SuperClass2):


2 ... pass

Here, the SubClass inherits from both SuperClass1 and SuperClass2.


When a class inherits from multiple classes, it can potentially have
multiple versions of the same method or property. This is known as the
Part II. A Deeper Dive 145

diamond problem, and can lead to ambiguity about which version of the
method or property to use.
To resolve this ambiguity, Python uses a method resolution order (MRO)
to determine the order in which the classes are searched for a method
or property. The MRO used in Python is the C3 linearization algorithm,
which creates a linearization of the class hierarchy that is guaranteed to
be consistent and predictable.
C3 guarentees the following:

• subclasses take precedent over superclasses


• order of preference for multiple superclasses is left to right
• a class only appears once in MRO

1 >>> class A:
2 ... def method(self):
3 ... print("A")
4 ...
5 ... class B(A):
6 ... pass
7 ...
8 ... class C(A):
9 ... def method(self):
10 ... print("C")
11 ...
12 ... class D(B, C):
13 ... pass
14 ...

Here, class D inherits from both B and C which both inherit from A. If we
create an instance of D and call the method method, C is printed, because
Part II. A Deeper Dive 146

C is a direct superclass, where as A is a superclass once removed and B


does not implement the method.

1 >>> D().method()
2 C
3 >>> D.__mro__
4 (__main__.D, __main__.B, __main__.C, __main__.A, object)

Encapsulation

Encapsulation is a feature of object-oriented programming that allows


for the hiding of implementation details within an object. It is the
practice of keeping the internal state of an object private, and providing
a public interface for interacting with the object - separating the interac-
tion with the implementation.
In Python, encapsulation is achieved through the use of “private” prop-
erties and methods, which are denoted by a single or double underscore
prefix. It should be noted that this is only a convention, and data is never
truly private in Python.
For example, a method or attribute with a single underscore prefix like
_attribute is considered a private method or attribute, and should only
be accessed by the class and its subclasses internally. An attribute or
method with double underscore prefix like __attribute is also considered
a private method or attribute, and should be accessed only within the
class.
For attributes prefixed with double underscores, Python will “mangle”
the attribute name so it is harder to intuit from outside the class. This
causes the attribute name to be prefixed with “classname_”.
Part II. A Deeper Dive 147

1 >>> class MyClass:


2 ... def __init__(self, value):
3 ... self.__mangled = value
4 ... self._private = value
5 ...
6 >>> my_object = MyClass('string')
7 >>> dir(my_object)
8 ['_MyClass__mangled',
9 '__class__',
10 ...
11 '_private']

It should be noted that, unlike many other languages, Python doesn’t


truly keep methods and attributes private. The underscore convention
is merely a convention. Any user of the class can access these methods
with impunity.

Polymorphism

Polymorphism is a feature of object-oriented programming that allows


objects of different classes to be treated as objects of a common super-
class. This means that objects of different classes can be used interchange-
ably, as long as they implement the same methods or properties. This
feature allows for the creation of flexible, reusable and extensible code.
Python is a dynamically-typed language, which allows for a more
flexible approach to polymorphism, known as “duck typing”, which is
based on the idea that “if it quacks like a duck, it’s a duck”.
In other words, the type of an object is irrelevant to the execution of your
program, so long as you can operate on an object through an expected
interface, your code will execute.
Part II. A Deeper Dive 148

1 >>> class Mouse:


2 ... def speak(self):
3 ... return "Squeak"
4 ...
5 ... class Cat:
6 ... def speak(self):
7 ... return "Meow"
8 ...
9 ... class Dog:
10 ... def speak(self):
11 ... return "Woof"
12 ...
13 >>> for animal in (Mouse(), Cat(), Dog()):
14 >>> animal.speak()
15 Squeak
16 Meow
17 Woof

Chapter 11. Metaclasses

A metaclass in Python is a class that defines the behavior of other


classes. Or, it is a class that is used to create other classes. All classes
are themselves objects, and those objects are instances of type, the base
metaclass. The type object is itself a class, which can be subclassed to
create custom metaclasses.
When a class is defined, Python automatically creates an instance of the
type class, and assigns it as the metaclass of the new class. This instance
is used to create the class object, and to define its behavior. By default,
the behavior of a class is defined by the methods of the type class, but a
custom metaclass can be used to change this behavior.
Part II. A Deeper Dive 149

The syntax for creating a custom metaclass is to define a new class that
inherits from the type class. From there, we can overload the default
methods of the type class to create custom hooks that run when a class
is defined. This is typically done using the __new__ method to define
the behavior of class creation. The __init__ method is also commonly
overridden to define the behavior of the class initialization.
For example, the following code defines a custom metaclass that adds a
greeting attribute to the class definition, and prints to the console when
this is done:

1 >>> class MyMetaclass(type):


2 ... def __new__(cls, name, bases, attrs):
3 ... attrs["greeting"] = "Hello, World!"
4 ... print(f"creating the class: {name}")
5 ... return type.__new__(cls, name, bases, attrs)
6 ...
7 >>> class MyClass(metaclass=MyMetaclass):
8 ... pass
9 ...
10 creating the class: MyClass
11 >>> MyClass.greeting
12 Hello, World!

As we can see, the print() function is executed at the moment which


MyClass is defined. We can also see that the class has a class attribute
greeting which is the string “Hello World!”
Metaclasses allow you to hook into user code from library code by
providing a way to customize the behavior of class creation. By defining
a custom metaclass and setting it as the metaclass of a user-defined class,
you can change the way that class is created and initialized, as well as
add new attributes and methods to the class.
Part II. A Deeper Dive 150

Lets consider a library that provides a custom metaclass for creating


singletons. The metaclass could override the __call__ method to ensure
that only one instance of the class is ever created, and return that instance
whenever the class is called.

1 >>> class SingletonMetaclass(type):


2 ... def __init__(cls, name, bases, attrs):
3 ... super().__init__(name, bases, attrs)
4 ... cls._instance = None
5 ...
6 ... def __call__(cls, *args, **kwargs):
7 ... if cls._instance is None:
8 ... cls._instance = super().__call__(*args, **kwargs)
9 ... return cls._instance
10 ...
11 >>> class BaseSingleton(metaclass=SingletonMetaclass):
12 ... pass
13 ...
14 >>> class MySingleton(BaseSingleton):
15 ... pass
16 ...
17 >>> id(MySingleton())
18 140486001221456
19 >>> id(MySingleton())
20 140486001221456

Metaclasses can also be used to enforce from the library constraints or


expectations on user code. Consider another example where a library
expects users to define certain methods on a derived classes. You can use
a metaclass to catch type errors at the class instantiation, instead of later
during runtime.
Part II. A Deeper Dive 151

1 >>> class LibraryMetaclass(type):


2 ... def __new__(cls, name, bases, attrs):
3 ... if "my_method" not in attrs:
4 ... raise AttributeError(
5 ... "derived class must define method my_method()"
6 ... )
7 ... return type.__new__(cls, name, bases, attrs)
8 ...
9 >>> class BaseClass(metaclass=LibraryMetaclass):
10 ... my_method = None
11 ...
12 >>> class DerivedClass(BaseClass):
13 ... def my_method(self):
14 ... pass
15 ...
16 >>> class BadClass(BaseClass):
17 ... pass
18 Traceback (most recent call last):
19 File "<stdin>", line 1, in <module>
20 File "<stdin>", line 4, in __new__
21 AttributeError: derived class must define method my_method()

Chapter 12. The Data Types, Revisited.

In this chapter, we’re going to circle back and look more in depth as some
of python’s built-in data types. As previously discussed, the built-in data
types are all classes which implement a series of attributes and methods.
In order to inspect these attributes and methods, we can use the built-
in function dir() to see a list of all the names of attributes and
methods that the object has. This can be a useful tool for exploring
Part II. A Deeper Dive 152

and understanding the functionality of a particular object or module in


code. The help() function can also be used in order to show the official
documentation for any method not covered here.

1 >>> help(str.count)
2 Help on method_descriptor:
3
4 count(...)
5 S.count(sub[, start[, end]]) -> int
6
7 Return the number of non-overlapping occurrences
8 of substring sub in string S[start:end]. Optional
9 arguments start and end are interpreted as in slice
10 notation.

Numbers

The three major numeric types in Python are int, float, and complex.
While each data type is uniquely distinct, they share some attributes so
to make interoperability easier. For example, the .conjugate() method
returns the complex conjugate of a complex number - for integers and
floats, this is just the number itself. Furthermore, the .real and .imag
attributes define the real and imaginary part of a given number. For floats
and ints, the .real part is the number itself, and the .imag part is always
zero.

Integers

The int type represents integers, or whole numbers. Integers can be


positive, negative, or zero and have no decimal point. Python supports
Part II. A Deeper Dive 153

arbitrarily large integers, so there is no limit on the size of an integer


value, unlike some other programming languages. Python only guaran-
tees singletons for the integers -5 through 256, so when doing integer
comparisons outside this range, be sure to use == instead of is.

Bits and Bytes

The Python int has several methods for convenience when it comes to
bit and byte representations. The int.bit_length() method returns
the number of bits necessary to represent an integer in binary, excluding
the sign and leading zeros. int.bit_count() returns the number of bits
set to 1 in the binary representation of an integer. int.from_bytes()
and int.to_bytes() convert integer values to and from a bytestring
representation.

1 >>> int(5).bit_count()
2 2
3 >>> int(5).bit_length()
4 3
5 >>> int(5).to_bytes()
6 b'\x05'
7 >>> int.from_bytes(b'\x05')
8 5

Floats

The python floating point value, represented by the float data type, is
a primitive data type that is used to represent decimal numbers. Floats
are numbers with decimal points, such as 3.14 or 2.718. There are also
a few special float values, such as float('inf'), float('-inf'), and
float('nan'), which are representations of infinity, negative infinity,
and “Not a Number”, respectively.
Part II. A Deeper Dive 154

Float Methods

The float() type defines a method .is_integer() which returns


True if the float is directly convertible to an integer value, and False
otherwise.

1 >>> float(5).is_integer()
2 True

float.as_integer_ratio() returns a numerator and denominator


integer value whose ratio is exactly equal to the original float. The
denominator is guaranteed to be positive. If the float is infinite, this
operation raises an OverflowError; if it is NaN, this operation raises
a ValueError.

1 >>> float(1.5).as_integer_ratio()
2 (3, 2)

Hex Values

The float.fromhex() classmethod and float.hex() method are


built-in methods that are used to work with floats in a hexadecimal
format.

1 >>> hex = float(3.4).hex()


2 >>> hex
3 '0x1.b333333333333p+1'
4 >>> float.fromhex(hex)
5 3.4
Part II. A Deeper Dive 155

Complex Numbers

The complex number is a value represented by a real portion and an


imaginary, in the form of (x+yj). For complex numbers, the com-
plex.real attribute returns the real portion of the number x, and the
complex.imag attribute returns the imaginary portion of the number yj.
The conjugate() method returns the complex conjugate of the complex
number.

1 >>> (3-4j).conjugate()
2 (3+4j)

Strings

As stated previously, strings are a built-in data type used to represent


sequences of characters. They are enclosed in either single quotes ' or
double quotes ", and can contain letters, numbers, and symbols. Strings
are immutable, which means that once they are created, their values
cannot be changed.
When operating on a method of the str class, the result of the method
call returns a new string. This is important to remember in cases where
you have multiple references to the initial string, and when you’re doing
comparisons between strings; be sure to use == instead of is.

Split and Join

The str.split() and str.join() methods are methods of the Python


string class that convert strings to and from iterables. str.split()
takes one optional argument, which is a delimiter, and returns a list of
Part II. A Deeper Dive 156

substrings that are separated by the delimiter. If no delimiter is provided,


the method will split the string using whitespace. str.join() takes
an iterable of strings as an argument and returns a new string that
concatenates the elements of the iterable using the string on which the
method is called as a separator.

1 >>> _str = "a b c d e f g"


2 >>> _list = _str.split()
3 >>> _list
4 ['a', 'b', 'c', 'd', 'e', 'f', 'g']
5 >>> " ".join(_list)
6 'a b c d e f g'

Search and Replace

The str.index(), str.find(), str.rfind(), and str.replace()


methods allow you to search and replace substrings from a given string.
The str.index(), str.find(), str.rfind() methods each take one
argument, the substring, and they each return the index of the first
occurrence of the string. The differences are that str.index() raises a
ValueError if the string is not found, but the -find() methods simply
return -1. Furthermore, rfind() searches in reverse order compared to
the find() method.
str.replace() takes two required arguments, the first is the substring
to search for, and the second is the substring to replace with. It returns
a new string where all the occurrences of the first substring are replaced
with the second substring. A third, optional count argument may be
provided, which specifies the number of found instances of the substring
which should be replaced.
Part II. A Deeper Dive 157

1 >>> _str = "Hello World!"


2 >>> _str.index("World")
3 6
4 >>> _str.index("Hey!")
5 File "<stdin>", line 3, in <module>
6 ValueError: substring not found
7 >>> _str.find("Hey!")
8 -1
9 >>> "Hello, World. World. World.".rfind("World")
10 21
11 >>> "Hello, World!".replace("World", "Universe")
12 Hello Universe!

Paddings

The str.ljust(), str.lstrip(), str.rjust(), str.rstrip(),


str.zfill(), and str.center() methods allow you to add and
remove padding from the lefthand and righthand sides of a Python
string. The -just() and center() methods take two arguments, the
first of which is the length of the final string, and the second is the fill
character which is to be used to add characters to the resulting string.
The -strip() methods do the opposite, instead of padding strings they
strip the provided characters from the lefthand and righthand sides of
the string. Finally, the zfill() method leftpads the string with zeros to
a specified width.
Part II. A Deeper Dive 158

1 >>> _str = "Hello World"


2 >>> _str.ljust(15, "-")
3 Hello World----
4 >>> _str.rjust(15, "-")
5 ----Hello World
6 >>> _str = _str.center(19, "-")
7 >>> _str
8 ----Hello World----
9 >>> _str.lstrip("-")
10 Hello World----
11 >>> _str.rstrip("-")
12 ----Hello World
13 >>> "1".zfill(4)
14 '0001'

Formatting

The str.format() and str.format_map() methods are used to insert


values into a string using placeholders. These methods are used to create
formatted strings, which are useful for displaying data in a specific way
or for creating string templates that can be reused.
str.format() takes any number of arguments, which are used to
replace placeholders in the string. Placeholders are indicated by curly
braces {} in the string, and the index of the argument determines which
placeholder is replaced.
str.format_map() on the other hand takes a single argument, which
is a dictionary. The dictionary contains keys that match the placeholders
in the string, and values that are used to replace the placeholders. In this
case the placeholders are curly braces {} which contain the name of the
corresponding dictionary key.
Part II. A Deeper Dive 159

1 >>> "My name is {} and I am {} years old".format("Jessica", 27)


2 'My name is Jessica and I am 31 years old'
3 >>> data = {"name": "Jessica", "age": 31}
4 >>> "My name is {name} and I am {age} years old".format_map(data)
5 'My name is Jessica and I am 31 years old'

Translating

The str.translate() and str.maketrans() methods are used to ma-


nipulate strings by replacing specific characters or groups of characters.
str.maketrans() method can take a dictionary which is a mapping
of individual characters (or their ordinal values) to a corresponding
replacement string. It returns a translation table that can be used as an
argument for str.translate() method.

1 >>> _str = "Hello World!"


2 >>> tbl = str.maketrans({"!": "?"})
3 >>> str.translate(_str, tbl)
4 Hello World?

Partitioning

The str.partition() and str.rpartition() methods are used to


split a string into three parts: a part before a specified delimiter, the
delimiter itself, and a part after the delimiter. str.partition() method
takes one required argument, which is the delimiter to search for. It
returns a tuple containing the part of the string before the delimiter,
the delimiter itself, and the part of the string after the delimiter. If the
delimiter is not found, the tuple contains the original string, followed
by two empty strings. The str.rpartition() method is similar to
str.partition() method but it returns the last occurrence of the
Part II. A Deeper Dive 160

delimiter in the string, instead of the first. If the delimiter is not found,
the tuple contains two empty strings, followed by the original string.

1 >>> "Hello World!".partition(' ')


2 ('Hello', ' ', 'World!')
3 >>> "Hello World World!".rpartition(' ')
4 ('Hello World', ' ', 'World!')

Prefixes and Suffixes

The str.endswith() and str.startswith() methods are used to


check if a string starts or ends with a specific substring. str.endswith()
takes one required argument, which is the suffix to be checked, and
returns True if the string ends with the specified suffix, and False
otherwise. str.startswith() also takes one required argument, which
is the prefix to be checked, and returns True if the string starts with the
specified prefix, and False otherwise.

1 >>> _str = "Hello World!"


2 >>> _str.startswith("Hello!")
3 True
4 >>> _str.endswith("world!")
5 False

The str.removeprefix() and str.removesuffix() are used


to remove a specific substring from the start or end of a string.
str.removeprefix() takes one required argument, which is the
prefix to be removed, and removes the prefix from the string if the
string starts with the specified prefix, and returns the modified string.
str.removesuffix() also takes one required argument, which is the
suffix to be removed, and removes the suffix from the string if the string
Part II. A Deeper Dive 161

ends with the specified suffix and returns the modified string. They are
non-failing, so they return a resultant string regardless of whether or
not the substring was present.

1 >>> _str = "Hello, World!"


2 >>> _str.removeprefix("Hello, ")
3 "World!"
4 >>> _str.removesuffix("World!")
5 "Hello, "

Boolean Checks

A number of methods on the string class are checks to return True or


False depending on if a certain condition is true. Those methods are as
follows:

• str.isalnum() - Returns True if all characters in the string are


alphanumeric, False otherwise.
• str.isalpha() - Returns True if all characters in the string are
alphabetical, False otherwise.
• str.isascii() - Returns True if all characters in the string are
ASCII, False otherwise.
• str.isdecimal() - Returns True if all characters in the string are
decimal digits, False otherwise.
• str.isdigit() - Returns True if all characters in the string are
digits, False otherwise.
• str.isidentifier() - Returns True if the string is a valid Python
identifier, False otherwise.
• str.islower() - Returns True if all characters in the string are
lowercase, False otherwise.
Part II. A Deeper Dive 162

• str.isnumeric() - Returns True if all characters in the string are


numeric, False otherwise.
• str.isprintable() - Returns True if all characters in the string
are printable, False otherwise.
• str.isspace() - Returns True if all characters in the string are
whitespace, False otherwise.
• str.istitle() - Returns True if the string is in title case, False
otherwise.
• str.isupper() - Returns True if all characters in the string are
uppercase, False otherwise.

Case Methods

A number of methods on the string class are methods for formatting


strings based on the cases of individual characters. Those methods are as
follows:

• str.capitalize() - Returns a copy of the string with its first


character capitalized and the rest in lowercase.
• str.casefold() - Returns a version of the string suitable for
caseless comparisons.
• str.lower() - Returns a copy of the string with all uppercase
characters converted to lowercase.
• str.swapcase() - Returns a copy of the string with uppercase
characters converted to lowercase and vice versa.
• str.title() - Returns a copy of the string in title case, where the
first letter of each word is capitalized and the rest are lowercase.
• str.upper() - Returns a copy of the string with all lowercase
characters converted to uppercase.
Part II. A Deeper Dive 163

It should be noted that when sorting strings in a case-insensitive manner,


it is recommended to use the str.casefold() method instead of
using the str.lower() method. The str.casefold() method is a
more powerful version of the str.lower() method that is specifically
designed for case-insensitive string comparisons.
The str.lower() method simply converts all uppercase characters
in a string to lowercase. However, this method may not work as in-
tended for certain characters and character encodings. For example,
the str.lower() method may not properly handle the conversion
of uppercase characters in non-Latin scripts. On the other hand, the
str.casefold() method performs a more aggressive case-folding that
is suitable for case-insensitive string comparisons.

1 >>> sorted(["Massive", "Maß"])


2 ['Massive', 'Maß']
3 >>> sorted(["Massive", "Maß"], key=lambda x: x.lower())
4 ['Massive', 'Maß']
5 >>> sorted(["Massive", "Maß"], key=lambda x: x.casefold())
6 ['Maß', 'Massive']
7 >>> 'Maß'.lower()
8 'Maß'
9 >>> 'Maß'.upper()
10 'MASS'
11 >>> 'Maß'.casefold()
12 'mass'

Encodings

The str.encode() method is used to convert a string to bytes using


a specific encoding. Encoding is a process of converting a string of
Part II. A Deeper Dive 164

characters into a sequence of bytes, and it’s necessary when working


with text data that needs to be stored or transmitted in a specific format.
The str.encode() method takes one argument, which is the name of
the encoding to be used, by default utf-8. Some of the other encodings
are utf-16, ascii, and latin1. It returns a bytes object that contains the
encoded string.

1 >>> "Hello, World!".encode("utf8")


2 b'Hello, World!'

Bytes

So far in this text we’ve been referring to bytes as related to strings.


And this is with some justification; many of the methods for strings are
available for bytestrings, and the syntax is largely the same, save for the
b'' prefix.
However, str and bytes are two distinct data types. A str is a sequence
of Unicode characters; bytes however are fundamentally a sequence
of integers, each of which is between 0 and 255. They represent binary
data, and it’s useful when working with data that needs to be stored
or transmitted in a specific format. If the value of an individual byte is
below 127, it may be represented using an ASCII character.

1 >>> "Maß".encode('utf8')
2 b'Ma\xc3\x9f'

There are a few distinct methods which are unique to the bytes type,
and we’ll discuss those here.
Part II. A Deeper Dive 165

1 >>> [m for m in dir(bytes)


2 ... if not any((m.startswith("__"), m in dir(str)))]
3 ['decode', 'fromhex', 'hex']

Decoding

The bytes.decode() method is used to convert a bytestring to a


string. The method takes one required argument, which is the character
encoding of the bytestring.

1 >>> b = b'Hello World'


2 >>> b.decode('utf-8')
3 'Hello World'

Hex Values

The bytes.fromhex() and bytes.hex() methods are used to work


with bytes in a hexadecimal format.

1 >>> b = b"Hello World"


2 >>> hex = b.hex()
3 >>> hex
4 '48656c6c6f20576f726c64'
5 >>> bytes.fromhex(hex)
6 b"Hello World"

Tuples

The Python tuple data type has limited functionality, given its nature as
an immutable data type. The two methods it does define however are
tuple.count() and tuple.index().
Part II. A Deeper Dive 166

The .count() method is used to count the number of occurrences of


a specific element in a tuple. The method takes one required argument,
which is the element to count, and returns the number of occurrences of
that element in the tuple.

1 >>> t = (1, 2, 3, 2, 1)
2 >>> t.count(2)
3 2

The .index() method is used to find the index of the first occurrence of
a specific element in a tuple. The method takes one required argument,
which is the element to find, and returns the index of the first occurrence
of that element in the tuple. If the element is not found in the tuple, the
method raises a ValueError.

1 >>> t = (1, 2, 3, 2, 1)
2 >>> t.index(2)
3 1

Lists

The Python list type defines a suite a methods which are used to inspect
and mutate the data structure.

Counting and Indexing

The .count() method is used to count the number of occurrences of


a specific element in a list. The method takes one required argument,
which is the element to count, and returns the number of occurrences of
that element in the list.
Part II. A Deeper Dive 167

1 >>> l = [1, 2, 3, 2, 1]
2 >>> l.count(2)
3 2

The .index() method is used to find the index of the first occurrence
of a specific element in a list. The method takes one required argument,
which is the element to find, and returns the index of the first occurrence
of that element in the list. If the element is not found in the list, the
method raises a ValueError.

1 >>> l = [1, 2, 3, 2, 1]
2 >>> l.index(2)
3 1

Copying

The copy method is used to create a shallow copy of a list. It does not
take any arguments and it returns a new list that is a copy of the original
list.

1 >>> l1 = [object()]
2 >>> l2 = l1.copy()
3 >>> l2
4 [<object at 0x7f910fec4630>]
5 >>> l1[0] is l2[0]
6 True

Mutations

The .append() method is used to add an element to the end of a list.


The method takes one required argument, which is the element to add,
and adds it to the end of the list.
Part II. A Deeper Dive 168

1 >>> l = [0]
2 >>> l.append(1)
3 >>> l
4 [0, 1]

The .extend() method is similar to append, but it is used to add multiple


elements to a list. The method takes one required argument, which is an
iterable, and adds each element of the iterable to the list.

1 >>> l = [1, 2, 3]
2 >>> l.extend((4, 5, 6))
3 >>> l
4 [1, 2, 3, 4, 5, 6]

The .insert() method is used to insert an element at a specific position


in a list. The method takes two required arguments: the first is the index
where the element should be inserted, and the second is the element to
insert. An insert operation does not replace the element at a given index,
it simply shifts the list accordingly.

1 >>> l = [1, 2, 3, 4]
2 >>> l.insert(2, 5)
3 >>> l
4 [1, 2, 5, 3, 4]

The .remove() method is used to remove the first occurrence of a


specific element from a list. The method takes one required argument,
which is the element to remove, and removes the first occurrence of that
element in the list. If the element is not found in the list, the method
raises a ValueError.
Part II. A Deeper Dive 169

1 >>> l = [1, 2, 3, 2, 1]
2 >>> l.remove(2)
3 >>> l
4 [1, 3, 2, 1]

The .pop() method is used to remove an element from a list and return
it. The method takes one optional argument, which is the index of the
element to remove. If no index is provided, the method removes and
returns the last element of the list.

1 >>> l = [1, 2, 3, 4]
2 >>> l.pop(2)
3 3
4 >>> l
5 [1, 2, 4]

Finally, the .clear() method is used to remove all elements from a list.
It does not take any arguments and removes all elements from the list.

1 >>> l = [1, 2, 3, 4]
2 >>> l.clear()
3 >>> l
4 []

Orderings

The .reverse() method is used to reverse the order of elements in a


list. The method is an in-place operation and returns None.
Part II. A Deeper Dive 170

1 >>> l = [1, 2, 3, 4]
2 >>> l.reverse()
3 >>> l
4 [4, 3, 2, 1]

Finally, the .sort() method is used to sort the elements in a list. The
method is an in-place operation and returns None. The method does
not take any arguments but accepts a few optional arguments, such as
key and reverse. The key argument specifies a function that is used to
extract a comparison value from each element in the list. The reverse
argument is a Boolean value that specifies whether the list should be
sorted in a ascending or descending order. The sort() method sorts the
elements in ascending order by default.

1 >>> l = [3, 2, 4, 1]
2 >>> l.sort()
3 >>> l
4 [1, 2, 3, 4]

Dictionaries

The Python dict type defines a suite a methods which are used to inspect
and mutate the data structure.

Iter Methods

The .keys(), .values(), and .items() methods are the primary


mechanism for iterating over a dictionary. .keys() returns an iterator
which yields each of the keys contained in the dictionary. .values()
returns an iterator which yields each of the values matched to a key in
the dictionary. Finally, .items() yields each of the key-value pairs as a
tuple.
Part II. A Deeper Dive 171

1 >>> d = {'a': 1, 'b': 2, 'c': 3}


2 >>> list(d.keys())
3 ['a', 'b', 'c']
4 >>> list(d.values())
5 [1, 2, 3]
6 >>> list(d.items())
7 [('a', 1), ('b', 2), ('c', 3)]

Getter/Setter Methods

The .get() method is used to retrieve the value of a specific key in a


dictionary. The method takes one required argument, which is the key to
retrieve, and returns the value of that key. If the key is not found in the
dictionary, the method returns None, or an optional default value passed
as a second argument.

1 >>> d = {'a': 1, 'b': 2, 'c': 3}


2 >>> d.get('b')
3 2
4 >>> d.get('d', -1)
5 -1

The .setdefault() method is used to insert a key-value pairs into


a dictionary if the key is not already present. The method takes two
required arguments, the first is the key to insert, and the second is the
value to set for that key should the key not be present. The method then
returns the value for the key.
Part II. A Deeper Dive 172

1 >>> d = {}
2 >>> d.setdefault("a", []).append(1)
3 >>> d
4 {'a': [1]}

Mutations

The .pop() method is used to remove a specific key-value pair from a


dictionary and return its value. The method takes one required argument,
which is the key to remove, removes that key-value pair from the
dictionary, and returns the value to the caller. If the key is not found
in the dictionary, the method raises a KeyError.

1 >>> d = {'a': 1, 'b': 2, 'c': 3}


2 >>> d.pop('b')
3 2
4 >>> d
5 {'a': 1, 'c': 3}

The .popitem() method is used to remove an arbitrary key-value pair


from a dictionary and return it as a tuple. The method does not take any
arguments.

1 >>> d = {'a': 1, 'b': 2, 'c': 3}


2 >>> d.popitem()
3 ('c', 3)
4 >>> d
5 {'a': 1, 'b': 2}

The .clear() method is used to remove all key-value pairs from a


dictionary. It It does not take any arguments and removes all key-value
pairs from the dictionary.
Part II. A Deeper Dive 173

1 >>> d = {'a': 1, 'b': 2, 'c': 3}


2 >>> d.clear()
3 >>> d
4 {}

The .update() method is used to add multiple key-value pairs to


a dictionary. The method takes one required argument, which is a
dictionary, and it adds each key-value pair of that dictionary to the
original dictionary. Any naming conflicts favors the new value.

1 >>> d = {'a': 1, 'b': 2}


2 >>> d.update({'c': 3, 'd': 4})
3 >>> d
4 {'a': 1, 'b': 2, 'c': 3, 'd': 4}
5 >>> d = dict(a=1, b=2)
6 >>> d.update({'a': 2})
7 >>> d
8 {'a': 2, 'b': 2}

Creating new Dictionaries

The .copy() method is used to create a shallow copy of a dictionary. It


does not take any arguments and it returns a new dictionary that is a
copy of the original dictionary.

1 >>> d = {'a': 1, 'b': 2, 'c': 3}


2 >>> d2 = d.copy()
3 >>> d2
4 {'a': 1, 'b': 2, 'c': 3}

Finally, the .fromkeys() method is used to create a new dictionary. The


method takes one or two arguments; the first is an iterable of keys and
Part II. A Deeper Dive 174

the second is an optional value to set as the default value for each key. If
a default value is not specified, it defaults to None.

1 >>> keys = ('a', 'b', 'c')


2 >>> d = dict.fromkeys(keys, 0)
3 >>> d
4 {'a': 0, 'b': 0, 'c': 0}

Sets

Sets are used to store unique elements. They are useful for operations
such as membership testing, removing duplicates from a sequence and
mathematical operations. They are mutable and have a variety of built-in
methods to perform various set operations.

Mutations

The .add() method is used to add an element to a set. The method takes
one required argument, which is the element to add, and adds it to the
set.

1 >>> s = {1, 2, 3}
2 >>> s.add(4)
3 >>> s
4 {1, 2, 3, 4}

The .remove() method is used to remove a specific element from a


set. The method takes one required argument, which is the element to
remove. If the element is not present in the set, it raises a KeyError.
Part II. A Deeper Dive 175

1 >>> s = {1, 2, 3}
2 >>> s.remove(2)
3 >>> s
4 {1, 3}

The .discard() method is used to remove an element from a set if it is


present. The method takes one required argument, which is the element
to remove. This method is similar to the .remove() method, but it does
not raise an error if the element is not present in the set.

1 >>> s = {1, 2, 3}
2 >>> s.discard(2)
3 >>> s
4 {1, 3}
5 >>> s.discard(4)
6 >>> s
7 {1, 3}

The .pop() method is used to remove and return an arbitrary element


from a set. The method does not take any arguments and removes and
returns an arbitrary element from the set. If the set is empty, it raises a
KeyError.

1 >>> s = {1, 2, 3}
2 >>> s.pop()
3 1
4 >>> s
5 {2, 3}

The .update() method is used to add multiple elements to a set. The


method takes one or more iterable arguments and adds each element
from those iterables to the set.
Part II. A Deeper Dive 176

1 >>> s = {1, 2}
2 >>> s.update([2, 3], (4, 5))
3 >>> s
4 {1, 2, 3, 4, 5}

The .clear() method is used to remove all elements from a set. It does
not take any arguments and removes all elements from the set.

1 >>> s = {1, 2, 3}
2 >>> s.clear()
3 >>> s
4 set()

Set Theory

Set theory is a branch of mathematics which models the relationship


between objects, focusing on the different ways they can be organized
into a collection. Python’s builtin set type provides a multitude of
methods which allow you to compute relationships between sets.
The .union() method returns a new set that contains all elements from
both the set and an iterable.

1 >>> s1 = {1, 2, 3}
2 >>> s2 = {2, 3, 4}
3 >>> s1.union(s2)
4 {1, 2, 3, 4}

The .intersection() and .intersection_update() methods com-


pute a new set that contains only the elements that are common to both
the set and the iterable. Calling .intersection() returns a new set,
and .intersection_update() does the operation in-place.
Part II. A Deeper Dive 177

1 >>> s1 = {1, 2, 3}
2 >>> s2 = {2, 3, 4}
3 >>> s1.intersection(s2)
4 {2, 3}
5 >>> s1.intersection_update(s2)
6 >>> s1
7 {2, 3}

The .difference() and .difference_update() methods compute


a new set that contains all elements that are in the set but not in an
iterable. Calling .difference() returns a new set, and .difference_-
update() does the operation in-place.

1 >>> s1 = {1, 2, 3}
2 >>> s2 = {2, 3, 4}
3 >>> s1.difference(s2)
4 {1}
5 >>> s1.difference_update(s2)
6 >>> s1

The .symmetric_difference() and .symmetric_difference_up-


date() methods compute a new set that contains elements that are
in either the set or an iterable, but not in both. Calling .symmetric_-
difference() returns a new set, and .symmetric_difference_up-
date() does the operation in-place.
Part II. A Deeper Dive 178

1 >>> s1 = {1, 2, 3}
2 >>> s2 = {2, 3, 4}
3 >>> s1.symmetric_difference(s2)
4 {1, 4}
5 >>> s1.symmetric_difference_update(s2)
6 >>> s1
7 {1, 4}

Boolean Checks

Several methods available for set objects allow you to check the relation-
ship between a pair of sets.
The .isdisjoint() method returns True if two sets have no common
elements.

1 >>> s1 = {1, 2, 3}
2 >>> s2 = {4, 5, 6}
3 >>> s1.isdisjoint(s2)
4 True

The .issubset() method returns True if all elements of a set are


present in another set.

1 >>> s1 = {1, 2, 3}
2 >>> s2 = {1, 2, 3, 4, 5, 6}
3 >>> s1.issubset(s2)
4 True

The .issuperset() method returns True if a set contains all elements


of another set.
Part II. A Deeper Dive 179

1 >>> s1 = {1, 2, 3, 4, 5, 6}
2 >>> s2 = {1, 2, 3}
3 >>> s1.issuperset(s2)
4 True

Copying

Finally, the .copy() method allows you to create a shallow copy of a


given set. It does not take any arguments and returns a new set that is a
copy of the original set. The references in each set are the same.

1 >>> s1 = {"Hello World",}


2 >>> s2 = s1.copy()
3 >>> s1.pop() is s2.pop()
4 True

Chapter 13. Type Hints

Type hints, also known as type annotations, are a recent addition


to Python which allow developers to add static type information for
function arguments, return values, and variables. The philosophy behind
type hints is to provide a way to improve code readability and catch
certain types of errors earlier in the development cycle.
Type hints make it easier for other developers to understand the expected
inputs and outputs of functions, without having to manually inspect the
code or drop into a debugger during test. They also allow static type
checkers, such as mypy or Language Server Protocols (LSP), to analyze
code and detect type-related issues before the code is executed.
It’s important to note that type hints are optional, and Python’s dynamic
typing is still maintained. Type hints do not affect runtime behavior and
Part II. A Deeper Dive 180

are only used for static analysis and documentation purposes. It should
also be explicitly noted that, absent a mechanism for type checking, type
hints are not guaranteed to be accurate by the interpreter. They are hints
in the very sense of the word.

Incorporating Type Hints

One of the best places to start adding type hints is in existing functions.
Functions typically have explicit inputs and outputs, so adding type
definitions to I/O values is relatively straightforward. Once you have
identified a function that can be type annotated, determine the types of
inputs and outputs for that function. This should be done by inspecting
the code and looking at the context in which each variable of the function
is used. For example, if a function takes two integers as inputs and returns
a single integer as output, you would know that the inputs should have
type hints int and the return type hint should be int as well.
Type hints for functions go in the function signature. The syntax for these
hints is name: type[=default] for function arguments, and -> type:
for the return value.

1 def multiply(a: int, b: int) -> int:


2 return a * b

In this example, the multiply function is given type annotations for the
a and b variables. In this case, those types are specified as integers. Since
the resulting type of multiplying two integers is always an integer, the
return type of this function is also int.
Its worth reiterating that this multiply function will not fail at runtime
if it is passed values which aren’t integers. However, static checkers like
Part II. A Deeper Dive 181

mypy will fail if the function is called elsewhere in the code base with
non-int values, and text editors with LSP support will indicate that you
are using the function in error.

1 # ./script.py
2
3 def multiply(a: int, b: int) -> int:
4 return a * b
5
6 print(multiply(1, 2.0))

1 root@77a3d1e5fa49:/code/hints# python script.py


2 2.0
3 root@77a3d1e5fa49:/code/hints# python -m mypy script.py
4 script.py:6: error: Argument 2 to "multiply" has incompatible typ\
5 e "float"; expected "int" [arg-type]
6 Found 1 error in 1 file (checked 1 source file)
7 root@77a3d1e5fa49:/code/hints#

Union types

You can specify multiple possible types for a single argument or return
value by using a union type. This is done either by using the Union type
from the typing module, enclosing the possible types in square brackets,
or by using the bitwise or operator | as of python 3.10.
Part II. A Deeper Dive 182

1 # ./script.py
2
3 def multiply(a: int|float, b: int|float) -> int|float:
4 return a * b
5
6 print(multiply(1, 2.0))

In this example, we’ve modified our multiply() function to type check


against both integer types, as well as floats. Running a static type checker
against this new function signature throws no errors.

1 root@77a3d1e5fa49:/code/hints# python -m mypy script.py


2 Success: no issues found in 1 source file
3 root@77a3d1e5fa49:/code/hints#

Optional

The Optional type is a type hint which is used to indicate that an


argument to a function or method is optional, meaning that it does not
have to be provided in order for the function or method to be called. It
is typically paired with a default value which is used when the function
caller does not pass an explicit argument.
Part II. A Deeper Dive 183

1 # ./script.py
2
3 from typing import Optional
4
5 def greet(name: str, title: Optional[str] = None) -> str:
6 if title:
7 return f"Hello, {title} {name}"
8 return f"Hello, {name}"
9
10 print(greet("Justin", "Dr."))
11 print(greet("Cory"))

1 root@77a3d1e5fa49:/code/hints# python -m mypy script.py


2 Success: no issues found in 1 source file
3 root@77a3d1e5fa49:/code/hints# python script.py
4 Hello, Dr. Justin
5 Hello, Cory
6 root@77a3d1e5fa49:/code/hints#

type|None

An optional argument is by definition a union between some type t and


None. In addition to using Optional as an explicit type definition, it
is possible to achieve the same type definition using the | operator as
t|None. Both methods of type annotation serve the same purpose, but
the choice of which to use often comes down to personal preference and
the specific use case.
Part II. A Deeper Dive 184

1 # ./script.py
2
3 def greet(name: str, title: str|None = None) -> str:
4 if title:
5 return f"Hello, {title} {name}"
6 return f"Hello, {name}"
7
8 print(greet("Justin", "Dr."))
9 print(greet("Cory"))

1 root@77a3d1e5fa49:/code/hints# python -m mypy script.py


2 Success: no issues found in 1 source file
3 root@77a3d1e5fa49:/code/hints# python script.py
4 Hello, Dr. Justin
5 Hello, Cory
6 root@77a3d1e5fa49:/code/hints#

Literal

The Literal type allows you to restrict the values that an argument or
variable can take to a specific set of literal values. This can be used for
ensuring that a function or method is only called with a specific correct
argument.
Part II. A Deeper Dive 185

1 # ./script.py
2
3 from typing import Literal
4
5 def color_picker(
6 color: Literal["red", "green", "blue"]
7 ) -> tuple[int, int, int]:
8 match color:
9 case "red":
10 return (255, 0, 0)
11 case "green":
12 return (0, 255, 0)
13 case "blue":
14 return (0, 0, 255)
15
16 print(color_picker("pink"))

1 root@77a3d1e5fa49:/code/hints# python -m mypy script.py


2 script.py:14: error: Argument 1 to "color_picker" has incompatibl\
3 e type "Literal['pink']"; expected "Literal['red', 'green', 'blue\
4 ']" [arg-type]
5 Found 1 error in 1 file (checked 1 source file)
6 root@77a3d1e5fa49:/code/hints#

Final

In instances where a value is expected to be constant, its type can


be specified as Final. Final types allow you to provide stronger
guarantees about the behavior of a variable. This also applies to variables
being overridden in subclasses.
Part II. A Deeper Dive 186

1 # ./script.py
2
3 from typing import Final
4
5 API_KEY: Final = "77d75da2e4a24afb85480c3c61f2eb09"
6 API_KEY = "c25e77071f5a4733bcd453c037adeb3f"
7
8 class User:
9 MAXSIZE: Final = 32
10
11 class NewUser(User):
12 MAXSIZE: Final = 64

1 root@77a3d1e5fa49:/code/hints# python -m mypy script.py


2 script.py:6: error: Cannot assign to final name "API_KEY" [misc]
3 script.py:12: error: Cannot override final attribute "MAXSIZE" (p\
4 reviously declared in base class "User") [misc]
5 Found 2 errors in 1 file (checked 1 source file)
6 root@77a3d1e5fa49:/code/hints#

TypeAlias

In instances where inlining type hints becomes too verbose, you can use
a TypeAlias type to create an alias for a definition based on existing
types.
Part II. A Deeper Dive 187

1 # ./script.py
2
3 from typing import Literal, TypeAlias
4
5 ColorName = Literal["red", "green", "blue"]
6 Color: TypeAlias = tuple[int, int, int]
7
8 def color_picker(color: ColorName) -> Color:
9 match color:
10 case "red":
11 return (255, 0, 0)
12 case "green":
13 return (0, 255, 0)
14 case "blue":
15 return (0, 0, 255)
16
17 print(color_picker("green"))

1 root@77a3d1e5fa49:/code/hints# python -m mypy script.py


2 Success: no issues found in 1 source file
3 root@77a3d1e5fa49:/code/hints# python script.py
4 (0, 255, 0)
5 root@77a3d1e5fa49:/code/hints#

NewType

In contrast to a TypeAlias, the NewType type allows you to define new


types based on existing types. This type is treated as it’s own distinct
type, separate from whatever original type it was defined with. Creating
a variable defined as the new type is done by calling the type with values
matching the type’s type definition.
Part II. A Deeper Dive 188

1 # ./script.py
2
3 from typing import Literal, NewType
4
5 ColorName = Literal["red", "green", "blue"]
6 Color = NewType("Color", tuple[int, int, int])
7
8 def color_picker(color: ColorName) -> Color:
9 match color:
10 case "red":
11 return Color((255, 0, 0))
12 case "green":
13 return Color((0, 255, 0))
14 case "blue":
15 return Color((0, 0, 255))
16
17 print(color_picker("green"))

1 root@77a3d1e5fa49:/code/hints# python -m mypy script.py


2 Success: no issues found in 1 source file
3 root@77a3d1e5fa49:/code/hints# python script.py
4 (0, 255, 0)
5 root@77a3d1e5fa49:/code/hints#

TypeVar

Generic Types can be created as a placeholders for specific types that


are to be determined at a later time. They still provide type safety
by enforcing type consistency, but they do not assert what type is
to be expected. Generic types can be constructed using the TypeVar
constructor.
Part II. A Deeper Dive 189

1 # ./script.py
2
3 from typing import TypeVar
4
5 T = TypeVar('T')
6
7 def identity(item: T) -> T:
8 return item
9
10 identity(1)

TypeVar’s can also be either constrained or bound. A bound TypeVar


can only be instantiated with a specified type or its subtype. A con-
strained TypeVar can only be of a given type or types.

1 # ./script.py
2
3 from typing import TypeVar
4
5 class Missing(int):
6 _instance = None
7 def __new__(cls, *args, **kwargs):
8 if not cls._instance:
9 cls._instance = super().__new__(cls, -1)
10 return cls._instance
11
12 # True, False, and Missing all satisfy this type
13 # because they are subclasses of int. However,
14 # int types also satisfy.
15 BoundTrinary = TypeVar("Trinary", bound=int)
16
17 def is_falsy(a: BoundTrinary) -> bool:
Part II. A Deeper Dive 190

18 return a is not True


19
20 print(
21 is_falsy(True),
22 is_falsy(False),
23 is_falsy(Missing()),
24 is_falsy(-1)
25 )
26
27 # True, False, and Missing all satisfy this type
28 # because the type is constrained. In this case
29 # however int does not satisfy, and throws a type
30 # error.
31 ConstrainedTrinary = TypeVar("Trinary", bool, Missing)
32
33 def is_truthy(a: ConstrainedTrinary) -> bool:
34 return a is True
35
36 print(
37 is_truthy(True),
38 is_truthy(False),
39 is_truthy(Missing()),
40 is_truthy(-1)
41 )

1 root@a22f8a0c68fb:/code/hints# python -m mypy ./script.py


2 script.py:37: error: Value of type weariable "ConstrainedTrinary"\
3 of "is_truthy" cannot be "int" [type-var]
4 Found 1 error in 1 file (checked 1 source file)
5 root@a22f8a0c68fb:/code/hints#

In this example we’re adding a Missing() singleton for representing a


Part II. A Deeper Dive 191

“Missing” state for trinary logic (in comparison to boolean logic which
has 1 and 0, trinary logic has three states, 1, 0, and -1). Since this
new missing state, as well as the True and False singletons, all are
subclasses of int, we can create a TypeVar named BoundTrinary
which is bound to the int type, and this type definition will satisfy
the three states in the trinary. However this type will also match any
variable of int type; if this is undesirable we can instead constrain
the type definition to exclusively the bool, and Missing types as
demonstrated in the ConstrainedTrinary type definition. Static type
checkers will now throw a type error if an int type is passed where a
ConstrainedTrinary is expected.

Protocols

Instead of checking against a specific type, sometimes we only want to


check that an object implements some sort of expected interface. We
can do this using the Protocol type. A protocol describes the minimal
interface that an object must implement in order to be considered a
particular type.

1 # ./script.py
2
3 from typing import Protocol
4
5 class Incrementable(Protocol):
6 def increment(self) -> None: ...
7
8 class Counter:
9 def __init__(self):
10 self.value = 0
11
Part II. A Deeper Dive 192

12 class CountByOnes(Counter):
13 def increment(self):
14 self.value += 1
15
16 class CountByTwos(Counter):
17 def increment(self):
18 self.value += 2
19
20 def increment_n(counter: Incrementable, n: int) -> None:
21 for _ in range(n):
22 counter.increment()
23
24 for c in (CountByOnes(), CountByTwos()):
25 increment_n(c, 10)
26 print(c.value)

1 <b>root@a22f8a0c68fb:/code/hints# python -m mypy script.py</b>


2 Success: no issues found in 1 source file
3 <b>root@a22f8a0c68fb:/code/hints# python script.py</b>
4 10
5 20

In this example, the Incrementable Protocol specifies an interface


which contains a specific method increment() as a method that takes
no arguments and returns None. Any object which defines an incre-
ment() method satisfies the type definition of the protocol.
The increment_n function takes an object of type Incrementable and
an integer n and calls the .increment() method n times. The function
does not need to know the specific type of the counter object, as long as it
implements the Incrementable protocol. Since both CountByOnes and
Part II. A Deeper Dive 193

CountByTwos implement this protocol, we can safely pass both objects


to the increment_n function.

Runtime type checking

Protocols by default are not type checkable at runtime. However, pro-


tocols can hook into Python’s runtime type check mechanisms by in-
cluding a typing.runtime_checkable decorator, which adds runtime
infrastructure without interfering with static analysis.

1 # ./script.py
2
3 from typing import Protocol, runtime_checkable
4
5 @runtime_checkable
6 class Incrementable(Protocol):
7 def increment(self) -> None: ...
8
9 def increment_n(counter: Incrementable, n: int) -> None:
10 if not isinstance(counter, Incrementable):
11 raise TypeError

Generics

Generics are used to create type-safe data abstractions while being type
agnostic. To do this, we can create classes which subclasses the Generic
type from the typing module. When the class is later instantiated, a type
definition can then be defined at the point of object construction using
square brackets.
Part II. A Deeper Dive 194

1 # ./script.py
2
3 from typing import Generic, TypeVar
4
5 T = TypeVar("T")
6
7 class Queue(Generic[T]):
8 def __init__(self) -> None:
9 self._data: list[T] = []
10
11 def push(self, item: T) -> None:
12 self._data.append(item)
13
14 def pop(self) -> T:
15 return self._data.pop(0)
16
17 int_queue = Queue[int]()
18 str_queue = Queue[str]()
19
20 # fails type check, pushing a string to an `int` queue
21 int_queue.push('a')

1 root@a22f8a0c68fb:/code/hints# python -m mypy ./script.py


2 script.py:21: error: Argument 1 to "push" of "Queue" has incompat\
3 ible type "str"; expected "int" [arg-type]
4 Found 1 error in 1 file (checked 1 source file)
5 root@a22f8a0c68fb:/code/hints#

In this example, Queue is a generic data abstraction which defines a type-


safe implementation for pushing and popping items from a collection.
The type is defined at instantiation; when int_queue and str_queue
are created, a type is specified during construction of the queue using
Part II. A Deeper Dive 195

square bracket notation. This makes it so code fails static type checking if
items of incorrect type are pushed and popped from either of the queues.

TypedDict

The TypedDict class is used to create typesafe dictionaries. With Type-


dDict, you can ensure that dictionaries have the right keys and values,
and those values are type safe at instantiation.

1 # ./script.py
2
3 from typing import TypedDict
4
5 class Person(TypedDict):
6 name: str
7 age: int
8
9 person = Person(name="Chloe", age=27)

Classes which subclass TypedDict can use class variables to specify keys
in a dictionary key-value store. The included type annotation asserts
during static type check that the value associated with a given key is
of a specified type.
At instantiation, TypedDict objects require that all variables are passed
to the constructor as keyword arguments. Failing to pass an expected
variable will cause static type checking to fail. This default behavior
can be changed however, by passing a total=False flag in the class
definition. Setting this flag will make all keys optional by default. We
can further specify a specific key as either Required or NotRequired,
so to designate if a specified key in the collection requires a value at
instantiation.
Part II. A Deeper Dive 196

1 # ./script.py
2
3 from typing import TypedDict, NotRequired, Required
4
5 class FirstClass(TypedDict, total=False):
6 a: str
7 b: str
8
9 class SecondClass(TypedDict):
10 a: str
11 b: NotRequired[str]
12
13 class ThirdClass(TypedDict, total=False):
14 a: str
15 b: Required[str]
16
17 first = FirstClass() # passes type check, nothing is required
18 second = SecondClass() # fails type check, missing 'a'
19 third = ThirdClass() # fails type check, missing 'b'

1 root@a22f8a0c68fb:/code/hints# python -m mypy ./script.py


2 script.py:18: error: Missing key "a" for TypedDict "SecondClass" \
3 [typeddict-item]
4 script.py:19: error: Missing key "b" for TypedDict "ThirdClass" \
5 [typeddict-item]
6 Found 2 errors in 1 file (checked 1 source file)
7 root@a22f8a0c68fb:/code/hints#

Finally, TypedDict objects can subclass the Generic type. This allows
dictionary values to be collections of types which are specified at
instantiation.
Part II. A Deeper Dive 197

1 # ./script.py
2
3 from typing import TypedDict, Generic, TypeVar
4
5 T = TypeVar("T")
6
7 class Collection(TypedDict, Generic[T]):
8 name: str
9 items: list[T]
10
11 coll = Collection[int](name="uid", items=[1,2,3])

Chapter 14. Modules, Packages, and


Namespaces

Up until now, we have been utilizing the functionality that is built


directly into the Python runtime. We have been using literals, containers,
functions, and classes that are available by default. However, this is only
a small portion of the tools and functionality that are available to Python
developers. To access this additional functionality, we must delve into
the intricacies of Python’s module system and its packaging system.

Modules

Python’s module system allows developers to organize and reuse code in


a structured way. A module is simply a file containing Python definitions
and statements, and its file name serves as the module name with a .py
file extension. For example, a file named module.py would be a module
containing Python code that can be imported and used in other parts of
your program.
Part II. A Deeper Dive 198

Consider the following directory structure with two files. The first file is
a module file, and the second file is a script.

1 root@b9ba278f248d:/code# find . -type f


2 ./module.py
3 ./script.py

In the module file, we define the functions add() and subtract().


These functions now exist in what is called the “namespace” of the
module, i.e. the name module is recognized to contain the functions add
and subtract. Everything defined in the top level scope of the module
file is defined under the module namespace.

1 # ./module.py
2
3 def add(a, b):
4 return a + b
5
6 def subtract(a, b):
7 return a - b

In the script file, we use the import statement to import our module.
The import statement will import the entire module and make all of
its definitions and statements available in the current namespace of
the script, under the name module. We can access all of the defined
functionality of the module using the dot syntax, similar to accessing
the attributes of a class.
Part II. A Deeper Dive 199

1 # ./script.py
2
3 import module
4
5 print(module.add(1, 2))
6 print(module.subtract(1, 2))

With this, we can run python ./script.py and the output to the
console is 3, and then -1.

1 root@b9ba278f248d:/# cd code/
2 root@b9ba278f248d:/code# ls
3 module.py script.py
4 root@b9ba278f248d:/code# python script.py
5 3
6 -1
7 root@b9ba278f248d:/code#

If we want to instead import our module under a given alias, we can use
the as keyword in our import statement to rename the module in the
scope of the current namespace.

1 # ./script.py
2
3 import module as m
4
5 print(m.add(1, 2))
6 print(m.subtract(1, 2))

If we are only interested in importing specific objects from the module


namespace, we can use the from keyword to import individual objects.
Multiple objects can be imported via a mechanism similar to tuple
unpacking.
Part II. A Deeper Dive 200

1 # ./script.py
2
3 from module import (
4 add,
5 subtract,
6 )
7
8 print(add(1, 2))
9 print(subtract(1, 2))

In instances where our modules are collected into folders within the root
directory, we can use dot syntax to specify the path to a module.

1 root@b9ba278f248d:/code# find . -type f


2 ./modules/module.py
3 ./script.py

1 # ./script.py
2 import modules.module as m
3
4 print(m.add(1, 2))

Module Attributes

There are several attributes that are automatically created for every
module in Python. Some of the most commonly used module attributes
are:

• __name__ - This attribute contains the name of the module as a


string. If the module is being run as the main program, this attribute
will be set to __main__.
Part II. A Deeper Dive 201

• __doc__ - This attribute contains the module’s documentation


string, which is specified at the top of the module using triple quotes
""".
• __file__ - This attribute contains the file path of the module as a
string.
• __package__ - This attribute contains the package name of the
module as a string, or None if the module is not part of a package.

if __name__ == "__main__":

The process of importing a module causes the python interpreter to run


the module. In order to distinguish between a module being run as the
main process as compared to it being run as an import, the if __name__-
== "__main__" idiom is used. __name__ will only be equal to __main_-
_ when the file is being executed as the main program. __name__ will
set to the module name in all other instances.

1 # ./script.py
2
3 import module
4
5 def main():
6 print(module.add(1, 2))
7
8 if __name__ == "__main__":
9 main()

Packages

In addition to modules, a collection of files and folders can be organized


into packages. A package in Python is a way to group together related
Part II. A Deeper Dive 202

modules and the package name serves as a prefix for the modules
it contains. For example, a package named mypackage could contain
modules mypackage.module1, mypackage.module2 and so on.
To create a package, you need to create a directory with the package
name, and within that directory, you can have one or more modules.
The directory should contain an __init__.py file, which is a special
file that tells Python that this directory should be treated as a package.
Outside the directory you should have a setup file. Typically these come
in the form of either a setup.py file or a pyproject.toml file. The
setup.py file is a setup script that the setuptools package uses to
configure the build of a python package. A typical setup.py file will
define a function called setup() that is used to specify build parameters
and metadata. A pyproject.toml file accomplishes much of the same
functionality, but uses a declarative configuration scheme to define the
project’s metadata, dependencies, and build settings. For now, we’ll use
just use the pyproject.toml toolchain, and introduce setup.py later
when necessary.
Let’s consider the following package structure:

1 my_package/
2 logging/
3 log.py
4 math/
5 addition.py
6 subtraction.py
7 __init__.py
8 __main__.py
9 pyproject.toml

The pyproject.toml file should, at minimum, define which build


Part II. A Deeper Dive 203

system should be used to install a package. The default buildsystems for


python are setuptools and wheels.

1 # ./pyproject.toml
2
3 [build-system]
4 requires = ["setuptools", "wheel"]

Next, inside the my_package directory there should be an __init__.py


file. For this example it is blank, but you can place any number of imports
in this file for objects you wish to be included in the top-level package
namespace. We can also optionally add a __main__.py file, which is
executed when the package is executed from the terminal using the
syntax python -m my_package.

1 # ./my_package/__main__.py
2
3 if __name__ == "__main__":
4 print("hello from main!")

Inside the my_package directory is two folders. The first is a folder called
logging, and inside that folder there is a log.py file. There is only one
function in this file, named fn. This function simply calls the print()
function.

1 # ./my_package/logging/log.py
2
3 def fn(*args, **kwargs):
4 print(*args, **kwargs)

Inside the second folder math is two files. One file, addition.py,
contains a function which adds two numbers. This file also calls the fn
Part II. A Deeper Dive 204

from logging.log to print out the two numbers whenever it is called. The
second file, subtraction.py does the same, except it returns the result
of subtracting the two numbers, instead of adding them.

1 # ./my_package/math/addition.py
2
3 from ..logging.log import fn
4
5 def add(a, b):
6 fn(a, b)
7 return a + b

1 # ./my_package/math/subtraction.py
2
3 from my_package.logging.log import fn
4
5 def subtract(a, b):
6 fn(a, b)
7 return a - b

Imports within packages

Packages allow for both relative and absolute imports across the package.
Relative import allow you to specify the package and module names
relative to the current namespace using dot notation. For example, in
addition.py the log file’s fn is imported relatively. The logging module
is once removed up the folder tree, so ..logging specifies that the
logging module is once removed from the current directory using a
second .. As another example, from the addition.py file you could
import the subtract function by specifying a relative import from
.subtraction import subtract. Absolute imports on the other hand
Part II. A Deeper Dive 205

specify the full package and module name, as seen in subtraction.py.


This is more explicit and straightforward, but in deeply nested package
directories it can get verbose.

Installation

In order to make use of this package, we need to install it. Python’s


default package installer is pip, which stands for “pip installs packages”.
Pip also allows you to easily search for, download, and install packages
from PyPI, the Python Package Index, which is an online repository
of Python packages. Packages are installed from the package index by
running python -m pip install <package_name> in your terminal.
For local packages, you can install from the package directory using
python -m pip install ./. The -e flag can be optionally passed so
that the package is installed as --editable. This links the installed
python package to the local developer instance, so that any changes made
to the local codebase are automatically reflected in the imported module.

1 root@e08d854dfbfe:/code# ls
2 my_package pyproject.toml
3
4 root@e08d854dfbfe:/code# python -m pip install -e .
5 Obtaining file:///code
6 Installing build dependencies ... done
7 Checking if build backend supports build_editable ... done
8 Getting requirements to build editable ... done
9 Preparing editable metadata (pyproject.toml) ... done
10 Building wheels for collected packages: my-package
11 Building editable for my-package (pyproject.toml) ... done
12 Created wheel for my-package: filename=my_package-0.0.0-0.edita\
13 ble-py3-none-any.whl size=2321 sha256=2137df7f84a48acdcb77e9d1ab3\
14 33bf838693910338918fe16870419cb351979
Part II. A Deeper Dive 206

15 Stored in directory: /tmp/pip-ephem-wheel-cache-qqft00bw/wheels\


16 /71/fd/a9/eb23a522d4ed2deb67e9d98937897b0b77b5bf9c1ac50a2378
17 Successfully built my-package
18 Installing collected packages: my-package
19 Successfully installed my-package-0.0.0
20 WARNING: Running pip as the 'root' user can result in broken perm\
21 issions and conflicting behaviour with the system package manager\
22 . It is recommended to use a virtual environment instead: https:/\
23 /pip.pypa.io/warnings/venv
24
25 root@e08d854dfbfe:/code# python
26 Python 3.11.1 (main, Jan 17 2023, 23:30:27) [GCC 10.2.1 20210110]\
27 on linux
28 Type "help", "copyright", "credits" or "license" for more informa\
29 tion.
30 >>> from my_package.math import addition
31 >>> addition.add(1, 2)
32 1 2
33 3
34 >>>
35 >>> from my_package.math import subtraction
36 >>> subtraction.subtract(3, 4)
37 3 4
38 -1
39 >>>
40 root@71c53592c77f:~# python -m my_package
41 hello from main!
Part III. The Python
Standard Library
The Python Standard Library is a collection of modules and functions
that are included with every Python installation. It provides a range of
functionality that can be used to accomplish a wide variety of tasks, such
as string and text processing, file and directory manipulation, network-
ing, mathematics and statistics, multithreading and multiprocessing,
date and time operations, compression and archiving, and logging and
debugging. The inclusion of such an expansive standard library by
default means that developers can focus on writing their application
code, rather than worrying about recreating standard implementations
from scratch. It is a big reason as to why Python is often referred to as a
“batteries included” language.

Chapter 15. Virtual Environments

As we start to create packages and start integrating other packages


into projects, we’re going to quickly run into the issue where we want
to keep project dependencies separate. This is where Python virtual
environments come in handy. Virtual environments allow us to create
isolated environments with their own independent set of dependencies,
which can be managed separately from other projects and the global
Python environment.
Part III. The Python Standard Library 208

Without virtual environments, different projects might require different


versions of the same package or even completely different packages.
Installing or upgrading a package in one project could affect the behavior
of other projects, potentially causing errors or breaking functionality.
Using virtual environments ensures that each project has access to the
specific versions of packages that it needs, without worrying about
conflicts or breaking other projects. It also allows for easier sharing and
portability of projects, as the environment and its dependencies can be
packaged and distributed together with the project.
To create a virtual environment, start by navigating in your terminal to
a project directory. Next, run the command python -m venv $name,
where $name is the name of your virtual environment (typically just
venv or .venv). This will create a directory in the current working
directory with the folder name $name. To note, if you’re using a version
control tool like git, be sure to add this folder to the .gitignore file so
packages aren’t added to the project VC history.

1 root@aa41f93b0e59:~# python -m venv venv


2 root@aa41f93b0e59:~# ls
3 venv

Inside the newly created venv directory, we’ll see a few folders. First is
the ./venv/bin folder (on Windows this will be the ./venv/Scripts
folder) which is where executables will be placed. The two files within
this folder that are worth knowing about are the python file, which
is a symlink to the binary executable of the global python version,
and the activate script, which for POSIX systems (linux, mac) is
generally ./venv/bin/activate (or an analogous activate.fish
or activate.csh if using either the fish shell or csh/tcsh shells,
respectively), and for Windows is ./venv/Scripts/Activate.ps1 if
Part III. The Python Standard Library 209

using PowerShell. The activate script acts differently depending on


which platform you are using, but in general its responsibility is to add
the executables folder to your shell’s PATH. Doing this makes such that
when it goes to execute python, it first finds the virtual environment
executable instead of finding the global install. It also adds environment
variables such as VIRTUAL_ENV so that the interpreter knows that it is
running in a virtual environment mode, and finally it updates the shell
prompt to reflect that the virtual environment is active. Finally, it adds a
deactivate command to restore the shell to it’s original state once we
are done with the virtual environment.

1 root@cd9ed409ec97:~# which python


2 /usr/local/bin/python
3 root@cd9ed409ec97:~# echo $PATH
4 /usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:\
5 /sbin:/bin
6 root@cd9ed409ec97:~# . venv/bin/activate
7 (venv) root@cd9ed409ec97:~# which python
8 /root/venv/bin/python
9 (venv) root@cd9ed409ec97:~# echo $PATH
10 /root/venv/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr\
11 /sbin:/usr/bin:/sbin:/bin
12 (venv) root@cd9ed409ec97:~#

Apart from the ./venv/bin folder, the venv package also creates a
./venv/lib folder. Inside this folder nested under a python version
folder is a folder called site-packages. This site-packages directory
inside the ./venv/lib/ directory of the virtual environment is where
third-party packages are installed when the virtual environment is
active.
Part III. The Python Standard Library 210

1 (venv) root@959fd3c7f2e4:~# python -m pip install ciso8601


2 Collecting ciso8601
3 Downloading ciso8601-2.3.0-cp311-cp311-manylinux_2_5_x86_64.man\
4 ylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (39\
5 kB)
6 Installing collected packages: ciso8601
7 Successfully installed ciso8601-2.3.0
8
9 [notice] A new release of pip available: 22.3.1 -> 23.0.1
10 [notice] To update, run: pip install --upgrade pip
11 (venv) root@13be81b026f6:~# ls -1a ./venv/lib/python3.11/site-pac\
12 kages/
13 .
14 ..
15 _distutils_hack
16 ciso8601
17 ciso8601-2.3.0.dist-info
18 ciso8601.cpython-311-x86_64-linux-gnu.so
19 distutils-precedence.pth
20 pip
21 pip-22.3.1.dist-info
22 pkg_resources
23 setuptools
24 setuptools-65.5.0.dist-info

In this example, using python -m pip to install a ciso8601 library,


a ciso8601 folder was created in the virtual environment’s site-
packages folder. This library install can now be imported into projects
which run in the context of this virtual environment. This library is
separate from the global installation of python; attempting to import
it outside of the context of the virtual environment will cause an
ImportError.
Part III. The Python Standard Library 211

Chapter 16. Copy

In previous sections, we made reference to the idea that, when you


make an assignment to a variable a from a differing variable b, the
assignment operation is merely creating a new reference to the value
that a is currently referencing. In cases where this value is a mutable
collection, changes to b will be reflected in a, and vice versa.
If instead we want to make a separate and unique object, where state is
not bound between the two variables, we need to instead make a copy
of that value, and then assign that copy, instead of assigning the original
variable. Python provides a copy module in the standard library for just
this purpose.
The copy module defines two functions, copy and deepcopy. The
function copy.copy(item) creates a shallow copy of a given item
(meaning, only make a copy of the object itself). Comparatively,
copy.deepcopy(item) creates a deeply nested copy (meaning, each
mutable object which is referenced by the object is also copied).
Lets consider an example. Given an object my_list which is a list of
lists, we can call copy.copy on this list, which returns a value we assign
to shallow_copy. The shallow_copy is a copy of the list my_list, so
calling id() on shallow_copy and my_list yields different identifiers,
as they are different lists. However, calling id() on any of the items
of the two lists, such as the item at the zeroth index, yields the same
identifier, as the contents of both lists reference the same items.
Part III. The Python Standard Library 212

1 >>> my_list = [["a", "b", "c"], [1, 2, 3]]


2
3 >>> import copy
4 >>> shallow_copy = copy.copy(my_list)
5
6 >>> id(my_list), id(shallow_copy)
7 (140097007507264, 140097007504960) # id's are different
8
9 >>> id(my_list[0]), id(shallow_copy[0])
10 (140097007512368, 140097007512368) # id's are the same

This can be contrasted against the action of deepcopy. Given the same
object my_list, we can call copy.deepcopy on this list, which returns
a variable we assign to deep_copy. The deep_copy is a copy of the list
my_list, so calling id() on deep_copy and my_list yields different
identifiers, as they are different lists. The difference however is that
now if we call id() on any of the items of the two lists, such as the
item at the zeroth index, the call yields different values. This is because
copy.deep_copy recursively traverses collections, making copies of
each item, until the new copy is completely independent of the original
item.

1 >>> deep_copy = copy.deepcopy(my_list)


2 >>> id(my_list), id(deep_copy)
3 (140097007554944, 140097007504960) # id's are different
4
5 >>> id(my_list[0]), id(deep_copy[0])
6 (140097007345472, 140097007551616) # id's are also different
Part III. The Python Standard Library 213

Chapter 17. Itertools

The itertools library in Python is a collection of tools for working with


iterators. The main philosophy behind it is to provide a set of efficient,
reusable, and easy-to-use functions for common tasks related to iterators.
One of the main reasons for its existence is to make it easier to work
with large datasets or sequences of data without having to write complex
and repetitive code. The library also aims to promote a more functional
programming style, as many of its functions are designed to be used in
combination with other functions to build up more complex operations.
The result is that itertools allows for a more expressive and concise
codebase.

Chaining Iterables

The itertools.chain() function is used to take multiple iterators and


“chain” them together into a single iterator. This can be useful when you
have multiple lists, tuples, or other iterable objects that you want to treat
as one sequence. itertools.chain() is variadic, so you can pass any
number of iterables to the function call, and it’ll return a single iterable.

1 >>> import itertools


2 >>> a = [1, 2]
3 >>> b = [3]
4 >>> list(itertools.chain(a, b))
5 [1, 2, 3]

If the iterables are already in an iterable collection, the


itertools.chain.from_iterable() classmethod can be used
instead as a way to iterate over an iterable of iterables.
Part III. The Python Standard Library 214

1 >>> import itertools


2 >>> lists = [(1, 2), (3, 4)]
3 >>> list(itertools.chain.from_iterable(lists))
4 [1, 2, 3, 4]

Filtering Iterables

The itertools.compress() function is used to filter an iterator based


on a second masking iterator. It returns an iterator that only includes
the elements of the first iterator for which the corresponding element
in the masking iterator is truthy. The two iterators passed to iter-
tools.compress() should be of the same length, otherwise, the func-
tion will raise an exception.

1 >>> import itertools


2 >>> data = [1, 2, 3, 4, 5, 6]
3 >>> mask = [True, False, True, False, True, False]
4 >>> result = itertools.compress(data, mask)
5 >>> list(result)
6 [1, 3, 5]

The itertools.filterfalse() function is used to create an iterator


that excludes elements from the input iterator which satisfy the callback.
It has the opposite action of the built-in filter() function.

1 >>> import itertools


2 >>> data = range(1, 10)
3 >>> list(itertools.filterfalse(lambda x: x%2 or x%3, data))
4 [6]

Finally, if you only wish to filter out the elements of an iterable while
the callback is satisfied, the itertools.dropwhile() function can be
Part III. The Python Standard Library 215

used. This function creates an iterator that drops elements from the
input iterator as long as a given callback is truthy for those elements.
Once the callback returns a falsy value for an element, the remaining
elements are included in the output iterator. There’s also an analogous
itertools.takewhile() function, which instead of dropping values,
it yields values until the callback returns falsy.

1 >>> import itertools


2 >>> data = range(1, 10)
3 >>> list(itertools.dropwhile(lambda x: x%2 or x%3, data))
4 [6, 7, 8, 9]
5 >>> list(itertools.takewhile(lambda x: x < 5, data))
6 [0, 1, 2, 3, 4]

Cycling Through Iterables

The itertools.cycle() function is used to create an iterator that


repeatedly iterates over the elements of a given iterable. It creates an
infinite iterator that cycles through the elements of the input iterable.

1 >>> import itertools


2 >>> data = [1, 2, 3]
3 >>> result = itertools.cycle(data)
4 >>> next(result)
5 1
6 >>> next(result)
7 2
8 >>> next(result)
9 3
10 >>> next(result)
11 1
Part III. The Python Standard Library 216

12 >>> next(result)
13 2

Creating Groups

The itertools.groupby() function is used to group elements from


an iterator into groups based on a given callback. The callback takes
one argument, which is an element from the input iterator, and returns
a value that is used to determine which group the element belongs to.
The groupby() function returns an iterator that produces pairs of the
form (key, group), where key is the result of the callback on a given
element, and group is an iterator which continues to yield items from
the iterable so long as callback(item) produces the same key.

1 >>> import itertools


2 >>> data = [1, 2, 4, 3, 5, 6]
3 >>> groups = itertools.groupby(data, lambda x: x % 2)
4 >>> for k, g in groups:
5 ... print(k, list(g))
6 ...
7 1 [1]
8 0 [2, 4]
9 1 [3, 5]
10 0 [6]
11
12 >>> data = [1, 2, 3, 4, 5, 6]
13 >>> groups = itertools.groupby(data, lambda x: x % 2)
14 >>> c = {}
15 >>> for k, g in groups:
16 ... c.setdefault(k, []).extend(g)
17 ...
Part III. The Python Standard Library 217

18 >>> c
19 {1: [1, 3, 5], 0: [2, 4, 6]}

Slicing Iterables

The itertools.islice() function is used to create an iterator that re-


turns selected elements from the input iterator. It takes three arguments:
the input iterator, a start index, and a stop index. The function returns an
iterator that starts at the start index and continues until the stop index,
or the end of the input iterator, whichever comes first. An optional step
argument can be provided to skip over elements.

1 >>> import itertools


2 >>> data = [1, 2, 3, 4, 5, 6]
3 >>> list(itertools.islice(data, 2, 5))
4 [3, 4, 5]

Zip Longest

The itertools.zip_longest() function allows for iterators of differ-


ent lengths. It combines elements from multiple iterators into a single
iterator of tuples. When one of the input iterators is exhausted, it fills in
the remaining values with a specified fillvalue.

1 >>> import itertools


2 >>> data = [(1, 2, 3), (4, 5, 6, 7)]
3 >>> list(itertools.zip_longest(*data, fillvalue=-1))
4 [(1, 4), (2, 5), (3, 6), (-1, 7)]
Part III. The Python Standard Library 218

Chapter 18. Functools

The functools library in Python is a collection of tools for working


with functions and functional programming. One of the main reasons
for its existence is to make it easier to write efficient, expressive, and
maintainable code by providing a set of higher-order functions designed
to work with other functions. These functions are highly composable and
reusable, making them well-suited for use in functional programming.

Partials

The functools.partial() function is used to create a new function


with some of the arguments pre-filled. It takes a function and any number
of parameters, and returns a new function that, when called, will call
the original function with the pre-filled arguments and any additional
arguments passed to the new function.

1 >>> import functools


2 >>> fn = functools.partial(lambda x, y: x + y, 1)
3 >>> fn(2)
4 3

An analgous functools.partialmethod() is also available, which


allows you to emulate this behavior on object methods.

Reduce

The functools.reduce() function is used to apply a binary function to


the elements of an iterable in a cumulative way. The function takes two
Part III. The Python Standard Library 219

arguments: a function and an iterable, and returns a single value. The


function is applied cumulatively to the elements of the iterable, from left
to right, so as to reduce the iterable to a single value. An optional third
value can be passed as the initial value, instead of relying on the first
element of the iterable.

1 >>> import functools


2 >>> data = range(10)
3 >>> functools.reduce(lambda x, y: x+y, data)
4 45

Pipes

Pipes are a concept in functional programming which allow developers


to chain function calls. It’s used to pass the output of one function as
the input to another function, creating a pipeline through which data is
passed.
We can emulate this functionality in python using func-
tools.reduce(), where instead of having an iterable of data, we
have an iterable of functions.

1 >>> import functools


2 >>> fns = (lambda x: x + 2, lambda x: x ** 2, lambda x: x // 3)
3 >>> functools.reduce(lambda x, f: f(x), fns, 7)
4 27

Caching

functools.cache() is a function that is used to cache the results of a


function call, so that the function does not need to be computed again
Part III. The Python Standard Library 220

for the same input. It stores the result of the function call for a specific
input in a cache. If the function is called again with the same input, the
cached result is returned instead of recomputing the result. This can be
useful for reducing the amount of computation required for a function,
and can also be useful for improving the performance of a function that
is called multiple times with the same input.

1 >>> import functools


2 >>> def fibonacci(n):
3 ... if n == 2 or n==1:
4 ... return 1
5 ... return fibonacci(n-1) + fibonacci(n-2)
6 ...
7 >>> fibonacci(40) # this take ~5 seconds on my Ryzen 5800x desktop
8 >>> fibonacci = functools.cache(fibonacci)
9 >>> fibonacci(40) # this operation is almost immediate

In some instances it may be necessary to limit the cache size. In this case,
you can use functools.lru_cache() to set an optional maxsize for
the cache, where the “least recently used” values that exceed the cache
size are dropped. By default this value is 128 items. The cache can be
invalidated using the .cache_clear() method on the cache.
Part III. The Python Standard Library 221

1 >>> import functools


2 >>> def fibonacci(n):
3 ... if n == 2 or n==1:
4 ... return 1
5 ... return fibonacci(n-1) + fibonacci(n-2)
6 ...
7 >>> fibonacci = functools.lru_cache(maxsize=10)(fibonacci)
8 >>> fibonacci(40) # this operation is almost immediate
9 >>> fibonacci.cache_clear()

Finally, functools.cached_property() is a function that is used to


cache the results of a method call on an object, so that the method does
not need to be computed again for the same object. It stores the result
of the method call for a specific object in a cache, and if the method is
called again with the same object, the cached result is returned instead
of recomputing the value.

1 >>> import functools


2 >>> class MyClass:
3 ... @functools.cached_property
4 ... def value(self):
5 ... print("called")
6 ... return True
7 ...
8 >>> my_object = MyClass()
9 >>> my_object.value
10 called
11 True
12 >>> my_object.value
13 True

It should be noted that cached_property() is thread-safe, but it’s


Part III. The Python Standard Library 222

underlying implementation is slow in multithreaded workloads. There


is some discussion about reimplementing it at a later date.

Dispatching

The functools.singledispatch() function is a function that is used


to create a single-dispatch generic function, which can be used to define
multiple implementations of a function for different types of input. The
function takes a single argument, the type of the input, and returns
the appropriate implementation for that type. The analogous method
implementation is functools.singledispatchmethod(). It supports
nesting with other decorators so long as it is the outer most decorator
(i.e. it must be called first).

1 >>> fn = functools.singledispatch(
2 ... lambda x: print("unknown type:", x)
3 ... )
4 >>> fn.register(int)(lambda x: print(f"type int: {x}"))
5 >>> fn.register(float)(lambda x: print(f"type float: {x}"))
6 >>> fn("a")
7 unknown type: a
8 >>> fn(1)
9 type int: 1
10 >>> fn(1.0)
11 type float: 1.0
Part III. The Python Standard Library 223

Chapter 19. Enums, NamedTuples, and


Dataclasses

As programs become more complex and interconnected, it becomes


important to limit the variability of state wherever you can. By defining
a set of named values and limiting the possible options, you can reduce
the chance of introducing bugs and errors due to unexpected states. It
also makes your code more readable and understandable, as the meaning
and purpose of specific values are clear and consistent throughout the
program.
In this chapter we’re going to look at three separate methods for enumer-
ating values in your code - Enums, NamedTuples, and Dataclasses.

Enums

Enums (enumerated types) in Python are used to define a set of named


values, which can be used to represent specific states in a program. They
are a way to give more structure and meaning to a set of values. Enums
are defined using the Enum class. The Enum class is used to define the
enumerated type, and each member of the Enum is defined as a class
variable. Each member is given a name and a value, and the value can
be any valid Python object, but it’s common to use integers or strings as
values. If the value is of no consequence to you, you can use any object
with a unique identifier, such as object().
An Enum can also be created using a factory function. In this case, the
first argument is the name of the Enum, and the second argument is
an iterable containing each class variable. The values are automatically
assigned as integers incrementing from 1.
Part III. The Python Standard Library 224

1 from enum import Enum


2
3 class Weekends(Enum):
4 Saturday = object()
5 Sunday = object()
6
7 Weekdays = Enum(
8 "Weekdays",
9 (
10 "Monday",
11 "Tuesday",
12 "Wednesday",
13 "Thursday",
14 "Friday",
15 ),
16 )

Once an Enum is defined, you can use the enumerated values as if


they were variables. Enums in python are immutable, meaning that
their values cannot be changed after they are created, and they are also
comparable, meaning that you can use them in comparison operations.

1 match current_day:
2 case Weekdays.Monday:
3 print("Today is Monday")
4 case Weekdays.Tuesday:
5 print("Today is Tuesday")
6 case Weekdays.Wednesday:
7 print("Today is Wednesday")
8 case Weekdays.Thursday:
9 print("Today is Thursday")
10 case Weekdays.Friday:
Part III. The Python Standard Library 225

11 print("Today is Friday")
12 case Weekends.Saturday | Weekends.Sunday:
13 print("Weekend!")

NamedTuples

NamedTuples are a special type of tuple in Python that allows you to give
names to the elements of the tuple. They are designed to be interoperable
with regular tuples, but they also allow you to access elements by name,
in addition to by index.
The NamedTuple class from the typing module can be used to create a
NamedTuple. The derived class defines attributes via type annotations.
Instance variables are defined at instantiation. A NamedTuple can be
instantiated using variadic arguments or keyword arguments; in the
case of using variadic args, the order of attributes defined in the class
determine which indexed value is assigned to the attribute.
A NamedTuple can also be created using the factory function named-
tuple in the collections module. In this case, the first argument is
the name of the NamedTuple, and the second argument is an iterable
containing the name of each attribute.
Part III. The Python Standard Library 226

1 >>> from typing import NamedTuple


2 >>> class Point(NamedTuple):
3 ... x: int | float
4 ... y: int | float
5 ...
6 >>> point = Point(1, 2)
7 >>> point
8 Point(x=1, y=2)
9 >>> point.x
10 1
11 >>> from collections import namedtuple
12 >>> Coordinate = namedtuple("Coordinate", ('x', 'y'))
13 >>> coordinate = Coordinate(1, 2)
14 >>> coordinate
15 Coordinate(x=1, y=2)
16 >>> coordinate.y
17 2
18 >>> list(coordinate)
19 [1, 2]

Dataclasses

Dataclasses are the most feature-rich implementation of data as objects.


They allow you to define classes that are less verbose than traditional
Python classes, while still providing useful functionality such as auto-
matic generation of default special methods.
The dataclasses.dataclass() decorator automatically generates a
default implementation of special methods like __init__, __repr__-
, __eq__, __lt__ etc. so you don’t have to write them yourself. This
allows you to focus on the data that the class is holding, rather than the
mechanics of the class.
Part III. The Python Standard Library 227

1 >>> from dataclasses import dataclass


2 >>> @dataclass
3 ... class Person:
4 ... name: str
5 ... age: int
6 ... country: str = "USA"
7 ...
8 >>> Person("Reid", 28)
9 Person(name='Reid', age=28, country='USA')
10 >>> Person("Sai", 27, "India")
11 Person(name='Sai', age=27, country='India')

The dataclass decorator takes multiple optional arguments. They are as


follows:

• init - a boolean indicating whether or not to generate an __init_-


_ method.
• repr - a boolean indicating whether or not to generate a __repr_-
_ method.
• eq - a boolean indicating whether or not to generate __eq__-
method.
• order - a boolean indicating whether or not to generate __lt__,
__le__, __gt__, and __ge__ methods.
• frozen - a boolean indicating whether or not to make the class
immutable.
• unsafe_hash - a boolean indicating whether or not to force create
a hash function. If True, all mutable attributes need to be passed
over by including hash=False in the corresponding field factory
function.
• slots - a boolean indicating whether or not to prefer __slots__-
architecture to store attributes over a dictionary
Part III. The Python Standard Library 228

Mutable default values can be provided using the fields function.


fields takes an optional default_factory argument that must be a
callable of zero arguments.

1 >>> from dataclasses import dataclass, field


2 >>> @dataclass
3 ... class Group:
4 ... people: list[str] = field(default_factory=list)
5 ...
6 >>> Group(["Jessie", "Ryan"])
7 Group(people=['Jessie', 'Ryan'])

Chapter 20. Multithreading and


Multiprocessing

Parallelism refers to the concept of running multiple tasks simulta-


neously, in order to improve performance and speed up the overall
execution of a program. In Python, parallelism can be achieved through
the use of multithreading and multiprocessing.
Multithreading involves the use of multiple threads, which are small
units of a process that can run independently of each other. The thread-
ing module can be used to create and manage threads. Multithreading
can be useful for tasks that involve a lot of waiting, such as waiting for
input/output operations to complete or waiting for a response from a
network resource.
Multiprocessing, on the other hand, involves the use of multiple pro-
cesses, which are separate instances of a program that run concurrently.
The multiprocessing module can be used to create and manage processes.
Part III. The Python Standard Library 229

Multiprocessing can be useful for tasks that involve a lot of computation,


such as data processing or machine learning tasks.
It’s important to note that while both multithreading and multiprocess-
ing can be used to achieve parallelism in Python, they have different
use cases and tradeoffs. Multithreading is typically used for IO-bound
and high-level structured network code, while multiprocessing is used
for CPU-bound and low-level code.

Multithreading

In Python, the threading module provides means through which to create


and manage threads. A thread is a separate execution flow that runs in
parallel with the main program. Threads share the same memory space
as the main program, which allows them to access and share data.
Threading uses a preemptive task switching model, where the interpreter
schedules and switches between threads. The interpreter can interrupt a
running thread at any time, in order to give another thread a chance to
run. Python uses a Global Interpreter Lock (GIL) to achieve this: only
the thread which has acquired the GIL may execute, and there is only
one lock, the global interpreter lock, which any thread can acquire. The
GIL is released when a thread makes a blocking I/O call, such as reading
from a file or waiting for a network response. When the call returns, the
GIL must be acquired again before the thread can continue to execute.
This means that if one thread is blocked on I/O, another thread can take
over and execute Python code. However, if all threads are performing
CPU-bound tasks, they will be effectively running sequentially and not
concurrently.
The threading module in Python provides a way to create and manage
threads. To create a new thread, you can use the Thread class from
Part III. The Python Standard Library 230

the threading module. The Thread class takes a target function as an


argument, which is a function that will be executed by the thread.
The function args and kwargs can be passed through using keyword
arguments of the same name.

1 >>> import threading


2 >>> def my_function(*args, **kwargs):
3 ... print(f"Hello from the thread! {args} {kwargs}")
4 ...
5 >>> thread = threading.Thread(target=my_function, args=(1,), kwar\
6 gs={"a": 1})
7 >>> thread.start()
8 Hello from the thread! (1,) {'a': 1}

The threading module also includes a Thread class which can be


subclassed. It has several methods and attributes that can be used to
control and manipulate threads. Some of the most commonly used
methods are as follows:

• Thread.start() - starts the execution of the thread. This method


creates a new native thread and calls the run method on it.
• Thread.run() - entry point of the thread. This method is where
the code that will be executed by the thread should be placed.
• Thread.join(timeout=None) - waits for the thread to complete.
If the optional timeout argument is provided, the method will wait
for at most time seconds.
• Thread.is_alive() - returns True if the thread is executing, and
False otherwise.

If you want to subclass the Thread object, you need to be sure to


explicitly call the parent class’s __init__ method inside the child
Part III. The Python Standard Library 231

class’s __init__ method, or use a @classmethod for instantiation.


Furthermore, you should not call the .run() method directly. Rather,
use the .start() method to start the thread. The .start() method
from threading.Thread will create a new native thread, and it will
call the .run() method.

1 >>> import threading, time


2 >>> class MyThread(threading.Thread):
3 ... @classmethod
4 ... def create(cls, value):
5 ... self = cls()
6 ... self.value = value
7 ... self.start()
8 ... return self
9 ... def run(self):
10 ... time.sleep(self.value)
11 ... print(f"Thread slept {self.value} seconds")
12 ...
13 >>> threads = [MyThread.create(i) for i in range(5)]
14 >>> # .join() returns None, so any() will exhaust the expression
15 >>> any(t.join() for t in threads)
16 Thread slept 0 seconds
17 Thread slept 1 seconds
18 Thread slept 2 seconds
19 Thread slept 3 seconds
20 Thread slept 4 seconds
21 False

Thread Locks

Threads have the ability to operate on shared memory. However, access


to that memory must be “thread safe”, i.e. controlled in such a manner
Part III. The Python Standard Library 232

that no two threads are attempting to change shared mutable state at


the same time. Failure to keep shared memory thread safe can lead to
undefined behavior.

1 >>> import threading, time


2 >>> value = 0
3 >>> def increment():
4 ... global value
5 ... tmp = value
6 ... time.sleep(0.1)
7 ... value = tmp + 1
8 ...
9 >>> threads = (
10 ... threading.Thread(target=increment)
11 ... for _ in range(2)
12 ... )
13 ...
14 >>> for t in threads:
15 ... t.start()
16 ...
17 >>> for t in threads:
18 ... t.join()
19 ...
20 >>> value # should be 2
21 1

The threading.Lock() class provides a way to synchronize access to


shared resources between multiple threads. A lock is a synchronization
object that can be in one of two states: “locked” or “unlocked”. When
a thread acquires a lock, it changes the state of the lock to “locked” and
prevents other threads from acquiring the lock. When the thread releases
the lock, it changes the state of the lock to “unlocked” and allows other
Part III. The Python Standard Library 233

threads to acquire the lock.


The Lock() class defines two methods: .acquire() and .release()
that can be used to acquire and release the lock. The .acquire() method
can also take an optional blocking parameter, which defaults to True.
When blocking is True, the .acquire() method blocks the execution
of a thread until the lock can be acquired; when blocking is False,
the .acquire() method returns immediately with a boolean indicating
whether the lock was acquired or not. Additionally, the .acquire()
method takes an optional timeout parameter, which will release the lock
after a certain amount of time, in case of deadlock.
A lock object can also be used as a context manager. The arguments of
the __enter__ method are equivalent to the .acquire() method.

1 >>> import threading, time


2 >>> value = 0
3 >>> lock = threading.Lock()
4 >>> def increment():
5 ... with lock:
6 ... global value
7 ... tmp = value
8 ... time.sleep(0.1)
9 ... value = tmp + 1
10 ...
11
12 >>> threads = (
13 ... threading.Thread(target=increment)
14 ... for _ in range(2)
15 ... )
16 ...
17 >>> for t in threads:
18 ... t.start()
Part III. The Python Standard Library 234

19 ...
20 >>> for t in threads:
21 ... t.join()
22 ...
23 >>> value # should be 2
24 1

Multiprocessing

The multiprocessing module in Python provides a way to write con-


current code using multiple processes, instead of threads. The multipro-
cessing module provides several classes and functions for creating and
managing processes, including Process, Queue, Lock, Pool, and Pipe.

Process and Pool

The Process() class can be used to create and manage new processes.
A process is a separate execution environment, with its own memory
space and Python interpreter. This allows you to take advantage of
multiple cores on a machine, and to work around the Global Interpreter
Lock (GIL) that prevents multiple threads from executing Python code
simultaneously. The multiprocessing.Process class has the same
API as the threading.Thread class.
Part III. The Python Standard Library 235

1 >>> def f(name):


2 ... print(f"Hello {name}")
3
4 >>> multiprocessing.Process(target=f, args=("Peter",)).start()
5
6 Hello Peter
7
8 >>> import multiprocessing, time
9 >>> class MyProcess(multiprocessing.Process):
10 ... @classmethod
11 ... def create(cls, value):
12 ... self = cls()
13 ... self.value = value
14 ... self.start()
15 ... return self
16 ... def run(self):
17 ... time.sleep(self.value)
18 ... print(f"Process slept {self.value} seconds")
19 ...
20 >>> ps = [MyProcess.create(i) for i in range(5)]
21 >>> any(p.join() for p in ps)
22 Process slept 0 seconds
23 Process slept 1 seconds
24 Process slept 2 seconds
25 Process slept 3 seconds
26 Process slept 4 seconds
27 False

The multiprocessing library also provies a Pool class that can be used
to orchestrate multiple tasks in parallel. A pool of worker processes is a
group of processes that can be reused to execute multiple tasks.
Part III. The Python Standard Library 236

1 >>> import multiprocessing


2 >>> def double(n):
3 ... return n * 2
4 ...
5 >>> with multiprocessing.Pool(processes=4) as pool:
6 ... results = pool.map(double, range(8))
7 ... print(results)
8 ...
9 [0, 2, 4, 6, 8, 10, 12, 14]

In this example, a Pool() object is created with four processes and


assigned to the variable pool. The pool’s .map() method is used to apply
my_function() on multiple inputs in parallel. The .map() method
returns a list of the results in the order of the input.
The Pool() class also provides several other methods for submitting
tasks, such as .imap(), .imap_unordered() and .apply(). The
.imap() and .imap_unordered() methods are similar to .map(),
except these methods return iterators which yields results lazily. The
distinction between the two is that .imap() will yield results in order,
and .imap_unordered() will yield results arbitrarily. Finally, the
.apply() method is used to submit a single task for execution, and it
blocks the main process until the task is completed.

Process Locks

Similar to the threading.Lock() class, the multiprocessing.Lock()


class can be used to synchronize access to shared resources across
multiple processes. When a process acquires a lock, it changes the state
of the lock to “locked” and prevents other processes from acquiring the
lock. When the process releases the lock, it changes the state of the lock
to “unlocked” and allows other processes to acquire the lock.
Part III. The Python Standard Library 237

The multiprocessing.Lock() class has the same API as the thread-


ing.Lock() class. It has similar .acquire() and .release() methods,
and can be used as a context manager.

1 >>> import multiprocessing, time


2 >>> lock = multiprocessing.Lock()
3 >>> def increment(n):
4 ... with lock:
5 ... time.sleep(1-n)
6 ... print(n)
7 ...
8 >>> ps = (
9 ... multiprocessing.Process(target=increment, args=(i,))
10 ... for i in range(2)
11 ... )
12 ...
13 >>> for p in ps:
14 ... p.start()
15 ...
16 >>> for p in ps:
17 ... p.join()
18 ...
19 0
20 1

Pipes and Queues

Since processes run in isolated execution environments, it’s difficult to


share data between them. You can’t just pass a reference to an object
in the main process and expect separate processes to interact with it.
In order to establish interprocess communication, we need to use IPC
Part III. The Python Standard Library 238

mechanisms to send and receive data. The multiprocessing module


provides both pipes and queues for this purpose.
multiprocessing.Pipe() creates a two-ended connection between
two processes, one end of the connection is for sending messages, and
the other end is for receiving messages. It returns a pair of connection
objects, one for each end of the pipe. You can pass an optional duplex=
parameter, which if True makes the connections bidirectional. There’s
also a multiprocessing.Queue() object which creates a queue that
is shared between multiple processes. It allows processes to put and get
messages in a thread-safe and process-safe way.

1 >>> import multiprocessing


2 >>> def sender(conn, item):
3 ... conn.send(item)
4 ... conn.close()
5 ...
6 >>> downstream, upstream = Pipe()
7 >>> process = multiprocessing.Process(
8 ... target=sender,
9 ... args=(upstream, "Hello!")
10 ... )
11 >>> process.start()
12 >>> process.join()
13 >>> downstream.recv()
14 'Hello!'

In this example, a pipe is created, and then a second process is created.


The second process sends a message through the pipe and the main
process receives it.
Part III. The Python Standard Library 239

1 >>> import multiprocessing


2 >>> def sender(queue, value):
3 ... queue.put(value)
4 ...
5 >>> queue = multiprocessing.Queue()
6 >>> process = multiprocessing.Process(target=sender, args=(queue,\
7 "Hello!"))
8 >>> process.start()
9 >>> process.join()
10 >>> queue.get()
11 'Hello!'

In this example, a queue is created, and then a second process is created.


The second process places a message in the queue for the main process
to get.

concurrent.futures

The concurrent.futures module in Python provides a high-level API


for asynchronously executing callables using threading or multiprocess-
ing. This module provides several classes and functions that abstract
away the details of thread and process management, and provide a
consistent interface for executing tasks in parallel.
The ThreadPoolExecutor class is used for asynchronously executing
callables using a pool of worker threads. A thread pool is a group
of worker threads that can be reused to execute multiple tasks. The
ThreadPoolExecutor class allows you to submit tasks to be executed
by the worker threads, and to retrieve the results of the tasks when they
are completed.
The ProcessPoolExecutor class is used for asynchronously executing
callables using a pool of worker processes. Similar to a thread pool,
Part III. The Python Standard Library 240

a process pool is a group of worker processes that can be reused to


execute multiple tasks. It provides an API that is the same as Thread-
PoolExecutor, but it is important to note that due to the nature of
processes, tasks will not share the same memory space and you will
therefore have to use interprocess communication mechanisms like
the multiprocessing.Queue() in order to communicate between the
various processes.
When you submit a task to be executed by a ThreadPoolExecutor
or ProcessPoolExecutor, the .submit() method returns a Future
object. You can use this Future object to check the status of the task,
retrieve the result of the task when it is completed, or wait for the task
to complete.
The concurrent.futures module in Python provides two functions
for working with a collection of Future objects: wait() and as_com-
pleted().
The concurrent.futures.wait() function takes an iterable of Future
objects and blocks until all of the futures are completed or the optional
timeout is reached. A return_when parameter controls when the func-
tion returns; it can be set to FIRST_COMPLETED, FIRST_EXCEPTION,
or ALL_COMPLETED (the default value). If FIRST_COMPLETED is
used, the function will return as soon as any future completes. If
FIRST_EXCEPTION is used, the function will return as soon as any
future raises an exception. If ALL_COMPLETED is used, the function
will return only when all futures have completed, regardless of whether
any raised an exception. The function returns a named tuple of sets,
containing the completed, not completed, and done futures.
The concurrent.futures.as_completed() function takes an iterable
of Future objects and returns an iterator that yields Future objects
as they are completed. The timeout parameter can be used to specify
Part III. The Python Standard Library 241

a maximum time to wait for the futures to complete before stopping the
iteration.
It’s common to utilize as_completed when you want to process the
results of the tasks as soon as they are completed, regardless of the order
they were submitted. On the other hand, wait is useful when you want
to block until all the tasks are completed and retrieve the results in the
order they were submitted.

1 >>> import concurrent.futures


2 >>> def double(n):
3 ... return n * 2
4 ...
5 >>> with concurrent.futures.ThreadPoolExecutor() as exec:
6 ... results = [exec.submit(double, i) for i in range(10)]
7 ... values = [
8 ... future.result() for future in
9 ... concurrent.futures.as_completed(results)
10 ... ]
11 ...
12 >>> values
13 [12, 18, 10, 8, 2, 16, 0, 4, 6, 14]
14 >>> with concurrent.futures.ProcessPoolExecutor() as exec:
15 ... jobs = [exec.submit(double, i) for i in range(10)]
16 ... results = concurrent.futures.wait(jobs)
17 ... values = list(future.result() for future in results.done)
18 ...
19 >>> values
20 [0, 18, 6, 14, 2, 12, 16, 4, 8, 10]
Part III. The Python Standard Library 242

Chapter 21. Asyncio

The asyncio library is a redesign of how you use threading in python to


handle I/O and network related tasks. Previously, in order to implement
threading, you would use the threading module, which relies on a
preemptive task switching mechanism for coordination between threads.
In preemptive multitasking, the scheduler is responsible for interrupting
the execution of a task and switching to another task. This means that the
interpreter can interrupt the execution of a thread and switch to another
thread at any point. This can happen when a thread exceeds its time slice
or when a higher-priority thread becomes ready to run.
In contrast, asyncio relies on cooperative task switching. In this
paradigm tasks voluntarily yield control to the scheduler, in this case
an event loop. Each task is individually responsible for releasing control
back to the scheduler, allowing other tasks to execute. In the asyncio
library, tasks are represented by coroutines, and they use an await
keyword to yield control to the event loop. The event loop schedules
the execution of coroutines and switches between them based on their
current states and the availability of resources.
In general, cooperative task switching is more efficient and less error-
prone than preemptive task switching, because it allows tasks to run for
as long as they need without interruption. However, it also requires more
explicit coordination between tasks, and it can lead to issues such as
deadlocks and livelocks if not implemented correctly. On the other hand,
preemptive task switching can be more complex to implement and it can
lead to race conditions and other synchronization issues, but it can also
prevent tasks from monopolizing resources.
asyncio is an extensive topic, and not everything will be covered in this
Part III. The Python Standard Library 243

chapter. Our focus instead will be how developers can use asyncio in
the context of backend systems - framework developers are encouraged
to read the full spec in python’s official documentation.

Coroutines

In asyncio, tasks are represented by coroutines, which are special types


of generator functions that can be paused and resumed. Coroutines are
defined using the async def keyword, and it can use the await keyword
to suspend its execution and allow other tasks to run.

1 >>> import asyncio


2 >>> async def my_coroutine():
3 ... print("Hello ", end="", flush=True)
4 ... await asyncio.sleep(1)
5 ... print("World!")
6 ...
7 >>> asyncio.run(my_coroutine())
8 Hello World!

This example demonstrates a simple example of an asynchronous corou-


tine in Python using the asyncio library. It immediately prints “Hello”
to the console. Then the coroutine uses the await keyword to wait for
1 second using the asyncio.sleep(1) function. Finally, the coroutine
prints “World!” after the delay of 1 second.
It’s important to note that when using coroutines, you should avoid
blocking operations, such as waiting for file I/O or performing com-
putations, as it will block the event loop and prevent other tasks from
running. Instead, you should use non-blocking operations, such as asyn-
cio.sleep() or async with statements.
Part III. The Python Standard Library 244

Tasks and Task Groups

Multiple coroutines can be aggregated so to run concurrently. To do this,


you can use the asyncio.create_task() function to aggregate multi-
ple coroutines either within a single coroutine, or using an aggregator
such as asyncio.gather(), and then use asyncio.run() to run the
aggregate.

1 >>> import asyncio


2 >>> async def first_coroutine():
3 ... await asyncio.sleep(0.25)
4 ... await asyncio.sleep(0.25)
5 ... print("World!")
6 ...
7 >>> async def second_coroutine():
8 ... await asyncio.sleep(0.25)
9 ... print("Hello ", end="", flush=True)
10 ...
11 >>> async def main():
12 ... first_task = asyncio.create_task(first_coroutine())
13 ... second_task = asyncio.create_task(second_coroutine())
14 ... await first_task
15 ... await second_task
16 ...
17 >>> asyncio.run(main())
18 Hello World!
19 >>> async def main():
20 ... await asyncio.gather(first_coroutine(), second_coroutine(\
21 ))
22 ...
23 >>> asyncio.run(main())
24 Hello World!
Part III. The Python Standard Library 245

It should be noted that asyncio.create_task() uses a weakref to


the task internally, so if the task returned by the function is deleted, or
goes out of scope, it will decrease in reference count to 0 and be garbage
collected. In order to avoid this, they should be collected into a collection,
and discarded only when the task is done.
In python 3.11, an asynchronous context manager was added for manag-
ing groups of asyncio tasks. The asyncio.TaskGroup() class defines a
create_task() method with a signature to match asyncio.create_-
task(). The context manager on exit awaits all the registered tasks.

1 >>> import asyncio


2 >>> async def first_coroutine():
3 ... await asyncio.sleep(0.25)
4 ... await asyncio.sleep(0.25)
5 ... print("World!")
6 ...
7 >>> async def second_coroutine():
8 ... await asyncio.sleep(0.25)
9 ... print("Hello ", end="", flush=True)
10 ...
11 >>> async def main():
12 ... async with asyncio.TaskGroup() as tg:
13 ... first_task = tg.create_task(first_coroutine())
14 ... second_task = tg.create_task(second_coroutine())
15 ...
16 >>> asyncio.run(main())
17 Hello World!

ExceptionGroup and Exception unpacking

In instances where multiple coroutines of a task group raise an exception,


the task group bundles the exceptions into an ExceptionGroup and
Part III. The Python Standard Library 246

raises the group. Exceptions from the group can be selectively handled
using an except* syntax, which is akin to unpacking exceptions from
the group to handle individually.

1 >>> import asyncio


2 >>> async def first_coroutine(n):
3 ... await asyncio.sleep(n)
4 ... print(f"coroutine {n} reporting...")
5 ...
6 >>> async def second_coroutine():
7 ... raise AssertionError("raised AssertionError")
8 ...
9 >>> async def third_coroutine():
10 ... raise ValueError("raise ValueError")
11 ...
12 >>> async def main():
13 ... try:
14 ... async with asyncio.TaskGroup() as tg:
15 ... for i in range(3):
16 ... tg.create_task(first_coroutine(i))
17 ... tg.create_task(second_coroutine())
18 ... tg.create_task(third_coroutine())
19 ... except* AssertionError as err:
20 ... aerr, = err.exceptions
21 ... print(f"AssertionError caught: {aerr}")
22 ... except* ValueError as err:
23 ... verr, = err.exceptions
24 ... print(f"ValueError caught: {verr}")
25 ...
26 >>> asyncio.run(main())
27 coroutine 0 reporting...
28 AssertionError caught: raised AssertionError
Part III. The Python Standard Library 247

29 ValueError caught: raise ValueError

In this example, the try block catches both the AssertionError and the
ValueError within a single iteration of the event loop. Both exceptions
are grouped together into a single ExceptionGroup by the TaskGroup,
and the ExceptionGroup is raised. Tasks which aren’t finished by the
time the ExceptionGroup is raised are cancelled.
The except* handler unpacks the exceptions of the exception group and
it allow you to handle each exception in isolation from other exceptions
caught by the group exception handler. In this case, the AssertionError
and the ValueError are print to the console, and the main block exits.
Part VI. The Underbelly of
the Snake
As Python developers, it’s important to have the skills and tools to find
and fix bugs, and optimize the performance of our code. As the codebase
grows and becomes more complex, it becomes increasingly difficult to
identify and fix issues. Additionally, as our respected platforms start
to gain users, or start deployed in more demanding environments, the
importance of ensuring that it is performing well becomes even more
critical.
In this section, we’ll cover the various techniques and tools available for
debugging and profiling Python, as well as how to optimize bottlenecks
with C extensions.

Chapter 22. Debugging

Debugging Python can be a bit different experience, as compared to


debugging code in a statically typed language. In a dynamically typed
language like Python, the type of a variable is determined at run-time,
rather than at compile-time. This can make it more difficult to catch
certain types of errors, such as type errors, before they occur.
However, Python also provides a number of tools and features that
can make debugging easier. By taking advantage of these tools and
familiarizing yourself with them, you can become more effective at
finding and fixing bugs.
Part VI. The Underbelly of the Snake 249

pdb

The Python Debugger, also known as pdb, is a built-in module that allows
you to debug your code by stepping through it line by line and inspecting
the state of the program at each step. It is a command-line interface that
provides a set of commands for controlling the execution of the program
and for inspecting the program’s state.
When using pdb, you can set breakpoints in your code, which will cause
the interpreter to pause execution at that point, and drop you into a
debugger REPL. You can then use various commands to inspect the state
of the program, such as viewing the values of variables, inspecting the
call stack, and seeing the source code. You can also step through your
code, line by line, to see how each command mutates state.

1 # ./closure.py
2
3 def my_closure(value):
4 def my_decorator(fn):
5 def wrapper(*args, **kwargs):
6 _enclosed = (fn, value)
7 breakpoint()
8 return wrapper
9 return my_decorator
10
11 @my_closure("this")
12 def my_function(*args, **kwargs):
13 pass
14
15 my_function(None, kwarg=1)

In this example we’re setting a breakpoint inside the wrapper function


Part VI. The Underbelly of the Snake 250

of a decorator. The breakpoint will drop us into the pdb repl when the
interpreter hits this line in execution.
From here, we have a set of commands which we can use to inspect the
state of our program. Here are some of the most common:

• help - Display a list of available commands or get help on a specific


command.
• list or l - List source code for the current file.
• where or w - Display the stack trace and line number of the current
line.
• next or n - Execute the current line and move to the next line of
code.
• step or s - Step into the function call at the current line.
• return or r - Continue execution until the current function returns.
• break or b - Set a breakpoint at the current line or at a specific line.
• clear - Clear all breakpoints, or a breakpoint at a specific line.
• watch or watch expression - Set a watchpoint on a variable, so
that the execution will pause when the value of the variable changes.
• args or a - Display the argument list of the current function.
• print or p - Print the value of an expression.
• up or u - Move up the stack trace.
• down or d - Move down the stack trace.
• quit or q - Quit the debugger and exit.
Part VI. The Underbelly of the Snake 251

1 root@e08d854dfbfe:~# ls
2 script.py
3 root@e08d854dfbfe:~# python ./script.py
4 --Return--
5 > /root/script.py(5)wrapper()->None
6 -> breakpoint()
7 (Pdb) help
8
9 Documented commands (type help <topic>):
10 ========================================
11 EOF c d h list q rv \
12 undisplay
13 a cl debug help ll quit s \
14 unt
15 alias clear disable ignore longlist r source \
16 until
17 args commands display interact n restart step \
18 up
19 b condition down j next return tbreak w
20 break cont enable jump p retval u \
21 whatis
22 bt continue exit l pp run unalias \
23 where
24
25 Miscellaneous help topics:
26 ==========================
27 exec pdb
28 (Pdb) help whatis
29 whatis arg
30 Print the type of the argument.

In addition, the pdb repl is able to run executable python. This includes
Part VI. The Underbelly of the Snake 252

builtin functions, comprehensions, etc.

1 (Pdb) list
2 1 def my_closure(value):
3 2 def my_decorator(fn):
4 3 def wrapper(*args, **kwargs):
5 4 _enclosed = (fn, value)
6 5 -> breakpoint()
7 6 return wrapper
8 7 return my_decorator
9 8
10 9 @my_closure("this")
11 10 def my_function(*args, **kwargs):
12 11 pass
13
14 (pdb) where
15 /root/script.py(13)<module>()
16 -> my_function(0, kwarg=1)
17 > /root/script.py(5)wrapper()->None
18 -> breakpoint()
19
20 (pdb) p args
21 (0, )
22
23 (pdb) p value
24 'this'
25
26 (Pdb) dir(kwargs)
27 ['__class__', '__class_getitem__', '__contains__', '__delattr__',
28 '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__g\
29 e__',
30 '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__h\
31 ash__',
Part VI. The Underbelly of the Snake 253

32 '__init__', '__init_subclass__', '__ior__', '__iter__', '__le__',\


33 '__len__',
34 '__lt__', '__ne__', '__new__', '__or__', '__reduce__', '__reduce_\
35 ex__',
36 '__repr__', '__reversed__', '__ror__', '__setattr__', '__setitem_\
37 _',
38 '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fr\
39 omkeys',
40 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update',\
41 'values']
42
43 (pdb) [x for x in dir(kwargs) if not x.startswith('__')]
44 ['clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'pop\
45 item',
46 'setdefault', 'update', 'values']
47
48 (Pdb) n
49 --Return--
50 > /root/script.py(13)<module>()->None
51 -> my_function(0, kwarg=1)
52
53 (Pdb) l
54 8
55 9 @my_closure("this")
56 10 def my_function(*args, **kwargs):
57 11 pass
58 12
59 13 -> my_function(0, kwarg=1)
60 14

Python programs can be run within the context of the python debugger.
This allows the debugger to catch unexpected exceptions, dropping you
Part VI. The Underbelly of the Snake 254

into the debugger REPL at the point where an Exception is raised,


as opposed to python just printing the stack trace on exit. To do this,
run your program via the pdb module by executing python -m pdb
./script.py.

1 # ./script.py
2
3 def my_closure(value):
4 def my_decorator(fn):
5 def wrapper(*args, **kwargs):
6 _enclosed = (fn, value)
7 raise ValueError
8 return wrapper
9 return my_decorator
10
11 @my_closure("this")
12 def my_function(*args, **kwargs):
13 pass
14
15 my_function(0, kwarg=1)

1 root@e08d854dfbfe:~# python script.py


2 Traceback (most recent call last):
3 File "/root/script.py", line 13, in <module>
4 my_function(0, kwarg=1)
5 File "/root/script.py", line 5, in wrapper
6 raise ValueError
7 ValueError
8
9 root@e08d854dfbfe:~# python -m pdb script.py
10 > /root/script.py(1)<module>()
11 -> def my_closure(value):
Part VI. The Underbelly of the Snake 255

12
13 (Pdb) c
14 Traceback (most recent call last):
15 File "/usr/local/lib/python3.11/pdb.py", line 1774, in main
16 pdb._run(target)
17 File "/usr/local/lib/python3.11/pdb.py", line 1652, in _run
18 self.run(target.code)
19 File "/usr/local/lib/python3.11/bdb.py", line 597, in run
20 exec(cmd, globals, locals)
21 File "<string>", line 1, in <module>
22 File "/root/script.py", line 13, in <module>
23 my_function(0, kwarg=1)
24 File "/root/script.py", line 5, in wrapper
25 raise ValueError
26 ValueError
27 Uncaught exception. Entering post mortem debugging
28 Running 'cont' or 'step' will restart the program
29 > /root/script.py(5)wrapper()
30 -> raise ValueError
31
32 (Pdb) ll
33 3 def wrapper(*args, **kwargs):
34 4 _enclosed = (fn, value)
35 5 -> raise ValueError

Other Debuggers

There are several third-party python debuggers in the python ecosystem


that are also worth discussing. Pdb++ (pdbpp) is a drop-in replacement
for the built-in pdb, so you can use it in the same way you would use pdb.
pdbpp implements feature enhancements such as syntax highlighting,
tab completion, and smart command parsing. As of this writing though
Part VI. The Underbelly of the Snake 256

its main branch is only compatible with python 3.10.

1 root@e08d854dfbfe:~# python -m pip install pdbpp


2 Collecting pdbpp
3 Downloading pdbpp-0.10.3-py2.py3-none-any.whl (23 kB)
4 ...
5
6 root@ab2771e522c8:~# python -m pdb script.py
7 [2] > /root/script.py(3)<module>()
8 -> def my_closure(value):
9 (Pdb++)

A second optional debugger is the ipdb debugger. ipdb is similar to


pdb++, but it is built as a standalone library, rather than a drop-in pdb
replacement. The main benefit of ipdb is that it uses the IPython shell,
which is a powerful interactive shell for Python that provides features
such as syntax highlighting, tab completion, and better support for multi-
lined input. To use ipdb, instead of placing a breakpoint() in your
code, you need to toggle the debugger directly using an import statement
like __import__('ipdb').set_trace(). Also, to use it as a package,
from the terminal you execute its main script via python -m ipdb
./script.py.
Part VI. The Underbelly of the Snake 257

1 root@6b5606b62e20:~# python -m pip install ipdb


2 Collecting ipdb
3 Using cached ipdb-0.13.11-py3-none-any.whl (12 kB)
4 ...
5
6 root@6b5606b62e20:~# python -m ipdb script.py
7 /usr/local/lib/python3.10/runpy.py:126: RuntimeWarning: 'ipdb.__m\
8 ain__' found in sys.modules after import of package 'ipdb', but p\
9 rior to execution of 'ipdb.__main__'; this may result in unpredic\
10 table behaviour
11 warn(RuntimeWarning(msg))
12 > /root/script.py(1)<module>()
13 ----> 1 def my_closure(value):
14 2 def my_decorator(fn):
15 3 def wrapper(*args, **kwargs):
16
17 ipdb> c
18 Traceback (most recent call last):
19 File "/usr/local/lib/python3.10/site-packages/ipdb/__main__.py"\
20 , line 322, in main
21 pdb._runscript(mainpyfile)
22 File "/usr/local/lib/python3.10/pdb.py", line 1592, in _runscri\
23 pt
24 self.run(statement)
25 File "/usr/local/lib/python3.10/bdb.py", line 597, in run
26 exec(cmd, globals, locals)
27 File "<string>", line 1, in <module>
28 File "/root/script.py", line 13, in <module>
29 my_function(0, kwarg=1)
30 File "/root/script.py", line 5, in wrapper
31 raise ValueError
32 ValueError
Part VI. The Underbelly of the Snake 258

33 Uncaught exception. Entering post mortem debugging


34 Running 'cont' or 'step' will restart the program
35 > /root/script.py(5)wrapper()
36 4 _enclosed = (fn, value)
37 ----> 5 raise ValueError
38 6 return wrapper
39
40 ipdb> ll
41 3 def wrapper(*args, **kwargs):
42 4 _enclosed = (fn, value)
43 ----> 5 raise ValueError
44 6 return wrapper
45
46 ipdb> l
47 7 return my_decorator
48 8
49 9 @my_closure("this")
50 10 def my_function(*args, **kwargs):
51 11 pass
52 12
53 13 my_function(0, kwarg=1)
54 14
55
56 ipdb>

Chapter 23. Profiling

Profiling is the process of measuring the performance of your code


and identifying bottlenecks. This allows you to optimize your code and
improve its performance. Python provides several built-in libraries and
tools for profiling your code. They are designed to hook into audit events
Part VI. The Underbelly of the Snake 259

such as call or return, and report how much time elapsed between the
auditable events of a given set of instructions. There are also third party
libraries available which can analyze other aspects of the runtime, like
memory usage.

cProfile

One of the most commonly used tools for profiling Python code is the
cProfile module. It is a built-in library that generates statistics on the
number of calls and the time spent in each function. This information
can be used to identify which parts of the code are taking the most time
to execute, and make adjustments accordingly.
To use cProfile, you can run your script with the command python -m
cProfile ./script.py and it will output the statistics of the script’s
execution. You can pass an optional -s argument so as to control how
the output is sorted; by default the output is sorted by the call count, but
can be set to cumulative to sort by cumulative time, ncalls to sort
by the call count, etc. You can also pass -o ./file.prof to dump the
results to a file, though -s and -o are mutually exclusive.

1 import time
2
3
4 def slow_mult(a, b):
5 time.sleep(1.1)
6 return a * b
7
8
9 def fast_mult(a, b):
10 time.sleep(0.1)
Part VI. The Underbelly of the Snake 260

11 return a * b
12
13
14 def run_mult(a, b):
15 x = slow_mult(a, b)
16 y = fast_mult(a, b)
17 _abs = abs(x - y)
18 return _abs < 0.001
19
20
21 def main():
22 a, b = 1, 2
23 run_mult(a, b)
24
25
26 if __name__ == "__main__":
27 main()

1 root@3668136f44b5:/code# python -m cProfile -s cumulative ./scrip\


2 t.py
3
4 root@3668136f44b5:/code# python -m cProfile -s cumulative ./scrip\
5 t.py
6 10 function calls in 1.200 seconds
7
8 Ordered by: cumulative time
9
10 ncalls tottime percall cumtime percall filename:lineno(fun\
11 ction)
12 1 0.000 0.000 1.200 1.200 {built-in method bu\
13 iltins.exec}
14 1 0.000 0.000 1.200 1.200 script.py:1(<module\
Part VI. The Underbelly of the Snake 261

15 >)
16 1 0.000 0.000 1.200 1.200 script.py:19(main)
17 1 0.000 0.000 1.200 1.200 script.py:12(run_mu\
18 lt)
19 2 1.200 0.600 1.200 0.600 {built-in method ti\
20 me.sleep}
21 1 0.000 0.000 1.100 1.100 script.py:3(slow_mu\
22 lt)
23 1 0.000 0.000 0.100 0.100 script.py:7(fast_mu\
24 lt)
25 1 0.000 0.000 0.000 0.000 {method 'disable' o\
26 f '_lsprof.Profiler' objects}
27 1 0.000 0.000 0.000 0.000 {built-in method bu\
28 iltins.abs}

flameprof

The data dump of a cProfile run can be used to generate what’s called a
flame graph. Flame graphs are visual representations of how much time
is spent within the scope of a given function call. Each bar in a flame
graph represents a function and its subfunctions, with the width of the
bar representing the amount of time spent in that function. Functions
that take up more time are represented by wider bars, and functions
that take up less time are represented by narrower bars. The functions
are stacked vertically, with the main function at the bottom and the
subfunctions at the top.
The python library flameprof can be used to generate flame graphs
from the output of cProfile. To generate one, first run cProfile with
the -o argument to dump results to a file. Next, use flameprof to ingest
the dump file. flameprof will use that profile to generate an svg file.
You can open this file in a web browser to see the results.
Part VI. The Underbelly of the Snake 262

1 root@3668136f44b5:/code# python -m cProfile -o ./script.prof ./sc\


2 ript.py
3
4 root@3668136f44b5:/code# python -m pip install flameprof
5 Collecting flameprof
6 Downloading flameprof-0.4.tar.gz (7.9 kB)
7 Preparing metadata (setup.py) ... done
8 Building wheels for collected packages: flameprof
9 Building wheel for flameprof (setup.py) ... done
10 Created wheel for flameprof: filename=flameprof-0.4-py3-none-an\
11 y.whl size=8009 sha256=487bbd51bcf377fa2f9e0795db03b46896d1b41adf\
12 719272ea8e187abbd85bb5
13 Stored in directory: /root/.cache/pip/wheels/18/93/7e/afc52a495\
14 a87307d7b93f5e03ee364585b0edf120fb98eff99
15 Successfully built flameprof
16 Installing collected packages: flameprof
17 Successfully installed flameprof-0.4
18 WARNING: Running pip as the 'root' user can result in broken perm\
19 issions and conflicting behaviour with the system package manager\
20 . It is recommended to use a virtual environment instead: https:/\
21 /pip.pypa.io/warnings/venv
22
23 root@3668136f44b5:/code# python -m flameprof ./script.prof > scri\
24 pt.svg
25
26 root@3668136f44b5:/code# ls
27 script.prof script.py script.svg
Part VI. The Underbelly of the Snake 263

flamegraph

snakeviz

In addition to flameprof, the snakeviz library can be used to visualize


the output of cProfile. snakeviz is a browser-based tool that takes the
profile dump as input and runs a lightweight web server for interactive
analysis. In this webpage you can filter data by module, function, file,
and sort it by different criteria such as the number of calls or cumulative
time spent in a function.

1 root@3668136f44b5:/code# python -m pip install snakeviz


2 Collecting snakeviz
3 Downloading snakeviz-2.1.1-py2.py3-none-any.whl (282 kB)
4 ���������������������������������������� 282.1/282.1 kB 4.5 \
5 MB/s eta 0:00:00
6 Collecting tornado>=2.0
7 Downloading tornado-6.2-cp37-abi3-manylinux_2_5_x86_64.manylinu\
8 x1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (423 kB)
9 ���������������������������������������� 424.0/424.0 kB 17.4\
10 MB/s eta 0:00:00
11 Installing collected packages: tornado, snakeviz
12 Successfully installed snakeviz-2.1.1 tornado-6.2
13 WARNING: Running pip as the 'root' user can result in broken perm\
14 issions and conflicting behaviour with the system package manager\
Part VI. The Underbelly of the Snake 264

15 . It is recommended to use a virtual environment instead: https:/\


16 /pip.pypa.io/warnings/venv
17 root@3668136f44b5:/code# python -m snakeviz script.prof
18 snakeviz web server started on 127.0.0.1:8080; enter Ctrl-C to ex\
19 it
20 http://127.0.0.1:8080/snakeviz/%2Fhome%2Fmichael%2FProjects%2Ftex\
21 tbook%2Fprofiling%2Fscript.prof
22 Opening in existing browser session.

snakeviz

memory_profiler

memory_profiler is a Python library that helps you profile the memory


usage of your Python code. To use memory_profiler, you need to install
the library, import it into your code, and add the @profile decorator
to the functions you want to profile. Then, when you run your code,
memory_profiler will collect and display memory usage data for each
line of the decorated functions.
Part VI. The Underbelly of the Snake 265

1 # ./script.py
2
3 @__import__("memory_profiler").profile
4 def main():
5 my_list = [257] * (10**6)
6 return my_list
7
8
9 if __name__ == "__main__":
10 main()

1 root@3668136f44b5:/code# python ./script.py


2 Filename: /code/script.py
3
4 Line # Mem usage Increment Occurrences Line Contents
5 =============================================================
6 24 21.1 MiB 21.1 MiB 1 @__import__("memor\
7 y_profiler").profile
8 25 def main():
9 26 28.5 MiB 7.5 MiB 1 my_list = [257\
10 ] * (10**6)
11 27 28.5 MiB 0.0 MiB 1 return my_list
12
13
14 root@3668136f44b5:/code#

Chapter 24. C extensions

C extensions are a powerful tool that allow you to write performance-


critical parts of your Python code in C. By writing these performance-
critical parts in C, you can greatly improve the speed and efficiency of
Part VI. The Underbelly of the Snake 266

your Python code, which can be especially useful for large or complex
applications.
One of the benefits of using C extensions is that C is much faster than
Python, especially for tasks that are computationally intensive or involve
heavy manipulation of data. With C extensions, you can take advantage
of the performance benefits of C while still being able to use Python for
the higher-level logic and user interface parts of your code.
Additionally, C extensions can also be used to interface with existing
C libraries and APIs. This allows you to leverage existing libraries and
tools that are already available in C, making it easier to build complex
systems and applications.
It is important to note that the use of C extensions in Python requires a
good understanding of programming in C. If you are unfamiliar with C
or are not comfortable with its syntax and concepts, it is recommended
that you first learn C before attempting to use C extensions in Python.
Writing and using C extensions can be complex and requires a strong
understanding of both Python and C, so it is important to have a solid
foundation in both languages before diving into this aspect of Python
development.

Hello World

To start, we’re going to create a new python package with the following
folder structure:
Part VI. The Underbelly of the Snake 267

1 my_package/
2 __init__.py
3 hello_world.c
4 setup.py

The __init__.py file is an empty file designating my_package as


a python package. The hello_world.c file contains our extension
code. The setup.py file contains build instructions and metadata for
our package. Finally, the hello_world.c file defines a set of static
constructs which abstract C functions into objects with which the python
interpreter knows how to interact.

hello_world.c

The following is an example file for the hello_world.c extension:

1 // ./my_package/hello_world.c
2
3 #include <stdio.h>
4
5 #include <Python.h>
6
7 static PyObject* hello_world() {
8 puts("Hello World!");
9 Py_RETURN_NONE;
10 }
11
12 static char HelloWorldFunctionDocs[] =
13 "prints 'Hello World!' to the screen, from C.";
14
15 static PyMethodDef MethodTable[] = {
16 {
Part VI. The Underbelly of the Snake 268

17 "hello_world",
18 (PyCFunction) hello_world,
19 METH_NOARGS,
20 HelloWorldFunctionDocs
21 },
22 {NULL, }
23 };
24
25 static char HelloWorldModuleDocs[] =
26 "module documentation for 'Hello World'";
27
28 static struct PyModuleDef HelloWorld = {
29 PyModuleDef_HEAD_INIT,
30 "hello_world",
31 HelloWorldModuleDocs,
32 -1,
33 MethodTable
34 };
35
36 PyMODINIT_FUNC PyInit_hello_world() {
37 return PyModule_Create(&HelloWorld);
38 }

Starting from the bottom of this file; the PyMODINIT_FUNC macro


initializes the module, and it returns the module definition to the
interpreter. This initialization function should be the only item in your
extension file that’s not defined as static.
PyModuleDef is a structure defined in the C/Python API that
defines information about a Python module. It’s used to specify the
module’s name, methods, documentation, and other details about the
module. The PyModuleDef structure is used in the definition of the
module’s init function, which is called when the module is imported.
Part VI. The Underbelly of the Snake 269

This structure includes the PyModuleDef_HEAD_INIT macro, the


name of the module ("hello_world"), the module’s documentation
(HelloWorldModuleDocs), the size of the per-interpreter state of the
module (if the module does not need to support subinterpreters, this can
be set to -1), and the MethodTable array which indicates the methods
that the module provides.
The PyMethodDef structure is used in the definition of the module’s
PyModuleDef structure to specify the methods that the module provides.
It is a NULL-terminated array of arrays, where each array of the
table defines the function name ("hello_world"), a function pointer
(hello_world), a function signature (METH_NOARGS), and a docstring.
In this case, our hello_world C function takes no arguments, so the
function signature is METH_NOARGS. We’ll explore other options for
function signatures in a later section.
The C function hello_world has a function definition of static
PyObject*. Even though this is a no-op function, from the perspective of
the interpreter functions which return nothing still return the singleton
None. The macro Py_RETURN_NONE handles incrementing the reference
count of the None object that is ultimately returned. The function itself
simply uses the puts() C function to print "Hello World!" to the
console.
Finally, the C extension needs to source the C/Python API. This is done
by including the python header file via #include <Python.h>. Since it
also makes use of the puts() function, we include the standard io header
file via #include <stdio.h>

setup.py

Now that we have our C extension written to a file, we need to tell python
how to compile this extension into a dynamically linked shared object
Part VI. The Underbelly of the Snake 270

library. The most straightforward way to do this is to use setuptools


to handle compilation. This is done using the setup.py file, in which
we create a collection of Extension objects.
Each Extension object takes, at minimum, a name which python uses
for import purposes, and a list of C source files to be compiled into
the extension. Since python package imports are relative to the root
of the package, it’s typically convenient to have the name mimic the
import path of the extension file, i.e. ./my_package/hello_world.c
maps to the my_package.hello_world import statement. While not
strictly necessary, I find this makes it easier for compilation purposes.
The filepaths to these C files are relative to the setup.py file. The
collection is then passed to the setup() function under the keyword
argument ext_modules.

1 # ./setup.py
2
3 import os.path
4 from setuptools import setup, Extension
5
6 extensions = [
7 Extension(
8 'my_package.hello_world',
9 [os.path.join('my_package', 'hello_world.c')]
10 )
11 ]
12
13 setup(
14 name="my_package",
15 ext_modules=extensions
16 )

Finally, to use our extension, install the package and call the module
Part VI. The Underbelly of the Snake 271

function
in Python.

1 root@edc7d7fa9220:/code# ls
2 my_package setup.py
3
4 root@edc7d7fa9220:/code# python -m pip install -e .
5 Obtaining file:///code
6 Preparing metadata (setup.py) ... done
7 Installing collected packages: my-package
8 Running setup.py develop for my-package
9 Successfully installed my-package-0.0.0
10
11 [notice] A new release of pip available: 22.3.1 -> 23.0
12 [notice] To update, run: pip install --upgrade pip
13
14 root@edc7d7fa9220:/code# python
15 Python 3.11.1 (main, Jan 23 2023, 21:04:06) [GCC 10.2.1 20210110]\
16 on linux
17 Type "help", "copyright", "credits" or "license" for more informa\
18 tion.
19 >>> from my_package import hello_world
20 >>> hello_world.__doc__
21 "module documentation for 'Hello World'"
22 >>> hello_world.hello_world.__doc__
23 "prints 'Hello World!' to the screen, from C."
24 >>> hello_world.hello_world()
25 Hello World!
26 >>>

If you need to make changes to a C extension, you need to recompile. This


can be done using the setup.py script argument build_ext. By default,
Part VI. The Underbelly of the Snake 272

building a C extension places the shared object file into a ./build


directory. You can pass the argument --inplace to have the setup script
copy the shared object library from the ./build directory into your
module automatically, and place it next to the location of the .c file.

1 // ./my_package/hello_world.c
2
3 #include <stddef.h>
4 #include <stdio.h>
5
6 #include <Python.h>
7
8 static PyObject* hello_world() {
9 puts("Hello World! \n- With <3 from C");
10 Py_RETURN_NONE;
11 }
12
13 static char HelloWorldFunctionDocs[] =
14 "prints 'Hello World!' to the screen, from C.";
15
16 static PyMethodDef MethodTable[] = {
17 {
18 "hello_world",
19 (PyCFunction) hello_world,
20 METH_NOARGS,
21 HelloWorldFunctionDocs
22 },
23 {NULL, }
24 };
25
26 static char HelloWorldModuleDocs[] =
27 "module documentation for 'Hello World'";
Part VI. The Underbelly of the Snake 273

28
29 static struct PyModuleDef HelloWorld = {
30 PyModuleDef_HEAD_INIT,
31 "hello_world",
32 HelloWorldModuleDocs,
33 -1,
34 MethodTable
35 };
36
37 PyMODINIT_FUNC PyInit_hello_world() {
38 return PyModule_Create(&HelloWorld);
39 }

1 root@edc7d7fa9220:/code# python setup.py build_ext --inplace


2 running build_ext
3 building 'my_package.hello_world' extension
4 creating build
5 creating build/temp.linux-x86_64-cpython-311
6 creating build/temp.linux-x86_64-cpython-311/my_package
7 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -\
8 I/usr/local/include/python3.11 -c my_package/hello_world.c -o bui\
9 ld/temp.linux-x86_64-cpython-311/my_package/hello_world.o
10 creating build/lib.linux-x86_64-cpython-311
11 creating build/lib.linux-x86_64-cpython-311/my_package
12 gcc -pthread -shared build/temp.linux-x86_64-cpython-311/my_packa\
13 ge/hello_world.o -L/usr/local/lib -o build/lib.linux-x86_64-cpyth\
14 on-311/my_package/hello_world.cpython-311-x86_64-linux-gnu.so
15 copying build/lib.linux-x86_64-cpython-311/my_package/hello_world\
16 .cpython-311-x86_64-linux-gnu.so -> my_package
17
18 root@edc7d7fa9220:/code# python
19 Python 3.11.1 (main, Jan 23 2023, 21:04:06) [GCC 10.2.1 20210110]\
Part VI. The Underbelly of the Snake 274

20 on linux
21 Type "help", "copyright", "credits" or "license" for more informa\
22 tion.
23 >>> from my_package import hello_world
24 >>> hello_world.hello_world()
25 Hello World!
26 - With <3 from C
27 >>>

Passing data in and out of Python

In order to pass python objects into C functions, we need to set a


flag in the function signature to one of either METH_O (indicating a
single PyObject), METH_VARARGS (including positional arguments only),
or METH_VARARGS|METH_KEYWORDS (including both positional and key-
word arguments).
All objects in Python are PyObject C structs. In order to make use of
our python objects in C, the data from python needs to be unwrapped
from their PyObject shells and then cast into a corresponding C data
type. The primitive python data types have corresponding Py<Type>_-
<As/From><Type> functions for doing this cast operation.
Part VI. The Underbelly of the Snake 275

1 #include <Python.h>
2
3 static PyObject* sum(PyObject* self, PyObject* args) {
4 PyObject* iter = PyObject_GetIter(args);
5 PyObject* item;
6
7 long res_i = 0;
8 double res_f = 0;
9
10 while ((item = PyIter_Next(iter))) {
11 if (PyLong_Check(item)) {
12 long val_i = PyLong_AsLong(item);
13 res_i += val_i;
14 }
15 else if (PyFloat_Check(item)) {
16 double val_f = PyFloat_AsDouble(item);
17 res_f += val_f;
18 }
19 Py_DECREF(item);
20 }
21 Py_DECREF(iter);
22
23 if (res_f) {
24 double result = res_f + res_i;
25 return PyFloat_FromDouble(result);
26 }
27 return PyLong_FromLong(res_i);
28 }
29
30 static PyMethodDef MethodTable[] = {
31 {
32 "sum",
Part VI. The Underbelly of the Snake 276

33 (PyCFunction) sum,
34 METH_VARARGS,
35 "returns the sum of a series of numeric types"
36 },
37 {NULL, }
38 };
39
40
41 static struct PyModuleDef MyMathModule = {
42 PyModuleDef_HEAD_INIT,
43 "math",
44 "my math module",
45 -1,
46 MethodTable,
47 };
48
49 PyMODINIT_FUNC PyInit_math() {
50 return PyModule_Create(&MyMathModule);
51 }

In this example, we’re creating a sum() function which takes variadic


arguments of the function call and calculates the sum of the numeric
types. We’re parsing the arguments lazily, by creating an iterable to
loop over. Each item of the iterable is cast to either a long or a double
depending on if its corresponding python type is an int or a float.
Those values are then added to a resultant.
The function’s return “type” is dependent on the arguments of the
function call. If no float types are passed to the sum function, then
the returned value is of Python type int. If however there were float
types provided, the entire result is cast to a double and then returned as
a float type.
Part VI. The Underbelly of the Snake 277

1 root@edc7d7fa9220:/code# python setup.py build_ext --inplace --fo\


2 rce
3 running build_ext
4 building 'my_package.hello_world' extension
5 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -\
6 I/usr/local/include/python3.11 -c my_package/hello_world.c -o bui\
7 ld/temp.linux-x86_64-cpython-311/my_package/hello_world.o
8 gcc -pthread -shared build/temp.linux-x86_64-cpython-311/my_packa\
9 ge/hello_world.o -L/usr/local/lib -o build/lib.linux-x86_64-cpyth\
10 on-311/my_package/hello_world.cpython-311-x86_64-linux-gnu.so
11 building 'my_package.math' extension
12 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -\
13 I/usr/local/include/python3.11 -c my_package/math.c -o build/temp\
14 .linux-x86_64-cpython-311/my_package/math.o
15 gcc -pthread -shared build/temp.linux-x86_64-cpython-311/my_packa\
16 ge/math.o -L/usr/local/lib -o build/lib.linux-x86_64-cpython-311/\
17 my_package/math.cpython-311-x86_64-linux-gnu.so
18 copying build/lib.linux-x86_64-cpython-311/my_package/hello_world\
19 .cpython-311-x86_64-linux-gnu.so -> my_package
20 copying build/lib.linux-x86_64-cpython-311/my_package/math.cpytho\
21 n-311-x86_64-linux-gnu.so -> my_package
22
23 root@edc7d7fa9220:/code# python
24 Python 3.11.1 (main, Jan 23 2023, 21:04:06) [GCC 10.2.1 20210110]\
25 on linux
26 Type "help", "copyright", "credits" or "license" for more informa\
27 tion.
28
29 >>> from my_package import math
30 >>> (val := math.sum(1, 2, 3))
31 6
32 >>> type(val)
Part VI. The Underbelly of the Snake 278

33 class 'int'
34 >>> (val := math.sum(1, 2, 3, 4.0))
35 10.0
36 >>> type(val)
37 class 'float'
38 >>>

Memory Management

Reference counting is one of the core concepts of memory management


in Python. It is used to keep track of the number of references to an
object in the program and automatically reclaim memory when an object
is no longer in use, or when the reference count drops to zero. The
rationale behind this approach is to simplify memory management for
the interpreter and reduce the risk of memory leaks and other issues
associated with manual memory management.
In Python, the reference count of an object is increased whenever
a reference to the object is created, and is decremented whenever a
reference to the object is deleted. When the reference count of an
object reaches zero, the object is slated for deallocation so its memory
is freed. In a C extension, it is possible to interact with the reference
counting mechanism of Python objects by using the Py_INCREF and
Py_DECREF functions. Py_INCREF is used to increment the reference
count of an object, indicating that the object is being used in multiple
places. Py_DECREF is used to decrement the reference count of an object,
indicating that the object is no longer needed and can be garbage
collected if its reference count reaches zero.
References in C can be either owned or borrowed. If a function call
returns an owned reference, you are responsible for one of the counts
in the reference count of the object. Failing to properly decrement the
Part VI. The Underbelly of the Snake 279

reference count will result in a memory leak. In the previous example,


PyObject_GetIter() and PyIter_Next() both returned owned refer-
ences. Thus, we call Py_DECREF() on these objects when we were done
with them, so that they are properly garbage collected. If however a func-
tion call returns a borrowed reference, you are equally responsible for not
decrementing the reference count of the object. Calling Py_DECREF()
on a borrowed reference will cause an object to be garbage collected
prematurely, leading to dangling pointers.

Parsing Arguments

Positional arguments are passed down to C extension functions in the


form of a tuple, and keyword arguments are passed as a dictionary. As
such, any of the tuple and dictionary functions in the C/Python API can
be used to individually extract PyObject* items from their respected
collections. However, there are several functions designed to batch-parse
these PyObjects into local variables, mainly PyArg_ParseTuple() and
PyArg_ParseTupleAndKeywords(). These functions return borrowed
references. Meaning, you do not need to increase or decrease the refer-
ence count if you only plan to use these references in the context of a
specific C function.

Parsing Tuple Arguments

PyArg_ParseTuple() will parse a tuple out to local variables. It takes


a tuple, in this case the args PyObject pointer, a format string that
specifies the expected types of the arguments, and variables to store the
parsed values. The format string uses codes to indicate the type of each
argument. For example, i for an integer, s for a string, etc. The function
returns non-zero if the arguments are successfully parsed and stored in
Part VI. The Underbelly of the Snake 280

the specified variables, and zero otherwise. If the arguments are invalid,
it raises an exception in Python to indicate the error, and the C function
should immediately return NULL.

1 static PyObject* print_int(PyObject* self, PyObject* args) {


2 int val;
3 if (!PyArg_ParseTuple(args, "i", &val))
4 return NULL;
5
6 printf("%d\n", val);
7 Py_RETURN_NONE;
8 }

1 root@cb4fbbf71628:/code# python -c "from my_package import math; \


2 math.print_int(1)"
3 1
4 root@cb4fbbf71628:/code# python -c "from my_package import math; \
5 math.print_int(1.0)"
6 Traceback (most recent call last):
7 File "<string>", line 1, in <module>
8 TypeError: 'float' object cannot be interpreted as an integer

Parsing Keyword Arguments

PyArg_ParseTupleAndKeywords() will parse a tuple/dictionary pair


of args, kwargs pointers out to local variables. It takes a tuple, a
dictionary, a format string for positional parsing, a null-terminated array
of keywords to parse from the function call, and finally the variables from
where to store the parsed values. If the arguments are invalid, it raises
an exception in Python to indicate the error, and the function should
immediately return NULL. It should be noted that the kwlist should
Part VI. The Underbelly of the Snake 281

include names for all of the arguments, whether or not they are strictly
positional vs keyword.

1 static PyObject* print_ints(PyObject* self, PyObject* args, PyObj\


2 ect* kwargs) {
3 int pos;
4 int kwd;
5 char *kwlist[] = {"pos", "keyword", NULL};
6 if (!PyArg_ParseTupleAndKeywords(args, kwargs, "l$l", kwlist,\
7 &pos, &kwd))
8 return NULL;
9
10 printf("postional: %d\nkeyword: %d\n", pos, kwd);
11 Py_RETURN_NONE;
12 }

1 root@cb4fbbf71628:/code# python setup.py build_ext --inplace --fo\


2 rce
3 running build_ext
4 building 'my_package.hello_world' extension
5 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -\
6 I/usr/local/include/python3.11 -c my_package/hello_world.c -o bui\
7 ld/temp.linux-x86_64-cpython-311/my_package/hello_world.o
8 gcc -pthread -shared build/temp.linux-x86_64-cpython-311/my_packa\
9 ge/hello_world.o -L/usr/local/lib -o build/lib.linux-x86_64-cpyth\
10 on-311/my_package/hello_world.cpython-311-x86_64-linux-gnu.so
11 building 'my_package.math' extension
12 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -\
13 I/usr/local/include/python3.11 -c my_package/math.c -o build/temp\
14 .linux-x86_64-cpython-311/my_package/math.o
15 gcc -pthread -shared build/temp.linux-x86_64-cpython-311/my_packa\
16 ge/math.o -L/usr/local/lib -o build/lib.linux-x86_64-cpython-311/\
Part VI. The Underbelly of the Snake 282

17 my_package/math.cpython-311-x86_64-linux-gnu.so
18 copying build/lib.linux-x86_64-cpython-311/my_package/hello_world\
19 .cpython-311-x86_64-linux-gnu.so -> my_package
20 copying build/lib.linux-x86_64-cpython-311/my_package/math.cpytho\
21 n-311-x86_64-linux-gnu.so -> my_package
22
23 root@cb4fbbf71628:/code# python -c "from my_package import math; \
24 math.print_ints(pos=1, keyword=2)"
25 postional: 1
26 keyword: 2
27 root@cb4fbbf71628:/code#

Creating PyObjects

In the previous section we looked at how we can take PyObjects and


parse them into C types for computation. We also need to know how
to do the inverse; given a collection of C types, create PyObjects which
we can pass back to the interpreter. As mentioned previously, primitive
data types can be packaged into PyObject’s using their corresponding
Py<Type>_<As/From><Type> functions. These functions create owned
references, which can be returned directly from our C functions.
Primitive data types can also be packaged into collections. Each col-
lection type has a corresponding constructor, Py<Type>_New(), which
can create a new object of a specified type. Each datatype also has
corresponding functions to operate on their respected PyObjects; these
functions mostly correspond to the methods of a given type available at
the python level - PyList_Append() vs. list.append(), PySet_Add()
vs. set.add, etc.
Part VI. The Underbelly of the Snake 283

1 static PyObject* new_collections() {


2 PyObject* _int = PyLong_FromLong(1);
3 PyObject* _float = PyFloat_FromDouble(1.0);
4
5 PyObject* _set = PySet_New(NULL);
6 PySet_Add(_set, _int);
7 PySet_Add(_set, _float);
8
9 PyObject* _list = PyList_New(0);
10 PyList_Append(_list, _set);
11
12 PyObject* _str = PyUnicode_FromString("data");
13 PyObject* _dict = PyDict_New();
14
15 PyDict_SetItem(_dict, _str, _list);
16
17 PyObject* _tuple = PyTuple_New(1);
18 PyTuple_SetItem(_tuple, 0, _dict);
19
20 return _tuple;
21 }

1 root@cb4fbbf71628:/code# python setup.py build_ext --inplace --fo\


2 rce
3 running build_ext
4 building 'my_package.hello_world' extension
5 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -\
6 I/usr/local/include/python3.11 -c my_package/hello_world.c -o bui\
7 ld/temp.linux-x86_64-cpython-311/my_package/hello_world.o
8 gcc -pthread -shared build/temp.linux-x86_64-cpython-311/my_packa\
9 ge/hello_world.o -L/usr/local/lib -o build/lib.linux-x86_64-cpyth\
10 on-311/my_package/hello_world.cpython-311-x86_64-linux-gnu.so
Part VI. The Underbelly of the Snake 284

11 building 'my_package.math' extension


12 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -\
13 I/usr/local/include/python3.11 -c my_package/math.c -o build/temp\
14 .linux-x86_64-cpython-311/my_package/math.o
15 gcc -pthread -shared build/temp.linux-x86_64-cpython-311/my_packa\
16 ge/math.o -L/usr/local/lib -o build/lib.linux-x86_64-cpython-311/\
17 my_package/math.cpython-311-x86_64-linux-gnu.so
18 copying build/lib.linux-x86_64-cpython-311/my_package/hello_world\
19 .cpython-311-x86_64-linux-gnu.so -> my_package
20 copying build/lib.linux-x86_64-cpython-311/my_package/math.cpytho\
21 n-311-x86_64-linux-gnu.so -> my_package
22 root@cb4fbbf71628:/code# python -c "from my_package import hello_\
23 world; print(hello_world.new_collections())"
24 ({'data': [{1}]},)
25 root@cb4fbbf71628:/code#

It should be noted that in this example each data type is only being as-
signed once, if you wish to assign the same object to multiple collections,
for example appending the _float to the _list, as well as adding it to
the _set, you’ll need to increase the reference count.

Importing Modules

In a C extension, you can import other Python modules using the


PyImport_ImportModule() function from the C/Python API. This
function takes a single argument, a string representing the name of the
module you wish to import. For example, to import the “math” module,
you can assign the module to a variable using PyObject *math_module
= PyImport_ImportModule("math"). Once the module is locally as-
signed, accessing objects from the module namespace can be done using
PyObject_GetAttrString().
Part VI. The Underbelly of the Snake 285

1 root@cb4fbbf71628:/code# python setup.py build_ext --inplace --fo\


2 rce
3 running build_ext
4 building 'my_package.hello_world' extension
5 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -\
6 I/usr/local/include/python3.11 -c my_package/hello_world.c -o bui\
7 ld/temp.linux-x86_64-cpython-311/my_package/hello_world.o
8 gcc -pthread -shared build/temp.linux-x86_64-cpython-311/my_packa\
9 ge/hello_world.o -L/usr/local/lib -o build/lib.linux-x86_64-cpyth\
10 on-311/my_package/hello_world.cpython-311-x86_64-linux-gnu.so
11 building 'my_package.math' extension
12 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -\
13 I/usr/local/include/python3.11 -c my_package/math.c -o build/temp\
14 .linux-x86_64-cpython-311/my_package/math.o
15 gcc -pthread -shared build/temp.linux-x86_64-cpython-311/my_packa\
16 ge/math.o -L/usr/local/lib -o build/lib.linux-x86_64-cpython-311/\
17 my_package/math.cpython-311-x86_64-linux-gnu.so
18 copying build/lib.linux-x86_64-cpython-311/my_package/hello_world\
19 .cpython-311-x86_64-linux-gnu.so -> my_package
20 copying build/lib.linux-x86_64-cpython-311/my_package/math.cpytho\
21 n-311-x86_64-linux-gnu.so -> my_package
22
23 root@cb4fbbf71628:/code# python -c "from my_package import math; \
24 print(math.my_sqrt(4))"
25 2.0

Defining New Types

In addition to creating functions, we can create entirely new Python


objects using the C/Python API. These objects can define attributes
and methods which are written in C, and can be evoked from python
code directly. It should be noted that creating Python objects in C is an
Part VI. The Underbelly of the Snake 286

exhaustive process and we will undoubtedly be unable to cover all the


nuances. That being said for the sake of completeness we’ll introduce
the fundamental concepts, and we suggest to consult the official python
documentation for the more esoteric aspects of the craft.
To create a new type, you first need to define a struct that represents the
type and initialize it using the PyTypeObject struct. The PyTypeObject
struct contains fields such as; tp_name, which is the name of the type;
tp_basicsize, which is the size of the type in bytes; tp_new, tp_init,
and tp_dealloc, which are functions that are called when an instance
of the type is being allocated, initialized, and destroyed respectively; and
both Member and Method tables for defining the attributes and methods
which are exposed by our object. Once you have defined the struct, you
can create the new type by calling the PyType_Ready() function and
passing it a reference to the struct. This function initializes the type and
makes it available to in the interpreter. Finally, to export the type from
the module, you define a module initialization function using the PyMO-
DINIT_FUNC macro and call the PyModule_Create() function to create
a new module. On this object you use the PyModule_AddObject()
function to insert the custom PyTypeObject into the module.
Below is a full working example of a Person class, with first_name
and last_name attributes, and a full_name method for creating a string
that includes both attributes.
Part VI. The Underbelly of the Snake 287

1 #include <Python.h>
2 #include "structmember.h"
3
4 typedef struct {
5 PyObject_HEAD
6 PyObject *first_name;
7 PyObject *last_name;
8 } Person;
9
10 static void Person_Destruct(Person* self) {
11 Py_XDECREF(self->first_name);
12 Py_XDECREF(self->last_name);
13 Py_TYPE(self)->tp_free((PyObject*)self);
14 }
15
16 static int Person_Init(Person *self, PyObject *args, PyObject *kw\
17 args) {
18
19 PyObject *first=NULL, *last=NULL;
20 static char *kwlist[] = {"first_name", "last_name", NULL};
21 if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO", kwlist, \
22 &first, &last))
23 return -1;
24
25 PyObject* _first = self->first_name;
26 Py_INCREF(first);
27 self->first_name = first;
28 Py_XDECREF(_first);
29
30 PyObject* _last = self->last_name;
31 Py_INCREF(last);
32 self->last_name = last;
Part VI. The Underbelly of the Snake 288

33 Py_XDECREF(_last);
34
35 return 0;
36 }
37
38 static PyObject* Person_FullName(Person* self) {
39 if (self->first_name == NULL) {
40 PyErr_SetString(PyExc_AttributeError, "first_name is not \
41 defined");
42 return NULL;
43 }
44
45 if (self->last_name == NULL) {
46 PyErr_SetString(PyExc_AttributeError, "last_name is not d\
47 efined");
48 return NULL;
49 }
50
51 return PyUnicode_FromFormat("%S %S", self->first_name, self->\
52 last_name);
53 }
54
55 static PyMemberDef PersonMembers[] = {
56 {
57 "first_name",
58 T_OBJECT_EX,
59 offsetof(Person, first_name),
60 0,
61 "first name"
62 },
63 {
64 "last_name",
Part VI. The Underbelly of the Snake 289

65 T_OBJECT_EX,
66 offsetof(Person, last_name),
67 0,
68 "last name"
69 },
70 {NULL, }
71 };
72
73 static PyMethodDef PersonMethods[] = {
74 {
75 "full_name",
76 (PyCFunction)Person_FullName,
77 METH_NOARGS,
78 "return the full name of the Person"
79 },
80 {NULL, }
81 };
82
83 static PyTypeObject PersonType = {
84 PyVarObject_HEAD_INIT(NULL, 0)
85 "person.Person", /* tp_name */
86 sizeof(Person), /* tp_basicsize */
87 0, /* tp_itemsize */
88 (destructor)Person_Destruct, /* tp_dealloc */
89 0, /* tp_print */
90 0, /* tp_getattr */
91 0, /* tp_setattr */
92 0, /* tp_reserved */
93 0, /* tp_repr */
94 0, /* tp_as_number */
95 0, /* tp_as_sequence */
96 0, /* tp_as_mapping */
Part VI. The Underbelly of the Snake 290

97 0, /* tp_hash */
98 0, /* tp_call */
99 0, /* tp_str */
100 0, /* tp_getattro */
101 0, /* tp_setattro */
102 0, /* tp_as_buffer */
103 Py_TPFLAGS_DEFAULT
104 | Py_TPFLAGS_BASETYPE, /* tp_flags */
105 "Person objects", /* tp_doc */
106 0, /* tp_traverse */
107 0, /* tp_clear */
108 0, /* tp_richcompare */
109 0, /* tp_weaklistoffset */
110 0, /* tp_iter */
111 0, /* tp_iternext */
112 PersonMethods, /* tp_methods */
113 PersonMembers, /* tp_members */
114 0, /* tp_getset */
115 0, /* tp_base */
116 0, /* tp_dict */
117 0, /* tp_descr_get */
118 0, /* tp_descr_set */
119 0, /* tp_dictoffset */
120 (initproc)Person_Init, /* tp_init */
121 0, /* tp_alloc */
122 PyType_GenericNew, /* tp_new */
123 };
124
125 static PyModuleDef PersonModule = {
126 PyModuleDef_HEAD_INIT,
127 "person",
128 "Example module for creating Python types in C",
Part VI. The Underbelly of the Snake 291

129 -1,
130 NULL
131 };
132
133 PyMODINIT_FUNC PyInit_person() {
134 if (PyType_Ready(&PersonType) < 0)
135 return NULL;
136
137 PyObject* module = PyModule_Create(&PersonModule);
138 if (module == NULL)
139 return NULL;
140
141 Py_INCREF(&PersonType);
142 PyModule_AddObject(module, "Person", (PyObject*)&PersonType);
143 return module;
144 }

Now, this is a lot to take in at once, so lets go over each section


individually, this time from top to bottom.

1 #include <Python.h>
2 #include "structmember.h"

To start, we include the standard Python.h header file. We also need to


include the structmember.h header file, which includes declarations
needed for creating PyMemberDef and PyMethodDef structs.
Part VI. The Underbelly of the Snake 292

1 typedef struct {
2 PyObject_HEAD
3 PyObject *first_name;
4 PyObject *last_name;
5 } Person;

Next, we define a Person struct. This struct represents the new type in
Python that can be used to model a person. The struct is initialized using
the PyObject_HEAD macro for including the standard prefixes for all
Python objects in C, and two custom fields, first_name and last_name,
which are pointers to the Python Objects which will represent the first
and last name of our person.

1 static void Person_Destruct(Person* self) {


2 Py_XDECREF(self->first_name);
3 Py_XDECREF(self->last_name);
4 Py_TYPE(self)->tp_free((PyObject*)self);
5 }

Next, we define the function that is responsible for freeing up resources


of an object when its reference count hits zero. The function takes a
single argument, self, which is a pointer to an instance of the Person
type. Py_XDECREF() decrements the count of the first_name and
last_name PyObjects. It’s similar to Py_DECREF() but does not error
if the function is called on NULL. Lastly, we call the tp_free function,
which is a member of the ob_type attribute that is the type struct. This
is pointed to by the Person instance at self->ob_base->ob_type, but
the Py_TYPE() function can also be used as a shorthand. The tp_free()
function takes self cast as a PyObject* and it frees the memory of the
object.
Part VI. The Underbelly of the Snake 293

1 static int Person_Init(Person *self, PyObject *args, PyObject *kw\


2 args) {
3
4 PyObject *first, *last;
5 static char *kwlist[] = {"first_name", "last_name", NULL};
6 if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO", kwlist, \
7 &first, &last))
8 return -1;
9
10 PyObject* _first = self->first_name;
11 Py_INCREF(first);
12 self->first_name = first;
13 Py_XDECREF(_first);
14
15 PyObject* _last = self->last_name;
16 Py_INCREF(last);
17 self->last_name = last;
18 Py_XDECREF(_last);
19
20 return 0;
21 }

Next, we define the initializer, which is used to initialize instances of


the Person type. The function takes a pointer self to the instance of
a Person type, and PyObject pointers for an args tuple and a kwargs
dictionary. The args and kwargs are parsed out to the PyObject point-
ers first and last, using the PyArg_ParseTupleAndKeywords()
function. Finally, the first and last PyObjects are assigned to the
Person instance. This is done in a specific order; we first take the
reference to whatever value self->first_name was originally pointing
to and assign it to a temp value; we then increase the reference count
of the new first attribute; next we assign the new attribute to the
Part VI. The Underbelly of the Snake 294

self->first_name pointer; and finally, we Py_XDECREF() the temp


value, which could possibly be NULL. This ordering is important, as
garbage collection on the attribute value should only be allowed after
the attribute reference is reassigned to the new value.

1 static PyObject* Person_FullName(Person* self) {


2 if (self->first_name == NULL) {
3 PyErr_SetString(PyExc_AttributeError, "first_name is not \
4 defined");
5 return NULL;
6 }
7
8 if (self->last_name == NULL) {
9 PyErr_SetString(PyExc_AttributeError, "last_name is not d\
10 efined");
11 return NULL;
12 }
13
14 return PyUnicode_FromFormat("%S %S", self->first_name, self->\
15 last_name);
16 }

Next, we define a C function for creating a unicode string representing


the full name of our person. The function checks if both first_name
and last_name attributes of the Person instance have been set, and
if not it sets an error using the PyErr_SetString function with a
message of “first_name is not defined” or “last_name is not defined”.
If both attributes are set, it creates a Python Unicode string using the
PyUnicode_FromFormat() function, which takes a template string and
variadic PyObjects.
Part VI. The Underbelly of the Snake 295

1 static PyMemberDef PersonMembers[] = {


2 {
3 "first_name",
4 T_OBJECT_EX,
5 offsetof(Person, first_name),
6 0,
7 "first name"
8 },
9 {
10 "last_name",
11 T_OBJECT_EX,
12 offsetof(Person, last_name),
13 0,
14 "last name"
15 },
16 {NULL, }
17 };
18
19 static PyMethodDef PersonMethods[] = {
20 {
21 "full_name",
22 (PyCFunction)Person_FullName,
23 METH_NOARGS,
24 "return the full name of the Person"
25 },
26 {NULL, }
27 };

Next, we define two structs which hold definitions of the attributes


and methods of the Person type. PersonMembers is an array of PyMem-
berDef structs, where each struct of the array defines an attribute of
the Person type, including their name, type, offset, read/write access,
Part VI. The Underbelly of the Snake 296

and description. PersonMethods is an array of PyMethodDef structs,


which defines a methods name, function pointer, flag, and description.
The Person_FullName function is included in this array as a method
named “full_name”. Both structs are NULL terminated.

1 static PyTypeObject PersonType = {


2 PyVarObject_HEAD_INIT(NULL, 0)
3 "person.Person", /* tp_name */
4 sizeof(Person), /* tp_basicsize */
5 0, /* tp_itemsize */
6 (destructor)Person_Destruct, /* tp_dealloc */
7 0, /* tp_print */
8 0, /* tp_getattr */
9 0, /* tp_setattr */
10 0, /* tp_reserved */
11 0, /* tp_repr */
12 0, /* tp_as_number */
13 0, /* tp_as_sequence */
14 0, /* tp_as_mapping */
15 0, /* tp_hash */
16 0, /* tp_call */
17 0, /* tp_str */
18 0, /* tp_getattro */
19 0, /* tp_setattro */
20 0, /* tp_as_buffer */
21 Py_TPFLAGS_DEFAULT
22 | Py_TPFLAGS_BASETYPE, /* tp_flags */
23 "Person objects", /* tp_doc */
24 0, /* tp_traverse */
25 0, /* tp_clear */
26 0, /* tp_richcompare */
27 0, /* tp_weaklistoffset */
Part VI. The Underbelly of the Snake 297

28 0, /* tp_iter */
29 0, /* tp_iternext */
30 PersonMethods, /* tp_methods */
31 PersonMembers, /* tp_members */
32 0, /* tp_getset */
33 0, /* tp_base */
34 0, /* tp_dict */
35 0, /* tp_descr_get */
36 0, /* tp_descr_set */
37 0, /* tp_dictoffset */
38 (initproc)Person_Init, /* tp_init */
39 0, /* tp_alloc */
40 PyType_GenericNew, /* tp_new */
41 };

Each of these constituent elements goes into defining a custom PyType-


Object named PersonType, which defines a python type called per-
son.Person. This struct specifies the methods, members and other prop-
erties of the type. The type has a tp_basicsize of sizeof(Person),
indicating the size of the type in bytes, tp_dealloc as the destructor
method Person_Destruct, and the tp_methods and tp_members as
defined in the PersonMethods and PersonMembers arrays, respectively.
The PyType_GenericNew function is used to instantiate new objects
of the type - we could provide a custom tp_new function to hook
into the constructor protocol of this type, similar to how you would
overload the __new__ method in Python, but for this example the generic
PyType_GenericNew implementation is sufficient.
Part VI. The Underbelly of the Snake 298

1 static PyModuleDef PersonModule = {


2 PyModuleDef_HEAD_INIT,
3 "person",
4 "Example module for creating Python types in C",
5 -1,
6 NULL
7 };
8
9 PyMODINIT_FUNC PyInit_person() {
10
11 PyObject* module = PyModule_Create(&PersonModule);
12 if (module == NULL)
13 return NULL;
14
15 if (PyType_Ready(&PersonType) < 0)
16 return NULL;
17
18 Py_INCREF(&PersonType);
19 PyModule_AddObject(module, "Person", (PyObject*)&PersonType);
20 return module;
21 }

Finally, we create a module definition PersonModule and instantiate it


using PyModule_Create() in the module initialization function. Once
the module is created, we call the function PyType_Ready() for final
initialization of our new type, increase it’s reference count, and add it
as an object to the module under the name “Person” using PyModule_-
AddObject(). Once all of this is completed, we return the module.
After writing this to a .c extension and adding it to our setup.py file,
we can compile the extension and run the interpreter.
Part VI. The Underbelly of the Snake 299

1 root@e94757ea01f2:/code/extensions# python setup.py build_ext --i\


2 nplace
3 running build_ext
4 building 'my_package.person' extension
5 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -\
6 I/usr/local/include/python3.11 -c my_package/person.c -o build/te\
7 mp.linux-x86_64-cpython-311/my_package/person.o
8 gcc -pthread -shared build/temp.linux-x86_64-cpython-311/my_packa\
9 ge/person.o -L/usr/local/lib -o build/lib.linux-x86_64-cpython-31\
10 1/my_package/person.cpython-311-x86_64-linux-gnu.so
11 copying build/lib.linux-x86_64-cpython-311/my_package/person.cpyt\
12 hon-311-x86_64-linux-gnu.so -> my_package
13 root@e94757ea01f2:/code/extensions# python
14 Python 3.11.1 (main, Jan 23 2023, 21:04:06) [GCC 10.2.1 20210110]\
15 on linux
16 Type "help", "copyright", "credits" or "license" for more informa\
17 tion.
18 >>> from my_package import person
19 >>> my_person = person.Person(first_name="John", last_name="Smith\
20 ")
21 >>> my_person.full_name()
22 'John Smith'
23 >>>

Stack type

Let’s look at another example. Here we’re going to be implementing a


basic stack type with the methods push() and pop(). These methods
will operate on an internal memory allocation of PyObject*’s, adding
and removing items as necessary. We’ll also define a length attribute to
show the current size of the data structure.
Part VI. The Underbelly of the Snake 300

1 #include <Python.h>
2 #include "structmember.h"
3
4 typedef struct {
5 PyObject_HEAD
6 size_t length;
7 PyObject* pop;
8 PyObject* push;
9 PyObject** _data;
10 } Stack;
11
12 static PyObject* Stack_Push(Stack* self, PyObject* item) {
13 size_t len = self->length + 1;
14 self->_data = realloc(self->_data, len*sizeof(PyObject*));
15 Py_INCREF(item);
16 self->_data[self->length] = item;
17 self->length = len;
18 Py_RETURN_NONE;
19 }
20
21 static PyObject* Stack_Pop(Stack* self) {
22 if (self->length == 0)
23 Py_RETURN_NONE;
24 long len = self->length - 1;
25 PyObject* item = self->_data[len];
26 self->_data = realloc(self->_data, len*sizeof(PyObject*));
27 self->length = len;
28 return item;
29 }
30
31 static PyObject* Stack_New(PyTypeObject* type, PyObject* args, Py\
32 Object* kwargs) {
Part VI. The Underbelly of the Snake 301

33 Stack* self = (Stack*)type->tp_alloc(type, 0);


34 if (!self)
35 return NULL;
36 self->length = 0;
37 self->_data = malloc(0);
38 return (PyObject*)self;
39 }
40
41 static void Stack_Destruct(Stack* self) {
42 for (size_t i = 0; i < self->length; i++)
43 Py_DECREF(self->_data[i]);
44 free(self->_data);
45 Py_TYPE(self)->tp_free((PyObject*)self);
46 }
47
48 static PyMemberDef StackMembers[] = {
49 {"length", T_LONG, offsetof(Stack, length), READONLY, "stack \
50 length"},
51 {NULL, }
52 };
53
54 static PyMethodDef StackMethods[] = {
55 {"push", (PyCFunction)Stack_Push, METH_O, "push item to the s\
56 tack"},
57 {"pop", (PyCFunction)Stack_Pop, METH_NOARGS, "pop an item to \
58 the stack"},
59 {NULL, }
60 };
61
62 static PyTypeObject StackType = {
63 PyVarObject_HEAD_INIT(NULL, 0)
64 "stack.Stack", /* tp_name */
Part VI. The Underbelly of the Snake 302

65 sizeof(Stack), /* tp_basicsize */
66 0, /* tp_itemsize */
67 (destructor)Stack_Destruct, /* tp_dealloc */
68 0, /* tp_print */
69 0, /* tp_getattr */
70 0, /* tp_setattr */
71 0, /* tp_reserved */
72 0, /* tp_repr */
73 0, /* tp_as_number */
74 0, /* tp_as_sequence */
75 0, /* tp_as_mapping */
76 0, /* tp_hash */
77 0, /* tp_call */
78 0, /* tp_str */
79 0, /* tp_getattro */
80 0, /* tp_setattro */
81 0, /* tp_as_buffer */
82 Py_TPFLAGS_DEFAULT
83 | Py_TPFLAGS_BASETYPE, /* tp_flags */
84 "Stack objects", /* tp_doc */
85 0, /* tp_traverse */
86 0, /* tp_clear */
87 0, /* tp_richcompare */
88 0, /* tp_weaklistoffset */
89 0, /* tp_iter */
90 0, /* tp_iternext */
91 StackMethods, /* tp_methods */
92 StackMembers, /* tp_members */
93 0, /* tp_getset */
94 0, /* tp_base */
95 0, /* tp_dict */
96 0, /* tp_descr_get */
Part VI. The Underbelly of the Snake 303

97 0, /* tp_descr_set */
98 0, /* tp_dictoffset */
99 0, /* tp_init */
100 0, /* tp_alloc */
101 Stack_New, /* tp_new */
102 };
103
104 static PyModuleDef StackModule = {
105 PyModuleDef_HEAD_INIT,
106 "stack",
107 "module for custom stack object",
108 -1,
109 NULL
110 };
111
112 PyMODINIT_FUNC PyInit_stack() {
113 if (PyType_Ready(&StackType) < 0)
114 return NULL;
115
116 PyObject* module = PyModule_Create(&StackModule);
117 if (!module)
118 return NULL;
119
120 Py_INCREF(&StackType);
121 PyModule_AddObject(module, "Stack", (PyObject*)&StackType);
122 return module;
123 }

It should be noted that this example reallocates the _data structure on


each item write; this is not optimized for purposes of brevity.
In this example, we define a Stack_New() function instead of using the
standard PyType_GenericNew. This allows us to customize the instan-
Part VI. The Underbelly of the Snake 304

tiation of our Stack struct, setting the default values for the _data and
length fields. Since Stack() won’t take any initialization arguments,
we set tp_init to 0. We also use T_LONG as the implementation function
in the PyMemberDef struct to automatically cast the field value to a
Python int when the struct field is requested from the interpreter. We
also set this attribute as READONLY so as to prevent its value from being
reassigned.
Since this object is responsible for its own data collection, it’s worth
talking a moment to look at it’s implementation in finer detail. Specif-
ically, the push() and pop() methods of the struct, as well as the
Stack_Destruct implementation that is called at garbage collection.
When an item is passed to the Stack during a push operation, this
PyObject* is being passed as a borrowed reference. Since we are writing
a copy of this reference to _data, we need to explicitly inform the
interpreter that the Stack owns a reference, and until Stack releases
this reference, the item should not be garbage collected. We do this by
increasing the reference count.
When an item is popped from the Stack, we can simply return it without
decreasing the reference count, because we are returning an owned
reference to the caller.
Finally, when the Stack is deallocated, any owned reference in the
collection of PyObject*’s must be dereferenced; if this is not done then
the reference count of each item in the stack will never hit zero, resulting
in memory leaks. This is done by iterating over the length of the _data
array and calling Py_DECREF on each of the items. Only after this is done
can you call tp_free to delete the Stack object.
Part VI. The Underbelly of the Snake 305

Debugging C extensions

Similar to pdb, it’s possible to stop the execution of the python interpreter
inside c extensions using a C debugger like gdb. gdb allows you to find
and fix errors in C code by providing you with information about the
state of the extension while it is running.
To best use this, it’s important to first compile extensions without
optimizations. By default, python compiles extensions with an -O3
optimization flag; this is good for production but can result in objects
being optimized out of the extension. To compile without optimizations,
set the CFLAGS environment variable to -O0 during the extension build
step.

1 root@be45641b03f3:/code/extensions# CFLAGS="-O0" python setup.py \


2 build_ext --inplace --force
3 running build_ext
4 ...

Once the C extension is compiled, use gdb to start the python interpreter.
It should be noted that gdb requires a binary executable, so modules
(for example pytest) and scripts need to be invoked from the python
executable as either a module or a script.
Part VI. The Underbelly of the Snake 306

1 root@be45641b03f3:/code/extensions# gdb --args python script.py


2 GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
3 Copyright (C) 2021 Free Software Foundation, Inc.
4 License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licens\
5 es/gpl.html>
6 This is free software: you are free to change and redistribute it.
7 There is NO WARRANTY, to the extent permitted by law.
8 Type "show copying" and "show warranty" for details.
9 This GDB was configured as "x86_64-linux-gnu".
10 Type "show configuration" for configuration details.
11 For bug reporting instructions, please see:
12 <https://www.gnu.org/software/gdb/bugs/>.
13 Find the GDB manual and other documentation resources online at:
14 <http://www.gnu.org/software/gdb/documentation/>.
15
16 For help, type "help".
17 Type "apropos word" to search for commands related to "word"...
18 Reading symbols from python...
19 (gdb)

Once in the gdb shell, we can do things such as set breakpoints, inspect
the call stack, observe variables, and run our program.
Part VI. The Underbelly of the Snake 307

1 (gdb) b Stack_Push
2 Function "Stack_Push" not defined.
3 Make breakpoint pending on future shared library load? (y or [n])\
4 y
5 Breakpoint 1 (Stack_Push) pending.
6 (gdb) run
7 Starting program: /usr/local/bin/python script.py
8 warning: Error disabling address space randomization: Operation n\
9 ot permitted
10 [Thread debugging using libthread_db enabled]
11 Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_\
12 db.so.1".
13
14 Breakpoint 1, Stack_Push (self=0x7f0d84d2bde0, item=0) at my_pack\
15 age/stack.c:13
16 13 size_t len = self->length + 1;
17 (gdb) b 17
18 Breakpoint 2 at 0x7f0d84adb25c: file my_package/stack.c, line 17.
19 (gdb) c
20 Continuing.
21
22 Breakpoint 2, Stack_Push (self=0x7f0d84d2bde0, item=0) at my_pack\
23 age/stack.c:17
24 17 self->length = len;
25 (gdb) p len
26 $1 = 1
27 (gdb) l
28 12 static PyObject* Stack_Push(Stack* self, PyObject* item) {
29 13 size_t len = self->length + 1;
30 14 self->_data = realloc(self->_data, len*sizeof(PyObjec\
31 t*));
32 15 Py_INCREF(item);
Part VI. The Underbelly of the Snake 308

33 16 self->_data[self->length] = item;
34 17 self->length = len;
35 18 Py_RETURN_NONE;
36 19 }
37 20
38 21 static PyObject* Stack_Pop(Stack* self) {
39 (gdb) p *self
40 $2 = {ob_base = {ob_refcnt = 2, ob_type = 0x7f0d84ade140 <StackTy\
41 pe>}, length = 0,
42 _data = 0x563ee09d7000, push = 0x0, pop = 0x0}
43 (gdb) p *item
44 $3 = {ob_refcnt = 1000000155, ob_type = 0x7f0d8564b760 <PyLong_Ty\
45 pe>}
46 <b>(gdb) p (long)PyLong_AsLong(item)</b>
47 $4 = 0
48 (gdb)

You might also like