A Whirlwind Tour of Python
A Whirlwind Tour of Python
A Whirlwind Tour of Python
Jake VanderPlas
Beijing
Tokyo
First Edition
978-1-491-96465-1
[LSI]
Table of Contents
1
2
5
7
13
17
24
30
37
41
45
52
58
61
66
69
84
90
Introduction
Conceived in the late 1980s as a teaching and scripting language,
Python has since become an essential tool for many programmers,
engineers, researchers, and data scientists across academia and
industry. As an astronomer focused on building and promoting the
free open tools for data-intensive science, Ive found Python to be a
near-perfect fit for the types of problems I face day to day, whether
its extracting meaning from large astronomical datasets, scraping
and munging data sources from the Web, or automating day-to-day
research tasks.
The appeal of Python is in its simplicity and beauty, as well as the
convenience of the large ecosystem of domain-specific tools that
have been built on top of it. For example, most of the Python code
in scientific computing and data science is built around a group of
mature and useful packages:
NumPy provides efficient storage and computation for multidi
mensional data arrays.
SciPy contains a wide array of numerical tools such as numeri
cal integration and interpolation.
Pandas provides a DataFrame object along with a powerful set
of methods to manipulate, filter, group, and transform data.
Matplotlib provides a useful interface for creation of
publication-quality plots and figures.
Scikit-Learn provides a uniform toolkit for applying common
machine learning algorithms to data.
1
With the interpreter running, you can begin to type and execute
code snippets. Here well use the interpreter as a simple calculator,
performing calculations and assigning values to variables:
>>> 1 + 1
2
>>> x = 5
>>> x * 3
15
Note that just as the input is numbered, the output of each com
mand is numbered as well. IPython makes available a wide array of
useful features; for some suggestions on where to read more, see
Resources for Further Learning on page 90.
To run this file, we make sure it is in the current directory and type
# shorthand for x = x + 2
Python does not have any syntax for multiline comments, such as
the /* ... */ syntax used in C and C++, though multiline strings
are often used as a replacement for multiline comments (more on
this in String Manipulation and Regular Expressions on page 69).
This shows the example of how the semicolon (;) familiar in C can
be used optionally in Python to put two statements on a single line.
Functionally, this is entirely equivalent to writing:
lower = []
upper = []
>>> if x < 4:
...
y = x * 2
... print(x)
In the snippet on the left, print(x) is in the indented block, and will
be executed only if x is less than 4. In the snippet on the right,
print(x) is outside the block, and will be executed regardless of the
value of x!
10
to
x = 10 ** -2
I find the second version with spaces much more easily readable at a
single glance. Most Python style guides recommend using a single
space around binary operators, and no space around unary opera
tors. Well discuss Pythons operators further in Basic Python
Semantics: Variables and Objects on page 13.
11
12
# Python 3 only!
>>> print("first value:", 1)
first value: 1
13
This may seem straightforward, but if you have the wrong mental
model of what this operation does, the way Python works may seem
confusing. Well briefly dig into that here.
In many programming languages, variables are best thought of as
containers or buckets into which you put data. So in C, for example,
when you write
// C code
int x = 4;
this dynamic typing is one of the pieces that makes Python so quick
to write and easy to read.
There is a consequence of this variable as pointer approach that
you need to be aware of. If we have two variable names pointing to
the same mutable object, then changing one will change the other as
well! For example, lets create and modify a list:
In [2]: x = [1, 2, 3]
y = x
14
Weve created two variables x and y that both point to the same
object. Because of this, if we modify the list via one of its names,
well see that the other list will be modified as well:
In [3]: print(y)
[1, 2, 3]
In [4]: x.append(4) # append 4 to the list pointed to by x
print(y) # y's list is modified as well!
[1, 2, 3, 4]
15
points. For this reason, the value of y is not affected by the opera
tion.
Everything Is an Object
Python is an object-oriented programming language, and in Python
everything is an object.
Lets flesh out what this means. Earlier we saw that variables are sim
ply pointers, and the variable names themselves have no attached
type information. This leads some to claim erroneously that Python
is a type-free language. But this is not the case! Consider the follow
ing:
In [7]:
x = 4
type(x)
x = 'hello'
type(x)
x = 3.14159
type(x)
Python has types; however, the types are linked not to the variable
names but to the objects themselves.
In object-oriented programming languages like Python, an object is
an entity that contains data along with associated metadata and/or
functionality. In Python, everything is an object, which means every
entity has some metadata (called attributes) and associated function
ality (called methods). These attributes and methods are accessed via
the dot syntax.
For example, before we saw that lists have an append method, which
adds an item to the list, and is accessed via the dot syntax (.):
In [10]: L = [1, 2, 3]
L.append(100)
print(L)
[1, 2, 3, 100]
example, numerical types have a real and imag attribute that return
the real and imaginary part of the value, if viewed as a complex
number:
In [11]: x = 4.5
print(x.real, "+", x.imag, 'i')
4.5 + 0.0 i
Methods are like attributes, except they are functions that you can
call using a pair of opening and closing parentheses. For example,
floating-point numbers have a method called is_integer that
checks whether the value is an integer:
In [12]:
x = 4.5
x.is_integer()
x = 4.0
x.is_integer()
type(x.is_integer)
Arithmetic Operations
Python implements seven basic binary arithmetic operators, two of
which can double as unary operators. They are summarized in the
following table:
17
Operator Name
a + b Addition
Description
Sum of a and b
a - b
Subtraction
Difference of a and b
a * b
Multiplication
Product of a and b
a / b
True division
Quotient of a and b
a // b Floor division
Modulus
a % b
Negation
The negative of a
+a
Unary plus
Bitwise Operations
In addition to the standard numerical operations, Python includes
operators to perform bitwise logical operations on integers. These
are much less commonly used than the standard arithmetic opera
18
tions, but its useful to know that they exist. The six bitwise opera
tors are summarized in the following table:
Operator Name
a & b Bitwise AND
Description
Bits defined in both a and b
a | b
Bitwise OR
a ^ b
Bitwise XOR
Bitwise NOT
Bitwise negation of a
These bitwise operators only make sense in terms of the binary rep
resentation of numbers, which you can see using the built-in bin
function:
In [4]:
bin(10)
bin(4)
Now, using bitwise OR, we can find the number which combines the
bits of 4 and 10:
In [6]:
4 | 10
Out [6]: 14
In [7]:
bin(4 | 10)
19
Assignment Operations
Weve seen that variables can be assigned with the = operator, and
the values stored for later use. For example:
In [8]: a = 24
print(a)
24
a + 2
Out [9]: 26
We might want to update the variable a with this new value; in this
case, we could combine the addition and the assignment and write
a = a + 2. Because this type of combined operation and assign
ment is so common, Python includes built-in update operators for
all of the arithmetic operations:
In [10]: a += 2 # equivalent to a = a + 2
print(a)
26
a -= b a *= b
a /= b
a ^= b a <<= b a >>= b
Comparison Operations
Another type of operation that can be very useful is comparison of
different values. For this, Python implements standard comparison
20
operators, which return Boolean values True and False. The com
parison operations are listed in the following table:
Operation Description
a == b
a equal to b
a != b
a not equal to b
a < b
a less than b
a > b
a greater than b
a <= b
a >= b
# 25 is odd
25 % 2 == 1
# 66 is odd
66 % 2 == 1
And, just to make your head hurt a bit, take a look at this compari
son:
In [14]:
-1 == ~0
Recall that ~ is the bit-flip operator, and evidently when you flip all
the bits of zero you end up with 1. If youre curious as to why this
is, look up the twos complement integer encoding scheme, which is
what Python uses to encode signed integers, and think about hap
pens when you start flipping all the bits of integers encoded this way.
21
Boolean Operations
When working with Boolean values, Python provides operators to
combine the values using the standard concepts of and, or, and
not. Predictably, these operators are expressed using the words
and, or, and not:
In [15]:
x = 4
(x < 6) and (x > 2)
(x > 10) or (x % 2 == 0)
not (x < 6)
22
Operator
Description
a is b
True if a is a member of b
a = [1, 2, 3]
b = [1, 2, 3]
In [20]:
a == b
a is b
a is not b
a = [1, 2, 3]
b = a
a is b
The difference between the two cases here is that in the first, a and b
point to different objects, while in the second they point to the same
object. As we saw in the previous section, Python variables are
pointers. The is operator checks whether the two variables are
pointing to the same container (object), rather than referring to
what the container contains. With this in mind, in most cases that a
beginner is tempted to use is, what they really mean is ==.
Membership operators
Membership operators check for membership within compound
objects. So, for example, we can write:
In [24]:
1 in [1, 2, 3]
23
In [25]:
2 not in [1, 2, 3]
Example
int
x = 1
Description
Integers (i.e., whole numbers)
float
x = 1.0
complex
bool
x = True
str
x = 'abc'
NoneType x = None
Integers
The most basic numerical type is the integer. Any number without a
decimal point is an integer:
In [1]:
x = 1
type(x)
Python integers are actually quite a bit more sophisticated than inte
gers in languages like C. C integers are fixed-precision, and usually
24
2 ** 200
Out [2]:
1606938044258990275541962092341162602522202993782792835301376
5 / 2
5 // 2
Out [4]: 2
Finally, note that although Python 2.x had both an int and long
type, Python 3 combines the behavior of these two into a single int
type.
Floating-Point Numbers
The floating-point type can store fractional numbers. They can be
defined either in standard decimal notation, or in exponential nota
tion:
In [5]: x = 0.000005
y = 5e-6
print(x == y)
True
In [6]: x = 1400000.00
y = 1.4e6
print(x == y)
True
25
float(1)
Floating-point precision
One thing to be aware of with floating-point arithmetic is that its
precision is limited, which can cause equality tests to be unstable.
For example:
In [8]:
Why is this the case? It turns out that it is not a behavior unique to
Python, but is due to the fixed-precision format of the binary
floating-point storage used by most, if not all, scientific computing
platforms. All programming languages using floating-point num
bers store them in a fixed number of bits, and this leads some num
bers to be represented only approximately. We can see this by
printing the three values to high precision:
In [9]: print("0.1 = {0:.17f}".format(0.1))
print("0.2 = {0:.17f}".format(0.2))
print("0.3 = {0:.17f}".format(0.3))
0.1 = 0.10000000000000001
0.2 = 0.20000000000000001
0.3 = 0.29999999999999999
26
Complex Numbers
Complex numbers are numbers with real and imaginary (floatingpoint) parts. Weve seen integers and real numbers before; we can
use these to construct a complex number:
In [10]:
complex(1, 2)
1 + 2j
27
c = 3 + 4j
In [13]:
c.real
# real part
c.imag
# imaginary part
c.conjugate()
# complex conjugate
String Type
Strings in Python are created with single or double quotes:
In [17]: message = "what do you like?"
response = 'spam'
# length of string
len(response)
Out [18]: 4
In [19]:
# concatenation with +
message + response
28
In [23]:
None Type
Python includes a special type, the NoneType, which has only a sin
gle possible value: None. For example:
In [24]:
type(None)
Youll see None used in many places, but perhaps most commonly it
is used as the default return value of a function. For example, the
print() function in Python 3 does not return anything, but we can
still catch its value:
In [25]: return_value = print('abc')
abc
In [26]: print(return_value)
None
Boolean Type
The Boolean type is a simple type with two possible values: True and
False, and is returned by comparison operators discussed previ
ously:
In [27]:
result = (4 < 5)
result
type(result)
Keep in mind that the Boolean values are case-sensitive: unlike some
other languages, True and False must be capitalized!
In [29]: print(True, False)
True False
29
bool(2014)
bool(0)
bool(3.1415)
bool(None)
For strings, bool(s) is False for empty strings and True otherwise:
In [34]:
bool("")
bool("abc")
For sequences, which well see in the next section, the Boolean rep
resentation is False for empty sequences and True for any other
sequences:
In [36]:
bool([1, 2, 3])
bool([])
[1, 2, 3]
Description
Ordered collection
tuple
(1, 2, 3)
30
Description
{'a':1, 'b':2, 'c':3} Unordered (key,value) mapping
set
{1, 2, 3}
As you can see, round, square, and curly brackets have distinct
meanings when it comes to the type of collection produced. Well
take a quick tour of these data structures here.
Lists
Lists are the basic ordered and mutable data collection type in
Python. They can be defined with comma-separated values between
square brackets; here is a list of the first several prime numbers:
In [1]: L = [2, 3, 5, 7]
# Length of a list
len(L)
Out [2]: 4
In [3]:
In addition, there are many more built-in list methods; they are
well-covered in Pythons online documentation.
While weve been demonstrating lists containing values of a single
type, one of the powerful features of Pythons compound objects is
that they can contain objects of any type, or even a mix of types. For
example:
31
Python uses zero-based indexing, so we can access the first and sec
ond element in using the following syntax:
In [8]:
L[0]
Out [8]: 2
In [9]:
L[1]
Out [9]: 3
Elements at the end of the list can be accessed with negative num
bers, starting from -1:
In [10]:
L[-1]
Out [10]: 11
In [12]:
L[-2]
Out [12]: 7
32
L[0:3]
Notice where 0 and 3 lie in the preceding diagram, and how the slice
takes just the values between the indices. If we leave out the first
index, 0 is assumed, so we can equivalently write the following:
In [13]:
L[:3]
Similarly, if we leave out the last index, it defaults to the length of the
list. Thus, the last three elements can be accessed as follows:
In [14]:
L[-3:]
L[::2]
# equivalent to L[0:len(L):2]
L[::-1]
33
Tuples
Tuples are in many ways similar to lists, but they are defined with
parentheses rather than square brackets:
In [19]: t = (1, 2, 3)
Like the lists discussed before, tuples have a length, and individual
elements can be extracted using square-bracket indexing:
In [21]:
len(t)
Out [21]: 3
In [22]:
t[0]
Out [22]: 1
<ipython-input-23-141c76cb54a2> in <module>()
----> 1 t[1] = 4
34
In [24]: t.append(4)
--------------------------------------------------------AttributeError
<ipython-input-24-e8bd1632f9dd> in <module>()
----> 1 t.append(4)
x = 0.125
x.as_integer_ratio()
The indexing and slicing logic covered earlier for lists works for
tuples as well, along with a host of other methods. Refer to the Data
Structures documentation for a more complete list of these.
Dictionaries
Dictionaries are extremely flexible mappings of keys to values, and
form the basis of much of Pythons internal implementation. They
can be created via a comma-separated list of key:value pairs within
curly braces:
In [27]: numbers = {'one':1, 'two':2, 'three':3}
Items are accessed and set via the indexing syntax used for lists and
tuples, except here the index is not a zero-based order but valid key
in the dictionary:
In [28]:
Out [28]: 2
35
Sets
The fourth basic collection is the set, which contains unordered col
lections of unique items. They are defined much like lists and tuples,
except they use the curly brackets of dictionaries:
In [30]: primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}
36
In [34]:
# symmetric difference: items appearing in only one set
primes ^ odds
# with an operator
primes.symmetric_difference(odds) # equivalently with a method
Out [34]: {1, 2, 9}
Many more set methods and operations are available. Youve proba
bly already guessed what Ill say next: refer to Pythons online docu
mentation for a complete reference.
Control Flow
Control flow is where the rubber really meets the road in program
ming. Without it, a program is simply a list of statements that are
sequentially executed. With control flow, you can execute certain
code blocks conditionally and/or repeatedly: these basic building
blocks can be combined to create surprisingly sophisticated pro
grams!
Control Flow
37
"is zero")
"is positive")
"is negative")
"is unlike anything I've ever seen...")
-15 is negative
Note especially the use of colons (:) and whitespace to denote sepa
rate blocks of code.
Python adopts the if and else often used in other languages; its
more unique keyword is elif, a contraction of else if . In these
conditional clauses, elif and else blocks are optional; additionally,
you can optionally include as few or as many elif statements as you
would like.
for loops
Loops in Python are a way to repeatedly execute some code state
ment. So, for example, if wed like to print each of the items in a list,
we can use a for loop:
In [2]: for N in [2, 3, 5, 7]:
print(N, end=' ') # print all on same line
2 3 5 7
Note that the range starts at zero by default, and that by convention
the top of the range is not included in the output. Range objects can
also have more complicated values:
In [4]:
# range from 5 to 10
list(range(5, 10))
# range from 0 to 10 by 2
list(range(0, 10, 2))
You might notice that the meaning of range arguments is very simi
lar to the slicing syntax that we covered in Lists on page 31.
Note that the behavior of range() is one of the differences between
Python 2 and Python 3: in Python 2, range() produces a list, while
in Python 3, range() produces an iterable object.
while loops
The other type of loop in Python is a while loop, which iterates until
some condition is met:
In [6]: i = 0
while i < 10:
print(i, end=' ')
i += 1
0 1 2 3 4 5 6 7 8 9
39
Notice that we use a while True loop, which will loop forever unless
we have a break statement!
40
The else statement only executes if none of the factors divide the
given number. The else statement works similarly with the while
loop.
Using Functions
Functions are groups of code that have a name and can be called
using parentheses. Weve seen functions before. For example, print
in Python 3 is a function:
In [1]: print('abc')
abc
Here print is the function name, and 'abc' is the functions argu
ment.
In addition to arguments, there are keyword arguments that are
specified by name. One available keyword argument for the print()
Defining and Using Functions
41
Defining Functions
Functions become even more useful when we begin to define our
own, organizing functionality to be used in multiple places. In
Python, functions are defined with the def statement. For example,
we can encapsulate a version of our Fibonacci sequence code from
the previous section as follows:
In [4]: def fibonacci(N):
L = []
a, b = 0, 1
while len(L) < N:
a, b = b, a + b
L.append(a)
return L
fibonacci(10)
42
r, i, c = real_imag_conj(3 + 4j)
print(r, i, c)
3.0 4.0 (3-4j)
fibonacci(10)
But now we can use the function to explore new things, such as the
effect of new starting values:
In [9]:
fibonacci(10, 0, 2)
Out [10]: [3, 4, 7, 11, 18, 29, 47, 76, 123, 199]
43
Here it is not the names args and kwargs that are important, but the
* characters preceding them. args and kwargs are just the variable
names often used by convention, short for arguments and key
word arguments. The operative difference is the asterisk characters:
a single * before a variable means expand this as a sequence, while
a double ** before a variable means expand this as a dictionary. In
fact, this syntax can be used not only with the function definition,
but with the function call as well!
In [14]: inputs = (1, 2, 3)
keywords = {'pi': 3.14}
catch_all(*inputs, **keywords)
args = (1, 2, 3)
kwargs = {'pi': 3.14}
add = lambda x, y: x + y
add(1, 2)
Out [15]: 3
So why would you ever want to use such a thing? Primarily, it comes
down to the fact that everything is an object in Python, even func
tions themselves! That means that functions can be passed as argu
ments to functions.
As an example of this, suppose we have some data stored in a list of
dictionaries:
44
In [17]:
data = [{'first':'Guido', 'last':'Van Rossum', 'YOB':1956},
{'first':'Grace', 'last':'Hopper',
'YOB':1906},
{'first':'Alan', 'last':'Turing',
'YOB':1912}]
Now suppose we want to sort this data. Python has a sorted func
tion that does this:
In [18]:
sorted([2,4,3,5,1,6])
But dictionaries are not orderable: we need a way to tell the function
how to sort our data. We can do this by specifying the key function,
a function which given an item returns the sorting key for that item:
In [19]:
Out [19]:
[{'YOB': 1912, 'first': 'Alan', 'last': 'Turing'},
{'YOB': 1906, 'first': 'Grace', 'last': 'Hopper'},
{'YOB': 1956, 'first': 'Guido', 'last': 'Van Rossum'}]
In [20]:
Out [20]:
[{'YOB': 1906, 'first': 'Grace', 'last': 'Hopper'},
{'YOB': 1912, 'first': 'Alan', 'last': 'Turing'},
{'YOB': 1956, 'first': 'Guido', 'last': 'Van Rossum'}]
45
Runtime Errors
If youve done any coding in Python, youve likely come across run
time errors. They can happen in a lot of ways.
For example, if you try to reference an undefined variable:
In [1]: print(Q)
--------------------------------------------------------NameError
<ipython-input-3-e796bdcf24ff> in <module>()
----> 1 print(Q)
<ipython-input-4-aab9e8ede4f7> in <module>()
----> 1 1 + 'abc'
<ipython-input-5-ae0c5d243292> in <module>()
----> 1 2 / 0
46
<ipython-input-6-06b6eb1b8957> in <module>()
1 L = [1, 2, 3]
----> 2 L[1000]
Note that in each case, Python is kind enough to not simply indicate
that an error happened, but to spit out a meaningful exception that
includes information about what exactly went wrong, along with the
exact line of code where the error happened. Having access to mean
ingful errors like this is immensely useful when trying to trace the
root of problems in your code.
Note that the second block here did not get executed: this is because
the first block did not return an error. Lets put a problematic state
ment in the try block and see what happens:
In [6]: try:
print("let's try something:")
x = 1 / 0 # ZeroDivisionError
except:
print("something bad happened!")
let's try something:
something bad happened!
47
Here we see that when the error was raised in the try statement (in
this case, a ZeroDivisionError), the error was caught, and the
except statement was executed.
One way this is often used is to check user input within a function
or another piece of code. For example, we might wish to have a
function that catches zero-division and returns some other value,
perhaps a suitably large number like 10100:
In [7]:
In [8]:
safe_divide(1, 2)
safe_divide(2, 0)
Dividing an integer and a string raises a TypeError, which our overzealous code caught and assumed was a ZeroDivisionError! For
this reason, its nearly always a better idea to catch exceptions
explicitly:
In [11]:
In [12]:
safe_divide(1, 0)
safe_divide(1, '2')
--------------------------------------------------------TypeError
<ipython-input-15-2331af6a0acf> in <module>()
----> 1 safe_divide(1, '2')
48
<ipython-input-13-10b5f0163af8> in safe_divide(a, b)
1 def safe_divide(a, b):
2
try:
----> 3
return a / b
4
except ZeroDivisionError:
5
return 1E100
Were now catching zero-division errors only, and letting all other
errors pass through unmodified.
<ipython-input-16-c6a4c1ed2f34> in <module>()
----> 1 raise RuntimeError("my error message")
49
One potential problem here is that the input value could be negative.
This will not currently cause any error in our function, but we might
want to let the user know that a negative N is not supported. Errors
stemming from invalid parameter values, by convention, lead to a
ValueError being raised:
In [16]:
def fibonacci(N):
if N < 0:
raise ValueError("N must be non-negative")
L = []
a, b = 0, 1
while len(L) < N:
a, b = b, a + b
L.append(a)
return L
In [17]:
fibonacci(10)
fibonacci(-10)
--------------------------------------------------------RuntimeError
<ipython-input-20-3d291499cfa7> in <module>()
----> 1 fibonacci(-10)
<ipython-input-18-01d0cf168d63> in fibonacci(N)
1 def fibonacci(N):
2
if N < 0:
----> 3
raise ValueError("N must be non-negative")
4
L = []
5
a, b = 0, 1
Now the user knows exactly why the input is invalid, and could even
use a tryexcept block to handle it!
In [19]: N = -10
try:
print("trying this...")
print(fibonacci(N))
except ValueError:
print("Bad value: need to do something else")
trying this...
Bad value: need to do something else
50
With this pattern, you can further customize the exception handling
of your function.
<ipython-input-23-92c36e04a9d0> in <module>()
2
pass
3
----> 4 raise MySpecialError("here's the message")
This would allow you to use a tryexcept block that only catches
this type of error:
Errors and Exceptions
51
In [22]:
try:
print("do something")
raise MySpecialError("[informative error message here]")
except MySpecialError:
print("do something else")
do something
do something else
You might find this useful as you develop more customized code.
tryexceptelsefinally
In addition to try and except, you can use the else and finally
keywords to further tune your codes handling of exceptions. The
basic structure is this:
In [23]: try:
print("try something here")
except:
print("this happens only if it fails")
else:
print("this happens only if it succeeds")
finally:
print("this happens no matter what")
try something here
this happens only if it succeeds
this happens no matter what
The utility of else here is clear, but whats the point of finally?
Well, the finally clause really is executed no matter what: I usually
see it used to do some sort of cleanup after an operation completes.
Iterators
Often an important piece of data analysis is repeating a similar cal
culation, over and over, in an automated fashion. For example, you
may have a table of names that youd like to split into first and last,
or perhaps of dates that youd like to convert to some standard for
mat. One of Pythons answers to this is the iterator syntax. Weve
seen this already with the range iterator:
In [1]: for i in range(10):
print(i, end=' ')
0 1 2 3 4 5 6 7 8 9
52
Here were going to dig a bit deeper. It turns out that in Python 3,
range is not a list, but is something called an iterator, and learning
how it works is key to understanding a wide class of very useful
Python functionality.
iter([2, 4, 6, 8, 10])
Iterators
53
range(10)
iter(range(10))
The benefit of the iterator indirection is that the full list is never
explicitly created! We can see this by doing a range calculation that
would overwhelm our system memory if we actually instantiated it
(note that in Python 2, range creates a list, so running the following
will not lead to good things!):
In [11]: N = 10 ** 12
for i in range(N):
if i >= 10: break
print(i, end=', ')
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
54
Useful Iterators
This iterator syntax is used nearly universally in Python built-in
types as well as the more data sciencespecific object well explore in
later sections. Here well cover some of the more useful iterators in
the Python language.
enumerate
Often you need to iterate not only the values in an array, but also
keep track of the index. You might be tempted to do things this way:
In [13]: L = [2, 4, 6, 8, 10]
for i in range(len(L)):
print(i, L[i])
0
1
2
3
4
2
4
6
8
10
Although this does work, Python provides a cleaner syntax using the
enumerate iterator:
In [14]: for i, val in enumerate(L):
print(i, val)
0
1
2
3
4
2
4
6
8
10
This is the more Pythonic way to enumerate the indices and values
in a list.
Iterators
55
zip
Other times, you may have multiple lists that you want to iterate
over simultaneously. You could certainly iterate over the index as in
the non-Pythonic example we looked at previously, but it is better to
use the zip iterator, which zips together iterables:
In [15]: L = [2, 4, 6, 8, 10]
R = [3, 6, 9, 12, 15]
for lval, rval in zip(L, R):
print(lval, rval)
2 3
4 6
6 9
8 12
10 15
Any number of iterables can be zipped together, and if they are dif
ferent lengths, the shortest will determine the length of the zip.
The filter iterator looks similar, except it only passes through val
ues for which the filter function evaluates to True:
In [17]: # find values up to 10 for which x % 2 is zero
is_even = lambda x: x % 2 == 0
for val in filter(is_even, range(10)):
print(val, end=' ')
0 2 4 6 8
The map and filter functions, along with the reduce function
(which lives in Pythons functools module) are fundamental com
ponents of the functional programming style, which, while not a
dominant programming style in the Python world, has its outspo
ken proponents (see, for example, the pytoolz library).
56
So, for example, we can get tricky and compress the map example
from before into the following:
In [19]: print(*map(lambda x: x ** 2, range(10)))
0 1 4 9 16 25 36 49 64 81
Using this trick lets us answer the age-old question that comes up in
Python learners forums: why is there no unzip() function that does
the opposite of zip()? If you lock yourself in a dark closet and think
about it for a while, you might realize that the opposite of zip() is
zip()! The key is that zip() can zip together any number of itera
tors or sequences. Observe:
In [20]: L1 = (1, 2, 3, 4)
L2 = ('a', 'b', 'c', 'd')
In [21]: z = zip(L1, L2)
print(*z)
(1, 'a') (2, 'b') (3, 'c') (4, 'd')
In [22]: z = zip(L1, L2)
new_L1, new_L2 = zip(*z)
print(new_L1, new_L2)
(1, 2, 3, 4) ('a', 'b', 'c', 'd')
Ponder this for a while. If you understand why it works, youll have
come a long way in understanding Python iterators!
Iterators
57
Somewhat related is the product iterator, which iterates over all sets
of pairs between two or more iterables:
In [25]: from itertools import product
p = product('ab', range(3))
print(*p)
('a', 0) ('a', 1) ('a', 2) ('b', 0) ('b', 1) ('b', 2)
Many more useful iterators exist in itertools: the full list can be
found, along with some examples, in Pythons online documenta
tion.
List Comprehensions
If you read enough Python code, youll eventually come across the
terse and efficient construction known as a list comprehension. This
is one feature of Python I expect you will fall in love with if youve
not used it before; it looks something like this:
In [1]:
Out [1]: [1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19]
58
In [2]:
L = []
for n in range(12):
L.append(n ** 2)
L
Out [2]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]
[n ** 2 for n in range(12)]
Out [3]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]
As with many Python statements, you can almost read off the mean
ing of this statement in plain English: construct a list consisting of
the square of n for each n up to 12.
This basic syntax, then, is [expr for var in iterable], where
expr is any valid expression, var is a variable name, and iterable is
any iterable Python object.
Multiple Iteration
Sometimes you want to build a list not just from one value, but from
two. To do this, simply add another for expression in the compre
hension:
In [4]:
Out [4]: [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
Notice that the second for expression acts as the interior index,
varying the fastest in the resulting list. This type of construction can
be extended to three, four, or more iterators within the comprehen
sion, though at some point code readability will suffer!
Out [5]: [1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19]
59
value is not divisible by 3. Once you are comfortable with it, this is
much easier to writeand to understand at a glancethan the
equivalent loop syntax:
In [6]:
L = []
for val in range(20):
if val % 3:
L.append(val)
L
Out [6]: [1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19]
Python has something very similar to this, which is most often used
within list comprehensions, lambda functions, and other places
where a simple expression is desired:
In [7]:
val = -10
val if val >= 0 else -val
Out [7]: 10
Out [8]: [1, -2, -4, 5, 7, -8, -10, 11, 13, -14, -16, 17, 19]
Note the line break within the list comprehension before the for
expression: this is valid in Python, and is often a nice way to breakup long list comprehensions for greater readability. Look this over:
what were doing is constructing a list, leaving out multiples of 3,
and negating all multiples of 2.
Once you understand the dynamics of list comprehensions, its
straightforward to move on to other types of comprehensions. The
syntax is largely the same; the only difference is the type of bracket
you use.
60
For example, with curly braces you can create a set with a set com
prehension:
In [9]:
Out [9]: {0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121}
{a % 3 for a in range(1000)}
With a slight tweak, you can add a colon (:) to create a dict compre
hension:
In [11]:
Finally, if you use parentheses rather than square brackets, you get
whats called a generator expression:
In [12]:
Generators
Here well take a deeper dive into Python generators, including gen
erator expressions and generator functions.
Generator Expressions
The difference between list comprehensions and generator expres
sions is sometimes confusing; here well quickly outline the differ
ences between them.
[n ** 2 for n in range(12)]
Generators
61
Out [1]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]
(n ** 2 for n in range(12))
Notice that printing the generator expression does not print the
contents; one way to print the contents of a generator expression is
to pass it to the list constructor:
In [3]:
G = (n ** 2 for n in range(12))
list(G)
Out [3]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]
62
In [7]:
for i in count():
print(i, end=' ')
if i >= 10: break
0 1 2 3 4 5 6 7 8 9 10
The count iterator will go on happily counting forever until you tell
it to stop; this makes it convenient to create generators that will also
go on forever:
In [8]:
factors = [2, 3, 5, 7]
G = (i for i in count() if all(i % n > 0 for n in factors))
for val in G:
print(val, end=' ')
if val > 40: break
1 11 13 17 19 23 29 31 37 41
You might see what were getting at here: if we were to expand the
list of factors appropriately, what we would have the beginnings of is
a prime number generator, using the Sieve of Eratosthenes algo
rithm. Well explore this more momentarily.
G = (n ** 2 for n in range(12))
list(G)
Out [10]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]
In [11]:
list(G)
Out [11]: []
Generators
63
One place Ive found this useful is when working with collections of
data files on disk; it means that you can quite easily analyze them in
batches, letting the generator keep track of which ones you have yet
to see.
64
yield n ** 2
G2 = gen()
print(*G1)
print(*G2)
0 1 4 9 16 25 36 49 64 81 100 121
0 1 4 9 16 25 36 49 64 81 100 121
65
Thats all there is to it! While this is certainly not the most computa
tionally efficient implementation of the Sieve of Eratosthenes, it
illustrates how convenient the generator function syntax can be for
building more complicated sequences.
import math
math.cos(math.pi)
66
import numpy as np
np.cos(np.pi)
67
sum(range(5), -1)
Out [6]: 9
Now observe what happens if we make the exact same function call
after importing * from numpy:
In [7]:
In [8]:
sum(range(5), -1)
Out [8]: 10
The result is off by one! The reason for this is that the import *
statement replaces the built-in sum function with the numpy.sum
function, which has a different call signature: in the former, were
summing range(5) starting at -1; in the latter, were summing
range(5) along the last axis (indicated by -1). This is the type of sit
uation that may arise if care is not taken when using import *for
this reason, it is best to avoid this unless you know exactly what you
are doing.
Tools for interfacing with the operating system, including navigating file
directory structures and executing shell commands
math and cmath Mathematical functions and operations on real and complex numbers
itertools
functools
random
68
pickle
Tools for object persistence: saving objects to and loading objects from disk
urllib
You can find information on these, and many more, in the Python
standard library documentation: https://docs.python.org/3/library/.
69
In [1]:
x = 'a string'
y = "a string"
x == y
With this, lets take a quick tour of some of Pythons string manipu
lation tools.
To convert the entire string into uppercase or lowercase, you can use
the upper() or lower() methods respectively:
In [4]:
fox.upper()
fox.lower()
70
fox.title()
fox.capitalize()
fox.swapcase()
line = '
line.strip()
'
line.rstrip()
line.lstrip()
'
To remove characters other than spaces, you can pass the desired
character to the strip() method:
In [12]:
num = "000000000000435"
num.strip('0')
'
71
line.ljust(30)
'
line.rjust(30)
'435'.rjust(10, '0')
'435'.zfill(10)
Out [18]: 16
In [19]:
line.index('fox')
Out [19]: 16
line.find('bear')
Out [20]: -1
In [21]: line.index('bear')
72
--------------------------------------------------------ValueError
<ipython-input-21-4cbe6ee9b0eb> in <module>()
----> 1 line.index('bear')
line.rfind('a')
Out [22]: 35
line.endswith('dog')
line.startswith('fox')
line.replace('brown', 'red')
Out [25]: 'the quick red fox jumped over a lazy dog'
The replace() function returns a new string, and will replace all
occurrences of the input:
In [26]:
line.replace('o', '--')
Out [26]: 'the quick br--wn f--x jumped --ver a lazy d--g'
73
line.partition('fox')
Out [27]: ('the quick brown ', 'fox', ' jumped over a lazy dog')
line.split()
Note that if you would like to undo a split(), you can use the
join() method, which returns a string built from a split-point and
an iterable:
In [30]:
74
Format Strings
In the preceding methods, we have learned how to extract values
from strings, and to manipulate strings themselves into desired for
mats. Another use of string methods is to manipulate string repre
sentations of values of other types. Of course, string representations
can always be found using the str() function; for example:
In [32]:
pi = 3.14159
str(pi)
If you include a string, it will refer to the key of any keyword argu
ment:
In [36]:
"""First: {first}. Last: {last}.""".format(last='Z', first='A')
Out [36]: 'First: A. Last: Z.'
75
Finally, for numerical inputs, you can include format codes that con
trol how the value is converted to a string. For example, to print a
number as a floating point with three digits after the decimal point,
you can use the following:
In [37]:
"pi = {0:.3f}".format(pi)
76
import re
regex = re.compile('\s+')
regex.split(line)
Like split(), there are similar convenience routines to find the first
match (like str.index() or str.find()) or to find and replace (like
str.replace()). Well again use the line from before:
In [41]: line = 'the quick brown fox jumped over a lazy dog'
With this, we can see that the regex.search() method operates a lot
like str.index() or str.find():
77
In [42]:
line.index('fox')
Out [42]: 16
In [43]:
regex = re.compile('fox')
match = regex.search(line)
match.start()
Out [43]: 16
method
operates
much
like
line.replace('fox', 'BEAR')
Out [44]: 'the quick brown BEAR jumped over a lazy dog'
In [45]:
regex.sub('BEAR', line)
Out [45]: 'the quick brown BEAR jumped over a lazy dog'
With a bit of thought, other native string operations can also be cast
as regular expressions.
(Note that these addresses are entirely made up; there are probably
better ways to get in touch with Guido).
We can do further operations, like replacing these email addresses
with another string, perhaps to hide addresses in the output:
In [48]:
78
email.sub('--@--.--', text)
Out [48]:
'To email Guido, try --@--.-- or the older address --@--.--.'
Finally, note that if you really want to match any email address, the
preceding regular expression is far too simple. For example, it only
allows addresses made of alphanumeric characters that end in one
of several common domain suffixes. So, for example, the period
used here means that we only find part of the address:
In [49]:
email.findall('barack.obama@whitehouse.gov')
regex = re.compile('ion')
regex.findall('Great Expectations')
regex = re.compile(r'\$')
regex.findall("the cost is $20")
79
regex = re.compile(r'\w\s\w')
regex.findall('the fox is 9 years old')
Out [54]: ['e f', 'x i', 's 9', 's o']
\s
\S
80
Character Description
Match any alphanumeric char
\w
\W
regex = re.compile('[aeiou]')
regex.split('consequential')
Similarly, you can use a dash to specify a range: for example, [a-z]
will match any lowercase letter, and [1-3] will match any of 1, 2, or
3. For instance, you may need to extract from a document specific
numerical codes that consist of a capital letter followed by a digit.
You could do this as follows:
In [56]:
regex = re.compile('[A-Z][0-9]')
regex.findall('1043879, G2, H6')
regex = re.compile(r'\w{3}')
regex.findall('The quick brown fox')
regex = re.compile(r'\w+')
regex.findall('The quick brown fox')
81
Example
ab? matches a or ab
{n}
{m,n}
With these basics in mind, lets return to our email address matcher:
In [59]: email = re.compile(r'\w+@\w+\.[a-z]{3}')
email2 = re.compile(r'[\w.]+@\w+\.[a-z]{3}')
email2.findall('barack.obama@whitehouse.gov')
82
In [61]:
email3 = re.compile(r'([\w.]+)@(\w+)\.([a-z]{3})')
In [62]:
Out [62]:
[('guido', 'python', 'org'), ('guido', 'google', 'com')]
As we see, this grouping actually extracts a list of the subcomponents of the email address.
We can go a bit further and name the extracted components using
the (?P<name> ) syntax, in which case the groups can be extracted
as a Python dictionary:
In [63]:
email4 = re.compile(r'(?P<user>[\w.]+)@(?P<domain>\w+)'\
'\.(?P<suffix>[a-z]{3})')
match = email4.match('guido@python.org')
match.groupdict()
Out [63]: {'domain': 'python', 'suffix': 'org', 'user': 'guido'}
Combining these ideas (as well as some of the powerful regexp syn
tax that we have not covered here) allows you to flexibly and quickly
extract information from strings in Python.
83
import numpy as np
x = np.arange(1, 10)
x
x ** 2
84
4,
Compare this with the much more verbose Python-style list com
prehension for the same result:
In [3]:
M = x.reshape((3, 3))
M
M.T
92, 146])
-1.11684397e+00,
-1.30367773e-15])
85
label
A
B
C
A
B
C
value
1
2
3
4
5
6
df['label']
Out [9]: 0
A
1
B
2
C
3
A
4
B
5
C
Name: label, dtype: object
df['label'].str.lower()
Out [10]: 0
a
1
b
2
c
3
a
4
b
5
c
Name: label, dtype: object
df['value'].sum()
Out [11]: 21
86
df.groupby('label').sum()
Out [12]:
value
label
A
5
B
7
C
9
Here in one line we have computed the sum of all objects sharing
the same label, something that is much more verbose (and much less
efficient) using tools provided in NumPy and core Python.
For more information on using Pandas, see the resources listed in
Resources for Further Learning on page 90.
Now lets create some data (as NumPy arrays, of course) and plot the
results:
In [15]: x = np.linspace(0, 10)
y = np.sin(x)
plt.plot(x, y);
87
If you run this code live, you will see an interactive plot that lets you
pan, zoom, and scroll to explore the data.
This is the simplest example of a Matplotlib plot; for ideas on the
wide range of plot types available, see Matplotlibs online gallery as
well as other references listed in Resources for Further Learning
on page 90.
scipy.integrate
Numerical integration
scipy.optimize
scipy.sparse
scipy.stats
88
89
90
91