Python Data Science Handbook
Python Data Science Handbook
Python Data Science Handbook
by Jake VanderPlas
Copyright © 2017 Jake VanderPlas. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐
tutional sales department: 800-998-9938 or corporate@oreilly.com.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Python Data Science Handbook, the
cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
While the publisher and the author have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the author disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.
978-1-491-91205-8
[LSI]
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
iii
Passing Values to and from the Shell 18
Shell-Related Magic Commands 19
Errors and Debugging 20
Controlling Exceptions: %xmode 20
Debugging: When Reading Tracebacks Is Not Enough 22
Profiling and Timing Code 25
Timing Code Snippets: %timeit and %time 25
Profiling Full Scripts: %prun 27
Line-by-Line Profiling with %lprun 28
Profiling Memory Use: %memit and %mprun 29
More IPython Resources 30
Web Resources 30
Books 31
2. Introduction to NumPy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Understanding Data Types in Python 34
A Python Integer Is More Than Just an Integer 35
A Python List Is More Than Just a List 37
Fixed-Type Arrays in Python 38
Creating Arrays from Python Lists 39
Creating Arrays from Scratch 39
NumPy Standard Data Types 41
The Basics of NumPy Arrays 42
NumPy Array Attributes 42
Array Indexing: Accessing Single Elements 43
Array Slicing: Accessing Subarrays 44
Reshaping of Arrays 47
Array Concatenation and Splitting 48
Computation on NumPy Arrays: Universal Functions 50
The Slowness of Loops 50
Introducing UFuncs 51
Exploring NumPy’s UFuncs 52
Advanced Ufunc Features 56
Ufuncs: Learning More 58
Aggregations: Min, Max, and Everything in Between 58
Summing the Values in an Array 59
Minimum and Maximum 59
Example: What Is the Average Height of US Presidents? 61
Computation on Arrays: Broadcasting 63
Introducing Broadcasting 63
Rules of Broadcasting 65
Broadcasting in Practice 68
iv | Table of Contents