0% found this document useful (0 votes)
1 views

Numerical Python v0.3

The document provides a tutorial on using Python, particularly focusing on the NumPy library for numerical computing. It covers installation, downloading resources, and key features of NumPy, including data structures, memory layout, and basic operations. The tutorial emphasizes Python's strengths and the importance of specialized libraries for efficient numerical calculations.

Uploaded by

abouqora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Numerical Python v0.3

The document provides a tutorial on using Python, particularly focusing on the NumPy library for numerical computing. It covers installation, downloading resources, and key features of NumPy, including data structures, memory layout, and basic operations. The tutorial emphasizes Python's strengths and the importance of specialized libraries for efficient numerical calculations.

Uploaded by

abouqora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

Numerical

Python
v0.3

Research Computing
Services
IS & T
Run
Spyder
 Start the
Anaconda
Navigator

 Click on
Spyder’s
Launch button

 Be patient…it
takes a while to
start.
Download from the RCS
website
Open a browser and go to:
http://rcs.bu.edu/examples/python/tutorials/

 Click on the rcs_py_tutorials_2020.zip file and download it.

 This ZIP archive containing all of the presentations and


sample Python files for all of the available Python tutorials.
 This is for everyone, regardless of the specific Python tutorial you are taking.
 This ZIP archive must be extracted to a folder to use it with the Python
software.
 Windows: right-click the ZIP file and choose “Extract All”
 OSX: double-click the ZIP file
 Linux: use the unzip utility from a terminal: unzip
rcs_py_tutorials_2020.zip
SCC Python
Tutorials
 Introduction to Python, Part
one
 Introduction to Python, Part
two
 Numerical Python
 Python for Data Analysis
 Data Visualization in Python
 Introduction to Python Scikit-
learn
 Python Optimization
Outlin
e Python lists

 The numpy library

 Speeding up numpy: numba and


numexpr

 Libraries: scipy and opencv

 Alternatives to Python
Python’s
strengths
 Python is a general purpose language.
 In contrast with R or Matlab which started out as specialized languages

 Python lends itself to implementing complex or specialized


algorithms for
solving computational problems.

 It is a highly productive language to work with that’s been


applied to hundreds of subject areas.
Extending its
Capabilities
However…for number crunching some aspects of the
language are not optimal:
 Runtime type checks
 No compiler to analyze a whole program for optimizations
 General purpose built-in data structures are not optimal for numeric
calculations

 “regular” Python code is not competitive with compiled


languages (C, C++,
Fortran) for numeric computing.

 The solution: specialized libraries that extend Python with data


structures and algorithms for numeric computing.
 Keep the good stuff, speed up the parts that are slow!
Outlin
e The numpy library

 Libraries: scipy and opencv

 When numpy / scipy isn’t fast


enough
NumP
y
 NumPy provides optimized data structures and basic
routines for
manipulating multidimensional numerical data.

 Mostly implemented in compiled C code.

 NumPy underlies many other numeric and algorithm


libraries used throughout the Python software ecosystem.
Source: Continuum
Analytics
Ndarray – the basic NumPy data
type
 NumPy ndarray’s are:
 Typed
 Fixed in size
 Fixed in dimensionality
 An ndarray can be constructed from:
 Conversion from a Python list, set, tuple, or similar data
structure
 NumPy initialization routines
 Copies or computations with other ndarray’s
 NumPy-based functions as a return value
ndarray vs
list
 List:  Ndarray:
 General purpose  Intended to store and
 Untyped process (mostly) numeric
 1 dimension data
  Typed
Resizable
 Add/remove elements anywhere  N-dimensions
 Chosen at creation time
 Accessed with [ ] notation
and  Fixed size
integer indices  Chosen at creation time
 Accessed with [ ] notation
and integer indices
List x = ['a','b',3.14]

Review
Operation Syntax Notes

Indexing – starting from 0 x[0]  ‘a’

x[1]  ‘b’
Indexing backwards from -1 x[-1]  3.14

x[-3]  ‘a’
Slicing x[start:end:incr] Slicing produces a COPY
of the original list!
x[0:2]  [‘a’,’b’]

x[-1:-3:-1]  [3.14,’b’]

x[:]  [‘a’,’b’,3.14]
Sorting x.sort()  in-place sort Depending on list
sorted(x)  returns a new sorted list contents a sorting
function might be req’d
Size of a list len(x)
Each has a unique
Computer address

memory
All the bytes in a …
row
 All of the memory in a computer is accessed (and connected
electronically) as a 1D array.Any byte can be read via its unique
address.

 When a program reads data from memory the fastest way to


do so is to read data from adjacent addresses.
 Underlying specialized pools of memory called caches handle this. These are
managed by the computer hardware, not the programmer.

 Let’s contrast how Python lists and ndarrays organize


themselves in
memory…
List x = ['a','b',3.14]
Implementation
 A Python list mimics a linked list data structure when used
 It’s implemented as a resizable array of pointers to Python objects for
performance reasons.
Pointer to a 'a'
Python object
Allocated
Pointer to a
x Python object 'b' anywhere in
memory
Pointer to a
Allocated as a continuous
Python object 3.14
block of memory

 x[1]  get the pointer (memory address) at index 1  resolve


pointer to retrieve the Python object in memory  get the
value from the object  return ‘b’
import numpy as np
# Initialize a NumPy array
NumPy # from a Python list
y = np.array([1,2,3])
ndarray
 The basic data type is a class called ndarray.
 The object has:
 a data that describes the array (data type, number of dimensions, number of
elements, memory
format, etc.)
 A contiguous array in memory containing the data.
Values are
Data description physically
(integer, 3 elements, 1-D) adjacent in
y memory

1 2 3
 y[1]  check the ndarray data type  retrieve the value at offset
1 in the data array  return 2
https://docs.scipy.org/doc/numpy/reference/arrays.html
dtyp

e
Every ndarray has a dtype, the a = np.array([1,2,3])
type of data that it holds. a.dtype 
 This is used to interpret the dtype('int32')
block of
data stored in the ndarray.
c = np.array([-1,4,124],
dtype='int8')
 Can be assigned at creation c.dtype --> dtype('int8')
time:

 Conversion from one type to b = a.astype('float')


b.dtype  dtype('float64')
another is done with the
astype() method:
Ndarray memory

notes
The memory allocated by an ndarray:

 Storage for the data: N elements * bytes-per-element


 4 bytes for 32-bit integers, 8 bytes for 64-bit floats (doubles), 1 byte for 8-bit
characters etc.

 A small amount of memory is used to store info about the ndarray (~few dozen bytes)

 Data storage is compatible with external libraries


 C, C++, Fortran, or other external libraries can use the data allocated in an ndarray
directly without any conversion or copying.
ndarray from numpy
initialization
 There are a number of initialization routines. They are
mostly copies of similar routines in Matlab.
 These share a similar syntax:
function_name([size of dimensions list], opt. dtype…)
 zeros – everything initialized to zero.
 ones – initialize elements to one.
 empty – do not initialize elements
 identity – create a 2D array with ones on the diagonal and zeros elsewhere
 full – create an array and initialize all elements to a specified value
 Read the docs for a complete list and descriptions.
x = [1,2,3]
ndarray from a y = np.array(x)

list
 The numpy function array creates a new array from any data
structure with array like behavior (other ndarrays, lists, sets,
etc.)
 Read the docs!

 Creating an ndarray from a list does not change the list.

 Often combined with a reshape() call to create a multi-


dimensional array.

 Open the file ndarray_basics.py in Spyder so we can check out


some examples.
ndarray oneD = np.array([1,2,3,4])

indexing twoD =
oneD.reshape([2,2])
 ndarray indexing is similar
twoD  array([[1, 2],
to Python lists, strings, [3, 4]])
tuples, etc.
# index from 0
oneD[0]  1
 Index with integers, oneD[3]  4
starting from # -index starts from the end
zero. oneD[-1]  4
oneD[-2]  3

 Indexing N-dimensional # For multiple dimensions


use a comma
arrays, just use commas: # matrix[row,column]
twoD[0,0]  1
array[i,j,k,l] = 42 twoD[1,0]  3
y = np.arange(50,300,50)
ndarray # y --> array([ 50, 100, 150, 200, 250])

 slicing
Syntax for each dimension (same
rules as lists): y[0:3] --> array([ 50, 100, 150])
 start:end:step y[-1:-3:-1] --> array([250, 200])
 start:  from starting index to end
 :end  start from 0 to end (exclusive of x = np.arange(10,130,10).reshape(4,3)
end) # x --> array([[ 10, 20, 30],
 :  all elements. [ 40, 50, 60],
[ 70, 80, 90],
 Slicing an ndarray does not [100, 110, 120]])
make a copy, it creates a view
to the original data. # 1-D returned!
x[:,0] --> array([ 10, 40, 70, 100])
# 2-D returned!
 Slicing a Python list creates a x[2:4,1:3] --> array([[ 80,
copy. 90], [110, 120]])

Look at the file slicing.py


ndarray slicing
assignment
 Slice notation on the left hand side of an = sign overwrites elements of
an ndarray.
Can be combined with right hand slicing.
y = np.arange(50,300,50)
# y --> array([ 50, 100, 150, 200,
250])

y[0:3] = -1
# y --> array([ -1, -1, -1, 200,
250])

y[0:8] = -1
# NO ERROR!
# y --> array([ -1, -1, -1, -1, -1])
y[0:2] = x[3:5]
# =
x y np.array([10,20,30,40,50,60])
--> array([ 40, 50, -1, -1, -1])
ndarray addressing with an
ndarray
 Ndarray’s can be used to a=np.linspace(-1,1,5)
# a --> [-1. , -0.5, 0. , 0.5, 1. ]
address/index another
ndarray. b=np.array([0,1,2])

a[b] # --> array([-1. , -0.5, 0.])


 Use integer or Boolean
c = np.array([True, False, True, True,
values. False])

# Boolean indexing returns elements


 Remember: this still # where True
returns a view. a[c] # --> array([-1. , 0. ,
0.5])
numpy.whe

re
Similar to find in Matlab.
a=np.linspace(-1,1,5)
# a --> [-1. , -0.5, 0. , 0.5, 1. ]

 Syntax: # Returns a TUPLE containing the INDICES where


# the condition is True!
np.where(a <= 0)
numpy.where(condition, [x,y]) # --> (array([0, 1, 2], dtype=int64),)

 Condition: some Boolean np.where(a <= 0, -a, 20*a)


condition # --> array([ 1. , 0.5, -0. , 10. , 20. ])
applied to an ndarray
 x, y: Optional variables to
choose from. x is for
condition == True, y is for
condition == False.
 All three arguments must
apply to ndarray’s.
ndarray memory
layout

X = np.ones([3,5],order='F')
# OR...
The memory layout (C or # Y is C-ordered by default
Fortran order) can be set: Y = np.ones([3,5])
 This can be important when dealing # Z is a F-ordered copy of
with Y
external libraries written in R, Matlab, Z = np.asfortranarray(Y)
etc.

 Row-major order: C, C++, Java,


C#, and others

 Column-major order:
Fortran, R, Matlab, and
others

https://en.wikipedia.org/wiki/Row-_and_column-major_order
ndarray memory
# Y is C-ordered by default

layout
For row-major ordering the
Y = np.ones([2,3,4])
# For loop indexing:
rightmost index accesses total=0.0
values in adjacent memory. for i in range(Y.shape[0]):
for j in range(Y.shape[1]):
for k in range(Y.shape[2]):
 The opposite is true for column- total += Y[i,j,k]
major ordering.
# X is Fortan-ordered
X = np.ones([2,3,4], order='F')
 This can have a significant # For loop indexing:
impact on performance when total=0.0
accessing ndarrays with for for i in range(X.shape[2]):
for j in range(X.shape[1]):
loops or built-in functions like for k in range(X.shape[0]):
numpy.sum() total += X[k,j,i]

Look at the file row_vs_col_timing.py


ndarray
math a = np.array([1,2,3,4])
 By default operators b = np.array([4,5,6,7])
work element-by-
c = a / b
element # c is an ndarray
print(type(c))  <class 'numpy.ndarray'>

 These are executed a * b  array([ 4, 10, 18, 28])


in compiled C a + b  array([ 5, 7, 9, 11])
a - b  array([-3, -3, -3, -3])
code. a / b  array([0.25, 0.4, 0.5, 0.57142857])
-2 * a + b  array([ 2, 1, 0, -1])
 Vectors are applied
a = np.array([2,2,2,2])
row-by-row to
matrices c = np.array([[1,2,3,4],
[4,5,6,7],
 The length of the [1,1,1,1],
[2,2,2,2]])  array([[1, 2, 3, 4],
vector must match [4, 5, 6, 7],
the width of the [1,
[2,
1,
2,
1,
2,
1],
2]])
row. a + c  array([[3, 4, 5, 6],
[6, 7, 8, 9],
[3, 3, 3, 3],
[4, 4, 4, 4]])
Linear algebra
multiplication
 Vector/matrix multiplication
a = np.array([[1, 0], [0, 1]])
can be done using the b = np.array([[4, 1], [2, 2]])
dot(), cross() functions, or @ np.dot(a, b) # --> array([[4, 1],
operator # [2, 2]])
a @ b # --> array([[4, 1],
# [2, 2]])
 There are many other np.cross(a,b)# --> array([ 1, -
linear algebra routines! 2])

https://docs.scipy.org/doc/numpy/reference/routines.linalg.html
NumPy
I/O
 When reading files you can use standard Python, use lists,
allocate
ndarrays and fill them.

 Or use any of NumPy’s I/O routines that will directly generate


ndarrays.

 The best way depends on the structure of your data.

 If dealing with structured numeric data (tables of numbers, etc.)


NumPy is easier and faster.

 Docs: https://docs.scipy.org/doc/numpy/reference/routines.io.html
Numpy
docs
 As numpy is a large library we can only cover the basic
usage here

Let’s look that the official docs:


https://docs.scipy.org/doc/numpy/reference/ind
ex.html

 As an example, computing an average:


https://docs.scipy.org/doc/numpy/reference/generated/nu
mpy.mean.html#numpy.mean
Some numpy file reading
numpy.save # save .npy
options
.npz and .npy file formats (cross- numpy.savez # save .npz
# ditto, with compression
platform numpy.savez_compressed
compatible) :
 .npy files store a single NumPY variable in a numpy.load # load .npy
binary format. numpy.loadz #
 .npz files store multiple NumPy Variables in load .npz
a file.

 h5py is a library that reads HDF5 Tutorial: https://docs.scipy.or


files into ndarrays g/doc/nu
mpy/user/basics.io.html
 The I/O routines allow for flexible
reading from a variety of text file
formats
NumPy I/O
example
Read in a data file from a set of ocean
B0
1
weather
buoys.
 File: buoy_data.csv
 18 columns. 1st column are dates, the rest are
numeric data for different buoys.
 Some rows have dates but are missing data points in
some
columns. A0
 Use the most flexible NumPy file reader, 1

genfromtxt.
 Task: read temperature data into a 2D
ndarray, clean up missing data, plot the
data.
Outlin
e The numpy
library
 Libraries: scipy and opencv

 When numpy / scipy isn’t fast


enough
• physical constants and conversion factors
SciP • hierarchical clustering, vector quantization, K-
means
y SciPy builds on top
• Discrete Fourier Transform algorithms
• numerical integration routines
of NumPy. • interpolation tools
• data input and output
• Python wrappers to external libraries
 Ndarrays are the • linear algebra routines
basic data • miscellaneous utilities (e.g. image
structure used. reading/writing)
• various functions for multi-dimensional image
processing
 Libraries are • optimization algorithms including linear
programming
provided for:
• signal processing tools
• sparse matrix and related algorithms
 Comparable to • KD-trees, nearest neighbors, distance
functions
Matlab toolboxes. • special functions
scipy.i
o
I/O routines support a wide variety of file
formats:Software Format Read? Write?
name
Matlab .mat Yes Yes
IDL .sav Yes No
Matrix Market .mm Yes Yes
Netcdf .nc Yes Yes
Harwell-Boeing .hb Yes Yes
(sparse matrices)
Unformatted Fortran files .anything Yes Yes
Wav (sound) .wav Yes Yes
Arff .arff Yes No
(Attribute-Relation File
Format)
Using
SciPy
Think about your code and what sort of algorithms you’re
using:
 Integration, statistical sampling, linear algebra, image processing, etc.

 See if an appropriate algorithm exists in SciPy before


trying to write your own.

 Read the docs – many functions have large numbers of


optional arguments.

 Understand the algorithms!


Example: Fit a line with
SciPy
There are many ways to fit equation parameters to data in
NumPy and SciPy

 scipy.stats.linregress: Calculate a regression line

 Open the example linregress.py

 This demonstrates calling the function and extracting all


the info it returns.
Example: 𝑦 = 3𝑥2 + 𝑥
−1
scipy.optimize.minimize
 Finds the minimum value of a
function.
 You provide the function as an
argument to minimize.
 Using functions as arguments is a
common pattern in scipy.

 Open scipy_minimize.py

https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html
OpenC
V • Image Processing
 The Open Source • Image file reading and writing
• Video I/O
Computer Vision Library • High-level GUI
• Video Analysis
• Camera Calibration and 3D Reconstruction
 Highly optimized and mature
• 2D Features Framework
C++ library usable from C+ • Object Detection
+, Java, and Python. • Deep Neural Network module
• Machine Learning
• Clustering and Search in Multi-Dimensional Spaces
 Cross platform: Windows, • Computational Photography
Linux, • Image stitching
Mac OSX, iOS, Android
OpenCV vs
SciPy
 For imaging-related operations and many linear algebra functions
there is a lot of overlap between these two libraries.

 OpenCV functions are frequently faster for the same operation


than SciPy,
sometimes significantly so.

 The OpenCV Python API uses NumPy ndarrays, making OpenCV


algorithms compatible with SciPy and other libraries.
OpenCV vs
 ASciPy
simple benchmark: Gaussian and
median filtering a 1024x671 pixel
image of the CAS building.
 Gaussian: radius 5, median: radius 9. See: image_bench.py
 Timing: 2.4 GHz Xeon E5-2680 (Sandybridge)

Operation Function Time (msec) OpenCV speedup

scipy.ndimage.gaussian_f 85.7
Gaussian 3.7x
ilter cv2.GaussianBlur 23.2

scipy.ndimage.median_filter 1,780
Median 22.5x
cv2.medianBlur 79.2
When NumPy and SciPy aren’t fast
enough
 Auto-compile your Python code with the numba and numexpr
libraries

 Use the Intel Python distribution

 Re-code critical paths with Cython

 Combine your own C++ (with SWIG) or Fortran code (with


f2py) and call
from Python
numb
a
 The numba library can translate portions of your Python code
and compile it into machine code on demand.

 Achieves a significant speedup compared with regular Python.

 Compatible with numpy ndarrays.

 Can generate code to execute automatically on GPUs.


numb from numba import jit

a
 The @jit decorator is # This will get compiled when it's
used to indicate which first executed
@jit
functions are compiled.
def average(x,
 Options: y, z):
 GPU code generation return (x
 Parallelization + y + z) /
 Caching of compiled code 3.0

 Can produce faster array


# With type information this one gets
code than pure NumPy # compiled when the file is read.
statements. @jit
(float64(float64,float64,float64))
def average_eager(x, y, z):
return (x + y + z) / 3.0
numex
pr import numpy as np
import numexpr as ne
 Another acceleration
library for Python. a = np.arange(10)
b = np.arange(0, 20, 2)

 Useful for speeding up # Plain NumPy


specific c = 2 * a + 3 * b
ndarray expressions.
 Typically 2-4x faster than plain # Numexpr
NumPy d = ne.evaluate("2*a+3*b")

 Code needs to be edited to


move ndarray expressions
into the numexpr.evaluate
function:
Intel
Python
 Intel now releases a customized build of Python 2.7 and 3.6
based on their optimized libraries.

 Can be installed stand-alone or inside of Anaconda:


https://software.intel.com/en-us/distribution-for-python

 Available on the SCC: module avail python2-intel (or python3-


intel)
Intel
Python
 In RCS testing on various projects the Intel Python build is at
least as fast as the regular Python and Anaconda modules on
the SCC.
 Sometimes it’s a fair amount faster.
 The record so far was one case involving processing several GB’s of XML code.
Intel Python
was 20x faster!

 Easy to try: change environments in Anaconda or load the SCC


module.

 You can use the Intel Thread Building Blocks library which may
improve performance on multithreaded Python programs:
Cytho
n
 Cython is a superset of the Python language.

 The additional syntax allows for C code to be auto-


generated and compiled from Python code.

 This can make mixing Python, Cython, and C code (or


libraries) very straightforward.

 A mature library that is widely used.


You feel the need for
speed…
Auto-compilation systems like numba, numexpr, and Cython:
 all provide access to higher speed code
 minimal to significant code changes
 You’re still working in Python or Python-like code
 Faster than NumPy which is also much faster than plain Python for numeric
calculation

 For the fastest implementation of algorithms, optimized and


well-written C, C++, and Fortran codes are very hard to beat
 Connect C or C++ to Python with SWIG
 Connect Fortran to Python with f2py (part of Numpy)

 Contact RCS for help!


End-of-course Evaluation
Form
 Please visit this page and fill in the evaluation form for this
course.

 Your feedback is highly valuable to the RCS team for the


improvement and development of tutorials.

 If you visit this link later please make sure to select the correct
tutorial – name, time, and location.

http://scv.bu.edu/survey/tutorial_evaluation.html

You might also like