Numpy
Numpy
Table of contents
Table of contents ............................................................................................................................ 1
Overview ......................................................................................................................................... 2
Installation ...................................................................................................................................... 2
Other resources .............................................................................................................................. 2
Importing the NumPy module ........................................................................................................ 2
Arrays .............................................................................................................................................. 3
Other ways to create arrays............................................................................................................ 7
Array mathematics.......................................................................................................................... 8
Array iteration ............................................................................................................................... 10
Basic array operations .................................................................................................................. 10
Comparison operators and value testing ..................................................................................... 12
Array item selection and manipulation ........................................................................................ 14
Vector and matrix mathematics ................................................................................................... 16
Polynomial mathematics .............................................................................................................. 18
Statistics ........................................................................................................................................ 19
Random numbers.......................................................................................................................... 19
Other functions to know about .................................................................................................... 21
Modules available in SciPy ............................................................................................................ 21
Installation
If you installed the Anaconda distribution, then you should be ready to go. If not, then you will
have to install these add-ons manually after installing Python, in the order of NumPy and then
SciPy. Installation files are available at:
https://numpy.org
https://scipy.org/
Other resources
The NumPy and SciPy development community maintains an extensive online documentation
system, including user guides and tutorials, at:
https://docs.scipy.org/doc/
However, for large amounts of calls to NumPy functions, it can become tedious to write
numpy.X over and over again. Instead, it is common to import under the briefer name np:
This statement will allow us to access NumPy objects using np.X instead of numpy.X. It is also
possible to import NumPy directly into the current namespace so that we don't have to use dot
notation at all, but rather simply call the functions as if they were built-in:
Arrays
The central feature of NumPy is the array object class. Arrays are similar to lists in Python, except
that every element of an array must be of the same type, typically a numeric type like float or
int. Arrays make operations with large amounts of numeric data very fast and are generally
much more efficient than lists.
Here, the function array takes two arguments: the list to be converted into the array and the
type of each member of the list. Array elements are accessed, sliced, and manipulated just like
lists:
>>> a[:2]
array([ 1., 4.])
>>> a[3]
8.0
>>> a[0] = 5.
>>> a
array([ 5., 4., 5., 8.])
Arrays can be multidimensional. Unlike lists, different axes are accessed using commas inside
bracket notation. Here is an example with a two-dimensional array (e.g., a matrix):
Array slicing works with multiple dimensions in the same way as usual, applying each slice
specification as a filter to a specified dimension. Use of a single ":" in a dimension indicates the
use of everything along that dimension:
The shape property of an array returns a tuple with the size of each array dimension:
>>> a.shape
(2, 3)
The dtype property tells you what type of values are stored by the array:
>>> a.dtype
dtype('float64')
Here, float64 is a numeric type that NumPy uses to store double-precision (8-byte) real
numbers, similar to the float type in Python.
When used with an array, the len function returns the length of the first axis:
Arrays can be reshaped using tuples that specify new dimensions. In the following example, we
turn a ten-element one-dimensional array into a two-dimensional one whose first axis has five
elements and whose second axis has two elements:
Keep in mind that Python's name-binding approach still applies to arrays. The copy function can
be used to create a separate copy of an array in memory if needed:
One can convert the raw data in an array to a binary string (i.e., not in human-readable form)
using the tostring function. The fromstring function then allows an array to be created
from this data later on. These routines are sometimes convenient for saving large amount of
array data in binary files that can be read later on:
Transposed versions of arrays can also be generated, which will create a new array with the final
two axes switched:
Two or more arrays can be concatenated together using the concatenate function with a
tuple of the arrays to be joined:
If an array has more than one dimension, it is possible to specify the axis along which multiple
arrays are concatenated. By default (without specifying the axis), NumPy concatenates along the
first dimension:
Finally, the dimensionality of an array can be increased using the newaxis constant in bracket
notation:
Notice here that in each case the new array has two dimensions; the one created by newaxis
has a length of one. The newaxis approach is convenient for generating the proper-
dimensioned arrays for vector and matrix mathematics.
The functions zeros and ones create new arrays of specified dimensions filled with these
values. These are perhaps the most commonly used functions to create new arrays:
The zeros_like and ones_like functions create a new array with the same dimensions and
type of an existing one:
There are also a number of functions for creating special matrices (2D arrays). To create an
identity matrix of a given size,
Array mathematics
When standard mathematical operations are used with arrays, they are applied on an element-
by-element basis. This means that the arrays should be the same size during addition,
subtraction, etc.:
For two-dimensional arrays, multiplication remains elementwise and does not correspond to
matrix multiplication.
However, arrays that do not match in the number of dimensions will be broadcasted by Python
to perform mathematical operations. This often means that the smaller array will be repeated
as necessary to perform the operation indicated. Consider the following:
Here, the one-dimensional array b was broadcasted to a two-dimensional array that matched
the size of a. In essence, b was repeated for each item in a, as if it were given by
array([[-1., 3.],
[-1., 3.],
[-1., 3.]])
Python automatically broadcasts arrays in this manner. Sometimes, however, how we should
broadcast is ambiguous. In these cases, we can use the newaxis constant to specify how we
want to broadcast:
In addition to the standard operators, NumPy offers a large library of common mathematical
functions that can be applied elementwise to arrays. Among these are the functions: abs,
sign, sqrt, log, log10, exp, sin, cos, tan, arcsin, arccos, arctan,
sinh, cosh, tanh, arcsinh, arccosh, and arctanh.
The functions floor, ceil, and rint give the lower, upper, or nearest (rounded) integer:
Also included in the NumPy module are two important mathematical constants:
>>> np.pi
3.1415926535897931
>>> np.e
2.7182818284590451
Array iteration
It is possible to iterate over arrays in a manner similar to that of lists:
For multidimensional arrays, iteration proceeds over the first axis such that each loop returns a
subsection of the array:
>>> a[1:].sum()
7.0
In this example, member functions of the arrays were used. Alternatively, standalone functions
in the NumPy module can be accessed:
>>> np.sum(a)
9.0
>>> np.prod(a)
24.0
For most of the routines described below, both standalone and member functions are available.
A number of routines enable computation of statistical quantities in array datasets, such as the
mean (average), variance, and standard deviation:
It's also possible to find the minimum and maximum element values:
The argmin and argmax functions return the array indices of the minimum and maximum
values:
Values in an array can be "clipped" to be within a prespecified range. This is the same as applying
min(max(x, minval), maxval) to each element x in an array.
>>> c = a > b
>>> c
array([ True, False, False], dtype=bool)
The any and all operators can be used to determine whether or not any or all elements of a
Boolean array are true:
The where function forms a new array from two arrays of equivalent size using a Boolean filter
to choose between elements of the two. Its basic syntax is where(boolarray,
truearray, falsearray):
It is also possible to test whether or not values are NaN ("not a number") or finite:
Although here we used NumPy constants to add the NaN and infinite values, these can result
from standard mathematical operations.
Notice that sending the Boolean array given by a>=6 to the bracket selection for a, an array with
only the True elements is returned. We could have also stored the selector array in a variable:
In other words, we take the 0th, 0th, 1st, 3rd, 2nd, and 1st elements of a, in that order, when we use
b to select elements from a. Lists can also be used as selection arrays:
For multidimensional arrays, we have to send multiple one-dimensional integer arrays to the
selection bracket, one for each axis. Then, each of these selection arrays is traversed in sequence:
the first element taken has a first axis index taken from the first member of the first selection
array, a second index from the first member of the second selection array, and so on. An
example:
A special function take is also available to perform selection with integer arrays. This works in
an identical manner as bracket selection:
take also provides an axis argument, such that subsections of a multi-dimensional array can be
taken across a given dimension.
Note that the value 7 from the source array b is not used, since only two indices [0, 3] are
specified. The source array will be repeated as necessary if not the same size:
It is also possible to generate inner, outer, and cross products of matrices and vectors. For
vectors, note that the inner product is equivalent to the dot product:
NumPy also comes with a number of built-in routines for linear algebra calculations. These can
be found in the sub-module linalg. Among these are routines for dealing with matrices and
their inverses. The determinant of a matrix can be found:
>>> b = np.linalg.inv(a)
>>> b
array([[ 0.14814815, 0.07407407, -0.25925926],
[ 0.2037037 , -0.14814815, 0.51851852],
[-0.27777778, 0.11111111, 0.11111111]])
>>> np.dot(a, b)
array([[ 1.00000000e+00, 5.55111512e-17, 2.22044605e-16],
[ 0.00000000e+00, 1.00000000e+00, 5.55111512e-16],
[ 1.11022302e-16, 0.00000000e+00, 1.00000000e+00]])
Polynomial mathematics
NumPy supplies methods for working with polynomials. Given a set of roots, it is possible to
show the polynomial coefficients:
Here, the return array gives the coefficients corresponding to 𝑥 4 − 11𝑥 3 + 9𝑥 2 + 11𝑥 − 10.
The opposite operation can be performed: given a set of coefficients, the root function returns
all of the polynomial roots:
The functions polyadd, polysub, polymul, and polydiv also handle proper addition,
subtraction, multiplication, and division of polynomial coefficients, respectively.
Finally, the polyfit function can be used to fit a polynomial of specified order to a set of data
using a least-squares approach:
>>> x = [1, 2, 3, 4, 5, 6, 7, 8]
>>> y = [0, 2, 1, 3, 7, 10, 11, 19]
The return value is a set of polynomial coefficients. More sophisticated regression and
interpolation routines can be found in the SciPy package.
Statistics
In addition to the mean, var, and std functions, NumPy supplies several other methods for
returning statistical features of arrays.
The correlation coefficient for multiple variables observed at multiple instances can be found for
arrays of the form [[x1, x2, …], [y1, y2, …], [z1, z2, …], …] where x, y, z are different observables
and the numbers indicate the observation times:
Here the return array c[i,j] gives the correlation coefficient for the ith and jth observables.
Similarly, the covariance for data can be found:
>>> np.cov(a)
array([[ 0.91666667, 2.08333333],
[ 2.08333333, 8.91666667]])
Random numbers
An important part of any simulation is the ability to draw random numbers. For this purpose, we
use NumPy's built-in pseudorandom number generator routines in the sub-module random.
The numbers are pseudo random in the sense that they are generated deterministically from a
seed number, but are distributed in what has statistical similarities to random fashion. NumPy
uses a particular algorithm called the Mersenne Twister to generate pseudorandom numbers.
>>> np.random.seed(293423)
We can generate an array of random numbers in the half-open interval [0.0, 1.0):
>>> np.random.rand(5)
array([ 0.40783762, 0.7550402 , 0.00919317, 0.01713451, 0.95299583])
We can use the rand function to generate two-dimensional random arrays, or use the resize
function:
>>> np.random.rand(2,3)
array([[ 0.50431753, 0.48272463, 0.45811345],
[ 0.18209476, 0.48631022, 0.49590404]])
>>> np.random.rand(6).reshape((2,3))
array([[ 0.72915152, 0.59423848, 0.25644881],
[ 0.75965311, 0.52151819, 0.60084796]])
>>> np.random.random()
0.70110427435769551
To generate random integers in the range [min, max) use randint(min, max):
In each of these examples, we drew random numbers form a uniform distribution. NumPy also
includes generators for many other distributions, including the Beta, binomial, chi-square,
Dirichlet, exponential, F, Gamma, geometric, Gumbel, hypergeometric, Laplace, logistic, log-
normal, logarithmic, multinomial, multivariate, negative binomial, noncentral chi-square,
noncentral F, normal, Pareto, Poisson, power, Rayleigh, Cauchy, student's t, triangular, von
Mises, Wald, Weibull, and Zipf distributions. Here we only give examples for two of these.
>>> np.random.poisson(6.0)
5
To draw from a continuous normal (Gaussian) distribution with mean 𝜇 = 1.5 and standard
deviation 𝜎 = 4.0:
>>> np.random.normal()
0.27548716940682932
>>> np.random.normal(size=5)
array([-1.67215088, 0.65813053, -0.70150614, 0.91452499, 0.71440557])
The random module can also be used to randomly shuffle the order of items in a list. This is
sometimes useful if we want to sort a list in random order:
>>> l = range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> np.random.shuffle(l)
>>> l
[4, 9, 5, 0, 2, 7, 6, 8, 1, 3]
Notice that the shuffle function modifies the list in place, meaning it does not return a new list
but rather modifies the original list itself.
The help function provides useful information on the packages that SciPy offers:
>>> help(scipy)
NAME
scipy
DESCRIPTION
SciPy: A scientific computing package for Python
================================================
Contents
--------
SciPy imports all the functions from the NumPy namespace, and in
addition provides:
Subpackages
-----------
Using any of these subpackages requires an explicit import. For example,
``import scipy.cluster``.
::
Utility tools
-------------
::
PACKAGE CONTENTS
__config__
_build_utils (package)
_distributor_init
_lib (package)
cluster (package)
conftest
constants (package)
fft (package)
fftpack (package)
integrate (package)
interpolate (package)
io (package)
linalg (package)
misc (package)
ndimage (package)
odr (package)
optimize (package)
setup
signal (package)
sparse (package)
spatial (package)
special (package)
stats (package)
version
Notice that a number of sub-modules in SciPy require explicit import, as indicated in the help text
above:
The functions in each module are well-documented in both the internal docstrings and at the
SciPy documentation website. Many of these functions provide instant access to common
numerical algorithms, and are very easy to implement. Thus, SciPy can save tremendous
amounts of time in scientific computing applications since it offers a library of pre-written, pre-
tested routines.
A large community of developers continually builds new functionality into SciPy. A good rule of
thumb is: if you are thinking about implementing a numerical routine into your code, check the
SciPy documentation website first. Chances are, if it's a common task, someone will have added
it to SciPy.