8/5/2020 Introduction to Numpy.
ipynb - Colaboratory
Overview
In this module, we will discuss about numpy (Numerical Python), a widely used library for
scienti c computing
Out of all the data structures of NumPy, Numpy arrays (or ndarrays) are the most widely used
data structure among scienti c community
Numpy arrays are highly optimized for larger volumes of data (millions of entries)
As a result, almost all the Python based scienti c softwares and tools are built based on
NumPy arrays.
Numpy Basics
We can import numpy to script import numpy as np .
as np allows us to use numpy functions and operations by simply using np.foo() instead of
numpy.foo() .
import numpy as np
Numpy Arrays
Creating Arrays
Numpy arrays (Also known as ndarrays )are highly optimized for larger volumes of data than
python lists.
In almost all scienti c experiments with large amounts of data, numpy arrays are used over python
lists. (~ millions of entries)
Numpy has functions to convert python lists easily to numpy arrays.
# create a simple python list
data1 = [9, 8, 7, 1, 2, 3]
numpydata1 = np.array(data1)
numpydata1
array([9, 8, 7, 1, 2, 3])
https://colab.research.google.com/drive/1hH3fneZ2O1MYx7sROfe26NdQAHmC9bjD#printMode=true 1/11
8/5/2020 Introduction to Numpy.ipynb - Colaboratory
Numpy has some other ways to create arrays as well. We will discuss them later.
We can build multi-dimensional numpy arrays similarly from python lists.
# create a multi-dimensional python list
data2 = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
numpydata2 = np.array(data2)
numpydata2
Numpy Array Properties
Shape
Shape is a numpy array property that describes the number of elements in each dimension of the
numpy array.
We can access shape by simply calling array_name.shape in the script.
data1 = [9, 8, 7, 1, 2, 3]
numpydata1 = np.array(data1)
numpydata1.shape
data2 = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
numpydata2 = np.array(data2)
numpydata2.shape
Each number in tuple indicates the number of elements in each dimension.
Number of Dimensions
https://colab.research.google.com/drive/1hH3fneZ2O1MYx7sROfe26NdQAHmC9bjD#printMode=true 2/11
8/5/2020 Introduction to Numpy.ipynb - Colaboratory
Ndim property describes the number of dimensions in the array. We can access property by calling
array_name.ndim .
data1 = [9, 8, 7, 1, 2, 3]
numpydata1 = np.array(data1)
numpydata1.ndim
data2 = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
numpydata2 = np.array(data2)
numpydata2.ndim
Data Type
Numpy array data type property describes the data type used to store data in the array which can
be accesses by calling array_name.dtype
data1 = [9, 8, 7, 1, 2, 3]
numpydata1 = np.array(data1)
numpydata1.dtype
data2 = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
numpydata2 = np.array(data2)
numpydata2.dtype
Depending on the level of precision required, the appropriate data type of data needs to be
determined. It is also important to note that high precision will require more memory, and
processing power which are also constraints when analyzing data.
https://colab.research.google.com/drive/1hH3fneZ2O1MYx7sROfe26NdQAHmC9bjD#printMode=true 3/11
8/5/2020 Introduction to Numpy.ipynb - Colaboratory
Creating Arrays - Special Functions
All zeros list can be made by calling function np.zeros(shape) function.
zeros = np.zeros((2, 3))
zeros
Similarly we can form a array of all elements ones by using np.ones function.
ones = np.ones((3, 2))
ones
Also a list with random elements can be generated using np.random.randint(start, end, shape)
function.
random = np.random.randint(0, 10, (3,3))
random
Numpy Array Operations
Reshape
We can change the shape of an array using reshape operation. We can use this by calling
np.reshape(array_name, target_shape)
np.reshape(numpydata1, (2, 3))
https://colab.research.google.com/drive/1hH3fneZ2O1MYx7sROfe26NdQAHmC9bjD#printMode=true 4/11
8/5/2020 Introduction to Numpy.ipynb - Colaboratory
np.reshape(numpydata2, (6,))
Sort
np.sort(numpydata1)
Concatenate
Concatenate operation allows to join multiple arrays along a dimension using
np.concatenate((array1, array2), axis=axis1) .
By chosing axis=1 , we can concatenate a new array in 1st dimension. (Or add a row if we have 2
dimensional data)
# Create array of shape (2, 2)
a = np.array([[1, 2], [6, 7]])
# Create an array of shape (1, 2)
b = np.array([[4, 5]])
c = np.concatenate((a, b), axis=0)
c.shape
https://colab.research.google.com/drive/1hH3fneZ2O1MYx7sROfe26NdQAHmC9bjD#printMode=true 5/11
8/5/2020 Introduction to Numpy.ipynb - Colaboratory
By chosing axis=1 we can concatenate arrays in second dimension. (or add a column for 2
dimensional data)
x = np.array([[1, 2], [8, 9]])
x.shape
y = np.array([[3], [4]])
y.shape
z = np.concatenate((x, y), axis=1)
z.shape
It is important to note that shape needs to be same in both arrays other than the dimension we are
going to concatenate to be able to concatenate. Try np.concatenate((a, b), axis=1) and
np.concatenate((x, y), axis=0)
Activity
A device having 3 accelerometers collecting acceleration information during a certain period of
time has collected acceleration encountered during each second has been stored under the
variable accerationdata .
During the same time period another device has collected temperature reading outside and inside
the assembly and has been saved to variable temperturedata . (Inside reading followed by an
outside reading)
Your task is to combine the readings to a single variable combineddata so that contains both
acceleration and temperature data.
accelerationdata = np array([[0 03463151 0 48593401 0 29648785]
https://colab.research.google.com/drive/1hH3fneZ2O1MYx7sROfe26NdQAHmC9bjD#printMode=true 6/11
8/5/2020 Introduction to Numpy.ipynb - Colaboratory
accelerationdata = np.array([[0.03463151, 0.48593401, 0.29648785],
[0.6746004 , 0.88983019, 0.38779023],
[0.75813463, 0.87322111, 0.05209736],
[0.14376458, 0.95533169, 0.75532094],
[0.17252515, 0.35901729, 0.27063359],
[0.4135009 , 0.86243141, 0.53516819],
[0.80347004, 0.36083334, 0.79639674],
[0.81023186, 0.18515889, 0.64252951],
[0.66539218, 0.20486895, 0.18353906],
[0.54754633, 0.18408961, 0.30367977]])
temperaturedata = np.array([166, 251, 108, 238, 229, 236, 194, 161, 266, 108, 102, 291, 235,
121, 188, 183, 183, 137, 129, 133])
Arithmatic Operations
Addition/ Multiplication/ Subtraction/ Division - By using +, *, -, / operations along with
another number we can perform multiplication to each element in the array.
a = np.array([1, 2, 4])
a* 4
By using +, *, -, / ` operations with another array. we can perform element wise multiplication
between two arrays.
a = np.array([1, 2, 3])
b = np.array([3, 2, 1])
a*b
Comparison - Similarly we can perform comparison of elements using >, <, =<, >=, ==
operators with either elements or lists.
a = np.array([4, 5, 6])
a >= 5
https://colab.research.google.com/drive/1hH3fneZ2O1MYx7sROfe26NdQAHmC9bjD#printMode=true 7/11
8/5/2020 Introduction to Numpy.ipynb - Colaboratory
a = np.array([4, 5, 6])
b = np.array([4, 3, 6])
a == b
Activity
The sensory data we had in previous stage requires some modi cations in this stage. The
acceleration_data corresponds to volatage reading from the sensors and needs to be converted
to standard measurement units. For that, each element needs to be multiplied by 2.5391.
Also temperatures are need to be converted to correct unit by adding 273.
Your task is to form corrected values to correctedAccelerationData , correctedTemperatureData
and also correctedCombinedData variables.
Also determine whether there are any outliers or possible incorrect readings of acceleration.
(Corrected value being greater than 2.5 is usually considered an outlier)
# Start your code here
Functions on NumPy Arrays
We have set of statistical functions that we can easily apply over a numpy function.
we can call them as np.sum(array)
mean(array)
sum(array)
amin(array)
amax(array)
array = np.array([1, 2, 3, -1, 5, 6, 7, 8])
np.amin(array)
https://colab.research.google.com/drive/1hH3fneZ2O1MYx7sROfe26NdQAHmC9bjD#printMode=true 8/11
8/5/2020 Introduction to Numpy.ipynb - Colaboratory
Numpy also have Mathematical functions such as sine, cosine and tangenet. These functions get
applied element-wise to all the elements in the array.
Some of the functions include,
sin(array)
tan(array)
log10(array)
arr = np.array([10, 100, 1000])
np.log10(arr)
Activity
Consider the scenario we discussed earlier, instead of using inside and outside temperature
readings separately over different points of time, we will be using average of all the readings for our
scienti c experiment.
Also for our experiment, we need to get the tangent(tan) reading of the standard acceleration
values.
Your task is to do the appropriate operations to correctedAccelerationData and
correctedTemperatureData and obtain relevant readings for the experiments.
np.tan(accelerationdata)
Indexing and Slicing
Indexing
https://colab.research.google.com/drive/1hH3fneZ2O1MYx7sROfe26NdQAHmC9bjD#printMode=true 9/11
8/5/2020 Introduction to Numpy.ipynb - Colaboratory
Similar to python lists, we can access individual elements in numpy arrays using the index for a 1D
array.
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
a[0]
In the case of multi-dimensional array, it return an array. To access individual elements, we have to
call/access index recursively.
b = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
b[1]
b[1][0]
In numpy, there's an alternative notation to recursively access elements.
b[1, 0]
Slicing
Slicing allows us to select multiple elements based on a simple criteria based on indices.
a = np.array([1, 2, 3, 4, 5, 6, 7])
a[3:]
a[:5]
a[3:5]
https://colab.research.google.com/drive/1hH3fneZ2O1MYx7sROfe26NdQAHmC9bjD#printMode=true 10/11
8/5/2020 Introduction to Numpy.ipynb - Colaboratory
Numpy Arrays vs Python lists
To demonstrate the key difference, let's do a simple operation.
1. Create a python list (A, lets have numbers from 1 to 5)
2. Slice some part of the list (X, let's take all elements except rst element)
3. Make a change to the slice (Make rst element of X -1)
And see whether there's a change to the original list.
# insert your code here
Let's repeat the same element with a Numpy Array. Name the numpy array B and slice we take as Y.
# insert your code here
What is your observation?
https://colab.research.google.com/drive/1hH3fneZ2O1MYx7sROfe26NdQAHmC9bjD#printMode=true 11/11