C1 W2 Lab01 Python Numpy Vectorization Soln
C1 W2 Lab01 Python Numpy Vectorization Soln
Outline
1.1 Goals
1.2 Useful References
2 Python and NumPy
3 Vectors
3.1 Abstract
3.2 NumPy Arrays
3.3 Vector Creation
3.4 Operations on Vectors
4 Matrices
4.1 Abstract
4.2 NumPy Arrays
4.3 Matrix Creation
4.4 Operations on Matrices
1.1 Goals
In this lab, you will:
Review the features of NumPy and Python that are used in Course 1
3 Vectors
3.1 Abstract
Vectors, as you will use them in this course, are ordered arrays of numbers. In notation, vectors are
denoted with lower case bold letters such as 𝐱. The elements of a vector are all the same type. A vector does not, for example, contain both characters and
numbers. The number of elements in the array is often referred to as the dimension though mathematicians may prefer rank. The vector shown has a dimension
of 𝑛. The elements of a vector can be referenced with an index. In math settings, indexes typically run from 1 to n. In computer science and these labs, indexing
𝑡ℎ
will typically run from 0 to n-1. In notation, elements of a vector, when referenced individually will indicate the index in a subscript, for example, the 0 element,
of the vector 𝐱 is 𝑥0 . Note, the x is not bold in this case.
Data creation routines in NumPy will generally have a first parameter which is the shape of the object. This can either be a single value for a 1-D result or a tuple
(n,m,...) specifying the shape of the result. Below are examples of creating vectors using these routines.
In [6]: # NumPy routines which allocate memory and fill arrays with value
a = np.zeros(4); print(f"np.zeros(4) : a = {a}, a shape = {a.shape}, a data type = {a.dty
pe}")
a = np.zeros((4,)); print(f"np.zeros(4,) : a = {a}, a shape = {a.shape}, a data type = {a.dty
pe}")
a = np.random.random_sample(4); print(f"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data t
ype = {a.dtype}")
In [13]: # NumPy routines which allocate memory and fill arrays with value but do not accept shape as input argumen
t
a = np.arange(4.); print(f"np.arange(4.): a = {a}, a shape = {a.shape}, a data type = {a.
dtype}")
a = np.random.rand(4); print(f"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.
dtype}")
In [ ]:
(5,) int64
In [20]: # NumPy routines which allocate memory and fill with user specified values
a = np.array([5,4,3,2]); print(f"np.array([5,4,3,2]): a = {a}, a shape = {a.shape}, a data type =
{a.dtype}")
a = np.array([5.,4,3,2]); print(f"np.array([5.,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dty
pe}")
These have all created a one-dimensional vector a with four elements. a.shape returns the dimensions. Here we see a.shape = (4,) indicating a 1-d array
with 4 elements.
3.4.1 Indexing
Elements of vectors can be accessed via indexing and slicing. NumPy provides a very complete set of indexing and slicing capabilities. We will explore only the
basics needed for the course here. Reference Slicing and Indexing (https://NumPy.org/doc/stable/reference/arrays.indexing.html) for more details.
Indexing means referring to an element of an array by its position within the array.
Slicing means getting a subset of elements from an array based on their indices.
NumPy starts indexing at zero so the 3rd element of an vector 𝐚 is a[2] .
In [ ]:
[0 1 2 3 4 5 6 7 8 9]
index 10 is out of bounds for axis 0 with size 10
#access an element
print(f"a[2].shape: {a[2].shape} a[2] = {a[2]}, Accessing an element returns a scalar")
# access the last element, negative indexes count from the end
print(f"a[-1] = {a[-1]}")
#indexs must be within the range of the vector or they will produce and error
try:
c = a[10]
except Exception as e:
print("The error message you'll see is:")
print(e)
3.4.2 Slicing
Slicing creates an array of indices using a set of three values ( start:stop:step ). A subset of values is also valid. Its use is best explained by example:
In [37]: a = np.arange(10)
c = a[:]; print(c)
[0 1 2 3 4 5 6 7 8 9]
In [44]: a = np.array([1,2,3,4])
print(f"a :{a}")
b = a**2
print(f"b : {b}")
a :[1 2 3 4]
b : [ 1 4 9 16]
In [ ]: a = np.array([1,2,3,4])
print(f"a : {a}")
# negate elements of a
b = -a
print(f"b = -a : {b}")
b = np.mean(a)
print(f"b = np.mean(a): {b}")
b = a**2
print(f"b = a**2 : {b}")
Of course, for this to work correctly, the vectors must be of the same size:
In [ ]:
In [46]: a = np.array([1,2,3])
c = np.array([1,2])
a+c
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-46-433a64d462bd> in <module>
1 a = np.array([1,2,3])
2 c = np.array([1,2])
----> 3 a+c
ValueError: operands could not be broadcast together with shapes (3,) (2,)
# multiply a by a scalar
b = 5 * a
print(f"b = 5 * a : {b}")
b = 5 * a : [ 5 10 15 20]
The dot product multiplies the values in two vectors element-wise and then sums the result. Vector dot product requires the dimensions of the two vectors to be
the same.
Using a for loop, implement a function which returns the dot product of two vectors. The function to return given inputs 𝑎 and 𝑏 :
𝑛−1
𝑥 = ∑ 𝑎𝑖 𝑏𝑖
𝑖=0
Assume both a and b are the same shape.
Args:
a (ndarray (n,)): input vector
b (ndarray (n,)): input vector with same dimension as a
Returns:
x (scalar):
"""
x=0
for i in range(a.shape[0]):
x = x + a[i] * b[i]
return x
print(np.dot(a,b))
24
In [ ]: # test 1-D
a = np.array([1, 2, 3, 4])
b = np.array([-1, 4, 3, 2])
c = np.dot(a, b)
print(f"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} ")
c = np.dot(b, a)
print(f"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} ")
Above, you will note that the results for 1-D matched our implementation.
25000669.807339136 0.48903322219848633
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-5-7a9af421ea5a> in <module>
12 f = 0
13 for i in range(0,a.shape[0]):
---> 14 f = f + a[i]*b[i]
15 toc = time.time()
16 temp_2 = toc - tic
KeyboardInterrupt:
In [ ]: np.random.seed(1)
a = np.random.rand(10000000) # very large arrays
b = np.random.rand(10000000)
print(f"np.dot(a, b) = {c:.4f}")
print(f"Vectorized version duration: {1000*(toc-tic):.4f} ms ")
print(f"my_dot(a, b) = {c:.4f}")
print(f"loop version duration: {1000*(toc-tic):.4f} ms ")
So, vectorization provides a large speed up in this example. This is because NumPy makes better use of available data parallelism in the underlying hardware.
GPU's and modern CPU's implement Single Instruction, Multiple Data (SIMD) pipelines allowing multiple operations to be issued in parallel. This is critical in
Machine Learning where the data sets are often very large.
Going forward, our examples will be stored in an array, X_train of dimension (m,n). This will be explained more in context, but here it is important to note
it is a 2 Dimensional array or matrix (see next section on matrices).
w will be a 1-dimensional vector of shape (n,).
we will perform operations by looping through the examples, extracting each example to work on individually by indexing X. For example: X[i]
X[i] returns a value of shape (n,), a 1-dimensional vector. Consequently, operations involving X[i] are often vector-vector.
That is a somewhat lengthy explanation, but aligning and understanding the shapes of your operands is important when performing vector operations.
4 Matrices
4.1 Abstract
Matrices, are two dimensional arrays. The elements of a matrix are all of the same type. In notation, matrices are denoted with capitol, bold letter such as 𝐗. In
this and other labs, m is often the number of rows and n the number of columns. The elements of a matrix can be referenced with a two dimensional index. In
math settings, numbers in the index typically run from 1 to n. In computer science and these labs, indexing will run from 0 to n-1.
missing
Generic Matrix Notation, 1st index is row, 2nd is column
In Course 1, 2-D matrices are used to hold training data. Training data is 𝑚 examples by 𝑛 features creating an (m,n) array. Course 1 does not do operations
directly on matrices but typically extracts an example as a vector and operates on that. Below you will review:
data creation
slicing and indexing
Below, the shape tuple is provided to achieve a 2-D result. Notice how NumPy uses brackets to denote each dimension. Notice further than NumPy, when
printing, will print one row per line.
In [14]:
[[19 22]
[43 50]]
In [ ]:
In [ ]: a = np.zeros((1, 5))
print(f"a shape = {a.shape}, a = {a}")
a = np.zeros((2, 1))
print(f"a shape = {a.shape}, a = {a}")
a = np.random.random_sample((1, 1))
print(f"a shape = {a.shape}, a = {a}")
One can also manually specify data. Dimensions are specified with additional brackets matching the format in the printing above.
In [ ]: # NumPy routines which allocate memory and fill with user specified values
a = np.array([[5], [4], [3]]); print(f" a shape = {a.shape}, np.array: a = {a}")
a = np.array([[5], # One can also
[4], # separate values
[3]]); #into separate rows
print(f" a shape = {a.shape}, np.array: a = {a}")
4.4.1 Indexing
Matrices include a second index. The two indexes describe [row, column]. Access can either return an element or a row/column. See below:
#access an element
print(f"\na[2,0].shape: {a[2, 0].shape}, a[2,0] = {a[2, 0]}, type(a[2,0]) = {type(a[2, 0])} Accessin
g an element returns a scalar\n")
#access a row
print(f"a[2].shape: {a[2].shape}, a[2] = {a[2]}, type(a[2]) = {type(a[2])}")
It is worth drawing attention to the last example. Accessing a matrix by just specifying the row will return a 1-D vector.
Reshape
The previous example used reshape (https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) to shape the array.
a = np.arange(6).reshape(-1, 2)
This line of code first created a 1-D Vector of six elements. It then reshaped that vector into a 2-D array using the reshape command. This could have been
written:
a = np.arange(6).reshape(3, 2)
To arrive at the same 3 row, 2 column array. The -1 argument tells the routine to compute the number of rows given the size of the array and the number of
columns.
4.4.2 Slicing
Slicing creates an array of indices using a set of three values ( start:stop:step ). A subset of values is also valid. Its use is best explained by example:
a = [[ 0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]]
(2, 1)
Congratulations!
In this lab you mastered the features of Python and NumPy that are needed for Course 1.
In [ ]: