Unit III Python
Unit III Python
Unit III Python
UNIT III: (10 hours) NumPy Basics: Arrays and Vectorized Computation- The
NumPy ndarray- Creating ndarrays Data Types for ndarrays- Arithmetic with
NumPy Arrays- Basic Indexing and Slicing - Boolean Indexing-Transposing Arrays
and Swapping Axes. Universal Functions: Fast Element-Wise Array Functions-
Mathematical and Statistical Methods-Sorting- Unique and Other Set Logic
NumPy:
NumPy, short for Numerical Python, is one of the most important
foundational packages for numerical computing in Python. it is designed for
efficiency on large arrays of data. There are a number of reasons for this:
ndarray, an efficient multidimensional array providing fast array-oriented
arithmetic operations and flexible broadcasting capabilities.
Mathematical functions for fast operations on entire arrays of data without
having to write loops.
Tools for reading/writing array data to disk and working with memory-
mapped files.
Linear algebra, random number generation, and Fourier transform
capabilities.
A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.
One of the reasons NumPy is so important for numerical computations in
Python is because it is designed for efficiency on large arrays of data. There are a
number of reasons for this:
NumPy internally stores data in a contiguous block of memory,
independent of other built-in Python objects. NumPy’s library of algorithms
written in the C language can operate on this memory without any type checking
NumPy arrays also use much less memory than built-in Python sequences.
NumPy operations perform complex computations on entire arrays without
the need for Python for loops.
import numpy as np
my_arr = np.arange(1000000)
my_list = list(range(1000000))
List Array
List can have elements of different data All elements of an array are of same
types for example, [1,3.4, ‘hello’, ‘a@’] data type for example, an array of floats
may be: [1.2, 5.4, 2.7]
Lists can contain objects of different NumPy array takes up less space in
datatype that Python must store the memory as compared to a list because
type information for every element arrays do not require to store datatype
along with its element value. Thus lists of each element separately.
take more space in memory and are
less efficient.
data=data * 10
print(data)
output:
[[ 1.02769038 -0.45400781 -1.09134785]
[-0.74483404 -0.89984109 -0.04883344]]
In the first example, all of the elements have been multiplied by 10. In the
second, the corresponding values in each “cell” in the array have been added to
each other.
An ndarray is a generic multidimensional container for homogeneous data;
that is, all of the elements must be the same type.
Every array has a shape, a tuple indicating the size of each dimension, and
a dtype, an object describing the data type of the array:
print(data.shape)
(2, 3)
print(data.dtype)
dtype('float64')
Arrays:
Example:
import numpy as np
a = np.arange(15).reshape(3, 5)
print(“array is”,a)
print("array size is",a.shape)
print("array dimensions",a.ndim)
print("itewm size is",a.itemsize)
print("type of array",type(a))
O/P:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]]
array size is (3, 5)
array dimensions 2
item size is 4
type of array <class 'numpy.ndarray'>
CREATING NDARRAYS
The easiest way to create an array is to use the array function. This accepts
any sequence-like object (including other arrays) and produces a new NumPy
array containing the passed data.
Exam
import numpy as np
a = np.array([2,3,4])
print("array is",a)
print("data type", a.dtype)
O/p:
array is [2 3 4]
data type int32
array b float64
The type of the array can also be explicitly specified at creation time:
Example:
import numpy as np
c = np.array( [ [1,2], [3,4] ], dtype=complex )
print("complex array",c)
O/P:
complex array [[1.+0.j 2.+0.j]
[3.+0.j 4.+0.j]]
The function zeros creates an array full of zeros, the function ones creates
an array full of ones, and the function empty creates an array whose initial
content is random and depends on the state of the memory. By default, the dtype
of the created array is float64.
Example:
import numpy as np
a=np.zeros( (3,4) )
print("array a is",a)
b=np.ones( (2,3,4), dtype=np.int16 )
print("array b is",b)
O/P:
array a is
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
array b is
[[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]]
[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]]]
Function Description
Function Description
arange Like the built-in range but returns an ndarray instead of a list
zeros,
Like ones and ones_like but producing arrays of 0s instead
zeros_like
empty, Create new arrays by allocating new memory, but do not populate
empty_like with any values like ones and zeros
Produce an array of the given shape and dtype with all values set to
full,
the indicated “fill value” full_like takes another array and produces a
full_like
filled array of the same shape and dtype
Boolean type
bool ?
storing True and False values
O/P:
int64
Example 2:
import numpy as np
arr = np.array(['apple', 'banana', 'cherry'])
print(arr.dtype)
O/P:
U6
Creating Arrays With a Defined Data Type:
We use the array() function to create arrays, this function can take an
optional argument: dtype that allows us to define the expected data type of the
array elements:
Example:
import numpy as np
arr = np.array([1, 2, 3, 4], dtype='S')
print("array is",arr)
print("array type is",arr.dtype)
O/P:
B V Raju College Page 11
Unit – III Python for Data Science
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print( arr.dtype)
#Output: dtype('int64')
float_arr = arr.astype(np.float64)
print( float_arr.dtype)
#Output: dtype('float64')
import numpy as np
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
print(arr)
#Output: array([ 3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
print(arr.astype(np.int32))
#Output: array([ 3, -1, -2, 0, 12, 10], dtype=int32)
import numpy as np
output:
[ 7 77 23 130]
Subtraction
import numpy as np
output:
[ 3 67 3 70]
Multiplication
import numpy as np
output:
[ 10 360 130 3000]
Division
import numpy as np
div_ans = a/b
print(div_ans)
output:
[ True True True True]
[ 2.5 36. 6.5 50. ]
[ 25 5184 169 10000]
There are two types of Indexing: basic and advanced. Advanced indexing is
further divided into Boolean and Purely Integer indexing. Negative Slicing index
values start from the end of the array.
To get some specific data or elements from numpy arrays, NumPy indexing
and slicing are used. Indexing starts from 0 and slicing is performed using
indexing.
Indexing an Array
Indexing is used to access individual elements. It is also possible to extract
entire rows, columns, or planes from multi-dimensional arrays with numpy
indexing. Indexing starts from 0. Let's see an array example below to understand
the concept of indexing:
Element
2 3 11 9 6 4 10 12
of array
Index 0 1 2 3 4 5 6 7
Print( arr[5])
#Output: 5
Print( arr[5:8])
#Output: array([5, 6, 7])
B V Raju College Page 16
Unit – III Python for Data Science
arr[5:8] = 12
print(arr)
#Output: array([ 0, 1, 2, 3, 4, 12, 12, 12, 8, 9])
As you can see, if you assign a scalar value to a slice, as in arr[5:8] = 12, the
value is propagated (or broadcasted henceforth) to the entire selection. An
important first distinction
from Python’s built-in lists is that array slices are views on the original
array. This means that the data is not copied, and any modifications to the view
will be reflected in the source array.
arr_slice = arr[5:8]
print( arr_slice)
Now, when I change values in arr_slice, the mutations are reflected in the
original array arr:
arr_slice[1] = 12345
print(arr)
Output:
array([ 0, 1, 2, 3, 4, 12, 12345, 12, 8, 9])
arr_slice[:] = 64
print(arr)
Output:
array([ 0, 1, 2, 3, 4, 64, 64, 64, 8, 9])
Indexing in 2 Dimensions
Example:
import numpy as np
arr=np.arange(12)
arr1=arr.reshape(3,4)
print("Array arr1:\n",arr1)
print("Element at 0th row and 0th column of arr1 is:",arr1[0,0])
print("Element at 1st row and 2nd column of arr1 is:",arr1[1,2])
O/P;
O/P:
Consider the two-dimensional array from before, arr2d. Slicing this array is
a bit different:
import numpy as np
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr2d[:2])
Output:
array([[1, 2, 3],
[4, 5, 6]])
As you can see, it has sliced along axis 0, the first axis. A slice, therefore,
selects a range of elements along an axis. It can be helpful to read the
expression arr2d[:2] as “select the first two rows of arr2d.”
import numpy as np
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[:2, 1:]
Output:
array([[2, 3],
[5, 6]])
When slicing like this, you always obtain array views of the same number of
dimensions. By mixing integer indexes and slices, you get lower dimensional
slices.
For example, I can select the second row but only the first two columns like
so:
import numpy as np
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[1, :2]
Note that a colon by itself means to take the entire axis, so you can slice
only higher dimensional axes by doing:
import numpy as np
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print( arr2d[:, :1])
Output:
array([[1],
[4],
[7]])
import numpy as np
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[:2, 1:] = 0
print( arr2d)
Output:
array([[1, 0, 0],
[4, 0, 0],
[7, 8, 9]])
Indexing in 3 Dimensions
There are three dimensions in a 3-D array, suppose we have three
dimensions as (i, j, k), where i stands for the 1st dimension, j stands for the 2nd
dimension and, k stands for the 3rd dimension.
Example:
import numpy as np
arr=np.arange(12)
arr1=arr.reshape(2,2,3)
print("Array arr1:\n",arr1)
print("Element:",arr1[1,0,2])
O/P:
Slicing a 2D Array
In a 2-D array, we have to specify start:stop 2 times. One for the row and 2nd one
for the column.
Exampl:
import numpy as np
arr=np.arange(12)
arr1=arr.reshape(3,4)
print("Array arr1:\n",arr1)
print("\n")
print("elements of 1st row and 1st column upto last column :\n",arr1[1:,1:4])
O/P:
The 1st number represents the row, so slicing starts from the 1st row and
goes till the last as no ending index is mentioned. Then elements from the 1st
column to the 3rd column are sliced and printed as output.
Boolean Indexing:
Boolean indexing occurs when the obj is a Boolean array object, i.e., a true or
false type or having some condition.
The elements that satisfy the Boolean expression are returned.
This is used to filter the values of the desired elements.
Example:
import numpy as np
arr = np.array([11,6,41,10,29,50,55,45])
print(arr[arr>35])
O/P:
Elements that satisfy the given condition, i.e., greater than 35, are printed
as output
Import numpy as np
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7, 4)
print( names)
print( data)
names == 'Bob'
data[names == 'Bob']
data[names == 'Bob', 2:]
data[names == 'Bob', 3]
Output:
array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')
array([[ 0.0929, 0.2817, 0.769 , 1.2464],
[ 1.0072, -1.2962, 0.275 , 0.2289],
[ 1.3529, 0.8864, -2.0016, -0.3718],
[ 1.669 , -0.4386, -0.5397, 0.477 ],
[ 3.2489, -1.0212, -0.5771, 0.1241],
[ 0.3026, 0.5238, 0.0009, 1.3438],
[-0.7135, -0.8312, -2.3702, -1.8608]])
array([ True, False, False, True, False, False, False])
array([[ 0.0929, 0.2817, 0.769 , 1.2464],
[ 1.669 , -0.4386, -0.5397, 0.477 ]])
array([[ 0.769 , 1.2464],
[-0.5397, 0.477 ]])
array([1.2464, 0.477 ])
Example:
import numpy as np
a= np.arange(6).reshape((2,3))
print("array",a)
b=np.transpose(a)
print("transpose array",b )
O/P:
O/P:
unary ufuncs.
Function Description
Natural logarithm (base e), log base 10, log base 2, and
log, log10, log2, log1p
log(1 + x), respectively
Function Description
import numpy as np
arr = np.arange(10)
print("array is\n",arr)
print("\n")
a=np.sqrt(arr)
print("square root\n",a)
print("\n")
b=np.exp(arr)
print("exponent is\n",b)
O/p:
Binary ufunc:
Function Description
Function Description
import numpy as np
arr = np.arange(5)
arr1=np.arange(5,10)
print("arr",arr)
print("\n")
print("arr1",arr1)
print("add is\n",np.add(arr,arr1))
print("div is\n",np.divide(arr,arr1))
O/P:
Trigonometric Functions
NumPy has standard trigonometric functions which return trigonometric
ratios for a given angle in radians.
FUNCTION DESCRIPTION
Example
import numpy as np
a = np.array([0,30,45,60,90])
B V Raju College Page 33
Unit – III Python for Data Science
FUNCTION DESCRIPTION
round_off_values = np.round_(in_array)
print ("\nRounded values : \n", round_off_values)
round_off_values = np.round_(in_array)
print ("\nRounded values : \n", round_off_values)
Output :
Input array :
[0.5, 1.5, 2.5, 3.5, 4.5, 10.1]
Rounded values :
[ 0. 2. 2. 4. 4. 10.]
Input array :
[0.53, 1.54, 0.71]
Rounded values :
[ 1. 2. 1.]
Input array :
[0.5538, 1.33354, 0.71445]
Rounded values :
[ 0.554 1.334 0.714]
Statistics Methods
Numpy provides various statistical functions which are used to perform some
statistical data analysis.
Mean
Median
Range (peak to peak)
Standard Deviation
Variance
np.mean
Compute the arithmetic mean of an array along a specific axis. The default is
along the flattened axis. For example-
import numpy as np
output:
[[1 2]
[3 4]]
mean 2.5
mean axis [1.5 3.5]
np.median
computes the median of the array along a specific axis. Median is the middle
value in a sorted (ascending/descending) list of numbers.
import numpy as np
array_for_median = np.array([[10, 7, 4], [3, 2, 1]])
x=np.median(array_for_median) #default
print(array_for_median)
print("Median", x)
output:
[[10 7 4]
[ 3 2 1]]
Median 3.5
np.ptp
measures the range along a specific axis of an array. The range is the
difference between the maximum and minimum values in a matrix/array.
import numpy as np
array_for_range = np.array([[-85, 60, 94, 53],
[3, -12, 54, 14],
[32, 45, -66, 36]])
x=np.ptp(array_for_range)
print(x)
The maximum value is 94 while the minimum value is -85. The range will be:
(94- (-85)) = (94+85) = 179
np.std
The standard deviation is the spread of the values from their mean
value. np.stdis the NumPy function which helps to measure the standard deviation
of an array along a specified axis.
import numpy as np
array_for_stddev = np.array([[7, 8, 9], [10, 11, 12]])
x=np.std(array_for_stddev)
print(array_for_stddev)
print("standard deviation",x)
output:
[[ 7 8 9]
[10 11 12]]
standard deviation 1.707825127659933
np.var
np.var is the NumPy function which measures the variance of an array along
a specified axis. Variance is the average squared deviations from the mean of all
observed values.
import numpy as np
x=np.var(array_for_variance) #default
print(array_for_variance)
print("variance",x)
output:
[[1 2 3]
[6 7 8]]
variance 6.916666666666667
Method Description
Method Description
argmin,
Indices of minimum and maximum elements, respectively
argmax
Sorting:
Like Python’s built-in list type, NumPy arrays can be sorted in-place using the sort
method:
Syntax
list.sort(reverse=True|False, key=myFunc)
Parameter Description
Example:
import numpy as np
import numpy as np
values = np.array([6, 0, 0, 3, 2, 5, 6])
print( np.in1d(values, [2, 3, 6]))
O/P: