Apj Abdul Kalam Technological Uni Versi Ty: Syllabus - Study Materials - Textbook PDF - Solved Question Papers
Apj Abdul Kalam Technological Uni Versi Ty: Syllabus - Study Materials - Textbook PDF - Solved Question Papers
Apj Abdul Kalam Technological Uni Versi Ty: Syllabus - Study Materials - Textbook PDF - Solved Question Papers
www.keralanotes.com
KTU STUDY MATERIALS
PROGRAMMING IN PYTHON
Module 5
Related Link :
www.keralanotes.com
https://www.keralanotes.com/
CST 362: PROGRAMMING IN PYTHON
MODULE V
The os and sys modules, NumPy - Basics, Creating arrays, Arithmetic, Slicing,
Matrix Operations, Random numbers. Plotting and visualization. Matplotlib - Basic
plot, Ticks, Labels, and Legends. Working with CSV files. – Pandas - Reading,
Manipulating, and Processing Data. Introduction to Micro services using Flask.
o m
OS Module in Python
.c
The OS module in python provides functions for interacting with the operating
system.
s
te
OS, comes under Python’s standard utility modules.
no
1. os.name
ke
This function gives the name of the operating system dependent module imported.
The following names have currently been registered: ‘posix’, ‘nt’, ‘os2’, ‘ce’,
‘java’ and ‘riscos’
>>>import os
>>>os.name
'nt'
2. os.getcwd()
1
>>> os.getcwd()
'C:\\Users\\binuvp\\AppData\\Local\\Programs\\Python\\Python38-32'
3. os.listdir('.')
m
>>> os.listdir('.')
o
s .c
['are.py', 'DLLs', 'Doc', 'include', 'Lib', 'libs', 'LICENSE.txt', 'mymodule.py',
'NEWS.txt', 'polygon.py', 'python.exe', 'python3.dll', 'python38.dll', 'pythonw.exe',
te
's.py', 'Scripts', 'student.py', 't.py', 'tcl', 'test.py', 'Tools', 'vcruntime140.dll',
no
'__pycache__']
la
ra
4. os.chdir('..')
ke
>>> os.getcwd()
'C:\\Users\\binuvp\\AppData\\Local\\Programs\\Python\\Python38-32'
>>> os.chdir('..')
>>> os.getcwd()
'C:\\Users\\binuvp\\AppData\\Local\\Programs\\Python'
5. os.mkdir(path)
6. os.rmdir(path)
7. os.remove(path)
Remove a file
8. os.rename(old.new)
o m
Sys Module in Python
s
parts of the Python runtime environment. .c
The sys module provides functions and variables used to manipulate different
te
1. sys.argv
no
Returns a list of command line arguments passed to a Python script. The item at
index 0 in this list is always the name of the script. The rest of the arguments
la
2. sys.exit
ke
This causes the script to exit back to either the Python console or the command
prompt. This is generally used to safely exit from the program in case of
generation of an exception.
3. sys.maxsize
Returns the largest integer a variable can take.
4. sys.path
This is an environment variable that is a search path for all Python modules.
5. sys.version
This attribute displays a string containing the version number of the current
Python interpreter.
m
Fourier transforms and routines for shape manipulation.
o
Operations related to linear algebra.
.c
NumPy has in-built functions for linea algebra and random number generation.
s
te
ndarray Object
no
Every item in an ndarray takes the same size of block in the memory. Each
element in ndarray is an object of data-type object (called dtype).
ke
Any item extracted from ndarray object (by slicing) is represented by a Python
object of one of array scalar type.
The basic ndarray is created using an array function in NumPy as follows −
numpy.array
import numpy as np
a = np.array([1,2,3,4])
4
print(a) [1 2 3 4]
c = np.array([(1,2,3),(4,5,6),(7,8,9)])
[[1 2 3]
print(c) [4 5 6]
m
[7 8 9]]
o
ndarray Object – Parameters
s .c
te
Some important attributes of ndarray object
(1) ndarray.ndim
no
(2) ndarray.shape
ra
shape is a tuple of integers representing the size of the ndarray in each dimension.
ke
(3) ndarray.size
size is the total number of elements in the ndarray. It is equal to the product of
elements of the shape.
(4) ndarray.dtype
dtype tells the data type of the elements of a NumPy array. In NumPy array, all the
elements have the same data type.
(5) ndarray.itemsize
5
itemsize returns the size (in bytes) of each element of a NumPy array.
Example:
import numpy as np
a = np.array([[[1,2,3],[4,3,5]],[[3,6,7],[2,1,0]]])
print("The dimension of array a is:", a.ndim)
print("The size of the array a is: ", a.shape)
print("The total no: of elements in array a is: ", a.size)
print("The datatype of elements in array a is: ", a.dtype)
m
print("The size of each element in array a is: ", a.itemsize)
o
Output:
The dimension of array a is: 3 s .c
te
The size of the array a is: (2, 2, 3)
The total no: of elements in array a is: 12
no
One-dimensional arrays can be indexed, sliced and iterated over, much like lists
and other Python sequences.
import numpy as np
A=np.arange(10)
print(A)
>>[0 1 2 3 4 5 6 7 8 9]
print(A[0])
>>0
print(A[-1])
>>9
print(A[0:3])
>>[0 1 2]
A[0:3]=100
A[3]=200
print(A)
>>[100 100 100 200 4 5 6 7 8 9]
m
When we assign a scalar value to a slice, as in A[0:3] = 100, the value is propagated
(or broadcasted henceforth) to the entire selection. An important first distinction from
o
lists is that array slices are views on the original array. This means that the data is not
.c
copied, and any modifications to the view will be reflected in the source array:
s
te
slice=A[5:9]
no
print(slice)
la
>>[5 6 7 8]
ra
slice[:]=200
ke
print(A)
>>[100 100 100 3 4 200 200 200 200 9]
B=np.arange(10)
print(B[0:8:2])
>>[0 2 4 6]
print(B[8:0:-2])
>>[8 6 4 2]
print(B[:4])
7
>>[0 1 2 3]
print(B[5:])
>>[5 6 7 8 9]
print(B[::-1])
>>[9 8 7 6 5 4 3 2 1 0]
m
Arithmetic operations are possible only if the array has the same structure and
dimensions.
o
Basic operations : with scalars
import numpy as np
s .c
te
a = np.array([1,2,3,4,5])
no
b = a+1
print(b)
la
c = 2**a
ra
print(c)
ke
Output:
[2 3 4 5 6]
[ 2 4 8 16 32]
Matrix operations
Addition
import numpy as np
A = np.array([[2, 4], [5, -6]])
B = np.array([[9, -3], [3, 6]])
C = A + B # element wise addition
print(C)
>>[[11 1]
[ 8 0]]
Subtraction
import numpy as np
A = np.array([[2, 4], [5, -6]])
B = np.array([[9, -3], [3, 6]])
C = A - B # element wise subtraction
print(C)
m
>>[[ -7 7]
[ 2 -12]]
o
.c
Multiplication (Element-wise matrix multiplication or the Hadamard product)
s
te
import numpy as np
no
print(A*A)
>>[[ 4 16]
ra
[25 36]]
ke
import numpy as np
A = np.array([[1, 2], [3, 4]])
print(A*2)
>>[[2 4]
[6 8]]
Transpose
import numpy as np
A = np.array([[2, 4], [5, -6]])
9
print(A.T)
print(A.transpose())
>>[[ 2, 5],
[ 4, -6]])
Inverse
Matrix inversion is a process that finds another matrix that when multiplied with the
matrix, results in an identity matrix. Not all matrices are invertible. A square matrix
that is not invertible is referred to as singular.
m
from numpy import array
from numpy.linalg import inv
o
# define matrix
A = array([[1.0, 2.0],[3.0, 4.0]]) s .c
te
print(A)
B = inv(A) # invert matrix
no
print(B)
la
Trace
A trace of a square matrix is the sum of the values on the main diagonal of the matrix
(top-left to bottom-right).
# matrix trace
from numpy import array
from numpy import trace
# define matrix
A = array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
print(A)
10
# calculate trace
B = trace(A)
print(B)
Determinant
The determinant of a square matrix is a scalar representation of the volume of the
matrix.
from numpy import array
from numpy.linalg import det
m
# define matrix
A = array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
o
print(A)
# calculate determinant s .c
te
B = det(A)
print(B)
no
la
Matrix-Matrix Multiplication
ra
It is more complicated than the previous operations and involves a rule as not
all matrices can be multiplied together.
The rule for matrix multiplication is as follows:
The number of columns (n) in the first matrix (A) must equal the number of
rows (m) in the second matrix (B).
Example:
from numpy import array
# define first matrix
A = array([[1, 2],[3, 4],[5, 6]])
11
print(A)
# define second matrix
B = array([[1, 2],[3, 4]])
print(B)
# multiply matrices
C = A.dot(B)
print(C)
Random Numbers
Random means something that cannot be predicted logically. Computers work
m
on programs, and programs are definitive set of instructions. So it means there
must be some algorithm to generate a random number as well.
o
If there is a program to generate random number it can be predicted, thus it is
.c
not truly random. Random numbers generated through a generation algorithm
s
te
are called pseudo random.
In order to generate a truly random number on our computers we need to get
no
the random data from some outside source. This outside source is generally our
la
import numpy as np
x = np.random.randint(100)
print(x)
>>64
The randint() method takes a size parameter where you can specify the shape
of an array. The following commands will generate 5 random numbers from 0
to 100.
12
import numpy as np
x = np.random.randint(100,size=5)
print(x)
>>[25 62 24 81 39]
The following will Generate a 2-D array with 3 rows, each row containing 5
random integers from 0 to 100:
import numpy as np
x = np.random.randint(100,size=(3,5))
m
print(x)
>>[[ 2 96 40 43 85]
o
[81 81 4 48 29]
[80 31 6 10 24]]
s .c
te
The random module's rand() method returns a random float between 0 and 1.
no
import numpy as np
la
x = np.random.rand()
print(x)
ra
>>0.2733166576024767
ke
x = np.random.rand(10)
print(x)
>>[0.82536563 0.46789636 0.28863107 0.83941914 0.24424812 0.25816291
0.72567413 0.80770073 0.32845661 0.34451507]
Generate an array with size (3,5)
x = np.random.rand(3,5)
print(x)
13
The choice() method allows to get a random value from an array of values.
import numpy as np
x = np.random.choice([3,5,6,7,9,2])
print(x)
>>3
m
import numpy as np
o
x = np.random.choice([3,5,6,7,9,2],size=(3,5))
print(x)
>>[[3 2 5 2 6]
s .c
te
[5 9 3 6 9]
no
[5 6 9 3 3]]
la
Random Permutations
ra
Shuffling Arrays
Shuffle means changing arrangement of elements in-place. i.e. in the array
itself.
import numpy as np
x=np.array([1,2,3,4,5])
14
np.random.shuffle(x)
print(x)
>>[4 1 3 5 2]
Generating Permutation of Arrays
The permutation() method returns a re-arranged array (and leaves the original
array un-changed).
import numpy as np
x=np.array([1,2,3,4,5])
y=np.random.permutation(x)
m
print(y)
o
>>[3 1 5 2 4]
Matplotlib
s .c
te
no
Matplotlib is one of the most popular Python packages used for data
visualization.
la
import math
x = np.arange(0, math.pi*2, 0.05)
3. The ndarray object serves as values on x axis of the graph. The corresponding
sine values of angles in x to be displayed on y axis are obtained by the following
statement
y = np.sin(x)
4. The values from two arrays are plotted using the plot() function.
plt.plot(x,y)
5. You can set the plot title, and labels for x and y axes.You can set the plot title,
m
and labels for x and y axes.
plt.xlabel("angle")
o
plt.ylabel("sine")
plt.title('sine wave') s .c
te
6. The Plot viewer window is invoked by the show() function
plt.show()
no
la
import numpy as np
ke
16
m
Matplotlib - PyLab module
o
PyLab is a procedural interface to the Matplotlib object-oriented plotting
library.
s .c
te
PyLab is a convenience module that bulk imports matplotlib.pyplot (for
plotting) and NumPy (for Mathematics and working with arrays) in a single
no
name space.
la
basic plot
from numpy import *
ra
x = linspace(-3, 3, 30)
y = x**2
plot(x, y)
show()
17
Color code
o m
s .c
te
no
la
ra
ke
18
m
xlabel('x')
ylabel('sin cos -sin')
o
legend(loc='upper right')
show() s .c
te
no
la
x = [5, 2, 9, 4, 7]
ke
y = [10, 5, 8, 4, 2]
# Function to plot the bar
plt.bar(x,y)
# function to show the plot
plt.show()
19
Creating a histogram
from matplotlib import pyplot as plt
# x-axis values
x = [5, 2, 9, 4, 7,5,5,5,4,9,9,9,9,9,9,9,9,9]
# Function to plot the histogram
plt.hist(x)
# function to show the plot
plt.show()
m
Scatter Plot
from matplotlib import pyplot as plt
o
x = [5, 2, 9, 4, 7]
y = [10, 5, 8, 4, 2] s .c
te
# Function to plot scatter
plt.scatter(x, y)
no
plt.show()
la
ra
Stem plot
ke
20
Pie Plot
data=[20,30,10,50]
from pylab import *
pie(data)
show()
m
from pylab import *
o
x = np.arange(0, math.pi*2, 0.05)
subplot(2,2,1) s .c
te
plot(x, sin(x),label='sin')
xlabel('x')
no
ylabel('sin')
la
legend(loc='upper right')
grid(True)
ra
subplot(2,2,2)
ke
legend(loc='upper right')
grid(True)
subplot(2,2,4)
xlabel('x')
ylabel('tan')
plot(x, tan(x), 'y-',label='tan')
legend(loc='upper right')
grid(True)
show()
om
Ticks in Plot
.c
Ticks are the values used to show specific points on the coordinate axis. It can
s
te
be a number or a string. Whenever we plot a graph, the axes adjust and take the
default ticks.
no
Matplotlib’s default ticks are generally sufficient in common situations but are
la
in no way optimal for every plot. Here, we will see how to customize these
ticks as per our need.
ra
ke
The following program shows the default ticks and customized ticks
22
plt.plot(x, y, 'b')
plt.xlabel('x')
plt.ylabel('y')
figure(2)
plt.plot(x, y, 'r')
plt.xlabel('x')
plt.ylabel('y')
# 0 is the initial value, 51 is the final value
# (last value is not taken) and 5 is the difference
m
# of values between two consecutive ticks
plt.xticks(np.arange(0, 51, 5))
o
plt.yticks(np.arange(0, 11, 1))
plt.tick_params(axis='y',colors='red',rotation=45)
s .c
te
plt.show()
no
la
ra
ke
23
o m
Working with CSV Files s .c
te
no
CSV is a delimited data format that has fields/columns separated by the comma
character and records/rows terminated by newlines.
la
A CSV file does not require a specific character encoding, byte order, or line
ra
All records should have the same number of fields, in the same order. Data within
fields is interpreted as a sequence of characters, not as a sequence of bits or bytes.
24
Fields with double quote characters must be surrounded by double quotes. Each
inbuilt double quote must be represented by a pair of consecutive quotes.
m
Fast and efficient for manipulating and analyzing data.
o
Data from different file objects can be loaded.
s .c
Easy handling of missing data (represented as NaN)
Size mutability: columns can be inserted and deleted from DataFrame and
te
higher dimensional objects
no
Pandas generally provide two data structure for manipulating data, They are:
ke
Series
DataFrame
Series
Pandas Series is a one-dimensional labeled array capable of holding data of any
type (integer, string, float, python objects, etc.).
The axis labels are collectively called index. Pandas Series is nothing but a
column in an excel sheet.
The simplest Series is formed from only an array of data.
Example:
25
import pandas as pd
obj=pd.Series([3,5,-8,7,9])
print(obj)
0 3
1 5
2 -8
3 7
4 9
Often it will be desirable to create a Series with an index identifying each data
m
point:
obj2 = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])
o
print(obj2)
d 4
s .c
te
b 7
a -5
no
c 3
la
If you have data contained in a Python dict, you can create a Series from it by
passing the dict:
ra
obj3=pd.Series(sdata)
print(obj3)
Ohio 35000
Texas 71000
Oregon 16000
Utah 5000
The isnull() and notnull() functions in pandas should be used to detect missing
data:
26
o m
s .c
te
no
la
ra
Pandas DataFrame
ke
27
mo
.c
Basic operation which can be performed on Pandas DataFrame
s
Creating a DataFrame
te
Dealing with Rows and Columns
no
Creating a DataFrame
In the real world, a Pandas DataFrame will be created by loading the datasets
from existing storage, storage can be SQL Database, CSV file, and Excel file.
Pandas DataFrame can be created from the lists, dictionary, and from a list of
dictionary etc.
import pandas as pd
lst = ['mec', 'minor', 'stud', 'eee', 'bio']
df = pd.DataFrame(lst)
28
print(df)
0
0 mec
1 minor
2 stud
3 eee
4 bio
m
import pandas as pd
o
# initialise data of lists.
.c
data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}
s
te
# Create DataFrame
df = pd.DataFrame(data)
no
print(df)
Name Age
ra
0 Tom 20
ke
1 nick 21
2 krish 19
3 jack 18
Dealing with Rows and Columns
A Data frame is a two-dimensional data structure, i.e., data is aligned in a
tabular fashion in rows and columns.
We can perform basic operations on rows/columns like selecting, deleting,
adding, and renaming.
Column Selection: In Order to select a column in Pandas DataFrame, we can either
29
m
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
o
# select two columns
print(df) s .c
te
print(df[['Name', 'Qualification']])
no
Row Selection: Pandas provide a unique method to retrieve rows from a Data frame.
la
DataFrame.loc[] method is used to retrieve rows from Pandas DataFrame. Rows can
also be selected by passing integer location to an iloc[] function.
ra
ke
print(data.loc[‘Jai’])
print(data.iloc[1])
Missing Data can occur when no information is provided for one or more items
or for a whole unit. Missing Data can also refer to as NA(Not Available) values
in pandas.
In order to check missing values in Pandas DataFrame, we use a function
isnull() and notnull().
Both function help in checking whether a value is NaN or not. These function
can also be used in Pandas Series in order to find null values in a series.
import pandas as pd
m
import numpy as np
dict = {'First Score':[100, 90, np.nan, 95],
o
.c
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
s
te
df = pd.DataFrame(dict)
print(df.isnull())
no
la
print(df.notnull())
import pandas as pd
31
import numpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
m
print(df)
First Score Second Score Third Score
o
0 100.0 30.0 NaN
1 90.0 45.0 40.0 s .c
te
2 NaN 56.0 80.0
3 95.0 NaN 98.0
no
la
32
m
0 100.0 30.0 -1.0
1 90.0 45.0 40.0
o
2 -1.0 56.0 80.0
3 95.0 -1.0 98.0 s .c
te
#dropping the rows containing null values
no
print(df.dropna())
la
33
o m
s .c
te
no
The process of creating or writing a CSV file through Pandas can be a little
more complicated than reading CSV, but it's still relatively simple.
ra
Here is a simple example showing how to export a DataFrame to a CSV file via
to_csv():
# importing pandas as pd
import pandas as pd
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'degree': ["MBA", "BCA", "M.Tech", "MBA"],
34
o m
s .c
te
no
la
ra
ke
The following are the various functions you can do on this data file
# importing pandas as pd
35
import pandas as pd
df=pd.read_csv('stud.csv',index_col='rollno')
print("data frame stud")
print(df)
om
.c
print("statistical info of numerical column")
print(df.describe())
s
te
statistical info of numerical column
mark
no
count 6.000000
mean 33.833333
std 10.590877
la
min 25.000000
25% 25.000000
50% 30.000000
ra
75% 42.500000
max 48.000000
ke
print("coulmns")
print(df.columns)
coulmns
Index(['name', 'place', 'mark'], dtype='object')
print("size")
print(df.size)
size
18
36
print("data types")
print(df.dtypes)
data types
name object
place object
mark int64
print("shapes")
print(df.shape)
shapes
(6, 3)
m
print("index and length of index")
o
print(df.index,len(df.index))
print("sum=",df['mark'].sum())
la
print("mean=",df['mark'].mean())
ra
print("max=",df['mark'].max())
print("min=",df['mark'].min())
ke
print("var=",df['mark'].var())
print("standard deviation=",df['mark'].std())
print(df.std())
statistical functions
sum= 203
mean= 33.833333333333336
max= 48
min= 25
var= 112.16666666666667
standard deviation= 10.59087657687817
mark 10.590877
37
print("top 2 rows")
print(df.head(2))
top 2 rows
name place mark
rollno
101 binu ernkulam 45
103 ashik alleppey 35
print("last 2 rows")
print(df.tail(2))
o m
last 2 rows
name place mark
.c
rollno
106 ann thrisur 25
107 padma kylm 25
s
te
no
print(df[df['mark']>40])
print("rows 0,1,2 columns 0,2")
print(df.iloc[0:3,[0,2]])
m
o
print("sorting in the descending order of marks")
s .c
te
print(df.sort_values(by='mark',ascending=False))
no
rollno
102 faisal kollam 48
101 binu ernkulam 45
ra
min 25.000000
max 48.000000
mean 33.833333
Name: mark, dtype: float64
print("median of marks")
39
print("Median",df.sort_values(by='mark',ascending=False).median())
print("mode of marks")
print("Mode",df.sort_values(by='mark',ascending=False)['mark'].mode())
print("count of marks")
print(df['mark'].value_counts())
median of marks
Median mark 30.0
dtype: float64
mode of marks
Mode 0 25
dtype: int64
m
count of marks
25 3
o
45 1
.c
35 1
48 1
s
te
print("grouping data based on column value")
print(df.groupby('mark')['mark'].mean())
no
la
35 35
45 45
ke
48 48
plt.pie(df['mark'])
o m
s .c
te
no
la
ra
ke
41