0% found this document useful (0 votes)
7 views

06. PT - Chapter 6 - Library in Python

This document is a chapter from a programming course focused on Python libraries, specifically for data processing and visualization. It covers defining libraries, popular libraries like Numpy, Pandas, Matplotlib, and Seaborn, and how to integrate them into applications. The chapter also includes objectives for students to learn about library creation, data handling, and software packaging.

Uploaded by

hihubebe
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

06. PT - Chapter 6 - Library in Python

This document is a chapter from a programming course focused on Python libraries, specifically for data processing and visualization. It covers defining libraries, popular libraries like Numpy, Pandas, Matplotlib, and Seaborn, and how to integrate them into applications. The chapter also includes objectives for students to learn about library creation, data handling, and software packaging.

Uploaded by

hihubebe
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 133

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY

UNIVERSITY OF ECONOMICS AND LAW


Programming Techniques
Chapter VI
LIBRARY IN PYTHON

Lecturer:
Trần Duy Thanh, PhD.
Email: thanhtd@uel.edu.vn
Blog: https://tranduythanh.com
Objectives

 After completing this chapter, students will know how to define their own
libraries. They can use some popular libraries such as Processing array data
with Numpy library, Organizing and processing table data with Pandas
library, Visualizing data with Matplotlib & Seaborn library and Working
with Random module.
 Students will know how to integrate Matplotlib & Seaborn into PyQt6
interface – Qt Designer
 Finally, students will be able to package and publish the software

Faculty of Information
2
Systems
Contents
• 6.1. Libraries in programming.
• 6.2. How to define library.
• 6.3. Some popular libraries
• 6.3.1. Processing array data with Numpy library.
• 6.3.2. Organizing and processing table data with Pandas library.
• 6.3.3. Visualizing data with Matplotlib & Seaborn library.
• 6.3.4. Working with Random modules.
• 6.4. Integrating Matplotlib & Seaborn into PyQt6 interface - Qt Designer
6.5. Packaging and
Faculty of Information
publishing software
3
Systems
6.1. Libraries in programming
A library in programming can be simply understood as a place (environment) that
provides ready-made methods that can be reused in many programs, helping to shorten
programming time.

Faculty of Information
4
Systems
6.1. Libraries in programming

Module Packag Library


? e? ?

Faculty of Information
5
Systems
6.1. Libraries in programming

Module Modules include classes, functions and variables that are


declared and defined in python files that can be imported
? into other program scripts.

Faculty of Information
6
Systems
6.1. Libraries in programming

Packag Package: a set of related modules that work together to provide


certain processing functionality.
e?
Package

init.py

Faculty of Information
7
Systems
6.1. Libraries in programming

Library Library: a collection of packages that provide a wide range of


processing functions.
?

Module Module Module Module


#1 #2 #1 #2

… …

Package #1 Package #n

Library

Faculty of Information
8
Systems
6.1. Libraries in programming
Example: find the greatest common divisor of 2 numbers

Using the gcd function from the “math” standard library

Faculty of Information
9
Systems
6.1. Libraries in programming

Example: reading excel data

import pandas as pd
df = pd.read_excel('Sales.xlsx’)
print(df)

Download the dataset for this example at:


https://tranduythanh.com/datasets/Sales.xlsx

Faculty of Information
10
Systems
6.1. Libraries in programming
Example: data visualization
import seaborn as sns
sns.set(style='darkgrid') # Set style for chart

sns.relplot(x='total_bill'
, y='tip', data=tips,
hue='smoker')

Faculty of Information
11
Systems
6.2. How to define library

All the functions, classes… that we build can become libraries to support reuse during
software deployment.

Google search “Package your code as a pip module”

Students learn and complete the creation of a library so that other members can install and
use it

Faculty of Information
12
Systems
6.3. Some popular libraries

• 6.3.1. Processing array data with Numpy library


• 6.3.2. Organizing and processing table data with Pandas library
• 6.3.3. Visualizing data with Matplotlib & Seaborn library
• 6.3.4. Working with Random module

Faculty of Information
13
Systems
6.3. Some popular libraries

Numpy Library

Faculty of Information
14
Systems
6.3.1. Processing array data with Numpy library

• 6.3.1.1. Introduction to Numpy


• 6.3.1.2. One-dimensional arrays
• 6.3.1.3. Multidimensional arrays

Faculty of Information
15
Systems
6.3.1.1. Introduction to Numpy

NumPy (Numerical Python) is an open library that primarily supports computing large
one- or multi-dimensional arrays.
Setup: pip install numpy
(in Anaconda environment, numpy is already built in)
using: import numpy as np

https://
numpy.org/

Faculty of Information
16
Systems
6.3.1.2. One-dimensional arrays
Initialization
 function np.array(): np.array([element 1, element 2,…., element n])

arr1 = np.array([6, 6.5, 4, 5.5, 7, 8.5])


print(arr1)
print(type(arr1))
print(arr1.dtype)
 function np.asarray():
list_sample = [5, 7, 6.5, 8, 9.5]
tuple_sample = (6, 4, 5.5, 8.5)
arr2 = np.asarray(list_sample)
arr3 = np.asarray(tuple_sample)
print(arr2)
print(arr3)
Faculty of Information
17
Systems
6.3.1.2. One-dimensional arrays - Numpy
Initialization
 functions np.zeros(), np.ones():

arr_zeros = np.zeros(4)
arr_ones = np.ones(3, dtype=int)
print(arr_zeros)
print(arr_ones)

 function np.arange(start, stop, step, dtype):

arr1 = np.arange(2, 10, 1.5)


arr2 = np.arange(6)
print(arr1)
print(arr2)

Faculty of Information
18
Systems
6.3.1.2. One-dimensional arrays - Numpy

Initialization
 Function np.linspace(start, stop, num, endpoint, dtype):

arr = np.linspace(5, 15, 6)


print(arr)
 Function np.random.rand()/randn()/randint():

arr1 = np.random.rand(3)
arr2 = np.random.randn(4)
arr3 = np.random.randint(10, 45, 5)
print(arr1)
print(arr2)
print(arr3)

Faculty of Information
19
Systems
6.3.1.2. One-dimensional arrays - Numpy
Initialization
 function np.random.uniform(low, high, size):

arr1 = np.random.uniform(0.0, 5.0, 20)


print(arr1)

 function np.random.normal(loc, scale, size):

arr2 = np.random.normal(5.0, 1.0, 10000)


print(arr2)

Faculty of Information
20
Systems
6.3.1.2. One-dimensional arrays - Numpy

Access
 For example:

arr = np.random.randint(10, 80, 8)


print(arr)
print(arr[1])
print(arr[-2])
print(arr[[1,3,4]])
print(arr[2:5])
print(arr[arr < 40])

Faculty of Information
21
Systems
6.3.1.2. One-dimensional arrays - Numpy

Access
 For example:

arr = np.random.randint(10, 80, 8)


print(arr)
indices = np.where(arr>40)
print(indices)
print(arr[indices])
print(np.extract(arr>35,arr))
print(np.extract(np.mod(arr,2)==0, arr))

Faculty of Information
22
Systems
6.3.1.2. One-dimensional arrays - Numpy
Access
 Some basic statistics:

arr = np.random.randint(5, 45, 7)


print(arr)
print(np.min(arr)) # min value
print(np.argmin(arr)) # min element index
print(np.max(arr)) # max value
print(np.argmax(arr)) # max element index
print(np.mean(arr)) # mean value
print(np.median(arr)) # median value
print(np.std(arr)) # standard deviation

Faculty of Information
23
Systems
6.3.1.2. One-dimensional arrays - Numpy
Add element
arr = np.array([4, 2, 15, 7, 9, 5, 11, 8])
print(arr)
arr = np.append(arr, [6, 3]) # Add element at the end
print(arr)
arr = np.insert(arr, 2, [6, 1]) # Add at any location
print(arr)

Delete element
arr = np.array([4, 2, 15, 7, 9, 5, 11, 8])
arr = np.delete(arr, [2, 4]) # Delete element by index
print(arr)
Faculty of Information
24
Systems
6.3.1.2. One-dimensional arrays - Numpy
Sort

arr = np.array([4, 2, 15, 7, 9, 5, 11, 8])


print(np.sort(arr))
print(np.sort(arr)[::-1])
Arithmetic calculation
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print(arr1 + arr2)
print(arr1 - arr2)
print(arr1 * arr2)
print(arr1 / arr2)
print(arr1 % arr2)
Faculty of Information
25
Systems
6.3.1.3. Multidimensional arrays - Numpy

Initialization arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr)

print(arr.shape) # Array size


arr = np.array([[[1, 2, 3], [4, 5, 6]],
[[7, 8, 9], [10, 11, 12]]])

print(arr)

print(arr.shape) # Array size

Faculty of Information
26
Systems
6.3.1.3. Multidimensional arrays - Numpy

Initialization
arr = np.zeros([2, 3, 4], dtype=int)
print(arr)

arr = np.ones([2, 3], dtype=int)


print(arr)

Faculty of Information
27
Systems
6.3.1.3. Multidimensional arrays - Numpy

Access
arr = np.array([[[1, 2, 3], [4, 5, 6]],
[[7, 8, 9], [10, 11, 12]]])

print(arr)

print(arr[0]) print(arr[1][0][2]) print(arr[1][1][1])

Faculty of Information
28
Systems
6.3.1.3. Multidimensional arrays - Numpy

Access arr = np.array([[[1, 2, 3], [4, 5, 6]],


[[7, 8, 9], [10, 11, 12]]])

print(arr)

 Access basic statistical values ​similar to 1-dimensional arrays

print(np.max(arr)) print(np.mean(arr[1]))

print(arr[np.where(arr>8)])

Faculty of Information
29
Systems
6.3.1.3. Multidimensional arrays - Numpy

Update value
arr = np.array([[[1, 2, 3], [4, 5, 6]],
[[7, 8, 9], [10, 11, 12]]])

print(arr)

arr[1][0][2] = 10
print(arr[1][0])

Faculty of Information
30
Systems
6.3.1.3. Multidimensional arrays - Numpy

Sort
arr = np.array([[[1, 3, 2], [6, 4, 5]],
[[7, 9, 8], [12, 10, 11]]])
print(np.sort(arr))

Faculty of Information
31
Systems
6.3.1.3. Multidimensional arrays - Numpy

Arithmetic calculation
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([1, 2, 3])

print(arr1 + arr2)

print(arr1 - arr2)

print(arr1 * arr2)

print(arr1 / arr2)

 Similar calculations for other arithmetic operations


Faculty of Information
32
Systems
6.3.1.3. Multidimensional arrays - Numpy
Row  Column
arr1 = np.array(range(12))
print(arr1)

arr_reshape = arr1.reshape(3,4)
print(arr_reshape)

arr2 = arr_reshape.reshape(1,-1)
print(arr2)

arr3 = arr_reshape.flatten()
print(arr3)

Faculty of Information
33
Systems
6.3. Some popular libraries

Pandas Library

Faculty of Information
34
Systems
6.3.2. Data processing with Pandas library

• 6.3.2.1. Introducing Pandas


• 6.3.2.2. Data Structures in Pandas
• 6.3.2.3. Working with Empty Data

Faculty of Information
35
Systems
6.3.2.1. Introducing Pandas

Pandas is an open library, widely used in data processing, applied in many fields such as
economics, science, statistics, ...
Setup: pip install pandas
(in Anaconda environment, pandas is already built in)
Using: import pandas as pd

Faculty of Information
36
Systems
6.3.2.2. Data Structures in Pandas
Series
Series is a data structure similar to a 1-dimensional array but expressed vertically. All
elements in a Series are assigned an index.

ser = pd.Series([2, 4, 6, 8])


print(ser)

arr_price = np.array([76.3, 23.1, 102.4])


arr_symbol = np.array(['FPT', 'ACB', 'VNM'])
ser = pd.Series(arr_price, index=arr_symbol)
print(ser)
dic = {'FPT':76.3, 'ACB':23.1,
'VNM':102.4}
ser = pd.Series(dic)
print(ser)
Faculty of Information
37
Systems
6.3.2.2. Data Structures in Pandas

Series
 Access

print(ser['ACB'])

print(ser[2])

print(ser[1:]) print(ser[['FPT', 'VNM']])

Faculty of Information
38
Systems
6.3.2.2. Data Structures in Pandas
Series
 Access
print(ser.size)
print(len(ser))

print(ser.values)

print(ser.index)

print(ser.axes)

Faculty of Information
39
Systems
6.3.2.2. Data Structures in Pandas
Series
 Access

dic = {'FPT':76.3, 'ACB':23.1, 'VNM':102.4,


'AGH': 7.8, 'FLC':3.5, 'HTC':24.2}
ser = pd.Series(dic)
print(ser)

print(ser.head(3))

print(ser.tail(2))

Faculty of Information
40
Systems
6.3.2.2. Data Structures in Pandas
Series
 Access

print(ser.mean())
print(ser.std())
print(ser.describe())

Khoa Hệ Thống Thông


41
Tin
6.3.2.2. Data Structures in Pandas

Series
 Update

ser['FPT'] = 81
ser[2] = 106
print(ser)

Faculty of Information
42
Systems
6.3.2.2. Data Structures in Pandas

Series
 Delete element

print(ser.drop(ser.index[[0, 2]]))

print(ser.drop(['FLC', 'AGH']))

Faculty of Information
43
Systems
6.3.2.2. Data Structures in Pandas

Series
 Arithmetic calculation

print(ser + 2)

print(ser.map(lambda x:x*2))

Faculty of Information
44
Systems
6.3.2.2. Data Structures in Pandas
DataFrame
DataFrame is a data structure organized in rows and columns.

list_sample = [['PNJ', 180.1, 182], ['VIB', 22.3, 21.2], ['VIC', 46.2,


45.6], ['VNM', 150, 146.1]]
df = pd.DataFrame(list_sample, columns=['Symbol', 'Open', 'Close'])
print(df)

Faculty of Information
45
Systems
6.3.2.2. Data Structures in Pandas
DataFrame
 Read data from .csv/.xlsx file

import pandas as pd
df = pd.read_csv('employee.csv')
print(df)

Dataset download at: https://tranduythanh.com/datasets/employee.csv

Faculty of Information
46
Systems
6.3.2.2. Data Structures in Pandas

DataFrame
 Read data from .csv/.xlsx file
$ pip install xlrd
$ pip install openpyxl .xlsx

import pandas as pd
df = pd.read_excel('Sales.xlsx')
Dataset download at : https://tranduythanh.com/datasets/Sales.xlsx

Faculty of Information
47
Systems
6.3.2.2. Data Structures in Pandas
DataFrame
 Read data from .csv/.xlsx file
df = pd.read_csv('./data/TCB_2018_2020.csv')
print(df)
Dataset download at: https://tranduythanh.com/datasets/TCB_2018_2020.csv

Faculty of Information
48
Systems
6.3.2.2. Data Structures in Pandas

DataFrame
 Read data from .csv/.xlsx file

df = pd.read_csv('./data/TCB_2018_2020.csv', index_col=0)
print(df.head())

Faculty of Information
49
Systems
6.3.2.2. Data Structures in Pandas
DataFrame
 Retrieve data based on conditions

df = pd.read_csv('./data/TCB_2018_2020.csv', index_col=0)
# Export data with the condition that the closing price is greater than 98
print(df[df['Close']>98])

Faculty of Information
50
Systems
6.3.2.2. Data Structures in Pandas

DataFrame
 Retrieve data by column

print(df[["High", "Low"]].tail())

Faculty of Information
51
Systems
6.3.2.2. Data Structures in Pandas
DataFrame
 Retrieve data by column

df = pd.read_csv('./data/TCB_2018_2020.csv', header=None)
print(df[[0, 2, 3]].tail())

Note*: column data can only be retrieved by index when the DataFrame is
initialized with a default index.
Faculty of Information
52
Systems
6.3.2.2. Data Structures in Pandas

DataFrame
 Retrieve data by row

df = pd.read_csv('./data/TCB_2018_2020.csv', index_col=0)
print(df.loc['2020-06-15'])

Faculty of Information
53
Systems
6.3.2.2. Data Structures in Pandas

DataFrame
 Retrieve data by row

df = pd.read_csv('./data/TCB_2018_2020.csv', index_col=0)
print(df.loc[['2019-06-10', '2020-06-10']])

Faculty of Information
54
Systems
6.3.2.2. Data Structures in Pandas
DataFrame
 Retrieve data by row

df = pd.read_csv('./data/TCB_2018_2020.csv', index_col=0)
print(df.iloc[0]) # Get the first row

Faculty of Information
55
Systems
6.3.2.2. Data Structures in Pandas

DataFrame
 Retrieve data by row

df = pd.read_csv('./data/TCB_2018_2020.csv', index_col=0)
print(df.iloc[[0, 2]]) # Get multiple records

Faculty of Information
56
Systems
6.3.2.2. Data Structures in Pandas
DataFrame
 Retrieve data by row

df = pd.read_csv('./data/TCB_2018_2020.csv', index_col=0)
print(df.iloc[35:41]) # Get multiple consecutive records

Faculty of Information
57
Systems
6.3.2.2. Data Structures in Pandas
DataFrame
 Retrieve data by element

df = pd.read_csv('./data/TCB_2018_2020.csv', index_col=0)
# Retrieve closing price on 20-08-2019
print(df.loc['2019-08-20', 'Close'])

print(df.loc['2020-12-25':, 'Open'])

Faculty of Information
58
Systems
6.3.2.2. Data Structures in Pandas
DataFrame
 Retrieve data by element

df = pd.read_csv('./data/TCB_2018_2020.csv', index_col=0)
# Retrieve the 5th row and first column
print(df.iloc[4, 0])

# Retrieve rows from row 648 with all columns


print(df.iloc[648:, :])

Faculty of Information
59
Systems
6.3.2.2. Data Structures in Pandas
DataFrame
 Delete data Dataset download at: https://tranduythanh.com/datasets/SampleData.csv

df = pd.read_csv('./data/SampleData.csv', index_col=0)
print(df)

del df['Price'] # Delete Price column


print(df)

# Delete row with index 2


print(df.drop(df.index[2]))

Faculty of Information
60
Systems
6.3.2.2. Data Structures in Pandas
DataFrame
 Add data

df = pd.read_csv('./data/SampleData.csv', index_col=0)
print(df)
# Add USD price column with
exchange rate 1USD = 23,000 VND
df['Usd'] = df['Price']/23
print(df)

Faculty of Information
61
Systems
6.3.2.2. Data Structures in Pandas
DataFrame
 Add data

df = pd.read_csv('./data/SampleData.csv')
print(df)

# Add line at the end of df


df.loc[df.shape[0]] = ['VCB', 113.6, 23.09]
print(df)

Faculty of Information
62
Systems
6.3.2.2. Data Structures in Pandas

DataFrame
 Sort data

sales_2020 = pd.DataFrame({'sales': [450, 360, 550, 480]},


index=['Mar', 'Jun', 'Feb', 'Apr'])
print(sales_2020)

print(sales_2020.sort_index())

Faculty of Information
63
Systems
6.3.2.2. Data Structures in Pandas
DataFrame
 Sort data

sales_2020 = pd.DataFrame({'sales': [450, 360, 550, 480]},


index=['Mar', 'Jun', 'Feb', 'Apr'])
print(sales_2020)
sales_2021 = pd.DataFrame({'sales': [650, 600,
700, 680]}, index=['Feb', 'Mar', 'Apr', 'Jun'])
print(sales_2020.reindex(sales_2021.index))

Faculty of Information
64
Systems
6.3.2.2. Data Structures in Pandas
DataFrame
 Group data: When elements in a column have repeating values ​(categorical
variables).
Dataset loaded at : https://tranduythanh.com/datasets/SampleData2.csv

df = pd.read_csv('./data/SampleData2.csv')
print(df)

print(df.groupby('Group').mean())
sum(),
count(),

Faculty of Information
65
Systems
6.3.2.2. Data Structures in Pandas
DataFrame
 Merge data

df = pd.read_csv('./data/SampleData2.csv')
df1 = df[['Symbol', 'Price', 'Group']]
df2 = df[['Symbol', 'PE', 'Group']]
print(df1) print(df2)

Faculty of Information
66
Systems
6.3.2.2. Data Structures in Pandas

DataFrame
 Merge data

df_concat = pd.concat([df1, df2])


print(df_concat)

Faculty of Information
67
Systems
6.3.2.2. Data Structures in Pandas

DataFrame
 Merge data

df_concat = pd.concat([df1, df2], join='inner')


print(df_concat)

 Remove non-duplicate columns using join = ‘inner’


parameter (default join = ‘outer’)

Faculty of Information
68
Systems
6.3.2.2. Data Structures in Pandas

DataFrame
 Merge data

df_concat = pd.concat([df1, df2], axis=1)


print(df_concat)

 Merge data by parameter axis = 1 (default axis = 0)

Faculty of Information
69
Systems
6.3.2.2. Data Structures in Pandas

DataFrame
 Merge data

df_append = df1.append(df2)
print(df_append)

Faculty of Information
70
Systems
6.3.2.2. Data Structures in Pandas

DataFrame
 Merge data

df_merge = pd.merge(df1, df2)


print(df_merge)

Faculty of Information
71
Systems
6.3.2.2. Data Structures in Pandas

DataFrame
 Merge data

df = pd.read_csv('./data/SampleData2.csv')
df1 = df[['Symbol', 'Price', 'Group’]]
df1 = df1.drop(df1.index[3]) # Delete FPT
df2 = df[['Symbol', 'PE', 'Group']]
print(df1) print(df2)

Faculty of Information
72
Systems
6.3.2.2. Data Structures in Pandas

DataFrame
 Merge data

df_merge = pd.merge(df1, df2)


print(df_merge)

df_merge = pd.merge(df1, df2, how='outer')


print(df_merge)

Faculty of Information
73
Systems
6.3.2.2. Data Structures in Pandas
Check for empty data
Dataset loaded at : https://tranduythanh.com/datasets/SampleData_NaN.csv
df = pd.read_csv('./data/SampleData_NaN.csv')
print(df)

Faculty of Information
74
Systems
6.3.2.2. Data Structures in Pandas
Check for empty data
df = pd.read_csv('./data/SampleData_NaN.csv')
print(df.isnull())

Faculty of Information
75
Systems
6.3.2.2. Data Structures in Pandas

Check for empty data


df = pd.read_csv('./data/SampleData_NaN.csv')
# Check for empty data for each column
print(df.isnull().any())

# Check for empty data for entire


DataFrame
print(df.isnull().values.any())

Faculty of Information
76
Systems
6.3.2.2. Data Structures in Pandas

Check for empty data


df = pd.read_csv('./data/SampleData_NaN.csv')
# Check the number of empty data for each column
print(df.isnull().sum())
# Check the number of empty data for the
entire DataFrame
print(df.isnull().sum().sum())  6

Faculty of Information
77
Systems
6.3.2.2. Data Structures in Pandas
Handling empty data
Empty data can be handled by deleting or filling in new values.
df = pd.read_csv('./data/SampleData_NaN.csv')
# Delete rows containing empty elements
df_delete_na_by_row = df.dropna(axis=0)
print(df_delete_na_by_row)

Faculty of Information
78
Systems
6.3.2.2. Data Structures in Pandas
Handling empty data
Empty data can be handled by deleting or filling in new values.

df = pd.read_csv('./data/SampleData_NaN.csv')
# Delete columns containing empty elements
df_delete_na_by_col = df.dropna(axis=1)
print(df_delete_na_by_col)

Faculty of Information
79
Systems
6.3.2.2. Data Structures in Pandas
Handling empty data
Empty data can be handled by deleting or filling in new values.

df = pd.read_csv('./data/SampleData_NaN.csv')
# Fill value 100 for empty element
df_fill_na_100 = df.fillna(100)
print(df_fill_na_100)

Faculty of Information
80
Systems
6.3.2.2. Data Structures in Pandas
Handling empty data
Empty data can be handled by deleting or filling in new values.

df = pd.read_csv('./data/SampleData_NaN.csv')
# Fill empty element with adjacent value below
df_fill_na_bfill = df.fillna(method='bfill')
print(df_fill_na_bfill)

Faculty of Information
81
Systems
6.3.2.2. Data Structures in Pandas
Handling empty data
Empty data can be handled by deleting or filling in new values.

df = pd.read_csv('./data/SampleData_NaN.csv')
# Fill empty element with adjacent value above
df_fill_na_ffill = df.fillna(method='ffill')
print(df_fill_na_ffill)

Faculty of Information
82
Systems
6.3.2.2. Data Structures in Pandas
Handling empty data
Empty data can be handled by deleting or filling in new values.

df = pd.read_csv('./data/SampleData_NaN.csv')
# Fill empty elements with interpolated values
df_fill_na_interpolate = df.interpolate()
print(df_fill_na_interpolate)

*When there are many


consecutive empty elements,
the value interpolation
method should be used.

Faculty of Information
83
Systems
6.3. Some popular libraries

Matplotlib & Seaborn


Library

Faculty of Information
84
Systems
6.3.3. Visualization data with Matplotlib & Seaborn

• 6.3.3.1. Introduction to Matplotlib


• 6.3.3.2. Common types of charts with Matplotlib
• 6.3.3.3. Introduction to Seaborn
• 6.3.3.4. Types of charts in Seaborn

Faculty of Information
85
Systems
6.3.3.1. Introduction to Matplotlib

Matplotlib is a popular library in Python, mainly serving the purpose of plotting


descriptive data, supporting a variety of graph types.
Setup: pip install matplotlib
(in Anaconda environment, matplotlib is already integrated)
Using: import matplotlib.pyplot as plt # way 1
from matplotlib import pyplot as plt # way 2

https://matplotlib.org/

Faculty of Information
86
Systems
6.3.3.2. Common types of charts with Matplotlib

Set general configuration parameters for the chart


plt.rcParams['figure.figsize'] = (10,8)
plt.rcParams['figure.dpi'] = 200
plt.rcParams['font.size'] = 13
# plt.rcParams['savefig.dpi'] = 200
# plt.rcParams['legend.fontsize'] = 'large'
# plt.rcParams['figure.titlesize'] = 'medium'
# plt.rcParams["legend.loc"] = 'best'

Faculty of Information
87
Systems
6.3.3.2. Common types of charts with Matplotlib
Line chart
Dataset loaded at: df = pd.read_csv('./data/NetProfit.csv')
https://tranduythanh.com/datasets dat = df[['Year', 'VIC']]
/NetProfit.csv
print(dat)

plt.plot('Year', 'VIC', data=dat)


plt.show()

Faculty of Information
88
Systems
6.3.3.2. Common types of charts with Matplotlib
Line chart
df = pd.read_csv('./data/NetProfit.csv')
print(df)

plt.plot('Year', 'VNM', data=df)


plt.plot('Year', 'PNJ', data=df)
plt.plot('Year', 'VCB', data=df)
plt.plot('Year', 'VIC', data=df)
plt.show()

Faculty of Information
89
Systems
6.3.3.2. Common types of charts with Matplotlib

Line chart
plt.plot('Year', 'VNM', data=df, color='b', linestyle='-', marker='o')
plt.plot('Year', 'PNJ', data=df, color='g', linestyle='--', marker='s')
plt.plot('Year', 'VCB', data=df, color='#FF0000', linestyle=':', marker='+')
plt.plot('Year', 'VIC', data=df, color='orange', linestyle='-.', marker='*')
plt.title("Lợi nhuận của VNM, PNJ, VCB, VIC từ 2010 đến 2020", fontweight='bold')
plt.xlabel("Năm", fontweight='bold')
plt.ylabel("Lợi nhuận", fontweight='bold')
plt.legend()
plt.show()

Faculty of Information
90
Systems
6.3.3.2. Common types of charts with Matplotlib

Column chart
df = pd.read_csv('./data/PVD_Asset.csv') Dataset loaded at:
print(df) https://tranduythanh.com/datasets/PVD_Asset.csv
plt.bar('Year', 'Liabilities', data=df)
plt.title("Nợ của PVD qua các năm")
plt.xlabel("Năm")
plt.ylabel("Nợ")
plt.show()

Faculty of Information
91
Systems
6.3.3.2. Common types of charts with Matplotlib
Column chart
df = pd.read_csv('./data/PVD_Asset.csv')
print(df)
plt.barh('Year', 'Equity', data=df)
plt.title("Vốn của PVD qua các năm")
plt.xlabel("Vốn")
plt.ylabel("Năm")
plt.show()

Faculty of Information
92
Systems
6.3.3.2. Common types of charts with Matplotlib
Column chart plt.bar('Year', 'Liabilities', data=df, color='orange', label="Nợ")
plt.bar('Year', 'Equity', data=df, bottom='Liabilities', color='darkgreen', label="Vốn")
plt.title("Tài sản của PVD từ 2010-2020")
plt.xlabel("Năm")
plt.ylabel("Tỷ đồng")
plt.legend()
plt.show()

Faculty of Information
93
Systems
6.3.3.2. Common types of charts with Matplotlib
Scatter chart
df = pd.read_csv('./data/Income.csv') Dataset loaded at:
print(df) https://tranduythanh.com/datasets/Income.csv
plt.scatter('Income', 'Expenditure', data=df, color='darkgreen')
plt.xlabel('Thu nhập', fontweight='bold')
plt.ylabel('Chi tiêu', fontweight='bold')
plt.show()

Faculty of Information
94
Systems
6.3.3.2. Common types of charts with Matplotlib

Scatter chart
colors = np.random.rand(df.shape[0]) # Random random color
area = df['Income'].values * 50 # Data point size
plt.scatter('Income', 'Expenditure', data=df, c=colors, s=area,alpha=0.8)
plt.xlabel('Thu nhập', fontweight='bold')
plt.ylabel('Chi tiêu', fontweight='bold')
plt.show()

Faculty of Information
95
Systems
6.3.3.2. Common types of charts with Matplotlib

Histogram Chart
A histogram is a type of bar chart that shows the changes and fluctuations of a data set in
certain shapes.

For example: draw a frequency chart showing the salaries of 400 employees (average salary 12
million VND, difference 2 million VND).

dat = np.random.normal(12, 2, 400)


plt.hist(dat, color='darkgreen',
edgecolor='orange')
plt.xlabel('Lương')
plt.ylabel('Tần suất')
plt.show()

Faculty of Information
96
Systems
6.3.3.2. Common types of charts with Matplotlib

Pie chart
lbls = ['Chuyển nhượng BĐS', 'Cho thuê BĐS', 'DV khách sạn', 'Bệnh viện', 'Giáo dục', 'Sản xuất', 'Hoạt động khác']
income = [71.576, 6.788, 4.869, 2.675, 2.244, 18.007, 4.304]
explode = [0.1, 0.1, 0.2, 0.2, 0.1, 0.1, 0.2]
plt.pie(income, labels= lbls, explode=explode, autopct='%1.1f%%', pctdistance=1.1, labeldistance=1.2)
plt.title('Cơ cấu doanh thu VinGroup - 2020', fontweight='bold')
# plt.legend(loc='upper right')
plt.show()

Faculty of Information
97
Systems
6.3.3.2. Common types of charts with Matplotlib
Boxplot chart
A Boxplot is a type of graph that shows the distribution of data, showing how the data points are
spread out, whether the data is symmetrical, widely or narrowly distributed, min-max values, and
outliers.
Data is distributed according to “quartile” ranges which divide the data into 4 parts:
Q1 Q2 Q3
25% 25% 25% 25%

Interquartile Range
(IQR)
Outliers Outliers

Minimum Maximum
(Q1 – 1.5*IQR) Q1 Median Q3 (Q1 + 1.5*IQR)
25% 75%

Faculty of Information
98
Systems
6.3.3.2. Common types of charts with Matplotlib
Boxplot chart
For example: Consider the following programmer salary survey data:
dat = pd.read_csv('./data/Salary_of_Developer.csv')
print(dat)

plt.boxplot(dat)
plt.ylabel("Lương (triệu đồng)")
plt.title("Boxplot thể hiện phân bổ mức
lương Lập trình viên")
plt.show()

Dataset loaded at: https://tranduythanh.com/datasets/Salary_of_Developer.csv


Faculty of Information
99
Systems
6.3.3.2. Common types of charts with Matplotlib
Boxplot chart
For example: consider the following programmer salary survey data:
orange_square = dict(markerfacecolor='orange', marker='s')
plt.boxplot(dat, notch=True, flierprops=orange_square, vert=False)
plt.xlabel("Lương (triệu đồng)")
plt.title("Boxplot thể hiện phân bổ mức lương Lập trình viên")
plt.show()

Faculty of Information
10
Systems 0
6.3.3.2. Common types of charts with Matplotlib

Subplot chart
plt.subplot(rows, columns, index)

plt.subplot(2,
2, 2)
plt.subplot(2,
2, 1)

plt.subplot(2, plt.subplot(2,
2, 3) 2, 4)

Faculty of Information
101
Systems
6.3. Some popular libraries

Seaborn

Faculty of Information
10
Systems 2
6.3.3.3. Introduction to Seaborn
Seaborn is a library built on the foundation of Matplotlib, serving mainly the purpose of
drawing graphs to visualize data.
Setup: pip install seaborn
(in Anaconda environment, seaborn is already integrated)
Using: import seaborn as sns

https://seaborn.pydata.org/

Faculty of Information
10
Systems 3
6.3.3.3. Introduction to Seaborn

General configuration settings


import seaborn as sns
sns.set(style='darkgrid') # Set style for chart
import ssl
# Configure ssl to allow loading sample data via library
ssl._create_default_https_context = ssl._create_unverified_context

# Show sample datasets


sample_datasets = sns.get_dataset_names()
print(sample_datasets)

Faculty of Information
10
Systems 4
6.3.3.3. Introduction to Seaborn
Chart with continuous variables
 relplot()
tips = sns.load_dataset("tips")
print(tips.head())

sns.relplot(x='total_bill'
, y='tip', data=tips)

Faculty of Information
105
Systems
6.3.3.3. Introduction to Seaborn
Chart with continuous variables
 relplot()

sns.relplot(x='total_bill'
, y='tip', data=tips,
hue='smoker')

Faculty of Information
10
Systems 6
6.3.3.4. Types of charts in Seaborn
Chart with continuous variables
 relplot()

sns.relplot(x='total_bill'
, y='tip', data=tips,
hue='smoker',
style='sex')

Faculty of Information
10
Systems 7
6.3.3.4. Types of charts in Seaborn

Chart with continuous variables


 relplot()

sns.relplot(x='total_bill'
, y='tip', data=tips,
hue='smoker',
style='sex', size='size')

Faculty of Information
10
Systems 8
6.3.3.4. Types of charts in Seaborn
Chart with continuous variables
 relplot() Dataset loaded at:
dat = pd.read_csv('./data/Income.csv') https://tranduythanh.com/datasets/Income.csv
print(dat)
sns.relplot(x='Income', y='Expenditure', data=dat,
kind='scatter')

kind='scatter' kind='line'
Faculty of Information
109
Systems
6.3.3.4. Types of charts in Seaborn
Chart with continuous variables
 scatterplot()
relplot() với
kind='scatter'

Dataset loaded at: https://tranduythanh.com/datasets/Housing.csv

Faculty of Information
11
Systems 0
6.3.3.4. Types of charts in Seaborn
Chart with continuous variables
 lineplot()
relplot() với
kind= 'line'

Faculty of Information
11
Systems 1
6.3.3.4. Types of charts in Seaborn

Chart with categorical variables


 catplot()
scatter Distribution Estimate

boxplot()  catplot()
stripplot()  catplot() với kind=‘box’ pointplot()  catplot()
với kind=‘strip’ với kind=‘point’
violinplot()  catplot()
với kind=‘violin’
swarmplot() 
boxenplot()  barplot()  catplot()
catplot() với
catplot() với với kind=‘bar’
kind=‘warm’
kind=‘boxen’

Faculty of Information
11
Systems 2
6.3.3.4. Types of charts in Seaborn
Chart with categorical variables
 catplot()

kind = 'strip' kind = 'swarm'

Faculty of Information
11
Systems 3
6.3.3.4. Types of charts in Seaborn

Chart with categorical variables


 catplot()

kind = ‘box’ kind = ‘violin’ kind = ‘boxen’

Faculty of Information
11
Systems 4
6.3.3.4. Types of charts in Seaborn
Chart with categorical variables
 catplot()

kind = ‘point’ kind = ‘bar’

Faculty of Information
11
Systems 5
6.3.3.4. Types of charts in Seaborn
Distribution (frequency) chart
 distplot()
np.random.seed(10) # -> Random with constant value
dat = np.random.normal(12, 2, 400)
sns.displot(dat) sns.displot(dat, kde=True, color='r')
plt.xlabel('Salary') plt.xlabel('Salary')

Faculty of Information
11
Systems 6
6.3.3.4. Types of charts in Seaborn
Regression chart Dataset loaded at:
 regplot() https://tranduythanh.com/datasets/Income.csv
df = pd.read_csv('./data/Income.csv')
sns.regplot(x='Income', y='Expenditure', data=df)

Faculty of Information
117
Systems
6.3.3.4. Types of charts in Seaborn

Heatmap
flights_long
 heatmap()

flights_long = sns.load_dataset('flights')
flights = flights_long.pivot('month', 'year',
'passengers')

flights

Faculty of Information
118
Systems
6.3.3.4. Types of charts in Seaborn
Heatmap
 heatmap()
sns.heatmap(flights, annot=True, fmt='d', linewidths=.5)

cmap='YlGnBu'
Faculty of Information
11
Systems 9
6.3.3.4. Types of charts in Seaborn
Combination chart
Dataset loaded at:
 jointplot()
https://tranduythanh.com/datasets/Income.csv
df = pd.read_csv('./data/Income.csv')
sns.jointplot(x='Income', y='Expenditure', data=df, color='orange')

Faculty of Information
12
Systems 0
6.3.3.4. Types of charts in Seaborn
Combination chart
 pairplot()
df = pd.read_csv('./data/Income.csv')
sns.pairplot(df[['Income','Expenditure']])

Faculty of Information
12
Systems 1
6.3.4. Working with Random module

Random module
🏚: Students can read the document at the following link:

https://www.w3schools.com/python/module_random.a
sp

https://docs.python.org/3/library/random.html

Faculty of Information
12
Systems 2
6.4. Integrating Matplotlib & Seaborn into PyQt6

Faculty of Information
12
Systems 3
6.4. Integrating Matplotlib & Seaborn into PyQt6
Create a project named “TichHopChart” with the folder structure and images as below:

Dataset “NetProfit.csv” reused in previous slides


Faculty of Information
12
Systems 4
6.4. Integrating Matplotlib & Seaborn into PyQt6
Design the interface “MainWindow.ui” and name the Widgets as shown below in Qt
Designer. Then Generate python code “MainWindow.py”

Faculty of Information
12
Systems 5
6.4. Integrating Matplotlib & Seaborn into PyQt6
Then create and write the code “MainWindowEx.py”

Faculty of Information
12
Systems 6
6.4. Integrating Matplotlib & Seaborn into PyQt6
The configuration function creates a Chart for the interface.

Faculty of Information
12
Systems 7
6.4. Integrating Matplotlib & Seaborn into PyQt6
Bar Chart Display Function

Faculty of Information
12
Systems 8
6.4. Integrating Matplotlib & Seaborn into PyQt6

Line Chart Plot Display Function

Faculty of Information
12
Systems 9
6.4. Integrating Matplotlib & Seaborn into PyQt6
Pie Chart Display Function

The entire source code of the Student project can be found here:
https://tranduythanh.com/software/TichHopChart.rar

Faculty of Information
13
Systems 0
6.5. Packaging and publishing software

Students learn about packaging and publishing software in the chapter “Packaging &
Distribution”

Martin Fitzpatrick (2022), Create GUI Applications with Python & Qt6, Link download:
https://tranduythanh.com/ebooks/create-gui-applications-pyqt6.pdf
Faculty of Information
131
Systems
Reviews

• Question 1: Describe how to define your own Python library


• Question 2: Describe some important and commonly used features of Numpy
• Question 3: Describe some important and commonly used features of Pandas
• Question 4: Describe how to visualize data with Matplotlib
• Question 5: Describe how to visualize data with Seaborn
• Question 6: Describe how to integrate Matplotlib and Seaborn into the PyQt6
interface
• Question 7: Describe how to package and publish software

Faculty of Information
13
Systems 2
VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY
UNIVERSITY OF ECONOMICS AND LAW

THANK YOU!

You might also like