0% found this document useful (0 votes)
17 views

DataAnalytic-04 - NumPy & Pandas

The document discusses NumPy arrays and how to manipulate array shapes in NumPy. It describes how to create arrays using functions like array(), arange(), zeros(), ones(), and random(). It also covers data types in NumPy and how to select array elements by index. Additionally, it shows how to change array shapes using functions like reshape(), ravel(), flatten(), and transpose(). This allows transforming arrays between multi-dimensional and single-dimensional forms.

Uploaded by

kadnan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

DataAnalytic-04 - NumPy & Pandas

The document discusses NumPy arrays and how to manipulate array shapes in NumPy. It describes how to create arrays using functions like array(), arange(), zeros(), ones(), and random(). It also covers data types in NumPy and how to select array elements by index. Additionally, it shows how to change array shapes using functions like reshape(), ravel(), flatten(), and transpose(). This allows transforming arrays between multi-dimensional and single-dimensional forms.

Uploaded by

kadnan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

1

Data Analytic
Adhi Harmoko Saputro

Data Analytic
2

NumPy & Pandas


Adhi Harmoko Saputro

Data Analytic
3

Content

NumPy Pandas
• Create Array • Creating DataFrames
• Manipulating Array Shapes • Understanding DataFrames
• Stacking Arrays • Reading & Querying
• Partitioning Arrays • Describing DataFrames
• Changing Data Type • Grouping & Joining DataFrame
• Creating Views & Copies • Working Missing Values
• Slicing Arrays • Creating Pivot Tables
• Broadcasting Arrays • Dealing with Dates

Data Analytic
4

NumPy Arrays
Adhi Harmoko Saputro

Data Analytic
5

Create Array
• Create an array using the array() function with a list of items
• Possible data types are bool, int, float, long, double, and long double

Code:
# Creating an array
import numpy as np

a = np.array([2,4,6,8,10])
print(a)

Output:
[ 2 4 6 8 10]

Data Analytic
6

Create Array
• Creates an evenly spaced NumPy array using arange(start,[stop],step) function
• The start is the initial value of the range
• The stop is the last value of the range (compulsory)
• The step is the increment in that range
• Generates a value that is one less than the stop parameter value

Code:
# Creating an array using arange()
import numpy as np

a = np.arange(1,11)
print(a)

Output:
[ 1 2 3 4 5 6 7 8 9 10]

Data Analytic
7

Create Array
• The zeros() function creates an array for a given dimension with all zeroes
• The ones() function creates an array for a given dimension with all ones
• The full() function generates an array with constant values
• The eye() function creates an identity matrix
• The random() function creates an array with any given dimension

Data Analytic
8

Create Array
Code:
import numpy as np

p = np.zeros((3,3)) # Create an array of all zeros


print('All Zeros Array')
print(p)

q = np.ones((2,2)) # Create an array of all ones


print('All Anes Array')
print(q)

r = np.full((2,2), 4) # Create a constant array


print('Constant Array')
print(r)

s = np.eye(4) # Create a 2x2 identity matrix


print('2x2 Identity Matrix')
print(s)

t = np.random.random((3,3)) # Create an array filled with random values


print('Random Values Array')
print(t)

Data Analytic
9

Create Array
Output:
All Zeros Array
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
All Anes Array
[[1. 1.]
[1. 1.]]
Constant Array
[[4 4]
[4 4]]
2x2 Identity Matrix
[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
Random Values Array
[[0.52715194 0.21717546 0.77645954]
[0.6971321 0.28391831 0.88782967]
[0.25132285 0.17128033 0.22925874]]

Data Analytic
10

Array Data Type


• The type() function returns the type of the container: numpy.ndarray
• The dtype() function returns the type of the elements
• The shape returns the shape of the vector
Code:
# Creating an array using arange()
import numpy as np
a = np.arange(1,11)
print(type(a))
print(a.dtype)

# check shape pf Array


print(a.shape)

Integers (32- and 64-bit)


Output:
<class 'numpy.ndarray'>
int32
(10,) One-dimensional NumPy arrays are also known as vectors

Data Analytic
11

Selecting Array Elements


• Need to specify the index of the matrix as a[m,n]
• m is the row index
• n is the column index

[0,0] [0,1]

[1,0] [1,1]

Data Analytic
12

Selecting Array Elements


Code:
import numpy as np

# Creating an array in matrix 2x2


a = np.array([[5,6],[7,8]])

print(a)
print(a[0,0])
print(a[0,1])
print(a[1,0])
print(a[1,1])

Output:
[[5 6]
[7 8]]
5
6
7
8

Data Analytic
13

Numerical Data Types


Data Type Details

bool This is a Boolean type that stores a bit and takes True or False Values

inti Platform integers can be either int32 or int64

int8 Byte store values range from –128 to 127

int16 This stores integers ranging from –32768 to 32767

int32 This stores integers ranging from –231 to 231 –1

int64 This stores integers ranging from –263 to 263 –1

uint8 This stores unsigned integers ranging from 0 to 255

uint16 This stores unsigned integers ranging from 0 to 65535

uint32 This stores unsigned integers ranging from 0 to 232 – 1

uint64 This stores unsigned integers ranging from 0 to 264 – 1

Data Analytic
14

Numerical Data Types


Data Type Details

half/float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa

single (float) Platform-defined single precision float: typically sign bit, 8 bits exponent, 23 bits mantissa

double Platform-defined double precision float: typically sign bit, 11 bits exponent, 52 bits mantissa

long double Platform-defined extended-precision float

float complex Complex number, represented by two single-precision floats (real and imaginary components)

double complex Complex number, represented by two double-precision floats (real and imaginary components)

long double complex Complex number, represented by two extended-precision floats (real and imaginary components)

Data Analytic
15

Numerical Data Types


• Example of matching conversion function
Code:
import numpy as np

print(np.float64(21))
print(np.int8(21.0))
print(np.bool_(21))
print(np.bool_(0))
print(np.bool_(21.0))
print(np.single(True))
print(np.single(False))

Output:
float64(21) : 21.0
int8(21.0) : 21
bool(21) : True
bool(0) : False
bool(21.0) : True
single(True) : 1.0
single(False): 0.0

Data Analytic
16

Create Array
• Create an array with specific a data type argument
Code:
import numpy as np

arr = np.arange(1,11, dtype= np.float32)

print(arr)

Output:
[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]

Data Analytic
17

Manipulating Array Shapes


• The reshape() will change the shape of the array
• The flatten() transforms an n-dimensional array into a one-dimensional array
• The ravel() transforms an n-dimensional array into a one-dimensional array
• The flatten() returns the actual array
• The ravel() returns the reference of the original array
• The ravel() function is faster than the flatten() function because it does not occupy extra
memory
• The transpose() converts rows into columns and columns into rows
• The resize() function changes the size of the NumPy array
• Similar to reshape(), but it changes the shape of the original array

Data Analytic
18

Manipulating Array Shapes


Create One-dimensional arrays
Code:
import numpy as np

# Create an array
arr = np.arange(12)

Data Analytic
19

Manipulating Array Shapes


Code:
# Reshape the array dimension
new_arr = arr.reshape(4,3)

print(new_arr)

Output:
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]

Data Analytic
20

Manipulating Array Shapes


Code:
# Reshape the array dimension
new_arr2 = arr.reshape(3,4)

print(new_arr2)

Output:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]

Data Analytic
21

Manipulating Array Shapes


Code:
import numpy as np
Create Two-dimensional arrays
# Create an array
arr = np.arange(1,10).reshape(3,3)
print(arr)

Output:
[[1 2 3]
[4 5 6]
[7 8 9]]

Data Analytic
22

Manipulating Array Shapes


Code:
# flatten the array
print(arr.flatten())

Output:
[1 2 3 4 5 6 7 8 9]

Data Analytic
23

Manipulating Array Shapes


Code:
# Transpose the matrix
print(arr.transpose())

Output:
[[1 4 7]
[2 5 8]
[3 6 9]]

Data Analytic
24

Manipulating Array Shapes


Code:
# resize the matrix
arr.resize(1,9)
print(arr)

Output:
[[1 2 3 4 5 6 7 8 9]]

Data Analytic
25

Stacking Arrays
• Stacking means joining the same dimensional arrays along with a new axis
Horizontal Stacking

• The same dimensional arrays are joined along with a horizontal axis using the hstack() and
concatenate() functions

Vertical Stacking

• The same dimensional arrays are joined along with a vertical axis using the vstack() and concatenate()
functions

Depth Stacking

• The same dimensional arrays are joined along with a third axis (depth) using the dstack() function

Column Stacking

• Stacks multiple sequence one-dimensional arrays as columns into a single two-dimensional array

Data Analytic
26

Horizontal Stacking
Create one 3*3 array
Code:
arr1 = np.arange(1,10).reshape(3,3)
print(arr1)

arr2 = 2*arr1
Perform horizontal stacking along the x axis
print(arr2)

arr3 = np.hstack((arr1, arr2))


print(arr3)

Output:
[[1 2 3] 1st array
[4 5 6]
[7 8 9]]
[[ 2 4 6] 2nd array
[ 8 10 12]
[14 16 18]]
[[ 1 2 3 2 4 6] Horizontal stacking array
[ 4 5 6 8 10 12]
[ 7 8 9 14 16 18]]

Data Analytic
27

Horizontal Stacking
Code:
# Horizontal stacking using concatenate() function
arr4 = np.concatenate((arr1, arr2), axis=1)
print(arr4)

Generate the horizontal stacking with axis parameter value 1

Output:
[[ 1 2 3 2 4 6] Horizontal stacking array
[ 4 5 6 8 10 12]
[ 7 8 9 14 16 18]]

Data Analytic
28

Vertical Stacking
Code:
The same dimensional arrays are joined along with a
# Vertical stacking vertical axis
arr5 = np.vstack((arr1, arr2))
print(arr5)

# Vertical stacking using concatenate() function


arr6 = np.concatenate((arr1, arr2), axis=0)
print(arr6)

Generate the vertical stacking with axis parameter value 0


Output:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[ 2 4 6]
[ 8 10 12]
[14 16 18]]
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[ 2 4 6]
[ 8 10 12]
[14 16 18]]

Data Analytic
29

Depth Stacking
Code:
The same dimensional arrays are joined along with a third
# Depth stacking axis (depth)
arr7 = np.dstack((arr1, arr2))
print(arr7)

Output:
[[[ 1 2]
[ 2 4]
[ 3 6]]

[[ 4 8]
[ 5 10]
[ 6 12]]

[[ 7 14]
[ 8 16]
[ 9 18]]]

Data Analytic
30

Column Stacking
Code:
# Create 1-D array
arr1 = np.arange(4,7)
print(arr1)

# Create 1-D array


arr2 = 2 * arr1
print(arr2) Stacks multiple sequence one-dimensional arrays as
columns into a single two-dimensional array
# Create column stack
arr_col_stack = np.column_stack((arr1,arr2))
print(arr_col_stack)

Output:
[4 5 6]
[8 10 12]
[[ 4 8]
[ 5 10] Two one-dimensional arrays and stacked them column-wise
[ 6 12]]

Data Analytic
31

Partitioning Arrays
• Arrays can be partitioned into multiple sub-arrays
• Vertical, horizontal, and depth-wise split functionality
• Split into the same size arrays but can also specify the split location

• Horizontal splitting
• Divided into N equal sub-arrays along the horizontal axis using the hsplit() function
• Vertical splitting
• Divided into N equal subarrays along the vertical axis using the vsplit() and split()
functions

Data Analytic
32

Horizontal Splitting
Code:
# Create an array
arr = np.arange(1,10).reshape(3,3)
print(arr) Divides the array into three sub-arrays
Each part is a column of the original array
# Peroform horizontal splitting
arr_hor_split = np.hsplit(arr, 3)
print(arr_hor_split)

Output:
[[1 2 3]
[4 5 6]
[7 8 9]]
[array([[1],
[4],
[7]]),
array([[2],
[5],
[8]]),
array([[3],
[6],
[9]])]

Data Analytic
33

Vertical Splitting
Code: Divides the array into three sub-arrays
Each part is a row of the original array
# vertical split
arr_ver_split = np.vsplit(arr, 3)
print(arr_ver_split) The split function with axis=0 performs the same operation
as the vsplit() function
# split with axis=0
arr_split = np.split(arr,3,axis=0)
print(arr_split)

Output:
[array([[1, 2, 3]]), array([[4, 5, 6]]), array([[7, 8, 9]])]
[array([[1, 2, 3]]), array([[4, 5, 6]]), array([[7, 8, 9]])]

Data Analytic
34

Changing Data Type


• The astype() function converts the data type of the array
Code:
# Create an array
arr = np.arange(1,10).reshape(3,3)
print("Integer Array:", arr)

# Change datatype of array


arr = arr.astype(float)

# print array
print("Float Array:", arr)

# Check new data type of array


print("Changed Datatype:", arr.dtype)

Output:
Integer Array: [[1 2 3]
[4 5 6]
[7 8 9]]
Float Array: [[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
Changed Datatype: float64

Data Analytic
35

Changing Data Type


• The tolist() function converts a NumPy array into a Python list
Code:
# Create an array
arr = np.arange(1,10)

# Convert NumPy array to Python List


list = arr.tolist()
print(list)

A Python list object


Output:
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Data Analytic
36

Views & Copies

Views Copies
• The original base array and are treated as a shallow copy • The separate objects and treated as a deep copy
• View uses the same memory content • Copy stores the array in another location
• Modifications in a view affect the original data • Modifications in a copy do not affect the original array
• Views use the concept of shared memory • Copies require extra space compared to views
• Copies are slower than views

Data Analytic
37

Views & Copies


Shallow Copy Deep Copy

Copy Copy

Data X Copy of Data X Data X Copy of Data X

Heap Heap

Data Analytic
38

Creating Views & Copies


Code:
# Create NumPy Array
arr = np.arange(1,5).reshape(2,2)
print(arr)
# Create no copy only assignment
arr_no_copy = arr
# Create Deep Copy
arr_copy = arr.copy()
# Create shallow copy using View
arr_view = arr.view()
print("Original Array : ",id(arr))
print("Assignment : ",id(arr_no_copy))
print("Deep Copy : ",id(arr_copy))
print("Shallow Copy(View): ",id(arr_view))

Output:
[[1 2] The original array and the assigned array have the same
[3 4]] object ID, meaning both are pointing to the same object
Original Array : 2231251409616
Assignment : 2231251409616
Deep Copy : 2232047625840
Shallow Copy(View): 2232047625552

Data Analytic
39

Slicing Arrays
• Slicing in NumPy is similar to Python lists
• Indexing prefers to select a single value
• Slicing is used to select multiple values from an array
• NumPy arrays also support negative indexing and slicing
• The negative sign indicates the opposite direction and indexing starts from the right-hand
side with a starting value of -1

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9

Data Analytic
40

Slicing Arrays
Code:
# Create NumPy Array
arr = np.arange(10)
print(arr)

print(arr[3:6])

Use the colon symbol to select the collection of values


Slicing takes three values: start, stop, and step

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9

Output:
[0 1 2 3 4 5 6 7 8 9]
[3 4 5]

Data Analytic
41

Slicing Arrays
Code:
# Create NumPy Array
arr = np.arange(10)
print(arr)

print(arr[3:])

 Only the starting index is given (3 is the starting index)


 This slice operation will select the values from the
starting index to the end of the array

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9

Output:
[0 1 2 3 4 5 6 7 8 9]
[3 4 5 6 7 8 9]

Data Analytic
42

Slicing Arrays
Code:
# Create NumPy Array
arr = np.arange(10)
print(arr)

print(arr[-3:])

The slice operation will select values from the third value
from the right side of the array to the end of the array

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9

Output:
[0 1 2 3 4 5 6 7 8 9]
[7 8 9]

Data Analytic
43

Slicing Arrays
Code:
# Create NumPy Array
arr = np.arange(10)
print(arr)

print(arr[2:7:2])

 The start, stop, and step index are 2, 7, and 2, respectively


 The slice operation selects values from the second index to
the sixth (one less than the stop value) index with an
increment of 2 in the index value

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9

Output:
[0 1 2 3 4 5 6 7 8 9]
[2 4 6]

Data Analytic
44

Broadcasting Arrays
• Python lists do not support direct vectorizing arithmetic operations
• NumPy offers a fastervectorized array operation compared to Python list loop-based
operations
• All the looping operations are performed in C instead of Python, which makes it faster
• Broadcasting functionality checks a set of rules for applying binary functions
• Addition
• Subtraction
• Multiplication

Data Analytic
45

Broadcasting Arrays
Code:
import numpy as np

# Create NumPy Array


arr1 = np.arange(1,5).reshape(2,2)
print(arr1)

# Create another NumPy Array


arr2 = np.arange(5,9).reshape(2,2)
print(arr2)

# Add two matrices The addition of two arrays of the same size
print(arr1+arr2)

Output:
[[ 6 8]
[10 12]]

Data Analytic
46

Broadcasting Arrays
Code:
# Multiply two matrices
print(arr1*arr2)

# Add a scaler value


print(arr1 + 3)

# Multiply with a scalar value


print(arr1 * 3)

Output:
[[ 5 12]
[21 32]]
[[4 5]
[6 7]]
[[ 3 6]
[ 9 12]]

Data Analytic
47

Pandas DataFrames
Adhi Harmoko Saputro

Data Analytic
48

Pandas DataFrames
• The pandas library is designed to work with a panel or tabular data
• Fast, highly efficient, and productive tool for manipulating and analyzing string, numeric,
datetime, and time-series data
• Provides data structures such as DataFrames and Series

• A pandas DataFrame is a tabular, two-dimensional labeled


• Indexed data structure with a grid of rows and columns
• Its columns are heterogeneous types
• Has the capability to work with different types of objects, carry out grouping and joining
operations, handle missing values, create pivot tables, and deal with dates

Data Analytic
49

Series & DataFrames


• A Series is essentially a column
• A DataFrame is a multi-dimensional table made up of a collection of Series

Series Series DataFrame

Apple Orange Apple Orange


0 3 0 0 0 3 0
1 2 1 3 1 2 3
2 0 2 7 2 0 7
3 1 3 2 3 1 2

Data Analytic
50

Create Empty DataFrame


Code:
# Import pandas library
import pandas as pd

# Create empty DataFrame


df = pd.DataFrame()

# Header of dataframe.
df.head()

Output:
An empty DataFrame

Data Analytic
51

Create DataFrame
Code:
Use a dictionary of the list to create a DataFrame
# Create dictionary of list
data = {'Name': ['Vijay', 'Sundar', 'Satyam', 'Indira'], 'Age': [23, 45, 46, 52 ]}

# Create the pandas DataFrame


df = pd.DataFrame(data)

# Header of dataframe.
df.head()

Output:

 The keys of the dictionary are equivalent to columns,


 The values are represented as a list that is equivalent to
the rows of the DataFrame

Data Analytic
52

Create DataFrame
Code:
# Pandas DataFrame by lists of dicts.
# Initialise data to lists.
data =[ {'Name': 'Vijay', 'Age': 23},{'Name': 'Sundar', 'Age': 25},{'Name': 'Shankar', 'Age': 26}]

# Creates DataFrame. The DataFrame is created using a list of dictionaries


df = pd.DataFrame(data,columns=['Name','Age'])  Each item is a dictionary
 Each key is the name of the column
# Print dataframe header  The value is the cell value for a row
df.head()

Output:

Data Analytic
53

Create DataFrame using list of tuples


Code:
# Creating DataFrame using list of tuples.
data = [('Vijay', 23),( 'Sundar', 45), ('Satyam', 46), ('Indira',52)]

# Create dataframe The DataFrame is created using a list of tuples


df = pd.DataFrame(data, columns=['Name','Age'])  Each item is a tuple
 Each tuple is equivalent to the row of columns
# Print dataframe header
df.head()

Output:

Data Analytic
54

Pandas Series
• Pandas Series is a one-dimensional sequential data structure
• Able to handle any type of data, such as string, numeric, datetime, Python lists, and
dictionaries with labels and indexes
• Series is one of the columns of a DataFrame
• Create a Series using a Python dictionary, NumPy array, and scalar value

Data Analytic
55

Pandas Series using Python Dictionary


• Create a dictionary object and pass it to the Series object
Code:
# Creating Pandas Series using Dictionary
dict1 = {0 : 'Ajay', 1 : 'Jay', 2 : 'Vijay'}

# Create Pandas Series


series = pd.Series(dict1)

# Show series
series

Output:
0 Ajay
1 Jay
2 Vijay
dtype: object

Data Analytic
56

Pandas Series using NumPy Array


• Create a NumPy array object and pass it to the Series object
Code:
# load Pandas and NumPy
import pandas as pd
import numpy as np

# Create NumPy array


arr = np.array([51,65,48,59, 68])

# Create Pandas Series


series = pd.Series(arr)
series

Output:
0 51
1 65
2 48
3 59
4 68
dtype: int32

Data Analytic
57

Pandas Series using Single Scalar Value


• Create a pandas Series with a scalar value, pass the scalar value and index list to a Series
object
Code:
# load Pandas and NumPy
import pandas as pd
import numpy as np

# Create Pandas Series


series = pd.Series(10, index=[0, 1, 2, 3, 4, 5])
series

Output:
0 10
1 10
2 10
3 10
4 10
5 10
dtype: int64

Data Analytic
58

Pandas Series from CSV File


Code:
# Import pandas
import pandas as pd
Read the csv file using the read_csv() function
# Load data using read_csv()
df = pd.read_csv("WHO_first9cols.csv")

# Show initial 5 records


df.head()

Data Analytic
59

Pandas Series from CSV File


Code:
# Import pandas
import pandas as pd

# Load data using read_csv()


df = pd.read_csv("WHO_first9cols.csv")
Show last 5 records
# Show last 5 records
df.tail()

Data Analytic
60

Pandas Series
• The pandas Series data structure shares some of the common attributes of DataFrames and
also has a name attribute
Code:
# Show the shape of DataFrame
print("Shape:", df.shape)

Output:
Shape: (202, 9)

Data Analytic
61

Pandas Series
• Check the column list of a DataFrame:
Code:
# Check the column list of DataFrame
print("List of Columns:", df.columns)

Output:
List of Columns: Index(['Country', 'CountryID', 'Continent', 'Adolescent fertility rate (%)',
'Adult literacy rate (%)',
'Gross national income per capita (PPP international $)',
'Net primary school enrolment ratio female (%)',
'Net primary school enrolment ratio male (%)',
'Population (in thousands) total'],
dtype='object')

Data Analytic
62

Pandas Series
• Check the data types of DataFrame columns
Code:
# Show the datatypes of columns
print("Data types:", df.dtypes)

Output:
Data types: Country object
CountryID int64
Continent int64
Adolescent fertility rate (%) float64
Adult literacy rate (%) float64
Gross national income per capita (PPP international $) float64
Net primary school enrolment ratio female (%) float64
Net primary school enrolment ratio male (%) float64
Population (in thousands) total float64
dtype: object

Data Analytic
63

Slicing Pandas Series


Code:
# Import pandas
import pandas as pd

# Load data using read_csv()


df = pd.read_csv("WHO_first9cols.csv")

# Select a series
country_series=df['Country’]

# Pandas Series Slicing


country_series[-5:]

Output:
197 Vietnam
198 West Bank and Gaza
199 Yemen
200 Zambia
201 Zimbabwe
Name: Country, dtype: object

Data Analytic
64

Dealing with Dates


• Dealing with dates is messy and complicated
• In time-series datasets: come across dates
• pandas offers date ranges, resamples time-series data, and performs date arithmetic
operations

Data Analytic
65

Pandas Date Range


Code:
import pandas as pd Create a range of dates starting from September 1, 2023,
lasting for 45 days
# Date range function
pd.date_range('09-01-2023', periods=45, freq='D')

Generates sequences of date and time with a fixed-


frequency interval

Output:
DatetimeIndex(['2023-09-01', '2023-09-02', '2023-09-03', '2023-09-04',
'2023-09-05', '2023-09-06', '2023-09-07', '2023-09-08',
'2023-09-09', '2023-09-10', '2023-09-11', '2023-09-12',
'2023-09-13', '2023-09-14', '2023-09-15', '2023-09-16',
'2023-09-17', '2023-09-18', '2023-09-19', '2023-09-20',
'2023-09-21', '2023-09-22', '2023-09-23', '2023-09-24',
'2023-09-25', '2023-09-26', '2023-09-27', '2023-09-28',
'2023-09-29', '2023-09-30', '2023-10-01', '2023-10-02',
'2023-10-03', '2023-10-04', '2023-10-05', '2023-10-06',
'2023-10-07', '2023-10-08', '2023-10-09', '2023-10-10',
'2023-10-11', '2023-10-12', '2023-10-13', '2023-10-14',
'2023-10-15'],
dtype='datetime64[ns]', freq='D')

Data Analytic
66

Create Dates Range


• date_range() freq parameters
• B for business day frequency
• W for weekly frequency
• H for hourly frequency
• M for minute frequency
• S for second frequency
• L for millisecond frequency
• U for microsecond frequency

Pandas Documentation:
https://pandas.pydata.org/docs/user_guide/timeseries.html
Data Analytic
67

Converts Date
Code:
# Convert argument to datetime
pd.to_datetime('1/1/1970')

Converts a timestamp string into datetime

Output:
Timestamp('1970-01-01 00:00:00')

Data Analytic
68

Converts Date
Code:
# Convert argument to datetime in specified format
pd.to_datetime(['20200101', '20200102'], format='%Y%m%d')

Convert a timestamp string into a datetime object in the


specified format

Output:
DatetimeIndex(['2020-01-01', '2020-01-02'], dtype='datetime64[ns]', freq=None)

Data Analytic
69

Handling an unknown format string


Code:
# Value Error
pd.to_datetime(['20200101', 'not a date'])

# Handle value error


pd.to_datetime(['20200101', 'not a date'], errors='coerce')

Output:
ParserError: Unknown string format: not a date
ParserError: Unknown string format: not a date present at position 1

Data Analytic
70

Terima Kasih
Adhi Harmoko Saputro

Data Analytic

You might also like