Pandas Series.str.decode() Method



The Series.str.decode() method in Pandas allows you to convert byte strings into regular strings by using the specified encoding. This function is useful when working with encoded text data that needs to be decoded for analysis or processing.

This method is similar to the str.decode() method in Python 2 and the bytes.decode() method in Python 3, providing a easy way to handle encoded text data within a Pandas Series or Index.

Syntax

Following is the syntax of the Pandas Series.str.decode() method −

Series.str.decode(encoding, errors='strict')

Parameters

The Series.str.decode() method accepts the following parameters −

  • encoding − A string representing the name of the encoding used to decode the bytes.

  • errors − An optional string specifying the error handling scheme. The default is 'strict', which raises a UnicodeDecodeError on encoding errors. Other options include 'ignore', 'replace', 'backslashreplace', and 'namereplace'.

Return Value

The Series.str.decode() method returns a Series or Index of the same type as the calling object, containing the decoded strings.

Example

In this example, we demonstrate the basic usage of the Series.str.decode() method by decoding a Series of byte strings using the 'ascii' encoding.

import pandas as pd

# Create a Series of byte strings
ser = pd.Series([b'Tutorialspoint', b'123', b'$'])

# Decode byte strings using 'ascii' encoding
result = ser.str.decode('ascii')

print("Input Series:")
print(ser)
print("\nSeries after calling str.decode('ascii'):")
print(result)

When we run the above code, it produces the following output −

Input Series:
0    b'Tutorialspoint'
1               b'123'
2                 b'$'
dtype: object

Series after calling str.decode('ascii'):
0    Tutorialspoint
1               123
2                 $
dtype: object

Example

This example demonstrates how to use the Series.str.decode() method to decode a column of byte strings in a DataFrame using the 'utf-8' encoding.

import pandas as pd

# Create a DataFrame with a column of byte strings
df = pd.DataFrame({ 'COLUMN1': [b'\xc2\xa9', b'\xe2\x82\xac', b'\xf0\x9f\x87\x80'] })

# Decode byte strings using 'utf-8' encoding
result = df['COLUMN1'].str.decode("utf-8")

print("Input DataFrame:")
print(df)
print("\nDataFrame column after calling str.decode('utf-8'):")
print(result)

Following is the output of the above code −

Input DataFrame:
               COLUMN1
0          b'\xc2\xa9'
1      b'\xe2\x82\xac'
2  b'\xf0\x9f\x87\x80'

DataFrame column after calling str.decode('utf-8'):
0    
1    
2    
Name: COLUMN1, dtype: object

Example

Here's another example demonstrating the use of Series.str.decode() method.

import pandas as pd

# Create a Series of byte strings representing text in different encodings
ser = pd.Series([b'\xe2\x9c\x94', b'\xe2\x9c\x93', b'\xe2\x9c\x9c'])

# Decode byte strings using 'utf-8' encoding
result = ser.str.decode('utf-8')

print("Input Series:")
print(ser)
print("\nSeries after calling str.decode('utf-8'):")
print(result)

Following is the output of the above code −

Input Series:
0    b'\xe2\x9c\x94'
1    b'\xe2\x9c\x93'
2    b'\xe2\x9c\x9c'
dtype: object

Series after calling str.decode('utf-8'):
0    
1    
2    
dtype: object
python_pandas_working_with_text_data.htm
Advertisements