Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Python Pandas - Basics
Python Pandas - Introduction to Data Structures
Python Pandas - Index Objects
Python Pandas - Panel
Python Pandas - Basic Functionality
Python Pandas - Indexing & Selecting Data
Python Pandas - Series
Python Pandas - Series
Python Pandas - Slicing a Series Object
Python Pandas - Attributes of a Series Object
Python Pandas - Arithmetic Operations on Series Object
Python Pandas - Converting Series to Other Objects
Python Pandas - DataFrame
Python Pandas - DataFrame
Python Pandas - Accessing DataFrame
Python Pandas - Slicing a DataFrame Object
Python Pandas - Modifying DataFrame
Python Pandas - Removing Rows from a DataFrame
Python Pandas - Arithmetic Operations on DataFrame
Python Pandas - IO Tools
Python Pandas - IO Tools
Python Pandas - Working with CSV Format
Python Pandas - Reading & Writing JSON Files
Python Pandas - Reading Data from an Excel File
Python Pandas - Writing Data to Excel Files
Python Pandas - Working with HTML Data
Python Pandas - Clipboard
Python Pandas - Working with HDF5 Format
Python Pandas - Comparison with SQL
Python Pandas - Data Handling
Python Pandas - Sorting
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Concatenation
Python Pandas - Statistical Functions
Python Pandas - Descriptive Statistics
Python Pandas - Working with Text Data
Python Pandas - Function Application
Python Pandas - Options & Customization
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Merging/Joining
Python Pandas - MultiIndex
Python Pandas - Basics of MultiIndex
Python Pandas - Indexing with MultiIndex
Python Pandas - Advanced Reindexing with MultiIndex
Python Pandas - Renaming MultiIndex Labels
Python Pandas - Sorting a MultiIndex
Python Pandas - Binary Operations
Python Pandas - Binary Comparison Operations
Python Pandas - Boolean Indexing
Python Pandas - Boolean Masking
Python Pandas - Data Reshaping & Pivoting
Python Pandas - Pivoting
Python Pandas - Stacking & Unstacking
Python Pandas - Melting
Python Pandas - Computing Dummy Variables
Python Pandas - Categorical Data
Python Pandas - Categorical Data
Python Pandas - Ordering & Sorting Categorical Data
Python Pandas - Comparing Categorical Data
Python Pandas - Handling Missing Data
Python Pandas - Missing Data
Python Pandas - Filling Missing Data
Python Pandas - Interpolation of Missing Values
Python Pandas - Dropping Missing Data
Python Pandas - Calculations with Missing Data
Python Pandas - Handling Duplicates
Python Pandas - Duplicated Data
Python Pandas - Counting & Retrieving Unique Elements
Python Pandas - Duplicated Labels
Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy
Python Pandas - Time-series Data
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Sparse Data Structures
Python Pandas - Sparse Data
Python Pandas - Visualization
Python Pandas - Visualization
Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Python Pandas - CategoricalDtype

Quiz

Pandas CategoricalDtype

In Pandas, CategoricalDtype defines the data type for categorical data, specifying categories and their ordering. This data type can be useful when working with categorical data in Series, DataFrames, and various Pandas operations.

Using CategoricalDtype provides better control over categorical data by explicitly defining categories and their order. This can help reduce memory usage and improve performance when handling large datasets. In this tutorial, we will learn about CategoricalDtype and its structure, and practical examples.

CategoricalDtype Structure

A CategoricalDtype is fully described by −

categories: A sequence of unique values without missing entries.
ordered
: A boolean indicating if the categories have an inherent order.

Creating CategoricalDtype

You can create a CategoricalDtype using the pandas.api.types.CategoricalDtype class. This class defines a custom data type for categorical data, allowing you to control categories and their order explicitly.

Following is the syntax for creating the CategoricalDtype in Pandas −

from pandas.api.types import CategoricalDtype
cat_type = CategoricalDtype(categories=None, ordered=False)

Here,

categories: This parameter takes a sequence of unique, non-null values defining valid categories. It is stored as a Pandas index and if not provided, the dtype of that data index will be used.
ordered: It takes a boolean value indicating whether the categories have an order. By default it is set to False.

Example: Applying CategoricalDtype to a Series

The following example demonstrates creating a Pandas Series object with the CategoricalDtype.

import pandas as pd
from pandas.api.types import CategoricalDtype

# Define custom CategoricalDtype
cat_type = CategoricalDtype(categories=["low", "medium", "high"], ordered=True)

# Create a Series with a defined categorical type
s = pd.Series(["low", "high", "medium", "low"], dtype=cat_type)

# Display the Series
print("Categorical Series:")
print(s)

Following is the output of the above code −

Categorical Series:
0       low
1      high
2    medium
3       low
dtype: category
Categories (3, object): ['low' < 'medium' < 'high']

Example: Applying CategoricalDtype to a DataFrame

The following example shows how to apply CategoricalDtype to a DataFrame column.

import pandas as pd
from pandas.api.types import CategoricalDtype

# Define custom CategoricalDtype
cat_type = CategoricalDtype(categories=["small", "medium", "large"], ordered=True)

# Create a DataFrame
df = pd.DataFrame({"Size": ["large", "small", "medium", "large"]})

# Convert column to CategoricalDtype
df["Size"] = df["Size"].astype(cat_type)

# Display the DataFrame
print("DataFrame with Categorical Data:")
print(df['Size'])

When we run above program, it produces following result −

DataFrame with Categorical Data:
0     large
1     small
2    medium
3     large
Name: Size, dtype: category
Categories (3, object): ['small' < 'medium' < 'large']

Usage of CategoricalDtype in Pandas

A CategoricalDtype can be used wherever pandas expects a dtype. such as −

pandas.read_csv()
DataFrame.astype()
pandas.Series() constructor

Example: Using CategoricalDtype with DataFrame.astype()

This example shows using the CategoricalDtype with the Pandas DataFeam.astype() method for specifying the data type of a DataFrame column.

import pandas as pd
from pandas.api.types import CategoricalDtype

# Creating a DataFrame
data = {'col1': ["duck", "wolf", 'cat']}
df = pd.DataFrame(data)

# Convert column to CategoricalDtype
custom_dtype = CategoricalDtype(categories=["duck", "cat", "wolf"], ordered=True)
df['col1'] = df['col1'].astype(custom_dtype)

# Display the DataFrame
print("DataFrame with Categorical Data:")
print(df['col1'])

While executing the above code we get the following output −

DataFrame with Categorical Data:
0    duck
1    wolf
2     cat
Name: col1, dtype: category
Categories (3, object): ['duck' < 'cat' < 'wolf']

Example: Default String Representation

As a shortcut, you can also use the 'category' string representation as the dtype for CategoricalDtype(). This assumes default unordered categories inferred from the data.

This example uses the shortcut 'category' for applying categorical data type to the Pandas Series object.

import pandas as pd
from pandas.api.types import CategoricalDtype

# Create a Series with a defined categorical type
s = pd.Series(["low", "high", "medium", "low"], dtype='category')

# Display the Series
print("Categorical Series:")
print(s)

Following is the output of the above code −

Categorical Series:
0       low
1      high
2    medium
3       low
dtype: category
Categories (3, object): ['high', 'low', 'medium']

Comparing CategoricalDtype Instances

Instances of CategoricalDtype are equal if they have the same categories and order. When categories are unordered, their order does not matter.

Example

This example compares the ordered and unordered CategoricalDtype instance for showing the equality semantics of the categorical data type object.

import pandas as pd
from pandas.api.types import CategoricalDtype

c1 = CategoricalDtype(['a', 'b', 'c'], ordered=False)

# Unordered categories - order does not matter
result1 = (c1 == CategoricalDtype(['b', 'c', 'a'], ordered=False))
print("Equality of two unordered same categories:", result1)

# Ordered categories - different orders considered unequal
result2 = (c1 == CategoricalDtype(['a', 'b', 'c'], ordered=True))
print("Equality of ordered category with an unordered one:", result2)

# Comparison with 'category' shortcut
print(c1 == 'category')

When we run above program, it produces following result −

Equality of two unordered same categories: True
Equality of ordered category with an unordered one: False
True

Print Page