Python Pandas - Feather File Format



The Feather file format in Pandas provides a fast and efficient way to store and retrieve DataFrame data in a binary format. It is a portable file format optimized for high-performance I/O operations and is portable across different platforms.

What is the Feather File Format?

Feather is a binary columnar file format designed for efficient data storage and fast retrieval of tabular data. It supports all Pandas data types, including extension types like categorical and timezone-aware datetime types. The format is based on Apache Arrow's memory specification, enabling high-performance I/O operations.

The Feather file format is language-independent binary file format designed for efficient data exchanging. It is supported by both Python and R languages, ensuring easy data sharing compatibility across data analysis languages. This format is also efficient for fast reading and writing capabilities with less memory usage.

Important Considerations

When working with Feather files in Pandas, you need to keep the following points in mind −

  • Index Storage: Pandas does not store DataFrame indices (Index, or MultiIndex) in Feather files. You can use reset_index() method if you need to store the index.

  • Unique Column Names: Duplicate or non-string column names are not supported.

  • Object Data Types: Columns with object data types are not supported and will raise an error during serialization.

Saving a Pandas DataFrame as a Feather File

To save a Pandas object to a Feather file, you can use the DataFrame.to_feather() method, which saves data of the Pandas object to a file in feather format.

Note: Before saving or retrieving the data from a feather file you need to ensure that the 'pyarrow' library is installed. It is an optional Python dependency library that needs to be installed it by using the following command −

pip install pyarrow.

Example

Following is the example that uses the to_feather() method for saving a Pandas DataFrame object into a feather file.

import pandas as pd
import numpy as np

# Create a sample DataFrame
df = pd.DataFrame({
"a": list("abc"),
"b": list(range(1, 4)),
"c": np.arange(3, 6).astype("u1"),
"d": np.arange(4.0, 7.0),
"e": [True, False, True],
"f": pd.Categorical(list("abc")),
"g": pd.date_range("20240101", periods=3)
})
print("Original DataFrame:")
print(df)

# Save the DataFrame as a feather file
df.to_feather("df_feather_file.feather")

print("\nDataFrame is successfully saved as a feather file.")

When we run above program, it produces following result −

Original DataFrame:
a b c d e f g
0 a 1 3 4.0 True a 2024-01-01
1 b 2 4 5.0 False b 2024-01-02
2 c 3 5 6.0 True c 2024-01-03
DataFrame is successfully saved as a feather file.

Reading a Feather File into Pandas

For loading a feather file data into the Pandas object, you can use the read_feather() method. This method provides several options for customizing data reading.

Example

This example reads the Pandas object from a feather file using the Pandas read_feather() method.

import pandas as pd
import numpy as np

# Create a sample DataFrame
df = pd.DataFrame({
"a": list("abc"),
"b": list(range(1, 4)),
"c": np.arange(3, 6).astype("u1"),
"d": np.arange(4.0, 7.0),
"e": [True, False, True],
"f": pd.Categorical(list("abc")),
"g": pd.date_range("20240101", periods=3)
})

# Save the DataFrame as a feather file
df.to_feather("df_feather_file.feather")

# Load the feather file
result = pd.read_feather("df_feather_file.feather")
# Display the DataFrame
print(result)

# Verify data types
print("\nData Type of the each column:")
print(result.dtypes)

While executing the above code we get the following output −


a b c d e f g
0 a 1 3 4.0 True a 2024-01-01
1 b 2 4 5.0 False b 2024-01-02
2 c 3 5 6.0 True c 2024-01-03
Data Type of the each column: a object b int64 c uint8 d float64 e bool f category g datetime64[ns] dtype: object

Handling Feather Files in Memory

In-memory files in Python stores the data in RAM rather than reading/writing to a disk. This avoids the high latency of physical I/O operations. Python provides several types of in-memory files, including −

  • Memory-mapped files

  • StringIO

  • BytesIO

  • MemoryFS

Example

This example demonstrates saving and loading a DataFrame as a feather format In-Memory using the read_feather() and to_feather() methods with the help of the BytesIO library, for the in-memory binary data storage.

import pandas as pd
import io

# Create a DataFrame
df = pd.DataFrame({"Col_1": range(5), "Col_2": range(5, 10)})
print("Original DataFrame:")
print(df)

# Save the DataFrame as In-Memory feather
buf = io.BytesIO()
df.to_feather(buf)

# Read the DataFrame from the in-memory buffer
loaded_df = pd.read_feather(buf)
print("\nDataFrame Loaded from In-Memory Feather:")
print(loaded_df)

Following is an output of the above code −

Original DataFrame:
Col_1 Col_2
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
DataFrame Loaded from In-Memory Feather:
Col_1 Col_2
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
Advertisements