Pandas DataFrame to_xml() Method



The Python Pandas library provides the DataFrame.to_xml() method to convert the contents of a DataFrame into an XML document. You can customize the output and save it to a file or return it as a string. This method is a powerful tool for handling data transformation and is highly customizable. This method provides various parameters for customizing the structure and format of the XML output.

XML (Extensible Markup Language) is a widely used format for data interchange and storage. Using this method, you can generate structured and hierarchical representations of tabular data.

Syntax

The syntax of the to_xml() method is as follows −

DataFrame.to_xml(path_or_buffer=None, *, index=True, root_name='data', row_name='row', na_rep=None, attr_cols=None, elem_cols=None, namespaces=None, prefix=None, encoding='utf-8', xml_declaration=True, pretty_print=True, parser='lxml', stylesheet=None, compression='infer', storage_options=None)

Parameters

The Python Pandas to_xml() method accepts the following parameters −

  • path_or_buffer: The file path or buffer to write the XML output. If None, the XML is returned as a string instead of saving it to a file.

  • index: a boolean determines whether to include the DataFrame index in the XML. Defaults to True.

  • root_name: The name of the root element in the XML. Defaults to 'data'.

  • row_name: The name of each row element. Defaults to 'row'.

  • na_rep: Representation for missing values (NaN).

  • attr_cols: List of columns to be written as attributes in the row elements.
  • elem_cols: List of columns to be written as child elements of the row element.
  • namespaces: Dictionary defining XML namespaces.

  • prefix: Namespace prefix for elements and attributes.
  • xml_declaration: Boolean to include the XML declaration. Defaults to True.

  • pretty_print: Whether to pretty-print the XML with indentation and line breaks. Defaults to True.

  • parser: Specifies the parser module to use ('lxml' or 'etree'). Default is 'lxml'.
  • stylesheet: An optional XSLT stylesheet to transform the XML.
  • compression: Compression options for the output file.
  • storage_options: Additional options for storage connection.
  • encoding: Encoding for the XML output. Defaults to 'utf-8'.

Note − The to_xml() method does not support advanced XML features like DTD, CData, XSD schemas, or processing instructions. It supports namespaces at the root level, and the layout can be transformed using stylesheet.

Return Value

The to_xml() method returns an XML string if path_or_buffer is not specified. Otherwise, it writes the XML to the given file path or buffer.

Example: Converts DataFrame to an XML string

Here is a basic example demonstrates working of the DataFrame.to_xml() method for converting pandas DataFrame to an XML string.

import pandas as pd

# Create a DataFrame
data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']}
df = pd.DataFrame(data)

# Convert to XML
xml_data = df.to_xml()

print('Output XML String:')
print(xml_data)

Output of the above code is as follows −

Output XML String:
<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <index>0</index>
    <Name>Kiran</Name>
    <Age>25</Age>
    <City>New Delhi</City>
  </row>
  <row>
    <index>1</index>
    <Name>Priya</Name>
    <Age>30</Age>
    <City>Hyderabad</City>
  </row>
  <row>
    <index>2</index>
    <Name>Naveen</Name>
    <Age>35</Age>
    <City>Chennai</City>
  </row>
</data>

Example: Saving DataFrame to an XML File

The following example shows how to save a Pandas DataFrame as an XML file. Here we have specified the file path the to_xml() method.

import pandas as pd

# Create a DataFrame
data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']}
df = pd.DataFrame(data)

# Save the XML to a file
df.to_xml('output.xml')
print("DataFrame saved to 'output.xml'")

Following is an output of the above code −

DataFrame saved to 'output.xml'

Example: Customizing Root and Row Names while Saving DataFrame as an XML

The following example demonstrates specifying custom names for the root and row elements of an XML string created from the Pandas DataFrame. For this you can use the root_name and row_name parameters of the to_xml() method.

import pandas as pd

# Create a DataFrame
data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']}
df = pd.DataFrame(data)

# Customizing the root and row names
xml_data = df.to_xml(root_name='employees', row_name='employee')

print('Output XML String with custom root and row names:')
print(xml_data)

When we run above program, it produces following result −

Output XML String with custom root and row names:
<?xml version='1.0' encoding='utf-8'?>
<employees>
  <employee>
    <index>0</index>
    <Name>Kiran</Name>
    <Age>25</Age>
    <City>New Delhi</City>
  </employee>
  <employee>
    <index>1</index>
    <Name>Priya</Name>
    <Age>30</Age>
    <City>Hyderabad</City>
  </employee>
  <employee>
    <index>2</index>
    <Name>Naveen</Name>
    <Age>35</Age>
    <City>Chennai</City>
  </employee>
</employees>

Example: Convert DataFrame to XML with Missing Value

This example demonstrates using the na_rep parameter to handle missing values in the DataFrame while converting it to an XML data.

import pandas as pd

# Create a DataFrame
data = {'Name': [None, 'Priya', 'Naveen'], 'Age': [25, None, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']}
df = pd.DataFrame(data)

# Convert to XML with missing value representation
xml_data = df.to_xml(na_rep='NA')

print('Output XML String with missing value representation:')
print(xml_data)

Output of the above code is as follows −

Output XML String with missing value representation:
<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <index>0</index>
    <Name>NA</Name>
    <Age>25.0</Age>
    <City>New Delhi</City>
  </row>
  <row>
    <index>1</index>
    <Name>Priya</Name>
    <Age>NA</Age>
    <City>Hyderabad</City>
  </row>
  <row>
    <index>2</index>
    <Name>Naveen</Name>
    <Age>35.0</Age>
    <City>Chennai</City>
  </row>
</data>

Example: Specifying Columns as Attributes in the XML

This example demonstrates saving Pandas DataFrame as an XML with specific columns as attributes in the XML.

import pandas as pd

# Create a DataFrame
data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']}
df = pd.DataFrame(data)

# Convert columns to attributes
xml_data = df.to_xml(attr_cols=['Name'])

print('Output XML String with specific columns as attributes:')
print(xml_data)

Following is an output of the above code −

Output XML String with specific columns as attributes:
<?xml version='1.0' encoding='utf-8'?>
<data>
  <row index="0" Name="Kiran"/>
  <row index="1" Name="Priya"/>
  <row index="2" Name="Naveen"/>
</data>
python_pandas_io_tool.htm
Advertisements