
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Pandas DataFrame to_xml() Method
The Python Pandas library provides the DataFrame.to_xml() method to convert the contents of a DataFrame into an XML document. You can customize the output and save it to a file or return it as a string. This method is a powerful tool for handling data transformation and is highly customizable. This method provides various parameters for customizing the structure and format of the XML output.
XML (Extensible Markup Language) is a widely used format for data interchange and storage. Using this method, you can generate structured and hierarchical representations of tabular data.
Syntax
The syntax of the to_xml() method is as follows −
DataFrame.to_xml(path_or_buffer=None, *, index=True, root_name='data', row_name='row', na_rep=None, attr_cols=None, elem_cols=None, namespaces=None, prefix=None, encoding='utf-8', xml_declaration=True, pretty_print=True, parser='lxml', stylesheet=None, compression='infer', storage_options=None)
Parameters
The Python Pandas to_xml() method accepts the following parameters −
path_or_buffer: The file path or buffer to write the XML output. If None, the XML is returned as a string instead of saving it to a file.
index: a boolean determines whether to include the DataFrame index in the XML. Defaults to True.
root_name: The name of the root element in the XML. Defaults to 'data'.
row_name: The name of each row element. Defaults to 'row'.
na_rep: Representation for missing values (NaN).
- attr_cols: List of columns to be written as attributes in the row elements.
- elem_cols: List of columns to be written as child elements of the row element.
namespaces: Dictionary defining XML namespaces.
- prefix: Namespace prefix for elements and attributes.
xml_declaration: Boolean to include the XML declaration. Defaults to True.
pretty_print: Whether to pretty-print the XML with indentation and line breaks. Defaults to True.
- parser: Specifies the parser module to use ('lxml' or 'etree'). Default is 'lxml'.
- stylesheet: An optional XSLT stylesheet to transform the XML.
- compression: Compression options for the output file.
- storage_options: Additional options for storage connection.
encoding: Encoding for the XML output. Defaults to 'utf-8'.
Note − The to_xml() method does not support advanced XML features like DTD, CData, XSD schemas, or processing instructions. It supports namespaces at the root level, and the layout can be transformed using stylesheet.
Return Value
The to_xml() method returns an XML string if path_or_buffer is not specified. Otherwise, it writes the XML to the given file path or buffer.
Example: Converts DataFrame to an XML string
Here is a basic example demonstrates working of the DataFrame.to_xml() method for converting pandas DataFrame to an XML string.
import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Convert to XML xml_data = df.to_xml() print('Output XML String:') print(xml_data)
Output of the above code is as follows −
Output XML String: <?xml version='1.0' encoding='utf-8'?> <data> <row> <index>0</index> <Name>Kiran</Name> <Age>25</Age> <City>New Delhi</City> </row> <row> <index>1</index> <Name>Priya</Name> <Age>30</Age> <City>Hyderabad</City> </row> <row> <index>2</index> <Name>Naveen</Name> <Age>35</Age> <City>Chennai</City> </row> </data>
Example: Saving DataFrame to an XML File
The following example shows how to save a Pandas DataFrame as an XML file. Here we have specified the file path the to_xml() method.
import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Save the XML to a file df.to_xml('output.xml') print("DataFrame saved to 'output.xml'")
Following is an output of the above code −
DataFrame saved to 'output.xml'
Example: Customizing Root and Row Names while Saving DataFrame as an XML
The following example demonstrates specifying custom names for the root and row elements of an XML string created from the Pandas DataFrame. For this you can use the root_name and row_name parameters of the to_xml() method.
import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Customizing the root and row names xml_data = df.to_xml(root_name='employees', row_name='employee') print('Output XML String with custom root and row names:') print(xml_data)
When we run above program, it produces following result −
Output XML String with custom root and row names: <?xml version='1.0' encoding='utf-8'?> <employees> <employee> <index>0</index> <Name>Kiran</Name> <Age>25</Age> <City>New Delhi</City> </employee> <employee> <index>1</index> <Name>Priya</Name> <Age>30</Age> <City>Hyderabad</City> </employee> <employee> <index>2</index> <Name>Naveen</Name> <Age>35</Age> <City>Chennai</City> </employee> </employees>
Example: Convert DataFrame to XML with Missing Value
This example demonstrates using the na_rep parameter to handle missing values in the DataFrame while converting it to an XML data.
import pandas as pd # Create a DataFrame data = {'Name': [None, 'Priya', 'Naveen'], 'Age': [25, None, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Convert to XML with missing value representation xml_data = df.to_xml(na_rep='NA') print('Output XML String with missing value representation:') print(xml_data)
Output of the above code is as follows −
Output XML String with missing value representation: <?xml version='1.0' encoding='utf-8'?> <data> <row> <index>0</index> <Name>NA</Name> <Age>25.0</Age> <City>New Delhi</City> </row> <row> <index>1</index> <Name>Priya</Name> <Age>NA</Age> <City>Hyderabad</City> </row> <row> <index>2</index> <Name>Naveen</Name> <Age>35.0</Age> <City>Chennai</City> </row> </data>
Example: Specifying Columns as Attributes in the XML
This example demonstrates saving Pandas DataFrame as an XML with specific columns as attributes in the XML.
import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Convert columns to attributes xml_data = df.to_xml(attr_cols=['Name']) print('Output XML String with specific columns as attributes:') print(xml_data)
Following is an output of the above code −
Output XML String with specific columns as attributes: <?xml version='1.0' encoding='utf-8'?> <data> <row index="0" Name="Kiran"/> <row index="1" Name="Priya"/> <row index="2" Name="Naveen"/> </data>