Syntax:
Let’s see the syntax and parameters that are passed to pandas.read_xml() function:
- path/buffer: We can specify the file name with the “.xml” extension such that the XML is loaded into DataFrame. We can also provide XML within the string.
- xpath (By default = ‘./*’): Using this parameter, the nodes are used to create the DataFrame. Basically, if the row name is custom, we need to give the path with this name.
- namespaces (By default = None): Custom namespaces are provided to this parameter. They are defined in the root element.
- encoding (By default = ‘utf-8’): This parameter refers to the encoding type of the XML document.
- attrs_only (By default = False): XML can be created with attributes/elements. If you want to create the Pandas DataFrame only with the attributes in the specified xpath, set this parameter to “True” (it will not consider the elements).
- elems_only (By default = False): XML can be created with attributes/elements. If you want to create the Pandas DataFrame only with the elements in the specified xpath, set this parameter to “True” (it will not consider the attributes).
Example 1: Read XML as String
Consider the XML document from a string (temperature_xml) with the data as root name and two “childs” rows. Each row holds two values – Country and Celsius. Use the pandas.read_xml() function to create the “Temperature” DataFrame from the temperature_xml.
temperature_xml='''<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
<Country>India</Country>
<Celsius>32</Celsius>
</row>
<row>
<Country>USA</Country>
<Celsius>43</Celsius>
</row>
</data>'''
# Read temperature_xml into Temperature DataFrame
Temperature=pandas.read_xml(temperature_xml)
print(Temperature)
Output:
The “Temperature” DataFrame is created with two records.
Example 2: Read XML from File
Consider the XML document from a file (temp.xml) with the data as root name and two childs rows. Each row holds two values – Country and Celsius. Use the pandas.read_xml() function to create the “Temperature” DataFrame from temp.xml.
Temperature=pandas.read_xml('temp.xml')
print(Temperature)
Output:
The “Temperature” DataFrame is created with two records.
Example 3: Read XML with Namespaces
Consider the XML document from a string (temperature_xml) with the temp=”https://temperaturedetails.com” namespace. Create the DataFrame from this XML.
temperature_xml='''India</temp:Country>
<temp:Celsius>32</temp:Celsius>
</temp:row>
<temp:row>
<temp:Country>USA</temp:Country>
<temp:Celsius>43</temp:Celsius>
</temp:row>
</temp:data>'''
# namespaces parameter
Temperature=pandas.read_xml(temperature_xml,namespaces={"temp": "https://temperaturedetails.com"})
print(Temperature)
Output:
Example 4: Xpath Parameter
Consider the XML document with the row name as “detail”. Use the xpath parameter to specify the path such that the values that are present under detail are inserted into the DataFrame as a row. The path is “.//detail”.
temperature_xml='''<?xml version='1.0' encoding='utf-8'?>
<data>
<detail>
<Country>India</Country>
<Celsius>32</Celsius>
</detail>
<detail>
<Country>USA</Country>
<Celsius>43</Celsius>
</detail>
</data>'''
# xpath parameter
Temperature=pandas.read_xml(temperature_xml,xpath=".//detail")
print(Temperature)
Output:
Example 5: Attrs_Only Parameter
Consider the XML document with two rows. Each row has the “season” attribute and two elements which are “Country” and “Celsius”.
- Set the attrs_only parameter to “True” to create the “Temperature” DataFrame with only the attributes.
- Set the attrs_only parameter to “False” to create the “Temperature” DataFrame with all attributes and elements.
temperature_xml='''<?xml version='1.0' encoding='utf-8'?>
<data>
<detail season="summer">
<Country>India</Country>
<Celsius>32</Celsius>
</detail>
<detail season="winter">
<Country>USA</Country>
<Celsius>43</Celsius>
</detail>
</data>'''
# xpath parameter & attrs_only
Temperature=pandas.read_xml(temperature_xml,xpath=".//detail",attrs_only=True)
print(Temperature,"\n")
Temperature=pandas.read_xml(temperature_xml,xpath=".//detail",attrs_only=False)
print(Temperature)
Output:
In the first output, the “Temperature” is created with the attributes of the XML. In the second output, “Temperature” is created with all elements and attributes.
Example 6: Elems_Only Parameter
- Set the elems_only parameter to “True” to create the “Temperature” DataFrame with only the elements.
- Set the elems_only parameter to “False” to create the “Temperature” DataFrame with all attributes and elements.
temperature_xml='''<?xml version='1.0' encoding='utf-8'?>
<data>
<detail season="summer">
<Country>India</Country>
<Celsius>32</Celsius>
</detail>
<detail season="winter">
<Country>USA</Country>
<Celsius>43</Celsius>
</detail>
</data>'''
# xpath parameter & elems_only
Temperature=pandas.read_xml(temperature_xml,xpath=".//detail",elems_only=True)
print(Temperature,"\n")
Temperature=pandas.read_xml(temperature_xml,xpath=".//detail",elems_only=False)
print(Temperature)
Output:
In the first output, “Temperature” is created with the elements of the XML. In the second output, “Temperature” is created with all elements and attributes.
Conclusion
We learned how to create the Pandas DataFrame from XML using the pandas.read_xml() function with examples. In each example, we specified one parameter that creates a DataFrame from XML with namespaces, row names, etc. In the last two examples, we created the DataFrame by including/excluding the XML elements and attributes.