Python Pandas

Pandas.Read_XML()

The pandas.read_xml() reads the given XML document into a specified DataFrame object. The source can be in string format/file. With this function, it can be possible to create the Pandas DataFrame with only the specific elements in XML and the specified attributes by passing some parameters. In this guide, we will discuss all those parameters with separate examples.

Syntax:

Let’s see the syntax and parameters that are passed to pandas.read_xml() function:

pandas.read_xml(path/buffer, xpath, namespaces, elems_only, attrs_only, names, encoding)
  1. path/buffer: We can specify the file name with the “.xml” extension such that the XML is loaded into DataFrame. We can also provide XML within the string.
  2. xpath (By default = ‘./*’): Using this parameter, the nodes are used to create the DataFrame. Basically, if the row name is custom, we need to give the path with this name.
  3. namespaces (By default = None): Custom namespaces are provided to this parameter. They are defined in the root element.
  4. encoding (By default = ‘utf-8’): This parameter refers to the encoding type of the XML document.
  5. attrs_only (By default = False): XML can be created with attributes/elements. If you want to create the Pandas DataFrame only with the attributes in the specified xpath, set this parameter to “True” (it will not consider the elements).
  6. elems_only (By default = False): XML can be created with attributes/elements. If you want to create the Pandas DataFrame only with the elements in the specified xpath, set this parameter to “True” (it will not consider the attributes).

Example 1: Read XML as String

Consider the XML document from a string (temperature_xml) with the data as root name and two “childs” rows. Each row holds two values – Country and Celsius. Use the pandas.read_xml() function to create the “Temperature” DataFrame from the temperature_xml.

import pandas

temperature_xml='''<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <Country>India</Country>
    <Celsius>32</Celsius>
  </row>
  <row>
    <Country>USA</Country>
    <Celsius>43</Celsius>
  </row>
</data>'''


# Read temperature_xml into Temperature DataFrame
Temperature=pandas.read_xml(temperature_xml)
print(Temperature)

Output:

The “Temperature” DataFrame is created with two records.

Example 2: Read XML from File

Consider the XML document from a file (temp.xml) with the data as root name and two childs rows. Each row holds two values – Country and Celsius. Use the pandas.read_xml() function to create the “Temperature” DataFrame from temp.xml.

# Read temp.xml into Temperature DataFrame
Temperature=pandas.read_xml('temp.xml')
print(Temperature)

Output:

The “Temperature” DataFrame is created with two records.

Example 3: Read XML with Namespaces

Consider the XML document from a string (temperature_xml) with the temp=”https://temperaturedetails.com” namespace. Create the DataFrame from this XML.

import pandas

temperature_xml='''India</temp:Country>
    <temp:Celsius>32</temp:Celsius>
  </temp:row>
  <temp:row>
    <temp:Country>USA</temp:Country>
    <temp:Celsius>43</temp:Celsius>
  </temp:row>
</temp:data>'''


# namespaces parameter
Temperature=pandas.read_xml(temperature_xml,namespaces={"temp": "https://temperaturedetails.com"})
print(Temperature)

Output:

Example 4: Xpath Parameter

Consider the XML document with the row name as “detail”. Use the xpath parameter to specify the path such that the values that are present under detail are inserted into the DataFrame as a row. The path is “.//detail”.

import pandas

temperature_xml='''<?xml version='1.0' encoding='utf-8'?>
<data>
  <detail>
    <Country>India</Country>
    <Celsius>32</Celsius>
  </detail>
  <detail>
    <Country>USA</Country>
    <Celsius>43</Celsius>
  </detail>
</data>'''


# xpath parameter
Temperature=pandas.read_xml(temperature_xml,xpath=".//detail")
print(Temperature)

Output:

Example 5: Attrs_Only Parameter

Consider the XML document with two rows. Each row has the “season” attribute and two elements which are “Country” and “Celsius”.

  1. Set the attrs_only parameter to “True” to create the “Temperature” DataFrame with only the attributes.
  2. Set the attrs_only parameter to “False” to create the “Temperature” DataFrame with all attributes and elements.
import pandas

temperature_xml='''<?xml version='1.0' encoding='utf-8'?>
<data>
  <detail season="summer">
    <Country>India</Country>
    <Celsius>32</Celsius>
  </detail>
  <detail season="winter">
    <Country>USA</Country>
    <Celsius>43</Celsius>
  </detail>
</data>'''


# xpath parameter & attrs_only
Temperature=pandas.read_xml(temperature_xml,xpath=".//detail",attrs_only=True)
print(Temperature,"\n")

Temperature=pandas.read_xml(temperature_xml,xpath=".//detail",attrs_only=False)
print(Temperature)

Output:

In the first output, the “Temperature” is created with the attributes of the XML. In the second output, “Temperature” is created with all elements and attributes.

Example 6: Elems_Only Parameter

  1. Set the elems_only parameter to “True” to create the “Temperature” DataFrame with only the elements.
  2. Set the elems_only parameter to “False” to create the “Temperature” DataFrame with all attributes and elements.
import pandas

temperature_xml='''<?xml version='1.0' encoding='utf-8'?>
<data>
  <detail season="summer">
    <Country>India</Country>
    <Celsius>32</Celsius>
  </detail>
  <detail season="winter">
    <Country>USA</Country>
    <Celsius>43</Celsius>
  </detail>
</data>'''


# xpath parameter & elems_only
Temperature=pandas.read_xml(temperature_xml,xpath=".//detail",elems_only=True)
print(Temperature,"\n")

Temperature=pandas.read_xml(temperature_xml,xpath=".//detail",elems_only=False)
print(Temperature)

Output:

In the first output, “Temperature” is created with the elements of the XML. In the second output, “Temperature” is created with all elements and attributes.

Conclusion

We learned how to create the Pandas DataFrame from XML using the pandas.read_xml() function with examples. In each example, we specified one parameter that creates a DataFrame from XML with namespaces, row names, etc. In the last two examples, we created the DataFrame by including/excluding the XML elements and attributes.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain