Linux Commands

How to Parse XML on Linux Command Line

Unlike other file types, XML is structured, and when you print its contents on the command line, you may think it’s unreadable. However, that’s how XML is structured, and the best option is to parse it using the available command line tools. While most believe parsing XML is challenging, this guide will break down the complexity by introducing two easy tools to ease your XML parsing on the Linux command line. Let’s take a look.

Two Ways of Parsing XML on Linux Command Line

XML stands for Extensible Markup Language. It works as a file format and a markup language, making it useful for transmission, storage, and other functionalities. The main drawback with XML is its difficulty to read due to its structure, and unless you have a clean way of parsing it, a single XML line can easily be confusing.

Take a look at the following image. Understanding what it means can be difficult, but we will see how to fix that and parse it using two tools.

Method 1. Using the xmllint Command

The xmllint is a reliable XML formator and validation tool. To use the tool, you must install the libxml2-utils package.

With xmllint installed, let’s proceed and parse our XML file. Here is the following syntax for using xmllint:

$ xmllint [options] xml-file

The first thing when parsing your XML file is to validate your XML is well written. For this, use the following command and optionally add the –noout to avoid printing the XML’s contents but only validate it:

$ xmllint --noout filename

If you have an error in your XML file, you will get an error output on your terminal, as shown in the following example:

Remember the output that was hard to read before? You can parse it using xmllint and get a pretty output on your command line. For that, use the –format argument and note how well formatted your XML file will get printed.

With xmllint, you can also change the indentation for your file and choose how many spaces you want for the indent. However, you must create an environment variable, XMLLINT_INDENT, and set the number of spaces you wish to use.

For instance, if we needed five spaces, the command to export the environment variable would be:

$ export XMLLINT_INDENT=" “

Now, your XML will get formatted with the specified indentation spaces.

If you notice your XML file has unnecessary spaces, you can remove them using the –noblanks argument, which eliminates even the newlines.

$ xmllint --noblanks filename

You will note removing the spaces will disorient the format of your XML, but you can use this option when you need to reduce the size of your XML document.

Method 2. Using the XMLStarlet Command

The XMLStarlet is another reliable tool for parsing XML via the command line. It has plenty of options that you can use to transform, parse, query, or validate your XML file. You must install the command line tool to use it and on Ubuntu, use the command provided below:

$ sudo apt install xmlstarlet

With XMLStarlet, you can easily extract data from your XML and perform other simple activities

Everything you can do with the command line tool is available on its help page. Let’s see several common usage examples.

To view an XML file, use the following command:

$ xmlstarlet format filename

If you needed to validate the XML, the command is provided below:

$ xmlstarlet val filename

To select data with XMLStarlet, use the select option and specify the path to the node. In our XML file, use the following command to select the address:

$ xmlstarlet select --template --value-of /customers/customer/address --nl test3.xml

In the previous command, the –nl specifies to add a new line after the output. You can modify the path to get any specific data in the XML

There are various ways you can specify the XMLStarlet options to work with your XML file, and the comprehensive options are on the man page for the command line.

Conclusion

Parsing XML files shouldn’t be challenging when using Linux. You will enjoy working with XML files if you have the right command line tools to get the job done. This guide focused on two command line options for parsing XML files. Try them out and see which you find easier to use.

About the author

Denis Kariuki

Denis is a Computer Scientist with a passion for Networking and Cyber Security. I love the terminal, and using Linux is a hobby. I am passionate about sharing tips and ideas about Linux and computing.