Two Ways of Parsing XML on Linux Command Line
XML stands for Extensible Markup Language. It works as a file format and a markup language, making it useful for transmission, storage, and other functionalities. The main drawback with XML is its difficulty to read due to its structure, and unless you have a clean way of parsing it, a single XML line can easily be confusing.
Take a look at the following image. Understanding what it means can be difficult, but we will see how to fix that and parse it using two tools.
Method 1. Using the xmllint Command
The xmllint is a reliable XML formator and validation tool. To use the tool, you must install the libxml2-utils package.
With xmllint installed, let’s proceed and parse our XML file. Here is the following syntax for using xmllint:
The first thing when parsing your XML file is to validate your XML is well written. For this, use the following command and optionally add the –noout to avoid printing the XML’s contents but only validate it:
If you have an error in your XML file, you will get an error output on your terminal, as shown in the following example:
Remember the output that was hard to read before? You can parse it using xmllint and get a pretty output on your command line. For that, use the –format argument and note how well formatted your XML file will get printed.
With xmllint, you can also change the indentation for your file and choose how many spaces you want for the indent. However, you must create an environment variable, XMLLINT_INDENT, and set the number of spaces you wish to use.
For instance, if we needed five spaces, the command to export the environment variable would be:
Now, your XML will get formatted with the specified indentation spaces.
If you notice your XML file has unnecessary spaces, you can remove them using the –noblanks argument, which eliminates even the newlines.
You will note removing the spaces will disorient the format of your XML, but you can use this option when you need to reduce the size of your XML document.
Method 2. Using the XMLStarlet Command
The XMLStarlet is another reliable tool for parsing XML via the command line. It has plenty of options that you can use to transform, parse, query, or validate your XML file. You must install the command line tool to use it and on Ubuntu, use the command provided below:
With XMLStarlet, you can easily extract data from your XML and perform other simple activities
Everything you can do with the command line tool is available on its help page. Let’s see several common usage examples.
To view an XML file, use the following command:
If you needed to validate the XML, the command is provided below:
To select data with XMLStarlet, use the select option and specify the path to the node. In our XML file, use the following command to select the address:
In the previous command, the –nl specifies to add a new line after the output. You can modify the path to get any specific data in the XML
There are various ways you can specify the XMLStarlet options to work with your XML file, and the comprehensive options are on the man page for the command line.
Conclusion
Parsing XML files shouldn’t be challenging when using Linux. You will enjoy working with XML files if you have the right command line tools to get the job done. This guide focused on two command line options for parsing XML files. Try them out and see which you find easier to use.