C++

How to Parse XML in C++

In this article, we are going to discuss how to parse XML in C++ programming language. We will see several working examples to understand the XML parsing mechanism in C++.

What is XML?

XML is a markup language and is mainly used for storing and transferring data in an organized way. XML stands for eXtensible Markup Language. It is very similar to HTML. The XML is completely focused on storing and transferring the data, whereas the HTML is used for displaying the data on the browser.

A Sample XML File/XML Syntax

Here is a sample XML file:

<?xml version="1.0" encoding="utf-8"?>

<EmployeeData>

    <Employee student_type="Part-time">

        <Name>Tom</Name>

    </Employee>

    <Employee student_type="Full-time">

        <Name>Drake</Name>

    </Employee>

</EmployeeData>

Unlike HTML, It is a tag-oriented markup language, and we can define our own tag in an XML file. In the above example, we have several user-defined tags such as “<Employee>”. Every tag will have the corresponding ending tag. “</Employee>” is the ending tag for “<Employee>”. We can define as many user-defined tags as we want to organize the data.

Parsing Libraries in C++:

There are various libraries to parse XML data in most of the high-level programming languages. C++ is not an exception. Here are the most popular C++ libraries to parse XML data:

  1. RapidXML
  2. PugiXML
  3. TinyXML

As the name suggests, the RapidXML is mainly focused on speed, and it is a DOM style parsing library. PugiXML supports Unicode conversion. You may want to use PugiXML if you want to convert UTF-16 doc to UTF-8. TinyXML is a bare-minimum version to parse XML data and not that fast as compared to the previous two. If you want to just get the job done and don’t care about the speed, you can choose TinyXML.

Examples
Now, we have a basic understanding of XML and XML parsing libraries in C++. Let’s now look at a couple of examples to parse xml file in C++:

  • Example-1: Parse XML in C++ using RapidXML
  • Example-2: Parse XML in C++ using PugiXML
  • Example-3: Parse XML in C++ using TinyXML

In each of these examples, we will use the respective libraries to parse a sample XML file.

Example-1: Parse XML in C++ using RapidXML

In this example program, we will demonstrate how to parse xml using RapidXML library in C++. Here is the input XML file (sample.xml):

<?xml version="1.0" encoding="utf-8"?>

<MyStudentsData>

    <Student student_type="Part-time">

        <Name>John</Name>

    </Student>

    <Student student_type="Full-time">

        <Name>Sean</Name>

    </Student>

    <Student student_type="Part-time">

        <Name>Sarah</Name>

    </Student>

</MyStudentsData>

Our goal here is to parse the above XML file using C++. Here is the C++ program to parse XML data using RapidXML. You can download the RapidXML library from Here.

#include <iostream>
#include <fstream>
#include <vector>
#include "rapidxml.hpp"

using namespace std;
using namespace rapidxml;


xml_document<> doc
xml_node<> * root_node = NULL;
   
int main(void)
{
    cout << "\nParsing my students data (sample.xml)....." << endl;
   
    // Read the sample.xml file
    ifstream theFile ("sample.xml");
    vector<char> buffer((istreambuf_iterator<char>(theFile)), istreambuf_iterator<char>());
    buffer.push_back('\0');
   
    // Parse the buffer
    doc.parse<0>(&buffer[0]);
   
    // Find out the root node
    root_node = doc.first_node("MyStudentsData");
   
    // Iterate over the student nodes
    for (xml_node<> * student_node = root_node->first_node("Student");
                student_node; student_node = student_node->next_sibling())
    {
        cout << "\nStudent Type =   " <<
                student_node->first_attribute("student_type")->value();
        cout << endl;
           
            // Interate over the Student Names
        for(xml_node<> * student_name_node = student_node->first_node("Name");
                student_name_node; student_name_node = student_name_node->next_sibling())
        {
            cout << "Student Name =   " << student_name_node->value();
            cout << endl;
        }
        cout << endl;
    }
   
    return 0;
}

Example-2: Parse XML in C++ using PugiXML

In this example program, we will demonstrate how to parse xml using PugiXML library in C++. Here is the input XML file (sample.xml):

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

<EmployeesData FormatVersion="1">

    <Employees>

        <Employee Name="John" Type="Part-Time">

        </Employee>
       
        <Employee Name="Sean" Type="Full-Time">

        </Employee>
       
        <Employee Name="Sarah" Type="Part-Time">

        </Employee>

    </Employees>

</EmployeesData>

In this example program, we will demonstrate how to parse xml using pugixml library in C++. You can download the PugiXML library from Here.

#include <iostream>
#include "pugixml.hpp"

using namespace std;
using namespace pugi;

int main()
{
    cout << "\nParsing employees data (sample.xml).....\n\n";
   
    xml_document doc;
   
    // load the XML file
    if (!doc.load_file("sample.xml")) return -1;

    xml_node tools = doc.child("EmployeesData").child("Employees");

   
    for (xml_node_iterator it = tools.begin(); it != tools.end(); ++it)
    {
        cout << "Employees:";

        for (xml_attribute_iterator ait = it->attributes_begin();
                ait != it->attributes_end(); ++ait)
        {
            cout << " " << ait->name() <<
              "=" << ait->value();
        }

        cout << endl;
    }

    cout << endl;
    return 0;
}

Example-3: Parse XML in C++ using TinyXML

In this example program, we will demonstrate how to parse xml using TinyXML library in C++. Here is the input XML file (sample.xml):

<?xml version="1.0" encoding="utf-8"?>

<MyStudentsData>

    <Student> John </Student>

    <Student> Sean </Student>

    <Student> Sarah </Student>

</MyStudentsData>

In this example program, we will demonstrate how to parse xml using TinyXML library in C++. You can download the TinyXML library from Here.

#include <iostream>
#include <fstream>
#include <vector>
#include "tinyxml2.cpp"

using namespace std;
using namespace tinyxml2;
   

int main(void)
{
  cout << "\nParsing my students data (sample.xml)....." << endl;
   
  // Read the sample.xml file
  XMLDocument doc;
  doc.LoadFile( "sample.xml" );
   
  const char* title = doc.FirstChildElement( "MyStudentsData" )->FirstChildElement( "Student" )->GetText();
  printf( "Student Name: %s\n", title );
 
  XMLText* textNode = doc.LastChildElement( "MyStudentsData" )->LastChildElement( "Student" )->FirstChild()->ToText();
  title = textNode->Value();
  printf( "Student Name: %s\n", title );
   
  return 0;
}

Conclusion

In this article, we have briefly discussed XML and looked into three different examples of how to parse XML in C++. TinyXML is a minimalistic library for parsing XML data.  Most of the programmers mainly use the RapidXML or PugiXML to parse XML data.

About the author

Sukumar Paul

I am a passionate software engineer and blogger. I have done my Masters in Software Engineering from BITS PILANI University, India. I have very good experience in real-time software development and testing using C, C++, and Python. Follow me at thecloudstrap.com.