C++

Guide to C++ Serialization

Serialization converts an object into a stream of bytes to be stored in the disk or sent to another computer through a network. There are two kinds of objects in C++: fundamental objects and objects instantiated from a defined class. Note, in C++, the struct is considered a class, and the name of a struct represents the instantiated object of the struct.

Individual fundamental objects are not normally serialized. However, since an instantiated object has fundamental objects, as the whole object is serialized, the fundamental objects are also serialized. In C++, all data structures, such as the vector, are predefined classes.

Serialization is also called marshaling. The opposite of serialization is deserialization or unmarshalling. The serialized object as a file from the disk or the network can be converted back (resurrected) to the object at the local computer to be used with the local C++ application (program).

This article guides you on better understanding the C++ serialization libraries and how to write your own serialization library. It is focused on the serialized standard stream, JSON – see below.

Article Content

Binary and Text Stream

Binary
A compiled C++ program is said to be in binary form. A serialized stream can be in binary form. However, this article will not consider binary serialized objects.

Text
The serialized stream can be in text form. Two text standards that are used today are JSON and XML. It is easier to understand and handle JSON than to understand and handle XML. So JSON is used in this article.

Main Goals

The main goals for serialization are that the serialized stream should be backwardly compatible and forwardly compatible. It should also be possible to be used in different operating systems and different computer architectures.

Version
Assume that you have written a program and shipped it to a customer, and the customer is satisfied. That is fine. Later on, the customer needs modification. However, today, the customer has employed his own programmer. The programmer asks you to add another property (data member) to a class and send the corresponding objectives through the network. He intends to fit the object into the program; when you do that, the serialized stream will have to be backwardly compatible with the old object.

Specification of C++ and other languages change over time. In some specifications, you are informed of some of the changes that will take place in the next and future specifications. It is normally not possible to inform you of all the changes that will take place. So, your serialized stream should be forwardly compatible, as long as these new future changes are concerned. Forward compatibility has its limits because not all future changes can be determined.

Both forward and backward compatibility is handled by the scheme called versioning.

JSON Stream

JSON stands for JavaScript Object Notation.

JSON is a text format for storing and transporting data.

JSON is “self-describing”.

JSON is also an old standard, and so it suits well for C++ text serialization and deserialization. So, to send a C++ instantiated object, convert it to a JSON object and send. Just before the JSON object is sent, it is called a stream. When a JSON object is received in its sequence, it is still called a stream for deserialization.

JSON Syntax

With JSON, a datum is a key/value pair. For example, in

    "name":"Smith"

name is a key, and Smith is the value. An object is delimited by braces, as in:

    {"name" : "Smith", "height" : 1.7}

Data are separated by commas. Any text, whether it is a key or a value, must be in double-quotes. Numbers are written without quotes.

An array is delimited by square brackets as in:

    ["orange", "banana", "pear", "lemon"]

In the following code, there is one datum whose value is an array and identified by arr

    {"arr" : ["orange", "banana", "pear", "lemon"]}

Note: Objects can be nested in JSON, and with that, objects can be identified.

JSON Data Value

Possible JSON datum value is:

  • a string
  • a number
  • an object
  • an array
  • a Boolean
  • null
  • a function (but in double-quotes)

A C++ date or any other object not in this list must be converted to a literal string to become a JSON value.

Comparing C++ and JSON Objects

The following is a simple C++ program with a simple object, of default constructor:

#include
using namespace std;

class TheCla
    {
        public:
        int num;

        int mthd (int it)
            {
                return it;
            }
    };

int main()
    {
        TheCla obj;
        int no = obj.mthd(3);
        cout << no << endl;

        return 0;
    }

The equivalent JSON object is as follows:

    {"obj": {"num" : null, "mthd" : "int mthd (int it) { return it;}"}}

A JSON object is, by definition, serialized.

Note how the name of the object has been indicated. Also, note how the name of the function has been indicated. At the receiving end, the C++ program there for deserialization will have to convert this into a C++ class and object and then compile. The program will also have to recognize the function in string form, remove the double quotes, and have the function as text before compiling.

To facilitate this, metadata should be sent. Metadata is data about data. A C++ map with the metadata can be sent. A map is a C++ object itself, which will have to be converted to a JSON object. It will be sent, followed by the JSON object of interest.

The JSON object is a stream object. After it has been prepared, it should be sent to the C++ ostream object to be saved as a file or sent through the network. At the receiving computer, the C++ istream will receive the sequence. It will then be taken by the deserialization program that will reproduce the object in C++ format. ostream and istream are objects of C++ fstream.

Note: In JavaScript (ECMAScript), serialization is called, stringifying and deserialization is called parsing.

JSON Object and JavaScript Object

JSON object and JavaScript object are similar. JavaScript object has less restrictions than JSON object. JSON object was designed from the JavaScript object, but today, it can be used by many other computer languages. JSON is the most common archive (serialized sequence) used to send data between the web servers and their clients. C++ libraries use JSON, but none of them satisfies most of the goals of producing an archive for C++.

Note: in JavaScript, a function is not a string. Any function received as a string is converted to a normal syntax function.

More to Know

As well as knowing the above, in order to produce a serialization or deserialization library for yourself, you have also to know:

  • how to express C++ pointers-to-objects in JSON format;
  • how to express C++ inheritance in JSON format;
  • how to express C++ polymorphism in JSON format; and
  • more on JSON.

Conclusion

Serialization converts an object into a stream of bytes to be stored in the disk or sent to another computer through a network. Deserialization is the reversed process for the serialized stream, which is called the archive.

Both fundamental objects and instantiated objects can be serialized. Single fundamental objects are hardly serialized. However, since an instantiated object has fundamental objects, fundamental objects are serialized alongside the whole.

Serialization has one disadvantage that it exposes private members of the C++ object. This problem can be resolved by doing serialization in binary. With text, metadata can be sent to indicate the private members; but the programmer at the other end may still know the private members.

You might already have saved into the disk or sent a binary or source code program through the email, and you might be wondering: why save or send only the object. Well, in C++, you might have realized that a whole library may consist of just one class, possibly with some inheritance. The class might be longer than many short C++ programs. So, one reason for sending objects is because some objects are too large. Object-Oriented Programming (OOP) involves the interaction of objects, similar to how animals, plants, and tools interact. Another reason is that OOP is improving, and programmers prefer to deal with objects than the whole application, which may be too large.

C++ does not yet have a standard archive format for text or binary, though there are serialization libraries for C++ serialization and deserialization. None of them is really satisfactory. The text archive format for JavaScript is JSON. JSON can be used with any computer language. So, with the above guide, you should be able to produce your own library for C++ marshaling and unmarshalling.

About the author

Chrysanthus Forcha

Discoverer of mathematics Integration from First Principles and related series. Master’s Degree in Technical Education, specializing in Electronics and Computer Software. BSc Electronics. I also have knowledge and experience at the Master’s level in Computing and Telecommunications. Out of 20,000 writers, I was the 37th best writer at devarticles.com. I have been working in these fields for more than 10 years.