Serialization Implementation Method Based on Generic Programming

Serialization is a dumping-recovering operation process, which supports operations, such as dumping an object to a temporary buffer or a permanent file and recovering the contents of the temporary buffer or the permanent file to an object. The purpose is to share and transfer data between different application programs, achieving cross-application, cross-language and cross-platform decoupling, and to instantly save the values of the contents of the data structure to a file when the application is abnormal or crashes on the customer site, and to recover these values to assist in analyzing and locating the causes when they are sent back to the developer.

Generic programming is an abstract implementation process of different data types with the same function (such as the STL source implementation), which supports the compiler to automatically derive specific types and generate implementation code at compile time. At the same time, based on the specific properties or optimization needs of specific types, it supports specific implementations using features, such as specialization or partial specialization, and template meta-programming.

Hello World

#include <iostream>
int main(int argc, char* argv[])
{
    std::cout << "Hello World!" << std::endl;
    return 0;
 }

Generic programming is actually all around us. Many functions and classes in std and stl namespaces that we often use are implemented by generic programming. For example, in the code above, std::cout is the template class, and std::basic_ostream is a specialization.

namespace std
{
    typedef basic_ostream<char>         ostream;
}

Starting with the Standard Input and Output of C++

In addition to the std::cout and std::basic_ostream mentioned above, C++ also provides various forms of input and output template classes, such as std::basic_istream, std::basic_ifstream, std::basic_ofstream, std::basic_istringstream, and std::basic_ostringstream, which mainly implements built-in input and output interfaces. For example, for Hello World, it can be directly used in strings. However, for input and output of a custom type, it is necessary to overload and implement operators >> and <<, as for the custom class below.

class MyClip 
{
    bool        mValid;
    int         mIn;
    int         mOut; 
    std::string mFilePath; 
};

A series of compilation errors occur if you use the following method.

MyClip clip;
std::cout << clip;

The error content is basically some information that clip does not support the << operator and the conversion operation is not supported when the clip is converted to a series of built-in types supported by cout, such as void* and int.

To solve compilation errors, we need to make the class MyClip support input and output operators >> and <<. Similar implementation code is as follows.

inline std::istream& operator>>(std::istream& st, MyClip& clip)
{
    st >> clip.mValid;
    st >> clip.mIn >> clip.mOut;
    st >> clip.mFilePath;
    return st;
}
inline std::ostream& operator<<(std::ostream& st, MyClip const& clip)
{
    st << clip.mValid << ' ';
    st << clip.mIn << ' ' << clip.mOut << ' ';
    st << clip.mFilePath << ' ';
    return st;
}

To access the private member variables of class objects normally, we also need to add serialized and deserialized friend functions in custom types (recall the reason why friend functions must be used instead of directly overloading operators >> and <<). For example:

friend std::istream& operator>>(std::istream& st, MyClip& clip);
friend std::ostream& operator<<(std::ostream& st, MyClip const& clip);

The implementation method of serialization is intuitive and easy to understand, but the defect is that for large-scale project development, the number of custom types may reach tens of thousands or even more, so we need to implement 2 functions for each type. One is to serialize the dumped data, and the other is to deserialize the recovered data, which not only increases the amount of code developed and implemented, but also needs to modify these 2 functions once the member variables of some classes are modified in the later stage.

At the same time, more complex custom types are considered, such as member variables that contain inheritance relationships and custom types.

class MyVideo : public MyClip
{
    std::list<MyFilter> mFilters;
};

As in the above code, things become more complicated when dumping-recovering the object contents of the class MyVideo, because the base class also needs to be dumped-recovered. And the member variables use the combination of STL template container list and custom class 'MyFilter`, in which case, the implementation of dumping-recovering also needs to be defined.

To address the above questions, is there a way to reduce the workload of code modification while being easy to understand and maintain?

Boost Serialization Library

For the problems encountered in using C++ standard input and output methods, Boost provides a good solution - all types of dumping-recovering operations are abstracted into 1 function, which is easy to understand. For the above types, the above 2 friend functions are simply replaced with the following 1 friend function.

template<typename Archive> friend void serialize(Archive&, MyClip&, unsigned int const);

The implementation of the friend functions is similar to the following:

template<typename A>void serialize(A &ar, MyClip &clip, unsigned int const ver) 
{
    ar & BOOST_SERIALIZATION_NVP(clip.mValid); 
    ar & BOOST_SERIALIZATION_NVP(clip.mIn);
    ar & BOOST_SERIALIZATION_NVP(clip.mOut);
    ar & BOOST_SERIALIZATION_NVP(clip.mFilePath);
}

BOOST_SERIALIZATION_NVP is a macro defined inside Boost, whose main function is to package various variables.

The use of dumping-recovering acts directly on operators >> and <<. For example:

// store
MyClip clip;
...
std::ostringstream ostr; 
boost::archive::text_oarchive oa(ostr);
oa << clip;

// load
std::istringstream istr(ostr.str());
boost::archive::text_iarchive ia(istr);
ia >> clip;

The std::istringstream and std::ostringstream are used here to recover data from the string stream and dump the data of class objects into the string stream, respectively.

For classes MyFilter and MyVideo, the same method is used, which is to add an implementation of the template friend function "serialize", respectively. As for std::list template class, Boost has already implemented it.

At this time, we found that for each defined class, all we need to do is declare a template friend function inside the class and implement the template function outside the class. For the subsequent modification of the member variables of the class, such as adding, deleting or renaming member variables, it is only necessary to modify a function.

The Boost serialization library is perfect enough, but the story is not over!

While developing on the terminal, we found several challenges in referencing the Boost serialization library.

Compilation data on the terminal is little, and official data for the compilation on the terminal is basically not exist. When switching between different versions for compilation, various strange compilation errors are often encountered.
Boost is not compatible enough between different C++ development standards. In particular, many problems occur when the libc++ standard is used for compiling links.
Boost increases the size of the release package on the terminal
Boost adds private header information, such as serialization library and version number, to each serialization, and parses it again during deserialization, which reduces the performance in some scenarios.

Serialization Implementation Method Based on Generic Programming

To solve these problems encountered in using Boost, we think it is necessary to re-implement the serialization library to remove the dependence on Boost and meet the following requirements at the same time:

The Boost serialization library is heavily used in existing projects, so compatibility with the existing code and the habits of developers is the primary goal.
The workload of code modification and refactoring is minimized.
It is compatible with different C++ development standards.
It provides higher performance than the Boost serialization library.
The size of the release package on the terminal is reduced.

To be compatible with the existing Boost code and keep the habits of the current developers, while minimizing the workload of refactoring with code modification, we should keep the template function "serialize". For the internal implementation of the template function, the following definitions are directly used without repackaging the member variables, to improve the efficiency.

#define BOOST_SERIALIZATION_NVP(value)  value

For the call to the dumping-recovering interface, the current invocation method is still used, except that the input and output classes are modified to the following

alivc::text_oarchive oa(ostr);
alivc::text_iarchive ia(istr);

So far, the external interface of the serialization library has been completed, and the rest is the internal work. How should the internal framework of the serialization library be redesigned and implemented to meet the requirements?

First, let's look at the process flow chart of the current design architecture.

For example, for the dumping class text_oarchive, the interfaces it supports must include the following

explicit text_oarchive(std::ostream& ost, unsigned int version = 0);
template <typename T> text_oarchive& operator<<(T& v);
template <typename T> text_oarchive& operator&(T& v);

When developers call the operator function <<, they need to callback to the template function "serialize" of the corresponding type first.

template <typename T>
text_oarchive& operator<<(T& v)
{
    serialize(*this, v, mversion);
    return *this;
}

When developers start to operate on each member of a specific type, judgment is needed at this moment. If the member variable is already a built-in type, perform the serialization directly. If it is a custom type, callback to the template function "serialize" of the corresponding type.

template <typename T>
text_oarchive& operator&(T& v)
{
    basic_save<T>::invoke(*this, v, mversion);
    return *this;
}

In the code above, basic_save::invoke completes the template type derivation at compile time and chooses whether to dump the built-in type directly or callback to the "serialize" function of the corresponding type of the member variable to continue repeating the above process.

Due to the limited number of built-in types, we choose to set the default behavior of the template class basic_save to callback to the "serialize" function of the corresponding type.

template <typename T, bool E = false>
struct basic_load_save
{
    template <typename A>
    static void invoke(A& ar, T& v, unsigned int version)
    {
        serialize(ar, v, version);
    }
};

template <typename T> 
struct basic_save : public basic_load_save<T, std::is_enum<T>::value>
{
};

In this case, the template parameter of the above code has an additional parameter E. Here, it is mainly necessary to perform special processing on the enumerated type. The implementation using partial specialization is as follows:

template <typename T>
struct basic_load_save<T, true>
{
    template <typename A>
    static void invoke(A& ar, T& v, unsigned int version)
    {
        int tmp = v;
        ar & tmp;
        v = (T)tmp;
    }
};

At this point, we have completed the default behavior of the overloaded operator &, which is continuously backtracking to the template function "serialize" of the corresponding member variable type, but we need to stop the backtracking process for a built-in model, such as the int type.

template <typename T>
struct basic_pod_save
{
    template <typename A>
    static void invoke(A& ar, T const& v, unsigned int)
    {
        ar.template save(v);
    }
};

template <>
struct basic_save<int> : public basic_pod_save<int>
{
};

For the int type, the integer value is directly dumped into the output stream. At this time, a final dump function needs to be added to text_oarchive.

template <typename T>
void save(T const& v)
{
    most << v << ' ';
}

Here, we find that in the save member function, we have already output the value of the specific member variable to the stream.

For other built-in types, they are treated in the same way and implemented by referring to the source code of C++ std::basic_ostream.

Correspondingly, the operation flow of text_iarchive for recovery operation is as follows:

Test Results

We have conducted a comparative test on using the Boost serialization library and the reimplemented serialization library. The results are as follows:

The refactoring workload of code modification is very small. It is only necessary to delete the related header files of Boost, and replace the Boost related namespaces with the macros of alivc, BOOST_SERIALIZATION_FUNCTION, and BOOST_SERIALIZATION_NVP.
The size of the release package on Android is reduced by about 500 KB.
In the current message processing framework, the average time to process a message is reduced from 100 us to 25 us.
The code implementation is about 300 lines, which is more lightweight.

Future Work

Due to the current project, the re-implemented serialization library does not support the memory data pointed by the dumping-recovering pointer. However, the current design framework has taken this extensibility into account and it may be supported in the future.

Conclusion

Generic programming can greatly improve development efficiency, especially in code reusability. At the same time, its type derivation and code generation are completed at compile time, so it does not reduce the performance.
Serialization plays an important role in the decoupling process that requires dumping-recovering and in assisting in the cause analysis of exceptions and crashes.
By using the language features of C++ and the template itself, and combining with reasonable architecture design, it is easy to expand and avoid excessive design.

References

https://www.ibm.com/developerworks/cn/aix/library/au-boostserialization/

Community

Serialization Implementation Method Based on Generic Programming

Hello World

Starting with the Standard Input and Output of C++

Boost Serialization Library

Serialization Implementation Method Based on Generic Programming

Test Results

Future Work

Conclusion

References

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

Alibaba Clouder

Related Products

ECS(Elastic Compute Service)

OSS(Object Storage Service)