×
Community Blog An Introduction and Comparison of Several Common Java Serialization Frameworks

An Introduction and Comparison of Several Common Java Serialization Frameworks

This article compares open-source serialization frameworks in the industry by the universality, usability, scalability, performance, and supports for Java data types and syntax.

1

1. Background

Serialization and deserialization are common techniques used in daily data persistence and network transmission. However, the current variety of serialization frameworks confuses people about the selection of serialization frameworks in different scenarios. This article compares open-source serialization frameworks in the industry by the universality, usability, scalability, performance, and supports for Java data types and syntax.

  • Universality: Universality indicates whether the serialization framework supports cross-language and cross-platform serialization.
  • Usability: Usability refers to whether the serialization framework is easy to use and debug, which affects development efficiency.
  • Scalability: As the business grows, the transmission entity may change, but the old entity may still be used. So, it is necessary to consider the scalability of the serialization framework.
  • Performance: The serialization performance includes time overhead and space overhead. Serialized data is usually used for persistence or network transmission, so its size is an important indicator. The encoding and decoding time is also an important indicator that affects the selection of serialization protocol because systems are pursuing high performance.
  • Supports for Java Data Types and Syntax: Different serialization frameworks support different data types and syntaxes. This article tests the supports for Java data types and syntaxes in different serialization frameworks.

The following parts test and compare JDK Serializable, FST, Kryo, Protobuf, Thrift, Hessian, and Avro.

2. Serialization Frameworks

2.1 JDK Serializable

JDK Serializable is a serialization framework of Java. Users can use the serialization mechanism of Java by implementing java.io.Serializable or java.io.Externalizable. The implementation of serialization interfaces means only the class can be serialized or deserialized. ObjectInputStream and ObjectOutputStream are required to serialize and deserialize objects.

The following demo shows encoding and decoding using JDK Serializable:

/**
 * Encoding
 */
public static byte[] encoder(Object ob) throws Exception{
    //Buffer for byte numbers
    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
    //Serialize the object
    ObjectOutputStream objectOutputStream = new ObjectOutputStream(byteArrayOutputStream);
    objectOutputStream.writeObject(ob);
    byte[] result = byteArrayOutputStream.toByteArray();
    //Close the stream
    objectOutputStream.close();
    byteArrayOutputStream.close();
    return result;
}
/**
 * Decoding
 */
public static <T> T decoder(byte[] bytes) throws Exception {
    ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
    ObjectInputStream objectInputStream = new ObjectInputStream(byteArrayInputStream);
    T object = (T) objectInputStream.readObject();
    objectInputStream.close();
    byteArrayInputStream.close();
    return object;
}
  • Universality

JDK Serializable is a built-in serialization framework of Java. Therefore, cross-language serialization and deserialization are not supported.

  • Usability

JDK Serializable can complete the serialization task without referencing any external dependencies. However, it is more difficult to use than the open-source frameworks. The encoding and decoding demo above is not user-friendly. ByteArrayOutputStream and ByteArrayInputStream are required to convert all bytes.

  • Scalability

JDK Serializable uses serialVersionUID to control the version of a serialized class. If the versions of serialized and deserialized classes are different, the exception message java.io.InvalidClassException is returned, indicating that the VUID of the serialized class is inconsistent with that of deserialized classes.

java.io.InvalidClassException: com.yjz.serialization.java.UserInfo; local class incompatible: stream classdesc serialVersionUID = -5548195544707231683, local class serialVersionUID = -5194320341014913710

The exception above occurs because serialVersionUID is not defined but generated automatically by JDK Serializable through the hash algorithm. Therefore, the results of serialization and deserialization are inconsistent.

Users can define serialVersionUID and use it during serialization and deserialization to avoid this problem. By doing so, JDK Serializable can support field extension.

private static final long serialVersionUID = 1L;
  • Performance

Although JDK Serializable is exclusive for Java, its performance is not so good. The following test sample will also be used for all other frameworks.

public class MessageInfo implements Serializable {

    private String username;
    private String password;
    private int age;
    private HashMap<String,Object> params;
    ...
    public static MessageInfo buildMessage() {
        MessageInfo messageInfo = new MessageInfo();
        messageInfo.setUsername("abcdefg");
        messageInfo.setPassword("123456789");
        messageInfo.setAge(27);
        Map<String,Object> map = new HashMap<>();
        for(int i = 0; i< 20; i++) {
            map.put(String.valueOf(i),"a");
        }
        return messageInfo;
    }
}

The byte size after serialization by JDK Serializable is 432. This number will be compared to the other serialization frameworks.

Now, perform serialization and deserialization on the test sample 10 million times and then calculate the total time consumption:

Time Consumed for Serialization (ms) Time Consumed for Deserialization (ms)
38,952 96,508

The results will also be compared with those of other serialization frameworks.

  • Supports for Java Data Types and Syntax

JDK Serializable supports most Java data types and syntax.

2

WeakHashMap does not implement the interfaces of JDK Serializable.

3

Note 1: Serialize the following code:

Runnable runnable = () -> System.out.println("Hello");

Direct serialization will result in the following exception:

com.yjz.serialization.SerializerFunctionTest$$Lambda$1/189568618

Runnable Lambda does not implement the interfaces of JDK Serializable. Modify the code below to serialize Lambda expressions:

Runnable runnable = (Runnable & Serializable) () -> System.out.println("Hello");

2.2 FST

Fast-serialization (FST) is a Java serialization framework that is fully compatible with the JDK serialization protocol. Its serialization speed is ten times faster than JDK Serializable, but the byte size is only 1/3 the size of JDK Serializable. The latest FST version is 2.56. FST has supported Android since version 2.17.

The following demo shows how to use FST for serialization. One FSTConfiguration can be called by multiple threads. However, to prevent the performance bottleneck due to frequent calls, ThreadLocal is usually used to assign an `FSTConfiguration to each thread.

private final ThreadLocal<FSTConfiguration> conf = ThreadLocal.withInitial(() -> {
      FSTConfiguration conf = FSTConfiguration.createDefaultConfiguration();
      return conf;
  });

public byte[] encoder(Object object) {
    return conf.get().asByteArray(object);
}

public <T> T decoder(byte[] bytes) {
    Object ob = conf.get().asObject(bytes);
    return (T)ob;
}
  • Universality

FST is also a serialization framework developed for Java, so it does not support cross-language serialization as well.

  • Usability

In terms of usability, FST is much better than JDK Serializable. Its syntax is extremely simple because FSTConfiguration encapsulates most methods.

  • Scalability

FST supports the compatibility of new fields with old data streams using the @Version comment. All new fields must be marked with @Version comments. If there is no @Version comment, the version number is 0.

private String origiField;
@Version(1)
private String addField;

Note:

  • Deleting a field will affect the backward compatibility, but deleting a raw field will not affect the backward compatibility (if no new fields are added). If a field is deleted after being added, the compatibility will be affected.
  • The @Version comment feature cannot be applied to self-implemented readObject and writeObject methods.
  • If the Serializer is implemented, users need to control the Version.

On the whole, FST has scalability, but it is still complicated to use.

  • Performance

Use FST to serialize the test sample using JDK Serializable. The byte size is 172, which is almost 1/3 the size of JDK Serializable. The following table shows the time consumption of serialization and deserialization.

Time Consumed for Serialization (ms) Time Consumed for Deserialization (ms)
13,587 19,031

FST can be optimized by disabling circular reference and pre-registering serialized classes.

private static final ThreadLocal<FSTConfiguration> conf = ThreadLocal.withInitial(() -> {
      FSTConfiguration conf = FSTConfiguration.createDefaultConfiguration();
      conf.registerClass(UserInfo.class);
      conf.setShareReferences(false);
      return conf;
  });

After the optimization above, the time consumption is listed below:

Time Consumed for Serialization (ms) Time Consumed for Deserialization (ms)
7,609 17,792

The serialization time consumption has decreased by nearly half, but the byte size has increased to 191.

  • Supports for Java Data Types and Syntax

FST is developed based on JDK Serializable. Therefore, they support the same Java data types and syntax.

4
5

2.3 Kryo

Kryo is a fast and effective Java binary serialization framework. It relies on the underlying ASM library to generate bytecode, so it runs quickly. Kryo aims to provide a serialization framework with fast serialization speed, small result size, and simple APIs. Kryo supports automatic deep copy and shallow copy and realizes deep copy in object :arrow_right: object mode instead of object :arrow_right: byte :arrow_right: object mode.

The following demo shows how to use Kryo for serialization:
private static final ThreadLocal<Kryo> kryoLocal = ThreadLocal.withInitial(() -> {
        Kryo kryo = new Kryo();
            kryo.setRegistrationRequired(false);//No need to pre-register the class
        return kryo;
    });
 
    public static byte[] encoder(Object object) {
        Output output = new Output();
        kryoLocal.get().writeObject(output,object);
        output.flush();
        return output.toBytes();
    }
 
    public static <T> T decoder(byte[] bytes) {
        Input input = new Input(bytes);
        Object ob = kryoLocal.get().readClassAndObject(input);
        return (T) ob;
    }

Note: The corresponding Input.readxxx function must be used when using the Output.writeXxx function. For example, Output.writeClassAndObject() must be used together with Input.readClassAndObject().

  • Universality

On the official website, Kryo is described as a Java binary serialization framework. In addition, no cross-language practices of Kryo are found on the Internet. Although some articles have mentioned that the cross-language use of Kryo is very complicated, no related implementation in other languages is found.

  • Usability

In terms of usage, the APIs provided by Kryo are also very simple and easy to use. The Input and Output encapsulate almost all stream operations. Kryo provides rich and flexible configurations, such as serializer customization and default serializer setting, but they are difficult to use.

  • Scalability

The default Kryo serializer FiledSerializer does not support field extension. Other default serializers are required to support field extension.

For example:

private static final ThreadLocal<Kryo> kryoLocal = ThreadLocal.withInitial(() -> {
        Kryo kryo = new Kryo();
        kryo.setRegistrationRequired(false);
        kryo.setDefaultSerializer(TaggedFieldSerializer.class);
        return kryo;
    });
  • Performance

After using Kryo, the byte size after serialization is 172, which is the same as FST before optimization. The time consumption is listed below:

Time Consumed for Serialization (ms) Time Consumed for Deserialization (ms)
13,550 14,315

Disable the circular reference and serialized class pre-registering. The byte size after serialization is 120 because the identity of the serialized class is a number instead of the class name. The time consumption is listed below:

Time Consumed for Serialization (ms) Time Consumed for Deserialization (ms)
11,799 11,584
  • Supports for Java Data Types and Syntax

Kryo requires no-arg constructors to serialize classes because no-arg constructors are used to create objects during deserialization.

6
7

2.4 Protobuf

Protocol Buffer (Protobuf) is a language-neutral, platform-independent, and scalable serialization framework. Compared with previous serialization frameworks, Protobuf needs to predefine the schema.

The following demo shows how to use Protobuf:

(1) Prepare the .proto description file:

syntax = "proto3";

option java_package = "com.yjz.serialization.protobuf3";

message MessageInfo
{
    string username = 1;
    string password = 2;
    int32 age = 3;
    map<string,string> params = 4;
}

(2) Generate Java code:

protoc --java_out=./src/main/java message.proto

(3) The generated Java code already contains encoding and decoding methods:

//Encoding
byte[] bytes = MessageInfo.toByteArray()
//Decoding
MessageInfo messageInfo = Message.MessageInfo.parseFrom(bytes);
  • Universality

Protobuf is designed as a language-independent serialization framework. Currently, it supports Java, Python, C++, Go, and C# and provides third-party packages for many other languages. Therefore, in terms of universality, Protobuf is very powerful.

  • Usability

Protobuf uses interface definition language (IDL) to define the schema description file. After defining the description file, the protoc compiler can be used to generate serialization and deserialization code directly. Therefore, to use Protobuf, users simply need to prepare the description file.

  • Scalability

Scalability is also one of the goals of Protobuf design. The .proto files can be modified easily.

Add fields: To add fields, make sure that the new fields have corresponding default values to interact with the old code. The message generated by the new protocol can be parsed by the old protocol.

Delete fields: To delete fields, note that the corresponding field or tag cannot be used in subsequent updates. The "reserved" command can be used to avoid errors.

message userinfo{
reserved 3,7;  //Set field tags to be deleted as "reserved"
  reserved "age","sex" //Set fields to be deleted as "reserved"
}

Protobuf is also compatible with many value types, such as int32, unit32, int64, unit64, and Boolean. The type can be changed as needed.

Protobuf has made a lot of efforts in scalability so it can support protocol extensions.

  • Performance

Perform the same serialization operation on the test sample using Protobuf. The byte size after serialization is 192. The following table lists the corresponding time consumption:

Time Consumed for Serialization (ms) Time Consumed for Deserialization (ms)
14,235 30,694

The deserialization performance of Protobuf is worse than FST and Kryo.

  • Supports for Java Data Types and Syntax

Protobuf does not support defining Java methods because it uses IDL to define schemas. The following table displays the supports of Protobuf for data types:

8

Note: List, Set, and Queue collection classes are defined and tested through the "repeated" modifier in Protobuf. Any class that implements the Iterable interface can use the repeated list.

2.5 Thrift

Thrift is an efficient remote procedure call (RPC) framework developed by Facebook and supports multiple languages. Later, Facebook made Thrift open-source to Apache. As an RPC framework, Thrift is often used in serialization because it provides RPC services across multiple languages.

To perform serialization using Thrift, create the Thrift IDL file first and then compile the file to generate Java code. Next, use TSerializer and TDeserializer to serialize and deserialize objects.

(1) Use IDL to define the .thrift files:

namespace java com.yjz.serialization.thrift

struct MessageInfo{
    1: string username;
    2: string password;
    3: i32 age;
    4: map<string,string> params;
}

(2) Use the compiler of Thrift to generate Java code:

thrift --gen java message.thrift

(3) Use TSerializer and TDeserializer for encoding and decoding:

  public static byte[] encoder(MessageInfo messageInfo) throws Exception{
        TSerializer serializer = new TSerializer();
        return serializer.serialize(messageInfo);
    }
    public static MessageInfo decoder(byte[] bytes) throws Exception{
        TDeserializer deserializer = new TDeserializer();
        MessageInfo messageInfo = new MessageInfo();
        deserializer.deserialize(messageInfo,bytes);
        return messageInfo;
    }
  • Universality

Similar to Protobuf, Thrift also uses IDL to define the description file. This is an effective method to implement cross-language serialization/RPC. Thrift supports C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml, Delphi, and many other languages. So, Thrift is very universal.

  • Usability

Thrift is similar to Protobuf in terms of usability. Three steps are required for both of them:

  1. Writing description files using IDL
  2. Compiling and generating Java code
  3. Calling serialization and deserialization methods

The generated classes in Protobuf contain built-in serialization and deserialization methods, while Thrift needs to call a built-in serializer to encode and decode the class.

  • Scalability

Thrift supports field extensions. Please note the following issues when extending fields:

  • Modify Field Name: The modification of the field name does not affect serialization or deserialization. The value of deserialization data is assigned to the updated field because the numbers during encoding and decoding processes correspond.
  • Modify Field Type: If the modified field is an optional field, null or 0 (default value of the data type) is returned. If the modified field is a required field, an exception is reported indicating that the field is not found.
  • Add Field: If the new field is a required field, set a default value for it, or an exception is reported during deserialization. If the type of the field is optional, this field is not included in the deserialization process. Since it has no value, it will not be serialized or deserialized. If it is a default field, its value after deserialization is null or 0, depending on the data type.
  • Delete Field: Both required and optional fields can be deleted. Deserialization is not affected.

Do not reuse the deleted integer field tag, or deserialization may be affected.

  • Performance

For the test sample, the byte size after serialization using Thrift is 257. The corresponding time consumption is listed below:

Time Consumed for Serialization (ms) Time Consumed for Deserialization (ms)
28,634 20,722

The time consumption of Thrift is very close to Protobuf in serialization and deserialization. Protobuf consumes less time in serialization than Thrift, while Thrift is better in deserialization.

  • Supports for Java Data Types and Syntax

Thrift uses IDL to define the serialization class. Thrift supports the following Java data types:

  1. Eight basic data types. Thrift does not include short and char types, which are replaced with double and String types.
  2. Collection types, including List, Set, and Map. Queue is not supported.
  3. Custom types (struct type)
  4. Numeration types
  5. Byte array

Thrift does not support defining Java methods.

2.6 Hessian

Hessian is a lightweight RPC framework developed by Caucho. It uses HTTP protocol to transmit data and supports binary serialization.

Hessian is often used as a serialization framework because it supports cross-language and efficient binary serialization protocol. The Hessian serialization protocol includes Hessian 1.0 and Hessian 2.0. Hessian 2.0 optimizes the serialization process, and its performance is significantly improved compared with Hessian 1.0.

It is very simple to serialize objects using Hessian. Only HessianInput and HessianOutput are needed. The following demo shows how to use Hessian for serialization:

public static <T> byte[] encoder2(T obj) throws Exception{
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        Hessian2Output hessian2Output = new Hessian2Output(bos);
        hessian2Output.writeObject(obj);
        return bos.toByteArray();
    }

    public static <T> T decoder2(byte[] bytes) throws Exception {
        ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
        Hessian2Input hessian2Input = new Hessian2Input(bis);
        Object obj = hessian2Input.readObject();
        return (T) obj;
    }
  • Universality

Like Protobuf and Thrift, Hessian supports RPC communication across languages. One of the main advantages of Hessian over other cross-language PRC frameworks is that it does not use IDL to define data and services. Instead, it defines services by self-description. Currently, Hessian supports languages, including Java, Flash/Flex, Python, C++, .Net/C#, D, Erlang, PHP, Ruby, and Object-C.

  • Usability

Hessian does not need IDL to define data and services. It only needs to implement the Serializable interface for serialized data. Therefore, Hessian is easier to use compared to Protobuf and Thrift.

  • Scalability

Although Hessian needs to implement the Serializable interface to serialize classes, it is not affected by serialVersionUID and supports field extension easily.

  1. Modify Field Name: The new field name after deserialization is null or 0 (depending on the type.)
  2. Add Field: The new field is null or 0 (depending on the type) after deserialization.
  3. Delete Field: Deserialization is not affected.
  4. Modify Field Type: Deserialization is not affected if the field type is compatible. Otherwise, an exception is reported.
  • Performance

The byte size after serialization is 277 using Hessian 1.0 and 178 using Hessian 2.0.

The time consumption of serialization and deserialization is listed below:

Time Consumed for Serialization (ms) Time Consumed for Deserialization (ms)
Hessian 1.0 57,648 55,261
Hessian 2.0 38,823 17,682

The results show that Hessian 2.0 is much better than Hessian 1.0 in both the bite size and time consumption.

  • Supports for Java Data Types and Syntax

As Hessian uses Java self-description to serialize classes, the native data types, collection classes, custom classes, and enumeration types of Java are mostly supported (SynchronousQueue is not supported.) Java syntax is also supported.

2.7 Avro

Avro is a data serialization framework. It is a sub-project of Apache Hadoop and a data serialization framework developed by Doug Cutting while he was in charge of Hadoop. Avro is designed to support data-intensive applications and is suitable for remote or local large-scale data exchange and storage.

Use Avro to serialize objects in the following three steps:

(1) Define the avsc file:

{
    "namespace": "com.yjz.serialization.avro",
    "type": "record",
    "name": "MessageInfo",
    "fields": [
        {"name": "username","type": "string"},
        {"name": "password","type": "string"},
        {"name": "age","type": "int"},
        {"name": "params","type": {"type": "map","values": "string"}
        }
    ]
}

(2) Use avro-tools.jar or Maven to compile and generate Java code:

java -jar avro-tools-1.8.2.jar compile schema src/main/resources/avro/Message.avsc ./src/main/java

(3) Use BinaryEncoder and BinaryDecoder for encoding and decoding:

public static  byte[] encoder(MessageInfo obj) throws Exception{
        DatumWriter<MessageInfo> datumWriter = new SpecificDatumWriter<>(MessageInfo.class);
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        BinaryEncoder binaryEncoder = EncoderFactory.get().directBinaryEncoder(outputStream,null);
        datumWriter.write(obj,binaryEncoder);
        return outputStream.toByteArray();
    }

    public static MessageInfo decoder(byte[] bytes) throws Exception{
        DatumReader<MessageInfo> datumReader = new SpecificDatumReader<>(MessageInfo.class);
        BinaryDecoder binaryDecoder = DecoderFactory.get().directBinaryDecoder(new ByteArrayInputStream(bytes),null);
        return datumReader.read(new MessageInfo(),binaryDecoder);
    }
  • Universality

Avro defines the data structure through schemas. Currently, Avro supports Java, C, C++, C#, Python, PHP, and Ruby, so Avro is universal among these languages.

  • Usability

Avro does not need to generate code for dynamic languages. However, for static languages, such as Java, avro-tools.jar is still necessary to compile and generate Java code. It is more complicated to write a Schema in Avro than in Thrift and Protobuf.

  • Scalability
  1. Set a default value for all fields. If a field does not have a default value, the field cannot be deleted in the future.
  2. To add a new field, a default value must be set.
  3. The field type cannot be modified.
  4. The field name cannot be modified, but an alias can be added.
  • Performance

The byte size after serialization using Avro is 111. The following table lists the time consumption:

Time Consumed for Serialization (ms) Time Consumed for Deserialization (ms)
Generate Java code 26,565 45,383
  • Supports for Java Data Types and Syntax

Avro needs to use supported data types to write schema information. Avro supports the basic Java data types (null, Boolean, int, long, float, double, bytes, and string) and complex Java data types (Record, Enum, Array, Map, Union, and Fixed.)

Avro generates code automatically or by using schemas. Java methods cannot be defined in serialized classes.

3. Summary

3.1 Universality

The following table compares the universality of different serialization frameworks. It shows how Protobuf is the best because it supports multiple programming languages.

9

3.2 Usability

The following table compares the API usability of different serialization frameworks. All serialization frameworks provide good API usage except JDK Serializer.

10

3.3 Scalability

The following table compares the scalability of serialization frameworks. The scalability of Protobuf is the most convenient and natural. Other serialization frameworks require some configurations and comments for scalability.

11

3.4 Performance

  • Comparison of Byte Size after Serialization

The following figure compares the byte size in different serialization frameworks after serialization. The serialization results of the Kryo pre-registering feature (pre-register the serialized class) and Avro are both very good. So, if the byte size after serialization is restricted, choose Kryo or Avro.

12

  • Comparison of Time Consumption

The following figure shows the serialization and deserialization time consumption. The Kryo pre-registering feature and the FST pre-registering feature have excellent performance. The time consumed for serialization in FST is the shortest, while the time consumed for serialization and deserialization in Kryo are almost the same. Therefore, if serialization time consumption is a key metric, choose Kryo or FST.

13

3.5 Supports for Java Data Types and Syntax

Java Data types supported by the serialization frameworks:

14

Note: The collection class tests cover most corresponding implementation classes.

  1. List: ArrayList, LinkedList, Stack, CopyOnWriteArrayList, and Vector
  2. Set: HashSet, LinkedHashSet, TreeSet, and CopyOnWriteArraySet
  3. Map: HashMap, LinkedHashMap, TreeMap, WeakHashMap, ConcurrentHashMap, and Hashtable
  4. Queue: PriorityQueue, ArrayBlockingQueue, LinkedBlockingQueue, ConcurrentLinkedQueue, SynchronousQueue, ArrayDeque, LinkedBlockingDeque, and ConcurrentLinkedDeque

The following table lists the data types and syntax supported by serialization frameworks.

15

  • Note 1: For static internal classes, the serialization interface must be implemented.
  • Note 2: For external classes, the serialization interface must be implemented.
  • Note 3: Add (IXxx & Serializable) before the Lambda expression.

Protobuf and Thrift use IDL to define class files and then use compilers to generate Java code. IDL does not provide syntax to define the static internal classes or non-static internal classes. Therefore, these functions cannot be tested.

0 0 0
Share on

jianzhang.yjz

1 posts | 0 followers

You may also like

Comments

jianzhang.yjz

1 posts | 0 followers

Related Products