This topic describes how X-Engine of ApsaraDB RDS helps reduce the costs of DingTalk and implement online collaboration.

Background information

As a leading enterprise-grade instant messaging (IM) tool in China, DingTalk serves hundreds of millions of users across China. The basic features of DingTalk include project-specific chat groups, video conferences and daily reports. DingTalk Open Platform also provides various office automation (OA) applications to facilitate communication between co-workers.

In 2020, COVID-19 is a serious issue. To decrease the risk of infection caused by work in centralized offices, a large number of employees have opted to work from home. The demand for collaborative tools suddenly increases. In this situation, DingTalk is quickly elevated to the top of the App Store download list, which involves sharp increases in traffic. DingTalk is developed based on the elastic infrastructure on Alibaba Cloud. This ensures that all the traffic spikes are smoothly handled.

To serve a large number of users, DingTalk must ensure the accuracy and timeliness of message sending and receiving, and provide specific features, such as read receipts. Unlike individual-oriented IM tools such as WeChat, enterprise-grade IM tools must support the permanent storage of chat history and provide the multi-terminal roaming feature. This feature allows you to receive messages from multiple terminals. As the number of users sharply increases, DingTalk faces challenges in the costs that are generated by the permanent storage of chat history and the challenges to ensure the performance of read and write operations on the chat history.

To address these challenges, DingTalk uses X-Engine as the storage engine for messages. This achieves a balance between costs and performance. X-Engine has the following benefits:

  • The storage required by X-Engine is about 62% less than the storage required by the InnoDB storage engine.
  • Some database features such as transactions and secondary indexes are supported.
  • Code can be migrated without changes to ApsaraDB RDS instances that are powered by X-Engine.
  • X-Engine separates hot and cold data to accelerate the processing of current messages. X-Engine also implements the most efficient compression algorithm for historical messages.

X-Engine storage efficiency is tested on two datasets: Link-Bench and Alibaba internal transaction business. In the test, X-Engine requires 2-fold less storage than the InnoDB storage engine with compression enabled, and 3- to 5-fold less storage than the InnoDB storage engine with compression disabled.

Comparison

Low costs achieved by X-Engine

X-Engine adopts the following technologies to ensure low costs:

  • Compact pages

    X-Engine uses the copy-on-write technology to write new data to new pages without updating the original pages. The new pages are read-only and cannot be updated. These pages are stored in a compact manner, and the data is compressed by using algorithms such as prefix encoding. This improves the storage efficiency. You can use the compaction operation to clear invalid records. This ensures a compact arrangement of valid records. X-Engine requires only 10% to 50% of the storage compared with conventional storage engines, such as InnoDB.

  • Data compression and cleaning of invalid records

    Pages after encoding can be compressed by using general compression algorithms, such as zlib, zstd, and snappy. Data at a low level of a log-structured merge-tree (LSM tree) is compressed by default.

    Data compression sacrifices computing resources for storage. We recommend that you select compression algorithms that provide a low compression ratio and a high speed of compression and decompression. After a large number of comparative tests, X-Engine selects zstd as the default compression algorithm with additional support for other compression algorithms.

    In addition, the compaction operation is introduced to delete invalid records. This way, only valid records are retained. The more frequently the compaction operation is performed, the lower the proportion of invalid records, and the higher the storage efficiency. Therefore, you must perform the compaction operation at a suitable frequency.

    The X-Engine team also develops the field-programmable gate array (FPGA) compaction technology to reduce the computing resource consumption of the compaction operation. This technology uses heterogeneous computing hardware to accelerate the compaction process. The FPGA technology streamlines compaction and compression operations by using FPGA hardware. On a host without FPGA hardware, X-Engine can use a suitable scheduling algorithm to save storage at a lower performance cost.

  • Intelligent separation between hot and cold data

    In normal access to a storage system, most access requests direct to a small portion of data. This is why the cache works. In an LSM tree structure, frequently accessed data is stored at a high level to a fast storage device, such as non-volatile memory (NVM) and dynamic random-access memory (DRAM). Infrequently accessed data is stored at a low level to a slow storage device. This is the hot and cold data separation in X-Engine.

    The separation algorithm completes the following tasks:

    • In the compaction operation, the pages and records that are least likely to be accessed are selected and moved to the bottom of the LSM tree.
    • Current hot data is selected and backfilled to memory such as BlockCache and RowCache in the compaction or dump process. This prevents compromised performance from jitters in cache hit rates.
    • The AI algorithm recognizes data that may be accessed in the future and pre-reads it into memory. This increases the hit rates for accessing cache at the first time.

    Hot data and cold data are accurately identified to avoid computing resource waste due to invalid compression or decompression. This improves system throughput.

For more information, see Introduction to X-Engine.

Related papers