This paper introduces ApsaraDB for HBase data compressing and encoding in practical application.

Compression Algorithm

Currently, ApsaraDB for HBase platform supports the following compression algorithms: LZO, ZSTD, GZ, LZ4, SNAPPY, and NONE. NONE means that the compression is disabled. The following table compares the compression rates and speeds of the compression algorithms in different scenarios.

Business type Size of an uncompressed table LZO (compression rate/decompression speed, Unit: MB/s) ZSTD (compression rate/decompression speed, Unit: MB/s) LZ4 (compression rate/decompression speed, Unit: MB/s)
Monitoring 419.75 TB 5.82/372 13.09/256 5.19/463.8
Logs 77.26 TB 4.11/333 6.0/287 4.16/496.1
Risk control 147.83 TB 4.29/297.7 5.93/270 4.19/441.38
Transaction records 108.04 TB 5.93/316.8 10.51/288.3 5.55/520.3
Note
  • We recommend that you use the LZ4 compression algorithm for the scenarios with high response time (RT) requirements.
  • We recommend that you use the ZSTD compression algorithm for scenarios with low RT requirements, such as monitoring and Internet of Things (IoT) scenarios.

Encoding

ApsaraDB for HBase supports DataBlockEncoding, which compresses data by reducing the duplicate parts in HBase KeyValue. We recommend that you use DIFF for DATA_BLOCK_ENCODING.

Procedure

  1. Modify the COMPRESSION property of the table.
    alter 'test', {NAME => 'f', COMPRESSION => 'lz4', DATA_BLOCK_ENCODING =>'DIFF'}           
  2. The modifications do not take effect immediately. You must perform a major compaction for the modifications to take effect. Major compactions are time consuming, and we recommend that you perform a major compaction during off-peak hours.
    major_compact 'test'