All Products
Search
Document Center

Lindorm:Data compression test

Last Updated:Mar 28, 2026

This topic compares the data compression performance of Lindorm against open source HBase, MySQL, and MongoDB across four real-world datasets: orders, Internet of Vehicles (IoV), logs, and user behaviors.

Test environment

Lindorm is a multi-model hyper-converged database service that uses the zstd compression algorithm by default and supports dictionary-based compression, which improves the compression ratio by optimizing dictionary sampling during data encoding.

The following table shows the database versions and compression configurations used in this test.

DatabaseVersionDefault compressionNotes
LindormLatestzstd (optimized)Dictionary-based compression available
Open source HBase2.3.4Snappyzstd is supported with later Hadoop versions but prone to stability issues and core dumps; most deployments use Snappy
Open source MySQL8.0None (disabled)zlib is available but significantly degrades query performance when enabled
Open source MongoDB5.0Snappyzstd is available as an alternative
Important

This test follows only parts of the TPC benchmark specifications. Results are not equivalent to or comparable with results from tests that fully follow TPC benchmark specifications.

Each scenario tests and compares the following configurations:

  • Lindorm with zstd (default)

  • Lindorm with dictionary-based compression enabled

  • Open source HBase with Snappy

  • Open source MySQL with compression disabled

  • Open source MongoDB with Snappy

  • Open source MongoDB with zstd

When to use dictionary-based compression

zstd vs. dictionary-based compression:

AlgorithmBenefitBest for
zstd (default)Significant storage reduction with no additional configurationAll data types
Dictionary-based compressionFurther reduction beyond zstd, at the cost of a dictionary training step during data ingestionDatasets with high row-to-row repetition

Data that benefits most from dictionary-based compression: datasets with repetitive structure across rows, such as log entries, IoV telemetry fields, and behavioral event records.

For consolidated results across all scenarios, see Summary.

Orders

Dataset

This scenario uses the TPC-H benchmark dataset, defined by the Transaction Processing Performance Council (TPC) to evaluate analytical query performance.

Download the TPC-H tool: TPC-H_Tools_v3.0.0.zip

Generate 10 GB of test data:

# Unzip and build the data generator
unzip TPC-H_Tools_v3.0.0.zip
cd TPC-H_Tools_v3.0.0/dbgen
cp makefile.suite makefile

# Edit makefile: set the following fields
# CC = gcc
# DATABASE = ORACLE
# MACHINE = LINUX
# WORKLOAD = TPCH
make

# Generate 10 GB of test data
./dbgen -s 10

This generates eight .tbl files. This test uses ORDERS.tbl: 15 million rows, 1.76 GB.

FieldType
O_ORDERKEYINT
O_CUSTKEYINT
O_ORDERSTATUSCHAR(1)
O_TOTALPRICEDECIMAL(15,2)
O_ORDERDATEDATE
O_ORDERPRIORITYCHAR(15)
O_CLERKCHAR(15)
O_SHIPPRIORITYINT
O_COMMENTVARCHAR(79)

Create test tables

HBase

create 'ORDERS', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'SNAPPY', BLOCKSIZE => '32768}

MySQL

CREATE TABLE ORDERS (
  O_ORDERKEY      INTEGER NOT NULL,
  O_CUSTKEY       INTEGER NOT NULL,
  O_ORDERSTATUS   CHAR(1) NOT NULL,
  O_TOTALPRICE    DECIMAL(15,2) NOT NULL,
  O_ORDERDATE     DATE NOT NULL,
  O_ORDERPRIORITY CHAR(15) NOT NULL,
  O_CLERK         CHAR(15) NOT NULL,
  O_SHIPPRIORITY  INTEGER NOT NULL,
  O_COMMENT       VARCHAR(79) NOT NULL
);

MongoDB

db.createCollection("ORDERS")

Lindorm

-- lindorm-cli
CREATE TABLE ORDERS (
  O_ORDERKEY      INTEGER NOT NULL,
  O_CUSTKEY       INTEGER NOT NULL,
  O_ORDERSTATUS   CHAR(1) NOT NULL,
  O_TOTALPRICE    DECIMAL(15,2) NOT NULL,
  O_ORDERDATE     DATE NOT NULL,
  O_ORDERPRIORITY CHAR(15) NOT NULL,
  O_CLERK         CHAR(15) NOT NULL,
  O_SHIPPRIORITY  INTEGER NOT NULL,
  O_COMMENT       VARCHAR(79) NOT NULL,
  PRIMARY KEY(O_ORDERKEY)
);

Compression results

image.png
DatabaseTable size
Lindorm (zstd)784 MB
Lindorm (dictionary-based compression)639 MB
HBase (Snappy)1.23 GB
MySQL (no compression)2.10 GB
MongoDB (Snappy)1.63 GB
MongoDB (zstd)1.32 GB

IoV

Dataset

This scenario uses the NGSIM (Next Generation Simulation) dataset, collected by the U.S. Federal Highway Administration from vehicle trajectories on U.S. Route 101. NGSIM is widely used in driving behavior research, traffic flow analysis, vehicle trajectory prediction, and autonomous driving decision planning.

Download NGSIM_Data.csv: 11.85 million rows, 1.54 GB, 25 columns per row.

Create test tables

HBase

create 'NGSIM', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'SNAPPY', BLOCKSIZE => '32768}

MySQL

CREATE TABLE NGSIM (
  ID             INTEGER NOT NULL,
  Vehicle_ID     INTEGER NOT NULL,
  Frame_ID       INTEGER NOT NULL,
  Total_Frames   INTEGER NOT NULL,
  Global_Time    BIGINT NOT NULL,
  Local_X        DECIMAL(10,3) NOT NULL,
  Local_Y        DECIMAL(10,3) NOT NULL,
  Global_X       DECIMAL(15,3) NOT NULL,
  Global_Y       DECIMAL(15,3) NOT NULL,
  v_length       DECIMAL(10,3) NOT NULL,
  v_Width        DECIMAL(10,3) NOT NULL,
  v_Class        INTEGER NOT NULL,
  v_Vel          DECIMAL(10,3) NOT NULL,
  v_Acc          DECIMAL(10,3) NOT NULL,
  Lane_ID        INTEGER NOT NULL,
  O_Zone         CHAR(10),
  D_Zone         CHAR(10),
  Int_ID         CHAR(10),
  Section_ID     CHAR(10),
  Direction      CHAR(10),
  Movement       CHAR(10),
  Preceding      INTEGER NOT NULL,
  Following      INTEGER NOT NULL,
  Space_Headway  DECIMAL(10,3) NOT NULL,
  Time_Headway   DECIMAL(10,3) NOT NULL,
  Location       CHAR(10) NOT NULL,
  PRIMARY KEY(ID)
);

MongoDB

db.createCollection("NGSIM")

Lindorm

-- lindorm-cli
CREATE TABLE NGSIM (
  ID             INTEGER NOT NULL,
  Vehicle_ID     INTEGER NOT NULL,
  Frame_ID       INTEGER NOT NULL,
  Total_Frames   INTEGER NOT NULL,
  Global_Time    BIGINT NOT NULL,
  Local_X        DECIMAL(10,3) NOT NULL,
  Local_Y        DECIMAL(10,3) NOT NULL,
  Global_X       DECIMAL(15,3) NOT NULL,
  Global_Y       DECIMAL(15,3) NOT NULL,
  v_length       DECIMAL(10,3) NOT NULL,
  v_Width        DECIMAL(10,3) NOT NULL,
  v_Class        INTEGER NOT NULL,
  v_Vel          DECIMAL(10,3) NOT NULL,
  v_Acc          DECIMAL(10,3) NOT NULL,
  Lane_ID        INTEGER NOT NULL,
  O_Zone         CHAR(10),
  D_Zone         CHAR(10),
  Int_ID         CHAR(10),
  Section_ID     CHAR(10),
  Direction      CHAR(10),
  Movement       CHAR(10),
  Preceding      INTEGER NOT NULL,
  Following      INTEGER NOT NULL,
  Space_Headway  DECIMAL(10,3) NOT NULL,
  Time_Headway   DECIMAL(10,3) NOT NULL,
  Location       CHAR(10) NOT NULL,
  PRIMARY KEY(ID)
);

Compression results

image.png
DatabaseTable size
Lindorm (zstd)995 MB
Lindorm (dictionary-based compression)818 MB
HBase (Snappy)1.72 GB
MySQL (no compression)2.51 GB
MongoDB (Snappy)1.88 GB
MongoDB (zstd)1.50 GB

Logs

Dataset

This scenario uses the Online Shopping Store - Web Server Logs dataset (Zaker, Farzin, 2019, Harvard Dataverse, V1).

Download access.log: 10.36 million rows, 3.51 GB. Each row is a single log entry. Example:

54.36.149.41 - - [22/Jan/2019:03:56:14 +0330] "GET /filter/27|13%20%D9%85%DA%AF%D8%A7%D9%BE%DB%8C%DA%A9%D8%B3%D9%84,27|%DA%A9%D9%85%D8%AA%D8%B1%20%D8%A7%D8%B2%205%20%D9%85%DA%AF%D8%A7%D9%BE%DB%8C%DA%A9%D8%B3%D9%84,p53 HTTP/1.1" 200 30577 "-" "Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/)" "-"

Log data is structurally repetitive across rows, which is why this scenario shows the highest compression gains from dictionary-based compression.

Create test tables

HBase

create 'ACCESS_LOG', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'SNAPPY', BLOCKSIZE => '32768}

MySQL

CREATE TABLE ACCESS_LOG (
  ID      INTEGER NOT NULL,
  CONTENT VARCHAR(10000),
  PRIMARY KEY(ID)
);

MongoDB

db.createCollection("ACCESS_LOG")

Lindorm

-- lindorm-cli
CREATE TABLE ACCESS_LOG (
  ID      INTEGER NOT NULL,
  CONTENT VARCHAR(10000),
  PRIMARY KEY(ID)
);

Compression results

image.png
DatabaseTable size
Lindorm (zstd)646 MB
Lindorm (dictionary-based compression)387 MB
HBase (Snappy)737 MB
MySQL (no compression)3.99 GB
MongoDB (Snappy)1.17 GB
MongoDB (zstd)893 MB

User behaviors

Dataset

This scenario uses the Shop Info and User Behavior data from IJCAI-15 dataset from Alibaba Cloud Tianchi.

Download data_format1.zip and use user_log_format1.csv: 54.92 million rows, 1.91 GB.

ColumnSample values
user_id328862
item_id323294, 844400, 575153
cat_id833, 1271
seller_id2882
brand_id2661
time_stamp829
action_type0

Create test tables

HBase

create 'USER_LOG', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'SNAPPY', BLOCKSIZE => '32768}

MySQL

CREATE TABLE USER_LOG (
  ID          INTEGER NOT NULL,
  USER_ID     INTEGER NOT NULL,
  ITEM_ID     INTEGER NOT NULL,
  CAT_ID      INTEGER NOT NULL,
  SELLER_ID   INTEGER NOT NULL,
  BRAND_ID    INTEGER,
  TIME_STAMP  CHAR(4) NOT NULL,
  ACTION_TYPE CHAR(1) NOT NULL,
  PRIMARY KEY(ID)
);

MongoDB

db.createCollection("USER_LOG")

Lindorm

-- lindorm-cli
CREATE TABLE USER_LOG (
  ID          INTEGER NOT NULL,
  USER_ID     INTEGER NOT NULL,
  ITEM_ID     INTEGER NOT NULL,
  CAT_ID      INTEGER NOT NULL,
  SELLER_ID   INTEGER NOT NULL,
  BRAND_ID    INTEGER,
  TIME_STAMP  CHAR(4) NOT NULL,
  ACTION_TYPE CHAR(1) NOT NULL,
  PRIMARY KEY(ID)
);

Compression results

image.png
DatabaseTable size
Lindorm (zstd)805 MB
Lindorm (dictionary-based compression)721 MB
HBase (Snappy)1.48 GB
MySQL (no compression)2.90 GB
MongoDB (Snappy)3.33 GB
MongoDB (zstd)2.74 GB

Summary

Lindorm achieves a higher compression ratio than open source databases even without dictionary-based compression enabled. With dictionary-based compression, Lindorm achieves the highest compression ratio across all four scenarios. Compared to the defaults used by each open source database, Lindorm with dictionary-based compression reduces stored data size by:

  • 1–2x more than open source HBase (Snappy)

  • 2–4x more than open source MongoDB (Snappy or zstd)

  • 3–10x more than open source MySQL (uncompressed)

The following table consolidates all test results.

DatasetOriginal sizeLindorm (zstd)Lindorm (dictionary)HBase (Snappy)MySQLMongoDB (Snappy)MongoDB (zstd)
Order data (TPC-H)1.76 GB784 MB639 MB1.23 GB2.10 GB1.63 GB1.32 GB
IoV data (NGSIM)1.54 GB995 MB818 MB1.72 GB2.51 GB1.88 GB1.50 GB
Log data (web server)3.51 GB646 MB387 MB737 MB3.99 GB1.17 GB893 MB
User behavior (IJCAI-15)1.91 GB805 MB721 MB1.48 GB2.90 GB3.33 GB2.74 GB

Choosing between zstd and dictionary-based compression: zstd is enabled by default and reduces storage costs across all data types with no additional configuration. Dictionary-based compression provides a further reduction—most pronounced for log data (387 MB vs. 646 MB) and most modest for numeric-heavy IoV data—at the cost of a dictionary training step during data ingestion.