Test Lindorm vs HBase at Scale Using YCSB - Lindorm

These benchmark tests compare throughput, response latency, and compression ratio between an Apache HBase cluster and a Lindorm cluster using Yahoo Cloud Serving Benchmark (YCSB).

Test workflow:

Create tables in both clusters using the same schema.
Load 2 billion rows of data using YCSB.
Run throughput tests across four scenarios with the same number of threads.
Run response latency tests across the same four scenarios with a fixed OPS target.
Run compression ratio tests across four column/size combinations.

Create tables

Both clusters use the same table schema with 200 pre-split partitions based on YCSB data.

Note

For instructions on using Lindorm Shell to create tables, see Use Lindorm Shell to connect to LindormTable.

Lindorm cluster — uses INDEX encoding (a Lindorm-exclusive algorithm activated by setting DATA_BLOCK_ENCODING to DIFF) and Zstandard (ZSTD) compression:

create 'test', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'ZSTD'}, {SPLITS => (1..199).map{|i| "user#{(i * ((2**63-1)/199)).to_s.rjust(19, "0")}"} }

Apache HBase cluster — uses DIFF encoding and SNAPPY compression as recommended by Apache HBase:

create 'test', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'SNAPPY'}, {SPLITS => (1..199).map{|i| "user#{(i * ((2**63-1)/199)).to_s.rjust(19, "0")}"} }

Load data

Each table contains 2 billion rows, 20 columns per row, and 20 bytes per column.

YCSB profile:

recordcount=2000000000
operationcount=150000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=20
fieldlength=20

readproportion=1.0
updateproportion=0.0
scanproportion=0
insertproportion=0

requestdistribution=uniform

Run the following command to load data:

bin/ycsb load hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -s

Throughput test

The throughput test runs each scenario with the same number of threads on both clusters. The four scenarios are independent of each other.

All scenarios use maxexecutiontime=1200 (20-minute run). For read scenarios, run a 20-minute warm-up before the formal test; trigger a major compaction and wait for it to complete before starting.

Single-row read

Simulates high-concurrency point lookup. Reads one row at a time from a 10-million-row query range within the 2-billion-row dataset. The query range (recordcount=10000000) is smaller than the total dataset to simulate a hot-spot read pattern against a realistic data volume.

Parameter	Value
Rows in dataset	2 billion
Query range	10 million rows
Columns per row	20
Column size	20 bytes
Threads	200
Warm-up	20 minutes
Formal test	20 minutes

YCSB profile:

recordcount=10000000
operationcount=2000000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=20
fieldlength=20

readproportion=1.0
updateproportion=0.0
scanproportion=0
insertproportion=0

requestdistribution=uniform

Stress test command:

bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200

Range scan

Simulates batch scan workloads. Reads 50 consecutive rows per scan from a 10-million-row query range.

Parameter	Value
Rows in dataset	2 billion
Query range	10 million rows
Columns per row	20
Column size	20 bytes
Rows per scan	50
Threads	100
Warm-up	20 minutes
Formal test	20 minutes

YCSB profile:

recordcount=10000000
operationcount=2000000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=20
fieldlength=20

readproportion=0.0
updateproportion=0.0
scanproportion=1.0
insertproportion=0

requestdistribution=uniform
maxscanlength=50
Lindorm.usepagefilter=false

Stress test command:

bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200

Single-row insert

Simulates high-frequency single-row write workloads. Inserts one column (20 bytes) per operation.

Parameter	Value
Columns per insert	1
Column size	20 bytes
Threads	200
Test duration	20 minutes

YCSB profile:

recordcount=2000000000
operationcount=100000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=1
fieldlength=20

readproportion=0.0
updateproportion=0.0
scanproportion=0
insertproportion=1.0

requestdistribution=uniform

Stress test command:

bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200

Batch insert

Simulates bulk write workloads. Inserts one column (20 bytes) per operation in batches of 100 rows.

Parameter	Value
Columns per insert	1
Column size	20 bytes
Batch size	100 rows
Threads	100
Test duration	20 minutes

YCSB profile:

recordcount=2000000000
operationcount=10000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
fieldcount=1
fieldlength=20
cyclickey=true

readallfields=false
readproportion=0
updateproportion=0
scanproportion=0
insertproportion=0.0
batchproportion=1.0
batchsize=100

requestdistribution=uniform

Stress test command:

bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200

Response latency test

The response latency test uses the same workload configurations as the throughput test, but adds a -p target=<N> flag to cap OPS at a fixed value. This keeps the load identical across both clusters, so latency differences reflect cluster performance rather than load variation.

Single-row read

Parameter	Value
Query range	10 million rows
Columns per row	20
Column size	20 bytes
Threads	200
Max OPS	5,000
Warm-up	20 minutes
Formal test	20 minutes

YCSB profile: same as single-row read throughput test.

Stress test command:

bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200 -p target=5000

Range scan

Parameter	Value
Query range	10 million rows
Rows per scan	50
Threads	100
Max OPS	5,000
Warm-up	20 minutes
Formal test	20 minutes

YCSB profile: same as range scan throughput test.

Stress test command:

bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200 -p target=5000

Single-row insert

Parameter	Value
Columns per insert	1
Column size	20 bytes
Threads	200
Max OPS	50,000
Test duration	20 minutes

YCSB profile: same as single-row insert throughput test.

Stress test command:

bin/ycsb run hbase10 -P <workload> -p table=testwrite -threads 200 -p columnfamily=f -p maxexecutiontime=1200 -p target=50000

Batch insert

Parameter	Value
Columns per insert	1
Column size	20 bytes
Batch size	100 rows
Threads	100
Max OPS	2,000
Test duration	20 minutes

YCSB profile: same as batch insert throughput test.

Stress test command:

bin/ycsb run hbase10 -P <workload> -p table=testwrite -threads 100 -p columnfamily=f -p maxexecutiontime=1200 -p target=2000

Compression ratio test

Each compression ratio test loads 5 million rows into both clusters, then triggers a flush and major compaction. After compaction completes, compare the on-disk table size between both clusters.

Run this procedure for each of the following column configurations:

Columns per row	Column size (bytes)
1	10
20	10
20	20
100	10

YCSB profile (replace <fieldcount> and <fieldlength> with values from the table above):

recordcount=5000000
operationcount=150000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=<fieldcount>
fieldlength=<fieldlength>

readproportion=1.0

requestdistribution=uniform

Load data command:

bin/ycsb load hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -s

After loading, trigger a flush and major compaction manually, then check the table size in both clusters.