The benchmark tests compare the throughput and response latency between HBase Community Edition and ApsaraDB for HBase Performance-enhanced Edition. The throughput benchmark test uses the same number of threads to test the throughput of HBase Community Edition and the throughput of ApsaraDB for HBase Performance-enhanced Edition. The response latency benchmark test uses the same amount of workloads to test the response latency of HBase Community Edition and the response latency of ApsaraDB for HBase Performance-enhanced Edition. The compression ratio benchmark test writes the same amount of data into HBase Community Edition and ApsaraDB for HBase Performance-enhanced Edition to test their compression ratios.

Prepare Data

Create a table in the cluster of HBase Community Edition and the cluster of ApsaraDB for HBase Performance-enhanced Edition. The tables used in all test cases use the same schema. Create 200 partitions based on the Yahoo Cloud Serving Benchmark (YCSB) data.

The table created in the cluster of ApsaraDB for HBase Performance-enhanced Edition uses the exclusive INDEX encoding and Zstandard compression algorithms. When you set the encoding algorithm to DIFF, it is automatically updated to the INDEX encoding algorithm. The statement for creating the table is as follows:

create 'test', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'ZSTD'}, {SPLITS => (1..199).map{|i| "user#{(i * ((2**63-1)/199)).to_s.rjust(19, "0")}"} }
        

The table created in the cluster of HBase Community Edition uses the DIFF encoding and SNAPPY compression algorithms, which are recommended by official HBase. The statement for creating the table is as follows:

create 'test', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'SNAPPY'}, {SPLITS => (1..199).map{|i| "user#{(i * ((2**63-1)/199)).to_s.rjust(19, "0")}"} }
        

Prepare the data to be read from a single row and within a specified range.

The test dataset contains 2 billion rows. Each row contains 20 columns. The size of each column is 20 bytes.

The YCSB configuration file is configured as follows:

recordcount=2000000000
operationcount=150000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=20
fieldlength=20

readproportion=1.0
updateproportion=0.0
scanproportion=0
insertproportion=0

requestdistribution=uniform
        

Run the following command to launch YCSB:

bin/ycsb load hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -s
        

Test scenarios

Throughput benchmark tests

The throughput benchmark tests compare the throughput of HBase Community Edition with that of ApsaraDB for HBase Performance-enhanced Edition based on the same number of threads. The tests include four test scenarios. The test scenarios are independent of each other.

Read data in a single row

The test dataset contains 2 billion rows. Each row contains 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. After the preceding data is prepared, run a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.

The configuration file of YCSB is as follows:

recordcount=10000000
operationcount=2000000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=1
fieldlength=20

readproportion=1.0
updateproportion=0.0
scanproportion=0
insertproportion=0

requestdistribution=uniform
        

Run the following command to launch YCSB:

bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200
        

Read data within a specified range

The test dataset contains 2 billion rows. Each row contains 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. 50 rows are read each time. After the preceding data is prepared, run a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.

The configuration file
recordcount=10000000
operationcount=2000000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=1
fieldlength=20

readproportion=0.0
updateproportion=0.0
scanproportion=1.0
insertproportion=0

requestdistribution=uniform
maxscanlength=50
hbase.usepagefilter=false
        

Run the following command to launch YCSB:

bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200
        

Write data into a single row

Insert one column into the table at a time. The size of each column is 20 bytes. Run the test for 20 minutes.

The configuration file of YCSB is as follows:

recordcount=2000000000
operationcount=100000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=1
fieldlength=20

readproportion=0.0
updateproportion=0.0
scanproportion=0
insertproportion=1.0

requestdistribution=uniform
        

Run the following command to launch YCSB:

bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200
        

Write data into multiple rows

Insert one column into the table at a time. The size of each column is 20 bytes. Write data into 100 rows per batch. Run the test for 20 minutes.

recordcount=2000000000
operationcount=10000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
fieldcount=1
fieldlength=20
cyclickey=true

readallfields=false
readproportion=0
updateproportion=0
scanproportion=0
insertproportion=0.0
batchproportion=1.0
batchsize=100

requestdistribution=uniform
        

Run the following command to launch YCSB:

bin/ycsb run hbase10 -P <workload> -p table=test -threads 100  -p columnfamily=f -p maxexecutiontime=1200
        

Response latency benchmark tests

The response latency benchmark tests compare the response latency of HBase Community Edition with that of ApsaraDB for HBase Performance-enhanced Edition based on the same Operations per Second (OPS).

Read data in a single row

The test dataset contains 2 billion rows. Each row contains 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. The maximum OPS is 5000. After the preceding data is prepared, run a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.

The configuration file of YCSB is as follows:

recordcount=10000000
operationcount=2000000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=1
fieldlength=20

readproportion=1.0
updateproportion=0.0
scanproportion=0
insertproportion=0

requestdistribution=uniform
        

Run the following command to launch YCSB:

bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200 -p target=5000
        

Read data within a specified range

The test dataset contains 2 billion rows. Each row contains 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. 50 rows are read at a time. The maximum OPS is 5000. After the preceding data is prepared, run a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.

The configuration file of YCSB is as follows:

recordcount=10000000
operationcount=2000000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=1
fieldlength=20

readproportion=0.0
updateproportion=0.0
scanproportion=1.0
insertproportion=0

requestdistribution=uniform
maxscanlength=50
hbase.usepagefilter=false
        

Run the following command to launch YCSB:

bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200 -p target=5000
        

Write data into a single row

Insert a column to the table at a time. The size of each column in the row is 20 bytes. Run the test for 20 minutes. The maximum OPS is 50000.

The configuration file of YCSB is as follows:

recordcount=2000000000
operationcount=100000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=1
fieldlength=20

readproportion=0.0
updateproportion=0.0
scanproportion=0
insertproportion=1.0

requestdistribution=uniform
        

Run the following command to launch YCSB:

bin/ycsb run hbase10 -P <workload> -p table=testwrite -threads 200 -p columnfamily=f -p maxexecutiontime=1200 -p target=50000
        

Write data into multiple rows

Insert one column into the table at a time. The size of each column is 20 bytes. Write data into 100 rows per batch. Run the test for 20 minutes. The maximum OPS is 2000.

recordcount=2000000000
operationcount=10000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
fieldcount=1
fieldlength=20
cyclickey=true

readallfields=false
readproportion=0
updateproportion=0
scanproportion=0
insertproportion=0.0
batchproportion=1.0
batchsize=100

requestdistribution=uniform
        

Run the following command to launch YCSB:

bin/ycsb run hbase10 -P <workload> -p table=testwrite -threads 100 -p columnfamily=f -p maxexecutiontime=1200 -p target=2000
        

Compression ratio benchmark tests

The following compression ratio benchmark tests all follow the same procedure. Manually trigger a flush and major compaction by inserting 5 million rows into the table through YCSB. After the data is inserted into the table, check the size of the table.

The number of columns in each row. The size of each column.
1 10
1 100
20 10
20 20

The configuration file of YCSB is as follows:

recordcount=5000000
operationcount=150000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
 fieldcount=<Number of columns in each row>
fieldlength=<Size of each column>

readproportion=1.0

requestdistribution=uniform
        

Run the following command to insert data:

bin/ycsb load hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -s