The benchmark tests compare the throughput and response latency between HBase Community Edition and ApsaraDB for Lindorm.

The throughput benchmark test uses the same number of threads to test the throughput of HBase Community Edition and the throughput of ApsaraDB for Lindorm. The response latency benchmark test uses the same amount of workloads to test the response latency of HBase Community Edition and the response latency of ApsaraDB for Lindorm. The benchmark test for compression ratios writes the same amount of data into HBase Community Edition and ApsaraDB for Lindorm to test their compression ratios.

Create a table

Create tables in the clusters of HBase Community Edition and ApsaraDB for Lindorm. The tables used in all the test cases use the same schema. Create 200 partitions based on the Yahoo Cloud Serving Benchmark (YCSB) data.

The table created in the cluster of ApsaraDB for Lindorm uses the exclusive INDEX encoding and Zstandard compression algorithms. When you set the encoding algorithm to DIFF, the algorithm is automatically updated to the INDEX encoding algorithm.

create 'test', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'ZSTD'}, {SPLITS => (1..199).map{|i| "user#{(i * ((2**63-1)/199)).to_s.rjust(19, "0")}"} }
            

The table created in the cluster of HBase Community Edition uses the DIFF encoding and SNAPPY compression algorithms that are recommended by official HBase. You can execute the following statement to create the table:

create 'test', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'SNAPPY'}, {SPLITS => (1..199).map{|i| "user#{(i * ((2**63-1)/199)).to_s.rjust(19, "0")}"} }
            

Prepare the test data

Prepare the data to be read from a single row and within a specified range.

A single table stores 2 billion rows. Each row stores 20 columns. The size of each column is 20 bytes.

The following code block shows the YCSB profile:

recordcount=2000000000
operationcount=150000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=20
fieldlength=20

readproportion=1.0
updateproportion=0.0
scanproportion=0
insertproportion=0

requestdistribution=uniform
            

Run the following command to launch YCSB:

bin/ycsb load hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -s
            

Throughput benchmark tests

The throughput benchmark tests compare the throughput of HBase Community Edition with that of ApsaraDB for Lindorm based on the same number of threads. The tests include four test scenarios. The test scenarios are independent of each other.

  • Read data in a single row

    The test dataset stores 2 billion rows. Each row stores 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. After the preceding data is prepared, perform a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.

    The following code block shows the workload configuration in the YCSB profile:

    recordcount=10000000
    operationcount=2000000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    
    readallfields=false
    fieldcount=1
    fieldlength=20
    
    readproportion=1.0
    updateproportion=0.0
    scanproportion=0
    insertproportion=0
    
    requestdistribution=uniform
                        

    Run the following command to launch YCSB:

    bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200
                        
  • Read data within a specified range

    The test dataset stores 2 billion rows. Each row stores 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. Fifty rows are read each time. After the preceding data is prepared, perform a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.

    The following code block shows the workload configuration in the YCSB profile:

    recordcount=10000000
    operationcount=2000000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    
    readallfields=false
    fieldcount=1
    fieldlength=20
    
    readproportion=0.0
    updateproportion=0.0
    scanproportion=1.0
    insertproportion=0
    
    requestdistribution=uniform
    maxscanlength=50
    Lindorm.usepagefilter=false
                        

    Run the following command to launch YCSB:

    bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200
                        
  • Write data into a single row

    Insert one column into the table at a time. The size of each column is 20 bytes. Run the test for 20 minutes.

    The following code block shows the workload configuration in the YCSB profile:

    recordcount=2000000000
    operationcount=100000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    
    readallfields=false
    fieldcount=1
    fieldlength=20
    
    readproportion=0.0
    updateproportion=0.0
    scanproportion=0
    insertproportion=1.0
    
    requestdistribution=uniform
                        

    Run the following command to launch YCSB:

    bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200
                        
  • Write data into multiple rows

    Insert one column into the table at a time. The size of each column is 20 bytes. Write data into 100 rows per batch. Run the test for 20 minutes.

    recordcount=2000000000
    operationcount=10000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    fieldcount=1
    fieldlength=20
    cyclickey=true
    
    readallfields=false
    readproportion=0
    updateproportion=0
    scanproportion=0
    insertproportion=0.0
    batchproportion=1.0
    batchsize=100
    
    requestdistribution=uniform
                        

    Run the following command to launch YCSB:

    bin/ycsb run hbase10 -P <workload> -p table=test -threads 100  -p columnfamily=f -p maxexecutiontime=1200
                        

Response latency benchmark tests

The response latency benchmark tests compare the response latency of HBase Community Edition with that of ApsaraDB for Lindorm based on the same Operations per Second (OPS).

  • Read data in a single row

    The test dataset stores 2 billion rows. Each row stores 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. The maximum OPS is 5000. After the preceding data is prepared, perform a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.

    The following code block shows the workload configuration in the YCSB profile:

    recordcount=10000000
    operationcount=2000000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    
    readallfields=false
    fieldcount=1
    fieldlength=20
    
    readproportion=1.0
    updateproportion=0.0
    scanproportion=0
    insertproportion=0
    
    requestdistribution=uniform
                        

    Run the following command to launch YCSB:

    bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200 -p target=5000
                        
  • Read data within a specified range

    The test dataset stores 2 billion rows. Each row stores 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. 50 rows are read at a time. The maximum OPS is 5000. After the preceding data is prepared, perform a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.

    The following code block shows the workload configuration in the YCSB profile:

    recordcount=10000000
    operationcount=2000000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    
    readallfields=false
    fieldcount=1
    fieldlength=20
    
    readproportion=0.0
    updateproportion=0.0
    scanproportion=1.0
    insertproportion=0
    
    requestdistribution=uniform
    maxscanlength=50
    Lindorm.usepagefilter=false
                        

    Run the following command to launch YCSB:

    bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200 -p target=5000
                        
  • Write data into a single row

    Insert one column into the table at a time. The size of each column is 20 bytes. Run the test for 20 minutes. The maximum OPS is 50000.

    The following code block shows the workload configuration in the YCSB profile:

    recordcount=2000000000
    operationcount=100000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    
    readallfields=false
    fieldcount=1
    fieldlength=20
    
    readproportion=0.0
    updateproportion=0.0
    scanproportion=0
    insertproportion=1.0
    
    requestdistribution=uniform
                        

    Run the following command to launch YCSB:

    bin/ycsb run hbase10 -P <workload> -p table=testwrite -threads 200 -p columnfamily=f -p maxexecutiontime=1200 -p target=50000
                        
  • Write data into multiple rows

    Insert one column into the table at a time. The size of each column is 20 bytes. Write data into 100 rows per batch. Run the test for 20 minutes. The maximum OPS is 2000.

    recordcount=2000000000
    operationcount=10000000
    workload=com.yahoo.ycsb.workloads.CoreWorkload
    fieldcount=1
    fieldlength=20
    cyclickey=true
    
    readallfields=false
    readproportion=0
    updateproportion=0
    scanproportion=0
    insertproportion=0.0
    batchproportion=1.0
    batchsize=100
    
    requestdistribution=uniform
                        

    Run the following command to launch YCSB:

    bin/ycsb run hbase10 -P <workload> -p table=testwrite -threads 100 -p columnfamily=f -p maxexecutiontime=1200 -p target=2000
                        

Benchmark tests for compression ratios

The following compression ratio benchmark tests all follow the same procedure. Manually trigger a flush and major compaction by inserting 5 million rows into the table through YCSB. After the data is inserted into the table, check the size of the table.

Number of columns in each row Size of each column
1 10
1 100
20 10
20 20

The following code block shows the workload configuration in the YCSB profile:

recordcount=5000000
operationcount=150000000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=false
fieldcount=<Number of columns in each row>
fieldlength=<Size of each column>

readproportion=1.0

requestdistribution=uniform
            

Run the following command to insert data:

bin/ycsb load hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -s