The benchmark tests compare the throughput and response latency between HBase Community Edition and ApsaraDB for HBase Performance-enhanced Edition. The throughput benchmark test uses the same number of threads to test the throughput of HBase Community Edition and the throughput of ApsaraDB for HBase Performance-enhanced Edition. The response latency benchmark test uses the same amount of workloads to test the response latency of HBase Community Edition and the response latency of ApsaraDB for HBase Performance-enhanced Edition. The compression ratio benchmark test writes the same amount of data into HBase Community Edition and ApsaraDB for HBase Performance-enhanced Edition to test their compression ratios.
Prepare Data
Create a table in the cluster of HBase Community Edition and the cluster of ApsaraDB for HBase Performance-enhanced Edition. The tables used in all test cases use the same schema. Create 200 partitions based on the Yahoo Cloud Serving Benchmark (YCSB) data.
The table created in the cluster of ApsaraDB for HBase Performance-enhanced Edition uses the exclusive INDEX encoding and Zstandard compression algorithms. When you set the encoding algorithm to DIFF, it is automatically updated to the INDEX encoding algorithm. The statement for creating the table is as follows:
create 'test', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'ZSTD'}, {SPLITS => (1..199).map{|i| "user#{(i * ((2**63-1)/199)).to_s.rjust(19, "0")}"} }
The table created in the cluster of HBase Community Edition uses the DIFF encoding and SNAPPY compression algorithms, which are recommended by official HBase. The statement for creating the table is as follows:
create 'test', {NAME => 'f', DATA_BLOCK_ENCODING => 'DIFF', COMPRESSION => 'SNAPPY'}, {SPLITS => (1..199).map{|i| "user#{(i * ((2**63-1)/199)).to_s.rjust(19, "0")}"} }
Prepare the data to be read from a single row and within a specified range.
The test dataset contains 2 billion rows. Each row contains 20 columns. The size of each column is 20 bytes.
The YCSB configuration file is configured as follows:
recordcount=2000000000
operationcount=150000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=false
fieldcount=20
fieldlength=20
readproportion=1.0
updateproportion=0.0
scanproportion=0
insertproportion=0
requestdistribution=uniform
Run the following command to launch YCSB:
bin/ycsb load hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -s
Test scenarios
Throughput benchmark tests
The throughput benchmark tests compare the throughput of HBase Community Edition with that of ApsaraDB for HBase Performance-enhanced Edition based on the same number of threads. The tests include four test scenarios. The test scenarios are independent of each other.
Read data in a single row
The test dataset contains 2 billion rows. Each row contains 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. After the preceding data is prepared, run a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.
The configuration file of YCSB is as follows:
recordcount=10000000
operationcount=2000000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=false
fieldcount=1
fieldlength=20
readproportion=1.0
updateproportion=0.0
scanproportion=0
insertproportion=0
requestdistribution=uniform
Run the following command to launch YCSB:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200
Read data within a specified range
The test dataset contains 2 billion rows. Each row contains 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. 50 rows are read each time. After the preceding data is prepared, run a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.
recordcount=10000000
operationcount=2000000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=false
fieldcount=1
fieldlength=20
readproportion=0.0
updateproportion=0.0
scanproportion=1.0
insertproportion=0
requestdistribution=uniform
maxscanlength=50
hbase.usepagefilter=false
Run the following command to launch YCSB:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200
Write data into a single row
Insert one column into the table at a time. The size of each column is 20 bytes. Run the test for 20 minutes.
The configuration file of YCSB is as follows:
recordcount=2000000000
operationcount=100000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=false
fieldcount=1
fieldlength=20
readproportion=0.0
updateproportion=0.0
scanproportion=0
insertproportion=1.0
requestdistribution=uniform
Run the following command to launch YCSB:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200
Write data into multiple rows
Insert one column into the table at a time. The size of each column is 20 bytes. Write data into 100 rows per batch. Run the test for 20 minutes.
recordcount=2000000000
operationcount=10000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
fieldcount=1
fieldlength=20
cyclickey=true
readallfields=false
readproportion=0
updateproportion=0
scanproportion=0
insertproportion=0.0
batchproportion=1.0
batchsize=100
requestdistribution=uniform
Run the following command to launch YCSB:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200
Response latency benchmark tests
The response latency benchmark tests compare the response latency of HBase Community Edition with that of ApsaraDB for HBase Performance-enhanced Edition based on the same Operations per Second (OPS).
Read data in a single row
The test dataset contains 2 billion rows. Each row contains 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. The maximum OPS is 5000. After the preceding data is prepared, run a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.
The configuration file of YCSB is as follows:
recordcount=10000000
operationcount=2000000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=false
fieldcount=1
fieldlength=20
readproportion=1.0
updateproportion=0.0
scanproportion=0
insertproportion=0
requestdistribution=uniform
Run the following command to launch YCSB:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -p maxexecutiontime=1200 -p target=5000
Read data within a specified range
The test dataset contains 2 billion rows. Each row contains 20 columns. The size of each column is 20 bytes. The query range is 10 million rows. 50 rows are read at a time. The maximum OPS is 5000. After the preceding data is prepared, run a major compaction and wait for the system to complete the major compaction. Run a warm-up test for 20 minutes, and then run a formal test for 20 minutes.
The configuration file of YCSB is as follows:
recordcount=10000000
operationcount=2000000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=false
fieldcount=1
fieldlength=20
readproportion=0.0
updateproportion=0.0
scanproportion=1.0
insertproportion=0
requestdistribution=uniform
maxscanlength=50
hbase.usepagefilter=false
Run the following command to launch YCSB:
bin/ycsb run hbase10 -P <workload> -p table=test -threads 100 -p columnfamily=f -p maxexecutiontime=1200 -p target=5000
Write data into a single row
Insert a column to the table at a time. The size of each column in the row is 20 bytes. Run the test for 20 minutes. The maximum OPS is 50000.
The configuration file of YCSB is as follows:
recordcount=2000000000
operationcount=100000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=false
fieldcount=1
fieldlength=20
readproportion=0.0
updateproportion=0.0
scanproportion=0
insertproportion=1.0
requestdistribution=uniform
Run the following command to launch YCSB:
bin/ycsb run hbase10 -P <workload> -p table=testwrite -threads 200 -p columnfamily=f -p maxexecutiontime=1200 -p target=50000
Write data into multiple rows
Insert one column into the table at a time. The size of each column is 20 bytes. Write data into 100 rows per batch. Run the test for 20 minutes. The maximum OPS is 2000.
recordcount=2000000000
operationcount=10000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
fieldcount=1
fieldlength=20
cyclickey=true
readallfields=false
readproportion=0
updateproportion=0
scanproportion=0
insertproportion=0.0
batchproportion=1.0
batchsize=100
requestdistribution=uniform
Run the following command to launch YCSB:
bin/ycsb run hbase10 -P <workload> -p table=testwrite -threads 100 -p columnfamily=f -p maxexecutiontime=1200 -p target=2000
Compression ratio benchmark tests
The following compression ratio benchmark tests all follow the same procedure. Manually trigger a flush and major compaction by inserting 5 million rows into the table through YCSB. After the data is inserted into the table, check the size of the table.
The number of columns in each row. | The size of each column. |
---|---|
1 | 10 |
1 | 100 |
20 | 10 |
20 | 20 |
The configuration file of YCSB is as follows:
recordcount=5000000
operationcount=150000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=false
fieldcount=<Number of columns in each row>
fieldlength=<Size of each column>
readproportion=1.0
requestdistribution=uniform
Run the following command to insert data:
bin/ycsb load hbase10 -P <workload> -p table=test -threads 200 -p columnfamily=f -s