All Products
Search
Document Center

Performance white paper

Last Updated: Feb 23, 2021

Preface

This topic describes how to perform benchmark tests for ApsaraDB for Cassandra and provides sample results. The test results may not represent the optimal performance. The test results vary as the kernel and cloud environments are constantly optimized. If you need to estimate the ApsaraDB for Cassandra instance scales that are suitable for your business, you can refer to the test approaches that are described in this topic. The best way is to run simulated services on instances. The test result is more precise than the result of an external test tool.

Test tools

Use the industry-standard test tool: Yahoo! Cloud Serving Benchmark (YCSB) 0.15.0 (new released). For more information, visit https://github.com/brianfrankcooper/YCSB/tree/0.15.0/cassandra.

Test environment

Buy an ApsaraDB for Cassandra instance for testing.

Network: VPC. Deploy the client and server in the same region and zone. Instance architecture: a data center that contains three nodes. Instance storage: 400 GB per node and standard SSD. Stress testing server type: ecs.c6.2xlarge (8 cores, 16 GB). Instance specifications: all the specifications supported by ApsaraDB for Cassandra.

Test load description

Different services bear different loads such as the number of fields and data volume in each row and result in different throughput and latency. This topic uses the default workloada of YCSB for testing. You can adjust YCSB parameters to best match your business. Most of ApsaraDB for Cassandra parameters use the default values. For more information, visit the document at https://github.com/brianfrankcooper/YCSB/tree/0.15.0/cassandra.

Parameters

  • 10 fields per row (default).

  • 1 KB records per row (default).

  • Read/write operation ratio: 95:5.

  • Read/write consistency level: ONE (default).

  • Number of replicas: Configure two replicas because the disk is used.

  • Stress testing threads: modified based on the instance specifications. For more information, see the test results.

  • recordcount: the row number of imported data. It is modified based on the specifications. For more information, see the test results.

  • operationcount: the number of stress testing operations. It is the same as recordcount.

Take note of the following item: If the consistency level is adjusted, the performance is affected. Specify a consistency level based on your business requirements.

Test procedure

1. Create a test table

# Replace cn-shanghai-g with the data center ID of the purchased instance. You can view the Data Center Name parameter in the ApsaraDB for Cassandra console.
create keyspace ycsb WITH replication = {'class': 'NetworkTopologyStrategy', 'cn-shanghai-g': 2};
create table ycsb.usertable (y_id varchar primary key, field0 varchar, field1 varchar, field2 varchar, field3 varchar, field4 varchar, field5 varchar, field6 varchar, field7 varchar, field8 varchar, field9 varchar);
                        

2. Install the test tool

wget https://github.com/brianfrankcooper/YCSB/releases/download/0.15.0/ycsb-cassandra-binding-0.15.0.tar.gz
tar -zxf ycsb-cassandra-binding-0.15.0.tar.gz
                        

3. Edit the workloads/workloada code

Add the following three lines of code:

hosts=cds-xxxxxxxx-core-003.cassandra.rds.aliyuncs.com #The endpoint of the instance. You can view the endpoint in the ApsaraDB for Cassandra console.
cassandra.username=cassandra #The account of the instance. The account must have permissions to read and write ycsb keyspace.
cassandra.password=123456 #If you forget the password, you can change the password in the console.
                        

4. Prepare data (only write test)

nohup ./bin/ycsb load cassandra2-cql -threads $THREAD_COUNT -P workloads/workloada -s > $LOG_FILE 2>&1 &
                        

You can view the maximum write throughput based on the test result. To test the maximum throughput, you must increase the value of $THREAD_COUNT and view whether the throughput increases. The specifications for the stress testing client cannot be set too small.

5. Perform stress testing (read and write test)

nohup ./bin/ycsb run cassandra2-cql -threads $THREAD_COUNT -P workloads/workloada -s > $LOG_FILE 2>&1 &
                        

You can view the read and write performance based on the test result.

Test results

The test results are for reference only. Different loads result in different throughput and latency You can use various parameters, various loads, and larger data volumes (longer time periods) to obtain test results that suit your business. Take note of the following item: The client specifications affect test results. Do not use shared instances.

Test result description

Load: the data preparation phase (only write test). Run: the stress testing phase (read and write test). OPS: the operations per second, indicating the throughput in all the phases. WAVG: the average write latency. Unit: microsecond. RAVG: the average read latency. Unit: microsecond. RP999: the 99.9% read latency. Unit: microsecond. Thread: 100/100 indicates the number of YCSB testing threads during data preparation/the number of YCSB testing threads during stress testing.

A full load test and a normal load test are performed during the stress testing phase.

80% CPU load

Specification

Thread

Data volume (ten thousand rows)

Load OPS

Load WAVG

Run OPS

Run WAVG

Run RAVG

Run RP95

Run RP99

Run RP999

4-core 8 GB

100/100

1600

32277

3071

29745

2846

3363

7795

23039

43999

60% CPU load

Specification

Thread

Data volume (ten thousand rows)

Load OPS

Load WAVG

Run OPS

Run WAVG

Run RAVG

Run RP95

Run RP99

Run RP999

4-core 8 GB

100/16

1600

32063

3093

16721

514

974

1879

3047

28063

Take note of the following item: This topic lists the test results of an instance based on the SSD. Ultra disks provide high IOPS. When the amount of data and the instance specifications are small, the impact of ultra disks on performance is close to the impact of SSDs on performance. Disks are not the bottleneck. Therefore, ultra disks are not provided for reference. You can simulate actual loads based on your business to test the actual performance. The impacts of applications must also be considered in the actual effect of the business side. For example, the garbage collection mechanism of Java clients causes latency.