edit-icon download-icon

Precautions on comparing ECS user-created databases and ApsaraDB for RDS instances in terms of performance

Last Updated: Mar 31, 2018

Scenario

Test the performance differences between ECS user-created databases and ApsaraDB for RDS instances.

Background information

Many users compare ECS user-created databases and ApsaraDB for RDS instances in terms of performance. During a performance test, you must make sure that the ECS user-created database and RDS instance for testing are in the same network environment and under the same configuration, and take into account their running mechanisms. Otherwise, the test results are inaccurate and unconvincing.

ApsaraDB for RDS is a public relational database service with the outstanding features of high availability and robust security. High performance is the second consideration because no one uses unstable and non-secure services. ApsaraDB for RDS has the following strengths:

  • Each ApsaraDB for RDS instance runs on a master node and a backup node which may be in different zones of the same region. The master node and backup node of an RDS for MySQL instance may be in different regions. When the master instance is faulty, the backup instance can quickly take over services to guarantee stability.

  • ApsaraDB for RDS adds an intermediate layer for data access. All requests are routed through this layer and requests with SQL injection are intercepted at this layer. ApsaraDB for RDS implements the highest security level for writing data at the underlying layer to prevent data loss when a power failure occurs. This guarantees robust security for databases.

  • The RDS source code team is continually making optimizations to MySQL. Both performance and stability are higher than in the community edition, as evidenced by standard benchmark tests.

Precautions on performance testing

When comparing the performance of an ECS user-created database and an ApsaraDB for RDS instance, pay attention to the following precautions.

Network difference

Zone

RDS instances can be divided into single-zone instances and multi-zone instances. For a single-zone RDS instance, the master database and backup database are in the same data center. For a multi-zone RDS instance, the master database and backup database may be in different data centers. Make sure that the ECS instance and RDS instance for testing are in the same zone.

The path from an ECS instance to an RDS instance contains multiple network links, such as ECS -> DNS -> Server Load Balancer -> Proxy -> DB, as shown in the following figure. The path between ECS user-created databases is ECS -> ECS. An RDS instance has three more links than an ECS user-created database.

RDS access links

Case analysis

Scenario: An e-commerce customer plans to migrate systems to the cloud. During the testing process, it is found that the ApsaraDB for RDS instance has relatively low performance though it has the same application code and database configuration as the ECS user-created database.

Cause: The network delay increases, as shown in the following figure.

Configuration difference

Specification configuration

The specification configuration for an RDS instance mainly includes memory and CPU. When testing the RDS instance, make sure that the ECS instance and RDS instance have the same number of CPU cores.

Parameter settings

  • Security configuration: ApsaraDB for RDS applies the highest protection standards when committing transactions and flushing binary logs to guarantee data security.

    • innodb_flush_log_at_trx_commit indicates the log writing frequency after InnoDB commits transactions. If it is set to 1, the log buffer is written to the log file and flushed to the disk every time a transaction is committed. The default value is 1, which is the safest configuration, but performance loss incurs due to the operation of disk I/O during each transaction commit.

    • sync_binlog indicates the frequency at which MySQL binary logs are synchronized to the disk. MySQL binary logs are flushed to the disk after being written for the times specified by sync_binlog. The value 1 is the safest configuration, whereby binary logs are synchronized once after a statement is executed or a transaction is committed. In this way, only one statement or transaction log is lost in the event of total failure. The trade-off is that this is the slowest configuration.

  • Performance configuration: ApsaraDB for RDS allows you to set parameters except for specification parameters. Most of the parameters have been optimized by the official team, so you can run your RDS instance out of the box without having to adjust any parameters. However, while these parameters are suitable for most use cases, you may need to perform customization for performance purposes.

    tmp_table_size is used to determine the maximum value of the internal temporary memory table which is assigned to each thread. In the actual condition, the minimum values of tmp_table_size and max_heap_table_size take effect. If the temporary memory table exceeds the parameter value, MySQL automatically converts it to a disk-based MyISAM table. Avoid using a temporary table when optimizing query statements. If you need to use a temporary table, make sure that the temporary table is in the memory.

    Symptom: SQL execution takes a longer time if temporary tables are used when a complex SQL statement contains GROUP BY and DISTINCT clauses which cannot be optimized through an index.

    Suggestion: If the application involves many GROUP BY and DISTINCT clauses and the database has enough memory, you can increase the values of tmp_table_size and max_heap_table_size to improve query performance.

    query_cache_size controls the memory size of the MySQL query cache. With the query cache enabled, MySQL locks the query cache when executing every query and then determines whether the query cache contains the queried data. If yes, MySQL returns results directly. If no, MySQL proceeds to engine query and other operations. The INSERT, UPDATE, and DELETE statements can invalidate the query cache and any changes made to schemas and indexes. It costs a lot to maintain the invalid query cache, resulting in great pressure on MySQL. The query cache functions well when the database is not frequently updated. However, when data is frequently written to only several tables of the database, the query cache lock results in frequent lock conflicts and causes the tables to mutually wait for the query cache lock to unlock for reads and writes on a specific table, reducing the efficiency of the SELECT statement.

    Symptom: The database goes through a number of different statuses, including checking query cache for query, waiting for query cache lock, and storing results in query cache.

    Suggestion: ApsaraDB for RDS disables the query cache by default. If your instances enable the query cache, you can disable it when the preceding problem occurs. In some cases, however, you can enable the query cache to solve database performance problems.

Case analysis

Scenario: A customer migrates data from a local business system to the cloud. The services provided over the RDS instance are one time slower than the customer’s local user-created database.

Cause: The user-created database has different parameter settings from the RDS instance.

  • Local parameter settings:

    join_buffer_size = 128M

    read_rnd_buffer_size = 128M

    tmp_table_size = 128M

  • Parameter settings of the RDS instance:

    join_buffer_size = 1M

    read_buffer_size = 1M

    tmp_table_size =256K

Architecture difference

ApsaraDB for RDS adopts master/slave replication for high-availability. It also uses semi-synchronization which is an improvement over asynchronization used by MySQL. After the master database finishes executing a transaction committed by the client, it waits for acknowledgment from the slave database which is responsible for writing the data to the relay log. Only then does the master database send a reply to the client, rather than sending it right away. The semi-synchronization offers data security improvements compared to the asynchronization. However, it also causes a certain degree of latency which is at least equal to the round-trip time for the TCP/IP connection. This means that the semi-synchronization increases the response time of the transaction. See the following figure.

Note: The same problems also exist when the high-availability model of SQL Server uses a mirror.

Semi-synchronous replication

Thank you! We've received your feedback.