Developer Content

Related Tags：1.PolarDB-X
2. PolarDB

Abstract: Secondary indexes are a key difference between relational databases and NoSQL databases. The secondary index must be strongly consistent, so the writing of the index needs to be placed in a transaction with the writing of the primary key. The performance of the transaction is the basis of the performance of the secondary index. This test will focus on the index performance of different distributed databases, especially the performance difference between the global index in the industry and the MySQL index.

Why do you want to do this test?

The secondary index is a key difference between relational databases and NoSQL databases. The secondary index must be strongly consistent. Therefore, the writing of the index and the writing of the primary key must be placed in one transaction. The performance of the transaction is the basis for the performance of the secondary index.

There are several mainstream forms of distributed databases in the market from the perspective of user experience:

1: Pure transparent usage represented by TiDB, CockroachDB, etc. In terms of performance, all tables in this type of database are distributed tables, and no partition key needs to be specified. Its core logic is to use distributed transactions to maintain the global index, and use the global index to completely replace the secondary index in the stand-alone database.

2: Pure manual usage represented by OceanBase. In terms of performance, this type of database exists in the form of a single table without specifying the partition key, and does not have scalability; Creating distributed tables requires a syntax similar to partitioning tables. This type of database also generally provides the ability of global indexing (what is not provided is generally called middleware rather than database). However, unlike the first category, they generally take the global index as an option, which is manually specified and created by users. In addition, they also provide local indexes (local indexes) based on stand-alone transactions.

3: PolarDB-X with the above two usages is provided at the same time. Without specifying the partition key, similar to the first type of database, distributed tables+global indexes are used to provide a transparent distributed experience; It also allows you to manually specify the partition key and use local indexing and other technologies to improve performance.

In the previous article, "Interpretation of PolarDB-X Data Distribution (IV): Transparent vs. Manual", we proposed:

1: Transparent (automatic) ease of use determines the lower limit of the use of distributed databases, but there is a higher cost in performance

2: Manual operation can provide optimal performance, but the threshold for use will be increased

Distributed databases need to provide the ability to use transparently for most scenarios, and also provide the ability to manually tune and eliminate distributed transactions for a few scenarios with high performance requirements.

The important basis of this view is that the pure transparency mode essentially uses distributed transaction+global index to replace the transaction+index in the stand-alone database. However, the performance of distributed transaction+global index is significantly different from that of stand-alone transaction+index.

This test will focus on the index performance of different distributed databases, especially the performance difference between the global index and MySQL index in the industry.

The products tested this time include TiDB, OB, PolarDB-X and CockroachDB. The reasons for selecting these databases are as follows:

1: They all provide strong and consistent global indexing capabilities, and are databases rather than middleware.

2: Both have open source and cloud products. They have a long history and a lot of data. They are not PPT databases, so it is easy to understand the internal principles.

3: These databases are mainly for OLTP.

In addition, we also tested the index performance of MySQL for comparison.

Test method

Because the hardware configuration (for example, OB uses 6 machines (and the tenant settings do not occupy the entire machine), TiDB and TiKV use 5, PolarDB-X and MySQL are directly purchased from the public cloud, etc.), system parameters, and so on, for each database, they are not exactly the same or necessarily optimal, so it is meaningless to directly compare TPS.

We will take the peak value that TPS can reach for each database without index as the baseline (100%), and compare the percentage of performance of different index numbers relative to the baseline.

For example, if Product A can run to 10W TPS without an index and to 5W TPS with an index, then we think it is 50% of the baseline with an index. This percentage may be called TPS percentage.

In the horizontal comparison between products, we will compare the percentage of TPS under the same index number, rather than the absolute value of TPS.

When testing the TPS percentage, we will adjust the concurrency to find the maximum TPS concurrency, and calculate the TPS percentage with the maximum TPS.

In addition to the TPS percentage, we will also test the RT of each product written at a single time under different index numbers. In RT test, we will use single thread to write.

We can see that even if there is only one index in the global indexes implemented by these distributed databases, the performance will fall below 30%. In the case of eight indexes, the performance will basically fall below 10%.

The performance of a stand-alone database such as MySQL remains above 85% with 8 indexes.

This confirms the point we mentioned in PolarDB-X Data Distribution Interpretation (IV): Transparent vs. Manual: "Compared with stand-alone transactions, distributed transactions still have an insurmountable gap in cost (or performance), which is at least three times greater".

Replacing stand-alone database indexes with global indexes will bring high costs. In cost sensitive scenarios, local indexes need to be properly used to reduce the use costs.

In a database that provides local indexes:

● The local index and primary key of PolarDB-X and OB have the affinity of Locality, and can use stand-alone transactions to write the index. Compared with the global index, PolarDB-X and OB maintain very high performance.

● Although TiDB provides local indexes, its indexes and primary keys do not have the affinity on Locality and cannot be bound to the same machine. Therefore, its local indexes still need to be maintained using distributed transactions. There is no significant difference between the performance and the global indexes, and the cost is high.

● The local index of CockroachDB is theoretically similar to the behavior of TiDB, but the partition function of CockroachDB is only provided by the commercial version, so it has not been tested this time

For TiDB and CockroachDB, the situation is embarrassing, because the cost of all the indexes they provide is much higher than that of stand-alone MySQL. As a user, there is no way to eliminate this cost unless you do not use secondary indexes.

From the perspective of RT:

1: Because the network interaction of transactions in a stand-alone database is the least, RT performance is the best, and it has little relationship with the number of indexes.

2: Distributed database transactions require more cross node interactions, so RT is obviously larger than stand-alone databases. However, distributed databases generally adopt the parallel write strategy on multi branch transactions, so the database RT with good performance will not increase linearly with the number of indexes. Overall, RT is acceptable.

3: The RT performance of the CockroachDB global index is the worst, which may be related to the transaction strategy using HLC. Other databases use the TSO scheme.

4: The RT performance of TiDB global indexes is the best. The RTs of 0-8 indexes are almost unchanged, indicating that the parallel optimization is very good.

5; The RT of PolarDB-X global index seems to have room for optimization. Although the increase is limited, it is not completely parallel. We will optimize in subsequent versions. The local index RT is very stable and low.

6: The performance of OB's global index and local index is similar to that of PolarDB-X. Parallel optimization has room for improvement, and local index performs well.

As an additional discovery in the testing process, auto_increment/serial of TiDB, OB, and CockroachDB have serious performance problems, and random alternatives should be used. TiDB and CockroachDB are caused by the hot range brought by the time sequence, and OB may be caused by some internal locks. Compatibility includes function compatibility and performance compatibility. There is a long way to go for performance compatibility.

Evaluation of TiDB, OceanBase, PolarDB-X and CockroachDB secondary index writing performance

Why do you want to do this test?

Test method

From the perspective of RT:

Related Articles

A detailed explanation of Hadoop core architecture HDFS

What Does IOT Mean

6 Optional Technologies for Data Storage

What Is Blockchain Technology

Explore More Special Offers

Short Message Service(SMS) & Mail Service

Sales Support

Technical Support

Connect & Report Abuse