PolarDB-X global secondary index
Usually, each table contains a primary key index (Primary Index), and indexes other than the primary key index are collectively referred to as secondary indexes (Secondary Index).
Indexes are the basic components of databases. As early as the 1970s, SystemR has supported multi-dimensional queries by adding indexes. In a stand-alone database, indexes are mainly divided into BTree indexes, Hash indexes, full-text indexes, spatial indexes, etc. according to the purpose and the data structure used. Usually, each table contains a primary key index (Primary Index), and indexes other than the primary key index are collectively referred to as secondary indexes (Secondary Index).
Distributed databases adopting the separation of storage and computing and the shared-nothing architecture have good horizontal scalability. Through data partitioning and stateless computing nodes, computing and storage are allowed to scale independently. A large number of distributed databases use this architecture (Spanner , CockroachDB, YugabyteDB, etc.).
What problem does a global index solve?
The shared-nothing architecture introduces the concept of partition, and data needs to be divided according to a fixed partition key, which enables queries containing the partition key to quickly locate a specific partition, while other queries require full partition scans. This situation is similar to that in a stand-alone database, querying by primary key can quickly locate the page where the data is located, while querying by non-primary key requires a full table scan.
Different from the stand-alone database, for a distributed database, the full partition scan will not only increase the number of slow queries and reduce the system throughput, but may also cause the system to lose its linear scalability. Refer to the example below
Before expansion: two storage nodes (Data Node, DN), two data partitions, assuming that the physical QPS that a single DN can carry is 3, the overall physical QPS is 6, each query is a full partition scan, logical QPS: 6/ 2=3
After expansion: three storage nodes, three data partitions, the overall physical QPS is 9, each query is a full partition scan, logical QPS: 9/3=3. The machine cost has increased by 50%, and the query performance has not been improved!
A stand-alone database uses a secondary index to avoid full table scans. Specifically, the secondary index selects a non-primary key column as the key, and the value part stores the value of the primary key (it may also be a reference to the row record, the specific implementation does not affect the problem-solving idea) . The query process using the secondary index becomes, first locate the page according to the index column of the secondary index, read the value of the primary key, and then return the primary key index to query the entire row of records (this step is called the return table). In essence, the secondary index avoids full table scan by redundant data, which belongs to the standard idea of system optimization "space for time"
To eliminate full-partition scans in a distributed database, a similar idea can be used, with a redundant copy of the index data, and the index uses a different partition key from the main table. When querying, first locate a partition according to the partition key of the index, and then find the partition key and primary key of the main table from the partition, and return the table to obtain complete data. The whole only needs to scan a fixed number of partitions (for example, for point query, at most two scans are required. partitions).
This kind of index with a different partition dimension from the main table is called a global secondary index (Global Secondary Index, GSI, also often referred to as a global index), and the corresponding index with the same partition dimension as the main table is called a local index ( Local Secondary Index (LSI)
Why is a global index necessary?
It has been said before that the full partition scan will make the system unscalable, so if the user can strictly ensure that all SQL contains the partition key, is there no need for a global index?
Yes, this situation is really not needed, but the complexity of the real situation determines that this is a small probability event. The more common scenarios are:
● The user table needs to support user login by mobile phone number and user ID. Which partition key should be selected?
● In the e-commerce system, you need to query the order according to the buyer ID and seller ID. How to choose the partition key of the order table?
● The existing business code is written by an outsourcing company, and it is unrealistic to modify SQL in a large scale. What should I do?
For more scenario analysis, please refer to TPCC and transparent distribution. Conclusion: In order to provide a "transparent distributed" experience similar to that of a stand-alone database, global indexes must be supported.
What kind of global index user experience do users want?
Indexes in stand-alone databases are very commonly used components and are highly accepted by users. If global indexes can achieve a similar experience to stand-alone database indexes, it can be called a "transparent" index experience. From the perspective of users, the following four key features that affect the experience of using the index are listed
It is not easy to meet these four characteristics, and the read, write, and schema change processes need to be designed accordingly. Relevant issues such as distributed transactions, CBO index selection, how to implement Asynchronous Online Schema Change, how to handle columns including the on update current_timestamp attribute, and how affected rows are compatible with MySQL behavior need to be considered, and high performance needs to be guaranteed.
The following introduces the technical exploration made by PolarDB-X in the process of implementing a global secondary index compatible with MySQL index usage experience.
Global secondary index implementation
For the global index in the OLTP system, it is first necessary to ensure that the data is strongly consistent with the main table. To solve this problem, distributed transactions, logical multi-write and Asynchronous Online Schema Change (AOSC) are required.
Consistency of data writing
When data is written, since the data of the main table and GSI may be located in different partitions, distributed transactions are required to ensure atomic commit. At the same time, due to concurrent writing, write-write conflicts need to be handled. For a table without a global index, the DML statement can be routed to the partition where the data is located, and the concurrency control is completed by DN, but for a table containing GSI, when updating data, it is necessary to first read and lock the data to be changed, and then update the primary key according to the primary key. Tables and indexes, this read-before-write approach is called logical multiple write.
Read-before-write doesn't sound too difficult to implement, it's just SELECT + UPDATE/DELETE, but the reality is a bit more complicated than you think. First of all, the DML implementation of early DRDS completely relied on push-down execution and lacked corresponding logical plans. There are about 13 DML syntaxes in MySQL, each of which needs to be supported, and the push-down execution plan is still reserved for scenarios that can be pushed down; secondly , many detailed behaviors of MySQL are not introduced in the official documents, and need to be adapted one by one according to the code, such as type conversion, affected_rows, implicit default value, etc. In addition, in order to support the global unique index, it is necessary to increase the process of conflict detection, resulting in four more execution modes of the INSERT statement. The above figure shows the execution process of logic multi-writing. For details, please refer to the source code interpretation
Data Consistency for Index Creation
The second aspect of ensuring data consistency is to ensure data consistency during index creation. For example, in the picture on the left below, in a distributed scenario, there may be a time difference in the perception of metadata by multiple nodes. Referring to the situation in the figure, a node knows that there is an index, so it inserts the index and writes to the main table and the index table at the same time. The other node does not know the existence of the index, so it only deletes the content on the main table, but does not delete the content on the index table, which leads to an extra piece of data on the index table.
In order to solve this problem, PolarDB-X refers to the solution of Google F1, and introduces multiple mutually compatible stages to ensure that the transition of metadata is smooth. For detailed implementation, please refer to this article. At the same time, due to the increase in the number of switching metadata versions during the Schema Change process, we have also optimized the evolution of metadata versions on a single CN, so that DDL will not affect read and write execution at all. For details, please refer to this article.
After introducing the above technologies, our entire DDL framework can create global indexes without blocking. It is worth mentioning that MySQL has supported atomic DDL since version 8.0. PolarDB-X also has its own implementation in this regard. For details, see this article for details.
Data Consistency for Index Scans
In the process of data writing, due to concurrency, it is necessary to deal with write-write conflicts. Similarly, in the process of data reading, due to concurrent read-writes, it is also necessary to deal with read-write conflicts. Modern databases basically resolve read-write conflicts through MVCC. Before the query starts, a version number is obtained from the issuer, and the version number is used to determine whether the latest version of the data row is visible to the current transaction, so that the read data meets the specified isolation level. . PolarDB-X supports TSO-based MVCC implementation, which can ensure that the index table and the main table read the same snapshot during the table return process. For MVCC implementation, please refer to this article.
The core goal of index selection is to let users not need to manually specify indexes when using GSI. The solution is automatic index selection based on CBO. The implementation involves how the optimizer evaluates and selects index scans (especially index scans on secondary indexes). , the common names are IndexScan, IndexSeek, etc., hereinafter collectively referred to as the execution plan of IndexScan). The practice of stand-alone database is to replace TableScan with IndexScan. If the index cannot cover the required columns, an additional table return operation is added. The optimization of IndexScan is mainly column cutting and predicate pushdown, and independent algorithms are used to calculate IndexScan and return table. price.
A key issue in cost evaluation is how to evaluate the cost of returning to the table. GSI itself is also a logical table, and the operation of returning the table is equivalent to joining the index table and the main table on the primary key. Therefore, we have made engineering optimizations, and adapted the action of indexing back to the table to the operation of adding a project to a join, so that the entire cost evaluation of the index can be adapted to the cost evaluation of the ordinary query plan.
In order to be able to incorporate plans containing IndexScan into the execution plan enumeration process, index scan and table return operators need to be adapted to the existing CBO framework. The specific implementation is shown in the figure above. The execution plan using GSI is generated through AccessPathRule, and the most suitable plan is selected by comparing the cost in subsequent iterations. Refer to this article about the CBO framework. At the same time, because the return table in the distributed database requires network IO, the cost of returning the table is higher than that of the stand-alone database. PolarDB-X also supports the operation of Join/Limit and other operations in advance before returning to the table, and is pressed to the DN together with the index scan for execution. , to achieve the purpose of reducing the amount of data returned to the table and reducing network IO, please refer to this article for details
A covering index is a special index that allows users to store more columns of data in the index. The purpose is to meet the needs of more query statements for referenced columns and avoid returning to the table as much as possible. Covering indexes in stand-alone databases are a common optimization method. For example, Sql Server has long supported optimizing query performance through covering indexes.
For distributed databases, the return table may also affect the horizontal scalability of the system. Referring to the example in the figure above, the order table is partitioned by buyer_id, and full partition scanning is required when querying by seller_id. Create a GSI on seller_id to optimize, because the index table contains only the partition key, the main table partition key and the primary key by default, there is no content column, and the table needs to be returned. As the number of orders sold by sellers increases, more and more partitions are involved in the table return operation, which eventually becomes a full-partition scan. The goal of avoiding full-partition scans by adding indexes has not been achieved. In order to avoid this situation, PolarDB-X supports the creation of "covering indexes", adding specified columns to GSI through the COVERING syntax, making it easier for GSI to achieve index coverage.
In addition to the lack of columns, the lack of historical versions may also lead to the return of the table. For example, MySQL does not save version information for the secondary index, but only saves the transaction id that performed the last write in the header of each page of the secondary index, which leads to query history if needed. The version must be returned to the table. PolarDB-X records undo-log separately for GSI during the writing process, can read the historical version of the index, does not generate additional table return operations due to querying the historical version, and supports direct Flashback Query (Flashback Query) Delivered to the GSI for execution.
Since distributed transactions and logical multi-write must be used when writing data, there is additional overhead, and the write performance needs to be optimized to ensure system throughput. Specifically, distributed transactions rely on two-phase commits to ensure atomicity. Compared with stand-alone transactions, the prepare phase and the steps of writing commit-points are added. At the same time, TSO is relied on to obtain commit timestamps, and the throughput of TSO services may also become a bottleneck. For the optimization of distributed transactions, including one-phase commit optimization, single-machine multi-partition optimization, TSO Grouping, etc., you can refer to distributed transaction implementation and global timestamp service design.
Logical multi-write needs to read data to CN first. There are two reasons. First, PolarDB-X is compatible with MySQL's pessimistic transaction behavior. The write operation uses the current read. For UPDATE, DELETE and other statements that determine the update range according to the predicate, you need to query first. And lock the data that needs to be modified to avoid inconsistent snapshots read by different branch transactions. Secondly, for the INSERT statement, if there is a unique constraint on the target table, it is also necessary to read the data for unique constraint conflict detection.
A similar process also exists in a stand-alone database. For example, when MySQL executes DML, the server layer first queries and locks it from innodbthe data that needs to be modified, and then call ha_innobase::write_row to write the data. MySQL's unique constraint implementation also requires a unique constraint check before INSERT. The difference is that the interaction between the MySQL server layer and the innodb layer occurs in a single machine, only involving memory and disk IO, and the cost is low. The CN and DN in the distributed database interact through the network machine, which is more expensive.
When PolarDB-X executes DML, push-down execution is preferred. For scenarios that must use logical multi-write, engineering optimizations have been carried out for "query and lock data that needs to be modified" and "unique constraint conflict detection" respectively.
● Read-write parallelism: The idea is very simple. The serial execution process of "read-cache-write" is transformed into multiple small batches of parallel "read" and "write" processes. A key problem to be solved is that MySQL transactions and connections are bound. If multiple read connections are created within a transaction, there will be data visibility problems. In order to solve this problem, we introduce the concept of transaction group, so that multiple connections can share the same ReadView, thereby solving the problem of binding between read and write connections within a transaction, so that different batches of reads and writes can be executed in parallel .
● Push-down of unique constraint conflict detection: It mainly solves the performance problem in the data import scenario. In order to resume the data import from a breakpoint, a statement such as INSERT IGNORE is usually used, but in fact, almost all the inserted data has no conflict, so It is not cost-effective to do conflict detection for each piece of data. Our optimization method is to adopt the idea of optimistic processing and add compensation through the RETURNING statement. In the data import scenario, the performance of INSERT IGNORE is the same as that of INSERT.
Good DDL compatibility is a must for "transparent" global indexes. Just imagine, if every time you modify the column type, you need to delete the global index referencing this column, and rebuild the index after the type change is completed, how "big head" thing is. PolarDB-X is fully compatible with MySQL DDL statements, and statements related to table, column, and partition changes will automatically maintain GSI. DDL execution algorithms are divided into Instant DDL and Online DDL. Instant DDL is mainly used for adding columns. Online DDL is based on AOSC and has a refined design for different DDL statements. The following is a brief introduction to the representative implementations of ADD COLUMN and CHANGE COLUMN.
PolarDB-X supports a global clustered secondary index (CSI), which is characterized by always maintaining the same structure as the main table, ensuring that no return table is required for all queries. Therefore, a column needs to be added to the CSI when adding a column to the main table. Generally, the AOSC process can ensure that the index data is consistent during the column adding process. However, if the new column contains the ON UPDATE CURRENT_TIMESTAMP attribute, it may cause problems. For example, in the following scenario, the physical table is added with columns, but CN does not know the existence of the new column, so the DN independently fills the values in the main table and the index table, resulting in inconsistent data. In order to solve this problem, we will use the backfill process to refresh the value of the newly added column on the index after all CNs perceive the metadata update to ensure that the index is consistent with the main table data.
Changing the column type is the most complex operation in DDL, and has a relatively large impact on writing. For example, MySQL 8.0 still needs to lock the table in the process of changing the column type. PolarDB-X supports Online Modify Column (OMC), through the method of "adding column - copying data - modifying metadata mapping", combined with Instant Add Column, to realize the type change of unlocked table column that supports GSI.
The figure above shows the process of executing CHANGE COLUMN on a table without GSI. Divided into seven stages, first add a COL_B that is invisible to the user, fill COL_A and COL_B with the same value during the writing process, then backfill the existing data in the table with the value of COL_A to COL_B, and finally exchange COL_A and COL_B Metadata mapping of COL_B, and delete COL_A to complete the column type change. Scenarios where GSI is present also follow the same process, except that the GSI is treated the same at each step.
The underlying technology used for DDL compatibility is the same as that used to create GSI (AOSC, data backfill, logical multi-write, asynchronous DDL engine, etc.), but the implementation needs to consider the semantics of each DDL statement, as well as the detailed behavior of MySQL, such as changing columns Type backfill data between new and old columns through UPDATE statement, but the type conversion logic of MySQL ALTER TABLE and UPDATE is not the same, so we have implemented special type conversion logic to simulate the behavior of ALTER TABLE in UPDATE. In general, DDL compatibility seems to only support some syntax, but the workload inside is actually very large.
The impact of global indexing on read and write performance has a lot to do with specific business scenarios. In essence, it sacrifices part of the write performance in exchange for a substantial improvement in read performance. The following figure uses the Sysbench scenario as an example to show the GSI's read and write throughput in this scenario. Impact.
Distributed databases based on the separation of storage and computation and shared-nothing architecture need to support global secondary indexes to eliminate full partition scans and ensure linear scalability. The stand-alone database has introduced secondary indexes very early, and the user experience is highly accepted. A good global index usage experience should be in line with the stand-alone database. It is necessary to ensure strong data consistency, support creation through DDL statements, support automatic index selection, and index at the same time. The presence of should not prevent other DDL statements from executing. PolarDB-X ensures strong consistency of index data through distributed transactions and logical multi-write; supports Online Schema Change, and the index creation process can be parallelized with writing; supports covering indexes to solve the problem of full partition scan caused by returning tables; refined processing Contains DDL statement compatibility for GSI tables. Provide users with a "transparent" index experience and lower the threshold for using distributed databases.
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Explore More Special Offers
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00