PolarDBX split key recommendation
Preface
PolarDBX2.0 provides the ability of transparent distribution. By default, the primary key is split, allowing users to migrate from a standalone database to a distributed database imperceptibly. The choice of the split key has been studied for a long time in academia and industry. One important choice is to give priority to tp or ap. It is difficult to give consideration to both. The purpose of tp priority [1] is to reduce distributed transactions. Filter is preferred, so that SQL does not cross databases as much as possible. The purpose of ap priority [2] is to take advantage of the advantages of pushdown, optimize the network, and prioritize the processing of equivalent joins. PolarDBX's primary key splitting is friendly to tp queries, but not to ap queries. Therefore, if it is found that the default splitting method cannot meet the needs of partial AP after PolarDBX is migrated, you can recommend the splitting key based on the actual workload, and make intelligent splitting key adjustments in combination with PolarDBX's splitting change capability.
There are two main ways to recommend AP priority split keys: the results are more accurate with the help of the optimizer [3], but the running time is too long. The optimizer is used as a black box or a deep modification optimizer, which is relatively fast, but the speed is still not satisfactory; It is relatively fast without the optimizer [4], but it still needs to enumerate all possibilities, and the cost is not accurate. The reason why the algorithm with optimizer is difficult to optimize is that the result of the objective function cannot be predicted, so it is unable to give effective pruning. In order to make recommendations as quickly as possible, PolarDBX's split key recommendation does not use the optimizer and uses less statistical information. An efficient recommendation algorithm is designed based on the problems defined by Amazon RedShift.
Problem definition
This section is referenced from Amazon RedShift's paper [4].
Split key recommendation
The problem is defined as selecting a split column for each table to maximize the network cost that can be saved. Join can push down that the split key used by two tables is exactly two columns of this side. In the following figure (b), A, B, C, D, E, and F use a, b1, c, d1, c, and c as split keys respectively. Then,,can be pushed down. The cost saved by this split is 6.
Join MultiGraph
Given n queries, the ith query is considered as m binary join $ {j_ {i1}, j_ {i2},...,j_ A collection of {im} } $. Each jk is a five tuple, $jk=$, representing table $t1_ {k} $a1 of $_ {k} $Columns and Tables $t2_ {k} $a2 of $_ {k} The equivalent join of the $column. The network cost of not pushing down is $w_ k$。 First 4 same sides $$、$$will be merged into $$。
The definition of w in $$is that the rows of t1 and t2 in the two tables are r1 and r2, respectively. $w=min (r1+r2, min (r1, r2) * shardCnt) * count $. The meaning of the formula is that if not pushed down, the resulting network cost is to pull two tables to CN for hash join or use the small table as the broadcast table to multiply the number of occurrences of this equivalent condition.
To build a Join Multi Graph, each table is a point, each jk is an edge, and there are two column names and costs on the edge. The top edge in the figure (a) below is. This graph can have multiple edges. For example, there are two edges between BDs (because the column names are different).
Inapproximability
This problem can not be approximated to $2 ^ {log ^ {1  epsilon} n} $, and n is the number of columns involved in the figure. This is actually a very bad conclusion. The theoretical approximation ratio of this problem is basically equal to none.
The RedShift algorithm is an integer linear programming. If the timeout occurs, call the four random algorithms, whichever is larger. There are some problems in this algorithm, such as the lack of stability of the random algorithm, and the inability of autonomous control over timeout. Based on the above shortcomings, PolarDBX has designed an exhaustive algorithm based on branch and round.
Algorithm design
Exhaustion algorithm based on branch and bond
For a scheme $Shard= {P, col } $that is exhaustive to the table level, calculate the upper bound of the weight of all splitting schemes based on this scheme, upperBound. If the upper bound is less than the currently found optimal solution, that is, $upperBound (Shard) ≥ OPT $is not satisfied, the scheme can be discarded directly.
The upper bound can be calculated in four parts:
1: The weight of the current P.
2: Weight caused by col.
3: The weight of the best solution from table level+1 to table  V .
4: For table level+1 to table  V , only the weight of the best solution caused by shard is considered, that is, the best column is selected for each table (interval sum).
The schemes of 3 and 4 may not be the same scheme, but the value of $1+2+3+4 $is definitely an upper bound.
At the calculation level, 3 is directly calculated by iteration of the reverse search in the exactSearch function. 1, 2, and 4 can be maintained in the search function when enumerating Shards. A tree array is used to maintain the interval sum of 4. The max value of each table is recorded in an array.
It seems that the search space of exactSearch is larger than that of naiveSearch, but this is the key to pruning.
The following figure is an example of pruning. The T1 optional split key is $ {a, b } $, the T2 optional split key is $ {c, d } $, the T3 optional split key is $ {e, f } $, and the T4 optional split key is $ {g, h } $. The four subgraphs represent four cycles of the exactSearch function 27 lines, respectively, that is, the best split key for T4, the best split key for T3, T4, the best split key for T2, T3, T4, and the best split key for T1, T2, T3, T4. The red node is the best splitting scheme for the current table set. Figure 3 shows that the best splitting key for T2, T3, and T4 is {d, f, h}. The blue color indicates the nodes visited in the search process. The nodes that have not been traversed to the end are the pruned parts. In Figure 4, if $T1=a and T2=d $are enumerated, and $upperBound (T1=a, T2=d)
The ability to prune is determined by the size of the upperBound function and OPT. The ability to prune is enhanced as follows.
The exhaustive order is changed to
Exhaustion order has a great impact on the speed of violent search, such as various heuristics in clique enumeration problem [5]. Consider the list with degree {1,9,10}. The total enumeration of the list {10,9,1} is 1+9 * 1+10 * 9 * 1=100 $, and the total enumeration of the list {1,9,10} is $10+9 * 10+1 * 9 * 10=190 $. The order also affects the value of the upperBound function.
An intuitive method is to sort points in descending order of degree. Here, we can explain the idea with the example of {1,9,10} above.
Note that the most influential part of pruning is 4. If the positions of connected points are close in the list, 4 can be reduced earlier. PolarDBX adopts an exhaustive algorithm based on global minimum cut. From a global perspective, the graph is divided into two blocks to minimize the edge weight between them. The block with more points is put in front, and then the two blocks are iteratively processed. Each segmentation here is a global minimum cut problem. Stoer Wagner algorithm needs to be called n times, with a total complexity of $O (n ^ 2mlogn) $. The idea of minimum cut is also used in [6], but it does not deal with related problems.
OPT optimization
Calculating a relatively large OPT can strengthen the ability of pruning.
At the beginning of each cycle of extraSearch, all the splittable columns of the level table are exhaustively spliced with the optimal scheme of the subsequent tables to initialize a large OPT.
In the search function, combine shard and Lopt [level+1] to get a larger OPT.
These two optimizations can directly obtain the splitting scheme. In fact, they are the places where the splitting scheme is actually expected to be calculated, rather than the last table.
Adaptive approximation
The pruning strategy proposed above cannot change the nature of the np hard problem, so approximate accelerated pruning is used, that is, pruning is modified to $upperBound (Shard) ≥ OPT * ratio $. The advantage of this approximation is that the ratio can guarantee the approximation ratio, and the ratio increases gradually with the search time. It should be noted that when saving the results in exactSearch, you need to change the weight to weight * ratio, otherwise the calculation error will occur in the upperBound and the approximation ratio will be lost.
Other
The algorithm may find an execution plan that prevents some tables from being pushed down. Such tables, together with the tables that are not joined, use the filter column as the split key. Otherwise, back to primary key splitting.
In addition, tables with fewer rows than the threshold value will be directly recommended as broadcast tables and will not participate in the process of split key recommendation.
Because the whole algorithm does not use the optimizer, the generated plan will not necessarily be optimized, so finally there will be a whatif process to calculate the change of the sql cost calculated by the optimizer before and after the change of the splitting method. If the evaluation shows that the cost has not changed, the system will give up the recommended result.
Summary
From the above experiments, we can find that even for tp queries, the performance will be greatly affected by the introduction of a large number of distributed transactions due to incorrect split keys. PolarDBX's "manual and automated" scheme has certain requirements for the selection of the split key. It is recommended that the split key be selected manually to reduce the labor burden.
PolarDBX2.0 provides the ability of transparent distribution. By default, the primary key is split, allowing users to migrate from a standalone database to a distributed database imperceptibly. The choice of the split key has been studied for a long time in academia and industry. One important choice is to give priority to tp or ap. It is difficult to give consideration to both. The purpose of tp priority [1] is to reduce distributed transactions. Filter is preferred, so that SQL does not cross databases as much as possible. The purpose of ap priority [2] is to take advantage of the advantages of pushdown, optimize the network, and prioritize the processing of equivalent joins. PolarDBX's primary key splitting is friendly to tp queries, but not to ap queries. Therefore, if it is found that the default splitting method cannot meet the needs of partial AP after PolarDBX is migrated, you can recommend the splitting key based on the actual workload, and make intelligent splitting key adjustments in combination with PolarDBX's splitting change capability.
There are two main ways to recommend AP priority split keys: the results are more accurate with the help of the optimizer [3], but the running time is too long. The optimizer is used as a black box or a deep modification optimizer, which is relatively fast, but the speed is still not satisfactory; It is relatively fast without the optimizer [4], but it still needs to enumerate all possibilities, and the cost is not accurate. The reason why the algorithm with optimizer is difficult to optimize is that the result of the objective function cannot be predicted, so it is unable to give effective pruning. In order to make recommendations as quickly as possible, PolarDBX's split key recommendation does not use the optimizer and uses less statistical information. An efficient recommendation algorithm is designed based on the problems defined by Amazon RedShift.
Problem definition
This section is referenced from Amazon RedShift's paper [4].
Split key recommendation
The problem is defined as selecting a split column for each table to maximize the network cost that can be saved. Join can push down that the split key used by two tables is exactly two columns of this side. In the following figure (b), A, B, C, D, E, and F use a, b1, c, d1, c, and c as split keys respectively. Then
Join MultiGraph
Given n queries, the ith query is considered as m binary join $ {j_ {i1}, j_ {i2},...,j_ A collection of {im} } $. Each jk is a five tuple, $jk=
The definition of w in $
To build a Join Multi Graph, each table is a point, each jk is an edge, and there are two column names and costs on the edge. The top edge in the figure (a) below is
Inapproximability
This problem can not be approximated to $2 ^ {log ^ {1  epsilon} n} $, and n is the number of columns involved in the figure. This is actually a very bad conclusion. The theoretical approximation ratio of this problem is basically equal to none.
The RedShift algorithm is an integer linear programming. If the timeout occurs, call the four random algorithms, whichever is larger. There are some problems in this algorithm, such as the lack of stability of the random algorithm, and the inability of autonomous control over timeout. Based on the above shortcomings, PolarDBX has designed an exhaustive algorithm based on branch and round.
Algorithm design
Exhaustion algorithm based on branch and bond
For a scheme $Shard= {P, col } $that is exhaustive to the table level, calculate the upper bound of the weight of all splitting schemes based on this scheme, upperBound. If the upper bound is less than the currently found optimal solution, that is, $upperBound (Shard) ≥ OPT $is not satisfied, the scheme can be discarded directly.
The upper bound can be calculated in four parts:
1: The weight of the current P.
2: Weight caused by col.
3: The weight of the best solution from table level+1 to table  V .
4: For table level+1 to table  V , only the weight of the best solution caused by shard is considered, that is, the best column is selected for each table (interval sum).
The schemes of 3 and 4 may not be the same scheme, but the value of $1+2+3+4 $is definitely an upper bound.
At the calculation level, 3 is directly calculated by iteration of the reverse search in the exactSearch function. 1, 2, and 4 can be maintained in the search function when enumerating Shards. A tree array is used to maintain the interval sum of 4. The max value of each table is recorded in an array.
It seems that the search space of exactSearch is larger than that of naiveSearch, but this is the key to pruning.
The following figure is an example of pruning. The T1 optional split key is $ {a, b } $, the T2 optional split key is $ {c, d } $, the T3 optional split key is $ {e, f } $, and the T4 optional split key is $ {g, h } $. The four subgraphs represent four cycles of the exactSearch function 27 lines, respectively, that is, the best split key for T4, the best split key for T3, T4, the best split key for T2, T3, T4, and the best split key for T1, T2, T3, T4. The red node is the best splitting scheme for the current table set. Figure 3 shows that the best splitting key for T2, T3, and T4 is {d, f, h}. The blue color indicates the nodes visited in the search process. The nodes that have not been traversed to the end are the pruned parts. In Figure 4, if $T1=a and T2=d $are enumerated, and $upperBound (T1=a, T2=d)
The ability to prune is determined by the size of the upperBound function and OPT. The ability to prune is enhanced as follows.
The exhaustive order is changed to
Exhaustion order has a great impact on the speed of violent search, such as various heuristics in clique enumeration problem [5]. Consider the list with degree {1,9,10}. The total enumeration of the list {10,9,1} is 1+9 * 1+10 * 9 * 1=100 $, and the total enumeration of the list {1,9,10} is $10+9 * 10+1 * 9 * 10=190 $. The order also affects the value of the upperBound function.
An intuitive method is to sort points in descending order of degree. Here, we can explain the idea with the example of {1,9,10} above.
Note that the most influential part of pruning is 4. If the positions of connected points are close in the list, 4 can be reduced earlier. PolarDBX adopts an exhaustive algorithm based on global minimum cut. From a global perspective, the graph is divided into two blocks to minimize the edge weight between them. The block with more points is put in front, and then the two blocks are iteratively processed. Each segmentation here is a global minimum cut problem. Stoer Wagner algorithm needs to be called n times, with a total complexity of $O (n ^ 2mlogn) $. The idea of minimum cut is also used in [6], but it does not deal with related problems.
OPT optimization
Calculating a relatively large OPT can strengthen the ability of pruning.
At the beginning of each cycle of extraSearch, all the splittable columns of the level table are exhaustively spliced with the optimal scheme of the subsequent tables to initialize a large OPT.
In the search function, combine shard and Lopt [level+1] to get a larger OPT.
These two optimizations can directly obtain the splitting scheme. In fact, they are the places where the splitting scheme is actually expected to be calculated, rather than the last table.
Adaptive approximation
The pruning strategy proposed above cannot change the nature of the np hard problem, so approximate accelerated pruning is used, that is, pruning is modified to $upperBound (Shard) ≥ OPT * ratio $. The advantage of this approximation is that the ratio can guarantee the approximation ratio, and the ratio increases gradually with the search time. It should be noted that when saving the results in exactSearch, you need to change the weight to weight * ratio, otherwise the calculation error will occur in the upperBound and the approximation ratio will be lost.
Other
The algorithm may find an execution plan that prevents some tables from being pushed down. Such tables, together with the tables that are not joined, use the filter column as the split key. Otherwise, back to primary key splitting.
In addition, tables with fewer rows than the threshold value will be directly recommended as broadcast tables and will not participate in the process of split key recommendation.
Because the whole algorithm does not use the optimizer, the generated plan will not necessarily be optimized, so finally there will be a whatif process to calculate the change of the sql cost calculated by the optimizer before and after the change of the splitting method. If the evaluation shows that the cost has not changed, the system will give up the recommended result.
Summary
From the above experiments, we can find that even for tp queries, the performance will be greatly affected by the introduction of a large number of distributed transactions due to incorrect split keys. PolarDBX's "manual and automated" scheme has certain requirements for the selection of the split key. It is recommended that the split key be selected manually to reduce the labor burden.
Related Articles

A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team

What Does IOT Mean
Knowledge Base Team

6 Optional Technologies for Data Storage
Knowledge Base Team

What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers

Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00