PolarDBX split key recommendation
Introduction: PolarDBX 2.0 provides transparent and distributed capabilities. By default, the hash split of the primary key is performed, allowing users to migrate from a singlemachine database to a distributed database without perception. The choice of split key is a longstanding issue in academia and industry. An important choice is tp priority or ap priority, and it is difficult to take both.
foreword
PolarDBX2.0 provides transparent and distributed capabilities, and the primary key is split by default, allowing users to migrate from a singlemachine database to a distributed database without awareness. The choice of split key is a longstanding issue in academia and industry. An important choice is tp priority or ap priority, and it is difficult to take both. The purpose of tp first [1] is to reduce distributed transactions. Filter takes precedence, so that sql does not cross libraries as much as possible. The purpose of AP first [2] is to take advantage of pushdown, network optimization, and prioritize equalvalue joins. The primary key split of PolarDBX is friendly to tp queries, but not to ap queries. Therefore, after migrating PolarDBX, if you find that the default split method cannot meet the needs of partial AP, you can recommend split keys based on the actual workload, and make intelligent split key adjustments based on the split change capabilities of PolarDBX.
There are two main ways to recommend apfirst split keys: using the optimizer [3], the results are more accurate, but the running time is too long, and the optimizer is regarded as a black box or a deep modification optimizer, which is relatively faster, but the speed is still not enough. Satisfied; relatively fast without the help of an optimizer [4], but still needs to exhaust all possibilities, and the cost is uncertain. The reason why it is difficult to optimize the algorithm with the help of the optimizer is that the result of the objective function cannot be predicted, so effective pruning cannot be given. In order to make recommendations as quickly as possible, the split key recommendation of PolarDBX does not rely on the optimizer, uses less statistical information, and designs an efficient recommendation algorithm based on the problem defined by Amazon RedShift.
problem definition
The content in this section is referenced from the Amazon RedShift paper [4].
Split key recommendation
The problem definition is to choose a split column per table that maximizes the network cost savings that can be made. The join pushdown requires the split keys used by both tables to be exactly two columns of this edge. In the following figure (b), A, B, C, D, E, and F use a, b1, c, d1, c, and c as split keys respectively, then, , can be pushed down, and the cost savings of this split is 6.
The definition of w in $$ is that the number of rows in the two tables t1 and t2 are r1 and r2 respectively, $w=min(r1+r2,min(r1,r2)*shardCnt) *count$. The meaning of the formula is that if it is not pushed down, the resulting network cost is to pull the two tables to the CN for hash join, or use the small table as a broadcast table and finally multiply the number of occurrences of this equivalent condition.
Build a Join MultiGraph, each table is a point, each jk is an edge, there are two column names and costs on the edge, the top edge in the figure (a) below is. This graph can have multiple edges, for example, there are two edges between BD (because the column names are different).
incommensurability
The inapproximation ratio of this problem is $2^{log^{1epsilon}n}$, where n is the number of columns involved in the graph. This is actually a very bad conclusion, and the theoretical approximation ratio for this problem is basically equal to none.
The algorithm of RedShift is an integer linear programming. If it times out, it will call four random algorithms and take the larger one. This algorithm has problems such as lack of stability of random algorithm, and inability to control autonomously over time. Based on the above shortcomings, PolarDBX designs an exhaustive algorithm based on branchandbound.
Exhaustive algorithm based on branchandbound
For a scheme $Shard={P,col}$ exhausted to the table level, calculate the upperBound of the weight of all split schemes based on this scheme. If the upper bound is smaller than the currently found optimal solution, that is, $upperBound(Shard)≥OPT$ is not satisfied, then this scheme can be discarded directly.
The upper bound can be divided into 4 parts to calculate:
1: The weight of the current P.
2: weight caused by col.
3: The weight of the best solution from table level+1 to table V.
4: From table level+1 to table V, only the weight of the best solution caused by Shard is considered, that is, each table selects the best column (interval summation).
The schemes of 3 and 4 may not be the same scheme, but the value of $1+2+3+4$ is definitely an upper bound.
At the calculation level, 3 is directly iteratively calculated by the reverse search in the exactSearch function. 1, 2, and 4 can be maintained in the search function when shards are exhausted, the interval sum of 4 is maintained by a tree array, and the max value of each table is recorded in an array.
Compared with naiveSearch, exactSearch seems to have a larger search space, but this is the key to pruning.
The 4 subgraphs represent the 4 cycles of lines 27 of the exactSearch function respectively, that is, select the best split of T4 key, choose the best split key for T3, T4, choose the best split key for T2, T3, T4, choose the best split key for T1, T2, T3, T4. The red node is the best splitting scheme for the current table set. Figure 3 shows that the best splitting keys for T2, T3, and T4 are {d,f,h}. Blue is the node visited during the search process, and the node that has not been traversed to the end is the pruned part. In Figure 4, after exhausting $T1=a, T2=d$, it is found that there is $upperBound(T1=a, T2=d)
The exhaustive order is changed to
Exhaustive order has a great impact on the speed of brute force search, such as various heuristics in the clique enumeration problem [5]. Considering a list of degree {1,9,10}, the total exhaustive number of list {10,9,1} is 1+9*1+10*9*1=100$, list {1,9,10 } The total exhaustive number is $10+9*10+1*9*10=190$. In addition, the order also affects the value of the upperBound function.
An intuitive method is to sort the points in descending order of degree, the idea here can be explained with the example of {1,9,10} above.
Note that the most influential part of pruning is 4. If the connected points are close in the list, 4 can be reduced earlier. The exhaustive algorithm based on the global minimum cut adopted by PolarDBX. From a global perspective, the graph is divided into two blocks, so that the edge weight between them is the smallest, the block with more points is placed in the front, and then the two blocks before and after are iteratively processed. Each split here is a global minimum cut problem, which needs to call the Stoer–Wagner algorithm n times, and the total complexity is $O(n^2mlogn)$. The idea of minimum cut is still used in [6], but it does not deal with related problems.
OPT optimization
A relatively large OPT is calculated to enhance the ability of pruning.
At the beginning of each cycle of extraSearch, all the splittable columns of the level table are exhaustively spliced with the optimal solution of the subsequent table, and a larger OPT is initialized.
In the search function, shard and Lopt[level+1] are spliced to obtain a larger OPT.
These two optimizations can directly get the splitting scheme, which is actually where the expected splitting scheme is actually calculated, rather than relying on exhaustive exhaustion to the last table.
foreword
PolarDBX2.0 provides transparent and distributed capabilities, and the primary key is split by default, allowing users to migrate from a singlemachine database to a distributed database without awareness. The choice of split key is a longstanding issue in academia and industry. An important choice is tp priority or ap priority, and it is difficult to take both. The purpose of tp first [1] is to reduce distributed transactions. Filter takes precedence, so that sql does not cross libraries as much as possible. The purpose of AP first [2] is to take advantage of pushdown, network optimization, and prioritize equalvalue joins. The primary key split of PolarDBX is friendly to tp queries, but not to ap queries. Therefore, after migrating PolarDBX, if you find that the default split method cannot meet the needs of partial AP, you can recommend split keys based on the actual workload, and make intelligent split key adjustments based on the split change capabilities of PolarDBX.
There are two main ways to recommend apfirst split keys: using the optimizer [3], the results are more accurate, but the running time is too long, and the optimizer is regarded as a black box or a deep modification optimizer, which is relatively faster, but the speed is still not enough. Satisfied; relatively fast without the help of an optimizer [4], but still needs to exhaust all possibilities, and the cost is uncertain. The reason why it is difficult to optimize the algorithm with the help of the optimizer is that the result of the objective function cannot be predicted, so effective pruning cannot be given. In order to make recommendations as quickly as possible, the split key recommendation of PolarDBX does not rely on the optimizer, uses less statistical information, and designs an efficient recommendation algorithm based on the problem defined by Amazon RedShift.
problem definition
The content in this section is referenced from the Amazon RedShift paper [4].
Split key recommendation
The problem definition is to choose a split column per table that maximizes the network cost savings that can be made. The join pushdown requires the split keys used by both tables to be exactly two columns of this edge. In the following figure (b), A, B, C, D, E, and F use a, b1, c, d1, c, and c as split keys respectively, then
The definition of w in $
Build a Join MultiGraph, each table is a point, each jk is an edge, there are two column names and costs on the edge, the top edge in the figure (a) below is
incommensurability
The inapproximation ratio of this problem is $2^{log^{1epsilon}n}$, where n is the number of columns involved in the graph. This is actually a very bad conclusion, and the theoretical approximation ratio for this problem is basically equal to none.
The algorithm of RedShift is an integer linear programming. If it times out, it will call four random algorithms and take the larger one. This algorithm has problems such as lack of stability of random algorithm, and inability to control autonomously over time. Based on the above shortcomings, PolarDBX designs an exhaustive algorithm based on branchandbound.
Exhaustive algorithm based on branchandbound
For a scheme $Shard={P,col}$ exhausted to the table level, calculate the upperBound of the weight of all split schemes based on this scheme. If the upper bound is smaller than the currently found optimal solution, that is, $upperBound(Shard)≥OPT$ is not satisfied, then this scheme can be discarded directly.
The upper bound can be divided into 4 parts to calculate:
1: The weight of the current P.
2: weight caused by col.
3: The weight of the best solution from table level+1 to table V.
4: From table level+1 to table V, only the weight of the best solution caused by Shard is considered, that is, each table selects the best column (interval summation).
The schemes of 3 and 4 may not be the same scheme, but the value of $1+2+3+4$ is definitely an upper bound.
At the calculation level, 3 is directly iteratively calculated by the reverse search in the exactSearch function. 1, 2, and 4 can be maintained in the search function when shards are exhausted, the interval sum of 4 is maintained by a tree array, and the max value of each table is recorded in an array.
Compared with naiveSearch, exactSearch seems to have a larger search space, but this is the key to pruning.
The 4 subgraphs represent the 4 cycles of lines 27 of the exactSearch function respectively, that is, select the best split of T4 key, choose the best split key for T3, T4, choose the best split key for T2, T3, T4, choose the best split key for T1, T2, T3, T4. The red node is the best splitting scheme for the current table set. Figure 3 shows that the best splitting keys for T2, T3, and T4 are {d,f,h}. Blue is the node visited during the search process, and the node that has not been traversed to the end is the pruned part. In Figure 4, after exhausting $T1=a, T2=d$, it is found that there is $upperBound(T1=a, T2=d)
The exhaustive order is changed to
Exhaustive order has a great impact on the speed of brute force search, such as various heuristics in the clique enumeration problem [5]. Considering a list of degree {1,9,10}, the total exhaustive number of list {10,9,1} is 1+9*1+10*9*1=100$, list {1,9,10 } The total exhaustive number is $10+9*10+1*9*10=190$. In addition, the order also affects the value of the upperBound function.
An intuitive method is to sort the points in descending order of degree, the idea here can be explained with the example of {1,9,10} above.
Note that the most influential part of pruning is 4. If the connected points are close in the list, 4 can be reduced earlier. The exhaustive algorithm based on the global minimum cut adopted by PolarDBX. From a global perspective, the graph is divided into two blocks, so that the edge weight between them is the smallest, the block with more points is placed in the front, and then the two blocks before and after are iteratively processed. Each split here is a global minimum cut problem, which needs to call the Stoer–Wagner algorithm n times, and the total complexity is $O(n^2mlogn)$. The idea of minimum cut is still used in [6], but it does not deal with related problems.
OPT optimization
A relatively large OPT is calculated to enhance the ability of pruning.
At the beginning of each cycle of extraSearch, all the splittable columns of the level table are exhaustively spliced with the optimal solution of the subsequent table, and a larger OPT is initialized.
In the search function, shard and Lopt[level+1] are spliced to obtain a larger OPT.
These two optimizations can directly get the splitting scheme, which is actually where the expected splitting scheme is actually calculated, rather than relying on exhaustive exhaustion to the last table.
Related Articles

A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team

What Does IOT Mean
Knowledge Base Team

6 Optional Technologies for Data Storage
Knowledge Base Team

What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers

Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00