After you find a slow Structured Query Language (SQL) query that needs to be tuned, you can execute the EXPLAIN statement to view the execution plan. Then, you can optimize the SQL statement by pushing down more computations to the storage layer MySQL, appropriately adding indexes, and optimizing the execution plan.

Push down more computations

DRDS pushes down more computations to the storage layer MySQL. Computation pushdown reduces the amount of transferred data and the overheads at the network layer and the DRDS layer and improves the execution efficiency of SQL statements.DRDS can push down almost all the operators that include the following items:

  • Filter conditions, such as the conditions in WHERE or HAVING.
  • Aggregation operators, such as COUNT and GROUP BY. Aggregation operators are divided into two phases for aggregation.
  • Sorting operators, such as ORDER BY.
  • JOIN and subqueries: The JOIN Keys and the sharding modes on both sides must be the same or one side is a broadcast table.

The following example shows how to push down more computations to MySQL to accelerate the execution.

> EXPLAIN select * from customer, nation where c_nationkey = n_nationkey and n_regionkey = 3;

Project(c_custkey="c_custkey", c_name="c_name", c_address="c_address", c_nationkey="c_nationkey", c_phone="c_phone", c_acctbal="c_acctbal", c_mktsegment="c_mktsegment", c_comment="c_comment", n_nationkey="n_nationkey", n_name="n_name", n_regionkey="n_regionkey", n_comment="n_comment")
  BKAJoin(condition="c_nationkey = n_nationkey", type="inner")
    Gather(concurrent=true)
      LogicalView(tables="nation", shardCount=2, sql="SELECT * FROM `nation` AS `nation` WHERE (`n_regionkey` = ?)")
    Gather(concurrent=true)
      LogicalView(tables="customer_[0-7]", shardCount=8, sql="SELECT * FROM `customer` AS `customer` WHERE (`c_nationkey` IN ('?'))")

BKAJOIN appears in the execution plan. BKAJOIN assembles an IN query to retrieve the rows associated with the right table and performs the JOIN operation each time BKAJOIN retrieves a batch of data from the left table. The left table contains a large amount of data. As a result, data must be retrieved for many times to complete the query. This slows down the execution.

The JOIN operation cannot be pushed down because nations are grouped by the n_nationkey primary key, but the JOIN Key for this query is c_custkey. This difference causes the pushdown failure.

The nation table contains a small amount of data and few modifications are made on the table. Therefore, you can rebuild the table as the following broadcast table:

--- After the modification ---
CREATE TABLE `nation` (
  `n_nationkey` int(11) NOT NULL,
  `n_name` varchar(25) NOT NULL,
  `n_regionkey` int(11) NOT NULL,
  `n_comment` varchar(152) DEFAULT NULL,
  PRIMARY KEY (`n_nationkey`)
) BROADCAST; --- Declare the table as a broadcast table.

After the modification, JOIN does not appear in the execution plan and almost all the computations are pushed down to the storage layer MySQL and then executed in LogicalView. The upper layer only gathers and returns the results to users by using the Gather operator. This significantly improves the execution performance.

> EXPLAIN select * from customer, nation where c_nationkey = n_nationkey and n_regionkey = 3;

Gather(concurrent=true)
  LogicalView(tables="customer_[0-7],nation", shardCount=8, sql="SELECT * FROM `customer` AS `customer` INNER JOIN `nation` AS `nation` ON ((`nation`.`n_regionkey` = ?) AND (`customer`.`c_nationkey` = `nation`.`n_nationkey`))")

For more information about pushdown principles and optimization, see SQL rewrite and pushdown.

Add indexes

If (physical) slow SQL statements appear in the SQL statements that are pushed down, you can add indexes to table shardings. Details are not described here.

DRDS has supported Global secondary indexes since the 5.4.1 version. You can add a global secondary index (GSI) to enable a logical table to be sharded in multiple dimensions.

The following example shows how to use a GSI to push down more operators. A slow SQL statement is used in this example.

> EXPLAIN select o_orderkey, c_custkey, c_name from orders, customer
          where o_custkey = c_custkey and o_orderdate = '2019-11-11' and o_totalprice > 100;

Project(o_orderkey="o_orderkey", c_custkey="c_custkey", c_name="c_name")
  HashJoin(condition="o_custkey = c_custkey", type="inner")
    Gather(concurrent=true)
      LogicalView(tables="customer_[0-7]", shardCount=8, sql="SELECT `c_custkey`, `c_name` FROM `customer` AS `customer`")
    Gather(concurrent=true)
      LogicalView(tables="orders_[0-7]", shardCount=8, sql="SELECT `o_orderkey`, `o_custkey` FROM `orders` AS `orders` WHERE ((`o_orderdate` = ?) AND (`o_totalprice` > ?))")

In the execution plan, orders is sharded by o_orderkey and customer is sharded by c_custkey. The JOIN operator cannot be pushed down due to different sharding dimensions.

A large number of orders, each with a total price of more than CNY 100, are generated on November 11, 2019 and cross-shard JOIN operations consume a long time. Therefore, you must create a GSI on the orders table so that the JOIN operator can be pushed down.

The o_orderkey, o_custkey, o_orderdate, and o_totalprice columns in the orders table are used for the query. In the four columns, o_orderkey is the shard key of the primary table and o_custkey is the shard key of the index table. o_orderdate and o_totalprice are used as covering columns in the index to avoid table access.

> create global index i_o_custkey on orders(`o_custkey`) covering(`o_orderdate`, `o_totalprice`)
        DBPARTITION BY HASH(`o_custkey`) TBPARTITION BY HASH(`o_custkey`) TBPARTITIONS 4;

After the GSI is added and force index(i_o_custkey) is executed to forcibly use the GSI, the cross-shard JOIN operation becomes a local JOIN operation in IndexScan on MySQL. In addition, covering columns are used to avoid table access. This improves the query performance.

> EXPLAIN select o_orderkey, c_custkey, c_name from orders force index(i_o_custkey), customer
          where o_custkey = c_custkey and o_orderdate = '2019-11-11' and o_totalprice > 100;

Gather(concurrent=true)
  IndexScan(tables="i_o_custkey_[0-7],customer_[0-7]", shardCount=8, sql="SELECT `i_o_custkey`.`o_orderkey`, `customer`.`c_custkey`, `customer`.`c_name` FROM `i_o_custkey` AS `i_o_custkey` INNER JOIN `customer` AS `customer` ON (((`i_o_custkey`.`o_orderdate` = ?) AND (`i_o_custkey`.`o_custkey` = `customer`.`c_custkey`)) AND (`i_o_custkey`.`o_totalprice` > ?))")

For more information about how to use GSIs, see Use global secondary indexes.

Optimize execution plans

Note The following content applies to DRDS 5.3.12 or later.

In most cases,the query optimizer of DRDS can automatically generate the best execution plan. However, in a few cases, the generated execution plan may not be as expected due to missing or invalid statistics. In this case, you can use Hints to intervene in the behavior of the optimizer to enable the optimizer to produce a better execution plan.

The following example shows how to optimize the execution plan:

In the following query,the query optimizer of DRDS compares the costs of both JOIN sides.

> EXPLAIN select o_orderkey, c_custkey, c_name from orders, customer
          where o_custkey = c_custkey and o_orderdate = '2019-11-15' and o_totalprice < 10;

Project(o_orderkey="o_orderkey", c_custkey="c_custkey", c_name="c_name")
  HashJoin(condition="o_custkey = c_custkey", type="inner")
    Gather(concurrent=true)
      LogicalView(tables="customer_[0-7]", shardCount=8, sql="SELECT `c_custkey`, `c_name` FROM `customer` AS `customer`")
    Gather(concurrent=true)
      LogicalView(tables="orders_[0-7]", shardCount=8, sql="SELECT `o_orderkey`, `o_custkey` FROM `orders` AS `orders` WHERE ((`o_orderdate` = ?) AND (`o_totalprice` < ?))")

However, on November 15, 2019, each of only a few orders has a total price of less than CNY 10. In this case, BKAJOIN is a better choice than Hash JOIN. For more information about BKAJOIN and Hash JOIN, see Optimize and execute JOINs and subqueries.

Use the /*+TDDL:BKA_JOIN(orders, customer)*/ Hint to force the optimizer to use BKAJOIN (LookupJOIN):

> EXPLAIN /*+TDDL:BKA_JOIN(orders, customer)*/ select o_orderkey, c_custkey, c_name from orders, customer
          where o_custkey = c_custkey and o_orderdate = '2019-11-15' and o_totalprice < 10;

Project(o_orderkey="o_orderkey", c_custkey="c_custkey", c_name="c_name")
  BKAJoin(condition="o_custkey = c_custkey", type="inner")
    Gather(concurrent=true)
      LogicalView(tables="orders_[0-7]", shardCount=8, sql="SELECT `o_orderkey`, `o_custkey` FROM `orders` AS `orders` WHERE ((`o_orderdate` = ?) AND (`o_totalprice` < ?))")
    Gather(concurrent=true)
      LogicalView(tables="customer_[0-7]", shardCount=8, sql="SELECT `c_custkey`, `c_name` FROM `customer` AS `customer` WHERE (`c_custkey` IN ('?'))")

You can perform a query that contains the following hint:

/*+TDDL:BKA_JOIN(orders, customer)*/ select o_orderkey, c_custkey, c_name from orders, customer where o_custkey = c_custkey and o_orderdate = '2019-11-15' and o_totalprice < 10;

The preceding operation speeds up the SQL query. To make Hints work, you can add Hints to the SQL statement in the application. Alternatively, the more convenient method is to use the Plan Management feature to fix the execution plan of the SQL statement. The following code provides the specific operations.

BASELINE FIX SQL /*+TDDL:BKA_JOIN(orders, customer)*/ select o_orderkey, c_custkey, c_name from orders, customer where o_custkey = c_custkey and o_orderdate = '2019-11-15';

In this way,DRDS uses the preceding fixed execution plan for this SQL statement for which the parameters can be different.

For more information about Plan Management, see Manage execution plans.