In most cases, index optimization requires O&M personnel or developers to have in-depth understanding on how queries are executed and optimized in database engines. To optimize user experience and reduce the threshold of optimizing indexes, DRDS offers the index recommendation feature that uses a cost-based optimizer. This feature can analyze and recommend indexes for query statements. This feature helps you reduce the time that is consumed to run queries and improve database performance.

Notes

The index recommendation feature analyzes and recommends indexes for only the SQL query statements that you specify. Before you create an index based on the recommendation, evaluate the impact of this index on the other queries.

Environment description

TPC Benchmark-H (TPC-H) is a benchmark that is widely used in the industry. It is formulated and released by the Transaction Processing Performance Council (TPC) to evaluate the database capabilities of processing analytical queries. The TPC-H benchmark involves 8 tables and 22 complex SQL queries (Q1 to Q22). The following figure shows the information that is returned for the Q17 statement. In the TPC-H benchmark, Q17 represents Small-Quantity-Order Revenue Query. The returned information shows that 28.76 seconds was consumed to execute the query statement. In this topic, the intelligent index recommendation feature is used to accelerate the query.

Note For more information about the query statements in TCP-H, see TCP-H documentation.

This implementation of TPC-H is derived from the TPC-H Benchmark and is not comparable to published TPC-H Benchmark results, as this implementation does not comply with all the requirements of the TPC-H Benchmark.

Figure 1. Before intelligent index optimization
Before intelligent index optimization

Step 1: Query the information about an intelligent index recommendation

To query the information about an intelligent index recommendation for a query statement, you need only to add the EXPLAIN ADVISOR statement at the start of the query statement. The following code block shows an example:

EXPLAIN ADVISOR
SELECT sum(l_extendedprice) / 7.0 AS avg_yearly
FROM lineitem,
     part
WHERE p_partkey = l_partkey
  AND p_brand = 'Brand#23'
  AND p_container = 'MED BOX'
  AND l_quantity <
    (SELECT 0.2 * avg(`l_quantity`)
     FROM lineitem
     WHERE l_partkey = p_partkey);

After you execute the preceding statement,DRDS returns the information about the index recommendation, including the statement used to create the recommended index and the costs before and after the index is created. The following code block shows the returned information and the related comments:

Note
  • In this example, the number of disk I/O operations is reduced by 3024.7%. This indicates that the recommended index can bring huge benefits.
  • When DRDS cannot recommend an index, it returns a message that recommends you to execute the Analyze Table statement during off-peak hours of your business. The statement refreshes the statistics of the table for which you want to create an index. This operation consumes a large number of I/O resources. After the statistics are refreshed, implement the index recommendation feature again to obtain a more accurate index.
IMPROVE_VALUE: 2465.3%        # The estimated percent value by which the total cost is reduced.
  IMPROVE_CPU: 59377.4%       # The estimated percent value by which the CPU utilization is reduced.
  IMPROVE_MEM: 0.4%           # The estimated percent value by which the consumed memory is reduced.
   IMPROVE_IO: 3024.7%        # The estimated percent value by which the number of disk I/O operations is reduced.
  IMPROVE_NET: 2011.1%        # The estimated percent value by which the amount of data that is transferred over the network is reduced.
 BEFORE_VALUE: 4.711359845E8  # The total cost before the index is created.
   BEFORE_CPU: 1.19405577E7   # The estimated CPU utilization before the index is created.
   BEFORE_MEM: 426811.2       # The estimated memory that is consumed before the index is created.
    BEFORE_IO: 44339          # The estimated number of disk I/O operations before the index is created.
   BEFORE_NET: 47.5           # The estimated amount of data that is transferred over the network before the index is created.
  AFTER_VALUE: 1.83655008E7   # The total cost after the index is created.
    AFTER_CPU: 20075.8        # The estimated CPU utilization after the index is created.
    AFTER_MEM: 425016         # The estimated memory that is consumed after the index is created.
     AFTER_IO: 1419           # The estimated number of disk I/O operations after the index is created.
    AFTER_NET: 2.2            # The estimated amount of data that is transferred over the network after the index is created.
 ADVISE_INDEX: ALTER TABLE `lineitem` ADD  INDEX `__advise_index_lineiteml_partkey`(`l_partkey`);
/* ADVISE_INDEX indicates the statement that is used to create the recommended index. */
     NEW_PLAN:                # The estimated execution plan after the index is created.
Project(avg_yearly="$f0 / ? 0")
  HashAgg($f0="SUM(l_extendedprice)")
    Filter(condition="l_quantity < $16 * f17w0$o0")
      SortWindow(p_partkey="p_partkey", l_partkey="l_partkey", l_quantity="l_quantity", l_extendedprice="l_extendedprice", $16="$16", f5w0$o0="window#0AVG($2)", Reference Windows="window#0=window(partition {1} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED PRECEDING aggs [AVG($2)])")
        MemSort(sort="l_partkey ASC")
          BKAJoin(condition="l_partkey = p_partkey", type="inner")
            Gather(concurrent=true)
              LogicalView(tables="[0000,0001].part", shardCount=2, sql="SELECT `p_partkey` FROM `part` AS `part` WHERE ((`p_brand` = ?) AND (`p_container` = ?))")
            Gather(concurrent=true)
              LogicalView(tables="[0000,0001].lineitem", shardCount=2, sql="SELECT `l_partkey`, `l_quantity`, `l_extendedprice`, ? AS `$16` FROM `lineitem` AS `lineitem` WHERE (`l_partkey` IN (...))")

         INFO: LOCAL_INDEX    # The other information.

Step 2: Create the index based on the recommendation

  1. Evaluate the benefits of creating the index. Then, create the index by executing the returned SQL statement that is indicated by ADVISE_INDEX.
    ALTER TABLE `lineitem` ADD  INDEX `__advise_index_lineiteml_partkey`(`l_partkey`);
  2. Execute the Q17 statement again. In the TPC-H benchmark, Q17 represents Small-Quantity-Order Revenue Query. The time that is consumed to execute the statement is reduced to 1.41 seconds. The query efficiency is improved in a significant way.
    Figure 2. After intelligent index optimization
    After intelligent index optimization