All Products
Search
Document Center

Cost-based path selection

Last Updated: Jun 18, 2021

If multiple paths are available after the rule-based path selection, OceanBase Database calculates the cost of each path and selects the one with the lowest cost.

The OceanBase cost model takes into account the CPU cost, for example, the CPU utilization for processing a predicate, and the I/O costs, such as the costs for reading macroblocks and microblocks sequentially or randomly. The CPU and I/O costs are summed to get a total cost.

OceanBase Database shows the cost of each access path in the execution plan. Example:

obclient>CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT, INDEX k1(b));
Query OK, 0 rows affected (0.35 sec)

/*Cost of primary access path*/
obclient>EXPLAIN SELECT/*+INDEX(t1 PRIMARY)*/ * FROM t1 WHERE b < 10;
+-----------------------------------------------------------------+
| Query Plan                                                                              |
+-----------------------------------------------------------------+
| ===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |200      |622 |
===================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c]), filter([t1.b < 10]),
      access([t1.b], [t1.a], [t1.c]), partitions(p0)

/*Cost of path k1*/
obclient> EXPLAIN SELECT/*+INDEX(t1 k1)*/ * FROM t1 WHERE b < 10;
+--------------------------------------------------------------------+
| Query Plan                                                                                   |
+--------------------------------------------------------------------+
| =====================================
|ID|OPERATOR  |NAME  |EST. ROWS|COST|
-------------------------------------
|0 |TABLE SCAN|t1(k1)|200      |1114|
=====================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c]), filter(nil),
      access([t1.b], [t1.a], [t1.c]), partitions(p0)

The cost of an access path includes two major parts: the cost of path scanning and the cost of table access by index primary key. The latter part is zero if table access by index primary key is not required.

In OceanBase Database, the cost of an access path depends on many factors, such as the number of scanned rows, rows returned by TABLE ACCESS BY INDEX PRIMARY KEY operations, projected columns, and predicates. Among them, the cost largely depends on the number of rows. The following example analyzes these two major parts of cost in terms of the number of rows.

  • The cost of scanning the access path

    The cost is proportional to the number of scanned rows. Technically, an execution takes more time if more rows are scanned. For an access path, the query range determines the scan range, or the number of rows to be scanned. The query range also indicates that the scanning is performed sequentially.

  • The cost of table access by index primary key

    The cost is also positively correlated to the number of rows returned by TABLE ACCESS BY INDEX PRIMARY KEY operations. The more the rows, the longer the execution. Here, the rows returned by TABLE ACCESS BY INDEX PRIMARY KEY operations are rows that satisfy all predicates executable on the index. Because the scan for TABLE ACCESS BY INDEX PRIMARY KEY operations is performed randomly, the cost is much higher compared with scanning by query range.

To evaluate the performance of an access path, you can calculate the total number of rows returned by the preceding two operations. You can execute an SQL statement to get the number of rows.

In the following example, for the query SELECT * FROM t1 WHERE c2 > 20 AND c2 < 800 AND c3 < 200, the access path by index k1 first needs execution plan display to get the predicates for extracting the query range. The predicate c2 > 20 AND c2 < 800 is used to extract the query range and the predicate c3 < 200 is taken as the one executed before the TABLE ACCESS BY INDEX PRIMARY KEY operation. So, you can use the following two queries to check the number of rows extracted for the query range and the number of rows returned by the TABLE ACCESS BY INDEX PRIMARY KEY operation.

obclient>CREATE TABLE t1(c1 INT PRIMARY KEY, c2 INT, c3 INT, c4 INT, c5 INT, INDEX k1(c2,c3));
Query OK, 0 rows affected (0.26 sec)

obclient>EXPLAIN EXTENDED_NOADDR SELECT/*+INDEX(t1 k1)*/ * FROM t1 WHERE 
      c2 > 20 AND c2 < 800 AND c3 < 200;
+--------------------------------------------------------------+
| Query Plan                                                                          |
+--------------------------------------------------------------+
| =====================================
|ID|OPERATOR  |NAME  |EST. ROWS|COST|
-------------------------------------
|0 |TABLE SCAN|t1(k1)|156      |1216|
=====================================

Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2], [t1.c3], [t1.c4], [t1.c5]), filter([t1.c3 < 200]),
      access([t1.c2], [t1.c3], [t1.c1], [t1.c4], [t1.c5]), partitions(p0),
      is_index_back=true, filter_before_indexback[true],
      range_key([t1.c2], [t1.c3], [t1.c1]), range(20,MAX,MAX ; 800,MIN,MIN),
      range_cond([t1.c2 > 20], [t1.c2 < 800])

/*The number of rows scanned with query range*/
obclient>SELECT/*+INDEX(t1 k1)*/ COUNT(*) FROM t1 WHERE c2 > 20 AND c2 < 800;
+----------+
| count(*) |
+----------+
|      779 |
+----------+
1 row in set (0.02 sec)

/*The number of rows returned by the TABLE ACCESS BY INDEX PRIMARY KEY operation*/
obclient> SELECT/*+INDEX(t1 k1)*/ COUNT(*) FROM t1 WHERE c2 > 20 AND c2 < 800
      AND c3 < 200;
+----------+
| count(*) |
+----------+
|      179 |
+----------+
1 row in set (0.01 sec)