All Products
Search
Document Center

Fast parameterization

Last Updated: Jun 18, 2021

Parameterization refers to a process of converting constants in an SQL query into variables.

An SQL statement may use different parameters in each execution. These parameters are parameterized to generate a parameter-irrelevant SQL string. This SQL string is used as a plan caching key to retrieve an execution plan from the plan cache. This design ensures that SQL statements with different parameters can share the same plan.

Typically, the syntax tree in a conventional database is parameterized to generate keys for retrieving plans from the plan cache. Different from these databases, OceanBase Database uses a lexical parameterization process to directly parameterize text strings as the keys of the plan cache. That is why this process is called fast parameterization.

OceanBase Database also supports Adaptive Cursor Sharing to enable the selection of plans with different parameters.

The following figure shows the process of retrieving an execution plan based on fast parameterization:

1

Example:

obclient>SELECT * FROM T1 WHERE c1 = 5 AND c2 ='oceanbase';

The following syntax shows the parameterization result of the SQL query in the preceding example. Constant 5 and oceanbase are parameterized into variables @1 and @2:

obclient>SELECT * FROM T1 WHERE c1 = @1 AND c2 = @2;

However, not all constants can be parameterized. Constants after ORDER BY, for example, cannot be parameterized as they indicate sorting by a specific column in the projected columns in the SELECT statement.

In the following example, table t1 contains columns c1 and c2. The results of the SQL query are sorted by column c1, which is the primary key column. Sorting is not necessary because column c1 is ordered.

obclient>CREATE TABLE t1(c1 INT PRIMARY KEY,c2 INT);
Query OK, 0 rows affected (0.06 sec)

obclient>INSERT INTO t1 VALUES (1,2);
Query OK, 1 row affected (0.01 sec)

obclient>INSERT INTO t1 VALUES (2,1);
Query OK, 1 row affected (0.01 sec)

obclient>INSERT INTO t1 VALUES (3,1);
Query OK, 1 row affected (0.01 sec)

obclient>SELECT c1, c2 FROM t1 ORDER BY 1;

+----+------+
| C1 | C2   |
+----+------+
|  1 |    2 |
|  2 |    1 |
|  3 |    1 |
+----+------+
3 rows in set (0.00 sec)

obclient>EXPLAIN SELECT c1, c2 FROM t1 ORDER BY 1\G;
*************************** 1. row ***************************
Query Plan: 
| ===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1000     |1381|
===================================
Outputs & filters:
-------------------------------------
  0 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)

However, if you run the following command:

obclient>SELECT c1, c2 FROM t1 ORDER BY 2;

+----+------+
| C1 | C2   |
+----+------+
|  2 |    1 |
|  3 |    1 |
|  1 |    2 |
+----+------+
3 rows in set (0.00 sec)

The results need to be sorted by column c2. So, a sorting operation is required. The execution plan is shown in the following example:

obclient>EXPLAIN SELECT c1, c2 FROM t1 ORDER BY 2\G;
*************************** 1. row ***************************
Query Plan: 
| ====================================
|ID|OPERATOR   |NAME|EST. ROWS|COST|
------------------------------------
|0 |SORT       |    |1000     |1886|
|1 | TABLE SCAN|t1  |1000     |1381|
====================================
Outputs & filters:
-------------------------------------
  0 - output([T1.C1], [T1.C2]), filter(nil), sort_keys([T1.C2, ASC])
  1 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)

Therefore, if constants after ORDER BY are parameterized, different values of ORDER BY are assimilated to have the same parameterized SQL statements, resulting in the retrieval of wrong plans. In addition, the following constants are constraints for parameterization and cannot be parameterized:

  • All constants after ORDER BY, such as ORDER BY 1,2;

  • All constants after GROUP BY, such as GROUP BY 1,2;

  • Constants after LIMIT, such as LIMIT 5;

  • A string constant in a format string, such as %d in SELECT DATE_FORMAT('2006-06-00', '%d');

  • Constants, among input parameters of a function, that affect the result of the function and eventually the execution plan, such as NUMBER(2,1) in CAST(999.88 as NUMBER(2,1)) , or 1 and 2 in SUBSTR('abcd', 1, 2)

  • Constants, among input parameters of a function, that carry implicit information and eventually affect the execution plan, such as '2015-11-13 10:20:19.012' in SELECT UNIX_TIMESTAMP('2015-11-13 10:20:19.012'); It not only specifies input timestamp but also implies that the precision of the function is set to milliseconds.

To solve the possible mismatch problems, the system parameterizes SQL queries by analyzing their syntax trees in the process of generating the execution plan through hard parsing and obtaining the corresponding distinctive information. Assume that a statement conveys this information: "The third item of the fast parameterization parameter array must be 3". This information is considered a "constraint".

For query Q1 in the following example:

Q1:
obclient>SELECT c1, c2, c3 FROM t1 
      WHERE c1 = 1 AND c2 LIKE 'senior%' ORDER BY 3;

After lexical analysis of the syntax tree, the SQL statement is parameterized into the following string:

obclient>SELECT c1, c2, c3 FROM t1
           WHERE c1 = @1 AND c2 LIKE @2 ORDER BY @3 ;
/*The parameterized parameter array is {1, 'senior%', 3}*/

If constants after ORDER BY are different, they cannot share the same execution plan. Therefore, another result is generated by parameterizing the SQL statement:

obclient>SELECT c1, c2, c3 FROM t1
           WHERE c1 = @1 AND c2 LIKE @2 ORDER BY 3 ;

/*The parameterized array is {1, 'senior'}.
The constraint is "The third item of the fast parameterization parameter array must be 3"*/

The new parameterized string, constraint, and the execution plan for query Q1 are stored in the plan cache.

When you execute the following Q2 query:

Q2:
obclient>SELECT c1, c2, c3 FROM t1 
      WHERE c1 = 1 AND c2 LIKE 'senior%' ORDER BY 2;

The result after fast parameterization is:

obclient>SELECT c1, c2, c3 FROM t1 
      WHERE c1 = @1 and c2 like @2 ORDER BY @3;
/*The parameterized array is {1, 'senior%', 2}*/

The parameterized string is the same as that of the SQL statement for Q1. However, their constraints do not match. Therefore, query Q2 cannot match the plan for Q1. In this case, the optimizer generates a new execution plan and constraint "The third item of the fast parameterization parameter array must be 2" for Q2 through hard parsing, and adds the new plan and constraint to the cache. The next time Q1 and Q2 are executed, the correct execution plans are retrieved from the cache.

Based on fast parameterization, the plan caching of OceanBase Database has the following benefits:

  • Simplified parsing process.

  • When you search for hash map, you can perform the HASH and MEMCMP operations on the text string, instead of the HASH and COMPARISON operations on the parameterized syntax tree. This approach provides higher execution efficiency.