All Products
Search
Document Center

PolarDB:Convert IN predicates into joins

Last Updated:Mar 30, 2026

When a query has a large IN list, the optimizer analyzes every value as a range condition — a process that can consume significant CPU and memory. PolarDB for MySQL automatically converts large IN lists into semijoin-based joins when the element count reaches a configurable threshold, bypassing range analysis and reducing optimization overhead.

The optimization is enabled by default with a threshold of 5000 elements. Set loose_in_predicate_conversion_threshold to 0 to disable it.

Prerequisites

Before you begin, make sure you have:

  • A PolarDB for MySQL 8.0 cluster at revision version 8.0.2.2.10 or later. To check your cluster version, see Engine versions 5.6, 5.7, and 8.0

  • The number of elements in the IN list reaches the number specified in the loose_in_predicate_conversion_threshold parameter

  • The [NOT]IN condition at the top level of a WHERE or ON clause — nested conditions are not eligible for conversion

How it works

When an IN list exceeds the threshold, the optimizer rewrites the predicate as a semijoin against a temporary table. The original form:

column [NOT] IN (const1, const2, ...)

is converted to an equivalent subquery that materializes the IN list:

column [NOT] IN (SELECT ... FROM temporary_table)

This avoids the CPU and memory cost of range analysis. The optimizer then uses index lookups against the temporary table instead of scanning all range candidates.

Trade-offs to consider:

  • If the table has no usable index, the conversion may result in a full table scan rather than range-based access.

  • Row count estimates for the converted query are less precise than for range analysis.

Configure the threshold

Use the loose_in_predicate_conversion_threshold parameter to control when conversion takes effect. For instructions on setting cluster parameters, see Specify cluster and node parameters.

Parameter Level Description
loose_in_predicate_conversion_threshold Global Minimum number of elements in an IN list that triggers conversion to a join. Valid values: 0–18446744073709551615. Default value: 5000. Set to 0 to disable the optimization.

Verify the optimization

Use EXPLAIN to confirm whether a query is using IN predicate conversion.

Before conversion (default threshold of 5000, query has 5 elements):

mysql> EXPLAIN SELECT * FROM t WHERE a IN (1,2,3,5,5);
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t     | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  160 |    50.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN format=tree SELECT * FROM t WHERE a IN (1,2,3,5,5);
+------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                              |
+------------------------------------------------------------------------------------------------------+
| -> Filter: (t.a IN (1,2,3,5,5))  (cost=16.25 rows=80)
    -> TABLE scan ON t  (cost=16.25 rows=160)
 |
+------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

The plan shows a simple TABLE scan with a filter — no conversion has occurred.

After lowering the threshold to 5:

mysql> SET in_predicate_conversion_threshold=5;
mysql> EXPLAIN SELECT * FROM t WHERE a IN (1,2,3,5,5);
+----+-------------+------------+------------+--------+---------------------+---------------------+---------+----------+------+----------+--------------------------+
| id | select_type | table      | partitions | type   | possible_keys       | key                 | key_len | ref      | rows | filtered | Extra                    |
+----+-------------+------------+------------+--------+---------------------+---------------------+---------+----------+------+----------+--------------------------+
|  1 | PRIMARY     | t          | NULL       | ALL    | NULL                | NULL                | NULL    | NULL     |  160 |   100.00 | Using where              |
|  1 | PRIMARY     | <derived3> | NULL       | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 8       | test.t.a |    1 |   100.00 | Using where; Using index |
|  3 | DERIVED     | NULL       | NULL       | NULL   | NULL                | NULL                | NULL    | NULL     | NULL |     NULL | IN-list Converted        |
+----+-------------+------------+------------+--------+---------------------+---------------------+---------+----------+------+----------+--------------------------+
mysql> EXPLAIN format=tree SELECT * FROM t1 WHERE a IN (1,2,3,5,5);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                    |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop semijoin
    -> Filter: (t1.a IS NOT NULL)  (cost=0.55 rows=3)
        -> TABLE scan ON t1  (cost=0.55 rows=3)
    -> Filter: (t1.a = tvc_0._col_1)
        -> Index lookup ON tvc_0 using <auto_key0> (_col_1=t1.a)
            -> Materialize
                -> scan ON in-list: 5 rows

The Extra column shows IN-list Converted, and the tree-format plan shows a Nested loop semijoin with Materialize — confirming the IN list has been converted.

Related topics