When a query has a large IN list, the optimizer analyzes every value as a range condition — a process that can consume significant CPU and memory. PolarDB for MySQL automatically converts large IN lists into semijoin-based joins when the element count reaches a configurable threshold, bypassing range analysis and reducing optimization overhead.
The optimization is enabled by default with a threshold of 5000 elements. Set loose_in_predicate_conversion_threshold to 0 to disable it.
Prerequisites
Before you begin, make sure you have:
-
A PolarDB for MySQL 8.0 cluster at revision version 8.0.2.2.10 or later. To check your cluster version, see Engine versions 5.6, 5.7, and 8.0
-
The number of elements in the IN list reaches the number specified in the
loose_in_predicate_conversion_thresholdparameter -
The
[NOT]INcondition at the top level of aWHEREorONclause — nested conditions are not eligible for conversion
How it works
When an IN list exceeds the threshold, the optimizer rewrites the predicate as a semijoin against a temporary table. The original form:
column [NOT] IN (const1, const2, ...)
is converted to an equivalent subquery that materializes the IN list:
column [NOT] IN (SELECT ... FROM temporary_table)
This avoids the CPU and memory cost of range analysis. The optimizer then uses index lookups against the temporary table instead of scanning all range candidates.
Trade-offs to consider:
-
If the table has no usable index, the conversion may result in a full table scan rather than range-based access.
-
Row count estimates for the converted query are less precise than for range analysis.
Configure the threshold
Use the loose_in_predicate_conversion_threshold parameter to control when conversion takes effect. For instructions on setting cluster parameters, see Specify cluster and node parameters.
| Parameter | Level | Description |
|---|---|---|
loose_in_predicate_conversion_threshold |
Global | Minimum number of elements in an IN list that triggers conversion to a join. Valid values: 0–18446744073709551615. Default value: 5000. Set to 0 to disable the optimization. |
Verify the optimization
Use EXPLAIN to confirm whether a query is using IN predicate conversion.
Before conversion (default threshold of 5000, query has 5 elements):
mysql> EXPLAIN SELECT * FROM t WHERE a IN (1,2,3,5,5);
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | t | NULL | ALL | NULL | NULL | NULL | NULL | 160 | 50.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN format=tree SELECT * FROM t WHERE a IN (1,2,3,5,5);
+------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------------------+
| -> Filter: (t.a IN (1,2,3,5,5)) (cost=16.25 rows=80)
-> TABLE scan ON t (cost=16.25 rows=160)
|
+------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
The plan shows a simple TABLE scan with a filter — no conversion has occurred.
After lowering the threshold to 5:
mysql> SET in_predicate_conversion_threshold=5;
mysql> EXPLAIN SELECT * FROM t WHERE a IN (1,2,3,5,5);
+----+-------------+------------+------------+--------+---------------------+---------------------+---------+----------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+---------------------+---------------------+---------+----------+------+----------+--------------------------+
| 1 | PRIMARY | t | NULL | ALL | NULL | NULL | NULL | NULL | 160 | 100.00 | Using where |
| 1 | PRIMARY | <derived3> | NULL | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 8 | test.t.a | 1 | 100.00 | Using where; Using index |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | IN-list Converted |
+----+-------------+------------+------------+--------+---------------------+---------------------+---------+----------+------+----------+--------------------------+
mysql> EXPLAIN format=tree SELECT * FROM t1 WHERE a IN (1,2,3,5,5);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop semijoin
-> Filter: (t1.a IS NOT NULL) (cost=0.55 rows=3)
-> TABLE scan ON t1 (cost=0.55 rows=3)
-> Filter: (t1.a = tvc_0._col_1)
-> Index lookup ON tvc_0 using <auto_key0> (_col_1=t1.a)
-> Materialize
-> scan ON in-list: 5 rows
The Extra column shows IN-list Converted, and the tree-format plan shows a Nested loop semijoin with Materialize — confirming the IN list has been converted.