All Products
Search
Document Center

PolarDB:Specify join reorder for the IMCI optimizer

Last Updated:Mar 28, 2026

Complex analytical queries that join many tables are sensitive to join order: a suboptimal order can produce large intermediate result sets, causing every downstream operator to do more work. The IMCI optimizer addresses this by combining query transformation rules with per-column statistics to evaluate multiple equivalent execution plans and select the lowest-cost join order automatically. This page explains how the optimizer works, how to enable it, and what to do when the optimizer cannot find the optimal plan.

How it works

SQL is a declarative language: a single query can be executed in many ways, all producing the same correct result. The query optimizer explores this space of equivalent query plans and picks the one with the lowest estimated cost.

For example, the following query on four tables can be executed in at least two orders:

SELECT * FROM t0, t1, t2, t3
WHERE t0.a = t1.a AND t1.a = t2.a AND t2.a = t3.a AND t3.b = t1.b;
Two equivalent execution plans

Plan A and Plan B produce the same result. The optimizer generates Plan B by applying a query transformation rule — for example, rewriting t1 INNER JOIN t2 as t2 INNER JOIN t1. It then estimates the cost of each plan and selects the one with the lower cost.

Optimizer workflow

  1. Parse the SQL statement to produce an initial query plan.

  2. Apply query transformation rules to generate equivalent query plans.

  3. Use statistics and cost models to estimate the cost of each plan, then submit the lowest-cost plan to the execution engine.

Statistics

The IMCI optimizer uses three types of statistics for cardinality estimation and cost calculation:

  • Histograms: show the value distribution of each column, used to estimate value ranges and the selection rate of equality predicates.

  • Distinct value counts: used to estimate the number of groups in GROUP BY and to assist with selection rate estimation.

  • Column constraints: whether a column has a unique index or foreign key relationships.

Cost calculation

The cost of each operator in a query plan depends on:

  • The total number of rows the operator processes (estimated from statistics).

  • The algorithmic complexity of the operator.

The total execution cost equals the sum of all operator costs. For hash join, the formula is:

Cost(join) = Card(inner) + Card(outer)

Applying this to Plan A and Plan B:

PlanCalculationTotal cost
Plan A10000+1+1000+100+10000+1021,111
Plan B10000+1+100+10+1000+1011,121

Plan B has a lower cost, so the optimizer selects it.

Supported versions

Your PolarDB cluster must be running one of the following versions:

  • PolarDB for MySQL 8.0.1, revision 8.0.1.1.31 or later

  • PolarDB for MySQL 8.0.2, revision 8.0.2.2.12 or later

Enable IMCI query optimization

Step 1: Collect statistics

Statistics are required before the optimizer can estimate costs accurately. Choose one of the following approaches:

  • Periodic `ANALYZE TABLE`: run ANALYZE TABLE on tables that use IMCI query optimization to keep statistics current.

  • Initial collection + auto-update (recommended for tables under active load): after creating an In-Memory Column Index (IMCI) on a table, run ANALYZE TABLE on the read-only node to build initial statistics. Then set loose_imci_auto_update_statistic to ASYNC so the optimizer refreshes statistics automatically as data changes, without manual intervention. ASYNC mode is suitable for active tables because re-collection happens in the background and does not block ongoing queries.

Step 2: Enable the feature

In the PolarDB console, set loose_imci_optimizer_switch to use_imci_card_est=ON,use_imci_join_reorder=ON.

Join reorder does not take effect unless use_imci_card_est is also set to ON and the table has statistics.

For instructions on configuring parameters, see Specify cluster and node parameters.

Step 3: Run your queries

After the feature is enabled and statistics are collected, run your analytical queries. The optimizer automatically evaluates join orders and selects the lowest-cost plan.

Parameters

Configure the following parameters in the PolarDB console to control IMCI query optimization.

ParameterDescriptionDefault
loose_imci_optimizer_switchControls IMCI query optimization sub-features. Set use_imci_card_est to ON to enable cardinality estimation and cost calculation. Set use_imci_join_reorder to ON to enable join reorder. Both sub-options must be enabled together for join reorder to take effect.OFF
loose_imci_auto_update_statisticControls how the IMCI optimizer refreshes stale statistics. ASYNC: re-collects statistics asynchronously in the background — suitable for tables under active load. SYNC: re-collects statistics synchronously before the query runs. OFF: statistics are not automatically refreshed.ASYNC
loose_imci_max_enum_join_pairsThe maximum number of equivalent execution plans the optimizer evaluates when join reorder is enabled. Increase this value to explore more plans for queries with many joins; decrease it to reduce planning overhead. Valid values: 0–4,294,967,295.2000

Limitations and workarounds

The following conditions can cause large errors in cardinality estimation, leading the optimizer to pick a suboptimal plan. When you encounter one of these conditions, use a JOIN_ORDER hint to explicitly specify the join order.

ConditionExampleWorkaround
Predicates that compare two columns in the same tablet1.c1 > t1.c2Use JOIN_ORDER hint
Predicates with operators that cannot use statisticst1.c1 MOD 2 = 1, t1.c2 LIKE '%ABC%'Use JOIN_ORDER hint
Predicates with expressions the optimizer cannot evaluatet1.c1 + t1.c3 > 100Use JOIN_ORDER hint
Columns without statistics for estimating predicate selection ratesSELECT a, SUM(b) FROM t1 HAVING SUM(b) > 10Use JOIN_ORDER hint
Multiple predicates combined with ANDt1.c1 > 10 AND t1.c3 < 5Use JOIN_ORDER hint
Queries with many nested layersUse JOIN_ORDER hint
Queries joining many tablesAdjust loose_imci_max_enum_join_pairs to expand the search space, or use JOIN_ORDER hint

Applying a `JOIN_ORDER` hint: add the hint directly in the SELECT statement to override the optimizer's join order. For example:

SELECT /*+ JOIN_ORDER(t0, t1, t2, t3) */ *
FROM t0, t1, t2, t3
WHERE t0.a = t1.a AND t1.a = t2.a AND t2.a = t3.a AND t3.b = t1.b;

For the full list of supported join order hints, see MySQL optimizer hints for join order.

Performance results

The following example uses TPC-H Q8, a multi-table query with aggregate functions representative of complex OLAP workloads.

SELECT
  o_year,
  SUM(
    CASE
      WHEN nation = 'BRAZIL' THEN volume
      ELSE 0
    END
  ) / SUM(volume) AS mkt_share
FROM
  (
    SELECT
      EXTRACT(year FROM o_orderdate) AS o_year,
      l_extendedprice * (1 - l_discount) AS volume,
      n2.n_name AS nation
    FROM
      lineitem, orders, part, supplier, customer, nation n1, nation n2, region
    WHERE
      p_partkey = l_partkey
      AND s_suppkey = l_suppkey
      AND l_orderkey = o_orderkey
      AND o_custkey = c_custkey
      AND c_nationkey = n1.n_nationkey
      AND n1.n_regionkey = r_regionkey
      AND r_name = 'AMERICA'
      AND s_nationkey = n2.n_nationkey
      AND o_orderdate BETWEEN DATE '1995-01-01' AND DATE '1996-12-31'
      AND p_type = 'ECONOMY ANODIZED STEEL'
  ) AS all_nations
GROUP BY o_year
ORDER BY o_year;

Test environment: 32-core cluster, TPC-H SF100 dataset.

Without IMCI query optimization

Query plan without IMCI optimization

The query plan produces large intermediate result sets across multiple joins, increasing the rows processed by each operator. Query duration: 7,017 ms.

With IMCI query optimization

Query plan with IMCI optimization

The optimizer reorders joins to reduce intermediate result set sizes to the order of millions of rows, cutting the work for downstream operators. Query duration: 1,900 ms — a 73% reduction.

Related topics