How to improve JOIN query performance using primary and foreign key constraints - AnalyticDB

When tables are joined but a query reads columns from only one of them, AnalyticDB for MySQL can prove the join is redundant and remove it from the execution plan — reducing latency without any change to your SQL. This is called join elimination, and it relies on declared primary key (PK) and foreign key (FK) constraint relationships.

Important

FK and PK constraints in AnalyticDB for MySQL are informational only — the system does not enforce data integrity. You are responsible for ensuring that all FK values reference valid PK values. If your data violates the declared constraints, queries may return incorrect results after join elimination.

Prerequisites

Before you begin, ensure that you have:

An AnalyticDB for MySQL cluster running V3.1.10 or later

To check the minor version of a Data Lakehouse Edition cluster, run SELECT adb_version();. To upgrade the minor version, contact technical support.

How join elimination works

The optimizer uses FK-PK constraint information to determine whether a join changes the result set. If a query joins table A (FK side) to table B (PK side) on a declared FK-PK column pair, and the query does not select any columns from table B, then:

Each row in A matches at most one row in B (because B's column is a PK).
The join does not filter or multiply rows from A.
The join is therefore redundant — the result is identical to scanning A alone.

When these conditions are met, the optimizer replaces the join with a single TableScan on the FK table.

Enable or disable this behavior per query using the /*+ PK_FK_DEPENDENCY_ENABLED*/ hint:

Hint	Effect
`/+ PK_FK_DEPENDENCY_ENABLED = true/`	Enables FK-PK-based join elimination
`/+ PK_FK_DEPENDENCY_ENABLED = false/`	Disables FK-PK-based join elimination

Declare foreign keys

Declare FKs when creating a table (CREATE TABLE) or add and remove them later (ALTER TABLE). For CREATE TABLE syntax details, see CREATE TABLE. For ALTER TABLE syntax details, see ALTER TABLE.

Limitations

Limitation	Details
Single-column FKs only	A FK can reference only one column. Multi-column FKs such as `FOREIGN KEY (sr_item_sk, sr_ticket_number) REFERENCES store_sales(ss_item_sk, d_date_sk)` are not supported.
No data constraint enforcement	AnalyticDB for MySQL does not verify FK-PK data integrity. Ensure your data satisfies the declared constraints before relying on join elimination.
External tables	FK constraints cannot be added to external tables.
DROP TABLE behavior	When you use the `DROP TABLE` statement to remove foreign keys, AnalyticDB for MySQL does not check this operation or report an error.
One FK per ALTER TABLE statement	To add multiple FKs with `ALTER TABLE`, run a separate statement for each FK.

Declare FKs with CREATE TABLE

Same-database FK

Create a database and the primary table:

CREATE DATABASE db;
USE db;
CREATE TABLE item
(
  i_item_sk bigint NOT NULL,
  i_current_price bigint,
  PRIMARY KEY(i_item_sk)
)
DISTRIBUTED BY HASH(i_item_sk);

Create a table with a FK referencing item:

When symbol is not specified, the parser automatically names the constraint <column_name>_fk. In this example, the constraint is named sr_item_sk_fk.
```
CREATE TABLE store_returns
(
  sr_sale_id bigint,
  sr_store_sk bigint,
  sr_item_sk bigint NOT NULL,
  FOREIGN KEY (sr_item_sk) REFERENCES item (i_item_sk)
);
```

Cross-database FK

To reference a table in another database, prefix the table name with the database name:

CREATE DATABASE db2;
USE db2;
CREATE TABLE store_sales
(
  ss_sale_id bigint,
  ss_store_sk bigint,
  ss_item_sk bigint not null,
  CONSTRAINT fk_constraint FOREIGN KEY (ss_item_sk) REFERENCES db.item (i_item_sk)
);

Multiple FKs on a single table

Include multiple FOREIGN KEY clauses in one CREATE TABLE statement:

Create the referenced tables:

USE db;
CREATE TABLE customer
(
  i_customer_sk bigint NOT NULL,
  i_current_price bigint,
  PRIMARY KEY(i_customer_sk)
)
DISTRIBUTED BY HASH(i_customer_sk);

CREATE TABLE vendor
(
  id bigint primary key,
  name varchar(5) not null
);

Create a table with two FKs:

CREATE TABLE store_product
(
  sr_sale_id bigint,
  sr_store_sk bigint,
  sr_customer_sk bigint NOT NULL,
  FOREIGN KEY (sr_sale_id) REFERENCES vendor (id),
  FOREIGN KEY (sr_customer_sk) REFERENCES customer (i_customer_sk)
);

Declare FKs with ALTER TABLE

To add or remove a FK on an existing table, use ALTER TABLE. Each statement handles one FK at a time; run multiple statements to manage multiple FKs.

Add a FK:

USE db;
ALTER TABLE store_returns ADD CONSTRAINT sr_item_fk FOREIGN KEY (sr_item_sk) REFERENCES item (i_item_sk);

Remove a FK:

USE db;
ALTER TABLE store_returns DROP FOREIGN KEY sr_item_sk_fk;

Add a FK that references a table in another database:

-- Step 1: Create the referenced table in db2
USE db2;
CREATE TABLE store
(
  id bigint primary key,
  name varchar(5) not null
);

-- Step 2: Add a FK from db.store_returns to db2.store
ALTER TABLE db.store_returns ADD FOREIGN KEY (sr_store_sk) REFERENCES store(id);

Inspect declared FKs

Use SHOW CREATE TABLE to view FK definitions on a table:

USE db;
SHOW CREATE TABLE store_returns;

Sample output (other statements omitted):

-- CONSTRAINT `sr_item_sk_fk` FOREIGN KEY (`sr_item_sk`) REFERENCES `db`.`item`(`i_item_sk`)

Apply join elimination

Once FKs are declared, add the /*+ PK_FK_DEPENDENCY_ENABLED = true*/ hint to your query. The optimizer removes any join where:

The join condition matches a declared FK-PK relationship, and
The query does not select columns from the joined PK table.

The resulting execution plan contains a TableScan on the FK table only — no InnerJoin operator appears.

Two tables in the same database

store_returns.sr_item_sk has a FK referencing item.i_item_sk. The query selects only columns from store_returns. Because i_item_sk is a PK, each sr_item_sk value matches at most one row in item, so the join does not change the result set. The optimizer eliminates it.

/*+ PK_FK_DEPENDENCY_ENABLED = true*/
EXPLAIN
SELECT
  s.sr_sale_id,
  s.sr_store_sk,
  s.sr_item_sk
FROM
  store_returns s,
  item
WHERE
  sr_item_sk = i_item_sk;

The execution plan shows a single TableScan — no InnerJoin operator:

+---------------+
| Plan Summary  |
+---------------+
 1- Output[ Query plan ] {Est rowCount: 1.0}
 2    -> Exchange[GATHER] {Est rowCount: 1.0}
 3        - TableScan {table: store_returns, Est rowCount: 1.0}

Two tables across databases

The same logic applies to cross-database FK relationships. db2.store_sales.ss_item_sk references db.item.i_item_sk. The query selects only store_sales columns, so the join on db.item is redundant:

USE db2;
/*+ PK_FK_DEPENDENCY_ENABLED = true*/
EXPLAIN
SELECT
  s.ss_sale_id,
  s.ss_item_sk
FROM
  store_sales s,
  db.item
WHERE
  ss_item_sk = i_item_sk;

Sample result:

+---------------+
| Plan Summary  |
+---------------+
 1- Output[ Query plan ] {Est rowCount: 1.0}
 2    -> Exchange[GATHER] {Est rowCount: 1.0}
 3        - TableScan {table: store_sales, Est rowCount: 1.0}

Multiple tables joined

When all joins in a query are redundant, the optimizer eliminates them in a single pass. Here, store_returns has FKs referencing both item and db2.store, and the query reads only store_returns columns:

USE db;
/*+ PK_FK_DEPENDENCY_ENABLED = true*/
EXPLAIN
SELECT
  s.sr_sale_id,
  s.sr_store_sk,
  s.sr_item_sk
FROM
  store_returns s,
  item,
  db2.store
WHERE
  sr_item_sk = i_item_sk
  AND sr_store_sk = id;

Sample result:

+---------------+
| Plan Summary  |
+---------------+
 1- Output[ Query plan ] {Est rowCount: 1.0}
 2    -> Exchange[GATHER] {Est rowCount: 1.0}
 3        - TableScan {table: store_returns, Est rowCount: 1.0}

Joins in views

Join elimination also works when a view is defined over multiple tables. Create a view that joins store_returns and item:

CREATE VIEW sr_item_v AS
SELECT
  s.sr_store_sk AS store_name,
  s.sr_sale_id AS sale_id,
  s.sr_item_sk AS sr_item_id,
  item.i_current_price AS item_price,
  item.i_item_sk as item_id
FROM
  store_returns s,
  item
WHERE
  sr_item_sk = i_item_sk;

A query against the view that does not reference item_price (or any other column sourced exclusively from item) triggers join elimination. The optimizer replaces the reference to item.i_item_sk with store_returns.sr_item_sk and scans only store_returns:

/*+ PK_FK_DEPENDENCY_ENABLED = true*/
EXPLAIN
SELECT store_name, sr_item_id, sale_id
FROM sr_item_v;

Sample result:

+---------------+
| Plan Summary  |
+---------------+
 1- Output[ Query plan ] {Est rowCount: 1.0}
 2    -> Exchange[GATHER] {Est rowCount: 1.0}
 3        - TableScan {table: store_returns, Est rowCount: 1.0}

When elimination does not apply: if the query selects a column that can only be resolved from item (such as item_price), the join is necessary and the optimizer retains it:

/*+ PK_FK_DEPENDENCY_ENABLED = true*/
EXPLAIN
SELECT store_name, sr_item_id, sale_id, item_price
FROM sr_item_v;

Sample result — InnerJoin[Hash Join] appears in the plan:

+---------------+
| Plan Summary  |
+---------------+
 1- Output[ Query plan ] {Est rowCount: 1.0}
 2    -> Exchange[GATHER] {Est rowCount: 1.0}
 3        -> InnerJoin[Hash Join] {Est rowCount: 1.0}
 4            -> Project {Est rowCount: 1.0}
 5                -> Exchange[REPARTITION] {Est rowCount: 1.0}
 6                    - TableScan {table: store_returns, Est rowCount: 1.0}
 7            -> LocalExchange[HASH] {Est rowCount: 1.0}
 8                -> ScanProject {table: item, Est rowCount: 1.0}
 9                    - TableScan {table: item, Est rowCount: 1.0}