All Products
Search
Document Center

PolarDB:Best practices and optimizations for high-concurrency scenarios

Last Updated:Mar 28, 2026

In Online Transaction Processing (OLTP) workloads, point queries are the most performance-sensitive access pattern. PolarDB-X is optimized for point query response time (RT) and throughput, making them the recommended design target for high-concurrency read scenarios.

This topic explains what point queries are, how to verify that a query qualifies as one, and how to use indexes to optimize queries that don't.

What is a point query?

A point query scans a small amount of data — not just returns a small amount. The distinction matters: a query like select * from t1 order by c1 limit 1 returns one row, but if c1 has no index, the database must scan and sort every row in the table to find it. That is not a point query.

In a single-node database, the classic point query retrieves data by primary key (PK). The primary key index lets the database find a row in log(n) steps, where n is the total number of rows. For other access patterns, you can add a local secondary index (LSI). The database scans the LSI first to get the primary key, then retrieves the full row. If the LSI includes all columns the query needs, the full-row retrieval step is skipped entirely.

PolarDB-X is a distributed database. Data is distributed across multiple data nodes (DNs) using partitioned tables. When you create a table, you choose one or more columns as partitioning keys — these columns determine which partition each row belongs to, and each partition is stored on a specific DN.

This adds a new performance dimension: in a distributed database, a point query must also scan a small number of partitions. A query that reads one row but touches all partitions is not a point query.

PolarDB-X uses the primary key as the partition key by default. A primary key query locates the target partition first, then uses the primary key index on that DN — achieving optimal performance. For other access patterns, you can create a global secondary index (GSI). A GSI works similarly to an LSI: the database retrieves the primary key from the index, then looks up the full row. The key difference is that a GSI is itself a partitioned table, stored on different DNs than the primary table, so the lookup step requires a network transfer. To eliminate that overhead, PolarDB-X supports clustered indexes, which store all primary table columns in the index — no lookup needed, and performance equivalent to a primary key query.

Detect point queries

Use EXPLAIN and EXPLAIN EXECUTE together to verify that a query is a point query.

EXPLAIN shows the logical execution plan — specifically, how many partitions the query touches. A point query targets a single partition. EXPLAIN EXECUTE shows the physical execution plan on the DN, which follows MySQL execution plan format. For field descriptions, see the MySQL documentation.

The following example checks whether a primary key query is a point query.

  1. Check the logical execution plan.

    explain select c_custkey, c_name, c_address from customer where c_custkey = 42;

    Output:

    +------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | LOGICAL EXECUTIONPLAN                                                                                                                                            |
    +------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | LogicalView(tables="TEST1_000002_GROUP.customer_IVgG_10", sql="SELECT `c_custkey`, `c_name`, `c_address` FROM `customer` AS `customer` WHERE (`c_custkey` = ?)") |
    +------------------------------------------------------------------------------------------------------------------------------------------------------------------+

    The LogicalView targets a single partition (TEST1_000002_GROUP.customer_IVgG_10). This confirms the query touches exactly one shard.

  2. Check the physical execution plan on the DN.

    explain execute select c_custkey, c_name, c_address from customer where c_custkey = 42;

    Output:

    +----+-------------+----------+------------+-------+---------------+---------+---------+-------+------+----------+-----------------+
    | id | select_type | table    | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra           |
    +----+-------------+----------+------------+-------+---------------+---------+---------+-------+------+----------+-----------------+
    | 1  | SIMPLE      | customer | NULL       | const | PRIMARY       | PRIMARY | 4       | const | 1    | 100      | Using pk access |
    +----+-------------+----------+------------+-------+---------------+---------+---------+-------+------+----------+-----------------+

    Key fields to check:

    FieldValueMeaning
    typeconstThe query retrieves at most one row using the primary key — the best possible access type
    keyPRIMARYThe primary key index is used
    rows1Exactly one row is scanned
    ExtraUsing pk accessPrimary key access on the DN

Optimize queries to point queries

Not every query can be a point query. Full-table scans (select * from t1), deep-offset pagination (select * from t1 where c1 = 1 limit 100000, 10), and IN queries with an unbounded number of parameters cannot be optimized this way.

Queries that can be optimized fall into two categories:

  • Fixed-range scans: equality conditions, small BETWEEN AND ranges, or IN with a fixed number of parameters

  • TopN queries: queries with a fixed result set size, such as select * from t1 where c1 > 42 limit 10 or select * from t1 order by c1 limit 10

The optimization strategy is to add an appropriate index so the query scans a single partition and uses an index on that partition. The following example walks through the process using a GSI and then a clustered index.

Example: optimize a phone number lookup

  1. Create the customer table.

    CREATE TABLE `customer` (
      `c_custkey` int(11) NOT NULL,
      `c_name` varchar(25) NOT NULL,
      `c_address` varchar(40) NOT NULL,
      `c_nationkey` int(11) NOT NULL,
      `c_phone` varchar(15) NOT NULL,
      `c_acctbal` decimal(15,2) NOT NULL,
      `c_mktsegment` varchar(10) NOT NULL,
      `c_comment` varchar(117) NOT NULL,
      PRIMARY KEY (`c_custkey`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by hash(`c_custkey`) tbpartition by hash(`c_custkey`) tbpartitions 4;

    The table is partitioned by c_custkey. Queries that filter on c_phone must scan all shards.

  2. Verify the problem: check the execution plan for a phone number query.

    explain select * from customer where c_phone = "11";

    Output:

    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | LOGICAL EXECUTIONPLAN                                                                                                                                                                                                                            |
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Gather(concurrent=true)                                                                                                                                                                                                                          |
    |   LogicalView(tables="[000000-000003].customer_[00-15]", shardCount=16, sql="SELECT `c_custkey`, `c_name`, `c_address`, `c_nationkey`, `c_phone`, `c_acctbal`, `c_mktsegment`, `c_comment` FROM `customer` AS `customer` WHERE (`c_phone` = ?)") |
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

    shardCount=16 confirms the query scans all 16 shards — a full-table scan across the entire distributed table.

  3. Add a GSI on c_phone.

    create global index g_i_phone on customer(c_phone) dbpartition by hash(c_phone);
  4. Check the execution plan again.

    explain select * from customer where c_phone = "11";

    Output:

    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | LOGICAL EXECUTIONPLAN                                                                                                                                                                                                                                                  |
    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Project(c_custkey="c_custkey", c_name="c_name", c_address="c_address", c_nationkey="c_nationkey", c_phone="c_phone", c_acctbal="c_acctbal", c_mktsegment="c_mktsegment", c_comment="c_comment")                                                                        |
    |   BKAJoin(condition="c_custkey = c_custkey", type="inner")                                                                                                                                                                                                             |
    |     IndexScan(tables="TEST1_000000_GROUP.g_i_phone_2CSp", sql="SELECT `c_custkey`, `c_phone` FROM `g_i_phone` AS `g_i_phone` WHERE (`c_phone` = ?)")                                                                                                                   |
    |     Gather(concurrent=true)                                                                                                                                                                                                                                            |
    |       LogicalView(tables="[000000-000003].customer_[00-15]", shardCount=16, sql="SELECT `c_custkey`, `c_name`, `c_address`, `c_nationkey`, `c_acctbal`, `c_mktsegment`, `c_comment` FROM `customer` AS `customer` WHERE ((`c_phone` = ?) AND (`c_custkey` IN (...)))") |
    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

    The query now uses the GSI (IndexScan on g_i_phone) to find the matching primary key, then looks up the full row on a single shard. The LogicalView still shows shardCount=16 because the exact shard cannot be determined at EXPLAIN time — it depends on the primary key value returned from the index.

  5. Replace the GSI with a clustered index to eliminate the lookup step.

    drop index g_i_phone on customer;
    create clustered index g_i_phone on customer(c_phone) dbpartition by hash(c_phone);
  6. Check the execution plan one more time.

    explain select * from customer where c_phone = "11";

    Output:

    +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | LOGICAL EXECUTIONPLAN                                                                                                                                                                                                            |
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | IndexScan(tables="TEST1_000000_GROUP.g_i_phone_fHmZ", sql="SELECT `c_custkey`, `c_name`, `c_address`, `c_nationkey`, `c_phone`, `c_acctbal`, `c_mktsegment`, `c_comment` FROM `g_i_phone` AS `g_i_phone` WHERE (`c_phone` = ?)") |
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

    The clustered index stores all primary table columns, so no lookup is needed. The plan is a single IndexScan on one shard — a true point query with performance equivalent to a primary key query.

GSI vs clustered index

GSIClustered index
How it worksStores only indexed columns and primary key; full-row lookup requiredStores all primary table columns; no lookup needed
Read performanceSingle-shard index scan + one lookup (network transfer)Single-shard index scan, no lookup
Write overheadAll writes run as distributed transactions; write RT increases 2–3xEvaluate write performance impact before use
Use whenQueries select only a few columns, or storage is a concernQueries select all columns, or read latency is critical

Usage notes

Write performance impact of GSIs

All writes to a table with a GSI run as distributed transactions by default, ensuring strong consistency between the GSI and the primary table. Compared to tables without a GSI, write RT increases 2–3 times. Because the index and primary tables are written to in parallel, the chance of distributed deadlocks also increases under high-concurrency writes.

Limit GSIs to no more than 3 per logical table.

Choosing the right column for the index

The key to point query optimization is matching the index partition key to the query's filter column. Add an index only when the query filter is selective enough to justify the write overhead.

What's next