Performance and limitations of ultra-large transactions - PolarDB - Alibaba Cloud - PolarDB

Key limits

Two MySQL-level parameters govern the maximum transaction size in PolarDB-X:

Parameter	Default	Maximum	Modifiable
`MAX_ALLOWED_PACKET`	16 MB	—	Yes, via Parameter settings
`MAX_BINLOG_CACHE_SIZE`	2 GB (2,147,483,648 bytes)	4 GB (MySQL limit)	No

These limits apply per shard, not per transaction. The total data a transaction can write is:

INSERT-only transactions: number of shards × MAX_BINLOG_CACHE_SIZE — binlog records inserted values only.
UPDATE-only transactions: number of shards × (MAX_BINLOG_CACHE_SIZE ÷ 2) — binlog records both before-image and after-image, doubling the binlog footprint.

To increase transaction capacity, split your table into more shards.

Important

MAX_BINLOG_CACHE_SIZE defaults to 2 GB and cannot be modified in PolarDB-X. Even in standard MySQL, the value cannot exceed 4 GB due to MySQL's own limits.

Test environment

All scenarios on this page use the following setup:

Component	Specification
PolarDB-X version	polarx-kernel_5.4.11-16301083_xcluster-20210805
Node specification	4 CPU cores, 16 GB memory
Number of nodes	4
Client (Elastic Compute Service (ECS) instance)	4 CPU cores, 16 GB memory, same CIDR block and vSwitch as the PolarDB-X instance

The test table tb has the following schema. The data type of column c and the number of shards vary by scenario.

CREATE TABLE `tb` (
    `id` BIGINT(20) NOT NULL,
    `c` LONGBLOB/CHAR(255)/CHAR(1),
    PRIMARY KEY (`id`)
);

Shard configurations tested:

1 shard: PolarDB-X optimizes the transaction commit as a one-phase commit.
8 shards: split on the id column, 2 shards per data node.
16 shards: split on the id column, 4 shards per data node.

Scenario 1: Moderate number of statements, large data per statement

This scenario tests transactions that write a large total volume of data across a moderate number of INSERT statements.

Parameter	Value
SQL statements per transaction	2,048
Data per statement	256 KB–8 MB
Total data written	512 MB–16 GB
Rows written	2,048
Column `c` type	LONGBLOB

All statements follow this format:

INSERT INTO `tb` VALUES (id, c)

Important

Each statement must fit within MAX_ALLOWED_PACKET. If you connect via JDBC and send a packet larger than this limit, the following error is reported:

CommunicationsException: Communications link failure

When all 2,048 INSERTs run in a single non-sharded transaction and the total LONGBLOB data exceeds 2 GB, binlog capacity is exhausted:

ERR-CODE: [TDDL-4614][ERR_EXECUTE_ON_MYSQL] Error occurs when execute on GROUP ...:
Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage;
increase this mysqld variable and try again

In a distributed transaction, each branch transaction runs on one shard. The binlog limit applies per shard, so the effective total INSERT capacity is number of shards × MAX_BINLOG_CACHE_SIZE. For 8 shards, the limit is 16 GB.

Each SQL statement must fit within MAX_ALLOWED_PACKET, regardless of whether a transaction is active.
The data written to binlogs per shard per transaction cannot exceed MAX_BINLOG_CACHE_SIZE.
To write more data in a single transaction, split the table into more shards.

Scenario 2: Single statement, large number of rows updated

This scenario tests transactions that execute a single UPDATE touching a large number of rows.

Parameter	Value
SQL statements per transaction	1
Data per statement	~256 KB
Total data updated	256 MB–8 GB
Rows updated	1,048,576 (2^20) to 33,554,432 (2^25)
Column `c` type	CHAR(255)

The table is pre-loaded with 33,554,432 rows. The id column contains sequential values from 0 to 2^26−1; the c column contains 255 consecutive a characters. Each transaction executes:

UPDATE `tb` SET c = 'BBBBB...BBB' WHERE id < ?

Setting ? to x updates x rows, 256 bytes each, for a total of x × 256 bytes. In this test, x ranges from 1,048,576 to 33,554,432.

Because UPDATE statements write both the before-image and after-image to the binlog, each 256-byte update generates 512 bytes of binlog. On a non-sharded table, the binlog limit is reached when about 1 GB of data is updated. With 8 shards, the limit rises to 8 GB.

When the limit is exceeded, the same binlog overflow error from Scenario 1 is reported.

Conclusions:

For UPDATE transactions, the data updated per shard cannot exceed MAX_BINLOG_CACHE_SIZE ÷ 2 (1 GB by default), because binlogs record both the before-image and after-image.
Non-sharded table: maximum UPDATE volume is ~1 GB. 8-shard table: ~8 GB.
To update more data in a single transaction, split the table into more shards.

Scenario 3: Large number of statements, small data per statement

This scenario tests transactions that execute a very large number of small statements.

Parameter	Value
SQL statements per transaction	64,000–1,280,000
Data per statement	Several bytes
Total data updated	32 bytes
Rows in table	32
Column `c` type	CHAR(1)

The table contains 32 rows with sequential id values (0–31) and c = 'a'. Each iteration runs the following 64 statements:

UPDATE `tb` SET c = 'b' WHERE id = 0;
UPDATE `tb` SET c = 'a' WHERE id = 0;
UPDATE `tb` SET c = 'b' WHERE id = 1;
UPDATE `tb` SET c = 'a' WHERE id = 1;
...
UPDATE `tb` SET c = 'b' WHERE id = 31;
UPDATE `tb` SET c = 'a' WHERE id = 31;

Conclusion:

PolarDB-X does not hit a performance bottleneck even at 1.28 million SQL statements per transaction, regardless of shard count. In practice, transaction performance is more likely to be limited by data volume than by statement count.

Scenario 4: Effect of shard count on transaction performance

This scenario isolates how the number of shards affects transaction running time, independent of data volume.

Parameter	Value
SQL statements per transaction	1
Data per statement	Several bytes
Total data updated	8 KB–8 MB
Shards tested	1–2,048
Column `c` type	CHAR(1)

Each transaction runs:

UPDATE `tb` SET c = 'b' WHERE id < x

This updates x rows and x bytes of data.

Conclusions:

Small data volume (8 KB): Transaction commit time dominates. Running time increases with shard count—roughly linearly above 1,024 shards. Below 64 shards, DML execution time still contributes, so the increase is not strictly linear.
Large data volume (8 MB): DML execution time dominates. Shard count has little effect on running time.
More shards increase total transaction capacity but also increase commit overhead. For transactions that operate on small amounts of data across many shards, commit time becomes the bottleneck. Size your shard count based on the data volume your transactions actually process.

Sizing guidance:

Data volume per transaction	Recommended approach
Large (hundreds of MB or more)	Increase shard count to raise capacity; commit overhead is negligible
Small (KB range)	Avoid excessive shard counts; commit time dominates and degrades performance

Scenario 5: High compute node workload

This scenario tests PolarDB-X stability when a single transaction drives high memory pressure on compute nodes through a large hash join.

The logical table tb is split into 16 shards and contains 67,108,864 (2^26) rows of CHAR(255) data (~16 GB). A second table tb2 has the same schema, data, and size. A temporary table tmp shares the same schema.

Each transaction runs:

INSERT INTO tmp
    SELECT tb.id, tb.c
  FROM tb, tb2
  WHERE tb.c = tb2.c AND tb.id > x AND tb2.id > x

At x = 0, the hash join processes all data from both tables—32 GB in total on compute nodes with 16 GB of memory each.

The execution plan:

LogicalInsert(table="tmp", columns=RecordType(BIGINT id, CHAR c))
  Project(id="id", c="c")
    HashJoin(condition="c = c", type="inner")
      Gather(concurrent=true)
        LogicalView(tables="[000000-000015].tb_VjuI", shardCount=16,
          sql="SELECT `id`, `c` FROM `tb` AS `tb` WHERE (`id` > ?)")
      Gather(concurrent=true)
        LogicalView(tables="[000000-000015].tb2_IfrZ", shardCount=16,
          sql="SELECT `c` FROM `tb2` AS `tb2` WHERE (`id` > ?)")

Conclusion:

PolarDB-X handles high compute node workloads from large single transactions without errors. Transaction running time scales linearly with the amount of data processed.

Summary

Rule	Detail
Per-statement size limit	Each SQL statement must fit within `MAX_ALLOWED_PACKET` (default 16 MB). This applies with or without an active transaction.
Per-shard binlog limit	Binlog data per shard per transaction cannot exceed `MAX_BINLOG_CACHE_SIZE` (default 2 GB, max 4 GB).
INSERT capacity	Total insertable data = number of shards × `MAX_BINLOG_CACHE_SIZE`. For INSERT-only transactions, the binlog records the inserted value only.
UPDATE capacity	Total updatable data = number of shards × (`MAX_BINLOG_CACHE_SIZE` ÷ 2). Binlog records both before-image and after-image, doubling the footprint.
Statement count	No performance bottleneck observed at up to 1.28 million statements per transaction. Data volume is the practical limit.
Shard count and commit overhead	More shards increase total capacity but also increase transaction commit time. For small-data transactions on many shards, commit time dominates. Size shards to match your actual data volumes.
Compute node stability	PolarDB-X delivers stable, linear performance under high compute node workloads caused by large single transactions.