Use and limits of distributed sequences - PolarDB - Alibaba Cloud Documentation Center

Generating a unique identifier for each row in a table is a common requirement in database design, similar to the AUTO_INCREMENT feature in MySQL. In PolarDB for PostgreSQL Distributed Edition, you can use sequence objects with types such as SERIAL or BIGSERIAL to implement this feature. However, ensuring that IDs generated across all nodes are globally unique is a challenge in a distributed environment. PolarDB provides different implementation mechanisms to address this challenge. The behavior and limitations of these mechanisms differ based on the sequence type that you choose, which is primarily determined by its integer size.

Sequence type differences

Before you choose a sequence type for a distributed table, review the core differences in the following table to make an informed decision.

Attribute	64-bit sequences (`BIGSERIAL`)	16-bit/32-bit sequences (`SERIAL`, `SMALLSERIAL`)
Recommended use	Highly recommended for all distributed tables.	Not recommended for distributed tables, unless you can accept its limitations.
Implementation	Distributed generation: The ID consists of a `16-bit node ID` and a `48-bit sequence value`.	Centralized generation: ID generation requests must be routed to the primary compute node (primary CN).
ID generation	- Can be generated on any compute node (CN). - Globally unique, but not guaranteed to be consecutive.	- Can only be generated on the primary CN. - Consecutive when generated on the primary CN.
Core limitations	Generated ID values have large gaps when they are generated on different source nodes.	Attempting to generate an ID on any node other than the primary CN results in an error.

Detailed behavior and examples

64-bit sequences

This is the recommended way to generate unique IDs for distributed tables.

How it works

To safely generate IDs on all nodes, PolarDB for PostgreSQL Distributed Edition splits the 64-bit integer space. The high-order 16 bits embed the unique ID of the current node. The low-order 48 bits are used for the incrementing sequence value on that node. This structure ensures that IDs generated from different nodes never conflict.

Behavior analysis

Advantage: You can run INSERT operations on any compute node, such as a primary or secondary CN. This provides high availability and high throughput.
Note: Because the ID contains node information, inserting data on different nodes causes large jumps in the ID values. The values are not numerically consecutive.

Example

-- Create a distributed table that uses BIGSERIAL
CREATE TABLE student (id BIGSERIAL, name TEXT);
SELECT create_distributed_table('student', 'id');

-- (Connect to the primary CN) Insert two rows of data
INSERT INTO student (name) VALUES ('Alice'); -- id: 1
INSERT INTO student (name) VALUES ('Bob');   -- id: 2 (consecutive)

-- (Switch connection to a secondary CN) Insert another row of data
INSERT INTO student (name) VALUES ('Charlie'); -- id: 281474976710657 (a jump occurs)

-- View the results
=> SELECT * FROM student ORDER BY id;
        id         |  name
-------------------+---------
                 1 | Alice
                 2 | Bob
 281474976710657   | Charlie
(3 rows)

The ID 281474976710657 appears to be a very large number. Its binary representation reveals its structure: a 16-bit node ID (0000000000000001) followed by a 48-bit sequence value (000...0001). This represents the first ID generated on node 1.

16-bit/32-bit sequences

Because the integer space is limited, these types cannot embed a node ID for distributed generation. Therefore, they use a centralized generation strategy.

How it works

To ensure global uniqueness, all ID generation requests for 16-bit or 32-bit sequences must be routed to and processed by the primary compute node (primary CN).

Behavior analysis

Advantage: IDs generated on the primary CN are consecutive. This matches the behavior of a standalone database.
Limitation: You cannot run INSERT operations that trigger ID generation on any node other than the primary CN, such as a secondary CN or a data node (DN). This action causes an operation is not allowed on this node error.

Example

-- Create a distributed table that uses SERIAL
CREATE TABLE animal (id SERIAL, name TEXT);
SELECT create_distributed_table('animal', 'id');

-- (Connect to the primary CN) Insert data. The operation succeeds.
INSERT INTO animal (name) VALUES ('cat'); -- id: 1
SELECT * FROM animal;
 id | name
----+------
  1 | cat
(1 row)

-- (Switch connection to a secondary CN) Try to insert data. The operation fails.
INSERT INTO animal (name) VALUES ('dog');
ERROR:  nextval(sequence) calls in worker nodes are not supported for column defaults of type int or smallint
HINT:  If the command was issued from a worker node, try issuing it from the coordinator node instead.

The error message clearly states that calling nextval (to generate a sequence value) for int or smallint columns is not supported on worker nodes, such as a secondary CN.

Summary and best practices

Always use BIGSERIAL. When you design auto-incrementing IDs for distributed tables, choose BIGSERIAL or an IDENTITY column that is based on BIGINT.
Accept non-consecutive IDs. When you use BIGSERIAL, your business logic must not rely on strictly consecutive IDs. The uniqueness and monotonic increase of IDs are guaranteed on a single node.
Use SERIAL with caution. Use SERIAL or SMALLSERIAL only if your business scenario can tolerate all write operations being routed through the primary CN and if write availability on secondary CNs is not a concern.