Generating a unique identifier for each row in a table is a common requirement in database design, similar to the AUTO_INCREMENT feature in MySQL. In PolarDB for PostgreSQL Distributed Edition, you can use sequence objects with types such as SERIAL or BIGSERIAL to implement this feature. However, ensuring that IDs generated across all nodes are globally unique is a challenge in a distributed environment. PolarDB provides different implementation mechanisms to address this challenge. The behavior and limitations of these mechanisms differ based on the sequence type that you choose, which is primarily determined by its integer size.
Sequence type differences
Before you choose a sequence type for a distributed table, review the core differences in the following table to make an informed decision.
Attribute | 64-bit sequences ( | 16-bit/32-bit sequences ( |
Recommended use | Highly recommended for all distributed tables. | Not recommended for distributed tables, unless you can accept its limitations. |
Implementation | Distributed generation: The ID consists of a | Centralized generation: ID generation requests must be routed to the primary compute node (primary CN). |
ID generation | - Can be generated on any compute node (CN). - Globally unique, but not guaranteed to be consecutive. | - Can only be generated on the primary CN. - Consecutive when generated on the primary CN. |
Core limitations | Generated ID values have large gaps when they are generated on different source nodes. | Attempting to generate an ID on any node other than the primary CN results in an error. |
Detailed behavior and examples
64-bit sequences
This is the recommended way to generate unique IDs for distributed tables.
How it works
To safely generate IDs on all nodes, PolarDB for PostgreSQL Distributed Edition splits the 64-bit integer space. The high-order 16 bits embed the unique ID of the current node. The low-order 48 bits are used for the incrementing sequence value on that node. This structure ensures that IDs generated from different nodes never conflict.
Behavior analysis
Advantage: You can run
INSERToperations on any compute node, such as a primary or secondary CN. This provides high availability and high throughput.Note: Because the ID contains node information, inserting data on different nodes causes large jumps in the ID values. The values are not numerically consecutive.
Example
-- Create a distributed table that uses BIGSERIAL
CREATE TABLE student (id BIGSERIAL, name TEXT);
SELECT create_distributed_table('student', 'id');
-- (Connect to the primary CN) Insert two rows of data
INSERT INTO student (name) VALUES ('Alice'); -- id: 1
INSERT INTO student (name) VALUES ('Bob'); -- id: 2 (consecutive)
-- (Switch connection to a secondary CN) Insert another row of data
INSERT INTO student (name) VALUES ('Charlie'); -- id: 281474976710657 (a jump occurs)
-- View the results
=> SELECT * FROM student ORDER BY id;
id | name
-------------------+---------
1 | Alice
2 | Bob
281474976710657 | Charlie
(3 rows)The ID 281474976710657 appears to be a very large number. Its binary representation reveals its structure: a 16-bit node ID (0000000000000001) followed by a 48-bit sequence value (000...0001). This represents the first ID generated on node 1.
16-bit/32-bit sequences
Because the integer space is limited, these types cannot embed a node ID for distributed generation. Therefore, they use a centralized generation strategy.
How it works
To ensure global uniqueness, all ID generation requests for 16-bit or 32-bit sequences must be routed to and processed by the primary compute node (primary CN).
Behavior analysis
Advantage: IDs generated on the primary CN are consecutive. This matches the behavior of a standalone database.
Limitation: You cannot run
INSERToperations that trigger ID generation on any node other than the primary CN, such as a secondary CN or a data node (DN). This action causes anoperation is not allowed on this nodeerror.
Example
-- Create a distributed table that uses SERIAL
CREATE TABLE animal (id SERIAL, name TEXT);
SELECT create_distributed_table('animal', 'id');
-- (Connect to the primary CN) Insert data. The operation succeeds.
INSERT INTO animal (name) VALUES ('cat'); -- id: 1
SELECT * FROM animal;
id | name
----+------
1 | cat
(1 row)
-- (Switch connection to a secondary CN) Try to insert data. The operation fails.
INSERT INTO animal (name) VALUES ('dog');
ERROR: nextval(sequence) calls in worker nodes are not supported for column defaults of type int or smallint
HINT: If the command was issued from a worker node, try issuing it from the coordinator node instead.The error message clearly states that calling nextval (to generate a sequence value) for int or smallint columns is not supported on worker nodes, such as a secondary CN.
Summary and best practices
Always use
BIGSERIAL. When you design auto-incrementing IDs for distributed tables, chooseBIGSERIALor anIDENTITYcolumn that is based onBIGINT.Accept non-consecutive IDs. When you use
BIGSERIAL, your business logic must not rely on strictly consecutive IDs. The uniqueness and monotonic increase of IDs are guaranteed on a single node.Use
SERIALwith caution. UseSERIALorSMALLSERIALonly if your business scenario can tolerate all write operations being routed through the primary CN and if write availability on secondary CNs is not a concern.