Custom data replication key - PolarDB - Alibaba Cloud Documentation Center

By default, IMCI uses the primary key to synchronize data updates and deletions. However, this strategy is inefficient if a table lacks a primary key or has an unordered one, such as a UUID. These scenarios cause excessive random I/O during data synchronization, which degrades performance. This guide explains how to define a custom replication key to improve data synchronization performance in these scenarios.

Prerequisites

The table must have a column with monotonically increasing values in physical storage.

Considerations

Before you begin, consider the following:

Must be based on a unique index: A custom replication key must be based on a unique index, specified with the USING INDEX clause.
Index columns must be non-null: All columns in the unique index for the replication key must be defined as NOT NULL.
Use an ordered index for optimal performance: For optimal performance, we strongly recommend building the unique index on a monotonically increasing column, such as an auto-incrementing column of the SERIAL or BIGSERIAL type.

How it works

`REPLICA IDENTITY` mechanism

The REPLICA IDENTITY property determines what information is written to the WAL (write-ahead log) during logical replication to identify changed rows. IMCI uses this mechanism to locate and update data in the columnar store. REPLICA IDENTITY supports the following four modes:

Mode	Description	Use case
`DEFAULT`	Uses the primary key as the replication identity.	The default and recommended mode. No change is needed if the table has a primary key.
`USING INDEX`	Uses a specified unique index as the replication identity.	Specify an ordered unique index to improve efficiency when the table lacks a primary key or has an unordered primary key.
`FULL`	Writes the values of all columns in the row to the WAL.	Use only when a table has neither a primary key nor a unique index but must support UPDATE or DELETE operations.
`NOTHING`	Records no identifying information.	Use only for insert-only tables that do not support updates or deletes.

Ordered unique index as replication key

The core principle is to use the USING INDEX mode to set the replication key to a unique index on an ordered column, such as an auto-incrementing column.

The workflow is as follows:

Create an ordered unique index: Create a unique index on a physically ordered column, such as an auto-incrementing ID of the SERIAL type.
Change the replication identity: Run the ALTER TABLE ... REPLICA IDENTITY USING INDEX ... command to switch the table's replication identity to the newly created unique index.
Efficient logical decoding: During logical replication, the system efficiently decodes the ordered index key values from the WAL.
Accelerate data synchronization: IMCI uses these ordered key values to locate and update data in the columnar store. Because the key values are ordered, operations become efficient sequential I/O, which significantly boosts data synchronization performance.

Procedure

Step 1: Create a unique index

Create a unique index on the ordered auto-incrementing column.

-- Assuming table t has an auto-incrementing column named serial_id
CREATE UNIQUE INDEX idx_t_serial_id ON t(serial_id);

Step 2: Change the replication identity
Switch the table's REPLICA IDENTITY to the unique index created in the previous step.
```
ALTER TABLE t REPLICA IDENTITY USING INDEX idx_t_serial_id;
```
Step 3: Create or rebuild the columnar store index
Ensure the columnar store index (CSI) includes the column used for the replication key. If a columnar store index already exists, we recommend rebuilding it for optimal performance.
```
-- Create a columnar store index that includes all columns
CREATE INDEX ON t USING csi;
```
Step 4: Compare performance
We conducted an experiment to verify the performance improvement.
- Test setup:
  1. Create a table with an unordered primary key (random_id) and an ordered auto-incrementing column (serial_id).
```
create table t (random_id text primary key, serial_id serial, a text);
```
  2. Insert 10 million rows of data.
```
insert into t(random_id, a)
  select md5(i::text), i::text
    from generate_series(1, 10000000) i;
```
  3. Create a columnar store index and an ordered unique index, idx_a, on the serial_id column.
```
create unique index idx_a on t(serial_id);
create index on t using csi;
```
- Test procedure:
  1. Default mode test (using the unordered primary key random_id as the replication key):
```
-- Update 10,000 rows
UPDATE t SET a = '0' WHERE serial_id <= 10000;
```
  2. Custom mode test (switching the replication key to the ordered unique index idx_a):
```
ALTER TABLE t REPLICA IDENTITY USING INDEX idx_a;
-- Update another 10,000 rows
UPDATE t SET a = '1' WHERE serial_id <= 10000;
```
- Test results:
  Replication key type
  Default (unordered primary key)
  Custom (ordered unique index)
  Synchronization time (10,000 rows)
  ~0.5 seconds
  ~0.01 seconds

Conclusion

The approximately 50-fold improvement in performance demonstrates that a custom, ordered replication key is critical for optimizing IMCI data synchronization efficiency.

Replication key type	Default (unordered primary key)	Custom (ordered unique index)
Synchronization time (10,000 rows)	~0.5 seconds	~0.01 seconds