The wal2json plugin decodes write-ahead log (WAL) changes into JSON format, making it straightforward to consume database change events in data pipelines, change data capture (CDC) systems, auditing tools, and event-driven services.
Prerequisites
Before you begin, make sure that you have:
A PolarDB for PostgreSQL cluster running a supported minor engine version
The
wal_levelparameter set tological. Set this in the console under Set cluster parameters. This change requires a cluster restart.
Applicability
The following minor engine versions support wal2json:
| PostgreSQL version | Minimum minor engine version |
|---|---|
| PostgreSQL 18 | 2.0.18.1.1.0 |
| PostgreSQL 17 | 2.0.17.7.5.0 |
| PostgreSQL 16 | 2.0.16.6.2.0 |
| PostgreSQL 15 | 2.0.15.12.4.0 |
| PostgreSQL 14 | 2.0.14.5.1.0 |
| PostgreSQL 11 | 2.0.11.9.29.0 |
To check your version, run SHOW polardb_version; or view it in the console. If your version is not supported, upgrade the minor engine version.
How it works
wal2json is a logical decoding output plugin for PostgreSQL. Rather than using the default pgoutput format, it converts WAL entries into JSON, which makes WAL changes easy to integrate with systems that consume JSON data.
For each transaction, wal2json generates one JSON object containing all changed tuples (rows). You can include additional metadata in the output — such as transaction IDs, timestamps, schemas, and data types — using the parameters described below.
wal2json supports two ways to consume changes:
SQL API — Use
pg_logical_slot_get_changes()to pull changes on demand. This approach works well for batch processing and one-off queries.Streaming protocol — Use logical replication slots with a streaming consumer (for example,
pg_recvlogical). This approach suits real-time CDC pipelines.
The plugin handles the following DML operations: INSERT, UPDATE, DELETE, and TRUNCATE (format version 2 only).
Usage notes
Full row logging by default
PolarDB for PostgreSQL sets REPLICA_IDENTITY_FULL on all tables by default. This means every UPDATE and DELETE writes the full row to the WAL — not just the changed columns. To log only changed columns, disable the polar_create_table_with_full_replica_identity parameter. This parameter cannot be changed in the console. Contact us for assistance.
Tables without a primary key
For tables without a primary key, wal2json relies on REPLICA_IDENTITY to determine what row information is included in UPDATE and DELETE output. With the default REPLICA_IDENTITY_FULL setting on PolarDB for PostgreSQL, full row data is logged for all tables, including those without a primary key.
Retrieve changes using SQL
wal2json is not installed with CREATE EXTENSION. Load it through a logical replication slot instead.
The following example uses an orders table to show how wal2json captures INSERT and UPDATE operations:
Create the table and a logical replication slot:
-- Create a sample table CREATE TABLE orders ( order_id SERIAL PRIMARY KEY, product_name VARCHAR(100), quantity INT, status TEXT ); -- Create a logical replication slot using the wal2json plugin SELECT 'init' FROM pg_create_logical_replication_slot('orders_slot', 'wal2json');Make changes to the table:
BEGIN; INSERT INTO orders (product_name, quantity, status) VALUES ('Laptop', 2, 'pending'); INSERT INTO orders (product_name, quantity, status) VALUES ('Mouse', 5, 'pending'); UPDATE orders SET status = 'confirmed' WHERE product_name = 'Laptop'; COMMIT;Retrieve the changes as JSON:
SELECT data FROM pg_logical_slot_get_changes('orders_slot', NULL, NULL, 'pretty-print', '1');The output looks like this:
{ "change": [ { "kind": "insert", "schema": "public", "table": "orders", "columnnames": ["order_id", "product_name", "quantity", "status"], "columntypes": ["integer", "character varying(100)", "integer", "text"], "columnvalues": [1, "Laptop", 2, "pending"] }, { "kind": "insert", "schema": "public", "table": "orders", "columnnames": ["order_id", "product_name", "quantity", "status"], "columntypes": ["integer", "character varying(100)", "integer", "text"], "columnvalues": [2, "Mouse", 5, "pending"] }, { "kind": "update", "schema": "public", "table": "orders", "columnnames": ["order_id", "product_name", "quantity", "status"], "columntypes": ["integer", "character varying(100)", "integer", "text"], "columnvalues": [1, "Laptop", 2, "confirmed"], "oldkeys": { "keynames": ["order_id"], "keytypes": ["integer"], "keyvalues": [1] } } ] }When you are done, drop the replication slot:
-- Returns 'stop' on success SELECT 'stop' FROM pg_drop_replication_slot('orders_slot');
Parameters
All parameters are passed as key-value pairs to pg_logical_slot_get_changes(). For example:
SELECT data FROM pg_logical_slot_get_changes(
'my_slot', NULL, NULL,
'pretty-print', '1',
'include-xids', '1'
);Output content
| Parameter | Default | Description |
|---|---|---|
include-xids | false | Add the transaction ID (xid) to each changeset |
include-timestamp | false | Add a timestamp to each changeset |
include-lsn | false | Add the next LSN (nextlsn) to each changeset |
include-schemas | true | Add the schema name to each change |
include-types | true | Add data types to each change |
include-typmod | true | Add type modifiers — for example, varchar(20) instead of varchar |
include-type-oids | false | Add type OIDs to each change |
include-not-null | false | Add not null constraint information as columnoptionals |
pretty-print | false | Format JSON with whitespace and indentation |
write-in-chunks | false | Emit output after each change rather than after each full changeset |
Filtering
| Parameter | Default | Description |
|---|---|---|
filter-tables | (none) | Exclude specific tables. Separate multiple entries with commas, and include the schema name (for example, public.orders). Use *.foo to match foo in all schemas, or bar.* to match all tables in schema bar. Escape special characters (space, single quote, comma, period, asterisk) with a backslash. Schema and table names are case-sensitive. For example, the table Foo bar in the public schema is referenced as public.Foo\ bar. |
add-tables | all tables | Decode only the specified tables. Same syntax as filter-tables. |
filter-msg-prefixes | (none) | Exclude rows with the specified message prefixes. Separate multiple prefixes with commas. Typically used with pg_logical_slot_peek_changes(). |
add-msg-prefixes | all | Include only rows with the specified message prefixes. Apply filter-msg-prefixes before this parameter. |
Format and operations
| Parameter | Default | Description |
|---|---|---|
format-version | 1 | Output format version. Version 1 produces one JSON object per transaction — use this for batch processing. Version 2 produces one JSON object per row with optional transaction markers ({"action":"B"} / {"action":"C"}) — use this when you need to process changes row by row. |
actions | all | Operations to include in output: INSERT, UPDATE, DELETE, TRUNCATE. TRUNCATE is only available in format version 2. |
wal2json parameter glossary
| Term | Description |
|---|---|
change | A single WAL entry for one DML operation (INSERT, UPDATE, DELETE, or TRUNCATE) |
changeset | A collection of change entries belonging to one transaction |
Example: using include-xids
This example shows how to pass parameters to retrieve transaction IDs alongside change data.
Create a table, a replication slot, and insert a row:
DROP TABLE IF EXISTS tbl; CREATE TABLE tbl (id int); SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'wal2json'); INSERT INTO tbl VALUES (1);Query the slot with
include-xidsenabled and verify that exactly one distinct transaction ID appears:SELECT count(*) = 1, count(distinct ((data::json)->'xid')::text) = 1 FROM pg_logical_slot_get_changes( 'regression_slot', NULL, NULL, 'format-version', '1', 'include-xids', '1' );
What's next
Set cluster parameters — enable
wal_level = logicalwal2json on GitHub — upstream documentation and design principles