Hologres binary logging (binlog) records every data change event at the table level, letting you replicate data to downstream systems, synchronize databases, or feed change events into stream-processing pipelines. When combined with Flink, binlog consumption enables end-to-end real-time data transformation across warehouse layers—from the operation data store (ODS) to the data warehouse dimension (DWD) layer and beyond—at lower latency than traditional batch-based approaches.
This topic covers how to enable binlogs on a table, understand the binlog record structure, query binlog data directly, and consume binlog events in real time.
How binlogs work in Hologres
Hologres binary logging is modeled after MySQL's binlog mechanism but adapted for a distributed real-time data warehouse:
-
Table-level control: Enable or disable binlogs per table, with an independent Time to Live (TTL) per table.
-
DML-only recording: Binlogs capture INSERT, UPDATE, and DELETE events. DDL operations are not logged.
-
Distributed binlogs: Because Hologres is a distributed system, binlog data is spread across shards. The log sequence number (LSN) increases monotonically within a shard but is not guaranteed to be continuous, unique, or ordered across shards. See Distribution Key.
-
Directly queryable: Query binlog data with SQL—useful for auditing, debugging, and offset inspection.
-
Not used for high availability: Unlike traditional databases, Hologres binlogs are not used for primary-replica synchronization or data restoration.
Prerequisites
Before you begin, make sure that:
-
Your Hologres instance is V0.9 or later. If it is earlier, contact Hologres support. See How do I get more online support?.
-
Your account has the permissions required to consume binlogs:
| Version | Method | Required permission |
|---|---|---|
| V1.3.14 / V1.1.82 and later | Flink | SELECT on the target table |
| V1.3.14 / V1.1.82 and later | Java Database Connectivity (JDBC) | Replication Role |
| Earlier versions | Any | Superuser only. Accounts with fewer permissions encounter: permission denied for table hg_replication_slot_properties |
Limitations
| Limitation | Details |
|---|---|
| Existing tables (V0.9 and V0.10) | Cannot enable binlogs on existing tables by modifying properties. Recreate the table. In V1.1 and later, you can enable binlogs on existing tables without recreating them. |
| Partitioned tables | Consume binlogs from child tables or standard (non-partitioned) tables only. Consumption from a parent table is not supported. In V1.3.24 and later, you can set a binlog TTL per child table. A child table with no TTL set inherits the parent table's TTL. |
| TTL precision | The TTL is not precise. The system does not delete expired binlogs immediately—deletion occurs at some point after expiration. |
| Foreign tables | Only Hologres internal tables support binlogs. Foreign tables do not. |
| Blink data types | When using Realtime Compute (Blink) to consume binlogs, the TIMESTAMP type is not supported. Use TIMESTAMPTZ when creating tables. SMALLINT is also not supported. |
| Column-oriented tables with frequent updates | Enabling binlogs on a column-oriented table has higher overhead than on a row-oriented table for write-heavy workloads. For tables that also serve online analytical processing (OLAP) queries, use the row-column hybrid storage format. See Table storage formats. |
Binlog support matrix
| Consumer | Row-oriented table | Column-oriented table | Row-column hybrid table (V1.1+) |
|---|---|---|---|
| Realtime Compute (Blink) | Supported | Supported | Supported |
| Fully Managed Flink | Supported | Supported | Supported |
| Open source Flink | Not supported | Not supported | Not supported |
| JDBC | Supported (V1.1+) | Supported (V1.1+) | Supported (V1.1+) |
Enable binlogs
Binlogs are disabled by default. Enable them by setting the binlog.level and binlog.ttl table properties when you create a table, or on an existing table in V1.1 and later.
Create a table with binlogs enabled
V2.1 and later (property names updated to binlog_level and binlog_ttl):
CREATE TABLE test_message_src (
id int PRIMARY KEY,
title text NOT NULL,
body text
)
WITH (
orientation = 'row',
clustering_key = 'id',
binlog_level = 'replica',
binlog_ttl = '86400' -- TTL in seconds (24 hours)
);
All versions:
BEGIN;
CREATE TABLE test_message_src (
id int PRIMARY KEY,
title text NOT NULL,
body text
);
CALL set_table_property('test_message_src', 'orientation', 'row'); -- Row-oriented storage
CALL set_table_property('test_message_src', 'clustering_key', 'id'); -- Clustered index on id
CALL set_table_property('test_message_src', 'binlog.level', 'replica'); -- Enable binlogs
CALL set_table_property('test_message_src', 'binlog.ttl', '86400'); -- TTL in seconds (24 hours)
COMMIT;
Parameters:
| Parameter | Description |
|---|---|
binlog_level / binlog.level |
Whether to enable binlogs. replica = enabled; none = disabled. |
binlog_ttl / binlog.ttl |
Retention period for binlog data, in seconds. Default: 2,592,000 seconds (30 days). |
For full CREATE TABLE syntax, see CREATE TABLE.
Enable or disable binlogs on an existing table
These operations require V1.1 or later.
Enable binlogs:
-- Enable binlogs
BEGIN;
CALL set_table_property('<table_name>', 'binlog.level', 'replica');
COMMIT;
-- Set binlog TTL (in seconds)
BEGIN;
CALL set_table_property('<table_name>', 'binlog.ttl', '2592000');
COMMIT;
Disable binlogs:
BEGIN;
CALL set_table_property('<table_name>', 'binlog.level', 'none');
COMMIT;
Update binlog TTL:
CALL set_table_property('<table_name>', 'binlog.ttl', '8640000'); -- Unit: seconds
In V1.3.24 and later, you can set a TTL on a child table of a partitioned table. A child table with no TTL set inherits the parent table's TTL.
Replace <table_name> with the actual table name.
Understand the binlog record format
Each binlog record contains three system fields followed by all user-defined table columns, in the same column order as the original table's DDL.
| Field | Type | Description |
|---|---|---|
hg_binlog_lsn |
BIGINT | Binlog serial number. Increases monotonically within a shard. Not guaranteed to be continuous, unique, or ordered across shards. |
hg_binlog_event_type |
BIGINT | Type of change event. See event type values below. |
hg_binlog_timestamp_us |
BIGINT | System timestamp of the change event, in microseconds. |
| User table columns | User-defined | All columns from the source table, in DDL order. |
hg_binlog_event_type values
| Value | Name | Meaning |
|---|---|---|
| 5 | INSERT | A row was inserted. |
| 2 | DELETE | A row was deleted. |
| 3 | BEFORE_UPDATE | The row state before an UPDATE. |
| 7 | AFTER_UPDATE | The row state after an UPDATE. |
How UPDATE events are recorded
An UPDATE operation produces two consecutive binlog records: the BEFORE_UPDATE record comes first, immediately followed by the AFTER_UPDATE record. The binlog subscription feature guarantees these two records are consecutive.
BEFORE_UPDATE (3) and AFTER_UPDATE (7) appear only when an UPDATE is executed as a plain SQL statement. When UPDATE is performed through a Hologres connector—such as Holo Client, Flink Connector, or Data Integration—the connector translates BEFORE_UPDATE to DELETE (2) and AFTER_UPDATE to INSERT (5) to ensure eventual consistency.
Structurally, a Hologres binlog behaves like a row-oriented table where hg_binlog_lsn is the key, and hg_binlog_event_type, hg_binlog_timestamp_us, and all original table columns form the value. Because binlog fields are fixed and strongly typed, row-oriented or row-column hybrid tables deliver better read performance when binlogs are enabled.
Query binlogs
Query binlog records directly
Combine the three system fields with the table's original columns in a SELECT statement:
SELECT hg_binlog_lsn, hg_binlog_event_type, hg_binlog_timestamp_us, *
FROM test_message_src;
The following figure shows a sample result.
Query the earliest or latest binlog on a shard
Use hg_get_binlog_cursor to retrieve the first or last binlog entry on a specific shard:
-- Earliest binlog on shard 0
SELECT * FROM hg_get_binlog_cursor('test_message_src', 'OLDEST', 0);
-- Latest binlog on shard 0
SELECT * FROM hg_get_binlog_cursor('test_message_src', 'LATEST', 0);
Syntax:
SELECT * FROM hg_get_binlog_cursor('<table_name>', 'OLDEST' | 'LATEST', <shard_id>);
The following figure shows a sample result for the OLDEST query.
Query binlog position by LSN
Use hg_get_binlog_cursor_by_lsn to find the first binlog record whose LSN is greater than or equal to the specified value. If no such record exists, hg_binlog_timestamp_us in the result returns the current time.
SELECT * FROM hg_get_binlog_cursor_by_lsn('test_message_src', 152, 0);
Syntax:
SELECT * FROM hg_get_binlog_cursor_by_lsn('<table_name>', <lsn>, <shard_id>);
-- <lsn>: BIGINT
The following figure shows a sample result.
Query binlog position by timestamp
Use hg_get_binlog_cursor_by_timestamp to find the first binlog record with a timestamp greater than or equal to the specified time. If the specified time is later than the latest binlog entry, hg_binlog_timestamp_us returns the current time and hg_binlog_lsn returns the LSN that will be assigned to the next inserted row.
If the provided timestamp is later than the current time returned bynow(), the query throws aGet binlog cursor in future timeexception.
SELECT *, to_timestamp(hg_binlog_timestamp_us / 1000000.0)
FROM hg_get_binlog_cursor_by_timestamp('test_message_src', '2024-05-20 19:34:53.791+08', 0);
Syntax:
SELECT * FROM hg_get_binlog_cursor_by_timestamp('<table_name>', <timestamp>, <shard_id>);
The following figure shows a sample result.
Consume binlogs in real time
Choose a consumption method based on your environment:
| Method | When to use |
|---|---|
| Flink or Realtime Compute (Blink) | Stream-processing pipelines. Supported for row-oriented, column-oriented, and row-column hybrid tables. |
| JDBC (including Holo Client) | Custom consumers and application-level change data capture (CDC). Requires V1.1+ and the Replication Role. |
-
For Flink or Blink, see Consume Hologres binlogs in real time using Flink or Blink.
-
For JDBC, see Consume Hologres binlogs using JDBC.
Manage binlog tables
View tables with binlogs enabled
Run the following query to list all tables with binlogs enabled in the current database:
SELECT *
FROM hologres.hg_table_properties
WHERE property_key = 'binlog.level'
AND property_value = 'replica';
The following figure shows a sample result.
View binlog storage size
-
All versions: Use
pg_relation_sizeto get total table storage, which includes binlog storage. See View the storage size of a table. -
V2.1 and later: Use
hologres.hg_relation_sizeto view a storage breakdown by type—data, binlogs, and other components. See View table storage details.
Suppress binlog generation for a session
To run DML statements without generating binlog records, set the following Grand Unified Configuration (GUC) parameter at the session level before executing the DML statement:
SET hg_experimental_generate_binlog = off;
This setting applies only to the current session and must be set before each DML statement for which you want to suppress binlog output.