How to configure Hologres data synchronization - DataWorks

Supported sync task types

Task type	Supported sources	Configuration guide
Single-table offline	All Data Integration sources	Codeless UI / Script Mode
Single-table real-time	DataHub, Hologres, Kafka, LogHub	Configure a real-time synchronization task for a single table
Full-database offline	AnalyticDB for MySQL 3.0, ClickHouse, Doris, Hologres, Oracle, PolarDB, SQL Server	Configure a full-database offline synchronization task
Full-database real-time	AnalyticDB for OceanBase, MongoDB, MySQL, Oracle, PolarDB, PolarDB-X 2.0, PostgreSQL	Configure a real-time synchronization task for an entire database
Serverless full-database real-time	MySQL	Configure a Serverless synchronization task

For frequently asked questions about real-time sync, see FAQ about real-time synchronization to Hologres.

Limitations

Offline read and write

The Hologres writer does not support writing to Hologres foreign tables.
The endpoint selection logic depends on whether the Hologres instance is in the same region as the task:
- Same region: any Tunnel > Single Tunnel > Public
- Different region: Public > Single Tunnel

Single-table real-time read

Hologres version 2.1 or later is required.
Incremental synchronization is not supported for partitioned tables.
Data Definition Language (DDL) change messages from Hologres tables are not supported.
Enable Hologres Binlog on the source table before configuring a real-time sync task. See Subscribe to Hologres Binlog.
Supported data types for incremental sync: INTEGER, BIGINT, TEXT, CHAR(n), VARCHAR(n), REAL, JSON, SERIAL, OID, INT4[], INT8[], FLOAT8[], BOOLEAN[], TEXT[].

Full-database real-time write

Tables without a primary key are not supported.
When syncing from MySQL to Hologres, write data only to child tables of a partitioned table, not the parent table.

Supported field types

UUID is not supported for offline read, offline write, or real-time write. All other types listed below are fully supported across all three operations.

Type	Offline read	Offline write	Real-time write
UUID	Not supported	Not supported	Not supported
CHAR	Supported	Supported	Supported
NCHAR	Supported	Supported	Supported
VARCHAR	Supported	Supported	Supported
LONGVARCHAR	Supported	Supported	Supported
NVARCHAR	Supported	Supported	Supported
LONGNVARCHAR	Supported	Supported	Supported
CLOB	Supported	Supported	Supported
NCLOB	Supported	Supported	Supported
SMALLINT	Supported	Supported	Supported
TINYINT	Supported	Supported	Supported
INTEGER	Supported	Supported	Supported
BIGINT	Supported	Supported	Supported
NUMERIC	Supported	Supported	Supported
DECIMAL	Supported	Supported	Supported
FLOAT	Supported	Supported	Supported
REAL	Supported	Supported	Supported
DOUBLE	Supported	Supported	Supported
TIME	Supported	Supported	Supported
DATE	Supported	Supported	Supported
TIMESTAMP	Supported	Supported	Supported
BINARY	Supported	Supported	Supported
VARBINARY	Supported	Supported	Supported
BLOB	Supported	Supported	Supported
LONGVARBINARY	Supported	Supported	Supported
BOOLEAN	Supported	Supported	Supported
BIT	Supported	Supported	Supported
JSON	Supported	Supported	Supported
JSONB	Supported	Supported	Supported

How it works

Offline read

The Hologres reader uses PSQL to read data from a Hologres table. It launches multiple concurrent SELECT tasks based on the table's shard count, filtering rows per shard using the built-in hg_shard_id column.

The shard count is set when the table is created:

CALL set_table_property('table_name', 'shard_count', 'xx');

Run this command within the same CREATE TABLE transaction. If not specified, the table inherits the database's default shard count, which depends on your instance configuration.

Offline write

The Hologres writer uses the conflictMode parameter to resolve primary key conflicts during writes. conflictMode applies only to tables with a primary key.

`conflictMode` value	Behavior
`Replace`	Overwrites the entire row. Columns not mapped in the task are set to `NULL`.
`Update`	Updates only the mapped columns. Unmapped columns retain their existing values.
`Ignore`	Discards the incoming row if a conflict exists.

For details on write performance, see Technical principles.

Appendix: Script Mode parameters and examples

Use Script Mode to configure a batch synchronization task with the code editor. For general Script Mode setup, see Use the code editor.

Reader script

Both examples use the holo step type with the Reader category. The key difference is the partition parameter, which is required for partitioned tables.

Non-partitioned table

{
    "transform": false,
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "holo",
            "parameter": {
                "datasource": "holo_db",
                "envType": 1,
                "column": [
                    "tag",
                    "id",
                    "title",
                    "body"
                ],
                "where": "",
                "table": "holo_reader_basic_src"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {
                "print": false,
                "fieldDelimiter": ","
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "executeMode": null,
        "failoverEnable": null,
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "concurrent": 2,
            "throttle": false
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Corresponding Hologres DDL:

BEGIN;
DROP TABLE IF EXISTS holo_reader_basic_src;
CREATE TABLE holo_reader_basic_src(
  tag text NOT NULL,
  id int NOT NULL,
  title text NOT NULL,
  body text,
  PRIMARY KEY (tag, id));
  CALL set_table_property('holo_reader_basic_src', 'orientation', 'column');
  CALL set_table_property('holo_reader_basic_src', 'shard_count', '3');
COMMIT;

Partitioned table

Set the partition parameter to specify which child table to read from. The format is column=value and must match the partition key defined in the DDL.

{
    "transform": false,
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "holo",
            "parameter": {
                "selectedDatabase": "public",
                "partition": "tag=foo",
                "datasource": "holo_db",
                "envType": 1,
                "column": [
                    "tag",
                    "id",
                    "title",
                    "body"
                ],
                "tableComment": "",
                "where": "",
                "table": "public.holo_reader_basic_part_src"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Corresponding Hologres DDL:

BEGIN;
DROP TABLE IF EXISTS holo_reader_basic_part_src;
CREATE TABLE holo_reader_basic_part_src(
  tag text NOT NULL,
  id int NOT NULL,
  title text NOT NULL,
  body text,
  PRIMARY KEY (tag, id))
  PARTITION BY LIST(tag);
  CALL set_table_property('holo_reader_basic_part_src', 'orientation', 'column');
  CALL set_table_property('holo_reader_basic_part_src', 'shard_count', '3');
COMMIT;

CREATE TABLE holo_reader_basic_part_src_1583161774228
  PARTITION OF holo_reader_basic_part_src FOR VALUES IN ('foo');

Reader script parameters

Parameter	Required	Default	Description
`database`	Yes	—	The Hologres database name.
`table`	Yes	—	The table name. For a partitioned table, specify the parent table name.
`column`	Yes	—	The columns to read. Use `["*"]` to read all columns.
`partition`	No	Empty (non-partitioned)	The partition filter in `column=value` format. Hologres supports only list partitioning; the partition key must be a single `INT4` or `TEXT` column. The specified child table must exist and contain data.

Writer script

Both examples write to Hologres using the holo step type with the Writer category. The key difference is the partition parameter for partitioned tables.

Non-partitioned table

This example writes from a MySQL database to a non-partitioned Hologres table using JDBC mode.

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "mysql",
            "parameter": {
                "envType": 0,
                "useSpecialSecret": false,
                "column": [
                    "<column1>",
                    "<column2>",
                    "<columnN>"
                ],
                "tableComment": "",
                "connection": [
                    {
                        "datasource": "<mysql_source_name>",
                        "table": [
                            "<mysql_table_name>"
                        ]
                    }
                ],
                "where": "",
                "splitPk": "",
                "encoding": "UTF-8"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "holo",
            "parameter": {
                "selectedDatabase": "public",
                "schema": "public",
                "maxConnectionCount": 9,
                "truncate": true,
                "datasource": "<holo_sink_name>",
                "conflictMode": "ignore",
                "envType": 0,
                "column": [
                    "<column1>",
                    "<column2>",
                    "<columnN>"
                ],
                "tableComment": "",
                "table": "<holo_table_name>",
                "reShuffleByDistributionKey": false
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "executeMode": null,
        "errorLimit": {
            "record": "0"
        },
        "locale": "zh_CN",
        "speed": {
            "concurrent": 2,
            "throttle": false
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Replace the placeholders with actual values:

Placeholder	Description
`<mysql_source_name>`	The name of the MySQL data source in DataWorks.
`<mysql_table_name>`	The source MySQL table name.
`<holo_sink_name>`	The name of the Hologres data source in DataWorks.
`<holo_table_name>`	The destination Hologres table name.
`<column1>` ... `<columnN>`	The column names to sync.

Corresponding Hologres DDL:

BEGIN;
DROP TABLE IF EXISTS mysql_to_holo_test;
CREATE TABLE mysql_to_holo_test(
  tag text NOT NULL,
  id int NOT NULL,
  body text NOT NULL,
  birth date,
  PRIMARY KEY (tag, id));
  CALL set_table_property('mysql_to_holo_test', 'orientation', 'column');
  CALL set_table_property('mysql_to_holo_test', 'distribution_key', 'id');
  CALL set_table_property('mysql_to_holo_test', 'clustering_key', 'birth');
COMMIT;

Partitioned table

Hologres supports only list partitioning. The partition key must be a single INT4 or TEXT column. The partition value must match the partition configuration in the table's DDL.

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "mysql",
            "parameter": {
                "envType": 0,
                "useSpecialSecret": false,
                "column": [
                    "<column1>",
                    "<column2>",
                    "<columnN>"
                ],
                "tableComment": "",
                "connection": [
                    {
                        "datasource": "<mysql_source_name>",
                        "table": [
                            "<mysql_table_name>"
                        ]
                    }
                ],
                "where": "",
                "splitPk": "<mysql_pk>",
                "encoding": "UTF-8"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "holo",
            "parameter": {
                "selectedDatabase": "public",
                "maxConnectionCount": 9,
                "partition": "<partition_key>",
                "truncate": "false",
                "datasource": "<holo_sink_name>",
                "conflictMode": "ignore",
                "envType": 0,
                "column": [
                    "<column1>",
                    "<column2>",
                    "<columnN>"
                ],
                "tableComment": "",
                "table": "<holo_table_name>",
                "reShuffleByDistributionKey": false
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "executeMode": null,
        "failoverEnable": null,
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "concurrent": 2,
            "throttle": false
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Replace the placeholders with actual values:

Placeholder	Description
`<mysql_source_name>`	The name of the MySQL data source in DataWorks.
`<mysql_table_name>`	The source MySQL table name.
`<mysql_pk>`	The primary key column of the MySQL table, used for data splitting.
`<partition_key>`	The partition key column of the Hologres table.
`<holo_sink_name>`	The name of the Hologres data source in DataWorks.
`<holo_table_name>`	The destination Hologres parent table name.
`<column1>` ... `<columnN>`	The column names to sync.

Corresponding Hologres DDL:

BEGIN;
CREATE TABLE public.hologres_parent_table(
  a text,
  b int,
  c timestamp,
  d text,
  ds text,
  PRIMARY KEY(ds, b))
  PARTITION BY LIST(ds);
CALL set_table_property('public.hologres_parent_table', 'orientation', 'column');
CREATE TABLE public.holo_child_1 PARTITION OF public.hologres_parent_table FOR VALUES IN('20201215');
CREATE TABLE public.holo_child_2 PARTITION OF public.hologres_parent_table FOR VALUES IN('20201216');
CREATE TABLE public.holo_child_3 PARTITION OF public.hologres_parent_table FOR VALUES IN('20201217');
COMMIT;

Writer script parameters

Parameter	Required	Default	Description
`database`	Yes	—	The Hologres database name.
`table`	Yes	—	The table name. Include the schema if needed: `schema_name.table_name`.
`conflictMode`	Yes	—	How to handle primary key conflicts: `Replace`, `Update`, or `Ignore`. See Offline write for details.
`column`	Yes	—	The destination columns to write to. Must include all primary key columns. Use `["*"]` to write to all columns.
`partition`	No	Empty (non-partitioned)	The partition filter in `column=value` format. Hologres supports only list partitioning; the partition key must be a single `INT4` or `TEXT` column.
`reShuffleByDistributionKey`	No	`false`	Routes data to specific shards based on the distribution key, enabling lock-free concurrent writes. Available only with a Serverless resource group.
`truncate`	No	`false`	Whether to truncate the destination table before writing. Set to `true` to truncate. For statically partitioned tables, only the specified child table is truncated; the parent table is not affected. Truncating dynamically partitioned tables is not supported and causes the task to fail.