ClickHouse data synchronization capabilities supported by DataWorks - DataWorks

The ClickHouse data source in DataWorks lets you read from and write to ClickHouse using batch synchronization tasks. This topic covers supported versions, limitations, data types, and script parameters.

Capabilities

Capability	Supported	Notes
Batch read	Yes	Via JDBC statements only
Batch write	Yes	INSERT mode; max 1,000 TPS recommended
Column filtering and reordering	Yes
Incremental synchronization	Yes	Use the `where` parameter
Full synchronization	Yes	Default when `where` is not set
Serverless Resource Groups	Yes
Single-table batch synchronization	Yes	ApsaraDB for ClickHouse only
Full-Database Batch Synchronization	Yes

Supported versions

The following table lists the supported ApsaraDB for ClickHouse kernel versions and their compatible Java Database Connectivity (JDBC) driver versions.

JDBC driver version	ApsaraDB for ClickHouse kernel version
0.2.4	20.8, 21.8
0.4.0, 0.4.2	22.8, 23.8, 25.3

Limitations

Only batch read and write operations are supported. Streaming and real-time modes are not available.
Connections use JDBC; data reading is limited to JDBC statements.
Single-table batch synchronization is supported only for ApsaraDB for ClickHouse.
When using ClickHouse Writer in INSERT mode, set the maximum throughput to 1,000 Transactions Per Second (TPS) to avoid overloading ClickHouse.

Supported data types

The table below lists the ApsaraDB for ClickHouse data types supported by ClickHouse Reader and Writer. Data types not listed here are not supported. For the complete list of ApsaraDB for ClickHouse data types, see Data types. For the complete open-source list, see ClickHouse data types.

Type	Reader	Writer	Notes
Int8	Supported	Supported
Int16	Supported	Supported
Int32	Supported	Supported
Int64	Supported	Supported
UInt8	Supported	Supported
UInt16	Supported	Supported
UInt32	Supported	Supported
UInt64	Supported	Supported
Float32	Supported	Supported
Float64	Supported	Supported
Decimal	Supported	Supported
String	Supported	Supported
FixedString	Supported	Supported
Date	Supported	Supported
DateTime	Supported	Supported
DateTime64	Supported	Supported
Boolean	Supported	Supported	ClickHouse has no native Boolean type. Use UInt8 or Int8 as an alternative.
Array	Partially supported	Supported	Reader supports arrays of integer, floating-point, string, or DateTime64 (millisecond precision) elements only.
Tuple	Supported	Supported
Domain (IPv4, IPv6)	Supported	Supported
Enum8	Supported	Supported
Enum16	Supported	Supported
Nullable	Supported	Supported
Nested	Partially supported	Supported	Reader supports nested types of integer, floating-point, string, or DateTime64 (millisecond precision) only.

Add a data source

Add the ClickHouse data source to DataWorks before creating a synchronization task. Follow the instructions in Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.

Data synchronization

Single-table batch synchronization

To configure using the Codeless UI or Code Editor, see Configure a task in the Codeless UI and Configure a task in the Code Editor.
For the complete parameter reference and script samples, see Appendix: Script samples and parameters.

Full-Database Batch Synchronization

See Configure a Full-Database Batch Synchronization task.

Appendix: Script samples and parameters

The following sections provide script samples and parameter descriptions for configuring ClickHouse Reader and Writer in the Code Editor. For general Code Editor usage, see Use the code editor.

Reader script sample

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "clickhouse",
            "parameter": {
                "datasource": "example",
                "table": "source_table",
                "column": [
                    "id",
                    "name"
                ],
                "where": "",
                "splitPk": "",
                "fetchSize": 1024
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Reader parameters

Parameter	Required	Default	Description
`datasource`	Yes	None	The name of the ClickHouse data source created in DataWorks.
`table`	Yes	None	The source table to read from. Must be specified within the `connection` configuration block.
`column`	Yes	None	The columns to read. Specify as a JSON array, for example: `["id", "name", "age"]`. Cannot be empty.
`jdbcUrl`	Yes	None	The JDBC connection string for the source database. Specified within the `connection` block; only one value per database. Format follows the official ClickHouse JDBC format, for example: `jdbc:clickhouse://localhost:3306/test?user=root&password=&useUnicode=true&characterEncoding=gbk&autoReconnect=true&failOverReadOnly=false`.
`username`	Yes	None	The username for the data source.
`password`	Yes	None	The password for the data source.
`fetchSize`	No	1024	The number of records fetched per batch. A larger value reduces network round trips and improves extraction performance. Setting this too high may cause an Out-of-Memory (OOM) error — increase the value gradually based on your resource usage and ClickHouse workload. Required when `splitPk` is configured.
`splitPk`	No	None	The shard key for parallel reading. When set, DataWorks shards the data by this field and runs concurrent tasks to improve throughput. `fetchSize` must be configured when using this parameter.
`where`	No	None	A filter condition for incremental synchronization, for example: `gmt_create>$bizdate`. Without a `where` condition, a full data synchronization runs.

Writer script sample

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "clickhouse",
            "parameter": {
                "datasource": "example_ch_writer",
                "column": [
                    "id",
                    "name"
                ],
                "table": "target_table",
                "preSql": [
                    "TRUNCATE TABLE IF EXISTS target_table"
                ],
                "postSql": [
                    "OPTIMIZE TABLE target_table FINAL"
                ],
                "batchSize": 1024,
                "writeMode": "insert"
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Writer parameters

Parameter	Required	Default	Description
`datasource`	Yes	None	The name of the ClickHouse data source created in DataWorks.
`table`	Yes	None	The target table to write to. Must be specified within the `connection` configuration block.
`column`	Yes	None	The target columns to write to. Specify as a JSON array, for example: `["id", "name", "age"]`. Cannot be empty.
`jdbcUrl`	Yes	None	The JDBC connection string for the target database. Specified within the `connection` block; only one value per database. For example: `jdbc:clickhouse://127.0.0.1:3306/database`.
`password`	Yes	None	The password for the data source.
`preSql`	No	None	An SQL statement that runs before the write task begins. For example: `TRUNCATE TABLE IF EXISTS target_table`.
`postSql`	No	None	An SQL statement that runs after the write task completes. For example: `OPTIMIZE TABLE target_table FINAL`.
`batchSize`	No	1024	The number of records written per batch. A larger value reduces network interactions with ClickHouse and improves overall throughput.