The ClickHouse data source in DataWorks lets you read from and write to ClickHouse using batch synchronization tasks. This topic covers supported versions, limitations, data types, and script parameters.
Capabilities
| Capability | Supported | Notes |
|---|---|---|
| Batch read | Yes | Via JDBC statements only |
| Batch write | Yes | INSERT mode; max 1,000 TPS recommended |
| Column filtering and reordering | Yes | |
| Incremental synchronization | Yes | Use the where parameter |
| Full synchronization | Yes | Default when where is not set |
| Serverless Resource Groups | Yes | |
| Single-table batch synchronization | Yes | ApsaraDB for ClickHouse only |
| Full-Database Batch Synchronization | Yes |
Supported versions
The following table lists the supported ApsaraDB for ClickHouse kernel versions and their compatible Java Database Connectivity (JDBC) driver versions.
| JDBC driver version | ApsaraDB for ClickHouse kernel version |
|---|---|
| 0.2.4 | 20.8, 21.8 |
| 0.4.0, 0.4.2 | 22.8, 23.8, 25.3 |
Limitations
-
Only batch read and write operations are supported. Streaming and real-time modes are not available.
-
Connections use JDBC; data reading is limited to JDBC statements.
-
Single-table batch synchronization is supported only for ApsaraDB for ClickHouse.
-
When using ClickHouse Writer in INSERT mode, set the maximum throughput to 1,000 Transactions Per Second (TPS) to avoid overloading ClickHouse.
Supported data types
The table below lists the ApsaraDB for ClickHouse data types supported by ClickHouse Reader and Writer. Data types not listed here are not supported. For the complete list of ApsaraDB for ClickHouse data types, see Data types. For the complete open-source list, see ClickHouse data types.
| Type | Reader | Writer | Notes |
|---|---|---|---|
| Int8 | Supported | Supported | |
| Int16 | Supported | Supported | |
| Int32 | Supported | Supported | |
| Int64 | Supported | Supported | |
| UInt8 | Supported | Supported | |
| UInt16 | Supported | Supported | |
| UInt32 | Supported | Supported | |
| UInt64 | Supported | Supported | |
| Float32 | Supported | Supported | |
| Float64 | Supported | Supported | |
| Decimal | Supported | Supported | |
| String | Supported | Supported | |
| FixedString | Supported | Supported | |
| Date | Supported | Supported | |
| DateTime | Supported | Supported | |
| DateTime64 | Supported | Supported | |
| Boolean | Supported | Supported | ClickHouse has no native Boolean type. Use UInt8 or Int8 as an alternative. |
| Array | Partially supported | Supported | Reader supports arrays of integer, floating-point, string, or DateTime64 (millisecond precision) elements only. |
| Tuple | Supported | Supported | |
| Domain (IPv4, IPv6) | Supported | Supported | |
| Enum8 | Supported | Supported | |
| Enum16 | Supported | Supported | |
| Nullable | Supported | Supported | |
| Nested | Partially supported | Supported | Reader supports nested types of integer, floating-point, string, or DateTime64 (millisecond precision) only. |
Add a data source
Add the ClickHouse data source to DataWorks before creating a synchronization task. Follow the instructions in Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.
Data synchronization
Single-table batch synchronization
-
To configure using the Codeless UI or Code Editor, see Configure a task in the Codeless UI and Configure a task in the Code Editor.
-
For the complete parameter reference and script samples, see Appendix: Script samples and parameters.
Full-Database Batch Synchronization
Appendix: Script samples and parameters
The following sections provide script samples and parameter descriptions for configuring ClickHouse Reader and Writer in the Code Editor. For general Code Editor usage, see Use the code editor.
Reader script sample
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "clickhouse",
"parameter": {
"datasource": "example",
"table": "source_table",
"column": [
"id",
"name"
],
"where": "",
"splitPk": "",
"fetchSize": 1024
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Reader parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
datasource |
Yes | None | The name of the ClickHouse data source created in DataWorks. |
table |
Yes | None | The source table to read from. Must be specified within the connection configuration block. |
column |
Yes | None | The columns to read. Specify as a JSON array, for example: ["id", "name", "age"]. Cannot be empty. |
jdbcUrl |
Yes | None | The JDBC connection string for the source database. Specified within the connection block; only one value per database. Format follows the official ClickHouse JDBC format, for example: jdbc:clickhouse://localhost:3306/test?user=root&password=&useUnicode=true&characterEncoding=gbk&autoReconnect=true&failOverReadOnly=false. |
username |
Yes | None | The username for the data source. |
password |
Yes | None | The password for the data source. |
fetchSize |
No | 1024 | The number of records fetched per batch. A larger value reduces network round trips and improves extraction performance. Setting this too high may cause an Out-of-Memory (OOM) error — increase the value gradually based on your resource usage and ClickHouse workload. Required when splitPk is configured. |
splitPk |
No | None | The shard key for parallel reading. When set, DataWorks shards the data by this field and runs concurrent tasks to improve throughput. fetchSize must be configured when using this parameter. |
where |
No | None | A filter condition for incremental synchronization, for example: gmt_create>$bizdate. Without a where condition, a full data synchronization runs. |
Writer script sample
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "stream",
"parameter": {},
"name": "Reader",
"category": "reader"
},
{
"stepType": "clickhouse",
"parameter": {
"datasource": "example_ch_writer",
"column": [
"id",
"name"
],
"table": "target_table",
"preSql": [
"TRUNCATE TABLE IF EXISTS target_table"
],
"postSql": [
"OPTIMIZE TABLE target_table FINAL"
],
"batchSize": 1024,
"writeMode": "insert"
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Writer parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
datasource |
Yes | None | The name of the ClickHouse data source created in DataWorks. |
table |
Yes | None | The target table to write to. Must be specified within the connection configuration block. |
column |
Yes | None | The target columns to write to. Specify as a JSON array, for example: ["id", "name", "age"]. Cannot be empty. |
jdbcUrl |
Yes | None | The JDBC connection string for the target database. Specified within the connection block; only one value per database. For example: jdbc:clickhouse://127.0.0.1:3306/database. |
password |
Yes | None | The password for the data source. |
preSql |
No | None | An SQL statement that runs before the write task begins. For example: TRUNCATE TABLE IF EXISTS target_table. |
postSql |
No | None | An SQL statement that runs after the write task completes. For example: OPTIMIZE TABLE target_table FINAL. |
batchSize |
No | 1024 | The number of records written per batch. A larger value reduces network interactions with ClickHouse and improves overall throughput. |