ClickHouse is an open source column-oriented database management system (DBMS) for online analytical processing (OLAP) of queries. This topic describes how ClickHouse Writer works, the parameters that are supported by ClickHouse Writer, and how to configure ClickHouse Writer by using the codeless user interface (UI) and code editor.
Limits
- Only Alibaba Cloud ApsaraDB for ClickHouse is supported.
- ClickHouse Writer supports only exclusive resource groups for Data Integration, but not the shared resource group or custom resource groups for Data Integration. For more information, see Create and use an exclusive resource group for Data Integration, Use a shared resource group, and Create a custom resource group for Data Integration.
- ClickHouse Writer connects to a ClickHouse database by using Java Database Connectivity (JDBC) and can write data to a destination table in the ClickHouse database only by using JDBC Statement.
- ClickHouse Writer allows you to specify the columns to which you want to write data. You can specify the columns in an order different from the order specified by the schema of the destination table.
- If ClickHouse Writer writes data in INSERT mode, we recommend that you throttle the transactions per second (TPS) to 1,000 to prevent high workloads on ClickHouse.
- After ClickHouse Writer writes all required data, ClickHouse Writer performs a single-process POST Flush operation to update the data records in the ClickHouse database.
- You must make sure that the driver version is compatible with your ClickHouse database.
ClickHouse Writer supports only the following version of the ClickHouse database driver:
<dependency> <groupId>ru.yandex.clickhouse</groupId> <artifactId>clickhouse-jdbc</artifactId> <version>0.2.4.ali2-SNAPSHOT</version> </dependency>
Background information
ClickHouse Writer writes data to ClickHouse databases. ClickHouse Writer connects
to a remote ClickHouse database by using JDBC and executes the INSERT INTO
statement to write data to the ClickHouse database.
ClickHouse Writer is designed for extract, transform, load (ETL) developers to import data in data warehouses to ClickHouse databases. ClickHouse Writer can also be used as a data migration tool by users such as database administrators.
ClickHouse Writer obtains data from a reader, generates an INSERT INTO statement based on your configurations, and executes the INSERT INTO statement to write data to ClickHouse databases.
Parameters
Parameter | Description | Required | Default value |
---|---|---|---|
jdbcUrl | The JDBC URL of the ClickHouse database. This parameter is included in the connection parameter.
|
Yes | No default value |
username | The username that you can use to connect to the database. | Yes | No default value |
password | The password that you can use to connect to the database. | Yes | No default value |
table | The name of the table to which you want to write data. Specify the name in a JSON
array.
Note The table parameter must be included in the connection parameter.
|
Yes | No default value |
column | The names of the columns to which you want to write data. Separate the names with
commas (,), such as "column": ["id", "name", "age"] .
Note The column parameter cannot be left empty.
|
Yes | No default value |
preSql | The SQL statement that you want to execute before the synchronization node is run. | No | No default value |
postSql | The SQL statement that you want to execute after the synchronization node is run. | No | No default value |
batchSize | The number of data records to write at a time. Set this parameter to an appropriate value based on your business requirements. This greatly reduces the interactions between Data Integration and ClickHouse and increases throughput. If you set this parameter to an excessively large value, an out of memory (OOM) error may occur during data synchronization. | No | 1,024 |
Configure ClickHouse Writer by using the codeless UI
This method is not supported.
Configure ClickHouse Writer by using the code editor
{
"type":"job",
"version":"2.0",// The version number.
"steps":[
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType":"clickhouse",// The writer type.
"parameter":{
"username": "",
"password": "",
"column": [// The names of the columns to which you want to write data.
"id",
"name"
],
"connection": [
{
"table": [// The name of the table to which you want to write data.
"ClickHouse_table"
],
"jdbcUrl": "jdbc:clickhouse://ip:port/database"
}
],
"preSql": [ // The SQL statement that you want to execute before the synchronization node is run.
"delete from table where db_id = -1"
],
"postSql": [// The SQL statement that you want to execute after the synchronization node is run.
"update table set db_modify_time = now() where db_id = 1"
],
"batchSize": "1024",
"batchByteSize": "67108864",
"writeMode": "insert"
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// The maximum number of dirty data records allowed.
},
"speed":{
"throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
"concurrent":1, // The maximum number of parallel threads.
"mbps":"12"// The maximum transmission rate.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}