ClickHouse is an open source column-oriented database management system (DBMS) for online analytical processing (OLAP) of queries. This topic describes how ClickHouse Writer works, the parameters that are supported by ClickHouse Writer, and how to configure ClickHouse Writer by using the codeless user interface (UI) and code editor.

Limits

  • ClickHouse Writer supports only exclusive resource groups for Data Integration, but not the shared resource group or custom resource groups for Data Integration. For more information, see Create and use an exclusive resource group for Data Integration, Use a shared resource group, and Create a custom resource group for Data Integration.
  • ClickHouse Writer connects to a ClickHouse database by using Java Database Connectivity (JDBC) and can write data to a destination table in the ClickHouse database only by using JDBC Statement.
  • ClickHouse Writer allows you to specify the columns to which you want to write data. You can specify the columns in an order different from the order specified by the schema of the destination table.
  • If ClickHouse Writer writes data in INSERT mode, we recommend that you throttle the transactions per second (TPS) to 1,000 to prevent high workloads on ClickHouse.
  • After ClickHouse Writer writes all required data, ClickHouse Writer performs a single-process POST Flush operation to update the data records in the ClickHouse database.
  • You must make sure that the driver version is compatible with your ClickHouse database. ClickHouse Writer supports only the following version of the ClickHouse database driver:
    <dependency>
        <groupId>ru.yandex.clickhouse</groupId>
        <artifactId>clickhouse-jdbc</artifactId>
        <version>0.2.4.ali2-SNAPSHOT</version>
    </dependency>

Background information

ClickHouse Writer writes data to ClickHouse databases. ClickHouse Writer connects to a remote ClickHouse database by using JDBC and executes the INSERT INTO statement to write data to the ClickHouse database.

ClickHouse Writer is designed for extract, transform, load (ETL) developers to import data in data warehouses to ClickHouse databases. ClickHouse Writer can also be used as a data migration tool by users such as database administrators.

ClickHouse Writer obtains data from a reader, generates an INSERT INTO statement based on your configurations, and executes the INSERT INTO statement to write data to ClickHouse databases.

Parameters

Parameter Description Required Default value
jdbcUrl The JDBC URL of the ClickHouse database. This parameter is included in the connection parameter.
  • You can configure only one JDBC URL for a database.
  • The value format of the jdbcUrl parameter must follow official ClickHouse specifications. You can also specify the information of the attachment facility. Example: jdbc:clickhouse://127.0.0.1:3306/database.
Yes No default value
username The username that you can use to connect to the database. Yes No default value
password The password that you can use to connect to the database. Yes No default value
table The name of the table to which you want to write data. Specify the name in a JSON array.
Note The table parameter must be included in the connection parameter.
Yes No default value
column The names of the columns to which you want to write data. Separate the names with commas (,), such as "column": ["id", "name", "age"].
Note The column parameter cannot be left empty.
Yes No default value
preSql The SQL statement that you want to execute before the synchronization node is run. No No default value
postSql The SQL statement that you want to execute after the synchronization node is run. No No default value
batchSize The number of data records to write at a time. Set this parameter to an appropriate value based on your business requirements. This greatly reduces the interactions between Data Integration and ClickHouse and increases throughput. If you set this parameter to an excessively large value, an out of memory (OOM) error may occur during data synchronization. No 1,024

Configure ClickHouse Writer by using the codeless UI

This method is not supported.

Configure ClickHouse Writer by using the code editor

For more information about how to configure a synchronization node by using the code editor, see Create a sync node by using the code editor.
Note Delete the comments from the following code before you run the code.
In the following code, a synchronization node is configured to write data to a ClickHouse database:
{
    "type":"job",
    "version":"2.0",// The version number. 
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"clickhouse",// The writer type. 
            "parameter":{
                "username": "",
                "password": "",
                "column": [// The names of the columns to which you want to write data. 
                    "id",
                    "name"
                ],
                "connection": [
                    {
                        "table": [// The name of the table to which you want to write data. 
                            "ClickHouse_table"
                        ],
                        "jdbcUrl": "jdbc:clickhouse://ip:port/database"
                    }
                ],
                "preSql": [ // The SQL statement that you want to execute before the synchronization node is run. 
                    "delete from table where db_id = -1"
                ],
                "postSql": [// The SQL statement that you want to execute after the synchronization node is run. 
                    "update table set db_modify_time = now() where db_id = 1"
                ],
                "batchSize": "1024",
                "batchByteSize": "67108864",
                "writeMode": "insert"
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed. 
        },
        "speed":{
            "throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1, // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}