All Products
Search
Document Center

ApsaraDB for ClickHouse:Sync with DataWorks

Last Updated:Mar 28, 2026

Use DataWorks batch data synchronization to load data from MaxCompute into ApsaraDB for ClickHouse. The steps below use MaxCompute as the source, but DataWorks supports many other data source types.

Prerequisites

Before you begin, ensure that you have:

Constraints

  • ApsaraDB for ClickHouse supports only exclusive resource groups for Data Integration. Shared resource groups are not supported.

  • To re-synchronize a table that was previously synced, clear the existing data first by running TRUNCATE TABLE <table_name>; in the ClickHouse database.

Sync data from MaxCompute to ApsaraDB for ClickHouse

Step 1: Add data sources

Add data sources for both MaxCompute and ApsaraDB for ClickHouse in DataWorks.

For instructions, see Associate a MaxCompute computing resource and Associate a ClickHouse computing resource.

Step 2: Create a MaxCompute table

  1. Log on to the DataWorks console.

  2. In the left-side navigation pane, click Workspaces.

  3. In the top navigation bar, select the region where your workspace is located. On the Workspaces page, find your workspace and choose Shortcuts > Data Development in the Actions column.

  4. On the DataStudio page, move the pointer over the 新建 icon and choose Create Table > MaxCompute > Table.

  5. In the Create Table dialog box, select a path from the Path drop-down list and enter a table name. This example uses odptabletest1. Click Create.

  6. In the General section, configure the table properties.

    Configuration itemDescription
    Display nameThe display name of the table.
    ThemeA subject acts as a folder. Define level-1 and level-2 folders to classify the table by business purpose. If no subject exists, create one. See Define table subjects.

    General section

  7. Click DDL in the toolbar.

  8. In the DDL dialog box, enter the following statement and click Generate Table Schema:

    CREATE TABLE IF NOT EXISTS odptabletest1
    (
      v1  TINYINT,
      v2  SMALLINT
    );
  9. Click Commit to Development Environment, then click Commit to Production Environment.

Step 3: Write data to the MaxCompute table

  1. On the DataStudio page, click Ad Hoc Query in the left-side navigation pane.

  2. Move the pointer over the 新建 icon and choose Create > ODPS SQL.

  3. In the Create Node dialog box, select a path from the Path drop-down list and enter a name for the node. Click Confirm.

  4. In the node editor, enter the following statement to insert data into the MaxCompute table:

    insert into odptabletest1 values (1,"a"),(2,"b"),(3,"c"),(4,"d");
  5. Click the 执行 icon in the toolbar.

  6. In the Estimate MaxCompute Computing Cost dialog box, click Run.

Step 4: Create an ApsaraDB for ClickHouse table

Create a destination table whose column types match the MaxCompute source table.

  1. Log on to the ApsaraDB for ClickHouse console.

  2. In the top navigation bar, select the region where your cluster is deployed.

  3. On the Clusters page, click the tab for your cluster edition and click the cluster ID.

  4. On the Cluster Information page, click Log On to Database in the upper-right corner.

  5. In the Log on to Database Instance dialog box, enter your database account credentials and click Login.

  6. Run the following statement to create the destination table:

    create table default.dataworktest ON CLUSTER default (
      v1 Int,
      v2 String
    ) ENGINE = MergeTree ORDER BY v1;
The schema type of the ApsaraDB for ClickHouse table must match the schema type of the MaxCompute table.

Step 5: Create a workflow

Skip this step if you already have a workflow.

  1. On the DataStudio page, click Scheduled Workflow in the left-side navigation pane.

  2. Move the pointer over the 新建 icon and select Create Workflow.

  3. In the Create Workflow dialog box, enter a workflow name.

    Important

    The name must be 1–128 characters and can contain letters, digits, underscores (_), and periods (.).

  4. Click Create.

Step 6: Create a batch synchronization node

  1. Click the workflow, then right-click Data Integration.

  2. Choose Create Node > Offline synchronization.

  3. In the Create Node dialog box, enter a node name and select a path.

    Important

    The node name must be 1–128 characters and can contain letters, digits, underscores (_), and periods (.).

  4. Click Confirm.

Step 7: Configure the source and destination

  1. Source: Select a data source type. This example uses MaxCompute.

    ParameterDescription
    ConnectionThe type and name of the data source.
    Production project nameThe name of the project in the production environment. Read-only.
    TableThe source table to synchronize.
    Partition key columnThe partition to read for daily incremental data. For example, set to ${bizdate} for date-based partitions. DataWorks cannot automatically map fields in partitioned MaxCompute tables — specify each partition manually when configuring MaxCompute Reader. See MaxCompute Reader.

    Source configuration

  2. Target: Select ClickHouse.

    ParameterDescription
    ConnectionThe type and name of the data source. Select ClickHouse.
    TableThe destination table in ApsaraDB for ClickHouse.
    Primary key or unique key conflict handlingSet to insert into (Insert).
    Pre sqlSQL statement to run before the synchronization task starts.
    Post sqlSQL statement to run after the synchronization task completes.
    Batch insert byte sizeMaximum number of bytes per insert batch.
    Number of batchesNumber of records to insert per batch.

    Target configuration

  3. (Optional) Mappings: Select field mappings. Each field on the left (source) maps to a field on the right (destination). For parameter details, see Configure mappings between source fields and destination fields.

    Field mappings

  4. (Optional) Channel: Configure the maximum transmission rate and dirty data check rules. For parameter details, see Configure channel control policies.

    Channel control

Step 8: Configure the resource group

On the right panel, click Resource Group configuration and select a group from the Exclusive Resource Group drop-down list.

Resource group configuration

Step 9: Save and run the synchronization task

  1. Click the 保存 icon in the toolbar to save the task.

  2. Click the 运行 icon to run the task.

    Running the task

Verify the sync result

  1. Log on to the ApsaraDB for ClickHouse console.

  2. In the top navigation bar, select the region where your cluster is deployed.

  3. On the Clusters page, click the tab for your cluster edition and click the cluster ID.

  4. On the Cluster Information page, click Log On to Database in the upper-right corner.

  5. In the Log on to Database Instance dialog box, enter your credentials and click Login.

  6. Run the following query and click Execute(F8):

    SELECT * FROM dataworktest;

    If the synchronization succeeded, the query returns the four rows inserted in step 3.

    Query result