This topic describes how to migrate data between two ClickHouse clusters.

Prerequisites

For security purposes, the backend server of an ApsaraDB for ClickHouse cluster can access only the servers within the same virtual private cloud (VPC). The backend server cannot access the Internet. Before you migrate data, verify that the source and destination ClickHouse clusters are able to communicate with each other.
  • If the two ClickHouse clusters are deployed within the same VPC on Alibaba Cloud, you can call the remote function to migrate data.
  • If the two ClickHouse clusters belong to different VPCs on Alibaba Cloud, connect the VPCs. For more information, see Connect VPCs. After the two VPCs are connected, call the remote function to migrate data.
  • To migrate data from a ClickHouse cluster that is not deployed on Alibaba Cloud, connect the deployment environment of the cluster to Alibaba Cloud. For example, if the source ClickHouse cluster is deployed in a self-managed data center, connect the data center to the VPC that is connected to the destination cluster first. For more information, see Connect a data center to a VPC. After the data center is connected to the VPC, call the remote function to migrate data.
  • If the self-managed data center cannot be connected to the VPC, we recommend that you export the data of the source ClickHouse cluster as a file, and then use clickhouse-client to import the file to the destination ApsaraDB for ClickHouse cluster.
  • If the self-managed data center cannot be connected to the VPC but infrastructure such as Apache Spark or Apache Flink is available, you can create Apache Spark or Apache Flink jobs to read data from the source ClickHouse cluster and then write the data to the destination ApsaraDB for ClickHouse cluster.

Migrate metadata

ClickHouse metadata can be migrated by executing the DDL statement.

To install clickhouse-client, click clickhouse-client to download it first. For more information about how to install and use the client, see Community documentation.

Step 1: View the database list of the source ClickHouse cluster
clickhouse-client --host="<old host>" --port="<old port>" --user="<old user name>" --password="<old password>" --query="SHOW databases"  > database.list

The following table describes the parameters.

Parameter Description
old host The endpoint of the host on which the source ClickHouse database is deployed.
old port The port of the host on which the source ClickHouse database is deployed.
old user name The username that is used to log on to the source ClickHouse cluster. The user that uses the username has the read and write permissions on the cluster.
old password The password that is used to log on to the ClickHouse cluster.
Note system stands for the system database. You do not need to migrate this database.
Step 2: View the table list of the source ClickHouse cluster
clickhouse-client --host="<old host>" --port="<old port>" --user="<old user name>" --password="<old password>" --query="SHOW tables from <database_name>"  > table.list

<database_name> specifies the database that you want to view. For information about other parameters, see Step 1: View the database list of the source ClickHouse cluster.

You can also use the system table to view the names of all databases and tables in the cluster.

select distinct database, name from system.tables where database != 'system';
Note The returned tables whose names start with .inner. are internal representations of materialized views. You do not need to migrate these tables.
Step 3: Export the DDL statement used to create tables from the source ClickHouse cluster
clickhouse-client --host="<old host>" --port="<old port>" --user="<old user name>" --password="<old password>" --query="SHOW CREATE TABLE <database_name>.<table_name>"  > table.sql
Step 4: Import the DDL statement used to create tables to the destination ApsaraDB for ClickHouse cluster
clickhouse-client --host="<new host>" --port="<new port>" --user="<new user name>" --password="<new password>"  < table.sql
Parameter Description
new host The endpoint of the host on which the destination ApsaraDB for ClickHouse database is deployed.
new port The port of the host on which the destination ApsaraDB for ClickHouse database is deployed.
new user name The username that is used to log on to the destination ApsaraDB for ClickHouse cluster. A user that uses this username has the read and write permissions on the cluster.
new password The password that is used to log on to the destination ApsaraDB for ClickHouse cluster.

Use the remote function to migrate data

  1. Add the CIDR block of the destination ApsaraDB for ClickHouse cluster to the whitelist of the source ClickHouse cluster.

    You can view the VPC that is connected to the destination ApsaraDB for ClickHouse cluster in the ApsaraDB for ClickHouse console. Add the CIDR block of the VPC to the whitelist of the source ClickHouse cluster. If you want to control the CIDR block range in the whitelist to avoid potential security risks, execute the following SQL statement to query the backend server IP addresses of the destination ApsaraDB for ClickHouse cluster and add only these IP addresses to the whitelist of the source ClickHouse cluster.

    select * from system.clusters;
  2. In the destination ApsaraDB for ClickHouse cluster, execute the following SQL statement to migrate data:
    insert into <new_database>.<new_table> select * from remote('old_endpoint', <old_database>.<old_table>, '<username>', '<password>');
    For ClickHouse 20.8, we recommend that you use the remoteRaw function to migrate data. If your migration fails, you can apply for a minor version upgrade.
    insert into <new_database>.<new_table> select * from remoteRaw('old_endpoint', <old_database>.<old_table>, '<username>', '<password>');
    Parameter Description
    new_database The name of the database in the destination ApsaraDB for ClickHouse cluster.
    new_table The name of the table in the destination ApsaraDB for ClickHouse cluster.
    old_endpoint The endpoint of the source ClickHouse cluster. Enter the VPC endpoint instead of the public endpoint.
    old_database The name of the database in the source ClickHouse cluster.
    old_table The name of the table in the source ClickHouse cluster.
    username The username that is used to log on to the source ClickHouse cluster.
    password The password that is used to log on to the source ClickHouse cluster.

Import and export a file to migrate data

You can export a file from the source ClickHouse cluster and import the file into the destination cluster.

Import and export a CSV file
  1. Export data in a CSV file from the source ClickHouse cluster.
    clickhouse-client --host="<old host>" --port="<oldport>" --user="<old user name>" --password="<old password>"  --query="select * from <database_name>.<table_name> FORMAT CSV"  > table.csv
  2. Import the CSV file to the destination ApsaraDB for ClickHouse cluster.
    clickhouse-client --host="<new host>" --port="<new port>" --user="<new user name>" --password="<new password>"  --query="insert into <database_name>.<table_name> FORMAT CSV"  < table.csv
Use a pipe in Linux to export and import streaming data
clickhouse-client --host="<old host>" --port="<old port>" --user="<user name>" --password="<password>"  --query="select * from <database_name>.<table_name> FORMAT CSV" | 
clickhouse-client --host="<new host>" --port="<new port>" --user="<user name>" --password="<password>"   --query="INSERT INTO <database_name>.<table_name> FORMAT CSV"