This topic describes how to migrate table data from a self-managed ClickHouse cluster to an ApsaraDB for ClickHouse cluster.
Prerequisites
- The source cluster is a self-managed ClickHouse cluster that is deployed on Alibaba Cloud, the destination cluster is an ApsaraDB for ClickHouse cluster, and the source and the destination clusters are deployed in the same VPC. In this case, you can call the remote function to migrate data between the clusters.
- The source cluster is a self-managed ClickHouse cluster that is deployed on Alibaba Cloud, the destination cluster is an ApsaraDB for ClickHouse cluster, and the source and the destination clusters are deployed in different VPCs. In this case, you must first connect the VPCs. For more information, see Connect VPCs. After the two VPCs are connected, call the remote function to migrate data.
- To migrate data from a ClickHouse cluster that is deployed outside Alibaba Cloud, connect the deployment environment of the cluster to Alibaba Cloud. For example, if the source ClickHouse cluster is deployed in a data center, you must first connect the data center to the VPC that is connected to the destination ApsaraDB for ClickHouse cluster. For more information, see Connect a data center to a VPC. After the data center is connected to the VPC, call the remote function to migrate data.
- If the data center cannot connect to the VPC, we recommend that you export the data in the source ClickHouse cluster as a file. Then, use clickhouse-client to import the file to the destination ApsaraDB for ClickHouse cluster.
- If the data center cannot connect to the VPC, but infrastructure such as Spark or Flink is available, you can create Spark or Flink jobs to read data from the source cluster and then write the data to the destination ApsaraDB for ClickHouse cluster.
Migrate metadata
To migrate the metadata from a ClickHouse cluster, import the DDL statements that are executed to create tables in the source ClickHouse cluster to a destination ApsaraDB for ClickHouse cluster.
If you need to install clickhouse-client, make sure that the version of clickhouse-client is consistent with the version of the destination ApsaraDB for ClickHouse cluster. For more information about how to apply for a public endpoint, see Apply for a public endpoint.
- Step 1: Query the databases in the source ClickHouse cluster
clickhouse-client --host="<old host>" --port="<old port>" --user="<old user name>" --password="<old password>" --query="SHOW databases" > database.list
The following table describes the parameters.
Parameter Description old host The endpoint of the source ClickHouse cluster. old port The port that is used by the source ClickHouse cluster. old user name The username that is used to log on to the source ClickHouse cluster. The user has DML permissions, including read, write, and set permissions. The user can also be granted DDL permissions. old password The password of the user that is used to connect to the source ClickHouse cluster. Note system stands for the system database. You do not need to migrate this database.- Step 2: Query the tables in the source ClickHouse cluster
clickhouse-client --host="<old host>" --port="<old port>" --user="<old user name>" --password="<old password>" --query="SHOW tables from <database_name>" > table.list
<database_name> specifies the database that you want to query. For more information about other parameters, see the "Step 1: Query the databases in the source ClickHouse cluster" section.
You can also use the system table to query the names of all databases and tables in the cluster.
select distinct database, name from system.tables where database != 'system';
Note Table names that start with .inner. are the internal representations of materialized views. You can skip these tables during migration.- Step 3: Export the DDL statements that are executed to create tables in the source ClickHouse cluster
clickhouse-client --host="<old host>" --port="<old port>" --user="<old user name>" --password="<old password>" --query="SHOW CREATE TABLE <database_name>.<table_name>" > table.sql
- Step 4: Import the DDL statements that are executed to create tables in the source ClickHouse cluster to the destination ApsaraDB for ClickHouse cluster
clickhouse-client --host="<new host>" --port="<new port>" --user="<new user name>" --password="<new password>" < table.sql
Parameter Description new host The endpoint of the destination ApsaraDB for ClickHouse cluster. new port The port that is used by the destination ApsaraDB for ClickHouse cluster. new user name The username that is used to log on to the destination ApsaraDB for ClickHouse cluster. The user has DML permissions, including read, write, and set permissions. The user can also be granted DDL permissions. new password The password of the user that is used to log on to the destination ApsaraDB for ClickHouse cluster.
Use the remote function to migrate data
-
Add the CIDR block of the destination ApsaraDB for ClickHouse cluster to the whitelist of the source ClickHouse cluster.
You can view the VPC that is connected to the destination ApsaraDB for ClickHouse cluster in the ApsaraDB for ClickHouse console and add the CIDR block of the VPC to the whitelist of the source ClickHouse cluster. If you want to control the CIDR block range in the whitelist to avoid potential security risks, execute the following SQL statement. This way, you can query the backend server IP addresses of the destination ApsaraDB for ClickHouse cluster and add only these IP addresses to the whitelist of the source ClickHouse cluster.
select * from system.clusters;
-
In the destination ApsaraDB for ClickHouse cluster, execute the following SQL statement to migrate data:
insert into <new_database>.<new_table> select * from remote('old_endpoint', <old_database>.<old_table>, '<username>', '<password>');
For ApsaraDB for ClickHouse V20.8, we recommend that you use the remoteRaw function to migrate data. If data fails to be migrated, you can apply for a minor version upgrade.insert into <new_database>.<new_table> select * from remoteRaw('old_endpoint', <old_database>.<old_table>, '<username>', '<password>');
Parameter Description new_database The name of the database in the destination ApsaraDB for ClickHouse cluster. new_table The name of the table in the destination ApsaraDB for ClickHouse cluster. old_endpoint The endpoint of the source ClickHouse cluster. Specify a VPC endpoint instead of a public endpoint. old_database The name of the database in the source ClickHouse cluster. old_table The name of the table in the source ClickHouse cluster. username The username that is used to log on to the source ClickHouse cluster. password The password that is used to log on to the source ClickHouse cluster.
Export and import a file to migrate data
You can export a file from the source ClickHouse cluster and import the file to the destination ApsaraDB for ClickHouse cluster.
- Export and import a CSV file
-
- Export data to a CSV file from the source ClickHouse cluster.
clickhouse-client --host="<old host>" --port="<oldport>" --user="<old user name>" --password="<old password>" --query="select * from <database_name>.<table_name> FORMAT CSV" > table.csv
- Import the CSV file to the destination ApsaraDB for ClickHouse cluster.
clickhouse-client --host="<new host>" --port="<new port>" --user="<new user name>" --password="<new password>" --query="insert into <database_name>.<table_name> FORMAT CSV" < table.csv
- Export data to a CSV file from the source ClickHouse cluster.
- Use a pipe in Linux to export and import streaming data
clickhouse-client --host="<old host>" --port="<old port>" --user="<user name>" --password="<password>" --query="select * from <database_name>.<table_name> FORMAT CSV" | clickhouse-client --host="<new host>" --port="<new port>" --user="<user name>" --password="<password>" --query="INSERT INTO <database_name>.<table_name> FORMAT CSV"