You can use the BACKUP and RESTORE commands to back up data from ClickHouse to Object Storage Service (OSS) for low-cost long-term archiving. You can also use these commands to migrate a self-managed ClickHouse cluster to ApsaraDB for ClickHouse. These built-in ClickHouse SQL commands back up and restore objects, such as databases and tables. This topic describes the syntax of the BACKUP and RESTORE commands and provides usage examples.
Notes
You cannot back up and restore data between the Community-compatible Edition and the Enterprise Edition because their database and table engines are incompatible.
During backup and recovery, the cluster's CPU utilization increases. You can adjust parameters such as
backup_threadsandrestore_threadsto limit resource consumption. Memory usage is not significantly affected.
Prerequisites
For OSS:
You have granted ClickHouse access to the OSS bucket. For more information, see Common examples of bucket policies.
You have obtained the Endpoint information for your region.
Syntax
BACKUP|RESTORE
TABLE [db.]table_name [AS [db.]table_name_in_backup]
[PARTITION[S] partition_expr [,...]] |
DICTIONARY [db.]dictionary_name [AS [db.]name_in_backup] |
DATABASE database_name [AS database_name_in_backup]
[EXCEPT TABLES ...] |
TEMPORARY TABLE table_name [AS table_name_in_backup] |
VIEW view_name [AS view_name_in_backup] |
ALL [EXCEPT {TABLES|DATABASES}...] } [,...]
[ON CLUSTER 'cluster_name']
TO|FROM File('<path>/<filename>') | Disk('<disk_name>', '<path>/') | S3('<S3 endpoint>/<path>', '<Access key ID>', '<Access key Secret>')
[SETTINGS base_backup = File('<path>/<filename>') | Disk(...) | S3('<S3 endpoint>/<path>', '<Access key ID>', '<Access key Secret>')]Community-compatible Edition
The BACKUP and RESTORE commands support backup and recovery of instances, databases, and tables. This section uses a table as an example to describe the backup process.
Back up data to OSS
Upload directly to OSS
BACKUP TABLE default.data ON CLUSTER default TO S3('https://<yourBucketName>.<yourEndpoint>/data/', 'yourAccessKeyID', 'yourAccessKeySecret')Back up to a local disk and then upload to OSS
ApsaraDB for ClickHouse does not support backing up data to a local disk.
Configure the backup.
You can add the
/etc/clickhouse-server/config.d/backup_disk.xmlfile to specify the backup destination.In the following example, the
<storage_configuration>tag defines a custom disk namedbackupsthat uses the physical path/backups/. The<backups>tag specifies that only thebackupsdisk can be used for backups and that the backup files are stored in the/backups/folder.<clickhouse> <storage_configuration> <disks> <backups> <type>local</type> <path>/backups/</path> </backups> </disks> </storage_configuration> <backups> <allowed_disk>backups</allowed_disk> <allowed_path>/backups/</allowed_path> </backups> </clickhouse>Back up data to the local disk.
BACKUP TABLE test.table TO Disk('backups', 'data_1.zip')Use ossutil to upload the backup to OSS.
ossutil cp data_1.zip oss://<yourBucketName>/data/data_1.zip -i <yourAccessKeyID> -k <yourAccessKeySecret> -e <yourEndpoint>
Restore data from OSS
-- Restore data to a single node from a ZIP file.
RESTORE TABLE default.data FROM S3('https://<yourBucketName>.<yourEndpoint>/data/data_1.zip', 'yourAccessKeyID', 'yourAccessKeySecret')
-- Restore data to all nodes from a directory.
RESTORE TABLE default.data ON CLUSTER default FROM S3('https://<yourBucketName>.<yourEndpoint>/data/', 'yourAccessKeyID', 'yourAccessKeySecret')Enterprise Edition
Back up data to OSS
BACKUP TABLE default.data TO S3('https://<yourBucketName>.<yourEndpoint>/data/data_1.zip', 'yourAccessKeyID', 'yourAccessKeySecret')Restore data from OSS
RESTORE TABLE default.data FROM S3('https://<yourBucketName>.<yourEndpoint>/data/data_1.zip', 'yourAccessKeyID', 'yourAccessKeySecret')Performance optimization parameters
How to view
-- User-level parameters
SELECT * FROM system.settings WHERE name LIKE '%backup%' OR name LIKE '%restore%';
-- Server-level parameters
SELECT * FROM system.server_settings WHERE name LIKE '%backup%' OR name LIKE '%restore%';Parameter descriptions
backup
Parameter type
Parameter name
Description
Server-level parameter
backup_threads
Restart required. Limits the number of threads for a backup. The default maximum is 16.
max_backup_bandwidth_for_server
Restart required. Limits the total bandwidth for all backups on a single server.
max_backups_io_thread_pool_size
Restart required. Limits the number of threads for backup I/O operations.
max_backups_io_thread_pool_free_size
Restart required. Limits the number of idle threads in the thread pool for backup I/O operations.
User-level parameter
max_backup_bandwidth
Limits the bandwidth for a single backup job.
restore
Parameter type
Parameter name
Description
Server-level parameter
restore_threads
Restart required. Limits the number of threads for a restore. The default maximum is 16.
FAQ
How do I resolve the "Not found backup engine S3" error?
This error occurs because the instance is running an earlier version that does not support backing up data to OSS. To resolve this issue, upgrade the instance to version 23.8 or later. For a self-managed ClickHouse cluster, you can back up data to a local disk and then upload the backup to OSS.
How do I resolve the "Using archives with backups on clusters is disabled" error?
Cluster backups for multi-node deployments do not support the ZIP archive format. Use a folder for the S3 path, as shown in the following example.
BACKUP TABLE default.data ON CLUSTER default TO S3('https://<yourBucketName>.<yourEndpoint>/data/', 'yourAccessKeyID', 'yourAccessKeySecret')What should I do if the connection between clickhouse-client and ClickHouse is disconnected after I run the BACKUP command?
The backup task runs in the background and is not affected if the connection is disconnected. You can run the select * from system.backups; command to view the progress of the backup job.
To run the task asynchronously, add ASYNC to the BACKUP command.
BACKUP TABLE default.data ON CLUSTER default TO S3('https://<yourBucketName>.<yourEndpoint>/data/data_1.zip', 'yourAccessKeyID', 'yourAccessKeySecret') ASYNC;What is the speed of the BACKUP and RESTORE commands?
The speed of the backup and recovery process is affected by CPU, disk throughput, network throughput, and OSS bandwidth. The bottleneck component determines the actual speed.
For ApsaraDB for ClickHouse, you can upgrade the cluster specifications to increase CPU, disk throughput, and network throughput.
For OSS: For more information, see Limits and performance metrics.
How do I back up and restore nodes one by one in ApsaraDB for ClickHouse?
You can query the node IP addresses.
SELECT * FROM system.clusters;You can connect directly to the instance node using clickhouse-client and disable the mandatory
ON CLUSTER defaultsetting.NoteThis parameter setting applies only to Alibaba Cloud ApsaraDB for ClickHouse. Do not configure this parameter for self-managed ClickHouse clusters.
set enforce_on_cluster_default_for_ddl = 0;You can run the
BACKUPcommand.BACKUP TABLE default.data TO S3('https://<yourBucketName>.<yourEndpoint>/data/data_1.zip', 'yourAccessKeyID', 'yourAccessKeySecret')You can run the
RESTOREcommand.RESTORE TABLE default.data FROM S3('https://<yourBucketName>.<yourEndpoint>/data/data_1.zip', 'yourAccessKeyID', 'yourAccessKeySecret')