Back up and restore data using the BACKUP and RESTORE commands - ApsaraDB for ClickHouse

You can use the BACKUP and RESTORE commands to back up data from ClickHouse to Object Storage Service (OSS) for low-cost long-term archiving. You can also use these commands to migrate a self-managed ClickHouse cluster to ApsaraDB for ClickHouse. These built-in ClickHouse SQL commands back up and restore objects, such as databases and tables. This topic describes the syntax of the BACKUP and RESTORE commands and provides usage examples.

Notes

You cannot back up and restore data between the Community-compatible Edition and the Enterprise Edition because their database and table engines are incompatible.
During backup and recovery, the cluster's CPU utilization increases. You can adjust parameters such as backup_threads and restore_threads to limit resource consumption. Memory usage is not significantly affected.

Prerequisites

An AccessKey has been created.
For OSS:
- A bucket has been created.
- You have granted ClickHouse access to the OSS bucket. For more information, see Common examples of bucket policies.
- You have obtained the Endpoint information for your region.

Syntax

 BACKUP|RESTORE
  TABLE [db.]table_name [AS [db.]table_name_in_backup]
    [PARTITION[S] partition_expr [,...]] |
  DICTIONARY [db.]dictionary_name [AS [db.]name_in_backup] |
  DATABASE database_name [AS database_name_in_backup]
    [EXCEPT TABLES ...] |
  TEMPORARY TABLE table_name [AS table_name_in_backup] |
  VIEW view_name [AS view_name_in_backup] |
  ALL [EXCEPT {TABLES|DATABASES}...] } [,...]
  [ON CLUSTER 'cluster_name']
  TO|FROM File('<path>/<filename>') | Disk('<disk_name>', '<path>/') | S3('<S3 endpoint>/<path>', '<Access key ID>', '<Access key Secret>')
  [SETTINGS base_backup = File('<path>/<filename>') | Disk(...) | S3('<S3 endpoint>/<path>', '<Access key ID>', '<Access key Secret>')]

Community-compatible Edition

The BACKUP and RESTORE commands support backup and recovery of instances, databases, and tables. This section uses a table as an example to describe the backup process.

Back up data to OSS

Upload directly to OSS

BACKUP TABLE default.data ON CLUSTER default TO S3('https://<yourBucketName>.<yourEndpoint>/data/', 'yourAccessKeyID', 'yourAccessKeySecret')

Back up to a local disk and then upload to OSS

Important

ApsaraDB for ClickHouse does not support backing up data to a local disk.

Configure the backup.
You can add the /etc/clickhouse-server/config.d/backup_disk.xml file to specify the backup destination.
In the following example, the <storage_configuration> tag defines a custom disk named backups that uses the physical path /backups/. The <backups> tag specifies that only the backups disk can be used for backups and that the backup files are stored in the /backups/ folder.
```
<clickhouse>
    <storage_configuration>
        <disks>
            <backups>
                <type>local</type>
                <path>/backups/</path>
            </backups>
        </disks>
    </storage_configuration>
    <backups>
        <allowed_disk>backups</allowed_disk>
        <allowed_path>/backups/</allowed_path>
    </backups>
</clickhouse>
```

Back up data to the local disk.

BACKUP TABLE test.table TO Disk('backups', 'data_1.zip')

Use ossutil to upload the backup to OSS.

ossutil cp data_1.zip oss://<yourBucketName>/data/data_1.zip -i <yourAccessKeyID> -k <yourAccessKeySecret> -e <yourEndpoint>

Restore data from OSS

-- Restore data to a single node from a ZIP file.
RESTORE TABLE default.data FROM S3('https://<yourBucketName>.<yourEndpoint>/data/data_1.zip', 'yourAccessKeyID', 'yourAccessKeySecret')
-- Restore data to all nodes from a directory.
RESTORE TABLE default.data ON CLUSTER default FROM S3('https://<yourBucketName>.<yourEndpoint>/data/', 'yourAccessKeyID', 'yourAccessKeySecret')

Enterprise Edition

Back up data to OSS

BACKUP TABLE default.data TO S3('https://<yourBucketName>.<yourEndpoint>/data/data_1.zip', 'yourAccessKeyID', 'yourAccessKeySecret')

Restore data from OSS

RESTORE TABLE default.data FROM S3('https://<yourBucketName>.<yourEndpoint>/data/data_1.zip', 'yourAccessKeyID', 'yourAccessKeySecret')

Performance optimization parameters

How to view

-- User-level parameters
SELECT * FROM system.settings WHERE name LIKE '%backup%' OR name LIKE '%restore%';
-- Server-level parameters
SELECT * FROM system.server_settings WHERE name LIKE '%backup%' OR name LIKE '%restore%';

Parameter descriptions

backup

Parameter type	Parameter name	Description
Server-level parameter	backup_threads	Restart required. Limits the number of threads for a backup. The default maximum is 16.
	max_backup_bandwidth_for_server	Restart required. Limits the total bandwidth for all backups on a single server.
	max_backups_io_thread_pool_size	Restart required. Limits the number of threads for backup I/O operations.
	max_backups_io_thread_pool_free_size	Restart required. Limits the number of idle threads in the thread pool for backup I/O operations.
User-level parameter	max_backup_bandwidth	Limits the bandwidth for a single backup job.

restore
Parameter type
Parameter name
Description
Server-level parameter
restore_threads
Restart required. Limits the number of threads for a restore. The default maximum is 16.

FAQ

How do I resolve the "Not found backup engine S3" error?

This error occurs because the instance is running an earlier version that does not support backing up data to OSS. To resolve this issue, upgrade the instance to version 23.8 or later. For a self-managed ClickHouse cluster, you can back up data to a local disk and then upload the backup to OSS.

How do I resolve the "Using archives with backups on clusters is disabled" error?

Cluster backups for multi-node deployments do not support the ZIP archive format. Use a folder for the S3 path, as shown in the following example.

BACKUP TABLE default.data ON CLUSTER default TO S3('https://<yourBucketName>.<yourEndpoint>/data/', 'yourAccessKeyID', 'yourAccessKeySecret')

What should I do if the connection between clickhouse-client and ClickHouse is disconnected after I run the `BACKUP` command?

The backup task runs in the background and is not affected if the connection is disconnected. You can run the select * from system.backups; command to view the progress of the backup job.

To run the task asynchronously, add ASYNC to the BACKUP command.

BACKUP TABLE default.data ON CLUSTER default TO S3('https://<yourBucketName>.<yourEndpoint>/data/data_1.zip', 'yourAccessKeyID', 'yourAccessKeySecret') ASYNC;

What is the speed of the `BACKUP` and `RESTORE` commands?

The speed of the backup and recovery process is affected by CPU, disk throughput, network throughput, and OSS bandwidth. The bottleneck component determines the actual speed.

For ApsaraDB for ClickHouse, you can upgrade the cluster specifications to increase CPU, disk throughput, and network throughput.
For OSS: For more information, see Limits and performance metrics.

How do I back up and restore nodes one by one in ApsaraDB for ClickHouse?

You can query the node IP addresses.
```
SELECT * FROM system.clusters;
```
You can connect directly to the instance node using clickhouse-client and disable the mandatory ON CLUSTER default setting.
Note
This parameter setting applies only to Alibaba Cloud ApsaraDB for ClickHouse. Do not configure this parameter for self-managed ClickHouse clusters.
```
set enforce_on_cluster_default_for_ddl = 0;
```

You can run the BACKUP command.

BACKUP TABLE default.data TO S3('https://<yourBucketName>.<yourEndpoint>/data/data_1.zip', 'yourAccessKeyID', 'yourAccessKeySecret')

You can run the RESTORE command.

RESTORE TABLE default.data FROM S3('https://<yourBucketName>.<yourEndpoint>/data/data_1.zip', 'yourAccessKeyID', 'yourAccessKeySecret')

Notes

Prerequisites

Syntax

Community-compatible Edition

Back up data to OSS

Upload directly to OSS

Back up to a local disk and then upload to OSS

Restore data from OSS

Enterprise Edition

Back up data to OSS

Restore data from OSS

Performance optimization parameters

FAQ

How do I resolve the "Not found backup engine S3" error?

How do I resolve the "Using archives with backups on clusters is disabled" error?

What should I do if the connection between clickhouse-client and ClickHouse is disconnected after I run the BACKUP command?

What is the speed of the BACKUP and RESTORE commands?

How do I back up and restore nodes one by one in ApsaraDB for ClickHouse?

What should I do if the connection between clickhouse-client and ClickHouse is disconnected after I run the `BACKUP` command?

What is the speed of the `BACKUP` and `RESTORE` commands?