This topic provides answers to some frequently asked questions about the backup center.
Table of contents
- What do I do if the migrate-controller component in a cluster that uses FlexVolume cannot be launched?
- What do I do if the status of the backup, restore, or snapshot conversion task is Failed and the ack backup location status is not ok: XXX error is returned?
- What do I do if the backup, restore, or snapshot conversion task remains in the Inprogress state for a long period of time?
- What do I do if the console displays Failed to retrieve the data. Refresh and try again. 404 page not found
- What do I do if the system prompts that no backup file can be selected when the system initializes the backup vault to restore an application across clusters?
- How do I modify a backup vault?
- What do I do if the status of the backup task is Failed and the velero backup: XXX failed error is returned?
What do I do if the migrate-controller component in a cluster that uses FlexVolume cannot be launched?
The migrate-controller component does not support clusters that use FlexVolume. If you want to use the backup center feature, you can use one of the following methods to switch from FlexVolume to Container Storage Interface (CSI):
- Upgrade from FlexVolume to CSI for clusters where no data is stored
- Use CSI to take over the statically provisioned NAS volumes that are managed by FlexVolume
- Use CSI to take over the statically provisioned OSS volumes that are managed by FlexVolume
- To switch from FlexVolume to CSI in other scenarios, Join the DingTalk group 35532895..
What do I do if the status of the backup, restore, or snapshot conversion task is Failed and the ack backup location status is not ok: XXX error is returned?
Issue
The status of the backup, restore, or snapshot conversion task is Failed and the ack backup location status is not ok: XXX error is returned.
Causes
- The specified Object Storage Service (OSS) bucket does not exist.
- The cluster does not have permissions to access OSS.
- The network of the OSS bucket is unreachable.
Solution
- Log on to the OSS console. Check whether the OSS bucket associated with the backup vault exists. If the OSS bucket does not exist, create one and associate it with the backup vault. Fore more information, see Create buckets.
- Check whether the cluster has permissions to access OSS.
- Container Service for Kubernetes (ACK) Pro clusters: No OSS permissions are required. Make sure that the name of the backup vault is cnfs-oss-**.
- ACK dedicated clusters and registered clusters: OSS permissions are required. Fore more information, see Install migrate-controller and grant permissions.
Note You cannot create a backup vault that uses the same name as a deleted one. You cannot associate a backup vault with an OSS bucket whose name is not cnfs-oss-**. If your backup vault is already associated with an OSS bucket whose name is not cnfs-oss-**, recreate a backup vault that uses a different name and associate the backup vault with an OSS bucket whose name meets the requirement. - Check the network configuration of the cluster. If the cluster and OSS bucket are deployed in different regions, the backup vault needs to access the OSS bucket over the Internet.
What do I do if the backup, restore, or snapshot conversion task remains in the Inprogress state for a long period of time?
Cause 1: The components in the csdr namespace cannot run as normal
Check the status of the components and identify the cause of the anomaly.
- Run the following command to check whether the components in the csdr namespace are restarted or cannot be launched:
kubectl get pod -n csdr
- Run the following command to identify the cause of the restart or launch failure.
kubectl describe pod <pod-name> -n csdr
- If the components are restarted due to an out of memory (OOM) error, perform the following steps:
Run the following command to modify the resource limit of the Deployment. Set
<deploy-name>
ofcsdr-controller-***
tocsdr-controller
and set<deploy-name>
ofcsdr-velero-***
tocsdr-velero
.kubectl patch deploy <deploy-name> -p '{"spec":{"containers":{"resources":{"limits":"<new-limit-memory>"}}}}'
- If the components cannot be launched due to insufficient HBR permissions, perform the following steps:
- Check whether Hybrid Backup Recovery (HBR) is activated for the cluster.
- If HBR is not activated, activate the service. For more information, see HBR.
- If HBR is activated, proceed with the next step.
- Check whether the ACK Pro cluster or registered cluster has HBR permissions.
- If the cluster does not have permissions, grant HBR permissions to the cluster. For more information, see Install migrate-controller and grant permissions.
- If the cluster has permissions, proceed with the next step.
- Run the following command to check whether the token required by the HBR client exists.
If a couldnt find key HBR_TOKEN event is generated, the token does not exist. Perform the following steps to resolve the issue:kubectl describe <hbr-client-***>
- Run the following command to query the node that hosts
hbr-client-***
:kubectl get pod <hbr-client-***> -n csdr -owide
- Run the following command to change the value of
labels: csdr.alibabacloud.com/agent-enable
fromtrue
tofalse
for the node.kubectl label node <node-name> csdr.alibabacloud.com/agent-enable=false --overwrite
Important- When the system reruns the backup or restore task, the system automatically creates a token and launches hbr-client.
- You cannot launch hbr-client by copying a token from another cluster to the current cluster. You need to delete the copied token and the corresponding
hbr-client-*** pod
and repeat the preceding steps.
- Run the following command to query the node that hosts
- Check whether Hybrid Backup Recovery (HBR) is activated for the cluster.
Cause 2: No permissions are granted to use disk snapshots in disk backup scenarios
If you back up the disk volume that is mounted to your application but the backup task remains in the Inprogress state for a long period of time, run the following command to query the newly created VolumeSnapshots in the cluster:
kubectl get volumesnapshot -n <backup-namespace>
Expected output:
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT ...
<volumesnapshot-name> true <volumesnapshotcontent-name> ...
If the READYTOUSE
state of all VolumeSnapshots
remains false
for a long period of time, perform the following steps.
- Log on to the Elastic Compute Service (ECS) console and check whether the disk snapshot feature is enabled.
- If the feature is disabled, enable the feature in the corresponding region. For more information, see Activate ECS Snapshot.
- If the feature is enabled, proceed with the next step.
- Check whether the permissions to use disk snapshots are granted.
- Log on to the ACK console and click Clusters in the left-side navigation pane.
- On the Clusters page, click the name of a cluster and click Cluster Information in the left-side navigation pane.
- On the Cluster Information page, click the Cluster Resources tab and click the hyperlink to the right of Master RAM Role to go to the permission management page.
- On the Policies page, check whether the permissions to use disk snapshots are granted.
- If the k8sMasterRolePolicy-Csi-*** policy exists and the policy provides the
k8sMasterRolePolicy-Csi-***
andk8sMasterRolePolicy-Csi-***
permissions, the required permissions are granted. In this case, Submit a ticket. - If the k8sMasterRolePolicy-Csi-*** policy does not exist, attach the following policy to the master RAM role to grant the permissions to use disk snapshots. For more information, see Create a custom policy and Grant permissions to a RAM role.
{ "Version": "1", "Statement": [ { "Action": [ "ecs:DescribeDisks", "ecs:DescribeInstances", "ecs:DescribeAvailableResource", "ecs:DescribeInstanceTypes", "nas:DescribeFileSystems", "ecs:AttachDisk", "ecs:CreateDisk", "ecs:CreateSnapshot", "ecs:DeleteDisk", "ecs:DeleteSnapshot", "ecs:DetachDisk" ], "Resource": [ "*" ], "Effect": "Allow" } ] }
- If the k8sMasterRolePolicy-Csi-*** policy exists and the policy provides the
- If the issue persists after you perform the preceding steps, Submit a ticket.
What do I do if the console displays Failed to retrieve the data. Refresh and try again. 404 page not found
Issue
The console displays Failed to retrieve the data. Refresh and try again. 404 page not found.
Causes
The relevant custom resource definitions (CRDs) fail to be deployed.
Solution
- Check whether nodes that belong to the cluster exist. If nodes that belong to the cluster do not exist, the backup center cannot be deployed.
- Check whether the cluster uses FlexVolume. If the cluster uses FlexVolume, switch to CSI. Fore more information, see What do I do if the migrate-controller component in a cluster that uses FlexVolume cannot be launched?.
- If you use the command-line tool to access the backup center, check whether the YAML content contains errors. Fore more information, see Send kubelet requests to use the backup and restore features.
What do I do if the system prompts that no backup file can be selected when the system initializes the backup vault to restore an application across clusters?
Issue
The system prompts that no backup file can be selected when the system initializes the backup vault to restore an application across clusters.
Cause
The backup vault that you select is not associated with your cluster. The system initializes the backup vault and synchronizes the basic information about the backup vault, including the OSS bucket information, to the cluster. Then, the system initializes the backup files from the backup vault in the cluster. You can select a backup file from the backup vault to restore the application only after the backup vault is initialized.
Solution
In the Create Restoration Task panel, click Initialize Backup Vault to the right of Backup Vaults, wait until the backup vault is initialized, and then select a backup file.
How do I modify a backup vault?
You cannot modify backup vaults in the backup center. If you want to modify a backup vault, delete the current backup vault and create a new one.
Backup vaults are shared resources. Existing backup vaults may be in the Backing Up or Restoring state. If you modify a parameter of the backup vault, the system may fail to find the required data when backing up or restoring an application. Therefore, backup vaults cannot be modified.
- If a backup vault has never been used before, you can delete the backup vault and then create a new one with the same name.
- If the backup vault has been used to back up or restore data, the preceding method does not apply. This ensures that the backup or restore task can be completed without errors.
What do I do if the status of the backup task is Failed and the velero backup: XXX failed error is returned?
Issue
The backup task failed and the velero backup: XXX failed error is returned.
Cause
During the backup process, the pod named csdr-velero-**** in the csdr namespace encounters an error, such as the OOMKilled error. During application backup, csdr-velero may reach the memory upper limit. If you back up a large number of objects, the OOMKilled error may occur.
Solution
For more information about how to resolve the issue, see What do I do if the backup, restore, or snapshot conversion task remains in the Inprogress state for a long period of time?.