This topic provides answers to some frequently asked questions about the backup center.

Table of contents

Note If you want to use the command-line interface to access the backup center, we recommend that you update the backup component migrate-controller to the latest version before you perform troubleshooting. The update does not affect the existing backups.

What do I do if the migrate-controller component in a cluster that uses FlexVolume cannot be launched?

The migrate-controller component does not support clusters that use FlexVolume. If you want to use the backup center feature, you can use one of the following methods to switch from FlexVolume to Container Storage Interface (CSI):

What do I do if the status of the backup, restore, or snapshot conversion task is Failed and the ack backup location status is not ok: XXX error is returned?

Issue

The status of the backup, restore, or snapshot conversion task is Failed and the ack backup location status is not ok: XXX error is returned.

Causes

  • The specified Object Storage Service (OSS) bucket does not exist.
  • The cluster does not have permissions to access OSS.
  • The network of the OSS bucket is unreachable.

Solution

  1. Log on to the OSS console. Check whether the OSS bucket associated with the backup vault exists.
    If the OSS bucket does not exist, create one and associate it with the backup vault. Fore more information, see Create buckets.
  2. Check whether the cluster has permissions to access OSS.
    • Container Service for Kubernetes (ACK) Pro clusters: No OSS permissions are required. Make sure that the name of the backup vault is cnfs-oss-**.
    • ACK dedicated clusters and registered clusters: OSS permissions are required. Fore more information, see Install migrate-controller and grant permissions.
    Note You cannot create a backup vault that uses the same name as a deleted one. You cannot associate a backup vault with an OSS bucket whose name is not cnfs-oss-**. If your backup vault is already associated with an OSS bucket whose name is not cnfs-oss-**, recreate a backup vault that uses a different name and associate the backup vault with an OSS bucket whose name meets the requirement.
  3. Check the network configuration of the cluster. If the cluster and OSS bucket are deployed in different regions, the backup vault needs to access the OSS bucket over the Internet.

What do I do if the backup, restore, or snapshot conversion task remains in the Inprogress state for a long period of time?

Cause 1: The components in the csdr namespace cannot run as normal

Check the status of the components and identify the cause of the anomaly.

  1. Run the following command to check whether the components in the csdr namespace are restarted or cannot be launched:
    kubectl get pod -n csdr
  2. Run the following command to identify the cause of the restart or launch failure.
    kubectl describe pod <pod-name> -n csdr
  • If the components are restarted due to an out of memory (OOM) error, perform the following steps:

    Run the following command to modify the resource limit of the Deployment. Set <deploy-name> of csdr-controller-*** to csdr-controller and set <deploy-name> of csdr-velero-*** to csdr-velero.

    kubectl patch deploy  <deploy-name> -p '{"spec":{"containers":{"resources":{"limits":"<new-limit-memory>"}}}}'
  • If the components cannot be launched due to insufficient HBR permissions, perform the following steps:
    1. Check whether Hybrid Backup Recovery (HBR) is activated for the cluster.
      • If HBR is not activated, activate the service. For more information, see HBR.
      • If HBR is activated, proceed with the next step.
    2. Check whether the ACK Pro cluster or registered cluster has HBR permissions.
    3. Run the following command to check whether the token required by the HBR client exists.
      kubectl describe <hbr-client-***>
      If a couldnt find key HBR_TOKEN event is generated, the token does not exist. Perform the following steps to resolve the issue:
      1. Run the following command to query the node that hosts hbr-client-***:
        kubectl get pod <hbr-client-***> -n csdr -owide
      2. Run the following command to change the value of labels: csdr.alibabacloud.com/agent-enable from true to false for the node.
        kubectl label node <node-name> csdr.alibabacloud.com/agent-enable=false --overwrite
        Important
        • When the system reruns the backup or restore task, the system automatically creates a token and launches hbr-client.
        • You cannot launch hbr-client by copying a token from another cluster to the current cluster. You need to delete the copied token and the corresponding hbr-client-*** pod and repeat the preceding steps.

Cause 2: No permissions are granted to use disk snapshots in disk backup scenarios

If you back up the disk volume that is mounted to your application but the backup task remains in the Inprogress state for a long period of time, run the following command to query the newly created VolumeSnapshots in the cluster:

kubectl get volumesnapshot -n <backup-namespace>

Expected output:

NAME                    READYTOUSE      SOURCEPVC         SOURCESNAPSHOTCONTENT         ...
<volumesnapshot-name>   true                              <volumesnapshotcontent-name>  ...

If the READYTOUSE state of all VolumeSnapshots remains false for a long period of time, perform the following steps.

  1. Log on to the Elastic Compute Service (ECS) console and check whether the disk snapshot feature is enabled.
    • If the feature is disabled, enable the feature in the corresponding region. For more information, see Activate ECS Snapshot.
    • If the feature is enabled, proceed with the next step.
  2. Check whether the permissions to use disk snapshots are granted.
    1. Log on to the ACK console and click Clusters in the left-side navigation pane.
    2. On the Clusters page, click the name of a cluster and click Cluster Information in the left-side navigation pane.
    3. On the Cluster Information page, click the Cluster Resources tab and click the hyperlink to the right of Master RAM Role to go to the permission management page.
    4. On the Policies page, check whether the permissions to use disk snapshots are granted.
      • If the k8sMasterRolePolicy-Csi-*** policy exists and the policy provides the k8sMasterRolePolicy-Csi-*** and k8sMasterRolePolicy-Csi-*** permissions, the required permissions are granted. In this case, Submit a ticket.
      • If the k8sMasterRolePolicy-Csi-*** policy does not exist, attach the following policy to the master RAM role to grant the permissions to use disk snapshots. For more information, see Create a custom policy and Grant permissions to a RAM role.
        {
            "Version": "1",
            "Statement": [
                {
                    "Action": [
                        "ecs:DescribeDisks",
                        "ecs:DescribeInstances",
                        "ecs:DescribeAvailableResource",
                        "ecs:DescribeInstanceTypes",
                        "nas:DescribeFileSystems",
                        "ecs:AttachDisk",
                        "ecs:CreateDisk",
                        "ecs:CreateSnapshot",
                        "ecs:DeleteDisk",
                        "ecs:DeleteSnapshot",
                        "ecs:DetachDisk"
                    ],
                    "Resource": [
                        "*"
                    ],
                    "Effect": "Allow"
                }
            ]
        }
    5. If the issue persists after you perform the preceding steps, Submit a ticket.

What do I do if the console displays Failed to retrieve the data. Refresh and try again. 404 page not found

Issue

The console displays Failed to retrieve the data. Refresh and try again. 404 page not found.

Causes

The relevant custom resource definitions (CRDs) fail to be deployed.

Solution

What do I do if the system prompts that no backup file can be selected when the system initializes the backup vault to restore an application across clusters?

Issue

The system prompts that no backup file can be selected when the system initializes the backup vault to restore an application across clusters.

Cause

The backup vault that you select is not associated with your cluster. The system initializes the backup vault and synchronizes the basic information about the backup vault, including the OSS bucket information, to the cluster. Then, the system initializes the backup files from the backup vault in the cluster. You can select a backup file from the backup vault to restore the application only after the backup vault is initialized.

Solution

In the Create Restoration Task panel, click Initialize Backup Vault to the right of Backup Vaults, wait until the backup vault is initialized, and then select a backup file.

How do I modify a backup vault?

You cannot modify backup vaults in the backup center. If you want to modify a backup vault, delete the current backup vault and create a new one.

Backup vaults are shared resources. Existing backup vaults may be in the Backing Up or Restoring state. If you modify a parameter of the backup vault, the system may fail to find the required data when backing up or restoring an application. Therefore, backup vaults cannot be modified.

Important
  • If a backup vault has never been used before, you can delete the backup vault and then create a new one with the same name.
  • If the backup vault has been used to back up or restore data, the preceding method does not apply. This ensures that the backup or restore task can be completed without errors.

What do I do if the status of the backup task is Failed and the velero backup: XXX failed error is returned?

Issue

The backup task failed and the velero backup: XXX failed error is returned.

Cause

During the backup process, the pod named csdr-velero-**** in the csdr namespace encounters an error, such as the OOMKilled error. During application backup, csdr-velero may reach the memory upper limit. If you back up a large number of objects, the OOMKilled error may occur.

Solution

For more information about how to resolve the issue, see What do I do if the backup, restore, or snapshot conversion task remains in the Inprogress state for a long period of time?.