All Products
Search
Document Center

Container Service for Kubernetes:Use Managed Service for Prometheus to monitor the backup center and configure alerting

Last Updated:Mar 26, 2026

You can integrate the backup center with Managed Service for Prometheus to monitor backup vaults and backup tasks in real time, and configure alerts for backup failures.

Prerequisites

Before you begin, ensure that you have:

Billing

The migrate-controller component sends metrics to Managed Service for Prometheus. These metrics are classified as custom metrics and incur additional fees.

Read Metrics before enabling the backup center monitoring feature to understand the billing rules for custom metrics. Fees vary based on cluster size and number of applications. You can also view resource usage in Managed Service for Prometheus.

Integrate the backup center with Managed Service for Prometheus

  1. Log on to the ARMS consoleACK consoleACK console.

  2. In the left-side navigation pane, click Integration Center. On the Infrastructure tab, search for and click Ack Backup Center.

  3. On the Start Integration tab, select the ACK cluster that has the backup center installed, then click OK.

After the integration is complete, view the dashboards in the ACK console or the ARMS console.

View backup center dashboards

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, find your cluster and click its name. In the left navigation pane, choose Operations > Prometheus Monitoring.

  3. On the Prometheus Monitoring page, click the Others tab and view the backup center dashboards under the ACK BackupCenter tab.

To view the backup center dashboards in the ARMS console, see View a dashboard.

Three dashboards are available: Backup Locations (backup vault information), Backup Operation Status (backup task information), and Addon Status (working component information).

Backup Locations

This dashboard shows the status of backup vaults associated with the current cluster (Backuplocation Detail).

image

A backup vault stores backup files and displays the association between the backup center and an Object Storage Service (OSS) bucket. The backup center can run backup, snapshot, and restore tasks only after a backup vault enters the Available state.

Metric Description
Backuplocation The name of the backup vault.
OSS Bucket The name of the OSS bucket associated with the backup vault.
Region The region of the OSS bucket, such as cn-hangzhou.
NetworkPolicy The network connection type between the backup vault and the OSS bucket. Valid values: internal (internal network) or Public (Internet).
Phase The status of the backup vault. Valid values: InProgress (initialization in progress, checking OSS connectivity — lasts a short time), Available (OSS connectivity is normal, backup tasks can run), or Unavailable (disconnected from OSS, backup tasks cannot run).

If a backup vault is in the Unavailable state, run the following command to view the error details. Replace <unavailable-backuplocation-name> with the name of the affected backup vault.

kubectl -n csdr describe backuplocation <unavailable-backuplocation-name>

For more troubleshooting information, see Backup center FAQ.

Backup Operation Status

This dashboard shows the status of backup tasks, including an overview of all backup tasks (Backup Overview) and details of failed tasks (Failed Backup Detail).

image

Backup Overview displays a histogram of backup tasks per vault in the current cluster, covering both instant and scheduled backup tasks. The X axis shows backup vault names and the Y axis shows the number of tasks per vault.

Metric Description
Backup (Failed) The red bar shows the number of failed backup tasks.
Backup (Completed) The green bar shows the number of successful backup tasks.

Failed Backup Detail shows basic information about failed backup tasks in the current cluster.

Metric Description
Backup The name of the backup task.
Backuplocation The name of the backup vault the task belongs to.
BackupType The backup mode. Valid values: AppBackup (YAML-only backups) or AppAndPvBackup (YAML files and persistent volume (PV) data).
DataType The type of data backup. Valid values: snapshot (disk PVs), hbr (file system PVs, including HostPath local volumes, NAS volumes, and OSS volumes), all (both disk and file system PVs), or none (data backup is enabled but no PV is used in the specified namespace).
FromSchedule The backup task type. Empty indicates an instant backup task. Not empty indicates a scheduled backup task and displays the backup plan name.

If a backup task fails, investigate using the CLI or the ACK console.

CLI

Run the following command to view the error details. Replace <failed-applicationbackup-name> with the name of the failed backup task.

kubectl -n csdr describe applicationbackup <failed-applicationbackup-name>

ACK console

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. Find your cluster and click its name. In the left navigation pane, choose Operations > Application Backup.

  3. On the Application Backup page, click the Backup Records tab, find the failed backup task, and click Failed in the Status column.

For more troubleshooting information, see Backup center FAQ.

Addon Status

This dashboard shows the status of the csdr-controller and csdr-velero working components. Both components must run normally for the backup center to perform backup, snapshot, and restore tasks.

After migrate-controller is installed, it runs a precheck on the cluster and then deploys csdr-controller and csdr-velero in the csdr namespace.

image

Both components run as Deployment pods.

Metric Description
Age The uptime of the working component.
Status The health status. Health: the pod is running normally. UnHealth: the pod failed to start or probing failed.
Pods Detailed information about the working component pod.
Memory Request Memory resources reserved for the working component.
CPU Request CPU resources reserved for the working component.
Memory Limit The memory upper limit of the working component.
CPU Limit The CPU upper limit of the working component.

If a working component shows UnHealth or has a non-zero restart count, see Troubleshooting below.

Configure alerting for backup task failures

Backup task failure alerts are event-based. When a backup task fails, an applicationbackups CustomResourceDefinition (CRD) in the csdr.alibabacloud.com resource group generates a WARN event.

To query WARN events from failed backup tasks, run:

kubectl -n csdr get events --field-selector='type!=Normal'

Expected output:

VaultError: backup vault is unavailable: oss: service returned error: StatusCode=403, ErrorCode=AccessDenied, ErrorMessage="The bucket you access does not belong to you.", RequestId=668516BC35F915******

The VaultError field shows the cause of the failure.

To configure alert rules that trigger on these events, use the ACK cluster alerting feature. For more information, see Container Service alert management.

Troubleshooting

Working component not found or in the UnHealth state

Symptom: After the backup center is installed, the working component cannot be found or is repeatedly deployed.

Run the following command to check the status of migrate-controller:

kubectl -n kube-system get pod -l app=migrate-controller

If the pod is in CrashLoopBackOff or keeps restarting, the cluster failed the precheck. This typically occurs because the cluster uses FlexVolume or the registered cluster lacks the required permissions. For more information, see Backup center FAQ and Registered cluster.

Symptom: The UnHealth state persists and the pod dashboard shows no data.

The working component pod failed to start. For more information, see Pod troubleshooting.

Symptom: The working component shows Health but has a non-zero restart count.

The csdr-velero component can experience memory spikes during backup, which may trigger Out-of-Memory (OOM) errors and cause the component to exit unexpectedly. Increase the memory limit to resolve this.

If a working component pod exits during a backup, the backup task will fail or remain in the InProgress state for an extended period.

Backup vault in the Unavailable state

Run the following command to view the error message. Replace <unavailable-backuplocation-name> with the name of the affected backup vault.

kubectl -n csdr describe backuplocation <unavailable-backuplocation-name>

For more troubleshooting information, see Backup center FAQ.