You can use the backup center to migrate applications from a cluster that uses the FlexVolume plug-in to a cluster that uses the Container Storage Interface (CSI) plug-in. You can also migrate applications from an older Kubernetes cluster to a new one. The backup center resolves issues that arise during application migration across clusters with different storage plug-ins or Kubernetes versions. For example, it can back up cluster-level resources that are not used by applications and automatically use an API version compatible with the restore cluster. This topic describes how to use the backup center to migrate applications by using an example of migrating an application from a cluster that uses FlexVolume and runs Kubernetes 1.16 to a cluster that uses CSI and runs Kubernetes 1.28.
Notes
The backup cluster and the restore cluster must be in the same region. The backup cluster must run Kubernetes 1.16 or later. To avoid API version compatibility issues, we recommend that you do not use the backup center to migrate applications from a newer to an older Kubernetes version.
The backup center does not back up resources that are being deleted.
-
To restore data to a Network Attached Storage (NAS) volume managed by Container Network File System (CNFS), you must create a StorageClass if you plan to set the StorageClass parameter to alibabacloud-cnfs-nas during restoration. For more information, see Manage NAS file systems by using CNFS.
When an application is restored, resources are preferentially restored using the recommended apiVersion for the Kubernetes version of the restore cluster. If a resource does not have an apiVersion that is supported by both cluster versions, you must manually deploy that resource. For example:
A Deployment in a Kubernetes 1.16 cluster supports the
extensions/v1beta1,apps/v1beta1,apps/v1beta2, andapps/v1API versions. When restored to a Kubernetes 1.28 cluster, it is restored by usingapps/v1.An Ingress in a Kubernetes 1.16 cluster supports the
extensions/v1beta1andnetworking.k8s.io/v1beta1API versions. You cannot directly restore it to a cluster that runs Kubernetes 1.22 or later.
For more information about API changes across Kubernetes versions, see ACK release notes and Deprecated API Migration Guide.
ImportantIn a Kubernetes 1.16 cluster, API groups such as
appsandrbac.authorization.k8s.ioalready support v1. When you migrate applications to a Kubernetes 1.28 cluster, you must manually restore resources such as Ingress and CronJob.
Use cases
Application migration across storage plug-ins
ACK clusters that run Kubernetes 1.20 or later no longer support the FlexVolume storage plug-in. You can use the backup center to migrate stateful applications from a FlexVolume cluster to a CSI cluster.
NoteYou can migrate applications from clusters that use either the FlexVolume or CSI storage plug-in, but the restore cluster must use the CSI storage plug-in.
Switching between clusters with large Kubernetes version gaps
In some scenarios, you may need to migrate services from an older Kubernetes cluster (1.16 or later) to a new cluster. For example, you might switch the network plug-in from Flannel to Terway. The backup center supports application migration across large version gaps and automatically adjusts basic configurations, such as the
apiVersionin application templates, to match the new Kubernetes version.
Prerequisites
Activate Cloud Backup. When backing up NAS, OSS, or local disk persistent volumes, and in hybrid cloud scenarios, the backup center needs to use Cloud Backup for File Backup.
A cluster where the volume is restored is created. To ensure that you can use snapshots of Elastic Compute Service (ECS) instances to restore disk data, we recommend that you update the Kubernetes version of the cluster to 1.18 or later. For more information, see Create an ACK managed cluster, Create an ACK dedicated cluster (discontinued), or Create an ACK One registered cluster.
ImportantThe restore cluster must use the Container Storage Interface (CSI) plug-in. Application restoration is not supported in clusters that use FlexVolume or use csi-compatible-controller and FlexVolume.
The backup center is used to back up and restore applications. Before you run a restore task, you must install and configure system components in the restore cluster. Example:
aliyun-acr-credential-helper: You need to grant permissions to the restore cluster and configure acr-configuration.
alb-ingress-controller: You need to configure an ALBConfig.
-
The migrate-controller backup service component is installed and its permissions are configured. For more information, see Install the migrate-controller backup service component and configure permissions.
-
To back up volumes by using cloud disk snapshots, you must install CSI plug-in v1.1.0 or later. For more information, see Install and upgrade CSI components.
Migration workflow
The migration workflow varies based on the storage plug-in that the backup cluster uses. The following figures show the details.
Backup cluster with no storage applications
Backup cluster using FlexVolume
Backup cluster using CSI
Procedure
This section uses an example of migrating applications, configurations, and volume data from an ACK cluster that uses FlexVolume and runs Kubernetes 1.16 to an ACK cluster that uses CSI and runs Kubernetes 1.28. The migration uses either a data source change or an unchanged data source. If you want to migrate applications that do not use storage or if your backup cluster uses the CSI storage plug-in, you can skip the steps that are marked as Optional.
If you use the unchanged data source method, you must change the reclaim policy of the PersistentVolume (PV) in the backup cluster to Retain. This prevents data from being deleted when the volume is deleted.
kubectl patch pv/<pv-name> --type='json' -p '[{"op":"replace","path":"/spec/persistentVolumeReclaimPolicy","value":"Retain"}]'Method | Description | Use case |
Data source change | This method backs up data from volumes in the backup cluster and creates a new copy of the data for applications in the restore cluster. This method creates two completely independent storage sets. The data restore process uses dynamic mounting, which allows you to change the storage type by converting the StorageClass. For example, you can convert NAS storage to disk storage. |
|
Unchanged data source | This method uses static mounting during the restore process to reuse the original data source, such as a disk ID or an OSS bucket, based on the backed-up PersistentVolumeClaim (PVC) and PersistentVolume (PV). If you are migrating applications from a FlexVolume cluster to a CSI cluster, you must manually create a static PVC and PV because the YAML templates are not compatible. | The application cannot be paused for data writes during the backup and restore process, and requires strong data consistency. |
Prepare the environment
Item | Backup cluster | Restore cluster |
Cluster version | 1.16.9-aliyun.1 | 1.28.3-aliyun.1 |
Runtime version | Docker 19.03.5 | containerd 1.6.20 |
Storage component version | FlexVolume: v1.14.8.109-649dc5a-aliyun | CSI: v1.26.5-56d1e30-aliyun |
Other |
| The csi-plugin and csi-provisioner storage components are installed. For more information, see Manage components. |
Step 1: Deploy a test application
Run the following command to deploy a dynamically provisioned disk volume.
Replace
alicloud-disk-topologywith the name of the default disk StorageClass for the FlexVolume storage plug-in in your cluster.cat << EOF | kubectl apply -f - kind: PersistentVolumeClaim apiVersion: v1 metadata: name: disk-essd spec: accessModes: - ReadWriteOnce storageClassName: alicloud-disk-topology resources: requests: storage: 20Gi EOFRun the following command to deploy a statically provisioned NAS volume.
Replace
serverwith the mount target of the NAS file system in your account.cat << EOF | kubectl apply -f - apiVersion: v1 kind: PersistentVolume metadata: name: pv-nas spec: capacity: storage: 5Gi storageClassName: nas accessModes: - ReadWriteMany flexVolume: driver: "alicloud/nas" options: server: "1758axxxxx-xxxxx.cn-beijing.nas.aliyuncs.com" vers: "3" options: "nolock,tcp,noresvport" --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: pvc-nas spec: accessModes: - ReadWriteMany storageClassName: nas resources: requests: storage: 5Gi EOFRun the following command to deploy the application. This application mounts both the disk and NAS volumes from the previous steps.
The
apiVersionin the following code uses extensions/v1beta1. ThisapiVersionhas been deprecated in version 1.28 clusters.cat << EOF | kubectl apply -f - apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx labels: app: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx ports: - containerPort: 80 volumeMounts: - name: nas mountPath: /cold - name: disk mountPath: /hot volumes: - name: nas persistentVolumeClaim: claimName: pvc-nas - name: disk persistentVolumeClaim: claimName: disk-essd EOFRun the following command to verify that the deployed application has started.
kubectl get pod -l app=nginxExpected output:
NAME READY STATUS RESTARTS AGE nginx-5ffbc895b-xxxxx 1/1 Running 0 2m28s
Step 2: Install the backup center
In the backup cluster, install the migrate-controller backup service component.
NoteFor clusters that run Kubernetes 1.16 or later, you can directly install the backup service component of V1.7.6 or later from the component marketplace.
If your backup cluster is an ACK Dedicated Cluster or a registered cluster, or uses a storage plugin other than CSI (such as FlexVolume), you need to configure additional permissions. For more information, see Registered cluster.
(Optional) If your cluster is a FlexVolume cluster, run the following command to confirm that the required permissions are configured.
kubectl -n csdr get secret alibaba-addon-secret(Optional) If your cluster is a FlexVolume cluster, run the following command to add the
USE_FLEXVOLUMEenvironment variable for the migrate-controller in the kube-system namespace.ImportantIn a FlexVolume cluster, after the
migrate-controllerbackup service component is installed, themigrate-controllerpod exits unexpectedly, and opening the Application Backup page of the cluster returns a 404 error. In this case, you must edit the component's YAML file to add theUSE_FLEXVOLUMEenvironment variable.kubectl -n kube-system patch deployment migrate-controller --type json -p '[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"USE_FLEXVOLUME","value":"true"}}]'Run the following commands to confirm that the backup service component is running properly.
kubectl -n kube-system get pod -l app=migrate-controller kubectl -n csdr get podExpected output:
NAME READY STATUS RESTARTS AGE migrate-controller-6c8b9c6cbf-967x7 1/1 Running 0 3m55s NAME READY STATUS RESTARTS AGE csdr-controller-69787f6dc8-f886h 1/1 Running 0 3m39s csdr-velero-58494f6bf4-52mv6 1/1 Running 0 3m37s
Step 3: Create a backup
In the same region as the backup cluster, create an OSS Bucket named in the
cnfs-oss-*format to store backups. For more information, see Create a bucket.NoteACK managed clusters have permissions for OSS Buckets whose names start with
cnfs-oss-*by default. If your bucket naming format does not meet the requirements, you also need to configure additional permissions. For more information, see Install components and configure permissions in the console.Run the following command to create an immediate backup task.
For information about how to configure backup settings in the ACK console, see Back up and restore applications in a cluster. This step provides recommended configuration for the example scenario. Adjust the settings for your specific scenario.
cat << EOF | kubectl apply -f - apiVersion: csdr.alibabacloud.com/v1beta1 kind: ApplicationBackup metadata: annotations: csdr.alibabacloud.com/backuplocations: '{"name":"<your-backup-vault-name>","region":"<your-region-id>","bucket":"<your-oss-bucket-name>","provider":"alibabacloud"}' labels: csdr/schedule-name: fake-name name: <your-backup-name> namespace: csdr spec: excludedNamespaces: - csdr - kube-system - kube-public - kube-node-lease excludedResources: - storageclasses - clusterroles - clusterrolebindings - events - persistentvolumeclaims - persistentvolumes includeClusterResources: true pvBackup: defaultPvBackup: true storageLocation: <your-backup-vault-name> ttl: 720h0m0s EOFParameter
Description
excludedNamespaces
The namespaces to exclude from the backup. We recommend that you exclude the following namespaces:
csdr: The working namespace for the backup center. The backup center has inter-cluster synchronization logic. Do not manually back up tasks, such as backup or restore, in the csdr namespace. This action may cause unexpected behavior.kube-system,kube-public, andkube-node-leaseare namespaces that exist by default in ACK clusters. They cannot be easily restored between clusters due to differences in cluster parameters and configurations.
excludedResources
The resources to exclude. You can configure this parameter based on your business requirements.
includeClusterResources
Specifies whether to back up cluster-level resources, such as StorageClasses, CRDs, and webhooks.
true: Backs up all Cluster-level resources.false: Backs up only Cluster-level resources that are referenced by Namespace-level resources in the selected namespace. For example, when you back up a Pod, if the referenced ServiceAccount is authorized by a ClusterRole, the ClusterRole is automatically backed up. When you back up a CR, the CRD is automatically backed up.
NoteBy default,
IncludeClusterResourcesis set tofalsefor backup tasks created in the ACK console.defaultPvBackup
Specifies whether to back up volume data.
true: Backs up the application and the data in volumes used by running Pods.false: Backs up the application only.
ImportantFor clusters that run both Kubernetes and CSI versions 1.18 or later, the backup center uses ECS snapshots to back up disk data by default. For other storage types or for disk data in clusters that run Kubernetes versions from 1.16 up to, but not including, 1.18, Cloud Backup is used.
For volumes that are not used by running pods, you can only use the unchanged data source method. This requires manually creating a static PV and PVC in the new cluster and specifying the original data source, such as a disk ID or an OSS bucket.
If your application requires strong data consistency, pause data writes during the backup period. Alternatively, you can choose the unchanged data source method and back up only the application.
Run the following command to query the status of the backup task.
kubectl -ncsdr describe applicationbackup <your-backup-name>In the expected output, the
Phaseparameter ofStatuschanges toCompleted, which indicates that the backup task is created successfully.Run the following commands to confirm the resource list for this backup.
kubectl -ncsdr get pod | grep csdr-velero kubectl -ncsdr exec -it <csdr-velero-pod-name> -- /velero describe backup <your-backup-name> --detailsYou can check the list for resources that were not backed up and adjust the backup configuration to run the backup again.
Resource List: apiextensions.k8s.io/v1/CustomResourceDefinition: - volumesnapshots.snapshot.storage.k8s.io v1/Endpoints: - default/kubernetes v1/Namespace: - default v1/PersistentVolume: - d-2ze88915lz1il01v1yeq - pv-nas v1/PersistentVolumeClaim: - default/disk-essd - default/pvc-nas v1/Secret: - default/default-token-n7jss - default/oss-secret - default/osssecret v1/Service: - default/kubernetes v1/ServiceAccount: - default/default ...
Step 4: Install the backup center
Install the backup center in the restore cluster. For more information, see Step 2: Install the backup center.
Associate the backup vault that you created with the restore cluster.
Log on to the ACK console.
On the Clusters page, click the name of the target cluster. In the left-side navigation pane, choose .
On the Application Backup page, click Restore.
Select the Backup Vault that was used for the backup, click Initialize Backup Vault, and wait for the backup to synchronize to this cluster.
(Optional) Step 5: Manually create PVCs and PVs
In most scenarios, you can simply follow Step 6 to create a restore task directly in the restore cluster. The backup center component then automatically generates storage claims and volumes based on the backup.
When the backup center executes a restore task, it skips the restoration of PVCs and PVs with the same names to protect existing data. This means it neither rebuilds them nor overwrites the data within the volumes. Therefore, in the following scenarios, you can pre-create the PVCs and PVs before the restore task for more flexible recovery:
You backed up volumes, but some volumes contain data that does not need to be migrated, such as logs. You can pre-create empty volumes.
You backed up volumes, but volumes in the backup cluster that are not used by running pods also need to be migrated to the restore cluster.
You did not back up volumes, and the
excludedResourceslist includespersistentvolumeclaimsandpersistentvolumes, or the migration involves moving an application from a FlexVolume cluster to a CSI cluster.
The following are the specific steps:
Disks cannot be mounted across availability zones. If you switch to a different availability zone in the restore cluster, choose one of the following methods:
Synchronize data using the data source change method.
Log on to the ECS Management Console, manually create a single snapshot for the disk, and use the snapshot to create a disk in a new availability zone. For more information, see Create a disk from a snapshot. In the following
outputfile.txtYAML file, replace the disk ID and the availability zone ID innodeAffinity.
(Optional) If your backup cluster is a FlexVolume cluster, you can use a command-line tool to batch convert YAML files because the YAML for PVs and PVCs is different between FlexVolume and CSI. For more information, see Use the FlexVolume2CSI command-line tool to batch convert YAML files.
Run the following command to deploy the CSI YAML file obtained from FlexVolume2CSI.
where
outputfile.txtis the output of the command-line tool's YAML conversion.kubectl apply -f outputfile.txtRun the following command to confirm that the PVC in the restore cluster is in the
Boundstate.kubectl get pvcExpected output:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE disk-essd Bound d-2ze88915lz1il0xxxxxx 20Gi RWO alicloud-disk-essd 29m pvc-nas Bound pv-nas 5Gi RWX nas 29m
Step 6: Create a restore task
If a resource with the same name already exists in the restore cluster, the restore task skips that resource.
The backup center focuses on the backup and restore of business applications. Before you start a restore task, you must install and configure the required system components in the restore cluster. For example:
ACR secret-free component: You need to re-authorize the restore cluster and configure
acr-configuration.ALB Ingress component: You need to configure
ALBConfigand other related resources in advance.
When restoring Service resources, the backup center adapts them based on the Service type:
NodePortService: When restoring across clusters, the backup center retains the port number by default.For a Service of type LoadBalancer, when
ExternalTrafficPolicyis set toLocal, theHealthCheckNodePortuses a random port number by default. To preserve the port number, setspec.preserveNodePorts: truewhen you create a restore task.If a Service in the backup cluster uses a specified existing SLB instance, the restored service will use the original SLB instance and disable forced listening by default. You need to configure the listener in the SLB console.
If an SLB instance for a Service in the backup cluster is managed by CCM, a new SLB instance is created by CCM upon restoration. For more information, see Load balancer configuration notes for a Service.
If you backed up volumes when creating the backup, you are using the data source change method for backup and restore. You can change the storage type by using StorageClass conversion (convertedarg). For example, you can convert NAS storage to disk storage. You can select the target StorageClass based on your requirements.
In this example, because the backup cluster is a v1.16 FlexVolume cluster and disk volume backups use Cloud Backup, you can select alicloud-disk as the target StorageClass for the disk-essd storage claim (which is converted to a CSI disk class and defaults to alicloud-disk-topology-alltype). If your backup cluster is a v1.18 or later CSI cluster, you do not need to perform related configurations for disk volumes.
This example also converts the FlexVolume NAS volume to a CNFS-managed isolated NAS volume by selecting the target StorageClass
alibabacloud-cnfs-nasfor thepvc-nasPVC. If your cluster does not have thealibabacloud-cnfs-nasStorageClass, see Manage NAS file systems by using CNFS.
The following are the specific steps:
Run the following command to create a restore task.
For information about how to configure a restore task in the ACK console, see Restore applications and data volumes. This step provides recommended configuration for the example scenario. Adjust the settings for your specific scenario.
cat << EOF | kubectl apply -f - apiVersion: csdr.alibabacloud.com/v1beta1 kind: ApplicationRestore metadata: annotations: csdr.alibabacloud.com/backuplocations: >- '{"name":"<your-backup-vault-name>","region":"<your-region-id>","bucket":"<your-oss-bucket-name>","provider":"alibabacloud"}' name: <your-restore-name> namespace: csdr spec: backupName: <your-backup-name> excludedNamespaces: - arms-prom excludedResources: - secrets appRestoreOnly: false convertedarg: - convertToStorageClassType: alicloud-disk-topology-alltype namespace: default persistentVolumeClaim: alicloud-disk - convertToStorageClassType: alibabacloud-cnfs-nas namespace: default persistentVolumeClaim: pvc-nas namespaceMapping: <backupNamespace>: <restoreNamespace> EOFParameter
Description
excludedNamespaces
The namespaces to exclude. You can exclude unwanted namespaces from the backup resource list.
excludedResources
The resources to exclude. You can exclude unwanted resource types from the backup resource list.
appRestoreOnly
For a backup that includes volumes, this parameter specifies whether to restore the volumes.
true: Creates dynamic volumes and storage claims that point to a new data source during restoration. Backup tasks created in the console default to true.false: A static volume is not created. You must manually deploy a static volume in advance.
NoteTypically, set this to
truefor a data source change and tofalsefor an unchanged data source.convertedarg
The StorageClass conversion list. For volumes of the FileSystem type, such as OSS, NAS, CPFS, and local volumes, you can configure this parameter to convert the StorageClasses of their PVCs to the specified StorageClass during the restoration process. For example, you can convert NAS volumes to disk volumes.
convertToStorageClassType: the desired StorageClass. Make sure that the StorageClass exists in the current cluster. You can specify only the disk or NAS StorageClass.
namespace: the namespace of the PVC.
persistentVolumeClaim: the name of the PVC.
The above are the required parameters for the StorageClass conversion feature.
Run the following command to query the status of the restore task.
kubectl -ncsdr describe applicationrestore <your-restore-name>In the expected output, the
PhaseofStatuschanges toCompleted, which indicates that the task is restored successfully.Run the following commands to check for any resources that failed to restore and to find the cause of the failure.
kubectl -ncsdr get pod | grep csdr-velero kubectl -ncsdr exec -it <csdr-velero-pod-name> -- /velero describe restore <your-restore-name> --detailsExpected output:
Warnings: Velero: <none> Cluster: could not restore, ClusterRoleBinding "kubernetes-proxy" already exists. Warning: the in-cluster version is different than the backed-up version. Namespaces: demo-ns: could not restore, ConfigMap "kube-root-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version. could not restore, Endpoints "kubernetes" already exists. Warning: the in-cluster version is different than the backed-up version. could not restore, Service "kubernetes" already exists. Warning: the in-cluster version is different than the backed-up version. Errors: Velero: <none> Cluster: <none> Namespaces: demo-ns: error restoring endpoints/xxxxxx/kubernetes: Endpoints "kubernetes" is invalid: subsets[0].addresses[0].ip: Invalid value: "169.254.128.9": may not be in the link-local range (169.xxx.0.0/16, fe80::/10) error restoring endpointslices.discovery.k8s.io/demo-ns/kubernetes: EndpointSlice.discovery.k8s.io "kubernetes" is invalid: endpoints[0].addresses[0]: Invalid value: "169.xxx.128.9": may not be in the link-local range (169.xxx.0.0/16, fe80::/10) error restoring services/xxxxxx/kubernetes-extranet: Service "kubernetes-extranet" is invalid: spec.ports[0].nodePort: Invalid value: 31882: provided port is already allocatedFrom the preceding output, you can see if any resources in the restore cluster were not restored. For example, the
Warningsindicate that a resource already exists and was skipped. TheErrorsindicate a NodePort conflict, as the original port is retained during a cross-cluster restore.Confirm that the restored application is running properly.
After the application is restored, check whether any resources are in an abnormal state due to application constraints, container runtime exceptions, or other reasons. If so, fix them manually.
After the recovery is verified, the
apiVersionof the Nginx application is adjusted by default to apps/v1, which is recommended for clusters of version 1.28.