Cluster resource availability changes over time. Because the scheduler's placement decisions are based on a point-in-time snapshot of cluster resources, replicas that were successfully scheduled can later become unschedulable when nodes fail or resources are exhausted. By default, ACK One Fleet handles this automatically: it distributes Deployment, StatefulSet, and Job replicas across associated clusters using PropagationPolicy, checks for unschedulable replicas every 2 minutes, and triggers descheduling if any replica remains unschedulable for more than 30 seconds.
Prerequisites
Before you begin, ensure that you have:
Fleet management enabled
A Fleet instance with multiple associated clusters
The AliyunAdcpFullAccess permission granted to your RAM user
The AMC command-line tool installed
Step 1: Create an application in the Fleet
Create a file named
web-demo.yamlwith the following content:apiVersion: apps/v1 kind: Deployment metadata: name: web-demo spec: replicas: 3 selector: matchLabels: app: web-demo template: metadata: labels: app: web-demo spec: containers: - name: nginx image: registry-cn-hangzhou.ack.aliyuncs.com/acs/web-demo:0.5.0 ports: - containerPort: 80Deploy the application:
kubectl apply -f web-demo.yaml
Step 2: Create a distribution policy
Create a dynamic weight-based distribution policy. Setting
dynamicWeight: AvailableReplicastells the Fleet to automatically adjust replica allocation ratios based on available resources across all nodes in each associated cluster.apiVersion: policy.one.alibabacloud.com/v1alpha1 kind: PropagationPolicy metadata: name: web-demo spec: resourceSelectors: - apiVersion: apps/v1 kind: Deployment name: web-demo placement: clusterAffinity: clusterNames: - ${cluster1-id} # Your cluster ID. - ${cluster2-id} replicaScheduling: replicaSchedulingType: Divided replicaDivisionPreference: Weighted weightPreference: dynamicWeight: AvailableReplicasCheck the application distribution status:
kubectl amc get deploy web-demo -MThe expected output is similar to the following (results vary based on available resources in each associated cluster):
NAME CLUSTER READY UP-TO-DATE AVAILABLE AGE ADOPTION web-demo cxxxxxxxx1 2/2 2 2 11s Y web-demo cxxxxxxxx2 3/3 3 3 11s Y
Step 3: Verify descheduling
Simulate a scenario where insufficient resources cause replicas to become unschedulable by tainting all nodes in one cluster and then restarting the workload.
Taint all nodes in Cluster1 with
NoSchedule:kubectl --kubeconfig=<cluster1.config> taint nodes foo=bar:NoSchedule --all=trueRestart the workload. Because all nodes in Cluster1 are tainted, the restarted pods cannot be scheduled and enter a
Pendingstate.kubectl --kubeconfig=<cluster1.config> rollout restart deploy web-demoConfirm that the pods in Cluster1 are in a
Pendingstate:kubectl --kubeconfig=<cluster1.config> get podsThe pods appear as
Pending. This is expected — wait for the descheduler to detect and reschedule them. The Fleet checks for unschedulable replicas every 2 minutes and triggers rescheduling after they remain unschedulable for more than 30 seconds.After about 3 minutes, check the scheduling results:
kubectl amc get deploy web-demo -MExpected output:
NAME CLUSTER READY UP-TO-DATE AVAILABLE AGE ADOPTION web-demo cxxxxxxxx2 5/5 5 5 11s YAll replicas from Cluster1 have been rescheduled to Cluster2.