The descheduler component is used to optimize the scheduling of pods that cannot be matched with suitable nodes. This avoids resource waste and improves resource utilization in Container Service for Kubernetes (ACK) clusters. This topic describes how to configure and use descheduler.
Prerequisites
- An ACK cluster that runs Kubernetes 1.14 or later is created. For more information, see Create an ACK managed cluster.
- A kubectl client is connected to the cluster. For more information, see Connect to ACK clusters by using kubectl.
Install ack-descheduler
- Log on to the ACK console.
- In the left-side navigation pane of the ACK console, choose Marketplace > App Catalog.
- On the Marketplace page, click the App Catalog tab. Find and click ack-descheduler.
- On the ack-descheduler page, click Deploy.
- In the Deploy wizard, select a cluster and namespace, and then click Next.
- On the Parameters wizard page, set the parameters and click OK. After ack-descheduler is installed, a CronJob is automatically created in the
kube-system
namespace
. By default, this CronJob runs every 2 minutes. After ack-descheduler is installed, you are directed to the ack-descheduler-default page. If all the relevant resources are created, as shown in the following figure, the component is installed.
Use ack-descheduler to optimize pod scheduling
- Check the DeschedulerPolicy setting of the ack-descheduler-default ConfigMap.
kubectl describe cm ack-descheduler-default -n kube-system
Expected output:
Name: descheduler Namespace: kube-system Labels: app.kubernetes.io/instance=descheduler app.kubernetes.io/managed-by=Helm app.kuberne tes.io/name=descheduler app.kubernetes.io/version=0.20.0 helm.sh/chart=descheduler-0.20.0 Annotations: meta.helm.sh/release-name: descheduler meta.helm.sh/release-namespace: kube-system Data ==== policy.yaml: ---- apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "RemoveDuplicates": enabled: true "RemovePodsViolatingInterPodAntiAffinity": enabled: true "LowNodeUtilization": enabled: true params: nodeResourceUtilizationThresholds: thresholds: "cpu" : 20 "memory": 20 "pods": 20 targetThresholds: "cpu" : 50 "memory": 50 "pods": 50 "RemovePodsHavingTooManyRestarts": enabled: true params: podsHavingTooManyRestarts: podRestartThreshold: 100 includingInitContainers: true Events: <none>
The following table describes the scheduling policies returned in the preceding output. For more information about the policy settings in the
strategies
section, see Descheduler.Policy Description RemoveDuplicates This policy removes duplicate pods and ensures that only one pod is associated with a ReplicaSet, ReplicationController, StatefulSet, or Job that runs on the same node. RemovePodsViolatingInterPodAntiAffinity This policy deletes pods that violate inter-pod anti-affinity rules. LowNodeUtilization This policy finds nodes that are underutilized, evicts pods from other nodes, and recreates the pods on the underutilized nodes. The parameters of this policy are configured in the nodeResourceUtilizationThresholds
section.RemovePodsHavingTooManyRestarts This policy deletes pods that have been restarted for a specified number of times. - Verify pod scheduling before the scheduling policy is modified.
- Create a Deployment to test the scheduling. Create an nginx.yaml file and copy the following content to the file:
apiVersion: apps/v1 # for versions before 1.8.0 use apps/v1beta1 kind: Deployment metadata: name: nginx-deployment-basic labels: app: nginx spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.7.9 # Replace with the image that you want to use. The value must be in the <image_name:tags> format. ports: - containerPort: 80
Run the following command to create a Deployment with the nginx.yaml file:
kubectl apply -f nginx.yaml
Expected output:
deployment.apps/nginx-deployment-basic created
- Wait 2 minutes and run the following command to check the nodes to which the pods
are scheduled:
kubectl get pod -o wide | grep nginx
Expected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-basic-**1 1/1 Running 0 36s 172.25.XXX.XX1 cn-hangzhou.172.16.XXX.XX2 <none> <none> nginx-deployment-basic-**2 1/1 Running 0 11s 172.25.XXX.XX2 cn-hangzhou.172.16.XXX.XX3 <none> <none> nginx-deployment-basic-**3 1/1 Running 0 36s 172.25.XXX.XX3 cn-hangzhou.172.16.XXX.XX3 <none> <none>
The output shows that pod
nginx-deployment-basic-**2
and podnginx-deployment-basic-**3
are scheduled to the same nodecn-hangzhou.172.16.XXX.XX3
.Note If you use the default settings for the ack-descheduler-default ConfigMap, the scheduling result varies based on actual conditions of the cluster.
- Create a Deployment to test the scheduling.
- Modify the scheduling policy. If you use multiple scheduling policies, unexpected scheduling results may be obtained. To prevent this issue, modify the ConfigMap in Step 1 to retain only the RemoveDuplicates policy.Note The RemoveDuplicates policy ensures that pods managed by replication controllers are evenly distributed to different nodes.
apiVersion: v1 kind: ConfigMap metadata: name: descheduler namespace: kube-system labels: app.kubernetes.io/instance: descheduler app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: descheduler app.kubernetes.io/version: 0.20.0 helm.sh/chart: descheduler-0.20.0 annotations: meta.helm.sh/release-name: descheduler meta.helm.sh/release-namespace: kube-system data: policy.yaml: |- apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "RemoveDuplicates": # Retain only the RemoveDuplicates policy. enabled: true
- Verify pod scheduling after the scheduling policy is modified.
- Run the following command to apply the new scheduling policy:
kubectl apply -f newPolicy.yaml
Expected output:
configmap/descheduler created
- Wait 2 minutes and run the following command to check the nodes to which the pods
are scheduled:
kubectl get pod -o wide | grep nginx
Expected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-basic-**1 1/1 Running 0 8m26s 172.25.XXX.XX1 cn-hangzhou.172.16.XXX.XX2 <none> <none> nginx-deployment-basic-**2 1/1 Running 0 8m1s 172.25.XXX.XX2 cn-hangzhou.172.16.XXX.XX1 <none> <none> nginx-deployment-basic-**3 1/1 Running 0 8m26s 172.25.XXX.XX3 cn-hangzhou.172.16.XXX.XX3 <none> <none>
The output shows that pod
nginx-deployment-basic-**2
is rescheduled tocn-hangzhou.172.16.XXX.XX1
by descheduler. In this case, each of the three test pods is scheduled to a different node. This balances pod scheduling among multiple nodes.
- Run the following command to apply the new scheduling policy: