All Products
Search
Document Center

Container Compute Service:Use gang scheduling

Last Updated:Aug 07, 2025

Alibaba Cloud Container Compute Service (ACS) provides the gang scheduling feature, which fulfills all-or-nothing requirements in job scheduling scenarios. This topic describes how to use gang scheduling.

Prerequisites

  • kube-scheduler is installed and its version meets the following requirements.

    ACS cluster version

    Scheduler version

    1.31

    v1.31.0-aliyun-1.2.0 and later

    1.30

    v1.30.3-aliyun-1.1.1 and later

    1.28

    v1.28.9-aliyun-1.1.0 and later

  • Gang scheduling supports only the high-performance network GPU (gpu-hpn) compute type. For more information, see Definition of computing types.

  • The Enable Custom Labels And Schedulers For GPU-HPN Nodes setting is disabled. For more information, see Component configuration.

Feature introduction

When a job creates multiple pods, the pods must start and run in a coordinated manner. Resources must be allocated to the group of pods as a batch to ensure that all pods can request resources at the same time. If the scheduling requirements of a single pod are not met, the scheduling for the entire group of pods fails. The scheduler provides these all-or-nothing scheduling semantics to help prevent resource deadlocks caused by resource competition among multiple jobs.

The built-in scheduler of ACS provides the gang scheduling feature to implement all-or-nothing scheduling, which ensures that jobs can run successfully.

Important

The group of pods for which the gang scheduling feature is configured must belong to the same compute class.

Usage

The gang scheduling feature provided by ACS is compatible with the PodGroup custom resource in Kubernetes. The corresponding version is podgroups.scheduling.sigs.k8s.io/v1alpha1. Before you submit a job, you must create a PodGroup instance in the job's namespace and specify the minimum number of pods (`minMember`) required for the job to run. Then, when you create the job's pods, you must associate them with the PodGroup instance using the Pod-group.scheduling.sigs.k8s.io label. During scheduling, ACS allocates resources to all pods that share the same PodGroup label.

  1. Create a PodGroup custom resource.

    apiVersion: scheduling.sigs.k8s.io/v1alpha1
    kind: PodGroup
    metadata: 
      name: demo-job-podgroup
      namespace: default
    spec: 
      scheduleTimeoutSeconds: 10 
      minMember: 3 # Set the minimum number of running pods.
  2. Create a job and associate it with the PodGroup.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: gang-job
      namespace: default
    spec:
      parallelism: 3 # The number of pods must be greater than or equal to minMember in the PodGroup object.
      template:
        metadata:
          labels:
            alibabacloud.com/compute-class: "gpu" # Specify the compute class as gpu or gpu-hpn.
            alibabacloud.com/gpu-model-series: "example-model" # The GPU compute class requires you to specify a GPU model.
            pod-group.scheduling.sigs.k8s.io: demo-job-podgroup # Associate with the PodGroup instance demo-job-podgroup.
        spec:
          containers:
          - name: demo-job
            image: registry.cn-hangzhou.aliyuncs.com/acs/stress:v1.0.4
            args:
              - 'infinity'
            command:
              - sleep
            resources:
              requests:
                cpu: "1"
                memory: "1Gi"
                nvidia.com/gpu: "1"
              limits:
                cpu: "1"
                memory: "1Gi"
                nvidia.com/gpu: "1"
          restartPolicy: Never
      backoffLimit: 4
Important

Make sure that the number of associated pods is greater than or equal to the `minMember` value configured for the PodGroup instance. Otherwise, the pods cannot be scheduled.

Examples

This example demonstrates both successful and failed scheduling outcomes when you use gang scheduling for a job.

  1. Run the following command to create the test-gang namespace.

    kubectl create ns test-gang
  2. Run the following command to create a ResourceQuota in the test-gang namespace to demonstrate how gang scheduling behaves when resources are insufficient.

    cat << EOF | kubectl apply -f -
    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: object-counts
      namespace: test-gang
    spec:
      hard:
        pods: "2"
    EOF
  3. Run the following command to create a PodGroup object. In the object, minMember is set to 3, which specifies that at least 3 associated pods must be scheduled successfully at the same time. If one of the pods fails to be created or scheduled, all pods in the group remain in the Pending state.

    cat << EOF | kubectl apply -f -
    apiVersion: scheduling.sigs.k8s.io/v1alpha1
    kind: PodGroup
    metadata: 
      name: demo-job-podgroup
      namespace: test-gang
    spec: 
      minMember: 3 # Set the minimum number of running pods.
    EOF
  4. Use the following YAML content to create a gang-job.yaml file. This file defines a Job object that specifies four pod replicas and is associated with the PodGroup object.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: gang-job
      namespace: test-gang
    spec:
      parallelism: 4 # The number of pods must be greater than or equal to minMember in the PodGroup object.
      template:
        metadata:
          labels:
            alibabacloud.com/compute-class: "gpu" # Specify the compute class as gpu or gpu-hpn.
            alibabacloud.com/gpu-model-series: "example-model" # The GPU compute class requires you to specify a GPU model.
            pod-group.scheduling.sigs.k8s.io: demo-job-podgroup # Associate with the PodGroup instance demo-job-podgroup.
        spec:
          containers:
          - name: demo-job
            image: registry.cn-hangzhou.aliyuncs.com/acs/stress:v1.0.4
            args:
              - 'infinity'
            command:
              - sleep
            resources:
              requests:
                cpu: "1"
                memory: "1Gi"
                nvidia.com/gpu: "1"
              limits:
                cpu: "1"
                memory: "1Gi"
                nvidia.com/gpu: "1"
          restartPolicy: Never
      backoffLimit: 4
  5. Run the following command to deploy the gang-job job to the cluster.

    kubectl apply -f gang-job.yaml
  6. Run the following command to view the pod status.

    kubectl get pod -n test-gang

    Expected output:

    NAME             READY   STATUS    RESTARTS   AGE
    gang-job-hrnc6   0/1     Pending   0          23s
    gang-job-wthnq   0/1     Pending   0          23s

    The ResourceQuota limits the number of running pods to two, so only two pods are created for this job. This number is less than the `minMember` value specified in the PodGroup. Therefore, both pods remain in the Pending state and are not scheduled.

  7. Run the following command to delete the ResourceQuota and remove the limit on the number of pods.

    kubectl delete resourcequota -n test-gang object-counts
  8. Run the following command to view the pod status.

    kubectl get pod -n test-gang

    Expected output:

    NAME             READY   STATUS    RESTARTS   AGE
    gang-job-24cz9   1/1     Running   0          96s
    gang-job-mmkxl   1/1     Running   0          96s
    gang-job-msr8v   1/1     Running   0          96s
    gang-job-qnclz   1/1     Running   0          96s

    The output indicates that the pods are scheduled successfully.