All Products
Search
Document Center

Container Compute Service:Use gang scheduling

Last Updated:Mar 17, 2026

Gang scheduling provides all-or-nothing scheduling for multi-pod jobs in Alibaba Cloud Container Compute Service (ACS). The scheduler holds all pods until it can place the minimum required number simultaneously, preventing resource deadlocks in distributed workloads such as AI training jobs, MPI tasks, and multi-role inference pipelines.

How gang scheduling works

When a job creates multiple pods, all pods must start together. Gang scheduling ensures that resources are allocated to the entire group at once — if the minimum number of pods cannot be scheduled simultaneously, none of the pods are scheduled. This prevents resource deadlocks caused by jobs partially acquiring resources and blocking each other.

Gang scheduling in ACS is implemented using the PodGroup custom resource (podgroups.scheduling.sigs.k8s.io/v1alpha1). You create a PodGroup to define the group constraint, then associate job pods with it using a label.

Important

All pods configured for gang scheduling must belong to the same compute class.

Prerequisites

  • kube-scheduler is installed and its version meets the following requirements.

    ACS cluster version

    Scheduler component version

    1.31

    v1.31.0-aliyun-1.2.0 and later

    1.30

    v1.30.3-aliyun-1.1.1 and later

    1.28

    v1.28.9-aliyun-1.1.0 and later

  • Gang scheduling supports only the high-performance network GPU (gpu-hpn) compute type. For more information, see Definition of computing types.

  • The Enable Custom Labels And Schedulers For GPU-HPN Nodes setting is disabled. For more information, see Component configuration.

Configure gang scheduling

  1. Create a PodGroup custom resource. The minMember field sets the minimum number of pods that must be scheduled simultaneously. The scheduleTimeoutSeconds field sets how long the scheduler waits before marking the attempt as failed.

    apiVersion: scheduling.sigs.k8s.io/v1alpha1
    kind: PodGroup
    metadata:
      name: demo-job-podgroup
      namespace: default
    spec:
      scheduleTimeoutSeconds: 10
      minMember: 3 # Set the minimum number of running pods.
  2. Create a job and associate it with the PodGroup. Save the following content to gang-job.yaml. The label pod-group.scheduling.sigs.k8s.io: demo-job-podgroup on the pod template associates every pod with the named PodGroup.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: gang-job
      namespace: default
    spec:
      parallelism: 3 # The number of pods must be greater than or equal to minMember in the PodGroup object.
      template:
        metadata:
          labels:
            alibabacloud.com/compute-class: "gpu-hpn" # Specify the compute class as gpu-hpn.
            alibabacloud.com/gpu-model-series: "example-model" # A GPU model must be specified for the GPU compute class.
            pod-group.scheduling.sigs.k8s.io: demo-job-podgroup # Associate with the demo-job-podgroup PodGroup instance.
        spec:
          containers:
          - name: demo-job
            image: registry.cn-hangzhou.aliyuncs.com/acs/stress:v1.0.4
            args:
              - 'infinity'
            command:
              - sleep
            resources:
              requests:
                cpu: "1"
                memory: "1Gi"
                nvidia.com/gpu: "1"
              limits:
                cpu: "1"
                memory: "1Gi"
                nvidia.com/gpu: "1"
          restartPolicy: Never
      backoffLimit: 4
  3. Deploy the job to the cluster.

    kubectl apply -f gang-job.yaml
  4. Verify that the pods are scheduled. When scheduling succeeds, all pods transition from the Pending state to the Running state simultaneously.

    kubectl get podgroup -n default
    kubectl get pods -n default -l pod-group.scheduling.sigs.k8s.io=demo-job-podgroup
Important

Make sure that the number of associated pods is greater than or equal to the `minMember` value configured for the PodGroup instance. Otherwise, the pods cannot be scheduled.

Examples

This example demonstrates both successful and failed scheduling outcomes when you use gang scheduling for a job.

  1. Run the following command to create the test-gang namespace.

    kubectl create ns test-gang
  2. Run the following command to create a ResourceQuota in the test-gang namespace to demonstrate how gang scheduling behaves when resources are insufficient.

    cat << EOF | kubectl apply -f -
    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: object-counts
      namespace: test-gang
    spec:
      hard:
        pods: "2"
    EOF
  3. Run the following command to create a PodGroup object. In the object, minMember is set to 3, which specifies that at least 3 associated pods must be scheduled successfully at the same time. If one of the pods fails to be created or scheduled, all pods in the group remain in the Pending state.

    cat << EOF | kubectl apply -f -
    apiVersion: scheduling.sigs.k8s.io/v1alpha1
    kind: PodGroup
    metadata:
      name: demo-job-podgroup
      namespace: test-gang
    spec:
      minMember: 3 # Set the minimum number of running pods.
    EOF
  4. Use the following YAML content to create a gang-job.yaml file. This file defines a Job object that specifies four pod replicas and is associated with the PodGroup object.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: gang-job
      namespace: test-gang
    spec:
      parallelism: 4 # The number of pods must be greater than or equal to minMember in the PodGroup object.
      template:
        metadata:
          labels:
            alibabacloud.com/compute-class: "gpu-hpn" # Specify the compute class as gpu-hpn.
            alibabacloud.com/gpu-model-series: "example-model" # A GPU model must be specified for the GPU compute class.
            pod-group.scheduling.sigs.k8s.io: demo-job-podgroup # Associate with the demo-job-podgroup PodGroup instance.
        spec:
          containers:
          - name: demo-job
            image: registry.cn-hangzhou.aliyuncs.com/acs/stress:v1.0.4
            args:
              - 'infinity'
            command:
              - sleep
            resources:
              requests:
                cpu: "1"
                memory: "1Gi"
                nvidia.com/gpu: "1"
              limits:
                cpu: "1"
                memory: "1Gi"
                nvidia.com/gpu: "1"
          restartPolicy: Never
      backoffLimit: 4
  5. Run the following command to deploy the gang-job job to the cluster.

    kubectl apply -f gang-job.yaml
  6. Run the following command to view the pod status.

    kubectl get pod -n test-gang

    Expected output:

    NAME             READY   STATUS    RESTARTS   AGE
    gang-job-hrnc6   0/1     Pending   0          23s
    gang-job-wthnq   0/1     Pending   0          23s

    The ResourceQuota limits the number of running pods to two, so only two pods are created for this job. This number is less than the `minMember` value specified in the PodGroup. Therefore, both pods remain in the Pending state and are not scheduled.

  7. Run the following command to delete the ResourceQuota and remove the limit on the number of pods.

    kubectl delete resourcequota -n test-gang object-counts
  8. Run the following command to view the pod status.

    kubectl get pod -n test-gang

    Expected output:

    NAME             READY   STATUS    RESTARTS   AGE
    gang-job-24cz9   1/1     Running   0          96s
    gang-job-mmkxl   1/1     Running   0          96s
    gang-job-msr8v   1/1     Running   0          96s
    gang-job-qnclz   1/1     Running   0          96s

    The output indicates that the pods are scheduled successfully.