All Products
Search
Document Center

Container Service for Kubernetes:Automate RayJobs in ACK clusters

Last Updated:Dec 17, 2025

A primary challenge in managing cluster resources is balancing a high volume of jobs with limited resources. To address this, organizations must prioritize resource allocation for critical teams or individuals while maintaining the flexibility to make adjustments on the fly. This guide demonstrates how to improve cluster resource utilization by using a unified job management platform to automate the processing of numerous RayJobs from different departments. This approach supports job preemption and dynamic priority adjustments, ensuring that high-priority jobs receive resources first.

Prerequisites (select one)

Important

Solution for Docker Hub pull failures

Due to network instability, such as issues with carrier networks, image pulls from Docker Hub may fail. We recommend using images that rely on Docker Hub with caution in production environments. This example uses the official Ray image rayproject/ray:2.36.1. If you cannot pull this image, use one of the following solutions:

Resource quotas

Use the ElasticQuotaTree feature in an ACK cluster with RayJob to automate job scheduling and manage computing resources more flexibly. This allows RayJobs to efficiently schedule workloads within defined resource limits, ensuring that each team can fully use its allocated resources while avoiding waste or excessive competition.

You can configure the ElasticQuotaTree resource quota based on departments or individuals, defining the maximum amount of resources each team can use. Each node in the tree represents the minimum and maximum resource quota available to the corresponding team or department. When a RayJob is submitted, the system automatically checks if the resource quota for that job is sufficient to meet its requirements. The RayJob starts executing on the appropriate compute resources only when its quota is confirmed. This ensures both effective resource utilization and proper job prioritization.

ElasticQuotaTree defines quota information within the cluster, including the quota hierarchy, the amount of resources associated with each quota, and the namespaces bound to them. When a job is submitted in one of these namespaces, it is automatically counted against the resource quota of the corresponding namespace. Refer to the following example to set a resource quota with ElasticQuotaTree.

  1. To build a resource quota system that meets your organization's needs, submit the following ElasticQuotaTree configuration to the cluster.

    View sample code

    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: video 
    
    
    ---
    
    apiVersion: scheduling.sigs.k8s.io/v1beta1
    kind: ElasticQuotaTree
    metadata:
      name: elasticquotatree # Only one ElasticQuotaTree is supported.
      namespace: kube-system # The elastic quota group takes effect only if it is created in the kube-system namespace.
    spec:
      root:
        name: root 
        min:
          cpu: 100
          memory: 50Gi
          nvidia.com/gpu: 16
        max:
          cpu: 100
          memory: 50Gi
          nvidia.com/gpu: 16
        children:
    
        - name: algorithm  
          min:
            cpu: 50
            memory: 25Gi
            nvidia.com/gpu: 10 
          max:
            cpu: 80
            memory: 50Gi
            nvidia.com/gpu: 14 
          children:
          - name: video 
            min:
              cpu: 12
              memory: 12Gi
              nvidia.com/gpu: 2 
            max:
              cpu: 14
              memory: 14Gi
              nvidia.com/gpu: 4 
            namespaces: # Configure the namespace.
            - video 
     

    ElasticQuotaTree is a tree structure where each node defines the maximum resource usage through the max field and the minimum guaranteed resource amount through the min field. If a quota's minimum guarantee cannot be met, the scheduler attempts to reclaim resources from other quotas that are using more than their minimum guaranteed resources to run the job. For jobs marked as intern-text and intern-video, their guaranteed resource amount is set to 0. This means that if an algorithm team member submits a job that requires immediate processing while an intern's job is running, the system can preempt the resources used by the intern's job to prioritize the algorithm team's job, ensuring that high-priority jobs can proceed smoothly.

  2. View the ElasticQuotaTree settings that have taken effect in the kube-system namespace.

    kubectl -n kube-system get elasticquotatree elasticquotatree -o yaml

Job queues

The Queue feature assigns RayJobs from different departments and teams to their respective queues. After the ElasticQuotaTree is submitted, ack-kube-queue automatically creates corresponding queues in the cluster for job queuing. The resource quota of each leaf node corresponds to a separate Queue in the cluster. When a RayJob is submitted, ack-kube-queue automatically associates it with the corresponding Queue based on the RayJob's namespace. The job is then automatically placed in the correct queue, and its dequeue is determined by the queuing policy or quota. For more information, see ack-kube-queue manages job queues.

As shown in the following example, the video team is associated with the video namespace. Resources are allocated through the min and max configurations, and Kube Queue automatically creates an associated queue for this quota: root-algorithm-video. Subsequently, when a RayJob with the .spec.suspend field set to True is submitted in the video namespace, a corresponding QueueUnit resource object is automatically created and enters the root-algorithm-video queue. For a RayJob, KubeQueue calculates the total required resources by summing the Head Pod's requests with the total resources for each WorkerGroup (replicas × a single Pod's resource request). If the total resource request of the RayJob fits within the currently available quota, the RayJob will dequeue from root-algorithm-video and enter the scheduler logic.

image

After the ElasticQuotaTree is created, Kube Queue automatically creates a corresponding Queue for each leaf node based on the ElasticQuotaTree configuration.

  1. For example, for the algorithm department/video team, the queue root-algorithm-video is automatically created.

    kubectl get queue -n kube-queue root-algorithm-video-k42kq -o yaml
    
    apiVersion: scheduling.x-k8s.io/v1alpha1
    kind: Queue
    metadata:
      annotations:
        kube-queue/parent-quota-fullname: algorithm
        kube-queue/quota-fullname: root/algorithm/video
      creationTimestamp: "2025-01-09T03:32:27Z"
      generateName: root-algorithm-video-
      generation: 1
      labels:
        create-by-kubequeue: "true"
      name: root-algorithm-video-k42kq
      namespace: kube-queue
      resourceVersion: "18282630"
      uid: 5606059e-acf5-4f92-b11a-48a02ef53cdf
    spec:
      queuePolicy: Round
    status:
      queueItemDetails:
        active: []
        backoff: []
    Note

    active: Jobs awaiting scheduling, showing their priority and position in the queue.

    backoff: Jobs that failed to schedule, typically due to insufficient resources, and are waiting before retrying.

  2. View the queues.

    kubectl get queue -n kube-queue

    The output appears as follows.

    NAME                               AGE
    root-algorithm-n54fm               51s
    root-algorithm-text-hgbvz          51s
    root-algorithm-video-k42kq         51s
    root-devops-2zccw                  51s
    root-infrastructure-devops-d6zqq   51s
    root-infrastructure-vbpkt          51s
    root-k8htb                         51s

Create a RayJob

  1. Define a RayJob resource in the video namespace. This automatically associates the job with the root-algorithm-video queue and the corresponding video resource quota.

          - name: video 
            min:
              cpu: 12
              memory: 12Gi
              nvidia.com/gpu: 2 
            max:
              cpu: 14
              memory: 14Gi
              nvidia.com/gpu: 4 
            namespaces: # Configure the namespace.
            - video 
    Note
    1. Minimum guaranteed resources: cpu: 12, memory: 12Gi, nvidia.com/gpu: 2.

    2. Maximum available resources: cpu: 14, memory: 14Gi, nvidia.com/gpu: 4.

  2. Create a ConfigMap to define the Python code that the RayJob will execute in the RayCluster. The sample code creates an actor using the ray.remote decorator and calls the actor's inc() and get_counter() methods.

    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: rayjob-video
      namespace: video 
    data:
      sample_code.py: |
        import ray
        import os
        import requests
    
        ray.init()
    
        @ray.remote
        class Counter:
            def __init__(self):
                # Used to verify runtimeEnv
                self.name = os.getenv("counter_name")
                assert self.name == "test_counter"
                self.counter = 0
    
            def inc(self):
                self.counter += 1
    
            def get_counter(self):
                return "{} got {}".format(self.name, self.counter)
    
        counter = Counter.remote()
    
        for _ in range(2):
            ray.get(counter.inc.remote())
            print(ray.get(counter.get_counter.remote()))
    
        # Verify that the correct runtime env was used for the job.
        assert requests.__version__ == "2.26.0"
  3. Configure the sample RayJob YAML.

    apiVersion: ray.io/v1
    kind: RayJob
    metadata:
      labels:
        job-type: video
      generateName: rayjob-video-
      namespace: video 
    spec:
      entrypoint: python /home/ray/samples/sample_code.py
      runtimeEnvYAML: |
        pip:
          - requests==2.26.0
          - pendulum==2.1.2
        env_vars:
          counter_name: "test_counter"
      
      ttlSecondsAfterFinished: 10
      # if suspend: true , should set shutdownAfterJobFinishes to true
      shutdownAfterJobFinishes: true
      # Suspend specifies whether the RayJob controller should create a RayCluster instance.
      # If a job is applied with the suspend field set to true, the RayCluster will not be created and we will wait for the transition to false.
      # If the RayCluster is already created, it will be deleted. In the case of transition to false, a new RayCluste rwill be created.
      suspend: true
    
      rayClusterSpec:
        rayVersion: '2.36.1' # should match the Ray version in the image of the containers
        headGroupSpec:
          rayStartParams:
            dashboard-host: '0.0.0.0'
            num-cpus: "0"
          template:
            spec:
              containers:
                - name: ray-head
                  image: rayproject/ray:2.36.1
                  ports:
                    - containerPort: 6379
                      name: gcs-server
                    - containerPort: 8265 # Ray dashboard
                      name: dashboard
                    - containerPort: 10001
                      name: client
                  resources:
                    limits:
                      cpu: "4"
                      memory: 4G
                    requests:
                      cpu: "4"
                      memory: 4G
                  volumeMounts:
                    - mountPath: /home/ray/samples
                      name: code-sample
              volumes:
                # You set volumes at the Pod level, then mount them into containers inside that Pod
                - name: code-sample
                  configMap:
                    # Provide the name of the ConfigMap you want to mount.
                    name: rayjob-video
                    # An array of keys from the ConfigMap to create as files
                    items:
                      - key: sample_code.py
                        path: sample_code.py
        workerGroupSpecs:
          - replicas: 2 
            groupName: small-group
            rayStartParams: {}
            template:
              spec:
                containers:
                  - name: ray-worker 
                    image: rayproject/ray:2.36.1
                    lifecycle:
                      preStop:
                        exec:
                          command: [ "/bin/sh","-c","ray stop" ]
                    resources:
                      limits:
                        cpu: "4"
                        memory: 4G
                      requests:
                        cpu: "4"
                        memory: 4G
    
    

    View RayJob configuration details

    Parameter

    Description

    namespace:video

    Sets the namespace to video.

    submissionMode:K8sJobMode

    Queues the RayJob instead of running it immediately. Only jobs with suspend: true will be managed by KubeQueue.

    ttlSecondsAfterFinished:10

    The number of seconds to wait after the job is completed before deleting it.

    shutdownAfterJobFinishes:true

    Shuts down the RayCluster after the job finishes and the ttlSecondsAfterFinished period has passed.

    Note

    This prevents resource leaks and must be set to true for queued jobs.

    suspend:true

    Only RayJobs with suspend set to true will enter the queue.

  4. Use kubectl create -f to create two RayJobs.

    kubectl  get rayjob -n video
    NAME                 JOB STATUS   DEPLOYMENT STATUS   START TIME             END TIME   AGE
    rayjob-video-g2lvn                Initializing        2025-01-10T01:36:24Z              6s
    rayjob-video-h4x2q                Suspended           2025-01-10T01:36:25Z              5s            5s            3s

    rayjob-video-g2lvn is dequeued and is in the Initializing state, while rayjob-video-h4x2q remains queued in the Suspended state.

  5. Check the enqueue and dequeue times for rayjob-video-g2lvn in its annotations using kube-queue/job-enqueue-timestamp and kube-queue/job-dequeue-timestamp.

    kubectl -n video get rayjob rayjob-video-g2lvn -o yaml
    
    apiVersion: ray.io/v1
    kind: RayJob
    metadata:
      annotations:
        kube-queue/job-dequeue-timestamp: 2025-01-10 01:36:24.641181026 +0000 UTC m=+132100.596012828
        kube-queue/job-enqueue-timestamp: 2025-01-10 01:36:24.298639916 +0000 UTC m=+132100.253471714
      creationTimestamp: "2025-01-10T01:36:24Z"
      

    For rayjob-video-h4x2q, the annotations only show the enqueue time (kube-queue/job-enqueue-timestamp), with no dequeue time. This indicates that the RayJob has not yet been dequeued for scheduling.

     kubectl -n video get rayjob rayjob-video-h4x2q -o yaml
     
    apiVersion: ray.io/v1
    kind: RayJob
    metadata:
      annotations:
        kube-queue/job-enqueue-timestamp: 2025-01-10 01:36:25.505182364 +0000 UTC m=+132101.460014182
      creationTimestamp: "2025-01-10T01:36:25Z"
  6. View the Pods. Currently, only the Pods for rayjob-video-g2lvn have started scheduling.

     kubectl -n video get pod
    NAME                                                           READY   STATUS    RESTARTS   AGE
    rayjob-video-g2lvn-9gz66                                       1/1     Running   0          28s
    rayjob-video-g2lvn-raycluster-v8tfh-head-6trq5                 1/1     Running   0          49s
    rayjob-video-g2lvn-raycluster-v8tfh-small-group-worker-hkt7m   1/1     Running   0          49s
    rayjob-video-g2lvn-raycluster-v8tfh-small-group-worker-rbzjn   1/1     Running   0          49s
  7. Check the properties of the queue. The second job, rayjob-video-h4x2q, is now in the backoff list, waiting for resources.

    k -n kube-queue get queue root-algorithm-video-k42kq -o yaml
    
    apiVersion: scheduling.x-k8s.io/v1alpha1
    kind: Queue
    metadata:
      annotations:
        kube-queue/parent-quota-fullname: algorithm
        kube-queue/quota-fullname: root/algorithm/video
      creationTimestamp: "2025-01-09T08:34:57Z"
      generateName: root-algorithm-video-
      generation: 1
      labels:
        create-by-kubequeue: "true"
      name: root-algorithm-video-k42kq
      namespace: kube-queue
      resourceVersion: "19070012"
      uid: 5606059e-acf5-4f92-b11a-48a02ef53cdf
    spec:
      queuePolicy: Round
    status:
      queueItemDetails:
        active: []
        backoff:
        - name: rayjob-video-h4x2q-ray-qu
          namespace: video
          position: 1

Set up gang scheduling

Use gang scheduling (or co-scheduling) with Ray for distributed tasks that require multiple nodes to start simultaneously. This is useful in the following scenarios:

  • Large-scale machine learning training: When dealing with very large datasets or complex models, a single machine may not provide sufficient computing resources. In this case, a group of containers needs to work together. Gang scheduling ensures that these containers are scheduled simultaneously, avoiding resource contention and deadlocks, thereby improving training efficiency.

  • MPI computing framework: Parallel computing with multiple threads under the MPI framework requires master and slave processes to work together. Gang scheduling ensures that these processes are scheduled simultaneously, reducing communication latency and improving computational efficiency.

  • Data processing and analysis: For applications that need to process massive amounts of data, such as log analysis and real-time stream processing, multiple jobs may need to run simultaneously to complete complex analysis tasks. Gang scheduling ensures that these jobs are scheduled at the same time, improving overall processing speed.

  • Custom distributed application development: Implement player matchmaking services in a game server architecture, or coordinate data collection and processing from thousands of devices in an Internet of Things (IoT) project.

To enable gang scheduling for a RayJob in an ACK Pro or ACK Lingjun cluster, add the ray.io/scheduler-name: kube-scheduler label to its metadata. After submitting the job, the Ray Operator will automatically inject the necessary labels for gang scheduling when creating the Pods.

View sample code

apiVersion: ray.io/v1
kind: RayJob
metadata:
  generateName: rayjob-sample-
  namespace: algorithm-text
  labels:
    # Use ray.io/scheduler-name to specify gang scheduling.
    ray.io/scheduler-name: kube-scheduler
    # Use quota.scheduling.alibabacloud.com/name to specify a quota.
    quota.scheduling.alibabacloud.com/name: algorithm-video
spec:
  entrypoint: python /home/ray/samples/sample_code.py
  runtimeEnvYAML: |
    pip:
      - requests==2.26.0
      - pendulum==2.1.2
    env_vars:
      counter_name: "test_counter"
  shutdownAfterJobFinishes: true
  # Suspend specifies whether the RayJob controller should create a RayCluster instance.
  # If a job is applied with the suspend field set to true, the RayCluster will not be created and we will wait for the transition to false.
  # If the RayCluster is already created, it will be deleted. In the case of transition to false, a new RayCluste rwill be created.
  suspend: true

  rayClusterSpec:
    rayVersion: '2.9.0' # should match the Ray version in the image of the containers
    headGroupSpec:
      rayStartParams:
        dashboard-host: '0.0.0.0'
      template:
        spec:
          containers:
            - name: ray-head
              image: rayproject/ray:2.9.0
              ports:
                - containerPort: 6379
                  name: gcs-server
                - containerPort: 8265 # Ray dashboard
                  name: dashboard
                - containerPort: 10001
                  name: client
              resources:
                limits:
                  cpu: "1"
                requests:
                  cpu: "1"
              volumeMounts:
                - mountPath: /home/ray/samples
                  name: code-sample
          volumes:
            # You set volumes at the Pod level, then mount them into containers inside that Pod
            - name: code-sample
              configMap:
                # Provide the name of the ConfigMap you want to mount.
                name: ray-job-code-sample
                # An array of keys from the ConfigMap to create as files
                items:
                  - key: sample_code.py
                    path: sample_code.py
    workerGroupSpecs:
      - replicas: 30
        groupName: small-group
        rayStartParams: {}
        template:
          spec:
            containers:
              - name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc'
                image: rayproject/ray:2.9.0
                lifecycle:
                  preStop:
                    exec:
                      command: [ "/bin/sh","-c","ray stop" ]
                resources:
                  limits:
                    cpu: "1"
                  requests:
                    cpu: "1"
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: ray-job-code-sample
  namespace: algorithm-text
data:
  sample_code.py: |
    import ray
    import os
    import requests

    ray.init()

    @ray.remote
    class Counter:
        def __init__(self):
            # Used to verify runtimeEnv
            self.name = os.getenv("counter_name")
            assert self.name == "test_counter"
            self.counter = 0

        def inc(self):
            self.counter += 1

        def get_counter(self):
            return "{} got {}".format(self.name, self.counter)

    counter = Counter.remote()

    for _ in range(5):
        ray.get(counter.inc.remote())
        print(ray.get(counter.get_counter.remote()))

    # Verify that the correct runtime env was used for the job.
    assert requests.__version__ == "2.26.0"

    import time
    time.sleep(30)

When creating Pods, the Ray Operator injects labels to facilitate identification, grouping, and gang scheduling operations.

View sample code

apiVersion: v1
kind: Pod
metadata:
  annotations:
    ray.io/ft-enabled: "false"
  creationTimestamp: "2024-10-10T02:38:29Z"
  generateName: rayjob-sample-hhbdr-raycluster-ljj69-small-group-worker-
  labels:
    app.kubernetes.io/created-by: kuberay-operator
    app.kubernetes.io/name: kuberay
    # Add the Coscheduling label recognized by ACK.
    pod-group.scheduling.sigs.k8s.io/min-available: "31"
    pod-group.scheduling.sigs.k8s.io/name: rayjob-sample-hhbdr-raycluster-ljj69
    # Add the Quota label recognized by ACK.
    quota.scheduling.alibabacloud.com/name: algorithm-video
    ray.io/cluster: rayjob-sample-hhbdr-raycluster-ljj69
    ray.io/group: small-group
    ray.io/identifier: rayjob-sample-hhbdr-raycluster-ljj69-worker
    ray.io/is-ray-node: "yes"
    ray.io/node-type: worker
    scheduling.x-k8s.io/pod-group: rayjob-sample-hhbdr-raycluster-ljj69
  name: rayjob-sample-hhbdr-raycluster-ljj69-small-group-worker-xnzjh
  namespace: algorithm-text
  ownerReferences:
  - apiVersion: ray.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: RayCluster
    name: rayjob-sample-hhbdr-raycluster-ljj69
    uid: 74259f20-86fd-4777-b826-73d201065931
  resourceVersion: "7482744"
  uid: 4f666efe-a25d-4620-824f-cab3f4fa0ce7
spec:
  containers:
  - args:
    - 'ulimit -n 65536; ray start  --address=rayjob-sample-hhbdr-raycluster-ljj69-head-svc.algorithm-text.svc.cluster.local:6379  --metrics-export-port=8080  --block  --dashboard-agent-listen-port=52365  --num-cpus=1 '
    command:
    - /bin/bash
    - -lc
    - --
    env:
    - name: FQ_RAY_IP
      value: rayjob-sample-hhbdr-raycluster-ljj69-head-svc.algorithm-text.svc.cluster.local
    - name: RAY_IP
      value: rayjob-sample-hhbdr-raycluster-ljj69-head-svc
    - name: RAY_CLUSTER_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.labels['ray.io/cluster']
    - name: RAY_CLOUD_INSTANCE_ID
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: RAY_NODE_TYPE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.labels['ray.io/group']
    - name: KUBERAY_GEN_RAY_START_CMD
      value: 'ray start  --address=rayjob-sample-hhbdr-raycluster-ljj69-head-svc.algorithm-text.svc.cluster.local:6379  --metrics-export-port=8080  --block  --dashboard-agent-listen-port=52365  --num-cpus=1 '
    - name: RAY_PORT
      value: "6379"
    - name: RAY_ADDRESS
      value: rayjob-sample-hhbdr-raycluster-ljj69-head-svc.algorithm-text.svc.cluster.local:6379
    - name: RAY_USAGE_STATS_KUBERAY_IN_USE
      value: "1"
    - name: REDIS_PASSWORD
    - name: RAY_DASHBOARD_ENABLE_K8S_DISK_USAGE
      value: "1"
    image: rayproject/ray:2.9.0
    imagePullPolicy: IfNotPresent
    lifecycle:
      preStop:
        exec:
          command:
          - /bin/sh
          - -c
          - ray stop
    livenessProbe:
      exec:
        command:
        - bash
        - -c
        - wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep
          success
      failureThreshold: 120
      initialDelaySeconds: 30
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 2
    name: ray-worker
    ports:
    - containerPort: 8080
      name: metrics
      protocol: TCP
    readinessProbe:
      exec:
        command:
        - bash
        - -c
        - wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep
          success
      failureThreshold: 10
      initialDelaySeconds: 10
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 2
    resources:
      limits:
        cpu: "1"
      requests:
        cpu: "1"
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /dev/shm
      name: shared-mem
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-rq67v
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - args:
    - "\n\t\t\t\t\tSECONDS=0\n\t\t\t\t\twhile true; do\n\t\t\t\t\t\tif (( SECONDS
      <= 120 )); then\n\t\t\t\t\t\t\tif ray health-check --address rayjob-sample-hhbdr-raycluster-ljj69-head-svc.algorithm-text.svc.cluster.local:6379
      > /dev/null 2>&1; then\n\t\t\t\t\t\t\t\techo \"GCS is ready.\"\n\t\t\t\t\t\t\t\tbreak\n\t\t\t\t\t\t\tfi\n\t\t\t\t\t\t\techo
      \"$SECONDS seconds elapsed: Waiting for GCS to be ready.\"\n\t\t\t\t\t\telse\n\t\t\t\t\t\t\tif
      ray health-check --address rayjob-sample-hhbdr-raycluster-ljj69-head-svc.algorithm-text.svc.cluster.local:6379;
      then\n\t\t\t\t\t\t\t\techo \"GCS is ready. Any error messages above can be safely
      ignored.\"\n\t\t\t\t\t\t\t\tbreak\n\t\t\t\t\t\t\tfi\n\t\t\t\t\t\t\techo \"$SECONDS
      seconds elapsed: Still waiting for GCS to be ready. For troubleshooting, refer
      to the FAQ at https://github.com/ray-project/kuberay/blob/master/docs/guidance/FAQ.md.\"\n\t\t\t\t\t\tfi\n\t\t\t\t\t\tsleep
      5\n\t\t\t\t\tdone\n\t\t\t\t"
    command:
    - /bin/bash
    - -lc
    - --
    env:
    - name: FQ_RAY_IP
      value: rayjob-sample-hhbdr-raycluster-ljj69-head-svc.algorithm-text.svc.cluster.local
    - name: RAY_IP
      value: rayjob-sample-hhbdr-raycluster-ljj69-head-svc
    image: rayproject/ray:2.9.0
    imagePullPolicy: IfNotPresent
    name: wait-gcs-ready
    resources:
      limits:
        cpu: 200m
        memory: 256Mi
      requests:
        cpu: 200m
        memory: 256Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-rq67v
      readOnly: true
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir:
      medium: Memory
    name: shared-mem
  - name: kube-api-access-rq67v
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-10-10T02:38:29Z"
    message: '0/7 nodes are available: 3 node(s) had untolerated taint {virtual-kubelet.io/provider:
      alibabacloud}, 4 Insufficient cpu. failed to get current scheduling unit not
      found, preemption: 0/7 nodes are available: 1 No victims found on node cn-hongkong.10.0.118.181
      for preemptor pod rayjob-sample-hhbdr-raycluster-ljj69-small-group-worker-xnzjh,
      1 No victims found on node cn-hongkong.10.1.0.52 for preemptor pod rayjob-sample-hhbdr-raycluster-ljj69-small-group-worker-xnzjh,
      1 No victims found on node cn-hongkong.10.2.0.24 for preemptor pod rayjob-sample-hhbdr-raycluster-ljj69-small-group-worker-xnzjh,
      1 No victims found on node cn-hongkong.10.2.0.5 for preemptor pod rayjob-sample-hhbdr-raycluster-ljj69-small-group-worker-xnzjh,
      3 Preemption is not helpful for scheduling., '
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: Burstable

When resources are insufficient, get detailed information about scheduling failures by inspecting Kubernetes events. Use --field-selector='type=Warning,reason=GangFailedScheduling' to filter for events related to gang scheduling failures. The event message, which may include cycle xx, provides details about a specific scheduling attempt and explains why the Pod could not be successfully scheduled in that round. The following is an example.

➜  kubequeue-doc kubectl get events -n algorithm-text --field-selector='type=Warning,reason=GangFailedScheduling' | grep "cycle 1"
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-89mlq   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-89mlq in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-8fwmr   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-8fwmr in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-8g5wv   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-8g5wv in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m46s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-8tn4w   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-8tn4w in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-97gpk   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-97gpk in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m46s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-9xsgw   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-9xsgw in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-gwxhg   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-gwxhg in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-jzw6k   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-jzw6k in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-kb55s   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-kb55s in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-lbvk7   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-lbvk7 in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m46s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-ms96b   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-ms96b in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-sgr9g   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-sgr9g in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m46s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-svt6g   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-svt6g in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-wm5c6   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-wm5c6 in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.