How to automate RayJob execution in ACK clusters - Container Service for Kubernetes

A primary challenge in managing cluster resources is balancing a high volume of jobs with limited resources. To address this, organizations must prioritize resource allocation for critical teams or individuals while maintaining the flexibility to make adjustments on the fly. This guide demonstrates how to improve cluster resource utilization by using a unified job management platform to automate the processing of numerous RayJobs from different departments. This approach supports job preemption and dynamic priority adjustments, ensuring that high-priority jobs receive resources first.

Prerequisites (select one)

ACK Pro cluster
- Cluster version: v1.24 or later.
- You have installed kubectl and used it to connect to the Kubernetes cluster. For more information, see Obtain a kubeconfig file and connect to a cluster by using kubectl.
- Kuberay component: See Install Kuberay-Operator.
- Kube Queue: v1.21.4 or later. See Use ack-kube-queue to manage AI and machine learning workloads.
- Kube Queue is configured to support RayJob resources.
- Default Node Pool: At least three ECS instances with a specification of 8 vCPU and 32 GiB or higher.
ACK Lingjun cluster
- Cluster version: v1.24 or later.
- You have installed kubectl and used it to connect to the Kubernetes cluster. For more information, see Obtain a kubeconfig file and connect to a cluster by using kubectl.
- Kuberay component: See Install Kuberay-Operator.
- Kube Queue: v1.21.4 or later. See Use ack-kube-queue to manage AI and machine learning workloads.
- Kube Queue is configured to support RayJob resources.
- Default Node Pool: At least three ECS instances with a specification of 8 vCPU and 32 GiB or higher.

Important

Solution for Docker Hub pull failures

Due to network instability, such as issues with carrier networks, image pulls from Docker Hub may fail. We recommend using images that rely on Docker Hub with caution in production environments. This example uses the official Ray image rayproject/ray:2.36.1. If you cannot pull this image, use one of the following solutions:

Subscribe to images from registries outside the Chinese mainland through Container Registry. For more information, see Subscribe to images outside China.
To directly pull images from overseas sources, create a Global Accelerator instance and use its global network acceleration service. For more information, see Use GA to accelerate cross-domain container image pulling in ACK.

Resource quotas

Use the ElasticQuotaTree feature in an ACK cluster with RayJob to automate job scheduling and manage computing resources more flexibly. This allows RayJobs to efficiently schedule workloads within defined resource limits, ensuring that each team can fully use its allocated resources while avoiding waste or excessive competition.

You can configure the ElasticQuotaTree resource quota based on departments or individuals, defining the maximum amount of resources each team can use. Each node in the tree represents the minimum and maximum resource quota available to the corresponding team or department. When a RayJob is submitted, the system automatically checks if the resource quota for that job is sufficient to meet its requirements. The RayJob starts executing on the appropriate compute resources only when its quota is confirmed. This ensures both effective resource utilization and proper job prioritization.

ElasticQuotaTree defines quota information within the cluster, including the quota hierarchy, the amount of resources associated with each quota, and the namespaces bound to them. When a job is submitted in one of these namespaces, it is automatically counted against the resource quota of the corresponding namespace. Refer to the following example to set a resource quota with ElasticQuotaTree.

To build a resource quota system that meets your organization's needs, submit the following ElasticQuotaTree configuration to the cluster.

View sample code

---
apiVersion: v1
kind: Namespace
metadata:
  name: video 


---

apiVersion: scheduling.sigs.k8s.io/v1beta1
kind: ElasticQuotaTree
metadata:
  name: elasticquotatree # Only one ElasticQuotaTree is supported.
  namespace: kube-system # The elastic quota group takes effect only if it is created in the kube-system namespace.
spec:
  root:
    name: root 
    min:
      cpu: 100
      memory: 50Gi
      nvidia.com/gpu: 16
    max:
      cpu: 100
      memory: 50Gi
      nvidia.com/gpu: 16
    children:

    - name: algorithm  
      min:
        cpu: 50
        memory: 25Gi
        nvidia.com/gpu: 10 
      max:
        cpu: 80
        memory: 50Gi
        nvidia.com/gpu: 14 
      children:
      - name: video 
        min:
          cpu: 12
          memory: 12Gi
          nvidia.com/gpu: 2 
        max:
          cpu: 14
          memory: 14Gi
          nvidia.com/gpu: 4 
        namespaces: # Configure the namespace.
        - video

ElasticQuotaTree is a tree structure where each node defines the maximum resource usage through the max field and the minimum guaranteed resource amount through the min field. If a quota's minimum guarantee cannot be met, the scheduler attempts to reclaim resources from other quotas that are using more than their minimum guaranteed resources to run the job. For jobs marked as intern-text and intern-video, their guaranteed resource amount is set to 0. This means that if an algorithm team member submits a job that requires immediate processing while an intern's job is running, the system can preempt the resources used by the intern's job to prioritize the algorithm team's job, ensuring that high-priority jobs can proceed smoothly.

View the ElasticQuotaTree settings that have taken effect in the kube-system namespace.
```
kubectl -n kube-system get elasticquotatree elasticquotatree -o yaml
```

Job queues

The Queue feature assigns RayJobs from different departments and teams to their respective queues. After the ElasticQuotaTree is submitted, ack-kube-queue automatically creates corresponding queues in the cluster for job queuing. The resource quota of each leaf node corresponds to a separate Queue in the cluster. When a RayJob is submitted, ack-kube-queue automatically associates it with the corresponding Queue based on the RayJob's namespace. The job is then automatically placed in the correct queue, and its dequeue is determined by the queuing policy or quota. For more information, see ack-kube-queue manages job queues.

As shown in the following example, the video team is associated with the video namespace. Resources are allocated through the min and max configurations, and Kube Queue automatically creates an associated queue for this quota: root-algorithm-video. Subsequently, when a RayJob with the .spec.suspend field set to True is submitted in the video namespace, a corresponding QueueUnit resource object is automatically created and enters the root-algorithm-video queue. For a RayJob, KubeQueue calculates the total required resources by summing the Head Pod's requests with the total resources for each WorkerGroup (replicas × a single Pod's resource request). If the total resource request of the RayJob fits within the currently available quota, the RayJob will dequeue from root-algorithm-video and enter the scheduler logic.

After the ElasticQuotaTree is created, Kube Queue automatically creates a corresponding Queue for each leaf node based on the ElasticQuotaTree configuration.

For example, for the algorithm department/video team, the queue root-algorithm-video is automatically created.

kubectl get queue -n kube-queue root-algorithm-video-k42kq -o yaml

apiVersion: scheduling.x-k8s.io/v1alpha1
kind: Queue
metadata:
  annotations:
    kube-queue/parent-quota-fullname: algorithm
    kube-queue/quota-fullname: root/algorithm/video
  creationTimestamp: "2025-01-09T03:32:27Z"
  generateName: root-algorithm-video-
  generation: 1
  labels:
    create-by-kubequeue: "true"
  name: root-algorithm-video-k42kq
  namespace: kube-queue
  resourceVersion: "18282630"
  uid: 5606059e-acf5-4f92-b11a-48a02ef53cdf
spec:
  queuePolicy: Round
status:
  queueItemDetails:
    active: []
    backoff: []

Note

active: Jobs awaiting scheduling, showing their priority and position in the queue.

backoff: Jobs that failed to schedule, typically due to insufficient resources, and are waiting before retrying.

View the queues.

kubectl get queue -n kube-queue

The output appears as follows.

NAME                               AGE
root-algorithm-n54fm               51s
root-algorithm-text-hgbvz          51s
root-algorithm-video-k42kq         51s
root-devops-2zccw                  51s
root-infrastructure-devops-d6zqq   51s
root-infrastructure-vbpkt          51s
root-k8htb                         51s

Create a RayJob

Define a RayJob resource in the video namespace. This automatically associates the job with the root-algorithm-video queue and the corresponding video resource quota.
```
      - name: video 
        min:
          cpu: 12
          memory: 12Gi
          nvidia.com/gpu: 2 
        max:
          cpu: 14
          memory: 14Gi
          nvidia.com/gpu: 4 
        namespaces: # Configure the namespace.
        - video 
```
Note
1. Minimum guaranteed resources: cpu: 12, memory: 12Gi, nvidia.com/gpu: 2.
2. Maximum available resources: cpu: 14, memory: 14Gi, nvidia.com/gpu: 4.

Create a ConfigMap to define the Python code that the RayJob will execute in the RayCluster. The sample code creates an actor using the ray.remote decorator and calls the actor's inc() and get_counter() methods.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: rayjob-video
  namespace: video 
data:
  sample_code.py: |
    import ray
    import os
    import requests

    ray.init()

    @ray.remote
    class Counter:
        def __init__(self):
            # Used to verify runtimeEnv
            self.name = os.getenv("counter_name")
            assert self.name == "test_counter"
            self.counter = 0

        def inc(self):
            self.counter += 1

        def get_counter(self):
            return "{} got {}".format(self.name, self.counter)

    counter = Counter.remote()

    for _ in range(2):
        ray.get(counter.inc.remote())
        print(ray.get(counter.get_counter.remote()))

    # Verify that the correct runtime env was used for the job.
    assert requests.__version__ == "2.26.0"

Configure the sample RayJob YAML.

apiVersion: ray.io/v1
kind: RayJob
metadata:
  labels:
    job-type: video
  generateName: rayjob-video-
  namespace: video 
spec:
  entrypoint: python /home/ray/samples/sample_code.py
  runtimeEnvYAML: |
    pip:
      - requests==2.26.0
      - pendulum==2.1.2
    env_vars:
      counter_name: "test_counter"
  
  ttlSecondsAfterFinished: 10
  # if suspend: true , should set shutdownAfterJobFinishes to true
  shutdownAfterJobFinishes: true
  # Suspend specifies whether the RayJob controller should create a RayCluster instance.
  # If a job is applied with the suspend field set to true, the RayCluster will not be created and we will wait for the transition to false.
  # If the RayCluster is already created, it will be deleted. In the case of transition to false, a new RayCluste rwill be created.
  suspend: true

  rayClusterSpec:
    rayVersion: '2.36.1' # should match the Ray version in the image of the containers
    headGroupSpec:
      rayStartParams:
        dashboard-host: '0.0.0.0'
        num-cpus: "0"
      template:
        spec:
          containers:
            - name: ray-head
              image: rayproject/ray:2.36.1
              ports:
                - containerPort: 6379
                  name: gcs-server
                - containerPort: 8265 # Ray dashboard
                  name: dashboard
                - containerPort: 10001
                  name: client
              resources:
                limits:
                  cpu: "4"
                  memory: 4G
                requests:
                  cpu: "4"
                  memory: 4G
              volumeMounts:
                - mountPath: /home/ray/samples
                  name: code-sample
          volumes:
            # You set volumes at the Pod level, then mount them into containers inside that Pod
            - name: code-sample
              configMap:
                # Provide the name of the ConfigMap you want to mount.
                name: rayjob-video
                # An array of keys from the ConfigMap to create as files
                items:
                  - key: sample_code.py
                    path: sample_code.py
    workerGroupSpecs:
      - replicas: 2 
        groupName: small-group
        rayStartParams: {}
        template:
          spec:
            containers:
              - name: ray-worker 
                image: rayproject/ray:2.36.1
                lifecycle:
                  preStop:
                    exec:
                      command: [ "/bin/sh","-c","ray stop" ]
                resources:
                  limits:
                    cpu: "4"
                    memory: 4G
                  requests:
                    cpu: "4"
                    memory: 4G

View RayJob configuration details

Parameter	Description
`namespace:video`	Sets the namespace to `video`.
`submissionMode:K8sJobMode`	Queues the RayJob instead of running it immediately. Only jobs with `suspend: true` will be managed by KubeQueue.
`ttlSecondsAfterFinished:10`	The number of seconds to wait after the job is completed before deleting it.
`shutdownAfterJobFinishes:true`	Shuts down the RayCluster after the job finishes and the `ttlSecondsAfterFinished` period has passed. Note This prevents resource leaks and must be set to `true` for queued jobs.
`suspend:true`	Only RayJobs with suspend set to `true` will enter the queue.

Use kubectl create -f to create two RayJobs.

kubectl  get rayjob -n video
NAME                 JOB STATUS   DEPLOYMENT STATUS   START TIME             END TIME   AGE
rayjob-video-g2lvn                Initializing        2025-01-10T01:36:24Z              6s
rayjob-video-h4x2q                Suspended           2025-01-10T01:36:25Z              5s            5s            3s

rayjob-video-g2lvn is dequeued and is in the Initializing state, while rayjob-video-h4x2q remains queued in the Suspended state.

Check the enqueue and dequeue times for rayjob-video-g2lvn in its annotations using kube-queue/job-enqueue-timestamp and kube-queue/job-dequeue-timestamp.

kubectl -n video get rayjob rayjob-video-g2lvn -o yaml

apiVersion: ray.io/v1
kind: RayJob
metadata:
  annotations:
    kube-queue/job-dequeue-timestamp: 2025-01-10 01:36:24.641181026 +0000 UTC m=+132100.596012828
    kube-queue/job-enqueue-timestamp: 2025-01-10 01:36:24.298639916 +0000 UTC m=+132100.253471714
  creationTimestamp: "2025-01-10T01:36:24Z"

For rayjob-video-h4x2q, the annotations only show the enqueue time (kube-queue/job-enqueue-timestamp), with no dequeue time. This indicates that the RayJob has not yet been dequeued for scheduling.

 kubectl -n video get rayjob rayjob-video-h4x2q -o yaml
 
apiVersion: ray.io/v1
kind: RayJob
metadata:
  annotations:
    kube-queue/job-enqueue-timestamp: 2025-01-10 01:36:25.505182364 +0000 UTC m=+132101.460014182
  creationTimestamp: "2025-01-10T01:36:25Z"

View the Pods. Currently, only the Pods for rayjob-video-g2lvn have started scheduling.

 kubectl -n video get pod
NAME                                                           READY   STATUS    RESTARTS   AGE
rayjob-video-g2lvn-9gz66                                       1/1     Running   0          28s
rayjob-video-g2lvn-raycluster-v8tfh-head-6trq5                 1/1     Running   0          49s
rayjob-video-g2lvn-raycluster-v8tfh-small-group-worker-hkt7m   1/1     Running   0          49s
rayjob-video-g2lvn-raycluster-v8tfh-small-group-worker-rbzjn   1/1     Running   0          49s

Check the properties of the queue. The second job, rayjob-video-h4x2q, is now in the backoff list, waiting for resources.

k -n kube-queue get queue root-algorithm-video-k42kq -o yaml

apiVersion: scheduling.x-k8s.io/v1alpha1
kind: Queue
metadata:
  annotations:
    kube-queue/parent-quota-fullname: algorithm
    kube-queue/quota-fullname: root/algorithm/video
  creationTimestamp: "2025-01-09T08:34:57Z"
  generateName: root-algorithm-video-
  generation: 1
  labels:
    create-by-kubequeue: "true"
  name: root-algorithm-video-k42kq
  namespace: kube-queue
  resourceVersion: "19070012"
  uid: 5606059e-acf5-4f92-b11a-48a02ef53cdf
spec:
  queuePolicy: Round
status:
  queueItemDetails:
    active: []
    backoff:
    - name: rayjob-video-h4x2q-ray-qu
      namespace: video
      position: 1

Set up gang scheduling

Use gang scheduling (or co-scheduling) with Ray for distributed tasks that require multiple nodes to start simultaneously. This is useful in the following scenarios:

Large-scale machine learning training: When dealing with very large datasets or complex models, a single machine may not provide sufficient computing resources. In this case, a group of containers needs to work together. Gang scheduling ensures that these containers are scheduled simultaneously, avoiding resource contention and deadlocks, thereby improving training efficiency.
MPI computing framework: Parallel computing with multiple threads under the MPI framework requires master and slave processes to work together. Gang scheduling ensures that these processes are scheduled simultaneously, reducing communication latency and improving computational efficiency.
Data processing and analysis: For applications that need to process massive amounts of data, such as log analysis and real-time stream processing, multiple jobs may need to run simultaneously to complete complex analysis tasks. Gang scheduling ensures that these jobs are scheduled at the same time, improving overall processing speed.
Custom distributed application development: Implement player matchmaking services in a game server architecture, or coordinate data collection and processing from thousands of devices in an Internet of Things (IoT) project.

To enable gang scheduling for a RayJob in an ACK Pro or ACK Lingjun cluster, add the ray.io/scheduler-name: kube-scheduler label to its metadata. After submitting the job, the Ray Operator will automatically inject the necessary labels for gang scheduling when creating the Pods.

View sample code

apiVersion: ray.io/v1
kind: RayJob
metadata:
  generateName: rayjob-sample-
  namespace: algorithm-text
  labels:
    # Use ray.io/scheduler-name to specify gang scheduling.
    ray.io/scheduler-name: kube-scheduler
    # Use quota.scheduling.alibabacloud.com/name to specify a quota.
    quota.scheduling.alibabacloud.com/name: algorithm-video
spec:
  entrypoint: python /home/ray/samples/sample_code.py
  runtimeEnvYAML: |
    pip:
      - requests==2.26.0
      - pendulum==2.1.2
    env_vars:
      counter_name: "test_counter"
  shutdownAfterJobFinishes: true
  # Suspend specifies whether the RayJob controller should create a RayCluster instance.
  # If a job is applied with the suspend field set to true, the RayCluster will not be created and we will wait for the transition to false.
  # If the RayCluster is already created, it will be deleted. In the case of transition to false, a new RayCluste rwill be created.
  suspend: true

  rayClusterSpec:
    rayVersion: '2.9.0' # should match the Ray version in the image of the containers
    headGroupSpec:
      rayStartParams:
        dashboard-host: '0.0.0.0'
      template:
        spec:
          containers:
            - name: ray-head
              image: rayproject/ray:2.9.0
              ports:
                - containerPort: 6379
                  name: gcs-server
                - containerPort: 8265 # Ray dashboard
                  name: dashboard
                - containerPort: 10001
                  name: client
              resources:
                limits:
                  cpu: "1"
                requests:
                  cpu: "1"
              volumeMounts:
                - mountPath: /home/ray/samples
                  name: code-sample
          volumes:
            # You set volumes at the Pod level, then mount them into containers inside that Pod
            - name: code-sample
              configMap:
                # Provide the name of the ConfigMap you want to mount.
                name: ray-job-code-sample
                # An array of keys from the ConfigMap to create as files
                items:
                  - key: sample_code.py
                    path: sample_code.py
    workerGroupSpecs:
      - replicas: 30
        groupName: small-group
        rayStartParams: {}
        template:
          spec:
            containers:
              - name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc'
                image: rayproject/ray:2.9.0
                lifecycle:
                  preStop:
                    exec:
                      command: [ "/bin/sh","-c","ray stop" ]
                resources:
                  limits:
                    cpu: "1"
                  requests:
                    cpu: "1"
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: ray-job-code-sample
  namespace: algorithm-text
data:
  sample_code.py: |
    import ray
    import os
    import requests

    ray.init()

    @ray.remote
    class Counter:
        def __init__(self):
            # Used to verify runtimeEnv
            self.name = os.getenv("counter_name")
            assert self.name == "test_counter"
            self.counter = 0

        def inc(self):
            self.counter += 1

        def get_counter(self):
            return "{} got {}".format(self.name, self.counter)

    counter = Counter.remote()

    for _ in range(5):
        ray.get(counter.inc.remote())
        print(ray.get(counter.get_counter.remote()))

    # Verify that the correct runtime env was used for the job.
    assert requests.__version__ == "2.26.0"

    import time
    time.sleep(30)

When creating Pods, the Ray Operator injects labels to facilitate identification, grouping, and gang scheduling operations.

View sample code

apiVersion: v1
kind: Pod
metadata:
  annotations:
    ray.io/ft-enabled: "false"
  creationTimestamp: "2024-10-10T02:38:29Z"
  generateName: rayjob-sample-hhbdr-raycluster-ljj69-small-group-worker-
  labels:
    app.kubernetes.io/created-by: kuberay-operator
    app.kubernetes.io/name: kuberay
    # Add the Coscheduling label recognized by ACK.
    pod-group.scheduling.sigs.k8s.io/min-available: "31"
    pod-group.scheduling.sigs.k8s.io/name: rayjob-sample-hhbdr-raycluster-ljj69
    # Add the Quota label recognized by ACK.
    quota.scheduling.alibabacloud.com/name: algorithm-video
    ray.io/cluster: rayjob-sample-hhbdr-raycluster-ljj69
    ray.io/group: small-group
    ray.io/identifier: rayjob-sample-hhbdr-raycluster-ljj69-worker
    ray.io/is-ray-node: "yes"
    ray.io/node-type: worker
    scheduling.x-k8s.io/pod-group: rayjob-sample-hhbdr-raycluster-ljj69
  name: rayjob-sample-hhbdr-raycluster-ljj69-small-group-worker-xnzjh
  namespace: algorithm-text
  ownerReferences:
  - apiVersion: ray.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: RayCluster
    name: rayjob-sample-hhbdr-raycluster-ljj69
    uid: 74259f20-86fd-4777-b826-73d201065931
  resourceVersion: "7482744"
  uid: 4f666efe-a25d-4620-824f-cab3f4fa0ce7
spec:
  containers:
  - args:
    - 'ulimit -n 65536; ray start  --address=rayjob-sample-hhbdr-raycluster-ljj69-head-svc.algorithm-text.svc.cluster.local:6379  --metrics-export-port=8080  --block  --dashboard-agent-listen-port=52365  --num-cpus=1 '
    command:
    - /bin/bash
    - -lc
    - --
    env:
    - name: FQ_RAY_IP
      value: rayjob-sample-hhbdr-raycluster-ljj69-head-svc.algorithm-text.svc.cluster.local
    - name: RAY_IP
      value: rayjob-sample-hhbdr-raycluster-ljj69-head-svc
    - name: RAY_CLUSTER_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.labels['ray.io/cluster']
    - name: RAY_CLOUD_INSTANCE_ID
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: RAY_NODE_TYPE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.labels['ray.io/group']
    - name: KUBERAY_GEN_RAY_START_CMD
      value: 'ray start  --address=rayjob-sample-hhbdr-raycluster-ljj69-head-svc.algorithm-text.svc.cluster.local:6379  --metrics-export-port=8080  --block  --dashboard-agent-listen-port=52365  --num-cpus=1 '
    - name: RAY_PORT
      value: "6379"
    - name: RAY_ADDRESS
      value: rayjob-sample-hhbdr-raycluster-ljj69-head-svc.algorithm-text.svc.cluster.local:6379
    - name: RAY_USAGE_STATS_KUBERAY_IN_USE
      value: "1"
    - name: REDIS_PASSWORD
    - name: RAY_DASHBOARD_ENABLE_K8S_DISK_USAGE
      value: "1"
    image: rayproject/ray:2.9.0
    imagePullPolicy: IfNotPresent
    lifecycle:
      preStop:
        exec:
          command:
          - /bin/sh
          - -c
          - ray stop
    livenessProbe:
      exec:
        command:
        - bash
        - -c
        - wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep
          success
      failureThreshold: 120
      initialDelaySeconds: 30
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 2
    name: ray-worker
    ports:
    - containerPort: 8080
      name: metrics
      protocol: TCP
    readinessProbe:
      exec:
        command:
        - bash
        - -c
        - wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep
          success
      failureThreshold: 10
      initialDelaySeconds: 10
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 2
    resources:
      limits:
        cpu: "1"
      requests:
        cpu: "1"
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /dev/shm
      name: shared-mem
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-rq67v
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - args:
    - "\n\t\t\t\t\tSECONDS=0\n\t\t\t\t\twhile true; do\n\t\t\t\t\t\tif (( SECONDS
      <= 120 )); then\n\t\t\t\t\t\t\tif ray health-check --address rayjob-sample-hhbdr-raycluster-ljj69-head-svc.algorithm-text.svc.cluster.local:6379
      > /dev/null 2>&1; then\n\t\t\t\t\t\t\t\techo \"GCS is ready.\"\n\t\t\t\t\t\t\t\tbreak\n\t\t\t\t\t\t\tfi\n\t\t\t\t\t\t\techo
      \"$SECONDS seconds elapsed: Waiting for GCS to be ready.\"\n\t\t\t\t\t\telse\n\t\t\t\t\t\t\tif
      ray health-check --address rayjob-sample-hhbdr-raycluster-ljj69-head-svc.algorithm-text.svc.cluster.local:6379;
      then\n\t\t\t\t\t\t\t\techo \"GCS is ready. Any error messages above can be safely
      ignored.\"\n\t\t\t\t\t\t\t\tbreak\n\t\t\t\t\t\t\tfi\n\t\t\t\t\t\t\techo \"$SECONDS
      seconds elapsed: Still waiting for GCS to be ready. For troubleshooting, refer
      to the FAQ at https://github.com/ray-project/kuberay/blob/master/docs/guidance/FAQ.md.\"\n\t\t\t\t\t\tfi\n\t\t\t\t\t\tsleep
      5\n\t\t\t\t\tdone\n\t\t\t\t"
    command:
    - /bin/bash
    - -lc
    - --
    env:
    - name: FQ_RAY_IP
      value: rayjob-sample-hhbdr-raycluster-ljj69-head-svc.algorithm-text.svc.cluster.local
    - name: RAY_IP
      value: rayjob-sample-hhbdr-raycluster-ljj69-head-svc
    image: rayproject/ray:2.9.0
    imagePullPolicy: IfNotPresent
    name: wait-gcs-ready
    resources:
      limits:
        cpu: 200m
        memory: 256Mi
      requests:
        cpu: 200m
        memory: 256Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-rq67v
      readOnly: true
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir:
      medium: Memory
    name: shared-mem
  - name: kube-api-access-rq67v
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-10-10T02:38:29Z"
    message: '0/7 nodes are available: 3 node(s) had untolerated taint {virtual-kubelet.io/provider:
      alibabacloud}, 4 Insufficient cpu. failed to get current scheduling unit not
      found, preemption: 0/7 nodes are available: 1 No victims found on node cn-hongkong.10.0.118.181
      for preemptor pod rayjob-sample-hhbdr-raycluster-ljj69-small-group-worker-xnzjh,
      1 No victims found on node cn-hongkong.10.1.0.52 for preemptor pod rayjob-sample-hhbdr-raycluster-ljj69-small-group-worker-xnzjh,
      1 No victims found on node cn-hongkong.10.2.0.24 for preemptor pod rayjob-sample-hhbdr-raycluster-ljj69-small-group-worker-xnzjh,
      1 No victims found on node cn-hongkong.10.2.0.5 for preemptor pod rayjob-sample-hhbdr-raycluster-ljj69-small-group-worker-xnzjh,
      3 Preemption is not helpful for scheduling., '
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: Burstable

When resources are insufficient, get detailed information about scheduling failures by inspecting Kubernetes events. Use --field-selector='type=Warning,reason=GangFailedScheduling' to filter for events related to gang scheduling failures. The event message, which may include cycle xx, provides details about a specific scheduling attempt and explains why the Pod could not be successfully scheduled in that round. The following is an example.

➜  kubequeue-doc kubectl get events -n algorithm-text --field-selector='type=Warning,reason=GangFailedScheduling' | grep "cycle 1"
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-89mlq   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-89mlq in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-8fwmr   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-8fwmr in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-8g5wv   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-8g5wv in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m46s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-8tn4w   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-8tn4w in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-97gpk   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-97gpk in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m46s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-9xsgw   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-9xsgw in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-gwxhg   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-gwxhg in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-jzw6k   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-jzw6k in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-kb55s   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-kb55s in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-lbvk7   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-lbvk7 in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m46s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-ms96b   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-ms96b in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-sgr9g   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-sgr9g in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m46s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-svt6g   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-svt6g in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.
5m48s       Warning   GangFailedScheduling   pod/rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-wm5c6   rayjob-sample-dtmtl-raycluster-r9jc7-small-group-worker-wm5c6 in gang failed to be scheduled in cycle 1: 0/0 nodes are available: 3 Insufficient cpu.