All Products
Search
Document Center

Container Service for Kubernetes:Deploy Ray Cluster in ACK

Last Updated:Nov 17, 2025

Ray is an open-source framework for building scalable AI and Python applications, widely used in the machine learning field. This guide shows how to deploy a Ray Cluster on Alibaba Cloud Container Service for Kubernetes (ACK).

Create a cluster

To create a cluster, see Create an ACK managed cluster. To upgrade your cluster, see Manually upgrade a cluster. Create an ACK Managed Cluster Pro that meets the following requirements.

  • Cluster version: v1.24 or later.

  • Instance Type: Requires at least one node with a minimum of 8 vCPUs and 32 GB of memory.

  • The recommended minimum specifications are for a test environment. For production environments, use specifications that match your actual workload. If you require GPU acceleration, configure GPU-accelerated nodes.

    For more information about supported ECS instance types, see Instance family.

  • You have kubectl installed on your local machine and are connected to your Kubernetes cluster. For more information, see Obtain the KubeConfig file of a cluster and connect to the cluster by using kubectl.

(Optional) Create an ApsaraDB for Tair instance

To provide fault tolerance and high availability for the Ray Cluster, create an Alibaba Cloud ApsaraDB for Tair (Redis-compatible) instance that meets the following requirements.

Install the Kuberay-Operator component

Log on to the Container Service for Kubernetes (ACK) console. In the left-side navigation pane, click Clusters. Click the name of your target cluster. Navigate to Operations > Add-ons > Manage Applications, then click Install under Kuberay-Operator.

image

Deploy the Ray Cluster

Important

Solution for Docker Hub pull failures

Due to network instability, such as issues with carrier networks, image pulls from Docker Hub may fail. We recommend using images that rely on Docker Hub with caution in production environments. This example uses the official Ray image rayproject/ray:2.36.1. If you cannot pull this image, use one of the following solutions:

Run the following commands to create a Ray Cluster named myfirst-ray-cluster and check its deployment status.

  1. Run the following command to create the Ray Cluster resource.

    Expand to view the complete command code

    cat <<EOF | kubectl apply -f -
    apiVersion: ray.io/v1
    kind: RayCluster
    metadata:
      name: myfirst-ray-cluster
      namespace: default
    spec:
      suspend: false
      autoscalerOptions:
        env: []
        envFrom: []
        idleTimeoutSeconds: 60
        imagePullPolicy: Always
        resources:
          limits:
            cpu: 2000m
            memory: 2024Mi
          requests:
            cpu: 2000m
            memory: 2024Mi
        securityContext: {}
        upscalingMode: Default
      enableInTreeAutoscaling: false
      headGroupSpec:
        rayStartParams:
          dashboard-host: 0.0.0.0
          num-cpus: "0"
        serviceType: ClusterIP
        template:
          spec:
            containers:
            - image: rayproject/ray:2.36.1
              imagePullPolicy: Always
              name: ray-head
              resources:
                limits:
                  cpu: "4"
                  memory: 4G
                requests:
                  cpu: "1"
                  memory: 1G
      workerGroupSpecs:
      - groupName: work1
        maxReplicas: 1000
        minReplicas: 0
        numOfHosts: 1
        rayStartParams: {}
        replicas: 1
        template:
          spec:
            containers:
            - image: rayproject/ray:2.36.1
              imagePullPolicy: Always
              name: ray-worker
              resources:
                limits:
                  cpu: "4"
                  memory: 4G
                requests:
                  cpu: "4"
                  memory: 4G
    EOF
  2. Run the following commands to check the deployment status.

    1. Check the status of the Ray Cluster.

      kubectl get raycluster

      Expected output:

      NAME                  DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
      myfirst-ray-cluster   1                 1                   5      5G       0      ready    4m19s
    2. Check the pods for the Ray Cluster.

      kubectl get pod

      Expected output:

      NAME                                     READY   STATUS    RESTARTS   AGE
      myfirst-ray-cluster-head-5q2hk           1/1     Running   0          4m37s
      myfirst-ray-cluster-work1-worker-zkjgq   1/1     Running   0          4m31s
    3. Check the services for the Ray Cluster.

      kubectl get svc

      Expected output:

      NAME                           TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                                         AGE
      kubernetes                     ClusterIP   192.168.0.1   <none>        443/TCP                                         21d
      myfirst-ray-cluster-head-svc   ClusterIP   None          <none>        10001/TCP,8265/TCP,8080/TCP,6379/TCP,8000/TCP   6m57s