All Products
Search
Document Center

Container Service for Kubernetes:Deploy Ray Cluster in ACK

Last Updated:Jun 13, 2025

Ray is an open-source unified framework for scaling AI and Python applications. Ray is widely adopted in the machine learning sector. This topic describes how to deploy a Ray Cluster on an ACK cluster.

1. Create a cluster

For more information, see Create an ACK managed cluster. For more information about upgrading a cluster, see Manually upgrade a cluster. Create an ACK managed cluster Pro that meets the following requirements:

  • The Kubernetes version of the cluster is v1.24 or later.

  • Node specifications: A node that provides at least 8 vCPUs and 32 GB of memory is created.

  • You can use the recommended minimum specifications in a test environment. In a production environment, the specifications depend on your actual requirements. If you need to use GPU nodes, configure GPU nodes.

    For more information about the instance types supported by ECS, see Instance family.

  • You have connected to the Kubernetes cluster by using kubectl and have installed kubectl on your computer. For more information, see Obtain the KubeConfig file of a cluster and connect to the cluster by using kubectl.

(Optional) Create an ApsaraDB for Redis instance

This topic uses a Redis instance to implement fault tolerance and high availability for the Ray Cluster. You can create a Redis instance based on your needs. Create a Tair (Redis OSS-compatible) instance that meets the following requirements:

  • The Tair (Redis OSS-compatible) instance is deployed in the same region and virtual private cloud (VPC) as the ACK managed cluster Pro that is used in this topic. For more information, see Step 1: Create an instance.

  • Add a whitelist to allow access from the VPC CIDR block. For more information, see Step 2: Configure a whitelist.

  • Obtain the endpoint of the ApsaraDB for Redis instance. We recommend that you use the VPC endpoint. For more information, see View connection addresses.

  • Obtain the password of the ApsaraDB for Redis instance. For more information, see Change or reset the password.

2. Install the Kuberay-Operator component

Log on to the Container Service for Kubernetes (ACK) console. In the left-side navigation pane, click Clusters. Click the name of the cluster. On the cluster details page, click Operations > Add-ons > Application Management > Click to Install Kuberay-Operator in sequence as shown in the following figure to install the Kuberay-Operator component for the cluster.

image

3. Deploy a Ray Cluster

Important

Solution to Docker Hub pull failures.

Due to unstable factors such as carrier networks, image accelerators may fail to pull container images of specific versions. We recommend that you use container images that depend on Docker Hub with caution in production environments. The Ray official image used in this example is rayproject/ray:2.36.1. If you cannot pull this image, you can replace it with a subscribed image address by performing the following operations:

Run the following commands to create a Ray Cluster named myfirst-ray-cluster and check the deployment status.

  1. Run the following command to create a Ray Cluster resource.

    Expand to view the complete command code

    cat <<EOF | kubectl apply -f -
    apiVersion: ray.io/v1
    kind: RayCluster
    metadata:
      name: myfirst-ray-cluster
      namespace: default
    spec:
      suspend: false
      autoscalerOptions:
        env: []
        envFrom: []
        idleTimeoutSeconds: 60
        imagePullPolicy: Always
        resources:
          limits:
            cpu: 2000m
            memory: 2024Mi
          requests:
            cpu: 2000m
            memory: 2024Mi
        securityContext: {}
        upscalingMode: Default
      enableInTreeAutoscaling: false
      headGroupSpec:
        rayStartParams:
          dashboard-host: 0.0.0.0
          num-cpus: "0"
        serviceType: ClusterIP
        template:
          spec:
            containers:
            - image: rayproject/ray:2.36.1
              imagePullPolicy: Always
              name: ray-head
              resources:
                limits:
                  cpu: "4"
                  memory: 4G
                requests:
                  cpu: "1"
                  memory: 1G
      workerGroupSpecs:
      - groupName: work1
        maxReplicas: 1000
        minReplicas: 0
        numOfHosts: 1
        rayStartParams: {}
        replicas: 1
        template:
          spec:
            containers:
            - image: rayproject/ray:2.36.1
              imagePullPolicy: Always
              name: ray-worker
              resources:
                limits:
                  cpu: "4"
                  memory: 4G
                requests:
                  cpu: "4"
                  memory: 4G
    EOF
  2. Run the following commands to check the deployment status

    1. Check the Ray Cluster deployment status.

      kubectl get raycluster

      Expected results:

      NAME                  DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
      myfirst-ray-cluster   1                 1                   5      5G       0      ready    4m19s
    2. Check the pods of the Ray Cluster.

      kubectl get pod

      Expected results:

      NAME                                     READY   STATUS    RESTARTS   AGE
      myfirst-ray-cluster-head-5q2hk           1/1     Running   0          4m37s
      myfirst-ray-cluster-work1-worker-zkjgq   1/1     Running   0          4m31s
    3. Check the services of the Ray Cluster.

      kubectl get svc

      Expected results:

      NAME                           TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                                         AGE
      kubernetes                     ClusterIP   192.168.0.1   <none>        443/TCP                                         21d
      myfirst-ray-cluster-head-svc   ClusterIP   None          <none>        10001/TCP,8265/TCP,8080/TCP,6379/TCP,8000/TCP   6m57s