All Products
Search
Document Center

Container Service for Kubernetes:Install KubeRay in ACK

Last Updated:Dec 03, 2025

This topic describes how to install KubeRay Operator in an ACK managed Pro cluster and how to enable Simple Log Service and Managed Service for Prometheus for KubeRay. This improves log management, system observability, and system availability. You can create Kubernetes custom resources to manage Ray clusters and applications.

Prerequisites

To create a cluster, see Create an ACK managed cluster. To upgrade your cluster, see Manually upgrade a cluster. Create an ACK Managed Cluster Pro that meets the following requirements.

  • Cluster version: v1.24 or later.

  • Instance Type: Requires at least one node with a minimum of 8 vCPUs and 32 GB of memory.

  • The recommended minimum specifications are for a test environment. For production environments, use specifications that match your actual workload. If you require GPU acceleration, configure GPU-accelerated nodes.

    For more information about supported ECS instance types, see Instance family.

  • You have kubectl installed on your local machine and are connected to your Kubernetes cluster. For more information, see Obtain the KubeConfig file of a cluster and connect to the cluster by using kubectl.

Install KubeRay

Log on to the ACK console. In the left navigation pane, click Clusters. Click the name of the cluster you created. On the cluster details page, click Operations > Add-ons > Manage Applications as indicated in the following figure to install Kuberay-Operator.

Important

Kuberay-Operator is in invitational preview. To use Kuberay-Operator, submit a ticket.

image

Enable log collection for KubeRay

  1. Go to the cluster details page and choose Operations > Log Center > Control Plane Component Logs > Enable Component Log Collection.

  2. Select kuberay-operator from the drop-down list.

Enable log collection for Ray clusters

You can integrate Simple Log Service with a Ray cluster to persist logs.

  1. Run the following command to create a global AliyunLogConfig object to enable the Logtail component in the ACK cluster to collect logs generated by the pods of Ray clusters and deliver the logs to a Simple Log Service project.

    View sample code

    cat <<EOF | kubectl apply -f -
    apiVersion: log.alibabacloud.com/v1alpha1
    kind: AliyunLogConfig
    metadata:
      name: rayclusters
      namespace: kube-system
    spec:
       # The name of the Logstore. If the specified Logstore does not exist, Simple Log Service automatically creates one. 
      logstore: rayclusters
      # Configure Logtail. 
      logtailConfig:
        # The type of data source. If you want to collect text logs, you must set the value to file. 
        inputType: file
        # The name of the Logtail configuration. The name must be the same as the resource name that is specified in metadata.name. 
        configName: rayclusters
        inputDetail:
          # Configure Logtail to collect text logs in simple mode. 
          logType: common_reg_log
          # The path of the log file. 
          logPath: /tmp/ray/session_*-*-*_*/logs
          # The name of the log file. You can use wildcard characters such as asterisks (*) and question marks (?) when you specify the log file name. Example: log_*.log. 
          filePattern: "*.*"
          # If you want to collect container text logs, you must set dockerFile to true. 
          dockerFile: true
          # The filter conditions of containers. 
          advanced:
            k8s:
              IncludeK8sLabel:
                ray.io/is-ray-node: "yes"
              ExternalK8sLabelTag:
                ray.io/cluster: "_raycluster_name_"
                ray.io/node-type : "_node_type_"
    EOF

    Parameter

    Description

    logPath

    Collects all logs in the /tmp/ray/session_*-*-*_*/logs directory of the pods. You can specify a custom path.

    advanced.k8s.ExternalK8sLabelTag

    Adds tags to the collected logs to facilitate retrieval. By default, the _raycluster_name_ and _node_type_ tags are added.

    For more information about the AliyunLogConfig parameters, see Use CRDs to collect container logs in DaemonSet mode. Simple Log Service is a paid service. For more information, see Billing overview.

  2. View logs collected from Ray clusters.

    Log on to the ACK console. In the left-side navigation pane, click Clusters. Click the name of the cluster that you want to manage. On the cluster details page, click the callouts in the following figure in sequence. Choose Cluster Information > Basic Information > Cluster Resources. Then, click the hyperlink on the right side of Log Service Project to go to the details page of the Simple Log Service project.image

  3. Select the Logstore that corresponds to rayclusters and view the log content.

    You can view the logs of different Ray clusters based on tags, such as _raycluster_name_.

    image

Enable monitoring for Ray clusters

You can enable Managed Service for Prometheus for a Ray cluster. For more information about Managed Service for Prometheus, see Connect to and configure Managed Service for Prometheus. For more information about the billing of Managed Service for Prometheus, see Managed Service for Prometheus instance billing.

Create a PodMonitor and ServiceMonitor to collect metrics from the Ray cluster.

  1. Run the following command to create a PodMonitor:

    apiVersion: monitoring.coreos.com/v1
    kind: PodMonitor
    metadata:
      annotations:
        arms.prometheus.io/discovery: 'true'
        arms.prometheus.io/resource: arms
      name: ray-workers-monitor
      namespace: arms-prom
      labels:
        # `release: $HELM_RELEASE`: Prometheus can only detect PodMonitor with this label.
        release: prometheus
        #ray.io/cluster: raycluster-kuberay # $RAY_CLUSTER_NAME: "kubectl get rayclusters.ray.io"
    spec:
      namespaceSelector:
        any: true
      jobLabel: ray-workers
      # Only select Kubernetes Pods with "matchLabels".
      selector:
        matchLabels:
          ray.io/node-type: worker
      # A list of endpoints allowed as part of this PodMonitor.
      podMetricsEndpoints:
      - port: metrics
        relabelings:
        - action: replace
          regex: (.+)
          replacement: $1
          separator: ;
          sourceLabels:
            - __meta_kubernetes_pod_label_ray_io_cluster
          targetLabel: ray_io_cluster
          
    
  2. Run the following command to create a ServiceMonitor:

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      annotations:
        arms.prometheus.io/discovery: 'true'
        arms.prometheus.io/resource: arms
      name: ray-head-monitor
      namespace: arms-prom
      labels:
        # `release: $HELM_RELEASE`: Prometheus can only detect ServiceMonitor with this label.
        release: prometheus
    spec:
      namespaceSelector:
        any: true
      jobLabel: ray-head
      # Only select Kubernetes Services with "matchLabels".
      selector:
        matchLabels:
          ray.io/node-type: head
      # A list of endpoints allowed as part of this ServiceMonitor.
      endpoints:
        - port: metrics
          path: /metrics
      targetLabels:
      - ray.io/cluster
    
  3. Log on to the Application Real-Time Monitoring Service (ARMS) console and view resource integration information.

    1. Log on to the ARMS console. In the left-side navigation pane, click Integration Center. Use the search bar to find Ray (②), then select it from the results (③). In the Ray panel, select the cluster you created and click OK.

      image

    2. After the ACK cluster is integrated with Managed Service for Prometheus, click Integration Management from the left navigation pane, then click the target environment name. On the Component Management tab, click Dashboards in the Addon Type section, then click Ray Cluster.image

    3. Specify Namespace, RayClusterName, and SessionName to filter the monitoring data of tasks running in the Ray clusters.image.png