Install the KubeRay Operator on an ACK Managed Cluster Pro and connect it to Simple Log Service and Managed Service for Prometheus. This gives you centralized log storage and metrics dashboards for Ray clusters running on your cluster.
Prerequisites
Before you begin, ensure that you have:
An ACK Managed Cluster Pro running Kubernetes v1.24 or later with at least one node of 8 vCPUs and 32 GB of memory. For production workloads, size nodes to match your actual workload. For GPU acceleration, configure GPU-accelerated nodes. See Instance family for supported ECS instance types.
kubectl installed locally and connected to the cluster. See Obtain the KubeConfig file of a cluster and connect to the cluster by using kubectl.
If you need to create or upgrade a cluster, see Create an ACK managed cluster or Manually upgrade a cluster.
Install KubeRay Operator
KubeRay Operator is in invitational preview. To get access, submit a ticket.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
Click the name of your cluster.
On the cluster details page, choose Operations > Add-ons > Manage Applications.
Find and install Kuberay-Operator.

Verify: After installation, run the following command to confirm the operator pod is running:
kubectl get pods -A | grep kuberayThe operator pod should appear with a Running status. If the pod stays in Pending, run kubectl describe pod <pod-name> to check for scheduling errors, such as insufficient node resources.
Enable log collection for KubeRay Operator
Collect control-plane logs from the KubeRay Operator itself.
On the cluster details page, choose Operations > Log Center > Control Plane Component Logs.
Click Enable Component Log Collection.
Select kuberay-operator from the drop-down list.
Enable log collection for Ray clusters
Collect logs from Ray cluster pods and forward them to Simple Log Service. The following steps configure Logtail (via the AliyunLogConfig custom resource) to watch the Ray log directory across all Ray pods and tag each log entry with the cluster name and node type.
Simple Log Service is a paid service. See Billing overview for pricing details.
Apply the following manifest to create an
AliyunLogConfigobject in thekube-systemnamespace:Key parameters:
Parameter Description logPathThe log directory on each Ray pod. Matches all session directories under /tmp/ray/. Specify a custom path if your Ray cluster uses a different log location.advanced.k8s.ExternalK8sLabelTagAdds _raycluster_name_and_node_type_tags to each log entry for filtering in Simple Log Service.For the full list of
AliyunLogConfigparameters, see Use CRDs to collect container logs in DaemonSet mode.View logs in the ACK console:
Log on to the ACK console. In the left-side navigation pane, click Clusters.
Click the name of your cluster.
Choose Cluster Information > Basic Information > Cluster Resources.
Click the link next to Log Service Project to open the Simple Log Service project.

Select the Logstore named
rayclustersto view log entries. Filter by the_raycluster_name_tag to isolate logs from a specific Ray cluster.
Enable monitoring for Ray clusters
Collect Prometheus metrics from Ray cluster pods using a PodMonitor for worker nodes and a ServiceMonitor for the head node. Worker nodes are monitored with a PodMonitor because they are independent pods—not replicas managed by a ReplicaSet—so grouping them through a Kubernetes Service is not reliable. The head node exposes metrics through a stable Service endpoint, making a ServiceMonitor the appropriate choice.
Managed Service for Prometheus is a paid service. See Managed Service for Prometheus instance billing for pricing details. For setup instructions, see Connect to and configure Managed Service for Prometheus.
Create a PodMonitor to collect metrics from Ray worker pods:
apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: annotations: arms.prometheus.io/discovery: 'true' arms.prometheus.io/resource: arms name: ray-workers-monitor namespace: arms-prom labels: # `release: $HELM_RELEASE`: Prometheus can only detect PodMonitor with this label. release: prometheus #ray.io/cluster: raycluster-kuberay # $RAY_CLUSTER_NAME: "kubectl get rayclusters.ray.io" spec: namespaceSelector: any: true jobLabel: ray-workers # Only select Kubernetes Pods with "matchLabels". selector: matchLabels: ray.io/node-type: worker # A list of endpoints allowed as part of this PodMonitor. podMetricsEndpoints: - port: metrics relabelings: - action: replace regex: (.+) replacement: $1 separator: ; sourceLabels: - __meta_kubernetes_pod_label_ray_io_cluster targetLabel: ray_io_clusterCreate a ServiceMonitor to collect metrics from the Ray head node:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: annotations: arms.prometheus.io/discovery: 'true' arms.prometheus.io/resource: arms name: ray-head-monitor namespace: arms-prom labels: # `release: $HELM_RELEASE`: Prometheus can only detect ServiceMonitor with this label. release: prometheus spec: namespaceSelector: any: true jobLabel: ray-head # Only select Kubernetes Services with "matchLabels". selector: matchLabels: ray.io/node-type: head # A list of endpoints allowed as part of this ServiceMonitor. endpoints: - port: metrics path: /metrics targetLabels: - ray.io/clusterIntegrate with Application Real-Time Monitoring Service (ARMS) to view dashboards:
Log on to the ARMS console. In the left-side navigation pane, click Integration Center.
Search for Ray, then select it from the results.
In the Ray panel, select your cluster and click OK.

Click Integration Management in the left-side navigation pane, then click the target environment name.
On the Component Management tab, find Dashboards in the Addon Type section, then click Ray Cluster.

Set Namespace, RayClusterName, and SessionName to filter the monitoring data for a specific task.

Verify: After applying the PodMonitor and ServiceMonitor, confirm that Prometheus is scraping your targets. In the ARMS console, both ray-workers-monitor and ray-head-monitor should appear as active scrape targets in the integrated Prometheus instance. If a target does not appear, check that the release: prometheus label is present on the PodMonitor and ServiceMonitor, and that the namespace selector matches the namespace where your Ray cluster is running.
What's next
Deploy a RayCluster custom resource to start running distributed Ray workloads on your ACK cluster.
Use the
_raycluster_name_tag in Simple Log Service to correlate logs across multiple Ray clusters.Set up alerting rules in ARMS based on the Ray cluster metrics collected by Managed Service for Prometheus.