All Products
Search
Document Center

Container Service for Kubernetes:Install and use the HistoryServer component

Last Updated:Mar 26, 2026

The native Ray Dashboard is available only while a cluster is running. Once the cluster is terminated, all historical logs and monitoring data are gone. HistoryServer solves this by collecting node logs in real time during cluster operation and storing them persistently in Object Storage Service (OSS), so you can query historical records even after the cluster is recycled.

HistoryServer requires six configuration steps. Complete them in order, as each step depends on the previous one.

Step What you do
1. Enable RRSA Enable the RAM Roles for Service Accounts (RRSA) OpenID Connect (OIDC) feature on your cluster
2. Create an RRSA role Create a RAM role with ARMS read and OSS full-access permissions
3. Create an OAuth application Register an OAuth app and create a Kubernetes Secret with its credentials
4. Configure KubeRay Install KubeRay and set the HistoryServer parameters in the Operator
5. Create a RayCluster Submit a RayCluster with the ray.alibabacloud.com/enable-historyserver: "true" annotation
6. Access HistoryServer Connect via port-forward or configure public internet access

Prerequisites

Before you begin, ensure that you have:

  • KubeRay version later than 1.2.1.5 installed. For more information, see Install KubeRay in ACK.

  • (If you use a custom postStart hook) When HistoryServer is enabled, it overwrites the postStart hook of pods created by RayCluster. To keep your hook behavior, append the following script, which writes the Ray nodeid to /tmp/ray/init.log for the HistoryServer Collector sidecar to read.

    GetNodeId(){
      while true;
      do
        nodeid=$(ps -ef | grep raylet | grep node_id | grep -v grep | grep -oP '(?<=--node_id=)[^ ]*' | tr -d '\n')
        if [ -n "$nodeid" ]; then
          echo "$(date) raylet started: \"$(ps -ef | grep raylet | grep node_id | grep -v grep | grep -oP '(?<=--node_id=)[^ ]*')\" => ${nodeid}" >> /tmp/ray/init.log
          echo $nodeid > /tmp/ray/alibabacloud_raylet_node_id
          break
        else
          echo "$(date) raylet not start >> /tmp/ray/init.log"
          sleep 1
        fi
      done
    }
    GetNodeId
  • (If you use a custom ServiceAccount) When HistoryServer is enabled, it replaces the ServiceAccount of pods created by RayCluster. The new ServiceAccount follows the naming convention ServiceAccountPrefix-RayClusterName. Make sure your custom ServiceAccount configuration matches this naming rule.

Step 1: Enable RRSA

  1. Log in to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, click the name of the cluster you want to manage. In the left-side pane, click Cluster Information.

  3. On the cluster details page, go to Basic Information > Security and Auditing, and click Enable next to RRSA OIDC. For more information, see Enable during cluster creation.

    image

Step 2: Create an RRSA role

Create the role

  1. Log in to the RAM console as a RAM administrator. In the left-side navigation pane, choose Identities > Roles.

  2. On the Roles page, click Create Role and select Identity Provider as the trusted principal type.

  3. Add a principal: select the cluster for which you enabled RRSA OIDC.

    image

  4. Add a condition to associate a specific ServiceAccount with this role. If you use a custom ServiceAccount, click Add statement to add two principals to the same RRSA role, one for each of the following conditions: *Principal 1 — Identity Provider* *Principal 2 — Identity Provider* The two service accounts are system:serviceaccount:kuberay:ray-historyserver and system:serviceaccount:*:rhs*, where rhs is a customizable part.

    The asterisk (*) is a wildcard. ray-historyserver must match the ServiceAccountPrefix you specify when installing HistoryServer.
    Field Value
    Key oidc:sub
    Operator StringLike
    Value system:serviceaccount:*:ray-historyserver*
    Field Value
    Key oidc:sub
    Operator StringEquals
    Value system:serviceaccount:kuberay:ray-historyserver
    Field Value
    Key oidc:sub
    Operator StringLike
    Value system:serviceaccount:*:rhs*

    image

Add permissions

Add the following permissions to the role:

  1. Add AliyunARMSReadOnlyAccess for read-only access to Application Real-Time Monitoring Service (ARMS).

    image

  2. Add AliyunOSSFullAccess for managing OSS. The steps are the same as above.

    Important

    This guide grants the role full OSS permissions. In production, use precise authorization to limit the permission scope.

Step 3: Create an OAuth application

Create and configure the OAuth application

Important

To connect to HistoryServer over the internet, see Configure internet access.

  1. Create an OAuth enterprise application. Set the callback address to http://localhost:8080/auth/callback, where:

    • localhost:8080 is the HistoryServer domain, which corresponds to CallbackServiceName in the KubeRay configuration.

    • /auth/callback is a fixed path suffix.

    image

  2. Add the following OAuth scopes:

    • aliuid — retrieves the Alibaba Cloud UID (RAM user or Alibaba Cloud account ID).

    • profile — retrieves the username. For main accounts, this is the login name. For RAM users, this is the user principal name and display name.

    image

  3. Create and save the OAuth application Secret.

    Important

    Record the Application ID and AppSecretValue. You need them in the next step.

    image

Create a Kubernetes Secret

Connect to your ACK cluster (see Connect to a cluster) and run the following commands:

kubectl create ns kuberay
kubectl create secret -n kuberay generic webapp-secret --from-literal=webapp-id="yours-AppID" --from-literal=webapp-secret=yours-AppSecretValue

Replace the placeholders:

Placeholder Description
yours-AppID The OAuth application ID
yours-AppSecretValue The AppSecretValue of the OAuth key
webapp-secret is the Secret name and is customizable.

Step 4: Configure KubeRay parameters

  1. Install the KubeRay component. For more information, see Install KubeRay.

  2. Configure the following parameters in the KubeRay Operator:

    Parameter Description
    Enable HistoryServer Select to enable HistoryServer
    CallbackServiceName The callback domain for HistoryServer OAuth authentication. Must match the domain in the OAuth application callback address. For example, if the callback address is http://xx.com/auth/callback, set this to xx.com.
    CloudRoleName The name of the RRSA role associated with HistoryServer
    OSSBucket The OSS bucket name used by HistoryServer
    OSSEndPoint The endpoint of the OSS bucket
    OSSHistoryServerRootDir The OSS directory where HistoryServer stores logs and metadata
    OSSRegion The OSS region, such as cn-hangzhou or ap-southeast-1

Step 5: Create a RayCluster

Add the ray.alibabacloud.com/enable-historyserver: "true" annotation to your RayCluster manifest to enable HistoryServer. The following is a complete YAML example.

View the YAML file

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  annotations:
    ray.alibabacloud.com/enable-historyserver: "true"
  labels:
    ray.io/cluster: wukun
  generateName: wukun-ray240-
  namespace: default
spec:
  suspend: false
  autoscalerOptions:
    env: []
    envFrom: []
    idleTimeoutSeconds: 60
    imagePullPolicy: Always
    resources:
      limits:
        cpu: 200m
        memory: 200Mi
      requests:
        cpu: 200m
        memory: 200Mi
    securityContext: {}
    upscalingMode: Default
  enableInTreeAutoscaling: false
  headGroupSpec:
    rayStartParams:
      dashboard-host: 0.0.0.0
      num-cpus: "0"
    serviceType: ClusterIP
    template:
      metadata:
        labels:
          test: wukun
      spec:
        affinity:
        containers:
        - env:
          image: xxxx
          imagePullPolicy: Always
          name: ray-head
          resources:
            limits:
              cpu: "5"
              memory: 10G
            requests:
              cpu: "1"
              memory: 1G

        tolerations:
        - key: ray
          operator: Equal
          value: cpu
  workerGroupSpecs:
  - groupName: cpu
    maxReplicas: 1000
    minReplicas: 0
    numOfHosts: 1
    rayStartParams: {}
    replicas: 2
    template:
      metadata:
        labels:
          test: wukun
      spec:
        imagePullSecrets:
        containers:
        - env:

          image: xxxx
          imagePullPolicy: Always
          name: ray-worker
          resources:
            limits:
              cpu: "1"
              memory: 1G
            requests:
              cpu: "1"
              memory: 1G
          volumeMounts:
        tolerations:
        - key: ray
          operator: Equal
          value: cpu
        volumes:

Step 6: Access HistoryServer

Access via localhost

By default, use kubectl port-forward to access HistoryServer. Run the following command in a terminal window:

kubectl -n kuberay port-forward svc/ray-history-server --address 0.0.0.0 8080:80

After running this command, open http://localhost:8080 in your browser. At this point, you cannot view monitoring data in HistoryServer. To view monitoring data, run the following additional port-forward command in a separate terminal window:

kubectl -n kuberay port-forward svc/ray-history-server --address 0.0.0.0 3000:3000

Configure internet access

Important

This example is for demonstration purposes. Enable the Access Control feature in production environments to protect your application data.

Log in to the ACK console and navigate to the cluster details page. Configure the internet service as shown in the following figures, then set the OAuth application callback address to http://${externalIP}/auth/callback. For the OAuth application settings, see Step 3: Create an OAuth application.

imageimage