The native Ray Dashboard is available only when the cluster is running. After the cluster is terminated, users cannot access historical logs and monitoring data. This topic describes the RayCluster HistoryServer feature, which collects node logs in real-time during cluster operation and persistently stores them in Object Storage Service (OSS). This allows you to query historical records even after the cluster is recycled.
Prerequisites
After HistoryServer is enabled, the PostStartHook of the pod created by RayCluster will be overwritten. If you want to use PostStartHook, add the following script. This script writes the Ray
nodeidto the/tmp/ray/init.logfile for the sidecar of HistoryServer Collector to read and use.
GetNodeId(){
while true;
do
nodeid=$(ps -ef | grep raylet | grep node_id | grep -v grep | grep -oP '(?<=--node_id=)[^ ]*' | tr -d '\n')
if [ -n "$nodeid" ]; then
echo "$(date) raylet started: \"$(ps -ef | grep raylet | grep node_id | grep -v grep | grep -oP '(?<=--node_id=)[^ ]*')\" => ${nodeid}" >> /tmp/ray/init.log
echo $nodeid > /tmp/ray/alibabacloud_raylet_node_id
break
else
echo "$(date) raylet not start >> /tmp/ray/init.log"
sleep 1
fi
done
}
GetNodeId
After HistoryServer is enabled, the ServiceAccount of the pod created by RayCluster will be replaced. The new naming rule for ServiceAccount is
ServiceAccountPrefix-RayClusterName. If you want to use a custom ServiceAccount, make sure its configuration is consistent with the generation rule.The installed Kuberay version must be later than
1.2.1.5. For more information, see Install KubeRay in ACK.
1. Enable the RRSA feature for the cluster
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, click Cluster Information.
Enable the RAM Roles for Service Accounts (RRSA) OpenID Connect (OIDC) feature for the cluster. On the cluster page, choose Cluster Information > Basic Information > Security and Auditing, and click Enable next to the RRSA OIDC parameter. For more information about the operation, see Enable during cluster creation.

2. Create an RRSA role
Create an RRSA role.
Log on to the RAM console as a RAM administrator.In the left-side navigation pane, choose . On the Roles page, click Create Role and select Identity Provider as the trusted principal type.
Add a principal.
Identity Provider Type: Select the cluster for which RRSA OIDC is enabled.

Add conditions.
Associate a specific ServiceAccount. After the RRSA role is created, associate the RRSA role with a specific ServiceAccount in the cluster.

Condition
Value
Key
oidc:subOperator
StringLikeValue
system:serviceaccount:*:ray-historyserver*NoteAn asterisk (*) represents a wildcard character, and <ray-historyserver> corresponds to the serviceAccountPrefix in subsequent operations.
ray-historyserver is a custom part that must be the same as the ServiceAccountPrefix when installing HistoryServer.
NoteIf you are using a custom service account, you must click Add statement to add two principals in the same RRSA role, and add the following conditions for each principal:
Principal 1: Identity Provider
Condition
Value
Key
oidc:subOperator
StringEqualsValue
system:serviceaccount:kuberay:ray-historyserverPrincipal 2: Identity Provider
Condition
Value
Key
oidc:subOperator
StringLikeValue
system:serviceaccount:*:rhs*
The two service accounts are
system:serviceaccount:kuberay:ray-historyserverandsystem:serviceaccount:*:rhs*, where rhs is a customizable part.
Add permissions to the RRSA role.
Add the
AliyunARMSReadOnlyAccesspermission to the role for read-only access to Application Real-Time Monitoring Service (ARMS).
Add the
AliyunOSSFullAccesspermission to the role for managing OSS. The steps are the same as above.ImportantThis topic grants the role full OSS permissions. In actual scenarios, we recommend that you use precise authorization to control the scope of permissions.
3. Create an OAuth application
Create and configure an OAuth enterprise application.
ImportantFor more information about how to connect to HistoryServer over the Internet, see Configure Internet access.
NoteThe callback address is:
http://localhost:8080/auth/callback.localhost:8080is the domain name of HistoryServer, which corresponds toCallbackServiceNamein subsequent operations and must be specified when installing HistoryServer./auth/callbackis a fixed path suffix.Configure the OAuth application.
Add OAuth scopes.
aliuid
Obtain the Alibaba Cloud UID (RAM User or Alibaba Account ID).
profile
Obtain the profile such as the username of the user. (The main account access login obtains the login name, and the RAM user login obtains the user principal name and display name.)

Create and save the OAuth application Secret.
ImportantRecord your Application ID and AppSecretValue, which will be used when creating a Secret in the kuberay namespace in subsequent operations.
Connect to the cluster and create a Secret in the kuberay namespace.
For more information about how to connect to a Container Service for Kubernetes (ACK) cluster, see Connect to a cluster.
kubectl create ns kuberay kubectl create secret -n kuberay generic webapp-secret --from-literal=webapp-id="yours-AppID" --from-literal=webapp-secret=yours-AppSecretValueParameter
Description
webapp-secret
The name of the Secret to be created, which can be customized.
webapp-id
The OAuth application ID.
webapp-secret
The AppSecretValue of the OAuth key.
4. Configure KubeRay parameters
Install the Kuberay component.
For more information, see Install KubeRay.
Configure Kuberay-Operator parameters.
Parameter
Description
Enable HistoryServer
Select to enable HistoryServer.
CallbackServiceName
The callback domain name for HistoryServer OAuth authentication, which must be consistent with the domain name in the callback address of the OAuth application. For example, if the OAuth configuration is http://xx.com/oauth/callback, then set this parameter to xx.com.
CloudRoleName
The name of the RRSA role associated with HistoryServer.
OSSBucket
The name of the OSS bucket used by HistoryServer.
OSSEndPoint
The endpoint of the OSS bucket used by HistoryServer.
OSSHistoryServerRootDir
The directory where HistoryServer stores logs and metadata.
OSSRegion
The OSS region used by HistoryServer, such as cn-hangzhou and ap-southeast-1.
5. Create a RayCluster
If you need to enable the HistoryServer feature in a RayCluster, you need to add the ray.alibabacloud.com/enable-historyserver: "true" annotation when submitting the RayCluster. The following is a YAML configuration example.
6. Connect to HistoryServer
Connect to HistoryServer by using localhost
By default, HistoryServer needs to be accessed by using port-forward. Start a terminal window and execute the following command:
kubectl -n kuberay port-forward svc/ray-history-server --address 0.0.0.0 8080:80After configuration, you can access HistoryServer by visiting localhost:8080 in your browser. At this point, you cannot view monitoring data in HistoryServer. To view monitoring data, you must execute an additional port-forward command.
kubectl -n kuberay port-forward svc/ray-history-server --address 0.0.0.0 3000:3000Configure Internet access
This example is for demonstration purposes. For the security of your application data, we recommend that you also enable the Access Control feature in a production environment.
Log on to the ACK console. In the left-side navigation pane, click Clusters. Click the name of the cluster that you want to manage to go to the cluster details page. As shown in the following figure, configure the internet service according to the ordinal numbers. Set the callback address of the OAuth application to the created internet service, in the http://${externalIP}/auth/callback format. For more information about the detailed OAuth application settings, see 3. Create an OAuth application.
