All Products
Search
Document Center

Platform For AI:View EAS events in CloudMonitor

Last Updated:Jun 27, 2025

Elastic Algorithm Service (EAS) defines two types of events in CloudMonitor: Service events and ServiceInstance events. The EAS event controller pushes both events to CloudMonitor in real time. You can view events, perform O&M operations, audit events, or configure alert rules for events in the CloudMonitor console or via API.

View EAS events

Use the console

Take the following steps to view EAS events in the CloudMonitor console.

  1. Log on to the Cloud Monitor console.

  2. In the left-side navigation pane, choose Event Center > System Event.

  3. On the Event Monitoring tab, select PAI from the product drop-down list, and click Search.

    image

  4. Click Details in the Actions column of the target event to view the event details. Example:image

    Parameters

    Parameter

    Description

    Product

    The service code. For example, the code of Platform for AI (PAI) is learn.

    Name

    The event name, see the Name column in Supported EAS events.

    Level

    The event level. Valid values:

    • INFO

    • WARN

    • CRITICAL

    Status

    The event status, see the Status column in Supported EAS events.

    RegionId

    The region ID of the service. For example, the ID of the China (Shanghai) region is cn-shanghai.

    ResourceId

    The resource ID, see Policy description.

    InstanceName

    The service name or service instance name.

    Time

    The time at which the event occurred, a UNIX millisecond timestamp.

    GroupId

    The CloudMonitor application group to which the EAS service belongs. By default, this parameter is empty.

    Content

    The core content of the event.

    Fields of the Content parameter

    Parameter

    Description

    serviceName

    The service name of the instance.

    serviceId

    The service ID of the instance.

    serviceGroup

    The service group to which the instance belongs.

    resourceType

    The type of the resource group to which the instance belongs. Valid values:

    • PublicResource: public resource group.

    • DedicatedResource: dedicated resource group.

    instanceType

    The instance type.

    cpu

    The number of CPUs used by the instance.

    memory

    The memory usage of the instance. Unit: MB.

    gpu

    The number of GPUs used by the instance.

    gpuMemory

    The GPU memory usage of the instance. Unit: GB.

    nvidiaName

    The name of the GPU used by the instance.

    role

    The service role of the instance. Valid values:

    • Queue: the queue service.

    • DataLoader: the offline service.

    • Standard: the standard service.

    isBurst

    Specifies whether auto scaling is enabled for the resource group of the instance. Valid values:

    • false

    • true

    isSpot

    Specifies whether the instance is a preemptible instance. Valid values:

    • false

    • true

    callerUid

    The UID of the Alibaba Cloud account that is used to deploy the EAS service.

    timestamp

    The last startup time of the container.

    restartCount

    The number of times the instance was restarted.

    exitCode

    The exit status code of the instance. By default, this parameter is empty.

    status

    The status of the instance. For information about the valid values, see the Status column in Supported EAS events.

    reason

    The reason why the event occurred.

    message

    The event message.

Use API

Call DescribeSystemEventAttribute to view EAS events.

Create and enable alert rules

Use the console

  1. Create a system event-triggered alert rule. Configure the following key parameters:

    • Product Type: Select PAI.

    • Event Type: Select ServiceInstance or Service.

    • Event Level: Select one or more event levels.

    • Event Name: Select one or more event names that you want to monitor, which is the Name column in Supported EAS events.

    • Keyword Filtering: Set keywords to match the content in the event information to filter the subscribed events.

    image

  2. Configure callbacks for system event-triggered alerts (old).

Use API

Use API to create an event-triggered alert rule and enable the rule.

FAQ

Does a service instance refer to an inference service or a pod instance?

The Service event type represents service-level events. The ServiceInstance event type represents service instance-level events. In this context, a service instance refers to a pod instance.

Appendix: Supported EAS events

EAS defines service-level events and service instance-level events as follows.

Type

Name

Event Level

Event Status

ServiceInstance

EAS:ServiceInstance:Running

INFO

Running

EAS:ServiceInstance:Pending

INFO

Pending

EAS:ServiceInstance:Completed

INFO

Completed

EAS:ServiceInstance:Terminating

INFO

Terminating

EAS:ServiceInstance:Terminated

INFO

Terminated

EAS:ServiceInstance:Unknown

WARN

Unknown

EAS:ServiceInstance:Evicted

WARN

Evicted

EAS:ServiceInstance:ErrImagePull

WARN

ErrImagePull

EAS:ServiceInstance:ImagePullBackOff

WARN

ImagePullBackOff

EAS:ServiceInstance:CrashLoopBackOff

CRITICAL

CrashLoopBackOff

EAS:ServiceInstance:Error

CRITICAL

Error

EAS:ServiceInstance:Failed

CRITICAL

Failed

EAS:ServiceInstance:SpotToBeReleased

WARN

SpotToBeReleased

Service

EAS:Service:ReplicasChanged

INFO

ReplicasChanged

EAS:Service:StatusChanged

INFO

StatusChanged

EAS:Service:Unavailable

CRITICAL

Unavailable

EAS:Service:UpdateFailed

CRITICAL

UpdateFailed