All Products
Search
Document Center

Platform For AI:DSW monitoring and alerts

Last Updated:Jan 19, 2026

Monitor the resource usage of PAI-Data Science Workshop (DSW) instances using detailed metrics for CPU, memory, and GPU. You can also configure alerts and notifications by using CloudMonitor or Application Real-Time Monitoring Service (ARMS). When a resource metric is abnormal, such as when GPU utilization exceeds a threshold, the system triggers an alert, helping you track the instance's load status in real time.

Applicable scope

Monitoring currently supports DSW instances created from Subscription Resources (General-purpose Computing Resources and Lingjun Intelligent Computing Resources). It does not support DSW instances created from Public Resources.

Overview of monitoring methods

DSW provides several monitoring methods. Choose the method that best suits your needs and use case.

Monitoring method

Features

Scenarios

Billing information

View monitoring data in the DSW console

  • Easy to use and integrated with the DSW Console. No extra configuration is required.

  • Provides basic features. Does not support alert configuration or historical data queries.

View the real-time resource status of a single instance during development or debugging.

Free of charge

Use CloudMonitor to configure monitoring and alerts

  • Supports flexible alert rule configurations and notifications through multiple channels.

  • Supports subscribing to metric data through an API for custom development and system integration.

Perform routine, automated monitoring and set up anomaly alerts for DSW instances in a production environment.

CloudMonitor is billed separately. For more information about billing, see CloudMonitor billing information.

Use ARMS to configure monitoring and alerts

  • A professional, cloud-native observability platform with rich visualization capabilities.

  • Provides a built-in Grafana dashboard for out-of-the-box monitoring views.

  • Offers more flexible and powerful alert configuration capabilities based on Prometheus.

Customize monitoring dashboards, perform multi-dimensional data analysis, or integrate DSW monitoring into a unified ARMS observability system.

ARMS is billed separately. For more information about billing, see ARMS billing information.

View the monitoring page in the DSW console

On the DSW instance list page, click an instance name to open its details page. Click the Monitoring tab. This page shows the instance's resource status, including CPU, memory, and GPU.

image

Use CloudMonitor to configure monitoring and alerts

CloudMonitor is a service that monitors Alibaba Cloud resources and internet applications. It provides a one-stop, enterprise-grade monitoring solution for cloud users. Log in to the CloudMonitor console to view DSW instance monitoring data and configure alert notifications. CloudMonitor also supports subscribing to metric data through an API to build your own monitoring systems and dashboards.

View monitoring data

Log on to the CloudMonitor console. In the navigation pane on the left, click Cloud Service Monitoring. Search for and select PAI-Interactive Modeling DSW, then select the workspace and instance you want to query.

To obtain the workspace ID, see Manage workspaces. To obtain the instance ID, see Access and manage DSW in the console.

image

Configure monitoring alerts

You can use the monitoring and alert feature to monitor the resource usage of DSW instances and configure flexible alert rules.

Configure alert contacts

  1. Log on to the Cloud Monitor console.

  2. In the left-side navigation pane, choose Alerts > Alert Contacts.

  3. On the Alert Contacts tab, click Create Alert Contact.

    1. In the Set Alert Contact panel, enter the name, email address, and webhook URL of the alert contact.

    2. Click OK.

  4. On the Alert Contact Group tab, click Create Alert Contact Group.

    1. In the Create Alert Contact Group panel, enter a name for the alert contact group and add alert contacts to the alert contact group.

    2. Click Confirm.

Configure alert rules

  1. In the navigation pane on the left of the CloudMonitor console, click Cloud Resource Monitoring > Cloud Service Monitoring. Then, search for and select PAI-Interactive Modeling DSW.

  2. On the PAI-Interactive Modeling DSW page, select the service region and click Create Alert Rule.

    image

  3. On the Create Alert Rule page, configure the following key parameters and click Confirm. For more information about the parameters, see Create an alert rule.

    Parameter

    Description

    Product

    Select PAI-Data Science Workshop (DSW).

    Resource Range

    The scope to which the alert rule applies:

    • All Resources: The alert rule applies to all DSW resources.

    • Application Group: The alert rule applies to all resources within a specified DSW application group.

    • Instance: The alert rule applies to instances under a specified DSW workspace.

    Rule Description

    The conditions that trigger the alert. For instructions on how to set the rule description, see Create an alert rule.

    Mute Period

    The interval between repeated notifications for an unresolved alert.

    Effective Period

    The period when the alert rule is active. The rule only checks if an alert needs to be triggered during this period.

    Alert Contact Group

    The contact group to which alerts are sent. Ensure the group contains alert contacts.

  4. On the PAI-Data Science Workshop (DSW) page, click View Alert Rules to see details of created alert rules, alert history, and more.

Use ARMS to configure monitoring and alerts

Application Real-Time Monitoring Service (ARMS) is an Alibaba Cloud-native observability platform. With ARMS, you can customize a Grafana dashboard for DSW and use Prometheus to configure flexible alert rules.

Provision monitoring data

Follow these steps:

  1. Log on to the ARMS console. In the navigation pane on the left, click Integration Center > AI, and then click Alibaba Cloud PAI-Interactive Modeling (DSW).image

  2. In the Start Integration tab of the panel that appears, select a region for storage, configure an Access Name, and then click OK.

    Provisioning DSW takes one to two minutes. You can also switch to the PreviewCollect Metrics, or Alert Rule Template tab to view the monitoring dashboard, supported metrics, or alert rule details, respectively.

  3. After provisioning is complete, in the navigation pane on the left, choose Provisioning > Integration Management to see the Alibaba Cloud PAI-Interactive Modeling (DSW) component.

View the Grafana dashboard

  1. Log on to the ARMS console. In the navigation pane on the left, click Integration Management > Integrated Addons > Alibaba Cloud PAI-Interactive Modeling (DSW) > Dashboard. Click a dashboard name to view the built-in Grafana dashboard.

    image

  2. The Grafana dashboard page lets you view the status of a specific DSW instance by its ID.

    image

Configure Prometheus alerts

  1. Log on to the ARMS console. In the navigation pane on the left, click Integration Management > Integrated Addons > Alibaba Cloud PAI-Interactive Modeling (DSW) > Alerts to view the built-in alert rules.

    image

  2. The built-in alert rules generate alert events but do not send notifications. You can configure notifications to be sent to email or other platforms using one of the following two methods:

    • Configure notifications by editing an alert rule. In the Actions column of an alert rule, click Edit to open the Prometheus alert rule configuration page. Configure alert conditions, content, and notifications as needed. For more information about alert configuration, see Create a Prometheus alert rule.

      image

    • Set up a notification policy to define matching rules for alert events. When an event matches a rule, the system sends an alert to the specified notification object using your chosen method. For more information, see Notification policies.