All Products
Search
Document Center

Platform For AI:Notification rules

Last Updated:Apr 01, 2026

Configure notification rules that alert you through DingTalk, email, or other channels when a DLC job fails, times out, or is preempted.

Use cases

  • Job failure alerts: Receive immediate notifications when a job fails so that you can check logs and restart.

  • Timeout monitoring: Detect when a job exceeds its maximum queuing or runtime duration.

  • Preemption handling: Receive alerts when spot or idle instance jobs are preempted so that you can reschedule.

  • Workspace-wide oversight: Monitor all jobs in a workspace to keep team training runs on track.

Prerequisites

Before you begin, make sure that you have:

  • A PAI workspace with at least one DLC job.

  • Access to Workspace Configuration in your workspace.

Configure notifications

  1. On the Workspace Details page, choose Workspace Configuration > Event Notification Configuration and click Create Event Rule.image

  2. In the Create Event Rule panel, configure the following parameters and click Submit.image

    To receive Job Timeout notifications, configure timeout rules in Scheduling Configuration first. Without timeout rules, timeout events do not trigger. See Configure timeout alert rules.
    Parameter Description
    Rule name Custom name for the rule.
    Event type Set Event Source to DLC Job and select event types to monitor. See the supported event types table below.
    Event scope Created by me: Only DLC jobs created by you. All in current workspace: All DLC jobs in the workspace.
    Event target Notification channel: DingTalk, WeCom, Lark, voice call, text message, or email.

    Supported event types

    Category Event type Trigger condition
    Job progress Enters Queue Job is queued.
    Start Bidding Job enters the Bidding state.
    Starts Environment Preparation Job starts preparing the environment.
    Starts Running Job starts running.
    Retained on Success Job is retained after successful completion.
    Retained on Failure Job is retained after failure.
    Job Fails Job execution fails.
    Job Finishes (Success or Failure) Job execution completes, regardless of outcome.
    Automatic fault tolerance Automatic Fault Tolerance DLC job encounters an error and triggers automatic fault tolerance.
    Job timeout Queue Timeout Queuing duration exceeds the configured maximum.
    Environment Preparation Timeout Environment preparation duration exceeds the configured maximum.
    Wait Timeout Wait duration from job creation to execution exceeds the configured maximum.
    Run Timeout Runtime exceeds the configured maximum, which triggers an automatic stop.
    Other events Job Is Preempted Idle or spot instance job is preempted.
    Job Is Manually Stopped Job is manually stopped.
    Job Priority Is Adjusted Job priority is adjusted.

After the rule is created, the system sends notifications to the preset contacts when a job triggers the rule. To investigate a job, go to the Deep Learning Containers (DLC) page and check the monitoring status and logs. For more information, see View training details.

Cases where notifications are not sent

Understanding these cases helps you avoid misdiagnosing missed alerts as configuration errors:

  • Timeout events without timeout rules: If you select a Job Timeout event type but have not configured timeout rules in Scheduling Configuration, timeout events never trigger and no notification is sent.

  • Out-of-scope jobs: A rule scoped to Created by me does not trigger for jobs created by other workspace members.

Configure timeout alert rules

Timeout alert rules define the maximum allowed duration for each phase of a DLC job. Configure these rules before you enable job timeout notifications.

  1. On the Workspace Configuration page, click the Scheduling Configuration tab. In the DLC section, configure the timeout rules.image To add multiple timeout rules, click Add.

    Parameter Description
    Resource quota Resource scope for the timeout rule: Public resource group or a specific Resource quota attached to the workspace.
    Timeout rule configuration Maximum duration for each phase: Job waiting duration (queuing + environment preparation), Queuing duration, or Environment preparation duration.
  2. Click Save.

After saving, go to Event Notification Configuration, select the DLC Job event source, and configure the corresponding timeout event notification. If no timeout notification is configured, you do not receive alerts when a timeout occurs.

What's next