A PAI Deep Learning Containers (DLC) Job is considered compliant if AIMaster-based fault tolerance monitoring is enabled. This rule does not apply if no training Jobs exist.
Risk level
The default risk level is High.
You can change the risk level as needed.
Detection logic
A PAI Deep Learning Containers (DLC) Job is considered compliant if AIMaster-based fault tolerance monitoring is enabled.
If no training Jobs exist, this rule does not apply.
Rule details
Parameter | Description |
Rule name | Enable AIMaster-based fault tolerance monitoring for PAI distributed training |
Rule identifier | |
Tag | [PAIWorkspace] |
Automatic remediation | Not supported |
Rule trigger | Periodic, every 24 hours |
Supported resource types | [ACS::PAIWorkspace::Workspace] |
Input parameters | None |
Remediation guide
For more information about remediation, see AIMaster: Elastic Automatic Fault Tolerance Engine.