All Products
Search
Document Center

Elastic Compute Service:Comprehensive instance diagnosis

Last Updated:Nov 19, 2025

The comprehensive instance diagnosis feature checks the system, network, and disk status of an instance. This helps you understand the health of your instance and promptly identify and resolve common issues.

Prerequisites

  • When you use the Instance Fee and Security Behavior Audit diagnosis feature, the system checks whether the current account has the AliyunServiceRoleForECSSelfService service-linked role. If the role does not exist, a prompt appears. After you confirm the prompt, the system automatically creates the AliyunServiceRoleForECSSelfService service-linked role.

    The AliyunServiceRoleForECSSelfService role includes the AliyunServiceRolePolicyForECSSelfService system access policy. You cannot add, modify, or delete the permissions granted by this policy.

    Expand to view the AliyunServiceRolePolicyForECSSelfService policy document

    {
      "Version": "1",
      "Statement": [
        {
          "Action": [
            "ecs:StartInstance",
            "ecs:StopInstance",
            "ecs:DescribeInstances",
            "ecs:CreateSnapshot",
            "ecs:DescribeSnapshots",
            "ecs:DeleteSnapshot",
            "ecs:DescribeDisks",
            "ecs:DescribeDisksFullStatus",
            "ecs:ResetDisk",
            "ecs:DescribeInvocationResults",
            "ecs:DescribeInvocations",
            "ecs:RunCommand",
            "ecs:CreateDiagnosticReport",
            "oos:StartExecution",
            "oos:ListExecutions",
            "oos:ListExecutionLogs",
            "oos:ListTaskExecutions",
            "oos:CancelExecution",
            "actiontrail:LookupEvents"
          ],
          "Resource": "*",
          "Effect": "Allow"
        },
        {
          "Action": "ram:DeleteServiceLinkedRole",
          "Resource": "*",
          "Effect": "Allow",
          "Condition": {
            "StringEquals": {
              "ram:ServiceName": "selfservice.ecs.aliyuncs.com"
            }
          }
        }
      ]
    }

    If you use a Resource Access Management (RAM) user to run Instance Fee and Security Behavior Audit diagnostics, contact the Alibaba Cloud account owner to grant the RAM user permission to create service-linked roles. For more information, see Create custom policies in edit mode and Grant permissions to a RAM user.

    The following policy document grants a RAM user permission to use the self-service instance troubleshooting feature. The <account ID> is a variable. Replace it with the UID of your Alibaba Cloud account.

    {
        "Statement": [
            {
                "Action": [
                    "ram:CreateServiceLinkedRole"
                ],
                "Resource": "acs:ram:*:<account ID>:role/*",
                "Effect": "Allow",
                "Condition": {
                    "StringEquals": {
                        "ram:ServiceName": [
                            "selfservice.ecs.aliyuncs.com"
                        ]
                    }
                }
            }
        ],
        "Version": "1"
    }
  • If you are running a comprehensive diagnosis or diagnosing a network anomaly, ensure that the instance meets the following conditions:

    • Instance type: The instance belongs to an instance family that is available for purchase. For more information, see Instance families.

      Note

      Discontinued instance families do not support the instance health diagnosis feature.

    • Instance status: The instance is in the Running state.

    • Operating system: If the selected scenario involves checking configurations within the instance's operating system, ensure that the operating system meets the conditions in the following table.

      System architecture

      Operating system version

      Configuration within the operating system

      x86 64-bit

      • Windows Server 2008 and later

      • Alibaba Cloud Linux 2/3

      • AlmaLinux 8.x and later

      • Anolis OS 7.x/8.x

      • CentOS 7.x/8.x

      • CentOS Stream 8 and later

      • Debian 8.x and later

      • Fedora 33/34

      • OpenSUSE 15.x/42.x

      • Rocky Linux 8.x and later

      • SUSE Linux Enterprise Server 12.x/15.x

      • Ubuntu 16.04/18.04/20.04/24.04

      Note

      Operating system distributions not listed in the table are not supported. The diagnostic performance on unsupported distributions is not guaranteed.

  • If the scenario is Instance fails to start, ensure that the instance meets the following conditions:

    • Instance status: The instance is in the Stopped state.

    • Operating system: The selected scenario involves checking configurations within the instance's operating system. Ensure that the operating system meets the conditions in the following table.

      System architecture

      Operating system version

      x86 64-bit

      • Windows Server 2008 and later

      • Alibaba Cloud Linux 2/3

      • AlmaLinux 8.x and later

      • Anolis OS 7.x/8.x

      • CentOS 7.x/8.x

      • CentOS Stream 8 and later

      • Debian 8.x and later

      • Fedora 33/34

      • OpenSUSE 15.x/42.x

      • Rocky Linux 8.x and later

      • SUSE Linux Enterprise Server 12.x/15.x

      • Ubuntu 16.04/18.04/20.04/24.04

      Note

      Operating system distributions not listed in the table are not supported. The diagnostic performance on unsupported distributions is not guaranteed.

Scenarios

Use the comprehensive instance diagnosis feature in the following scenarios to understand the health of your instance:

  • Troubleshoot issues: Run targeted diagnostics to find solutions for problems you encounter, such as a failed network connection.

  • Perform regular checks: Understand the overall health of your instance during routine operations and maintenance (O&M). This helps you promptly detect and handle issues to prevent business disruptions.

Note

The instance health diagnosis feature provides problem descriptions and recommended solutions for each diagnostic item. For more information, see Diagnostic items and results.

Procedure

ECS console

Create an instance diagnosis

  1. Log on to the ECS console.

  2. In the navigation pane on the left, choose Maintenance & Monitoring > Troubleshooting.

  3. In the upper-left corner of the top menu bar, select a region.

  4. Select a time and an instance ID, and then click Start.

    Note

    Only one diagnostic task can be in progress for an instance at a time. The interval between two consecutive diagnoses must be more than 5 minutes.

    Problem type

    Description

    Instance Performance Issues

    Diagnose issues such as high CPU load, high memory usage, high bandwidth usage, high disk BPS or IOPS, or degraded performance on an ECS instance.

    Instance Connection Errors or Startup Exceptions

    Diagnose issues such as failed remote connections over the Secure Shell Protocol (SSH) or VNC, an instance that is down, or an instance's operating system failing to start.

    Network Issues

    Diagnose issues such as degraded network performance or ping failures on an ECS instance.

    Ineffective Instance Operation

    Diagnose issues where an operation on an ECS instance did not take effect, such as a disk expansion that was not applied.

    Insufficient Resource Quota

    Diagnose issues that occur because an ECS resource quota is reached. Examples include an insufficient disk capacity quota, an insufficient image quota, or reaching the maximum number of Elastic Network Interfaces (ENIs) or security groups.

    Check for Security Risks

    Diagnose security risks on an ECS instance, such as system vulnerabilities, security alerts, or malicious processes.

    Instance Billing and Security Audit

    Audit and trace operations related to ECS instance status, instance fees, and security groups.

    Note

    To use the instance fee and security behavior audit feature, you must have the service-linked role and permissions for self-service instance troubleshooting. For more information, see Service-linked role AliyunServiceRoleForECSSelfService.

    Instance Device Check

    Check whether devices such as GPUs on an instance are running properly.

    Others

    You can directly enter the issue details, instance ID, and the corresponding troubleshooting epoch.

    The actual diagnostic items may vary. In the diagnostic report, click the tabs under Diagnostic Item Details to view the items and their progress. The diagnosis takes a few minutes. You can view the progress on the current page or close the dialog box and check the diagnostic task list for the progress and the report.

  5. View the diagnostic report.

    The diagnostic report contains the following information:

    • Basic Information: Includes the diagnosis time range, resource ID, report ID, and diagnosis time.

    • Diagnosis Result: If all checks are normal, the result is No exceptions are detected on the instance. If any abnormal items are found, the specific items are displayed with recommended solutions. You can follow the recommendations to resolve the issues.

    • Diagnostic Item Details: Includes the results for each diagnostic item, with severity levels of Critical, Warning, and Passed.

    Note

    When you use the instance fee and security behavior audit feature, you can also obtain more information in the following ways:

    You can use the diagnostic report to resolve issues.

View diagnostic history

To review the historical health status of an instance, you can view its diagnostic history.

  1. Log on to the ECS console.

  2. View the instance's diagnostic history.

    1. In the navigation pane on the left, choose Maintenance & Monitoring > Troubleshooting.

    2. In the top navigation bar, select a region.

    3. On the Instance Troubleshooting tab, click View History.

    4. On the Check History page, click the Instance Health Diagnosis tab, enter a resource ID or report ID, and then click the search icon.

    Note

    In the diagnostic history report list, you can click the Filter icon to the left of Actions and select a status to filter the list.

  3. For a single diagnostic history entry, you can click View Report to view the detailed report, or click Re-diagnose to start a new diagnosis.

OpenAPI

  1. You can query diagnostic metrics.

    Call DescribeDiagnosticMetrics to query diagnostic metrics. For a list of available diagnostic metrics, see Diagnostic items and results.

  2. You can manage diagnostic metric collections.

    There are two types of diagnostic metric collections. You can use them to create diagnostic reports.

    • Public diagnostic metric collections: Public diagnostic metric collections are based on common user issues and help simplify the diagnosis process.

      Public diagnostic metric collections are maintained by Alibaba Cloud. You cannot modify them. You can call DescribeDiagnosticMetricSets to query public diagnostic metric collections. The currently supported public diagnostic metric collections are as follows.

      Metric name

      Description

      Scenario

      dms-instancedefault

      Default diagnostic collection

      Used for a comprehensive check of an ECS instance.

    • Custom diagnostic metric collections: If you want to check only specific diagnostic metrics, you can call CreateDiagnosticMetricSet to create a custom diagnostic metric collection. After the collection is created, you can call DescribeDiagnosticMetricSets to query it.

      The following sample response indicates that a custom diagnostic metric collection named test has been created.

      {
        "RequestId": "6AF68D67-601A-5278-AB10-4195CCA7****",
        "MetricSets": [
          {
            "Type": "User",
            "MetricIds": [
              "Instance.ControllerError",
              "Instance.CPUException",
              "Instance.CPUSplitLock"
            ],
            "MetricSetId": "dms-uf6ck3iljpbft15i****",
            "ResourceType": "instance",
            "MetricSetName": "test"
          }
        ]
      }
  3. You can create a diagnostic report.

    You can call CreateDiagnosticReport to create a diagnostic report using a custom or public diagnostic metric collection.

    The following sample response indicates that the diagnostic report was successfully created.

    {
      "RequestId": "A1283ACE-2F19-54B9-9464-401EBD1A****",
      "ReportId": "dr-uf6aacg5g2fjp64i****"
    }
  4. You can query a diagnostic report.

    You can call DescribeDiagnosticReports to query the details of a diagnostic report. The response returns the diagnosis result for each diagnostic metric in the collection. For more information about the results of diagnostic items, see Diagnostic items and results.

    The following sample response indicates that the diagnosis is normal and no issues were found.

    {
      "RequestId": "20381C19-C31B-52AE-AC9B-8AD672E4****",
      "NextToken": "",
      "Reports": [
        {
          "Status": "Finished",
          "EndTime": "2022-09-07T15:36Z",
          "ResourceId": "i-uf653eye7pkftni****",
          "MetricSetId": "dms-uf6ck3iljpbft15i****",
          "Issues": [],
          "StartTime": "2022-09-05T15:36Z",
          "CreationTime": "2022-09-07T15:36Z",
          "ReportId": "dr-uf6aacg5g2fjp64i****",
          "ResourceType": "instance",
          "Severity": "Normal",
          "FinishedTime": "2022-09-07T15:36Z"
        }
      ]
    }

References