When you go to the Application Overview page in the Enterprise Distributed Application Service (EDAS) console, you can customize a time range for a diagnostics test. Then, the system performs an automatic diagnostics test for the status of your application within the specified time range. If an issue is found, a red shield icon Diagnosis report icon appears in the upper part of the Application Overview page. After you click the icon, a diagnosis report appears. You can identify and fix the issue based on the fault definition and root cause analysis in the diagnosis report.

Common scenarios of automatic fault diagnostics

Sudden increase in the RT
  • If downstream business causes a sudden increase in the response time (RT) of an application, you can contact the person in charge of the downstream business for troubleshooting.
  • If an application change causes the sudden increase in the RT, you can view the specific changes for troubleshooting.
  • If a service of the application causes a sudden increase in the RT, you can check the following issues:
    • Check whether the service has an exception at this time.
    • Check whether the downstream service that calls the service has a long RT.
    • Check whether the RT of a service that is called by the service is long.
  • The sudden increase in the RT is caused by the following issues on a single node:
    • Full thread pool. A time series curve chart for the number of threads is provided in the diagnosis report.
    • Full garbage collection (GC) on the single node.
    • Disk read and write errors on the single node.
    • Out of memory (OOM) issues on the single node.
High proportion of error requests or a large number of requests
  • The number of error requests for a service of the application suddenly increases. This results in a high proportion of error requests.
  • A large number of requests and responses that occur in a specific period account for a high proportion. As a result, serialization and deserialization consume a long time.
Excessive loads on a host

Excessive loads on the host reduce the capability of the container to provide services.

Network issues

When the application is running, an exception occurs due to a network failure in a data center.

View the report of automatic fault diagnostics

  1. Log on to the EDAS console.
  2. Go to the Application Overview page by using one of the following methods:
    • In the left-side navigation pane, choose Resource Management > Container Service Kubernetes Clusters or Resource Management > Serverless Kubernetes Clusters. In the top navigation bar, select a region. In the upper part of the page, select a microservice namespace. On the Container Service Kubernetes Cluster or Serverless Kubernetes Clusters page, click a cluster ID. In the Applications section of the Cluster Details page, click the name of the application that you want to manage.
    • In the left-side navigation pane, click Applications. In the top navigation bar, select a region. In the upper part of the page, select a microservice namespace. Select Container Service or Serverless Kubernetes Cluster from the Cluster Type drop-down list and click the name of the application that you want to manage.
  3. On the Overall Analysis tab of the Application Overview page, specify a time range for a diagnostics test in the upper-right corner.
    Notice If the application is diagnosed with an exception within the specified time range, a red shield icon Diagnosis report icon appears on the right side of the application name in the upper part of the page. If the application is not diagnosed with a fault, this does not mean that the application has no potential issues.
    1. In the upper part of the Application Overview page, click Diagnosis report icon on the right side of the application name.
    2. View the fault symptom and cause analysis in the diagnosis report.

Composition of a diagnosis report

The diagnosis report consists of four parts: diagnosis details, fault definition, root cause analysis, and data support.
  • Diagnosis details: This part consists of the application that is diagnosed, the diagnostic time, and the fault symptom.
  • Fault definition: This part contains the shallow causes of application failures that are inferred by the diagnostic model. Generally, the following three causes are included:
    • An instance error of the application causes an overall failure.
    • An API or service error of the application causes an overall failure.
    • A downstream application failure of the application causes a failure of the application.
  • Root cause analysis: This part contains the deep causes that are inferred by the diagnostic model. Numerous deep causes exist and vary based on the actual situation.
  • Data support: This part contains data support for obtaining the inference. The diagnosis reports of different faults contain different analysis data.