When you go to the Application Overview page in the Enterprise Distributed Application Service (EDAS) console, the system performs an automatic diagnostics for the running status of your application based on the time range that you customize. If an issue is found, a red shield icon Diagnosis report icon flashes at the top of the Application Overview page. After you click the icon, a diagnosis report appears. You can locate and fix the issue based on the fault definition and root cause analysis in the diagnosis report.

Common scenarios of automatic fault diagnostics

Sudden increase in response time (RT)
  • If downstream business causes a sudden increase in RT of an application, you can contact the person in charge of the downstream business for troubleshooting.
  • If an application change causes the sudden increase in RT, you can view the specific changes for troubleshooting.
  • If a service of the application causes a sudden increase in RT, you can check the following issues:
    • Check whether the service has an exception at this time.
    • Check whether the downstream service that calls the service has a high RT.
    • Check whether the RT of a service that is called by the service is high.
  • The sudden increase in RT is caused by the following issues on a single node:
    • If a thread pool is full, a time series chart for the number of threads is provided in the diagnosis report.
    • Full garbage collection (GC) on the single node.
    • Disk read and write errors on the single node.
    • Out of memory (OOM) issues on the single node.
High proportion of error requests or a large number of requests
  • The number of error requests for a service of the application is suddenly increased. This results in a high proportion of error requests.
  • A large number of requests and responses that occur in a specific period account for a high proportion. As a result, serialization and deserialization consume a long time.
Excessive loads on a host

Excessive loads on the host reduce the capability of the container to provide services.

Network issues

A running exception occurs in the application due to a network failure in a data center.

View the report of automatic fault diagnostics

  1. Log on to the EDAS console.
  2. Perform one of the following operations as needed to go to the application details page:
    • In the left-side navigation pane, choose Resource Management > Container Service Kubernetes Clusters or Resource Management > Serverless Kubernetes Clusters. In the top navigation bar, select a region. In the upper part of the page, select a namespace. On the Container Service Kubernetes Clusters or Serverless Kubernetes Clusters page, click a cluster ID. In the Applications section of the Cluster Details page, click the name of the application that you want to manage.
    • In the left-side navigation pane, click Applications. In the top navigation bar, select a region. In the upper part of the page, select a namespace. Select Container Service or Serverless Kubernetes Cluster from the Cluster Type drop-down list, and then click the name of the application that you want to manage.
  3. On the Overall Analysis tab of the Application Overview page, specify your diagnostic time in the upper-right corner.
    Notice If an application is diagnosed with a running exception within the custom diagnostic time, the red shield icon Diagnosis report icon appears on the right side of the application name in the upper part of the page. If the application is not diagnosed with a fault, it does not mean that the application has no potential problems.
    1. In the upper part of the Application Overview page, click Diagnosis report icon on the right side of the application name.
    2. View the fault symptom and cause analysis in the diagnosis report.

Composition of a diagnosis report

The diagnosis report consists of four parts: diagnosis details, fault definition, root cause analysis, and data support.
  • Diagnosis details: This part consists of the application that is diagnosed, the diagnostic time, and the fault symptom.
  • Fault definition: This part contains the shallow causes of application failures that are inferred by the diagnostic model. Generally, the following three causes are included:
    • An instance error of the application causes an overall failure.
    • An interface or service error of the application causes an overall failure.
    • A downstream application failure of the application causes a failure of the application.
  • Root cause analysis: This part contains the deep causes that are inferred by the diagnostic model. Numerous deep causes exist and vary based on the actual situation.
  • Data support: This part contains data support for obtaining the inference. The diagnosis reports of different faults contain different analysis data.