When you go to the Application Overview page in the Enterprise Distributed Application
Service (EDAS) console, you can customize a time range for a diagnostics test. Then,
the system performs an automatic diagnostics test for the status of your application
within the specified time range. If an issue is found, a red shield icon appears in the upper part of the Application Overview page. After you click the icon,
a diagnosis report appears. You can identify and fix the issue based on the fault
definition and root cause analysis in the diagnosis report.
Common scenarios of automatic fault diagnostics
Sudden increase in the RT- If downstream business causes a sudden increase in the response time (RT) of an application, you can contact the person in charge of the downstream business for troubleshooting.
- If an application change causes the sudden increase in the RT, you can view the specific changes for troubleshooting.
- If a service of the application causes a sudden increase in the RT, you can check
the following issues:
- Check whether the service has an exception at this time.
- Check whether the downstream service that calls the service has a long RT.
- Check whether the RT of a service that is called by the service is long.
- The sudden increase in the RT is caused by the following issues on a single node:
- Full thread pool. A time series curve chart for the number of threads is provided in the diagnosis report.
- Full garbage collection (GC) on the single node.
- Disk read and write errors on the single node.
- Out of memory (OOM) issues on the single node.
- The number of error requests for a service of the application suddenly increases. This results in a high proportion of error requests.
- A large number of requests and responses that occur in a specific period account for a high proportion. As a result, serialization and deserialization consume a long time.
Excessive loads on the host reduce the capability of the container to provide services.
Network issuesWhen the application is running, an exception occurs due to a network failure in a data center.
View the report of automatic fault diagnostics
Composition of a diagnosis report
- Diagnosis details: This part consists of the application that is diagnosed, the diagnostic time, and the fault symptom.
- Fault definition: This part contains the shallow causes of application failures that
are inferred by the diagnostic model. Generally, the following three causes are included:
- An instance error of the application causes an overall failure.
- An API or service error of the application causes an overall failure.
- A downstream application failure of the application causes a failure of the application.
- Root cause analysis: This part contains the deep causes that are inferred by the diagnostic model. Numerous deep causes exist and vary based on the actual situation.
- Data support: This part contains data support for obtaining the inference. The diagnosis reports of different faults contain different analysis data.