All Products
Search
Document Center

Well-Architected Framework:Observability

Last Updated:Jul 15, 2025

Change observability refers to the ability to real-time perceive any unexpected online business anomalies (including monitoring, alarms, logs, etc.) triggered by changes during the change execution process. It is one of the effective ways for change executors to proactively and promptly discover problems and reduce the impact radius of major failures. Change observability is one of the basic tools for change executors and the most fundamental requirement for change systems.

Three Principles of Change Observability

  • Effective observation during change execution: The change system gradually implements strong control, and change observation starts from the first batch of execution.

  • Observation required during each gray release: Change observation needs to be carried out throughout the change execution process to ensure that the current batch of changes is observed without any abnormalities before proceeding to the next batch.

  • Sufficient observation interval for each batch of changes: Each business can implement different observation interval lengths suitable for the characteristics of the business based on its own experience, in order to avoid insufficient observation.

Levels of Observability

Observable coverage can be comprehensively determined based on the objects and methods of monitoring, and divided into four layers of observability:

  • Infrastructure monitoring: Mainly focuses on the operation of data centers, networks, and other infrastructure. In the Kubernetes environment on the cloud, it also refers to the performance monitoring of host nodes and network basic components. This observable coverage can be achieved through Alibaba Cloud CloudMonitor, such as viewing Node Load, CPU, memory, network usage rates, etc.

  • System application monitoring: Mainly focuses on the operation of instances, middleware, and other basic services. This observability can also be achieved through CloudMonitor. Alibaba Cloud Managed Edition Prometheus (ARMS Prometheus) can also meet the observability requirements of cloud-native metrics.

  • Business monitoring: By collecting business status data in the application, such as the number of interface requests, success rate, response time, etc., business-level monitoring metrics are produced to reflect the health of the business, thereby completing business monitoring. Alibaba Cloud ARMS provides code-intrusive visualization to define business requests and provides rich performance metrics and diagnostic capabilities that fit the business. Alibaba Cloud Log Service (SLS) can also be used as an observation solution for custom metrics. Users can customize the content and format of the application system and collect them through Log Service. They can also configure business dashboards in Log Service to observe their own business situations or perform system audits.

  • User feedback monitoring: Mainly collects user feedback on functions' availability through public opinion and customer complaints, etc..