Community Blog Cloud-Native Observability Suite: Building Ubiquitous Observability Infrastructure

Cloud-Native Observability Suite: Building Ubiquitous Observability Infrastructure

This article explains how observability makes cloud computing easier to use and more efficient while maximizing business stability, security, and economic value.

Observability Becomes the Strongest Support for Data-Driven Decision-Making.

Recently, Gartner released a report entitled Top Ten Strategic Technology Trends in 2023. The report focuses on optimization, expansion, and development. Application Observability has become one of the hot trends again.

Frances Karamouzis (Vice President of Gartner) said, “To enhance organization's financial position during times of economic turbulence, CIOs and IT executives must look beyond cost savings to new forms of operational excellence while continuing to accelerate digital transformation. Applied observability feeds observable artifacts back in a highly orchestrated and integrated approach to accelerate organizational decision-making. When planned strategically and executed successfully, applied observability is the most powerful source of data-driven decision-making.”

However, with the rapid development of IT technology, enterprises are bound to encounter three obstacles in the process of landing observability. First, the booming open-source/commercial observability product ecosystem and the traditional enterprise monitoring system that is gradually unable to meet the needs of cloud-native IT operations and maintenance have resulted in the separation between old and new tools and data and tools. How to choose and make a balance becomes the issue that the CTO and CIO must face. Secondly, as microservice models and distributed architectures are increasingly applied to enterprise businesses, the computing and storage costs of typical observability data (such as logs) increase exponentially. At a time when the industry situation is becoming severe, the cost of observability is high and unpredictable, and the application scenario often stays at a single point of troubleshooting or basic monitoring alerts. The return value of landing observability infrastructure is unknown. All of the points are difficult to convince the CTO and CIO to invest in increasingly tight O&M budgets and workforce for observability construction.

Alibaba Cloud, which has made a lot of efforts in the field of observability, launched ACOS in June to solve the problem. The product suite consists of Alibaba Cloud Prometheus service, Alibaba Cloud Grafana service, and tracing analysis OpenTelemetry. The three fact standards with the highest popularity of open-source and the widest ecological integration are the core of ACOS. It aims to connect all Alibaba Cloud products to realize full link data standardization by the open standard. It also connects to the existing observability data assets of enterprises and integrates with the Alibaba Cloud application managing platform.

It comprehensively covers scenarios (such as user experience (UEM), application observability (APM), cloud service observability, cost management, and emergency collaboration efficiency). It helps enterprises efficiently build an open, high-quality, low-cost, and unified observability system.


Unique Value of Cloud-Native ACOS

Compared with other observability commercial or open-source solutions, the cloud-native observability suite is fully compatible with and optimized for open-source standards in six aspects: collection, storage, computing, alerting, query, and visualization. At the same time, the observability experience of Alibaba Group and Alibaba Cloud services with a large number of users is exported as products. This includes preset templates for operational metrics, dashboard, and alert rules for more than 50 Alibaba Cloud mainstream cloud services. Full-link high-quality observability is achieved on the first day of access (including infrastructure, containers, applications, user experience, cost analysis, and O&M performance analysis.

Cloud-Native ACOS Upgrade

Three major components of ACOS also received important upgrades during the Apsara Conference 2022.


First, Alibaba Cloud Prometheus, which is the fact standard for container observability, extends the observability scope from specialized containers to full-stack observability. Prometheus has become the default observability infrastructure of over 50 Alibaba Cloud services to help more enterprises build a unified observability system. Prometheus is integrated with the APM, eBPF, and OpenTelemetry metrics of the Application Real-Time Monitoring Service (ARMS). Prometheus also aggregates Prometheus instances of ECS (non-Kubernetes clusters), Kubernetes clusters, and non-Alibaba Cloud clusters to help enterprises start the unified observability center under global and heterogeneous architecture.

While serving external customers, Alibaba Cloud Prometheus has been refined through internal scenarios. Currently, Prometheus supports container observability with tens of millions of cores and time series storage and computing capabilities with billions of timelines. The core technical difficulties of time series monitoring scenarios (such as high cardinality timeline divergence and convergence, long-period query, and false and missed reports under bursty traffic) are optimized to make Prometheus truly ubiquitous observability infrastructure available for mass production.

Prometheus provides enterprises with powerful observability capabilities. In addition, Prometheus provides a new subscription billing method. This billing method reduces the average cost by 60% compared with self-built businesses. It meets the observability requirements of users of different business scales and minimizes the O&M costs of enterprises.


Secondly, as the observability interface, Alibaba Cloud Grafana service will usher in a new upgrade of version 9.0. The new Prometheus, Loki query statement generators, and the enhanced Explore feature enable you to query and analyze data and create visualized dashboards and alerts at a lower threshold. In addition, Grafana is integrated with over 20 observability storage services to deal with richer heterogeneous observability data sources (such as Log Service and Elasticsearch) and help enterprises build a unified O&M and business observability interface. Enterprise features are enhanced (such as one-click import/export of user-created instances, automatic data export reports, one-click data backup and recovery, and user ActionTrail).


Finally, ARMS ushers in a huge upgrade to help enterprises open up a multi-dimensional observability perspective for cloud applications. In terms of data collection, the OpenTelemetry SDK is fully supported. In addition, metric data can be stored and calculated using the Prometheus standard to supplement business and customize component tracking points. It avoids vendor lock-in while improving the observability dimension. TraceExplorer is also used to implement a multi-source trace unified query.

At the same time, the Alibaba Cloud Observability Team is actively exploring eBPF technology and Continuous Profiling, which are the most popular segments in the field. During the Apsara Conference 2022, the Alibaba Cloud Observability Team released the lightweight application monitoring preview based on the eBPF technology to help enterprises quickly obtain non-intrusive, all-language application monitoring capabilities and timely perceive the global topology of the cluster.

At the same time, the Continuous Profiling function was jointly launched with the Alibaba Dragonwell Yeam, which can continuously analyze code-level performance overhead with extremely low power consumption, cover details that cannot be covered by traditional links, metrics, and logs, realize code-level production environment performance problem location and all-weather active analysis, and make application observability perspectives more abundant and granularity more detailed.


While exploring more observability scenarios to serve Alibaba Group and a large number of enterprise users, the team has gained high recognition from domestic and foreign industry organizations due to its complete product capabilities, good ecological integration capabilities, and excellent cost advantages. In 2022, ARMS obtained the first batch of advanced-level certification for observability products from the China Academy of Information and Communications Technology. At the same time, Alibaba Cloud entered the Gartner APM and Magic Quadrant for Observability for two consecutive years, and in 2022, it became the only Chinese manufacturer selected.

In an era when everything can be moved to the cloud, observability makes cloud computing easier to use and more efficient, maximizing business stability, security, and economic value. Observability power has become the essential competitiveness of every IT practitioner. Observability helps enterprises analyze, gain insight, and achieve high-quality decision-making and business innovation. Alibaba Cloud will continue to promote the evolution and implementation of observability technologies to help enterprises obtain the most cost-effective observability and truly realize high-quality digital transformation and innovation.

0 1 0
Share on

You may also like


Related Products