×
Community Blog From System Monitoring to Business Insights: A Comprehensive Analysis of the Custom Metric Collection Feature of ARMS

From System Monitoring to Business Insights: A Comprehensive Analysis of the Custom Metric Collection Feature of ARMS

This article introduces Alibaba Cloud ARMS' custom metric collection capability.

Introduction

In the wave of digital transformation, application performance monitoring (APM) has become an important cornerstone to ensure the stable operation of systems. However, traditional APM systems can only provide system-level performance data and cannot go deep into the business core. The custom metric collection feature of Alibaba Cloud Application Real-Time Monitoring Service (ARMS) breaks through this limitation and enables monitoring to become a real booster for business growth.

1. Why Is Custom Metric Collection Required?

1.1 Monitoring Blind Spots of Traditional APM Systems

Traditional APM systems generally focus on the following system-level metrics:

● CPU utilization and memory usage

● Request response time and throughput

● Database query performance

● API call success rate

These metrics are often designed to resolve business performance issues, errors, and slow responses. They can hardly reflect the business operation directly. Therefore, monitoring blind spots occur in the following business scenarios:

Scenario 1: E-commerce sales promotions

During sales promotions such as Double 11, the CPU and memory metrics of the system may run as expected. However, business issues often cannot be detected in time by using system metrics, such as a sudden drop in the order conversion rate or an anomaly in the payment success rate.

Scenario 2: E-commerce system operation

For an e-commerce system, the key business metrics include:

● Real-time order quantity and order amount

● Product inventory

● Conversion rate from the shopping cart

● Coupon usage

● Refund rate

These business metrics directly reflect business health and operational efficiency. However, these business metrics cannot be collected by traditional APM systems.

Scenario 3: Financial risk control system

A financial system needs to monitor the following metrics in real time:

● Number of transactions and transaction amount

● Risk blocking rate

● Percentage of abnormal transactions

● Capital turnover speed

These metrics are critical to business decisions. However, the metrics cannot be collected by traditional APM systems.

1.2 Value of Custom Metric Collection

ARMS provides the custom metric collection feature, which brings the following core value:

Business observability: Business metrics and system metrics are monitored in a unified manner to form a complete observability system.

Quick issue identification: If a business exception occurs, system metrics can be quickly associated and the root cause of the issue can be accurately located.

Data-driven decision-making: Real-time business metrics provide data support for operations and product decisions.

End-to-end tracing: The combination of business metrics and traces enables end-to-end business process monitoring.

2. Comparison of Common Metric Definition Frameworks in Java

In the Java ecosystem, there are multiple mature metric collection frameworks. Understanding their characteristics helps you choose an appropriate technical solution.

2.1 Micrometer

Introduction: Micrometer is a metrics facade for the Spring ecosystem, similar to SLF4J for logging.

Core features:

● Provides a unified API and supports multiple monitoring system backends, such as Prometheus, InfluxDB, and Datadog.

● Deeply integrates with Spring Boot.

● Supports dimensional metrics, such as tags or labels.

Sample code:

@Autowired
MeterRegistry registry;
public void processOrder(Order order) {
    Counter.builder("orders.processed")
        .tag("status", order.getStatus())
        .tag("channel", order.getChannel())
        .register(registry)
        .increment();
}

Advantages:

● ✅ Supports multiple backends. One set of code can be compatible with multiple monitoring systems.

● ✅ Supports automatic configurations of Spring Boot, enabling out-of-the-box use.

● ✅ Supports dimensional metrics for flexible queries.

● ✅ Active in the community and continuously updated.

Disadvantages:

● ❌ Highly dependent on the Spring ecosystem.

● ❌ Does not support distributed tracing and logging.

● ❌ Complex configurations.

● ❌ Lacks unified observability standards.

Scenarios: Spring Boot microservices applications

2.2 Prometheus clients

Introduction: Prometheus clients are Java client libraries provided by Prometheus. A Prometheus client can be directly connected to the Prometheus ecosystem and is a preferred solution for many components in the Kubernetes ecosystem to expose metrics.

Core features:

Native integration: seamlessly integrates with the Prometheus monitoring system.

Pull model: actively pulls metrics. Applications do not need to actively push metrics.

Powerful query: supports powerful query and aggregation capabilities of Prometheus Query Language (PromQL).

Rich ecosystem: supports the Grafana visualization tool and Alertmanager alerts.

Sample code:

import io.prometheus.client.Counter;
import io.prometheus.client.Gauge;
import io.prometheus.client.Histogram;
public class OrderMetrics {
    // Define a counter to record the total number of orders.
    private static final Counter orderCounter = Counter.build()
        .name("orders_total")
        .help("Total number of orders")
        .labelNames("status", "channel")  // Define labels
        .register();
    // Define a gauge to record the number of orders that are being processed.
    private static final Gauge processingOrders = Gauge.build()
        .name("orders_processing")
        .help("Number of orders currently processing")
        .register();
    // Define a histogram to record the statistics on order amount distribution.
    private static final Histogram orderAmount = Histogram.build()
        .name("order_amount")
        .help("Order amount distribution")
        .buckets(50, 100, 200, 500, 1000, 5000)  // Custom buckets
        .register();
    public void processOrder(Order order) {
        // Number of total orders + 1, with labels.
        orderCounter.labels(order.getStatus(), order.getChannel()).inc();
        // Record the order amount.
        orderAmount.observe(order.getAmount());
        // Number of orders that are being processed + 1.
        processingOrders.inc();
        try {
            // Order processing logic...
        } finally {
            // Processing completed. Counter - 1.
            processingOrders.dec();
        }
    }
}

Maven dependencies:

<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient</artifactId>
    <version>0.16.0</version>
</dependency>
<!-- Used to expose an HTTP endpoint. -->
<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient_servlet</artifactId>
    <version>0.16.0</version>
</dependency>

Expose the metric endpoint (Spring Boot):

@Configuration
public class PrometheusConfig {
    @Bean
    public ServletRegistrationBean<MetricsServlet> metricsServlet() {
        return new ServletRegistrationBean<>(
            new MetricsServlet(), "/metrics"
        );
    }
}

Visit http://localhost:8080/metrics\ to view metric data in the Prometheus format.

Advantages:

● ✅ Natively integrates with the Prometheus ecosystem.

● ✅ Supports the pull model. Applications do not need to actively push metrics.

● ✅ Supports powerful query features and complex aggregation and calculation of PromQL.

● ✅ Seamlessly connects to visualization tools such as Grafana.

● ✅ Supports flexible label mechanisms for multi-dimensional queries.

● ✅ Lightweight framework and low performance overhead.

Disadvantages:

● ❌ Only metrics can be collected. Distributed tracing and logging are not supported.

● ❌ The deployment of the pull model is complex in some network environments (port exposure required).

● ❌ Integration with non-Prometheus monitoring systems requires additional configurations.

● ❌ Data persistence depends on Prometheus servers. Prometheus clients do not store historical data.

● ❌ The automatic instrumentation capability is not provided. All metrics must be manually defined.

Scenarios:

● Teams that use the Prometheus monitoring system

● Cloud-native applications in Kubernetes environments

● Monitoring scenarios that require powerful query capabilities

● Projects with preferred open source solutions

Prometheus advantages compared to other frameworks:

1.  Pull model:

  • You do not need to configure a data push address for your application. This reduces coupling.
  • Prometheus can detect the health of applications. If Prometheus fails to capture the metrics of an application, the application is abnormal.
  • The pull model facilitates service discovery and dynamic monitoring.

2.  Powerful PromQL:

# Calculate the order growth rate.rate(orders_total[5m])# Collect the statistics by channel group.sum by(channel) (orders_total)# Query the P99 response time.histogram_quantile(0.99, order_amount_bucket)

3.  Cloud-native standards:

  • Kubernetes natively supports metric data in the Prometheus format.
  • A large number of open source components provide the /metrics endpoint.
  • Monitoring as Code (MaC) and configuration version management are supported.

2.3 OpenTelemetry

Introduction: OpenTelemetry is a Cloud Native Computing Foundation (CNCF) observability standard, which is the result of a merger between OpenTracing and OpenCensus.

Core features:

Diverse data types: supports traces, metrics, and logs.

Vendor neutral: supports standard data models and protocols.

Automatic instrumentation: automatically collects framework metrics using a Java agent.

Flexible extension: provides a comprehensive plug-in ecosystem.

Sample code:

OpenTelemetry openTelemetry = GlobalOpenTelemetry.get();Meter meter = openTelemetry.getMeter("order-service");LongCounter orderCounter = meter.counterBuilder("orders.total").setUnit("1").setDescription("Total number of orders").build();orderCounter.add(1, Attributes.of(AttributeKey.stringKey("status"), "success",AttributeKey.stringKey("payment_method"), "alipay"));

Advantages:

● ✅ Cloud-native standard with wide support.

● ✅ Provides a unified observability system that integrates traces, metrics, and logs.

● ✅ Supports automatic instrumentation. OpenTelemetry can collect framework metrics without the need to write code.

● ✅ Provides rich context information and supports associations of metrics and traces.

● ✅ Active in the community and supported by major cloud service providers.

Disadvantages:

● ❌ The learning curve is steep.

● ❌ Additional collector deployment is required.

● ❌ Some features are still evolving.

● ❌ Configuration is relatively complex.

Scenarios: Cloud-native microservices, distributed systems, and scenarios that require unified observability

2.5 Framework Comparison

Feature Micrometer Prometheus clients OpenTelemetry
Standardization ⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
Multiple backends ❌ (Prometheus only)
Distributed tracing
Automatic instrumentation Partially supported
Spring integration Natively supported Manual integration Configuration required
Learning cost ⭐⭐ ⭐⭐ ⭐⭐⭐
Cloud-native support ⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
Community activity ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
Query capabilities ⭐⭐⭐ ⭐⭐⭐⭐⭐ (PromQL) ⭐⭐⭐⭐
Data model Push Pull Push or pull
Visualization ecosystem Rich Excellent (Grafana) Rich

Recommended framework selection:

● Spring Boot applications: We recommend that you select Micrometer.

● Prometheus systems: We recommend that you select a Prometheus client.

● Cloud-native or distributed systems: We recommend that you select OpenTelemetry.

● Existing Grafana dashboards: We recommend that you select a Prometheus client or Micrometer.

Deep comparison between Prometheus clients and OpenTelemetry:

For cloud-native applications, a Prometheus client or OpenTelemetry is a common choice. Prometheus clients and OpenTelemetry have the following differences:

Dimension Prometheus clients OpenTelemetry
Positioning Focuses on metric collection. Provides a complete observability solution.
Data type Supports only metrics. Supports traces, metrics, and logs.
Data transmission Supports the pull model (/metrics endpoint). Supports the push model (OTLP protocol).
Backend binding Bound to Prometheus. Supports multiple backends.
Metric association Associated by label. Natively supports trace associations.
Learning curve Gentle Steep
Scenario Kubernetes and standard Prometheus stacks. Multi-cloud, hybrid cloud, and scenarios that require tracing analysis.

Common combinations:

  1. Prometheus-only stack: Prometheus client + Prometheus + Grafana
  2. Hybrid solution: OpenTelemetry collection + Metric data export in the Prometheus format + Grafana

3. Best Practices for Using ARMS to Collect Custom Metrics

The preceding comparisons show that different metric definition frameworks have their advantages and disadvantages. ARMS can deeply integrate with OpenTelemetry. Compared with open source solutions, ARMS greatly simplifies the process of defining metrics, collecting metrics, and configuring dashboards and alerts by using the OpenTelemetry SDK technology stack. In the future, ARMS will support quick collection of Micrometer and Prometheus metrics. The following example shows how to use ARMS to collect custom metrics in a flash sale scenario.

3.1 Scenario Introduction

You want to monitor a flash sale system and need to track the following key metrics in real time:

● Number of successful flash sale requests: The statistics information is classified by success or failure.

● Current inventory: the real-time inventory.

● Flash sale success rate: used for alerts and dashboard display.

3.2 Step 1: Add Dependencies

Add the OpenTelemetry dependency to the pom.xml file of your project.

<dependencies>
    <!-- OpenTelemetry API -->
    <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-api</artifactId>
    </dependency>
    <!-- OpenTelemetry SDK (Optional. Used for local testing.) -->
    <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-sdk</artifactId>
    </dependency>
</dependencies>
<!-- Unified version management -->
<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>io.opentelemetry</groupId>
            <artifactId>opentelemetry-bom</artifactId>
            <version>1.32.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

Note:

● The ARMS Java agent automatically initializes an OpenTelemetry instance.

● The application code needs to only depend on opentelemetry-api.

● You do not need to configure an exporter. Data is automatically reported to ARMS.

3.3 Step 2: Define Custom Metrics

Create a flash sale service and define business metrics.

import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.common.AttributeKey;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.metrics.LongCounter;
import io.opentelemetry.api.metrics.Meter;
import io.opentelemetry.api.metrics.ObservableLongGauge;
import org.springframework.stereotype.Service;
import javax.annotation.PreDestroy;
import java.util.concurrent.atomic.AtomicInteger;
@Service
public class SeckillService {
    // Inventory counter (thread-safe)
    private final AtomicInteger stock = new AtomicInteger(0);
    // Counter for calculating flash sale requests
    private final LongCounter seckillCounter;
    // Inventory gauge
    private final ObservableLongGauge stockGauge;
    // Metric dimension keys
    private static final AttributeKey<String> RESULT_KEY = AttributeKey.stringKey("result");
    private static final AttributeKey<String> PRODUCT_KEY = AttributeKey.stringKey("product_id");
    public SeckillService() {
        // Obtain the OpenTelemetry instance initialized by the ARMS Java agent.
        OpenTelemetry openTelemetry = GlobalOpenTelemetry.get();
        // Create a meter whose namespace is seckill.
        Meter meter = openTelemetry.getMeter("seckill");
        // Define a counter to record the number of flash sale requests (cumulative value)
        seckillCounter = meter.counterBuilder("product_seckill_count")
                .setUnit("1")
                .setDescription("The number of flash sale requests. The statistics information is classified by success or failure.")
                .build();
        // Define a gauge to record the current inventory (instantaneous value)
        stockGauge = meter.gaugeBuilder("product_current_stock")
                .ofLongs()
                .setDescription("The current product inventory.")
                .buildWithCallback(measurement -> {
                    // Execute a callback upon each collection to report the current inventory.
                    measurement.record(stock.get());
                });
    }
    /**
     * Initialize the inventory.
     */
    public void initStock(int count) {
        stock.set(count);
    }
    /**
     * Flash sale product
     */
    public String seckill(String productId, String userId) {
        int currentStock = stock.get();
        // The inventory is insufficient. The flash sale request fails.
        if (currentStock <= 0) {
            // Record the number of failed flash sale requests.
            seckillCounter.add(1, Attributes.of(
                RESULT_KEY, "failed",
                PRODUCT_KEY, productId
            ));
            return "The flash sale request fails. The product is sold out.";
        }
        // Try to deduct the inventory. Perform the Compare and Swap (CAS) operation to ensure thread safety.
        if (stock.decrementAndGet() >= 0) {
            // The flash sale request is successful.
            seckillCounter.add(1, Attributes.of(
                RESULT_KEY, "success",
                PRODUCT_KEY, productId
            ));
            return "Congratulations. The flash sale request is successful. Remaining inventory:" + stock.get();
        } else {
            // The inventory is insufficient in the concurrency situation. Roll back.
            stock.incrementAndGet();
            seckillCounter.add(1, Attributes.of(
                RESULT_KEY, "failed",
                PRODUCT_KEY, productId
            ));
            return "The flash sale request fails. The product is sold out.";
        }
    }
    /**
     * Destroy resources.
     */
    @PreDestroy
    public void destroy() {
        // Disable the gauge and stop collection
        stockGauge.close();
    }
}

Key code analysis:

1.  Meter naming: "seckill" in getMeter("seckill") is the namespace, which needs to be configured in the ARMS console.

2.  Counter and gauge comparison:

  • Counter: used to record a cumulative value (can be increased but not decreased), such as the total number of flash sale requests.
  • Gauge: used to record an instantaneous value (can be increased or decreased), such as the current inventory.

3.  Dimension design: You can use Attributes to add dimensions and use result (success or failed) and product_id to perform multi-dimensional analysis.

4.  Thread safety: Use AtomicInteger to ensure data accuracy in high-concurrency scenarios.

3.4 Step 3: Configure Custom Metric Collection in the ARMS Console

1.  Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring > Application List. On the Application List page, click the name of an application. On the page that appears, click the Configuration tab and select Custom Configurations.

2.  Enable custom metric collection.In the Probe switch settings section of the Configuration tab, configure the metrics to be collected.

1

3.  Configuration description:

  • meters parameter: Enter the name of the meter (seckill) defined in Step 2.
  • You can configure multiple meters. Separate multiple meters with commas (,). Example: seckill,order,payment.

3.5 Step 4: View Metric Data

1.  Go to the Instances page of the ARMS console. In the top navigation bar, select the region in which the application resides. The instance whose type is Prometheus Instance for Application Monitoring is the storage instance of APM metrics and custom metrics of all ARMS applications in the current region, as shown in the following figure.

2

2.  Click Shared Edition in the Grafana Workspace column of the instance to go to the Grafana page. Click Explore and select the Prometheus instance from the previous step as the data source.

3

3.  Use PromQL to query the metrics that you defined in the code, as shown in the following figure. You can also create a custom dashboard in Grafana.

4

3.6 Step 5: Configure an Alert Rule

Go to the Prometheus Alert Rules page of the ARMS console. In the top navigation bar, select the region in which the application resides. Click Create Prometheus Alert Rule and configure the rule, as shown in the following figure.

Alert: inventory alert

5

For more information about alert rules, see Create an alert rule for a Prometheus instance.

3.7 Recommended Best Practices

Metric naming conventions

<namespace>_<metric_name>
Examples:
- order_created_count  // The number of created orders.
- payment_success_rate // The payment success rate.
- user_login_duration  // The logon duration.

Dimension design principles

● The cardinality of a dimension should not be too large (to prevent excessive dimension data).

● An enumeration type dimension is preferred, such as status (success or failed).

● We recommend that you do not use high-cardinality dimensions, such as userId or orderId.

Invalid example:

// ❌ Invalid: The cardinality of userId is too large.
counter.add(1, Attributes.of(
    AttributeKey.stringKey("user_id"), userId
));

Valid example:

// ✅ Valid: Use an enumeration type dimension.
counter.add(1, Attributes.of(
    AttributeKey.stringKey("user_type"), "vip"
));

Performance optimization

● Create metric objects in advance. This prevents frequent metric object creation.

● Use the batch API to reduce overheads.

● Keep the logic of the gauge callback function simple.

Metric type selection

Scenario Metric type Example
Cumulative value Counter Total number of orders and requests
Instantaneous value Gauge Number of current online users and queue length
Distribution statistics Histogram Order amount distribution and response time distribution

4. Core Benefits of Custom ARMS Metrics

4.1 Seamless Integration and Zero-cost Integration

● ✅ Automatic injection: The ARMS Java agent is used. You do not need to manually configure OpenTelemetry.

● ✅ Non-intrusive collection: Framework metrics can be automatically collected, and business metrics can be defined as required.

● ✅ Unified reporting: Metrics can be automatically reported to ARMS without the need to deploy a collector.

4.2 Associations of Metrics and Traces

The core advantage of ARMS is to associate custom metrics with distributed traces.

Request trace:
Frontend -> Gateway -> Order service -> Payment service
         ↓
  Custom metric: An order is created.
         ↓
  Trace: the complete trace of the order.

Value: If an order metric is abnormal, go to a specific trace with one click to quickly locate the issue.

4.3 Powerful Data Visualization Capabilities

● 📊 Multi-dimensional aggregation queries

● 📈 Trend comparison analysis

● 🎯 Custom dashboards

● 🔔 Flexible alert rules

4.4 Enterprise-class Features

● 🔒 Secure data isolation

● 📦 Long-term data storage

● ⚡ High-performance queries

● 🌐 Cross-region deployment

5. Summary and Future Outlook

The custom metric collection feature is a key step for APM systems to move from monitoring to observability. Alibaba Cloud ARMS deeply integrates with OpenTelemetry to provide users with the following features:

Standardization: supports cloud-native standards to prevent vendor lock-in.

Simplification: requires only one line of configuration, enabling out-of-the-box use.

Visualization: supports metrics, traces, and logs.

Intelligence: supports AI-powered anomaly detection and root cause analysis.

Scenarios:

● E-commerce systems: order, payment, and inventory monitoring

● Financial systems: transaction volume and risk control metrics

● Game systems: number of online users and top-up amount

● IoT systems: online rate of devices and number of messages

Future outlook:

ARMS will continue to deepen its custom metric capabilities and support custom metric collection for more frameworks and metric types.

● Supports the Micrometer and Prometheus frameworks.

● Supports quantile and histogram metric types.

Try the custom metric collection feature of ARMS now, enabling monitoring to truly serve business growth.

References

Official documentation for custom metric collection in ARMS

OpenTelemetry official website

ARMS product homepage

[Try now] 👉 https://www.alibabacloud.com/en/product/arms


This article is presented by the Alibaba Cloud ARMS team.

0 1 0
Share on

You may also like

Comments

Related Products