In the wave of digital transformation, application performance monitoring (APM) has become an important cornerstone to ensure the stable operation of systems. However, traditional APM systems can only provide system-level performance data and cannot go deep into the business core. The custom metric collection feature of Alibaba Cloud Application Real-Time Monitoring Service (ARMS) breaks through this limitation and enables monitoring to become a real booster for business growth.
Traditional APM systems generally focus on the following system-level metrics:
● CPU utilization and memory usage
● Request response time and throughput
● Database query performance
● API call success rate
These metrics are often designed to resolve business performance issues, errors, and slow responses. They can hardly reflect the business operation directly. Therefore, monitoring blind spots occur in the following business scenarios:
During sales promotions such as Double 11, the CPU and memory metrics of the system may run as expected. However, business issues often cannot be detected in time by using system metrics, such as a sudden drop in the order conversion rate or an anomaly in the payment success rate.
For an e-commerce system, the key business metrics include:
● Real-time order quantity and order amount
● Product inventory
● Conversion rate from the shopping cart
● Coupon usage
● Refund rate
These business metrics directly reflect business health and operational efficiency. However, these business metrics cannot be collected by traditional APM systems.
A financial system needs to monitor the following metrics in real time:
● Number of transactions and transaction amount
● Risk blocking rate
● Percentage of abnormal transactions
● Capital turnover speed
These metrics are critical to business decisions. However, the metrics cannot be collected by traditional APM systems.
ARMS provides the custom metric collection feature, which brings the following core value:
✅ Business observability: Business metrics and system metrics are monitored in a unified manner to form a complete observability system.
✅ Quick issue identification: If a business exception occurs, system metrics can be quickly associated and the root cause of the issue can be accurately located.
✅ Data-driven decision-making: Real-time business metrics provide data support for operations and product decisions.
✅ End-to-end tracing: The combination of business metrics and traces enables end-to-end business process monitoring.
In the Java ecosystem, there are multiple mature metric collection frameworks. Understanding their characteristics helps you choose an appropriate technical solution.
Introduction: Micrometer is a metrics facade for the Spring ecosystem, similar to SLF4J for logging.
Core features:
● Provides a unified API and supports multiple monitoring system backends, such as Prometheus, InfluxDB, and Datadog.
● Deeply integrates with Spring Boot.
● Supports dimensional metrics, such as tags or labels.
Sample code:
@Autowired
MeterRegistry registry;
public void processOrder(Order order) {
Counter.builder("orders.processed")
.tag("status", order.getStatus())
.tag("channel", order.getChannel())
.register(registry)
.increment();
}
Advantages:
● ✅ Supports multiple backends. One set of code can be compatible with multiple monitoring systems.
● ✅ Supports automatic configurations of Spring Boot, enabling out-of-the-box use.
● ✅ Supports dimensional metrics for flexible queries.
● ✅ Active in the community and continuously updated.
Disadvantages:
● ❌ Highly dependent on the Spring ecosystem.
● ❌ Does not support distributed tracing and logging.
● ❌ Complex configurations.
● ❌ Lacks unified observability standards.
Scenarios: Spring Boot microservices applications
Introduction: Prometheus clients are Java client libraries provided by Prometheus. A Prometheus client can be directly connected to the Prometheus ecosystem and is a preferred solution for many components in the Kubernetes ecosystem to expose metrics.
Core features:
● Native integration: seamlessly integrates with the Prometheus monitoring system.
● Pull model: actively pulls metrics. Applications do not need to actively push metrics.
● Powerful query: supports powerful query and aggregation capabilities of Prometheus Query Language (PromQL).
● Rich ecosystem: supports the Grafana visualization tool and Alertmanager alerts.
Sample code:
import io.prometheus.client.Counter;
import io.prometheus.client.Gauge;
import io.prometheus.client.Histogram;
public class OrderMetrics {
// Define a counter to record the total number of orders.
private static final Counter orderCounter = Counter.build()
.name("orders_total")
.help("Total number of orders")
.labelNames("status", "channel") // Define labels
.register();
// Define a gauge to record the number of orders that are being processed.
private static final Gauge processingOrders = Gauge.build()
.name("orders_processing")
.help("Number of orders currently processing")
.register();
// Define a histogram to record the statistics on order amount distribution.
private static final Histogram orderAmount = Histogram.build()
.name("order_amount")
.help("Order amount distribution")
.buckets(50, 100, 200, 500, 1000, 5000) // Custom buckets
.register();
public void processOrder(Order order) {
// Number of total orders + 1, with labels.
orderCounter.labels(order.getStatus(), order.getChannel()).inc();
// Record the order amount.
orderAmount.observe(order.getAmount());
// Number of orders that are being processed + 1.
processingOrders.inc();
try {
// Order processing logic...
} finally {
// Processing completed. Counter - 1.
processingOrders.dec();
}
}
}
Maven dependencies:
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient</artifactId>
<version>0.16.0</version>
</dependency>
<!-- Used to expose an HTTP endpoint. -->
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient_servlet</artifactId>
<version>0.16.0</version>
</dependency>
Expose the metric endpoint (Spring Boot):
@Configuration
public class PrometheusConfig {
@Bean
public ServletRegistrationBean<MetricsServlet> metricsServlet() {
return new ServletRegistrationBean<>(
new MetricsServlet(), "/metrics"
);
}
}
Visit http://localhost:8080/metrics\ to view metric data in the Prometheus format.
Advantages:
● ✅ Natively integrates with the Prometheus ecosystem.
● ✅ Supports the pull model. Applications do not need to actively push metrics.
● ✅ Supports powerful query features and complex aggregation and calculation of PromQL.
● ✅ Seamlessly connects to visualization tools such as Grafana.
● ✅ Supports flexible label mechanisms for multi-dimensional queries.
● ✅ Lightweight framework and low performance overhead.
Disadvantages:
● ❌ Only metrics can be collected. Distributed tracing and logging are not supported.
● ❌ The deployment of the pull model is complex in some network environments (port exposure required).
● ❌ Integration with non-Prometheus monitoring systems requires additional configurations.
● ❌ Data persistence depends on Prometheus servers. Prometheus clients do not store historical data.
● ❌ The automatic instrumentation capability is not provided. All metrics must be manually defined.
Scenarios:
● Teams that use the Prometheus monitoring system
● Cloud-native applications in Kubernetes environments
● Monitoring scenarios that require powerful query capabilities
● Projects with preferred open source solutions
Prometheus advantages compared to other frameworks:
1. Pull model:
2. Powerful PromQL:
# Calculate the order growth rate.rate(orders_total[5m])# Collect the statistics by channel group.sum by(channel) (orders_total)# Query the P99 response time.histogram_quantile(0.99, order_amount_bucket)
3. Cloud-native standards:
Introduction: OpenTelemetry is a Cloud Native Computing Foundation (CNCF) observability standard, which is the result of a merger between OpenTracing and OpenCensus.
Core features:
● Diverse data types: supports traces, metrics, and logs.
● Vendor neutral: supports standard data models and protocols.
● Automatic instrumentation: automatically collects framework metrics using a Java agent.
● Flexible extension: provides a comprehensive plug-in ecosystem.
Sample code:
OpenTelemetry openTelemetry = GlobalOpenTelemetry.get();Meter meter = openTelemetry.getMeter("order-service");LongCounter orderCounter = meter.counterBuilder("orders.total").setUnit("1").setDescription("Total number of orders").build();orderCounter.add(1, Attributes.of(AttributeKey.stringKey("status"), "success",AttributeKey.stringKey("payment_method"), "alipay"));
Advantages:
● ✅ Cloud-native standard with wide support.
● ✅ Provides a unified observability system that integrates traces, metrics, and logs.
● ✅ Supports automatic instrumentation. OpenTelemetry can collect framework metrics without the need to write code.
● ✅ Provides rich context information and supports associations of metrics and traces.
● ✅ Active in the community and supported by major cloud service providers.
Disadvantages:
● ❌ The learning curve is steep.
● ❌ Additional collector deployment is required.
● ❌ Some features are still evolving.
● ❌ Configuration is relatively complex.
Scenarios: Cloud-native microservices, distributed systems, and scenarios that require unified observability
| Feature | Micrometer | Prometheus clients | OpenTelemetry |
|---|---|---|---|
| Standardization | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Multiple backends | ✅ | ❌ (Prometheus only) | ✅ |
| Distributed tracing | ✅ | ❌ | ✅ |
| Automatic instrumentation | Partially supported | ❌ | ✅ |
| Spring integration | Natively supported | Manual integration | Configuration required |
| Learning cost | ⭐⭐ | ⭐⭐ | ⭐⭐⭐ |
| Cloud-native support | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Community activity | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Query capabilities | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ (PromQL) | ⭐⭐⭐⭐ |
| Data model | Push | Pull | Push or pull |
| Visualization ecosystem | Rich | Excellent (Grafana) | Rich |
Recommended framework selection:
● Spring Boot applications: We recommend that you select Micrometer.
● Prometheus systems: We recommend that you select a Prometheus client.
● Cloud-native or distributed systems: We recommend that you select OpenTelemetry.
● Existing Grafana dashboards: We recommend that you select a Prometheus client or Micrometer.
Deep comparison between Prometheus clients and OpenTelemetry:
For cloud-native applications, a Prometheus client or OpenTelemetry is a common choice. Prometheus clients and OpenTelemetry have the following differences:
| Dimension | Prometheus clients | OpenTelemetry |
|---|---|---|
| Positioning | Focuses on metric collection. | Provides a complete observability solution. |
| Data type | Supports only metrics. | Supports traces, metrics, and logs. |
| Data transmission | Supports the pull model (/metrics endpoint). | Supports the push model (OTLP protocol). |
| Backend binding | Bound to Prometheus. | Supports multiple backends. |
| Metric association | Associated by label. | Natively supports trace associations. |
| Learning curve | Gentle | Steep |
| Scenario | Kubernetes and standard Prometheus stacks. | Multi-cloud, hybrid cloud, and scenarios that require tracing analysis. |
Common combinations:
The preceding comparisons show that different metric definition frameworks have their advantages and disadvantages. ARMS can deeply integrate with OpenTelemetry. Compared with open source solutions, ARMS greatly simplifies the process of defining metrics, collecting metrics, and configuring dashboards and alerts by using the OpenTelemetry SDK technology stack. In the future, ARMS will support quick collection of Micrometer and Prometheus metrics. The following example shows how to use ARMS to collect custom metrics in a flash sale scenario.
You want to monitor a flash sale system and need to track the following key metrics in real time:
● Number of successful flash sale requests: The statistics information is classified by success or failure.
● Current inventory: the real-time inventory.
● Flash sale success rate: used for alerts and dashboard display.
Add the OpenTelemetry dependency to the pom.xml file of your project.
<dependencies>
<!-- OpenTelemetry API -->
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-api</artifactId>
</dependency>
<!-- OpenTelemetry SDK (Optional. Used for local testing.) -->
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-sdk</artifactId>
</dependency>
</dependencies>
<!-- Unified version management -->
<dependencyManagement>
<dependencies>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-bom</artifactId>
<version>1.32.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
Note:
● The ARMS Java agent automatically initializes an OpenTelemetry instance.
● The application code needs to only depend on opentelemetry-api.
● You do not need to configure an exporter. Data is automatically reported to ARMS.
Create a flash sale service and define business metrics.
import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.common.AttributeKey;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.metrics.LongCounter;
import io.opentelemetry.api.metrics.Meter;
import io.opentelemetry.api.metrics.ObservableLongGauge;
import org.springframework.stereotype.Service;
import javax.annotation.PreDestroy;
import java.util.concurrent.atomic.AtomicInteger;
@Service
public class SeckillService {
// Inventory counter (thread-safe)
private final AtomicInteger stock = new AtomicInteger(0);
// Counter for calculating flash sale requests
private final LongCounter seckillCounter;
// Inventory gauge
private final ObservableLongGauge stockGauge;
// Metric dimension keys
private static final AttributeKey<String> RESULT_KEY = AttributeKey.stringKey("result");
private static final AttributeKey<String> PRODUCT_KEY = AttributeKey.stringKey("product_id");
public SeckillService() {
// Obtain the OpenTelemetry instance initialized by the ARMS Java agent.
OpenTelemetry openTelemetry = GlobalOpenTelemetry.get();
// Create a meter whose namespace is seckill.
Meter meter = openTelemetry.getMeter("seckill");
// Define a counter to record the number of flash sale requests (cumulative value)
seckillCounter = meter.counterBuilder("product_seckill_count")
.setUnit("1")
.setDescription("The number of flash sale requests. The statistics information is classified by success or failure.")
.build();
// Define a gauge to record the current inventory (instantaneous value)
stockGauge = meter.gaugeBuilder("product_current_stock")
.ofLongs()
.setDescription("The current product inventory.")
.buildWithCallback(measurement -> {
// Execute a callback upon each collection to report the current inventory.
measurement.record(stock.get());
});
}
/**
* Initialize the inventory.
*/
public void initStock(int count) {
stock.set(count);
}
/**
* Flash sale product
*/
public String seckill(String productId, String userId) {
int currentStock = stock.get();
// The inventory is insufficient. The flash sale request fails.
if (currentStock <= 0) {
// Record the number of failed flash sale requests.
seckillCounter.add(1, Attributes.of(
RESULT_KEY, "failed",
PRODUCT_KEY, productId
));
return "The flash sale request fails. The product is sold out.";
}
// Try to deduct the inventory. Perform the Compare and Swap (CAS) operation to ensure thread safety.
if (stock.decrementAndGet() >= 0) {
// The flash sale request is successful.
seckillCounter.add(1, Attributes.of(
RESULT_KEY, "success",
PRODUCT_KEY, productId
));
return "Congratulations. The flash sale request is successful. Remaining inventory:" + stock.get();
} else {
// The inventory is insufficient in the concurrency situation. Roll back.
stock.incrementAndGet();
seckillCounter.add(1, Attributes.of(
RESULT_KEY, "failed",
PRODUCT_KEY, productId
));
return "The flash sale request fails. The product is sold out.";
}
}
/**
* Destroy resources.
*/
@PreDestroy
public void destroy() {
// Disable the gauge and stop collection
stockGauge.close();
}
}
Key code analysis:
1. Meter naming: "seckill" in getMeter("seckill") is the namespace, which needs to be configured in the ARMS console.
2. Counter and gauge comparison:
3. Dimension design: You can use Attributes to add dimensions and use result (success or failed) and product_id to perform multi-dimensional analysis.
4. Thread safety: Use AtomicInteger to ensure data accuracy in high-concurrency scenarios.
1. Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring > Application List. On the Application List page, click the name of an application. On the page that appears, click the Configuration tab and select Custom Configurations.
2. Enable custom metric collection.In the Probe switch settings section of the Configuration tab, configure the metrics to be collected.

3. Configuration description:
meters parameter: Enter the name of the meter (seckill) defined in Step 2.,). Example: seckill,order,payment.1. Go to the Instances page of the ARMS console. In the top navigation bar, select the region in which the application resides. The instance whose type is Prometheus Instance for Application Monitoring is the storage instance of APM metrics and custom metrics of all ARMS applications in the current region, as shown in the following figure.

2. Click Shared Edition in the Grafana Workspace column of the instance to go to the Grafana page. Click Explore and select the Prometheus instance from the previous step as the data source.

3. Use PromQL to query the metrics that you defined in the code, as shown in the following figure. You can also create a custom dashboard in Grafana.

Go to the Prometheus Alert Rules page of the ARMS console. In the top navigation bar, select the region in which the application resides. Click Create Prometheus Alert Rule and configure the rule, as shown in the following figure.
Alert: inventory alert

For more information about alert rules, see Create an alert rule for a Prometheus instance.
✅ Metric naming conventions
<namespace>_<metric_name>
Examples:
- order_created_count // The number of created orders.
- payment_success_rate // The payment success rate.
- user_login_duration // The logon duration.
✅ Dimension design principles
● The cardinality of a dimension should not be too large (to prevent excessive dimension data).
● An enumeration type dimension is preferred, such as status (success or failed).
● We recommend that you do not use high-cardinality dimensions, such as userId or orderId.
Invalid example:
// ❌ Invalid: The cardinality of userId is too large.
counter.add(1, Attributes.of(
AttributeKey.stringKey("user_id"), userId
));
Valid example:
// ✅ Valid: Use an enumeration type dimension.
counter.add(1, Attributes.of(
AttributeKey.stringKey("user_type"), "vip"
));
✅ Performance optimization
● Create metric objects in advance. This prevents frequent metric object creation.
● Use the batch API to reduce overheads.
● Keep the logic of the gauge callback function simple.
✅ Metric type selection
| Scenario | Metric type | Example |
|---|---|---|
| Cumulative value | Counter | Total number of orders and requests |
| Instantaneous value | Gauge | Number of current online users and queue length |
| Distribution statistics | Histogram | Order amount distribution and response time distribution |
● ✅ Automatic injection: The ARMS Java agent is used. You do not need to manually configure OpenTelemetry.
● ✅ Non-intrusive collection: Framework metrics can be automatically collected, and business metrics can be defined as required.
● ✅ Unified reporting: Metrics can be automatically reported to ARMS without the need to deploy a collector.
The core advantage of ARMS is to associate custom metrics with distributed traces.
Request trace:
Frontend -> Gateway -> Order service -> Payment service
↓
Custom metric: An order is created.
↓
Trace: the complete trace of the order.
Value: If an order metric is abnormal, go to a specific trace with one click to quickly locate the issue.
● 📊 Multi-dimensional aggregation queries
● 📈 Trend comparison analysis
● 🎯 Custom dashboards
● 🔔 Flexible alert rules
● 🔒 Secure data isolation
● 📦 Long-term data storage
● ⚡ High-performance queries
● 🌐 Cross-region deployment
The custom metric collection feature is a key step for APM systems to move from monitoring to observability. Alibaba Cloud ARMS deeply integrates with OpenTelemetry to provide users with the following features:
✨ Standardization: supports cloud-native standards to prevent vendor lock-in.
✨ Simplification: requires only one line of configuration, enabling out-of-the-box use.
✨ Visualization: supports metrics, traces, and logs.
✨ Intelligence: supports AI-powered anomaly detection and root cause analysis.
Scenarios:
● E-commerce systems: order, payment, and inventory monitoring
● Financial systems: transaction volume and risk control metrics
● Game systems: number of online users and top-up amount
● IoT systems: online rate of devices and number of messages
Future outlook:
ARMS will continue to deepen its custom metric capabilities and support custom metric collection for more frameworks and metric types.
● Supports the Micrometer and Prometheus frameworks.
● Supports quantile and histogram metric types.
Try the custom metric collection feature of ARMS now, enabling monitoring to truly serve business growth.
● Official documentation for custom metric collection in ARMS
● OpenTelemetry official website
[Try now] 👉 https://www.alibabacloud.com/en/product/arms
This article is presented by the Alibaba Cloud ARMS team.
652 posts | 55 followers
FollowAlibaba Cloud Native Community - November 27, 2025
Alibaba Cloud Native Community - January 19, 2026
Alibaba Cloud Native Community - December 10, 2025
Alibaba Cloud Native - September 12, 2024
Alibaba Cloud Native Community - July 26, 2022
Alibaba Cloud Native Community - April 18, 2025
652 posts | 55 followers
Follow
Best Practices
Follow our step-by-step best practices guides to build your own business case.
Learn More
Microservices Engine (MSE)
MSE provides a fully managed registration and configuration center, and gateway and microservices governance capabilities.
Learn More
Application Real-Time Monitoring Service
Build business monitoring capabilities with real time response based on frontend monitoring, application monitoring, and custom business monitoring capabilities
Learn More
Hybrid Cloud Solution
Highly reliable and secure deployment solutions for enterprises to fully experience the unique benefits of the hybrid cloud
Learn MoreMore Posts by Alibaba Cloud Native Community