Realtime Compute for Apache Flink provides job monitoring and alerting features. You can use either CloudMonitor, a free monitoring service, or Managed Service for Prometheus, which is part of Application Real-Time Monitoring Service (ARMS). This topic compares these two services for Flink monitoring and alerting to help you choose the one that best suits your needs.
Feature comparison
Category | Features | ARMS | CloudMonitor |
Service availability | - | The service availability for both Prometheus monitoring and alerting is at least 99.9%. For more information, see the Prometheus Service Level Agreement. | The availability of monitoring metrics is not covered by the Service-Level Agreement (SLA). The alerting service has an availability of at least 99.9%. For more information, see the CloudMonitor Service Level Agreement. |
Costs | Monitoring and alerting fees | Metric collection and the alerting service for Prometheus are pay-as-you-go. | No subscription fees. Fully managed. Monitoring and alert management are free. For notification channels, only text messages and voice calls incur a small fee based on usage. |
Data storage duration | Includes a monthly free quota of 50 GB. Data is stored for 90 days by default. For more information, see Prometheus Instance Billing. | Storage is free. Data is stored for 30 days by default. | |
Monitoring metrics | Metric display | Metrics are displayed in monitoring charts in the Flink development console. The ARMS console supports queries using Prometheus Query Language (PromQL) and display on Grafana dashboards. | Pre-aggregated metrics are displayed in monitoring charts in the Realtime Compute for Apache Flink console and the CloudMonitor console. This method has some limitations compared to ARMS. For more information, see What are the limitations of CloudMonitor alerting compared to ARMS? |
Number of monitoring metrics |
| Monitoring metrics for Flink jobs. | |
Alert management | Configuration method |
| Go to the CloudMonitor console from the Flink development console to configure or subscribe. |
Number of alert metrics |
| Provides alerting for 20 monitoring metrics and lets you subscribe to event alerts. Event alerts include job failure alerts, ECS breakdown post-processing, and ECS proactive O&M impacts. | |
Single-metric/multi-metric support |
| Go to the CloudMonitor console from the Flink development console to set single-metric or multi-metric alert rules and subscribe to job failure event alerts. | |
Configure alert rules | |||
Configure alert templates | |||
Event alerting | Does not support event alerting. Only job failure alerts are supported. |
| |
Alert notifications | Notification methods |
| Sends alert notifications to contacts by phone, text message, email, DingTalk, WeCom, Lark, and webhook. On-call scheduling is supported. For more information, see Alert Contact. |
Alert notification policies |
| Supports dynamic threshold alerting, merging alert notifications, and blacklist policies. | |
Alert callback | Supported | Supported | |
One-click alerting | Not supported | Supported | |
OpenAPI | Monitoring and alerting OpenAPI | Note For details about the metrics, see Flink metrics. |
Switch monitoring and alerting services
The platform lets you switch between the two monitoring services to meet different business needs.
In the Realtime Compute for Apache Flink console, find the target workspace. In the Actions column, click More to switch to the other monitoring service.
Read the notes on switching service types carefully. You can proceed with the configuration only after you select the confirmation checkbox.
For more configuration details, see Configure monitoring and alerting.