This topic describes the monitoring and alerting process in Realtime Compute for Apache Flink and how to create alert rules in Realtime Compute for Apache Flink.

Introduction to CloudMonitor

CloudMonitor helps you collect the monitoring metrics of cloud resources or other custom monitoring metrics, check service availability, and configure alerts based on these monitoring metrics. CloudMonitor helps you view the cloud resource usage, business information, and service health status. In addition, you can receive alerts and respond to these alerts at the earliest opportunity to keep your applications running properly.

Create alert rules

For more information about how to create an alert rule, see Configure alert rules.

Monitoring items of Realtime Compute for Apache Flink

Monitoring item Unit Metric Dimensions Statistics
Service delay s inputDelay userId, regionId, projectName, and jobName Average
Read records per second (RPS) RPS ParserTpsRate userId, regionId, projectName, and jobName Average
Write RPS RPS SinkOutTpsRate userId, regionId, projectName, and jobName Average
Failover rate
Note The failover rate is the average number of failovers per second in the last minute. For example, if one failover occurred in the last minute, the failover rate is 0.01667 (1/60 = 0.01667).
% TaskFailoverRate userId, regionId, projectName, and jobName Average
Processing delay s FetchedDelay userId, regionId, projectName, and jobName Average

View monitoring metrics

  1. Log on to the Realtime Compute development platform.
  2. In the top navigation bar, click Administration.
  3. On the Administration page, click the name of the job for which you want to view monitoring metrics.
  4. In the upper-right corner of the page, choose More > Monitor.
  5. On the page that appears, view the monitoring metrics of the job.