Configure an alert rule for a cluster - ApsaraDB for ClickHouse

ApsaraDB for ClickHouse lets you set alert rules on key cluster metrics. When a metric value falls outside the range you define, the system sends a notification to your alert contact group so you can respond quickly.

Alert rules are powered by Application Real-Time Monitoring Service (ARMS). Two check types are available:

Static threshold — select a preset metric and define a numeric threshold.
Custom PromQL — write a PromQL expression to alert on any custom metric.

Prerequisites

Before you begin, ensure that you have:

An ApsaraDB for ClickHouse Community-compatible Edition or Enterprise Edition cluster
A RAM user with the AliyunARMSFullAccess permission granted for ARMS. For instructions, see Grant permissions to a RAM user.

You cannot view the monitoring information about Enterprise Edition clusters in the CloudMonitor console.

Create an alert rule

If your cluster meets either of the following conditions, follow the steps in Create an alert rule in the old console instead:

Your cluster was created before December 1, 2021.

Your cluster is deployed in the China (Qingdao) or China (Hohhot) region.

Log on to the ApsaraDB for ClickHouse console.
In the top navigation bar, select the region where your cluster is deployed.
On the Clusters page, click the Clusters of Community-compatible Edition tab or the Enterprise Edition Clusters tab, then click the cluster ID.
In the left-side navigation pane, click Monitoring and Alerting.
Click Cluster Alerting.
Click Create ClickHouseAlert Rule (for Community-compatible Edition) or Create Enterprise EditionClickHouseAlert Rule (for Enterprise Edition).

On the rule creation page, set Check type to Static Threshold or Custom PromQL, then configure the parameters described below.

Static threshold

Use this check type to monitor a preset metric against a fixed numeric threshold.

Parameter	Description	Example
Alert rule name	A name that identifies the alert rule.	CPU utilization alert
Check type	Set to Static Threshold.	Static Threshold
Cluster	The cluster to monitor.	cc-bp1lxbo89u95****
Alert contact group	The group that receives alert notifications. Available groups vary by Prometheus instance type.	ClickHouse
Alert metric	The metric to monitor. Available metrics vary by alert contact group.	cpu_usage
Alert condition	The condition that triggers an alert event.	When cpu usage > 80%, trigger alert
Filter conditions	Restricts the alert rule to a specific resource scope.	No Filter
Data Preview	Displays the PromQL statement derived from your alert condition, along with a time series graph of the metric. Use this to verify the condition behaves as expected before saving: the threshold appears as a red line; data points above the threshold appear in dark red, and data points below appear in blue. Hover over the curve to inspect values at a specific point in time. You can also select a time period on the time series curve to view the time series curve of the selected time period.	—
Duration	Controls when an alert event is generated: If the alert condition is met triggers immediately when the threshold is reached; If the alert condition is met continuously for N minutes triggers only after the threshold is exceeded for at least N consecutive minutes.	1
Alert level	Severity of the alert. Valid values: Default, P4, P3, P2, P1 (ascending severity).	P2
Alert message	The notification message sent to contacts. Supports Go template syntax and custom variables.	`node: {{$labels.pod_name}} CPU usage {{$labels.metrics_params_opt_label_value}} {{$labels.metrics_params_value}}%, current value {{ printf "%.2f" $value }}%`
Alert notification — Simple mode	Notification objects: the recipients. Notification period: only alerts generated within this time window are sent. Whether to resend notifications: controls resend behavior.	—
Alert notification — Standard mode	Select a notification policy from the Specify notification policy drop-down list. > Important After you select a notification policy, the alert events generated by this rule can be matched by the selected policy and alerts can be generated. The alert events may also be matched by other notification policies that use fuzzy match, and alerts may be generated. One or more alert events can be matched by one or more notification policies. Click Create notification policy to create a new one. For details, see Create and manage a notification policy.	Do Not Specify Notification Policy
Alert check cycle	How often the rule checks whether the alert condition is met. Default: 1 minute. Minimum: 1 minute.	1 minute
Check after the data is complete	Whether to run the check only after data is fully collected, transmitted, and written to storage. Default: Yes.	Yes
Tags	Tags attached to the alert rule for matching notification policies.	—
Annotations	Additional metadata for the alert rule.	—

Custom PromQL

Use this check type to write a PromQL expression and alert on any metric, including custom ones.

Parameter	Description	Example
Alert rule name	A name that identifies the alert rule.	Pod CPU utilization exceeds 8%
Check type	Set to Custom PromQL.	Custom PromQL
Cluster	The cluster to monitor.	cc-bp1lxbo89u95****
Reference alert contact group	The group that receives alert notifications. Available groups vary by Prometheus instance type.	ClickHouse
Reference metrics	Optional. Select a common metric to pre-fill the Custom PromQL statements field. Available metrics vary by Prometheus instance type.	http_conn_usage_count
Custom PromQL statements	The PromQL expression that defines the alert condition.	`clickhouse_http_conn_usage_count{} > 1000`
Data Preview	Displays a time series graph of the metric based on your PromQL expression. Use this to confirm the expression returns the expected data before saving.	—
Duration	Controls when an alert event is generated: If the alert condition is met triggers immediately; If the alert condition is met continuously for N minutes requires N consecutive minutes above the threshold.	1
Alert level	Severity of the alert. Valid values: Default, P4, P3, P2, P1 (ascending severity).	Default
Alert message	The notification message sent to contacts. Supports Go template syntax and custom variables.	`Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / The utilization of the {{$labels.device}} disk exceeds 90%. Current value: {{ printf "%.2f" $value }}%`
Alert notification — Simple mode	Notification objects: the recipients. Notification period: only alerts generated within this time window are sent. Whether to resend notifications: controls resend behavior.	—
Alert notification — Standard mode	Select a notification policy from the Specify notification policy drop-down list. Click Create notification policy to create a new one. For details, see Create and manage a notification policy.	Do Not Specify Notification Policy
Alert check cycle	How often the rule checks whether the alert condition is met. Default: 1 minute. Minimum: 1 minute.	1 minute
Check after the data is complete	Whether to run the check only after data is fully collected, transmitted, and written to storage. Default: Yes.	Yes
Tags	Tags attached to the alert rule for matching notification policies.	—
Annotations	Additional metadata for the alert rule.	—

Click Completed.

Create an alert rule using CloudMonitor

Log on to the CloudMonitor console.
In the left-side navigation pane, choose Alerts > Alert Rules.
On the Alert Rules page, click Create Alert Rule.
In the Create Alert Rule panel, configure the following parameters.

Parameter	Description
Product	Select based on when your Community-compatible Edition cluster was purchased: ClickHouse for clusters purchased before December 1, 2021; ClickHouse Community-compatible Edition for clusters purchased after December 1, 2021.
Resource range	The scope of resources the rule applies to: All resources, Application groups, or Instances.
Rule description	The alert condition. Click Add Rule, enter a rule name, and set Metric type to Single metric, Multiple metrics, or Dynamic threshold. For complex conditions, see Alert rule expressions. > Note The dynamic threshold feature is in invitational preview. To use it, submit a ticket.
Mute for	How long CloudMonitor waits before resending a notification if the alert is not cleared. Valid values: 5 minutes, 15 minutes, 30 minutes, 60 minutes, 3 hours, 6 hours, 12 hours, 24 hours.
Effective period	The time window during which the rule is active. Notifications are sent only within this window. Alert records are still logged when the rule is inactive.
Alert contact group	The group that receives alert notifications. A group can contain one or more alert contacts. To create contacts and groups, see Create an alert contact or alert contact group.
Tag	Tags for the alert rule. Maximum: 6 tags.
Advanced settings — Alert callback	A publicly accessible HTTP URL. CloudMonitor sends POST requests to this URL when an alert fires. To test connectivity, click Test next to the URL. For setup instructions, see Use the alert callback feature.
Advanced settings — Auto Scaling	If enabled, triggers a scaling rule when an alert fires. Requires Region, ESS group, and ESS rule. See Manage scaling groups and Configure scaling rules.
Advanced settings — Simple Log Service	If enabled, writes alert information to a Logstore when an alert fires. Requires Region, ProjectName, and Logstore. See Getting started.
Advanced settings — Simple Message Queue (formerly MNS) - topic	If enabled, sends alert information to an MNS topic when an alert fires. Requires Region and topicName. See Create a topic.
Advanced settings — Function Compute	If enabled, sends alert notifications to Function Compute for processing when an alert fires. Requires Region, Service, and Function. See Quickly create a function.
Method to handle alerts when no monitoring data is found	Specifies behavior when no data is available: Do not do anything (default), Send alert notifications, or Treated as normal.

Click OK. The alert rule takes effect immediately.

Create an alert rule in the old console

Follow these steps if your cluster meets either of the following conditions:

Your cluster was created before December 1, 2021.
Your cluster is deployed in the China (Qingdao) or China (Hohhot) region.

Log on to the ApsaraDB for ClickHouse console.
In the top navigation bar, select the region where your cluster is deployed.
On the Clusters page, click the Default Instances tab, find the cluster, and click the cluster ID.
In the left-side navigation pane, click Monitoring Details.
In the upper-right corner, click Alert Monitoring. This opens the CloudMonitor console.
In the left-side navigation pane of the CloudMonitor console, choose Alerts > Alert Rules.
On the Threshold Value Alert tab, click Create Alert Rule.

On the Create Alert Rule page, configure the parameters in the Relate resource section. Then configure the alert rule and notification method. For details, see Create an alert rule.

Create an alert contact group before configuring the notification method. See Create an alert contact or alert contact group.

Parameter	Description
Product	Select Clickhouse.
Resource range	All resources: applies the rule to all clusters. Cluster: applies the rule to selected clusters only.
Region	Required when Resource range is set to Cluster. Select the region of the cluster.
Cluster	Required when Resource range is set to Cluster. Select one or more cluster IDs.

Click Confirm. The alert rule automatically takes effect.

What's next

To view and manage alert rules configured through the ApsaraDB for ClickHouse console, see Manage alert rules.

References

What is CloudMonitor?