ApsaraDB for ClickHouse lets you set alert rules on key cluster metrics. When a metric value falls outside the range you define, the system sends a notification to your alert contact group so you can respond quickly.
Alert rules are powered by Application Real-Time Monitoring Service (ARMS). Two check types are available:
Static threshold — select a preset metric and define a numeric threshold.
Custom PromQL — write a PromQL expression to alert on any custom metric.
Prerequisites
Before you begin, ensure that you have:
An ApsaraDB for ClickHouse Community-compatible Edition or Enterprise Edition cluster
A RAM user with the AliyunARMSFullAccess permission granted for ARMS. For instructions, see Grant permissions to a RAM user.
You cannot view the monitoring information about Enterprise Edition clusters in the CloudMonitor console.
Create an alert rule
If your cluster meets either of the following conditions, follow the steps in Create an alert rule in the old console instead:
Your cluster was created before December 1, 2021.
Your cluster is deployed in the China (Qingdao) or China (Hohhot) region.
Log on to the ApsaraDB for ClickHouse console.
In the top navigation bar, select the region where your cluster is deployed.
On the Clusters page, click the Clusters of Community-compatible Edition tab or the Enterprise Edition Clusters tab, then click the cluster ID.
In the left-side navigation pane, click Monitoring and Alerting.
Click Cluster Alerting.
Click Create ClickHouseAlert Rule (for Community-compatible Edition) or Create Enterprise EditionClickHouseAlert Rule (for Enterprise Edition).
On the rule creation page, set Check type to Static Threshold or Custom PromQL, then configure the parameters described below.
Static threshold
Use this check type to monitor a preset metric against a fixed numeric threshold.
Parameter Description Example Alert rule name A name that identifies the alert rule. CPU utilization alert Check type Set to Static Threshold. Static Threshold Cluster The cluster to monitor. cc-bp1lxbo89u95**** Alert contact group The group that receives alert notifications. Available groups vary by Prometheus instance type. ClickHouse Alert metric The metric to monitor. Available metrics vary by alert contact group. cpu_usage Alert condition The condition that triggers an alert event. When cpu usage > 80%, trigger alert Filter conditions Restricts the alert rule to a specific resource scope. No Filter Data Preview Displays the PromQL statement derived from your alert condition, along with a time series graph of the metric. Use this to verify the condition behaves as expected before saving: the threshold appears as a red line; data points above the threshold appear in dark red, and data points below appear in blue. Hover over the curve to inspect values at a specific point in time. You can also select a time period on the time series curve to view the time series curve of the selected time period. — Duration Controls when an alert event is generated: If the alert condition is met triggers immediately when the threshold is reached; If the alert condition is met continuously for N minutes triggers only after the threshold is exceeded for at least N consecutive minutes. 1 Alert level Severity of the alert. Valid values: Default, P4, P3, P2, P1 (ascending severity). P2 Alert message The notification message sent to contacts. Supports Go template syntax and custom variables. node: {{$labels.pod_name}} CPU usage {{$labels.metrics_params_opt_label_value}} {{$labels.metrics_params_value}}%, current value {{ printf "%.2f" $value }}%Alert notification — Simple mode Notification objects: the recipients. Notification period: only alerts generated within this time window are sent. Whether to resend notifications: controls resend behavior. — Alert notification — Standard mode Select a notification policy from the Specify notification policy drop-down list. > ImportantAfter you select a notification policy, the alert events generated by this rule can be matched by the selected policy and alerts can be generated. The alert events may also be matched by other notification policies that use fuzzy match, and alerts may be generated. One or more alert events can be matched by one or more notification policies. Click Create notification policy to create a new one. For details, see Create and manage a notification policy.
Do Not Specify Notification Policy Alert check cycle How often the rule checks whether the alert condition is met. Default: 1 minute. Minimum: 1 minute. 1 minute Check after the data is complete Whether to run the check only after data is fully collected, transmitted, and written to storage. Default: Yes. Yes Tags Tags attached to the alert rule for matching notification policies. — Annotations Additional metadata for the alert rule. — Custom PromQL
Use this check type to write a PromQL expression and alert on any metric, including custom ones.
Parameter Description Example Alert rule name A name that identifies the alert rule. Pod CPU utilization exceeds 8% Check type Set to Custom PromQL. Custom PromQL Cluster The cluster to monitor. cc-bp1lxbo89u95**** Reference alert contact group The group that receives alert notifications. Available groups vary by Prometheus instance type. ClickHouse Reference metrics Optional. Select a common metric to pre-fill the Custom PromQL statements field. Available metrics vary by Prometheus instance type. http_conn_usage_count Custom PromQL statements The PromQL expression that defines the alert condition. clickhouse_http_conn_usage_count{} > 1000Data Preview Displays a time series graph of the metric based on your PromQL expression. Use this to confirm the expression returns the expected data before saving. — Duration Controls when an alert event is generated: If the alert condition is met triggers immediately; If the alert condition is met continuously for N minutes requires N consecutive minutes above the threshold. 1 Alert level Severity of the alert. Valid values: Default, P4, P3, P2, P1 (ascending severity). Default Alert message The notification message sent to contacts. Supports Go template syntax and custom variables. Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / The utilization of the {{$labels.device}} disk exceeds 90%. Current value: {{ printf "%.2f" $value }}%Alert notification — Simple mode Notification objects: the recipients. Notification period: only alerts generated within this time window are sent. Whether to resend notifications: controls resend behavior. — Alert notification — Standard mode Select a notification policy from the Specify notification policy drop-down list. Click Create notification policy to create a new one. For details, see Create and manage a notification policy. Do Not Specify Notification Policy Alert check cycle How often the rule checks whether the alert condition is met. Default: 1 minute. Minimum: 1 minute. 1 minute Check after the data is complete Whether to run the check only after data is fully collected, transmitted, and written to storage. Default: Yes. Yes Tags Tags attached to the alert rule for matching notification policies. — Annotations Additional metadata for the alert rule. — Click Completed.
Create an alert rule using CloudMonitor
Log on to the CloudMonitor console.
In the left-side navigation pane, choose Alerts > Alert Rules.
On the Alert Rules page, click Create Alert Rule.
In the Create Alert Rule panel, configure the following parameters.
| Parameter | Description |
|---|---|
| Product | Select based on when your Community-compatible Edition cluster was purchased: ClickHouse for clusters purchased before December 1, 2021; ClickHouse Community-compatible Edition for clusters purchased after December 1, 2021. |
| Resource range | The scope of resources the rule applies to: All resources, Application groups, or Instances. |
| Rule description | The alert condition. Click Add Rule, enter a rule name, and set Metric type to Single metric, Multiple metrics, or Dynamic threshold. For complex conditions, see Alert rule expressions. > Note The dynamic threshold feature is in invitational preview. To use it, submit a ticket. |
| Mute for | How long CloudMonitor waits before resending a notification if the alert is not cleared. Valid values: 5 minutes, 15 minutes, 30 minutes, 60 minutes, 3 hours, 6 hours, 12 hours, 24 hours. |
| Effective period | The time window during which the rule is active. Notifications are sent only within this window. Alert records are still logged when the rule is inactive. |
| Alert contact group | The group that receives alert notifications. A group can contain one or more alert contacts. To create contacts and groups, see Create an alert contact or alert contact group. |
| Tag | Tags for the alert rule. Maximum: 6 tags. |
| Advanced settings — Alert callback | A publicly accessible HTTP URL. CloudMonitor sends POST requests to this URL when an alert fires. To test connectivity, click Test next to the URL. For setup instructions, see Use the alert callback feature. |
| Advanced settings — Auto Scaling | If enabled, triggers a scaling rule when an alert fires. Requires Region, ESS group, and ESS rule. See Manage scaling groups and Configure scaling rules. |
| Advanced settings — Simple Log Service | If enabled, writes alert information to a Logstore when an alert fires. Requires Region, ProjectName, and Logstore. See Getting started. |
| Advanced settings — Simple Message Queue (formerly MNS) - topic | If enabled, sends alert information to an MNS topic when an alert fires. Requires Region and topicName. See Create a topic. |
| Advanced settings — Function Compute | If enabled, sends alert notifications to Function Compute for processing when an alert fires. Requires Region, Service, and Function. See Quickly create a function. |
| Method to handle alerts when no monitoring data is found | Specifies behavior when no data is available: Do not do anything (default), Send alert notifications, or Treated as normal. |
Click OK. The alert rule takes effect immediately.
Create an alert rule in the old console
Follow these steps if your cluster meets either of the following conditions:
Your cluster was created before December 1, 2021.
Your cluster is deployed in the China (Qingdao) or China (Hohhot) region.
Log on to the ApsaraDB for ClickHouse console.
In the top navigation bar, select the region where your cluster is deployed.
On the Clusters page, click the Default Instances tab, find the cluster, and click the cluster ID.
In the left-side navigation pane, click Monitoring Details.
In the upper-right corner, click Alert Monitoring. This opens the CloudMonitor console.
In the left-side navigation pane of the CloudMonitor console, choose Alerts > Alert Rules.
On the Threshold Value Alert tab, click Create Alert Rule.
On the Create Alert Rule page, configure the parameters in the Relate resource section. Then configure the alert rule and notification method. For details, see Create an alert rule.
Create an alert contact group before configuring the notification method. See Create an alert contact or alert contact group.
Parameter Description Product Select Clickhouse. Resource range All resources: applies the rule to all clusters. Cluster: applies the rule to selected clusters only. Region Required when Resource range is set to Cluster. Select the region of the cluster. Cluster Required when Resource range is set to Cluster. Select one or more cluster IDs. Click Confirm. The alert rule automatically takes effect.
What's next
To view and manage alert rules configured through the ApsaraDB for ClickHouse console, see Manage alert rules.