The alicloud_arms_prometheus_monitoring Terraform resource lets you define ServiceMonitor, PodMonitor, custom scrape jobs, and health check agents (Probes) as code. This lets you version-control, review, and reproduce monitoring configurations across Prometheus instances.
Prerequisites
Before you begin, make sure that you have:
A Prometheus instance for Container Service or ECS. See
Terraform 0.12.28 or later. Run
terraform --versionto check your versionCloud Shell includes Terraform by default with your account pre-configured
To install Terraform locally, see Install and configure Terraform
Alibaba Cloud credentials configured through one of the methods described in the following section
Configure credentials
To improve the flexibility and security of permission management, we recommend that you create a Resource Access Management (RAM) user named Terraform. Then, create an AccessKey pair for the RAM user and grant permissions to the RAM user. For more information, see Create a RAM user and Grant permissions to a RAM user.
Method 1: Environment variables
export ALICLOUD_ACCESS_KEY="<your-access-key>"
export ALICLOUD_SECRET_KEY="<your-secret-key>"
export ALICLOUD_REGION="cn-beijing"Method 2: Provider block
provider "alicloud" {
access_key = "<your-access-key>"
secret_key = "<your-secret-key>"
region = "cn-beijing"
}Replace the following placeholders with your actual values:
| Placeholder | Description | Example |
|---|---|---|
<your-access-key> | AccessKey ID of your RAM user | LTAI5tXxx |
<your-secret-key> | AccessKey secret of your RAM user | xXxXxXx |
Specify the region based on your business requirements. For example, cn-beijing.
Resource Orchestration Service (ROS) is a native infrastructure-as-code (IaC) service provided by Alibaba Cloud. It also supports the integration of Terraform templates. By using Terraform with ROS, you can define and manage resources in Alibaba Cloud, Amazon Web Services (AWS), or Microsoft Azure, specify resource parameters, and configure dependency relationships for the resources. See Create a Terraform template and Create a Terraform stack.
Supported monitoring types
The monitoring types available depend on your Prometheus instance type:
| Instance type | Supported monitoring types |
|---|---|
| Prometheus for Container Service | ServiceMonitor, PodMonitor, custom jobs, health check agents |
| Prometheus for ECS | Custom jobs and health check agents only |
Health check agent constraints:
The
statusparameter is not supported for Probe resources.Name format:
<custom-name>-{tcp|http|ping}-blackbox. For example,name1-tcp-blackboxindicates a TCP health check.For ECS instances (fully managed), the namespace must be empty or follow the format
<vpc-id>-<user-id>. For example,vpc-0jl4q1q2of2tagvwxxxx-11032353609xxxx.
Argument reference
All monitoring types use the alicloud_arms_prometheus_monitoring resource with the following arguments:
| Argument | Required | Description |
|---|---|---|
cluster_id | Yes | The ID of the Prometheus instance |
type | Yes | Monitoring type: serviceMonitor, podMonitor, customJob, or probe |
config_yaml | Yes | YAML configuration for the monitoring resource (heredoc format) |
status | No | Run status. Set to run to activate. Not supported for the probe type |
Deploy a monitoring resource
All monitoring types share the same Terraform workflow. Each section below provides the config_yaml for a specific type.
Create a working directory and add a
main.tffile with the provider block:provider "alicloud" { }Initialize Terraform: Expected output:
terraform initInitializing the backend... Initializing provider plugins... - Checking for available provider plugins... - Downloading plugin for provider "alicloud" (hashicorp/alicloud) 1.90.1... ... You may now begin working with Terraform. Try running "terraform plan" to see any changes that are required for your infrastructure. All Terraform commands should now work. If you ever set or change modules or backend configuration for Terraform, rerun this command to reinitialize your working directory. If you forget, other commands will detect it and remind you to do so if necessary.Add the resource configuration for your monitoring type to
main.tf. See the configuration examples in the following sections.Preview the changes: The output shows the resources that Terraform will create. For example:
terraform planTerraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: ... Plan: 1 to add, 0 to change, 0 to destroy.Apply the configuration: The output shows the execution plan and prompts for confirmation: If
yesis returned, the monitoring resource is created for the current Prometheus instance.terraform apply... Plan: 1 to add, 0 to change, 0 to destroy. Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes
Add a ServiceMonitor
Add the following resource block to main.tf:
resource "alicloud_arms_prometheus_monitoring" "myServiceMonitor1" {
cluster_id = "c77e1106f429e4b46b0ee1720cxxxxx" # The ID of the Prometheus instance.
status = "run"
type = "serviceMonitor"
config_yaml = <<-EOT
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: tomcat-demo
namespace: default
spec:
endpoints:
- interval: 30s
path: /metrics
port: tomcat-monitor
namespaceSelector:
any: true
selector:
matchLabels:
app: tomcat
EOT
}Key fields in config_yaml:
| Field | Description |
|---|---|
metadata.name | Name of the ServiceMonitor |
metadata.namespace | Namespace where the ServiceMonitor is created |
spec.endpoints[].interval | Scrape interval (for example, 30s) |
spec.endpoints[].path | Metrics endpoint path (typically /metrics) |
spec.endpoints[].port | Named port to scrape |
spec.namespaceSelector.any | Set to true to match Services in all namespaces |
spec.selector.matchLabels | Label selector to match target Services |
Verify the ServiceMonitor
Log on to the ARMS console.
In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
Click the name of your Prometheus instance to open the Integration Center.
In the Installed section, click the custom component. On the Service Discovery Configurations tab, verify that the ServiceMonitor appears.
Add a PodMonitor
Add the following resource block to main.tf:
resource "alicloud_arms_prometheus_monitoring" "myPodMonitor1" {
cluster_id = "c77e1106f429e4b46b0ee1720cxxxxx" # The ID of the Prometheus instance.
status = "run"
type = "podMonitor"
config_yaml = <<-EOT
apiVersion: "monitoring.coreos.com/v1"
kind: "PodMonitor"
metadata:
name: "podmonitor-demo"
namespace: "default"
spec:
namespaceSelector:
any: true
podMetricsEndpoints:
- interval: "30s"
path: "/metrics"
port: "tomcat-monitor"
selector:
matchLabels:
app: "nginx2-exporter"
EOT
}Key fields in config_yaml:
| Field | Description |
|---|---|
metadata.name | Name of the PodMonitor |
metadata.namespace | Namespace where the PodMonitor is created |
spec.podMetricsEndpoints[].interval | Scrape interval (for example, 30s) |
spec.podMetricsEndpoints[].path | Metrics endpoint path |
spec.podMetricsEndpoints[].port | Named port to scrape |
spec.namespaceSelector.any | Set to true to match Pods in all namespaces |
spec.selector.matchLabels | Label selector to match target Pods |
Verify the PodMonitor
Log on to the ARMS console.
In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
Click the name of your Prometheus instance to open the Integration Center.
In the Installed section, click the custom component. On the Service Discovery Configurations tab, verify that the PodMonitor appears.
Add a custom job
Add the following resource block to main.tf:
resource "alicloud_arms_prometheus_monitoring" "myCustomJob1" {
cluster_id = "c77e1106f429e4b46b0ee1720cxxxxx" # The ID of the Prometheus instance.
status = "run"
type = "customJob"
config_yaml = <<-EOT
scrape_configs:
- job_name: prometheus1
honor_timestamps: false
honor_labels: false
scheme: http
metrics_path: /metric
static_configs:
- targets:
- 127.0.0.1:9090
EOT
}Key fields in config_yaml:
| Field | Description |
|---|---|
scrape_configs[].job_name | Name of the scrape job |
scrape_configs[].scheme | Protocol to use (http or https) |
scrape_configs[].metrics_path | Path to the metrics endpoint |
scrape_configs[].static_configs[].targets | List of host:port targets to scrape |
scrape_configs[].honor_timestamps | When true, uses timestamps from scraped metrics instead of server time |
scrape_configs[].honor_labels | When true, keeps scraped labels that conflict with server-attached labels |
Verify the custom job
Log on to the ARMS console.
In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
Click the name of your Prometheus instance to open the Integration Center.
In the Installed section, click the custom component. On the Service Discovery Configurations tab, verify that the custom job appears.
Add a health check agent
Add the following resource block to main.tf:
resource "alicloud_arms_prometheus_monitoring" "myProbe1" {
cluster_id = "c77e1106f429e4b46b0ee1720cxxxxx" # The ID of the Prometheus instance.
type = "probe"
config_yaml = <<-EOT
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
name: name1-tcp-blackbox
namespace: arms-prom
spec:
interval: 30s
jobName: blackbox
module: tcp_connect
prober:
path: /blackbox/probe
scheme: http
url: 'localhost:9335'
targets:
staticConfig:
static:
- 'arms-prom-admin.arms-prom:9335'
EOT
}Do not set the status parameter for Probe resources. It is not supported.
Key fields in config_yaml:
| Field | Description |
|---|---|
metadata.name | Agent name. Must follow the format <custom-name>-{tcp|http|ping}-blackbox |
metadata.namespace | Namespace. For ECS instances, leave blank or use the format <vpc-id>-<user-id> |
spec.interval | Health check interval (for example, 30s) |
spec.jobName | Keep the default value blackbox |
spec.module | Probe module: tcp_connect, http_2xx, or icmp |
spec.prober | Prober endpoint configuration. Keep the default values |
spec.targets.staticConfig.static | List of host:port targets to check |
Verify the health check agent
Log on to the ARMS console.
In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
Click the name of your Prometheus instance to open the Integration Center.
In the Installed section, click the Blackbox component. On the Health Check tab, verify that the agent appears.
Delete monitoring resources
To remove all monitoring resources managed by the current Terraform configuration, run:
terraform destroyEnter yes when prompted to confirm. Expected output:
...
Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.
Enter a value: yes
...
Destroy complete! Resources: 1 destroyed.Verify the deletion
Log on to the ARMS console.
In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
Click the name of your Prometheus instance to open the Integration Center.
In the Installed section, click the custom or Blackbox component. On the Service Discovery Configurations or Health Check tab, confirm that the monitoring settings have been removed.