Manage Prometheus monitoring with Terraform - Application Real-Time Monitoring Service

The alicloud_arms_prometheus_monitoring Terraform resource lets you define ServiceMonitor, PodMonitor, custom scrape jobs, and health check agents (Probes) as code. This lets you version-control, review, and reproduce monitoring configurations across Prometheus instances.

Prerequisites

Before you begin, make sure that you have:

A Prometheus instance for Container Service or ECS. See
Terraform 0.12.28 or later. Run terraform --version to check your version
- Cloud Shell includes Terraform by default with your account pre-configured
- To install Terraform locally, see Install and configure Terraform
Alibaba Cloud credentials configured through one of the methods described in the following section

Configure credentials

To improve the flexibility and security of permission management, we recommend that you create a Resource Access Management (RAM) user named Terraform. Then, create an AccessKey pair for the RAM user and grant permissions to the RAM user. For more information, see Create a RAM user and Grant permissions to a RAM user.

Method 1: Environment variables

export ALICLOUD_ACCESS_KEY="<your-access-key>"
export ALICLOUD_SECRET_KEY="<your-secret-key>"
export ALICLOUD_REGION="cn-beijing"

Method 2: Provider block

provider "alicloud" {
  access_key = "<your-access-key>"
  secret_key = "<your-secret-key>"
  region     = "cn-beijing"
}

Replace the following placeholders with your actual values:

Placeholder	Description	Example
`<your-access-key>`	AccessKey ID of your RAM user	LTAI5tXxx
`<your-secret-key>`	AccessKey secret of your RAM user	xXxXxXx

Note

Specify the region based on your business requirements. For example, cn-beijing.

Note

Resource Orchestration Service (ROS) is a native infrastructure-as-code (IaC) service provided by Alibaba Cloud. It also supports the integration of Terraform templates. By using Terraform with ROS, you can define and manage resources in Alibaba Cloud, Amazon Web Services (AWS), or Microsoft Azure, specify resource parameters, and configure dependency relationships for the resources. See Create a Terraform template and Create a Terraform stack.

Supported monitoring types

The monitoring types available depend on your Prometheus instance type:

Instance type	Supported monitoring types
Prometheus for Container Service	ServiceMonitor, PodMonitor, custom jobs, health check agents
Prometheus for ECS	Custom jobs and health check agents only

Health check agent constraints:

The status parameter is not supported for Probe resources.
Name format: <custom-name>-{tcp|http|ping}-blackbox. For example, name1-tcp-blackbox indicates a TCP health check.
For ECS instances (fully managed), the namespace must be empty or follow the format <vpc-id>-<user-id>. For example, vpc-0jl4q1q2of2tagvwxxxx-11032353609xxxx.

Argument reference

All monitoring types use the alicloud_arms_prometheus_monitoring resource with the following arguments:

Argument	Required	Description
`cluster_id`	Yes	The ID of the Prometheus instance
`type`	Yes	Monitoring type: `serviceMonitor`, `podMonitor`, `customJob`, or `probe`
`config_yaml`	Yes	YAML configuration for the monitoring resource (heredoc format)
`status`	No	Run status. Set to `run` to activate. Not supported for the `probe` type

Deploy a monitoring resource

All monitoring types share the same Terraform workflow. Each section below provides the config_yaml for a specific type.

Create a working directory and add a main.tf file with the provider block:
```
   provider "alicloud" {
   }
```

Initialize Terraform: Expected output:

   terraform init

   Initializing the backend...

   Initializing provider plugins...
   - Checking for available provider plugins...
   - Downloading plugin for provider "alicloud" (hashicorp/alicloud) 1.90.1...
   ...

   You may now begin working with Terraform. Try running "terraform plan" to see
   any changes that are required for your infrastructure. All Terraform commands
   should now work.

   If you ever set or change modules or backend configuration for Terraform,
   rerun this command to reinitialize your working directory. If you forget, other
   commands will detect it and remind you to do so if necessary.

Add the resource configuration for your monitoring type to main.tf. See the configuration examples in the following sections.

Preview the changes: The output shows the resources that Terraform will create. For example:

   terraform plan

   Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
     + create

   Terraform will perform the following actions:
   ...
   Plan: 1 to add, 0 to change, 0 to destroy.

Apply the configuration: The output shows the execution plan and prompts for confirmation: If yes is returned, the monitoring resource is created for the current Prometheus instance.

   terraform apply

   ...
   Plan: 1 to add, 0 to change, 0 to destroy.

   Do you want to perform these actions?
     Terraform will perform the actions described above.
     Only 'yes' will be accepted to approve.

     Enter a value: yes

Add a ServiceMonitor

Add the following resource block to main.tf:

resource "alicloud_arms_prometheus_monitoring" "myServiceMonitor1" {
  cluster_id  = "c77e1106f429e4b46b0ee1720cxxxxx"   # The ID of the Prometheus instance.
  status      = "run"
  type        = "serviceMonitor"
  config_yaml = <<-EOT
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: tomcat-demo
      namespace: default
    spec:
      endpoints:
        - interval: 30s
          path: /metrics
          port: tomcat-monitor
      namespaceSelector:
        any: true
      selector:
        matchLabels:
          app: tomcat
  EOT
}

Key fields in config_yaml:

Field	Description
`metadata.name`	Name of the ServiceMonitor
`metadata.namespace`	Namespace where the ServiceMonitor is created
`spec.endpoints[].interval`	Scrape interval (for example, `30s`)
`spec.endpoints[].path`	Metrics endpoint path (typically `/metrics`)
`spec.endpoints[].port`	Named port to scrape
`spec.namespaceSelector.any`	Set to `true` to match Services in all namespaces
`spec.selector.matchLabels`	Label selector to match target Services

Verify the ServiceMonitor

Log on to the ARMS console.
In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
Click the name of your Prometheus instance to open the Integration Center.
In the Installed section, click the custom component. On the Service Discovery Configurations tab, verify that the ServiceMonitor appears.

Add a PodMonitor

Add the following resource block to main.tf:

resource "alicloud_arms_prometheus_monitoring" "myPodMonitor1" {
  cluster_id  = "c77e1106f429e4b46b0ee1720cxxxxx"   # The ID of the Prometheus instance.
  status      = "run"
  type        = "podMonitor"
  config_yaml = <<-EOT
    apiVersion: "monitoring.coreos.com/v1"
    kind: "PodMonitor"
    metadata:
      name: "podmonitor-demo"
      namespace: "default"
    spec:
      namespaceSelector:
        any: true
      podMetricsEndpoints:
        - interval: "30s"
          path: "/metrics"
          port: "tomcat-monitor"
      selector:
        matchLabels:
          app: "nginx2-exporter"
  EOT
}

Key fields in config_yaml:

Field	Description
`metadata.name`	Name of the PodMonitor
`metadata.namespace`	Namespace where the PodMonitor is created
`spec.podMetricsEndpoints[].interval`	Scrape interval (for example, `30s`)
`spec.podMetricsEndpoints[].path`	Metrics endpoint path
`spec.podMetricsEndpoints[].port`	Named port to scrape
`spec.namespaceSelector.any`	Set to `true` to match Pods in all namespaces
`spec.selector.matchLabels`	Label selector to match target Pods

Verify the PodMonitor

Log on to the ARMS console.
In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
Click the name of your Prometheus instance to open the Integration Center.
In the Installed section, click the custom component. On the Service Discovery Configurations tab, verify that the PodMonitor appears.

Add a custom job

Add the following resource block to main.tf:

resource "alicloud_arms_prometheus_monitoring" "myCustomJob1" {
  cluster_id  = "c77e1106f429e4b46b0ee1720cxxxxx"   # The ID of the Prometheus instance.
  status      = "run"
  type        = "customJob"
  config_yaml = <<-EOT
    scrape_configs:
      - job_name: prometheus1
        honor_timestamps: false
        honor_labels: false
        scheme: http
        metrics_path: /metric
        static_configs:
          - targets:
              - 127.0.0.1:9090
  EOT
}

Key fields in config_yaml:

Field	Description
`scrape_configs[].job_name`	Name of the scrape job
`scrape_configs[].scheme`	Protocol to use (`http` or `https`)
`scrape_configs[].metrics_path`	Path to the metrics endpoint
`scrape_configs[].static_configs[].targets`	List of `host:port` targets to scrape
`scrape_configs[].honor_timestamps`	When `true`, uses timestamps from scraped metrics instead of server time
`scrape_configs[].honor_labels`	When `true`, keeps scraped labels that conflict with server-attached labels

Verify the custom job

Log on to the ARMS console.
In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
Click the name of your Prometheus instance to open the Integration Center.
In the Installed section, click the custom component. On the Service Discovery Configurations tab, verify that the custom job appears.

Add a health check agent

Add the following resource block to main.tf:

resource "alicloud_arms_prometheus_monitoring" "myProbe1" {
  cluster_id  = "c77e1106f429e4b46b0ee1720cxxxxx"   # The ID of the Prometheus instance.
  type        = "probe"
  config_yaml = <<-EOT
    apiVersion: monitoring.coreos.com/v1
    kind: Probe
    metadata:
      name: name1-tcp-blackbox
      namespace: arms-prom
    spec:
      interval: 30s
      jobName: blackbox
      module: tcp_connect
      prober:
        path: /blackbox/probe
        scheme: http
        url: 'localhost:9335'
      targets:
        staticConfig:
          static:
            - 'arms-prom-admin.arms-prom:9335'
  EOT
}

Note

Do not set the status parameter for Probe resources. It is not supported.

Key fields in config_yaml:

Field	Description
`metadata.name`	Agent name. Must follow the format `<custom-name>-{tcp\|http\|ping}-blackbox`
`metadata.namespace`	Namespace. For ECS instances, leave blank or use the format `<vpc-id>-<user-id>`
`spec.interval`	Health check interval (for example, `30s`)
`spec.jobName`	Keep the default value `blackbox`
`spec.module`	Probe module: `tcp_connect`, `http_2xx`, or `icmp`
`spec.prober`	Prober endpoint configuration. Keep the default values
`spec.targets.staticConfig.static`	List of `host:port` targets to check

Verify the health check agent

Log on to the ARMS console.
In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
Click the name of your Prometheus instance to open the Integration Center.
In the Installed section, click the Blackbox component. On the Health Check tab, verify that the agent appears.

Delete monitoring resources

To remove all monitoring resources managed by the current Terraform configuration, run:

terraform destroy

Enter yes when prompted to confirm. Expected output:

...
Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes
...
Destroy complete! Resources: 1 destroyed.

Verify the deletion

Log on to the ARMS console.
In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
Click the name of your Prometheus instance to open the Integration Center.
In the Installed section, click the custom or Blackbox component. On the Service Discovery Configurations or Health Check tab, confirm that the monitoring settings have been removed.

Application Real-Time Monitoring Service:Manage Prometheus monitoring with Terraform

Prerequisites

Configure credentials

Supported monitoring types

Argument reference

Deploy a monitoring resource

Add a ServiceMonitor

Verify the ServiceMonitor

Add a PodMonitor

Verify the PodMonitor

Add a custom job

Verify the custom job

Add a health check agent

Verify the health check agent

Delete monitoring resources

Verify the deletion

What's next