Write metrics to Managed Service for Prometheus with the OpenTelemetry Collector - Managed Service for Prometheus

This topic describes how to send metrics to Managed Service for Prometheus from applications that are deployed in Container Service for Kubernetes (ACK) clusters and instrumented with the OpenTelemetry SDK. You can unify the collection of native OpenTelemetry (OTel) metrics, custom business metrics, and metrics converted from trace spans to achieve full application observability. Converting trace data into metrics can significantly reduce data ingestion volume.

OpenTelemetry overview

OpenTelemetry (OTel) is an open-source observability framework that provides a unified set of APIs, SDKs, and tools to generate, collect, and export telemetry data from distributed systems, including metrics, traces, and logs. Its core goal is to resolve the fragmentation of observability data across different tools and systems.

OpenTelemetry metrics are structured data that quantify system behavior and are used to monitor system performance and health. The source of metrics depends on the SDK or agent instrumentation for a specific language. For a Java application, metrics typically include standard Java Virtual Machine (JVM) metrics, custom user-instrumented metrics, and metrics derived from trace data.

OpenTelemetry metrics are designed to be compatible with Prometheus. Using the OpenTelemetry Collector, you can convert metrics to the Prometheus format for seamless integration with Managed Service for Prometheus.

Role of the OpenTelemetry Collector

The OpenTelemetry Collector is an extensible data processing pipeline that collects telemetry data from sources such as applications and services, and converts the data into the format required by a target system such as Prometheus.

Data collection

Applications use the OpenTelemetry SDK to generate metric data and send it to the Collector using the OpenTelemetry Protocol (OTLP) or other protocols, such as the native Prometheus protocol.

Data conversion

Community Collector extensions provide two exporters to convert OTel metrics to the Prometheus format.

The Prometheus Exporter converts OpenTelemetry metrics to the Prometheus format and provides an endpoint for a Prometheus agent to scrape data.
1. Metric name mapping: Converts OpenTelemetry metric names to a Prometheus-compatible format.
2. Label handling: Preserves or renames labels to comply with Prometheus naming rules.
3. Data type conversion:
  - gauge → Prometheus gauge
  - sum → Prometheus counter or gauge (based on the Monotonic property)
  - histogram → Prometheus histogram (by using bucket and sum sub-metrics)
  The following configuration example exposes a metric scraping endpoint on port 1234.
```
exporters:
  prometheus:
    endpoint: "0.0.0.0:1234"
    namespace: "acs"
    const_labels:
      label1: value1
    send_timestamps: true
    metric_expiration: 5m
    enable_open_metrics: true
    add_metric_suffixes: false
    resource_to_telemetry_conversion:
      enabled: true
```
The Prometheus Remote Write Exporter converts OpenTelemetry metrics to the Prometheus format and writes them directly to the target Prometheus service by using the Remote Write protocol.
Similar to the Prometheus Exporter, this exporter also converts the data format. The following code provides a configuration example:
```
exporters:
  prometheusremotewrite:
    endpoint: http://<Prometheus Endpoint>/api/v1/write
    namespace: "acs"
    resource_to_telemetry_conversion:
      enabled: true
    timeout: 10s   
    headers:
      Prometheus-Remote-Write-Version: "0.1.0"
    external_labels:
      data-mode: metrics
```

Best practices for applications in ACK clusters

Step 1: Prepare Managed Service for Prometheus

Prometheus monitoring is enabled for the cluster

If Prometheus monitoring is enabled for your ACK cluster, a Prometheus instance already exists. Log on to the CloudMonitor console. In the left-side navigation pane, choose Managed Service for Prometheus > Instances. Find the Prometheus instance that has the same name as your ACK cluster and confirm that the instance is in the Running state in the specified region.
Prometheus monitoring is not enabled for the cluster

Log on to the ACK console. Click the name of the target cluster and go to the Add-ons page. On the Logging and Monitoring tab, find and install the ack-arms-prometheus component. After installation, Prometheus monitoring is automatically enabled for the cluster. Verify that the component is installed.

Step 2: Deploy the Collector in sidecar mode

For accurate metric calculations, all metrics and traces from a single pod must be sent to the same Collector. Deploying in gateway mode requires complex load balancing configurations. Therefore, we recommend that you deploy the Collector in sidecar mode.

Prometheus Exporter mode

Note

This method exempts you from handling Prometheus write-path authentication. You can also change the scraping configuration to adjust the metric collection interval.

Architecture diagram

Deployment configuration example

# Kubernetes Deployment example
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    metadata:
      labels:
        # Add a specific label to the pod. The label is usually named after the application to facilitate metric scraping configuration.
        observability: opentelemetry-collector
    spec:
      volumes:
      - name: otel-config-volume
        configMap:
          # This configuration is created based on the following Collector configuration example.
          name: otel-config
      containers:
        - name: app
          image: your-app:latest
          env:
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: http://localhost:4317
        - name: otel-collector
          # You can use the Collector image that we provide. The image includes Prometheus-related extensions.
          # Replace  in the image name with the ID of your region.
          image: registry-<regionId>.ack.aliyuncs.com/acs/otel-collector:v0.128.0-7436f91	
          args: ["--config=/etc/otel/config/otel-config.yaml"]
          ports:
            - containerPort: 1234  # Prometheus endpoint
              name: metrics
          volumeMounts:
          - name: otel-config-volume
            mountPath: /etc/otel/config

Collector configuration example

Note

Configure the resource limits (CPU and memory) for the Collector based on your application's request volume to ensure that it can process all data.

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-config
  namespace: <app-namespace>
data:
  otel-config.yaml: |
    extensions:
      zpages:
        endpoint: localhost:55679
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      batch:
      memory_limiter:
        # 75% of maximum memory up to 2 GB
        limit_mib: 1536
        # 25% of limit up to 2 GB
        spike_limit_mib: 512
        check_interval: 5s
      resource:
        attributes:
          - key: process.runtime.description
            action: delete
          - key: process.command_args
            action: delete
          - key: telemetry.distro.version
            action: delete
          - key: telemetry.sdk.name
            action: delete
          - key: telemetry.sdk.version
            action: delete
          - key: service.instance.id
            action: delete
          - key: process.runtime.name
            action: delete
          - key: process.runtime.description
            action: delete
          - key: process.pid
            action: delete
          - key: process.executable.path
            action: delete
          - key: process.command.args
            action: delete
          - key: os.description
            action: delete
          - key: instance
            action: delete
          - key: container.id
            action: delete
    connectors:
      spanmetrics:
        histogram:
          explicit:
            buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10]
        dimensions: 
          - name: http.method
            default: "GET"
          - name: http.response.status_code
          - name: http.route
            # Custom attribute
          - name: user.id
        metrics_flush_interval: 15s
        exclude_dimensions:
        metrics_expiration: 3m
        events:
          enabled: true
          dimensions: 
          - name: default
            default: "GET"
    exporters:
      debug:
        verbosity: detailed
      prometheus:
        endpoint: "0.0.0.0:1234"
        namespace: "acs"
        const_labels:
          label1: value1
        send_timestamps: true
        metric_expiration: 5m
        enable_open_metrics: true
        add_metric_suffixes: false
        resource_to_telemetry_conversion:
          enabled: true
    service:
      pipelines:
        logs:
          receivers: [otlp]
          exporters: [debug]
        traces:
          receivers: [otlp]
          processors: [resource]
          exporters: [spanmetrics]
        metrics:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [prometheus]
        metrics/2:
          receivers: [spanmetrics]
          exporters: [prometheus]
      extensions: [zpages]

This configuration processes incoming Metrics and Traces data. It uses processors of the resource type to discard environment Attributes that are typically not of interest, which prevents the volume of metric data from becoming too large. It also uses spanmetrics to convert key Span statistics into metrics.

Scraping task configuration

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: opentelemetry-collector-podmonitor
  namespace: default
  annotations:
    arms.prometheus.io/discovery: "true"
spec:
  selector:
    matchLabels:
      observability: opentelemetry-collector
  podMetricsEndpoints:
  - port: metrics
    interval: 15s
    scheme: http
    path: /metrics

Prometheus Remote Write Exporter mode

Note

This method is suitable for scenarios with large data volumes or unstable scraping, where the Collector writes directly to the Prometheus instance.
The configuration is more complex because you need to configure the data write path.

Architecture diagram

Data write path preparation

In this mode, the Collector writes data directly to the Prometheus instance. You must first obtain the Prometheus endpoint and authentication information.

Obtain the endpoint

Log on to the CloudMonitor console. In the left-side navigation pane, choose Managed Service for Prometheus > Instances and find the Prometheus instance that corresponds to your cluster. The Prometheus instance ID and name typically match the ACK cluster ID and name. Click the name of the target instance. On the Settings page, find the Remote Write URL and copy the internal network address for later use.
Obtain authentication credentials
Choose one of the following methods:
- V2 Prometheus instances support a password-free policy that allows password-free writes from within the cluster's virtual private cloud (VPC).
- Create a RAM user for writing metric data, grant the AliyunPrometheusMetricWriteAccess system policy to the RAM user, and then obtain its AccessKey pair. The AccessKey ID is used as the username and the AccessKey secret is used as the password for writing data.

Deployment configuration example

The Collector deployment configuration is the same as that in the Prometheus Exporter mode. For more information, see the deployment configuration example for the Prometheus Exporter mode.

Collector configuration example

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-config
  namespace: <app-namespace>
data:
  otel-config.yaml: |
    extensions:
    zpages:
      endpoint: localhost:55679
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      batch:
      memory_limiter:
        # 75% of maximum memory up to 2 GB
        limit_mib: 1536
        # 25% of limit up to 2 GB
        spike_limit_mib: 512
        check_interval: 5s
      resource:
        attributes:
          - key: process.runtime.description
            action: delete
          - key: process.command_args
            action: delete
          - key: telemetry.distro.version
            action: delete
          - key: telemetry.sdk.name
            action: delete
          - key: telemetry.sdk.version
            action: delete
          - key: service.instance.id
            action: delete
          - key: process.runtime.name
            action: delete
          - key: process.runtime.description
            action: delete
          - key: process.pid
            action: delete
          - key: process.executable.path
            action: delete
          - key: process.command.args
            action: delete
          - key: os.description
            action: delete
          - key: instance
            action: delete
          - key: container.id
            action: delete
    connectors:
      spanmetrics:
        histogram:
          explicit:
            buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10]
        dimensions: 
          - name: http.method
            default: "GET"
          - name: http.response.status_code
          - name: http.route
            # Custom attribute
          - name: user.id
        metrics_flush_interval: 15s
        exclude_dimensions:
        metrics_expiration: 3m
        events:
          enabled: true
          dimensions: 
          - name: default
            default: "GET"
    exporters:
      debug:
        verbosity: detailed
      prometheusremotewrite:
        # Replace this with the internal network address of the Prometheus Remote Write endpoint.
        endpoint: http://<Endpoint>/api/v3/write
        namespace: "acs"
        resource_to_telemetry_conversion:
          enabled: true
        timeout: 10s   
        headers:
          Prometheus-Remote-Write-Version: "0.1.0"
          # This header is required if the password-free policy is not enabled.
          Authorization: Basic <base64-encoded-username-password>
        external_labels:
          data-mode: metrics
    service:
      pipelines:
        logs:
          receivers: [otlp]
          exporters: [debug]
        traces:
          receivers: [otlp]
          processors: [resource]
          exporters: [spanmetrics]
        metrics:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [prometheusremotewrite]
        metrics/2:
          receivers: [spanmetrics]
          exporters: [prometheusremotewrite]
      extensions: [zpages]

Replace the Prometheus Remote Write endpoint URL in the configuration with the URL you obtained.
To generate the value for base64-encoded-username-password, run the following command:
```
echo -n 'AK:SK' | base64
```

Managed Service for Prometheus:Write metrics to Managed Service for Prometheus with the OpenTelemetry Collector

OpenTelemetry overview

Role of the OpenTelemetry Collector

Data collection

Data conversion

Best practices for applications in ACK clusters

Step 1: Prepare Managed Service for Prometheus

Step 2: Deploy the Collector in sidecar mode

Prometheus Exporter mode

Architecture diagram

Deployment configuration example

Collector configuration example

Scraping task configuration

Prometheus Remote Write Exporter mode

Architecture diagram

Data write path preparation

Deployment configuration example

Collector configuration example

Step 3: Verification