All Products
Search
Document Center

Cloud Monitor:Modeling MetricSets for Prometheus

Last Updated:Nov 19, 2025

Overview

  • A MetricSet is a data structure in UModel that defines a collection of related metrics that share common attributes and configuration properties. MetricSets provide a general modeling capability for various monitoring scenarios, including CPU, memory, network, and business metrics.

  • This topic describes how to use UModel to model Prometheus metric data.

Case study materials

Download the materials for this use case: umodel-enrich-dev.zip.

Core concepts

Definition and purpose of MetricSet

As a core component of metric modeling, a MetricSet serves several key functions:

  • Metric organization: Groups related metrics into logical collections for simplified management and querying.

  • Label management: Defines common labels and filter rules to enable consistent metric dimensions.

  • Query optimization: Offers a unified interface for efficient and optimized querying.

  • Semantic expression: Includes rich metadata and supports multiple languages for enhanced semantic clarity.

MetricSet structure overview

Observable Data System
├── EntitySet                     # Entity set
├── TelemetryDataSet              # Telemetry dataset (metrics, logs, traces)
    └── MetricSet                 # Metric set
        ├── labels                # Label definitions
        │   ├── keys: Field[]     # Label field list
        │   ├── dynamic: boolean  # Dynamic label generation
        │   └── filter: string    # Label filter
        └── metrics: Metric[]     # Metric list
            ├── name              # Metric name
            ├── generator         # Query generator
            ├── aggregator        # Aggregation method
            └── data_format       # Format method

MetricSet structure specification

The core configuration properties of a MetricSet include the following:

Property

Type

Required

Description

labels

object

No

Label configuration. Defines the dimensional information of metrics.

metrics

array

Yes

A list of metrics. Must contain at least one metric.

query_type

enum

No

The query syntax type. Valid values: prom, spl, and cms.

needs_processing

boolean

No

Specifies whether the metric requires secondary calculation and processing. Default value: false.

Modeling labels

Label design principles

Labels define the dimensional properties of metrics. To ensure effective metric modeling, follow these principles:

  • Generality: MetricSet-level labels should represent dimensions shared across all metrics in the set.

  • Dynamic generation: Use dynamic methods to automatically generate labels and avoid hard coding.

  • Efficient filtering: Design labels to support efficient indexing and filtering for fast queries.

  • Cardinality control: Avoid using high-cardinality labels because they can introduce performance issues.

Label property configuration

Property

Type

Default value

Description

Recommended operation

keys

array

-

A list of label fields. For more information about the format, see Field definition.

Define key dimension fields.

dynamic

boolean

false

Specifies whether labels are dynamically generated.

We strongly recommend setting this to true.

filter

string

-

A label filter using Prometheus query syntax.

Use in conjunction with dynamic labels.

Label field definition

Each label field inherits all properties of a general Field. Pay special attention to the following properties:

Property

Recommended value

Description

Notes

filterable

true

Supports filter queries.

Labels should typically support filtering.

analysable

true

Supports aggregation analysis.

Labels should typically support aggregation analysis.

orderable

true

Supports sorting.

Labels should typically support sorting.

pattern

.*

Regular expression pattern.

No restrictions on value.

Label configuration examples

Kubernetes scenario

labels:
  dynamic: true
  filter: 'kube_deployment_spec_replicas'
  keys:
    - name: namespace
      display_name:
        zh_cn: 命名空间
        en_us: Namespace
      type: string
      filterable: true
      analysable: true
      pattern: ".*"
    - name: deployment  
      display_name:
        zh_cn: 部署名称
        en_us: Deployment
      type: string
      filterable: true
      analysable: true
      pattern: ".*"

APM scenario

labels:
  dynamic: true
  filter: 'arms_app_requests_count_ign_destid_endpoint_parent_ppid_prpc{callKind=~"http|rpc|custom_entry|server|consumer|schedule"}'
  keys:
    - name: service
      display_name:
        zh_cn: 服务名称
        en_us: Service
      type: string
      filterable: true
      analysable: true
    - name: rpc
      display_name:
        zh_cn: 接口名称  
        en_us: Operation
      type: string
      filterable: true
      analysable: true

Modeling metrics

Metric design principles

As core components of a MetricSet, each metric represents a distinct, queryable monitoring dimension. When designing metrics, consider the following principles:

  • Business semantics: Metric names should clearly reflect their business meaning.

  • Complete calculation: The generator must contain the complete calculation logic, not just reference a raw metric.

  • Clear units: Configure the data_format and unit properties correctly.

  • Appropriate aggregation: Select the correct aggregator based on the metric's properties.

Core metric properties

A Metric extends the basic field system with monitoring-specific properties:

Property

Type

Required

Description

Example

generator

string

No

Prometheus Query Language (PromQL) expression.

rate(cpu_usage[5m]) * 100

aggregator

string

No

The aggregation method. Examples: sum, avg, max, and min.

sum, avg

golden_metric

boolean

No

Specifies whether the metric is a golden metric. Default value: false.

true

interval_us

integer/array

No

Collection interval in microseconds.

[15000000]

type

string

No

The metric type. Default value: gauge.

gauge

query_mode

enum

No

The recommended query mode. Valid values: range, instant, and both.

both

Data format specification

Metrics are often visualized in charts, making the data_format field important. Whenever possible, use the enumeration values defined in the schema and avoid using custom formats.

For a list of available format options, see Field system.

Aggregator selection guide

Choosing the correct aggregator ensures query accuracy:

Metric type

Recommended aggregator

Reason

Example

Count metrics

sum

Cumulative; sum is meaningful

Number of replicas, total requests, total faults

Ratio metrics

Not set

Let the system handle aggregation logic automatically (the generator must contain aggregation logic)

CPU usage, memory usage

Capacity metrics

sum

Resources need summing

Total memory, total CPU cores

Note: For ratio metrics (for example, usage/limit × 100) where the generator already calculates the ratio, do not set the aggregator. UModel automatically selects the aggregation policy when needed.

The following are common Generator and Aggregator combinations:

Metric type

Generator

Aggregator

Description

Count metrics

sum_over_time_lcro(arms_app_requests_count[60s])

sum

Cumulative metrics, such as total requests, require summation during aggregation.

Replica count metrics

kube_deployment_spec_replicas{}

sum

The number of deployment replicas. Namespace-level aggregation requires summation.

CPU utilization

cpu_usage_rate{}

avg

CPU utilization. Use the avg aggregation method.

Absolute value metrics

sum by (pod, namespace) (container_memory_usage_bytes{pod!~"POD"})

Not set

The generator already contains the complete aggregation logic.

Rate metrics

sum by (pod, namespace) (rate(container_network_receive_bytes_total[5m]))

Not set

The generator already contains the complete aggregation logic.

Ratio metrics (aggregated)

sum(usage) / sum(requests) * 100

Not set

The generator already contains the complete aggregation logic.

Average latency metrics

sum(total_seconds) / sum(request_count)

Not set

The generator already contains the complete aggregation logic.

Specific configuration examples

1. Count metric: APM request count.

- name: request_count
  generator: 'sum_over_time_lcro(arms_app_requests_count_ign_destid_endpoint_parent_ppid_prpc{callKind=~"http|rpc|custom_entry|server|consumer|schedule"}[60s])'
  aggregator: sum
  data_format: KMB

2. Replica count metric: Desired number of deployment replicas.

- name: deployment_desired_replicas
  generator: kube_deployment_spec_replicas{}
  aggregator: sum
  data_format: KMB

3. Ratio metric (no aggregator): Deployment availability.

- name: deployment_availability_rate
  generator: (sum by (namespace, deployment) (kube_deployment_status_replicas_ready{}) / (sum by (namespace, deployment) (kube_deployment_spec_replicas{}))!=0) * 100
  # Do not set an aggregator
  data_format: percent

4. Average latency metric: APM average response time.

- name: avg_request_latency_seconds
  generator: 'sum(sum_over_time_lcro(arms_app_requests_seconds_ign_destid_endpoint_parent_ppid_prpc{callKind=~"http|rpc|custom_entry|server|consumer|schedule"}[60s])) / sum(sum_over_time_lcro(arms_app_requests_count_ign_destid_endpoint_parent_ppid_prpc{callKind=~"http|rpc|custom_entry|server|consumer|schedule"}[60s]))'
  # Do not set an aggregator
  data_format: s

Golden metric identification

Golden Metrics are the most critical metrics in a MetricSet:

  • Quantity limit: 3–5 per MetricSet; do not exceed 8.

  • Selection criteria: Core metrics that directly indicate system health.

  • Scenarios: Alert rules, dashboards, and automated O&M tasks.

Metrics configuration examples

  • CPU usage metric.

    - name: deployment_cpu_usage_vs_requests
      display_name:
        zh_cn: CPU使用率相对于请求值
        en_us: CPU Usage vs. Requests
      description:
        zh_cn: Deployment 的 CPU 使用率相对于请求值的百分比
        en_us: The CPU usage of the deployment as a percentage of its requested CPU.
      generator: |
        sum by (namespace, deployment) (
          rate(container_cpu_usage_seconds_total{pod!~"POD"}[5m])
        ) / sum by (namespace, deployment) (
          kube_pod_container_resource_requests_cpu_cores{pod!~"POD"}
        ) * 100
      data_format: percent
      unit: ''
      golden_metric: true
      interval_us: [15000000]
      type: gauge
  • Request count metric.

    - name: request_count
      display_name:
        zh_cn: 请求次数
        en_us: Request Count
      description:
        zh_cn: 请求次数指的是对一个特定应用或接口发起调用的总次数
        en_us: The total number of calls to a specific application or API.
      generator: 'sum_over_time_lcro(arms_app_requests_count_ign_destid_endpoint_parent_ppid_prpc{callKind=~"http|rpc|custom_entry|server|consumer|schedule"}[60s])'
      aggregator: sum
      data_format: KMB
      golden_metric: true
      interval_us: 15000000

Advanced configuration: Variable substitution and label association

Variable substitution mechanism

  • In the UModel system, MetricSets that use Prometheus syntax support built-in label filtering and injection. When you filter label values in Labels, the system automatically injects the filter's conditional expression into the generator.

  • The generator in a Metric supports join operations on multiple metrics. However, not all metrics support all label injections. For example, some advanced Kubernetes metrics require multi-metric joins for data enrichment. To address these differences in label fields, you can use the variable substitution mechanism.

Scenarios:

  • Missing labels: Some labels exist only in specific metrics.

  • Label name changes: The label names of some metrics do not match the configured names.

  • Multi-metric association queries: You need to associate different metrics using join operations.

Variable substitution syntax

Use ${{value|default_value}} to replace a label value in a metric. If the label value does not exist, the default value is used.

# Example: The node label exists only in kube_pod_info
kube_pod_info{node=~"${{node|.*}}"}

Label association example

Associating Kubernetes pod labels with deployment labels

In a Kubernetes scenario, to calculate the total memory requests of a deployment, you need to associate pods and deployments through a ReplicaSet. This involves joining across three metrics:

  1. kube_pod_container_resource_requests_memory_bytes: The memory request amount for the container to which the pod belongs.

  2. kube_pod_info: Pod information.

  3. kube_replicaset_owner: The deployment to which the ReplicaSet belongs.

The labels for the deployment are configured as namespace and deployment. You need to use the variable substitution mechanism to handle the differences in label fields.

  1. namespace: All metrics have this label, so no substitution is needed.

  2. deployment: Only kube_replicaset_owner has this label, and its actual label name is owner_name. You need to use the variable substitution mechanism to handle this.

# Memory resource-related metrics that associate pods and deployments through a ReplicaSet
- aggregator: sum
  data_format: byte
  description:
      en_us: Total memory requested by all pods in the deployment.
      zh_cn: Deployment 所有 Pod 的内存请求总量
  display_name:
      en_us: Deployment Memory Requests Total
      zh_cn: Deployment 内存请求总量
  generator: sum by (namespace, deployment) (kube_pod_container_resource_requests_memory_bytes{pod!~"POD"} * on (pod, namespace) group_left (deployment) (max by (pod, namespace, deployment) (label_replace(kube_pod_info{created_by_kind="ReplicaSet"}, "replicaset", "$1", "created_by_name", "(.*)") * on (namespace, replicaset) group_left (deployment) label_replace(kube_replicaset_owner{owner_kind="Deployment", owner_name=~"${{deployment|.*}}"}, "deployment", "$1", "owner_name", "(.*)"))))
  golden_metric: false
  interval_us:
  - 15000000
  launch_stage: ga
  name: deployment_memory_requests_total
  type: gauge
  unit: ''

MetricSet best practices

Design principles

  1. Clear semantics: The metric name should clearly express its business meaning.

  2. Reasonable labels: Avoid high-cardinality labels to maintain query performance.

  3. Complete calculation: The generator should contain the complete business logic.

  4. Standard format: Use only the data_format enumeration values defined in the schema and avoid using custom formats.

  5. Correct aggregation: Select the appropriate aggregator based on the metric's properties.

Performance optimization

  1. Label filtering: Apply label filters early in the query process.

  2. Aggregation optimization: Use functions such as sum by to reduce the number of time series.

  3. Dynamic labels: Use dynamic: true to improve label management efficiency.

Complete configuration example

Kubernetes deployment MetricSet.

kind: metric_set
schema:
  url: umodel.aliyun.com
  version: v0.1.0
metadata:
  name: k8s.metric.high_level_metric_deployment
  display_name:
    zh_cn: Kubernetes 高阶 Deployment 指标
    en_us: Kubernetes High-Level Deployment Metrics
  description:
    zh_cn: 用于评估 Kubernetes Deployment 工作负载健康状况、可用性及调度效率的指标
    en_us: Metrics to evaluate the health, availability, and scheduling efficiency of Deployment workloads in Kubernetes.
  domain: k8s
spec:
  labels:
    dynamic: true
    filter: kube_deployment_spec_replicas
    keys:
      - name: namespace
        display_name:
          zh_cn: 命名空间
          en_us: Namespace
        type: string
        filterable: true
        analysable: true
        pattern: ".*"
      - name: deployment
        display_name:
          zh_cn: Deployment 名称
          en_us: Deployment
        type: string
        filterable: true
        analysable: true
        pattern: ".*"
  metrics:
    - name: deployment_desired_replicas
      display_name:
        zh_cn: Deployment 期望副本数
        en_us: Deployment Desired Replicas
      description:
        zh_cn: Deployment 期望的副本数
        en_us: The desired number of replicas for the deployment.
      generator: kube_deployment_spec_replicas{}
      aggregator: sum
      data_format: KMB
      golden_metric: true
      interval_us: [15000000]
      type: gauge
    - name: deployment_availability_rate
      display_name:
        zh_cn: Deployment 可用性
        en_us: Deployment Availability Rate
      description:
        zh_cn: Deployment 可用性百分比(就绪副本数/期望副本数)
        en_us: Deployment availability as a percentage (ready replicas / desired replicas).
      generator: |
        (sum by (namespace, deployment) (kube_deployment_status_replicas_ready{}) / 
         sum by (namespace, deployment) (kube_deployment_spec_replicas{})) * 100
      data_format: percent
      unit: ''
      golden_metric: true
      interval_us: [15000000]
      type: gauge