Overview
A MetricSet is a data structure in UModel that defines a collection of related metrics that share common attributes and configuration properties. MetricSets provide a general modeling capability for various monitoring scenarios, including CPU, memory, network, and business metrics.
This topic describes how to use UModel to model Prometheus metric data.
Case study materials
Download the materials for this use case: umodel-enrich-dev.zip.
Core concepts
Definition and purpose of MetricSet
As a core component of metric modeling, a MetricSet serves several key functions:
Metric organization: Groups related metrics into logical collections for simplified management and querying.
Label management: Defines common labels and filter rules to enable consistent metric dimensions.
Query optimization: Offers a unified interface for efficient and optimized querying.
Semantic expression: Includes rich metadata and supports multiple languages for enhanced semantic clarity.
MetricSet structure overview
Observable Data System
├── EntitySet # Entity set
├── TelemetryDataSet # Telemetry dataset (metrics, logs, traces)
└── MetricSet # Metric set
├── labels # Label definitions
│ ├── keys: Field[] # Label field list
│ ├── dynamic: boolean # Dynamic label generation
│ └── filter: string # Label filter
└── metrics: Metric[] # Metric list
├── name # Metric name
├── generator # Query generator
├── aggregator # Aggregation method
└── data_format # Format methodMetricSet structure specification
The core configuration properties of a MetricSet include the following:
Property | Type | Required | Description |
| object | No | Label configuration. Defines the dimensional information of metrics. |
| array | Yes | A list of metrics. Must contain at least one metric. |
| enum | No | The query syntax type. Valid values: |
| boolean | No | Specifies whether the metric requires secondary calculation and processing. Default value: false. |
Modeling labels
Label design principles
Labels define the dimensional properties of metrics. To ensure effective metric modeling, follow these principles:
Generality: MetricSet-level labels should represent dimensions shared across all metrics in the set.
Dynamic generation: Use dynamic methods to automatically generate labels and avoid hard coding.
Efficient filtering: Design labels to support efficient indexing and filtering for fast queries.
Cardinality control: Avoid using high-cardinality labels because they can introduce performance issues.
Label property configuration
Property | Type | Default value | Description | Recommended operation |
| array | - | A list of label fields. For more information about the format, see Field definition. | Define key dimension fields. |
| boolean | false | Specifies whether labels are dynamically generated. | We strongly recommend setting this to true. |
| string | - | A label filter using Prometheus query syntax. | Use in conjunction with dynamic labels. |
Label field definition
Each label field inherits all properties of a general Field. Pay special attention to the following properties:
Property | Recommended value | Description | Notes |
| true | Supports filter queries. | Labels should typically support filtering. |
| true | Supports aggregation analysis. | Labels should typically support aggregation analysis. |
| true | Supports sorting. | Labels should typically support sorting. |
|
| Regular expression pattern. | No restrictions on value. |
Label configuration examples
Kubernetes scenario
labels:
dynamic: true
filter: 'kube_deployment_spec_replicas'
keys:
- name: namespace
display_name:
zh_cn: 命名空间
en_us: Namespace
type: string
filterable: true
analysable: true
pattern: ".*"
- name: deployment
display_name:
zh_cn: 部署名称
en_us: Deployment
type: string
filterable: true
analysable: true
pattern: ".*"APM scenario
labels:
dynamic: true
filter: 'arms_app_requests_count_ign_destid_endpoint_parent_ppid_prpc{callKind=~"http|rpc|custom_entry|server|consumer|schedule"}'
keys:
- name: service
display_name:
zh_cn: 服务名称
en_us: Service
type: string
filterable: true
analysable: true
- name: rpc
display_name:
zh_cn: 接口名称
en_us: Operation
type: string
filterable: true
analysable: trueModeling metrics
Metric design principles
As core components of a MetricSet, each metric represents a distinct, queryable monitoring dimension. When designing metrics, consider the following principles:
Business semantics: Metric names should clearly reflect their business meaning.
Complete calculation: The generator must contain the complete calculation logic, not just reference a raw metric.
Clear units: Configure the data_format and unit properties correctly.
Appropriate aggregation: Select the correct aggregator based on the metric's properties.
Core metric properties
A Metric extends the basic field system with monitoring-specific properties:
Property | Type | Required | Description | Example |
| string | No | Prometheus Query Language (PromQL) expression. |
|
| string | No | The aggregation method. Examples: |
|
| boolean | No | Specifies whether the metric is a golden metric. Default value: false. | true |
| integer/array | No | Collection interval in microseconds. |
|
| string | No | The metric type. Default value: |
|
| enum | No | The recommended query mode. Valid values: |
|
Data format specification
Metrics are often visualized in charts, making the data_format field important. Whenever possible, use the enumeration values defined in the schema and avoid using custom formats.
For a list of available format options, see Field system.
Aggregator selection guide
Choosing the correct aggregator ensures query accuracy:
Metric type | Recommended aggregator | Reason | Example |
Count metrics |
| Cumulative; sum is meaningful | Number of replicas, total requests, total faults |
Ratio metrics | Not set | Let the system handle aggregation logic automatically (the generator must contain aggregation logic) | CPU usage, memory usage |
Capacity metrics |
| Resources need summing | Total memory, total CPU cores |
Note: For ratio metrics (for example, usage/limit × 100) where the generator already calculates the ratio, do not set the aggregator. UModel automatically selects the aggregation policy when needed.
The following are common Generator and Aggregator combinations:
Metric type | Generator | Aggregator | Description |
Count metrics |
|
| Cumulative metrics, such as total requests, require summation during aggregation. |
Replica count metrics |
|
| The number of deployment replicas. Namespace-level aggregation requires summation. |
CPU utilization |
|
| CPU utilization. Use the |
Absolute value metrics |
| Not set | The generator already contains the complete aggregation logic. |
Rate metrics |
| Not set | The generator already contains the complete aggregation logic. |
Ratio metrics (aggregated) |
| Not set | The generator already contains the complete aggregation logic. |
Average latency metrics |
| Not set | The generator already contains the complete aggregation logic. |
Specific configuration examples
1. Count metric: APM request count.
- name: request_count
generator: 'sum_over_time_lcro(arms_app_requests_count_ign_destid_endpoint_parent_ppid_prpc{callKind=~"http|rpc|custom_entry|server|consumer|schedule"}[60s])'
aggregator: sum
data_format: KMB2. Replica count metric: Desired number of deployment replicas.
- name: deployment_desired_replicas
generator: kube_deployment_spec_replicas{}
aggregator: sum
data_format: KMB3. Ratio metric (no aggregator): Deployment availability.
- name: deployment_availability_rate
generator: (sum by (namespace, deployment) (kube_deployment_status_replicas_ready{}) / (sum by (namespace, deployment) (kube_deployment_spec_replicas{}))!=0) * 100
# Do not set an aggregator
data_format: percent4. Average latency metric: APM average response time.
- name: avg_request_latency_seconds
generator: 'sum(sum_over_time_lcro(arms_app_requests_seconds_ign_destid_endpoint_parent_ppid_prpc{callKind=~"http|rpc|custom_entry|server|consumer|schedule"}[60s])) / sum(sum_over_time_lcro(arms_app_requests_count_ign_destid_endpoint_parent_ppid_prpc{callKind=~"http|rpc|custom_entry|server|consumer|schedule"}[60s]))'
# Do not set an aggregator
data_format: sGolden metric identification
Golden Metrics are the most critical metrics in a MetricSet:
Quantity limit: 3–5 per MetricSet; do not exceed 8.
Selection criteria: Core metrics that directly indicate system health.
Scenarios: Alert rules, dashboards, and automated O&M tasks.
Metrics configuration examples
CPU usage metric.
- name: deployment_cpu_usage_vs_requests display_name: zh_cn: CPU使用率相对于请求值 en_us: CPU Usage vs. Requests description: zh_cn: Deployment 的 CPU 使用率相对于请求值的百分比 en_us: The CPU usage of the deployment as a percentage of its requested CPU. generator: | sum by (namespace, deployment) ( rate(container_cpu_usage_seconds_total{pod!~"POD"}[5m]) ) / sum by (namespace, deployment) ( kube_pod_container_resource_requests_cpu_cores{pod!~"POD"} ) * 100 data_format: percent unit: '' golden_metric: true interval_us: [15000000] type: gaugeRequest count metric.
- name: request_count display_name: zh_cn: 请求次数 en_us: Request Count description: zh_cn: 请求次数指的是对一个特定应用或接口发起调用的总次数 en_us: The total number of calls to a specific application or API. generator: 'sum_over_time_lcro(arms_app_requests_count_ign_destid_endpoint_parent_ppid_prpc{callKind=~"http|rpc|custom_entry|server|consumer|schedule"}[60s])' aggregator: sum data_format: KMB golden_metric: true interval_us: 15000000
Advanced configuration: Variable substitution and label association
Variable substitution mechanism
In the UModel system, MetricSets that use Prometheus syntax support built-in label filtering and injection. When you filter label values in Labels, the system automatically injects the filter's conditional expression into the generator.
The generator in a Metric supports join operations on multiple metrics. However, not all metrics support all label injections. For example, some advanced Kubernetes metrics require multi-metric joins for data enrichment. To address these differences in label fields, you can use the variable substitution mechanism.
Scenarios:
Missing labels: Some labels exist only in specific metrics.
Label name changes: The label names of some metrics do not match the configured names.
Multi-metric association queries: You need to associate different metrics using join operations.
Variable substitution syntax
Use ${{value|default_value}} to replace a label value in a metric. If the label value does not exist, the default value is used.
# Example: The node label exists only in kube_pod_info
kube_pod_info{node=~"${{node|.*}}"}Label association example
Associating Kubernetes pod labels with deployment labels
In a Kubernetes scenario, to calculate the total memory requests of a deployment, you need to associate pods and deployments through a ReplicaSet. This involves joining across three metrics:
kube_pod_container_resource_requests_memory_bytes: The memory request amount for the container to which the pod belongs.kube_pod_info: Pod information.kube_replicaset_owner: The deployment to which the ReplicaSet belongs.
The labels for the deployment are configured as namespace and deployment. You need to use the variable substitution mechanism to handle the differences in label fields.
namespace: All metrics have this label, so no substitution is needed.deployment: Onlykube_replicaset_ownerhas this label, and its actual label name isowner_name. You need to use the variable substitution mechanism to handle this.
# Memory resource-related metrics that associate pods and deployments through a ReplicaSet
- aggregator: sum
data_format: byte
description:
en_us: Total memory requested by all pods in the deployment.
zh_cn: Deployment 所有 Pod 的内存请求总量
display_name:
en_us: Deployment Memory Requests Total
zh_cn: Deployment 内存请求总量
generator: sum by (namespace, deployment) (kube_pod_container_resource_requests_memory_bytes{pod!~"POD"} * on (pod, namespace) group_left (deployment) (max by (pod, namespace, deployment) (label_replace(kube_pod_info{created_by_kind="ReplicaSet"}, "replicaset", "$1", "created_by_name", "(.*)") * on (namespace, replicaset) group_left (deployment) label_replace(kube_replicaset_owner{owner_kind="Deployment", owner_name=~"${{deployment|.*}}"}, "deployment", "$1", "owner_name", "(.*)"))))
golden_metric: false
interval_us:
- 15000000
launch_stage: ga
name: deployment_memory_requests_total
type: gauge
unit: ''MetricSet best practices
Design principles
Clear semantics: The metric name should clearly express its business meaning.
Reasonable labels: Avoid high-cardinality labels to maintain query performance.
Complete calculation: The generator should contain the complete business logic.
Standard format: Use only the data_format enumeration values defined in the schema and avoid using custom formats.
Correct aggregation: Select the appropriate aggregator based on the metric's properties.
Performance optimization
Label filtering: Apply label filters early in the query process.
Aggregation optimization: Use functions such as
sum byto reduce the number of time series.Dynamic labels: Use dynamic: true to improve label management efficiency.
Complete configuration example
Kubernetes deployment MetricSet.
kind: metric_set
schema:
url: umodel.aliyun.com
version: v0.1.0
metadata:
name: k8s.metric.high_level_metric_deployment
display_name:
zh_cn: Kubernetes 高阶 Deployment 指标
en_us: Kubernetes High-Level Deployment Metrics
description:
zh_cn: 用于评估 Kubernetes Deployment 工作负载健康状况、可用性及调度效率的指标
en_us: Metrics to evaluate the health, availability, and scheduling efficiency of Deployment workloads in Kubernetes.
domain: k8s
spec:
labels:
dynamic: true
filter: kube_deployment_spec_replicas
keys:
- name: namespace
display_name:
zh_cn: 命名空间
en_us: Namespace
type: string
filterable: true
analysable: true
pattern: ".*"
- name: deployment
display_name:
zh_cn: Deployment 名称
en_us: Deployment
type: string
filterable: true
analysable: true
pattern: ".*"
metrics:
- name: deployment_desired_replicas
display_name:
zh_cn: Deployment 期望副本数
en_us: Deployment Desired Replicas
description:
zh_cn: Deployment 期望的副本数
en_us: The desired number of replicas for the deployment.
generator: kube_deployment_spec_replicas{}
aggregator: sum
data_format: KMB
golden_metric: true
interval_us: [15000000]
type: gauge
- name: deployment_availability_rate
display_name:
zh_cn: Deployment 可用性
en_us: Deployment Availability Rate
description:
zh_cn: Deployment 可用性百分比(就绪副本数/期望副本数)
en_us: Deployment availability as a percentage (ready replicas / desired replicas).
generator: |
(sum by (namespace, deployment) (kube_deployment_status_replicas_ready{}) /
sum by (namespace, deployment) (kube_deployment_spec_replicas{})) * 100
data_format: percent
unit: ''
golden_metric: true
interval_us: [15000000]
type: gauge