Connect a custom MetricStore to Cloud Monitor 2.0 - Cloud Monitor

This topic describes how to connect a custom Simple Log Service MetricStore to the Cloud Monitor 2.0 UModel system. This connection lets you uniformly model, query, and manage metric data. Using UModel MetricSet modeling, you can associate disparate time series metric data with business entities to build a complete observability data system.

Unified data model: Structure metric data using MetricSet modeling to provide a consistent data access experience.
Entity association: Enhance the observability of existing entities by associating metric data with them.
Unified query: Allows you to perform unified queries using Structured Process Language (SPL) and custom analysis queries. You do not need to know the underlying storage details.

Precautions

Domain naming conventions: Do not use the same names as built-in domains, such as apm, k8s, or acs, when you configure new UModel domains. This practice prevents accidental deletion during a system upgrade.
Permission requirements: Ensure that the current user has read permissions for the MetricStore. Otherwise, queries fail and return a permission error.
UModel uniqueness: Ensure that the <kind, domain, name> of a new node is globally unique in UModel.

Prerequisites

Simple Log Service MetricStore: You have created a Simple Log Service MetricStore and are writing metric data to it.
Metric data: You understand the metric names and label structure in the MetricStore.

Procedure

Step 1: Define the Simple Log Service MetricStore storage

Create an sls_metricstore.yaml file to define the physical storage location of the metric data:

kind: sls_metricstore
schema:
  url: "umodel.aliyun.com"
  version: "v0.1.0"
metadata:
  name: "custom.metrics.storage"
  display_name:
    en_us: "Custom Metrics Storage"
  description:
    en_us: "Physical storage configuration for custom business metrics"
  domain: custom
spec:
  region: "cn-hangzhou"           # The region where the Simple Log Service project is located.
  project: "my-observability"     # The name of the Simple Log Service project.
  store: "business-metrics"       # The name of the MetricStore.

Parameter description:

Parameter Name	Type	Required	Description	Example
`kind`	string	Yes	Storage class identifier.	Static field: `sls_metricstore`
`region`	string	Yes	Simple Log Service region.	cn-hangzhou
`project`	string	Yes	SLS project name.	my-observability
`store`	string	Yes	MetricStore name.	business-metrics

Step 2: Define the MetricSet

Choose one of the following two methods:

Associate an existing MetricSet: If a suitable MetricSet definition already exists in the system, you can directly associate it with your MetricStore in Step 3.

Create a new MetricSet: To customize the metric structure, create a metric_set.yaml file to define the structure, labels, and query method for the metric data:

kind: metric_set
schema:
  url: "umodel.aliyun.com"
  version: "v0.1.0"
metadata:
  name: "custom.metric.business"
  display_name:
    en_us: "Business Monitoring Metrics"
  description:
    en_us: "Custom business monitoring metrics including request count, response time, success rate"
  domain: custom
spec:
  query_type: prom             # Query syntax: prom (MetricStore uses PromQL)
  labels:
    dynamic: true              # Specifies whether to dynamically generate labels. We recommend that you set this to true.
    filter: 'business_request_total'  # The metric used for dynamic label generation.
    keys: # A list of label fields that define all required dimension fields. 
      - name: service_name
        display_name:
          en_us: Service Name
        type: string
        filterable: true
        analysable: true
        pattern: ".*"
      - name: endpoint
        display_name:
          en_us: Endpoint
        type: string
        filterable: true
        analysable: true
        pattern: ".*"
      - name: region
        display_name:
          en_us: Region
        type: string
        filterable: true
        analysable: true
        pattern: ".*"
  metrics:
    # Request count metric
    - name: request_count # The name of the metric, which must be unique.
      display_name:
        en_us: Request Count
      description:
        en_us: Total number of requests received by the service
      generator: 'sum(increase(business_request_total{}[1m]))' # The PromQL expression that defines the metric.
      aggregator: sum # The operator for aggregation by dimension (label). Examples: sum, avg, max, and min.
      data_format: KMB # Data format. Examples: KMB, percent, byte, ms, and s.
      golden_metric: true # Specifies whether the metric is a golden metric. We recommend that you specify 3 to 5 golden metrics.
      interval_us: [60000000] # The collection interval in microseconds.
      type: gauge

    # Average response time metric
    - name: avg_response_time
      display_name:
        en_us: Average Response Time
      description:
        en_us: Average response time of service requests
      generator: |
        sum(rate(business_response_seconds_sum{}[1m])) / 
        sum(rate(business_response_seconds_count{}[1m]))
      data_format: ms
      unit: 'ms'
      golden_metric: true
      interval_us: [60000000]
      type: gauge

    # Success rate metric
    - name: success_rate
      display_name:
        en_us: Success Rate
      description:
        en_us: Success rate percentage of service requests
      generator: |
        (sum(increase(business_request_total{status="success"}[1m])) / 
         sum(increase(business_request_total{}[1m]))) * 100
      data_format: percent
      unit: '%'
      golden_metric: true
      interval_us: [60000000]
      type: gauge

    # Error count metric
    - name: error_count
      display_name:
        en_us: Error Count
      description:
        en_us: Number of failed requests
      generator: 'sum(increase(business_request_total{status="error"}[1m]))'
      aggregator: sum
      data_format: KMB
      golden_metric: false
      interval_us: [60000000]
      type: gauge

Labels configuration:

Property name	Type	Description	Recommended Value
`dynamic`	boolean	Whether to dynamically generate labels.	true (highly recommended)
`filter`	string	Metric filter for dynamic label generation.	Select a representative metric name.
`keys`	array	List of label fields.	Define all required dimension fields.

Metrics configuration:

Property name	Type	Required	Description
`name`	string	Yes	Metric name, unique identifier.
`generator`	string	Yes	Query expression for the metric. Prometheus Query Language (PromQL) expression.
`aggregator`	string	No	Aggregation operator for dimension (Label) aggregation. Aggregation methods: `sum`, `avg`, `max`, `min`, etc.
`data_format`	string	Yes	Data format: `KMB`, `percent`, `byte`, `ms`, `s`.
`golden_metric`	boolean	No	Whether it is a golden metric (three to five metrics are recommended).
`interval_us`	array	No	Collection interval, in microseconds.

For more information, see Metrics modeling.

Step 3: Create a Storage Link

Create a storage_link.yaml file to associate the MetricSet with the SLS MetricStore:

kind: storage_link
schema:
  url: "umodel.aliyun.com"
  version: "v0.1.0"
metadata:
  name: "custom.metric.business_storage_link"
  display_name:
    en_us: "Business Metrics Storage Link"
  description:
    en_us: "Link business metrics to SLS MetricStore"
  domain: custom
spec:
  src:
    domain: custom
    kind: metric_set
    name: custom.metric.business
  dest:
    domain: custom
    kind: sls_metricstore
    name: custom.metrics.storage

Step 4: Upload the configuration

Upload the configuration files to the UModel system.

Log on to the Cloud Monitor 2.0 console.
Navigate to the target workspace and choose UModel Explorer from the navigation pane on the left.
On the UModel Explorer page, click in the upper-right corner and click Upload UModel YAML/JSON.
In the Batch Upload UModel dialog box, click to select the files, or drag the files into the upload box: sls_metricstore.yaml, metric_set.yaml, and storage_link.yaml.
After the files are uploaded, click Import.
After the import is successful, click Submit in the upper-right corner of the page.
On the submission preview page, review the content. If the content is correct, click Confirm that the content is as expected and execute the change. In the Change Details dialog box, click OK.
On the UModel Explorer page, confirm that the MetricSet, StorageLink, and SLS MetricStore have been created.

Step 5: Verify the configuration

Go to Entity Explorer, click SPL, and enter the following SPL statements to verify the configuration:

1. Verify that the MetricSet was created successfully

.umodel | where kind = 'metric_set' and json_extract_scalar(metadata, '$.name') = 'custom.metric.business'

2. Verify that the data can be queried

.metric_set with(domain='custom', name='custom.metric.business') | limit 10

Data query

After you connect a MetricStore to UModel, you can query data using SPL. SPL is a unified query language from SLS that lets you directly query metric data in a MetricSet.

To query data, go to the Entity Explorer page, click SPL, and enter a search statement in the input box.

Basic query syntax

-- Basic syntax for querying a MetricSet
.metric_set with(
    domain='Domain name', 
    name='Dataset name', 
    source='metrics', 
    metric='Metric name', 
    [Other optional parameters]
)

Examples:

Basic query: Query the api_request_duration metric. This query does not perform aggregation and returns all metrics. The step size is automatically determined.
```
.metric_set with(
    domain='rum', 
    name='rum.metric.api', 
    source='metrics', 
    metric='api_request_duration',
    aggregate=false
)
```

Aggregate query: Query the request_count metric, aggregate the data by the service and operation dimensions, filter the data where service_id = "hwx28v3j7p@9949e3dbf79e9a082105c", and set the step size to 1 minute.

.metric_set with(
    domain='apm', 
    name='apm.metric.apm.operation', 
    source='metrics', 
    metric='request_count',
    query_type='range',
    step='1m',
    aggregate=true,
    aggregate_labels=['service', 'operation'],
    query='service_id = "hwx28v3j7p@9949e3dbf79e9a082105c"'
)

Post-aggregation analysis: Query the error_rate metric, perform a unified aggregation, and then call the anomaly detection SPL for further analysis.

.metric_set with(
    domain='apm',
    name='apm.metric.apm.service',
    source='metrics',
    metric='error_rate',
    step='1m',
    aggregate='true'
  )
| extend slice_index = find_first_index(__ts__, x -> x > 1756904640000000000)
| extend len = cardinality(__ts__)
| extend ret = series_cnn_anomalies(__value__)
| extend anomalies_score_series = ret.anomalies_score_series, anomalies_type_series = ret.anomalies_type_series, error_msg = ret.error_msg
| project __labels__, __name__, __ts__, __value__, anomalies_score_series, anomalies_type_series, error_msg,len,slice_index
| extend __ts__ = slice(__ts__, slice_index, len - slice_index), __value__ = slice(__value__, slice_index, len - slice_index), anomalies_score_series = slice(anomalies_score_series, slice_index, len - slice_index), anomalies_type_series = slice(anomalies_type_series, slice_index, len - slice_index)
| extend anomay_cnt = CARDINALITY(FILTER(anomalies_score_series, x -> x > 0.5)), anomaly_sorce = array_sum(FILTER(anomalies_score_series, x -> x > 0.5))
| extend sort_score = anomaly_sorce / cast(anomay_cnt as double) 
| sort sort_score desc , anomay_cnt desc

EntitySet-based data query

Query the request_count metric where service_id is order-service.

.entity_set with(domain='apm', name='apm.service', query='service_id = "order-service"')
| entity-call get_metric('apm', 'apm.metric.apm.service', 'request_count', 'range', '1m')

Query the request_count metric, perform a unified aggregation, and call the anomaly detection SPL for further analysis.

.entity_set with(domain='apm', name='apm.service', query='service_id = "order-service"')
| entity-call get_metric('apm', 'apm.metric.apm.service', 'request_count', 'range', '1m')
| extend slice_index = find_first_index(__ts__, x -> x > 1756904640000000000)
| extend len = cardinality(__ts__)
| extend ret = series_cnn_anomalies(__value__)
| extend anomalies_score_series = ret.anomalies_score_series, anomalies_type_series = ret.anomalies_type_series, error_msg = ret.error_msg
| project __labels__, __name__, __ts__, __value__, anomalies_score_series, anomalies_type_series, error_msg,len,slice_index
| extend __ts__ = slice(__ts__, slice_index, len - slice_index), __value__ = slice(__value__, slice_index, len - slice_index), anomalies_score_series = slice(anomalies_score_series, slice_index, len - slice_index), anomalies_type_series = slice(anomalies_type_series, slice_index, len - slice_index)
| extend anomay_cnt = CARDINALITY(FILTER(anomalies_score_series, x -> x > 0.5)), anomaly_sorce = array_sum(FILTER(anomalies_score_series, x -> x > 0.5))
| extend sort_score = anomaly_sorce / cast(anomay_cnt as double) 
| sort sort_score desc , anomay_cnt desc