×
Community Blog Configure SLO for Application Service in Alibaba Cloud Service Mesh (3): Configure SLOs for Applications in ASM

Configure SLO for Application Service in Alibaba Cloud Service Mesh (3): Configure SLOs for Applications in ASM

Part 3 of this series explains how to configure SLOs for applications in ASM.

By Xining Wang (xining.wxn@alibaba-inc.com)

This is the 3rd article in the series:

  1. Configure SLO for Application Service in Alibaba Cloud Service Mesh (1): SLO Overview
  2. Configure SLO for Application Service in Alibaba Cloud Service Mesh (2): SLO Definition in ASM
  3. Configure SLO for Application Service in Alibaba Cloud Service Mesh (3): Configure SLOs for Applications in ASM
  4. Configure SLO for Application Service in Alibaba Cloud Service Mesh (4): Import the Generated Rules to Prometheus to Execute SLO
  5. Configure SLO for Application Service in Alibaba Cloud Service Mesh (5): Use Grafana to View SLO

Users can manually configure SLO based on Prometheus metrics, but the process is cumbersome. Alibaba Cloud Service Mesh (ASM) can configure service level objectives (SLOs) and associated alert rules, simplifying this process with custom resource YAML configurations. This article explains how to configure SLOs for applications in ASM.

Prerequisites

  • An ASM instance whose version is 1.15.3 or later is created. For more information, see Create an ASM Instance.

Configure an SLO

In this topic, an SLO is configured for the httpbin application in the default namespace to specify the service availability. The objective is 99% and the period of time during which the SLO takes effect is 30 days. Two severity levels of alerts are configured: pageAlert and ticketAlert.

Save the following configuration file in YAML format as the prometheusservicelevel.yaml file. Use kubeconfig of the ASM instance to connect and run the kubectl command to deploy to the mesh.

kubectl apply -f prometheusservicelevel.yaml
apiVersion: istio.alibabacloud.com/v1beta1
kind: ServiceLevelObjective
metadata:
  name: asm-slo-default-httpbin
  namespace: default # Namespace to which the custom resource belongs 
spec:
  service: httpbin # Name of the application 
  period: 30d # Period of time during which the SLO takes effect 
  slos:
  - name: asm-slo # Name of the SLO 
    objective: "99" # Objective 
    sli:
      plugin:
        id: availability # Type of the plug-in 
    alerting:
      name: asm-alert # Name of the alert rule 

And you also can use the Web UI console of ASM to define the SLO shown as below.

slo_webui

Automatically Generated Prometheus Rules

Run the following command to view the results:

# default is the namespace where the application resides. httpbin is the name of the application. 
kubectl get prometheusservicelevel asm-slo-default-httpbin -o yaml

The status field in the command output:

status:
  ......
  status: success
  prometheusRules: # Automatically-generated YAML configuration of the Prometheus rule 

The value of the prometheusRules field is the YAML configuration of the Prometheus rule.
The following code provides an example of the YAML configuration:

groups:
- name: asm-slo-sli-recordings-httpbin-asm-slo
  rules:
  - record: slo:sli_error:ratio_rate5m
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[5m])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[5m])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 5m
  - record: slo:sli_error:ratio_rate30m
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[30m])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[30m])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 30m
  - record: slo:sli_error:ratio_rate1h
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[1h])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[1h])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 1h
  - record: slo:sli_error:ratio_rate2h
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[2h])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[2h])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 2h
  - record: slo:sli_error:ratio_rate6h
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[6h])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[6h])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 6h
  - record: slo:sli_error:ratio_rate1d
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[1d])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[1d])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 1d
  - record: slo:sli_error:ratio_rate3d
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[3d])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[3d])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 3d
  - record: slo:sli_error:ratio_rate30d
    expr: |
      sum_over_time(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}[30d])
      / ignoring (slo_window)
      count_over_time(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}[30d])
    labels:
      slo_window: 30d
- name: asm-slo-meta-recordings-httpbin-asm-slo
  rules:
  - record: slo:objective:ratio
    expr: vector(0.99)
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: slo:error_budget:ratio
    expr: vector(1-0.99)
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: slo:time_period:days
    expr: vector(30)
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: slo:current_burn_rate:ratio
    expr: |
      slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
      / on(slo_id, asm_slo, slo_service) group_left
      slo:error_budget:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: slo:period_burn_rate:ratio
    expr: |
      slo:sli_error:ratio_rate30d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
      / on(slo_id, asm_slo, slo_service) group_left
      slo:error_budget:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: slo:period_error_budget_remaining:ratio
    expr: 1 - slo:period_burn_rate:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo",
      slo_service="httpbin"}
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: asm_slo_info
    expr: vector(1)
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_mode: cli-gen-prom
      slo_objective: "99"
      slo_service: httpbin
      slo_spec: prometheus/v1
      slo_version: dev
- name: asm-slo-alerts-httpbin-asm-slo
  rules:
  - alert: asm-alert
    expr: |
      (
          (slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (14.4 * 0.01))
          and ignoring (slo_window)
          (slo:sli_error:ratio_rate1h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (14.4 * 0.01))
      )
      or ignoring (slo_window)
      (
          (slo:sli_error:ratio_rate30m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (6 * 0.01))
          and ignoring (slo_window)
          (slo:sli_error:ratio_rate6h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (6 * 0.01))
      )
    labels:
      slo_severity: page
    annotations:
      summary: '{{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
        rate is over expected.'
      title: (page) {{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
        rate is too fast.
  - alert: asm-alert
    expr: |
      (
          (slo:sli_error:ratio_rate2h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (3 * 0.01))
          and ignoring (slo_window)
          (slo:sli_error:ratio_rate1d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (3 * 0.01))
      )
      or ignoring (slo_window)
      (
          (slo:sli_error:ratio_rate6h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (1 * 0.01))
          and ignoring (slo_window)
          (slo:sli_error:ratio_rate3d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (1 * 0.01))
      )
    labels:
      slo_severity: ticket
    annotations:
      summary: '{{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
        rate is over expected.'
      title: (ticket) {{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget
        burn rate is too fast.

Save the result for the following configuration to Prometheus.

0 2 1
Share on

Xi Ning Wang(王夕宁)

56 posts | 8 followers

You may also like

Comments

Xi Ning Wang(王夕宁)

56 posts | 8 followers

Related Products