All Products
Search
Document Center

Alibaba Cloud Service Mesh:Configure SLOs for applications in ASM

Last Updated:Jan 30, 2024

If you need to manage and monitor the service level of an application, you can configure service level objectives (SLOs) and alert rules in the Service Mesh (ASM) console to ensure that the application runs as expected. When the service level of the application becomes equal to or lower than the preset threshold, ASM issues different levels of reminders based on the severity of the fault. This helps you manage the service level of the application more efficiently and handle issues more quickly.

Prerequisites

Step 1: Deploy the HTTPBin application

  1. Create an httpbin.yaml file that contains the following content:

    Expand to view the httpbin.yaml file

    ##################################################################################################
    # httpbin service
    ##################################################################################################
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: httpbin
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: httpbin
      labels:
        app: httpbin
        service: httpbin
    spec:
      ports:
      - name: http
        port: 8000
        targetPort: 80
      selector:
        app: httpbin
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: httpbin
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: httpbin
          version: v1
      template:
        metadata:
          labels:
            app: httpbin
            version: v1
        spec:
          serviceAccountName: httpbin
          containers:
          - image: docker.io/kennethreitz/httpbin
            imagePullPolicy: IfNotPresent
            name: httpbin
            ports:
            - containerPort: 80
  2. Use kubectl to connect to the ACK cluster and run the following command to deploy the HTTPBin application.

    For more information about how to use kubectl to connect to the ACK cluster, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

    kubectl apply -f httpbin.yaml

Step 2: Create a virtual service and an Istio gateway

  1. Create an httpbin-gateway.yaml file that contains the following content:

    Expand to view the httpbin-gateway.yaml file

    apiVersion: networking.istio.io/v1alpha3
    kind: Gateway
    metadata:
      name: httpbin-gateway
    spec:
      selector:
        istio: ingressgateway
      servers:
      - port:
          number: 80
          name: http
          protocol: HTTP
        hosts:
        - "*"
    ---
    apiVersion: networking.istio.io/v1alpha3
    kind: VirtualService
    metadata:
      name: httpbin
    spec:
      hosts:
      - "*"
      gateways:
      - httpbin-gateway
      http:
      - route:
        - destination:
            host: httpbin
            port:
              number: 8000
  2. Use kubectl to connect to the ASM instance and run the following command to deploy the virtual service and Istio gateway.

    For more information about how to use kubectl to connect to the ASM instance, see Use kubectl on the control plane to access Istio resources.

    kubectl apply -f httpbin-gateway.yaml
  3. In the address bar of your browser, enter http://{IP address of the ingress gateway}.

    For more information about how to obtain the IP address of the ingress gateway, see Use Istio resources to route traffic to different versions of a service. If you can view the page of the HTTPBin application, the HTTPBin application is successfully deployed.

Step 3: Configure an SLO

In this example, an SLO is configured for the HTTPBin application in the default namespace to specify the service availability. The objective is 99% and the period of time during which the SLO takes effect is 30 days. Two severity levels of alerts are configured: Page and Ticket. For more information about the concepts related to SLO, see SLO overview.

  1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.

  2. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Observability Management Center > SLO Configuration.

  3. In the upper part of the SLO Configuration page, select the default namespace from the Namespace drop-down list and click Create in the Actions column of the httpbin service.

  4. In the Basic Information section of the Create page, set Duration to 30d.

  5. Click the SLO rule tab. Set Name to asm-slo, Plugin type to availability, and Objective to 99. Turn on Enable alerting rules and set Alerting rules name to asm-alert. Turn on Enable alerting rule with Ticket level and Enable alerting rule with Page level.SLO配置

  6. (Optional) In the lower part of the page, click Preview to view the configurations. Confirm that the configurations are correct and click Submit.

    For more information about the fields in the configuration file, see Description of SLO CRD fields.

  7. In the lower part of the page, click Create.

Step 4: View the automatically generated Prometheus rule

After the SLO is configured, you can perform the following operations to view the automatically generated Prometheus rule: find the httpbin service on the SLO Configuration page and click View Prometheus rules in the Actions column.

查看Promethe规则

Expand to view a sample Prometheus rule

groups:
- name: asm-slo-sli-recordings-httpbin-asm-slo
  rules:
  - record: slo:sli_error:ratio_rate5m
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[5m])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[5m])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 5m
  - record: slo:sli_error:ratio_rate30m
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[30m])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[30m])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 30m
  - record: slo:sli_error:ratio_rate1h
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[1h])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[1h])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 1h
  - record: slo:sli_error:ratio_rate2h
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[2h])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[2h])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 2h
  - record: slo:sli_error:ratio_rate6h
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[6h])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[6h])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 6h
  - record: slo:sli_error:ratio_rate1d
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[1d])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[1d])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 1d
  - record: slo:sli_error:ratio_rate3d
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[3d])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[3d])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 3d
  - record: slo:sli_error:ratio_rate30d
    expr: |
      sum_over_time(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}[30d])
      / ignoring (slo_window)
      count_over_time(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}[30d])
    labels:
      slo_window: 30d
- name: asm-slo-meta-recordings-httpbin-asm-slo
  rules:
  - record: slo:objective:ratio
    expr: vector(0.99)
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: slo:error_budget:ratio
    expr: vector(1-0.99)
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: slo:time_period:days
    expr: vector(30)
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: slo:current_burn_rate:ratio
    expr: |
      slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
      / on(slo_id, asm_slo, slo_service) group_left
      slo:error_budget:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: slo:period_burn_rate:ratio
    expr: |
      slo:sli_error:ratio_rate30d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
      / on(slo_id, asm_slo, slo_service) group_left
      slo:error_budget:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: slo:period_error_budget_remaining:ratio
    expr: 1 - slo:period_burn_rate:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo",
      slo_service="httpbin"}
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: asm_slo_info
    expr: vector(1)
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_mode: cli-gen-prom
      slo_objective: "99"
      slo_service: httpbin
      slo_spec: prometheus/v1
      slo_version: dev
- name: asm-slo-alerts-httpbin-asm-slo
  rules:
  - alert: asm-alert
    expr: |
      (
          (slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (14.4 * 0.01))
          and ignoring (slo_window)
          (slo:sli_error:ratio_rate1h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (14.4 * 0.01))
      )
      or ignoring (slo_window)
      (
          (slo:sli_error:ratio_rate30m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (6 * 0.01))
          and ignoring (slo_window)
          (slo:sli_error:ratio_rate6h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (6 * 0.01))
      )
    labels:
      slo_severity: page
    annotations:
      summary: '{{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
        rate is over expected.'
      title: (page) {{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
        rate is too fast.
  - alert: asm-alert
    expr: |
      (
          (slo:sli_error:ratio_rate2h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (3 * 0.01))
          and ignoring (slo_window)
          (slo:sli_error:ratio_rate1d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (3 * 0.01))
      )
      or ignoring (slo_window)
      (
          (slo:sli_error:ratio_rate6h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (1 * 0.01))
          and ignoring (slo_window)
          (slo:sli_error:ratio_rate3d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (1 * 0.01))
      )
    labels:
      slo_severity: ticket
    annotations:
      summary: '{{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
        rate is over expected.'
      title: (ticket) {{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget
        burn rate is too fast.

What to do next

You can import the generated Prometheus rule to the Prometheus system for the SLO to take effect and use Grafana to view SLO-related metrics. For more information, see Import the generated Prometheus rule to the Prometheus system for the SLOs to take effect and Use Grafana to view SLOs.