全部產品
Search
文件中心

Alibaba Cloud Service Mesh:配置應用的服務等級目標SLO

更新時間:Jan 13, 2025

當您需要對應用程式的服務水平進行管理和監控時,可以在ASM控制台佈建服務等級目標SLO(Service Level Objectives)和相應的警示規則,確保應用程式按照期望的服務水平運行。一旦應用程式的服務水平達到或超過預設的閾值,ASM將根據故障的嚴重程度,在故障發生時及時發出不同等級的提醒,提高應用程式服務水平管理的效率和響應速度。

前提條件

步驟一:部署httpbin應用樣本

  1. 使用以下內容,建立httpbin.yaml。

    展開查看httpbin.yaml

    ##################################################################################################
    # httpbin service
    ##################################################################################################
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: httpbin
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: httpbin
      labels:
        app: httpbin
        service: httpbin
    spec:
      ports:
      - name: http
        port: 8000
        targetPort: 80
      selector:
        app: httpbin
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: httpbin
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: httpbin
          version: v1
      template:
        metadata:
          labels:
            app: httpbin
            version: v1
        spec:
          serviceAccountName: httpbin
          containers:
          - image: docker.io/kennethreitz/httpbin
            imagePullPolicy: IfNotPresent
            name: httpbin
            ports:
            - containerPort: 80
  2. 使用kubectl串連ACK叢集,執行以下命令,在ACK叢集中部署httpbin。

    關於如何通過kubectl工具串連ACK叢集,請參見通過kubectl工具串連叢集

    kubectl apply -f httpbin.yaml

步驟二:配置虛擬服務和網關規則

  1. 使用以下內容,建立httpbin-gateway.yaml。

    展開查看httpbin-gateway.yaml

    apiVersion: networking.istio.io/v1alpha3
    kind: Gateway
    metadata:
      name: httpbin-gateway
    spec:
      selector:
        istio: ingressgateway
      servers:
      - port:
          number: 80
          name: http
          protocol: HTTP
        hosts:
        - "*"
    ---
    apiVersion: networking.istio.io/v1alpha3
    kind: VirtualService
    metadata:
      name: httpbin
    spec:
      hosts:
      - "*"
      gateways:
      - httpbin-gateway
      http:
      - route:
        - destination:
            host: httpbin
            port:
              number: 8000
  2. 通過kubectl串連ASM執行個體,執行以下命令,部署虛擬服務和網關規則。

    關於如何通過kubectl工具串連ASM執行個體,請參見通過控制面kubectl訪問Istio資源

    kubectl apply -f httpbin-gateway.yaml
  3. 在瀏覽器地址欄,輸入http://{入口網關的IP地址}

    關於如何擷取網關IP,請參見擷取入口網關地址。如果您可以看到httpbin應用的頁面,說明httpbin應用部署成功。

步驟三:定義SLO配置

本文將為default命名空間下的httpbin服務產生服務可用性SLO。其中,目標值為99%,期間為30天,配置Page和Ticket兩個等級的警示。關於SLO的相關概念說明,請參見服務等級目標SLO概述

  1. 登入ASM控制台,在左側導覽列,選擇服務網格 > 網格管理

  2. 網格管理頁面,單擊目標執行個體名稱,然後在左側導覽列,選擇可觀測管理中心 > SLO配置

  3. SLO配置頁面上方,選擇命名空間為目標服務所在的命名空間(本文為default),在目標服務httpbin右側,單擊建立

  4. 建立頁面的基本資料地區,期間選擇30天

  5. 單擊SLO規則,配置名稱asm-slo外掛程式類型選擇availability目標值99,開啟開啟警示規則開關,配置警示規則名稱asm-alert,然後開啟開啟緊急層級的警示規則開啟警告層級的警示規則開關。SLO配置

  6. 可選:在頁面下方,單擊預覽,查看配置資訊。確認無誤後,單擊確認

    關於設定檔的欄位說明,請參見SLO CRD欄位說明

  7. 配置完成後,在頁面下方,單擊建立

步驟四:自動產生Prometheus規則

SLO配置成功後,您可以在SLO配置頁面的目標服務httpbin右側,單擊查看Promethe規則,查看產生的結果。

查看Promethe規則

展開查看Promethe規則樣本

groups:
- name: asm-slo-sli-recordings-httpbin-asm-slo
  rules:
  - record: slo:sli_error:ratio_rate5m
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[5m])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[5m])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 5m
  - record: slo:sli_error:ratio_rate30m
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[30m])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[30m])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 30m
  - record: slo:sli_error:ratio_rate1h
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[1h])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[1h])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 1h
  - record: slo:sli_error:ratio_rate2h
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[2h])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[2h])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 2h
  - record: slo:sli_error:ratio_rate6h
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[6h])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[6h])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 6h
  - record: slo:sli_error:ratio_rate1d
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[1d])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[1d])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 1d
  - record: slo:sli_error:ratio_rate3d
    expr: "(\n(\n  sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
      }[3d])) \n  /          \n  (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
      }[3d])) > 0)\n) OR on() vector(0)\n)"
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
      slo_window: 3d
  - record: slo:sli_error:ratio_rate30d
    expr: |
      sum_over_time(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}[30d])
      / ignoring (slo_window)
      count_over_time(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}[30d])
    labels:
      slo_window: 30d
- name: asm-slo-meta-recordings-httpbin-asm-slo
  rules:
  - record: slo:objective:ratio
    expr: vector(0.99)
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: slo:error_budget:ratio
    expr: vector(1-0.99)
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: slo:time_period:days
    expr: vector(30)
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: slo:current_burn_rate:ratio
    expr: |
      slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
      / on(slo_id, asm_slo, slo_service) group_left
      slo:error_budget:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: slo:period_burn_rate:ratio
    expr: |
      slo:sli_error:ratio_rate30d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
      / on(slo_id, asm_slo, slo_service) group_left
      slo:error_budget:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: slo:period_error_budget_remaining:ratio
    expr: 1 - slo:period_burn_rate:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo",
      slo_service="httpbin"}
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_service: httpbin
  - record: asm_slo_info
    expr: vector(1)
    labels:
      asm_slo: asm-slo
      slo_id: httpbin-asm-slo
      slo_mode: cli-gen-prom
      slo_objective: "99"
      slo_service: httpbin
      slo_spec: prometheus/v1
      slo_version: dev
- name: asm-slo-alerts-httpbin-asm-slo
  rules:
  - alert: asm-alert
    expr: |
      (
          (slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (14.4 * 0.01))
          and ignoring (slo_window)
          (slo:sli_error:ratio_rate1h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (14.4 * 0.01))
      )
      or ignoring (slo_window)
      (
          (slo:sli_error:ratio_rate30m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (6 * 0.01))
          and ignoring (slo_window)
          (slo:sli_error:ratio_rate6h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (6 * 0.01))
      )
    labels:
      slo_severity: page
    annotations:
      summary: '{{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
        rate is over expected.'
      title: (page) {{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
        rate is too fast.
  - alert: asm-alert
    expr: |
      (
          (slo:sli_error:ratio_rate2h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (3 * 0.01))
          and ignoring (slo_window)
          (slo:sli_error:ratio_rate1d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (3 * 0.01))
      )
      or ignoring (slo_window)
      (
          (slo:sli_error:ratio_rate6h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (1 * 0.01))
          and ignoring (slo_window)
          (slo:sli_error:ratio_rate3d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (1 * 0.01))
      )
    labels:
      slo_severity: ticket
    annotations:
      summary: '{{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
        rate is over expected.'
      title: (ticket) {{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget
        burn rate is too fast.

後續步驟

您可以將產生的Prometheus規則匯入Prometheus中執行SLO,並使用Grafana查看SLO相關指標。具體操作,請參見將產生的規則匯入Prometheus中執行SLO使用Grafana查看SLO