After you add an application to an Alibaba Cloud Service Mesh (ASM) instance, a sidecar proxy is injected into the pod of the application. The sidecar proxy intercepts all requests sent to the application. If you directly enable HTTP or TCP health checks for the application, health check exceptions may occur. For example, the health checks may always fail. This topic describes how to enable HTTP and TCP health checks for applications in an ASM instance by configuring redirection of health check requests.

Background information

The following issues may occur if you directly enable TCP and HTTP health checks for applications in an ASM instance:

  • HTTP health checks
    The kubelet service sends health check requests to pods in a Kubernetes cluster. If you enable mutual TLS (mTLS) for an ASM instance, applications in the ASM instance must use TLS to communicate with each other. The kubelet service is not part of the ASM instance. As a result, the kubelet service cannot provide the TLS certificate issued by ASM. In this case, all HTTP health check requests are rejected by applications, and the health checks always fail.
    Note If you do not enable mTLS for an ASM instance, HTTP health checks can be successfully performed on pods of applications. In this case, you can directly enable HTTP health checks for applications without the need to configure redirection of health check requests.
  • TCP health checks

    Sidecar proxies listen to all ports of pods in an ASM instance to intercept requests. If you enable TCP health checks for an application, the kubelet service checks whether the specified port of the pod is listened to by an application. If yes, the health check is successful.

    If a sidecar proxy is injected into the pod and works as expected, health checks are always successful regardless of whether the application is healthy. For example, if you configure an invalid port for the application, the health checks should fail, and the pod should not be ready. However, the health checks are always successful for the pod.

To resolve the preceding issues, you need to configure redirection of health check requests for applications in an ASM instance by using annotations. After you configure redirection of health check requests, the health checks can work as expected.

Enable HTTP health checks for applications in an ASM instance

In this example, an NGINX application is used. After you enable mTLS for the ASM instance in which the NGINX application resides, HTTP health checks for the NGINX application always fail. In this case, you can configure redirection of health check requests for the NGINX application. Then, check the events of the pod of the NGINX application. If no events that indicate a failed health check exist and the pod is ready, HTTP health checks are effective for the application.

Step 1: Enable the global mTLS mode for an ASM instance

  1. Log on to the ASM console.
  2. In the left-side navigation pane, choose Service Mesh > Mesh Management.
  3. On the Mesh Management page, find the ASM instance that you want to configure. Click the name of the ASM instance or click Manage in the Actions column.
  4. On the details page of the ASM instance, choose Zero Trust Security > PeerAuthentication in the left-side navigation pane.
  5. At the top of the PeerAuthentication page, select a namespace from the Namespace drop-down list and click Configure Global mTLS Mode.
  6. On the Configure Global mTLS Mode page, select STRICT -Strictly Enforce mTLS for the mTLS Mode (Namespace-wide) parameter and click Create.

Step 2: Deploy an NGINX application

  1. Connect to ACK clusters by using kubectl.
  2. Deploy an NGINX application.
    1. Create an http-liveness.yaml file that contains the following code:
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: nginx-deployment
        labels:
          app: nginx
      spec:
        selector:
          matchLabels:
            app: nginx
        replicas: 1
        template:
          metadata:
            labels:
              app: nginx
          spec:
            containers:
            - name: nginx
              image: nginx
              imagePullPolicy: IfNotPresent
              ports:
              - containerPort: 80
              readinessProbe:
                httpGet:
                  path: /index.html
                  port: 80
                  httpHeaders:
                  - name: X-Custom-Header
                    value: hello
                initialDelaySeconds: 5
                periodSeconds: 3

      The httpGet parameter in the readinessProbe parameter is specified to enable HTTP health checks for the NGINX application.

    2. Run the following command to deploy an NGINX application:
      kubectl apply -f http-liveness.yaml
  3. View the health check result of the NGINX application.
    1. Run the following command to view the name of the pod that runs the NGINX application:
      kubectl get pod| grep nginx
    2. Run the following command to view the events of the pod:
      kubectl describe pod <Pod name>

      Expected output:

      Warning  Unhealthy  45s               kubelet            Readiness probe failed: Get "http://172.23.64.22:80/index.html": read tcp 172.23.64.1:54130->172.23.64.22:80: read: connection reset by peer

      The preceding result indicates that HTTP health checks for the pod fail. As a result, the pod is not ready.

Step 3: Configure rediection of health check requests for the NGINX application

  1. Run the following command to open the http-liveness.yaml file:
    vim http-liveness.yaml
    Add the following content to the template parameter:
    annotations:
      sidecar.istio.io/rewriteAppHTTPProbers: "true"
    The following code shows the content of the http-liveness.yaml file after you add an annotation:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment
      labels:
        app: nginx
    spec:
      selector:
        matchLabels:
          app: nginx
      replicas: 1
      template:
        metadata:
          labels:
            app: nginx
          annotations:
            sidecar.istio.io/rewriteAppHTTPProbers: "true"
        spec:
          containers:
          - name: nginx
            image: nginx
            imagePullPolicy: IfNotPresent
            ports:
            - containerPort: 80
            readinessProbe:
              httpGet:
                path: /index.html
                port: 80
                httpHeaders:
                - name: X-Custom-Header
                  value: hello
              initialDelaySeconds: 5
              periodSeconds: 3
  2. Run the following command to deploy the NGINX application:
    kubectl apply -f http-liveness.yaml

Step 4: Verify that the health check result meets your expectations

  1. View the health check result of the pod.
    1. Run the following command to view the name of the pod that runs the NGINX application:
      kubectl get pod| grep nginx
    2. Run the following command to view the events of the pod:
      kubectl describe pod <Pod name>

      In the command output, no events that indicate a failed health check for the pod exist. The pod is ready. This indicates that HTTP health checks are effective for the application.

  2. Run the following command to view the YAML file of the pod after you configure redirection of health check requests:
    kubectl get pod nginx-deployment-676f85f66b-7vxct -o yaml

    Expected output:

    apiVersion: v1
    kind: Pod
    metadata:
      ...
      name: nginx-deployment-676f85f66b-cbzsx
      namespace: default
      ...
    spec:
      containers:
        - args:
            - proxy
            - sidecar
            - '--domain'
            - $(POD_NAMESPACE).svc.cluster.local
            - '--proxyLogLevel=warning'
            - '--proxyComponentLogLevel=misc:error'
            - '--log_output_level=default:info'
            - '--concurrency'
            - '2'
          env:
            ...
            - name: ISTIO_KUBE_APP_PROBERS
              value: >-
                null
          ...
        - image: nginx
          imagePullPolicy: IfNotPresent
          name: nginx
          ports:
            - containerPort: 80
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              httpHeaders:
                - name: X-Custom-Header
                  value: hello
              path: /app-health/nginx/readyz
              port: 15020
              scheme: HTTP
            initialDelaySeconds: 5
            periodSeconds: 3
            successThreshold: 1
            timeoutSeconds: 1

    After you configure redirection of health check requests, the health check port is changed from port 80 to port 15020, and the health check path is changed from /index.html to /app-health/nginx/readyz. In addition, an environment variable named ISTIO_KUBE_APP_PROBERS is added to the sidecar container of the pod. The value of this environment variable is serialized from the original health check configurations in the JSON format.

    For applications deployed in ASM, port 15020 is exclusively used for the observability of ASM. Requests that are sent to port 15020 are not intercepted by sidecar proxies. Therefore, health check requests are not affected by the requirements of the mTLS mode. After you configure redirection of health check requests, the pilot-agent service that runs in the sidecar container listens to port 15020. The pilot-agent service receives health check requests from the kubelet service and redirects these requests to the business container based on the value of the ISTIO_KUBE_APP_PROBERS environment variable. This ensures that HTTP health checks work as expected.

Enable TCP health checks for applications in an ASM instance

In this example, an NGINX application is used, and an invalid port is configured for the NGINX application. If you directly enable TCP health checks for the NGINX application, TCP health checks that should fail are always successful for the application. After you configure redirection of health check requests, TCP health checks for the pod fail. This indicates that TCP health checks are effective for the NGINX application.

Step 1: Deploy an NGINX application

  1. Connect to ACK clusters by using kubectl.
  2. Deploy an NGINX application.
    1. Create an http-liveness.yaml file that contains the following code:
      Port 2940 is configured as the health check port for the NGINX application. However, port 2940 is not listened to by the NGINX application. In normal cases, health checks should fail for the NGINX application, and the pod should not be ready.
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: nginx-deployment
        labels:
          app: nginx
      spec:
        selector:
          matchLabels:
            app: nginx
        replicas: 1
        template:
          metadata:
            labels:
              app: nginx
          spec:
            containers:
            - name: nginx
              image: nginx
              imagePullPolicy: IfNotPresent
              ports:
              - containerPort: 80
              readinessProbe:
                tcpSocket:
                  port: 2940
                initialDelaySeconds: 5
                periodSeconds: 3

      The tcpSocket parameter in the readinessProbe parameter is specified to enable TCP health checks for the NGINX application.

    2. Run the following command to deploy an NGINX application:
      kubectl apply -f tcp-liveness.yaml
  3. View the health check result of the NGINX application.
    1. Run the following command to view the name of the pod that runs the NGINX application:
      kubectl get pod| grep nginx
    2. Run the following command to view the events of the pod:
      kubectl describe pod <Pod name>

      In the command output, no events that indicate a failed health check exist. In this case, the pod is ready. This does not meet your expectations.

Step 2: Configure redirection of health check requests for the NGINX application

  1. Run the following command to open the tcp-liveness.yaml file:
    vim tcp-liveness.yaml
    Add the following content to the template parameter in the tcp-liveness.yaml file:
    annotations:
      sidecar.istio.io/rewriteAppHTTPProbers: "true"
    The following code shows the content of the tcp-liveness.yaml file after you add an annotation:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment
      labels:
        app: nginx
    spec:
      selector:
        matchLabels:
          app: nginx
      replicas: 1
      template:
        metadata:
          labels:
            app: nginx
          annotations:
            sidecar.istio.io/rewriteAppHTTPProbers: "true"
        spec:
          containers:
          - name: nginx
            image: nginx
            imagePullPolicy: IfNotPresent
            ports:
            - containerPort: 80
            readinessProbe:
              httpGet:
                path: /index.html
                port: 80
                httpHeaders:
                - name: X-Custom-Header
                  value: hello
              initialDelaySeconds: 5
              periodSeconds: 3
  2. Run the following command to deploy the NGINX application:
    kubectl apply -f tcp-liveness.yaml

Step 3: Verify that the health check result meets your expectations

  1. View the health check result of the NGINX application.
    1. Run the following command to view the name of the pod that runs the NGINX application:
      kubectl get pod| grep nginx
    2. Run the following command to view the events of the pod:
      kubectl describe pod <Pod name>

      Expected output:

      Warning  Unhealthy  45s               kubelet            Readiness probe failed: HTTP probe failed with statuscode: 500

      The preceding result indicates that TCP health checks for the pod fail. This meets your expectations.

  2. Run the following command to view the YAML file of the pod after you configure redirection of health check requests:
    kubectl get pod nginx-deployment-746458cdc9-m9t9q -o yaml

    Expected output:

    apiVersion: v1
    kind: Pod
    metadata:
      ...
      name: nginx-deployment-746458cdc9-m9t9q
      namespace: default
      ...
    spec:
      containers:
        - args:
            - proxy
            - sidecar
            - '--domain'
            - $(POD_NAMESPACE).svc.cluster.local
            - '--proxyLogLevel=warning'
            - '--proxyComponentLogLevel=misc:error'
            - '--log_output_level=default:info'
            - '--concurrency'
            - '2'
          env:
            ...
            - name: ISTIO_KUBE_APP_PROBERS
              value: >-
                null
          ...
        - image: nginx
          imagePullPolicy: IfNotPresent
          name: nginx
          ports:
            - containerPort: 80
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /app-health/nginx/readyz
              port: 15020
              scheme: HTTP
            initialDelaySeconds: 5
            periodSeconds: 3
            successThreshold: 1
            timeoutSeconds: 1
          ...

    After you configure redirection of health check requests, the original TCP requests are converted to HTTP requests. In addition, the health check port is changed from port 80 to port 15020, and the path /app-health/nginx/readyz is automatically added for HTTP health checks. An environment variable named ISTIO_KUBE_APP_PROBERS is added to the sidecar container of the pod. The value of this environment variable is serialized from the original TCP health check configurations in the JSON format.

    Port 15020 is used to receive the converted HTTP health check requests. This is the same as that in the redirection configurations for HTTP health check requests. After you configure redirection of health check requests, the pilot-agent service that runs in the sidecar container listens to port 15020. The pilot-agent service receives health check requests from the kubelet service and checks the status of the TCP health check port configured for the business container based on the value of the ISTIO_KUBE_APP_PROBERS environment variable. If the TCP health check fails, the pilot-agent service returns a 500 status code. This status code indicates a failed health check.