Why do time-consuming requests get lost or fail after a sidecar proxy is injected into a pod? - Alibaba Cloud Service Mesh

If a sidecar proxy is injected into a pod, time-consuming requests may get lost or fail after the pod stops. This topic provides the description and causes of this issue, and solutions to this issue.

Problem description

If a sidecar proxy is injected into a pod, the following issues may occur after the pod stops:

Some time-consuming requests to this pod are lost.
Requests initiated by using this pod to other Services fail.

Causes

After a sidecar proxy is injected into a pod, the traffic of the pod is controlled by the sidecar proxy. After the pod starts stopping, the corresponding Services no longer route traffic to the pod.

By default, Istio forcibly stops the sidecar proxy 5 seconds after Istio receives the signal that the pod is stopped. After the sidecar proxy is stopped, no inbound traffic is received, but the established inbound connections can still be processed. Outbound connections are not affected and can be initiated as expected. If the requests to the pod of the Service that is stopped are time-consuming, the existing inbound and outbound connections are terminated regardless of whether the connections are processed.

Solutions

Solution 1: Modify the termination drain duration for sidecar proxies

You can extend the termination drain duration for sidecar proxies so that inbound and outbound connections can be processed within this period.

Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.
On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Dataplane Component Management > Sidecar Proxy Setting.
On the Sidecar Proxy Setting page, click the Namespace tab.
Select a namespace from the Namespace drop-down list. Click Lifecycle Management, select Sidecar Proxy Drain Duration at Pod Termination, enter a proper value for Sidecar Proxy Drain Duration at Pod Termination, and then click Update Settings.

Solution 2: Configure the lifecycle of sidecar proxies

If you cannot estimate the maximum amount of time to wait for a request, we recommend that you set the preStop parameter to configure the lifecycle of sidecar proxies. The preStop parameter is used to check whether a request exists. If no requests exist, the sidecar proxy stops after the default amount of time, which is 5 seconds, elapses.

Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.
On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Dataplane Component Management > Sidecar Proxy Setting.
On the Sidecar Proxy Setting page, click the Namespace tab.

Select a namespace from the Namespace drop-down list. Click Lifecycle Management, select Lifecycle of Sidecar Proxy, enter the following content in the Lifecycle of Sidecar Proxy code editor, and then click Update Settings.

{
  "postStart": {
    "exec": {
      "command": [
        "pilot-agent",
        "wait"
      ]
    }
  },
  "preStop": {
    "exec": {
      "command": [
        "/bin/sh",
        "-c",
        "while [ $(netstat -plunt | grep tcp | grep -v envoy | wc -l | xargs) -ne 0 ]; do sleep 1; done"
      ]
    }
  }
}