Kubernetes Lossy Release Exploration

This article summarizes real EDAS customer scenarios and the Kubernetes traffic path to analyze the reasons for lossy release and provide practical solutions.

By Kuiyu

Issue Raised

Traffic loss is a common issue during application release. This phenomenon is usually reported to traffic monitoring. As shown in the following figure, the service RT suddenly increases during the release process, causing some services to respond slowly. The most intuitive experience for users is stuck, or the number of requested 500 errors increases suddenly. Service degradation or unavailability may occur, thus affecting user experience.

Since the application release is accompanied by a loss of traffic, we often need to move the release plan to the business trough and strictly limit the duration of the application release. However, the risks caused by the release cannot be completely avoided. Sometimes, we even have to stop the release. Enterprise Distributed Application Service (EDAS) is a general-purpose application management system. Application releasing is one of its most basic functions, and the Kubernetes application is the most common application form in EDAS. The following will summarize the real EDAS customer scenarios and start with the Kubernetes traffic path to analyze the reasons for lossy release and provide practical solutions.

Traffic Path Analysis

In Kubernetes, traffic can enter application pods from the following paths. Each path is different, and the causes of traffic loss are different. We will explore the routing mechanism of each path and the impact of pod changes on traffic paths.

LB Service Traffic

When you use the LoadBalancer Service to access an application, the core components in the traffic path are LoadBalancer and ipvs/iptables. LoadBalancer receives the external traffic from Kubernetes clusters and forwards it to Node. ipvs/iptables forwards the traffic received by Node to Pod. The actions of the core components are driven by cloud-controller-manager (CCM) and kube-proxy, which are responsible for updating the LoadBalancer backend and ipvs/iptables rules, respectively.

When the application is released, ready Pods will be added to the Endpoint backend, and Pods in the Terminating state will be removed from the Endpoint. The kube-proxy component updates the ipvs/iptables rules of each node. After the CCM component detects the change of the Endpoint, it calls the API of the cloud vendor to update the backend of the Load Balancer and updates the Node IP and port to the backend list. After inbound traffic is generated, the traffic will be forwarded to the corresponding Node based on the monitoring backend list configured in the Load Balancer and forwarded to the actual Pod by ipvs/iptables.

Service supports setting externalTrafficPolicy. Based on the externalTrafficPolicy, the behavior of the kube-proxy component updating the ipvs/iptables list and the CCM updating the Load Balancer backend varies.

Local Mode: CCM only adds the node where the target service resides to the Load Balancer backend address list. After traffic reaches the node, it is only forwarded to the Pods of the node.
Cluster Mode: CCM adds all nodes to the Load Balancer backend address list. After traffic reaches the node, it is allowed to be forwarded to Pods on other nodes.

Nginx Ingress Traffic

When you use SLB provided by Nginx Ingress to access an application, the core component of the traffic path is the Ingress controller. The Ingress controller functions as a proxy server to forward traffic to the Pods of the backend service and updates the routing rules of the gateway proxy based on the Endpoint.

When an application is released, the Ingress Controller monitors the changes of the Endpoint and updates the routing backend of the Ingress gateway. The inbound traffic will be forwarded to the upstream matching rules based on the traffic characteristics, and a backend will be selected based on the upstream backend list to forward the request.

By default, after monitoring the Endpoint changes of service, the Controller calls the dynamic configuration backend interface in Nginx to update the upstream backend list of the Nginx gateway to the Endpoint list of the service, which is Pod IP and the list of ports. Therefore, traffic that enters the Ingress Controller is directly forwarded to the backend Pod IP and ports.

Microservice Traffic

When you use microservices to access an application, the core component is the registry. After the Provider is started, the service will be registered with the registry, and the Consumer will subscribe to the address list of the service in the registry.

When an application is released, the Provider will register the Pod IP and port with the registry after it is started. The unpublished Pods will be removed from the registry. The change of the server list will be subscribed by the consumer, and the cached service backend Pod IP and the port list will be updated. After traffic is entered, the consumer Load Balancer forwards the traffic to the corresponding Provider Pod based on the service address list.

Cause Analysis and General Solutions

The application release process is the process of publishing new Pods and unpublishing old Pods. When there are problems in the update of traffic routing rules and the connection of application Pods publishing and unpublishing, traffic loss occurs. We can classify the traffic loss caused by application release as online loss and offline loss. In general, the reasons for the online and offline loss are listed below, which will be discussed in depth later.

Online Loss: After a new Pod is published, it is prematurely added to the backend of the route, and traffic is prematurely routed to the unprepared Pod.
Offline Loss: After the old Pod goes unpublished, the backend routing rules are not removed in time, and traffic is still routed to the stopped Pod.

Online Loss Analysis and Solutions

The following figure shows the process of publishing a Pod in Kubernetes:

If the availability of services in the Pod is not checked when the Pod is published, the Pod will be prematurely added to the Endpoint backend after the Pod is started and added to the gateway routing rules by other gateway controllers. Then, the connection will be denied when the traffic is forwarded to the Pod. Therefore, the health check is important. We need to ensure the Pod is started and allow it to allocate online traffic to avoid traffic loss. Kubernetes provides readinessProbe to check whether the new Pod is ready. You can set a proper readiness probe to check the actual startup status of an application and control the timing of publishing it in the Endpoint backend of a Service.

Endpoint-Based Traffic Scenarios

For scenarios where traffic paths are controlled based on the Endpoint(such as LB Service traffic and Nginx Ingress traffic), configure an appropriate readiness probe to ensure the service health check passes before adding the new Pod to the Endpoint backend to allocate traffic to avoid traffic loss. For example, in Spring Boot 2.3.0 and later, the health check interface /actuator/health/readiness and /actuator/health/liveness are added to support the configuration of the readiness probe and liveness probe for applications deployed in the Kubernetes environment.

readinessProbe:
...
httpGet:
path: /actuator/health/readiness
port: ${server.port}

Microservice Traffic Scenarios

For microservice applications, the registration and discovery of services are managed by the registry, which does not have a check mechanism (such as the Kubernetes readiness probe). In addition, since Java applications are usually started slowly, resources required after service registration is successful may still be initialized, such as database connection pool, thread pool, and JIT compilation. If a large number of microservice requests flood in at this time, exceptions (such as high RT or timeout) may occur.

Dubbo provides solutions for delayed registration and service preheating to address the preceding issues. The functions are summarized below:

The delayed registration function allows users to specify the time. After the program starts, it will complete the set waiting period and publish the service to the registry. During the waiting period, the program has the opportunity to complete initialization, avoiding the influx of service requests.

The service preheating function allows users to set the preheating duration. When Provider registers services in the registry, it registers the preheating duration and service starting time to the registry in the form of metadata. Consumer subscribes relative service instance list in the registry. Combined with the Provider starting time, Consumer calculates the call weight according to the instance preheating duration to control the just-launched instances to allocate less traffic. The low-traffic preheating allows programs to complete operations (such as class loading and JIT compilation at low loads). This allows new instances to share stable traffic after preheating.

We can add the following configurations to the program to enable the delayed registration and service preheating functions:

dubbo:
    provider:
        warmup: 120000
        delay: 5000

After you configure the preceding parameters, scale out one Pod for the Provider application to view the QPS curve during the startup of the new Pod to verify the traffic preheating effect. The following figure shows the QPS data:

According to the QPS curve of the received traffic of the Pod, the traffic is not evenly distributed after the Pod is started. Instead, the traffic processed per second increases linearly within the set preheating duration of 120 seconds and tends to stabilize after 120 seconds. This meets the expected effect of traffic preheating.

Offline Loss Analysis and Solutions

In Kubernetes, the following figure shows the process of Pods being unpublished:

As shown in the figure, after the Pod is deleted, the status is subscribed to by the endpoint-controller and kubelet, and the operation of removing the Endpoint and deleting the Pod is executed, respectively. However, the operations of these two components are performed at the same time instead of removing Endpoint in sequence and deleting the Pod. Therefore, it is possible that the Pod has received the SIGTERM signal, but traffic still enters.

Kubernetes provides the preStop Hook mechanism in the unpublishing Pod. This mechanism allows the kubelet to perform some pre-stop operations instead of immediately sending the SIGTERM signal to the container when the Pod is in the Terminating state. For a general solution to the preceding problem, you can set sleep in preStop for a period and delay sending SIGTERM to the application. This can avoid the loss of inbound traffic during this period. In addition, traffic that has already been received by the Pod can be allowed to continue to be processed.

The preceding section describes the traffic loss that may occur when Pods are unpublished and Endpoints are updated not in the expected order. After an application is connected to multiple types of gateways, the complexity of traffic paths increases, and traffic loss may occur in other routes.

LB Service Traffic Scenarios

When you use a LoadBalancer Service to access an application, externalTrafficPolicy in Local mode prevents traffic from being reforwarded and retains the source IP address of the request packets.

During the application release process, Pods are unpublished and have been deleted from the ipvs list of nodes. However, the CCM may have a delay in monitoring the Endpoint change and calling the cloud vendor API to update the backend of the Load Balancer. If the new Pod is scheduled to other nodes, no Pods are available on the original node, and the Load Balancer routing rules are not updated in time, the Load Balancer still forwards traffic to the original node, and no backend is available on this path, traffic loss will occur.

As such, although configuring sleep in the preStop of the Pod will keep the Pod running normally for a period before the LoadBalancer backend is updated, there is no guarantee that kube-proxy will not delete ipvs/iptables rules in the node before CCM removes the LoadBalancer backend. As shown in the preceding figure, request path 2 has been deleted while request path 1 has not been updated timely during Pod unpublished. Even if the Pod can continue to provide services for a period with sleep, the traffic is discarded before it is forwarded to the Pod due to incomplete forwarding rules.

Solutions:

Set externalTrafficPolicy to Cluster to avoid offline traffic loss. In the cluster mode, all nodes in the cluster are added to the backend of the Load Balancer, and the ipvs on node maintains a list of all available Pods in the cluster. If no Pods are available on the node, traffic can be reforwarded to Pods on other nodes. However, the secondary forwarding loss occurs, and the source IP address cannot be retained.
Pods are upgraded in place. You can specify that new Pods are still scheduled to the current Node by placing specific labels on the Node. Then, this process does not need to call the cloud vendor API to update the Load Balancer backend. After the traffic enters, it will be forwarded to the new Pod.

Nginx Ingress Traffic Scenarios

For Nginx Ingress, traffic is directly forwarded to the backend PodIP through the gateway instead of the ClusterIP of the Service by default. During the application release process, Pods are unpublished, and there is a delay when the Ingress Controller monitors the Endpoint change and updates it to the Nginx gateway. Traffic may still be forwarded to the Pods that have been unpublished. Then, traffic loss occurs.

Solutions:

The Ingress annotation nginx.ingress.kubernetes.io/service-upstream can be set to true or false. The default value is false. If you set this annotation to true, the routing rule uses ClusterIP as the upstream service address of the Ingress. If you set this annotation to false, the routing rule uses the PodIP as the upstream service address of the Ingress. The ClusterIP of the Service is always the same. When publishing or unpublishing the Pod, you do not need to change the configuration of the Nginx gateway, and traffic loss will not occur. However, when this feature is used, traffic load balancing is controlled by Kubernetes. Some Ingress Controller features become invalid (such as session persistence and retry policy).
Set a sleep period in the preStop of the Pod to allow the Pod to wait for some time before receiving the SIGTERM signal. This allows the Pod to receive and process traffic during this period to avoid traffic loss.

Microservice Traffic Scenarios

During the release of the Provider application, Pods are unpublished and deregistered from the registry. However, there is a delay in the change of the consumer subscription server list. As such, after the traffic enters the Consumer, if the Consumer still does not refresh the serverList, the offline Pods may still be accessed.

For the unpublishing of microservice application Pods, service registration and discovery are through the registry rather than independent of Endpoints. Removing the Endpoint from the endpoint-controller does not enable the Pod IP address to be offline from the registry. Only sleep in preStop still cannot solve the latency problem of consumer serverList cache refresh. In order for the old Pod to be unpublished, it needs to first be unpublished from the registry and be able to handle the traffic it has received. It is also necessary to ensure the consumer has refreshed the provider instance list cached on the client before the old Pod is unpublished. You can call the registry interface or call the interface provided by the service framework in the program and set it to preStop to achieve the effect above. In EDAS, you can directly use http://localhost:54199/offline:

lifecycle:
  preStop:
     exec:
       command:
         - /bin/sh
         - -c
         - curl http://localhost:54199/offline; sleep 30;

Enterprise-Level All-in-One Solution

Above, we analyzed the causes of the traffic loss problem of the three common traffic paths in the application release process and gave solutions. In order to ensure lossless traffic, gateway parameters and pod lifecycle probes and hooks must be used to ensure that traffic paths and pod publishing and unpublishing are matched. EDAS provides a non-intrusive solution to solve the preceding problems. You can use the EDAS console to implement the graceful start and shutdown of applications without changing program code or parameter configurations (as shown in the following figure):

LB Service supports configuring ExternalTrafficPolicy:

Nginx Ingress supports configuring annotations:

nginx.ingress.kubernetes.io/service-upstream

Microservice Application Configuration with Graceful Start Parameters

In addition, EDAS provides a variety of traffic gateway management methods (such as Nginx Ingress, ALB Ingress, and cloud-native gateway). It provides a variety of deployment methods for application release (such as batch release and canary release). It also provides different observability methods (such as Ingress monitoring and application monitoring). You can manage applications on the EDAS platform to implement graceful start and shutdown in multiple deployment scenarios.

Community

Kubernetes Lossy Release Exploration

Issue Raised

Traffic Path Analysis

LB Service Traffic

Nginx Ingress Traffic

Microservice Traffic

Cause Analysis and General Solutions

Online Loss Analysis and Solutions

Endpoint-Based Traffic Scenarios

Microservice Traffic Scenarios

Offline Loss Analysis and Solutions

LB Service Traffic Scenarios

Nginx Ingress Traffic Scenarios

Microservice Traffic Scenarios

Enterprise-Level All-in-One Solution

Read previous post:

Read next post:

Alibaba Cloud Native

You may also like

Comments

Alibaba Cloud Native

Related Products

Function Compute

Elastic High Performance Computing Solution

Quick Starts

ECS(Elastic Compute Service)