Research on K8s Lossy Release

Question posing

Losing traffic is a common problem during application publishing, and its phenomenon is usually fed back to traffic monitoring, as shown in the figure below. During the publishing process, the service RT suddenly rises, causing slow response of some businesses. The most intuitive experience for users is "stuck"; Or the number of 500 errors requested suddenly increases, and the user may feel that the service is degraded or unavailable, thus affecting the user experience.

Because the application release is accompanied by traffic loss, we often need to move the release plan to the business downturn and strictly limit the duration of the application release. However, the risk of the release cannot be completely avoided, and sometimes we have to choose to stop the release. As a general application management system, application publishing is one of the most basic functions of EDAS, and K8s application is the most common form of application in EDAS. The following will analyze the cause of lossy publishing and provide practical solutions by summarizing the real scenarios of EDAS customers, starting with the flow path of K8s.

Traffic path analysis

In K8s, traffic can usually enter the application pod from the following paths. Each path is very different, and the reasons for traffic loss are different. We will explore the routing mechanism of each path, as well as the impact of Pod changes on traffic paths.

LB Service Traffic

When accessing applications through the LoadBalancer type Service, the core components in the traffic path are LoadBalancer and ipvs/iptables. The LoadBalancer is responsible for receiving the external traffic of the K8s cluster and forwarding it to the Node. The ipvs/iptables is responsible for forwarding the traffic received by the Node to the Pod. The actions of core components are driven by CCM (cloud controller manager) and kube proxy, which are respectively responsible for updating LoadBalancer backend and ipvs/iptables rules.

When an application is published, the ready pod will be added to the endpoint backend, and the Terminating pod will be removed from the endpoint. The kube proxy component will update the ipvs/iptables rules of each node. After listening to the endpoint changes, the CCM component will call the cloud manufacturer API to update the backend of the load balancer and update the node IP and port to the backend list. After the traffic enters, it will be forwarded to the corresponding node according to the monitoring backend list configured by the load balancer, and then forwarded to the actual pod by the node ipvs/iptables.

The service supports the setting of externalTrafficPolicy. According to the different parameters, the behavior of updating the ipvs/iptables list by the kube proxy component of the node and the backend of the load balancer by CCM will be different:

• Local mode: CCM will only add the node where the target service is located to the load balancing back-end address list. After the traffic reaches the node, it will only be forwarded to the node's Pod.

• Cluster mode: CCM will add all nodes to the load balancing back-end address list. After the traffic reaches this node, it is allowed to be forwarded to the Pod of other nodes.

Nginx Ingress traffic

When accessing applications through the SLB provided by Nginx Ingress, the core component of the traffic path is Ingress Controller, which not only acts as a proxy server to forward traffic to the back-end service's Pod, but also updates the routing rules of the gateway proxy according to the Endpoint.

When the application is published, the Ingress Controller will listen to the changes in the Endpoint and update the Ingress Gateway routing backend. After the traffic enters, it will forward the traffic to the upstream of the matching rule according to the traffic characteristics, and select a backend according to the upstream backend list to forward the request.

By default, the Controller will call the dynamically configured backend interface in Nginx after listening to the change of the service endpoint, and update the upstream backend list of the Nginx gateway to the service endpoint list, that is, the IP and port list of the pod. Therefore, the traffic entering the Ingress Controller will be directly forwarded to the back-end pod IP and port.

Micro service traffic

When using microservice to access applications, the core component is the registry. After the provider is started, the service will be registered in the registry, and the consumer will subscribe to the address list of the service in the registry.

When the application is published, the provider will register the Pod IP and port to the registry after startup, and the offline Pod will be removed from the registry. Changes to the server list will be subscribed by consumers, and the cached service backend Pod IP and port list will be updated. After the traffic enters, the consumer will be forwarded to the corresponding provider pod by the client load balancer according to the service address list.

Cause analysis and general solutions

The application publishing process is actually a process in which the new Pod goes online and the old Pod goes offline. When the update of traffic routing rules and the coordination between the online and offline of the application Pod go wrong, traffic loss will occur. We can classify the traffic loss caused by the application release as online loss and offline loss. In general, the reasons for online and offline loss are as follows, which will be discussed in more detail in the following sections:

• Lossy launch: When a new pod is launched, it is added to the routing backend too early, and the traffic is routed to the unprepared pod too early.

• Lossy offline: After the old pod was offline, the routing rules did not remove the backend in time, and the traffic was still routed to the stopped pod.

On line damage analysis and countermeasures

The online process of Pod in K8s is shown in the following figure:

If the service in the pod is not checked for availability when the pod is online, it will be added to the endpoint backend prematurely after the pod is started, and then added to the gateway routing rules by other gateway controllers. The connection will be rejected after the traffic is forwarded to the pod. Therefore, health check is particularly important. We need to ensure that the Pod is started before allowing it to share online traffic to avoid traffic loss. K8s provides the readiness probe for the Pod to verify whether the new Pod is ready, set a reasonable ready probe to check the actual startup status of the application, and then control the time when it goes online in the Service backend Endpoint.

Endpoint based traffic scenario

For scenarios where endpoint based traffic paths are controlled, such as LB Service traffic and Nginx Ingress traffic, configuring appropriate ready probe checks can ensure that service health checks pass before adding them to the endpoint back-end to allocate traffic to avoid traffic loss. For example, in Spring Boot version 2.3.0 and above, the health check interfaces/actor/health/readiness and/actor/health/liveness are added to support the configuration of ready probes and survival probes deployed in K8S environments:

Micro service traffic scenario

For microservice applications, service registration and discovery are managed by the registry, which does not have a check mechanism like K8s ready probe. And because JAVA applications usually start slowly, the resources required after successful service registration may still be in initialization, such as database connection pool, thread pool, JIT compilation, etc. If a large number of microservice requests pour in at this time, it is likely to cause exceptions such as high request RT or timeout.

To solve the above problems, Dubbo provides a solution for delayed registration and service warm-up. The functions are summarized as follows:

• The delayed registration function allows the user to specify a period of time. After the program is started, it will first complete the set wait, and then publish the service to the registry. During the wait, the program has the opportunity to complete initialization, avoiding the influx of service requests.

• The service warm-up function allows the user to set the warm-up duration. When the provider registers a service with the registry, it registers the warm-up duration and service startup time in the registry through metadata. The consumer subscribes to the list of related service instances in the registry. Based on the warm-up duration of the instance, the provider calculates the call weight in combination with the provider startup time to control the allocation of less traffic to the newly started instance. Through small flow preheating, the program can complete class loading, JIT compilation and other operations under low load, so as to enable the new instance to share the flow stably after preheating.

We can enable the delayed registration and service warm-up functions by adding the following configurations to the program:

After the above parameters are configured, we expand a Pod for the Provider application to view the QPS curve during the startup of the new Pod to verify the flow warm-up effect. QPS data is shown in the following figure:

It can be seen from the QPS curve of the received flow of the Pod that after the startup of the Pod, there is no direct sharing of the flow on the line, but the flow handled per second shows a linear growth trend in the set warm-up time of 120 seconds, and tends to be stable after 120 seconds, which conforms to the expected effect of flow warm-up.

Analysis and countermeasures of offline damage

In K8s, the offline process of Pod is shown in the following figure:

From the figure, we can see that after the Pod is deleted, the status is subscribed by the endpoint controller and the kubelet, and the Endpoint and Pod are removed respectively. The operations of these two components are performed at the same time. It is not expected that the Endpoint will be removed first and then the Pod will be deleted in order. Therefore, it may occur that the Pod has received the SIGTERM signal, but the traffic still enters.

The K8s provides a preStop Hook mechanism in the pod offline process, which allows kubelet to do some pre stop operations instead of sending a SIGTERM signal to the container when it finds that the pod status is Terminating. For the general scheme of the above problems, you can set the sleep in preStop for a period of time, and let SIGTERM delay sending it to the application for a period of time to avoid the loss of incoming traffic during this period of time. In addition, the traffic received by the Pod can be allowed to continue processing.

The above describes the traffic loss problem that may occur when the Pod offline and Endpoint update time do not conform to the expected order during the change. After the application accesses multiple types of gateways, the complexity of the traffic path increases, and traffic loss may occur in other routing links.

LB Service Traffic Scenario

When using the LoadBalancer type Service to access applications, configuring the externalTrafficPolicy in Local mode can prevent traffic from being retransmitted and can retain the source IP address of the request packet.

During the application publishing process, the Pod is offline and has been deleted from the node's ipvs list. However, there may be a delay when CCM listens for Endpoint changes and calls the cloud manufacturer's API to update the backend of the load balancer. If the new Pod is dispatched to another node and there is no available Pod on the original node, if the routing rules of the load balancer are not updated in time, the load balancer will still forward the traffic to the original node, but there is no available backend for this path, causing traffic loss.

At this time, although configuring sleep in the pre Stop of the Pod allows the Pod to run normally for a period of time before the LoadBalancer backend is updated, it cannot ensure that the kube proxy does not delete the ipvs/iptables rules in the node before the CCM removes the LoadBalancer backend. The scenario is as shown in the above figure. During the offline process of the Pod, the request path 2 has been deleted, while the request path 1 has not been updated in time. Even though the sleep allows the Pod to continue to provide services for a period of time, due to the incomplete forwarding rules, the traffic has been discarded before being forwarded to the Pod.

Solution:

• Setting externalTrafficPolicy to Cluster can avoid offline traffic loss. Because all nodes in the cluster are added to the backend of the load balancer in Cluster mode, and the ipvs in the node maintain a list of all available pods in the cluster. When there are no available pods in this node, it can be retransmitted to the pods on other nodes, but it will cause loss in the secondary forwarding, and the source IP address cannot be reserved.

• Set the Pod to upgrade in place. By marking a specific label on the node, the new Pod is still scheduled to the node. Then the process does not need to call the cloud manufacturer API to update the backend of the load balancer. After the traffic enters, it will be forwarded to the new Pod.

Nginx Ingress traffic scenario

For Nginx Ingress, by default, the traffic is directly forwarded to the back-end PodIP through the gateway instead of the Service's ClusterIP. In the process of application publishing, the Pod is offline, and the Ingress Controller listens for Endpoint changes and updates to the Nginx gateway. There is a delay in the operation. After the traffic enters, it may still be forwarded to the offline Pod, and the traffic loss will occur.

Solution:

• Ingress annotation "nginx. ingeress. kubernetes. io/service upstream" supports configuration as "true" or "false", and the default is "false". When the annotation is set to "true", the routing rule uses ClusterIP as the Ingress upstream service address; When set to "false", the routing rule uses the Pod IP as the Ingress upstream service address. Since the ClusterIP of the service is always the same, when the Pod is online or offline, it is unnecessary to consider the changes in the Nginx gateway configuration, and the above traffic loss problem will not occur. However, it should be noted that when this feature is used, the traffic load balancing is controlled by K8s, and some Ingress Controller features will fail, such as session retention, retry policy, etc.

• Set the sleep on the pre Stop of the pod for a period of time, let the pod wait for a period of time before receiving the SIGTERM signal, allow the pod to receive and process the traffic during this period of time, and avoid traffic loss.

Micro service traffic scenario

In the process of Provider application publishing, the Pod is offline and logged out of the registry, but there is a certain delay in the list change of the consumer subscription server. After the traffic enters the Consumer, if the Consumer still does not refresh the serverList, it may still access the offline Pod.

For the offline of the micro service application Pod, the service registration discovery is through the registry rather than relying on the endpoint. The removal of the endpoint by the endpoint controller above does not enable the offline of the Pod IP from the registry. Only in preStop, sleep still cannot solve the problem of cache refresh delay of consumer serverList. In order for the old Pod to go offline gracefully, you need to go offline from the registry first in preStop, and be able to process the received traffic. You also need to ensure that consumers have refreshed the list of Provider instances cached by their clients before going offline. The offline instance can achieve the above effect by calling the registry interface, or calling the interface provided by the service framework in the program and setting it to preStop. In EDAS

Enterprise one-stop solution

Above, we analyzed the causes of the loss of traffic in three common traffic paths during the application release process and provided solutions. In general, in order to ensure the lossless traffic, it is necessary to ensure the tacit cooperation between the traffic path and the online and offline Pod from the gateway parameters and the Pod life cycle probes and hooks. EDAS provides a non-invasive solution to the above problems. Without changing the program code or parameter configuration, applications can be online and offline without any damage on the EDAS console. As shown in the figure below:

• LB Service supports the configuration of external traffic policy

• Nginx Ingress supports the configuration of "nginx. ingeress. kubernetes. io/service upstream" annotation

• Lossless online parameters for microservice application configuration

In addition, EDAS also provides a variety of traffic gateway management methods, such as Nginx Ingress, ALB Ingress, and Cloud Native Gateway, as well as a variety of deployment methods for application publishing, such as batch publishing, canary publishing, and observable means in different dimensions, such as Ingress monitoring, and application monitoring. The EDAS platform management application can easily realize lossless online and offline in multiple deployment scenarios.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us