Knativa's traffic-based grayscale release and automatic elastic practice
Knative provides automatic capacity expansion based on traffic, which can automatically expand the number of instances during peak hours according to the application's request volume; When the request volume decreases, the instance is automatically resized to achieve automated resource cost savings. In addition, Knative also provides traffic based grayscale publishing capability, which can publish the percentage of traffic in grayscale.
Before introducing Knative grayscale publishing and automatic elasticity, let's take a look at the traffic request mechanism in ASK Knative.
As shown in the above figure, the overall traffic request mechanism is divided into the following parts:
On the left is the version information of the Knative Service, which allows you to set a percentage of traffic; The following is the routing strategy, in which the corresponding routing rules are set to Alibaba Cloud SLB through the Ingress controller;
On the right is the corresponding created service version Revision, which corresponds to deployment resources. When traffic enters through SLB, it is directly transferred to the backend server Pod according to the corresponding forwarding rules.
In addition to the traffic request mechanism, the above figure also shows corresponding elastic strategies, such as KPA, HPA, etc.
2、 Service lifecycle
Service is a resource object directly operated by developers, consisting of two parts of resources: Route and Configuration.
As shown in the above figure, users can set the corresponding image, content, and environment variable information by configuring the information in Configuration.
Manage the expected state of the container;
Similar to version controllers, each time Configuration is updated, a new version (Revision) is created.
As shown in the above figure, compared to the Knative Service, Configuration is very similar to its configuration, and the configuration in Configuration is the expected resource information of the container.
Control the distribution of traffic to different versions (Revisions);
Support traffic distribution based on percentage.
As shown in the above figure, a Route resource includes traffic information below, where the corresponding version and traffic proportion for each version can be set.
A snapshot of Configuration;
Version tracking and rollback.
The resource for version management in Knative Service is Revision, which is a snapshot of Configuration. Every time Configuration is updated, a new Revision is created, and version tracking, grayscale publishing, and rollback can be achieved through Revision. In the Revision resource, you can directly see the configured image information.
3、 Traffic based grayscale publishing
As shown in the above figure, if we initially created a V1 version of the Revision, and there are new version changes, we only need to update the Configuration in the Service to create the V2 version accordingly. Then use Route to set different traffic ratios for V1 and V2. In the figure above, V1 is 70% and V2 is 30%. The traffic will be distributed to the two versions in a 7:3 ratio. Once there are no issues with the V2 version verification, the grayscale can be continued by adjusting the traffic ratio until the new version V2 reaches 100%.
During the grayscale process, if any abnormalities are found in the new version, the traffic ratio can be adjusted at any time for rollback. Assuming that there is a problem with the V2 version when the grayscale reaches 30%, we can adjust the proportion back and set the traffic to 100% on the original V1 version to achieve a rollback operation.
In addition, we can also add a tag to the Revision through traffic in Route. After completing the tag, a directly accessible URL will be automatically generated for the current Revision in Knative. Through this URL, we can directly transfer the corresponding traffic to the current version, which can achieve debugging for a certain version.
4、 Automatic elasticity
In addition to providing rich elastic strategies in Knative, ASK Knative also extends some corresponding elastic mechanisms. Next, we will introduce the following elastic strategies:
Knative Pod automatic scaling (KPA);
Pod Horizontal Automatic Expansion and Shrinkage (HPA);
Support automatic scaling strategy with timed and HPA;
Event gateway (precise elasticity based on traffic requests);
Expand custom scaling plugins.
1. Automatic expansion and contraction capacity - KPA
As shown in the above figure, Route can be understood as a traffic gateway; Activator carries the responsibility of 0-1 in Knative. When there is no requested traffic, Knative will hang the corresponding service on the Activator Pod. Once the first traffic enters, it will first enter the Activator. After receiving the traffic, the Activator will expand the Pod through Autoscanner. After the expansion is completed, the Activator will forward the request to the corresponding Pod. Once the Pod is ready, the corresponding services will be directly linked to the Pod through the Route, and the Activator has already ended its mission.
During the 1-N process, Pods can collect the request concurrency index within each Pod through the kube proxy container ， That is, request metrics. Autoscaler aggregates based on these request metrics, calculates the corresponding required expansion amount, and achieves the final expansion and contraction based on traffic.
2. Horizontal expansion and contraction capacity - HPA
It actually encapsulates the native HPA in K8s, configures corresponding indicators and strategies through Revision, and uses the native HPA in K8s to support automatic scaling of CPU and memory.
3. Timing+HPA fusion
Plan capacity in advance for resource preheating;
Combine with CPU and Memory.
On top of Knative, we will integrate with HPA on a regular basis to achieve pre planned capacity for resource preheating. When using K8s, we can experience that when expanding through HPA, waiting for the indicator threshold to reach before expanding can sometimes not meet the actual emergency scenarios. For some regular flexible tasks, the amount of capacity that needs to be expanded during a certain time period can be planned in advance through a scheduled approach.
We also integrate with CPU and Memory. For example, if a certain time period is set to 10 Pods, but the current CPU calculates a threshold of 20 Pods, the maximum value of the two will be taken, which is 20 Pods for expansion. This is the most basic guarantee of service stability.
4. Event Gateway
Automatic elasticity based on the number of requests;
1 on 1 task distribution.
The event gateway is based on precise elasticity of traffic requests. After the event enters, it will first enter the event gateway. We will expand the Pod based on the current number of incoming requests. After the expansion is completed, there will be a request to forward the task and Pod one-on-one. Because sometimes a Pod can only process one request at a time, we need to handle this situation, which is the scenario solved by the event gateway.
5. Custom scaling plugin
There are two key points to customizing scaling plugins:
Adjust the number of Pod instances.
Where do indicators come from? Like the traffic based KPA provided by the Knative community, its metric is to pull metric metrics from each Pod's queue proxy container through a scheduled task. Process these indicators through the controller, aggregate them, and calculate how many Pods need to be expanded.
How to perform scaling? Actually, you can adjust the number of Pods in the corresponding deployment.
By adjusting the collection indicators and adjusting the number of Pod instances, it is easy to implement custom scaling plugins.
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Explore More Special Offers
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00