Native On ASK brings you the ultimate serverless experience

The annual survey report released by CNCF shows that serverless technology has further gained greater recognition in 2019. Among them, 41% of the respondents said that they are already using serverless, and another 20% of the respondents said that they plan to adopt serverless technology in the next 12-18 months. Among the many open source Serverless projects, Knative is the most popular one.

As shown in the figure below, Knative has a 34% share, far ahead of the second place OpenFaaS. Knative is the first choice for building a serverless platform.

The reason why Knative is so popular has something to do with the ecology of containers. Different from the FaaS model, Knative does not require users to make major changes to the application. As long as the user's application is containerized, it can be deployed in Knative. And Knative provides a more focused application model on top of Kubernetes, so that users do not need to spend energy on application upgrades and traffic grayscale, all of which are done automatically.

The history of cloud hosting

Before the emergence of cloud computing, enterprises that wanted to provide services on the Internet needed to rent physical machines from IDC first, and then deploy applications on the physical machines of IDC. The performance of physical machines has maintained the growth rate of Moore's Law in the past ten years. As a result, a single application cannot fully utilize the resources of the entire physical machine. Therefore, there is a need for a technology to solve the problem of resource utilization. Simply think that if one application is not enough, deploy a few more. However, the mixed deployment of multiple applications under the same physical machine will bring many problems, such as:

port conflict

resource isolation

System dependence and difficulty in operation and maintenance

The emergence of virtual machines has solved the above problems very well. Through virtual machine technology, multiple hosts can be virtualized on the same physical machine, and each host can only deploy one application. Such a physical machine can not only deploy multiple applications, but also Independence between applications can be guaranteed.

As an enterprise grows, an enterprise may maintain many applications. Each application requires a lot of release, upgrade, rollback and other operations, and these applications may also need to be deployed in different regions. This brings a lot of O&M burdens, and the first to bear the brunt of these O&M problems is the operating environment of the application. Therefore, container technology emerged later. Through the lightweight isolation capability at the kernel level, container technology not only has the same isolation experience as VM, but also brings a huge innovation that is container image. The running environment of the application can be easily copied through the container image. Developers only need to put all the dependencies of the application in the image, and when the image is running, it directly uses the built-in dependencies to provide services. This solves the operating environment problems in the process of application release, upgrade and rollback, and multi-region deployment.

When people start to use container technology on a large scale, they find that the burden of maintaining the instance running environment is greatly reduced. At this time, the biggest problem is the coordination of multiple application instances and the coordination between multiple applications. Therefore, Kubernetes appeared shortly after the popularization of container technology. Unlike previous VM and container technologies, Kubernetes is naturally a distributed terminal-oriented design, not a capability on a single machine. Kubernetes abstracts more friendly APIs for IaaS resource allocation. Users do not need to care about specific allocation details. The Kubernetes Controller will automatically complete allocation, failover, and load balancing according to the life of the terminal. This allows application developers not to care where the specific instance is running, as long as Kubernetes can allocate resources when needed.

Whether it is the early physical machine model or the current Kubernetes model, application developers themselves do not want to manage any underlying resources. Application developers just want to run the application. In the physical machine mode, people need to monopolize the physical machine. In the Kubernetes mode, people actually don't care which physical machine their business processes are running on, and in fact they cannot predict in advance. As long as the application can run, it doesn't matter where it runs. Physical machine -> virtual machine -> container -> Kubernetes, the whole process is actually simplifying the threshold for applications to use IaaS resources. In this evolution process, we can find a clear context. The coupling between IaaS and applications is getting lower and lower. The basic platform only needs to allocate corresponding IaaS resources to the applications when the applications need to run. The application manager is only the leader of IaaS. Users do not need to be aware of the details of IaaS allocation.

Knative Serving

Before introducing Knative, let's take a look at how to do application traffic access and release in Kubernetes mode through a web application. As shown in the figure below, on the left is the Kubernetes mode and on the right is the Knative mode.

In Kubernetes mode

Users need to manage the Ingress Controller themselves

To expose services externally, the relationship between Ingress and Service needs to be maintained

If you want to do grayscale observation when publishing, you need to use multiple Deployment rotations to complete the upgrade

In Knative mode

Users only need to maintain a Knative Service resource

Of course, Knative cannot completely replace Kubernetes. Knative is built on the capabilities of Kubernetes. In addition to the different resources that users need to directly manage, Kubernetes and Knative actually have a huge conceptual difference:

The role of Kubernetes is to decouple IaaS and applications and reduce the cost of IaaS resource allocation. Kubernetes mainly focuses on the orchestration of IaaS resources. And Knative is more biased towards the application layer, with flexibility as the core application orchestration.

Knative is a Kubernetes-based serverless orchestration engine, and its goal is to develop a cloud-native, cross-platform serverless orchestration standard. Knative implements this Serverless standard by integrating container construction, workload management (dynamic scaling) and event model. Serving is the core module for running serverless workloads.

application hosting

Kubernetes is an abstraction for IaaS management. Deploying applications directly through Kubernetes requires more resources to maintain

One resource can define application hosting through Knative Service
traffic management

Knative accesses application traffic through Gateway and divides the traffic by percentage, which lays the foundation for elastic, gray scale and other capabilities
Gray release

Supports multi-version management, and it is easy to implement multiple versions of the application to provide online services at the same time
Different versions can set different traffic percentages, and it is easy to implement functions such as grayscale publishing


The core capability of Knative to help applications save costs is elasticity, which automatically expands capacity when traffic increases and automatically shrinks capacity when traffic decreases

Each grayscale version has its own elastic policy, and the elastic policy is associated with the traffic allocated to the current version. Knative will make decisions on scaling up or down based on the amount of allocated traffic

For more introduction of Knative, please go here or here to learn more.


Kubernetes in the community requires you to purchase a host in advance and register as a Kubernetes node to schedule pods. Purchasing a host in advance does not conform to the application logic. Application developers just want to allocate IaaS resources when they need to run application instances, and do not want to Maintain complex IaaS resources. Therefore, if there is a kind of Kubernetes that is fully compatible with the community’s Kubernetes API, but does not need to operate and maintain complex IaaS resources, it can automatically allocate resources when needed, which is more in line with the application’s concept of using resources. ASK adheres to this concept to bring you the experience of using Serverless Kubernetes.

The full name of ASK is Serverless Kubernetes, which is a serverless Kubernetes cluster. Users can directly deploy container applications without purchasing nodes. There is no need for node maintenance and capacity planning for the cluster, and on-demand deployment is performed according to the amount of CPU and memory resources configured by the application. pay. ASK clusters provide perfect Kubernetes compatibility, while greatly reducing the barriers to use of Kubernetes, allowing users to focus more on applications rather than managing the underlying infrastructure.

That is to say, you can directly create an ASK cluster without preparing ECS resources in advance, and then deploy your own services in the ASK cluster. A more detailed introduction to ASK can be found here.

When we analyzed the history of Serverless, we concluded that the main development of Serverless is actually that the coupling between IaaS and applications is getting lower and lower. The basic platform only needs to allocate corresponding IaaS resources to the application when the application needs to run. Managers are only users of IaaS and do not need to be aware of the details of IaaS allocation. ASK is the platform for allocating IaaS resources at any time. Knative is responsible for sensing the real-time status of applications and automatically "applying" IaaS resources (Pods) from ASK when needed. The combination of Knative and ASK can bring you a more extreme serverless experience.

For a more in-depth introduction to ASK, please refer to Serverless Kubernetes - Ideal, Reality and Future


SLB-based Gateway

The Knative community supports various Gateway implementations such as Istio, Gloo, Contour, Kourier, and ambassador by default. Among these many implementations, Istio is of course the most popular one, because Istio can be used as a ServiceMesh service in addition to acting as a Gateway. Although these Gateways are fully functional, as a Gateway for Serverless services, it is a bit contrary to the original intention. First, there must be a Gateway instance running permanently, and at least two instances must be backups for each other in order to ensure high availability. Secondly, the management and control terminals of these Gateways also need to run permanently. The IaaS fees and operation and maintenance of these resident instances are the costs that businesses need to pay.
In order to provide users with the ultimate serverless experience, we implemented the Knative Gateway through Alibaba Cloud SLB, which has all the required functions and is supported by cloud product levels. No need for resident resources not only saves your IaaS cost but also saves a lot of operation and maintenance burden.

low-cost reserved instance

Reserved instances are a unique feature of ASK Knative. By default, Knative in the community can shrink to zero when there is no traffic, but it is difficult to solve the cold start problem from zero to one after shrinking to zero. In addition to solving problems such as IaaS resource allocation, Kubernetes scheduling, and image pull, cold start also involves the startup time of the application. Application startup time ranges from milliseconds to minutes, which is almost uncontrollable at the general platform level. Of course, these problems exist in all serverless products. Most of the traditional FaaS products run different functions by maintaining a public IaaS pool. In order to protect the pool from being full and extremely low cold start time, most of the solutions of FaaS products are to impose various restrictions on user functions. . for example:

Timeout for processing requests: If it fails after this time, it will be considered a failure
Bursting concurrency: By default, all functions have a concurrency upper limit, and if the request exceeds this upper limit, the flow will be limited

CPU and memory: can not exceed the upper limit of CPU and memory

ASK Knative's solution to this problem is to balance cost and cold start problems through low-priced reserved instances. Alibaba Cloud ECI has many specifications, and the computing power of different specifications is different, and the price is also different. The price comparison between computing instances configured with 2c4G and burstable performance instances is shown below.


From the above comparison, it can be seen that the burst performance instance is 46% cheaper than the computing type. It can be seen that if there is no traffic, using the burst performance instance to provide services not only solves the problem of cold start, but also saves a lot of costs.

In addition to the price advantage, burst performance instances also have a very eye-catching feature: CPU credits. Burstable performance instances can use CPU credits to meet burst performance requirements. Burstable performance instances can continuously obtain CPU credits. When the performance cannot meet the load requirements, the computing performance can be seamlessly improved by consuming accumulated CPU credits without affecting the environment and applications deployed on the instance. Through CPU credits, you can allocate computing resources from the perspective of the overall business, and seamlessly transfer the remaining computing power during the business peak period to the peak period (simple understanding is gasoline-electric hybrid ☺️☺️). See here for more details on burst performance examples.

Therefore, ASK Knative's strategy is to replace standard computing instances with burst performance instances during business troughs, and then seamlessly switch to standard computing instances when the first request comes. This can help you reduce the cost of traffic troughs, and the CPU credits obtained during the troughs can also be consumed when the business peaks arrive, and every penny you pay is not wasted.
Using burstable performance instances as Reserved Instances is the default policy, and you can specify other types of instances you expect as Reserved Instance specifications. Of course, you can also specify a minimum reserve of one standard instance, thus turning off the function of retaining instances.

Demo display

After the Serverless Kubernetes (ASK) cluster is created, you can apply for activation of the Knative function through the following DingTalk group. Then you can directly use the capabilities provided by Knative in the ASK cluster.

reserved instance

In the previous highlights section, it was introduced that ASK Knative will use reserved instances to solve cold start and cost problems. Next, let's take a look at the switching process between reserved instances and standard instances.
After a while after the previous pressure test, use kubectl get pod to check the number of Pods. You may find that there is only one Pod, and the Pod name is xxx-reserve-xx. The meaning of reserve is to reserve the instance. At this time, the reserved instance is actually used to provide services. When there is no online request for a long time, Knative will automatically expand the reserved instance, and shrink the standard instance to zero, so as to save costs.

What happens if there is traffic coming in at this time? Let's verify it. From the gif below, it can be seen that if there is traffic coming in at this time, the standard instance will be automatically expanded, and the reserved instance will be scaled down after the standard instance is ready.

Reserved instances use the specification ecs.t5-lc1m2.small(1c2g) by default. Of course, some applications need to allocate memory (such as JVM) when they are started by default. Assuming that an application requires 4G of memory, you may need to use ecs.t5-c1m2.large(2c4g) as the reserved instance specification. Therefore, we also provide a method for users to specify reserved instance specifications. Users can specify reserved instance specifications through annotations when submitting knative services, such as ecs.t5 -lc2m1.nano This configuration means to use ecs.t5-lc2m1.nano as the reserved instance type.

Two things have been added to this version currently deployed:

Set a name coffee-v1 for the currently deployed revision (if not set, it will be automatically generated)

• The word v1 is set in the environment variable, so that it can be judged from the content returned by http that the current service is the v1 version

Execute the kubectl apply -f coffee-v1.yaml command to deploy the v1 version. After deployment, continue to use curl -H "Host:" to verify.

After a few seconds, you can find that the returned content is Hello coffee-v1! During this process, the service is not interrupted, and no manual switching is required. After the modification, you can directly submit it to automatically complete the switching between the new version and the old version instance.

Now let's take a look at the status of the pod instance again. It can be seen that the pod instance has been switched. Older versions of Pods are automatically replaced by newer versions.

There are more demo demonstrations of complex functions, please move here


Knative is the most popular serverless orchestration framework in the Kubernetes ecosystem. The community-native Knative requires a resident Controller and a resident gateway to provide services. In addition to the need to pay IaaS costs, these resident instances also bring a lot of O&M burdens, which brings certain difficulties to serverless applications. So we have fully hosted Knative Serving in ASK. Out of the box, you pay nothing for these resident instances. In addition to providing the Gateway capability through the SLB cloud product, we also provide the reserved specification function based on the burst performance instance, which can greatly reduce the IaaS expenses of your service during the traffic trough period, and the accumulated CPU credits during the traffic trough period can be Spending during peak traffic times, every penny you pay is not wasted.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us