By Wang Siyu (Jiuzhu)
OpenKruise is an open-source automated management engine for large-scale applications developed by Alibaba Cloud. In terms of functions, it is similar to Kubernetes-native controllers, such as Deployment and StatefulSet. However, OpenKruise provides many additional features, including graceful in-place upgrades, release priority/dispersion policies, multi-zone workload abstraction management, and unified container injection management of Sidecar. These features are all core capabilities that have been tested by the ultra-large-scale application scenarios of Alibaba. They help Alibaba Cloud address more diverse deployment environments and requirements and provide cluster maintainers and application developers with more flexible deployment and release policies.
Currently, OpenKruise is used for pod deployment and release management for all applications in Alibaba's internal cloud-native environment. Many companies in the industry and users of Alibaba Cloud also use OpenKruise to deploy applications because Kubernetes-native workload controllers, such as Deployment, cannot fully meet the requirements. Alibaba Cloud hopes OpenKruise can enable every Kubernetes developer and Alibaba Cloud user to use the same deployment and release capabilities the Alibaba cloud-native applications use!
OpenKruise v0.7.0 was released in November 16, 2020. It added some main features, optimizations, and iterations. The following section provides an overview of this version.
Based on the native
StatefulSet provides enhanced release capabilities, such as
maxUnavailable for parallel release and in-place upgrade.
Official Documentation: https://openkruise.io/en-us/docs/advanced_statefulset.html
In the past, custom workloads provided by OpenKruise were in v1alpha1. As workloads are widely used within Alibaba and by many community members, stable capabilities will be gradually upgraded to later versions. This Advanced
StatefulSet is the first CRD in v1beta1. Resources, such as
SidecarSet will be gradually upgraded.
If users have used the Advanced
StatefulSet of v1alpha1 in the past, are there any problems when upgrading it to v1beta1? There is a clear answer: no. The existing Advanced
StatefulSet objects are automatically converted to v1beta1. Moreover, users can continue to use the v1alpha1 interface and client to perform operations on objects in this version.
Let's look at the CRD definition in the new-version
kruise-controller-managernode is mounted to the
kruise-webhook-service. The same service is also configured in the
Now, let's look at the conversion procedure shown in the figure above:
StatefulSet, conversion is not required. So, apiserver can interact directly with etcd.
When using the v1alpha1 interface to perform operations on Advanced StatefulSet:
For details of the multi-version conversion logics, please see: https://github.com/openkruise/kruise/blob/master/apis/apps/v1alpha1/statefulset_conversion.go
Generally, the pods and PVCs are scaled out are in sequence for either community-native
StatefulSet or Advanced
StatefulSet. For example, for a StatefulSet with 4 replicas, the ordinals of the created pods are [0, 1, 2, 3].
However, in some cases, users need to delete the pod with a specific ordinal and hope
StatefulSet does not use the pod with this ordinal. This is especially true in scenarios where Local PVs are used. When some nodes are abnormal, the original PVC/PV will be reused by the new pod with the same ordinal by deleting the original pod. The pod will be scheduled to the original node.
Start from the Advanced StatefulSet v1beta1 of (corresponding to OpenKruise v0.7.0 and later versions), the ordinal reservation function is provided:
apiVersion: apps.kruise.io/v1beta1 kind: StatefulSet spec: # ... replicas: 4 reserveOrdinals: - 1
By writing reserved ordinals in the
reserveOrdinals field, the Advanced StatefulSet will not create pods with these ordinals. If these pods already exist, they will be deleted. Note:
spec.replicas is the expected number of pods to be run, and
spec. reserveOrdinals contains the ordinals of pods that will not be created.
Therefore, for an Advanced
StatefulSet with 4 replicas and  in
reserveOrdinals, the ordinals of running pods are [0, 2, 3, 4].
reserveOrdinals. Then, the controller deletes Pod-3 and creates Pod-5. The ordinals of running pods will be [0, 2, 4, 5].
reserveOrdinals, and the replica number is reduced to 3. Then, the controller deletes Pod-3, and the ordinals of running pods will be [0, 2, 4].
CloneSet controller provides the capability to manage stateless applications efficiently. It is similar to native Deployment, but it offers many enhanced functions.
Official Documentation: https://openkruise.io/en-us/docs/cloneset.html
CloneSet, users can use the partition field to control the number of gray releases. In previous versions, this field could only be set to an absolute value. Starting from v0.7.0, this field can be set to a percentage. Its semantics says the number or percentage of pods in old versions is reserved, which is 0 by default.
apiVersion: apps.kruise.io/v1alpha1 kind: CloneSet spec: # ... updateStrategy: partition: 80% # This means that, only 20% of pods are upgraded to the new version. Users can also set the partition to the absolute value of the number of reserved pods in old versions.
There are two cases for the setting of the partition value during the release process:
Some previous bugs in the edge scenarios are solved. Thanks to the feedback and contribution of the community members:
gracePeriodSecondsmode is used for continuous upgrades is solved.
AdvancedCronJob is a new controller added in v0.7.0. It is an extended version of
CronJob. It was contributed by Rishi Anand from Spectro Cloud!
CronJob only allows users to create a Job to execute tasks.
AdvancedCronJob allows users to create different types of templates. This means users can configure the schedule rule to create a Job or
BroadcastJob periodically to execute the task.
BroadcastJob can distribute the Job to all or specific nodes to execute the task.
apiVersion: apps.kruise.io/v1alpha1 kind: AdvancedCronJob spec: template: # Option 1: use jobTemplate, which is equivalent to original CronJob jobTemplate: # ... # Option 2: use broadcastJobTemplate, which will create a BroadcastJob object when cron schedule triggers broadcastJobTemplate: # ... # Options 3(future): ...
CronJob, and it creates a Job for task execution.
BroadcastJobperiodically to execute tasks.
kruise-controller-manager of OpenKruise contains multiple controllers and webhooks.
Webhook needs to generate a complete set of TLS certificates. The HTTPS service on the webhook server uses these certificates when being enabled. In addition, the CA certificate needs to be written to
caBundle of the CRD conversion.
How can we generate certificates automatically and configure them to the preceding configuration resources? How can we rewrite the configurations after they are reset? These are the O&M challenges that webhook encounters.
This version of OpenKruise implements a webhook controller that supports self-maintenance for TLS certificates and related configuration resources of OpenKruise. The process is listed below:
ValidatingWebhookConfiguration, and CRD conversion and performs continuous "list watch" operation on these resources. The CA certificate will be rewritten once any change occurs.
For more information, please see:
In the future, Alibaba Cloud will put these functions in a public warehouse. When writing webhooks, users can easily reuse the self-maintenance capabilities of this webhook.
OpenKruise will continue to make deeper optimizations in application automation. The next roadmap plan of OpenKruise, v0.8.0, has been released on March 4, 2021, and you can learn more about this release in this article. Alibaba Cloud will no longer be limited to workload application management capabilities and will make efforts in more fields, such as risk prevention and control and operator enhancement.
Alibaba Cloud welcomes every cloud-native enthusiast to participate in the construction of OpenKruise. Unlike other open-source projects, OpenKruise is not a copy of Alibaba's internal code. On the contrary, the OpenKruise Github repository is the upstream of Alibaba's internal code repository. Therefore, every line of code you contribute will run in all Kubernetes clusters within Alibaba and will jointly support Alibaba's world-leading cloud-native application scenarios!
Alibaba Cloud Native Community - August 25, 2022
Alibaba Cloud Native Community - August 17, 2022
Alibaba Clouder - December 3, 2020
Alibaba Clouder - July 12, 2019
Alibaba Cloud Native - June 9, 2022
Alibaba Developer - March 31, 2021
Alibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.Learn More
Provides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resourcesLearn More
Accelerate and secure the development, deployment, and management of containerized applications cost-effectively.Learn More
High-performance virtual machines with data transfer plan, starting from $2.50 per monthLearn More
More Posts by Alibaba Cloud Native Community