×
Community Blog OpenKruise v0.7.0: A Controller for Periodic Task Distribution

OpenKruise v0.7.0: A Controller for Periodic Task Distribution

This article gives an overview of OpenKruise v0.7.0.

By Wang Siyu (Jiuzhu)

1

Preface

OpenKruise is an open-source automated management engine for large-scale applications developed by Alibaba Cloud. In terms of functions, it is similar to Kubernetes-native controllers, such as Deployment and StatefulSet. However, OpenKruise provides many additional features, including graceful in-place upgrades, release priority/dispersion policies, multi-zone workload abstraction management, and unified container injection management of Sidecar. These features are all core capabilities that have been tested by the ultra-large-scale application scenarios of Alibaba. They help Alibaba Cloud address more diverse deployment environments and requirements and provide cluster maintainers and application developers with more flexible deployment and release policies.

Currently, OpenKruise is used for pod deployment and release management for all applications in Alibaba's internal cloud-native environment. Many companies in the industry and users of Alibaba Cloud also use OpenKruise to deploy applications because Kubernetes-native workload controllers, such as Deployment, cannot fully meet the requirements. Alibaba Cloud hopes OpenKruise can enable every Kubernetes developer and Alibaba Cloud user to use the same deployment and release capabilities the Alibaba cloud-native applications use!

Please see: OpenKruise: The Cloud-Native Platform for the Comprehensive Process of Alibaba's Double 11

Overview of OpenKruise v0.7.0

OpenKruise v0.7.0 was released in November 16, 2020. It added some main features, optimizations, and iterations. The following section provides an overview of this version.

1. Advanced StatefulSet

Based on the native StatefulSet, Advanced StatefulSet provides enhanced release capabilities, such as maxUnavailable for parallel release and in-place upgrade.

Official Documentation: https://openkruise.io/en-us/docs/advanced_statefulset.html

1) First v1beta1 CRD in OpenKruise

In the past, custom workloads provided by OpenKruise were in v1alpha1. As workloads are widely used within Alibaba and by many community members, stable capabilities will be gradually upgraded to later versions. This Advanced StatefulSet is the first CRD in v1beta1. Resources, such as CloneSet and SidecarSet will be gradually upgraded.

If users have used the Advanced StatefulSet of v1alpha1 in the past, are there any problems when upgrading it to v1beta1? There is a clear answer: no. The existing Advanced StatefulSet objects are automatically converted to v1beta1. Moreover, users can continue to use the v1alpha1 interface and client to perform operations on objects in this version.

2

Let's look at the CRD definition in the new-version StatefulSet:

  • The conversion field is specified. The conversion service is provided through the specified kruise-webhook-service. The kruise-controller-manager node is mounted to the kruise-webhook-service. The same service is also configured in the MutatingWebhookConfiguration/ValidationWebhookConfiguration of OpenKruise.
  • The versions list contains two versions: v1alpha1 and v1beta1. The storage parameter of v1beta1 is set to "true," representing that the version is stored in etcd.

Now, let's look at the conversion procedure shown in the figure above:

  • When using the v1beta1 interface directly to perform operations on Advanced StatefulSet, conversion is not required. So, apiserver can interact directly with etcd.
  • When using the v1alpha1 interface to perform operations on Advanced StatefulSet:

    • Write Operation: apiserver calls webhook to convert the v1alpha1 object written by the user into v1beta1 object and then writes the object into etcd.
    • Read Operation: apiserver calls webhook to convert the v1beta1 object from etcd to v1alpha1 object and then returns the object to the user.

For details of the multi-version conversion logics, please see: https://github.com/openkruise/kruise/blob/master/apis/apps/v1alpha1/statefulset_conversion.go

2) Ordinal Reservation

Generally, the pods and PVCs are scaled out are in sequence for either community-native StatefulSet or Advanced StatefulSet. For example, for a StatefulSet with 4 replicas, the ordinals of the created pods are [0, 1, 2, 3].

However, in some cases, users need to delete the pod with a specific ordinal and hope StatefulSet does not use the pod with this ordinal. This is especially true in scenarios where Local PVs are used. When some nodes are abnormal, the original PVC/PV will be reused by the new pod with the same ordinal by deleting the original pod. The pod will be scheduled to the original node.

Start from the Advanced StatefulSet v1beta1 of (corresponding to OpenKruise v0.7.0 and later versions), the ordinal reservation function is provided:

apiVersion: apps.kruise.io/v1beta1
kind: StatefulSet
spec:
  # ...
  replicas: 4
  reserveOrdinals:
  - 1

By writing reserved ordinals in the reserveOrdinals field, the Advanced StatefulSet will not create pods with these ordinals. If these pods already exist, they will be deleted. Note: spec.replicas is the expected number of pods to be run, and spec. reserveOrdinals contains the ordinals of pods that will not be created.

Therefore, for an Advanced StatefulSet with 4 replicas and [1] in reserveOrdinals, the ordinals of running pods are [0, 2, 3, 4].

  • If the Pod-3 needs to be migrated while the ordinal to be reserved, "3" needs to be written to the reserveOrdinals. Then, the controller deletes Pod-3 and creates Pod-5. The ordinals of running pods will be [0, 2, 4, 5].
  • If users only want to delete Pod-3, "3" needs to be written to the reserveOrdinals, and the replica number is reduced to 3. Then, the controller deletes Pod-3, and the ordinals of running pods will be [0, 2, 4].

2. CloneSet

The CloneSet controller provides the capability to manage stateless applications efficiently. It is similar to native Deployment, but it offers many enhanced functions.

Official Documentation: https://openkruise.io/en-us/docs/cloneset.html

1) Percentage Supported in Partition Field

In CloneSet, users can use the partition field to control the number of gray releases. In previous versions, this field could only be set to an absolute value. Starting from v0.7.0, this field can be set to a percentage. Its semantics says the number or percentage of pods in old versions is reserved, which is 0 by default.

apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
spec:
  # ...
  updateStrategy:
    partition: 80%  # This means that, only 20% of pods are upgraded to the new version. Users can also set the partition to the absolute value of the number of reserved pods in old versions.

There are two cases for the setting of the partition value during the release process:

  • If it is a number, there will be (replicas – partition) of pods to be upgraded to the latest version by the controller.
  • If it is a percentage, there will be (replicas * (100% - partition)) of pods to be upgraded to the latest version by the controller.

2) Other Optimizations

Some previous bugs in the edge scenarios are solved. Thanks to the feedback and contribution of the community members:

  • The owner reference is automatically removed from the pods that do not meet the matching conditions of the selector.
  • The occasional race condition of resourceVersionExpectation is resolved.
  • The version coverage issue when the gracePeriodSeconds mode is used for continuous upgrades is solved.

3. AdvancedCronJob (New Controller)

AdvancedCronJob is a new controller added in v0.7.0. It is an extended version of CronJob. It was contributed by Rishi Anand from Spectro Cloud!

The native CronJob only allows users to create a Job to execute tasks. AdvancedCronJob allows users to create different types of templates. This means users can configure the schedule rule to create a Job or BroadcastJob periodically to execute the task. BroadcastJob can distribute the Job to all or specific nodes to execute the task.

apiVersion: apps.kruise.io/v1alpha1
kind: AdvancedCronJob
spec:
  template:

    # Option 1: use jobTemplate, which is equivalent to original CronJob
    jobTemplate:
      # ...

    # Option 2: use broadcastJobTemplate, which will create a BroadcastJob object when cron schedule triggers
    broadcastJobTemplate:
      # ...

    # Options 3(future): ...
  • jobTemplate: It is similar to native CronJob, and it creates a Job for task execution.
  • broadcastJobTemplate: It creates BroadcastJob periodically to execute tasks.

3

4. Webhook Controller for Self-Maintenance

The kruise-controller-manager of OpenKruise contains multiple controllers and webhooks.

Webhook needs to generate a complete set of TLS certificates. The HTTPS service on the webhook server uses these certificates when being enabled. In addition, the CA certificate needs to be written to MutatingWebhookConfiguration, ValidatingWebhookConfiguration, and caBundle of the CRD conversion.

How can we generate certificates automatically and configure them to the preceding configuration resources? How can we rewrite the configurations after they are reset? These are the O&M challenges that webhook encounters.

This version of OpenKruise implements a webhook controller that supports self-maintenance for TLS certificates and related configuration resources of OpenKruise. The process is listed below:

  1. The webhook controller generates certificates automatically and stores them in "secret."
  2. It writes certificates to local components for the HTTPS service.
  3. It writes the CA certificate into MutatingWebhookConfiguration, ValidatingWebhookConfiguration, and CRD conversion and performs continuous "list watch" operation on these resources. The CA certificate will be rewritten once any change occurs.

For more information, please see:

https://github.com/openkruise/kruise/blob/master/pkg/webhook/util/controller/webhook_controller.go

In the future, Alibaba Cloud will put these functions in a public warehouse. When writing webhooks, users can easily reuse the self-maintenance capabilities of this webhook.

Summary

OpenKruise will continue to make deeper optimizations in application automation. The next roadmap plan of OpenKruise, v0.8.0, has been released on March 4, 2021, and you can learn more about this release in this article. Alibaba Cloud will no longer be limited to workload application management capabilities and will make efforts in more fields, such as risk prevention and control and operator enhancement.

Alibaba Cloud welcomes every cloud-native enthusiast to participate in the construction of OpenKruise. Unlike other open-source projects, OpenKruise is not a copy of Alibaba's internal code. On the contrary, the OpenKruise Github repository is the upstream of Alibaba's internal code repository. Therefore, every line of code you contribute will run in all Kubernetes clusters within Alibaba and will jointly support Alibaba's world-leading cloud-native application scenarios!

0 0 0
Share on

Alibaba Developer

207 posts | 33 followers

You may also like

Comments