Deep Dive into OpenYurt: Edge Autonomy Design

By Xinsheng, Alibaba Cloud technical expert

In the two weeks since OpenYurt was made open source, its non-intrusive architecture design for integrating cloud native and edge computing has attracted attention from a lot of developers. Alibaba Cloud launched the open-source OpenYurt project to share its experience in the cloud-native edge computing field with the open-source community, accelerate the extension of cloud computing to the edge, and work with the community to define unified standards for the future cloud-native edge computing architecture.

To help the community and users better understand OpenYurt, we have published a series of Deep Dive into OpenYurt articles. This article is the second in this series and describes the OpenYurt edge autonomy design. In case you have not done so, do check out the previous article, "OpenYurt Out-of-the-Box Evaluation: Instantly Empower Native Kubernetes Clusters with Edge Computing Capabilities."

Introduction to OpenYurt

OpenYurt was officially released on May 29, 2020. As the core framework of the ACK@Edge service, OpenYurt is used for CDN, audio and video live streaming, IoT, logistics, industrial brain, city brain, and other scenarios, serving Alibaba Cloud LinkEdge, Freshippo (Hema Fresh), Youku, ApsaraVideo, and other businesses and projects. Its open-source capabilities include:

Edge autonomy
One-click conversion of native Kubernetes clusters to edge clusters

You can visit the GitHub page for OpenYurt to participate in the open-source project.

Edge Autonomy

1. Feature Introduction

When Kubernetes is extended to edge computing scenarios, edge nodes are connected to the cloud through public networks. Due to the instability of public networks, costs, and other factors, the edge requires edge businesses to continuously run when the network is unstable or disconnected. Edge autonomy is one of the primary edge computing demands mentioned in Gartner's edge computing report.

In the Kubernetes system architecture, container information in the secondary agent (kubelet) is saved in the memory. When the network is disconnected, business data cannot be obtained from the cloud. If a node or the kubelet is restarted, service containers cannot be restored, as shown in the following figure.

2. Issues to Solve for Edge Autonomy

To achieve edge autonomy, the following issues need to be solved in Kubernetes:

Issue 1: When a node is exceptional or restarted, memory data is lost. If the network is disconnected, service containers cannot be restored.
Issue 2: When the network is disconnected for a long time, the cloud controller removes service containers.
Issue 3: When the network recovers after a long disconnection, edge and cloud data may be inconsistent.

Solutions to Issue 1

Solution 1: Kubelet Reconstruction

Reconstruct the kubelet component or reuse or reference the kublet checkpoint feature to store business data in containers to a local disk. When the network is disconnected, locally cached data can be used to recover businesses.

This solution reconstructs the kubelet to meet the requirements of edge autonomy and has the following advantages:

The reconstructed kubelet can integrate the management capabilities of peer devices.
The reconstructed kubelet is more lightweight. However, this advantage leads to a lack of native kubelet features.

However, this solution introduces the following disadvantages:

Poor maintainability: Those who are familiar with the kubelet know that it is difficult to intrusively modify the kubelet core in response to Kubernetes version upgrades. The kubelet component is responsible for computing, storage, and network interaction. Therefore, its code structure and logic are complex. The workload for continuously maintaining a highly modified kubelet is huge.
Poor scalability: The edge autonomy capability is developed in the reconstructed kubelet component. If other components of edge nodes, such as network components, want to reuse edge autonomy, they have to reinvent the wheel.
Deeper scenario coupling: For example, IoT device management is added to the reconstructed kubelet, which results in deep coupling of the kubelet and IoT scenarios.

Solution 2 (Solution used by OpenYurt): YurtHub

We added a web cache and request agent hub called YurtHub on edge nodes (and edge-hub in business products). The edge component (kubelet) communicates with the cloud through YurtHub. YurtHub is a transparent gateway with a data caching feature. When the network connection to the cloud is disconnected and the node or kubelet is restarted, data of service containers is obtained from YurtHub to ensure edge autonomy.

Compared with solution 1, solution 2 has the following advantages:

No kubelet modification: Native kubelet capabilities are retained, and there is no workload for maintaining the kubelet along with Kubernetes version upgrades.
High scalability: Other node components can easily reuse YurtHub.
Consistency with Kubernetes design concepts: YurtHub can easily extend more capabilities.

The OpenYurt solution also has the following disadvantages: The native kubelet is not lightweight, creating major challenges when dealing with limited resources. In business products, the minimum node specification is 2U4G.

Solutions to Issues 2 and 3

The solutions for issues 2 and 3 are simple, so I will not describe them in much detail.

Issue 2: The native cloud component kube-controller-manager removes pods.

This issue is solved by Node Controller in the open-source yurt-controller-manager component, as shown in the following figure.

Issue 3: Edge and cloud data may be inconsistent when the network recovers.

In Kubernetes, users manage the edge through the cloud, for example, by deploying, upgrading, and scaling applications. When the edge is disconnected from the cloud, edge nodes will not synchronize users' node control operations from the cloud. When a network disconnection occurs, YurtHub only needs to ensure that the locally cached data is consistent with the data that existed when the network was disconnected. This means the data cached on the edge is not updated during the network disconnection. When the network recovers, edge nodes can synchronize the latest data from the cloud.

What's Next

OpenYurt has only been available for a short time and has a long way to go. We believe that our cloud-native architecture design will help OpenYurt go further. In addition, OpenYurt's design philosophy of "extending your native Kubernetes to edge" is popular among cloud-native enthusiasts.

References

Gartner report on edge computing
OpenYurt project address: https://github.com/alibaba/openyurt. We invite you to participate in the open-source project.

Community

Deep Dive into OpenYurt: Edge Autonomy Design

Introduction to OpenYurt

Edge Autonomy

1. Feature Introduction

2. Issues to Solve for Edge Autonomy

Solutions to Issue 1

Solution 1: Kubelet Reconstruction

Solution 2 (Solution used by OpenYurt): YurtHub

Solutions to Issues 2 and 3

What's Next

References

Read previous post:

Read next post:

Alibaba Cloud Native Community

You may also like

Comments

Alibaba Cloud Native Community

Related Products

IoT Platform

Link IoT Edge

IoT Solution

Global Internet Access Solution