Analysis of OpenYurt's Yurthub data filtering framework
Analysis of Yurthub Data Filtering Framework
Yurthub is essentially a layer of kube apiserver proxy, which adds a layer of cache on top of the proxy to ensure that edge nodes can use local cache to ensure business stability when offline, effectively solving the problem of edge autonomy. Secondly, it can reduce the load of a large number of list and watch operations on cloud APIs.
The data filtering of Yurthub is sent to kube apiserver through the pod on the node and the request from kubelet through the Load Balancer. The agent receives the response message for data filtering processing, and then returns the filtered data to the requester. If the node is an edge node, it will cache the resources in the response request body locally based on the request type. If it is a cloud node, considering the good network status, it will not cache locally.
Schematic diagram of Yurthub's filtering framework implementation:
Yurthub currently includes four filtering rules, which use the user agent, resource, and verb of the addons request to determine which filter to filter the corresponding data through.
Four filtering rule functions and implementation
ServiceTopologyFilter
Mainly used for data filtering on EndpointSlice resources, but the Endpoint Slice feature needs to be supported in Kubernetes v1.18 or above. If it is below version 1.18, it is recommended to use the endpointsFilter filter. When passing through this filter, the service resource corresponding to the endpointSlice resource is first found through kubernetes.io/service-name. Then, it is determined whether the annotations openyurt.io/topologyKeys exist in the servces resource. If so, the data filtering rules are determined based on the values of these annotations. Finally, the response data is updated and returned to addons.
The values of Annotations are divided into two categories:
1. Kubernetes. io/hostname: Only filter out endpoint IPs of the same node
2. Openyurt. io/nodepool or kubernetes. io/zone: Use this annotation to obtain the corresponding node pool, and finally traverse the endpointSlice resource. Use the kubernetes. io/host name field in the topology field of the endpointSlice to find the corresponding Endpoints in the endpointSlice object. After that, reorganize the Endpoints in the endpointSlice and return them to addons.
EndpointsFilter
Perform corresponding data filtering on endpoints resources. Firstly, determine whether there is a corresponding service for the endpoint, obtain the node pool through the node's label: apps. openyurt. io/nodepool, and then obtain all nodes under the node pool. Traverse the resources under endpoints. Subsets to find the Ready pod address and NotReady pod address of the same node pool and reassemble them into new endpoints, which are then returned to addresses.
MasterServiceFilter
The scenario of replacing IP and ports for domain names under services mainly involves seamless use of InClusterConfig to access cluster resources on edge PODs.
DiscardCloudService
This filter targets two types of services, one of which is LoadBalancer. Because the edge cannot access resources of the LoadBalancer type, this filter will directly filter out resources of this type. Another type is for x-tunnel server internal Svc in the kube system namespace, which mainly exists in the cloud node for accessing the yurt tunnel server. For edge nodes, this service will be directly filtered out.
Current status of filtering framework
The current filtering framework is relatively rigid. hard coding resource filtering into code. Only registered resources can be filtered accordingly. In order to solve this problem, the filtering framework needs to be modified accordingly.
Solution
Option 1:
Customizing the filtering configuration using parameters or environment variables has the following drawbacks:
1. The complex configuration requires writing customized configurations into startup parameters or reading environment variables, as shown in the following format:
--filter_ serviceTopology=coredns/endpointslices#list,kube-proxy/services#list; watch --filter_ endpointsFilter=nginx-ingress-controller/endpoints#list; watch
2. Unable to hot update, every time the configuration is modified, Yurthub needs to be restarted to take effect.
Option 2:
1. Customize the filtering configuration in the form of configmap to reduce configuration complexity. Multiple resources are separated by commas in the configuration format (user agent/resource # list, watch). As follows:
filter_ endpoints: coredns/endpoints#list; watch,test/endpoints#list; watch
filter_ servicetopology: coredns/endpointslices#list; watch
filter_ discardcloudservice: ""
filter_ masterservice: ""
2. Utilizing the Informer mechanism to ensure real-time configuration validation
Based on the above two points, we have chosen Solution 2 in OpenYurt.
Problems encountered during the development process
The API address of the Informer watch on the edge is the proxy address of Yurthub, so Yurthub cannot guarantee that the configmap data is normal before starting the proxy port. If the request for addons is updated before the configmap data after startup, it will result in data being returned to addons without filtering, which can lead to many unexpected issues.
To solve this problem, we need to add WaitForCacheSync to the app to ensure that the data synchronization is completed before returning the corresponding filtered data. However, adding WaitForCacheSync to the app also directly causes the configmap to be blocked when watching, so we need to add a whitelist mechanism before WaitForCacheSync, When Yurthub uses list&watch to access configmap, we do not perform data filtering directly. The corresponding code logic is as follows:
Summary
1. From the above expansion capabilities, we can see that YurtHub is not just a reverse proxy with data caching capability on edge nodes. Instead, it adds a new layer of encapsulation to the application lifecycle management of Kubernetes node, providing the core management and control capabilities required by edge computing.
2. YurtHub is not only applicable to edge computing scenarios, but also can be used as a standing component on the node side, which is applicable to any scenario using Kubernetes. I believe this will also drive YurtHub towards higher performance and stability.
Yurthub is essentially a layer of kube apiserver proxy, which adds a layer of cache on top of the proxy to ensure that edge nodes can use local cache to ensure business stability when offline, effectively solving the problem of edge autonomy. Secondly, it can reduce the load of a large number of list and watch operations on cloud APIs.
The data filtering of Yurthub is sent to kube apiserver through the pod on the node and the request from kubelet through the Load Balancer. The agent receives the response message for data filtering processing, and then returns the filtered data to the requester. If the node is an edge node, it will cache the resources in the response request body locally based on the request type. If it is a cloud node, considering the good network status, it will not cache locally.
Schematic diagram of Yurthub's filtering framework implementation:
Yurthub currently includes four filtering rules, which use the user agent, resource, and verb of the addons request to determine which filter to filter the corresponding data through.
Four filtering rule functions and implementation
ServiceTopologyFilter
Mainly used for data filtering on EndpointSlice resources, but the Endpoint Slice feature needs to be supported in Kubernetes v1.18 or above. If it is below version 1.18, it is recommended to use the endpointsFilter filter. When passing through this filter, the service resource corresponding to the endpointSlice resource is first found through kubernetes.io/service-name. Then, it is determined whether the annotations openyurt.io/topologyKeys exist in the servces resource. If so, the data filtering rules are determined based on the values of these annotations. Finally, the response data is updated and returned to addons.
The values of Annotations are divided into two categories:
1. Kubernetes. io/hostname: Only filter out endpoint IPs of the same node
2. Openyurt. io/nodepool or kubernetes. io/zone: Use this annotation to obtain the corresponding node pool, and finally traverse the endpointSlice resource. Use the kubernetes. io/host name field in the topology field of the endpointSlice to find the corresponding Endpoints in the endpointSlice object. After that, reorganize the Endpoints in the endpointSlice and return them to addons.
EndpointsFilter
Perform corresponding data filtering on endpoints resources. Firstly, determine whether there is a corresponding service for the endpoint, obtain the node pool through the node's label: apps. openyurt. io/nodepool, and then obtain all nodes under the node pool. Traverse the resources under endpoints. Subsets to find the Ready pod address and NotReady pod address of the same node pool and reassemble them into new endpoints, which are then returned to addresses.
MasterServiceFilter
The scenario of replacing IP and ports for domain names under services mainly involves seamless use of InClusterConfig to access cluster resources on edge PODs.
DiscardCloudService
This filter targets two types of services, one of which is LoadBalancer. Because the edge cannot access resources of the LoadBalancer type, this filter will directly filter out resources of this type. Another type is for x-tunnel server internal Svc in the kube system namespace, which mainly exists in the cloud node for accessing the yurt tunnel server. For edge nodes, this service will be directly filtered out.
Current status of filtering framework
The current filtering framework is relatively rigid. hard coding resource filtering into code. Only registered resources can be filtered accordingly. In order to solve this problem, the filtering framework needs to be modified accordingly.
Solution
Option 1:
Customizing the filtering configuration using parameters or environment variables has the following drawbacks:
1. The complex configuration requires writing customized configurations into startup parameters or reading environment variables, as shown in the following format:
--filter_ serviceTopology=coredns/endpointslices#list,kube-proxy/services#list; watch --filter_ endpointsFilter=nginx-ingress-controller/endpoints#list; watch
2. Unable to hot update, every time the configuration is modified, Yurthub needs to be restarted to take effect.
Option 2:
1. Customize the filtering configuration in the form of configmap to reduce configuration complexity. Multiple resources are separated by commas in the configuration format (user agent/resource # list, watch). As follows:
filter_ endpoints: coredns/endpoints#list; watch,test/endpoints#list; watch
filter_ servicetopology: coredns/endpointslices#list; watch
filter_ discardcloudservice: ""
filter_ masterservice: ""
2. Utilizing the Informer mechanism to ensure real-time configuration validation
Based on the above two points, we have chosen Solution 2 in OpenYurt.
Problems encountered during the development process
The API address of the Informer watch on the edge is the proxy address of Yurthub, so Yurthub cannot guarantee that the configmap data is normal before starting the proxy port. If the request for addons is updated before the configmap data after startup, it will result in data being returned to addons without filtering, which can lead to many unexpected issues.
To solve this problem, we need to add WaitForCacheSync to the app to ensure that the data synchronization is completed before returning the corresponding filtered data. However, adding WaitForCacheSync to the app also directly causes the configmap to be blocked when watching, so we need to add a whitelist mechanism before WaitForCacheSync, When Yurthub uses list&watch to access configmap, we do not perform data filtering directly. The corresponding code logic is as follows:
Summary
1. From the above expansion capabilities, we can see that YurtHub is not just a reverse proxy with data caching capability on edge nodes. Instead, it adds a new layer of encapsulation to the application lifecycle management of Kubernetes node, providing the core management and control capabilities required by edge computing.
2. YurtHub is not only applicable to edge computing scenarios, but also can be used as a standing component on the node side, which is applicable to any scenario using Kubernetes. I believe this will also drive YurtHub towards higher performance and stability.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00