All Products
Search
Document Center

Alibaba Cloud Service Mesh:Overview of Mesh Optimization Center

Last Updated:May 13, 2024

This topic describes the role of Mesh Optimization Center and its features.

Features

Service Mesh (ASM) is used to manage network communications in a microservices architecture. After sidecar proxies are injected into application workloads, you can perform traffic control, service discovery, and load balancing. ASM provides many powerful features. However, it also has the following performance issues:

  • Increased latency: Requests for each service in ASM must be processed by a set of proxies. This increases the request processing latency.

  • Increased resource usage: Each proxy consumes resources such as CPU cores and memory. Therefore, as the number of services increases, more proxies are required, consuming more resources.

  • More resources consumed: If you enable Transport Layer Security (TLS) to enhance security, proxies in ASM instances will encrypt and decrypt traffic. This also consume resources.

In real-world applications, you need to optimize the performance and availability of ASM from different aspects based on your business requirements to provide better user experience. Service Mesh provides Mesh Optimization Center through which you can optimize the performance and availability of ASM from the following aspects at various levels:

1. Limit the scope of service discovery to improve the efficiency of pushing ASM configurations

2. Use the sidecars that are automatically recommended based on access log analysis to reduce resources consumed by proxies

3. Use adaptive xDS optimization to improve the configuration push efficiency of the control plane

4. Optimize performances of hardware and software

5. Configure Container Service for Kubernetes (ACK) resources that can be dynamically overcommitted in a sidecar proxy

1. Limit the scope of service discovery to improve the efficiency of pushing ASM configurations

In Istio, you can use the following methods to limit the scope of service discovery to improve the efficiency of pushing service configurations to sidecar proxies on the data plane from the control plane. Limiting the scope of service discovery helps effectively reduce the CPU resources and memory resources consumed by the control-plane components, as well as the bandwidth resources required by communications between control-plane components and sidecar proxies.

You can configure service discovery selectors to define a set of filtering rules to filter the service discovery configurations that you want to synchronize to the control plane.

  • You can configure service discovery selectors to ensure that the control plane discovers and processes only services in a specified namespace of a Kubernetes cluster on the data plane.

  • Istiod directly reads the configurations of service discovery selectors. This improves the efficiency of the control plane.

In addition, you can configure the exportTo and workloadSelector fields to limit the service discovery scope to an ASM instance or a specific namespace. You can also limit the service discovery scope to a specific service configuration. The following figure shows an example of the configured service discovery scope.

  • exportTo: specifies the scope to which virtual services, destination rules, and service entries apply.

  • workloadSelector: specifies the service on which destination rules, service entries, Envoy filters, and sidecars take effect.

image.png

You can also configure the service discovery scope in the ASM console. The following figure shows that you can define the scope of service discovery by selecting or clearing namespaces.

image.png

The following figure shows that you can also select services in the corresponding namespaces by editing label selectors. If a namespace of a Kubernetes cluster on the data plane matches a label selector, the services in the namespace will be included in the scope of automatic discovery. The namespace of a Kubernetes cluster on the data plane must match all the rules defined in a label selector before it can be selected by the selector. For more information, see Use service discovery selectors to improve the efficiency of pushing ASM configurations.

image.png

2. Use the sidecars that are automatically recommended based on access log analysis to reduce resources consumed by proxies

This optimization solution is implemented based on X Discovery Service (xDS) and sidecars.

xDS is a set of APIs used to facilitate communications between the control plane and sidecar proxies on the data plane. x refers to different types of APIs, including Listener Discovery Service (LDS) API, Cluster Discovery Service (CDS) API, Endpoint Discovery Service (EDS) API, and Route Discovery Service (RDS) API.

The xDS protocol is a transmission protocol used by sidecar proxies to obtain service configuration information. It is used for communications between Istiod and sidecar proxies. xDS allows you to define service discovery configurations and configurations governance rules in ASM. The size of the xDS data is positively correlated with the mesh size.

By default, xDS distributes service discovery configurations to all sidecar proxies in an ASM instance. This means that all sidecar proxies in the ASM instance store all service discovery configurations of all services in the ASM instance.

In most cases, a simple workload in a large-scale cluster may communicate only with a few other workloads. You can use the sidecar configuration feature to enable the sidecars of the workload to focus only on the services that the workload needs to call. This significantly reduces the memory size consumed by the sidecar proxies.

The following section describes how to use the sidecars that are automatically recommended based on access log analysis:

  1. ASM can obtain the call dependencies between services on the data plane by analyzing the access logs generated by sidecars on the data plane. ASM can then automatically recommend a sidecar for each workload on the data plane.

  2. After a sidecar is recommended based on the access log analysis, users can determine whether to accept the recommended sidecar or customize a sidecar based on the configurations of the automatically generated sidecar.

This optimization method applies to scenarios where access logs of services are generated and the call dependencies between services change slightly. For example, after multiple requests access a Bookinfo application and the microservices in the Bookinfo application call each other, each sidecar proxy will generate access logs. A sidecar shown in the following figure is recommended for the productpage microservice.

image.png

To use the automatic sidecar recommendation feature, the following requirement must be met:

Simple Log Service is activated to collect access logs generated by all service calls. This way, call dependencies between services are obtained. If access logs generated by service calls in a call chain are not collected, the corresponding call dependencies between services may be lost. This may lead to inaccurate sidecar definition and therefore calls of services in the call chain fail.

As shown in the following figure, ASM allows you to use the sidecars that are automatically recommended based on access log analysis to improve the configuration push efficiency. For more information, see Use the sidecars that are automatically recommended based on access log analysis.

image.png

3. Use adaptive xDS optimization to improve the configuration push efficiency of the control plane

To alleviate the limitations in the preceding solution, ASM provides the adaptive xDS optimization feature.

By analyzing the actual call relationships between services, this feature automatically generates optimized sidecar CustomResourceDefinitions (CRDs) for services and pushes required sidecar configurations only for necessary services. After you enable this feature, an egress gateway named istio-axds-egressgateway is deployed in the corresponding Kubernetes cluster and all HTTP traffic is first routed to the egress gateway. This feature automatically analyzes the call dependencies among services in the ASM instance based on the access logs of the egress gateway.

The following figure shows the architecture of this feature.

  • On the ASM control plane, the managed component Adaptive XdsController manages the lifecycle of the AdaptiveXds-EgressGateway component and generates the Envoy filter and bootstrap configurations required for the AdaptiveXds-EgressGateway component.

  • After you enable the adaptive xDS optimization feature, the AdaptiveXds-EgressGateway component reports access logs to Access Log Service (ALS).

  • ALS receives the access logs sent by sidecar proxies and works together with ALS Analyzer to analyze the access logs, and then generates corresponding sidecar CRDs based on the call dependencies between services.

image.png

You can enable the adaptive xDS optimization feature for services in a cluster by namespace. After you enable the adaptive xDS optimization feature for a namespace, the configuration push efficiency is optimized for all services in the namespace, and sidecar CRDs that contain service dependencies are automatically generated for the services.

You can also add asm.alibabacloud.com/asm-adaptive-xds: true to the annotations of a service to enable the feature for the service.

In a real-world use case, the customer who used ASM and enabled the adaptive xDS optimization feature reduced the sidecar proxy configurations by 90% and the consumed memory size from 400 MB to 50 MB.

ASM provides the adaptive xDS optimization feature to improve the configuration push efficiency of the control plane and reduce unnecessary configurations of sidecar proxies. For more information, see Use adaptive xDS optimization to improve the configuration push efficiency of the control plane.

4. Optimize performances of hardware and software

The data plane comes in various forms with different Elastic Container Service (ECS) instance types and operating system (OS) versions running on each node. You can use Node Feature Discovery (NFD) to detect the features of a node to better understand the capabilities supported by the node. For example, you can determine whether the extended Berkeley Packet Filter (eBPF)-related features are supported based on the kernel version. In addition, you can determine whether device plugins are provided and whether to enable TLS encryption and decryption based on whether Advanced Vector Extensions (AVX) are supported.

image.png

In other words, you can use NFD to detect hardware features available on each node in a Kubernetes cluster, including the CPUID feature and instruction set extensions. Then, features can be dynamically configured based on detected features without requiring users to perform operations.

This way, you can make full use of the node environment and dynamically enable the corresponding features to improve performance. ASM allows you to dynamically enable the Multi-Buffer feature based on whether AVX is supported. Enabling the Multi-Buffer feature can improve TLS encryption and decryption performance. For more information, see Accelerate Encrypted Communication Among Application Services with Intel Architecture-based ASM Technology.

image.png

The following section provides more details:

  1. On the ASM control plane, you can extend MeshConfig or CRDs to define a unified declarative configuration.

  2. The configuration of the control plane is delivered to Envoy proxies on the data plane by using the xDS protocol. This feature is an extended capability of ASM.

  3. You can schedule the pods of workloads for which the Multi-Buffer feature is enabled and dynamically adjust configurations. ASM preferentially schedules the pods with the Multi-Buffer feature enabled to nodes that support this feature. This way, the related features can take effect. ASM also provides you with the adaptive dynamic configuration capability. Even if no corresponding nodes are available for pod scheduling, the corresponding features can be disabled adaptively and dynamically when the pods with the corresponding features enabled are deployed on other nodes.

5. Configure ACK resources that can be dynamically overcommitted in a sidecar proxy

In Kubernetes, the kubelet manages the resources that are used by the pods on a node based on the quality of service (QoS) classes of the pods. For example, the kubelet controls the out of memory (OOM) priorities. The QoS class of a pod can be Guaranteed, Burstable, or BestEffort. The QoS classes of pods depend on the requests and limits of CPU and memory resources that are configured for the pods.

image.png

ack-koordinator can dynamically overcommit resources. ack-koordinator monitors the loads of a node in real time and then schedules the resources that are allocated to pods but are not in use.

Resource Limits and Required Resources can be set to the same value or different values. We recommend that you configure Resource Limits and Required Resources based on the workload type.

  • If the QoS class of a workload is Guaranteed, we recommend that you set both to the same value.

  • If you use other types of pods, we recommend that you keep the Required Resources value smaller than the Resource Limits value. This configuration requirement also applies to regular resources.

You can configure ACK resources that can be dynamically overcommitted for the injected sidecar proxy container and isito-init container. For more information, see Configure sidecar proxies and Configure ACK resources that can be dynamically overcommitted in a sidecar proxy.