All Products
Search
Document Center

Container Service for Kubernetes:ack-koordinator (FKA ack-slo-manager)

Last Updated:Sep 01, 2023

ack-koordinator is a key component that Container Service for Kubernetes (ACK) clusters use to support service level objective (SLO)-aware workload scheduling. ack-koordinator improves resource utilization to ensure the performance of your applications. This topic introduces ack-koordinator, and describes the usage notes and release notes for ack-koordinator.

Prerequisites

Introduction

History

ack-koordinator is related to the open source project Koordinator. Koordinator is a Quality of service (QoS)-based scheduling system for hybrid workload orchestration on Kubernetes. Koordinator is developed based on years of experience of Alibaba Cloud in SLO-aware scheduling. Koordinator aims to improve the runtime efficiency and reliability of both latency-sensitive workloads and batch jobs, simplify the complexity of resource-related configuration tuning, and increase pod deployment density to improve resource utilization.

ack-koordinator was formerly known as ack-slo-manager. ack-slo-manager provides valuable experience for the incubation of open source Koordinator. ack-slo-manager also benefits from Koordinator in terms of technologies as Koordinator reaches maturity and stability. Therefore, ack-koordinator supports the features of open source Koordinator and also the SLO-aware workload scheduling capability of ack-slo-manager.

Architecture

ack-koordinator consists of two control planes (Koordinator Manager and Koordinator Descheduler) and one DaemonSet component (Koordlet):

  • Koordinator Manager: Koordinator Manager is deployed as a Deployment and runs on a primary pod and a secondary pod to ensure high availability.

    • SLO Controller: SLO Controller manages resource overcommitment and dynamically adjusts the amount of resources that are overcommitted. SLO Controller also manages the SLO policies of each node.

    • Recommender: Recommender provides the resource profiling feature and estimates the peak resource demand of workloads. Recommender simplifies the configuration of resource requests and limits for containers.

  • Koordinator Descheduler: Koordinator Descheduler is deployed as a Deployment and is used to conduct rescheduling.

  • Koordlet: Koordlet is deployed as a DaemonSet and is used to support resource overcommitment, fine-grained scheduling, and QoS in hybrid deployment scenarios.

 ack-koord-arch

Version management

In ack-koordinator v1.1.1-ack.1 and later, the version number uses the x.y.z-ackn format.

  • x.y.z: indicates the corresponding open source Koordinator version. This means that ack-koordinator supports all features provided by this open source version.

  • ackn: indicates feature enhancement and optimization based on the open source Koordinator version.

Overview

The components of ack-koordinator support the features of the corresponding versions of open source Koordinator. When you install and configure the components of ack-koordinator, only basic feature gates are enabled by default. To use advanced features supported by open source Koordinator, you must manually enable the corresponding feature gates for ack-koordinator. For more information about the advanced features supported by open source Koordinator, see Koordinator official documentation.

Category

References

Whether the feature is the same as that of the open source Koordinator version

CPU scheduling

Topology-aware CPU scheduling

No

CPU Burst

Yes

Load-aware scheduling

Load-aware pod scheduling

Yes

Hotspot descheduling

Yes

Fine-grained scheduling

Resource profiling

No

Dynamic resource overcommitment

Yes

Elastic resource limit

Yes

CPU QoS for containers

Yes

Memory QoS for containers

Yes

Resource isolation based on the L3 cache and MBA

Yes

Dynamically modify the resource parameters of a pod

No

Use the nearby memory access acceleration feature on multi-NUMA instances

No

Use DSA to accelerate data streaming

No

All the features of resource-controller are supported by ack-koordinator. resource-controller is discontinued. We recommend that you upgrade from resource-controller to ack-koordinator. For more information, see Upgrade from resource-controller to ack-koordinator.

Component management

ack-koordinator is available on the Add-ons page of the ACK console. On the Add-ons page, you can install, update, and uninstall ack-koordinator. If the version of ack-koordinator that you are using is lower than V0.7, ack-koordinator is installed from the marketplace. Follow the steps described in Migrate ack-koordinator from the marketplace to the Add-ons page to migrate ack-koordinator from the marketplace to the Add-ons page.

Install ack-koordinator

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click More > Manage Components in the Actions column of the cluster.

  3. On the Add-ons page, find and click ack-koordinator. Then, click Install on the ack-koordinator card.

  4. In the Install ack-koordinator dialog box, configure parameters and click OK.

  5. In the left-side navigation pane of the cluster management page, choose Applications > Helm to view the status of ack-koordinator.

    If Deployed is displayed in the Status column, ack-koordinator is installed.

Modify ack-koordinator

  1. Click Configuration on the ack-koordinator card of the Add-ons page.

  2. In the ack-koordinator Parameters dialog box, modify the parameters and click OK.

    ACK will re-install ack-koordinator based on the modified configurations.

Update ack-koordinator

  1. Click Upgrade on the ack-koordinator card of the Add-ons page.

  2. In the ack-koordinator Parameters dialog box, modify the parameters and click OK.

    Important
    • The Upgrade button is displayed only if the component is not updated to the latest version.

    • The update overwrites all the changes that are made to the deployed components of ack-koordinator (including Deployments and DaemonSets) by using other methods.

Uninstall ack-koordinator

The topology-aware CPU scheduling feature creates a topology ConfigMap in the kube-system namespace for each node in an ACK cluster. ack-koordinator 0.5.1 and later can automatically delete the topology ConfigMaps of the nodes that are removed from a cluster. If you uninstall ack-koordinator, the topology ConfigMaps of the existing nodes in the cluster are still retained. The retained topology ConfigMaps do not impact other features but occupy storage space. We recommend that you delete these ConfigMaps at your earliest convenience.

  1. On the ack-slo-manager card of the Add-ons page, click Uninstall. In the Uninstall message, click OK.

  2. Delete topology ConfigMaps.

    1. In the left-side navigation pane, choose Configurations > ConfigMaps. In the top navigation bar, select the namespace of kube-system.

    2. Enter -numa-info into the Name search box and select the ConfigMaps that match the ${NODENAME}-numa-info naming rule from the list. Then, click Delete in the Actions column to delete the ConfigMaps.

    3. In the Confirm message, click OK.

Migrate components

Migrate ack-koordinator from the Marketplace page to the Add-ons page

If you have modified the ConfigMap of the ack-koordinator on the Marketplace page, back up the ConfigMap before you update ack-koordinator.

  1. Optional: Use one of the following methods to back up the ConfigMap of ack-koordinator.

    • Method 1: Use kubectl to back up the ConfigMap

      1. Run the kubectl get cm -n kube-system ack-slo-manager-config -o yaml > slo-config.yaml command to save the ConfigMap to the slo-config.yaml file. In this example, the namespace of the ConfigMap is kube-system and the name of the ConfigMap is ack-slo-manager-config. Replace them with the actual values.

      2. Run the vim slo-config.yaml command to change the namespace of the ConfigMap in the preceding file to kube-system, change the name of the ConfigMap to ack-slo-config, and delete all annotations and labels in the ConfigMap in case the update automatically overwrites these settings.

      3. Run the kubectl apply -f slo-config.yaml command to apply the modified ConfigMap to the cluster.

    • Method 2: Use the ACK console to back up the ConfigMap

      1. Record the key-value pair in the ConfigMap.

        1. In the left-side navigation pane, choose Configurations > ConfigMaps. In the top navigation bar of the page, select the namespace that is specified when you install ack-koordinator from the marketplace. The default namespace is kube-system.

        2. Enter the ConfigMap name ack-slo-manager-config into the search box, click the name of the ConfigMap that is displayed, and record the key-value pair.

      2. Use the recorded key-value pair to create another ConfigMap.

        1. In the left-side navigation pane, choose Configurations > ConfigMaps. In the top navigation bar, select All Namespaces.

        2. In the upper-right corner of the ConfigMap page, click Create. In the panel that appears, enter ack-slo-config into the ConfigMap Name field, select the kube-system namespace, click + Add, enter the recorded key-value pair, and then click OK.

  2. Update ack-koordinator to the latest version. For more information, see Update ack-koordinator.

    Important

    If you have modified the ConfigMap of ack-koordinator on the Marketplace page, you need to specify the name of the backup ConfigMap that is created in Step 1 in the ack-koordinator Parameters dialog box.

Upgrade from resource-controller to ack-koordinator

ack-koordinator supports all the features of resource-controller. If resource-controller is installed in your cluster, perform the following steps to upgrade from resource-controller to ack-koordinator:

  1. Update resource-controller to the latest version.

    1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

    2. On the Clusters page, find the cluster that you want to manage and click More > Manage Components in the Actions column of the cluster.

    3. On the Add-ons page, find and click resource-controller and then click Upgrade on the resource-controller card.

    4. In the message that appears, click OK.

  2. Install and configure ack-koordinator.

    1. On the Add-ons page, find and click ack-koordinator. Then, click Install on the ack-koordinator card.

    2. In the Install ack-koordinator dialog box, modify agentFeatureGates (the feature gate switch of ack-slo-agent) based on your business requirement.

      1. Check whether the cluster uses the CPU Burst feature described in Dynamically modify the resource parameters of a pod. This feature allows you to create a custom resource definition (CRD) or add a pod annotation to modify the cgroup configuration file named cpu.cfs_quota_us. If this feature is used, perform Step ii. If this feature is not used, perform Step c.

      2. Run the following command to obtain the feature-gate configurations from the YAML file of the ack-slo-agent DaemonSet:

        kubectl get daemonset -n kube-system ack-slo-agent -o yaml |grep feature-gates
        - --feature-gates=AllAlpha=false,AllBeta=false,...,CPUBurst=true,....
      3. Run the following command to modify the feature-gate configurations of ack-slo-agent. Disable CPU Burst by specifying CPUBurst=false. Separate multiple parameters with commas (,). Keep the other settings unchanged.

        After CPU Burst is disabled, CPU Burst is unavailable for all containers in the cluster. This prevents the cgroup configuration file cpu.cfs_quota_us from being modified by two modules at the same time.

        AllAlpha=false,AllBeta=false,...,CPUBurst=false,....
      4. If you want to dynamically scale the CPU resources of containers, we recommend that you enable the CPU Burst feature. This feature can automatically adjust the CPU limit of pods. For more information, see CPU Burst.

    3. Configure other parameters based on your business requirements and click OK.

    4. In the left-side navigation pane, choose Applications > Helm to view the status of ack-koordinator.

      If Deployed is displayed in the Status column, ack-koordinator is installed.

  3. Uninstall resource-controller.

    1. On the Add-ons page, find and click resource-controller and then click Uninstall on the resource-controller card.

    2. In the Uninstall message, click OK.

FAQ

Component installation error: no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1" ensure CRDs are installed first

Prometheus is not installed in the cluster. Install the Prometheus component by following the steps in Use Alibaba Cloud Prometheus Service to monitor an ACK cluster, or clear Enable Prometheus Metrics for ack-koordinator when you install ack-koordinator.

Component installation error: task install-addons-xxx timeout, error install addons map[ack-slo-manager:Can't install release with errors: ... function "lookup" not defined

Update Helm to V3.0 or later. For more information about how to update Helm, see [Component Updates] Update Helm V2 to V3.

Release notes

June 2023

Version number

Image address

Release date

Description

Impact

v1.2.0-ack1.3

  • registry-cn-zhangjiakou.ack.aliyuncs.com/acs/koord-manager:v1.2.0-ack1.3-89a9730-aliyun

  • registry-cn-zhangjiakou.ack.aliyuncs.com/acs/koordlet:v1.2.0-ack1.3-89a9730-aliyun

  • registry-cn-zhangjiakou.ack.aliyuncs.com/acs/koord-descheduler:v1.2.0-ack1.3-89a9730-aliyun

2023-06-09

Internal API operations are optimized.

No impact on workloads

April 2023

Version number

Image address

Release date

Description

Impact

v1.2.0-ack1.2

  • registry.cn-hangzhou.aliyuncs.com/acs/koord-manager:v1.2.0-ack1.2-b675c9a8-aliyun

  • registry.cn-hangzhou.aliyuncs.com/acs/koordlet:v1.2.0-ack1.2-b675c9a8-aliyun

  • registry.cn-hangzhou.aliyuncs.com/acs/koord-descheduler:v1.2.0-ack1.2-b675c9a8-aliyun

2023-04-25

No impact on workloads

March 2023

Version number

Image address

Release date

Description

Impact

v1.1.1-ack.2

  • registry.cn-hangzhou.aliyuncs.com/acs/koord-manager:v1.1.1-ack.2

  • registry.cn-hangzhou.aliyuncs.com/acs/koordlet:v1.1.1-ack.2

  • registry.cn-hangzhou.aliyuncs.com/acs/koord-descheduler:v1.1.1-ack.2

2023-03-23

Internal API operations are optimized.

No impact on workloads

January 2023

Version number

Image address

Release date

Description

Impact

v1.1.1-ack.1

  • registry.cn-hangzhou.aliyuncs.com/acs/koord-manager:v1.1.1-ack.1

  • registry.cn-hangzhou.aliyuncs.com/acs/koordlet:v1.1.1-ack.1

2023-01-11

  • The component is renamed as ack-koordinator.

  • Internal API operations are optimized.

No impact on workloads

November 2022

Version number

Image address

Release date

Description

Impact

v0.8.0

  • registry.cn-hangzhou.aliyuncs.com/acs/koord-manager:v0.8.0

  • registry.cn-hangzhou.aliyuncs.com/acs/koordlet:v0.8.0

2022-11-17

  • Load-aware pod scheduling is updated to support Kubernetes 1.22.

  • Internal API operations are optimized.

If you want to use load-aware pod scheduling after ack-slo-manager is updated, you must update the Kubernetes version of your cluster to 1.22.15-ack-2.0. Other features are not affected.

September 2022

Version number

Image address

Release date

Description

Impact

v0.7.2

registry.cn-hangzhou.aliyuncs.com/acs/ack-slo-manager:v0.7.2

2022-09-16

The following issue induced by V0.7.1 is fixed: topology-aware scheduling does not take effect on pods.

No impact on workloads

v0.7.1

registry.cn-hangzhou.aliyuncs.com/acs/ack-slo-manager:v0.7.1

2022-09-02

  • The CPU throttling issue that occurs when topology-aware scheduling is used in a CentOS 3 kernel environment is fixed.

  • Prometheus Monitoring can be enabled and disabled.

  • Resource profiling features are supported.

  • ack-slo-manager can no longer be installed from the marketplace in the ACK console.

No impact on workloads

August 2022

Version number

Image address

Release date

Description

Impact

v0.7.0

registry.cn-hangzhou.aliyuncs.com/acs/ack-slo-manager:v0.7.0

2022-08-08

ack-slo-manager is migrated from the marketplace to the Add-ons page in the ACK console. If you want to install ack-slo-manager, go to the Add-ons page.

No impact on workloads

July 2022

Version number

Image address

Release date

Description

Impact

v0.6.0

registry.cn-hangzhou.aliyuncs.com/acs/ack-slo-manager:v0.6.0

2022-07-26

Internal API operations are optimized and component configurations are simplified.

No impact on workloads

June 2022

Version number

Image address

Release date

Description

Impact

v0.5.2

registry.cn-hangzhou.aliyuncs.com/acs/ack-slo-manager:v0.5.2

2022-06-14

  • Internal API operations are optimized.

  • CPU QoS for containers is optimized.

  • Automatic creation of the ack-slo-manager namespace can be disabled.

No impact on workloads

v0.5.1

registry.cn-hangzhou.aliyuncs.com/acs/ack-slo-manager:v0.5.1

2022-06-02

  • CPU QoS for containers is supported.

  • Internal API operations are optimized. The topology ConfigMaps of the nodes that are removed from a cluster can be automatically deleted.

No impact on workloads

April 2022

Version number

Image address

Release date

Description

Impact

v0.5.0

registry.cn-hangzhou.aliyuncs.com/acs/ack-slo-manager:v0.5.0

2022-04-29

No impact on workloads

v0.4.1

registry.cn-hangzhou.aliyuncs.com/acs/ack-slo-manager:v0.4.1

2022-04-14

No impact on workloads

v0.4.0

registry.cn-hangzhou.aliyuncs.com/acs/ack-slo-manager:v0.4.0

2022-04-11

The memory consumption of slo-agent is reduced.

No impact on workloads

February 2022

Version number

Image address

Release date

Description

Impact

v0.3.0

registry.cn-hangzhou.aliyuncs.com/acs/ack-slo-manager:v0.3.0

2022-02-25

No impact on workloads

December 2021

Version number

Image address

Release date

Description

Impact

v0.2.0

registry.cn-hangzhou.aliyuncs.com/acs/ack-slo-manager:v0.2.0

2021-12-10

No impact on workloads

September 2021

Version number

Image address

Release date

Description

Impact

v0.1.1

registry.cn-hangzhou.aliyuncs.com/acs/ack-slo-manager:v0.1.1-c2ccefa

2021-09-02

Internal API operations are optimized.

No impact on workloads

July 2021

Version number

Image address

Release date

Description

Impact

v0.1.0

registry.cn-hangzhou.aliyuncs.com/acs/ack-slo-manager:v0.1.0-09766de

2021-07-08

Load-aware pod scheduling is supported.

No impact on workloads