The ack-koordinator component provides memory quality of service (QoS) for containers running in ACK clusters. By assigning different QoS classes to containers, ack-koordinator prioritizes memory for latency-sensitive workloads and reduces out of memory (OOM) errors during memory contention.
How it works
Kubernetes assigns each pod a memory request and a memory limit. Two failure modes occur when memory is constrained:
-
Container-level pressure: When a container's memory usage (including page cache) approaches its limit, the OS triggers memcg-level direct memory reclamation, blocking the container's processes. If memory is allocated faster than it is reclaimed, an OOM error terminates the pod.
-
Node-level pressure: When the sum of container memory limits exceeds the node's physical memory, the OS kernel reclaims memory across containers indiscriminately. This degrades performance and can trigger OOM errors for any pod on the node.
ack-koordinator addresses both failure modes by automatically configuring the memory control group (memcg) for each container. It enables three Alibaba Cloud Linux kernel features:
-
Memcg QoS: locks a minimum amount of memory so high-priority containers retain their working set
-
Memcg backend asynchronous reclamation: proactively reclaims memory before the limit is reached, avoiding blocking direct reclamation
-
Memcg global minimum watermark rating: adjusts the per-container reclamation threshold so latency-sensitive (LS) containers are reclaimed last
The result is fairer memory distribution across containers and lower application latency during overcommitment.
Advantages over open-source Kubernetes memory QoS
The upstream Kubernetes memory QoS feature is supported in Kubernetes 1.22 and later, supports only cgroup v2, and requires manual kubelet configuration. It applies to all pods and nodes in the cluster and does not support fine-grained per-pod or per-namespace configuration.
The ack-koordinator memory QoS feature improves on the upstream implementation in two key ways:
-
Broader kernel compatibility: Supports both cgroup v1 and cgroup v2 interfaces, backed by Alibaba Cloud Linux kernel features such as memcg backend asynchronous reclamation and minimum watermark rating. For details, see Overview of kernel features and interfaces.
-
Fine-grained configuration: Use pod annotations or ConfigMaps to configure memory QoS independently for a specific pod, namespace, or the entire cluster.
Configuration mechanism
ack-koordinator uses four cgroup parameters to enforce memory QoS policies. The following table shows how each parameter maps to the configuration options described in Advanced parameters:
| cgroup parameter | Controls | Configured by |
|---|---|---|
memory.limit_in_bytes |
Hard upper limit for the container | Kubernetes (from limits.memory) |
memory.high |
Throttling threshold — reclamation starts here | throttlingPercent |
memory.wmark_high |
Async reclamation trigger | wmarkRatio |
memory.min |
Unreclaimable memory floor | minLimitPercent / lowLimitPercent |
Configuration priority
When multiple configuration sources apply to the same pod, ack-koordinator uses the following priority order (highest first):
-
Pod annotation (
koordinator.sh/memoryQOS) -
Namespace-level ConfigMap (
ack-slo-pod-config) -
Cluster-level ConfigMap (
ack-slo-config)
QoS class mapping
If a pod does not have the koordinator.sh/qosClass label, ack-koordinator maps from Kubernetes QoS classes automatically:
| Kubernetes QoS class | koordinator QoS class |
|---|---|
| Guaranteed | Default memory QoS settings |
| Burstable | LS (latency-sensitive) |
| BestEffort | BE (best-effort) |
Prerequisites
Before you begin, make sure you have:
-
An ACK cluster running Kubernetes 1.18 or later. To upgrade, see Manually update ACK clusters.
-
Alibaba Cloud Linux as the node OS. Some advanced parameters depend on Alibaba Cloud Linux kernel features. See Advanced parameters for details.
-
ack-koordinator 0.8.0 or later installed. See ack-koordinator for installation steps.
Enable memory QoS for a specific pod
Add the following annotation to the pod spec:
annotations:
# Enable memory QoS with recommended settings
koordinator.sh/memoryQOS: '{"policy": "auto"}'
# Disable memory QoS
# koordinator.sh/memoryQOS: '{"policy": "none"}'
Enable memory QoS for a cluster
Use the ack-slo-config ConfigMap to apply memory QoS to all pods in the cluster.
-
Create a file named
configmap.yamlwith the following content:apiVersion: v1 kind: ConfigMap metadata: name: ack-slo-config namespace: kube-system data: resource-qos-config: |- { "clusterStrategy": { "lsClass": { "memoryQOS": { "enable": true } }, "beClass": { "memoryQOS": { "enable": true } } } } -
Set the QoS class of each pod using the
koordinator.sh/qosClasslabel:apiVersion: v1 kind: Pod metadata: name: pod-demo labels: koordinator.sh/qosClass: 'LS' -
Apply the ConfigMap:
-
If
ack-slo-configalready exists inkube-system, update it to avoid overwriting unrelated settings: ``bash kubectl patch cm -n kube-system ack-slo-config --patch "$(cat configmap.yaml)"`` -
If it does not exist, create it: ``
bash kubectl apply -f configmap.yaml``
-
-
(Optional) Configure advanced parameters.
Enable memory QoS for a namespace
Use the ack-slo-pod-config ConfigMap to enable or disable memory QoS for pods in specific namespaces.
-
Create a file named
ack-slo-pod-config.yamlwith the following content:apiVersion: v1 kind: ConfigMap metadata: name: ack-slo-pod-config namespace: kube-system data: memory-qos: | { "enabledNamespaces": ["allow-ns"], "disabledNamespaces": ["block-ns"] }Replace
allow-nsandblock-nswith the actual namespace names. -
Apply the ConfigMap:
kubectl patch cm -n kube-system ack-slo-pod-config --patch "$(cat ack-slo-pod-config.yaml)" -
(Optional) Configure advanced parameters.
Example: Redis under memory overcommitment
This example shows how memory QoS reduces Redis latency and increases throughput when the node's memory is overcommitted. The test uses:
-
An ACK Pro cluster with two nodes, each with 8 vCPUs and 32 GB of memory
-
One node running the Redis workload, the other running the stress test
Run the test
-
Create a file named
redis-demo.yaml:apiVersion: v1 kind: ConfigMap metadata: name: redis-demo-config data: redis-config: | appendonly yes appendfsync no --- apiVersion: v1 kind: Pod metadata: name: redis-demo labels: koordinator.sh/qosClass: 'LS' annotations: koordinator.sh/memoryQOS: '{"policy": "auto"}' spec: containers: - name: redis image: redis:5.0.4 command: - redis-server - "/redis-master/redis.conf" env: - name: MASTER value: "true" ports: - containerPort: 6379 resources: limits: cpu: "2" memory: "6Gi" requests: cpu: "2" memory: "2Gi" volumeMounts: - mountPath: /redis-master-data name: data - mountPath: /redis-master name: config volumes: - name: data emptyDir: {} - name: config configMap: name: redis-demo-config items: - key: redis-config path: redis.conf nodeName: # Set to the name of the node running Redis. --- apiVersion: v1 kind: Service metadata: name: redis-demo spec: ports: - name: redis-port port: 6379 protocol: TCP targetPort: 6379 selector: name: redis-demo type: ClusterIP -
Deploy Redis:
kubectl apply -f redis-demo.yaml -
Simulate memory overcommitment using the Stress tool. Create a file named
stress-demo.yaml:apiVersion: v1 kind: Pod metadata: name: stress-demo labels: koordinator.sh/qosClass: 'BE' annotations: koordinator.sh/memoryQOS: '{"policy": "auto"}' spec: containers: - args: - '--vm' - '2' - '--vm-bytes' - 11G - '-c' - '2' - '--vm-hang' - '2' command: - stress image: polinux/stress imagePullPolicy: Always name: stress restartPolicy: Always nodeName: # Set to the same node as redis-demo. -
Deploy the stress workload:
kubectl apply -f stress-demo.yaml -
Verify the global minimum watermark before running the benchmark.
ImportantIn memory overcommitment scenarios, a low global minimum watermark causes the OOM killer to run before memory reclamation. For a 32 GB node, set this value to at least 4,000,000 KB.
cat /proc/sys/vm/min_free_kbytesExpected output:
4000000 -
Deploy the memtier-benchmark tool to send requests to the Redis pod:
apiVersion: v1 kind: Pod metadata: labels: name: memtier-demo name: memtier-demo spec: containers: - command: - memtier_benchmark - '-s' - 'redis-demo' - '--data-size' - '200000' - "--ratio" - "1:4" image: 'redislabs/memtier_benchmark:1.3.0' name: memtier restartPolicy: Never nodeName: # Set to the name of the node sending requests. -
Check the benchmark results:
kubectl logs -f memtier-demo -
To compare, disable memory QoS on both pods and repeat the test:
apiVersion: v1 kind: Pod metadata: name: redis-demo labels: koordinator.sh/qosClass: 'LS' annotations: koordinator.sh/memoryQOS: '{"policy": "none"}' spec: ... --- apiVersion: v1 kind: Pod metadata: name: stress-demo labels: koordinator.sh/qosClass: 'BE' annotations: koordinator.sh/memoryQOS: '{"policy": "none"}'
Test results
The following data is for reference only. Actual results depend on your cluster configuration and workload.
| Metric | Memory QoS disabled | Memory QoS enabled |
|---|---|---|
Latency-avg |
51.32 ms | 47.25 ms |
Throughput-avg |
149.0 MB/s | 161.9 MB/s |
Enabling memory QoS reduced Redis latency by 7.9% and increased throughput by 8.7% under memory overcommitment.
Advanced parameters
The following table lists the advanced parameters you can set in pod annotations or the ack-slo-config ConfigMap. Pod annotations take precedence over ConfigMap settings.
<table> <thead> <tr> <td><p><b>Parameter</b></p></td> <td><p><b>Type</b></p></td> <td><p><b>Value range</b></p></td> <td><p><b>Description</b></p></td> <td><p><b>Pod annotation</b></p></td> <td><p><b>ConfigMap</b></p></td> </tr> </thead> <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup> <tbody> <tr> <td><p><code>enable</code></p></td> <td><p>Boolean</p></td> <td> <ul> <li><p><code>true</code></p></li> <li><p><code>false</code></p></li> </ul></td> <td> <ul> <li><p><code>true</code>: enables memory QoS for all containers in a cluster. The recommended memcg settings for the QoS class of the containers are used. </p></li> <li><p><code>false</code>: disables memory QoS for all containers in a cluster. The memcg settings are restored to the original settings for the QoS class of the containers. </p></li> </ul></td> <td><p><img></p></td> <td><p><img></p></td> </tr> <tr> <td><p><code>policy</code></p></td> <td><p>String</p></td> <td> <ul> <li><p><code>auto</code></p></li> <li><p><code>default</code></p></li> <li><p><code>none</code></p></li> </ul></td> <td> <ul> <li><p><code>auto</code>: enables memory QoS for the containers in the pod and uses the recommended settings. The recommended settings take precedence over the settings that are specified in the ack-slo-pod-config ConfigMap. </p></li> <li><p><code>default</code>: specifies that the pod inherits the settings that are specified in the ack-slo-pod-config ConfigMap. </p></li> <li><p><code>none</code>: disables memory QoS for the pod. The relevant memcg settings are restored to the original settings. The original settings take precedence over the settings that are specified in the ack-slo-pod-config ConfigMap. </p></li> </ul></td> <td><p><img></p></td> <td><p><img></p></td> </tr> <tr> <td><p><code>minLimitPercent</code></p></td> <td><p>Int</p></td> <td><p>0\~100</p></td> <td><p>Unit: %. Default value: <code>0</code>. The default value indicates that this parameter is disabled. </p><p>This parameter specifies the unreclaimable proportion of the memory request of a pod. This parameter is suitable for scenarios where applications are sensitive to the page cache. You can use this parameter to cache files to optimize read and write performance. For more information, see the Alibaba Cloud Linux topic <a href="https://www.alibabacloud.com/help/en/document_detail/169536.html#concept-2482889">Memcg QoS feature of the cgroup v1 interface</a>. </p><p>The amount of unreclaimable memory is calculated based on the following formula: <code>Value of memory.min = Memory request × Value of minLimitPercent/100</code>. For example, if you specify <code>Memory Request=100MiB</code> and <code>minLimitPercent=100</code> for a container, <code>the value of memory.min is 104857600</code>. </p></td> <td><p><img></p></td> <td><p><img></p></td> </tr> <tr> <td><p><code>lowLimitPercent</code></p></td> <td><p>Int</p></td> <td><p>0\~100</p></td> <td><p>Unit: %. Default value: <code>0</code>. The default value indicates that this parameter is disabled. </p><p>This parameter specifies the relatively unreclaimable proportion of the memory request of a pod. For more information, see the Alibaba Cloud Linux topic <a href="https://www.alibabacloud.com/help/en/document_detail/169536.html#concept-2482889">Memcg QoS feature of the cgroup v1 interface</a>. </p><p>The amount of relatively unreclaimable memory is calculated based on the following formula: <span><code>Value of memory.low = Memory request × Value of lowLimitPercent/100</code></span>. For example, if you specify <span><code>Memory Request=100MiB</code></span> and <span><code>lowLimitPercent=100</code></span> for a container, <span><code>the value of memory.low is 104857600</code></span>. </p></td> <td><p><img></p></td> <td><p><img></p></td> </tr> <tr> <td><p><code>throttlingPercent</code></p></td> <td><p>Int</p></td> <td><p>0\~100</p></td> <td><p>Unit: %. Default value: <code>0</code>. The default value indicates that this parameter is disabled. </p><p>This parameter specifies the memory throttling threshold for the ratio of the memory usage of a container to the memory limit of the container. If the memory usage of a container exceeds the memory throttling threshold, the memory used by the container will be reclaimed. This parameter is suitable for container memory overcommitment scenarios. You can use this parameter to prevent cgroups from triggering OOM. For more information, see the Alibaba Cloud Linux topic <a href="https://www.alibabacloud.com/help/en/document_detail/169536.html#concept-2482889">Memcg QoS feature of the cgroup v1 interface</a>. </p><p>The memory throttling threshold for memory usage is calculated based on the following formula: <code>Value of memory.high = Memory limit × Value of throttlingPercent/100</code>. For example, if you specify <code>Memory Limit=100MiB</code> and <code>throttlingPercent=80</code> for a container, <code>the value of memory.high is 83886080(80 MiB)</code>. </p></td> <td><p><img></p></td> <td><p><img></p></td> </tr> <tr> <td><p><code>wmarkRatio</code></p></td> <td><p>Int</p></td> <td><p>0\~100</p></td> <td><p>Unit: %. Default value: <code>95</code>. A value of <code>0</code> indicates that this parameter is disabled. If the memory usage exceeds the reclamation threshold, the memcg backend asynchronous reclamation feature is triggered. </p><p>This parameter specifies the asynchronous memory reclamation threshold of memory usage to memory limit or memory usage to the value of <code>memory.high</code>. For more information, see the Alibaba Cloud Linux topic <a href="https://www.alibabacloud.com/help/en/document_detail/169535.html#task-2487938">Memcg backend asynchronous reclaim</a>. </p><p>If throttlingPercent is disabled, the memory reclaim threshold for memory usage is calculated based on the following formula: Value of memory.wmark_high = Memory limit × wmarkRatio/100. If throttlingPercent is enabled, the memory reclaim threshold for memory usage is calculated based on the following formula: <code>Value of memory.wmark_high = Value of memory.high × wmarkRatio/100</code>. For example, if you specify <code>Memory Limit=100MiB</code> and <code>wmarkRatio=95,throttlingPercent=80</code> for a container, the memory throttling threshold specified by <code>memory.high is 83886080 (80 MiB)</code>, the memory reclamation ratio <code>memory.wmark_ratio is 95</code>, and the memory reclamation threshold specified by <code>memory.wmark_high is 79691776 (76 MiB)</code>. </p></td> <td><p><img></p></td> <td><p><img></p></td> </tr> <tr> <td><p><code>wmarkMinAdj</code></p></td> <td><p>Int</p></td> <td><p>-25\~50</p></td> <td><p>Unit: %. The default value is <code>-25</code> for the <code>LS</code> QoS class and <code>50</code> for the <code>BE</code> QoS class. A value of <code>0</code> indicates that this parameter is disabled. </p><p>This parameter specifies the adjustment to the global minimum watermark for a container. A negative value decreases the global minimum watermark and therefore postpones memory reclamation for the container. A positive value increases the global minimum watermark and therefore antedates memory reclamation for the container. For more information, see the Alibaba Cloud Linux topic <a href="https://www.alibabacloud.com/help/en/document_detail/169537.html#task-2492619">Memcg global minimum watermark rating</a>. </p><p>For example, if you create a pod whose QoS class is LS, the default setting of this parameter is <code>memory.wmark_min_adj=-25</code>, which indicates that the minimum watermark is decreased by 25% for the containers in the pod. </p></td> <td><p><img></p></td> <td><p><img></p></td> </tr> </tbody> </table>
FAQ
Is the memory QoS configuration from ack-slo-manager still valid after upgrading to ack-koordinator?
Yes. ack-koordinator is backward compatible with the annotation-based protocol used in ack-slo-manager 0.8.0 and earlier:
-
alibabacloud.com/qosClass— sets the QoS class -
alibabacloud.com/memoryQOS— configures memory QoS
The following table shows which protocols each version supports:
| Component version | alibabacloud.com protocol | koordinator.sh protocol |
|---|---|---|
| ≥ 0.3.0 and < 0.8.0 | ✓ | × |
| ≥ 0.8.0 | ✓ | ✓ |
Compatibility support for the alibabacloud.com protocol ended on July 30, 2023. Migrate your configurations to the koordinator.sh protocol.
Billing
No fee is charged for installing or using the ack-koordinator component. However, costs may apply in the following cases:
-
Node resource usage: ack-koordinator is a non-managed component that runs on worker nodes. You can configure the resource requests for each module at install time.
-
Prometheus metrics: If you enable Prometheus metrics for ack-koordinator and use Managed Service for Prometheus, the metrics are billed as custom metrics. Before enabling this feature, review the Managed Service for Prometheus billing rules. To monitor usage, see Query the amount of observable data and bills.
What's next
-
Overview of kernel features and interfaces — kernel features required by ACK memory QoS
-
Enable CPU QoS for containers — limit and evict reclaimed resources to protect latency-sensitive workloads