This guide walks you through enabling colocation policies and deploying a latency-sensitive (LS) application alongside a best-effort (BE) batch workload on the same node.
Prerequisites
Before you begin, ensure that you have:
-
ack-koordinator 0.8.0 or later installed (formerly ack-slo-manager)
-
(Recommended) ECS bare metal instances running Alibaba Cloud Linux for optimal colocation performance
Key concepts
ack-koordinator uses two independent dimensions—resource priority and QoS class—to control how online and offline workloads share a node. Use them together to define each workload's resource entitlement and isolation behavior.
Resource priorities
Resource priority determines how much of a node's capacity a workload can use.
| Priority | How resources are calculated | Resource name |
|---|---|---|
| Product | Equals the amount of physical resources provided by the node | CPU and memory reported by the node |
| Batch | Dynamically calculated: total physical resources − Product resources currently in use. See Dynamic resource overcommitment. | kubernetes.io/batch-cpu and kubernetes.io/batch-memory (extended resources in node metadata) |
Product resources that are allocated but unused are automatically downgraded to Batch and made available for reclamation.
QoS classes
QoS class determines scheduling priority and isolation behavior when resources are constrained.
| QoS class | Typical workloads | Behavior |
|---|---|---|
| LS (Latency Sensitive) | Web services, microservices, latency-sensitive stream computing | Prioritized in CPU time slice scheduling; prioritized in L3 cache and memory bandwidth allocation; memory is reclaimed from BE workloads first |
| BE (Best Effort) | Batch Spark jobs, MapReduce jobs, AI training jobs, video transcoding | Lower CPU scheduling priority than LS; L3 cache and memory bandwidth are limited; memory is reclaimed from BE workloads before LS workloads |
How resource reclamation works
Node capacity
├── Product limit ← Resources requested by LS pods
│ └── Actual usage ← Varies over time (often well below limit)
│ └── Reclaimable = limit − usage ← Available for BE pods
└── BE pods run on reclaimable resources
BE workloads consume resources that would otherwise sit idle, without affecting online service performance.
Valid combinations
Resource priority and QoS class are independent but only two combinations are used in practice:
-
Product + LS: Online, latency-sensitive applications (web apps, stream computing)
-
Batch + BE: Offline, lower-priority applications (Spark jobs, MapReduce jobs, AI training)
Enable colocation policies
ack-koordinator reads colocation policies from the ack-slo-config ConfigMap in the kube-system namespace.
-
Create
configmap.yamlwith the following content:# Example of the ack-slo-config ConfigMap. apiVersion: v1 kind: ConfigMap metadata: name: ack-slo-config namespace: kube-system data: colocation-config: |- { "enable": true } resource-qos-config: |- { "clusterStrategy": { "lsClass": { "cpuQOS": { "enable": true }, "memoryQOS": { "enable": true }, "resctrlQOS": { "enable": true } }, "beClass": { "cpuQOS": { "enable": true }, "memoryQOS": { "enable": true }, "resctrlQOS": { "enable": true } } } } resource-threshold-config: |- { "clusterStrategy": { "enable": true } }The ConfigMap includes three policies:
Policy key What it does colocation-configEnables real-time node load monitoring and identifies resources available for overcommitment. See Dynamic resource overcommitment. resource-qos-configEnables fine-grained resource management for LS and BE workloads, including CPU QoS, Memory QoS, and L3 cache and MBA isolation. resource-threshold-configDynamically limits resources available to BE workloads based on node utilization watermarks. See Elastic resource limit. -
Apply the ConfigMap:
kubectl apply -f configmap.yaml
Deploy workloads
Deploy an LS (online) workload and a BE (offline) workload to the same node. Set the QoS class using the koordinator.sh/qosClass label on each pod.
Deploy the LS workload (NGINX)
-
Create
nginx-ls-pod.yamlwith the following content. Thekoordinator.sh/qosClass: LSlabel marks this pod as a latency-sensitive workload:--- # Nginx application configuration apiVersion: v1 data: config: |- user nginx; worker_processes 80; # The number of Nginx worker processes, which affects concurrency. events { worker_connections 1024; # Default value is 1024. } http { server { listen 8000; gzip off; gzip_min_length 32; gzip_http_version 1.0; gzip_comp_level 3; gzip_types *; } } #daemon off; kind: ConfigMap metadata: name: nginx-conf --- # Manifest for the nginx-ls-pod. apiVersion: v1 kind: Pod metadata: labels: koordinator.sh/qosClass: LS app: nginx name: nginx spec: containers: - image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6 imagePullPolicy: IfNotPresent name: nginx ports: - containerPort: 8000 hostPort: 8000 # The host port that will receive requests for load testing. protocol: TCP resources: limits: cpu: '8' memory: 1Gi requests: cpu: '8' memory: 1Gi volumeMounts: - mountPath: /apps/nginx/conf name: config hostNetwork: true restartPolicy: Never volumes: - configMap: items: - key: config path: nginx.conf name: nginx-conf name: config -
Apply the manifest:
kubectl apply -f nginx-ls-pod.yaml
Deploy the BE workload (FFmpeg)
-
Create
ffmpeg-be-pod.yamlwith the following content. Thekoordinator.sh/qosClass: BElabel marks this pod as a best-effort workload. Resource limits are specified usingkubernetes.io/batch-cpuandkubernetes.io/batch-memoryinstead of standard CPU and memory fields:apiVersion: v1 kind: Pod metadata: labels: koordinator.sh/qosClass: BE name: be-ffmpeg spec: containers: - command: - start-ffmpeg.sh - '30' - '2' - /apps/ffmpeg/input/HD2-h264.ts - /apps/ffmpeg/ image: 'registry.cn-zhangjiakou.aliyuncs.com/acs/ffmpeg-4-4-1-for-slo-test:v0.1' imagePullPolicy: Always name: ffmpeg resources: limits: # Unit: millicores. kubernetes.io/batch-cpu: "70k" kubernetes.io/batch-memory: "22Gi" requests: # Unit: millicores. kubernetes.io/batch-cpu: "70k" kubernetes.io/batch-memory: "22Gi" -
Apply the manifest:
kubectl apply -f ffmpeg-be-pod.yaml
What's next
After both pods are running, explore the colocation capabilities available in ACK:
See it in action
-
Colocate online services and video transcoding applications — an end-to-end example using the setup from this guide
Resource management
CPU control