Getting started - Container Service for Kubernetes - Alibaba Cloud Documentation Center

Use ack-koordinator to quickly set up a colocation environment and run workloads in colocation mode. This topic explains how to enable colocation policies and deploy LS and BE workloads on the same node.

Prerequisites

Ensure you have:

An ACK Pro cluster
ack-koordinator 0.8.0 or later installed (formerly ack-slo-manager)
(Recommended) ECS bare metal instances running Alibaba Cloud Linux

Key concepts

ack-koordinator uses resource priority and QoS class to control how online and offline workloads share a node.

Resource priorities

Resource priority determines how much node capacity a workload can use.

Priority	How resources are calculated	Resource name
Product	Equals the node's physical resources	CPU and memory reported by the node
Batch	Dynamically calculated: total physical resources − Product resources in use. See Dynamic resource overcommitment.	`kubernetes.io/batch-cpu` and `kubernetes.io/batch-memory` (extended resources in node metadata)

Allocated but unused Product resources are automatically downgraded to Batch for reclamation.

QoS classes

QoS class determines scheduling priority and isolation when resources are constrained.

QoS class	Typical workloads	Behavior
LS (Latency Sensitive)	Web services, microservices, latency-sensitive stream computing	Priority in CPU scheduling, L3 cache, and memory bandwidth; memory reclaimed from BE first
BE (Best Effort)	Batch Spark jobs, MapReduce jobs, AI training jobs, video transcoding	Lower CPU priority than LS; L3 cache and memory bandwidth limited; memory reclaimed before LS workloads

How resource reclamation works

Node capacity
├── Product limit      ← Resources requested by LS pods
│   └── Actual usage  ← Varies over time (often well below limit)
│       └── Reclaimable = limit − usage ← Available for BE pods
└── BE pods run on reclaimable resources

BE workloads consume otherwise-idle resources without affecting online service performance.

Valid combinations

Resource priority and QoS class are independent but only two combinations are used in practice:

Product + LS: Online, latency-sensitive applications (web apps, stream computing)
Batch + BE: Offline, lower-priority applications (Spark jobs, MapReduce jobs, AI training)

Enable colocation policies

ack-koordinator reads colocation policies from the ack-slo-config ConfigMap in the kube-system namespace.

Create configmap.yaml with the following content:

# Example of the ack-slo-config ConfigMap.
apiVersion: v1
kind: ConfigMap
metadata:
  name: ack-slo-config
  namespace: kube-system
data:
  colocation-config: |-
    {
      "enable": true
    }
  resource-qos-config: |-
    {
      "clusterStrategy": {
        "lsClass": {
          "cpuQOS": {
            "enable": true
          },
          "memoryQOS": {
            "enable": true
          },
          "resctrlQOS": {
            "enable": true
          }
        },
        "beClass": {
          "cpuQOS": {
            "enable": true
          },
          "memoryQOS": {
            "enable": true
          },
          "resctrlQOS": {
            "enable": true
          }
        }
      }
    }
  resource-threshold-config: |-
    {
      "clusterStrategy": {
        "enable": true
      }
    }

The ConfigMap includes three policies:

Policy key	What it does
`colocation-config`	Enables real-time node load monitoring and identifies overcommittable resources. See Dynamic resource overcommitment.
`resource-qos-config`	Enables fine-grained resource management for LS and BE workloads, including CPU QoS, Memory QoS, and L3 cache and MBA isolation.
`resource-threshold-config`	Dynamically limits BE resources based on node utilization watermarks. See Elastic resource limit.

Apply the ConfigMap:
```
kubectl apply -f configmap.yaml
```

Deploy workloads

Deploy an LS (online) and a BE (offline) workload to the same node using the koordinator.sh/qosClass pod label.

Deploy the LS workload (NGINX)

Create nginx-ls-pod.yaml. The koordinator.sh/qosClass: LS label marks this pod as latency-sensitive:

---
# Nginx application configuration
apiVersion: v1
data:
  config: |-
    user  nginx;
    worker_processes  80;  # The number of Nginx worker processes, which affects concurrency.

    events {
        worker_connections  1024;  # Default value is 1024.
    }

    http {
        server {
            listen  8000;

            gzip off;
            gzip_min_length 32;
            gzip_http_version 1.0;
            gzip_comp_level 3;
            gzip_types *;
        }
    }

    #daemon off;
kind: ConfigMap
metadata:
  name: nginx-conf

---
# Manifest for the nginx-ls-pod.
apiVersion: v1
kind: Pod
metadata:
  labels:
    koordinator.sh/qosClass: LS
    app: nginx
  name: nginx
spec:
  containers:
    - image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
      imagePullPolicy: IfNotPresent
      name: nginx
      ports:
        - containerPort: 8000
          hostPort: 8000 # The host port that will receive requests for load testing.
          protocol: TCP
      resources:
        limits:
          cpu: '8'
          memory: 1Gi
        requests:
          cpu: '8'
          memory: 1Gi
      volumeMounts:
        - mountPath: /apps/nginx/conf
          name: config
  hostNetwork: true
  restartPolicy: Never
  volumes:
    - configMap:
        items:
          - key: config
            path: nginx.conf
        name: nginx-conf
      name: config

Apply the manifest:
```
kubectl apply -f nginx-ls-pod.yaml
```

Deploy the BE workload (FFmpeg)

Create ffmpeg-be-pod.yaml. The koordinator.sh/qosClass: BE label marks this pod as best-effort. Resource limits use kubernetes.io/batch-cpu and kubernetes.io/batch-memory instead of standard CPU and memory:

apiVersion: v1
kind: Pod
metadata:
  labels:
    koordinator.sh/qosClass: BE
  name: be-ffmpeg
spec:
  containers:
    - command:
        - start-ffmpeg.sh
        - '30'
        - '2'
        - /apps/ffmpeg/input/HD2-h264.ts
        - /apps/ffmpeg/
      image: 'registry.cn-zhangjiakou.aliyuncs.com/acs/ffmpeg-4-4-1-for-slo-test:v0.1'
      imagePullPolicy: Always
      name: ffmpeg
      resources:
        limits:
          # Unit: millicores.
          kubernetes.io/batch-cpu: "70k"
          kubernetes.io/batch-memory: "22Gi"
        requests:
          # Unit: millicores.
          kubernetes.io/batch-cpu: "70k"
          kubernetes.io/batch-memory: "22Gi"

Apply the manifest:
```
kubectl apply -f ffmpeg-be-pod.yaml
```

Next steps

After both pods are running, explore ACK colocation capabilities:

See it in action

Colocate online services and video transcoding applications — an end-to-end example using this setup

Resource management

CPU control