ack-koordinator provides service level objective (SLO)-aware workload scheduling that lets you colocate online and offline workloads on the same node — keeping your online service performant while improving overall cluster resource utilization. This topic walks through colocating an NGINX web service and an FFmpeg video transcoding application using ack-koordinator.
Background
Colocating online and offline workloads on the same node makes sense because their resource demands are complementary: online services have variable load and consume resources in bursts, while offline batch jobs run continuously and can tolerate lower resource priority.
| Online workload | Offline workload | |
|---|---|---|
| Typical applications | Web services, APIs, microservices | Video transcoding, big data processing, AI training |
| Latency | Sensitive | Insensitive |
| SLO | High | Low |
| Resource usage pattern | Bursty, time-based | Continuous |
| Fault tolerance | Low — requires high availability | High — allows failure and retry |
ack-koordinator uses Quality of Service (QoS) classes to manage resource priority between colocated workloads. The two classes used in this topic are:
| QoS class | Label value | Typical use | CPU priority | Memory priority |
|---|---|---|---|---|
| Latency-sensitive (LS) | koordinator.sh/qosClass: LS | Online services (e.g., NGINX) | High | High |
| Best-effort (BE) | koordinator.sh/qosClass: BE | Offline batch jobs (e.g., FFmpeg) | Low | Low |
How it works
In this topic, an NGINX service (LS QoS class) and an FFmpeg video transcoding application (BE QoS class) run on the same node. Two colocation features work together to protect NGINX performance:
Resource reuse: BE workloads can use resources that are allocated to LS workloads but are currently idle, improving cluster resource utilization. For more information, see Dynamic resource overcommitment.
Resource isolation: Various mechanisms limit BE workload resource usage and prioritize LS workload resource demand. For more information, see CPU QoS, CPU Suppress, and Resource isolation based on the L3 cache and MBA.
This topic deploys the applications in three modes and compares the results:
| Mode | Description |
|---|---|
| Exclusive deployment (baseline) | Only NGINX runs on the node. |
| Default Kubernetes colocation (control) | NGINX and FFmpeg run on the same node with standard Kubernetes QoS classes — no extended resources or ack-koordinator isolation features. |
| SLO-aware colocation (experimental) | NGINX and FFmpeg run on the same node with ack-koordinator isolation features enabled. |
Prerequisites
Before you begin, ensure you have:
An ACK Pro cluster with two nodes:
Node 1 (tested machine): runs the NGINX service and the FFmpeg application. For optimal colocation performance, use an Elastic Compute Service (ECS) Bare Metal instance with Alibaba Cloud Linux as the operating system.
Node 2 (stress test machine): runs the wrk load testing tool and sends requests to the NGINX service.
For cluster creation steps, see Create an ACK Pro cluster.
ack-koordinator (formerly ack-slo-manager) installed with colocation policies enabled. See Getting started. This topic uses ack-koordinator 0.8.0.
CPU QoS requires Alibaba Cloud Linux as the node OS. Resource isolation based on the L3 cache and MBA requires an ECS Bare Metal instance.
Deploy the NGINX service and wrk
Deploy the NGINX service on the tested machine and the wrk load testing tool on the stress test machine.
Install wrk on the stress test machine
Run the following commands on Node 2 (the stress test machine) to install wrk 4.2.0:
wget -O wrk-4.2.0.tar.gz https://github.com/wg/wrk/archive/refs/tags/4.2.0.tar.gz && tar -xvf wrk-4.2.0.tar.gz
cd wrk-4.2.0 && make && chmod +x ./wrkDeploy the FFmpeg application
Deploy the offline FFmpeg video transcoding application on the tested machine. The YAML configuration differs slightly between the default Kubernetes colocation mode and the SLO-aware colocation mode — the relevant comments in the file explain each difference.
Create a file named
be-ffmpeg.yamlwith the following content:Deploy the FFmpeg application:
kubectl apply -f be-ffmpeg.yamlVerify the pod is running:
kubectl get pod -l app=ffmpeg -o wideExpected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES be-ffmpeg 1/1 Running 0 15s 11.162.XXX.XXX cn-beijing.192.168.2.93 <none> <none>
Run the stress tests
Run tests in each colocation mode and compare the results. The key metrics are:
Response time (RT) percentiles: RT-P90 is the maximum time to process 90% of requests; RT-P99 covers 99% of requests. Lower values indicate better NGINX performance.
Average CPU utilization: measured with
kubectl top node.
Mode 1: Exclusive deployment (baseline)
Only the NGINX service runs on the tested machine.
Deploy NGINX as described in Deploy the NGINX service and wrk.
Send load from the stress test machine:
# Replace node_ip with the IP address of the tested machine. ./wrk -t6 -c54 -d60s --latency http://${node_ip}:8000/Check CPU utilization:
kubectl top nodeExpected output:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% cn-beijing.192.168.2.93 29593m 29% xxxx xxxx cn-beijing.192.168.2.94 6874m 7% xxxx xxxxCPU utilization on the tested machine is approximately 29%.
After the test completes, review the wrk output. For accurate results, run multiple tests. Expected output:
Running 1m test @ http://192.168.2.94:8000/ 6 threads and 54 connections Thread Stats Avg Stdev Max +/- Stdev Latency 402.18us 1.07ms 59.56ms 99.83% Req/Sec 24.22k 1.12k 30.58k 74.15% Latency Distribution 50% 343.00us 75% 402.00us 90% 523.00us 99% 786.00us 8686569 requests in 1.00m, 6.88GB read Requests/sec: 144537.08 Transfer/sec: 117.16MBThe
Latency Distributionsection shows RT percentile values. In exclusive mode: RT-P50 is 343 microseconds, RT-P90 is 523 microseconds, and RT-P99 is 786 microseconds.
Mode 2: Default Kubernetes colocation (control)
Both NGINX and FFmpeg run on the tested machine without ack-koordinator isolation features.
Deploy NGINX as described in Deploy the NGINX service and wrk, then deploy the FFmpeg application using be-ffmpeg.yaml with the following modifications:
Remove the
koordinator.sh/qosClass: BElabel.Remove the
kubernetes.io/batch-cpuandkubernetes.io/batch-memoryextended resources.
Run the wrk load test and collect CPU utilization as in Mode 1. In this control configuration, the node CPU utilization reaches approximately 65%.
Mode 3: SLO-aware colocation (experimental)
Both NGINX and FFmpeg run on the tested machine with ack-koordinator isolation features enabled.
Follow the Getting started guide to enable SLO-aware colocation, then configure each feature:
[Dynamic resource overcommitment](https://www.alibabacloud.com/help/en/document_detail/412172.html#task-2190961): Use the default configuration. This allows the system to allocate idle LS pod resources to BE pods as overcommitted batch resources (
kubernetes.io/batch-cpuandkubernetes.io/batch-memory).[CPU Suppress](https://www.alibabacloud.com/help/en/document_detail/268626.html#task-2088911): Set
cpuSuppressThresholdPercentto65. Use defaults for other settings. When node CPU utilization exceeds 65%, this feature throttles BE pod CPU usage to protect LS pod performance.[CPU QoS](https://www.alibabacloud.com/help/en/document_detail/433810.html#task-2223861): Use the default configuration. This enables the CPU Identity capability on Alibaba Cloud Linux, giving LS pods scheduling priority over BE pods — including when simultaneous multithreading (SMT) runs threads from both pods on the same physical core.
[Resource isolation based on the L3 cache and MBA](https://www.alibabacloud.com/help/en/document_detail/273042.html#task-2093499): Use the default configuration. On ECS Bare Metal instances, this isolates L3 cache (last level cache) and memory bandwidth allocation (MBA) so that LS pods get priority access.
ImportantCPU QoS requires Alibaba Cloud Linux as the node OS. L3 cache and MBA isolation requires an ECS Bare Metal instance.
Deploy NGINX as described in Deploy the NGINX service and wrk.
Create a file named
besteffort-ffmpeg.yamlwith the following content: Show YAML file content# Pod for the offline FFmpeg video transcoding application (BE QoS class, SLO-aware mode) apiVersion: v1 kind: Pod metadata: name: besteffort-ffmpeg labels: app: ffmpeg # Set the QoS class to BE for SLO-aware scheduling. koordinator.sh/qosClass: BE spec: containers: - command: - start-ffmpeg.sh - '30' - '2' - /apps/ffmpeg/input/HD2-h264.ts - /apps/ffmpeg/ image: 'registry.cn-zhangjiakou.aliyuncs.com/acs/ffmpeg-4-4-1-for-slo-test:v0.1' imagePullPolicy: Always name: ffmpeg resources: # Request dynamically overcommitted resources. limits: kubernetes.io/batch-cpu: 70k kubernetes.io/batch-memory: 22Gi requests: kubernetes.io/batch-cpu: 70k kubernetes.io/batch-memory: 22Gi hostNetwork: true restartPolicy: Never nodeName: cn-beijing.192.168.2.93 # Replace with the node name of your tested machine.Deploy the FFmpeg application:
kubectl apply -f besteffort-ffmpeg.yamlVerify the FFmpeg pod is running:
kubectl get pod -l app=ffmpeg -o wideExpected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES besteffort-ffmpeg 1/1 Running 0 15s 11.162.XXX.XXX cn-beijing.192.168.2.93 <none> <none>Send load from the stress test machine:
# Replace node_ip with the IP address of the tested machine. ./wrk -t6 -c54 -d60s --latency http://${node_ip}:8000/Check CPU utilization:
kubectl top nodeExpected output:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% cn-beijing.192.168.2.93 65424m 63% xxxx xxxx cn-beijing.192.168.2.94 7040m 7% xxxx xxxxCPU utilization on the tested machine is approximately 63%.
After the test completes, review the wrk output and compare with the results from the other modes.
Test results
The following table compares NGINX response time and node CPU utilization across all three modes.
| Metric | Baseline (exclusive) | Control (default Kubernetes) | Experimental (SLO-aware) |
|---|---|---|---|
| NGINX RT-P90 (ms) | 0.533 | 0.574 (+7.7%) | 0.548 (2.8%) |
| NGINX RT-P99 (ms) | 0.93 | 1.07 (+16%) | 0.96 (+3.2%) |
| Average CPU utilization | 29.6% | 65.1% | 64.8% |
Key observations:
Default Kubernetes colocation vs. baseline: CPU utilization increases from 29.6% to 65.1%, but NGINX RT-P90 rises 7.7% and RT-P99 rises 16%. The latency distribution has a long tail.
SLO-aware colocation vs. baseline: CPU utilization increases from 29.6% to 64.8%, while RT-P90 rises only 2.8% and RT-P99 rises only 3.2%.
SLO-aware colocation vs. default Kubernetes colocation: CPU utilization is similar (~65%), but NGINX response times are significantly lower and close to the exclusive-deployment baseline.
SLO-aware colocation achieves roughly the same CPU utilization improvement as standard colocation, while keeping NGINX latency much closer to the no-colocation baseline.
FAQ
Why does wrk report "Socket errors: connect 54,"?
This error means the wrk client can't establish connections to the NGINX server because the number of connections exceeds the OS limit. Fix it by enabling TCP connection reuse on the stress test machine (not the tested machine).
Check whether TCP connection reuse is enabled:
sudo sysctl -n net.ipv4.tcp_tw_reuseA return value of
0or2means the feature is disabled.Enable TCP connection reuse:
sudo sysctl -w net.ipv4.tcp_tw_reuse=1Re-run the wrk stress test. If
Socket errors: connect 54no longer appears, the fix worked.
After testing is complete, disable TCP connection reuse to avoid unintended effects on other services: sysctl -w net.ipv4.tcp_tw_reuse=0.