Run Video Transcoding Alongside Online Services on ACK with Guaranteed QoS - Container Service for Kubernetes

ack-koordinator provides service level objective (SLO)-aware workload scheduling that lets you colocate online and offline workloads on the same node — keeping your online service performant while improving overall cluster resource utilization. This topic walks through colocating an NGINX web service and an FFmpeg video transcoding application using ack-koordinator.

Background

Colocating online and offline workloads on the same node makes sense because their resource demands are complementary: online services have variable load and consume resources in bursts, while offline batch jobs run continuously and can tolerate lower resource priority.

	Online workload	Offline workload
Typical applications	Web services, APIs, microservices	Video transcoding, big data processing, AI training
Latency	Sensitive	Insensitive
SLO	High	Low
Resource usage pattern	Bursty, time-based	Continuous
Fault tolerance	Low — requires high availability	High — allows failure and retry

ack-koordinator uses Quality of Service (QoS) classes to manage resource priority between colocated workloads. The two classes used in this topic are:

QoS class	Label value	Typical use	CPU priority	Memory priority
Latency-sensitive (LS)	`koordinator.sh/qosClass: LS`	Online services (e.g., NGINX)	High	High
Best-effort (BE)	`koordinator.sh/qosClass: BE`	Offline batch jobs (e.g., FFmpeg)	Low	Low

How it works

In this topic, an NGINX service (LS QoS class) and an FFmpeg video transcoding application (BE QoS class) run on the same node. Two colocation features work together to protect NGINX performance:

Resource reuse: BE workloads can use resources that are allocated to LS workloads but are currently idle, improving cluster resource utilization. For more information, see Dynamic resource overcommitment.
Resource isolation: Various mechanisms limit BE workload resource usage and prioritize LS workload resource demand. For more information, see CPU QoS, CPU Suppress, and Resource isolation based on the L3 cache and MBA.

This topic deploys the applications in three modes and compares the results:

Mode	Description
Exclusive deployment (baseline)	Only NGINX runs on the node.
Default Kubernetes colocation (control)	NGINX and FFmpeg run on the same node with standard Kubernetes QoS classes — no extended resources or ack-koordinator isolation features.
SLO-aware colocation (experimental)	NGINX and FFmpeg run on the same node with ack-koordinator isolation features enabled.

Prerequisites

Before you begin, ensure you have:

An ACK Pro cluster with two nodes:
- Node 1 (tested machine): runs the NGINX service and the FFmpeg application. For optimal colocation performance, use an Elastic Compute Service (ECS) Bare Metal instance with Alibaba Cloud Linux as the operating system.
- Node 2 (stress test machine): runs the wrk load testing tool and sends requests to the NGINX service.
- For cluster creation steps, see Create an ACK Pro cluster.
ack-koordinator (formerly ack-slo-manager) installed with colocation policies enabled. See Getting started. This topic uses ack-koordinator 0.8.0.

CPU QoS requires Alibaba Cloud Linux as the node OS. Resource isolation based on the L3 cache and MBA requires an ECS Bare Metal instance.

Deploy the NGINX service and wrk

Deploy the NGINX service on the tested machine and the wrk load testing tool on the stress test machine.

Deploy NGINX

Create a file named ls-nginx.yaml with the following content:

Show YAML file content

---
# NGINX configuration
apiVersion: v1
data:
  config: |-
    user  nginx;
    worker_processes  80; # Number of worker processes — controls concurrent request capacity.

    events {
        worker_connections  1024;  # Maximum connections per worker. Default: 1024.
    }

    http {
        server {
            listen  8000;

            gzip off;
            gzip_min_length 32;
            gzip_http_version 1.0;
            gzip_comp_level 3;
            gzip_types *;
        }
    }

    #daemon off;
kind: ConfigMap
metadata:
  name: nginx-conf

---
# Pod for the online NGINX service (LS QoS class)
apiVersion: v1
kind: Pod
metadata:
  labels:
    koordinator.sh/qosClass: LS
    app: nginx
  name: nginx
spec:
  containers:
    - image: 'koordinatorsh/nginx:v1.18-koord-exmaple'
      imagePullPolicy: IfNotPresent
      name: nginx
      ports:
        - containerPort: 8000
          hostPort: 8000 # Port exposed for load testing.
          protocol: TCP
      resources:
        limits:
          cpu: '80'
          memory: 10Gi
        requests:
          cpu: '80'
          memory: 10Gi
      volumeMounts:
        - mountPath: /apps/nginx/conf
          name: config
  hostNetwork: true
  restartPolicy: Never
  volumes:
    - configMap:
        items:
          - key: config
            path: nginx.conf
        name: nginx-conf
      name: config
  nodeName: cn-beijing.192.168.2.93  # Replace with the node name of your tested machine.

Deploy the NGINX service:
```
kubectl apply -f ls-nginx.yaml
```

Verify the pod is running:

kubectl get pod -l app=nginx -o wide

Expected output:

NAME    READY   STATUS    RESTARTS   AGE    IP               NODE                      NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          43s    11.162.XXX.XXX   cn-beijing.192.168.2.93   <none>           <none>

The Running status confirms the NGINX service is up on the tested machine.

Install wrk on the stress test machine

Run the following commands on Node 2 (the stress test machine) to install wrk 4.2.0:

wget -O wrk-4.2.0.tar.gz https://github.com/wg/wrk/archive/refs/tags/4.2.0.tar.gz && tar -xvf wrk-4.2.0.tar.gz
cd wrk-4.2.0 && make && chmod +x ./wrk

Deploy the FFmpeg application

Deploy the offline FFmpeg video transcoding application on the tested machine. The YAML configuration differs slightly between the default Kubernetes colocation mode and the SLO-aware colocation mode — the relevant comments in the file explain each difference.

Create a file named be-ffmpeg.yaml with the following content:

Show YAML file content

# Pod for the offline FFmpeg video transcoding application (BE QoS class)
apiVersion: v1
kind: Pod
metadata:
  name: be-ffmpeg
  labels:
    app: ffmpeg
  # Default Kubernetes colocation mode: remove the koordinator.sh/qosClass: BE label.
  # SLO-aware colocation mode: keep the koordinator.sh/qosClass: BE label.
    koordinator.sh/qosClass: BE
spec:
  containers:
    # Increase the process count to control CPU utilization of the transcoding application.
    # Default: 25 processes, each with 2 parallel threads.
    - command:
        - start-ffmpeg.sh
        - '25'
        - '2'
        - /apps/ffmpeg/input/HD2-h264.ts
        - /apps/ffmpeg/
      image: 'registry.cn-zhangjiakou.aliyuncs.com/acs/ffmpeg-4-4-1-for-slo-test:v0.1'
      imagePullPolicy: Always
      name: ffmpeg
      resources:
      # Default Kubernetes colocation mode: remove the kubernetes.io/batch-cpu and
      # kubernetes.io/batch-memory extended resources.
      # SLO-aware colocation mode: keep them, sized to your node's resource spec.
        limits:
          kubernetes.io/batch-cpu: 70k
          kubernetes.io/batch-memory: 22Gi
        requests:
          kubernetes.io/batch-cpu: 70k
          kubernetes.io/batch-memory: 22Gi
  hostNetwork: true
  restartPolicy: Never
  nodeName: cn-beijing.192.168.2.93  # Replace with the node name of your tested machine.

Deploy the FFmpeg application:
```
kubectl apply -f be-ffmpeg.yaml
```

Verify the pod is running:

kubectl get pod -l app=ffmpeg -o wide

Expected output:

NAME        READY   STATUS    RESTARTS   AGE    IP               NODE                      NOMINATED NODE   READINESS GATES
be-ffmpeg   1/1     Running   0          15s    11.162.XXX.XXX   cn-beijing.192.168.2.93   <none>           <none>

Run the stress tests

Run tests in each colocation mode and compare the results. The key metrics are:

Response time (RT) percentiles: RT-P90 is the maximum time to process 90% of requests; RT-P99 covers 99% of requests. Lower values indicate better NGINX performance.
Average CPU utilization: measured with kubectl top node.

Mode 1: Exclusive deployment (baseline)

Only the NGINX service runs on the tested machine.

Deploy NGINX as described in Deploy the NGINX service and wrk.

Send load from the stress test machine:

# Replace node_ip with the IP address of the tested machine.
./wrk -t6 -c54 -d60s --latency http://${node_ip}:8000/

Check CPU utilization:

kubectl top node

Expected output:

NAME                      CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
cn-beijing.192.168.2.93   29593m       29%    xxxx            xxxx
cn-beijing.192.168.2.94   6874m        7%     xxxx            xxxx

CPU utilization on the tested machine is approximately 29%.

After the test completes, review the wrk output. For accurate results, run multiple tests. Expected output:

Running 1m test @ http://192.168.2.94:8000/
  6 threads and 54 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   402.18us    1.07ms  59.56ms   99.83%
    Req/Sec    24.22k     1.12k   30.58k    74.15%
  Latency Distribution
     50%  343.00us
     75%  402.00us
     90%  523.00us
     99%  786.00us
  8686569 requests in 1.00m, 6.88GB read
Requests/sec: 144537.08
Transfer/sec:    117.16MB

The Latency Distribution section shows RT percentile values. In exclusive mode: RT-P50 is 343 microseconds, RT-P90 is 523 microseconds, and RT-P99 is 786 microseconds.

Mode 2: Default Kubernetes colocation (control)

Both NGINX and FFmpeg run on the tested machine without ack-koordinator isolation features.

Deploy NGINX as described in Deploy the NGINX service and wrk, then deploy the FFmpeg application using be-ffmpeg.yaml with the following modifications:

Remove the koordinator.sh/qosClass: BE label.
Remove the kubernetes.io/batch-cpu and kubernetes.io/batch-memory extended resources.

Run the wrk load test and collect CPU utilization as in Mode 1. In this control configuration, the node CPU utilization reaches approximately 65%.

Mode 3: SLO-aware colocation (experimental)

Both NGINX and FFmpeg run on the tested machine with ack-koordinator isolation features enabled.

Follow the Getting started guide to enable SLO-aware colocation, then configure each feature:
- [Dynamic resource overcommitment](https://www.alibabacloud.com/help/en/document_detail/412172.html#task-2190961): Use the default configuration. This allows the system to allocate idle LS pod resources to BE pods as overcommitted batch resources (kubernetes.io/batch-cpu and kubernetes.io/batch-memory).
- [CPU Suppress](https://www.alibabacloud.com/help/en/document_detail/268626.html#task-2088911): Set cpuSuppressThresholdPercent to 65. Use defaults for other settings. When node CPU utilization exceeds 65%, this feature throttles BE pod CPU usage to protect LS pod performance.
- [CPU QoS](https://www.alibabacloud.com/help/en/document_detail/433810.html#task-2223861): Use the default configuration. This enables the CPU Identity capability on Alibaba Cloud Linux, giving LS pods scheduling priority over BE pods — including when simultaneous multithreading (SMT) runs threads from both pods on the same physical core.
- [Resource isolation based on the L3 cache and MBA](https://www.alibabacloud.com/help/en/document_detail/273042.html#task-2093499): Use the default configuration. On ECS Bare Metal instances, this isolates L3 cache (last level cache) and memory bandwidth allocation (MBA) so that LS pods get priority access.
Important
CPU QoS requires Alibaba Cloud Linux as the node OS. L3 cache and MBA isolation requires an ECS Bare Metal instance.
Deploy NGINX as described in Deploy the NGINX service and wrk.

Create a file named besteffort-ffmpeg.yaml with the following content: Show YAML file content

# Pod for the offline FFmpeg video transcoding application (BE QoS class, SLO-aware mode)
apiVersion: v1
kind: Pod
metadata:
  name: besteffort-ffmpeg
  labels:
    app: ffmpeg
    # Set the QoS class to BE for SLO-aware scheduling.
    koordinator.sh/qosClass: BE
spec:
  containers:
    - command:
        - start-ffmpeg.sh
        - '30'
        - '2'
        - /apps/ffmpeg/input/HD2-h264.ts
        - /apps/ffmpeg/
      image: 'registry.cn-zhangjiakou.aliyuncs.com/acs/ffmpeg-4-4-1-for-slo-test:v0.1'
      imagePullPolicy: Always
      name: ffmpeg
      resources:
        # Request dynamically overcommitted resources.
        limits:
          kubernetes.io/batch-cpu: 70k
          kubernetes.io/batch-memory: 22Gi
        requests:
          kubernetes.io/batch-cpu: 70k
          kubernetes.io/batch-memory: 22Gi
  hostNetwork: true
  restartPolicy: Never
  nodeName: cn-beijing.192.168.2.93  # Replace with the node name of your tested machine.

Deploy the FFmpeg application:
```
kubectl apply -f besteffort-ffmpeg.yaml
```

Verify the FFmpeg pod is running:

kubectl get pod -l app=ffmpeg -o wide

Expected output:

NAME                READY   STATUS    RESTARTS   AGE    IP               NODE                      NOMINATED NODE   READINESS GATES
besteffort-ffmpeg   1/1     Running   0          15s    11.162.XXX.XXX   cn-beijing.192.168.2.93   <none>           <none>

Send load from the stress test machine:

# Replace node_ip with the IP address of the tested machine.
./wrk -t6 -c54 -d60s --latency http://${node_ip}:8000/

Check CPU utilization:

kubectl top node

Expected output:

NAME                      CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
cn-beijing.192.168.2.93   65424m       63%    xxxx            xxxx
cn-beijing.192.168.2.94   7040m        7%     xxxx            xxxx

CPU utilization on the tested machine is approximately 63%.

After the test completes, review the wrk output and compare with the results from the other modes.

Test results

The following table compares NGINX response time and node CPU utilization across all three modes.

Metric	Baseline (exclusive)	Control (default Kubernetes)	Experimental (SLO-aware)
NGINX RT-P90 (ms)	0.533	0.574 (+7.7%)	0.548 (2.8%)
NGINX RT-P99 (ms)	0.93	1.07 (+16%)	0.96 (+3.2%)
Average CPU utilization	29.6%	65.1%	64.8%

Key observations:

Default Kubernetes colocation vs. baseline: CPU utilization increases from 29.6% to 65.1%, but NGINX RT-P90 rises 7.7% and RT-P99 rises 16%. The latency distribution has a long tail.
SLO-aware colocation vs. baseline: CPU utilization increases from 29.6% to 64.8%, while RT-P90 rises only 2.8% and RT-P99 rises only 3.2%.
SLO-aware colocation vs. default Kubernetes colocation: CPU utilization is similar (~65%), but NGINX response times are significantly lower and close to the exclusive-deployment baseline.

SLO-aware colocation achieves roughly the same CPU utilization improvement as standard colocation, while keeping NGINX latency much closer to the no-colocation baseline.

FAQ

Why does wrk report "Socket errors: connect 54,"?

This error means the wrk client can't establish connections to the NGINX server because the number of connections exceeds the OS limit. Fix it by enabling TCP connection reuse on the stress test machine (not the tested machine).

Check whether TCP connection reuse is enabled:
```
sudo sysctl -n net.ipv4.tcp_tw_reuse
```
A return value of 0 or 2 means the feature is disabled.
Enable TCP connection reuse:
```
sudo sysctl -w net.ipv4.tcp_tw_reuse=1
```
Re-run the wrk stress test. If Socket errors: connect 54 no longer appears, the fix worked.

After testing is complete, disable TCP connection reuse to avoid unintended effects on other services: sysctl -w net.ipv4.tcp_tw_reuse=0.

Container Service for Kubernetes:Colocate an online service with a video transcoding application

Background

How it works

Prerequisites

Deploy the NGINX service and wrk

Install wrk on the stress test machine

Deploy the FFmpeg application

Run the stress tests

Mode 1: Exclusive deployment (baseline)

Mode 2: Default Kubernetes colocation (control)

Mode 3: SLO-aware colocation (experimental)

Test results

FAQ

Why does wrk report "Socket errors: connect 54,"?

What's next