All Products
Search
Document Center

Container Service for Kubernetes:Colocate an online service with a video transcoding application

Last Updated:Mar 25, 2026

ack-koordinator provides service level objective (SLO)-aware workload scheduling that lets you colocate online and offline workloads on the same node — keeping your online service performant while improving overall cluster resource utilization. This topic walks through colocating an NGINX web service and an FFmpeg video transcoding application using ack-koordinator.

Background

Colocating online and offline workloads on the same node makes sense because their resource demands are complementary: online services have variable load and consume resources in bursts, while offline batch jobs run continuously and can tolerate lower resource priority.

Online workloadOffline workload
Typical applicationsWeb services, APIs, microservicesVideo transcoding, big data processing, AI training
LatencySensitiveInsensitive
SLOHighLow
Resource usage patternBursty, time-basedContinuous
Fault toleranceLow — requires high availabilityHigh — allows failure and retry

ack-koordinator uses Quality of Service (QoS) classes to manage resource priority between colocated workloads. The two classes used in this topic are:

QoS classLabel valueTypical useCPU priorityMemory priority
Latency-sensitive (LS)koordinator.sh/qosClass: LSOnline services (e.g., NGINX)HighHigh
Best-effort (BE)koordinator.sh/qosClass: BEOffline batch jobs (e.g., FFmpeg)LowLow

How it works

In this topic, an NGINX service (LS QoS class) and an FFmpeg video transcoding application (BE QoS class) run on the same node. Two colocation features work together to protect NGINX performance:

image

This topic deploys the applications in three modes and compares the results:

ModeDescription
Exclusive deployment (baseline)Only NGINX runs on the node.
Default Kubernetes colocation (control)NGINX and FFmpeg run on the same node with standard Kubernetes QoS classes — no extended resources or ack-koordinator isolation features.
SLO-aware colocation (experimental)NGINX and FFmpeg run on the same node with ack-koordinator isolation features enabled.

Prerequisites

Before you begin, ensure you have:

CPU QoS requires Alibaba Cloud Linux as the node OS. Resource isolation based on the L3 cache and MBA requires an ECS Bare Metal instance.

Deploy the NGINX service and wrk

Deploy the NGINX service on the tested machine and the wrk load testing tool on the stress test machine.

Deploy NGINX

  1. Create a file named ls-nginx.yaml with the following content:

    Show YAML file content

    ---
    # NGINX configuration
    apiVersion: v1
    data:
      config: |-
        user  nginx;
        worker_processes  80; # Number of worker processes — controls concurrent request capacity.
    
        events {
            worker_connections  1024;  # Maximum connections per worker. Default: 1024.
        }
    
        http {
            server {
                listen  8000;
    
                gzip off;
                gzip_min_length 32;
                gzip_http_version 1.0;
                gzip_comp_level 3;
                gzip_types *;
            }
        }
    
        #daemon off;
    kind: ConfigMap
    metadata:
      name: nginx-conf
    
    ---
    # Pod for the online NGINX service (LS QoS class)
    apiVersion: v1
    kind: Pod
    metadata:
      labels:
        koordinator.sh/qosClass: LS
        app: nginx
      name: nginx
    spec:
      containers:
        - image: 'koordinatorsh/nginx:v1.18-koord-exmaple'
          imagePullPolicy: IfNotPresent
          name: nginx
          ports:
            - containerPort: 8000
              hostPort: 8000 # Port exposed for load testing.
              protocol: TCP
          resources:
            limits:
              cpu: '80'
              memory: 10Gi
            requests:
              cpu: '80'
              memory: 10Gi
          volumeMounts:
            - mountPath: /apps/nginx/conf
              name: config
      hostNetwork: true
      restartPolicy: Never
      volumes:
        - configMap:
            items:
              - key: config
                path: nginx.conf
            name: nginx-conf
          name: config
      nodeName: cn-beijing.192.168.2.93  # Replace with the node name of your tested machine.
  2. Deploy the NGINX service:

    kubectl apply -f ls-nginx.yaml
  3. Verify the pod is running:

    kubectl get pod -l app=nginx -o wide

    Expected output:

    NAME    READY   STATUS    RESTARTS   AGE    IP               NODE                      NOMINATED NODE   READINESS GATES
    nginx   1/1     Running   0          43s    11.162.XXX.XXX   cn-beijing.192.168.2.93   <none>           <none>

    The Running status confirms the NGINX service is up on the tested machine.

Install wrk on the stress test machine

Run the following commands on Node 2 (the stress test machine) to install wrk 4.2.0:

wget -O wrk-4.2.0.tar.gz https://github.com/wg/wrk/archive/refs/tags/4.2.0.tar.gz && tar -xvf wrk-4.2.0.tar.gz
cd wrk-4.2.0 && make && chmod +x ./wrk

Deploy the FFmpeg application

Deploy the offline FFmpeg video transcoding application on the tested machine. The YAML configuration differs slightly between the default Kubernetes colocation mode and the SLO-aware colocation mode — the relevant comments in the file explain each difference.

  1. Create a file named be-ffmpeg.yaml with the following content:

    Show YAML file content

    # Pod for the offline FFmpeg video transcoding application (BE QoS class)
    apiVersion: v1
    kind: Pod
    metadata:
      name: be-ffmpeg
      labels:
        app: ffmpeg
      # Default Kubernetes colocation mode: remove the koordinator.sh/qosClass: BE label.
      # SLO-aware colocation mode: keep the koordinator.sh/qosClass: BE label.
        koordinator.sh/qosClass: BE
    spec:
      containers:
        # Increase the process count to control CPU utilization of the transcoding application.
        # Default: 25 processes, each with 2 parallel threads.
        - command:
            - start-ffmpeg.sh
            - '25'
            - '2'
            - /apps/ffmpeg/input/HD2-h264.ts
            - /apps/ffmpeg/
          image: 'registry.cn-zhangjiakou.aliyuncs.com/acs/ffmpeg-4-4-1-for-slo-test:v0.1'
          imagePullPolicy: Always
          name: ffmpeg
          resources:
          # Default Kubernetes colocation mode: remove the kubernetes.io/batch-cpu and
          # kubernetes.io/batch-memory extended resources.
          # SLO-aware colocation mode: keep them, sized to your node's resource spec.
            limits:
              kubernetes.io/batch-cpu: 70k
              kubernetes.io/batch-memory: 22Gi
            requests:
              kubernetes.io/batch-cpu: 70k
              kubernetes.io/batch-memory: 22Gi
      hostNetwork: true
      restartPolicy: Never
      nodeName: cn-beijing.192.168.2.93  # Replace with the node name of your tested machine.
  2. Deploy the FFmpeg application:

    kubectl apply -f be-ffmpeg.yaml
  3. Verify the pod is running:

    kubectl get pod -l app=ffmpeg -o wide

    Expected output:

    NAME        READY   STATUS    RESTARTS   AGE    IP               NODE                      NOMINATED NODE   READINESS GATES
    be-ffmpeg   1/1     Running   0          15s    11.162.XXX.XXX   cn-beijing.192.168.2.93   <none>           <none>

Run the stress tests

Run tests in each colocation mode and compare the results. The key metrics are:

  • Response time (RT) percentiles: RT-P90 is the maximum time to process 90% of requests; RT-P99 covers 99% of requests. Lower values indicate better NGINX performance.

  • Average CPU utilization: measured with kubectl top node.

Mode 1: Exclusive deployment (baseline)

Only the NGINX service runs on the tested machine.

  1. Deploy NGINX as described in Deploy the NGINX service and wrk.

  2. Send load from the stress test machine:

    # Replace node_ip with the IP address of the tested machine.
    ./wrk -t6 -c54 -d60s --latency http://${node_ip}:8000/
  3. Check CPU utilization:

    kubectl top node

    Expected output:

    NAME                      CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
    cn-beijing.192.168.2.93   29593m       29%    xxxx            xxxx
    cn-beijing.192.168.2.94   6874m        7%     xxxx            xxxx

    CPU utilization on the tested machine is approximately 29%.

  4. After the test completes, review the wrk output. For accurate results, run multiple tests. Expected output:

    Running 1m test @ http://192.168.2.94:8000/
      6 threads and 54 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency   402.18us    1.07ms  59.56ms   99.83%
        Req/Sec    24.22k     1.12k   30.58k    74.15%
      Latency Distribution
         50%  343.00us
         75%  402.00us
         90%  523.00us
         99%  786.00us
      8686569 requests in 1.00m, 6.88GB read
    Requests/sec: 144537.08
    Transfer/sec:    117.16MB

    The Latency Distribution section shows RT percentile values. In exclusive mode: RT-P50 is 343 microseconds, RT-P90 is 523 microseconds, and RT-P99 is 786 microseconds.

Mode 2: Default Kubernetes colocation (control)

Both NGINX and FFmpeg run on the tested machine without ack-koordinator isolation features.

Deploy NGINX as described in Deploy the NGINX service and wrk, then deploy the FFmpeg application using be-ffmpeg.yaml with the following modifications:

  • Remove the koordinator.sh/qosClass: BE label.

  • Remove the kubernetes.io/batch-cpu and kubernetes.io/batch-memory extended resources.

Run the wrk load test and collect CPU utilization as in Mode 1. In this control configuration, the node CPU utilization reaches approximately 65%.

Mode 3: SLO-aware colocation (experimental)

Both NGINX and FFmpeg run on the tested machine with ack-koordinator isolation features enabled.

  1. Follow the Getting started guide to enable SLO-aware colocation, then configure each feature:

    • [Dynamic resource overcommitment](https://www.alibabacloud.com/help/en/document_detail/412172.html#task-2190961): Use the default configuration. This allows the system to allocate idle LS pod resources to BE pods as overcommitted batch resources (kubernetes.io/batch-cpu and kubernetes.io/batch-memory).

    • [CPU Suppress](https://www.alibabacloud.com/help/en/document_detail/268626.html#task-2088911): Set cpuSuppressThresholdPercent to 65. Use defaults for other settings. When node CPU utilization exceeds 65%, this feature throttles BE pod CPU usage to protect LS pod performance.

    • [CPU QoS](https://www.alibabacloud.com/help/en/document_detail/433810.html#task-2223861): Use the default configuration. This enables the CPU Identity capability on Alibaba Cloud Linux, giving LS pods scheduling priority over BE pods — including when simultaneous multithreading (SMT) runs threads from both pods on the same physical core.

    • [Resource isolation based on the L3 cache and MBA](https://www.alibabacloud.com/help/en/document_detail/273042.html#task-2093499): Use the default configuration. On ECS Bare Metal instances, this isolates L3 cache (last level cache) and memory bandwidth allocation (MBA) so that LS pods get priority access.

    Important

    CPU QoS requires Alibaba Cloud Linux as the node OS. L3 cache and MBA isolation requires an ECS Bare Metal instance.

  2. Deploy NGINX as described in Deploy the NGINX service and wrk.

  3. Create a file named besteffort-ffmpeg.yaml with the following content: Show YAML file content

    # Pod for the offline FFmpeg video transcoding application (BE QoS class, SLO-aware mode)
    apiVersion: v1
    kind: Pod
    metadata:
      name: besteffort-ffmpeg
      labels:
        app: ffmpeg
        # Set the QoS class to BE for SLO-aware scheduling.
        koordinator.sh/qosClass: BE
    spec:
      containers:
        - command:
            - start-ffmpeg.sh
            - '30'
            - '2'
            - /apps/ffmpeg/input/HD2-h264.ts
            - /apps/ffmpeg/
          image: 'registry.cn-zhangjiakou.aliyuncs.com/acs/ffmpeg-4-4-1-for-slo-test:v0.1'
          imagePullPolicy: Always
          name: ffmpeg
          resources:
            # Request dynamically overcommitted resources.
            limits:
              kubernetes.io/batch-cpu: 70k
              kubernetes.io/batch-memory: 22Gi
            requests:
              kubernetes.io/batch-cpu: 70k
              kubernetes.io/batch-memory: 22Gi
      hostNetwork: true
      restartPolicy: Never
      nodeName: cn-beijing.192.168.2.93  # Replace with the node name of your tested machine.
  4. Deploy the FFmpeg application:

    kubectl apply -f besteffort-ffmpeg.yaml
  5. Verify the FFmpeg pod is running:

    kubectl get pod -l app=ffmpeg -o wide

    Expected output:

    NAME                READY   STATUS    RESTARTS   AGE    IP               NODE                      NOMINATED NODE   READINESS GATES
    besteffort-ffmpeg   1/1     Running   0          15s    11.162.XXX.XXX   cn-beijing.192.168.2.93   <none>           <none>
  6. Send load from the stress test machine:

    # Replace node_ip with the IP address of the tested machine.
    ./wrk -t6 -c54 -d60s --latency http://${node_ip}:8000/
  7. Check CPU utilization:

    kubectl top node

    Expected output:

    NAME                      CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
    cn-beijing.192.168.2.93   65424m       63%    xxxx            xxxx
    cn-beijing.192.168.2.94   7040m        7%     xxxx            xxxx

    CPU utilization on the tested machine is approximately 63%.

  8. After the test completes, review the wrk output and compare with the results from the other modes.

Test results

The following table compares NGINX response time and node CPU utilization across all three modes.

MetricBaseline (exclusive)Control (default Kubernetes)Experimental (SLO-aware)
NGINX RT-P90 (ms)0.5330.574 (+7.7%)0.548 (2.8%)
NGINX RT-P99 (ms)0.931.07 (+16%)0.96 (+3.2%)
Average CPU utilization29.6%65.1%64.8%

Key observations:

  • Default Kubernetes colocation vs. baseline: CPU utilization increases from 29.6% to 65.1%, but NGINX RT-P90 rises 7.7% and RT-P99 rises 16%. The latency distribution has a long tail.

  • SLO-aware colocation vs. baseline: CPU utilization increases from 29.6% to 64.8%, while RT-P90 rises only 2.8% and RT-P99 rises only 3.2%.

  • SLO-aware colocation vs. default Kubernetes colocation: CPU utilization is similar (~65%), but NGINX response times are significantly lower and close to the exclusive-deployment baseline.

SLO-aware colocation achieves roughly the same CPU utilization improvement as standard colocation, while keeping NGINX latency much closer to the no-colocation baseline.

FAQ

Why does wrk report "Socket errors: connect 54,"?

This error means the wrk client can't establish connections to the NGINX server because the number of connections exceeds the OS limit. Fix it by enabling TCP connection reuse on the stress test machine (not the tested machine).

  1. Check whether TCP connection reuse is enabled:

    sudo sysctl -n net.ipv4.tcp_tw_reuse

    A return value of 0 or 2 means the feature is disabled.

  2. Enable TCP connection reuse:

    sudo sysctl -w net.ipv4.tcp_tw_reuse=1
  3. Re-run the wrk stress test. If Socket errors: connect 54 no longer appears, the fix worked.

After testing is complete, disable TCP connection reuse to avoid unintended effects on other services: sysctl -w net.ipv4.tcp_tw_reuse=0.

What's next