All Products
Search
Document Center

Container Service for Kubernetes:Use P2P acceleration

Last Updated:Mar 26, 2026

The P2P acceleration feature accelerates image pulls to reduce application deployment time. When many nodes in a container cluster need to pull an image simultaneously, you can use P2P acceleration to improve performance. This topic describes how to use the P2P acceleration feature to accelerate image pulls.

How P2P acceleration works

When many nodes pull an image at the same time, the container image registry becomes a bandwidth bottleneck. P2P acceleration distributes image data across compute nodes in the cluster, reducing back-to-origin traffic and speeding up pulls for all nodes simultaneously.

In a 1000-node cluster pulling a 1 GB image over a 10 Gbit/s network, P2P acceleration reduces pull time by more than 95% compared to standard image pulls. The new P2P acceleration mode also improves performance by 30% to 50% over the old mode.

By default, the new P2P acceleration mode is used when Container Registry loads image resources on demand. For details, see Load resources of a container image on demand.

P2P acceleration is supported in the following cluster types:

  • ACK clusters

  • On-premises clusters and clusters of third-party cloud service providers

Prerequisites

Before you begin, ensure that you have:

How the webhook modifies your workload

When P2P acceleration is enabled for a Pod, the P2P acceleration agent's webhook automatically:

  1. Replaces the container image address with a P2P-accelerated endpoint. For example:

    • Original: test****vpc.cn-hangzhou.cr.aliyuncs.com/docker-builder/nginx:latest

    • Replaced: test****vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001/docker-builder/nginx:latest

  2. Generates an image pull secret for the P2P endpoint, copied from your original image pull secret. The new secret differs only in the domain name.

Important

Image pull secret generation and image address replacement are asynchronous. To avoid pull failures, create or issue the image pull secret for the P2P endpoint before deploying a workload. In the example above, the secret is created in the test-registry-vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001 domain.

Note

If the credentials in your original image pull secret are invalid, the P2P-accelerated image pull will also fail.

Enable P2P acceleration

Add the P2P acceleration label (k8s.aliyun.com/image-accelerate-mode: p2p) to a workload or a namespace. You do not need to modify workload YAML files when using the namespace-level method.

Option 1: Add the label to a workload

This example adds the label to a Deployment. Edit the Deployment:

kubectl edit deploy <Name of the Deployment>

Add the k8s.aliyun.com/image-accelerate-mode: p2p label to the Pod template's labels section:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        # enable P2P
        k8s.aliyun.com/image-accelerate-mode: p2p
        app: nginx
    spec:
      # your ACR instance image pull secret
      imagePullSecrets:
      - name: test-registry
      containers:
      # your ACR instance image
      - image: test-registry-vpc.cn-hangzhou.cr.aliyuncs.com/docker-builder/nginx:latest
        name: test
        command: ["sleep", "3600"]

Option 2: Add the label to a namespace

Applying the label at the namespace level enables P2P acceleration for all eligible workloads in that namespace, with no per-workload changes required.

Using the ACK console:

  1. Log on to the ACK consoleACK console. In the navigation pane on the left, click Clusters.

  2. On the Clusters page, find the cluster you want and click its name. In the left-side navigation pane, click Namespaces and Quotas.

  3. On the Namespace page, find the target namespace and click Edit in the Actions column.

  4. In the Edit Namespace dialog box, click +Labels, set Variable Name to k8s.aliyun.com/image-accelerate-mode and Variable Value to p2p, and then click OK.

Using kubectl:

kubectl label namespaces <YOUR-NAMESPACE> k8s.aliyun.com/image-accelerate-mode=p2p

Verify P2P acceleration

After enabling P2P acceleration, confirm that the webhook has injected the annotation, P2P image address, and P2P image pull secret into the Pod.

Run the following command:

kubectl get po <Name of the pod> -oyaml

Expected output:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    # injected automatically
    k8s.aliyun.com/image-accelerate-mode: p2p
    k8s.aliyun.com/p2p-config: '...'
spec:
  containers:
  # image replaced with P2P endpoint
  - image: test-registry-vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001/docker-builder/nginx:latest
  imagePullSecrets:
  - name: test-registry
  # image pull secret for P2P endpoint
  - name: acr-credential-test-registry-p2p

P2P acceleration is active when all three of the following are present in the output:

  • The k8s.aliyun.com/image-accelerate-mode: p2p annotation

  • The image address replaced with a .distributed. P2P endpoint

  • The P2P image pull secret (acr-credential-<original-secret-name>-p2p)

(Optional) Disable loading images on demand and enable P2P acceleration

Use this procedure to configure a single node to use P2P acceleration without on-demand image loading.

Note

These changes apply to a single node only. Subsequent O&M operations on the node may overwrite them. Re-apply the changes if that happens.

  1. Log on to the ACK consoleACK console. In the navigation pane on the left, click Clusters.

  2. On the Clusters page, click the name of the cluster you want to manage. In the navigation pane on the left, choose Nodes > Nodes.

  3. On the Nodes page, click the instance ID under the IP address of the node you want to manage.

  4. On the instance details page, use Connect to log on to the node.

  5. Run the vi command to edit the p2pConfig field in the /etc/overlaybd/overlaybd.json file. Set enable to false:

    {
         "p2pConfig": {
            "enable": false,
            "address": "https://localhost:6****/accelerator"
        },
    ... ...
    }
  6. Restart the overlaybd service to reload image resources on demand:

    service overlaybd-tcmu restart

(Optional) Enable acceleration metric collection

Enable metric collection

Enable metric collection when installing the P2P acceleration agent. In the agent's YAML configuration, set exporter.enable to true:

p2p:

  v2:
    # Component for P2P v2
    image: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/dadi-agent
    imageTag: v0.1.2-72276d4-aliyun

    # Concurrency limit on the number of layers each node proxy downloads simultaneously
    proxyConcurrencyLimit: 128

    # Port for communication between P2P nodes
    p2pPort: 65002

    cache:
      # Disk cache capacity in bytes, default 4 GB
      capacity: 4294967296
      # Set to 1 if you are using high-performance disks, e.g. ESSD PL2/PL3
      aioEnable: 0
    exporter:
      # Set to true to enable metric collection
      enable: false
      port: 65003

    # Downstream throughput limit in MB/s
    throttleLimitMB: 512

Access metrics

The exporter field in the P2P YAML file defines the listening port. Configure it as follows:

ExporterConfig:
  enable: true # Enables the metric collection feature.
  port: 65006 # Listening port.
  standaloneExporterPort: true # Exposes a standalone port. If set to false, throughput is reported over the HTTP service port.

To retrieve metrics, run:

curl 127.0.0.1:$port/metrics

Example output:

# HELP DADIP2P_Alive
# TYPE DADIP2P_Alive gauge
DADIP2P_Alive{node="192.168.69.172:65005",mode="agent"} 1.000000 1692156721833

# HELP DADIP2P_Read_Throughtput Bytes / sec
# TYPE DADIP2P_Read_Throughtput gauge
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833

# HELP DADIP2P_QPS
# TYPE DADIP2P_QPS gauge
DADIP2P_QPS{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833

# HELP DADIP2P_MaxLatency us
# TYPE DADIP2P_MaxLatency gauge
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833

# HELP DADIP2P_Count Bytes
# TYPE DADIP2P_Count gauge
DADIP2P_Count{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833

# HELP DADIP2P_Cache
# TYPE DADIP2P_Cache gauge
DADIP2P_Cache{node="192.168.69.172:65005",type="allocated",mode="agent"} 4294967296.000000 1692156721833
DADIP2P_Cache{node="192.168.69.172:65005",type="used",mode="agent"} 4294971392.000000 1692156721833

# HELP DADIP2P_Label
# TYPE DADIP2P_Label gauge

Metrics reference

Metric names

Metric Description Unit
DADIP2P_Alive Whether the service is alive
DADIP2P_Read_Throughtput P2P service read throughput bytes/s
DADIP2P_QPS Queries per second
DADIP2P_MaxLatency Maximum request latency μs
DADIP2P_Count Cumulative traffic processed bytes
DADIP2P_Cache Cache usage per server bytes

Tags

Tag Values Description
node IP:port Service address of the P2P agent or root
type pread Downstream request processing
download Back-to-origin routing
peer P2P network distribution
disk Disk operations
http HTTP request processing
allocated Cache space allocated
used Cache space in use

Example:

DADIP2P_Count{node="11.238.108.XXX:9877",type="http",mode="agent"} 4248808352.000000 1692157615810
The total HTTP request traffic processed by the agent service: 4248808352 bytes.

DADIP2P_Cache{node="11.238.108.XXX:9877",type="used",mode="agent"} 2147487744.000000 1692157615810
The cache used by the agent: 2147487744 bytes.

Audit logs

Enable audit logs

In the p2p ConfigMap, set logAudit to true:

DeployConfig:
  mode: agent
  logDir: /dadi-p2p/log
  logAudit: true
  logAuditMode: stdout # stdout sends logs to the console. Set to file to write logs to /dadi-p2p/log/audit.log.

Audit log format

Each log entry records the processing time from request receipt to response return. Unit: μs.

2022/08/30 15:44:52|AUDIT|th=00007FBA247C5280|download[pathname=/https://cri-pi840la*****-registry.oss-cn-hangzhou.aliyuncs.com/docker/registry/v2/blobs/sha256/dd/dd65726c224b09836aeb6ecebd6baf58c96be727ba86da14e62835569896008a/data][offset=125829120][size=2097152][latency=267172]
....
2022/08/30 15:44:55|AUDIT|th=00007FBA2EFEAEC0|http:pread[pathname=/https://cri-pi840lacia*****-registry.oss-cn-hangzhou.aliyuncs.com/docker/registry/v2/blobs/sha256/dd/dd65726c224b09836aeb6ecebd6baf58c96be727ba86da14e62835569896008a/data][offset=127467520][size=65536][latency=21]

Each entry contains: timestamp, AUDIT, thread pointer, and an operation code with [pathname=], [size=], and [latency=] fields.

  • Ignore the AUDIT marker and thread pointer — they are for internal use.

  • size: data size for a single request. A negative value indicates an exception.

  • latency: processing latency for a single request, in μs.

Common operation codes:

Code Description
http:pread HTTP proxy processing an output data request
rpc:stat P2P agent getting the file size
rpc:pread P2P agent processing an output data request
download P2P agent downloading data from the upstream
filewrite P2P agent writing a data shard to the cache
fileread P2P agent reading data shards from the cache

Log example:

download[pathname=mytest][offset=0][size=65536][latency=26461]
  ## The latency when the P2P agent downloads the [0,65536) data of the mytest file from the upstream is 26461 μs.
rpc:pread[pathname=mytest][offset=0][size=65536][latency=2]
  ## The latency when the P2P agent returns the [0,65536) data of the mytest file to the downstream is 2 μs.
http:pread[pathname=mytest][offset=0][size=65536][latency=26461]
  ## The latency when the proxy downloads the [0,65536) data of the mytest file from the upstream is 26461 μs.

Appendix

P2P acceleration performance reference

Test results for 1000 nodes pulling the same image simultaneously, with image decompression after download.

Test environment:

Component Specification
ACK cluster 1000 nodes
Elastic Compute Service (ECS) instance 4 vCPUs, 8 GB memory
Cloud disk 200 GB PL1 ESSD
P2P agent 1 vCPU, 1 GB memory, 4 GB cache

Image specifications tested:

  • 4 GB (512 MB × 8 layers)

  • 10 GB (10 GB × 1 layer)

  • 20 GB (4 GB × 5 layers, 10 GB × 2 layers, 512 MB × 40 layers, 20 GB × 1 layer, 2 GB × 10 layers)

Test results (P95 pull time):

Image specification Pull time Peak back-to-origin throughput (Gbit/s)
512 MB × 8 layers 116 seconds 2
10 GB × 1 layer 6 minutes 20 seconds 1.2
4 GB × 5 layers 9 minutes 15 seconds 5.1
10 GB × 2 layers 9 minutes 50 seconds 6.7
512 MB × 40 layers 7 minutes 55 seconds 3.8
20 GB × 1 layer 11 minutes 2.5
2 GB × 10 layers 8 minutes 13 seconds 3.2