All Products
Search
Document Center

Container Registry:Use the P2P acceleration feature

Last Updated:Feb 21, 2025

The P2P acceleration feature can accelerate image pulls to reduce the amount of time that is required to deploy applications. If a large number of nodes in a container cluster need to pull an image at the same time, you can use the P2P acceleration feature to accelerate image pulls. This topic describes how to use the P2P acceleration feature to accelerate image pulls.

Background information

If a large number of nodes in a container cluster need to pull an image at the same time, the network bandwidth of the container image storage may become a performance bottleneck. The P2P acceleration feature uses the bandwidth resources of compute nodes to distribute images across nodes. This reduces the pressure on container image storage, accelerates image pulls, and reduces the amount of time that is required for application deployment. Test results indicate that if 1000 nodes pull a 1GB-sized image at the same time in a network that has a bandwidth of 10 Gbit/s, the P2P acceleration feature reduces the image pull time by more than 95% compared with the regular image pull mode. In addition, the new P2P acceleration mode improves performance by 30% to 50% compared with the old P2P acceleration mode. By default, the new P2P acceleration mode is used when Container Registry loads image resources on demand. For more information, see Load resources of a container image on demand.

You can use the P2P acceleration feature in the following types of clusters:

  • Container Service for Kubernetes (ACK) clusters

  • On-premises clusters and clusters of third-party cloud service providers

Prerequisites

A P2P acceleration agent is installed in the cluster.

Limits

After you enable the P2P acceleration feature, the P2P acceleration agent uses a webhook to replace your container image address with the address of the P2P-accelerated image. For example, the P2P acceleration agent replaces your original image address (test****vpc.cn-hangzhou.cr.aliyuncs.com/docker-builder/nginx:latest) by using the address (test****vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001/docker-builder/nginx:latest) of the P2P-accelerated image.

In addition, the webhook automatically generates an image pull key to pull P2P-accelerated images. The generated image pull key is copied from the original image pull key. The generation of the image pull key and the replacement of the image address are asynchronous. Therefore, we recommend that you issue the image pull key or manually create an image pull key that is required for image pulls before you issue a workload. In the preceding example, the image pull key is created in the test-registry-vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001 domain. This can prevent image pull failures caused by image address replacement.

Enable P2P acceleration

You can add the P2P acceleration label to workloads such as pods and Deployments to enable P2P acceleration for the workloads. You can also add the P2P acceleration label to a namespace in your ACK cluster. This way, P2P acceleration is enabled for all workloads that meet acceleration conditions in the namespace. You do not need to modify the YAML files of the workloads to enable P2P acceleration. You can select a method to add the P2P acceleration label based on your business requirements.

Note

The name of the P2P acceleration label is k8s.aliyun.com/image-accelerate-mode and the value is p2p.

  • Add the P2P acceleration label to a workload

    In this example, the P2P acceleration label is added to a Deployment. Run the following command to edit the YAML file of the Deployment:

    kubectl edit deploy <Name of the Deployment>

    Add the k8s.aliyun.com/image-accelerate-mode: p2p label to the YAML file of the Deployment.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: test
      labels:
        app: nginx
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          labels:
            # enable P2P
            k8s.aliyun.com/image-accelerate-mode: p2p
            app: nginx
        spec:
          # your ACR instacne image pull secret
          imagePullSecrets:
          - name: test-registry
          containers:
          # your ACR instacne image
          - image: test-registry-vpc.cn-hangzhou.cr.aliyuncs.com/docker-builder/nginx:latest
            name: test
            command: ["sleep", "3600"]
  • Add the P2P acceleration label to a namespace

    • Use the ACK console

      1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

      2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, click Namespaces and Quotas.

      3. On the Namespace page, find the namespace to which you want to add the P2P acceleration label and click Edit in the Actions column.

      4. In the Edit Namespace dialog box, set the Variable Key parameter of the Label parameter to k8s.aliyun.com/image-accelerate-mode and the Variable Value parameter of the Label parameter to p2p. Then, click OK.

    • Use kubectl

      kubectl label namespaces <YOUR-NAMESPACE> k8s.aliyun.com/image-accelerate-mode=p2p

Check whether the P2P acceleration feature is enabled

After you enable P2P acceleration for a pod, the P2P agent automatically injects the P2P acceleration annotation, the address of the P2P-accelerated image, and the secret for pulling the P2P-accelerated image into the pod.

Important

The secret for pulling a P2P-accelerated image and the secret for pulling the original image are different only in the domain name of the image repository. Other configurations of the secrets are identical. If the user information in the secret for pulling the original image is invalid, the P2P-accelerated image fails to be pulled.

Run the following command to query pods:

kubectl get po <Name of the pod> -oyaml

Expected output:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    # inject p2p-annotations automatically
    k8s.aliyun.com/image-accelerate-mode: p2p
    k8s.aliyun.com/p2p-config: '...'
spec:
  containers:
   # inject image to p2p endpoint
   - image: test-registry-vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001/docker-builder/nginx:latest
  imagePullSecrets:
  - name: test-registry
  # inject image pull secret for p2p endpoint
  - name: acr-credential-test-registry-p2p

In the preceding code, the P2P acceleration annotation, the address of the P2P-accelerated image, and the secret for pulling the P2P-accelerated image are injected into the pod. P2P acceleration is enabled for the pod.

(Optional) Enable acceleration metric collection on the client

Description of P2P acceleration metrics

Enable the metric collection feature

Enable the metric collection feature when you install the P2P acceleration agent.

p2p:

  v2:
    # Component for P2P v2
    image: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/dadi-agent
    imageTag: v0.1.2-72276d4-aliyun

    # Concurrency limit number of layers downloading by each node proxy
    proxyConcurrencyLimit: 128

    # The server port to communicate with P2P nodes
    p2pPort: 65002

    cache:
      # Disk cache capacity in bytes, default 4GB
      capacity: 4294967296
      # Set to 1 if you are using high-performance disks on your ECS, e.g. ESSD PL2/PL3
      aioEnable: 0
    exporter:
      # Set to true if you want to collect component metrics
      enable: false
      port: 65003

    # limit for downstream throughput
    throttleLimitMB: 512

Method for accessing the P2P acceleration agent

The exporter-related parameter in the YAML file of the P2P acceleration agent defines the port for collecting metrics.

ExporterConfig:
  enable: true # Specifies whether to enable the metrics collection feature.
  port: 65006 # Specifies the listening port.
  standaloneExporterPort: true # Specifies whether to expose a standalone port. If you set this parameter to false, throughput is over the HTTP service port
.

Run the curl 127.0.0.1:$port/metrics command. The following metrics are returned.

# HELP DADIP2P_Alive 
# TYPE DADIP2P_Alive gauge
DADIP2P_Alive{node="192.168.69.172:65005",mode="agent"} 1.000000 1692156721833

# HELP DADIP2P_Read_Throughtput Bytes / sec
# TYPE DADIP2P_Read_Throughtput gauge
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833

# HELP DADIP2P_QPS 
# TYPE DADIP2P_QPS gauge
DADIP2P_QPS{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833

# HELP DADIP2P_MaxLatency us
# TYPE DADIP2P_MaxLatency gauge
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833

# HELP DADIP2P_Count Bytes
# TYPE DADIP2P_Count gauge
DADIP2P_Count{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833

# HELP DADIP2P_Cache 
# TYPE DADIP2P_Cache gauge
DADIP2P_Cache{node="192.168.69.172:65005",type="allocated",mode="agent"} 4294967296.000000 1692156721833
DADIP2P_Cache{node="192.168.69.172:65005",type="used",mode="agent"} 4294971392.000000 1692156721833

# HELP DADIP2P_Label 
# TYPE DADIP2P_Label gauge

Metrics description

Metric names

  • DADIP2P_Alive: indicates whether the service is alive.

  • DADIP2 P_Read_Throughtput: the throughput of the P2P service. Unit: byte/s.

  • DADIP2P_QPS: the queries per second (QPS).

  • DADIP2P_MaxLatency: the latency statistics. Unit: μs.

  • DADIP2P_Count: the traffic statistics. Unit: bytes.

  • DADIP2P_Cache: the cache that is used by a single server. Unit: bytes.

Tag

  • node: the service IP address and port number of the P2P agent or root.

  • type: the type of the metric.

    • pread: process downstream requests.

    • download: back-to-origin routing.

    • peer: P2P network distribution.

    • disk: process disks.

    • http: process HTTP requests.

    • allocated: the space that is allocated to the cache.

    • used: the space that is used by the cache.

Metric example

DADIP2P_Count{node="11.238.108.XXX:9877",type="http",mode="agent"} 4248808352.000000 1692157615810
The total HTTP request traffic that is processed by the agent service: 4248808352 bytes. 

DADIP2P_Cache{node="11.238.108.XXX:9877",type="used",mode="agent"} 2147487744.000000 1692157615810
The cache that is used by the agent: 2147487744 bytes.

Audit logs

Enable audit logs

Set the logAudit parameter in the P2P ConfigMap to true.

DeployConfig:
  mode: agent
  logDir: /dadi-p2p/log
  logAudit: true
  logAuditMode: stdout # Logs are collected to the console. If you set this parameter to file, logs are collected to the /dadi-p2p/log/audit.log directory
.

Format of audit logs

The following code shows the format of audit logs. The code indicates the processing time from the receipt of the request to the return of the result. Unit: μs.

2022/08/30 15:44:52|AUDIT|th=00007FBA247C5280|download[pathname=/https://cri-pi840la*****-registry.oss-cn-hangzhou.aliyuncs.com/docker/registry/v2/blobs/sha256/dd/dd65726c224b09836aeb6ecebd6baf58c96be727ba86da14e62835569896008a/data][offset=125829120][size=2097152][latency=267172]
....
2022/08/30 15:44:55|AUDIT|th=00007FBA2EFEAEC0|http:pread[pathname=/https://cri-pi840lacia*****-registry.oss-cn-hangzhou.aliyuncs.com/docker/registry/v2/blobs/sha256/dd/dd65726c224b09836aeb6ecebd6baf58c96be727ba86da14e62835569896008a/data][offset=127467520][size=65536][latency=21]

Major parameters include time, AUDIT, thread pointer, and the operation code [pathname=][size=][latency=].

You can ignore the AUDIT and thread pointer parameters. The size parameter specifies the size of logs in a single request. If the value of this parameter is a negative number, an exception occurs. The latency parameter specifies the latency of a single request. Unit: μs.

The following list describes common operation codes:

  • http: read: The HTTP proxy processes output data requests.

  • rpc:stat: The P2P agent obtains the file size.

  • rpc:pread: The P2P agent processes output data requests.

  • download: The P2P agent downloads data from the upstream.

  • filewrite: The P2P agent writes the current data shard to the cache.

  • fileread: The P2P agent reads data shards from the cache.

Log example

download[pathname=mytest][offset=0][size=65536][latency=26461]
  ## The latency when the P2P agent downloads the [0,65536) data of the mytest file from the upstream is 26461 μs.
rpc:pread[pathname=mytest][offset=0][size=65536][latency=2]
  ## The latency when the P2P agent returns the [0,65536) data of the mytest file to the downstream is 2 μs.
http:pread[pathname=mytest][offset=0][size=65536][latency=26461]
  ## The latency when the P2P agent downloads the [0,65536) data of the mytest file from the upstream is 26461 μs
.

(Optional) Disable loading images on demand and enable P2P acceleration

Note

You can modify the configurations of a single node in a cluster to disable loading images on demand and enable P2P acceleration. Subsequent O&M operations on the node may overwrite the modifications. In this case, you must re-modify the configurations of the node. Perform the following operations to disable loading images on demand and enable P2P acceleration:

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Nodes > Nodes.

  3. On the Nodes page, click the Instance ID in the Name/IP Address/Instance ID column corresponding to the node that you want to manage.

  4. On the Instance Details tab, click Connect in the upper-right corner to connect to the node.

  5. Run the vi command to edit the p2pConfig field in the /etc/overlaybd/overlaybd.json file by changing the value of enable to false.

    {
         "p2pConfig": {
            "enable": false,
            "address": "https://localhost:6****/accelerator"
        },
    ... ...
    }
  6. Run the following command to re-load image resources on demand:

    service overlaybd-tcmu restart

Appendix

Reference for the P2P acceleration effect

Pull images of different specifications

Image specifications that are used in the test

  • 4 GB (512 MB × 8 layers)

  • 10 GB (10 GB × 1 layer)

  • 20 GB (4 GB × 5 layers, 10 GB × 2 layers, 512 MB × 40 layers, 20 GB × 1 layer, 2 GB × 10 layers)

Test environment

  • ACK cluster: 1000 nodes

  • Specification of the Elastic Compute Service (ECS) instance: 4 vCPUs and 8 GB of memory

  • Specification of the cloud disk: 200 GB PL1 enhanced SSD (ESSD)

  • Specification of the P2P agent: 1 vCPU, 1 GB of memory, and 4 GB of cache

Test scenario

One thousand nodes pull the same image, and the image is decompressed after it is downloaded.

Test result (P95 time)

Image specification

Consumed time

Peak throughput (Gbit/s) of back-to-origin routing (buckets)

512 MB × 8 layers

116 seconds

2

10 GB × 1 layer

6 minutes and 20 seconds

1.2

4 GB × 5 layers

9 minutes and 15 seconds

5.1

10 GB × 2 layers

9 minutes and 50 seconds

6.7

512 MB × 40 layers

7 minutes and 55 seconds

3.8

20 GB × 1 layer

11 minutes

2.5

2 GB × 10 layers

8 minutes and 13 seconds

3.2