All Products
Search
Document Center

Container Service for Kubernetes:Use P2P acceleration

Last Updated:Jan 17, 2025

The P2P acceleration feature enhances the efficiency of image pulling, reducing the time required for application deployment. It is particularly useful when multiple nodes in a container cluster simultaneously pull an image. This topic explains how to utilize P2P acceleration for faster image pulling.

Background information

When numerous nodes in a container cluster concurrently pull an image, the network bandwidth of the container image storage can become a performance bottleneck. P2P acceleration leverages the bandwidth of compute nodes to distribute images, easing the load on container image storage, speeding up pulls, and shortening deployment times. Testing shows that with P2P acceleration, the time to pull a 1 GB image across 1000 nodes on a 10 Gbit/s network is reduced by over 95% compared to standard methods. The latest P2P solution also improves performance by 30% to 50% over previous versions and supports on-demand loading of container images by default. For more information, see on-demand loading of container images.

P2P acceleration is applicable in scenarios such as:

  • ACK clusters

  • On-premises clusters and clusters from third-party cloud service providers

Prerequisites

Install the P2P acceleration kit.

Limits

Once P2P acceleration is enabled, the P2P acceleration kit substitutes your container image address with a P2P image address via a webhook. For instance, if your original image address is test****vpc.cn-hangzhou.cr.aliyuncs.com/docker-builder/nginx:latest, the P2P accelerated image address becomes test****vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001/docker-builder/nginx:latest.

Additionally, the webhook automatically generates an image pull secret for the accelerated image address based on the original image pull secret. Due to the asynchronous nature of creating the P2P image pull secret and replacing the image address, it is recommended to provide the image pull secret required for container images before deploying workloads. Alternatively, manually create an image pull secret for P2P image pulling (with the domain test-registry-vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001) before deploying workloads to prevent image pull failures.

Enable P2P acceleration

Enable P2P acceleration by adding labels to workloads such as pods and Deployments, or to a namespace in your ACK cluster. When a namespace is labeled, P2P acceleration is automatically applied to all eligible workloads within it, without needing to alter the workloads' YAML files. Choose the method that best suits your business needs.

Note

The label name is k8s.aliyun.com/image-accelerate-mode, and the value is p2p.

  • Add the P2P acceleration label to a workload.

    For example, to add the P2P acceleration label to a Deployment, edit the Deployment's YAML file with the following command.

    kubectl edit deploy <Deployment name>

    Include the label k8s.aliyun.com/image-accelerate-mode: p2p in the Deployment file.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: test
      labels:
        app: nginx
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          labels:
            # enable P2P
            k8s.aliyun.com/image-accelerate-mode: p2p
            app: nginx
        spec:
          # your ACR instance image pull secret
          imagePullSecrets:
          - name: test-registry
          containers:
          # your ACR instance image
          - image: test-registry-vpc.cn-hangzhou.cr.aliyuncs.com/docker-builder/nginx:latest
            name: test
            command: ["sleep", "3600"]
  • Add the P2P acceleration label to a namespace

    • Add the label through the console.

      1. Log on to the Container Service Management Console. In the left-side navigation pane, select Cluster List.

      2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, click Namespaces and Quotas.

      3. On the Namespace page, click Edit in the Actions column of the target namespace.

      4. In the Edit Namespace dialog box, set the Variable Name of the Label to k8s.aliyun.com/image-accelerate-mode and set the Variable Value of the Label to p2p. Then, click OK.

    • Add the label through the command line.

      kubectl label namespaces <YOUR-NAMESPACE> k8s.aliyun.com/image-accelerate-mode=p2p

Verify P2P acceleration

Once P2P acceleration is activated, the P2P component seamlessly integrates P2P-related annotations, accelerated image addresses, and the necessary image pull secrets into the pods.

Important

The secrets for pulling P2P-accelerated images differ from those for the original images only in the image repository domain name. All other configurations remain the same. If the user information in the original image pull secret is invalid, the P2P-accelerated image pull will fail.

To query the pods, run the following command:

kubectl get po <Pod name> -oyaml

Expected output:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    # inject p2p-annotations automatically
    k8s.aliyun.com/image-accelerate-mode: p2p
    k8s.aliyun.com/p2p-config: '...'
spec:
  containers:
   # inject image to p2p endpoint
   - image: test-registry-vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001/docker-builder/nginx:latest
  imagePullSecrets:
  - name: test-registry
  # inject image pull secret for p2p endpoint
  - name: acr-credential-test-registry-p2p

Notice that the pod now includes P2P-related annotations, accelerated image addresses for P2P, and the respective image pull secrets, signifying the successful activation of P2P acceleration.

(Optional) Enable client metrics collection

P2P metrics explanation

Enable metrics

Activate metrics collection when installing the P2P acceleration agent.

p2p:

  v2:
    # Component for P2P v2
    image: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/dadi-agent
    imageTag: v0.1.2-72276d4-aliyun

    # Concurrency limit number of layers downloading by each node proxy
    proxyConcurrencyLimit: 128

    # The server port to communicate with P2P nodes
    p2pPort: 65002

    cache:
      # Disk cache capacity in bytes, default 4GB
      capacity: 4294967296
      # Set to 1 if you are using high-performance disks on your ECS, e.g. ESSD PL2/PL3
      aioEnable: 0
    exporter:
      # Set to true if you want to collect component metrics
      enable: false
      port: 65003

    # limit for downstream throughput
    throttleLimitMB: 512

Access methods

The exporter field in the P2P YAML specifies the metrics port.

ExporterConfig:
  enable: true # Specifies whether to enable
  port: 65006  # Specifies the listening port
  standaloneExporterPort: true # Specifies whether to use a standalone port for exposure. If false, metrics are exposed through the HTTP service port

Use curl 127.0.0.1:$port/metrics to retrieve the following metrics:

# HELP DADIP2P_Alive 
# TYPE DADIP2P_Alive gauge
DADIP2P_Alive{node="192.168.69.172:65005",mode="agent"} 1.000000 1692156721833

# HELP DADIP2P_Read_Throughtput Bytes / sec
# TYPE DADIP2P_Read_Throughtput gauge
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833

# HELP DADIP2P_QPS 
# TYPE DADIP2P_QPS gauge
DADIP2P_QPS{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833

# HELP DADIP2P_MaxLatency us
# TYPE DADIP2P_MaxLatency gauge
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833

# HELP DADIP2P_Count Bytes
# TYPE DADIP2P_Count gauge
DADIP2P_Count{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833

# HELP DADIP2P_Cache 
# TYPE DADIP2P_Cache gauge
DADIP2P_Cache{node="192.168.69.172:65005",type="allocated",mode="agent"} 4294967296.000000 1692156721833
DADIP2P_Cache{node="192.168.69.172:65005",type="used",mode="agent"} 4294971392.000000 1692156721833

# HELP DADIP2P_Label 
# TYPE DADIP2P_Label gauge

Metrics explanation

Metric names

  • DADIP2P_Alive: Indicates service status.

  • DADIP2P_Read_Throughtput: Measures P2P service throughput in bytes per second.

  • DADIP2P_QPS: Queries per second (QPS).

  • DADIP2P_MaxLatency: Latency statistics in microseconds (μs).

  • DADIP2P_Count: Traffic statistics in bytes.

  • DADIP2P_Cache: Cache used by a single server in bytes.

Tags

  • node: The service IP address and port number of the P2P agent or root.

  • type: The metric type, which can be one of the following:

    • pread: Processes downstream requests.

    • download: Handles back-to-origin routing.

    • peer: Distributes via P2P network.

    • disk: Processes disk operations.

    • http: Handles HTTP requests.

    • allocated: Allocated cache space.

    • used: Cache space in use.

Metric examples

DADIP2P_Count{node="11.238.108.XXX:9877",type="http",mode="agent"} 4248808352.000000 1692157615810
The total HTTP request traffic that is processed by the agent service: 4248808352 bytes.

DADIP2P_Cache{node="11.238.108.XXX:9877",type="used",mode="agent"} 2147487744.000000 1692157615810
The cache that is used by the current agent: 2147487744 bytes.

Audit logs

Enable audit logs

Set the audit field in the p2p configmap to true.

DeployConfig:
  mode: agent
  logDir: /dadi-p2p/log
  logAudit: true
  logAuditMode: stdout # Output to the console. If set to file, the output is written to the /dadi-p2p/log/audit.log directory

Audit log format

The following example illustrates the audit log format, which records the time taken from receiving a request to returning a result in microseconds (μs).

2022/08/30 15:44:52|AUDIT|th=00007FBA247C5280|download[pathname=/https://cri-pi840la*****-registry.oss-cn-hangzhou.aliyuncs.com/docker/registry/v2/blobs/sha256/dd/dd65726c224b09836aeb6ecebd6baf58c96be727ba86da14e62835569896008a/data][offset=125829120][size=2097152][latency=267172]
....
2022/08/30 15:44:55|AUDIT|th=00007FBA2EFEAEC0|http:pread[pathname=/https://cri-pi840lacia*****-registry.oss-cn-hangzhou.aliyuncs.com/docker/registry/v2/blobs/sha256/dd/dd65726c224b09836aeb6ecebd6baf58c96be727ba86da14e62835569896008a/data][offset=127467520][size=65536][latency=21]

Key parameters consist of time, AUDIT, thread pointer, and the operation code, which include pathname, size, and latency.

Key parameters include time, AUDIT, thread pointer, and operation code [pathname=][size=][latency=]. The size parameter indicates the size of a single request, with a negative value signifying an exception. The latency parameter measures the time taken for a single request in microseconds (μs).

Common operation codes are described below:

  • http:pread: Processes output data requests via HTTP proxy.

  • rpc:stat: Retrieves file size information.

  • rpc:pread: Handles output data requests by the P2P agent.

  • download: Downloads data from upstream by the P2P agent.

  • filewrite: Writes current data shards to the cache by the P2P agent.

  • fileread: Reads data shards from the cache by the P2P agent.

Log examples

download[pathname=mytest][offset=0][size=65536][latency=26461]
  ## The latency when the P2P agent downloads the [0,65536) data of the mytest file from the upstream is 26461 μs
rpc:pread[pathname=mytest][offset=0][size=65536][latency=2]
  ## The latency when the P2P agent returns the [0,65536) data of the mytest file to the downstream is 2 μs
http:pread[pathname=mytest][offset=0][size=65536][latency=26461]
  ## The latency when the proxy downloads the [0,65536) data of the mytest file from the upstream is 26461 μs

(Optional) Disable on-demand loading of images using P2P acceleration

Note

The steps below are for reference when modifying the configuration of a single node in a cluster. It is important to consider future maintenance operations on the node to determine if this configuration will be overwritten.

  1. Log on to the Container Service Management Console. In the left-side navigation pane, select Cluster List.

  2. On the Cluster List page, click the name of the target cluster. In the left-side navigation pane, select Node Management > Nodes.

  3. On the Nodes page, click the Instance ID under the IP address of the target node.

  4. On the instance details page, use Remote Connection to log on to the node.

  5. Edit the p2pConfig in the /etc/overlaybd/overlaybd.json file using the vi command and change enable to false.

    {
         "p2pConfig": {
            "enable": false,
            "address": "https://localhost:6****/accelerator"
        },
    ... ...
    }
  6. Execute the following command to reload image resources on demand:

    service overlaybd-tcmu restart

Appendix

P2P acceleration effect reference

Image pulling of different specifications

Test image specifications:

  • 4 GB (512 MB × 8 layers)

  • 10 GB (10 GB × 1 layer)

  • 20 GB (4 GB × 5 layers, 10 GB × 2 layers, 512 MB × 40 layers, 20 GB × 1 layer, 2 GB × 10 layers)

Test environment:

  • ACK cluster with 1000 nodes

  • ECS instance specifications: 4 vCPUs and 8 GB of memory

  • Cloud disk specifications: 200 GB PL1 enhanced SSD (ESSD)

  • P2P agent specifications: 1 vCPU, 1 GB of memory, and 4 GB of cache

Test scenario:

One thousand nodes pull the same image, which is decompressed after download.

Test results (P95 time consumption):

Image specifications

Time consumption

Back-to-origin (Bucket) peak throughput (Gbit/s)

512 MB × 8 layers

116 seconds

2

10 GB × 1 layer

6 minutes and 20 seconds

1.2

4 GB × 5 layers

9 minutes and 15 seconds

5.1

10 GB × 2 layers

9 minutes and 50 seconds

6.7

512 MB × 40 layers

7 minutes and 55 seconds

3.8

20 GB × 1 layer

11 minutes

2.5

2 GB × 10 layers

8 minutes and 13 seconds

3.2