The P2P acceleration feature enhances the efficiency of image pulling, reducing the time required for application deployment. It is particularly useful when multiple nodes in a container cluster simultaneously pull an image. This topic explains how to utilize P2P acceleration for faster image pulling.
Background information
When numerous nodes in a container cluster concurrently pull an image, the network bandwidth of the container image storage can become a performance bottleneck. P2P acceleration leverages the bandwidth of compute nodes to distribute images, easing the load on container image storage, speeding up pulls, and shortening deployment times. Testing shows that with P2P acceleration, the time to pull a 1 GB image across 1000 nodes on a 10 Gbit/s network is reduced by over 95% compared to standard methods. The latest P2P solution also improves performance by 30% to 50% over previous versions and supports on-demand loading of container images by default. For more information, see on-demand loading of container images.
P2P acceleration is applicable in scenarios such as:
ACK clusters
On-premises clusters and clusters from third-party cloud service providers
Prerequisites
Install the P2P acceleration kit.
For ACK clusters, refer to installing the P2P acceleration kit in ACK clusters.
For on-premises clusters or those from third-party cloud service providers, refer to installing the P2P acceleration kit in on-premises clusters or clusters of third-party cloud service providers.
Limits
Once P2P acceleration is enabled, the P2P acceleration kit substitutes your container image address with a P2P image address via a webhook. For instance, if your original image address is test****vpc.cn-hangzhou.cr.aliyuncs.com/docker-builder/nginx:latest
, the P2P accelerated image address becomes test****vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001/docker-builder/nginx:latest
.
Additionally, the webhook automatically generates an image pull secret for the accelerated image address based on the original image pull secret. Due to the asynchronous nature of creating the P2P image pull secret and replacing the image address, it is recommended to provide the image pull secret required for container images before deploying workloads. Alternatively, manually create an image pull secret for P2P image pulling (with the domain test-registry-vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001
) before deploying workloads to prevent image pull failures.
Enable P2P acceleration
Enable P2P acceleration by adding labels to workloads such as pods and Deployments, or to a namespace in your ACK cluster. When a namespace is labeled, P2P acceleration is automatically applied to all eligible workloads within it, without needing to alter the workloads' YAML files. Choose the method that best suits your business needs.
The label name is k8s.aliyun.com/image-accelerate-mode
, and the value is p2p
.
-
Add the P2P acceleration label to a workload.
For example, to add the P2P acceleration label to a Deployment, edit the Deployment's YAML file with the following command.
kubectl edit deploy <Deployment name>
Include the label
k8s.aliyun.com/image-accelerate-mode: p2p
in the Deployment file.apiVersion: apps/v1 kind: Deployment metadata: name: test labels: app: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: # enable P2P k8s.aliyun.com/image-accelerate-mode: p2p app: nginx spec: # your ACR instance image pull secret imagePullSecrets: - name: test-registry containers: # your ACR instance image - image: test-registry-vpc.cn-hangzhou.cr.aliyuncs.com/docker-builder/nginx:latest name: test command: ["sleep", "3600"]
-
Add the P2P acceleration label to a namespace
-
Add the label through the console.
-
Log on to the Container Service Management Console. In the left-side navigation pane, select Cluster List.
-
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, click Namespaces and Quotas.
-
On the Namespace page, click Edit in the Actions column of the target namespace.
-
In the Edit Namespace dialog box, set the Variable Name of the Label to
k8s.aliyun.com/image-accelerate-mode
and set the Variable Value of the Label top2p
. Then, click OK.
-
-
Add the label through the command line.
kubectl label namespaces <YOUR-NAMESPACE> k8s.aliyun.com/image-accelerate-mode=p2p
-
Verify P2P acceleration
Once P2P acceleration is activated, the P2P component seamlessly integrates P2P-related annotations, accelerated image addresses, and the necessary image pull secrets into the pods.
The secrets for pulling P2P-accelerated images differ from those for the original images only in the image repository domain name. All other configurations remain the same. If the user information in the original image pull secret is invalid, the P2P-accelerated image pull will fail.
To query the pods, run the following command:
kubectl get po <Pod name> -oyaml
Expected output:
apiVersion: v1
kind: Pod
metadata:
annotations:
# inject p2p-annotations automatically
k8s.aliyun.com/image-accelerate-mode: p2p
k8s.aliyun.com/p2p-config: '...'
spec:
containers:
# inject image to p2p endpoint
- image: test-registry-vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001/docker-builder/nginx:latest
imagePullSecrets:
- name: test-registry
# inject image pull secret for p2p endpoint
- name: acr-credential-test-registry-p2p
Notice that the pod now includes P2P-related annotations, accelerated image addresses for P2P, and the respective image pull secrets, signifying the successful activation of P2P acceleration.
(Optional) Enable client metrics collection
P2P metrics explanation
Enable metrics
Activate metrics collection when installing the P2P acceleration agent.
p2p:
v2:
# Component for P2P v2
image: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/dadi-agent
imageTag: v0.1.2-72276d4-aliyun
# Concurrency limit number of layers downloading by each node proxy
proxyConcurrencyLimit: 128
# The server port to communicate with P2P nodes
p2pPort: 65002
cache:
# Disk cache capacity in bytes, default 4GB
capacity: 4294967296
# Set to 1 if you are using high-performance disks on your ECS, e.g. ESSD PL2/PL3
aioEnable: 0
exporter:
# Set to true if you want to collect component metrics
enable: false
port: 65003
# limit for downstream throughput
throttleLimitMB: 512
Access methods
The exporter
field in the P2P YAML specifies the metrics port.
ExporterConfig:
enable: true # Specifies whether to enable
port: 65006 # Specifies the listening port
standaloneExporterPort: true # Specifies whether to use a standalone port for exposure. If false, metrics are exposed through the HTTP service port
Use curl 127.0.0.1:$port/metrics
to retrieve the following metrics:
# HELP DADIP2P_Alive
# TYPE DADIP2P_Alive gauge
DADIP2P_Alive{node="192.168.69.172:65005",mode="agent"} 1.000000 1692156721833
# HELP DADIP2P_Read_Throughtput Bytes / sec
# TYPE DADIP2P_Read_Throughtput gauge
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_QPS
# TYPE DADIP2P_QPS gauge
DADIP2P_QPS{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_MaxLatency us
# TYPE DADIP2P_MaxLatency gauge
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_Count Bytes
# TYPE DADIP2P_Count gauge
DADIP2P_Count{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_Cache
# TYPE DADIP2P_Cache gauge
DADIP2P_Cache{node="192.168.69.172:65005",type="allocated",mode="agent"} 4294967296.000000 1692156721833
DADIP2P_Cache{node="192.168.69.172:65005",type="used",mode="agent"} 4294971392.000000 1692156721833
# HELP DADIP2P_Label
# TYPE DADIP2P_Label gauge
Metrics explanation
Metric names
DADIP2P_Alive: Indicates service status.
DADIP2P_Read_Throughtput: Measures P2P service throughput in bytes per second.
DADIP2P_QPS: Queries per second (QPS).
DADIP2P_MaxLatency: Latency statistics in microseconds (μs).
DADIP2P_Count: Traffic statistics in bytes.
DADIP2P_Cache: Cache used by a single server in bytes.
Tags
node: The service IP address and port number of the P2P agent or root.
type: The metric type, which can be one of the following:
pread: Processes downstream requests.
download: Handles back-to-origin routing.
peer: Distributes via P2P network.
disk: Processes disk operations.
http: Handles HTTP requests.
allocated: Allocated cache space.
used: Cache space in use.
Metric examples
DADIP2P_Count{node="11.238.108.XXX:9877",type="http",mode="agent"} 4248808352.000000 1692157615810
The total HTTP request traffic that is processed by the agent service: 4248808352 bytes.
DADIP2P_Cache{node="11.238.108.XXX:9877",type="used",mode="agent"} 2147487744.000000 1692157615810
The cache that is used by the current agent: 2147487744 bytes.
Audit logs
Enable audit logs
Set the audit field in the p2p configmap
to true
.
DeployConfig:
mode: agent
logDir: /dadi-p2p/log
logAudit: true
logAuditMode: stdout # Output to the console. If set to file, the output is written to the /dadi-p2p/log/audit.log directory
Audit log format
The following example illustrates the audit log format, which records the time taken from receiving a request to returning a result in microseconds (μs).
2022/08/30 15:44:52|AUDIT|th=00007FBA247C5280|download[pathname=/https://cri-pi840la*****-registry.oss-cn-hangzhou.aliyuncs.com/docker/registry/v2/blobs/sha256/dd/dd65726c224b09836aeb6ecebd6baf58c96be727ba86da14e62835569896008a/data][offset=125829120][size=2097152][latency=267172]
....
2022/08/30 15:44:55|AUDIT|th=00007FBA2EFEAEC0|http:pread[pathname=/https://cri-pi840lacia*****-registry.oss-cn-hangzhou.aliyuncs.com/docker/registry/v2/blobs/sha256/dd/dd65726c224b09836aeb6ecebd6baf58c96be727ba86da14e62835569896008a/data][offset=127467520][size=65536][latency=21]
Key parameters consist of time, AUDIT, thread pointer, and the operation code, which include pathname, size, and latency.
Key parameters include time, AUDIT, thread pointer, and operation code [pathname=][size=][latency=]. The size parameter indicates the size of a single request, with a negative value signifying an exception. The latency parameter measures the time taken for a single request in microseconds (μs).
Common operation codes are described below:
http:pread: Processes output data requests via HTTP proxy.
rpc:stat: Retrieves file size information.
rpc:pread: Handles output data requests by the P2P agent.
download: Downloads data from upstream by the P2P agent.
filewrite: Writes current data shards to the cache by the P2P agent.
fileread: Reads data shards from the cache by the P2P agent.
Log examples
download[pathname=mytest][offset=0][size=65536][latency=26461]
## The latency when the P2P agent downloads the [0,65536) data of the mytest file from the upstream is 26461 μs
rpc:pread[pathname=mytest][offset=0][size=65536][latency=2]
## The latency when the P2P agent returns the [0,65536) data of the mytest file to the downstream is 2 μs
http:pread[pathname=mytest][offset=0][size=65536][latency=26461]
## The latency when the proxy downloads the [0,65536) data of the mytest file from the upstream is 26461 μs
(Optional) Disable on-demand loading of images using P2P acceleration
The steps below are for reference when modifying the configuration of a single node in a cluster. It is important to consider future maintenance operations on the node to determine if this configuration will be overwritten.
-
Log on to the Container Service Management Console. In the left-side navigation pane, select Cluster List.
-
On the Cluster List page, click the name of the target cluster. In the left-side navigation pane, select .
-
On the Nodes page, click the Instance ID under the IP address of the target node.
-
On the instance details page, use Remote Connection to log on to the node.
-
Edit the p2pConfig in the
/etc/overlaybd/overlaybd.json
file using thevi
command and change enable to false.{ "p2pConfig": { "enable": false, "address": "https://localhost:6****/accelerator" }, ... ... }
-
Execute the following command to reload image resources on demand:
service overlaybd-tcmu restart
Appendix
P2P acceleration effect reference
Image pulling of different specifications
Test image specifications:
4 GB (512 MB × 8 layers)
10 GB (10 GB × 1 layer)
20 GB (4 GB × 5 layers, 10 GB × 2 layers, 512 MB × 40 layers, 20 GB × 1 layer, 2 GB × 10 layers)
Test environment:
ACK cluster with 1000 nodes
ECS instance specifications: 4 vCPUs and 8 GB of memory
Cloud disk specifications: 200 GB PL1 enhanced SSD (ESSD)
P2P agent specifications: 1 vCPU, 1 GB of memory, and 4 GB of cache
Test scenario:
One thousand nodes pull the same image, which is decompressed after download.
Test results (P95 time consumption):
Image specifications | Time consumption | Back-to-origin (Bucket) peak throughput (Gbit/s) |
512 MB × 8 layers | 116 seconds | 2 |
10 GB × 1 layer | 6 minutes and 20 seconds | 1.2 |
4 GB × 5 layers | 9 minutes and 15 seconds | 5.1 |
10 GB × 2 layers | 9 minutes and 50 seconds | 6.7 |
512 MB × 40 layers | 7 minutes and 55 seconds | 3.8 |
20 GB × 1 layer | 11 minutes | 2.5 |
2 GB × 10 layers | 8 minutes and 13 seconds | 3.2 |