The P2P acceleration feature accelerates image pulls to reduce application deployment time. When many nodes in a container cluster need to pull an image simultaneously, you can use P2P acceleration to improve performance. This topic describes how to use the P2P acceleration feature to accelerate image pulls.
Background information
When many nodes in a container cluster need to pull an image simultaneously, the network bandwidth of the container image storage can become a performance bottleneck that slows down image pulling. The P2P acceleration feature leverages the bandwidth resources of compute nodes to distribute images across nodes. This reduces pressure on container image storage, accelerates image pulls, and decreases application deployment time. Test results show that when 1000 nodes pull a 1GB-sized image simultaneously in a network with 10 Gbit/s bandwidth, the P2P acceleration feature reduces image pull time by more than 95% compared to the regular image pull mode. Additionally, the new P2P acceleration mode improves performance by 30% to 50% compared to the old P2P acceleration mode. By default, the new P2P acceleration mode is used when Container Registry loads image resources on demand. For more information, see Load resources of a container image on demand.
You can use the P2P acceleration feature in the following types of clusters:
ACK clusters
On-premises clusters and clusters of third-party cloud service providers
Prerequisites
A P2P acceleration agent is installed in the cluster.
For information about how to install a P2P acceleration agent in an ACK cluster, see Install a P2P acceleration agent in an ACK cluster.
For information about how to install a P2P acceleration agent in an on-premises cluster or a cluster of a third-party cloud service provider, see Install a P2P acceleration agent in an on-premises cluster or a cluster of a third-party cloud service provider.
Limits
After you enable the P2P acceleration feature, the P2P acceleration agent uses a webhook to replace your container image address with the address of the P2P-accelerated image. For example, the P2P acceleration agent replaces your original image address (test****vpc.cn-hangzhou.cr.aliyuncs.com/docker-builder/nginx:latest) with the address (test****vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001/docker-builder/nginx:latest) of the P2P-accelerated image.
Additionally, the webhook automatically generates an image pull key to pull P2P-accelerated images. The generated image pull key is copied from the original image pull key. The generation of the image pull key and the replacement of the image address are asynchronous. Therefore, we recommend that you issue the image pull key or manually create an image pull key required for image pulls before you issue a workload. In the preceding example, the image pull key is created in the test-registry-vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001 domain. This prevents image pull failures caused by image address replacement.
Enable P2P acceleration
You can add the P2P acceleration label to workloads such as pods and Deployments to enable P2P acceleration for the workloads. You can also add the P2P acceleration label to a namespace in your ACK cluster. This way, P2P acceleration is enabled for all workloads that meet acceleration conditions in the namespace. You do not need to modify the YAML files of the workloads to enable P2P acceleration. You can select a method to add the P2P acceleration label as needed.
The name of the P2P acceleration label is k8s.aliyun.com/image-accelerate-mode and the value is p2p.
Add the P2P acceleration label to a workload
In this example, the P2P acceleration label is added to a Deployment. Run the following command to edit the YAML file of the Deployment:
kubectl edit deploy <Name of the Deployment>Add the
k8s.aliyun.com/image-accelerate-mode: p2plabel to the YAML file of the Deployment.apiVersion: apps/v1 kind: Deployment metadata: name: test labels: app: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: # enable P2P k8s.aliyun.com/image-accelerate-mode: p2p app: nginx spec: # your ACR instacne image pull secret imagePullSecrets: - name: test-registry containers: # your ACR instacne image - image: test-registry-vpc.cn-hangzhou.cr.aliyuncs.com/docker-builder/nginx:latest name: test command: ["sleep", "3600"]Add the P2P acceleration label to a namespace
Use the ACK console
Log on to the ACK console. In the navigation pane on the left, click Clusters.
On the Clusters page, find the cluster you want and click its name. In the left-side navigation pane, click Namespaces and Quotas.
On the Namespace page, find the namespace to which you want to add the P2P acceleration label and click Edit in the Actions column.
In the Edit Namespace dialog box, click +Namespace Label, set Variable Key to
k8s.aliyun.com/image-accelerate-modeand Variable Value top2p, and then click OK.
You can add a P2P acceleration tag from the command line.
kubectl label namespaces <YOUR-NAMESPACE> k8s.aliyun.com/image-accelerate-mode=p2p
Verify P2P acceleration
After you enable P2P acceleration for a pod, the P2P agent automatically injects the P2P acceleration annotation, the address of the P2P-accelerated image, and the secret for pulling the P2P-accelerated image into the pod.
The secret for pulling a P2P-accelerated image and the secret for pulling the original image differ only in the domain name of the image repository. Other configurations of the secrets are identical. If the user information in the secret for pulling the original image is invalid, the P2P-accelerated image will fail to be pulled.
Run the following command to query pods:
kubectl get po <Name of the pod> -oyamlExpected output:
apiVersion: v1
kind: Pod
metadata:
annotations:
# inject p2p-annotations automatically
k8s.aliyun.com/image-accelerate-mode: p2p
k8s.aliyun.com/p2p-config: '...'
spec:
containers:
# inject image to p2p endpoint
- image: test-registry-vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001/docker-builder/nginx:latest
imagePullSecrets:
- name: test-registry
# inject image pull secret for p2p endpoint
- name: acr-credential-test-registry-p2pIn the preceding code, the P2P acceleration annotation, the address of the P2P-accelerated image, and the secret for pulling the P2P-accelerated image are injected into the pod. P2P acceleration is enabled for the pod.
(Optional) Enable acceleration metric collection on the client
Description of P2P acceleration metrics
Enable the metric collection feature
Enable the metric collection feature when you install the P2P acceleration agent.
p2p:
v2:
# Component for P2P v2
image: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/dadi-agent
imageTag: v0.1.2-72276d4-aliyun
# Concurrency limit number of layers downloading by each node proxy
proxyConcurrencyLimit: 128
# The server port to communicate with P2P nodes
p2pPort: 65002
cache:
# Disk cache capacity in bytes, default 4GB
capacity: 4294967296
# Set to 1 if you are using high-performance disks on your ECS, e.g. ESSD PL2/PL3
aioEnable: 0
exporter:
# Set to true if you want to collect component metrics
enable: false
port: 65003
# limit for downstream throughput
throttleLimitMB: 512Access methods
The exporter field in the P2P YAML file defines the port for collecting metrics.
ExporterConfig:
enable: true # Specifies whether to enable the metrics collection feature.
port: 65006 # Specifies the listening port.
standaloneExporterPort: true # Specifies whether to expose a standalone port. If you set this parameter to false, throughput is over the HTTP service portRun the curl 127.0.0.1:$port/metrics command. The following metrics are returned.
# HELP DADIP2P_Alive
# TYPE DADIP2P_Alive gauge
DADIP2P_Alive{node="192.168.69.172:65005",mode="agent"} 1.000000 1692156721833
# HELP DADIP2P_Read_Throughtput Bytes / sec
# TYPE DADIP2P_Read_Throughtput gauge
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_QPS
# TYPE DADIP2P_QPS gauge
DADIP2P_QPS{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_MaxLatency us
# TYPE DADIP2P_MaxLatency gauge
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_Count Bytes
# TYPE DADIP2P_Count gauge
DADIP2P_Count{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_Cache
# TYPE DADIP2P_Cache gauge
DADIP2P_Cache{node="192.168.69.172:65005",type="allocated",mode="agent"} 4294967296.000000 1692156721833
DADIP2P_Cache{node="192.168.69.172:65005",type="used",mode="agent"} 4294971392.000000 1692156721833
# HELP DADIP2P_Label
# TYPE DADIP2P_Label gaugeMetrics description
Metric names
DADIP2P_Alive: indicates whether the service is alive.
DADIP2P_Read_Throughtput: the throughput of the P2P service. Unit: byte/s.
DADIP2P_QPS: the queries per second (QPS).
DADIP2P_MaxLatency: the latency statistics. Unit: μs.
DADIP2P_Count: the traffic statistics. Unit: bytes.
DADIP2P_Cache: the cache that is used by a single server. Unit: bytes.
Tag
node: the service IP address and port number of the P2P agent or root.
type: the type of the metric.
pread: process downstream requests.
download: back-to-origin routing.
peer: P2P network distribution.
disk: process disks.
http: process HTTP requests.
allocated: the space that is allocated to the cache.
used: the space that is used by the cache.
Metric example
DADIP2P_Count{node="11.238.108.XXX:9877",type="http",mode="agent"} 4248808352.000000 1692157615810
The total HTTP request traffic that is processed by the agent service: 4248808352 bytes.
DADIP2P_Cache{node="11.238.108.XXX:9877",type="used",mode="agent"} 2147487744.000000 1692157615810
The cache that is used by the agent: 2147487744 bytes.Audit logs
Enable audit logs
Set the audit field in the p2p configmap to true.
DeployConfig:
mode: agent
logDir: /dadi-p2p/log
logAudit: true
logAuditMode: stdout # Logs are collected to the console. If you set this parameter to file, logs are collected to the /dadi-p2p/log/audit.log directoryFormat of audit logs
The following code shows the format of audit logs. The code indicates the processing time from the receipt of the request to the return of the result. Unit: μs.
2022/08/30 15:44:52|AUDIT|th=00007FBA247C5280|download[pathname=/https://cri-pi840la*****-registry.oss-cn-hangzhou.aliyuncs.com/docker/registry/v2/blobs/sha256/dd/dd65726c224b09836aeb6ecebd6baf58c96be727ba86da14e62835569896008a/data][offset=125829120][size=2097152][latency=267172]
....
2022/08/30 15:44:55|AUDIT|th=00007FBA2EFEAEC0|http:pread[pathname=/https://cri-pi840lacia*****-registry.oss-cn-hangzhou.aliyuncs.com/docker/registry/v2/blobs/sha256/dd/dd65726c224b09836aeb6ecebd6baf58c96be727ba86da14e62835569896008a/data][offset=127467520][size=65536][latency=21]Major parameters include time, AUDIT, thread pointer, and the operation code [pathname=][size=][latency=].
You can ignore the AUDIT and thread pointer parameters. The size parameter specifies the size of logs in a single request. If the value of this parameter is a negative number, an exception occurs. The latency parameter specifies the latency of a single request. Unit: μs.
The following list describes common operation codes:
http:pread: The HTTP proxy processes output data requests.
rpc:stat: The P2P agent obtains the file size.
rpc:pread: The P2P agent processes output data requests.
download: The P2P agent downloads data from the upstream.
filewrite: The P2P agent writes the current data shard to the cache.
fileread: The P2P agent reads data shards from the cache.
Log example
download[pathname=mytest][offset=0][size=65536][latency=26461]
## The latency when the P2P agent downloads the [0,65536) data of the mytest file from the upstream is 26461 μs.
rpc:pread[pathname=mytest][offset=0][size=65536][latency=2]
## The latency when the P2P agent returns the [0,65536) data of the mytest file to the downstream is 2 μs.
http:pread[pathname=mytest][offset=0][size=65536][latency=26461]
## The latency when the proxy downloads the [0,65536) data of the mytest file from the upstream is 26461 μs.(Optional) Disable loading images on demand and enable P2P acceleration
You can modify the configurations of a single node in a cluster to disable loading images on demand and enable P2P acceleration. Subsequent O&M operations on the node may overwrite the modifications. In this case, you must re-modify the configurations of the node.
Log on to the ACK console. In the navigation pane on the left, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage. In the navigation pane on the left, choose .
On the Nodes page, click the Instance ID under the IP address of the node that you want to manage.
On the instance details page, use Remote Connection to log on to the node.
Run the
vicommand to edit the p2pConfig field in the/etc/overlaybd/overlaybd.jsonfile by changing the value of enable to false.{ "p2pConfig": { "enable": false, "address": "https://localhost:6****/accelerator" }, ... ... }Run the following command to re-load image resources on demand:
service overlaybd-tcmu restart
Appendix
Reference for the P2P acceleration effect
Pull images of different specifications
Image specifications used in the test
4 GB (512 MB × 8 layers)
10 GB (10 GB × 1 layer)
20 GB (4 GB × 5 layers, 10 GB × 2 layers, 512 MB × 40 layers, 20 GB × 1 layer, 2 GB × 10 layers)
Test environment
ACK cluster: 1000 nodes
Specification of the Elastic Compute Service (ECS) instance: 4 vCPUs and 8 GB of memory
Specification of the cloud disk: 200 GB PL1 enhanced SSD (ESSD)
Specification of the P2P agent: 1 vCPU, 1 GB of memory, and 4 GB of cache
Test scenario
One thousand nodes pull the same image, and the image is decompressed after it is downloaded.
Test result (P95 time)
Image specification | Consumed time | Peak throughput (Gbit/s) of back-to-origin routing (buckets) |
512 MB × 8 layers | 116 seconds | 2 |
10 GB × 1 layer | 6 minutes and 20 seconds | 1.2 |
4 GB × 5 layers | 9 minutes and 15 seconds | 5.1 |
10 GB × 2 layers | 9 minutes and 50 seconds | 6.7 |
512 MB × 40 layers | 7 minutes and 55 seconds | 3.8 |
20 GB × 1 layer | 11 minutes | 2.5 |
2 GB × 10 layers | 8 minutes and 13 seconds | 3.2 |