The P2P acceleration feature accelerates image pulls to reduce application deployment time. When many nodes in a container cluster need to pull an image simultaneously, you can use P2P acceleration to improve performance. This topic describes how to use the P2P acceleration feature to accelerate image pulls.
How P2P acceleration works
When many nodes pull an image at the same time, the container image registry becomes a bandwidth bottleneck. P2P acceleration distributes image data across compute nodes in the cluster, reducing back-to-origin traffic and speeding up pulls for all nodes simultaneously.
In a 1000-node cluster pulling a 1 GB image over a 10 Gbit/s network, P2P acceleration reduces pull time by more than 95% compared to standard image pulls. The new P2P acceleration mode also improves performance by 30% to 50% over the old mode.
By default, the new P2P acceleration mode is used when Container Registry loads image resources on demand. For details, see Load resources of a container image on demand.
P2P acceleration is supported in the following cluster types:
-
ACK clusters
-
On-premises clusters and clusters of third-party cloud service providers
Prerequisites
Before you begin, ensure that you have:
-
A P2P acceleration agent installed in the cluster
-
For ACK clusters: Install a P2P acceleration agent in an ACK cluster
-
For on-premises clusters or third-party clusters: Install a P2P acceleration agent in an on-premises cluster or a cluster of a third-party cloud service provider
-
How the webhook modifies your workload
When P2P acceleration is enabled for a Pod, the P2P acceleration agent's webhook automatically:
-
Replaces the container image address with a P2P-accelerated endpoint. For example:
-
Original:
test****vpc.cn-hangzhou.cr.aliyuncs.com/docker-builder/nginx:latest -
Replaced:
test****vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001/docker-builder/nginx:latest
-
-
Generates an image pull secret for the P2P endpoint, copied from your original image pull secret. The new secret differs only in the domain name.
Image pull secret generation and image address replacement are asynchronous. To avoid pull failures, create or issue the image pull secret for the P2P endpoint before deploying a workload. In the example above, the secret is created in the test-registry-vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001 domain.
If the credentials in your original image pull secret are invalid, the P2P-accelerated image pull will also fail.
Enable P2P acceleration
Add the P2P acceleration label (k8s.aliyun.com/image-accelerate-mode: p2p) to a workload or a namespace. You do not need to modify workload YAML files when using the namespace-level method.
Option 1: Add the label to a workload
This example adds the label to a Deployment. Edit the Deployment:
kubectl edit deploy <Name of the Deployment>
Add the k8s.aliyun.com/image-accelerate-mode: p2p label to the Pod template's labels section:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
# enable P2P
k8s.aliyun.com/image-accelerate-mode: p2p
app: nginx
spec:
# your ACR instance image pull secret
imagePullSecrets:
- name: test-registry
containers:
# your ACR instance image
- image: test-registry-vpc.cn-hangzhou.cr.aliyuncs.com/docker-builder/nginx:latest
name: test
command: ["sleep", "3600"]
Option 2: Add the label to a namespace
Applying the label at the namespace level enables P2P acceleration for all eligible workloads in that namespace, with no per-workload changes required.
Using the ACK console:
-
Log on to the ACK consoleACK console. In the navigation pane on the left, click Clusters.
-
On the Clusters page, find the cluster you want and click its name. In the left-side navigation pane, click Namespaces and Quotas.
-
On the Namespace page, find the target namespace and click Edit in the Actions column.
-
In the Edit Namespace dialog box, click +Labels, set Variable Name to
k8s.aliyun.com/image-accelerate-modeand Variable Value top2p, and then click OK.
Using kubectl:
kubectl label namespaces <YOUR-NAMESPACE> k8s.aliyun.com/image-accelerate-mode=p2p
Verify P2P acceleration
After enabling P2P acceleration, confirm that the webhook has injected the annotation, P2P image address, and P2P image pull secret into the Pod.
Run the following command:
kubectl get po <Name of the pod> -oyaml
Expected output:
apiVersion: v1
kind: Pod
metadata:
annotations:
# injected automatically
k8s.aliyun.com/image-accelerate-mode: p2p
k8s.aliyun.com/p2p-config: '...'
spec:
containers:
# image replaced with P2P endpoint
- image: test-registry-vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001/docker-builder/nginx:latest
imagePullSecrets:
- name: test-registry
# image pull secret for P2P endpoint
- name: acr-credential-test-registry-p2p
P2P acceleration is active when all three of the following are present in the output:
-
The
k8s.aliyun.com/image-accelerate-mode: p2pannotation -
The image address replaced with a
.distributed.P2P endpoint -
The P2P image pull secret (
acr-credential-<original-secret-name>-p2p)
(Optional) Disable loading images on demand and enable P2P acceleration
Use this procedure to configure a single node to use P2P acceleration without on-demand image loading.
These changes apply to a single node only. Subsequent O&M operations on the node may overwrite them. Re-apply the changes if that happens.
-
Log on to the ACK consoleACK console. In the navigation pane on the left, click Clusters.
-
On the Clusters page, click the name of the cluster you want to manage. In the navigation pane on the left, choose Nodes > Nodes.
-
On the Nodes page, click the instance ID under the IP address of the node you want to manage.
-
On the instance details page, use Connect to log on to the node.
-
Run the
vicommand to edit thep2pConfigfield in the/etc/overlaybd/overlaybd.jsonfile. Setenabletofalse:{ "p2pConfig": { "enable": false, "address": "https://localhost:6****/accelerator" }, ... ... } -
Restart the overlaybd service to reload image resources on demand:
service overlaybd-tcmu restart
(Optional) Enable acceleration metric collection
Enable metric collection
Enable metric collection when installing the P2P acceleration agent. In the agent's YAML configuration, set exporter.enable to true:
p2p:
v2:
# Component for P2P v2
image: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/dadi-agent
imageTag: v0.1.2-72276d4-aliyun
# Concurrency limit on the number of layers each node proxy downloads simultaneously
proxyConcurrencyLimit: 128
# Port for communication between P2P nodes
p2pPort: 65002
cache:
# Disk cache capacity in bytes, default 4 GB
capacity: 4294967296
# Set to 1 if you are using high-performance disks, e.g. ESSD PL2/PL3
aioEnable: 0
exporter:
# Set to true to enable metric collection
enable: false
port: 65003
# Downstream throughput limit in MB/s
throttleLimitMB: 512
Access metrics
The exporter field in the P2P YAML file defines the listening port. Configure it as follows:
ExporterConfig:
enable: true # Enables the metric collection feature.
port: 65006 # Listening port.
standaloneExporterPort: true # Exposes a standalone port. If set to false, throughput is reported over the HTTP service port.
To retrieve metrics, run:
curl 127.0.0.1:$port/metrics
Example output:
# HELP DADIP2P_Alive
# TYPE DADIP2P_Alive gauge
DADIP2P_Alive{node="192.168.69.172:65005",mode="agent"} 1.000000 1692156721833
# HELP DADIP2P_Read_Throughtput Bytes / sec
# TYPE DADIP2P_Read_Throughtput gauge
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_QPS
# TYPE DADIP2P_QPS gauge
DADIP2P_QPS{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_MaxLatency us
# TYPE DADIP2P_MaxLatency gauge
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_Count Bytes
# TYPE DADIP2P_Count gauge
DADIP2P_Count{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_Cache
# TYPE DADIP2P_Cache gauge
DADIP2P_Cache{node="192.168.69.172:65005",type="allocated",mode="agent"} 4294967296.000000 1692156721833
DADIP2P_Cache{node="192.168.69.172:65005",type="used",mode="agent"} 4294971392.000000 1692156721833
# HELP DADIP2P_Label
# TYPE DADIP2P_Label gauge
Metrics reference
Metric names
| Metric | Description | Unit |
|---|---|---|
DADIP2P_Alive |
Whether the service is alive | — |
DADIP2P_Read_Throughtput |
P2P service read throughput | bytes/s |
DADIP2P_QPS |
Queries per second | — |
DADIP2P_MaxLatency |
Maximum request latency | μs |
DADIP2P_Count |
Cumulative traffic processed | bytes |
DADIP2P_Cache |
Cache usage per server | bytes |
Tags
| Tag | Values | Description |
|---|---|---|
node |
IP:port | Service address of the P2P agent or root |
type |
pread |
Downstream request processing |
download |
Back-to-origin routing | |
peer |
P2P network distribution | |
disk |
Disk operations | |
http |
HTTP request processing | |
allocated |
Cache space allocated | |
used |
Cache space in use |
Example:
DADIP2P_Count{node="11.238.108.XXX:9877",type="http",mode="agent"} 4248808352.000000 1692157615810
The total HTTP request traffic processed by the agent service: 4248808352 bytes.
DADIP2P_Cache{node="11.238.108.XXX:9877",type="used",mode="agent"} 2147487744.000000 1692157615810
The cache used by the agent: 2147487744 bytes.
Audit logs
Enable audit logs
In the p2p ConfigMap, set logAudit to true:
DeployConfig:
mode: agent
logDir: /dadi-p2p/log
logAudit: true
logAuditMode: stdout # stdout sends logs to the console. Set to file to write logs to /dadi-p2p/log/audit.log.
Audit log format
Each log entry records the processing time from request receipt to response return. Unit: μs.
2022/08/30 15:44:52|AUDIT|th=00007FBA247C5280|download[pathname=/https://cri-pi840la*****-registry.oss-cn-hangzhou.aliyuncs.com/docker/registry/v2/blobs/sha256/dd/dd65726c224b09836aeb6ecebd6baf58c96be727ba86da14e62835569896008a/data][offset=125829120][size=2097152][latency=267172]
....
2022/08/30 15:44:55|AUDIT|th=00007FBA2EFEAEC0|http:pread[pathname=/https://cri-pi840lacia*****-registry.oss-cn-hangzhou.aliyuncs.com/docker/registry/v2/blobs/sha256/dd/dd65726c224b09836aeb6ecebd6baf58c96be727ba86da14e62835569896008a/data][offset=127467520][size=65536][latency=21]
Each entry contains: timestamp, AUDIT, thread pointer, and an operation code with [pathname=], [size=], and [latency=] fields.
-
Ignore the
AUDITmarker and thread pointer — they are for internal use. -
size: data size for a single request. A negative value indicates an exception. -
latency: processing latency for a single request, in μs.
Common operation codes:
| Code | Description |
|---|---|
http:pread |
HTTP proxy processing an output data request |
rpc:stat |
P2P agent getting the file size |
rpc:pread |
P2P agent processing an output data request |
download |
P2P agent downloading data from the upstream |
filewrite |
P2P agent writing a data shard to the cache |
fileread |
P2P agent reading data shards from the cache |
Log example:
download[pathname=mytest][offset=0][size=65536][latency=26461]
## The latency when the P2P agent downloads the [0,65536) data of the mytest file from the upstream is 26461 μs.
rpc:pread[pathname=mytest][offset=0][size=65536][latency=2]
## The latency when the P2P agent returns the [0,65536) data of the mytest file to the downstream is 2 μs.
http:pread[pathname=mytest][offset=0][size=65536][latency=26461]
## The latency when the proxy downloads the [0,65536) data of the mytest file from the upstream is 26461 μs.
Appendix
P2P acceleration performance reference
Test results for 1000 nodes pulling the same image simultaneously, with image decompression after download.
Test environment:
| Component | Specification |
|---|---|
| ACK cluster | 1000 nodes |
| Elastic Compute Service (ECS) instance | 4 vCPUs, 8 GB memory |
| Cloud disk | 200 GB PL1 ESSD |
| P2P agent | 1 vCPU, 1 GB memory, 4 GB cache |
Image specifications tested:
-
4 GB (512 MB × 8 layers)
-
10 GB (10 GB × 1 layer)
-
20 GB (4 GB × 5 layers, 10 GB × 2 layers, 512 MB × 40 layers, 20 GB × 1 layer, 2 GB × 10 layers)
Test results (P95 pull time):
| Image specification | Pull time | Peak back-to-origin throughput (Gbit/s) |
|---|---|---|
| 512 MB × 8 layers | 116 seconds | 2 |
| 10 GB × 1 layer | 6 minutes 20 seconds | 1.2 |
| 4 GB × 5 layers | 9 minutes 15 seconds | 5.1 |
| 10 GB × 2 layers | 9 minutes 50 seconds | 6.7 |
| 512 MB × 40 layers | 7 minutes 55 seconds | 3.8 |
| 20 GB × 1 layer | 11 minutes | 2.5 |
| 2 GB × 10 layers | 8 minutes 13 seconds | 3.2 |