The P2P acceleration feature can accelerate image pulls to reduce the amount of time that is required to deploy applications. If a large number of nodes in a container cluster need to pull an image at the same time, you can use the P2P acceleration feature to accelerate image pulls. This topic describes how to use the P2P acceleration feature to accelerate image pulls.
Background information
If a large number of nodes in a container cluster need to pull an image at the same time, the network bandwidth of the container image storage may become a performance bottleneck. The P2P acceleration feature uses the bandwidth resources of compute nodes to distribute images across nodes. This reduces the pressure on container image storage, accelerates image pulls, and reduces the amount of time that is required for application deployment. Test results indicate that if 1000 nodes pull a 1GB-sized image at the same time in a network that has a bandwidth of 10 Gbit/s, the P2P acceleration feature reduces the image pull time by more than 95% compared with the regular image pull mode. In addition, the new P2P acceleration mode improves performance by 30% to 50% compared with the old P2P acceleration mode. By default, the new P2P acceleration mode is used when Container Registry loads image resources on demand. For more information, see Load resources of a container image on demand.
You can use the P2P acceleration feature in the following types of clusters:
Container Service for Kubernetes (ACK) clusters
On-premises clusters and clusters of third-party cloud service providers
Prerequisites
A P2P acceleration agent is installed in the cluster.
For information about how to install a P2P acceleration agent in an ACK cluster, see Install a P2P acceleration agent in an ACK cluster.
For information about how to install a P2P acceleration agent in an on-premises cluster or a cluster of a third-party cloud service provider, see Install a P2P acceleration agent in an on-premises cluster or a cluster of a third-party cloud service provider.
Limits
After you enable the P2P acceleration feature, the P2P acceleration agent uses a webhook to replace your container image address with the address of the P2P-accelerated image. For example, the P2P acceleration agent replaces your original image address (test****vpc.cn-hangzhou.cr.aliyuncs.com/docker-builder/nginx:latest
) by using the address (test****vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001/docker-builder/nginx:latest
) of the P2P-accelerated image.
In addition, the webhook automatically generates an image pull key to pull P2P-accelerated images. The generated image pull key is copied from the original image pull key. The generation of the image pull key and the replacement of the image address are asynchronous. Therefore, we recommend that you issue the image pull key or manually create an image pull key that is required for image pulls before you issue a workload. In the preceding example, the image pull key is created in the test-registry-vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001
domain. This can prevent image pull failures caused by image address replacement.
Enable P2P acceleration
You can add the P2P acceleration label to workloads such as pods and Deployments to enable P2P acceleration for the workloads. You can also add the P2P acceleration label to a namespace in your ACK cluster. This way, P2P acceleration is enabled for all workloads that meet acceleration conditions in the namespace. You do not need to modify the YAML files of the workloads to enable P2P acceleration. You can select a method to add the P2P acceleration label based on your business requirements.
The name of the P2P acceleration label is k8s.aliyun.com/image-accelerate-mode
and the value is p2p
.
Add the P2P acceleration label to a workload
In this example, the P2P acceleration label is added to a Deployment. Run the following command to edit the YAML file of the Deployment:
kubectl edit deploy <Name of the Deployment>
Add the
k8s.aliyun.com/image-accelerate-mode: p2p
label to the YAML file of the Deployment.apiVersion: apps/v1 kind: Deployment metadata: name: test labels: app: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: # enable P2P k8s.aliyun.com/image-accelerate-mode: p2p app: nginx spec: # your ACR instacne image pull secret imagePullSecrets: - name: test-registry containers: # your ACR instacne image - image: test-registry-vpc.cn-hangzhou.cr.aliyuncs.com/docker-builder/nginx:latest name: test command: ["sleep", "3600"]
Add the P2P acceleration label to a namespace
Use the ACK console
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, click Namespaces and Quotas.
On the Namespace page, find the namespace to which you want to add the P2P acceleration label and click Edit in the Actions column.
In the Edit Namespace dialog box, set the Variable Key parameter of the Label parameter to
k8s.aliyun.com/image-accelerate-mode
and the Variable Value parameter of the Label parameter top2p
. Then, click OK.
Use kubectl
kubectl label namespaces <YOUR-NAMESPACE> k8s.aliyun.com/image-accelerate-mode=p2p
Check whether the P2P acceleration feature is enabled
After you enable P2P acceleration for a pod, the P2P agent automatically injects the P2P acceleration annotation, the address of the P2P-accelerated image, and the secret for pulling the P2P-accelerated image into the pod.
The secret for pulling a P2P-accelerated image and the secret for pulling the original image are different only in the domain name of the image repository. Other configurations of the secrets are identical. If the user information in the secret for pulling the original image is invalid, the P2P-accelerated image fails to be pulled.
Run the following command to query pods:
kubectl get po <Name of the pod> -oyaml
Expected output:
apiVersion: v1
kind: Pod
metadata:
annotations:
# inject p2p-annotations automatically
k8s.aliyun.com/image-accelerate-mode: p2p
k8s.aliyun.com/p2p-config: '...'
spec:
containers:
# inject image to p2p endpoint
- image: test-registry-vpc.distributed.cn-hangzhou.cr.aliyuncs.com:65001/docker-builder/nginx:latest
imagePullSecrets:
- name: test-registry
# inject image pull secret for p2p endpoint
- name: acr-credential-test-registry-p2p
In the preceding code, the P2P acceleration annotation, the address of the P2P-accelerated image, and the secret for pulling the P2P-accelerated image are injected into the pod. P2P acceleration is enabled for the pod.
(Optional) Enable acceleration metric collection on the client
Description of P2P acceleration metrics
Enable the metric collection feature
Enable the metric collection feature when you install the P2P acceleration agent.
p2p:
v2:
# Component for P2P v2
image: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/dadi-agent
imageTag: v0.1.2-72276d4-aliyun
# Concurrency limit number of layers downloading by each node proxy
proxyConcurrencyLimit: 128
# The server port to communicate with P2P nodes
p2pPort: 65002
cache:
# Disk cache capacity in bytes, default 4GB
capacity: 4294967296
# Set to 1 if you are using high-performance disks on your ECS, e.g. ESSD PL2/PL3
aioEnable: 0
exporter:
# Set to true if you want to collect component metrics
enable: false
port: 65003
# limit for downstream throughput
throttleLimitMB: 512
Method for accessing the P2P acceleration agent
The exporter
-related parameter in the YAML file of the P2P acceleration agent defines the port for collecting metrics.
ExporterConfig:
enable: true # Specifies whether to enable the metrics collection feature.
port: 65006 # Specifies the listening port.
standaloneExporterPort: true # Specifies whether to expose a standalone port. If you set this parameter to false, throughput is over the HTTP service port
.Run the curl 127.0.0.1:$port/metrics
command. The following metrics are returned.
# HELP DADIP2P_Alive
# TYPE DADIP2P_Alive gauge
DADIP2P_Alive{node="192.168.69.172:65005",mode="agent"} 1.000000 1692156721833
# HELP DADIP2P_Read_Throughtput Bytes / sec
# TYPE DADIP2P_Read_Throughtput gauge
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_Read_Throughtput{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_QPS
# TYPE DADIP2P_QPS gauge
DADIP2P_QPS{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_QPS{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_MaxLatency us
# TYPE DADIP2P_MaxLatency gauge
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_MaxLatency{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_Count Bytes
# TYPE DADIP2P_Count gauge
DADIP2P_Count{node="192.168.69.172:65005",type="pread",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="download",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="peer",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="disk",mode="agent"} 0.000000 1692156721833
DADIP2P_Count{node="192.168.69.172:65005",type="http",mode="agent"} 0.000000 1692156721833
# HELP DADIP2P_Cache
# TYPE DADIP2P_Cache gauge
DADIP2P_Cache{node="192.168.69.172:65005",type="allocated",mode="agent"} 4294967296.000000 1692156721833
DADIP2P_Cache{node="192.168.69.172:65005",type="used",mode="agent"} 4294971392.000000 1692156721833
# HELP DADIP2P_Label
# TYPE DADIP2P_Label gauge
Metrics description
Metric names
DADIP2P_Alive: indicates whether the service is alive.
DADIP2 P_Read_Throughtput: the throughput of the P2P service. Unit: byte/s.
DADIP2P_QPS: the queries per second (QPS).
DADIP2P_MaxLatency: the latency statistics. Unit: μs.
DADIP2P_Count: the traffic statistics. Unit: bytes.
DADIP2P_Cache: the cache that is used by a single server. Unit: bytes.
Tag
node: the service IP address and port number of the P2P agent or root.
type: the type of the metric.
pread: process downstream requests.
download: back-to-origin routing.
peer: P2P network distribution.
disk: process disks.
http: process HTTP requests.
allocated: the space that is allocated to the cache.
used: the space that is used by the cache.
Metric example
DADIP2P_Count{node="11.238.108.XXX:9877",type="http",mode="agent"} 4248808352.000000 1692157615810
The total HTTP request traffic that is processed by the agent service: 4248808352 bytes.
DADIP2P_Cache{node="11.238.108.XXX:9877",type="used",mode="agent"} 2147487744.000000 1692157615810
The cache that is used by the agent: 2147487744 bytes.
Audit logs
Enable audit logs
Set the logAudit parameter in the P2P ConfigMap
to true
.
DeployConfig:
mode: agent
logDir: /dadi-p2p/log
logAudit: true
logAuditMode: stdout # Logs are collected to the console. If you set this parameter to file, logs are collected to the /dadi-p2p/log/audit.log directory
.Format of audit logs
The following code shows the format of audit logs. The code indicates the processing time from the receipt of the request to the return of the result. Unit: μs.
2022/08/30 15:44:52|AUDIT|th=00007FBA247C5280|download[pathname=/https://cri-pi840la*****-registry.oss-cn-hangzhou.aliyuncs.com/docker/registry/v2/blobs/sha256/dd/dd65726c224b09836aeb6ecebd6baf58c96be727ba86da14e62835569896008a/data][offset=125829120][size=2097152][latency=267172]
....
2022/08/30 15:44:55|AUDIT|th=00007FBA2EFEAEC0|http:pread[pathname=/https://cri-pi840lacia*****-registry.oss-cn-hangzhou.aliyuncs.com/docker/registry/v2/blobs/sha256/dd/dd65726c224b09836aeb6ecebd6baf58c96be727ba86da14e62835569896008a/data][offset=127467520][size=65536][latency=21]
Major parameters include time, AUDIT, thread pointer, and the operation code [pathname=][size=][latency=].
You can ignore the AUDIT and thread pointer parameters. The size parameter specifies the size of logs in a single request. If the value of this parameter is a negative number, an exception occurs. The latency parameter specifies the latency of a single request. Unit: μs.
The following list describes common operation codes:
http: read: The HTTP proxy processes output data requests.
rpc:stat: The P2P agent obtains the file size.
rpc:pread: The P2P agent processes output data requests.
download: The P2P agent downloads data from the upstream.
filewrite: The P2P agent writes the current data shard to the cache.
fileread: The P2P agent reads data shards from the cache.
Log example
download[pathname=mytest][offset=0][size=65536][latency=26461]
## The latency when the P2P agent downloads the [0,65536) data of the mytest file from the upstream is 26461 μs.
rpc:pread[pathname=mytest][offset=0][size=65536][latency=2]
## The latency when the P2P agent returns the [0,65536) data of the mytest file to the downstream is 2 μs.
http:pread[pathname=mytest][offset=0][size=65536][latency=26461]
## The latency when the P2P agent downloads the [0,65536) data of the mytest file from the upstream is 26461 μs
.(Optional) Disable loading images on demand and enable P2P acceleration
You can modify the configurations of a single node in a cluster to disable loading images on demand and enable P2P acceleration. Subsequent O&M operations on the node may overwrite the modifications. In this case, you must re-modify the configurations of the node. Perform the following operations to disable loading images on demand and enable P2P acceleration:
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose .
On the Nodes page, click the Instance ID in the Name/IP Address/Instance ID column corresponding to the node that you want to manage.
On the Instance Details tab, click Connect in the upper-right corner to connect to the node.
Run the
vi
command to edit the p2pConfig field in the/etc/overlaybd/overlaybd.json
file by changing the value of enable to false.{ "p2pConfig": { "enable": false, "address": "https://localhost:6****/accelerator" }, ... ... }
Run the following command to re-load image resources on demand:
service overlaybd-tcmu restart
Appendix
Reference for the P2P acceleration effect
Pull images of different specifications
Image specifications that are used in the test
4 GB (512 MB × 8 layers)
10 GB (10 GB × 1 layer)
20 GB (4 GB × 5 layers, 10 GB × 2 layers, 512 MB × 40 layers, 20 GB × 1 layer, 2 GB × 10 layers)
Test environment
ACK cluster: 1000 nodes
Specification of the Elastic Compute Service (ECS) instance: 4 vCPUs and 8 GB of memory
Specification of the cloud disk: 200 GB PL1 enhanced SSD (ESSD)
Specification of the P2P agent: 1 vCPU, 1 GB of memory, and 4 GB of cache
Test scenario
One thousand nodes pull the same image, and the image is decompressed after it is downloaded.
Test result (P95 time)
Image specification | Consumed time | Peak throughput (Gbit/s) of back-to-origin routing (buckets) |
512 MB × 8 layers | 116 seconds | 2 |
10 GB × 1 layer | 6 minutes and 20 seconds | 1.2 |
4 GB × 5 layers | 9 minutes and 15 seconds | 5.1 |
10 GB × 2 layers | 9 minutes and 50 seconds | 6.7 |
512 MB × 40 layers | 7 minutes and 55 seconds | 3.8 |
20 GB × 1 layer | 11 minutes | 2.5 |
2 GB × 10 layers | 8 minutes and 13 seconds | 3.2 |