Intel® Data Streaming Accelerator (DSA) is a high-performance data replication and transformation accelerator built into the Intel® Sapphire Rapids processors of eighth-generation SHENLONG architecture ECS instances. When you run ack-koordinator on a DSA-capable node, DSA acceleration is automatically enabled — offloading memory copy and transformation operations from the CPU to the hardware accelerator.
How it works
DSA transfers memory operations from the CPU to the hardware accelerator integrated in the Intel® Sapphire Rapids processor. This reduces CPU load during memory-intensive tasks such as memory balancing and compaction, and improves the performance of the nearby memory access acceleration feature in ack-koordinator.
With DSA enabled, accessing 100,000–1,000,000 memory pages is 30%–200% faster, and CPU utilization drops accordingly. In an approximately 1.7 GB memory migration test, DSA reduced migration time to 31.25% of the baseline and increased bandwidth to 320%.
The performance figures above are reference values from test environments. Actual results vary based on workload, instance type, and memory usage.
Prerequisites
Before you begin, make sure you have:
-
DSA-capable ECS instances — Applications must run on eighth-generation SHENLONG bare metal instances with multiple non-uniform memory access (NUMA) nodes. The following instance types are recommended: For the full list of ECS instance types, see ECS instance types.
-
ecs.ebmc8i.48xlarge -
ecs.c8i.32xlarge -
ecs.g8i.48xlarge
-
-
ack-koordinator 1.2.0-ack1.2 or later — Install ack-koordinator on your ACK cluster. Alibaba Cloud provides the required DSA drivers based on Alinux 3. See ack-koordinator (FKA ack-slo-manager) for installation steps.
Noteack-koordinator supersedes resource-controller. If resource-controller is installed on the cluster, uninstall it before installing ack-koordinator. See Uninstall resource-controller.
-
kubectl connected to the cluster — See Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
Billing
Installing and using ack-koordinator is free. Fees may apply in the following cases:
-
Worker node resources: ack-koordinator is a non-managed component that runs on worker nodes. You can configure the amount of resources each module requests during installation.
-
Prometheus metrics: ack-koordinator exposes resource profiling and fine-grained scheduling metrics as Prometheus metrics. If you enable these metrics and use Managed Service for Prometheus, they are billed as custom metrics. Review the Managed Service for Prometheus billing page before enabling this feature. To monitor usage and costs, see Query the amount of observable data and bills.
Enable DSA acceleration
ack-koordinator automatically activates DSA on nodes where DSA hardware is present. No additional configuration is required beyond deploying ack-koordinator on a DSA-capable instance.
To confirm that DSA hardware is available on a node, run:
ls /sys/bus/dsa
If the command returns a non-empty directory without errors, DSA hardware is present and acceleration is active.
DSA enhances the nearby memory access acceleration feature. To get the most out of DSA, enable nearby memory access acceleration on your cluster. See Enable nearby memory access acceleration for containers.
Verify DSA acceleration
The following procedure uses an ecs.ebmc8i.48xlarge instance to verify that DSA acceleration is active and measure its effect on memory migration performance.
Procedure
-
Log on to the node. See Methods for connecting to an ECS instance.
-
Confirm that the processor has DSA hardware:
ls /sys/bus/dsaIf the command returns a non-empty directory without errors, the processor has been integrated with DSA. If the command returns an error or an empty directory, the node does not have DSA hardware. DSA acceleration is only available on eighth-generation SHENLONG bare metal instances.
-
Deploy a memory-intensive test application with nearby memory access acceleration enabled. Redis is recommended. See the example for deployment steps and instructions on enabling nearby memory access acceleration for the workload.
Result analysis
The following table shows the results of migrating 26.12 GB of Redis remote memory (based on 1 million memory pages) with and without DSA acceleration:
| Scenario | Migration time (seconds) | CPU utilization | vCore-seconds |
|---|---|---|---|
| DSA acceleration disabled | 9.649 | 1.000 | 9.649 |
| DSA acceleration enabled | 4.928 | 0.668 | 3.292 |
With DSA enabled, migration time drops to 51.8%, CPU utilization drops to 66.8%, and vCore-seconds drop to 34.1% of the baseline values — demonstrating that DSA accelerates memory migration and reduces CPU consumption.
What's next
DSA works best in combination with nearby memory access acceleration. To enable and configure nearby memory access acceleration for your containers, see Enable nearby memory access acceleration for containers.