IntelĀ® Data Streaming Accelerator (DSA) is a high-performance data replication and transformation accelerator that is integrated into the Intel Sapphire Rapids (SPR) processors of Elastic Compute Service (ECS) instances that use the eighth-generation SHENLONG architecture. After you install ack-koordinator on nodes that are integrated with DSA, DSA acceleration is automatically enabled to accelerate data replication and transformation in dynamic random-access memory (DRAM), persistent memory, and data processing applications. This topic describes how to use DSA to accelerate data streaming.

Table of contents

Prerequisites

  • A Container Service for Kubernetes (ACK) Pro cluster that runs Kubernetes 1.18 or later is created. For more information, see Create an ACK managed cluster.
  • A kubectl client is connected to the cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
  • ack-koordinator 1.1.1-ack.2 or later is installed. ack-koordinator was formerly known as ack-slo-manager. ack-koordinator 1.1.1-ack.2 is in canary release. To use ack-koordinator 1.1.1-ack.2, Submit a ticket. For more information about how to install ack-koordinator, see ack-koordinator (FKA ack-slo-manager).
    Note ack-koordinator supports all features provided by resource-controller. If you use resource-controller, you must uninstall it before you install ack-koordinator. For more information about how to uninstall a component, see Uninstall resource-controller.
  • The fifth-generation, sixth-generation, seventh-generation, or eighth-generation Elastic Compute Service (ECS) instances of the ecs.ebmc, ecs.ebmg, ecs.ebmgn, ecs.ebmr, ecs.ebmhfc, or ecs.scc instance families are used to deploy multi-NUMA instances.
    Note The nearby memory access acceleration feature functions better on the eighth-generation ECS instances of the ecs.ebmc8i.48xlarge, ecs.c8i.32xlarge, or ecs.g8i.48xlarge instance types. For more information about ECS instance families, see Overview of instance families.

Benefits

DSA is integrated into the processors of ECS instances that use the eighth-generation SHENLONG architecture. Alibaba Cloud provides relevant drivers based on Alinux 3. After you install ack-koordinator on ECS instances that are integrated with DSA, DSA acceleration is automatically enabled to transfer memory operations to DSA. This accelerates data replication and transformation, and mitigates CPU jitters during the acceleration process. DSA provides the following benefits:

  • Improve the data processing performance of data-intensive workloads on nodes, optimize memory operations in the OS kernel such as memory balancing and compaction, and improve the overall memory performance of nodes.
  • DSA significantly improves the performance of the nearby memory access acceleration feature of ack-koordinator in handling individual data requests. The vCore hours consumed by workloads are reduced. The acceleration performance of DSA is improved when the usage of remote memory increases. The speed of accessing 100,000 to 1,000,000 memory pages can be improved by 30% to 200% and the CPU utilization is reduced. About 1.7 GB of application memory is migrated to the local server. Compared with processors that are not integrated with DSA, the migration time is reduced to 31.25% and the bandwidth is increased to 320.00%.
    Important The test statistics provided in this topic are only theoretical values. The actual values may vary based on your environment.
    DSA

For more information about DSA, see Intel official documentation.

Use DSA acceleration

After you install ack-koordinator on ECS instances that are integrated with DSA, DSA acceleration is automatically enabled. No additional configuration is required. For more information about the nearby memory access acceleration feature of ack-koordinator, see Use the nearby memory access acceleration feature on multi-NUMA instances.

Verify DSA acceleration

The nearby memory access acceleration feature migrates the memory on the remote non-uniform memory access (NUMA) of a core-bound application to the local server in a secure manner. This improves the hit ratio of local memory access and optimizes memory access for memory-intensive workloads.

Test environment

To test DSA acceleration, you must use multi-NUMA instance types, such as ecs.ebmc8i.48xlarge, ecs.c8i.32xlarge, and ecs.g8i.48xlarge. In this example, ecs.ebmc8i.48xlarge is used.

Procedure
  1. Log on to the node and run the following command to check whether the processor of the node is integrated with DSA:
    ls /sys/bus/dsa

    If no error message appears and the returned directory is not empty, the processor is integrated with DSA.

  2. Deploy an application that has the nearby memory access acceleration feature enabled.
    We recommend that you deploy a memory-intensive application such as Redis.
Conclusions

The following table compares the processor that has DSA acceleration enabled and the processor that has DSA disabled in terms of the migration time and CPU utilization (based on 1,000,000 memory pages) when 26.12 GB of Redis remote memory is accelerated by using the nearby memory access acceleration feature.

ScenarioMigration time (s)CPU utilizationCore hour (s)
DSA acceleration disabled9.6491.0009.649
DSA acceleration enabled4.9280.6683.292

Conclusions: The migration time, average CPU utilization, and vCore hours when DSA acceleration is enabled are reduced to 51.8%, 66.8%, and 34.1% of those when DSA acceleration is disabled. This indicates that DSA can accelerate memory migration and reduce CPU utilization.