This topic describes how to optimize CPU performance in different scenarios.

Scenario Method Performance
Nodes of a Container Service for Kubernetes (ACK) cluster run on ECS bare metal instances that have Non-Uniform Memory Access (NUMA) enabled.
  • Compute-intensive workloads
  • Online workloads
  • Databases
  • Stable average response time (RT).
  • Minimized CPU throttling.
  • Improved CPU utilization.
Nodes of an ACK cluster run on ECS bare metal instances or ECS instances, each of which has 32 vCPUs or more.
  • Big data workloads, such as Spark jobs
  • Machine learning workloads, such as TensorFlow jobs and Message Passing Interface (MPI) jobs
Maximize the utilization of CPU time fragments to improve CPU utilization.

Nodes of an ACK cluster run on the following instances that have NUMA enabled: ECS bare metal instances or ECS instances, each of which has 32 vCPUs or more.

Hybrid deployment of latency-sensitive workloads and BestEffort workloads that allow bindings to vCPUs.

Maximize the RT, CPU time fragments, and memory reclaim policies for latency-sensitive workloads.

Nodes of an ACK cluster run on ECS bare metal instances or ECS instances, each of which has 32 vCPUs or more.

Hybrid deployment of multiple workloads that have the CPUShare mode enabled.

Maximize the RT of latency-sensitive workloads. The impact of BestEffort workloads on latency-sensitive workloads is within 5%.
Nodes of an ACK cluster run on ECS bare metal instances that use the AMD architecture.
  • Compute-intensive workloads
  • Online workloads
  • Redis in-memory databases
  • The RT of online applications that use 8 vCPUs or less is reduced by 30% and the throughput is improved by 45%.
  • Stable RT.
  • Minimized CPU throttling.
  • Improved CPU utilization.
Nodes of an ACK cluster run on ECS bare metal instances that use the ARM architecture.
  • Compute-intensive workloads
  • Online workloads
Topology-aware CPU scheduling
  • The RT of online applications that use 8 vCPUs or less is reduced by 20% and the throughput is improved by 20%.
  • Stable RT.
  • Minimized CPU throttling.
Registered clusters that manage the scheduling of on-premises physical nodes.
  • Compute-intensive workloads
  • Self-managed databases
Topology-aware CPU scheduling The CPU performance is related to the CPU type.

The following figure shows that the efficiency of the L3 cache, dynamic memory bandwidth isolation, CCX/CCD affinity scheduling, and topology-aware CPU scheduling are improved after the CPU performance is optimized.

1