All Products
Search
Document Center

Elastic High Performance Computing:Improve cluster performance by disabling HT for compute nodes

Last Updated:Jun 14, 2024

Each compute node in an Elastic High Performance Computing (E-HPC) cluster is an Elastic Compute Service (ECS) instance. By default, Hyper-Threading (HT) is enabled for each ECS instance. In some high-performance computing (HPC) scenarios, you can disable HT to improve the performance of instances. This topic describes how to disable HT for compute nodes.

Background information

CPUs are central processing units. A CPU can contain several physical cores. You can use the HT technology to create two virtual processing cores for each physical core in a CPU. ECS supports multi-threading based on HT that enables two threads to concurrently run on a single physical core. Each thread can be considered as a virtual CPU (vCPU). vCPUs are virtual processing cores of ECS instances. In some HPC scenarios, you can disable HT on compute nodes to improve their performance.

Usage notes

Different types of instances have different limits on disabling HT.

  • HT can be disabled only on some enterprise-level x86 compute-optimized ECS instances. For more information, see Instance type limits.

  • HT cannot be directly disabled on ECS Bare Metal Instances, but can be disabled at the software level.

  • HT is disabled on Super Computing Cluster (SCC) instances by default.

Disable HT on enterprise-level x86 compute-optimized ECS instances

If you need to add compute nodes to a created cluster, you can specify whether to enable HT for these instances.

Important

HT cannot be disabled on enterprise-level x86 compute-optimized ECS instances after they are created.

Manual scale-out

When you manually scale out the cluster, you can specify whether to enable HT in the Quantity And Type of Instances to be Added section. For more information, see Manually scale out an E-HPC cluster.

Disable HT when manually scale out.png

Auto scaling

When you configure auto scaling for the cluster, you can configure whether to enable HT for the instance in the Global Configurations section. For more information, see Configure auto scaling.

Disable HT when in auto scaling.png

Disable HT on ECS Bare Metal Instances

You can disable HT on ECS Bare Metal Instances at the software level after the instance is created, which is inside guest operating systems. To do so, you can either set nr_cpus or change the vCPU status. Each physical core corresponds to two virtual processing cores. Disabling HT inside guest operating systems is actually disabling one virtual processing core on each physical core. In this way, one physical core corresponds to only one virtual processing core.

Note

To physically disable HT on bare metal servers, you need to manage the motherboard BIOS. This process requires restarting hardware, which is complex and risky. Therefore, we recommend that you use the two methods described in this topic. You can manage the HT at the software level to achieve an effect similar to disabling.

Method

Advantages

Disadvantages

Set nr_cpu

When you use commands such as lscpu or cpuid to view CPU status, the actual number of CPUs in use are returned. This means that HT is disabled.

  • If nr_cpus is set to half of the vCPU number of the instance type, half of the vCPUs can no longer be used while the instance is running. You need to delete the nr_cpus parameter and then restart the instance to restore all vCPUs.

  • If you set nr_cpus to half of the vCPU number of the instance type, we recommend that you delete this parameter before you create a custom image. Otherwise, when you use the custom image to create instances of other instance types, only some physical cores may be recognized. To solve this issue, you need to reset the nr_cpus parameter.

Change vCPU status

You can run commands to change the status of vCPUs without restarting the instance. After HT is disabled in this way, you can use the echo 1 > /sys/devices/system/cpu/cpu$cpunum/online command to restore all vCPUs without restarting the instance.

  • When you use commands such as lscpu or cpuid to view CPU status, all vCPUs including the ones in use and not in use are returned.

  • Some software license can detect all CPUs and may incur extra charges.

  • You need to reset the settings if instances are restarted.

Important

We cannot guarantee whether these two methods will affect your business in real life. We recommend that you fully evaluate the business impact in real-life production environment.

Set nr_cpus

nr_cpus is a kernel parameter that limits the maximum number of CPUs supported by the kernel. Valid values: 2 to 255. To achieve the effect of disabling HT, you can set nr_cpus to half of the vCPU number of the instance type. This way, the maximum number of CPUs supported by the kernel is halved, and the actual CPUs in use are also halved. One physical core corresponds to only one virtual processing core. Also, the actual number of CPUs in use is returned for commands such as lscpu.

This topic uses an ecs.ebmc6me.16xlarge instance as an example. The instance has 64 vCPUs and uses CentOS.7 operating system. The following section describes the steps of setting nr_cpus for the instance.

  1. Connect to the ECS Bare Metal Instance. For more information, see Connect to a Linux instance by using a password or key.

  2. Run the lscpu command to view the vCPU status and check whether HT is enabled for the instance.

    The following figure shows a sample response. The CPU(s) value is the same as the actual number of vCPUs of the instance, and the Thread(s) per core value is 2, which indicates that HT is enabled for the instance.

    Bare Metal CPU.png

  3. Modify the grub file.

    vim /boot/grub2/grub.cfg

    Press i to enter the edit mode. Set nr_cpus to half the vCPU quantity of the instance type. Example: nr_cpus=32. Press the Esc key to exit the edit mode, and enter :wq to save the file and exit.

    裸金属CPU111..png

  4. Restart the instance.

  5. Check results.

    1. Run the lscpu command to view the vCPU status.

      The following figure shows a sample response in which the CPU(s) is 32 and the Thread(s) per core is 1. The response indicates that HT is disabled for the instance.

      Bare metal CPU1.png

    2. Run the lscpu --extend command to check the vCPU distribution.

      The following figure shows a sample response in which 32 vCPUs are distributed on 32 physical cores. This achieves the effect of disabling HT.

      Bare metal CPU11.png

Change the vCPU status

You can run the command to change the status of vCPUs to disable half of the vCPUs.

This topic uses an ecs.ebmc6me.16xlarge instance as an example. The instance has 64 vCPUs and uses CentOS.7 operating system. The following section describes the steps of changing its vCPU status.

  1. Connect to the ECS Bare Metal Instance. For more information, see Connect to a Linux instance by using a password or key.

  2. Run the lscpu command to view the vCPU status and check whether HT is enabled for the instance.

    The following figure shows a sample response. The CPU(s) value is the same as the actual number of vCPUs of the instance, and the Thread(s) per core value is 2, which indicates that HT is enabled for the instance.

    Bare Metal CPU.png

  3. Create and execute a script to change the vCPU status.

    1. Create a script.

      vim test.sh

      Sample script:

      #!/bin/bash
      for cpunum in $(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | cut -s -d, -f2- | tr ',' '\n' | sort -un)
      do
          echo 0 > /sys/devices/system/cpu/cpu$cpunum/online
      done
    2. Run the script.

      sh test.sh
  4. Check results.

    1. Run the lscpu command to view the vCPU status.

      The following figure shows a sample response in which 32 vCPUs are offline and the Thread(s) per core is 1. The response indicates that HT is disabled for the instance.

      Bare metal CPU22.png

    2. Run the lscpu --extend command to check the vCPU distribution.

      The following figure shows a sample response in which 32 online vCPUs are distributed on 32 physical cores. This achieves the effect of disabling HT.

      Bare metal CPU2.png