On Elastic Compute Service (ECS) Bare Metal instances, the kdump service that comes with the operating system may fail to generate crash dump files. This topic describes the cause of and solutions to this issue.

Problem description

The following section describes the known scenarios in which the issue occurs. In these scenarios, crash dump files cannot be generated after hot swapping of devices on ECS instances.
  • Instances of the sixth-generation ECS Bare Metal Instance families (including ebmg6, ebmc6, and ebmr6) use the following images:
    • CentOS 8.3 and earlier
    • Ubuntu 16 and Ubuntu18
    • Debian 10
    • Alibaba Cloud Linux 2 with a kernel version earlier than 4.19.91-24.al7
  • Instances of the seventh-generation ECS Bare Metal Instance families (including ebmg7, ebmc7, and ebmr7) use Debian 10 images.

Possible cause

When the kdump service of the instances is in the crashkernel phase, the pci_resource resource of Elastic Block Storage (EBS) device vda cannot be allocated. As a result, crash dump files cannot be generated. The root cause of this issue is that the instance type is incompatible with the operating system. A command output similar to the following one is returned.Kdump

Solutions

You can use one of the following methods to solve this issue:
  • Method 1: Upgrade the operating system kernel to version 5.10.
  • Method 2: Add the following patches to the operating system and rebuild a kernel:
    Benjamin Herrenschmidt (1):
      PCI: Don't auto-realloc if we're preserving firmware config
    
    Kelsey Skunberg (1):
      PCI: Make pci_hotplug_io_size, mem_size, and bus_size private
    
    Logan Gunthorpe (1):
      PCI: Don't disable bridge BARs when assigning bus resources
    
    Nicholas Johnson (2):
      PCI: Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters
      PCI: Avoid double hpmemsize MMIO window assignment
    Note Some kernel versions already contain some of the preceding patches. You need to add patches that are not contained to the kernels. For example, the kernel version 4.19 of Debian 10 already contains the first and third patches, but the second, fourth and fifth patches must be added to the kernel before the operating system can use the kdump service.

In addition to upgrading the kernel version and adding patches to the kernel, you must take note of the following items:

For Debian or Ubuntu operating system kernels, you must modify the crashkernel parameter to adjust the amount of memory reserved for the operating system. Recommended settings:
crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M
Note For some operating system images such as Debian 10, you must adjust the amount of reserved memory to 384 MB or 512 MB. If the value of the reserved memory is not increased, an out-of-memory (OOM) exception occurs when kdump is in the crashkernel phase, and crash dump files cannot be generated.
The following procedure shows how to adjust the crashkernel parameter to reserve 256 MB of memory out of 384 MB of system memory:
  1. Open the /kdump-tools.cfg file.
    vim /etc/default/grub.d/kdump-tools.cfg
  2. Press the i key to enter the edit mode and change the crashkernel parameter settings to the following content:
    GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=384M-:256M"
    After the parameter is changed, press the Esc key, enter :wq, and press the Enter key to save and close the file.
  3. Update the configurations of GRand Unified Bootloader (GRUB).
    update-grub
  4. Restart the ECS instance for the configurations to take effect.

    We recommend that you restart ECS instances during off-peak hours to reduce the impact on your business operations caused by instance restarts. For more information about how to restart an ECS instance, see Reboot the instance.