what do I do if crash dump files fail to be generated on some ECS instances? -

On Elastic Compute Service (ECS) bare metal instances, the kdump service that comes with the operating system may fail to generate crash dump files. This topic describes the cause of and solution to this issue.

Problem description

This section describes the known scenarios in which crash dump files cannot be generated after devices are hot-swapped on the following instances:

Instances of the sixth-generation ECS Bare Metal Instance families (including ebmg6, ebmc6, and ebmr6) that use the following images:
- CentOS 8.3 or earlier images
- Ubuntu 16/18 images
- Debian 10 images
- Alibaba Cloud Linux 2 images with a kernel version earlier than 4.19.91-24.al7
Instances of the seventh-generation ECS Bare Metal Instance families (including ebmg7, ebmc7, and ebmr7) that use Debian 10 images.

Cause

When the kdump service of the instances is in the crashkernel phase, the pci_resource resource failed to be allocated to the vda block device. As a result, crash dump files cannot be generated. The root cause of this issue is that the instance type is incompatible with the operating system. A command output similar to that shown in the following figure is returned. Kdump

Solution

You can use one of the following methods to resolve this issue:

Method 1: Upgrade the operating system kernel to version 5.10.
Method 2: Add the following patches to the operating system and rebuild a kernel:
```
Benjamin Herrenschmidt (1):
  PCI: Don't auto-realloc if we're preserving firmware config

Kelsey Skunberg (1):
  PCI: Make pci_hotplug_io_size, mem_size, and bus_size private

Logan Gunthorpe (1):
  PCI: Don't disable bridge BARs when assigning bus resources

Nicholas Johnson (2):
  PCI: Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters
  PCI: Avoid double hpmemsize MMIO window assignment
```
Note
Some kernel versions already contain some of the preceding patches. You must add patches that are not contained to the kernels. For example, the kernel version 4.19 of Debian 10 already contains the first and third patches. Before the operating system can use the kdump service as expected, you must add the second, fourth and fifth patches to the kernel.

You must also take note of the following items:

For Debian or Ubuntu operating system kernels, you must modify the crashkernel parameter to adjust the memory reserved for kdump. Recommended settings:

crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M

Note

For some operating system images such as Debian 10 images, you must adjust the amount of reserved memory to 384 MB or 512 MB. If the amount of the reserved memory is not increased, an out-of-memory (OOM) exception occurs when the kdump service is in the crashkernel phase, and crash dump files cannot be generated.

The following procedure shows how to adjust the crashkernel parameter to reserve 256 MB of memory if the total amount of system memory is more than 384 MB:

Open the /kdump-tools.cfg file.
```
vim /etc/default/grub.d/kdump-tools.cfg
```
Press the I key to enter Insert mode and modify the following crashkernel parameter settings:
```
GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=384M-:256M"
```
After you modify the parameter, press the Esc key, enter :wq, and then press the Enter key to save and close the file.
Update the configurations of GRand Unified Bootloader (GRUB).
```
update-grub
```
Restart the ECS instance for the configurations to take effect.
We recommend that you restart ECS instances during off-peak hours to reduce the impact on your business operations caused by instance restarts. For more information about how to restart an ECS instance, see Restart an instance.