All Products
Search
Document Center

Alibaba Cloud Linux:Alibaba Cloud Linux 2 ECS instances fail due to buffer overflow after the filter feature in Ftrace is enabled

Last Updated:Aug 05, 2024

Problem description

A system failure occurs on the Elastic Compute Service (ECS) instances that run Alibaba Cloud Linux 2 and have the following properties:

  • Image: Alibaba Cloud Linux 2.1903 LTS 64-bit

  • Kernel: kernel-4.19.91-23.al7 or earlier

The following call stack information is shown during the system failure.

[4017090.993301] general protection fault: 0000 [#1] SMP NOPTI
[4017090.999211] CPU: 69 PID: 24489 Comm: kubelet Kdump: loaded Tainted: G        W  OE     4.19.91-22.2.al7.x86_64 #1
[4017091.010213] Hardware name: Alibaba Alibaba Cloud ECS/Alibaba Cloud ECS, BIOS 1.0.PL.FC.P.028.01 09/22/2020
[4017091.020613] RIP: 0010:kmem_cache_alloc+0x90/0x190
[4017091.025821] Code: 03 05 34 be db 72 4d 8b 38 49 8b 40 10 4d 85 ff 0f 84 d5 00 00 00 48 85 c0 0f 84 cc 00 00 00 41 8b 46 20 48 8d 4a 01 4d 8b 06  8b 1c 07 4c 89 f8 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 41 8b 46
[4017091.045586] RSP: 0018:ffffb2216bebbe20 EFLAGS: 00010282
[4017091.051317] RAX: 0000000000000000 RBX: 00000000006000c0 RCX: 000000002477d008
[4017091.059192] RDX: 000000002477d007 RSI: 00000000006000c0 RDI: ffff95303f004600
[4017091.067065] RBP: 00000000006000c0 R08: 0000000000027170 R09: ffff950f20dadc01
[4017091.074937] R10: ffff950f20dadc00 R11: 0000000000000000 R12: ffffffff8d2d239b
[4017091.082807] R13: ffff95303f004600 R14: ffff95303f004600 R15: e58cbcefaf98e684
[4017091.090678] FS:  00007f0205ffb700(0000) GS:ffff953040740000(0000) knlGS:0000000000000000
[4017091.099512] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4017091.105761] CR2: 00007fd754130000 CR3: 0000005e9e840004 CR4: 00000000003606e0
[4017091.113637] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[4017091.121525] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[4017091.129399] Call Trace:
[4017091.132356]  do_epoll_ctl+0x5cb/0xf90
[4017091.136520]  ? do_sys_openat2+0x222/0x260
[4017091.141031]  __se_sys_epoll_ctl+0x4a/0x60
[4017091.145545]  do_syscall_64+0x5b/0x1b0
[4017091.149713]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[4017091.155270] RIP: 0033:0x221fec8
[4017091.158910] Code: 10 c3 8b 7c 24 08 b8 23 01 00 00 0f 05 89 44 24 10 c3 8b 7c 24 08 8b 74 24 0c 8b 54 24 10 4c 8b 54 24 18 b8 e9 00 00 00 0f 05  44 24 20 c3 cc cc cc 8b 7c 24 08 48 8b 74 24 10 8b 54 24 18 44
[4017091.178672] RSP: 002b:000000c006d57e48 EFLAGS: 00000206 ORIG_RAX: 00000000000000e9
[4017091.186978] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 000000000221fec8
[4017091.194857] RDX: 00000000000000b3 RSI: 0000000000000001 RDI: 0000000000000004
[4017091.202733] RBP: 000000c006d57e80 R08: 00007f042a5ea329 R09: 0000000000203000
[4017091.210602] R10: 000000c006d57e74 R11: 0000000000000206 R12: 00000000000000f2
[4017091.218473] R13: 0000000000000000 R14: 000000000684f29c R15: 0000000000000000
[4017091.226351] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache veth binfmt_misc tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag iptable_mangle sch_htb xt_set ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_bitmap_port ip_set dummy iptable_raw xt_CT xt_comment xt_mark xt_conntrack ipt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat_ipv4 nf_nat bpfilter overlay intel_rapl_msr intel_rapl_common iosf_mbi isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul i2c_algo_bit crc32_pclmul ttm ghash_clmulni_intel pcbc drm_kms_helper syscopyarea sysfillrect aesni_intel sysimgblt fb_sys_fops crypto_simd ip_vs_rr
[4017091.300367]  cryptd glue_helper drm lpc_ich ip_vs_sh i2c_i801 ipmi_si pcspkr mei_me ipmi_devintf ip_vs_wrr mousedev ioatdma i2c_core mfd_core mei ipmi_msghandler dca wmi pcc_cpufreq acpi_pad acpi_power_meter ip_vs nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc auth_rpcgss sunrpc ip_tables crc32c_intel ahci libahci libata [last unloaded: AliSecNetFlt64]
[4017091.336060] ---[ end trace b5d2ca91a9c736db ]---
[4017091.459188] RIP: 0010:kmem_cache_alloc+0x90/0x190

Cause

After the filter feature in Ftrace is enabled on an Alibaba Cloud Linux 2 instance, the buffer may overflow because the kernel incorrectly calculates the length of struct ring_buffer_event. If the buffer overflows, memory overlapping occurs, which causes a kernel failure.

Solution

Note

Take note of the following items:

  • Before you perform high-risk operations such as modifying instance configurations or data, we recommend that you check the disaster recovery and fault tolerance capabilities of the instances to ensure data security.

  • You can modify the configurations and data of instances such as ECS and ApsaraDB RDS instances. We recommend that you create snapshots or enable RDS log backup before you modify instance configurations or data.

  • If you have granted permissions to users or submitted sensitive information such as logon accounts and passwords in Alibaba Cloud Management Console, we recommend that you modify the information in a timely manner.

You can perform the following steps to troubleshoot the problem:

  1. Log on to the ECS instance. For more information, see Connection method overview.

  2. Run the following command to check whether one of the following solutions is applicable to your system kernel version:

    uname -r

    If an output similar to the following one is returned, one of the following solutions is applicable to your system kernel version:

    4.19.91-21.al7.x86_64

  3. Select one of the following solutions based on your system kernel version:

    • For kernel versions earlier than 4.19.91-19.1.al7.x86_64, you can perform the following steps:

      1. Run the following command to update the kernel of the operating system to the latest version:

        yum update kernel

      2. Run the following command to restart the server for the new kernel version to take effect:

        reboot

      3. If the problem persists, run the following command to install a hot patch for the kernel.

    • For kernel versions from V4.19.91-19.1.al7.x86_64 to V4.19.91-23.al7.x86_64, you can run the following command to install a hot patch for the kernel:

      yum install -y kernel-hotfix-5692820-`uname -r | awk -F"-" '{print $NF}'`

Applicable scope

  • ECS