All Products
Search
Document Center

Alibaba Cloud Linux:What do I do if a memory leak occurs on the Intel SGX driver of an Alibaba Cloud Linux 2 ECS instance in specific cases?

Last Updated:Dec 08, 2023

This topic describes the cause of the following issue and how to resolve the issue: In specific cases, a memory leak occurs on the Intel Software Guard Extension (SGX) driver of an Elastic Compute Service (ECS) instance that runs Alibaba Cloud Linux 2.

Problem description

In specific cases, a memory leak occurs on the Intel SGX driver of an Alibaba Cloud Linux 2 instance that meets the following conditions. The memory leak eventually causes system memory exhaustion. Most of the memory is occupied by the test process application of Intel SGX, and out-of-memory (OOM) information similar to the following one is displayed:

  • Image: Alibaba Cloud Linux 2.1903 LTS 64-bit

  • Kernel: kernel-4.19.91-23.al7 or earlier

  • Instance type: c7t, r7t, or g7t

[   71.938733] systemd-journal invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
[   71.938735] systemd-journal cpuset=/ mems_allowed=0
[   71.938738] CPU: 0 PID: 415 Comm: systemd-journal Not tainted 4.19.91-23.al7.x86_64 #1
[   71.938738] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
[   71.938739] Call Trace:
[   71.938746]  dump_stack+0x66/0x8b
[   71.938749]  dump_global_header+0x12/0x10f
[   71.938750]  oom_kill_process+0x2cf/0x310
[   71.938752]  out_of_memory+0xf7/0x4c0
[   71.938754]  __alloc_pages_nodemask+0xf07/0xfd0
[   71.938757]  ? blk_flush_plug_list+0xd7/0x220
[   71.938759]  pagecache_get_page+0x8c/0x350
[   71.938761]  filemap_fault+0x37e/0x6e0
[   71.938764]  ext4_filemap_fault+0x2c/0x3b
[   71.938766]  __do_fault+0x38/0x170
[   71.938768]  do_fault+0x2eb/0x640
[   71.938769]  __handle_mm_fault+0x621/0xa20
[   71.938772]  ? apic_timer_interrupt+0xa/0x20
[   71.938774]  handle_mm_fault+0x106/0x1c0
[   71.938776]  __do_page_fault+0x1ba/0x480
[   71.938778]  do_page_fault+0x32/0x140
[   71.938780]  ? async_page_fault+0x8/0x30
[   71.938781]  async_page_fault+0x1e/0x30
[   71.938782] RIP: 0033:0x55a1ca49516f
[   71.938786] Code: Bad RIP value.
[   71.938787] RSP: 002b:00007ffcd58b22b0 EFLAGS: 00010246
[   71.938788] RAX: 0000000000000000 RBX: 000055a1cbcc4400 RCX: a1fcdcf819d7e1e5
[   71.938788] RDX: 00007f3d4d72a000 RSI: 000055a1cbcc2060 RDI: 000055a1cbcc4400
[   71.938789] RBP: a1fcdcf819d7e1e5 R08: 00007ffcd58b23b0 R09: 00007ffcd58b23a8
[   71.938790] R10: 000055a1ca49a935 R11: 00000000d1ba4319 R12: 000055a1cbcc4400
[   71.938790] R13: 0000000000000011 R14: 000055a1cbcc2060 R15: a1fcdcf819d7e1e5
[   71.938791] Task in / killed as a result of limit of host
[   71.938792] Mem-Info:
[   71.938795] active_anon:85 inactive_anon:410619 isolated_anon:0
 active_file:150 inactive_file:353 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 slab_reclaimable:6038 slab_unreclaimable:17336
 mapped:98 shmem:403568 pagetables:1793 bounce:0
 free:12881 free_pcp:440 free_cma:0
[   71.938797] Node 0 active_anon:340kB inactive_anon:1642476kB active_file:600kB inactive_file:1412kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:392kB dirty:0kB writeback:0kB shmem:1614272kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 2048kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[   71.938798] Node 0 DMA free:7408kB min:392kB low:488kB high:584kB active_anon:0kB inactive_anon:8312kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:16kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   71.938800] lowmem_reserve[]: 0 1761 1761 1761 1761
[   71.938801] Node 0 DMA32 free:44116kB min:44660kB low:55824kB high:66988kB active_anon:340kB inactive_anon:1633492kB active_file:688kB inactive_file:1812kB unevictable:0kB writepending:0kB present:1914960kB managed:1826408kB  mlocked:0kB kernel_stack:2208kB pagetables:7156kB bounce:0kB free_pcp:1760kB local_pcp:1396kB free_cma:0kB
[   71.938804] lowmem_reserve[]: 0 0 0 0 0
[   71.938805] Node 0 DMA: 0*4kB 2*8kB (UM) 2*16kB (UE) 0*32kB 1*64kB (E) 3*128kB (UME) 3*256kB (UME) 2*512kB (ME) 3*1024kB (UME) 1*2048kB (E) 0*4096kB = 7408kB
[   71.938810] Node 0 DMA32: 233*4kB (UMEH) 158*8kB (UMEH) 177*16kB (UMEH) 79*32kB (UEH) 34*64kB (UMEH) 16*128kB (UMEH) 6*256kB (E) 3*512kB (UE) 3*1024kB (ME) 3*2048kB (UME) 5*4096kB (M) = 44548kB
[   71.938815] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[   71.938816] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   71.938816] 404127 total pagecache pages
[   71.938817] 0 pages in swap cache
[   71.938818] Swap cache stats: add 0, delete 0, find 0/0
[   71.938818] Free swap  = 0kB
[   71.938819] Total swap = 0kB
[   71.938819] 482739 pages RAM
[   71.938820] 0 pages HighMem/MovableOnly
[   71.938820] 22160 pages reserved
[   71.938820] 0 pages cma reserved
[   71.938821] 0 pages hwpoisoned
[   71.938821] Tasks state (memory values in pages):
[   71.938822] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[   71.938824] [    415]     0   415    11814       85   147456        0             0 systemd-journal
[   71.938826] [    439]     0   439    11430      228   118784        0         -1000 systemd-udevd
[   71.938827] [    550]     0   550    22654      218   212992        0             0 rngd
[   71.938828] [    554]    81   554    15051      155   167936        0          -900 dbus-daemon
[   71.938829] [    573]     0   573    48803      120   180224        0             0 gssproxy
[   71.938830] [    585]     0   585     6598       91    98304        0             0 systemd-logind
[   71.938831] [    587]     0   587     4456      115    61440        0             0 assist_daemon
[   71.938832] [    597]    32   597    17316      135   188416        0             0 rpcbind
[   71.938833] [    601]     0   601    31598      153   106496        0             0 crond
[   71.938834] [    606]   997   606    29454      129   143360        0             0 chronyd
[   71.938835] [    616]     0   616    27553       33    57344        0             0 agetty
[   71.938836] [    819]     0   819    25740      516   221184        0             0 dhclient
[   71.938837] [    887]     0   887   121900      708   430080        0             0 rsyslogd
[   71.938838] [    953]     0   953    10512      391   102400        0             0 AliYunDunUpdate
[   71.938839] [   1078]     0  1078    32317      732   274432        0             0 AliYunDun
[   71.938840] [   1235]     0  1235    28237      261   266240        0         -1000 sshd
[   71.938841] [   1283]     0  1283    39209      337   348160        0             0 sshd
[   71.938842] [   1292]     0  1292    29086      317    90112        0             0 bash
[   71.938843] [   1310]     0  1310    87597      530   311296        0          -900 abrt-dbus
[   71.938844] [   1397]     0  1397    39209      347   348160        0             0 sshd
[   71.938845] [   1399]     0  1399    29080      279    81920        0             0 bash
[   71.938846] [   1430]     0  1430    27028       25    77824        0             0 dmesg
[   71.938847] [   1431]     0  1431  8392985       92  3219456        0             0 app
[   71.938848] [   1432]     0  1432    39209      339   356352        0             0 sshd
[   71.938849] [   1434]     0  1434    29053      276    81920        0             0 bash
[   71.938850] [   1470]     0  1470     2146       23    57344        0             0 systemd-cgroups
[   71.938851] [   1471]     0  1471     2146       23    57344        0             0 systemd-cgroups
[   71.938852] [   1472]     0  1472     2146       23    53248        0             0 systemd-cgroups
[   71.938853] [   1473]     0  1473     2143       15    57344        0             0 systemd-cgroups
[   71.938854] Out of memory: Kill process 1431 (app) score 1 or sacrifice child
[   71.939026] Killed process 1431 (app) total-vm:33571940kB, anon-rss:320kB, file-rss:48kB, shmem-rss:0kB
[   71.942922] oom_reaper: reaped process 1431 (app), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Cause

The sgx_encl_mm_release_deferred functions in Arch, x86, Kernel, CPU, SGX, and encl.c fail to properly process the reference count of the Encl structure. When the processes that occupy Enclave Page Cache (EPC) call fork(), the reference count of Encl does not return to zero, which causes EPC to leak. After the physical memory is exhausted, the shared memory is used instead of EPC, and then the non-EPC memory is exhausted.

Solutions

To resolve the issue, perform the following operations:

  1. Log on to the instance and run the following command to view the kernel version. If the kernel version is later than 4.19.91-23.al7.x86_64, the following operations are not applicable.

    uname -r

    A command output similar to the following one is displayed.

     4.19.91-23.al7.x86_64
  2. Select one of the following solutions based on your kernel version:

    • To resolve the issue for a kernel version earlier than 4.19.91-23.al7.x86_64, perform the following steps:

      1. Update your kernel version to the latest version.

        yum update kernel
      2. Restart the instance for the update to take effect.

        reboot
      3. Update kernel hotfixes.

        If the issue exists on the latest kernel version, use the following solution.

    • To resolve the issue for kernel version 4.19.91-23.al7.x86_64, run the following installation command to install a kernel hotfix:

      yum install -y kernel-hotfix-5577959-`uname -r | awk -F"-" '{print $NF}'`