This topic describes the issues that may occur in Shared Memory Communication (SMC) and how to resolve the issues. This topic is applicable to Alibaba Cloud Linux 3.
SMC does not provide application performance improvements over TCP
Problem description
When you use SMC instead of TCP to accelerate the TCP connection of an application, the application performance is not improved.
Cause and solution
The SMC connection that is established for the application falls back to TCP. In this case, you cannot use Remote Direct Memory Access (RDMA) to accelerate network communication. For information about how to troubleshoot and resolve the fallback issue, see the SMC falls back to TCP and RDMA cannot be used to accelerate communications section in this topic.
The network communication overhead of the application accounts for a small portion of the overall overhead. For example, the application is CPU-intensive and slightly dependent on network communication.
Compared with TCP packets, RDMA packets require additional header space to accommodate RDMA-related information. Given the same amount of available bandwidth, the actual bandwidth that can be achieved for transmission of RDMA packets is slightly lower than that for transmission of TCP packets. To alleviate this issue, use the Jumbo Frames feature. For information about the feature, see Jumbo Frames.
SMC is incompatible with the network communication model of the application. Example scenarios:
Scenarios in which short-lived connections are frequently established and closed. The establishment of SMC connections involves slow-path operations such as creating and requesting RDMA resources. For applications that predominantly use short-lived connections, SMC offers no performance improvements over TCP.
Scenarios in which resources are limited. The resources required for SMC communications are subject to the memory and eRDMA interface (ERI) specifications of an Elastic Compute Service (ECS) instance. If the resources are insufficient, SMC may fall back to TCP. For more information, see Enable and configure SMC.
Communication fails after SMC is enabled
Problem description
After you enable Shared Memory Communications over Remote Direct Memory Access (SMC-R) for an ECS instance that runs Alibaba Cloud Linux 3, specific addresses such as the addresses of specific Internet-facing services can be pinged but cannot be accessed. After you disable SMC-R, the issue is resolved.
Cause
Some servers are not strictly compliant with the TCP specifications. When the servers process TCP options, the servers may replay the TCP options. As a result, the local ends incorrectly regards that the SMC-incapable peer servers support SMC.
A TCP implementation MUST (MUST-6) ignore without error any TCP Option it does not implement, assuming that the option has a length field. For more information, see RFC 9293.
If the TCP option that is used to indicate support for SMC is replayed, the local end misidentifies the peer server as being SMC-capable. In this case, a handshake error occurs. As a result, requests such as cURL requests fail, but pings over the Internet Control Message Protocol (ICMP) succeed.
You can perform a communication link check to diagnose the issue.
Solution
The TCP option replay issue occurs unexpectedly, and cannot be resolved because TCP options are replayed by intermediate network nodes or peers. We recommend that you perform SMC negotiation control based on BPF policies and do not use SMC for access on the problematic link.
SMC failed to be enabled after the smc_run command is run
Problem description
After you run the smc_run ./foo command to enable SMC for an application, you run the smcr l command to explore SMC-R link groups but the command output indicates that no SMC-R link groups are created. Then, you run the smcss -a command to query SMC sockets, but the command output indicates that no SMC connections exist or that an SMC connection falls back to TCP on one side. For more information about the commands, see Enable and configure SMC.
Cause
The smc_run command uses the following mechanism to transparently enable SMC: Preload the dynamic link libraries from smc-tools that are specified in the LD_PRELOAD variable before other libraries, and then make a socket(2) call in the preloaded dynamic link libraries to modify the families and protocols of sockets. If an application is not dynamically linked, you cannot run the smc_run command to transparently enable SMC for the application.
Solution
Run the sysctl net.smc.tcp2smc command that is described in Enable and configure SMC to enable SMC.
Specific ports become unusable after SMC is enabled
Problem description
After SMC is loaded, 16 ports within the port range of 65500 to 65515 become unusable. After you make a bind(2) call for the ports, EADDRINUSE is returned.
Cause
SMC-R and eRDMA are used together. SMC modules use ports 65500 to 65515 in the net namespace in which ERIs reside to establish out-of-band (OOB) connections. You can run the dmesg command and view the following information in the command output:
smc: smc: load SMC module with reserve_mode
NET: Registered protocol family 43
smc: netns <netns ID> reserved ports [65500 ~ 65515] for eRDMA OOB
smc: adding ib device erdma_0 with port count 1
smc: ib device erdma_0 port 1 has pnetidIf SMC modules fail to occupy the ports, the SMC modules cannot use eRDMA devices.
Solution
Unload the SMC modules to release the ports. For information about how to unload SMC modules, see the Use SMC in Alibaba Cloud ECS section in the "Use SMC" topic.
SMC falls back to TCP when IPv6 addresses are used
Problem description
After you enable SMC for applications that use IPv6 addresses, the smcss command output indicates that SMC falls back to TCP with the cause code 0x03030000 or 0x0x09990000.
Cause
SMC falls back to TCP because Alibaba Cloud eRDMA devices and SMC do not support IPv6 addresses.
Solutions
Before you enable SMC for new connections, use one of the following methods:
Method 1: Disable IPv6 addresses.
Run the following commands to disable IPv6 addresses for all network interfaces:
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1 sysctl -w net.ipv6.conf.default.disable_ipv6=1Run the following command to disable IPv6 addresses for a specific network interface.
Replace
<NetInName>with the name of the network interface.sudo sysctl -w net.ipv6.conf.<NetInName>.disable_ipv6=1
Method 2: Use
IPv4-mapped IPv6addresses for kernel version5.10.134-17.3and later.
SMC performance is lower than TCP performance in extreme PPS conditions
Problem description
When the network load reaches the maximum packets per second (PPS) rate defined by an ECS instance type, applications that use SMC with eRDMA have lower queries per second (QPS) performance than those that use TCP.
TCP traffic: Run the
sar -n DEV 1command to check the number of packets transmitted per second (rxpck/sandtxpck/s) on a network interface and see whether the network load has reached the PPS limit.SMC eRDMA traffic: Run the
eadm stat -d <ibdev_name> -lcommand to check the number of packets transmitted per second on an eRDMA network interface and see whether the network load has reached the PPS limit. For information abouteadm, see Use eadm to diagnose and troubleshoot issues in eRDMA.
Cause
RDMA generates more network packets for the same number of network requests than TCP. As a result, the PPS limit of an ECS instance is reached faster, preventing the QPS performance of applications from improving.
Solution
This issue occurs only under extreme network load conditions where the PPS limit is reached, such as during benchmark stress tests. In actual scenarios, network traffic load rarely reaches the PPS limit. In case this happens, we recommend that you do not use SMC.
SMC falls back to TCP and RDMA cannot be used to accelerate communications
Problem description
After you enable SMC to replace TCP in an application, you run the smcss -a command and the command output indicates that the SMC connection automatically falls back to TCP.
Cause
If an exception causes an SMC connection to fall back to TCP during SMC connection establishment, the SMC connection can still be used for communication, but the application that uses the SMC connection cannot leverage the performance benefits of RDMA. When an SMC-to-TCP fallback occurs, a cause code is returned. You can identify the cause of the fallback based on the code.
Solution
Run the
smcss -acommand to obtain the cause code of the SMC-to-TCP fallback.Sample command output:
State UID Inode Local Address Peer Address Intf Mode ACTIVE 00000 0156721 192.168.99.21:60188 192.168.99.22:8090 0000 TCP 0x03010000 ACTIVE 00000 1202539 172.16.4.189:44780 172.16.4.190:1811 0000 SMCRIn the first entry, TCP in the Intf Mode column indicates that the SMC connection fell back to TCP. The cause code is 0x03010000. In the second entry, SMCR in the Intf Mode column indicates that the SMC-R connection is established. If two cause codes (example: 0x05000000 and 0x03030001) are displayed in the Intf Mode column, the first code indicates the cause for the local host and the second code indicates the cause for the peer host. In most cases, SMC-to-TCP fallbacks are caused by the peer host.
Identify the causes of SMC-to-TCP fallbacks based on the cause codes and resolve the fallbacks.
After you enable SMC, data collected by common network O&M tools does not meet expectations
Problem description
After you enable SMC for an ECS instance that runs Alibaba Cloud Linux 3, common network analysis tools such as tcpdump and Wireshark and network monitoring tools such as the Socket Statistics (ss) and netstat utilities collect network traffic data that does not meet expectations or cannot collect expected traffic data.
Cause
SMC-R is a communication protocol that is based on RDMA. Currently, common network O&M tools analyze or monitor only TCP traffic and cannot identify RDMA packets. As a result, the data displayed in network O&M tools does not match actual network data.
Solution
Use RDMA-related O&M tools to analyze or monitor data. For more information, see Monitor and check eRDMA.
The SMC module that is loaded on a GPU-accelerated or Super Computing Cluster (SCC) instance is unusable
Problem description
The SMC module that is loaded on a GPU-accelerated or SCC instance is unusable.
Cause
Mellanox OpenFabrics Enterprise Distribution (OFED) drivers are installed on GPU-accelerated and SCC instances. The SMC module in the OFED stack is automatically loaded and cannot work. After you install Mellanox OFED drivers, symbols for RDMA-related functions change. The SMC module that is included in the kernel fails to be loaded, and the Unknow symbol error appears.
Solution
The SMC module in Alibaba Cloud Linux 3 cannot be used on GPU-accelerated or SCC instances.
After you enable SMC, some SOL_SOCKET or SOL_TCP level options for setsockopt and getsockopt calls do not work as expected
Problem description
After you enable SMC to replace TCP in applications, some SOL_SOCKET or SOL_TCP level options that were used for the TCP connections cannot be configured by making setsockopt or getsockopt calls or do not work as expected after configuration.
Cause
After you place the TCP protocol stack with the SMC protocol stack, shared buffer is used to transfer data over SMC links. The protocol stack design and data transfer methods of SMC greatly differ from the protocol stack design and data transfer methods of TCP. In this case, SOL_SOCKET or SOL_TCP level options are inapplicable.
Solution
Take note of the SOL_SOCKET or SOL_TCP level options that are supported or not supported by SMC in Alibaba Cloud Linux 3. The following tables describes the support of SMC for SOL_SOCKET or SOL_TCP level options.
Y, M, and N are displayed in the table.
Y: The option is supported by SMC and can be configured and obtained and work as expected.
M: The option is not supported by SMC and can be configured and obtained, but cannot work as expected due to the differences in design between SMC and TCP.
N: The option is not supported by SMC and cannot be configured or obtained. A fallback to TCP occurs with the cause code 0x03060000 or 0x03010001.