All Products
Search
Document Center

Alibaba Cloud Linux:SMC issues

Last Updated:Feb 28, 2026

This topic covers common Shared Memory Communications (SMC) issues on Alibaba Cloud Linux 3 and how to resolve them.

SMC does not improve application performance over TCP

SMC connections may fall back to TCP without Remote Direct Memory Access (RDMA) acceleration. Check whether a fallback occurred by running smcss -a. If the output shows TCP in the mode column, see SMC falls back to TCP for the cause code lookup.

Other common reasons SMC may not improve performance:

  • The application is CPU-bound. If the application spends most of its time on computation rather than network I/O, switching to RDMA has little effect.

  • RDMA header overhead. RDMA packets carry additional headers compared to TCP. For the same available bandwidth, RDMA achieves slightly lower throughput. To mitigate this, enable Jumbo Frames.

  • Short-lived connections. SMC connection setup involves slow-path operations such as creating and requesting RDMA resources. For workloads dominated by short-lived connections, this overhead outweighs the benefit.

  • Insufficient resources. SMC requires memory and elastic RDMA interface (ERI) resources tied to the Elastic Compute Service (ECS) instance specifications. When resources run out, SMC falls back to TCP. For resource requirements, see Enable and configure SMC.

Communication fails after SMC is enabled

After enabling SMC-R on an Alibaba Cloud Linux 3 ECS instance, some Internet-facing addresses can be pinged but not accessed. cURL requests fail while ICMP pings succeed. Disabling SMC-R resolves the issue.

This happens when remote servers replay TCP options instead of ignoring unsupported ones (as required by RFC 9293). If a server echoes back the SMC TCP option, the local end incorrectly identifies the peer as SMC-capable. The resulting handshake mismatch causes connection failures.

Run a communication link check to diagnose the problem.

To fix this, configure SMC negotiation control based on BPF policies and disable SMC on the problematic link.

SMC is not enabled after running smc_run

Running smc_run ./foo does not create SMC connections. The smcr l output shows no link groups, and smcss -a shows either no SMC connections or a one-sided TCP fallback.

smc_run uses LD_PRELOAD to inject smc-tools libraries that intercept socket(2) calls and modify socket families and protocols. This mechanism does not work for statically linked applications.

For statically linked applications, enable SMC at the kernel level instead:

sysctl net.smc.tcp2smc

For details, see Enable and configure SMC.

Ports 65500-65515 become unusable after SMC is enabled

After loading SMC modules, bind(2) calls on ports 65500 through 65515 return EADDRINUSE.

SMC-R with elastic Remote Direct Memory Access (eRDMA) reserves these 16 ports for out-of-band (OOB) connections in the net namespace where ERIs reside. Run dmesg to verify:

smc: smc: load SMC module with reserve_mode
NET: Registered protocol family 43
smc: netns <netns ID> reserved ports [65500 ~ 65515] for eRDMA OOB
smc: adding ib device erdma_0 with port count 1
smc: ib device erdma_0 port 1 has pnetid

If these ports are already in use when SMC modules load, the modules cannot use eRDMA devices.

To release the ports, unload the SMC modules. See Use SMC in Alibaba Cloud ECS.

SMC falls back to TCP with IPv6 addresses

After enabling SMC for applications that use IPv6, smcss shows TCP fallback with cause code 0x03030000 or 0x09990000.

Alibaba Cloud eRDMA devices and SMC do not support IPv6. Use one of these workarounds before enabling SMC for new connections:

Disable IPv6 for all interfaces:

sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1

Disable IPv6 for a specific interface:

sudo sysctl -w net.ipv6.conf.<NetInName>.disable_ipv6=1

Replace <NetInName> with the interface name.

Use IPv4-mapped IPv6 addresses on kernel version 5.10.134-17.3 and later.

SMC performance is lower than TCP at the PPS limit

When the network load reaches the maximum packets per second (PPS) rate defined by the ECS instance type, applications using SMC with eRDMA show lower queries per second (QPS) than those using TCP.

Check whether you have hit the PPS limit:

RDMA generates more network packets per request than TCP, so the PPS limit is reached sooner. This only occurs under extreme network load such as benchmark stress tests. In production, traffic rarely hits the PPS ceiling.

If the PPS limit is the bottleneck, do not use SMC for that workload.

SMC falls back to TCP and RDMA cannot accelerate communications

After enabling SMC, smcss -a shows that the connection fell back to TCP. The connection still works, but without RDMA acceleration.

Identify the cause code

Run smcss -a to get the fallback cause code:

State          UID   Inode   Local Address           Peer Address            Intf Mode
ACTIVE         00000 0156721 192.168.99.21:60188     192.168.99.22:8090      0000 TCP 0x03010000
ACTIVE         00000 1202539 172.16.4.189:44780      172.16.4.190:1811       0000 SMCR

In the first entry, TCP in the mode column indicates a fallback. The cause code is 0x03010000. In the second entry, SMCR indicates a successful SMC-R connection.

If two cause codes appear (for example, 0x05000000 and 0x03030001), the first is from the local host and the second from the peer. Most fallbacks are caused by the peer.

Cause code reference

Cause codeDescriptionCause and solution
0x01010000Insufficient memory for SMC data structures.Host memory cannot accommodate SMC connection resources. Free memory by stopping unnecessary processes.
0x02010000Connection Layer Control (CLC) or Link Layer Control (LLC) message timeout during TCP handshake.Cause 1: RDMA network interface cards (RNICs) or RDMA links failed, causing LLC message timeouts. Make sure the RNICs work correctly. Cause 2: Ethernet NICs or TCP/IP networks failed, causing CLC message timeouts. Make sure the Ethernet NICs work correctly.
0x02020000LLC timeout for RDMA link establishment.Not in use.
0x03000000Cannot obtain correct IP addresses.The IP address for the CLC socket cannot be retrieved when creating an SMC connection proposal. Make sure the TCP-based CLC connection and corresponding devices work correctly.
0x03010000Peer does not support SMC.The peer does not include SMC TCP option flags in SYN or SYN-ACK packets during the TCP handshake. Check whether the protocol stacks on both sides are replaced with SMC. Run smcss to check the SMC connection status.
0x03020000IPsec is not supported by SMC.Do not use IPsec with SMC connections.
0x03030000No SMC-D or SMC-R devices available.Cause 1: No RDMA devices available. Run smcr d to check. For eRDMA, make sure ERIs are configured in the ECS console and drivers are installed. Cause 2: With multiple NICs, the NIC used for SMC-R is not eRDMA-capable. Run ibv_devinfo to get eRDMA device GUIDs and ip addr to get NIC MAC addresses, then compare them. Cause 3: If RDMA devices run in exclusive mode, SMC only searches the net namespace where RDMA sockets are created. Run rdma system to check. If netns exclusive appears, move the device with rdma dev set <device> netns <namespace>. For RDMA over Converged Ethernet (RoCE) or Internet Wide Area RDMA Protocol (iWARP) devices, also move the Ethernet devices. Cause 4: A client attempted to replace an AF_INET6 connection with SMC. eRDMA uses SMCv2 which does not support AF_INET6. Switch the application to AF_INET.
0x03030001No SMC-D devices available.Alibaba Cloud does not provide SMC-D devices. Contact technical support.
0x03030002No SMC-R devices available.Cause 1: The selected RDMA device became invalid during connection setup. Run smcr d to check. For eRDMA, make sure ERIs are configured and drivers are installed. Cause 2: With multiple NICs, the NIC used for SMC-R is not eRDMA-capable. Run ibv_devinfo and ip addr to compare device GUIDs and NIC MAC addresses. Cause 3: RDMA devices in exclusive mode. Run rdma system to check. Move the device to the correct net namespace with rdma dev set.
0x03030003SMC-D devices do not support ISMv2.Alibaba Cloud does not provide SMC-D devices. Contact technical support.
0x03030004Peer does not support SMCv2 extension.The local host uses SMCv2, but the peer does not support it. eRDMA and RoCE v2 use SMCv2. Make sure both hosts use the same type of RDMA device. Run smcr d to check device types. In the output, the Type column shows values such as RoCE_Express, RoCE_Express2, or 0x107f (Alibaba Cloud eRDMA).
0x03030005Peer does not support SMC-D v2 extension.Alibaba Cloud does not provide SMC-D devices. Contact technical support.
0x03030006Peer does not have a system enterprise ID (SEID).Not in use.
0x03030007No SMC-D v2 devices available.Alibaba Cloud does not provide SMC-D devices. Contact technical support.
0x03030008Peer does not have a user-defined enterprise ID (UEID).SMCv2 requires a UEID. Run smcr ueid {show &#124; add &#124; del} to configure the same UEID on both hosts.
0x03030009SMC version negotiation failed.The negotiated SMC version changed during the CLC handshake. Make sure both hosts run the same operating system distribution.
0x0303000aMax Connections per LGR negotiation failed.SMCv2.1 negotiates the maximum number of connections per link group (LGR). A fallback occurs if the negotiated value is zero or exceeds the local maximum. Make sure both hosts run the same operating system distribution.
0x0303000bMax Links per LGR negotiation failed.SMCv2.1 negotiates the maximum number of links per LGR. A fallback occurs if the negotiated value is zero or exceeds the local maximum. Make sure both hosts run the same operating system distribution.
0x0303000cSMC vendor feature negotiation failed.The vendor feature changed during the CLC handshake. Make sure both hosts run the same operating system distribution. On kernel 5.10.134-015, do not change sysctl net.smc.vendor_exp_options during connection establishment. On kernel 5.10.134-016 or later, do not change sysctl net.smc.experiment_vendor_options.
0x03040000Local and peer hosts use different SMC device modes (SMC-D vs. SMC-R).Alibaba Cloud does not provide SMC-D devices. Contact technical support.
0x03050000Peer has a remote memory buffer element (RMBE) eyecatcher.Not in use for Linux.
0x03060000MSG_FASTOPEN is not supported by SMC.Remove the MSG_FASTOPEN flag when creating SMC sockets.
0x03070000Different IP prefix or subnet between hosts.RoCEv1 devices use SMCv1, which only supports same-subnet communication. Make sure both hosts are in the same subnet. eRDMA devices use SMCv2 and are not subject to this restriction.
0x03080000Cannot obtain the VLAN ID.SMC cannot retrieve the VLAN ID for the device during connection setup. Make sure the TCP connection and Ethernet devices work correctly.
0x03090000Cannot register the VLAN ID with an Internal Shared Memory (ISM) device.Alibaba Cloud does not provide SMC-D devices. Contact technical support.
0x030a0000No SMC-R RDMA links in the link group.The connection could not get a link from its LGR. Run smcr d to check RNIC status. For eRDMA, make sure ERIs are configured and drivers are installed.
0x030b0000Client cannot find the server's RDMA links.The client searches for RDMA links using the queue pair number (QPN), global identifier (GID), and MAC address provided by the server. If no matching links are found, the connection cannot use RDMA. Run smcr d to check RNIC status. For eRDMA, make sure ERIs are configured and drivers are installed.
0x030c0000SMC version negotiation failed.The negotiated SMC version is unacceptable. Make sure both hosts run the same operating system distribution.
0x030d0000Maximum number of SMC-D DMBs reached.Alibaba Cloud does not provide SMC-D devices. Contact technical support.
0x030e0000SMC-R V2 connection failed.During SMCv2 connection setup, the client cannot find routing information for the peer IP addresses. Make sure the TCP connection, Ethernet NICs, IP configuration, and routing configuration are correct and reachable.
0x030f0000Indirect connection flag mismatch.During SMCv2 connection setup, the client detects a mismatch between the server's gateway flag and local routing information. Make sure the TCP connection, Ethernet NICs, IP configuration, and routing configuration are correct and use the same network path.
0x04000000Server and client do not use the same link group.The server reuses an LGR, but the client wants to create a new one. Run smcr d to check RNIC status. For eRDMA, make sure ERIs are configured and drivers are installed.
0x05000000Peer rejected the handshake.The peer sent a CLC message rejecting the RDMA connection. Run smcss, find the connection by its quintuple, and check the peer's cause code.
0x09990000RDMA resource creation failed.RDMA resources could not be created or initialized. Use an RDMA monitoring tool to check error statistics. For eRDMA, run eadm stat.
0x09990001RDMA RToken failed.This is an SMC protocol stack issue. Contact technical support.
0x09990002RDMA queue pair (QP) initialization failed.SMC calls InfiniBand (IB) verbs interfaces to initialize the QP, and an error occurred. Run smcr d to check SMC-R devices. For eRDMA, make sure ERIs are configured and drivers are installed.
0x09990003Memory region (MR) registration failed.The number or size of MRs exceeds the RDMA device specifications. Run ibv_devinfo -d <device> -v &#124; grep max_mr to check limits. max_mr is the maximum number of MRs, and max_mr_size is the maximum size. This usually means the MR count limit was reached. Reduce the number of SMC connections.
0x09990004SMC flow control credit initialization failed.RNICs or RDMA links failed, preventing credit messages from being sent. Make sure RNICs work correctly.

Network O&M tools show unexpected data after SMC is enabled

After enabling SMC, tools such as tcpdump, Wireshark, ss, and netstat show network traffic data that does not match expectations, or do not capture expected traffic.

SMC-R is based on RDMA. These tools analyze only TCP traffic and cannot identify RDMA packets.

Use RDMA-specific tools instead. See Monitor and check eRDMA.

SMC module is unusable on GPU-accelerated or SCC instances

The SMC module loaded on a GPU-accelerated or Super Computing Cluster (SCC) instance does not function.

These instance types have Mellanox OpenFabrics Enterprise Distribution (OFED) drivers installed. The OFED stack includes its own SMC module that auto-loads but does not work. After installing Mellanox OFED drivers, RDMA function symbols change, and the kernel SMC module fails to load with an Unknown symbol error.

SMC cannot be used on GPU-accelerated or SCC instances with Alibaba Cloud Linux 3.

Some SOL_SOCKET and SOL_TCP options do not work after enabling SMC

After replacing TCP with SMC, some setsockopt and getsockopt options at the SOL_SOCKET or SOL_TCP level cannot be configured, cannot be retrieved, or do not work as expected.

SMC uses shared buffers and a different protocol stack design than TCP. Some socket options are incompatible with this design.

The support levels are:

  • Y: Fully supported. The option can be set, retrieved, and works as expected.

  • M: Configurable but may not work as expected due to design differences between SMC and TCP.

  • N: Not supported. Using the option causes a TCP fallback with cause code 0x03060000 or 0x03010001.

SOL_SOCKET options

OptionSupport
SO_DEBUGY
SO_REUSEADDRY
SO_TYPEY
SO_ERRORY
SO_DONTROUTEM
SO_BROADCASTM
SO_SNDBUFY
SO_RCVBUFY
SO_SNDBUFFORCEY
SO_RCVBUFFORCEY
SO_KEEPALIVEM
SO_OOBINLINEM
SO_NO_CHECKM
SO_PRIORITYM
SO_LINGERY
SO_BSDCOMPATM
SO_REUSEPORTY
SO_PASSCREDM
SO_PEERCREDM
SO_RCVLOWATM
SO_SNDLOWATM
SO_RCVTIMEO_OLDY
SO_SNDTIMEO_OLDY
SO_SECURITY_AUTHENTICATIONN
SO_SECURITY_ENCRYPTION_TRANSPORTN
SO_SECURITY_ENCRYPTION_NETWORKN
SO_BINDTODEVICEN
SO_ATTACH_FILTERM
SO_DETACH_FILTERM
SO_PEERNAMEY
SO_ACCEPTCONNM
SO_PEERSECN
SO_PASSSECM
SO_MARKM
SO_PROTOCOLY
SO_DOMAINY
SO_RXQ_OVFLM
SO_WIFI_STATUSM
SO_PEEK_OFFN
SO_NOFCSM
SO_LOCK_FILTERY
SO_SELECT_ERR_QUEUEM
SO_BUSY_POLLM
SO_MAX_PACING_RATEM
SO_BPF_EXTENSIONSY
SO_INCOMING_CPUM
SO_ATTACH_BPFM
SO_ATTACH_REUSEPORT_CBPFM
SO_ATTACH_REUSEPORT_EBPFN
SO_CNX_ADVICEM
SO_MEMINFOM
SO_INCOMING_NAPI_IDM
SO_COOKIEY
SO_PEERGROUPSN
SO_ZEROCOPYN
SO_TXTIMEM
SO_BINDTOIFINDEXN
SO_TIMESTAMP_OLDM
SO_TIMESTAMPNS_OLDM
SO_TIMESTAMPING_OLDM
SO_TIMESTAMP_NEWM
SO_TIMESTAMPNS_NEWM
SO_TIMESTAMPING_NEWM
SO_RCVTIMEO_NEWY
SO_SNDTIMEO_NEWY
SO_DETACH_REUSEPORT_BPFN

SOL_TCP options

OptionSupport
TCP_NODELAYY
TCP_MAXSEGM
TCP_CORKY
TCP_KEEPIDLEM
TCP_KEEPINTVLM
TCP_KEEPCNTM
TCP_SYNCNTM
TCP_LINGER2M
TCP_DEFER_ACCEPTY
TCP_WINDOW_CLAMPM
TCP_INFOM
TCP_QUICKACKM
TCP_CONGESTIONM
TCP_MD5SIGY
TCP_THIN_LINEAR_TIMEOUTSM
TCP_THIN_DUPACKM
TCP_USER_TIMEOUTM
TCP_REPAIRM
TCP_REPAIR_QUEUEM
TCP_QUEUE_SEQM
TCP_REPAIR_OPTIONSM
TCP_FASTOPENN
TCP_TIMESTAMPM
TCP_NOTSENT_LOWATM
TCP_CC_INFOM
TCP_SAVE_SYNY
TCP_SAVED_SYNY
TCP_REPAIR_WINDOWM
TCP_FASTOPEN_CONNECTN
TCP_ULPN
TCP_MD5SIG_EXTY
TCP_FASTOPEN_KEYN
TCP_FASTOPEN_NO_COOKIEN
TCP_ZEROCOPY_RECEIVEN
TCP_CM_INQ/TCP_INQM
TCP_TX_DELAYM