The tortuous experience of IoT devices sending MQTT requests to the cloud

The tortuous experience of IoT devices sending MQTT requests to the cloud

In order to understand the whole network process of IoT devices getting data from sensors and sending it to the cloud through the network, let's first look at the network hierarchy model:

The above figure illustrates the most common protocols in network layering:

• Application layer: the application is responsible for packaging the data with corresponding rules (protocols) and sending it to the transport layer

*MQTT: Message Queuing Telemetry Transmission

*CoAP: Restricted Application Protocol

*HTTP: Hypertext Transfer Protocol

*FTP: File Transfer Protocol

*SMTP: Simple Mail Transfer Protocol

• Transport layer: it is responsible for grouping the data transmitted from the application layer. In order to ensure the order and integrity of the data received by the terminal, each packet will be marked and handed over to the network layer

*TCP: Transmission Control Protocol

*UDP: User Data Protocol

• Network layer: responsible for sending data packets sent from the transport layer to the target terminal

*IP: Internet Protocol

*ICMP: Internet Control Message Protocol

*IGMP: Internet Group Management Protocol

• Link layer: sending and receiving data units for the network layer

*ARP: Address Resolution Protocol

*RARP: Reverse Address Resolution Protocol

Packaging and distribution

When the data passes through each layer, it will be packaged by the corresponding protocol, namely encapsulation. When it reaches the terminal, it will be unpacked layer by layer, namely demultiplexing.

When sending, the business data collected by the device is encapsulated as MQTT message by the application program. Each layer will use the message transmitted from the upper layer as the data block of this layer, and add its own header, which contains the protocol identification, and this whole will be transmitted downward as the message of this layer.

When receiving, the data flows from bottom to top. When passing through each layer, the header of the message is removed. The correct upper layer protocol is determined according to the message identification, and finally it is processed by the application program at the application layer.

The business data collected by the IoT device is encapsulated into MQTT messages by the application program on the device side. MQTT messages will be transmitted sequentially through an established TCP long connection in the form of a data stream. After TCP receives the data stream, it will be divided into small data blocks. The TCP header and data blocks added to each block together form a TCP packet. The packet is sent through the network layer, which follows the IP protocol. When receiving the packet sending request, The packet will be put into the IP datagram, the header will be filled, and the datagram will be sent through the link layer.

After receiving the data request from the link layer, the cloud system enters the network layer to analyze the data, and submits it to the transmission layer to verify the packet order and integrity. The cloud system takes the data from the data block, obtains the MQTT message, and submits it to the application layer for processing. This process will split the header layer by layer to restore the business data collected by the IoT device.

Application layer MQTT protocol

MQTT is a message transmission protocol of publish/subscribe mode for client server architecture. Its design idea is light, open, simple, standard and easy to implement. These characteristics make it a good choice for many scenarios, especially for limited environments such as machine to machine communication (M2M) and Internet of Things (IoT).

The packet format of MQTT protocol is very simple. It consists of fixed header, variable header and payload.

Fixed header: including control message type, flag bit and remaining length.

The upper 4 bits (bit7~bit4) of the first byte of the MQTT message indicate the control message type, which can represent a total of 16 protocol types:

The lower 4 bits (bit4~bit0) of the first byte of the MQTT message are used to specify the flag bit of the data packet. Only the PUBLISH control message is used.

Remaining length: The second byte of the MQTT message starts from the field used to identify the length of the MQTT packet, with at least one byte and at most four bytes. The lower 7 bits of each byte are used to identify the value, ranging from 0 to 127.

Variable header: it exists in some types of MQTT packets, and the specific content is determined by the corresponding types of packets.

Payload: It exists in some MQTT packets and stores the specific business data of the message.

Transport Layer TCP Protocol**

MQTT connections are based on TCP connections, which provide reliable data connections. When an MQTT message is to be transmitted, the message data will be transmitted sequentially through an open TCP connection in the form of a stream. TCP will divide the received data into small blocks, each of which is a TCP packet.

Because the data is sent in small blocks, complete and reliable data transmission is mainly reflected in: whether the packets are complete, whether the packet order is normal, whether the packets are damaged, and whether the packet data is repeated. These can be controlled through TCP's checksum, serial number, acknowledgement response, retransmission control, connection management and window mechanism.

TCP is a transmission control protocol. Transmission control mainly depends on the six symbols contained in the header, which control the transmission status of the message and the actions taken by the sender and receiver to deal with the data. When their values are 1, the respective functions corresponding to the flags can be executed. For example, when the URG is 1, the emergency pointer part of the message header is valid.

• URG emergency pointer

• ACK confirms that the serial number is valid

• The PSH receiver should deliver this message segment to the application layer as soon as possible.

• RST reconstruction connection

• SYN synchronization sequence number is used to initiate a connection

• The FIN originator completes the sending task

Source port and destination port: identify the port numbers of the sender and receiver. A TCP connection is confirmed by four values: source IP, source port, destination IP, and destination port. The source IP and destination IP are included in the IP packet.

Header Length: indicates the byte length of the TCP header. It can also mark the number of bytes from which the data needs to be transmitted.

TCP segment serial number: the serial number of the first byte of the data sent by this segment of the message. Each byte of the data in each segment of the message has a serial number. The serial number of the first byte starts from 0, increases by 1, and decreases by 1 to the 32nd power of 2, and then starts from 0 again.

TCP segment confirmation S/N: when the header flag ACK is 1, the confirmation S/N is valid. After the TCP segment is received by the receiving end, it will send a confirmation number back to the sending end, which is the sequence number of the last byte received plus 1.

Checksum: It is calculated by the sender and verified by the receiver. If the receiver detects that the checksum is incorrect, it indicates that the TCP segment may be damaged and will be discarded. At the same time, the receiver sends back a duplicate confirmation number (the same as the confirmation number of the latest correct message transmission), indicating that the received TCP segment is incorrect, and tells itself the serial number it wants to receive. At this time, the sender needs to immediately retransmit the TCP segment in error.

Emergency pointer: when the head mark URG is 1, the emergency pointer is valid, indicating that the sending end will send emergency data to the receiving end. The emergency pointer is a positive offset, which is added to the TCP segment serial number to calculate the serial number of the last byte of emergency data. For example, if the receiver receives data and starts reading from the byte with the serial number of 1000, and the emergency pointer is 1000, then the emergency data is the byte with the serial number from 1000 to 2000. It is up to the receiver to decide how to process the data.

Window size: determines the throughput of a TCP block data stream. It should be noted that it represents the amount of data that the sender allows the other party to send. For example, if the window size in the sender's header is 1000, the sender can accept up to 1000 bytes of data from the other party. This is related to the data cache space of the sender, and will affect the TCP performance.

Header flag PSH: If the receiver needs to be told to immediately submit all the data to the receiving process, the sender needs to set the PSH to 1. The data here is the data transmitted together with the PSH and all the data previously received. If the receiver receives a PSH of 1, it needs to immediately submit the data to the receiving process without waiting for other data to come in.

Reset flag RST: when RST is 1, it indicates that the connection is abnormal, and the receiver will terminate the connection and notify the application layer to reestablish the connection.

SYN: used to establish a connection, involving three handshakes of TCP.

1: When establishing a connection, the client sends a TCP packet to the server. The SYN of the header of the packet is 1, and it carries an initial serial number, indicating that this is a connection request.

2: If the server accepts the connection, it will send a TCP packet to the client. The packet will contain SYN and ACK, both of which are 1. It also contains a confirmation serial number. The value is the initial serial number from the client+1, indicating that the connection has been accepted.

3: After receiving the packet sent in the previous step, the client will send a confirmation message packet to the server. The ACK is 1, and the confirmation serial number will be carried again. The value is the confirmation serial number from the client in the second step+1. After receiving the confirmation message, the server enters the connected state.

In the third step, the confirmation packet can carry the data to be sent.

Connection termination flag FIN: used to close the connection. When one end finishes the data transmission task, it will send a FIN flag to terminate the connection. However, because TCP will have data transmission in two directions (C-S, S-C), and each direction has its own sending FIN&confirmation closing process, there will be four interactions, also known as four waves.

1: If the data transmission of the client application layer is completed, the client TCP message will send a FIN, telling the server to close the data transmission.

2: After the server receives this flag, it sends back an ACK, confirming that the serial number is the received serial number plus 1, and TCP also sends an end of file character to the application.

3: At this time, the server closes the connection in this direction, causing its TCP to also send a FIN.

4: After receiving it, the client sends back a confirmation ACK with the received serial number+1, and the connection is completely closed.

The TCP segment serial number and confirmation serial number ensure the sequence of data, check and ensure the integrity of data, and the emergency pointer ensures that emergency data can be processed in a timely manner. In addition, TCP has some timeout retransmission, congestion avoidance, and slow start mechanisms, which can ensure that the packet data is transmitted to the target in complete order.

Network layer - IP protocol

If TCP packets are containers that pack goods, IP is the truck that carries containers. IP protocol provides the connection between two nodes to ensure that TCP data can be transmitted from the source to the terminal as quickly as possible, but it cannot guarantee the reliability of transmission.

The IP layer encapsulates the TCP packets transmitted from the upper layer, brings its own header, and then performs routing, fragmentation, and reorganization to reach the destination. In this process, the IP header plays an important role. Let's take a look at the structure of the header.

Version: indicates the version of the current IP protocol. The current version number is 4. There is also a version number of 6, namely IPV4 and IPV6. If the versions of the sending and receiving sides are inconsistent, the current IP datagram will be discarded.

Head length: the length of the whole head, up to 60 bytes.

Service Type (TOS): It is used to distinguish service types, but the IP layer has never actually used it in its work. The existing TOS has only 4 bit sub fields and 1 bit unused bits. The unused bit must be set to 0. Only one of the four bits in the TOS can be set to 1 to represent the current service type. The four service types corresponding to 4bit are: minimum delay, maximum throughput, maximum reliability and minimum cost.

Total length: indicates the total length of the current datagram message, in bytes. The size and starting position of the data in the message can be calculated by combining the header length.

The following three header fields relate to the fragmentation and reassembly process of IP datagrams. Since the network layer generally limits the maximum length of each data frame, the IP layer sends datagrams and queries the maximum transmission length of each data frame of the current device network layer at the same time as routing. Once the maximum transmission length is exceeded, the datagrams will be fragmented and reassembled after arriving at the destination. At this time, the following three fields will be used as the basis for reassembly. It should be noted that because there is a routing process, the maximum transmission length of data frames for each layer of routing equipment through which datagrams pass is different, so fragmentation may occur in any routing process.

Packet ID: This ID is equivalent to an ID. The IP layer will add 1 to this packet ID every time a partition is successfully sent.

Flag: It takes up three bits, namely R, D and M. R is not used yet, but D, and M are useful. This field indicates the fragmentation behavior of the datagram. D If it is 1, it means that the data does not need to be fragmented and is transmitted at one time; M If it is 1, it means that the data is fragmented and there is data behind it. When it is 0, it means that the current datagram is the last fragment or only this fragment.

Slice offset: indicates the distance from the current partition to the beginning of the original datagram. After the partition, the total length of each slice will be changed to the length of this slice, rather than the length of the entire datagram.

Lifetime: (TTL) can determine whether a datagram is discarded. Because IP sends data hop by hop, data may be forwarded between different IP layers with routing functions set, so the lifetime indicates how many routes a datagram can go through at most. For each layer of routing, the value is subtracted by 1. When the value is 0, the datagram is discarded, and a message with an error message (ICMP, a component of the IP layer, is used to transmit some error information) is sent to the source. Lifetime can effectively solve the problem that datagrams are always forwarded in a routing loop.

Header checksum: verifies the integrity of the datagram. The sender sums the header, stores the result in the checksum, and the receiver calculates it again. If the calculation result is consistent with the result in the checksum, it indicates that the transmission process is OK, otherwise the datagram will be discarded.

Upper layer protocol: it determines which upper layer protocol the receiver will hand over the data to for processing, such as TCP or UDP.

Source IP: records the IP address of the sender, which is used when sending back error messages.

Destination IP: indicates the destination IP, which is used to make decisions for each routing.


Because the IP header only contains the destination IP address, it does not reflect the complete path. When sending data, the IP layer will make routing decisions based on the query results of the destination IP in the local routing table. Datagrams will be delivered to the destination hop by hop. Each hop here is a routing choice.

The IP layer can be configured as either a router or a host. When the routing function is configured, the datagram can be forwarded. When the destination IP is not the local IP, the datagram will be discarded.

When the target IP address of the IP layer with routing function is not the local address, what is the basis for determining which station to forward to? To understand this problem, you need to understand the structure of the routing table. The following is the routing table maintained by the IP layer:

Destination IP: indicates the network address or host address that the IP datagram will eventually reach or pass through.

Gateway (next hop address): the address of the adjacent router of the current routing table maintenance device

Flags: indicates the attributes of the current route record, which are represented by five different flags:

U: This route can be used

G: If there is this flag, it indicates that the next hop is a gateway. If there is no such flag, it indicates that the next hop is in the same network segment with the current device, that is, it can send datagrams directly

H: Whether the next hop is a host or a network, if there is this flag, it indicates a host; if there is no, it indicates that the next hop route is a network

D: The route is created by the redirect message

M: The route has been modified by the redirect message

Interface: the physical port of the current route item

Each time a datagram is received, the IP layer will query in the routing table according to the destination IP address, and will lead to three results according to the query status:

The route item matching the destination IP is found, and the message is sent to the Gateway or Interface of the next station of the route item

The route item matching the network number of the destination IP is found, and the message is sent to the next gateway or network interface of the route item

If the first two are not found, it depends on whether there is a default route item in the route table. If there is one, it will be sent to the designated next station route (Gateway)

If none of the above three results, the datagram cannot be sent. IP datagrams are sent to the destination host hop by hop, but they have an inherent length. Once they exceed the MTU of the destination host, they will be fragmented.

The concept of datagram fragmentation

When TCP handshakes, it will determine the maximum amount of data (MSS) that TCP data can transmit each time according to the maximum transmission unit (MTU) of the IP layer at the destination. Then TCP will packet data according to MSS, and each packet will be packaged into an IP datagram. When IP datagrams are routed through any layer in the routing process, they may be limited by MTU and thus fragmented. At this time, the M flag in the 3bit flag on the IP header is set to 1, indicating that fragmentation is required. The header of each partition is basically the same, but the slice offset is different. According to the slice offset, these fragments are recombined into a complete IP datagram (a TCP packet) at the destination. IP transmission is unordered, so the datagrams obtained are also unordered. However, if the data is complete, TCP will sort them according to the fields in the header. Once the IP fragment is lost and the IP layer cannot compose a complete datagram, it will tell the TCP layer to retransmit.

Link Layer ARP Protocol**

After the IP layer encapsulates the data, there is only the IP address of the target host. It is not possible to send datagrams directly with IP addresses alone, because each hardware device has its own MAC address, which is a 48bit value. Now that you know the address of the target IP, you need to find the MAC address corresponding to this IP. In this process, the MAC address corresponding to the target IP is finally obtained by querying the routing table and combining the ARP protocol of the link layer.

ARP protocol realizes the mapping from IP address to MAC address. At first, the starting point did not know the MAC address of the target, only the target IP address. To obtain this address, ARP requests and replies were involved. Similarly, ARP also has its own grouping. Let's take a look at the grouping format first.

Ethernet destination address: MAC address of the destination. When there is no ARP cache table, this is the broadcast address.

Ethernet source address: MAC address of the sending end.

Frame type: different frame types have different formats and MTU values, and different types have different numbers. Here, the number corresponding to ARP is 0x0806.

Hardware type: refers to the link layer network type. 1 refers to Ethernet.

Protocol Type: refers to the address type to be converted. 0x0800 is the IP address. For example, convert Ethernet address to IP address.

Operation type: There are four types, namely ARP request (1), ARP response (2), RARP request (3), and RARP response (4).

Source MAC Address: indicates the MAC address of the sending end.

Source IP Address: indicates the IP address of the sending end.

Destination Ethernet address: indicates the MAC physical address of the target device.

Destination IP Address: indicates the IP address of the target device.

Before the two devices send the message, the link layer of the source end will use the ARP protocol to ask the MAC address of the destination end, and the ARP will broadcast a request. Each host on the Ethernet will receive the broadcast. The purpose of the broadcast is to ask the MAC address of the destination IP. The main content is to first introduce your own IP and MAC address, and then ask if you have a destination IP, please reply to your hardware address. If a host sees that it has the IP address after receiving the broadcast and requests the internal active IP and MAC address, it will respond to the source host with an ARP response. If there is no target IP address, the request will be discarded. It can be seen that the request is broadcast to the outside, and the response is a separate response.

However, you cannot go through a request response process before each communication. After successfully receiving the response, the mapping relationship between IP and MAC addresses will be cached in the ARP cache table. The validity period is generally 20 minutes, which is convenient for the network layer to directly encapsulate next time. Therefore, the complete process should be:

After receiving TCP packets, the IP layer queries the routing table before sending or encapsulating them:

When the target IP and the user are in the same network segment, first go to the ARP cache table to find whether there is a MAC address corresponding to the target IP, and if so, send it to the link layer for encapsulation. If it is not in the cache table, broadcast it, obtain the MAC address, and cache it. The IP layer encapsulates TCP, and then sends it to the link layer for encapsulation.

When the target IP address is not in the same network segment as itself, the message needs to be sent to the default gateway. If there is a MAC address corresponding to the gateway IP in the ARP cache table, it is sent to the link layer for encapsulation. If not, broadcast, obtain the address and cache it. The IP layer encapsulates TCP, and then sends it to the link layer for encapsulation.

Ethernet data frame

All the above things are ready. What is encapsulated and sent is actually Ethernet data frame. Ethernet destination address, Ethernet source address, and frame type form the frame header. The preamble and frame start delimiter will be inserted before the header to inform the receiver to make some preparations. The frame check sequence FCS is added to the tail to detect whether the frame is wrong.

Pre synchronization code: coordinate the clock frequency of the terminal receiving adapter to make it the same as the frequency of the sending end.

Frame start delimiter: a mark indicating the beginning of a frame, indicating that the frame information is coming and ready to receive.

Destination address: the MAC address of the network adapter receiving the frame. When the receiving end receives the frame, it will first check whether the destination address matches the local address. If not, it will be discarded.

Source address: MAC address of the sending end device.

Type: determines the protocol to handle the data after receiving the frame.

Data: data handed over to the upper level. In this scenario, it refers to IP datagrams.

Frame check sequence: detect whether there is an error in this frame. The sender calculates the cyclic redundancy check (CRC) value of the frame and writes this value to the frame. The receiver computer recalculates the CRC and compares it with the value of the FCS field. If the two values are different, it indicates that data loss or change occurred during transmission. At this time, it is necessary to retransmit the frame.

Transmission and reception


After receiving the datagram from the upper layer, decide whether to divide it into small pieces according to the MTU and datagram size, that is, the process of IP datagram being fragmented.

Encapsulate the datagram (block) into a frame and send it to the underlying component, which converts the frame into a bit stream and sends it out.

The device on the Ethernet receives the frame and checks the target address in the frame. If it matches the local address, the frame will be processed and transmitted layer by layer (sharing process).


Above, we have sorted out the complete network process of IoT devices to collect data from sensors, package them into MQTT messages by end applications, package them layer by layer through network protocols, and then split them layer by layer into cloud receiving systems. I hope it will be helpful for you to understand the MQTT, TCP, IP, and ARP network protocols related to the Internet of Things.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us