Traditional message queues (such as Kafka and RocketMQ) have been optimized for many years to achieve excellent performance, massive accumulation, and message reliability. However, in IoT scenarios, the traditional message queue is often faced with a large number of messages ".
In the IoT field, event messages need to be transmitted from application servers to embedded chips, such as opening cabinets of shared power-charging devices, sending light-on instructions from servers to devices, and high-frequency message streams of industrial gateways, in the process of information transmission, the greatest significance of queues is to make the entire message event become a stable system under uncontrollable environmental factors, this is because IoT devices may occasionally cause a large number of message peaks due to faults or network jitter.
As a leader and innovator in the IoT field, Alibaba Cloud AIoT has been deeply engaged in Message Queue. To help IoT practitioners learn more about IoT scenario queues, Alibaba Cloud Technical experts Lv Jianwen, this topic describes how to create a message queue for IoT systems.
1. Differences between IoT queues and common queues
1. Isolate the uplink and downlink data.
In the IoT scenario, we divide the required queues into two scenarios: upstream queues and downstream queues. After splitting, the uplink and downlink links can be isolated to control a device. For example, if the payment is successful, the device must be delivered to open the cabinet. If there is any uplink problem, the downlink service must not be affected. In addition, the characteristics of the uplink and downlink are very different. The number of concurrent messages sent to the device is very high, but in many scenarios, the reliability and latency requirements are low, while the number of concurrent messages sent to the device is relatively low, however, downstream messages (usually control device instructions) require a high success rate.
2. Supports a large number of device-level topics.
The core requirement of traditional queues is that no matter how many queues are accumulated, their performance is not affected. If there are more topics in kafka, the original advantage of sequential message writing will lead to the degradation of a broker to random write, losing the advantage. In addition, zookeeper is also limited to coordinate so many topics, so these queue itself with provide a external broker Bridge Foreign entrance is more equipment topic, again bridging mapped to a small amount of actual kafka topic, this programme has certain feasibility, but can't isolation effect, no cure.
Through, Figure 1 and Figure 2 are more obvious. Congestion in one queue minimizes the impact on other devices. What we need is to isolate massive topics from each other as far as possible and do not affect the overall performance. We need to accumulate messages on device A and do not affect device B.
3. Send messages in real time first
let's take an example. A queue of express cabinet business accumulates, then "at this moment", the user next to the Cabinet desperately clicked on the Cabinet with his mobile phone and couldn't open it (at this time, the back-end system was restored), the problem is that there are hundreds of thousands of messages in the queue, and new messages need to be queued, waiting for the previous messages to be consumed, regardless of whether the messages are used or not. Therefore, messages generated in real time are sent preferentially, and accumulated messages enter the degradation mode.
2. IoT Message Queue
1. Design ideas for IoT queues
- fully follow the open-source ecosystem and is compatible with traditional queues.
- Order-keeping degradation, real-time priority, accumulation degradation, only real-time messages are relatively ordered.
- Massive topics and multi-tenant isolation
- separation of connection, computing, and storage
2. Message mode
the picture is just a clip. We can see the mechanism difference from this mode. Everyone is right, but the starting point is different.
3. Separation of connection, computing, and storage
the broker does not connect, but connects to the gateway proxy. The broker only distributes data to the flow, which is stateless and horizontally expanded. The storage is delivered to the nosql DB, which is high-throughput.
4. Message strategy-push-pull combination
this should be one of the core difficulties of queues. Compared with traditional queues, we consider the platform mode and the exclusive resources are too expensive. However, the problem is that the consumer is uncontrollable. Therefore, the combination mode is used to pull accumulated messages only when the consumer is online, and the pulling is done by the AMQP queue gateway, the onMessage callback is always pushed to the user interface.
- The broker does not directly let the consumer connect, but removes the queue Gateway, which is more flexible. For some users, the queue can be switched to ons, kafka, and other implementations. kafka and rocketmq assign a broker access address to the client when connecting.
- broker real-time message priority pushed to consumer, failure will fall queue; This is a complete event, if not completed is not to producer commit.
- Asynchronous ACK
5. Linear scaling-offline messages
some messages in real time are pushed, which basically does not become a bottleneck. Messages that cannot be consumed enter the accumulation mode. Because the underlying dependency storage has helped us solve the expansion of core storage, the remaining major problem is how to eliminate writing and consumption hotspots, so that the broker can be completely stateless.
Third, one thought-how to solve the problem of massive topics?
When facing massive issues, partition, unit, and group isolation and splitting are generally considered. Here, we discuss how to achieve as many topics as possible in a single instance mode, it is definitely unrealistic that 100% of the total number is OK.
Because the broker is isolated from the storage, the broker has no relationship with the topic, or any topic data is generated, what the broker does is to write and distribute.
- Massive topic, each topic limited number subscription: topic and subscribers relationship use redis Cache or local cache, for mqtt topic match with a topic tree the tree algorithm, hivemq with implementation version.
- Massive subscription of a single topic: This scenario is actually multicast and broadcast. We do not consider doing this on the queue itself, but encapsulate broadcast components at the upper layer to coordinate tasks and send in batches.
currently, Alibaba Cloud AIoT queue is also called Server subscription, which means that users use the server to subscribe to their device messages. To reduce access costs, users can use the AMQP1.0 protocol for access, which conforms to the open-source ecosystem. It is compatible with traditional queues and new Queues. Users can choose kafka, mq, or iot queue or combination mode, for example, you can configure a flow queue based on message feature rules.
In the scenario queue practice of Alibaba Cloud AIoT, in addition to the integration of existing mq queues and kafka queues, a self-built real-time priority queue implementation is added. In addition, a queue gateway proxy is added, you can select both normal message queues and lightweight IoT message queues.