By Zhaofeng Zhou (Muluo)
IM is the abbreviation of instant messaging. In the highly information-based mobile Internet era, IM products have become a must-have item in our life. The most well-known IM products in China are DingTalk, WeChat, and QQ. Among them, WeChat has already developed into an ecosystem, but its core function is still an IM. IM is an inseparable module of some applications that do not take IM as the core business. The most typical ones are online games and social networking applications. The IM feature is essential to applications with social attributes.
The IM system came into being in the early days of the Internet. Its basic technology architecture has been updated many times during the last dozen of years—from the early CS and P2P architectures to the current distributed systems in the backend. It involves all aspects of technologies, such as mobile clients, network, security, and storage. The daily active users that an IM product serves have increased from a small number in the early days to up to 900 million as announced by WeChat recently.
The core of an IM system is the messaging system, and the core function of the messaging system is the synchronization and storage of messages:
The article mainly describes the messaging system architecture in the IM system. We will introduce the implementation of a Table Store-based message synchronization and storage system architecture. It supports advanced features of the messaging system, such as "multi-end synchronization" and "message roaming". In terms of performance and scale, it supports full message storage on the cloud, and message synchronization with millions of TPS and millisecond delays.
This section mainly describes the architecture design of Table Store-based modern IM systems. Before describing the architecture design in detail, we will introduce the timeline logic model to abstract and simplify the understanding of the IM synchronization and storage models. After understanding the timeline model, we will talk about the modeling methods for the synchronization and storage of messages based on the timeline model. We also have to make technical trade-offs in various aspects when implementing message synchronization and storage. For example, we have to compare and choose common pull and push modes for message synchronization, and choose the underlying database based on the characteristics of the timeline model.
The above diagram is a simple comparison between the conventional and modern architectures of a messaging system.
Under a conventional architecture, messages are synchronized first before they are stored. For online users, messages are directly synchronized to online recipients in real time. After a message is successfully synchronized, it will not be persisted. For offline users or messages that cannot be synchronized in real time, the messages will be persisted to an offline database. The recipient may pull all unread messages from the offline database after reconnecting to the Internet. A message will be deleted from the offline database after it is successfully synchronized to the recipient. Main tasks of the conventional messaging system server is to maintain the connection between the sender and the recipient, and to provide online message synchronization and offline message caching. This design ensures messages can be transmitted from the sender to the recipient under all circumstances. The server does not persist the message, so it does not support message roaming.
Under the modern architecture, messages are stored first and then synchronized. The advantage of storing messages before synchronization is that when a recipient receives a message, the message must have already been saved on the cloud. In addition, the messages are saved in two databases—the message storage database and the message synchronization database. The message storage database stores messages of all sessions to support message roaming. The message synchronization database is mainly used for multiple-terminal synchronization of the recipient. After a message is sent by the sender, it is forwarded by the server. The server subsequently saves the message to the message storage database and the message synchronization database. After a message is persisted, if the recipient is online, the message is pushed to recipient directly. However, online push is not the only option. It's just the preferred one. For messages that failed to be pushed online, or when the recipient is offline, there is another unified message synchronization method. The recipient will actively pull all unsynchronized messages from the server. However, the time of synchronization, and from which terminals the recipient may send the message synchronization requests are unknown to the server. So, the server must save all messages that need to be synchronized to the recipient. This is what the message synchronization database is designed for. Users of an IM product may have message roaming needs when they use new devices. The message storage database is designed to meet such needs. From the message storage database, you can pull all historical messages for any session.
The above is a simple comparison between the conventional and modern IM system architectures. The modern architecture supports multi-terminal synchronization and message roaming without making the entire message synchronization and storage process much more complicated. The core of the modern architecture is the two message database—the "message synchronization database" and "message storage database". They are the foundation of message synchronization and storage. The next section of this article will mainly describe the design and implementation of these two databases.
Before analyzing the design and implementation of the "message synchronization database" and "message storage database", we will first introduce the timeline logic model. Understanding the timeline model is helpful to understand of message synchronization and storage models. The design and implementation of a message database is also based on the characteristics and requirements of the timeline model.
The above diagram is an abstract representation of the timeline model. The timeline can be simply understood as a message queue that has the following characteristics:
With these characteristics, the synchronization of messages can be easily implemented with timeline. In the above diagram, A is the message sender, and B the recipient. B has multiple receiving terminals, which are B1, B2, and B3, respectively. When A sends a message to B, the message needs to be synchronized to multiple terminals of B. The messages to be synchronized are exchanged through a timeline. All messages sent by A to B are saved in this timeline, and each receiving terminal of B independently pulls these messages from this timeline. After all messages are synchronized to each of the receiving terminals, the SeqId of the last synchronized message is recorded locally in the receiving terminal. This SequId is used as the starting checkpoint of the next message synchronization. The server does not have to record the synchronization status of each receiving terminal, and each terminal can pull messages from any time point.
Message roaming is implemented based on timeline, too. The only difference between message synchronization and message roaming is that message roaming requires the server to persist all data in the timeline.
Based on the timeline logic model, we can easily understand how to implement message synchronization and storage on the server side, and how to implement advanced functions such as multi-terminal synchronization and message roaming. The main implementation challenges are: How to map the logical model to the physical model? What are the database requirements for implementing the timeline model? Which database should we choose? These are the topics that will be discussed next.
The above diagram illustrates a timeline-based message storage model. Message storage requires each session to have a separate timeline. As shown in the example, A has a session with B, C, D, E, and F. Each session has a separate timeline. All messages of a session are held in the corresponding timeline memory. The server will persist each timeline. Because the server can persist the full amount of messages of all session timelines, and it has the ability to support message roaming.
The message synchronization model is slightly more complicated than the message storage model. Message synchronization is generally implemented in two different modes - pull and push corresponding to different physical timeline models.
The above diagram illustrates the timeline models of two synchronization modes: pull and push. As shown in the diagram, the message recipient A simultaneously has sessions with B, C, D, E, and F. All new messages in these sessions need to be synchronized to one of A's terminals. Let's take a look at how messages are synchronized in both the pull and push modes.
IM systems usually choose the push mode message synchronization. In the IM scenario, a message is generated only once in a session, but it will be read multiple times. It is a typical scenario with more reads than writes. The read/write ratio of messages is about 10:1. If we use the pull mode message synchronization, the read/write ratio of the IM system will be amplified to 100:1. A well-optimized system must be designed to balance the read and write pressure, and avoid bottlenecks of either read or write. Therefore, IM systems usually use the push mode message synchronization to balance reads and writes, and the read/write ratio could be balanced from 100:1 to 30:30. Of course, the push mode synchronization also needs to deal with some extreme scenarios, such as a group chat with over ten thousand participants. For such extreme push mode scenarios, the pull mode may be used. A simple IM system usually restricts the creation of such a large group at the product level. However, an advanced IM system usually blends the pull and push modes to meet the needs of such scenarios.
Based on the timeline model and the application of the timeline model in message storage and synchronization, let's take a look at the design of the message synchronization database and the message storage database.
The above illustrates the design of a timeline-based message database.
The message synchronization database and the message storage database have different database requirements. Next, we will discuss database selection.
The message synchronization database and the message storage database are the core databases of the message system. They have different database requirements:
To sum up, the database requirements are:
1. The schema design must be able to meet the functional requirements of the timeline model: it does not have to be a relational model, but it should be able to implement a queue model to enable the generation of auto-incrementing SeqIds.
2. Able to support highly concurrent writes and range reads, with a capacity of 100,000+ TPS.
3. Able to store massive amounts of data, measured in hundreds of TB.
4. Able to define data lifecycle.
Alibaba Cloud Table Store is a LSM storage engine-based distributed NoSQL database. It supports highly concurrent reads and writes at millions of TPS, PB level data storage, and TTL. It fully satisfies the above requirements, and supports auto-increment. It is a perfect design and physical model of timeline.
Let the code speak for itself. For the detailed sample code, click here.
This article mainly describes the implementation of the message push and storage architectures in a modern IM system. Based on the timeline logic model, we can clearly understand the message synchronization and storage architectures. Table Store is a perfect implementation of the timeline model. Its auto-incrementing feature solves the most critical problem of the timeline model—the auto-incrementing SeqId.
Table Store is a professional distributed NoSQL database independently developed by Alibaba Cloud. It is a high-performance, low-cost, scalable, and fully managed semi-structured data storage platform based on shared storage. It supports efficient calculation and analysis of Internet and IoT data. The message push and storage scenario of the IM system is one of the most important applications of Table Store in the social networking field.
The timeline-based message storage and push model can be applied in many other scenarios apart from the IM message system. For example, feed stream, real-time message synchronization, and bullet screens of live broadcast. In the feed stream field, we also have some in-depth studies. We also have some in-depth studies in other scenarios.
We have been constantly improving Table Store to meet the high-availability and high-reliability data requirements of the social networking scenario:
Alibaba Clouder - June 30, 2020
Alibaba Clouder - August 5, 2020
Alibaba Clouder - December 16, 2020
Alibaba Clouder - November 10, 2017
Alibaba Cloud Storage - February 27, 2020
Alibaba Clouder - February 14, 2020
A fully managed NoSQL cloud database service that enables storage of massive amount of structured and semi-structured dataLearn More
Block-level data storage attached to ECS instances to achieve high performance, low latency, and high reliabilityLearn More
Accelerate software development and delivery by integrating DevOps with the cloudLearn More
Plan and optimize your storage budget with flexible storage servicesLearn More
More Posts by Alibaba Cloud Storage