Cloud native event-driven engine

At the recent RocketMQ Summit 2022 Global Developer Summit, we officially opened our new product RocketMQ-Eventbridge event-driven engine.

RocketMQ has always been a message engine. What is an event-driven engine? Why should we launch the event-driven engine this time? What are the application scenarios and the corresponding technical solutions?

Let's take a look today. The whole article contains three parts:

In the first part, let's look at what an event is.

In the second part, let's take a look at the different "super capabilities" of the event. What can we do with these "super capabilities"?

In the third part, we will talk about the solution of events given by RocketMQ, which is also our open source project this time: RocketMQ-EventBridge.

What is an event

You can think about it in your head. What is an event? We define an event as:

What has happened in the past, especially what is more important.

A thing that happens, especially one of importance.

This is easy to understand. For example, I did a nucleic acid test yesterday afternoon; I ate another ice cream this morning. These are all events that have occurred in the past. However, if I ask again: What is the difference between events and news? At this time, do you think the definition of event is not so clear?

Can the events just mentioned be understood as news? If Lao Zhang sent me a text message, is it an event or a message? In the normal development process, "When to use messages and when to use events?"

However, before answering this question, let's take a look at a typical microservice.

The interaction between a microservice system and an external system can be divided into two parts: one is to receive external requests (the yellow part in the figure); The second is to call external services (the green part below in the figure).

There are two ways to receive external requests: one is to provide APIs to receive query requests and command requests sent from outside; The other is to actively subscribe to external command messages. After these two types of operations enter the system, we often call other microservice systems to work together to complete a specific operation. When these operations change the system state, an event will be generated.

Here, we call the Command message received from the outside and the event generated inside the system messages.

To sum up, the relationship between messages and events is as follows: messages include two parts, Command message and Event message

1. In the left half of the figure, Command is an operation command sent by an external system to the system;

2. In the right half of the figure, Event is the event generated after the system receives the Command operation request and changes in the system;

Therefore, events and messages are different. Events can be understood as special messages. Its special points are mainly in four places:

Occurred and immutable

The event must be "sent". What does "happened" mean? Immutable. We can't change the past. This feature is very important. When we deal with and analyze events, it means that we can absolutely trust these events. As long as they are received, they must be the real behavior of the system. It cannot be modified.

Compare Command and Query. What is Command in Chinese? Command. Obviously, it hasn't happened yet, just expressing a kind of expectation. We know that "expected" will not necessarily succeed.

For example, turn on the kitchen light, press the doorbell, and transfer to account A for 10w

These are all commands and expected behaviors. But did it happen in the end? I don't know.

Event is to clarify what has happened. For example, the kitchen light was turned on, someone rang the doorbell, and account A received 10w

Compared with Query, it is a request to query the current status of the system. For example, the kitchen light is on, the doorbell is ringing, and the account balance is 11w


How to understand this? An event objectively describes the change of a thing's state or attribute value, but has no expectation of how to handle the event itself.

In contrast, Command and Query have expectations. They want the system to make changes or return results, but Event is only an objective description of the system's changes.

Let's take an example: the traffic signal light changes from green light to red light. The event itself does not require pedestrians or cars to be prohibited from passing, but the traffic regulations require traffic lights and give them rules.

Therefore, the system generally does not send events to another system directly or separately, but uniformly tells the "event center", where there are various events reported by various systems. The system will explain to the event center: what events will be generated in its system and what is the format of these events.

If other systems are interested, they can actively subscribe to these events. It is the event consumers who really give value to the event. Event consumers want to see what has happened to a system? OK, then he will subscribe to these events, so the events are consumer-driven.

What is the difference between this and news? The sending and subscription of Command messages are agreed by both parties, which are unknown to outsiders. They are often in the form of documents or codes. Everyone sends and subscribes to consumption according to the agreed agreement, so the message is producer driven.

Let's make an analogy. Events are like market economy. The specific value of goods produced depends largely on their consumers. We can see all kinds of events in the system, just like all kinds of goods in the window. The Command message is a bit like the planned economy. It has a strong purpose at birth. I just want to "allocate" to whom to consume.

Natural order

The third characteristic of events is "natural order". Meaning: For the same entity, A and B cannot occur at the same time. There must be a priority relationship; If yes, the two events must belong to different event types.

For example, for the same traffic signal light, it cannot be changed into both green and red lights. At the same time, it can only be changed into a state.

You may find that there is an additional attribute of the event hidden here: it must be unique because it is naturally ordered, strongly bound to a certain time on the timeline, and cannot occur at the same time.

If we see two events with the same content, it must have happened twice, one before the other. (This is very valuable for us to deal with the final consistency of data and the analysis of system behavior: what we see is not only a final result of the system, but also a series of intermediate processes before this result)


The fourth characteristic of an event is "figurative".

The event will record the "crime scene" as completely as possible, because it does not know how consumers will use it, so it will try to be as detailed as possible, such as:

● Who caused the event? Subject

● What type of event is it? Type

● Who sent the event? Source

● What is the unique sign of the event? Id

● When does it happen? Time

● What is the content of the event? Data

● What is the content of the event? Dataschema

Let's take traffic lights as an example:

Compared with our common messages, because the upstream and downstream are generally determined, they will be as simple as possible for the sake of performance and transmission efficiency, as long as they meet the consumer needs specified in the "planned economy".

To sum up, the four features above the event are a huge attribute addition to the event, which makes the event have a "super power" different from ordinary messages. Events are often used in four typical scenarios: event notification, event traceability, inter-system integration and CQRS.

Let's expand one by one to see these application scenarios.

Typical application scenarios of events

Event notification

Event notification is a very common scenario in our system. For example, the user places an order to notify the payment system; The user payment event is notified to the transaction system.

Here, let's return to the example of signal light at the beginning. When the traffic light changes from red to green, many systems may need this information.

Method 1: The sender actively calls and adapts to the receiver

The simplest way is to call each system in turn and pass the information out. For example, the signal light system actively calls the API service of map navigation, the API service of traffic police central control, and the API service of urban brain to send out the traffic light change signal.

But we all know that this design is very bad. Especially when there are more and more systems, this is undoubtedly a disaster. Not only is the development cost high, but also if one of the systems has problems, it may hang the entire service, which will affect the call to other systems.

Method 2: The receiver actively subscribes and adapts to the sender

A natural solution is that we send this information to the intermediate message service broker. If other systems need it, they can actively subscribe to these messages.

At this time, the signal light system has no direct call dependency with other systems. The traffic police central control service, map navigation service, and city brain service can only subscribe to the signal light message and parse the information according to the agreed agreement.

However, there is also a problem here: in this architecture, the "signal light" is the center. Consumers need to understand the sender's business domain and actively add an adaptation layer (that is, the white boomerang part in the figure) to translate the message into the language of their business domain. But for every microservice, he hopes that it is highly cohesive and low coupled.

If the traffic police central control needs the national signal data, but the message format of each region is different, this means that the traffic police central control needs to adapt the protocol of each region and do a layer of conversion. And what if it changes later? Just think about how terrible the operation and maintenance cost is.

Is it possible for the traffic police central control system to require all traffic light systems in the country to give them the same data protocol? Sorry, these signal light data map services are also in use, and the city brain is also in use, which cannot be changed.

Method 3: Introduce events, and Borker flexibly adapts according to the receiver protocol

But if you use events, it will be different. Because the event is "unexpected" and "figurative", it naturally retains as much information as possible at the scene of the crime and is more standardized. For consumers (that is, traffic police hollow), it is easy to assemble events collected from different provinces into a format that meets their own business requirements.

Moreover, this assembly takes place in the middle-tier broker. For traffic police central control, it only needs to provide an API to receive events according to its own business domain design, and then other events can be actively delivered to this API through the broker. From beginning to end, there is no line of code suitable for external business for the traffic police central control system.

Therefore, this method has three obvious advantages:

1. Only focus on your own business domain, and do not need to adapt external code;

2. All changes to the system converge to the API, which is the only entry; The same API may be used for both receiving events and console operations;

3. Because events are pushed, it is not necessary to introduce an SDK to connect with the broker to obtain messages, which reduces the complexity of the system.

In this way, our initial picture will look like this: traffic lights generate events and deliver them to the event center. Other consumers who need these events subscribe to the event center, and then the event center actively delivers them according to their expected event format.

Let's review the whole process again:

Figure 1: At the beginning, we let the signal light system actively send information to each system through strong dependence. In this picture, we focus on all downstream services, and the signal light system adapts to all downstream services.

Figure 2: Later, we used the traditional message method to decouple the calling link. The systems on both sides no longer depend directly on each other, but there will still be business dependence. Consumers need to understand the message format of the producer, and transform and adapt within their own system. Therefore, this is actually producer centered.

Figure 3: Finally, we introduced the event notification method. For this method, producers and consumers only need to pay attention to their own system. Producers, what events they produce, consumers, and what data formats they consume are all centered on their own businesses and do not need to adapt for each other. Really achieve what we call high cohesion and low coupling, and realize complete decoupling.

Now, back to the typical microservice model we mentioned at the beginning, for some scenarios, we can change to the following way: change operations of microservices, uniformly converge to the API operation entry, and remove the Command message entry. Convergence entry is often very beneficial for us to maintain microservices and ensure system stability.

event sourcing

What is the source of the event? The simple understanding of event traceability is to return the system to any time in the past. How can the system go back to the past? Very simple, first of all, all changes in the system can be recorded in the form of events; Then, we can go back to any time in the past by playing back the event.

Then why can only events do this, and other ordinary news can't do it? This is to go back to the characteristics of the event we just mentioned: the occurrence of the event is immutable, natural, orderly and unique, and it is very detailed and specific, which completely records the scene of the event. Therefore, for the event traceability scenario, the event can be said to be a first-rate citizen of the system.

For example, if we can collect all kinds of event information on the road, including traffic lights, traffic volume, weather, traffic congestion, etc., then we can "cross the time", return to the traffic scene and make a new decision. For example, in the intelligent traffic scenario, when we want to verify a scheduling algorithm, we can replay all the events that occurred at that time to reproduce the scene.

You may think this is amazing, but in fact, we have always been in contact. Do you know what it is? It is our common code version - management system, such as github.

Here, you may ask, if a system likes many events, will it take a long time to replay them? For example, in some trading scenarios, there will be a large number of events every day. What should we do? Here, the system generally takes a snapshot every night. If the system goes down unexpectedly and wants to go back to a certain point, you can take out the snapshot of the previous day, and then rerun the events of the day to recover. In the daytime, all events are processed in memory and will not interact with the database. Therefore, the system performance is very fast, and only events will fall off the disk.

Of course, event tracing is not suitable for all scenarios. It has advantages and disadvantages. See the figure above for details.

Inter-system integration

The first scenario just mentioned: event notification, which generally involves the collaborative development of two upstream and downstream teams; The second scenario: event traceability, which is generally the development within a team; However, the integration between systems is often faced with the collaborative development of three business teams. How to understand this?

In fact, this is also very common: for example, the company has purchased ERP system, and also purchased external attendance system, external marketing system services, and so on. These systems have one thing in common. What is it? We didn't develop them ourselves, but bought them.

What if we want to synchronize the personnel information of the ERP system to the attendance system in real time and automatically? In fact, this is a bit troublesome, because these are not developed by us.

1. We can't modify the code of the ERP system, and take the initiative to call the attendance system to send the personnel change information;

2. You can't modify the code of the exam system and actively call the API of the external ERP system;

However, we can collect the personnel change events generated by the upstream ERP system through the event bus, with the help of webhook or standard API, and then filter and transform them and push them to the downstream attendance system. Of course, this can also be an internal self-research service.

Therefore, the current R&D model has become: the event center manages all SaaS services, including all events generated by the internal self-research system. Then, we just need to find the events we need in the event center, subscribe to them, and perform simple service choreography for SaaS services and internal self-developed systems to complete the development.


C in CQRS stands for Command. What does Command mean? It is a clear order, generally including Create/Update/Delete. Q stands for Query, which means query. So the essence of CQRS is read/write separation: all write operations are completed in the system on the left in the figure, and then the events that the system changes due to the command are synchronized to the query system on the right.

Here, students may have questions. What is the difference between this and the separation of reading and writing of databases? Database read/write separation also provides a write DB and a read DB. Both sides are synchronized. Right

A big difference here is that the database is the center of the read/write separation of the database. The databases on both sides are identical, and even the data storage structure is identical.

However, the read/write separation scenario of CQRS is business-centric. The data structure format stored on both sides is often different, even the database is not the same. Design the best technology selection based on their own reading and writing business logic. For writing scenarios, we may use relational databases to ensure transactions; For reading scenarios, we may use Redis, HBase and other Nosql databases to improve performance.

Of course, CQRS is not suitable for all scenarios. It is often suitable for:

● Hope to meet the requirements of high concurrent write and high concurrent read at the same time;

● When there is a big difference between writing model and reading model;

● When the reading/writing ratio is very high;

We just talked about the four application scenarios of events, but events are not omnipotent, just as there is no silver bullet in software development, there are many scenarios that are also not suitable for using events. include:

1. Synchronous call scenario with strong dependence on Response;

2. Scenarios requiring service invocation to maintain strong transaction consistency.

RocketMQ's solution to events

What kind of ability is needed?

First of all, according to the event application scenarios mentioned earlier, let's sort them out. If we do a good job in event driving, what capabilities should our system have?

First, we must have an event standard, right? Because the event is not for himself or him, but for everyone. Just now, we also said that the event is unexpected. It has no clear consumers. All are potential consumers. Therefore, we need to standardize the definition of the event so that everyone can understand it at a glance.

Second, we need to have an event center, which has all systems and registered events, (This is different from news. We don't have a message center, because the message is generally targeted, agreed by producers and consumers, a bit like the planned economy. When the message is produced, it has a strong purpose, and it is for whom to consume. And the event is a bit like the market economy, and the event center.) This is a bit like the market economy hypermarket, which is exquisite and full of all kinds of events, Everyone can come in and have a look, even if they don't buy it. If there are any events that may be needed by me, they can buy it back.

Third, we need to have an event format to describe the specific content of the event. This is equivalent to a sales contract in a market economy. The format of the event sent by the producer must be determined and cannot always be changed; The format in which consumers receive events must also be determined, otherwise the whole market will be in chaos.

Fourth, we need to give consumers the ability to deliver events to the target end. And before delivery, events can be filtered and converted so that they can adapt to the format of the target API receiving parameters. We call this process subscription rules.

Fifth, we need to have a place to store events, which is the event bus in the middle.

Event criteria

As for the first event standard mentioned just now, we have selected CloudEvents, an open source project under CNCF, which has been widely integrated and is a de facto standard.

Its protocol is also very simple. It mainly specifies four required fields: id, source, type, and specversion; And multiple optional fields: subject, time, data schema, datacontenttype, and data. On the right of the figure above, we have a simple example. You can see that it is not specific here.

In addition, event transmission also needs to define a protocol to facilitate communication between different systems. By default, three HTTP transmission modes are supported: Binary Content Mode, Structured Content Mode and Batched Content Mode. Through the Content-Type of HTTP, these three different modes can be distinguished. The first two are to deliver a single event; The third is to deliver batch events.

Event Schema

The event schema is used to describe the attributes, corresponding meanings, constraints and other information in the event. At present, we have selected Json Schema And OpenAPI 3.0. According to the schema description of the event, we can verify the validity of the event., Of course, the modification of the Schema itself also needs to comply with the compatibility principle, so it will not be expanded specifically here.

Event filtering and conversion

For event filtering and conversion, we provide 7 event filtering methods and 4 event conversion methods, which can be described in the following figure:

Technical architecture

Our RocketMQ product, called EventBridge, is also a new product that we will open source this time.

His whole architecture can be divided into two parts: the upper part is our control plane, and the lower part is our data plane.

The uppermost EventSource in the control plane is the event source registered by each system. These events can be sent to the event bus through APIGateway, or the configured EventSource can be used to generate SouceRunner to actively pull events from our system. After the event arrives at the event bus EventBus, we can configure the subscription rule EventRule. In the rule EventRule, we set how to filter the event and what transformation to do before it is delivered to the target end. The system will generate a TargetRunner based on the created rules, which can push the event to the specified target end.

What are SouceRunner and TargetRunner here? What upstream and downstream sources and targets can we dock with?

These can be registered in advance in the following SourceRegister and TargetRegister.

So the data side of EventBridge is an open architecture. It defines the SPI of event processing, and there can be multiple implementations below. For example, if we register the HTTP Connector of RocketMQ to the EventBridge, we can push the event to the HTTP server.

If we register Kafka's JDBC Connector to the EventBridge, we can push the event to the database.

Of course, if your system is not a general protocol such as HTTP/JDPC, you can also develop your own connector so that you can synchronize events to the EventBridge in real time or receive events from the EventBridge.

In addition, we also have some additional operation and maintenance capabilities, including event tracking, event playback, event analysis, and event archiving.

RocketMQ-EventBridge and Cloud

Among all open source connectors that integrate with other upstream and downstream systems, we have a special connector, called EventBridgeConnector, through which we can easily integrate with the event bus on Alibaba Cloud. Here are two typical application scenarios:

The first scenario is that the events generated inside the IDC system can not only be used for decoupling between internal systems, but also be synchronized to the cloud in real time to drive some computing services on the cloud, such as offline analysis of the events generated internally through Maxcompute on the cloud, or driving the image recognition service on the cloud to analyze the images marked in the event in real time.

The second scenario is that if IDC uses self-built MQ internally, we can also synchronize events to the cloud in real time through MQConnector and EventBridgeConnector, and gradually migrate the self-built MQ to the cloud.

Ecological development

As for the future direction of EventBridge, we hope to build an event bus ecosystem that supports multi-cloud architecture in open source. How to understand this? In short, we hope that between different cloud vendors, including cloud vendors and internal IDC systems, we can break the wall and achieve interoperability through events. Although cloud computing has developed rapidly in recent years, for some particularly large customers, sometimes they do not want to be strongly bound with a cloud vendor. This is not only the result of full market competition, but also a means to reduce risks for key customers. Therefore, at this time, how to flexibly interact and even migrate between different cloud vendors, including cloud vendor systems and their own internal IDC systems, is a very important demand of enterprises.

Of course, it is difficult to achieve this. However, if we design and develop enterprise architecture based on event-driven architecture - the interaction between different systems is based on events, it will be much easier.

Events here are like a common language. Through this common language, communication with different systems can be realized. For example, use events in the IDC system to drive services on Alibaba Cloud; Even use events on Alibaba Cloud to drive service operation on AWS;

In order to achieve this goal, we need to develop corresponding connectors when integrating with different cloud vendors and different SaaS system service providers.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us