Build a CDC application with EventBridge

Introduction

CDC (Change Data Capture) refers to an application scenario that monitors upstream data changes and synchronizes the change information to downstream businesses for further processing. In recent years, the popularity of event driven architecture (EDA) has gradually increased, becoming the first choice of project architecture designers. EDA naturally fits into the underlying infrastructure of CDC, which treats data changes as events. Each service completes a series of business drivers by listening to events of interest to it. Alibaba Cloud EventBridge is a serverless event bus service launched by Alibaba Cloud, which can help users easily and quickly build applications based on the EDA architecture. Recently, EventBridge event streams have supported CDC capabilities based on Alibaba Cloud DTS [1] services. This article will introduce how to easily build CDC applications using EventBridge from aspects such as CDC, CDC applications on EventBridge, and several best practice scenarios.

CDC Overview

Basic principles and application scenarios

CDC captures incremental data and data schema changes from the source database, and synchronizes these changes to the target database, data lake, or other data analysis services in an orderly manner through highly reliable and low latency data transmission. Currently, the mainstream open source CDC tools in the industry include Debezium [2], Canal [3], and Maxwell [4].

Image source: https://dbconvert.com

Currently, there are mainly the following types of CDC implementations in the industry:

1. Based on timestamp or version number

The timestamp based approach requires a field in the database table to represent the update timestamp. When there is data insertion or update, the corresponding timestamp field will be updated accordingly. The CDC component periodically retrieves data records whose update time is greater than the last synchronization time to capture data changes within this cycle. The principles of version number based tracking and timestamp based tracking are basically consistent, requiring developers to update the version number of data when changing it.

2. Snapshot based

The snapshot based CDC implementation uses three copies of the data source at the storage level, namely, the original data, previous snapshots, and current snapshots. Obtain the data change content between the two snapshots by comparing the differences between them.

3. Trigger based

In fact, the trigger based CDC implementation is to create a trigger on the source table to store records of data change operations (INSERT, UPDATE, DELETE). For example, a dedicated table is created to record user changes, and then three types of triggers, INSERT, UPDATE, and DELETE, are created to synchronize user changes to this table.

4. Log based

The above three methods are somewhat invasive to the source database, while the log based method is a non invasive CDC method. The database uses transaction logs to achieve disaster recovery. For example, MySQL's binlog records all user changes to the database. Log based CDC continuously monitors transaction logs to obtain real-time database changes.

CDC has a wide range of application scenarios, including but not limited to these aspects: database synchronization in remote computer rooms, data synchronization in heterogeneous databases, microservice decoupling, cache updates, and CQRS.

AliCloud based CDC solution: DTS

DTS (Data Transmission Service) is a real-time data flow service provided by Alibaba Cloud. It supports data interaction between data sources such as relational databases (RDBMS), non relational databases (NoSQL), and multi-dimensional data analysis (OLAP). It integrates data synchronization, migration, subscription, integration, and processing. The DTS data subscription [5] function can help users obtain real-time incremental data from self-built MySQL, RDS MySQL, and other databases.

Application of CDC on EventBridge

Alibaba Cloud EventBridge provides event routing services for two different application scenarios: event bus [6] and event stream [7].

The underlying layer of the event bus has the persistence capability of events, and can route events to multiple event targets as needed.

Event flow is suitable for end-to-end streaming data processing scenarios. It extracts, transforms, analyzes, and loads events generated at the source end in real time to the target end, eliminating the need to create an event bus. End-to-end dumping is more efficient and easier to use.

In order to better support users' needs in CDC scenarios, EventBridge supports the data subscription function of Alibaba Cloud DTS on the event stream source side. Users can simply configure to synchronize database changes to the EventBridge event stream.

EventBridge customized DTS Source Connector based on DTS sdk. When a user configures an event stream with DTS as the event provider, the source connector will pull DTS record data from the DTS server in real time. After data is pulled locally, certain structural encapsulation will be performed to preserve data such as id, operationType, topicPartition, beforeImage, and afterImage, while adding some system attributes required for streaming events.

For DTS Event samples, refer to the official EventBridge documentation

"EventBridge Streaming ensures the sequence of DTS events, but there is a possibility of repeated delivery of events. When EventId ensures a one-to-one mapping relationship with each DTS record, users can perform idempotent processing on events based on this field.".

Create an EventBridge event stream with a DTS source

The following shows how to create an event flow with a DTS source in the EventBridge console

• Pre preparation



1. Activate the EventBridge service;

2. Create a DTS data subscription task;

3. Create consumer group account information for consuming subscription data.

• Create an event stream



1. Log in to the EventBridge console, click the left navigation bar, select "Event Flow", and click "Create Event Flow" on the event flow list page;

"Event Flow Name" and "Description" in "Basic Information" can be filled in as needed;

"When creating an event stream and selecting an event provider, select" Database DTS "from the drop-down box;";

4. Select the created DTS data subscription task in the "Data Subscription Task" column. In the consumption group column, select which consumption group to use to consume subscription data, and fill in the consumption group password and initial consumption time.

5. Fill in the event flow rules and targets as needed, save and start to create an event flow with DTS data subscription as the event source.

matters needing attention

The following points need to be noted when using:

"EventBridge uses the SUBSCRIBE consumption mode [8], so please ensure that no other client instances are running in the current DTS consumption group.". If the set consumption group was previously running, the incoming site will be invalidated and consumption will continue based on the site that was previously consumed by this consumption group;

The site passed in when creating a DTS event source only takes effect when the new consumption group runs for the first time. After the subsequent task is restarted, consumption will continue based on the last consumption site;

3. The EventBridge event stream subscribes to DTS data with OperationTypes of INSERT, DELETE, UPDATE, and DDL;

4. Using DTS event sources may result in duplicate messages, which means that messages are guaranteed not to be lost, but it is not possible to ensure that they are delivered only once. It is recommended that users do idempotent processing well;

5. If users need to ensure sequential consumption, they need to set the exception tolerance policy to "NONE", that is, they do not tolerate exceptions. In this case, if the target end of the event flow consumes an abnormal message, the entire event flow will pause until the target end returns to normal.

Best Practice Examples

Implementing CQRS based on EventBridge

In the CQRS (Command Query Responsibility Aggregation) model, the command model is used to perform write and update operations, and the query model is used to support efficient read operations. There are certain differences in the data models used for read and write operations, and certain methods need to be used to ensure data synchronization. CDC based on EventBridge event streams can meet such requirements.

Based on cloud services, users can easily build an EventBridge based CQRS using the following methods:

1. Command the model to operate the database to make changes, query the model, and read the Elasticsearch to obtain data;

2. Enable DTS data subscription tasks to capture DB changes;

3. Configure the EventBridge event flow. The event provider is a DTS data subscription task, and the event receiver is a function calculation FC;

4. The service in FC is the update elasticsearch data operation.

Microservice decoupling

CDC can also be used for microservice decoupling. For example, the following is an order processing system for an e-commerce platform. When a new unpaid order is generated, the database will have an INSERT operation, and when the status of an order changes from "Unpaid" to "Paid", the database will have an UPDATE operation. Depending on the change in order status, there will be different microservices on the backend to handle this.

1. When a user places an order/pays, the order system performs business processing and writes data changes to the DB;

2. Create a new DTS subscription task to capture DB data changes;

3. Build an EventBridge event stream. The event provider is a DTS data subscription task, and the event receiver is RocketMQ;

When consuming RocketMQ data, enable three groups under the same topic to represent different business consumption logic;

A. GroupA will update the user cache of captured DB changes to facilitate users to query the order status;

B. GroupB downstream is associated with the financial system, which only handles new orders, that is, events with a DB operation type of INSERT, and discards other types of events;

C. GroupC only cares about the event that the order status changes from "Unpaid" to "Paid". When a qualified event arrives, it calls the downstream logistics and warehousing systems to further process the order.

If the interface invocation method is used, the order system will need to invoke the cache update interface, the new order interface, and the order payment interface after the user places an order, resulting in high business coupling. In addition, this mode allows the data consumer not to worry about the semantic information of the content returned by the upstream order processing interface. With the storage model unchanged, it can directly determine whether and how to handle this data change from the data level. At the same time, the natural message accumulation ability of Message Queuing can also help users achieve peak shaving and valley filling when orders peak.

In fact, currently, the messaging products supported by EventBridge Streaming also include RabbitMQ, Kafka, MNS, etc. In actual operation, users can choose according to their own needs.

Database Backup&Heterogeneous Database Synchronization

Database disaster recovery and heterogeneous database data synchronization are also important application scenarios for CDC. Alibaba Cloud EventBridge can also be used to quickly build such applications.

Create a new DTS data subscription task to capture changes to the user's MySQL database;

2. Build an EventBridge event stream, and the event provider is a DTS data subscription task;

3. Use EventBridge to execute specified sql on the target database to achieve database backup;

4. The data change event is delivered to the function calculation, and the user business updates the corresponding heterogeneous database based on the data change content.

Self built SQL audit

For users who have a need for self built SQL audits, using EventBridge can also be easily implemented.

Create a new DTS data subscription task to capture database changes;

2. Build an EventBridge event stream, with DTS as the event provider and SLS as the event receiver;

When users need to audit SQL, they can query SLS.

summary

This article introduces some concepts of CDC, its application on EventBridge, and several best practice scenarios. With the continuous increase of supporting products, the ecological landscape carried by EventBridge has also continuously expanded. From the message ecology to the database ecology, from the log ecology to the big data ecology, EventBridge has continuously expanded its application fields and consolidated its position as an event hub on the cloud. From now on, it will continue to develop in this direction, with deeper technology and wider ecology.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us