×
Community Blog Converged Database Ecosystem: Building CDC Applications with EventBridge

Converged Database Ecosystem: Building CDC Applications with EventBridge

This article introduces how to use EventBridge to build CDC applications from the aspects of CDC, CDC's application on EventBridge, and several best practice scenarios.

By Changfeng

Preface

Change Data Capture (CDC) refers to an application scenario that listens to upstream data changes and synchronizes the changed information to downstream services for further processing. In recent years, the popularity of event-driven architecture (EDA) has increased, which makes it the first choice for project architecture designers. EDA fits into the CDC's underlying infrastructure, which takes data changes as events. Each service completes business drivers by listening to events of interest. EventBridge is a Serverless event bus service launched by Alibaba Cloud, which helps users build applications based on the EDA architecture. Recently, EventBridge event streams have supported CDC capabilities based on Alibaba Cloud DTS [1] services. This article introduces how to use EventBridge to easily build CDC applications from the aspects of CDC, CDC's application on EventBridge, and several best practice scenarios.

CDC Overview

Basic Principles and Application Scenarios

CDC captures incremental data and data schema changes from the source database and synchronizes these changes to the destination database, data lake, or other data analysis services in an orderly manner in a highly reliable and low-latency data transmission. Currently, the mainstream open-source CDC tools in the industry include Debezium [2], Canal [3], and Maxwell [4].

1
Pictures source: https://dbconvert.com

The industry mainly has the following types of CDC implementations:

1. Timestamp or Version Number-Based

The timestamp-based method requires the database table to have a field representing the updated timestamp. When data is inserted or updated, the corresponding timestamp field will be updated accordingly. The CDC component periodically retrieves data records that were updated longer than the last synchronization time to capture changes to the data during the current period. The principles of version-based tracking and timestamp-based tracking are the same. Developers must update the version number of data when changing data.

2. Snapshot-Based

The snapshot-based CDC implementation uses three copies of the data source at the storage level: the original data, the previous snapshot, and the current snapshot. Obtain the data changes between the two snapshots by comparing the differences between the two snapshots.

3. Trigger-Based

The trigger-based CDC implementation establishes a trigger on the source table to store the data change operation (INSERT, UPDATE, and DELETE) records. For example, a table is created to record user changes, and three types of triggers are created to synchronize user changes to this table.

4. Log-Based

The three methods are intrusive to the source database, while the log-based method is a non-intrusive CDC method. The database uses transaction logs to implement disaster recovery. For example, MySQL binlog records all user changes to the database. Log-based CDC continuously monitors transaction logs to obtain changes in the database in real-time.

CDC has a wide range of application scenarios, including (but not limited to) these aspects: remote data center database synchronization, heterogeneous database data synchronization, microservice decoupling, cache update, and CQRS.

Alibaba Cloud-Based CDC Solution: DTS

Alibaba Cloud Data Transmission Service (DTS) is a real-time data streaming service. DTS supports data transmission between data sources (such as relational, NoSQL, and online analytical processing (OLAP) databases). DTS provides data synchronization, data migration, change tracking, data integration, and data processing features, enabling you to manage data within a secure, scalable, and high-availability architecture. A DTS data subscription [5] helps you obtain real-time incremental data from user-created MySQL, ApsaraDB RDS for MySQL, and Oracle databases.

2

CDC's Application on EventBridge

Alibaba Cloud EventBridge provides event bus [6] and event stream [7] routing services in different application scenarios.

The underlying layer of an event bus has the persistence capability of events and can route events to multiple event targets as needed.

The event stream is suitable for end-to-end streaming data processing scenarios. Events generated from the source end are extracted, converted, and analyzed in real-time and loaded to the destination end without creating an event bus. The end-to-end dump is more efficient and easier to use.

EventBridge supports the data subscription feature of Alibaba Cloud DTS at the event stream source to help support your needs in CDC scenarios. You can synchronize database changes to EventBridge event streams with simple configurations.

3

EventBridge customizes DTS Source Connector based on DTS SDKs. When you configure an event stream whose event provider is DTS, the source connector pulls DTS record data from the DTS server in real-time. After the data is pulled to a local device, a certain structure is encapsulated to retain the data (such as id, operationType, topicPartition, beforeImage, and afterimage). At the same time, some system attributes required for streaming events are added.

Please see the EventBridge official documentation for a sample DTS event.

4

EventBridge Streaming ensures the sequence of DTS events, but the event may be delivered repeatedly. EventId ensures a one-to-one mapping relationship with each DTS record. You can perform idempotent processing on events based on this field.

Create an EventBridge Event Stream with Its Source as DTS

The following shows how to create an event stream whose source is DTS in the EventBridge console.

  • Preparation
  1. Activate the EventBridge service
  2. Create a DTS data subscription task
  3. Create consumer group account information for consuming subscribed data
  • Create Event Streaming

1) Log on to the EventBridge console. Click Event Stream on the left-side navigation pane. Click Create Event Stream on the Event Stream page.

2) Fill in the Event Flow Name and Description in Basic Information as required.

3) When creating an event stream and selecting an event provider, select Database DTS from the drop-down list.

4) In the Data Subscription Tasks column, select the created DTS data subscription task. In the Consumer Group column, select the consumer group to consume subscription data and enter the consumer group password and initial consumption time.

5

5) Enter event stream rules and targets as required. Save and start to create an event stream that uses a DTS data subscription as the event source.

6

Notes

Note the following points when using:

  1. EventBridge is [8] in SUBSCRIBE consumption mode. Therefore, make sure no other client instances are running in the current DTS consumer group. If the specified consumer group has been run before, the incoming offset becomes invalid, and consumption continues based on the last offset consumed by the consumer group.
  2. The offset passed in when you create a DTS event source only takes effect for the first runtime. After the subsequent task is restarted, consumption will continue based on the last consumer offset.
  3. The EventBridge event stream subscribes to DTS data whose OperationType is INSERT, DELETE, UPDATE, and DDL.
  4. If you use DTS event sources, messages may be duplicated. Messages are not lost but cannot be delivered only once. We recommend users to pay attention to the idempotent processing.
  5. If users need to ensure sequential consumption, they need to set the exception tolerance policy to NONE, which means exceptions are not tolerated. As such, if the consumption of messages on the destination of the event stream is abnormal, the entire event stream is suspended until the destination is normal.

Best Practice Examples

Implement CQRS Based on EventBridge

In the Command Query Responsibility Segregation (CQRS) model, the command model is used to perform write and update operations, and the query model is used to support efficient read operations. There are certain differences between the data models used for read operations and write operations. You need to use certain methods to ensure data synchronization. Based on EventBridge event streams, CDC can meet this requirement.

Based on cloud services, users can easily build CQRS based on EventBridge in the following ways:

  1. Command the model operation database to make changes and query the model read Elasticsearch to obtain data
  2. Enable the DTS data subscription task to capture DB changes
  3. Configure the EventBridge event stream. The event provider is a DTS data subscription task, and the event recipient is Function Compute (FC).
  4. The service in FC is the update Elasticsearch data operation.

7

Microservice Decoupling

CDC can be used for microservice decoupling. For example, the following is an order processing system of an e-commerce platform. When a new unpaid order is generated, the database will have an INSERT operation. When the status of an order changes from Unpaid" to Paid, the database will have an UPDATE operation. According to the order status, the backend will have different microservices to handle this.

  1. The user orders/pays. Then, the order system performs business processing and writes data changes to DB.
  2. Create a DTS subscription task to capture DB data changes
  3. Build the EventBridge event stream. The event provider is a DTS data subscription task, and the event recipient is RocketMQ.
  4. When consuming RocketMQ data, enable three groups under the same topic to represent different business consumption logic.
  • Group A updates the user cache of the captured DB changes to facilitate users to query the order status.
  • Group B downstream associated financial system, only processing new orders, which means processing DB operation type INSERT events, discarding the remaining types of events.
  • Group C only cares about the event when the order status changes from Unpaid to Paid. When a qualified event arrives, it calls the downstream logistics and warehousing system to process the order.

If the interface call method is used, the order system will need to call the cache update interface, the new order interface, and the order payment interface after the order is placed, and the business coupling is too high. This mode allows the data consumer not to worry about the semantic information of the content returned by the upstream order processing interface and directly determines whether and how the data change needs to be processed from the data level under the condition that the storage model remains unchanged. Message queue natural message accumulation capability can also help users achieve business peak-valley shifting when the peak order comes.

EventBridge Streaming supports other messaging products (such as RabbitMQ, Kafka, and MNS). Users can select based on their needs in practice.

8

Database Backup and Heterogeneous Database Synchronization

Database disaster recovery and heterogeneous database data synchronization are important application scenarios for CDC. You can use Alibaba Cloud EventBridge to build such applications quickly.

  1. Create a DTS data subscription task to capture user MySQL database changes
  2. Build an EventBridge event stream. The event provider is a DTS data subscription task.
  3. Use EventBridge to execute the specified SQL in the destination database to implement the DBS
  4. Data change events are delivered to Function Compute. User services update the corresponding heterogeneous database based on the data changes.

9

Self-Built SQL Audit

You can use EventBridge to meet the needs of users for a self-built SQL audit.

  1. Create a DTS data subscription task to capture database changes
  2. Build an EventBridge event stream. The event provider is DTS, and the event recipient is Log Service.
  3. If you need to audit SQL statements, you can query SLS.

10

Summary

This article introduces some concepts of CDC, the application of CDC on EventBridge, and several best practice scenarios. With the continuous increase of support products, the ecological map carried by EventBridge is also expanding. From message ecology to database ecology and from log ecology to big data ecology, EventBridge continues to expand its applicable fields and consolidate its position as an event hub on the cloud. In the future, EventBridge will continue to develop in this direction with deeper technology and wider ecology.

References

[1] DTS:
https://www.alibabacloud.com/product/data-transmission-service

[2] Debezium:
https://debezium.io/

[3] Canal:
https://github.com/alibaba/canal

[4] Maxwell:
https://github.com/zendesk/maxwell

[5] Overview of Change Tracking Scenarios:
https://www.alibabacloud.com/help/en/data-transmission-service/latest/overview-of-change-tracking-scenarios

[6] Event Bus:
https://www.alibabacloud.com/help/en/eventbridge/latest/event-bus-overview

[7] EventStreamings:
https://www.alibabacloud.com/help/en/eventbridge/latest/eventstreamings-overview

[8] SUBSCRIBE Consumption Pattern:
https://www.alibabacloud.com/help/en/data-transmission-service/latest/consume-tracked-data-use-the-sdk-demo-code-to-consume-tracked-data

0 1 0
Share on

You may also like

Comments