×
Community Blog Interpreting EventBridge Transformation: Flexible Data Transformation and Processing

Interpreting EventBridge Transformation: Flexible Data Transformation and Processing

The article introduces the transformation capability of Alibaba Cloud EventBridge, covering an overview of ETL, the Transform (T) capability, and the practical scenarios of EventBridge Transform.

By Muze

Alibaba Cloud EventBridge offers a robust and versatile event bus service that seamlessly connects applications, Alibaba Cloud services, and serverless offerings to rapidly establish event-driven architectures (EDAs), fostering interactions between applications as well as between applications and the cloud. Moreover, it serves as a streaming data pipeline to expedite the development of ETL systems between various data warehouses and data processing or analysis tools.

This article describes the transformation capability of Alibaba Cloud EventBridge from the following aspects:

1) An introduction to the basic concepts of ETL.

2) An introduction to the T (transformation) capability.

3) An analysis of EventBridge's transformation capabilities and practical use cases.

1. What is ETL?

ETL stands for Extract, Transform, and Load — a crucial component of data integration. The primary roles of these three stages are:

1.1 Extract

Extracting data from sources, which could be diverse data storage systems like message queues and databases.

1.2 Transform

Transforming the extracted data, which may involve data enrichment, cleansing, aggregation, splitting, and format conversion.

1.3 Load

Loading the transformed data into a destination service, such as a data warehouse, data lake, or BI system. ETL's extensive application helps businesses manage and leverage data for data-driven decision-making and business transformation.

1

2. Transform (T) Capability

2.1 Application Scenarios of Transformation

The transformation (T) in ETL involves modifying the extracted data and is used in scenarios such as:

2.1.1 Data Enrichment

Enrich original data by calling external services to obtain additional information, enhancing the data's completeness and applicability.

2.1.2 Data Cleansing

Cleanse or verify the original data to remove duplicates, missing or inaccurate entries, ensuring data quality and accuracy, or anonymize data to maintain security.

2.1.3 Data Aggregation

Aggregate multiple pieces of raw data into a unified view for easier analysis and querying.

2.1.4 Data Splitting

Divide single pieces of raw data into multiple pieces based on business needs.

2.1.5 Data Format Conversion

Transform upstream data into a format suitable for the target service, such as converting raw data from formats like Base64, Avro, and PB to JSON.

Transformation turns raw data into high-quality data with consistency, accuracy, and security, laying a dependable foundation for subsequent data analysis.

2.2 Overview of Transformation Architecture in the Industry

Current industry practices for transformation capability include:

2.2.1 Built-in, Out-of-the-box, Simple and Lightweight Transformation Capability

Data cleansing: Removing sensitive fields from data and processing noise.

Data format conversion: Converting specified fields to a particular format.

2.2.2 Built-in Custom Transformation Capability

Allows for user-defined transformation logic. Users can implement an interface based on a custom transformation's specifications, compile the code into a JAR package, and then upload it to the system to use their custom transformation logic.

2.2.3 Remote Custom Transformation Capability

Employ remote calls to an external system for data transformation.

The first two practices are closely tied to system logic and share computational resources, making them suitable only for lightweight, simple tasks. Remote custom transformation, however, decouples the transformation logic from the data path, offering more flexibility.

2.3 Alibaba Cloud EventBridge Transformation Design

Alibaba Cloud EventBridge leverages Alibaba Cloud Function Compute for its custom transformation capability, using remote calls to decouple the transformation business logic from the data path. This enhances transformation flexibility and minimizes the risk of computational resource contention.

2.3.1 Link Architecture

The link architecture of EventBridge using Alibaba Cloud Function Compute for transformation is as follows:

• EventBridge extracts data from a source.

• Extracted data is processed in batches (windows) until the batching condition is met. Filtered data is then passed in batches to the next step. During transformation, Function Compute processes the data using user-written function code, and EventBridge waits for and receives the processed results.

• EventBridge loads the transformed data to the sink side.

2

On this basis, we continue to explore several key issues involved in the link:

2.3.2 Batch Aggregation

EventBridge's batch window aggregates multiple data records, pushing them in batches to the next step once the batch criteria are met. By placing the batch capability before transformation, data processing efficiency and throughput are enhanced, significantly reducing the number of calls to Function Compute for transformations. EventBridge controls batch operations based on quantity or time: once either condition is met, it triggers batch pushing.

Batch push quantity: The maximum number of data records aggregated at once.

Batch push interval: The frequency of aggregation, with the system pushing aggregated data in batches at set intervals.

2.3.3 High Availability

In case of transformation exceptions, to avoid data loss and ensure link stability and availability, EventBridge utilizes retry, dead-letter, and fault tolerance mechanisms.

Retry Mechanism

In the event of network issues or system crashes causing transformation exceptions, EventBridge will retry according to the user-selected policy, which currently includes backoff or exponential decay retries.

Dead-letter Queue

Should data remain untransformed after exceeding the retry limit, it becomes dead-letter data. Users can configure a dead-letter queue to prevent discarding this data, and EventBridge will deliver all dead-letter data there. It supports Kafka, RocketMQ, and MNS as destinations for dead-letter queues.

Fault Tolerance Policy

EventBridge offers two methods for handling transformation errors:

  • Exceptions allowed: The process is not blocked by a transformation exception; subsequent data is processed, and the abnormal data is retried. If it exceeds the retry policy, EventBridge will deliver it to a dead-letter queue or discard it based on the configuration.
  • No fault tolerance: No errors are permitted; if a transformation exception occurs and exceeds the retry configuration, the process is blocked.

2.3.4 Costs

Function Compute incurs costs for calls and function execution, which include invocation fees, resource usage (such as CPU and memory), and outbound internet traffic. To reduce costs, Function Compute offers a fee reduction for invocations from EventBridge, meaning calls triggered by EventBridge to Function Compute are no longer included in the bill [3,4].

2.4 Product Interaction

Currently, the transformation capability can be experienced within the event stream of EventBridge, as shown in the following figure.

3

For Alibaba Cloud Function Compute, we provide two methods:

2.4.1 Create a Function Template

You can create a function directly based on the provided template. A simple IDE is provided at the product level for you to write and debug code.

4

2.4.2 Bind Existing Functions

Binding existing functions is supported. For more information, see Transformation help documentation (Appendix [4]).

5

2.5 Advantages of Transformation

2.5.1 Serverless Transformation

EventBridge transformation is built based on the serverless Function Compute and enjoys the features of serverless, including free O&M, elastic resources, and pay-as-you-go billing:

Elasticity: It supports scaling within 100 milliseconds, meeting the requirements of diverse load scenarios such as peaks and troughs, bursts, and continuous stability.

Free O&M: You do not need to worry about the runtime environment and resources of transformation.

Pay-as-you-go billing: You only need to pay for the fees incurred by running functions. More importantly, you are not charged for the number of function invocations incurred by EventBridge.

Flexibility: UDFs can meet complex and personalized requirements in business.

Multi-language support: Mainstream languages such as Go, Python, Java, and Nodejs are supported. You can select a familiar or suitable language for your custom transformation logic.

Architecture decoupling: The architecture of remote transformation decouples the transformation business logic from the system logic to isolate resources and avoid resource contention.

Template support: A variety of transformation function templates are provided at the product level to help users get started.

Efficiency improvement through batch operations: By aggregating data in batches, the input parameters of functions are a batch of messages, which greatly improves the message processing efficiency and throughput.

3. Introduction to Customer Use Cases

3.1 Data Format Conversion + Architecture Upgrade

Message (MNS) -> Transform -> Message (RocketMQ)

The customer was faced with architecture upgrade issues and wanted to upgrade the MNS that the system depended on to RocketMQ. However, the system architecture was complex, which depended on a large number of MNS logic and involved a large number of R&D personnel. It was expected that the entire architecture upgrade would take several months. To ensure data consistency during the architecture upgrade, the customer used EventBridge to synchronize MNS messages from the old architecture to the new RocketMQ instance in real time. In addition, to adapt to the message design in the new architecture, the customer used FC transformation to convert the old message into the target format before delivering it to RocketMQ.

3.2 Data Cleansing + Data Dumping

Message (RocketMQ) -> Transform -> OSS

The customer used to deliver user-generated video data to RocketMQ, which was viewable to users. To avoid this problem, the customer selected OSS for file storage to meet the requirements of storing data at low costs in this scenario requiring more writes and fewer reads. However, the video data contained some sensitive information. Therefore, the customer used FC transformation to remove the sensitive data in the videos before delivering them to OSS.

4. Summary and Outlook

EventBridge transformation meets the complex and personalized needs of businesses by integrating Function Compute. Its features are favored by customers, including elasticity, free O&M, and pay-as-you-go billing. In the future, EventBridge transformation will unlock more business scenarios by integrating more services such as CloudFlow and HTTP destination to meet diverse requirements.

References:

[1] EventBridge-Event Stream-Event Transformation

[2] EventBridge-Homepage of Event Stream Products

[3] Fee reduction for requests from Alibaba Cloud message services and CloudFlow

[4] Notice on reduction of prices of Function Compute

0 1 0
Share on

Alibaba Cloud Native

199 posts | 12 followers

You may also like

Comments

Alibaba Cloud Native

199 posts | 12 followers

Related Products