RDS offers the Zero-ETL feature, enabling data synchronization to data warehouses such as AnalyticDB for MySQL, AnalyticDB for PostgreSQL, or ClickHouse without the need to establish or maintain synchronization links. These synchronization links are provided free of charge, reducing both data transmission and operational costs.
Supported regions
China (Beijing), China (Hangzhou), China (Shanghai), and China (Shenzhen)
The feature is being introduced to RDS instances in various regions in stages. The information in the ApsaraDB RDS console is authoritative.
Overview of the solution
In the big data era, enterprise data is dispersed across various systems and platforms, posing significant challenges in data management and utilization. Traditionally, enterprises have relied on ETL tools to consolidate scattered data into data warehouses for business decision-making. However, traditional ETL processes often encounter issues such as:
-
Increased system complexity: Users must maintain ETL tools themselves, which adds to operational complexity and distracts from core business activities.
-
Increased resource costs: Different data sources may require distinct ETL tools, leading to higher resource expenses for establishing ETL links.
-
Reduced data timeliness: Certain ETL processes rely on periodic batch updates, which can delay the availability of analysis results in near-real-time scenarios.
To overcome these challenges, Alibaba Cloud ApsaraDB introduces the Zero-ETL feature, which swiftly establishes data synchronization links between OLTP and OLAP systems. By automating data extraction, transformation, and loading, Zero-ETL facilitates the integration of transaction processing and data analysis, enabling enterprises to concentrate on data analytics and enhance efficiency.
Advantages of the feature
-
Simple and easy to use: Users can avoid creating and maintaining complex ETL data pipelines. By simply selecting the source data and the target instance, they can automatically establish real-time data synchronization links. This significantly simplifies data pipeline management and allows users to focus on higher-level application development.
-
Zero cost: Zero-ETL links are provided at no charge, enabling cost-free real-time analysis of input data and helping enterprises save on resources.
-
Multi-source aggregation: The feature supports real-time synchronization of data from multiple instances to a single data warehouse, such as AnalyticDB for MySQL, AnalyticDB for PostgreSQL, or ClickHouse, creating a comprehensive analysis perspective.
NoteReal-time data synchronization from multiple instances to a ClickHouse instance is limited to non-overlapping synchronization objects for different tasks.
Supported links
-
RDS MySQL->ClickHouse
-
RDS MySQL->AnalyticDB MySQL 3.0
-
RDS MySQL->AnalyticDB PostgreSQL
Cost description
The Zero-ETL synchronization links are provided free of charge.
Prerequisites
-
The source RDS MySQL instance must be created.
-
The ClickHouse instance, AnalyticDB MySQL Data Warehouse Edition cluster, or AnalyticDB PostgreSQL instance must be created in the same region as the RDS MySQL instance.
-
An RDS MySQL account, ClickHouse account, AnalyticDB for MySQL cluster account, and AnalyticDB for PostgreSQL database account must be created.
Preparations
Create a service-linked role and grant the necessary management permissions to a RAM user.
RDS MySQL->ClickHouse
-
Create the service-linked role AliyunServiceRoleForClickHouseZeroETL.
NoteThe system will automatically create this role when you establish the link. Manual creation is not necessary.
-
Grant the necessary management permissions to a RAM user.
To enable a RAM user to create Zero-ETL tasks, you must assign the following permissions to the RAM user. For more information, see create a custom policy.
-
Permissions on the source RDS MySQL: AliyunRDSFullAccess.
-
Permissions for the destination ClickHouse: The script below describes the custom policy for ClickHouse.
{ "Version": "1", "Statement": [ { "Action": "clickhouse:*", "Resource": "*", "Effect": "Allow" }, { "Action": "ram:CreateServiceLinkedRole", "Resource": "*", "Effect": "Allow", "Condition": { "StringEquals": { "ram:ServiceName": "clickhouse.aliyuncs.com" } } } ] }
-
Permissions on Data Transmission Service (DTS): The following script describes the custom policy for DTS.
{ "Version": "1", "Statement": [ { "Action": "dts:*", "Resource": "*", "Effect": "Allow" }, { "Action": "ram:PassRole", "Resource": "*", "Effect": "Allow", "Condition": { "StringEquals": { "acs:Service": "dts.aliyuncs.com" } } } ] }
-
RDS MySQL->AnalyticDB MySQL 3.0
-
Create the service-linked role AliyunServiceRoleForAnalyticDBForMySQL.
-
Navigate to the RAM console and select the roles list in the left-side navigation pane. Verify if the service-linked role named AliyunServiceRoleForAnalyticDBForMySQL exists. If not, create the role.
-
Click Create Role in the upper-left corner.
-
In the Create Role dialog box, select Alibaba Cloud Service and click Next.
-
Choose Service-Linked Role as the role type and select AnalyticDB for MySQL.
-
Click Complete to confirm the creation of the service-linked role.
-
-
Grant the necessary management permissions to a RAM user.
To grant a RAM user permissions on RDS MySQL instances and AnalyticDB for MySQL clusters, refer to the following information. For more details, see create a custom policy.
Below are the scripts for custom policies:
Grant permissions on all RDS MySQL instances and AnalyticDB for MySQL clusters
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": "dts:*", "Resource": [ "acs:adb:*:*:*", "acs:rds:*:*:*" ] }, { "Effect": "Allow", "Action": [ "dts:DescribeRegions", "dts:DescribeConfigRelations", "dts:DescribeSrcLinkConfig", "dts:DescribeDestLinkConfig", "dts:DescribeLinkConfig", "dts:DescribeConciseJobStatics", "dts:ListUserAuthorizationLogs", "dts:CreateUserAuthorization" ], "Resource": [ "acs:dts:*:*:*" ] } ] }
Grant permissions on specific RDS MySQL instances and AnalyticDB for MySQL clusters
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": "dts:*", "Resource": [ "acs:adb:*:*:dbcluster/am-2zeod8ax4b9a****", Use the ID of the AnalyticDB for MySQL cluster to be synchronized "acs:rds:*:*:dbinstance/rm-2ze6fs8ouh43****", Use the ID of the RDS instance to be synchronized ] }, { "Effect": "Allow", "Action": [ "dts:DescribeRegions", "dts:DescribeConfigRelations", "dts:DescribeSrcLinkConfig", "dts:DescribeDestLinkConfig", "dts:DescribeLinkConfig", "dts:DescribeConciseJobStatics", "dts:ListUserAuthorizationLogs", "dts:CreateUserAuthorization" ], "Resource": "acs:dts:*:*:*" } ] }
RDS MySQL->AnalyticDB PostgreSQL
-
Create the service-linked role AliyunServiceRoleForADBPG.
-
Go to the RAM console and check the roles list in the left-side navigation pane for the service-linked role named AliyunServiceRoleForADBPG. If it does not exist, create the role.
-
Click Create Role in the upper-left corner.
-
In the Create Role dialog box, select Alibaba Cloud Service and click Next.
-
Choose Service-Linked Role as the role type and select AnalyticDB for PostgreSQL.
-
Click Complete to ensure the service-linked role is created.
-
-
Grant management permissions to a RAM user.
You can grant a RAM user permissions for all or specific RDS MySQL instances and AnalyticDB for PostgreSQL instances. For more information, see create a custom policy.
The following sections describe the scripts for custom policies:
Grant permissions on all RDS MySQL instances and AnalyticDB for PostgreSQL instances
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": "dts:*", "Resource": [ "acs:gpdb:*:*:*", "acs:rds:*:*:*" ] }, { "Effect": "Allow", "Action": [ "dts:DescribeRegions", "dts:DescribeConfigRelations", "dts:DescribeSrcLinkConfig", "dts:DescribeDestLinkConfig", "dts:DescribeLinkConfig" ], "Resource": [ "acs:gpdb:*:*:*" ] } ] }
Grant permissions on specific RDS MySQL instances and AnalyticDB for PostgreSQL instances
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": "dts:*", "Resource": [ "acs:gpdb:*:*:dbinstanc/gp-bp1a740l3zx4****", Use the ID of the AnalyticDB for PostgreSQL cluster to be synchronized "acs:rds:*:*:dbinstance/rm-2ze6fs8ouh43****", Use the ID of the RDS instance to be synchronized ] }, { "Effect": "Allow", "Action": [ "dts:DescribeRegions", "dts:DescribeConfigRelations", "dts:DescribeSrcLinkConfig", "dts:DescribeDestLinkConfig", "dts:DescribeLinkConfig" ], "Resource": "acs:dts:*:*:*" } ] }
Procedure
-
Log in to the RDS Management Console and click Data Integration in the left-side navigation pane.
-
On the Zero-ETL tab, click Create Zero-ETL Task.
-
On the Create Zero-ETL Task page, enter the source database information and the destination database information.
-
Enter the source database information:
Source Database Information
Description
Task Name
The name of the zero-ETL task.
Database Type
Select RDS MySQL
Connection Type
Only Cloud Instance Access is supported.
Instance Region
The region of the source instance.
Instance ID
The ID of the RDS instance.
Database Name
The name of the database in the RDS instance.
Database Account
The username of the account that is used to log on to the RDS instance.
Database Password
The password of the account that is used to log on to the RDS instance.
Connection Method
The method that is used to connect to the RDS instance. Valid values: Non-encrypted and SSL-encrypted.
NoteIf you set the connection method to SSL-encrypted, you must use a cloud certificate to enable SSL link encryption in advance.
-
Enter the destination database information:
Destination Database Information
Description
Database Type
The current destination database supports AnalyticDB MySQL 3.0, AnalyticDB PostgreSQL, and ClickHouse.
Connection Type
Only Cloud Instance Access is supported.
Instance Region
The region of the destination database.
Instance ID
The ID of the cluster in which the destination database is created.
Database Account
The username of the account that is used to log on to the cluster in which the destination database is created.
Database Password
The password of the account that is used to log on to the cluster in which the destination database is created.
-
-
After entering the above parameters, click Test Connection to Proceed to the Next Step to navigate to the Zero-ETL configuration page and set the following parameters:
Configuration Item
Description
Synchronization Type
The type of the data synchronization between the source and destination databases.
Valid values: Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization.
NoteFor synchronization tasks, Incremental Data Synchronization is automatically selected and cannot be cleared. Incremental data of the source database is synchronized to the destination database in real time.
If you do not need incremental synchronization, use data migration.
DDL and DML Operations to Be Synchronized
Select the DML operations (insert, update, and delete) and DDL operations (create, alter, drop, rename, and truncate) to be synchronized. All operations are selected by default.
Source Objects and Selected Objects
The source objects and the objects that you want to synchronize.
Advanced Settings (Optional)
The retry time for failed connections between the source and destination databases and the retry time for other issues that occur on the source and destination databases.
-
Once you've set the above parameters, click Configure Table Fields and specify the following information:
Table Field Configuration
Description
Database Name
The name of the selected database.
Table Name
The name of the selected table.
Primary Key Column
The primary key column of the selected table.
Distribution Key
The distribution key column of the selected table.
Type
The type of the selected table, which can be a partitioned table or a replicated table.
Definition Status
The status of the selected table. After you configure the table fields, the status of the table changes from Undefined to Defined.
-
After configuring all the parameters, click Next to Save Task and Precheck.
-
If the precheck passes, click Start to initiate the zero-ETL task.
You can monitor the Name, Source/Destination, Status, and other details of the zero-ETL task on the Free Data Synchronization page.