zero-ETL - ApsaraDB RDS - Alibaba Cloud Documentation Center

ApsaraDB RDS provides the zero-ETL feature to synchronize data from an RDS instance to a data warehouse, such as AnalyticDB for PostgreSQL. You do not need to build or maintain data pipelines. This service is free of charge and reduces data transmission and O&M costs.

Supported regions

China (Beijing), China (Hangzhou), China (Shanghai), and China (Shenzhen)

Note

Support for more regions is being rolled out. For the latest information, see the RDS console.

Overview

In the era of big data, enterprises must use extract-transform-load (ETL) tools to efficiently manage and use large amounts of business data distributed across different systems and platforms.

An ETL tool extracts data from an upper-level business system, transforms the data, and then loads the data to data warehouses. This process incorporates distributed data into data warehouses for further computing, analysis, and business decision-making.

The following section describes the challenges associated with traditional ETL processes:

Increased system complexity: The maintenance of ETL tools increases O&M difficulty and prevents you from focusing on business application development.
Increased resource costs: Different data sources may require different ETL tools, and you are charged additional fees for creating ETL tasks.
Reduced data timeliness: Specific ETL processes involve periodic batch updates. In near-real-time scenarios, analysis results cannot be quickly generated.

To resolve the issues, Alibaba Cloud ApsaraDB provides the zero-ETL feature that allows you to quickly create a data synchronization task between a business system that uses online transaction processing (OLTP) databases and a data warehouse that uses online analytical processing (OLAP) databases. The feature automatically extracts, transforms, cleanses, and loads the data of the business system to the data warehouse. This way, data synchronization and management can be completed in a one-stop manner to integrate transaction processing and data analysis. This allows you to focus on data analysis.

Benefits

Easy to use: You do not need to create and maintain complex data pipelines for ETL operations. You can automatically create a real-time data pipeline by simply selecting the source data and destination instance. This reduces the challenges of building and managing data pipelines and lets you focus on application development.
Zero cost: Zero-ETL pipelines are free of charge. You can analyze input data in your data warehouse without incurring any data pipeline costs.
Multi-source aggregation: You can use zero-ETL pipelines to synchronize data from multiple instances to a single AnalyticDB for PostgreSQL instance in real time. This helps you build a comprehensive view for data analytics.

Supported links

RDS for PostgreSQL to AnalyticDB for PostgreSQL

Billing

Zero-ETL data synchronization is free of charge.

Prerequisites

An AnalyticDB for PostgreSQL instance is created. For more information, see Create an AnalyticDB for PostgreSQL instance.
The RDS for PostgreSQL instance and the AnalyticDB for PostgreSQL instance must be in the same region.
A database is created in the AnalyticDB for PostgreSQL instance to receive data. For more information, see Manage databases in an AnalyticDB for PostgreSQL instance.
A privileged account for the RDS for PostgreSQL instance is created and set as the owner of the destination database. For more information, see Create an account for an ApsaraDB RDS for PostgreSQL instance.
A database account for the AnalyticDB for PostgreSQL instance is created. For more information, see Create a database account for an AnalyticDB for PostgreSQL instance.
Note
You can also use the initial account or an account that has SUPERUSER permissions.
The wal_level parameter of the RDS for PostgreSQL instance is set to logical. For more information about how to modify parameters, see Modify the parameters of an ApsaraDB RDS for PostgreSQL instance.
Logical Replication Slot Failover is enabled for the RDS for PostgreSQL instance. This feature is enabled by default. To check the enabling status of this feature, see Logical Replication Slot Failover.
Note
When this feature is enabled, the sync task can run correctly and logical subscriptions are not interrupted by a primary/secondary failover.

Preparations

Create a service-linked role

Go to the Roles list in the Resource Access Management (RAM) console.
In the list of roles, check for a service-linked role named AliyunServiceRoleForADBPG. If it does not exist, continue to the next step to create it.
Note
You must create the service-linked role before you create a zero-ETL pipeline.
Go to the service-linked role creation page. Set Select Service to AliyunServiceRoleForADBPG.
Click Create Service Linked Role.
Return to the list of roles and verify that the service-linked role has been created.

Grant management permissions to a RAM user

A Resource Access Management (RAM) user needs the following two permissions to create and manage zero-ETL pipelines.

Permission 1: Grant the RAM user management permissions on the destination RDS instance

To create and manage zero-ETL pipelines, the RAM user must have the AliyunGPDBFullAccess permission (full management permissions on AnalyticDB for PostgreSQL) for the destination instance. For more information, see Grant permissions to a RAM user.

Permission 2: Grant the RAM user zero-ETL management permissions

To create and manage zero-ETL pipelines, the RAM user needs permission to create pipelines between the source instance (RDS for PostgreSQL) and the destination instance (AnalyticDB for PostgreSQL). You can create custom policies to authorize pipeline creation for all instances or only for specific instances. Then, grant the custom policies to the RAM user. For more information about how to create custom policies and grant permissions, see Create custom policies.

The following are sample scripts for custom policies:

Grant permissions on all RDS for PostgreSQL and AnalyticDB for PostgreSQL instances

{
    "Version": "1",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "dts:*",
            "Resource": [
                "acs:gpdb:*:*:*",
                "acs:rds:*:*:*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "dts:DescribeRegions",
                "dts:DescribeConfigRelations",
                "dts:DescribeSrcLinkConfig",
                "dts:DescribeDestLinkConfig",
                "dts:DescribeLinkConfig",
                "dts:DescribeConciseJobStatics",
                "dts:ListUserAuthorizationLogs",
                "dts:CreateUserAuthorization"
            ],
            "Resource": [
                "acs:dts:*:*:*"
            ]
        }
    ]
}

Grant permissions on specific RDS for PostgreSQL and AnalyticDB for PostgreSQL instances

{
    "Version": "1",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "dts:*",
            "Resource": [
                "acs:gpdb:*:*:dbcluster/gp-bp13e375cd8x2****",
                "acs:rds:*:*:dbinstance/pgm-2zeyjzi91g53****"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "dts:DescribeRegions",
                "dts:DescribeConfigRelations",
                "dts:DescribeSrcLinkConfig",
                "dts:DescribeDestLinkConfig",
                "dts:DescribeLinkConfig",
                "dts:DescribeConciseJobStatics",
                "dts:ListUserAuthorizationLogs",
                "dts:CreateUserAuthorization"
            ],
            "Resource": "acs:dts:*:*:*"
        }
    ]
}

Procedure

Log on to the RDS console. In the navigation pane on the left, click Data Integration.
On the Zero-ETL tab, click Create Zero-ETL Task.

On the Create Zero-ETL Task page, configure the source and destination databases.

Configure the source database:

Source database	Description
Task Name	The name of the Zero-ETL task.
Database Type	Select RDS for PostgreSQL.
Access Method	Only Alibaba Cloud Instance is supported.
Instance Region	The region where the source instance resides.
Instance ID	The ID of the RDS for PostgreSQL instance.
Database Name	The name of the database in the RDS for PostgreSQL instance.
Database Account	The database account for the RDS for PostgreSQL instance.
Database Password	The password of the database account.
Encryption	The method used to connect to the RDS for PostgreSQL instance.

Configure the destination database:

Destination database	Description
Database Type	Currently, only AnalyticDB For PostgreSQL is supported as the destination database.
Access Method	Only Alibaba Cloud Instance is supported.
Instance Region	The region where the destination instance resides.
Instance ID	The ID of the destination instance.
Database Name	The name of the database in the destination instance that is used to receive data.
Database Account	The database account for the destination instance.
Database Password	The password of the database account.

After you configure the parameters, click Test Connectivity and Proceed. On the page that appears, configure the following parameters:

Configuration item	Description
DDL and DML Operations to Be Synchronized	Select the DML operations (insert, update, and delete) and DDL operations (create, alter, drop, rename, and truncate) to synchronize. By default, all operations are selected.
Source Objects and Selected Objects	The source objects in the database and the objects to synchronize.
Advanced Settings (Optional)	Set the retry interval for when the source or destination database is unreachable, and the retry interval for other issues that occur on the source or destination database.

After you configure the parameters, select O&M-free Zero-ETL and click Next: Configure Database and Table Fields. Configure the following information:

Table and field configuration	Description
Database Name	Select an existing database.
Table Name	Select an existing data table.
Primary Key Column	The primary key column of the selected data table.
Distribution Key	The distribution key column of the selected data table.
Type	The type of the data table. Valid values: partitioned table and replicated table.
Definition Status	After you configure the table and field settings, the status changes from Undefined to Defined.

After you configure all the parameters, click Next: Save Task Settings and Precheck.
If the precheck is successful, click Start to start the zero-ETL task.

Note

On the Zero-ETL page, you can click the Source tab to view information about the zero-ETL task, such as its ID/Name, Source/Destination, and Status.