ApsaraDB RDS provides the zero-ETL feature to synchronize data from an RDS instance to a data warehouse, such as AnalyticDB for PostgreSQL. You do not need to build or maintain data pipelines. This service is free of charge and reduces data transmission and O&M costs.
Supported regions
China (Beijing), China (Hangzhou), China (Shanghai), and China (Shenzhen)
Support for more regions is being rolled out. For the latest information, see the RDS console.
Overview
In the era of big data, enterprises must use extract-transform-load (ETL) tools to efficiently manage and use large amounts of business data distributed across different systems and platforms.
An ETL tool extracts data from an upper-level business system, transforms the data, and then loads the data to data warehouses. This process incorporates distributed data into data warehouses for further computing, analysis, and business decision-making.
The following section describes the challenges associated with traditional ETL processes:
Increased system complexity: The maintenance of ETL tools increases O&M difficulty and prevents you from focusing on business application development.
Increased resource costs: Different data sources may require different ETL tools, and you are charged additional fees for creating ETL tasks.
Reduced data timeliness: Specific ETL processes involve periodic batch updates. In near-real-time scenarios, analysis results cannot be quickly generated.
To resolve the issues, Alibaba Cloud ApsaraDB provides the zero-ETL feature that allows you to quickly create a data synchronization task between a business system that uses online transaction processing (OLTP) databases and a data warehouse that uses online analytical processing (OLAP) databases. The feature automatically extracts, transforms, cleanses, and loads the data of the business system to the data warehouse. This way, data synchronization and management can be completed in a one-stop manner to integrate transaction processing and data analysis. This allows you to focus on data analysis.
Benefits
Easy to use: You do not need to create and maintain complex data pipelines for ETL operations. You can automatically create a real-time data pipeline by simply selecting the source data and destination instance. This reduces the challenges of building and managing data pipelines and lets you focus on application development.
Zero cost: Zero-ETL pipelines are free of charge. You can analyze input data in your data warehouse without incurring any data pipeline costs.
Multi-source aggregation: You can use zero-ETL pipelines to synchronize data from multiple instances to a single AnalyticDB for PostgreSQL instance in real time. This helps you build a comprehensive view for data analytics.
Supported links
RDS for PostgreSQL to AnalyticDB for PostgreSQL
Billing
Zero-ETL data synchronization is free of charge.
Prerequisites
An AnalyticDB for PostgreSQL instance is created. For more information, see Create an AnalyticDB for PostgreSQL instance.
The RDS for PostgreSQL instance and the AnalyticDB for PostgreSQL instance must be in the same region.
A database is created in the AnalyticDB for PostgreSQL instance to receive data. For more information, see Manage databases in an AnalyticDB for PostgreSQL instance.
A privileged account for the RDS for PostgreSQL instance is created and set as the owner of the destination database. For more information, see Create an account for an ApsaraDB RDS for PostgreSQL instance.
A database account for the AnalyticDB for PostgreSQL instance is created. For more information, see Create a database account for an AnalyticDB for PostgreSQL instance.
NoteYou can also use the initial account or an account that has SUPERUSER permissions.
The
wal_levelparameter of the RDS for PostgreSQL instance is set tological. For more information about how to modify parameters, see Modify the parameters of an ApsaraDB RDS for PostgreSQL instance.Logical Replication Slot Failover is enabled for the RDS for PostgreSQL instance. This feature is enabled by default. To check the enabling status of this feature, see Logical Replication Slot Failover.
NoteWhen this feature is enabled, the sync task can run correctly and logical subscriptions are not interrupted by a primary/secondary failover.
Preparations
Create a service-linked role
Go to the Roles list in the Resource Access Management (RAM) console.
In the list of roles, check for a service-linked role named AliyunServiceRoleForADBPG. If it does not exist, continue to the next step to create it.
NoteYou must create the service-linked role before you create a zero-ETL pipeline.
Go to the service-linked role creation page. Set Trusted Cloud Service to AliyunServiceRoleForADBPG.
Click Create Service-Linked Role.
Return to the list of roles and verify that the service-linked role has been created.
Grant management permissions to a RAM user
A Resource Access Management (RAM) user needs the following two permissions to create and manage zero-ETL pipelines.
Permission 1: Grant the RAM user management permissions on the destination RDS instance
To create and manage zero-ETL pipelines, the RAM user must have the AliyunGPDBFullAccess permission (full management permissions on AnalyticDB for PostgreSQL) for the destination instance. For more information, see Grant permissions to a RAM user.
Permission 2: Grant the RAM user zero-ETL management permissions
To create and manage zero-ETL pipelines, the RAM user needs permission to create pipelines between the source instance (RDS for PostgreSQL) and the destination instance (AnalyticDB for PostgreSQL). You can create custom policies to authorize pipeline creation for all instances or only for specific instances. Then, grant the custom policies to the RAM user. For more information about how to create custom policies and grant permissions, see Create custom policies.
The following are sample scripts for custom policies:
Grant permissions on all RDS for PostgreSQL and AnalyticDB for PostgreSQL instances
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": "dts:*",
"Resource": [
"acs:gpdb:*:*:*",
"acs:rds:*:*:*"
]
},
{
"Effect": "Allow",
"Action": [
"dts:DescribeRegions",
"dts:DescribeConfigRelations",
"dts:DescribeSrcLinkConfig",
"dts:DescribeDestLinkConfig",
"dts:DescribeLinkConfig",
"dts:DescribeConciseJobStatics",
"dts:ListUserAuthorizationLogs",
"dts:CreateUserAuthorization"
],
"Resource": [
"acs:dts:*:*:*"
]
}
]
}Grant permissions on specific RDS for PostgreSQL and AnalyticDB for PostgreSQL instances
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": "dts:*",
"Resource": [
"acs:gpdb:*:*:dbcluster/gp-bp13e375cd8x2****",
"acs:rds:*:*:dbinstance/pgm-2zeyjzi91g53****"
]
},
{
"Effect": "Allow",
"Action": [
"dts:DescribeRegions",
"dts:DescribeConfigRelations",
"dts:DescribeSrcLinkConfig",
"dts:DescribeDestLinkConfig",
"dts:DescribeLinkConfig",
"dts:DescribeConciseJobStatics",
"dts:ListUserAuthorizationLogs",
"dts:CreateUserAuthorization"
],
"Resource": "acs:dts:*:*:*"
}
]
}Procedure
Log on to the RDS console. In the navigation pane on the left, click Data Integration.
On the Zero-ETL tab, click Create Zero-ETL Task.
On the Create Zero-ETL Task page, configure the source and destination databases.
Configure the source database:
Source database
Description
Task name
The name of the zero-ETL task.
Database type
Select RDS for PostgreSQL.
Connection type
Only Cloud Instance is supported.
Instance region
The region where the source instance resides.
Instance ID
The ID of the RDS for PostgreSQL instance.
Database name
The name of the database in the RDS for PostgreSQL instance.
Database account
The database account for the RDS for PostgreSQL instance.
Database password
The password of the database account.
Connection method
The method used to connect to the RDS for PostgreSQL instance.
Configure the destination database:
Destination database
Description
Database type
Currently, only AnalyticDB For PostgreSQL is supported as the destination database.
Connection type
Only Cloud Instance is supported.
Instance region
The region where the destination instance resides.
Instance ID
The ID of the destination instance.
Database name
The name of the database in the destination instance that is used to receive data.
Database account
The database account for the destination instance.
Database password
The password of the database account.
After you configure the parameters, click Test Connection and Go to Next. On the page that appears, configure the following parameters:
Configuration item
Description
DDL and DML operations to synchronize
Select the DML operations (insert, update, and delete) and DDL operations (create, alter, drop, rename, and truncate) to synchronize. By default, all operations are selected.
Source objects and selected objects
The source objects in the database and the objects to synchronize.
Advanced configuration (Optional)
Set the retry interval for when the source or destination database is unreachable, and the retry interval for other issues that occur on the source or destination database.
After you configure the parameters, select Zero-ETL O&M-free Description and click Next: Configure Table and Field Mappings. Configure the following information:
Table and field configuration
Description
Database name
Select an existing database.
Table name
Select an existing data table.
Primary key column
The primary key column of the selected data table.
Distribution key
The distribution key column of the selected data table.
Type
The type of the data table. Valid values: partitioned table and replicated table.
Definition status
After you configure the table and field settings, the status changes from Undefined to Defined.
After you configure all the parameters, click Next: Save Task and Precheck.
If the precheck is successful, click Start to start the zero-ETL task.
On the Zero-ETL page, you can click the Source tab to view information about the zero-ETL task, such as its Name, Source/Destination, and Running Status.