All Products
Search
Document Center

ApsaraDB RDS:Use the zero-ETL feature

Last Updated:Aug 13, 2024

ApsaraDB RDS provides the zero-ETL feature to synchronize data from an ApsaraDB RDS for MySQL instance to a data warehouse. You do not need to create or maintain data synchronization tasks, and you are not charged for data synchronization tasks. This reduces data transmission costs and O&M costs. The data warehouse can be an AnalyticDB for MySQL cluster or an ApsaraDB for ClickHouse cluster.

Supported regions

China (Beijing), China (Hangzhou), China (Shanghai), and China (Shenzhen)

Note

The feature is rolled out for RDS instances in regions in phases. The information in the ApsaraDB RDS console shall prevail.

Overview

In the era of big data, enterprises must use extract-transform-load (ETL) tools to efficiently manage and use large amounts of business data distributed across different systems and platforms.

An ETL tool extracts data from an upper-level business system, transforms the data, and then loads the data to data warehouses. This process incorporates distributed data into data warehouses for further computing, analysis, and business decision-making.

The following section describes the challenges associated with traditional ETL processes:

  • Increased resource costs: Different data sources may require different ETL tools, and you are charged additional fees for creating ETL tasks.

  • Increased system complexity: The maintenance of ETL tools increases O&M difficulty and prevents you from focusing on business application development.

  • Reduced data timeliness: Specific ETL processes involve periodic batch updates. In near-real-time scenarios, analysis results cannot be quickly generated.

To resolve the issues, Alibaba Cloud ApsaraDB provides the zero-ETL feature that allows you to quickly create a data synchronization task between a business system that uses online transaction processing (OLTP) databases and a data warehouse that uses online analytical processing (OLAP) databases. The feature automatically extracts, transforms, cleanses, and loads the data of the business system to the data warehouse. This way, data synchronization and management can be completed in a one-stop manner to integrate transaction processing and data analysis. This allows you to focus on data analysis.

Benefits

  • Ease of use: You do not need to create or maintain complex data pipelines to perform ETL operations. You need to only select the source instance and destination cluster to create real-time data synchronization tasks. This reduces the challenges of building and managing data pipelines and allows you to focus on application development.

  • Zero cost: You can create zero-ETL tasks to synchronize data to a data warehouse free of charge for data analysis.

  • Multi-source aggregation: You can create a zero-ETL task to synchronize data from multiple ApsaraDB RDS for MySQL instances to an AnalyticDB for MySQL cluster or an ApsaraDB for ClickHouse cluster in real time to build a global analysis perspective.

    Note

    If you synchronize data from multiple ApsaraDB RDS for MySQL instances to an ApsaraDB for ClickHouse cluster, make sure that the objects to be synchronized do not overlap within different tasks.

Synchronization links

  • ApsaraDB RDS for MySQL to ApsaraDB for ClickHouse

  • ApsaraDB RDS for MySQL to AnalyticDB for MySQL 3.0

Billing rules

You can use the zero-ETL feature to synchronize data free of charge.

Prerequisites

Procedure

Preparations

Create a service-linked role and grant the management permissions to a RAM user.

ApsaraDB RDS for MySQL to ApsaraDB for ClickHouse

  1. Create the AliyunServiceRoleForClickHouseZeroETL service-linked role.

    Note

    When you create a task and click the instance ID drop-down list, a message indicating that you must create the AliyunServiceRoleForClickHouseZeroETL role is displayed. The system automatically creates the role. No manual operations are required.

  2. Grant a RAM user management permissions.

    To allow a RAM user to create zero-ETL tasks, you must grant the RAM user the following permissions. For more information, see Create custom policies.

    • Permissions on the ApsaraDB RDS for MySQL instance:

      The AliyunRDSFullAccess policy must be attached to the RAM user.

    • Permissions on the ApsaraDB for ClickHouse cluster (custom policy document):

      {
          "Version": "1",
          "Statement": [
              {
                  "Action": "clickhouse:*",
                  "Resource": "*",
                  "Effect": "Allow"
              },
              {
                  "Action": "ram:CreateServiceLinkedRole",
                  "Resource": "*",
                  "Effect": "Allow",
                  "Condition": {
                      "StringEquals": {
                          "ram:ServiceName": "clickhouse.aliyuncs.com"
                      }
                  }
              }
          ]
      }
    • Permissions on zero-ETL (custom policy document):

      {
          "Version": "1",
          "Statement": [
              {
                  "Action": "dts:*",
                  "Resource": "*",
                  "Effect": "Allow"
              },
              {
                  "Action": "ram:PassRole",
                  "Resource": "*",
                  "Effect": "Allow",
                  "Condition": {
                      "StringEquals": {
                          "acs:Service": "dts.aliyuncs.com"
                      }
                  }
              }
          ]
      }

ApsaraDB RDS for MySQL to AnalyticDB for MySQL 3.0

  1. Create the AliyunServiceRoleForAnalyticDBForMySQL service-linked role.

    Note

    Before you create a zero-ETL task, you must create the AliyunServiceRoleForAnalyticDBForMySQL service-linked role.

    1. Log on to the RAM console. In the left-side navigation pane, click Roles.

    2. In the upper-left corner of the page that appears, click Create Role.

    3. In the Select Role Type step, select Alibaba Cloud Service and click Next.

    4. Set the Role Type parameter to Service Linked Role and the Select Service parameter to AnalyticDB for MySQL.

    5. Click OK. Return to the Roles page and check whether the service-linked role is created.

  2. Grant a RAM user management permissions.

    You can grant a RAM user permissions on all or specific ApsaraDB RDS for MySQL instances and AnalyticDB for MySQL clusters. For more information, see Create custom policies.

    • Grant a RAM user permissions on all ApsaraDB RDS for MySQL instances and AnalyticDB for MySQL clusters (custom policy document):

      {
          "Version": "1",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": "dts:*",
                  "Resource": [
                      "acs:adb:*:*:*",
                      "acs:rds:*:*:*"
                  ]
              },
              {
                  "Effect": "Allow",
                  "Action": [
                      "dts:DescribeRegions",
                      "dts:DescribeConfigRelations",
                      "dts:DescribeSrcLinkConfig",
                      "dts:DescribeDestLinkConfig",
                      "dts:DescribeLinkConfig",
                      "dts:DescribeConciseJobStatics",
                      "dts:ListUserAuthorizationLogs",
                      "dts:CreateUserAuthorization"
                  ],
                  "Resource": [
                      "acs:dts:*:*:*"
                  ]
              }
          ]
      }
    • Grant a RAM user permissions on all or specific ApsaraDB RDS for MySQL instances and AnalyticDB for MySQL clusters (custom policy document):

      {
          "Version": "1",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": "dts:*",
                  "Resource": [
                      "acs:adb:*:*:dbcluster/am-2zeod8ax4b9a****", 
                      "acs:rds:*:*:dbinstance/rm-2ze6fs8ouh43****" 
                  ]
              },
              {
                  "Effect": "Allow",
                  "Action": [
                      "dts:DescribeRegions",
                      "dts:DescribeConfigRelations",
                      "dts:DescribeSrcLinkConfig",
                      "dts:DescribeDestLinkConfig",
                      "dts:DescribeLinkConfig",
                      "dts:DescribeConciseJobStatics",
                      "dts:ListUserAuthorizationLogs",
                      "dts:CreateUserAuthorization"
                  ],
                  "Resource": "acs:dts:*:*:*"
              }
          ]
      }

Procedure

  1. Log on to the ApsaraDB RDS console. In the left-side navigation pane, click Data Integration.

  2. On the Zero-ETL tab, click Create Zero-ETL Task.

  3. On the Create Zero-ETL Task page, configure the source and destination databases.

    • The following table describes the parameters of the source database.

      Parameter

      Description

      Task Name

      The name of the zero-ETL task.

      Database Type

      The database type. Select RDS for MySQL.

      Access Method

      The access method. Set the value to Alibaba Cloud Instance.

      Instance Region

      The region of the source instance.

      Instance ID

      The ID of the ApsaraDB RDS for MySQL instance.

      Database Name

      The name of the database in the ApsaraDB RDS for MySQL instance.

      Database Account

      The username of the database account for the ApsaraDB RDS for MySQL instance.

      Database Password

      The password of the database account for the ApsaraDB RDS for MySQL instance.

      Encryption

      The method that is used to connect to the ApsaraDB RDS for MySQL instance. Valid values: Non-encrypted and SSL-encrypted.

      Note

      If you select SSL-encrypted, you must enable SSL encryption for the ApsaraDB RDS for MySQL instance. For more information, see Configure the SSL encryption feature.

    • The following table describes the parameters of the destination database.

      Parameter

      Description

      Database Type

      The database type. Valid values: AnalyticDB for MySQL 3.0 and ClickHouse.

      Access Method

      The access method. Set the value to Alibaba Cloud Instance.

      Instance Region

      The ID of the cluster in which the destination database is created.

      Instance ID

      The ID of the cluster in which the destination database is created.

      Database Account

      The username of the database account for the cluster in which the destination database is created.

      Database Password

      The password of the database account for the cluster in which the destination database is created.

  4. After you configure the preceding parameters, click Test Connectivity and Proceed. In the Configure Zero-ETL step, configure the parameters. The following table describes the parameters.

    Parameter

    Description

    Synchronization Types

    The type of the data synchronization between the source and destination databases.

    Valid values: Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization.

    Note

    For synchronization tasks, Incremental Data Synchronization is automatically selected and cannot be cleared. Incremental data of the source database is synchronized to the destination database in real time.

    If you do not want to synchronize incremental data, use the data migration method.

    DDL and DML Operations to Be Synchronized

    The DML operations and DDL operations that you want to synchronize. The DML operations include INSERT, UPDATE, and DELETE. The DDL operations include CREATE, ALTER, DROP, RENAME, and TRUNCATE. By default, all operations are selected.

    Source Objects and Selected Objects

    The objects in the source database and the objects that you want to synchronize.

    Advanced Settings (Optional)

    The retry time for failed connections between the source and destination databases and the retry time for other issues that occur on the source and destination databases.

  5. After you configure the preceding parameters, click Next: Configure Database and Table Fields. In the Configure Database and Table Fields step, configure the required parameters. The following table describes the parameters.

    Parameter

    Description

    Database Name

    The name of the selected database.

    Table Name

    The name of the selected table.

    Primary Key Column

    The primary key column of the selected table.

    Distribution Key

    The distribution key column of the selected table.

    Type

    The type of the table, which can be a partitioned table or a replicated table.

    Definition Status

    The status of the selected table. After you configure the table fields, the status of the table changes from Undefined to Defined.

  6. After you configure the preceding parameters, click Next: Save Task Settings and Precheck.

  7. If the precheck is successful, click Start to start the zero-ETL task.

Note

On the Zero-ETL tab, you can click the Source tab to view the Name, Source/Destination, and Status parameters of the zero-ETL task.