If you synchronize data in PolarDB to MaxCompute, the source data source is PolarDB, and the destination data source is MaxCompute. Before you run a data synchronization node, you must refer to the operations in this topic to prepare the configurations such as network environments and whitelists for data sources.

Prerequisites

Before you configure data sources, make sure that the following operations are performed:
  • Prepare data sources: A PolarDB for MySQL cluster and a MaxCompute project are created. In this topic, a PolarDB for MySQL cluster is used as the source data source.
  • Plan and prepare resources: An exclusive resource group for data integration is purchased and configured. For more information, see Plan and configure resources.
  • Evaluate and plan the network environment: Before you perform data integration, connect data sources to exclusive resource groups for data integration based on your business requirements. After data sources and exclusive resource groups for data integration are connected, you can refer to the operations in this topic to configure access settings such as vSwitches and whitelists.
    • If data sources and exclusive resource groups for data integration reside in the same region and virtual private cloud (VPC), they are automatically connected.
    • If data sources and exclusive resource groups for data integration reside in different network environments, you must connect data sources and resource groups by using methods such as a VPN gateway.
  • Prepare the MaxCompute client: The MaxCompute client is installed. You need to use the MaxCompute client to configure attributes for the destination MaxCompute data source. For more information, see MaxCompute client (odpscmd).

Background information

Before you synchronize data from source data sources to destination data sources, make sure that data sources and exclusive resource groups are connected. In addition, you must make sure that exclusive resource groups can be used to access data sources.
  • Configure whitelists for data sources
    If data sources and exclusive resource groups for data integration reside in the same VPC, you need to add the CIDR block of the exclusive resource group to the whitelists of data sources. This ensures that the exclusive resource group for data integration can be used to access data sources.VPC connection
  • Create an account and authorize the account

    You must create an account that can be used to access data sources, read data from the source data source, and write data to the destination data source in the data integration process.

  • Enable the binary logging feature

    If the source data source is a PolarDB for MySQL cluster, you must enable the binary logging feature for the cluster. Alibaba Cloud PolarDB for MySQL is fully compatible with MySQL and uses high-level physical logs to replace binary logs. To facilitate the integration between PolarDB and the MySQL ecosystem, you can enable the binary logging feature for PolarDB clusters.

Limits

  • Only PolarDB for MySQL clusters can be used as source data sources. In this topic, PolarDB indicates PolarDB for MySQL data sources.
  • Only data stored on the primary node of the PolarDB cluster can be synchronized.

Configure the source PolarDB data source

  1. Configure a whitelist for the PolarDB for MySQL cluster.
    To add the CIDR block of the VPC where the exclusive resource group for Data Integration resides to a whitelist of the PolarDB for MySQL cluster, perform the following steps:
    1. View and record the elastic IP address (EIP) and CIDR block of the exclusive resource group for Data Integration.
      1. Log on to the DataWorks console.
      2. In the left-side navigation pane, click Resource Groups.
      3. On the Exclusive Resource Groups tab, find the exclusive resource group for Data Integration and click View Information in the Actions column.
      4. In the Exclusive Resource Groups dialog box, view and record the values of the EIPAddress and CIDR Blocks parameters.
      5. On the Exclusive Resource Groups tab, find the exclusive resource group for Data Integration and click Network Settings in the Actions column.
      6. On the VPC Binding tab of the page that appears, view and record the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated.
    2. Add the EIP and CIDR blocks recorded in the preceding steps to the whitelist of the PolarDB for MySQL cluster.
      Whitelist of the PolarDB clusterFor more information, see Configure an IP whitelist.
  2. Create a PolarDB for MySQL database account.
    For more information, see Create a database account.
  3. Enable the binary logging feature for the PolarDB cluster.
    For more information, see Enable binary logging.

Configure the destination MaxCompute data source

  1. Log on to the MaxCompute client by using the account of a project owner.
    For more information, see MaxCompute client (odpscmd).
  2. Enable the atomicity, consistency, isolation, durability (ACID) property for the MaxCompute project.
    Run the following command on the MaxCompute client:
    setproject odps.sql.acid.table.enable=true;
  3. Optional:Enable the MaxCompute V2.0 data type edition.
    If you need to use the TIMESTAMP data type in MaxCompute V2.0, run the following command to enable the MaxCompute V2.0 data type edition:
    setproject odps.sql.type.system.odps2=true;
  4. Create an Alibaba Cloud account.
    This account is used to add a data source and access MaxCompute for data synchronization. For more information about how to create an Alibaba Cloud account, see Create an Alibaba Cloud account.

    After the Alibaba Cloud account is created, you can record the AccessKey ID and AccessKey secret of the account for future use.

What to do next

After data sources are configured, the source data source, destination data source, and exclusive resource group for data integration are connected. Then, the exclusive resource group for data integration can be used to access data sources. You can add the source data source and destination data source to DataWorks, and associate them with a data synchronization solution when you create the solution.

For more information about how to add a data source, see Add a data source.