When you use DataWorks to synchronize data, you can use only exclusive resource groups for data integration to run real-time sync nodes. This topic describes the resources and configurations required to run real-time sync nodes.

Background information

  • Resource planning and preparation

    Before you use a data synchronization node to synchronize data, you must purchase an exclusive resource group for data integration and add the resource group to DataWorks for subsequent use.

    For more information about exclusive resource groups for Data Integration, see Exclusive resource groups for Data Integration.

  • Network connections

    An exclusive resource group for Data Integration is essentially a group of Elastic Compute Service (ECS) instances. After you purchase and create such an exclusive resource group, it is isolated from other services. You must associate the resource group with a virtual private cloud (VPC) to ensure network connectivity between the resource group and data sources during subsequent data synchronization.

Purchase exclusive resources for Data Integration

  1. Log on to the DataWorks console.
  2. In the top navigation bar, select a region. In the left-side navigation pane, click Resource Groups.
  3. On the Exclusive Resource Groups tab, click Create Resource Group for Data Integration.
  4. In the Create a dedicated resource group panel, click Purchase next to Order Number. The buy page for DataWorks exclusive resources appears.
  5. On the buy page, configure the parameters based on your business requirements. Then, click Buy Now.
    Note You must set the Type parameter to Exclusive Resource Groups for Data Integration.
  6. On the Confirm Order page, confirm that the order information is correct, read and select DataWorks Exclusive Resources Agreement of Service, and then click Pay.

Create an exclusive resource group for Data Integration

  1. On the Exclusive Resource Groups tab of the Resource Groups page, click Create Resource Group for Data Integration.
  2. In the Create a dedicated resource group panel, configure the parameters.
    Parameter Description
    Resource Group Type The type of the resource group. Valid values: Exclusive Resource Groups and Exclusive Resource Groups for Data Integration. A resource group of the Exclusive Resource Groups type can be used to run all types of nodes, and a resource group of the Exclusive Resource Groups for Data Integration type can be used to run only synchronization nodes.
    Resource Group Name The name of the resource group. The name must be unique within all the resource groups of a tenant.
    Note A tenant refers to an Alibaba Cloud account. Each tenant may have multiple Resource Access Management (RAM) users.
    Resource Group Description The description of the resource group.
    Order Number The order number of the exclusive resources that you purchased. If no exclusive resources are purchased, click Purchase next to Order Number to go to the buy page and purchase exclusive resources.
  3. After the parameters are configured, click OK.
    Note The exclusive resource group is initialized within 20 minutes. Wait until the state of the exclusive resource group changes to Running.

Configure network settings

Exclusive resource groups are deployed in a VPC in which DataWorks is hosted. Exclusive resource groups are disconnected from other network environments. To use an exclusive resource group, you must configure network settings for the exclusive resource group to associate it with a VPC that can connect to data sources. This way, the exclusive resource group can access the data sources over the VPC.

  1. Find the resource group for which you want to configure network settings, and click Network Settings in the Actions column. The VPC Binding tab appears.
    Note Before you associate the exclusive resource group with a VPC, configure permissions in the RAM console to authorize DataWorks to access your cloud resources.
  2. Associate the exclusive resource group with a VPC.
    1. On the VPC Binding tab, click Add Binding in the upper-left corner. In the Add VPC Binding panel, configure the parameters based on the network environment.
      The following table describes the parameters.
      Parameter Configuration if your data source and exclusive resource group reside in the same VPC Configuration if your data source and exclusive resource group reside in different VPCs
      VPC If your data source is deployed in a VPC, we recommend that you set this parameter to the VPC in which your data source resides. If your data source is not deployed in a VPC, or your data source and exclusive resource group need to be deployed in different VPCs, you can click Create VPC on the right side of this parameter to create a VPC for the exclusive resource group. After the VPC is created, set this parameter to the newly created VPC.
      VSwitch If you set the VPC parameter to the VPC in which your data source resides, we recommend that you select the vSwitch to which the data source is connected. If you set the VPC parameter to another VPC or no vSwitch is available, you can click Create VSwitch on the right side of this parameter to create a vSwitch for the exclusive resource group. After the vSwitch is created, set this parameter to the newly created vSwitch.
      Note After you associate the exclusive resource group with the VPC in which the data source resides and a vSwitch that resides in the VPC, a route is automatically added. The destination of this route is the CIDR block of the VPC. This ensures that the exclusive resource group can access all the data sources in this VPC.
      Security Groups Security groups allow or deny access to the resources in your exclusive resource group over the Internet or an internal network. You can select an existing security group based on your business requirements, or click Create Security Group on the right side of this parameter to create a security group for the resources in the exclusive resource group. For more information about how to create a security group, see Add security group rules.
    2. Click OK.
  3. Optional:Add host configurations.
    You may fail to access your data source by using IP addresses. For example, you can access your data source only by using hostnames. In this case, you must perform the following steps to add host configurations. Otherwise, the connectivity test fails when you add the data source by using its hostnames.
    1. Click the Hostname-to-IP Mapping tab. Then, click Add in the upper-left corner of the tab. In the Create Hostname-to-IP Mapping dialog box, configure the parameters. The following table describes the parameters.
      Parameter Description
      IP Address The actual IP address of the data source.
      The hostname The hostname that is used to access the data source. If you want to specify multiple hostnames, place each hostname on a separate line.
      Note The domain name can contain digits, letters, hyphens (-), and periods (.). It must start with a letter and end with a letter or digit.
    2. If the data source has multiple IP addresses, click Add to add more host configurations.
      Note
      • The IP address or hostnames that are added in a host configuration must be different from the IP addresses or hostnames in existing host configurations.
      • You can map one IP address to multiple hostnames in a host configuration. However, one hostname can point to only one IP address.
  4. Optional:Add Domain Name System (DNS) configurations.
    You may fail to access your data source by using IP addresses. For example, you can access your data source only by using the domain name of a Server Load Balancer (SLB) instance, and an internal DNS server resolves the domain name to IP addresses of your data source. In this case, you must perform the following steps to add DNS configurations. Otherwise, the connectivity test fails when you add the data source by using its DNS configuration.
    Note If a domain name that is added in a host configuration is also configured in a DNS configuration, the system preferentially uses the host configuration to access the data source.
    1. Click the DNS Configuration tab. Then, click Add in the lower-left corner of the tab. After you configure the parameters for a DNS configuration, click Save. The following table describes the parameters.
      Parameter Description
      Domain Optional. If you can use the same second-level domain to access your data sources, set this parameter to the second-level domain.

      For example, the domain name that is used to access Data source 1 is domain1.example.com, and the domain name that is used to access Data source 2 is domain2.example.com. In this example, we recommend that you set this parameter to example.com.

      Note The domain name can contain digits, letters, hyphens (-), and periods (.). It must start with a letter and end with a letter or digit.
      NameServer Enter the IP address of the DNS server that resolves the domain name of the data source. If you want to specify multiple DNS servers, place the IP address of each DNS server on a separate line.
    2. To modify an existing DNS configuration, click Modify in the lower-left corner.
After network settings are configured for the exclusive resource group for Data Integration, you must add the EIP of the exclusive resource group and the elastic network interface (ENI) IP address of the VPC to the whitelist of the data source.

What to do next

After you plan and configure resources, you can configure data sources. You must configure network connectivity for the data sources and permissions to access the data sources. This facilitates the creation of a real-time sync node. You can synchronize data to an AnalyticDB for MySQL data source only from a PolarDB or MySQL data source. You can select a PolarDB or MySQL data source based on your business requirements. For more information about how to configure a PolarDB or MySQL data source, see Configure a data source (PolarDB) or Configure data sources for data synchronization from MySQL.