Which information about DataWorks and its network capabilities do I need to take note of before I configure a data synchronization node?

Before you configure a data synchronization node, take note of the following items:
  • The virtual private cloud (VPC), vSwitch, and region that are used for the data source from which you want to synchronize data, and the region in which your DataWorks workspace resides.
  • Whether the data source and your DataWorks workspace are deployed in different regions under different accounts.
For more information about how to troubleshoot issues that occur when you configure and run a synchronization node, see Supported data source types, Reader plug-ins, and Writer plug-ins.

If you encounter issues when you test the network connectivity of a data source, we recommend that you troubleshoot the issue by referring to Establish a network connection between a resource group and a data source.

If you use an exclusive resource group for Data Integration, we recommend that you perform the following operations before you configure a data synchronization node: purchase an exclusive resource group for Data Integration, associate the exclusive resource group for Data Integration with the VPC in which the data source resides, evaluate whether you need to add a route, configure a whitelist for the data source, and associate the exclusive resource group for Data Integration with a DataWorks workspace. For more information, see Create and use an exclusive resource group for Data Integration.

How do I make sure the network connectivity between a resource group in DataWorks and a self-managed data source that is hosted on an Elastic Compute Service (ECS) instance when I synchronize data from the self-managed data source?

If you want to use an exclusive resource group for Data Integration to access a self-managed data source that is hosted on an ECS instance over an internal network, you must configure network settings for the exclusive resource group for Data Integration. For more information, see Create and use an exclusive resource group for Data Integration. Take note of the following items:
  • If you associate the exclusive resource group for Data Integration with a VPC in which the ECS instance resides, a route that points to the CIDR block of the VPC is automatically added. We recommend that you do not delete the added route. If you delete the added route, you may fail to access other data sources and an error may be reported during data synchronization.
  • You must add the CIDR block of the vSwitch to which the exclusive resource group for Data Integration is bound to the whitelist of the data source. For more information, see Add the EIP or CIDR block of an exclusive resource group for Data Integration to the whitelist of a data source.

How do I make sure the network connectivity between a resource group in DataWorks and a data source that is deployed in a different region from the resource group when I synchronize data from the data source?

Before you configure and run a data synchronization node, we recommend that you use a network connectivity solution. For more information, see Establish a network connection between a resource group and a data source. Take note of the following items:

If you want to synchronize data from a data source that is deployed in a region different from your DataWorks resource group over the Internet, you must add the elastic IP address (EIP) and CIDR block of the exclusive resource group for Data Integration to the whitelist of the data source. For more information, see Add the EIP or CIDR block of an exclusive resource group for Data Integration to the whitelist of a data source.
Note You are charged for the traffic generated over the Internet. For more information, see Billing of Internet traffic.

When I synchronize data from a data source, the account that I use to access the data source is different from the account that I use to access DataWorks. How do I make sure the network connectivity between DataWorks and the data source?

Before you configure and run a data synchronization node, we recommend that you use a network connectivity solution for troubleshooting. For more information, see Establish a network connection between a resource group and a data source.
  • If you want to synchronize data over the Internet, you must add the EIP and CIDR block of the exclusive resource group for Data Integration to the whitelist of the data source. For more information, see Add the EIP or CIDR block of an exclusive resource group for Data Integration to the whitelist of a data source.
    Note You are charged for the traffic generated over the Internet. For more information, see Billing of Internet traffic.
  • If you want to synchronize data from a data source over an internal network and the account that is used to access the data source is different from the account that you use to access DataWorks, you must perform the following operations:
    1. Use a network connectivity service of Alibaba Cloud to establish a connection between the networks of the two Alibaba Cloud accounts. You can use a network connectivity service such as VPN Gateway or Express Connect.
    2. Associate an exclusive resource group for Data Integration with the VPC that is connected to the network of the Alibaba Cloud account that is used to access the data source.
    3. Add a custom route to your data center and add the IP address of the destination data source to your data center.
    4. Add the CIDR block of the vSwitch to which the exclusive resource group for Data Integration is bound to the whitelist of the data source. For more information, see Add the EIP or CIDR block of an exclusive resource group for Data Integration to the whitelist of a data source.

What do I do if the network connectivity test for a data source in a VPC fails?

The data source connectivity test is sometimes successful and sometimes fails. What do I do?

Check whether the shared resource group for Data Integration is used. The network of the shared resource group for Data Integration is unstable. We recommend that you use an exclusive resource group for Data Integration to ensure network connection stability.

I cannot find the exclusive resource group for Data Integration that I purchased when I test network connectivity for a data source or run a data synchronization node. What do I do?

Make sure that you have associated the exclusive resource group for Data Integration with a DataWorks workspace. For more information, see Create and use an exclusive resource group for Data Integration.

How can I determine the type of the resource group on which a data synchronization node is run from a log?

  • If the node is run on the shared resource group, the log contains the following information: running in Pipeline[basecommon_ group_xxxxxxxxx].
  • If the node is run on a custom resource group for Data Integration, the log contains the following information: running in Pipeline[basecommon_xxxxxxxxx].
  • If the node is run on an exclusive resource group for Data Integration, the log contains the following information: running in Pipeline[basecommon_S_res_group_xxx].

How do I change the type of the resource group on which a data synchronization node is run?

  • Change the type of the resource group for scheduling and the type of the resource group for Data Integration on which a data synchronization node is run in the production environment in Operation Center:Change resource group
  • Change the type of the resource group on which a data synchronization node is run in the production environment based on the deployment process on the DataStudio page:
    Note After you perform the following operations to change the type of the resource group on which a data synchronization node is run, click Deploy to apply the change. For a workspace in standard mode, the change takes effect only in the development environment if you click only Submit. If you want to apply the change to an auto triggered node in the production environment, you must also click Deploy. After the auto triggered node is committed and deployed, you can view the type of the resource group on the Cycle Task page in Operation Center.
    1. Change the type of the resource group for scheduling.Change resource group for scheduling on the DataStudio page
    2. Change the type of the resource group for Data Integration.Change resource group for Data Integration on the DataStudio page

How do I troubleshoot the issue that a custom resource group for scheduling waits for gateway resources?

Log on to the DataWorks console. In the left-side navigation pane, click Resource Groups. The Custom Resource Groups tab appears by default. Find the resource group that is used to run a node and click Deploy in the Actions column. In the Create Deploy Task dialog box, check whether the server is in the Stopped state and whether the server is occupied by other nodes.

If the issue persists, run the following command to restart the service:
su - admin /home/admin/alisatasknode/target/alisatasknode/bin/serverctl restart

How do I view the EIP of a resource group and add the EIP of the resource group to the IP address whitelist of the data source from which I want to synchronize data?

If you want to use an exclusive resource group for Data Integration to run a node to synchronize data from a data source over the Internet, add the EIP of the exclusive resource group to an IP address whitelist of the data source. To obtain and add the EIP of the exclusive resource group for Data Integration to an IP address whitelist of the data source, perform the following operations:

On the Exclusive Resource Groups tab of the DataWorks console, find the exclusive resource group for Data Integration whose EIP you want to view and click View Information in the Actions column. In the Exclusive Resource Groups dialog box, copy the EIP. Then, add the copied EIP to the IP address whitelist of the data source.

Why is a message indicating that a node cannot be run due to insufficient resources in a resource group displayed when the resource group still has resources?

Check the details of the resource group. In most cases, the issue occurs if the remaining resources in the resource group are insufficient to run the node. For example, nodes that are waiting for the resources in the resource group exist.