All Products
Search
Document Center

DataWorks:Network connectivity and operations on resource groups

Last Updated:Feb 08, 2024

What information about DataWorks and its network capabilities do I need to understand before I configure a data synchronization task?

Before you configure a data synchronization task, take note of the following items:

  • The virtual private cloud (VPC), vSwitch, and region of the data source from which you want to synchronize data, and the region in which your DataWorks resource group resides.

  • Whether the data source and your DataWorks resource group reside in different regions or belong to different Alibaba Cloud accounts.

For more information about how to troubleshoot issues that occur when you configure and run a data synchronization task, see Supported data source types, readers, and writers.

If you encounter issues when you test the network connectivity of a data source, you can troubleshoot the issue by referring to Establish a network connection between a resource group and a data source.

If you want to use an exclusive resource group for Data Integration, you must perform the following operations before you can configure a data synchronization task: purchase an exclusive resource group for Data Integration, associate the resource group with the VPC in which the data source resides, evaluate whether you need to add a route, configure the IP address whitelist of the data source, and associate the resource group with a DataWorks workspace. For more information, see Create and use an exclusive resource group for Data Integration.

How do I make sure the network connectivity between a DataWorks resource group and a self-managed data source that is hosted on an ECS instance when I synchronize data from the data source?

If you want to use an exclusive resource group for Data Integration to access a self-managed data source that is hosted on an ECS instance over an internal network, you must configure network settings for the resource group. For more information, see Create and use an exclusive resource group for Data Integration. When you configure the network settings, take note of the following items:

  • If you associate the exclusive resource group for Data Integration with a VPC in which the ECS instance resides, a route that points to the CIDR block of the VPC is automatically added. We recommend that you do not delete the added route. If you delete the added route, you may fail to access other data sources and an error may be reported during data synchronization.

  • You must add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated to the IP address whitelist of the data source. For more information, see Add the EIP or CIDR block of an exclusive resource group for Data Integration to an IP address whitelist of a data source.

How do I make sure the network connectivity between a DataWorks resource group and a data source that resides in a different region from the resource group when I synchronize data from the data source?

Before you configure and run a data synchronization task, we recommend that you select an appropriate network connectivity solution to establish a network connection between the resource group and the data source. For more information, see Establish a network connection between a resource group and a data source. You must take note of the following items:

If you want to synchronize data from a data source that resides in a region different from your DataWorks resource group over the Internet, you must add the elastic IP address (EIP) of the exclusive resource group for Data Integration to the IP address whitelist of the data source. For more information, see Add the EIP or CIDR block of an exclusive resource group for Data Integration to an IP address whitelist of a data source.

Note

If you synchronize data over the Internet, you are charged for the traffic that is generated during data synchronization. For more information, see Billing of Internet traffic.

How do I make sure the network connectivity between a DataWorks resource group and a data source that belongs to a different Alibaba Cloud account from the resource group when I synchronize data from the data source?

Before you configure and run a data synchronization task, we recommend that you select an appropriate network connectivity solution to establish a network connection between the resource group and the data source. For more information, see Establish a network connection between a resource group and a data source.

  • If you want to synchronize data across accounts over the Internet, you must add the EIP of the exclusive resource group for Data Integration to the IP address whitelist of the data source. For more information, see Add the EIP or CIDR block of an exclusive resource group for Data Integration to an IP address whitelist of a data source.

    Note

    If you synchronize data over the Internet, you are charged for the traffic that is generated during data synchronization. For more information, see Billing of Internet traffic.

  • If you want to synchronize data from a data source over an internal network and the data source belongs to a different Alibaba Cloud account from your exclusive resource group for Data Integration, you must perform the following operations:

    1. Establish a network connection between the VPC in which the data source resides and a VPC (referred to as VPC 1) within the Alibaba Cloud account to which the exclusive resource group for Data Integration belongs. You can use a network connection tool such as a VPN gateway or Express Connect circuit to establish the network connection.

    2. Associate your exclusive resource group for Data Integration with VPC 1.

    3. Add a custom route for your exclusive resource group for Data Integration in the DataWorks console. When you add a custom route, set the Destination Type parameter to IDC, the Connection Method parameter to Fixed IP Address, and then enter the IP address of the data source in the Fixed IP Address field.

    4. Add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated to the IP address whitelist of the data source. For more information, see Add the EIP or CIDR block of an exclusive resource group for Data Integration to an IP address whitelist of a data source.

What do I do if the network connectivity test for a data source in a VPC fails?

The network connectivity test performed on a data source is sometimes successful and sometimes fails. What do I do?

Check whether the shared resource group for Data Integration is used. The network of the shared resource group for Data Integration is unstable. We recommend that you use an exclusive resource group for Data Integration to ensure network connection stability.

I cannot find the exclusive resource group for Data Integration that I purchased when I test network connectivity for a data source or run a data synchronization task. What do I do?

Make sure that you have associated the exclusive resource group for Data Integration with the DataWorks workspace that you want to use. For more information, see Create and use an exclusive resource group for Data Integration.

How do I view the type of resource group on which a data synchronization task is run from logs?

  • If the data synchronization task is run on the shared resource group for Data Integration, the logs generated for the task contain the following information: running in Pipeline[basecommon_ group_xxxxxxxxx].

  • If the data synchronization task is run on a custom resource group for Data Integration, the logs generated for the task contain the following information: running in Pipeline[basecommon_xxxxxxxxx].

  • If the data synchronization task is run on an exclusive resource group for Data Integration, the logs generated for the task contain the following information: running in Pipeline[basecommon_S_res_group_xxx].

How do I change the type of resource group on which a data synchronization task is run?

  • Change the resource group for scheduling and the resource group for Data Integration on which a data synchronization task is run in Operation Center in the production environment.切換资源组

  • Change the resource group on which a data synchronization task in the production environment is run on the DataStudio page and deploy the change operation.

    Note

    After you perform the following operations to change the resource group on which a data synchronization task is run, you must deploy the change operation. For a workspace in standard mode, the change takes effect only in the development environment if you only commit the change operation. If you want the change to take effect in the production environment, you must deploy the change operation. After the change operation is deployed, you can check whether the resource group used by the data synchronization task is changed on the Cycle Task page in Operation Center in the production environment.

    1. Change the resource group for scheduling used by a data synchronization task.数据开发修改调度资源组

    2. Change the resource group for Data Integration used by a data synchronization task.数据开发修改任务执行资源组

How do I troubleshoot the issue that a custom resource group for scheduling waits for gateway resources?

Log on to the DataWorks console. In the left-side navigation pane, click Resource Groups. On the Resource Groups page, click the Custom Resource Groups. Find the resource group for scheduling and click Server Management in the Actions column. In the dialog box that appears, check whether the server is in the Stopped state and whether the server is occupied by other tasks.

If the issue persists, run the following command to restart the service:

su - admin /home/admin/alisatasknode/target/alisatasknode/bin/serverctl restart

How do I view the EIP of a resource group and add the EIP of the resource group to the IP address whitelist of the data source from which I want to synchronize data?

If you want to use an exclusive resource group for Data Integration to run a task to synchronize data from a data source over the Internet, add the EIP of the exclusive resource group to the IP address whitelist of the data source. To obtain and add the EIP of the exclusive resource group for Data Integration to an IP address whitelist of the data source, perform the following operations:

Go to the Exclusive Resource Groups tab of the Resource Groups page in the DataWorks console, find the exclusive resource group for Data Integration whose EIP you want to view, and then click View Information in the Actions column. In the Exclusive Resource Groups dialog box, copy the EIP. Then, add the copied EIP to the IP address whitelist of the data source.

Why is a message indicating that a task cannot be run due to insufficient resources in a resource group displayed when the resource group still has resources?

In most cases, the issue occurs if the remaining resources in the resource group are insufficient to run the task. In this case, some tasks may be waiting for the resources in the resource group. To resolve this issue, view the details of the resource group.