This topic explains how to connect DataWorks to a data source in a different Alibaba Cloud account. It uses an ApsaraDB RDS for MySQL instance as an example.
Use cases
This solution is recommended if your data source and DataWorks workspace meet all of the following conditions:
The data source is an Alibaba Cloud product.
The data source and the DataWorks workspace belong to different Alibaba Cloud accounts.
How it works
For the different-account scenario, we recommend that you use a VPC (private network) connection. Use a network connectivity tool (Cloud Enterprise Network or VPC Peering Connection) to connect the data source under Account A with the DataWorks workspace resource group under Account B.
Prerequisites
-
You have an Alibaba Cloud data source that is supported by DataWorks.
-
You have created a workspace.
-
You have created a resource group and bound it to the workspace.
-
The data source and DataWorks workspace meet the requirements described in the Use cases section.
-
You have configured cross-account authorization in the Alibaba Cloud account that owns the data source.
Billing
Billing varies depending on the network connectivity tool you choose. For details, see Cloud Enterprise Network billing or VPC peering connection billing.
Using a VPC Peering Connection is free if the data source and DataWorks resource group are in different accounts but the same region.
Configure network connectivity
The following steps provide a general overview of the configuration process to help you understand the core logic. For a detailed walkthrough with specific values, see the Configuration example section.
Step 1: Collect basic information
Data source side
-
Account: This topic uses Account A as an example.
-
Region: This topic uses an ApsaraDB RDS for MySQL instance in the China (Hangzhou) region as an example.
-
VPC and vSwitch information:
NoteThis topic uses an ApsaraDB RDS for MySQL instance as an example. For other Alibaba Cloud instances, refer to the product's official documentation for instructions on how to obtain VPC information.
-
Go to the ApsaraDB RDS console, find the target instance, and click its Instance Name to open the Basic Information page.
-
In the left-side navigation pane, click Database Connection to obtain the VPC and vSwitch information for the ApsaraDB RDS for MySQL instance.
On the Database Connection page, find the Network Type row to view the VPC name (for example,
Account_A_hangzhou_VPC), its CIDR block (for example,192.168.0.0/16), and its vSwitch information (for example,Account_A_hangzhou_vSwitch).
-
DataWorks side
-
Account: This topic uses Account B as an example.
-
Region: This topic uses a DataWorks workspace and resource group in the China (Shanghai) region as an example.
-
VPC and vSwitch information of the resource group:
-
Go to the resource group list page in DataWorks, find the target resource group, and click Network Settings in the Operation column.
-
Under the relevant feature module, view the bound VPC and vSwitch information.
For example, if you need to connect an ApsaraDB RDS for MySQL instance to DataWorks for data synchronization, view the corresponding VPC and vSwitch information under Data Scheduling & Data Integration.
On the Network Settings page of the resource group, click the VPC Binding tab. In the Bound VPC column for the relevant region, find the VPC ID. In the Bound vSwitch column, find the vSwitch ID.
-
Step 2: Establish network connection
To connect VPCs across different accounts, you must use a network connectivity tool. Choose one based on your needs:
-
Cloud Enterprise Network (CEN): Suitable for complex enterprise network environments and for interconnecting multiple VPCs. For configuration details, see Interconnect VPCs that belong to different accounts.
-
VPC Peering Connection: Suitable for point-to-point VPC interconnection. For configuration details, see Use a VPC Peering Connection to achieve private communication between VPCs.
If you encounter issues during network configuration, submit a ticket to contact Alibaba Cloud technical support.
Step 3: Add a route to the resource group
When DataWorks accesses a data source in a different account, you must also add a route in the DataWorks resource group that points to the CIDR block of the data source's vSwitch.
-
Go to the resource group list page in DataWorks, find the target resource group, and click Network Settings in the Operation column.
-
Under the relevant feature module, find the bound VPC and click Custom Route in the Operation column.
-
Click Add Route, select CIDR Block for the connection method, and set Destination CIDR Block to the CIDR block of the data source's vSwitch.
Step 4 (Optional): Configure whitelist
If the data source uses a whitelist, add the CIDR block of the vSwitch bound to the resource group to the data source's whitelist to allow access.
This topic uses configuring an IP address whitelist for an ApsaraDB RDS for MySQL instance as an example. In the Whitelist and Security Group settings, add the vSwitch CIDR Block that is bound to the DataWorks resource group in Account B.
For other Alibaba Cloud instances, refer to the product's official documentation for instructions on how to configure a whitelist.
After the vSwitch CIDR block is added, you can view it in the dataworks group on the Whitelist Settings tab, for example, 172.16.66.0/24.
Verify network connectivity
-
Log on to the DataWorks console. In the target region, click in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Integration.
-
In the left-side navigation pane, click Data Sources. On the data source list page, click Add Data Source, select a data source type, and then configure the connection parameters.
-
In the list of resource groups at the bottom of the page, find the one you configured and click Test Connectivity. A Connected status indicates a successful connection.
NoteIf the result is Cannot connect, you can use the Network Connectivity Diagnostic Tool to diagnose the issue. If the resource group still cannot connect to the data source, submit a ticket for assistance.
Configuration example
This example walks you through establishing network connectivity between an ApsaraDB RDS for MySQL instance in Account A in the China (Hangzhou) region and a DataWorks workspace in Account B in the China (Shanghai) region.
1. Basic information
|
Parameter |
Data source (RDS for MySQL) |
DataWorks resource group |
|
Account |
Account A |
Account B |
|
Region |
China (Hangzhou) |
China (Shanghai) |
|
VPC |
On the Database Connection page of the ApsaraDB RDS instance Account_A_MySQL_Source, confirm that the network type is VPC and the internal endpoint is |
On the resource group details page, go to the Network Settings > VPC Binding tab. In the Data Scheduling & Data Integration section, you can view the bound VPC, vSwitch, vSwitch CIDR block, and security group information. Confirm that cross-account VPC binding is successful. |
2. Establish network connection
This solution supports using Cloud Enterprise Network (CEN) or a VPC Peering Connection to establish network connectivity between the data source and DataWorks. Choose the method that suits your needs.
If you encounter issues during network configuration, submit a ticket to contact Alibaba Cloud technical support.
Cloud Enterprise Network (CEN)
-
Log on to Account B, go to the Cloud Enterprise Network (CEN) console, and click Create CEN Instance. In the dialog box that appears, set a Name for the instance and click Confirm.
NoteAs a big data processing platform, DataWorks may need to connect to data sources in different accounts and VPCs. We recommend that you create the CEN instance in the same Alibaba Cloud account as DataWorks for centralized management.
-
In the dialog box, click Create Connection and configure the network information for the DataWorks resource group.
The following table describes the key parameters for this example. Use the default values for any parameters not mentioned.
Parameter
Configuration and example
Instance Type
For cross-account VPC interconnection, select VPC.
Region
Select the region of the DataWorks resource group, which is China (Shanghai) in this example.
Resource Owner UID
Select Current Account.
VPC
Select the VPC instance where the DataWorks resource group is located.
vSwitch
Select the vSwitch where the resource group is located. In this example, select
Account_B_Switch_sh_e.NoteCEN requires at least two vSwitches in different availability zones for disaster recovery. In addition to the resource group's vSwitch, you must add another one from a different availability zone. If you do not have at least two vSwitches, go to the vSwitch console to create another one before proceeding.
-
Click Create.
-
Authorize the cross-account VPC instance.
-
Log on to Account A, go to the VPC console, and find the VPC instance where the data source is located. In this example, the VPC instance is
Account_A_hangzhou_VPC. Click the instance name to go to the Basic Information page. -
Switch to the Cross-account Authorization tab, click CEN, and then configure the parameters based on the following information.
Parameter
Configuration and example
Peer account UID
The UID of the Alibaba Cloud account for Account B.
Peer CEN instance ID
The ID of the CEN instance that you created in Step 1.
Payer
Select the party responsible for payment.
-
Peer Account UID (Default): The connection fee and data transfer fee generated by the VPC instance are billed to the account that owns the CEN instance.
-
VPC Users: The connection fee and data transfer fee generated by the VPC instance are billed to the account that owns the VPC instance.
This example uses the default value.
ImportantChoose the payer carefully. Changing the payer later may affect your services. For more information, see Authorize a network instance that belongs to another account.
-
-
Click Determine.
-
-
Create a cross-account VPC connection.
-
Log on to Account B, go to the Cloud Enterprise Network (CEN) console, and click the ID of the CEN instance you created to go to its Basic Information page.
-
On the Transit Router tab, find the transit router you created. In the Operation column, click Create Connection and configure the network information for the data source.
The following table describes the key parameters for this example. Use the default values for any parameters not mentioned.
Parameter
Configuration and example
Instance Type
For cross-account VPC interconnection, select VPC.
Region
Select the region of the data source, which is China (Hangzhou) in this example.
Resource Owner UID
Select Cross-account and enter the UID of the Alibaba Cloud account for Account A.
VPC
Select the VPC instance where the data source is located.
vSwitch
Select the vSwitch where the data source is located. In this example, select
Account_A_Switch_hz_h.NoteCEN requires at least two vSwitches in different availability zones for disaster recovery. In addition to the data source's vSwitch, you must add another one from a different availability zone. If you do not have at least two vSwitches, go to the vSwitch console to create another one before proceeding.
-
Click Create.
-
-
Create an inter-region connection.
NoteIn this example, the data source and DataWorks are in different Alibaba Cloud accounts and different regions, so you must also create an inter-region connection. If your data source and DataWorks are in different accounts but in the same region, skip this step.
-
Log on to Account B, go to the Cloud Enterprise Network (CEN) console, and click the ID of the CEN instance you created to go to its Basic Information page.
-
On the Transit Router tab, find the transit router for China (Hangzhou) (the data source's region). In the Operation column, click Create Connection and configure the inter-region connection information.
Parameter
Configuration and example
Region
Select China (Hangzhou).
Peer region
Select China (Shanghai).
-
Click Create.
-
VPC Peering Connection
-
Log on to Account A, go to the VPC Peering Connection console, switch the region to China (Hangzhou) at the top of the page, and then click Create Peering Connection. Configure the relevant parameters.
The following table describes the key parameters for this example. Use the default values for any parameters not mentioned.
Parameter
Configuration and example
Peering Connection Name
Enter a custom name. For this example, use
Account_A to Account_B.Requester VPC
Select the VPC where the ApsaraDB RDS for MySQL data source in Account A is located. For this example, select
Account_A_hangzhou_VPC.Accepter Account Type
This example uses
Cross-account.Accepter Alibaba Cloud Account UID
Enter the UID of the Alibaba Cloud account for Account B.
Accepter Region Type
In this example,
Cross-regionis selected.Accepter Region
Select
China (Shanghai)as the region for the DataWorks workspace and resource group in Account B.Accepter VPC
Manually enter the VPC ID for the DataWorks resource group in Account B (
Account_B_shanghai_VPC). -
After you click Determine, the connection is created with a Status of Pending Acceptance.
-
Log on to Account B, go to the VPC Peering Connection console, and switch the region to China (Shanghai) at the top of the page. Find the peering connection initiated from Account A. In the Operation column, click Receiver. The connection's Status then changes to Activated.
-
Under Accepter VPC, click Configure Route Entry. In the Configure Route Entry dialog box, enter a custom Name for the route entry and set the Destination CIDR Block to the requester's vSwitch CIDR block. For this example, enter
192.168.6.0/24. -
Log on to Account A, go to the VPC Peering Connection console, switch the region to China (Hangzhou) at the top of the page, and find the peering connection you created.
-
Under Requester VPC, click Configure Route Entry. In the Configure Route Entry dialog box, enter a custom Name for the route entry and set the Destination CIDR Block to the accepter's vSwitch CIDR block. For this example, enter
172.16.66.0/24.
3. Add a route to the resource group
-
Log on to Account B, go to the resource group list page in DataWorks, find the target resource group, and click Network Settings in the Operation column.
-
Under the relevant feature module, find the bound VPC and click Custom Route in the Operation column.
-
Click Add Route, select CIDR Block for the connection method, and set Destination CIDR Block to the CIDR block of the ApsaraDB RDS for MySQL instance's vSwitch in Account A. For this example, use
192.168.6.0/24.
4. Configure whitelist
Log on to Account A and add the vSwitch CIDR Block for the DataWorks resource group to the Whitelist and Security Group settings for the ApsaraDB RDS for MySQL instance. For this example, the CIDR block is 172.16.66.0/24.
After the vSwitch CIDR block is added, you can view it in the dataworks group on the Whitelist Settings tab, for example, 172.16.66.0/24.
5. Test connectivity
Before you perform this step, make sure that you have configured cross-account authorization in the Alibaba Cloud account that owns the data source, which is Account A in this example.
-
Log on to Account B.
-
Log on to the DataWorks console. In the target region, click in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Integration.
-
In the left-side navigation pane, click Data Sources. On the Data Sources page, click Add Connection.
-
Select the MySQL data source type and configure its information.
-
For Configuration Mode, select ApsaraDB for RDS.
-
For Alibaba Cloud Account, select Another Alibaba Cloud Account.
-
For ID of Another Alibaba Cloud Account, enter the UID of Account A.
-
For Name of Role Assigned to RAM User, enter the name of the RAM role that you configured in Account A. For more information, see cross-account authorization.
-
For Region, select China (Hangzhou).
-
For Instance, select the ApsaraDB RDS for MySQL instance that you created in the China (Hangzhou) region of Account A and for which you have configured the network connection.
-
-
In the Connection Configuration section, click Test Network Connectivity for the resource group bound to the workspace.Connectable
Verify that the connectivity status is Connected.
NoteIf the test result shows Failed, use the Network Connectivity Diagnostic Tool to troubleshoot. If connectivity still fails, submit a ticket for assistance.
Related documentation
For answers to frequently asked questions about network connectivity, see Resource group operations and network connectivity.
