When you use DataWorks Data Integration to sync data from an RDS, Hive, or Kafka instance that belongs to a different Alibaba Cloud account, you must configure cross-account authorization. This lets DataWorks assume a Resource Access Management (RAM) role in the account that owns the instance and read data from it using temporary credentials.
This topic applies when you add a data source with Data Source Type set to Alibaba Cloud Instance Mode and the instance and the DataWorks workspace belong to different Alibaba Cloud accounts.
Account labels used in this topic
Label | Meaning |
Source account | The Alibaba Cloud account that owns the RDS, Hive, or Kafka instance |
DataWorks account | The Alibaba Cloud account that owns the DataWorks workspace |
Prerequisites
Before you begin, ensure that you have:
A network connection between the VPC of the data source instance and the VPC of the DataWorks resource group — for example, Cloud Enterprise Network (CEN). For available solutions, see Network connection solutions
How it works
Cross-account authorization uses RAM role assumption (Security Token Service (STS) AssumeRole):
The source account creates a RAM role that trusts the DataWorks account as a principal.
The source account attaches a read-only system policy to the role so that DataWorks can read from the instance.
The DataWorks account adds the data source in Data Integration, specifying the cross-account RAM role.
When the sync task runs, DataWorks assumes the RAM role and reads data from the instance using temporary credentials.
Step 1: Configure the source account
Perform the following steps in the source account. After you complete all three sub-steps, the RAM role will have read access to the instance and will trust the DataWorks account as a principal.
Create a RAM role
Log on to the Resource Access Management (RAM) console and go to the Roles page.
Create a RAM role. For details, see Create a RAM role for a trusted Alibaba Cloud account. Set the following parameters:
Parameter
Value
Select Trusted Entity
Alibaba Cloud Account
RAM Role Name
A custom name of your choice
Select Account
Other Alibaba Cloud Account
Account UID field
UID of the DataWorks account
Attach a system policy
Grant read permissions to the RAM role. For details, see Grant permissions to a RAM role.
Set Authorization Policy to System Policy and select the policy name for your instance type:
Instance type | Policy name |
RDS (MySQL, SQL Server, PostgreSQL, MariaDB) |
|
Hive |
|
Kafka |
|
Update the trust policy
Modify the trust policy of the RAM role so that the DataWorks account can assume it. For details, see Modify the trusted entity of a RAM role.
Replace the existing trust policy with the following:
{
"Statement": [
{
"Action": "sts:AssumeRole",
"Effect": "Allow",
"Principal": {
"Service": [
"<UID of the Alibaba Cloud account that owns the DataWorks workspace>@cdp.aliyuncs.com"
]
}
}
],
"Version": "1"
}Replace <UID of the Alibaba Cloud account that owns the DataWorks workspace> with the UID of the Alibaba Cloud account that owns your DataWorks workspace.
At this point, the source account configuration is complete. The RAM role is created, has read-only access to the instance, and trusts the DataWorks account as a principal.
Step 2: Add the data source in DataWorks
Perform the following steps in the DataWorks account.
Log on to the DataWorks console. In the top navigation bar, select the region. In the left navigation pane, choose Data Integration > Data Integration. Select the workspace from the drop-down list and click Go to Data Integration.
Add an RDS, Hive, or Kafka data source and set the following parameters:
NoteSelect Other Cloud Account or Other Alibaba Cloud Account based on your data source configuration.
Parameter
Value
Data Source Type
Alibaba Cloud Instance Mode
Account Of Instance
Other Cloud Account or Other Alibaba Cloud Account
UID of Other Alibaba Cloud Account
UID of the source account
RAM Role for Authorization
Name of the RAM role created in Step 1
Test network connectivity.
What's next
After the data source is added and network connectivity is verified, create sync tasks in Data Integration to read data from the cross-account instance.