This topic describes how to use the RAM authorization mode to configure a connection to an Object Storage Service (OSS) bucket, so as to improve the security of data on Alibaba Cloud.

Prerequisites

The RAM user that is used to configure a connection to a data store in RAM authorization mode is granted the AliyunDataWorksFullAccess and AliyunRAMFullAccess permissions. If you are using your Alibaba Cloud account, ignore this prerequisite.
  1. Log on to the RAM console by using your Alibaba Cloud account.
  2. In the left-side navigation pane, click Users.
  3. In the Actions column of the RAM user to which you want to grant permissions, click Add Permissions.Add permissions
  4. In the Add Permissions dialog box, set Authorization to Alibaba Cloud account all resources. In the Select Policy section, click AliyunDataWorksFullAccess and AliyunRAMFullAccess.Authorize the RAM user
  5. Click OK.

Background information

As the basis of data synchronization, connections to data stores are crucial to the security of enterprise data in the cloud. DataWorks allows you to use the RAM authorization mode, which provides higher security, to connect to data stores such as OSS, AnalyticDB for MySQL 2.0, LogHub, Tablestore, and Hologres. This improves the security of data in the cloud and avoids abuse of data stores and leak of AccessKey pairs.
You can use either the RAM authorization mode or the AccessKey mode to connect to data stores. In this topic, the RAM authorization mode is used. You can also use the AccessKey mode based on your business requirements. The following part describes the working principles of the AccessKey mode and the RAM authorization mode, respectively.
  • AccessKey mode

    The AccessKey mode provides lower security when compared with the RAM authorization mode. The AccessKey mode involves AccessKey IDs and AccessKey secrets. You need to enter only the AccessKey pair of your Alibaba Cloud account or RAM user to configure connections to data stores.

    To configure a connection to an OSS bucket, enter the AccessKey pair of an account that is granted the permission to connect to the OSS bucket in the Add OSS data source dialog box. Then, the connection to the OSS bucket is configured.

    When a sync node is run or scheduled, DataWorks uses the AccessKey pair to connect to OSS and read or write data.
    Note In AccessKey mode, the leak of AccessKey pairs will result in the leak of OSS data.
  • RAM authorization mode

    The RAM authorization mode provides higher security when compared with the AccessKey mode. In RAM authorization mode, no AccessKey pairs are used. This avoids leak of AccessKey pairs.

    In RAM authorization mode, you can authorize the DataWorks service account to assume a RAM role to connect to OSS without using AccessKey pairs.

    You can assign permissions on different data stores to different roles to realize professional permission management for enterprise users.

Workflow

The following workflow describes how to configure a connection to a data store in RAM authorization mode, create a sync node based on the connection, and run the sync node. In this workflow, a RAM user must be granted relevant permissions before it can function in the same way as an Alibaba Cloud account.

  1. Go to the DataWorks Data Integration page by using your Alibaba Cloud account or a RAM user that is granted the AliyunDataWorksFullAccess permission, and enable the RAM authorization mode.
  2. Go to the RAM console by using your Alibaba Cloud account or a RAM user that is granted the AliyunRAMFullAccess permission, and define the role to be assumed and the policy to be attached.
    • Role to be assumed: You must define a role for the DataWorks service account to assume. After the DataWorks service account assumes the role, the DataWorks service account can be used to connect to OSS as limited by the permissions granted to the role.
    • Policy to be attached: You must create a policy that includes the PassRole permission and attach the policy to a user so that the user can use the specified role to connect to data stores or run sync nodes.
  3. Go to the RAM console by using your Alibaba Cloud account or a RAM user that is granted the AliyunRAMFullAccess permission, and grant permissions to the RAM user to be used in steps 4 and 6.
    Note If an unauthorized RAM user is used to configure a connection to a data store in RAM authorization mode, all sync nodes created based on the connection will fail.
  4. Go to the DataWorks Data Integration page and configure a connection to a data store in RAM authorization mode. During the execution of sync nodes, the DataWorks service account assumes the specified RAM role to connect to the data store.
    Note You must be authorized in Step 3 before you can perform the operations in this step as a RAM user.
  5. Go to the DataStudio page and create a sync node based on the configured connection.
  6. Run the sync node on the DataStudio or Operation Center page.
    Note You must be authorized in Step 3 before you can perform the operations in this step as a RAM user.

Procedure

  1. Enable the RAM authorization mode.
    When you use the RAM authorization mode for the first time, you must enable the RAM authorization mode once and for all, so that the DataWorks service account can assume the specified role to connect to data stores. To enable the RAM authorization mode once and for all, perform the following steps:
    1. Go to the Data Source page by using your Alibaba Cloud account or a RAM user that is granted the AliyunDataWorksFullAccess permission.
    2. Click New data source in the upper-right corner.
    3. In the Add data source dialog box, click OSS in the Semi-structuredstorage section.
    4. In the Add OSS data source dialog box, set Access mode to RAM authorization mode.
      OSS
    5. In the Warning dialog box, click Enable authorization.
  2. Create a role to be assumed.
    You can create different roles for connecting to different data stores.
    Note Only Alibaba Cloud accounts and RAM users that are granted the AliyunRAMFullAccess permission are allowed to perform the operations in this step.

    This section describes how to create a role in the following scenario:

    An enterprise has 100 OSS buckets that store all the data of the enterprise, and the big data team needs to use the data of only two OSS buckets. If the preset AliyunDataWorksAccessingOSSRole role is used, the other 98 OSS buckets may be accessed by the big data team, causing management risks.

    Therefore, the cloud account owner can create a role named BigDataOSSRole for the big data team, and allow only the big data team to use the role. This helps isolate permissions across teams.

    1. Log on to the RAM console.
    2. In the left-side navigation pane, click RAM Roles.
    3. On the RAM Roles page, click Create RAM Role.
    4. In the Create RAM Role right-side pane, set Trusted entity type to Alibaba Cloud Account in the Select Role Type step and click Next.
      RAM role
    5. In the Configure Role step, set RAM Role Name to BigDataOSSRole and Select Trusted Alibaba Cloud Account to Current Alibaba Cloud Account.
      Configure the role
    6. Click OK.
    7. In the Finish step, click Add Permissions to RAM Role.
    8. In the Add Permissions right-side pane, click Create Policy in the Select Policy section. For more information, see Create a custom policy.
      The following policy grants users the read and write permissions on the two OSS buckets:
      {
          "Version": "1",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": [
                      "oss:GetObject",
                      "oss:ListObjects",
                      "oss:GetObjectMetadata",
                      "oss:GetObjectMeta",
                      "oss:GetBucketAcl",
                      "oss:GetBucketInfo",
                      "oss:PutObject",
                      "oss:DeleteObject",
                      "oss:PutBucket"
                  ],
                  "Resource": [
                      "acs:oss:*:*:bucket_name_1",
                      "acs:oss:*:*:bucket_name_2"
                  ]
              }
          ]
      }
    9. On the RAM Roles page, click BigDataOSSRole.
      On the details page of the BigDataOSSRole role, click the Trust Policy Management tab and then click Edit Trust Policy. In the Edit Trust Policy right-side pane, replace the trust policy of BigDataOSSRole with the following script, and click OK. Then, the DataWorks Data Integration service account is allowed to assume the role.
      Notice This step cannot be skipped. Otherwise, this role cannot be used.
      {
          "Statement": [
              {
                  "Action": "sts:AssumeRole",
                  "Effect": "Allow",
                  "Principal": {
                      "Service": [
                          "di.dataworks.aliyuncs.com"
                      ]
                  }
              }
          ],
          "Version": "1"
      }
      Trust policy
  3. Authorize the users that can use specific roles.
    After you determine the roles to be assumed, you must attach the policy that includes the PassRole permission to specific users, so that the users can use the roles to connect to data stores and run sync nodes. You can also configure the relationships between users and roles. That is, you can specify which users are allowed to use which roles.
    • Policy template 1: You can create a policy by referring to the following template. The template allows authorized users to use all roles related to DataWorks Data Integration, so we recommend that you exercise with caution.
      {
                  "Action": "ram:PassRole",
                  "Resource": "*",
                  "Effect": "Allow",
                  "Condition": {
                      "StringEquals": {
                          "acs:Service": "di.dataworks.aliyuncs.com"
                      }
                  }
      }
    • Policy template 2: You can customize a policy that includes the PassRole permission, and configure the relationships between users and roles.
      Note Only Alibaba Cloud accounts and RAM users that are granted the AliyunRAMFullAccess permission are allowed to perform the operations in this step.

      In this example, after the BigDataOSSRole role is defined for the big data team, you must specify that only relevant users are allowed to use the role. You can customize the BigDataOSSRoleAllowUse policy to authorize relevant users to use the role.

      To create the BigDataOSSRoleAllowUse policy, perform the following steps:
      1. On the page, click Create Policy.
      2. On the Create Custom Policy page, set Policy Name to BigDataOSSRoleAllowUse, set Configuration Mode to Script, and enter the following script:
        {
            "Version": "1",
            "Statement": [
                {
                    "Action": "ram:PassRole",
                    "Resource": "acs:ram::19122324****:role/BigDataOssRole",
                    "Effect": "Allow",
                    "Condition": {
                        "StringEquals": {
                            "acs:Service": "oss.aliyuncs.com",
                            "acs:Service": "di.dataworks.aliyuncs.com"
                        }
                    }
                }
            ]
        }
        Note Replace the UID (19122324****) in the preceding script with the UID of your Alibaba Cloud account.
        Policy
      3. Attach the BigDataOSSRoleAllowUse policy to RAM users that are allowed to use the BigDataOSSRole role.

        RAM users that are attached the BigDataOSSRoleAllowUse policy can use the BigDataOSSRole role as the identity to connect to data stores and run sync nodes.

  4. Configure a connection to a data store.
    After you are granted the required permissions by the cloud account owner, you can configure a connection to a data store.
    1. Go to the Data Source page by using your Alibaba Cloud account or a RAM user that is granted the AliyunDataWorksFullAccess permission.
    2. Click New data source in the upper-right corner.
    3. In the Add data source dialog box, click OSS in the Semi-structuredstorage section.
    4. In the Add OSS data source dialog box, set Access mode to RAM authorization mode and set related parameters.
      OSS
      Parameter Description
      Data Source Name The name of the connection. The name can contain letters, digits, and underscores (_), and must start with a letter.
      Description The description of the connection. The description can be up to 80 characters in length.
      Applicable environment The environment in which the connection is used. Valid values: Development and Production.
      Note This parameter is displayed only when the workspace is in standard mode.
      Endpoint The OSS endpoint, in the format of http://oss.aliyuncs.com. The OSS endpoint varies with the region.
      Note If you add the bucket name before the domain name, for example, http://xxx.oss.aliyuncs.com, the connection can pass the connectivity test but data synchronization will fail.
      Bucket The name of the OSS bucket. A bucket is a storage space that serves as a container for storing objects.

      You can create one or more buckets and add one or more objects to each bucket.

      DataWorks can search for objects only in the bucket specified here during data synchronization.

      Access mode The mode that is used to access the data store. In this example, select RAM authorization mode. Then, DataWorks can assume related roles to access data stores by using STS tokens. This ensures higher security.
      Role The role that DataWorks assumes. Select a RAM role from the Role drop-down list.
    5. On the Data Integration tab, find the exclusive resource group for Data Integration and click Test connectivity in the Operation column.
      A sync node uses only one resource group. Therefore, you must test the connectivity of all the resource groups for Data Integration that your sync nodes use to connect to the data store so that sync nodes can be properly run. If you need to test the connectivity of multiple resource groups at a time, select the resource groups and click Batch test connectivity. For more information, see Test data store connectivity.
    6. After the connection passes the connectivity test, click Complete.
    After you configure a connection to a data store, you can go to the DataStudio page and create a sync node based on the connection. For more information, see Create a sync node by using the codeless UI.

    Make sure that the user to run or schedule a sync node on the DataStudio page has been authorized in Step 3. Otherwise, the sync node may fail.