This topic describes how to use the RAM role-based authorization mode to add a data source to improve the security of data in the cloud. In this topic, an Object Storage Service (OSS) data source is used.

Prerequisites

If you want to log on to the DataWorks console and perform the operations that are described in this topic as a RAM user, you must make sure that the AliyunDataWorksFullAccess and AliyunRAMFullAccess policies are attached to the RAM user. For more information, see Grant permissions to a RAM user.
Note If you want to use an Alibaba Cloud account to log on to the DataWorks console and perform the operations, ignore this prerequisite.
The following figure shows how to attach a policy to a RAM user. Authorization

Background information

Data is synchronized based on data sources. Therefore, data sources are crucial to ensure the security of enterprise data in the cloud. DataWorks allows you to use the RAM role-based authorization mode to add and access data sources, such as OSS, AnalyticDB for MySQL 2.0, LogHub, Tablestore, and Hologres data sources. This improves the security of data in the cloud and prevents inappropriate use of data sources and leak of AccessKey pairs.
You can use the AccessKey pair-based authorization mode or the RAM role-based authorization mode to add a data source. In this topic, the RAM role-based authorization mode is used. The following descriptions provide the working principles of the AccessKey pair-based authorization mode and the RAM role-based authorization mode:
  • AccessKey pair-based authorization mode

    The AccessKey pair-based authorization mode provides lower security than the RAM role-based authorization mode. In AccessKey pair-based authorization mode, you need to specify the AccessKey pair of your Alibaba Cloud account or RAM user when you add a data source.

    The following figure shows the parameters that are required to use the AccessKey pair-based authorization mode to add an OSS data source. In the Add OSS data source dialog box, you must set the AccessKey ID and AccessKey Secret parameters to the AccessKey ID and AccessKey secret that can be used to access an OSS bucket. Add a data source
    When a synchronization node for the OSS data source runs or is scheduled, DataWorks uses the AccessKey pair to access the data source and read data from or write data to the data source.
    Note In AccessKey pair-based authorization mode, OSS data may be leaked if your AccessKey pair is leaked.
  • RAM role-based authorization mode

    The RAM role-based authorization mode provides higher security than the AccessKey pair-based authorization mode. In RAM role-based authorization mode, AccessKey pairs are not required. This prevents your AccessKey pair from being leaked.

    In RAM role-based authorization mode, you can authorize the DataWorks service account to assume a RAM role to access OSS without using AccessKey pairs.

    In addition, you can create different roles for different data sources based on your business requirements. This allows you to manage permissions in a fine-grained manner.

Process

The operation process of RAM role authorization mode is as follows.
  1. Use your Alibaba Cloud account or a RAM user to which the AliyunDataWorksFullAccess policy is attached to log on to the DataWorks console. Then, go to the Data Integration page and enable the RAM role-based authorization mode.
  2. Use your Alibaba Cloud account or a RAM user to which the AliyunRAMFullAccess policy is attached to log on to the RAM console. Then, create a role to be assumed and a policy to be attached.
    • Role to be assumed: You must create a custom role to be assumed by the DataWorks service account. After the DataWorks service account assumes the role, you can use the DataWorks service account to access OSS based on the permissions that are granted to the role.
    • Policy to be attached: You must create a policy that contains the PassRole permission and attach the policy to a RAM user. This way, the RAM user can use the custom role to add a data source or run a synchronization node for the data source.
  3. Use your Alibaba Cloud account or a RAM user to which the AliyunRAMFullAccess policy is attached to log on to the RAM console. Then, grant permissions to the RAM users that you want to use in Steps 4 and 6.
    Note In RAM role-based authorization mode, if you use an unauthorized RAM user to add a data source, all synchronization nodes for the data source fail to run.
  4. Log on to the DataWorks console by using the Alibaba Cloud account or RAM user that you want to use to add a data source. Then, go to the Data Integration page and use the RAM role-based authorization mode to add a data source. When the synchronization node for the data source runs, the system can use the DataWorks service account that assumes the created RAM role to access the data source.
    Note The Alibaba Cloud account or RAM user can be used to perform operations in this step only after the Alibaba Cloud account or RAM user is granted the required permissions in Step 3.
  5. Go to the DataStudio page by using the Alibaba Cloud account or RAM user that you want to use to create a data synchronization node. Then, create a synchronization node for the data source that you added.
  6. On the DataStudio or Operation Center page, run the data synchronization node by using the Alibaba Cloud account or RAM user that you want to use to run the node.
    Note The Alibaba Cloud account or RAM user can be used to perform operations in this step only after the Alibaba Cloud account or RAM user is granted the required permissions in Step 3.

Procedure

  1. Enable the RAM role-based authorization mode.
    When you use an Alibaba Cloud Account or a RAM user to which the AliyunDataWorksFullAccess policy is attached to add an OSS data source, you must enable the RAM role-based authorization mode the first time you use the mode. This way, the DataWorks service account can assume a RAM role and be used to access the data source.
    After you select RAM authorization mode for Access Mode, the Warning dialog box appears. You can click Enable authorization to complete authorization.
    Note For more information about how to add an OSS data source, see Add an OSS data source.
  2. Create a role to be assumed and a policy to be attached and attach the policy to the role.
    You can create different custom roles for different data sources based on your security requirements. In this example, the following scenario is used:
    Note Only an Alibaba Cloud account or a RAM user to which the AliyunRAMFullAccess policy is attached can be used to perform the operations in this step.

    An enterprise uses 100 OSS buckets to store all data, and the big data team needs to use data that is stored in only two of the OSS buckets. If the preset role AliyunDataWorksAccessingOSSRole is used, the other 98 OSS buckets may be accessed by the big data team. This may cause data leaks in these buckets.

    In this case, the owner of an Alibaba Cloud account can create a custom role named BigDataOSSRole for the big data team and allow only the members of the big data team to use the role. This helps isolate permissions across teams.

    1. Create a custom role.
      In this example, a custom role whose trusted entity is Alibaba Cloud account and whose name is BigDataOssRole is created. For more information about how to create a custom role, see Create a RAM role for a trusted Alibaba Cloud account.
    2. Create a custom policy.
      In this example, a policy that allows users to read data from and write data to two specific buckets is created. For more information about how to create a custom policy, see Create a custom policy. The following code shows the document of the policy:
      {
          "Version": "1",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": [
                      "oss:GetObject",
                      "oss:ListObjects",
                      "oss:GetObjectMetadata",
                      "oss:GetObjectMeta",
                      "oss:GetBucketAcl",
                      "oss:GetBucketInfo",
                      "oss:PutObject",
                      "oss:DeleteObject",
                      "oss:PutBucket"
                  ],
                  "Resource": [
                      "acs:oss:*:*:bucket_name_1",
                      "acs:oss:*:*:bucket_name_2"
                  ]
              }
          ]
      }
    3. Attach the policy to the BigDataOSSRole role.
      Modify the trust policy of the BigDataOSSRole role. Then, attach the created policy to the BigDataOSSRole role. This way, the user that is assigned the BigDataOSSRole role can read data from and write data to the two specified buckets.
      Important To use the role, you must perform the operations in this step.
      For more information about how to modify the trust policy of a role, see Edit the trust policy of a RAM role. The following code shows the document of the trust policy:
      {
          "Statement": [
              {
                  "Action": "sts:AssumeRole",
                  "Effect": "Allow",
                  "Principal": {
                      "Service": [
                          "di.dataworks.aliyuncs.com"
                      ]
                  }
              }
          ],
          "Version": "1"
      }
  3. Assign the role to users.
    After you determine the roles to be assumed, you must add the PassRole permission to the custom policy and attach the policy to specific users. This way, the users can use the roles to add data sources and run synchronization nodes for the data sources. You can also establish mappings between users and roles based on your business requirements.
    • You can create a policy based on the following template. The policy that is created based on the template allows authorized users to assume all roles that are related to DataWorks Data Integration. Proceed with caution when you use the template to create a policy.
      {
                  "Action": "ram:PassRole",
                  "Resource": "*",
                  "Effect": "Allow",
                  "Condition": {
                      "StringEquals": {
                          "acs:Service": "di.dataworks.aliyuncs.com"
                      }
                  }
      }
    • You can also create a custom policy that contains the PassRole permission. Then, you can establish mappings between users and roles based on your business requirements.
      Note Only an Alibaba Cloud account or a RAM user to which the AliyunRAMFullAccess policy is attached can be used to perform the operations in this step.

      In this example, after you create the BigDataOSSRole role for the big data team, you must assign the role to specific users based on your business requirements. You can create a custom policy named BigDataOSSRoleAllowUse and attach the policy to specific users. This way, the users can use the BigDataOSSRole role.

      Create a policy named BigDataOssRoleAllowUse. For more information, see Create a custom policy. The following code shows the document of the policy:
      {
          "Version": "1",
          "Statement": [
              {
                  "Action": "ram:PassRole",
                  "Resource": "acs:ram::19122324****:role/BigDataOssRole",
                  "Effect": "Allow",
                  "Condition": {
                      "StringEquals": {
                          "acs:Service": "oss.aliyuncs.com",
                          "acs:Service": "di.dataworks.aliyuncs.com"
                      }
                  }
              }
          ]
      }
      Note Replace the UID 19122324**** in the preceding code with the UID of your Alibaba Cloud account.

      After you create the BigDataOssRoleAllowUse policy, you can attach the policy to the RAM users who want to use the BigDataOssRole role. This way, the RAM users can use the BigDataOssRole role as the access identity to add data sources and run synchronization nodes for the data sources.

  4. Add a data source.
    After you are granted the required permissions by the owner of an Alibaba Cloud account, you can add a data source.
    1. Use your Alibaba Cloud account or a RAM user to which the DataWorksFullAccess policy is attached to add an OSS data source.
      In the Add OSS data source dialog box, select RAM authorization mode for Access Mode and configure other parameters based on your business requirements. The following table describes the parameters.
      Note In this example, an OSS data source is used. The parameters that you need to configure vary based on the data source type. For more information about how to add an OSS data source, see Add an OSS data source.
      OSS
      Parameter Description
      Data Source Name The name of the data source. The name can contain only letters, digits, and underscores (_), and must start with a letter.
      Data Source Description The description of the data source. The description cannot exceed 80 characters in length.
      Environment The environment in which the data source is used. Valid values: Development and Production.
      Note This parameter is displayed only if the workspace is in standard mode.
      Endpoint The endpoint of OSS. Example: http://oss.aliyuncs.com. The endpoint of OSS varies based on the region.
      Note If you add a bucket name before the endpoint of OSS and a period (.) after the bucket name, the data source can pass the connectivity test, but data synchronization will fail. For example, you cannot set this parameter to http://xxx.oss.aliyuncs.com.
      Bucket The name of the OSS bucket. A bucket is a container that is used to store objects in OSS.

      You can create one or more buckets and add one or more objects to a bucket.

      During data synchronization, DataWorks can search for objects only in the bucket that is specified by this parameter.

      Access Mode The mode that is used to access the data source. In this example, RAM authorization mode is used. Then, DataWorks can assume related roles to access the data source by using STS tokens. This ensures higher security.
      Select role The role that is assumed by DataWorks. Select a RAM role from the Role drop-down list.
    2. Test the network connectivity.
      On the Data Integration tab, find the required resource group and click Test connectivity in the Actions column.

      A synchronization node can use only one type of resource group. To ensure that your synchronization nodes can run as expected, you must test the connectivity between all resource groups for Data Integration on which your synchronization nodes run and the data sources. If you want to test the connectivity of multiple resource groups for Data Integration at a time, select the resource groups and click Batch test connectivity. For more information, see Establish a network connection between a resource group and a data source.

    3. If the connectivity test is successful, click Complete.
  5. Create a data synchronization node.
    After you add a data source, you can go to the DataStudio page and create a data synchronization node for the data source. For more information, see Configure a synchronization node .
  6. Run the data synchronization node.
    On the DataStudio or Operation Center page, run the created data synchronization node.
    Note Make sure that you are granted the required permissions in Step 3 before you can run nodes on the DataStudio page. Otherwise, the nodes fail to run.