To use compute engines such as MaxCompute and EMR in DataWorks, you must first grant DataWorks the required access permissions. After you grant the permissions, the system automatically creates service-linked roles with predefined access policies for each engine.
Background
When you add or edit a compute engine instance in the DataWorks console, the system prompts you to grant permissions. After authorization, the system automatically creates a service-linked role.
-
Only an Alibaba Cloud account or a RAM user with the
AliyunDataWorksFullAccessrole can grant these permissions. If a RAM user lacks theAliyunDataWorksFullAccessrole, assign it first. For more information, see Manage RAM user permissions. -
Operations such as Data Source Management trigger the authorization prompt.
-
View role details on the page in the RAM console. For more information about service-linked roles, see Service-linked Role.
The following table lists the automatically created roles and links to their details.
|
Role |
Purpose |
Details |
|
|
Grants DataWorks permissions to access MaxCompute. |
|
|
|
Obtains metadata from EMR (new data lake) to preview data records in Data Map. |
|
|
|
Manages VPC and security group configurations to connect exclusive resource groups with data sources. |
|
|
|
Lists RAM roles so you can select one when configuring data source access. |
|
|
|
Allows DataWorks to access resources of other cloud products under the current Alibaba Cloud account when you configure data sources, configure tasks, and synchronize data. This includes some management permissions for cloud resources such as RDS, Redis, MongoDB, PolarDB-X, HybridDB for MySQL, AnalyticDB for PostgreSQL, PolarDB, DMS, and DLF. |
|
|
|
Manages EventBridge events to support DataWorks Open Platform messaging and event features. |
AliyunServiceRoleForDataWorksOpenPlatform service-linked role |
|
|
Retrieves DLF metadata and manages metadata permissions, enabling Security Center to handle DLF permission requests and approvals. |
|
|
|
Manages resources in EventBridge and accesses resources of other cloud products such as OSS. |
The following sections detail the roles for MaxCompute and EMR (new data lake) engines.
Role 1: AliyunServiceRoleForDataworksEngine
-
Role name: AliyunServiceRoleForDataworksEngine
-
Purpose: A service-linked role for DataWorks to access compute engines (dataworks-engine). The dataworks-engine service uses this role to access your resources in other cloud services.
-
Attached policy: AliyunServiceRolePolicyForDataworksEngine
-
Policy details:
{ "Version": "1", "Statement": [ { "Action": "odps:*", "Effect": "Allow", "Resource": "*" }, { "Action": [ "stream:ActOnBehalfOfAnotherUser", "stream:CreateDeployment", "stream:StartJobWithParams", "stream:ListDeployments", "stream:GetDeployment", "stream:GetJob", "stream:StopJob", "stream:DeleteDeployment" ], "Effect": "Allow", "Resource": "*" }, { "Action": "dlf-auth:ActOnBehalfOfAnotherUser", "Resource": "*", "Effect": "Allow" }, { "Action": [ "pai:*", "paiplugin:*", "eas:*", "featurestore:*" ], "Resource": "*", "Effect": "Allow" }, { "Effect": "Allow", "Action": [ "emr-serverless-spark:StartSessionCluster", "emr-serverless-spark:CreateSqlStatement", "emr-serverless-spark:GetSqlStatement", "emr-serverless-spark:TerminateSqlStatement", "emr-serverless-spark:ListSessionClusters", "emr-serverless-spark:ListWorkspaces", "emr-serverless-spark:ListWorkspaceQueues", "emr-serverless-spark:ListReleaseVersions", "emr-serverless-spark:CancelJobRun", "emr-serverless-spark:ListJobRuns", "emr-serverless-spark:GetJobRun", "emr-serverless-spark:StartJobRun", "emr-serverless-spark:AddMembers", "emr-serverless-spark:GrantRoleToUsers", "emr-serverless-spark:ListLogContents", "emr-serverless-spark:GetTemplate", "emr-serverless-spark:ListKyuubiServices", "emr-serverless-spark:GetLivyCompute", "emr-serverless-spark:CreateLivyCompute", "emr-serverless-spark:UpdateLivyCompute", "emr-serverless-spark:ListLivyCompute", "emr-serverless-spark:DeleteLivyCompute", "emr-serverless-spark:StartLivyCompute", "emr-serverless-spark:StopLivyCompute", "emr-serverless-spark:CreateLivyComputeToken", "emr-serverless-spark:GetLivyComputeToken", "emr-serverless-spark:ListLivyComputeToken", "emr-serverless-spark:DeleteLivyComputeToken", "emr-serverless-spark:RefreshLivyComputeToken" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "adb:SubmitSparkApp", "adb:GetSparkAppState", "adb:GetSparkAppLog", "adb:GetSparkAppWebUiAddress", "adb:ListSparkApps", "adb:GetSparkAppInfo", "adb:KillSparkApp", "adb:DescribeAdbMySqlTables", "adb:getDatabaseObjectsByFilter", "adb:getTable" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "lindorm:GetLindormInstanceList", "lindorm:GetLindormInstance", "lindorm:GetLindormInstanceEngineList", "lindorm:GetLindormV2InstanceEngineList", "lindorm:ListLdpsComputeGroups", "lindorm:RestartLdpsComputeGroup" ], "Resource": "*" }, { "Action": "ram:DeleteServiceLinkedRole", "Resource": "*", "Effect": "Allow", "Condition": { "StringEquals": { "ram:ServiceName": "engine.dataworks.aliyuncs.com" } } }, { "Action": [ "searchengine:GetInstance", "searchengine:ListInstances", "searchengine:GetTable", "searchengine:ListTables" ], "Resource": "*", "Effect": "Allow" } ] }
Role 2: AliyunServiceRoleForDataworksOnEmr
Do not modify or delete this role or its access policy. Doing so may break the DataWorks on EMR feature.
-
Role name: AliyunServiceRoleForDataworksOnEmr
-
Purpose: Allows DataWorks to preview data in Data Map, retrieve metadata from DLF-type EMR clusters, and read EMR cluster configurations.
-
Attached policy: AliyunServiceRolePolicyForDataworksOnEmr
-
Policy details:
-
EMR access permissions
{ "Version": "1", "Statement": [ { "Action": [ "emr:GetCluster", "emr:GetOnKubeCluster", "emr:GetClusterClientMeta", "emr:GetApplicationConfigFile", "emr:ListClusters", "emr:ListNodes", "emr:ListNodeGroups", "emr:ListApplications", "emr:ListApplicationConfigs", "emr:ListApplicationConfigFiles", "emr:ListApplicationLinks", "emr:ListComponentInstances", "emr:DescribeClusterV2", "emr:DescribeCluster", "emr:DescribeClusterServiceConfig", "emr:DescribeFlowAgentToken", "emr:DescribeClusterBasicInfo", "emr:ListClusterHostComponent" ], "Resource": "*", "Effect": "Allow" } ] } -
Data Lake Formation (DLF) access permissions
If an EMR cluster uses DLF for metadata management, the role also includes these DLF permissions to let DataWorks retrieve EMR metadata.
{ "Action": [ "dlf:SubmitQuery", "dlf:GetQueryResult", "dlf:GetTable", "dlf:ListDatabases", "dlf:GetTableProfile", "dlf:GetCatalogSettings", "dlf:BatchGrantPermissions", "dlf:ListPartitionsByFilter", "dlf:ListPartitions", "dlf:GetHudiProperties", "dlf:ListCatalogs", "dlf:GetDatabase", "dlf:GetLifecycleRule", "dlf:GetCatalog", "dlf:GetIcebergNamespace", "dlf:GetIcebergTable" ], "Resource": "*", "Effect": "Allow" } -
Container Service for Kubernetes (ACK) access permissions
If the EMR cluster runs on ACK, the role also includes these ACK permissions.
{ "Action": [ "cs:DescribeUserPermission", "cs:DescribeClusterDetail", "cs:DescribeClusterUserKubeconfig", "cs:GetClusters", "cs:GrantPermissions", "cs:RevokeK8sClusterKubeConfig" ], "Resource": "*", "Effect": "Allow" } -
Serverless Spark access permissions
If the EMR cluster is a Serverless Spark cluster, the role also includes these permissions.
{ "Effect": "Allow", "Action": [ "emr-serverless-spark:StartSessionCluster", "emr-serverless-spark:CreateSqlStatement", "emr-serverless-spark:GetSqlStatement", "emr-serverless-spark:TerminateSqlStatement", "emr-serverless-spark:ListSessionClusters", "emr-serverless-spark:ListWorkspaces", "emr-serverless-spark:ListWorkspaceQueues", "emr-serverless-spark:ListReleaseVersions", "emr-serverless-spark:CancelJobRun", "emr-serverless-spark:ListJobRuns", "emr-serverless-spark:GetJobRun", "emr-serverless-spark:StartJobRun", "emr-serverless-spark:AddMembers", "emr-serverless-spark:GrantRoleToUsers", "emr-serverless-spark:ListLogContents", "emr-serverless-spark:GetTemplate", "emr-serverless-spark:ListKyuubiServices", "emr-serverless-spark:GetLivyCompute", "emr-serverless-spark:CreateLivyCompute", "emr-serverless-spark:UpdateLivyCompute", "emr-serverless-spark:ListLivyCompute", "emr-serverless-spark:DeleteLivyCompute", "emr-serverless-spark:StartLivyCompute", "emr-serverless-spark:StopLivyCompute", "emr-serverless-spark:CreateLivyComputeToken", "emr-serverless-spark:GetLivyComputeToken", "emr-serverless-spark:ListLivyComputeToken", "emr-serverless-spark:DeleteLivyComputeToken", "emr-serverless-spark:RefreshLivyComputeToken", "emr-serverless-spark:ListLogContents" ], "Resource": "*" }The following OSS permissions allow uploading SQL files and JAR packages or saving temporary query results.
{ "Action": [ "oss:PutObject", "oss:GetObject", "oss:DeleteObject", "oss:DeleteObjectVersion" ], "Resource": [ "acs:oss:*:*:*/.dataworks/*", "acs:oss:*:*:*/.dlsdata/*" ], "Effect": "Allow" }, { "Action": "oss:PostDataLakeStorageFileOperation", "Resource": "*", "Effect": "Allow" }
-