DataWorks allows you to associate a MaxCompute compute engine with a workspace on the Computing Engine Information tab of the Workspace page in the DataWorks console. This way, you can use the MaxCompute compute engine as a compute engine instance of the workspace to run MaxCompute nodes in DataWorks. This topic describes how to associate a MaxCompute compute engine with a workspace.
Prerequisites
- The account that you want to use is assigned the Workspace Administrator role. For more information, see the Add a RAM user to a workspace as a member and assign roles to the member section in the Manage permissions on workspace-level services topic.
- A RAM user is added to the workspace as a member. If you want to specify the RAM user as an identity to access the compute engine instance that is associated with the workspace in the production environment, you must add the RAM user to the workspace first. For more information, see the Add a RAM user to a workspace as a member and assign roles to the member section in the Manage permissions on workspace-level services topic.
Background information
Before you associate a compute engine with a workspace, make sure that you are familiar with the information that is provided in the following topics. This ensures that you can perform subsequent data development operations as expected.Overview
- You can create a MaxCompute project and associate the project with the workspace.
- You can associate an existing MaxCompute project with a workspace in basic mode as a compute engine instance.
- You can associate a lakehouse project with a workspace. For more information, see Lakehouse of MaxCompute.
Precautions
- The permissions of the account that you use to create a DataWorks workspace and associate a MaxCompute compute engine with the workspace depend on the type of the account. The following table provides the detailed information.
Account type Permission description Alibaba Cloud account If you use an Alibaba Cloud account to create a DataWorks workspace and associate a MaxCompute compute engine with the workspace, the account has operation permissions on all objects within the MaxCompute compute engine instance. Only authorized users can access the MaxCompute compute engine instance. RAM user - If you use a RAM user to create a DataWorks workspace and associate a MaxCompute compute engine with the workspace, the RAM user and the Alibaba Cloud account to which the RAM user belongs have operation permissions on all objects within the MaxCompute compute engine instance. Only authorized users can access the MaxCompute compute engine instance.
- To facilitate management, the system automatically assigns the Super_Administrator role of the MaxCompute compute engine instance to the RAM user. Note If the workspace is in standard mode, the RAM user is assigned the Super_Administrator role of the MaxCompute compute engine instance only in the development environment. For information about workspace modes, see Differences between workspaces in basic mode and workspaces in standard mode.
- The identity that you select to access the MaxCompute compute engine instance in the production environment has the following impacts on data and resources in the compute engine instance.Note For workspaces in basic mode, fine-grained permission management is not supported. The following table describes the impacts on data and resources in the production environment.
Item Description References Impact on ownership of data and resources in the production environment The data and resources in the production environment of the current workspace belong to the access identity that you selected when you associate a MaxCompute compute engine with the workspace in the production environment. The default access identity is an Alibaba Cloud account. By default, RAM users cannot perform operations on data and resources in the production environment. Impact on access control for data and resources in the production environment If a RAM user is specified as the scheduling access identity of a MaxCompute compute engine that you associate with the workspace in the production environment, the RAM user can perform operations on or access tables in the MaxCompute compute engine. In other scenarios, a RAM user cannot perform operations on or access tables in the MaxCompute compute engine in the production environment even after the RAM user is added to the workspace as a member. If you want the RAM user to perform operations on or access tables in the MaxCompute compute engine in the production environment, the RAM user must apply for the required permissions in Security Center.
Limits
- Only RAM users to which the Workspace Administrator role is assigned can associate a MaxCompute compute engine with a DataWorks workspace. For more information, see the Add a RAM user to a workspace as a member and assign roles to the member section in the Manage permissions on workspace-level services topic.
- Only RAM users to which the workspace role Workspace Administrator and the MaxCompute role Project Owner or Super_Administrator are assigned can associate existing MaxCompute projects with a DataWorks workspace. For more information about MaxCompute roles, see Role planning.
Go to the Workspace Management page
- Method 1: In the DataWorks console, find the desired workspace, move the pointer over the
icon in the Actions column, and then select Workspace Settings. On the Workspace page, click the Computing engine information tab.
- Method 2: Find the desired workspace and click the name of a DataWorks module in the Actions column, such as DataStudio or Data Integration. In the top navigation bar of the page that appears, click the
icon. On the Workspace page, click the Computing engine information tab.
Associate a MaxCompute compute engine with a workspace
- In the Compute Engine Information tab of the Workspace page, click the MaxCompute tab.
- On the MaxCompute tab, click Add Instance.
- In the Add MaxCompute Project dialog box, configure the parameters. DataWorks allows you to create a MaxCompute project and associate the project with a workspace, associate an existing MaxCompute project with a workspace, and associate an external MaxCompute project with a workspace.
Create a MaxCompute project and associate the project with a workspace
If you select Create Project for the Project Source parameter, you can create a MaxCompute project and associate the project with the current workspace.
Parameter | Description |
---|---|
Display Name Of Compute Engine | The display name of the MaxCompute compute engine instance. The display name is used to identify the configuration of the compute engine and is similar to the alias of the MaxCompute compute engine in DataWorks. You can specify a display name based on your business requirements. The display name must be unique. |
Project Source | The default value is Create Project. |
Billing Method | The billing method of the MaxCompute project. Valid values: Pay-as-you-go and Subscription. For more information about the billing rules of MaxCompute, see Overview. Note You cannot associate a MaxCompute project of the developer version with a workspace in standard mode as a compute engine instance. |
Quota Group | The computing resource pool that is used by the MaxCompute project. For more information, see Quota. |
Data Type | The data type edition of the MaxCompute project. Valid values: MaxCompute V2.0 Data Type Edition (Recommended), MaxCompute V1.0 Data Type Edition (for Early MaxCompute Projects), and Hive-Compatible Data Type Edition (Suitable for MaxCompute Projects Migrated from Hadoop). For more information, see Data type editions. |
Encrypt Or Not | Specifies whether to use Key Management Service (KMS) to encrypt data for storage. For more information, see Data encryption. |
Production Environment | In the Production Environment section, you can configure the following parameters:
Note A workspace in basic mode provides only the production environment. Therefore, the compute engine instance in the production environment is used for the operations that you perform in DataStudio. For more information about impacts of different workspace modes on development and O&M of nodes in the production environment, see Impacts of different workspace modes on development and O&M of nodes in the production environment. |
Development Environment | In the Development Environment section, you must configure the following parameters:
Note A workspace in basic mode provides only the production environment. |
Associate an existing MaxCompute project with a workspace
- Only RAM users to which the DataWorks role Workspace Administrator and the MaxCompute role Project Owner or Super_Administrator are assigned can associate existing MaxCompute projects with a DataWorks workspace.
- The parameter configurations are the same as the parameter configurations when you select Create Project for the Project Source parameter.
Associate an external MaxCompute project with a workspace
If you select MaxCompute External Project in the Add MaxCompute Project dialog box, you can associate a lakehouse project with the current workspace.
You can set the Source Type parameter to Hadoop HDFS or OSS+Data Lake Formation(DLF) based on your business requirements. For information about the parameter configurations, see Lakehouse of MaxCompute.
Disassociate a MaxCompute compute engine instance from a workspace
If a MaxCompute compute engine instance that is associated with your workspace is no longer required, you can click Disassociate in the upper-right corner of the section that displays the information about the compute engine instance to disassociate the compute engine instance from the workspace. After the compute engine instance is disassociated from the workspace, all nodes that are run based on the compute engine instance in the workspace fail.- Nodes: The nodes that are scheduled based on the MaxCompute compute engine instance fail. We recommend that you go to the DataStudio page and change the compute engine instance for the nodes at a time. Then, you can commit, deploy, and rerun the nodes.
- Data Integration: The data synchronization nodes that are run based on the MaxCompute compute engine instance fail. We recommend that you change the MaxCompute compute engine instance for the data synchronization nodes on the DataStudio page at the earliest opportunity.
- DataService Studio: The APIs that are related to the MaxCompute compute engine instance fail to be called. We recommend that you change the MaxCompute compute engine instance for the APIs at the earliest opportunity.
- DataAnalysis: You cannot query data that is related to the MaxCompute compute engine instance. We recommend that you change the MaxCompute compute engine instance at the earliest opportunity.
- The information about the MaxCompute compute engine instance is not displayed in the Data Map, Resource Optimization, Comprehensive Data Governance, or Security Center module.
Manage MaxCompute projects
DataWorks allows you to manage MaxCompute projects and view information about the MaxCompute projects by performing the following operations: In the left-side navigation pane of the DataWorks console, choose Housekeeper tab. On the page that appears, you can perform operations based on your business requirements. For example, you can view the information about a job, view the consumption of resources (including storage resources and CUs), and manage quota groups. For information about how to use MaxCompute Management, see Use MaxCompute Management.
. On the page that appears, click theWhat to do next
After you associate a MaxCompute compute engine with a workspace as a compute engine instance and you are familiar with the information about using a MaxCompute compute engine instance, you can use the compute engine instance to develop data.- After you assign a RAM user a built-in workspace-level role in your workspace and associate a MaxCompute project with the workspace as a compute engine instance, the RAM user is automatically granted the permissions of the mapped role of the MaxCompute compute engine instance in the development environment. By default, the RAM user does not have the permissions of the MaxCompute compute engine instance in the production environment.
- MaxCompute allows you to query tables across projects. You can query data in a MaxCompute project that you associate with a workspace in the production environment by specifying the project name on the DataStudio page.
- After you understand the preceding instructions, you can go to the DataStudio page to develop data. For more information, see Data development: Developers and Overview.