DataWorks allows you to associate a MaxCompute compute engine with a workspace on the Computing Engine Information tab of the Workspace page in the DataWorks console. This way, you can use the MaxCompute compute engine as a compute engine instance of the workspace to run MaxCompute nodes in DataWorks. This topic describes how to associate a MaxCompute compute engine with a workspace.

Prerequisites

Background information

Before you associate a compute engine with a workspace, make sure that you are familiar with the information that is provided in the following topics. This ensures that you can perform subsequent data development operations as expected.

Overview

DataWorks allows you to associate a MaxCompute project with a workspace in the following scenarios:
  • You can create a MaxCompute project and associate the project with the workspace.
  • You can associate an existing MaxCompute project with a workspace in basic mode as a compute engine instance.
  • You can associate a lakehouse project with a workspace. For more information, see Lakehouse of MaxCompute.

Precautions

  • The permissions of the account that you use to create a DataWorks workspace and associate a MaxCompute compute engine with the workspace depend on the type of the account. The following table provides the detailed information.
    Account typePermission description
    Alibaba Cloud accountIf you use an Alibaba Cloud account to create a DataWorks workspace and associate a MaxCompute compute engine with the workspace, the account has operation permissions on all objects within the MaxCompute compute engine instance. Only authorized users can access the MaxCompute compute engine instance.
    RAM user
    • If you use a RAM user to create a DataWorks workspace and associate a MaxCompute compute engine with the workspace, the RAM user and the Alibaba Cloud account to which the RAM user belongs have operation permissions on all objects within the MaxCompute compute engine instance. Only authorized users can access the MaxCompute compute engine instance.
    • To facilitate management, the system automatically assigns the Super_Administrator role of the MaxCompute compute engine instance to the RAM user.
      Note If the workspace is in standard mode, the RAM user is assigned the Super_Administrator role of the MaxCompute compute engine instance only in the development environment. For information about workspace modes, see Differences between workspaces in basic mode and workspaces in standard mode.
  • The identity that you select to access the MaxCompute compute engine instance in the production environment has the following impacts on data and resources in the compute engine instance.
    Note For workspaces in basic mode, fine-grained permission management is not supported. The following table describes the impacts on data and resources in the production environment.
    ItemDescriptionReferences
    Impact on ownership of data and resources in the production environmentThe data and resources in the production environment of the current workspace belong to the access identity that you selected when you associate a MaxCompute compute engine with the workspace in the production environment. The default access identity is an Alibaba Cloud account. By default, RAM users cannot perform operations on data and resources in the production environment.
    Impact on access control for data and resources in the production environmentIf a RAM user is specified as the scheduling access identity of a MaxCompute compute engine that you associate with the workspace in the production environment, the RAM user can perform operations on or access tables in the MaxCompute compute engine. In other scenarios, a RAM user cannot perform operations on or access tables in the MaxCompute compute engine in the production environment even after the RAM user is added to the workspace as a member. If you want the RAM user to perform operations on or access tables in the MaxCompute compute engine in the production environment, the RAM user must apply for the required permissions in Security Center.

Limits

  • Only RAM users to which the Workspace Administrator role is assigned can associate a MaxCompute compute engine with a DataWorks workspace. For more information, see the Add a RAM user to a workspace as a member and assign roles to the member section in the Manage permissions on workspace-level services topic.
  • Only RAM users to which the workspace role Workspace Administrator and the MaxCompute role Project Owner or Super_Administrator are assigned can associate existing MaxCompute projects with a DataWorks workspace. For more information about MaxCompute roles, see Role planning.

Go to the Workspace Management page

You can use one of the following methods to go to the page on which you can associate a compute engine with a DataWorks workspace:
  • Method 1: In the DataWorks console, find the desired workspace, move the pointer over the More icon in the Actions column, and then select Workspace Settings. On the Workspace page, click the Computing engine information tab.
  • Method 2: Find the desired workspace and click the name of a DataWorks module in the Actions column, such as DataStudio or Data Integration. In the top navigation bar of the page that appears, click the Workspace Management icon. On the Workspace page, click the Computing engine information tab.

Associate a MaxCompute compute engine with a workspace

  1. In the Compute Engine Information tab of the Workspace page, click the MaxCompute tab.
  2. On the MaxCompute tab, click Add Instance.
  3. In the Add MaxCompute Project dialog box, configure the parameters.
    DataWorks allows you to create a MaxCompute project and associate the project with a workspace, associate an existing MaxCompute project with a workspace, and associate an external MaxCompute project with a workspace. Associate a MaxCompute compute engine with a workspace

Create a MaxCompute project and associate the project with a workspace

If you select Create Project for the Project Source parameter, you can create a MaxCompute project and associate the project with the current workspace.

ParameterDescription
Display Name Of Compute EngineThe display name of the MaxCompute compute engine instance. The display name is used to identify the configuration of the compute engine and is similar to the alias of the MaxCompute compute engine in DataWorks.

You can specify a display name based on your business requirements. The display name must be unique.

Project SourceThe default value is Create Project.
Billing MethodThe billing method of the MaxCompute project. Valid values: Pay-as-you-go and Subscription. For more information about the billing rules of MaxCompute, see Overview.
Note You cannot associate a MaxCompute project of the developer version with a workspace in standard mode as a compute engine instance.
Quota GroupThe computing resource pool that is used by the MaxCompute project. For more information, see Quota.
Data TypeThe data type edition of the MaxCompute project. Valid values: MaxCompute V2.0 Data Type Edition (Recommended), MaxCompute V1.0 Data Type Edition (for Early MaxCompute Projects), and Hive-Compatible Data Type Edition (Suitable for MaxCompute Projects Migrated from Hadoop). For more information, see Data type editions.
Encrypt Or NotSpecifies whether to use Key Management Service (KMS) to encrypt data for storage. For more information, see Data encryption.
Production EnvironmentIn the Production Environment section, you can configure the following parameters:
  • Project Name: the name of the MaxCompute project that you want to associate with the workspace in the production environment.
  • Scheduled Access Identity: the identity that is used to access the MaxCompute project in the production environment. Valid values: Alibaba Cloud Account, RAM User, and RAM Role. The identity that you specify is used to run nodes in the MaxCompute compute engine instance when the nodes are automatically scheduled in the production environment.
Note A workspace in basic mode provides only the production environment. Therefore, the compute engine instance in the production environment is used for the operations that you perform in DataStudio. For more information about impacts of different workspace modes on development and O&M of nodes in the production environment, see Impacts of different workspace modes on development and O&M of nodes in the production environment.
Development EnvironmentIn the Development Environment section, you must configure the following parameters:
  • Project Name: the name of the MaxCompute project that you want to associate with the workspace in the development environment. The name is automatically generated based on the project name that you specify for the Project Name parameter in the Production Environment section. The name is suffixed with _dev.
  • Scheduled Access Identity: The value is fixed as Node Executor. In a workspace in standard mode, the current logon user is used to access objects, such as tables and functions, in the MaxCompute project that you associate with the workspace in the development environment when nodes are run in DataStudio. When the nodes are run in Operation Center in the development environment, the account of the node owner is automatically used.
Note A workspace in basic mode provides only the production environment.

Associate an existing MaxCompute project with a workspace

If you have created a MaxCompute project in the MaxCompute console and you use a workspace in basic mode, you can select Existing Project for the Project Source parameter and select the MaxCompute project that you want to associate with the workspace from the Project Name drop-down list.
Note
  • Only RAM users to which the DataWorks role Workspace Administrator and the MaxCompute role Project Owner or Super_Administrator are assigned can associate existing MaxCompute projects with a DataWorks workspace.
  • The parameter configurations are the same as the parameter configurations when you select Create Project for the Project Source parameter.

Associate an external MaxCompute project with a workspace

If you select MaxCompute External Project in the Add MaxCompute Project dialog box, you can associate a lakehouse project with the current workspace.

You can set the Source Type parameter to Hadoop HDFS or OSS+Data Lake Formation(DLF) based on your business requirements. For information about the parameter configurations, see Lakehouse of MaxCompute.

Disassociate a MaxCompute compute engine instance from a workspace

If a MaxCompute compute engine instance that is associated with your workspace is no longer required, you can click Disassociate in the upper-right corner of the section that displays the information about the compute engine instance to disassociate the compute engine instance from the workspace. After the compute engine instance is disassociated from the workspace, all nodes that are run based on the compute engine instance in the workspace fail.
The disassociation of a MaxCompute compute engine instance from a workspace has the following impacts:
  • Nodes: The nodes that are scheduled based on the MaxCompute compute engine instance fail. We recommend that you go to the DataStudio page and change the compute engine instance for the nodes at a time. Then, you can commit, deploy, and rerun the nodes.
  • Data Integration: The data synchronization nodes that are run based on the MaxCompute compute engine instance fail. We recommend that you change the MaxCompute compute engine instance for the data synchronization nodes on the DataStudio page at the earliest opportunity.
  • DataService Studio: The APIs that are related to the MaxCompute compute engine instance fail to be called. We recommend that you change the MaxCompute compute engine instance for the APIs at the earliest opportunity.
  • DataAnalysis: You cannot query data that is related to the MaxCompute compute engine instance. We recommend that you change the MaxCompute compute engine instance at the earliest opportunity.
  • The information about the MaxCompute compute engine instance is not displayed in the Data Map, Resource Optimization, Comprehensive Data Governance, or Security Center module.
Note Only a workspace administrator has permissions to disassociate a compute engine instance from a workspace.

Manage MaxCompute projects

DataWorks allows you to manage MaxCompute projects and view information about the MaxCompute projects by performing the following operations: In the left-side navigation pane of the DataWorks console, choose Compute Engines > MaxCompute. On the page that appears, click the Housekeeper tab. On the page that appears, you can perform operations based on your business requirements. For example, you can view the information about a job, view the consumption of resources (including storage resources and CUs), and manage quota groups. For information about how to use MaxCompute Management, see Use MaxCompute Management.

What to do next

After you associate a MaxCompute compute engine with a workspace as a compute engine instance and you are familiar with the information about using a MaxCompute compute engine instance, you can use the compute engine instance to develop data.
  1. After you assign a RAM user a built-in workspace-level role in your workspace and associate a MaxCompute project with the workspace as a compute engine instance, the RAM user is automatically granted the permissions of the mapped role of the MaxCompute compute engine instance in the development environment. By default, the RAM user does not have the permissions of the MaxCompute compute engine instance in the production environment.
  2. MaxCompute allows you to query tables across projects. You can query data in a MaxCompute project that you associate with a workspace in the production environment by specifying the project name on the DataStudio page.
  3. After you understand the preceding instructions, you can go to the DataStudio page to develop data. For more information, see Data development: Developers and Overview.